(196f) Computational Completion of Partial Chemical Reaction Equations | AIChE

(196f) Computational Completion of Partial Chemical Reaction Equations

Authors 

Weber, J. - Presenter, Technische Universiteit Delft
van Wijngaarden, M., Technische Universiteit Delft
Vogel, G., TU Delft
The availability of large chemical reaction data sets has significantly influenced the rapid development of machine learning models for chemical prediction tasks. For instance, machine learning models trained on the USPTO dataset successfully predict reaction products, reaction conditions, retrosynthesis routes and many more. Additionally, large chemical reaction data offers us a unique potential for reaction pathway optimisation, i.e., finding an optimal sequence of chemical reactions from feedstock to product molecules. Yet, the reaction records contain mostly incomplete reaction equations, i.e. missing co-reactants and co-products and/or missing reaction stoichiometries. This potentially poses a challenge for all algorithms trained on the data, but is a particular bottleneck for reaction pathway optimisation, where mass-balance are used to formulate model constraints [1].

We first illustrate the reaction pathway optimisation problem based on large scale chemical reaction data [2] and then dive into the task of completing partial reaction equation [3]. We combine two tactics for computational completion of partial equations. We developed a chemical rule-based method and a machine learning (ML) model, a fine-tuned version of the Molecular Transformer. The rule-based method is based on a linear solver and different sets of small chemical molecules (helper species) and therewith balances incomplete reactions. The ML model takes partial reactions as inputs and predicts molecules and stoichiometries that complete the reaction equation. It is trained on the data previously completed through the rule-based method. Our approach uses the USPTO STEREO chemical reaction data set and can complete almost half of the reactions through the rule-based approach. The fine-tuned Molecular Transformer shows > 99% validity in predicted SMILES, and a very good performance (> 85% accuracy) in top1 predictions in the interpolation task, yet it is still limited in its extrapolation capabilities. We benchmark our work against a similar hybrid approach that was concurrently developed [4]. Our results imply that our autoregressive encoder-decoder transformer model is a well-suited model choice and present a significant step forward to complete large-scale chemical reaction data.

References

[1]: Voll, A., & Marquardt, W. (2012). Reaction network flux analysis: Optimization‐based evaluation of reaction pathways for biorenewables processing. AIChE Journal, 58(6), 1788-1801.

[2]: Weber, J. M., Guo, Z., & Lapkin, A. A. (2022). Discovering Circular Process Solutions through Automated Reaction Network Optimization. ACS Engineering Au, 2(4), 333-349.

[3]: v. Wijngaarden, M., Vogel, G., Weber, J.M. (2024) Completing Partial Reaction Equations with Rule and Language Model-based Methods. Computer Aided Chemical Engineering - In press

[4]: Zhang, C., Arun, A., Lapkin, A.A. (2023), Completing and balancing database excerpted chemical reactions with a hybrid mechanistic-machine learning approach. Chemrxiv