(59ag) Chemistry-Aware Retrosynthesis and Forward Reaction Prediction Using Smiles Grammar Tree Transformer | AIChE

(59ag) Chemistry-Aware Retrosynthesis and Forward Reaction Prediction Using Smiles Grammar Tree Transformer

Authors 

Zhang, K., Columbia University
Venkatasubramanian, V., Columbia University
With the recent advances in machine learning algorithms complemented with significant improvements in computational capabilities — availability of better hardware, faster processing, and cheaper memory — computational chemistry is seeing applications that leverage complex machine learning models. Some of these methods have proven to be extremely successful, thanks to the inherent efficiency of machine learning models in capturing the complex, non-linear dependencies between various factors that govern reactions systems. To this end, several purely black-box approaches for modeling chemical reactions have been developed. Even though such approaches might seem to perform well based on traditional prediction metrics, there is a disconnect between underlying model architectures used and chemistry principles that human chemists use while performing these tasks.

In this work, we have built chemistry-aware retrosynthesis prediction and forward reaction prediction models that combine powerful data-driven models with chemistry knowledge. We represent molecules based on a hierarchical tree representation [1,2] that contains underlying chemistry information that is otherwise ignored by models based on purely SMILES-based representations [3,4]. Using these chemistry-aware representations, we perform functional groups-based convolution operations before performing the modeling exercise. We report a significant improvement in the model performance on both the forward reaction prediction task (given reactants, predict the product) and the retrosynthesis prediction task (given target molecule, predict precursors). We conclude that the combination of chemistry knowledge with powerful model architectures is required in order to develop deployable models that could be used in practice.

References:

1. Mann, Vipul, and Venkat Venkatasubramanian. "Predicting chemical reaction outcomes: A grammar ontology‐based transformer framework." AIChE Journal 67.3 (2021): e17190.

2. Mann, Vipul, and Venkat Venkatasubramanian. "Retrosynthesis prediction using grammar-based neural machine translation: An information-theoretic approach." Computers & Chemical Engineering 155 (2021): 107533.

3. Schwaller, Philippe, et al. "Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction." ACS central science 5.9 (2019): 1572-1583.

4. Tetko, Igor V., et al. "State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis." Nature communications 11.1 (2020): 5575.