(297a) Embedding Stereoelectronic Effects into Molecular Representations for Machine Learning | AIChE

(297a) Embedding Stereoelectronic Effects into Molecular Representations for Machine Learning

Authors 

Gomes, G. - Presenter, University of Toronto
Molecular representation learning lies at the intersection of machine learning and chemistry, being crucial for property prediction tasks. To date, many representation methods have been developed, ranging from simple physicochemical descriptors to sophisticated geometrical modeling. However, existing approaches typically ignore non-bonding electronic information, and are limited to capturing covalent interactions. To address this problem, we propose a novel molecular representation that generalizes to other interactions by incorporating stereoelectronic effects. We accomplished this goal by taking advantage of a well-known method that captures complex electronic interactions, Natural Bond Orbital (NBO) analysis. To limit the scope of this work to organic molecules, we applied NBO analyses for the structures from the QM9 dataset. Subsequently, we created a graph neural network method capable of predicting these interactions in a supervised learning fashion. The method yields a heterogeneous graph with representations for atom, bond, and lone pair orbitals while adding atom-specific orbitals and second-order interaction information. Our method also suggests a more general way to represent molecular structures, which supplements the use of nodes for atoms and edges for chemical bonds with the addition of bond orbital nodes and second-order interaction edges. We developed an end-to-end pipeline that takes a molecular geometry input and returns a representation with stereoelectronic information. The first step was to enrich the standard molecular graphs with lone pairs (predicted by a separate network) and bond orbital nodes, which are solved analytically for a given geometry. Second, we developed a multi-task approach that learns all the interaction values provided by the NBO analysis. Finally, we use the resulting representations with infused stereoelectronic information for downstream tasks, with experiments showing the approach’s applicability to increase performance. Our approach is uniquely suited for optimal learning with limited datasets, with strong applicability to catalysis and reaction design.