(186f) Machine-Learning-Guided Discovery of New Electrochemical Reactions | AIChE

(186f) Machine-Learning-Guided Discovery of New Electrochemical Reactions

Authors 

Machine-Learning-Guided Discovery of New Electrochemical Reactions

Andrew Zahrt, Yiming Mo, Yanfei Guan, Esther Heid, Kakasaheb Nandiwale, Klavs Jensen*

Significance

The discovery of new transformations constitutes a significant portion of research effort in organic chemistry, with hundreds of researchers globally working to develop new synthetic methods. The impact of field is exemplified by the incorporation of modern methods into medicinal chemistry programs is exemplified by the types of reactions used in discovery campaigns. Between 1976 and 2015, the types of synthetic transformations employed by medicinal chemists has increased by 27 %, with 123 different reaction types accounting for 95 % of reactions used in 1976 to 159 reactions used in 2015.[1] Further, a comparison between methods employed in both 1984 and 2014 reveals a number of synthetic method developments have changed the landscape of medicinal chemistry, including the Suzuki-Miyaura coupling, the Buchwald-Hartwig coupling, urea formation reactions, the Mitsunobu reaction, and others.[2] However, despite these developments, the chemical space explored by most discovery campaigns remains limited. Five types of reactions make up over 60 % of reactions used in drug candidate surveys.2,[3] Further, a survey of drug molecules published in 2010 indicated that the top 50 molecular frameworks covered ~50% of experimentally approved drugs,[4] a number not significantly different than a similar study using data from 1996.[5] In addition, a study of drugs developed before 2013 indicated that only 2% of known monocyclic or bicyclic rings systems are present in drug compounds.[6],[7] Although one might argue that this relative structural invariability is a consequence simply of the privileged bioactivity of these select scaffolds, the commonly cited “escape from flatland” concept indicates that this is likely not the case and that exploring 3D-chemical space is essential to further the field of drug development.[8] Clearly, the toolbox of the medicinal chemist must expand to increase the rate of drug discovery.

Background

Applications of computer guided approaches to new reaction discovery are rare, with most early examples never extending beyond proof-of-principle.[9]-[10][11][12] Recently, Cronin and coworkers have re-invigorated the field of computer-guided reaction discovery, demonstrating the ability to use classification models to predict productive reactant combinations.18 However, the synthetic chemistry community still has no computer-guided, generalizable method capable of discovering new reactions. This work seeks to rectify this limitation by combining high-throughput experimentation with a machine-learning-guided protocol for reaction discovery capable of computationally evaluating vast quantities of virtual reactions, including those with reactants not included in the training data. This workflow will increase the success rate of experimental reaction discovery campaigns. In this case, we explore the chemical space of convergent-paired electrolytic reactions.

Results

To achieve this goal, we postulated that advanced molecular representations will be required to produce generalizable models with sparse datasets. To develop this representation, a literature dataset of 370 electrochemical reactions was gathered with the objective of predicting the site selectivity of electrochemical oxidation reactions. Different molecular representations were evaluated and their performance compared in generating models classifying atomic centers as reactive (i.e. that atom is oxidized in the reaction) or unreactive. In the design of this study, we chose to divide the dataset on the basis of the overall transformation, in which one category (or “reaction template”) is left out as the test set. In this data partitioning design, fingerprint-based models struggle to predict into the “left out” reaction class. However, augmenting the DFT features with additional electronic structure information derived from NBO analysis of the neutral, oxidized, and reduced molecules improved the model significantly (Precision/Recall with Morgan Fingerprints vs. Engineered Features is 0.18/0.31 and 0.51/0.68, respectively). This finding is significant, suggesting that engineered features outperform structure-based fingerprints when generalization/extrapolation is the goal.

The engineered DFT-level features were atom-level features used to classify individual atoms and reactive and unreactive. In order to combine these atom-level features to create a molecular representation, the graph2vec method[13] was used to embed the molecular graph (with atom-level features as node attributes) to a fixed-length vector. To validate this molecular representation, models were produced to predict molecular properties (e.g. ionization potential, HOMO energy, etc.) or molecules using a data-partitioning scheme designed to force extrapolative predictions. In these cases, the engineered vector significantly outperformed Morgan fingerprints. With a validated representation identified, DFT calculations were performed for ~40K molecules containing atoms H, C, N, O, F, Si, P, S, Cl, Br, I, Li, Na, and K and this data used learn the desired representation, enabling instantaneous generation of the desired vectors.

With this computational groundwork established, an automated microfluidic platform previously published by our laboratory[14] was used to test ~150 electrochemical reactions. In the reactions, the cathodic reactions was intended to be held constant as the 1-electron reduction of 1,4-dicyanobenzene and the anodic reaction varied with a series of different partners to discover new convergent-paired electrolytic reactions. Using this dataset, a model was generated capable of classifying whether a reactant would be a competent reactive partner at 74 % accuracy. To date, this workflow has resulted in the discovery of 4 new electrochemical reactions. We now seek to expand this dataset and couple the reaction discovery model and site-selectivity model to produce a catalog of hypothetical reactions to be evaluated and (if successful) optimized on the automated high-throughput platform.

References

([1]) Schneider, N.; Lowe, D. M.; Sayle, R. A.; Tarselli, M. A.; Landrum, G. A. Big data from pharmaceutical patents: a computational analysis of medicinal chemists’ bread and butter. J. Med. Chem. 2016, 59, 9, 4385-4402.

([2]) Brown, D. G.; Boström, J. Analysis of past and present synthetic methodologies on medicinal chemistry: where have all the new reactions gone? J. Med. Chem. 2016, 59, 4443−4458.

([3]) Roughley, S. D.; Jordan, A. M. The medicinal chemist’s toolbox: an analysis of reactions used in the pursuit of drug candidates. J. Med. Chem. 2011, 54, 3451–3479.

([4]) Wang, J.; Hou, T. Drug and drug candidate building block analysis. J. Chem. Inf. Model. 2010, 50, 55–67.

([5]) Bemis, G. W.; Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 1996, 39, 2887–2893.

([6]) Taylor, R. D., MacCoss, M.; Lawson, A. D. Rings in drugs. J. Med. Chem. 2014, 57, 5845–5859.

([7]) Taylor, R. D., MacCoss, M.; Lawson, A. D. Combining molecular scaffolds from FDA approved drugs: application to drug discovery. J. Med. Chem. 2017, 60, 1638–1647.

([8]) Lovering, F.; Bikker, J.; Humblet, C. Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success. J. Med. Chem. 2009, 52, 6752-6756.

([9]) Balaban, A.T. Chemical graphs. 3. Reactions with Cyclic 6-Membered Transition States. Rev. Roum. Chem. 1967, 12, 875-902.

([10]) a) Hendrickson, J.B. The Variety of Thermal Pericyclic Reactions. Angew. Chem, Int. Ed. 1974, 13, 47-76. b) Arens, J.F. A Formalism for the Classification and Design of Organic Reactions. I. The Class of (-+)n+ and (-+)n – reactions. Recl. Des Trav. Chim. Des Pays-Bas. 1979, 98, 395-399. c) Arens, J.F. A Formalism for the Classification and Design of Organic Reactions II. The Classes of (+-)nC reactions. Recl. Des Trav. Chim. Des Pays-Bas. 1979, 98, 471-483. d) Arens, J.F. A Formalism for the Classification and Design of Organic Reactions III. The Class of (+-)nC Reactions. Recl. Des Trav. Chim. Des Pays-Bas. 1979, 98, 471-483. e) Zefirov, N.S.; Tratch, S.S. Formal-Logical Approach to Multicentered Processes with Cyclic Electron Transfer. Match. 1977, 263-264. f) Zefirov, N.S.; Tratch, S.S.; Tratch, S.S. Systematization of Tautomeric Processes and Formal-Logical Approach to the Search for New Topological and Reaction Types of Tautomerism. Chem. Scr. 1980, 15, 4-12.

([11]) a) Bauer, J.; Herges, R.; Fontain, E.; Ugi, I. IGOR and Computer Assisted Innovation in Chemistry. Chimia (Aarau). 1985, 39, 43-53. b) Bauer, J. IGOR2: A PC-program for Generating New Reactions and Molecular Structures. Tetrahedron. Comput. Methodol. 1989, 2, 269-280.

([12]) a) Herges, R.; Reaction Planning: Prediction of New Organic Reactions. J. Chem. Inf. Comput. Sci. 1990, 30, 377-383. B) Herges, R.; Hoock, C. Reaction Planning: Computer-Aided Discovery of a Novel Elimination Reaction. Science. 1992, 255, 711-713. c) Zefirov, N.S.; Baskin, I.I.; Palyulin, V.A. SYMBEQ Program and Its Application in Computer-Assisted Reaction Design. J. Chem. Inf. Comput. Sci. 1994, 34, 994-999. d) Zefirov, N.S.; Tratch, S.; Molchanova, M. The Argent Program System: A Second-Generation Tool Aimed at Combinatorial Search for New Types of Organic Reactions. 1 Main Concepts Potentialities MatchCommunication Math. Comput. Chem. 46 2002 SRC, 253-273. e) Molchanova, M.S.; Tratch, S.S.; Zefirov, N.S. Computer-Aided Design of New Organic Transformations: Exposition of the ARGENT-1 Program. J. Phys. Org. Chem. 2003, 16, 463-474.

([13]) Narayanan, A.; Chandramohan, M.; Venkatesan, R.; Chen, L.; Liu, Y.; Jaiswal, S. graph2vec: Learning Distributed Representations of Graphs. arXiv:1707.05005.

([14]) (A) Mo, Y.; Lu, Z.; Rughoobur, G.; Patil, P.; Gershenfeld, N.; Akinwande, A.I.; Buchwald, S.L.; Jensen, K.F. Microfluidic Electrochemistry for Single-Electron Transfer Redox-Neutral Reactions. Science, 2020, 368, 1352-1357. (B) Mo, Y.; Rughoobur, G.; Nambiar, A.M.K.; Zhang, K.; Jensen, K.F. A Multifunctional Microfluidic Platform for High-Throughput Experimentation of Electroorganic Chemistry. Angew. Chem. Int. Ed. 2020, 59, 20890-20894.