(219h) Characterizing Complex Solvent Environments in Acid-Catalyzed Reactions Using Molecular Dynamics Simulations and 3D Convolutional Neural Nets
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Applications of Data Science in Catalysis and Reaction Engineering III
Tuesday, November 17, 2020 - 9:15am to 9:30am
As an alternative to designing descriptors via human intuition, machine learning methods have been increasingly used to infer molecular properties by automatically extracting features from complex sources of data [7-13]. For example, convolutional neural networks (CNNs) can be used to identify and quantify patterns within two-dimensional (2D) spatial datasets such as images [14]. By training on a suitable set of labeled image data, CNNs extract spatial features without requiring human supervision and can then utilize these features to classify image contents. CNNs can be further generalized to extract features from three-dimensional (3D) volumetric data [15], which can facilitate the analysis of 3D molecular structures. For example, 3D CNNs have recently been used to detect protein functional sites [16], evaluate protein-ligand binding sites [17], and quantify protein-ligand binding affinities [18] by training on protein database structures. Based on these examples and our prior success using classical MD simulations to predict acid-catalyzed reaction outcomes [3], we hypothesize that 3D CNNs can exploit the output of classical MD simulations to more accurately predict solvent effects on acid-catalyzed reaction rates.
In this work, we developed 3D CNNs that utilize atomic positions obtained from classical MD simulation trajectories to predict the rates of liquid-phase, acid-catalyzed biomass conversion reactions in mixed-solvent environments. We constructed 3D grids of voxels (the 3D analogs of 2D pixels) that represent atomistic positions sampled in corresponding MD simulations. We find that our 3D CNN model, which we call SolventNet, predicts experimental reaction rates more accurately than models based on human-selected, MD-derived descriptors [3] and previously developed 3D CNNs (ORION [19] and VoxNet [20]). Surprisingly, reaction rate predictions with SolventNet require as little as 2 ns of classical MD trajectory data, a 100-fold improvement from the original 205 ns of MD data used in models based on human-selected descriptors [3]. This indicates that 3D atomistic positions embed significant information. We further show that SolventNet generalizes to new system compositions using leave-one-out cross-validation in which all data for a cosolvent-water mixture or reactant were treated as the test set and excluded from model training. Finally, we tested the predictive power of SolventNet for reactants in three additional polar aprotic cosolvents not included in model training: dimethyl sulfoxide, acetonitrile, and acetone. SolventNet still accurately predicts experimentally measured reaction rates in solvent mixtures containing these cosolvents despite their distinct properties (e.g., functional groups, basicity, and polarizability). To our knowledge, this work is the first to integrate 3D CNNs and classical MD simulations for the prediction of acid-catalyzed reaction rates. We envision that the computational efficiency associated with the combination of 3D CNNs and classical MD simulations will enable the integration of these tools with process models to screen solvents and optimize reactor conditions for biomass conversion processes [21].
[1] L. Shuai and J. Luterbacher, Chemsuschem, 2016, 9, 133-155.
[2] M. A. Mellmer, C. Sener, J. M. R. Gallo, J. S. Luterbacher, D. M. Alonso and J. A. Dumesic, Angew Chem Int Edit, 2014, 53, 11872-11875.
[3] T. W. Walker, A. K. Chew, H. X. Li, B. Demir, Z. C. Zhang, G. W. Huber, R. C. Van Lehn and J. A. Dumesic, Energy & Environmental Science, 2018, 11, 617-628.
[4] J. J. Varghese and S. H. Mushrif, Reaction Chemistry & Engineering, 2019, 4, 165-206.
[5] S. H. Mushrif, S. Caratzoulas and D. G. Vlachos, PCCP, 2012, 14, 2637-2644.
[6] A. K. Chew and R. C. Van Lehn, Front Chem, 2019, 7, 439.
[7]Connor W. Coley, W. Jin, L. Rogers, T. F. Jamison, T. S. Jaakkola, W. H. Green, R. Barzilay and K. F. Jensen, Chem Sci, 2019, 10, 370-377.
[8] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik and R. P. Adams, 2015.
[9] R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams and A. Aspuru-Guzik, ACS central science, 2018, 4, 268-276.
[10] N. E. Jackson, A. S. Bowen, L. W. Antony, M. A. Webb, V. Vishwanath and J. J. de Pablo, Sci Adv, 2019, 5, eaav1190.
[11] E. Y. Lee, B. M. Fulan, G. C. L. Wong and A. L. Ferguson, Proceedings of the National Academy of Sciences, 2016, 113, 13588-13593.
[12] Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing and V. Pande, Chem Sci, 2018, 9, 513-530.
[13] S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Schütt and K.-R. Müller, Sci Adv, 2017, 3, e1603015.
[14] W. Rawat and Z. H. Wang, Neural Comput, 2017, 29, 2352-2449.
[15] R. D. Singh, A. Mittal and R. K. Bhatia, Multimedia Tools and Applications, 2019, 78, 15951-15995.
[16] W. Torng and R. B. Altman, Bioinformatics, 2018, 35, 1503-1512.
[17] J. Jiménez, S. Doerr, G. MartÃnez-Rosell, A. S. Rose and G. De Fabritiis, Bioinformatics, 2017, 33, 3036-3042.
[18] J. Jiménez, M. Å kaliÄ, G. MartÃnez-Rosell and G. De Fabritiis, J Chem Inf Model, 2018, 58, 287-296.
[19] N. Sedaghat, M. Zolfaghari, E. Amiri and T. Brox, arXiv preprint arXiv:1604.03351, 2016.
[20] D. Maturana and S. Scherer, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, 922-928.
[21] D. M. Alonso, S. H. Hakim, S. Zhou, W. Won, O. Hosseinaei, J. Tao, V. Garcia-Negron, A. H. Motagamwala, M. A. Mellmer, K. Huang, C. J. Houtman, N. Labbé, D. P. Harper, C. T. Maravelias, T. Runge and J. A. Dumesic, Sci Adv, 2017, 3, e1603301.