(197aq) Strategic Use of Molecular Simulations to Expand Predictive Capability of Machine Learning Models | AIChE

(197aq) Strategic Use of Molecular Simulations to Expand Predictive Capability of Machine Learning Models

Authors 

Shah, J., Oklahoma State University
Gas solubility in solvents can be estimated by calculating the excess chemical potential and determining the corresponding Henry’s constant value. Rigorous computational methods implement the Bennett Acceptance Ratio approach to estimate the excess chemical potential. In this approach, the free energy change due to reversible dissolution of the gas is calculated by gradually introducing (or eliminating) the Coulombic and van der Waals’ interactions between the solute and solvent in series of intermediate steps. This process is generally slow and computationally expensive and is inefficient at screening large chemical spaces in a high-throughput manner. Further, it can be used only for solvents with reliable force fields which severely limits its use. This calls for machine learning approaches to predict gas solubility. However, machine learning methods do not necessarily capture the underlying physics of the system and rely heavily on the quantity and quality of training data. In this work, we implement a hybrid approach to test if we can improve the predictive capability of machine learning models by strategically leveraging molecular simulations to expand the dataset. For example, extensive data is available for CO2 solubility in linear alkanes, or linear alkyl substituted molecules in general, while data on branched alkanes is limited. In this work we investigate the feasibility of using rigorous molecular simulations on few simple branched chain molecules to generate data that can greatly expand the predictive capabilities of the machine learning model towards branched chain alkyl substituted molecules. We also identify the key features of such molecules whose data facilitates rapid improvement of the model.