(70c) Solubility Prediction of Industrial Chemicals: Feeding Graph Neural Networks with Physics-Based Simulations Data
AIChE Spring Meeting and Global Congress on Process Safety
2023
2023 Spring Meeting and 19th Global Congress on Process Safety
Industry 4.0 Topical Conference
Data Analytics and Smart Manufacturing II
Tuesday, March 14, 2023 - 11:15am to 11:45am
As a first step, a limited set of molecules from a specific chemical family (or families) is selected, and their solubility is calculated using molecular simulations. More specifically we estimate the Hansen solubility parameters1 using a fast Molecular Dynamics technique, where the simulation box is successively compressed and de-compressed to reach faster in correct density and thermodynamic equilibrium.2 The solubility parameters are defined by the intermolecular energy of the simulation box.
We subsequently use the simulation data to train a Graph Neural Network (GNN). The adopted machine learning model is very meticulously selected. Traditional machine learning algorithms and deep learning algorithms3 were also tested and provided high accuracy results, but molecular descriptors had to be generated to train the model against them (i.e. asphericity, molecular weight, number of hydrogen bonding donors/acceptors, radius of gyration, etc). The advantage of GNNs is that the complex chemical structure of a molecule can be naturally represented by a graph, defined as the ensemble of the connectivity relationships between a set of nodes (atoms) and a set of edges (bonds).4 That allows us to immediately train against the target solubility parameters without additional calculations.
Our hybrid protocol provides us with a very fast and accurate method for solubility calculation of industrial chemicals. The fast Molecular Dynamics technique should be performed only once for a specific chemical family. Once the GNN is trained then it automatically predicts the representative value for any other chemical of the same family. Such framework is more accurate than any Quantitative-Structure-Property-Relationship5 or data approach, while much faster than any particle simulation technique.
(1) Hansen, C. M., J. Paint Technol. 1967, 39, 511
(2) Belmares, M.; Blanco, M.; Goddard, W. A.; Ross, R.; Caldwell, G.; Chou, S.-H.; Pham, J.; Olofson, P. M.; Thomas, C., J. Comput. Chem. 2004, 25: 1814â1826
(3) Deng, T.; Jia, G.-Z., Molecular Physics, 2020, 118:2
(4) Lee, S.; Lee, M.; Gyak, K.-W.; Kim, S.; Kim, M.-J.; Min, K., ACS Omega 2022, 7, 14, 12268â12277
(5) Chinta, S.; Rengaswamy, R., Ind. Eng. Res. 2019, 58, 8, 3082â3092