(70c) Solubility Prediction of Industrial Chemicals: Feeding Graph Neural Networks with Physics-Based Simulations Data | AIChE

(70c) Solubility Prediction of Industrial Chemicals: Feeding Graph Neural Networks with Physics-Based Simulations Data

Authors 

Aglave, R. - Presenter, Siemens PLM Software
Mallick, S., Siemens
Boioli, F., Kaizen Solutions
Petris, P., Siemens Digital Industries Software
Mas, P., Siemens Digital Industries Software
The solubility of industrial chemicals dictates performance, stability, and process development. To assess this important material property in a fast and cost-effective way, digital solutions can be utilized. In the literature there are plenty of physics-based simulation and data-driven approaches. In our computational framework, we combine the best of both worlds to provide a complete solution to this industrial need.

As a first step, a limited set of molecules from a specific chemical family (or families) is selected, and their solubility is calculated using molecular simulations. More specifically we estimate the Hansen solubility parameters1 using a fast Molecular Dynamics technique, where the simulation box is successively compressed and de-compressed to reach faster in correct density and thermodynamic equilibrium.2 The solubility parameters are defined by the intermolecular energy of the simulation box.

We subsequently use the simulation data to train a Graph Neural Network (GNN). The adopted machine learning model is very meticulously selected. Traditional machine learning algorithms and deep learning algorithms3 were also tested and provided high accuracy results, but molecular descriptors had to be generated to train the model against them (i.e. asphericity, molecular weight, number of hydrogen bonding donors/acceptors, radius of gyration, etc). The advantage of GNNs is that the complex chemical structure of a molecule can be naturally represented by a graph, defined as the ensemble of the connectivity relationships between a set of nodes (atoms) and a set of edges (bonds).4 That allows us to immediately train against the target solubility parameters without additional calculations.

Our hybrid protocol provides us with a very fast and accurate method for solubility calculation of industrial chemicals. The fast Molecular Dynamics technique should be performed only once for a specific chemical family. Once the GNN is trained then it automatically predicts the representative value for any other chemical of the same family. Such framework is more accurate than any Quantitative-Structure-Property-Relationship5 or data approach, while much faster than any particle simulation technique.

(1) Hansen, C. M., J. Paint Technol. 1967, 39, 511

(2) Belmares, M.; Blanco, M.; Goddard, W. A.; Ross, R.; Caldwell, G.; Chou, S.-H.; Pham, J.; Olofson, P. M.; Thomas, C., J. Comput. Chem. 2004, 25: 1814–1826

(3) Deng, T.; Jia, G.-Z., Molecular Physics, 2020, 118:2

(4) Lee, S.; Lee, M.; Gyak, K.-W.; Kim, S.; Kim, M.-J.; Min, K., ACS Omega 2022, 7, 14, 12268–12277

(5) Chinta, S.; Rengaswamy, R., Ind. Eng. Res. 2019, 58, 8, 3082–3092