(126f) Towards Transferrable and User-Friendly Machine Learning Models for Thermophysical Property Prediction - a Case Study with Melting Points
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Engineering Sciences and Fundamentals
Thermophysical Properties and Phase Behavior I
Monday, October 28, 2024 - 1:35pm to 1:48pm
The prediction of melting points has proven to be challenging since this property not only depends on the structure of the molecule itself, but also on the structure of the solid (crystalline) phase and molten/liquid state of the substance [2]. Current prediction methods include group contribution (GC) methods as well as machine learning (ML) models based on quantitative structure-property relationships (QSPR). GC methods have limited accuracy due to their inability to capture complex intermolecular and intramolecular interactions as well lack of sufficient information for functional group increments.
Recent evaluation by Elliot et al. showed that on average the existing methods are not better than 20% and outliers are common for even simple organic compounds [3]. Even recent ML methods have root mean squared errors of 40 K which are not significant improvement. ML methods methods utilize molecular descriptors, primarily relying on 2D information and constitutional descriptors for single molecules. These descriptors are inherently more suitable for vapor phase properties.
The hypothesis of this work is that adding liquid phase and 3D descriptors will improve ML performance at predicting the property. Two methods, molecular dynamics (MD) and Ab initio or first principles simulations, were used to calculate these descriptors. MD simulations provide geometric information for the condensed phase. Ab initio calculations yield quantitative information about the 3D structure, properties, and behavior of molecules based on their atomic composition and electronic structure. The results show that including the best 2D, MD, and Ab initio descriptors improves interpretability and overall better prediction models.
References
[1] Yingqing Ran and Samuel H. Yalkowsky. Prediction of drug solubility by the general solubility equation (gse). Journal of Chemical Information and Computer Sciences, 41:354357, 1 2001.
[2] Laura D. Hughes, David S. Palmer, Florian Nigsch, and John B.O. Mitchell. Why are some properties more difficult to predict than others? a study of qspr models of solubility, melting point, and log p. Journal of Chemical Information and Modeling, 48:220â232,2008.
[3] Richard J. Elliot, Vladimir Diky, Thomas Knotts, and Vincent W. Wilding. The Properties of Gases and Liquids. McGraw Hill, sixth edition, 2023.