(126f) Towards Transferrable and User-Friendly Machine Learning Models for Thermophysical Property Prediction - a Case Study with Melting Points | AIChE

(126f) Towards Transferrable and User-Friendly Machine Learning Models for Thermophysical Property Prediction - a Case Study with Melting Points

Authors 

Mtetwa, F. - Presenter, Brigham Young University
Knotts, T. IV, Brigham Young University
Wilding, W. V., Brigham Young University
Accurately predicting thermophysical properties is essential for various engineering and scientific applications such as designing thermal systems, materials processing and in the analysis of transport phenomena. One such property is melting point which is used to describe the temperature at which a solid to liquid phase transition occurs. Melting points can provide information on the stability of a crystal’s structure and provide insights into polymorphism which makes melting point important for both academic and industrial knowledge. In addition, aqueous solubility prediction is based on melting points via the general solubility equation (GSE) [1].

The prediction of melting points has proven to be challenging since this property not only depends on the structure of the molecule itself, but also on the structure of the solid (crystalline) phase and molten/liquid state of the substance [2]. Current prediction methods include group contribution (GC) methods as well as machine learning (ML) models based on quantitative structure-property relationships (QSPR). GC methods have limited accuracy due to their inability to capture complex intermolecular and intramolecular interactions as well lack of sufficient information for functional group increments.

Recent evaluation by Elliot et al. showed that on average the existing methods are not better than 20% and outliers are common for even simple organic compounds [3]. Even recent ML methods have root mean squared errors of 40 K which are not significant improvement. ML methods methods utilize molecular descriptors, primarily relying on 2D information and constitutional descriptors for single molecules. These descriptors are inherently more suitable for vapor phase properties.

The hypothesis of this work is that adding liquid phase and 3D descriptors will improve ML performance at predicting the property. Two methods, molecular dynamics (MD) and Ab initio or first principles simulations, were used to calculate these descriptors. MD simulations provide geometric information for the condensed phase. Ab initio calculations yield quantitative information about the 3D structure, properties, and behavior of molecules based on their atomic composition and electronic structure. The results show that including the best 2D, MD, and Ab initio descriptors improves interpretability and overall better prediction models.

References
[1] Yingqing Ran and Samuel H. Yalkowsky. Prediction of drug solubility by the general solubility equation (gse). Journal of Chemical Information and Computer Sciences, 41:354357, 1 2001.

[2] Laura D. Hughes, David S. Palmer, Florian Nigsch, and John B.O. Mitchell. Why are some properties more difficult to predict than others? a study of qspr models of solubility, melting point, and log p. Journal of Chemical Information and Modeling, 48:220–232,2008.

[3] Richard J. Elliot, Vladimir Diky, Thomas Knotts, and Vincent W. Wilding. The Properties of Gases and Liquids. McGraw Hill, sixth edition, 2023.