(472e) Towards Transferrable and User-Friendly Machine Learning Models for Thermophysical Property Prediction-a Case Study with Normal Boiling Point and Critical Constants | AIChE

(472e) Towards Transferrable and User-Friendly Machine Learning Models for Thermophysical Property Prediction-a Case Study with Normal Boiling Point and Critical Constants

Authors 

Knotts, T. IV - Presenter, Brigham Young University
Mtetwa, F., Brigham Young University
Wilding, W. V., Brigham Young University
Giles, N., Brigham Young University
All modern fields of study and commerce are trying to capture the potential of machine learning (ML) for improvement. Prediction of thermophysical properties using the technique has exploded in popularity with many papers being published on the subject by various parties. However, wide adoption of ML models and methods is often difficult due to the “black box” nature of the technique. Specifically, it is often difficult for one group to reproduce the work of others due to data sets not being provided, complete specification of features not being outlined, or the model itself being withheld in the literature. Another issue when developing ML approaches involving chemicals is determining an optimal set of features to use. The chemicals comprising the training, validation, and test sets must be described in a mathematical form that is amenable to ML algorithms, and innumerable descriptors can be made. However, providing all such descriptors can result in sub-optimal models, and proper selection is not straightforward.

This talk will begin with an explanation of these issues and provide examples on how each can affect the performance of the model. The case study will concern the normal boiling point and critical point of a compound. The database for the training, validation, and test sets of the model consists of evaluated, experimental data from the DIPPR 801 database. Emphasis is placed on comparing and contrasting multiple approaches to feature selection and their effect on the model performance. The impact of limited data sets, such as those for critical point properties, compared to larger data sets like that for normal boiling point, will also be discussed.

The work culminates with a description of an optimal model for accurate prediction of the properties. A hallmark of the work is ease of use by external bodies, so the model input is only the SMILES of the compound—a bit of information that is easily created. Because the work was done with TensorFlow, the model is transferrable with the h5 file, so other groups can easily run the technique to predict normal boiling points for their compounds of interest.