(364d) Multi-Task Property Prediction: Importance of the “Chemist-in-the-Loop” in Model Building | AIChE

(364d) Multi-Task Property Prediction: Importance of the “Chemist-in-the-Loop” in Model Building

Authors 

Aouichaoui, A. - Presenter, Technical University of Denmark
Sin, G., Technical University of Denmark
Abildskov, J., Technical University of Denmark
Mansouri, S. S., Technical University of Denmark
In absence of experimental data, the ability to predict various thermophysical properties related to chemicals of interest becomes crucial in many engineering applications. Among these applications are phase-equilibria calculations, energy balances, and the evaluation of process alternatives [1]. Quantitative Structure-Property Relationship models or “QSPR” models enable such predictions by relating the molecular structure in a machine-readable format (molecular descriptors) to the property of interest through a mathematical model [2].

Recent developments in the field of Deep Learning (DL) and especially Graph Neural Networks (GNN) have eliminated the tedious task of choosing a suitable molecular descriptor for the task at hand, as they can learn an optimal representation from a molecular graph and map them to the target property through backpropagation [3], [4]. Traditionally, models are built to predict one specific property or target. However, DL models can predict several properties simultaneously also known as multitask learning [5]. Here the models might improve their performance through inductive transfer learning i.e. while learning to predict property “A”, the model might need less effort to learn how to predict property “B” and might also be able to transfer the newly gained knowledge into the new domain (task) [5]. This is especially relevant in cases where good quality experimental data are scarce [6]. However, an improvement of the model's predictive prowess is not always the case and “negative” transfer might occur [7].

In this work, we will demonstrate that domain knowledge plays an important role in ensuring “positive” learning when dealing with the multi-task prediction of molecular properties by showcasing two case studies: 1) “seemingly” non-related properties in the form of the Gibbs free energy of formation and the acentric factor 2) theoretically related properties in the form of the critical temperature and the acentric factor. For each property, a single task GNN-based model will be developed to serve as a benchmark model to illustrate the potential improvement when applying multi-task transfer learning. The comparative assessment between the two case studies will demonstrate that although AI-based techniques and tools might offer improved results compared to conventional modeling techniques, the “chemist-in-the-loop” [8] is an indispensable element in building and improving AI-based property prediction models.

References

[1] J. Frutiger, I. Bell, J. P. O’Connell, K. Kroenlein, J. Abildskov, and G. Sin, “Uncertainty assessment of equations of state with application to an organic Rankine cycle†,” Mol. Phys., vol. 115, no. 9–12, pp. 1225–1244, 2017.

[2] N. D. Austin, N. V. Sahinidis, and D. W. Trahan, “Computer-aided molecular design: An introduction and review of tools, applications, and solution techniques,” Chem. Eng. Res. Des., vol. 116, pp. 2–26, 2016.

[3] C. W. Coley, R. Barzilay, W. H. Green, T. S. Jaakkola, and K. F. Jensen, “Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction,” J. Chem. Inf. Model., vol. 57, no. 8, pp. 1757–1772, Aug. 2017.

[4] K. Yang et al., “Analyzing Learned Molecular Representations for Property Prediction,” J. Chem. Inf. Model., vol. 59, no. 8, pp. 3370–3388, 2019.

[5] S. Ruder, “An Overview of Multi-Task Learning in Deep Neural Networks,” arXiv, no. May, Jun. 2017.

[6] A. M. Schweidtmann, J. G. Rittig, A. König, M. Grohe, A. Mitsos, and M. Dahmen, “Graph Neural Networks for Prediction of Fuel Ignition Quality,” Energy & Fuels, vol. 34, no. 9, pp. 11395–11407, Sep. 2020.

[7] W. Zhang, L. Deng, L. Zhang, and D. Wu, “Overcoming Negative Transfer: A Survey,” arXiv, pp. 1–15, Sep. 2020.

[8] T. J. Wills, D. A. Polshakov, M. C. Robinson, and A. A. Lee, “Impact of Chemist-In-The-Loop Molecular Representations on Machine Learning Outcomes,” J. Chem. Inf. Model., vol. 60, no. 10, pp. 4449–4456, Oct. 2020.