(364d) Multi-Task Property Prediction: Importance of the “Chemist-in-the-Loop” in Model Building
AIChE Annual Meeting
2021
2021 Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Applications of Data Science in Molecular Sciences II
Tuesday, November 9, 2021 - 4:06pm to 4:18pm
Recent developments in the field of Deep Learning (DL) and especially Graph Neural Networks (GNN) have eliminated the tedious task of choosing a suitable molecular descriptor for the task at hand, as they can learn an optimal representation from a molecular graph and map them to the target property through backpropagation [3], [4]. Traditionally, models are built to predict one specific property or target. However, DL models can predict several properties simultaneously also known as multitask learning [5]. Here the models might improve their performance through inductive transfer learning i.e. while learning to predict property âAâ, the model might need less effort to learn how to predict property âBâ and might also be able to transfer the newly gained knowledge into the new domain (task) [5]. This is especially relevant in cases where good quality experimental data are scarce [6]. However, an improvement of the model's predictive prowess is not always the case and ânegativeâ transfer might occur [7].
In this work, we will demonstrate that domain knowledge plays an important role in ensuring âpositiveâ learning when dealing with the multi-task prediction of molecular properties by showcasing two case studies: 1) âseeminglyâ non-related properties in the form of the Gibbs free energy of formation and the acentric factor 2) theoretically related properties in the form of the critical temperature and the acentric factor. For each property, a single task GNN-based model will be developed to serve as a benchmark model to illustrate the potential improvement when applying multi-task transfer learning. The comparative assessment between the two case studies will demonstrate that although AI-based techniques and tools might offer improved results compared to conventional modeling techniques, the âchemist-in-the-loopâ [8] is an indispensable element in building and improving AI-based property prediction models.
References
[1] J. Frutiger, I. Bell, J. P. OâConnell, K. Kroenlein, J. Abildskov, and G. Sin, âUncertainty assessment of equations of state with application to an organic Rankine cycleâ ,â Mol. Phys., vol. 115, no. 9â12, pp. 1225â1244, 2017.
[2] N. D. Austin, N. V. Sahinidis, and D. W. Trahan, âComputer-aided molecular design: An introduction and review of tools, applications, and solution techniques,â Chem. Eng. Res. Des., vol. 116, pp. 2â26, 2016.
[3] C. W. Coley, R. Barzilay, W. H. Green, T. S. Jaakkola, and K. F. Jensen, âConvolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction,â J. Chem. Inf. Model., vol. 57, no. 8, pp. 1757â1772, Aug. 2017.
[4] K. Yang et al., âAnalyzing Learned Molecular Representations for Property Prediction,â J. Chem. Inf. Model., vol. 59, no. 8, pp. 3370â3388, 2019.
[5] S. Ruder, âAn Overview of Multi-Task Learning in Deep Neural Networks,â arXiv, no. May, Jun. 2017.
[6] A. M. Schweidtmann, J. G. Rittig, A. König, M. Grohe, A. Mitsos, and M. Dahmen, âGraph Neural Networks for Prediction of Fuel Ignition Quality,â Energy & Fuels, vol. 34, no. 9, pp. 11395â11407, Sep. 2020.
[7] W. Zhang, L. Deng, L. Zhang, and D. Wu, âOvercoming Negative Transfer: A Survey,â arXiv, pp. 1â15, Sep. 2020.
[8] T. J. Wills, D. A. Polshakov, M. C. Robinson, and A. A. Lee, âImpact of Chemist-In-The-Loop Molecular Representations on Machine Learning Outcomes,â J. Chem. Inf. Model., vol. 60, no. 10, pp. 4449â4456, Oct. 2020.