(364d) Multi-Task Property Prediction: Importance of the “Chemist-in-the-Loop” in Model Building

Conference

AIChE Annual Meeting

Year

2021

Proceeding

2021 Annual Meeting

Group

Topical Conference: Applications of Data Science to Molecules and Materials

Session

Applications of Data Science in Molecular Sciences II

Time

Tuesday, November 9, 2021 - 4:06pm to 4:18pm

Authors

Aouichaoui, A. - Presenter, Technical University of Denmark

Sin, G., Technical University of Denmark

Abildskov, J., Technical University of Denmark

Mansouri, S. S., Technical University of Denmark

In absence of experimental data, the ability to predict various thermophysical properties related to chemicals of interest becomes crucial in many engineering applications. Among these applications are phase-equilibria calculations, energy balances, and the evaluation of process alternatives [1]. Quantitative Structure-Property Relationship models or â€œQSPRâ€ models enable such predictions by relating the molecular structure in a machine-readable format (molecular descriptors) to the property of interest through a mathematical model [2].

Recent developments in the field of Deep Learning (DL) and especially Graph Neural Networks (GNN) have eliminated the tedious task of choosing a suitable molecular descriptor for the task at hand, as they can learn an optimal representation from a molecular graph and map them to the target property through backpropagation [3], [4]. Traditionally, models are built to predict one specific property or target. However, DL models can predict several properties simultaneously also known as multitask learning [5]. Here the models might improve their performance through inductive transfer learning i.e. while learning to predict property â€œAâ€, the model might need less effort to learn how to predict property â€œBâ€ and might also be able to transfer the newly gained knowledge into the new domain (task) [5]. This is especially relevant in cases where good quality experimental data are scarce [6]. However, an improvement of the model's predictive prowess is not always the case and â€œnegativeâ€ transfer might occur [7].

In this work, we will demonstrate that domain knowledge plays an important role in ensuring â€œpositiveâ€ learning when dealing with the multi-task prediction of molecular properties by showcasing two case studies: 1) â€œseeminglyâ€ non-related properties in the form of the Gibbs free energy of formation and the acentric factor 2) theoretically related properties in the form of the critical temperature and the acentric factor. For each property, a single task GNN-based model will be developed to serve as a benchmark model to illustrate the potential improvement when applying multi-task transfer learning. The comparative assessment between the two case studies will demonstrate that although AI-based techniques and tools might offer improved results compared to conventional modeling techniques, the â€œchemist-in-the-loopâ€ [8] is an indispensable element in building and improving AI-based property prediction models.

References

[1] J. Frutiger, I. Bell, J. P. Oâ€™Connell, K. Kroenlein, J. Abildskov, and G. Sin, â€œUncertainty assessment of equations of state with application to an organic Rankine cycleâ€ ,â€ Mol. Phys., vol. 115, no. 9â€“12, pp. 1225â€“1244, 2017.

[2] N. D. Austin, N. V. Sahinidis, and D. W. Trahan, â€œComputer-aided molecular design: An introduction and review of tools, applications, and solution techniques,â€ Chem. Eng. Res. Des., vol. 116, pp. 2â€“26, 2016.

[3] C. W. Coley, R. Barzilay, W. H. Green, T. S. Jaakkola, and K. F. Jensen, â€œConvolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction,â€ J. Chem. Inf. Model., vol. 57, no. 8, pp. 1757â€“1772, Aug. 2017.

[4] K. Yang et al., â€œAnalyzing Learned Molecular Representations for Property Prediction,â€ J. Chem. Inf. Model., vol. 59, no. 8, pp. 3370â€“3388, 2019.

[5] S. Ruder, â€œAn Overview of Multi-Task Learning in Deep Neural Networks,â€ arXiv, no. May, Jun. 2017.

[6] A. M. Schweidtmann, J. G. Rittig, A. KÃ¶nig, M. Grohe, A. Mitsos, and M. Dahmen, â€œGraph Neural Networks for Prediction of Fuel Ignition Quality,â€ Energy & Fuels, vol. 34, no. 9, pp. 11395â€“11407, Sep. 2020.

[7] W. Zhang, L. Deng, L. Zhang, and D. Wu, â€œOvercoming Negative Transfer: A Survey,â€ arXiv, pp. 1â€“15, Sep. 2020.

[8] T. J. Wills, D. A. Polshakov, M. C. Robinson, and A. A. Lee, â€œImpact of Chemist-In-The-Loop Molecular Representations on Machine Learning Outcomes,â€ J. Chem. Inf. Model., vol. 60, no. 10, pp. 4449â€“4456, Oct. 2020.

Topics

Computational Molecular Engineering

Computing and Systems Engineering

Physical Properties

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.