(531f) Keynote Talk - Machine Learning of Molecular and Materials Properties at the Low-Data Limit | AIChE

(531f) Keynote Talk - Machine Learning of Molecular and Materials Properties at the Low-Data Limit

Authors 

Rangarajan, S. - Presenter, Lehigh University - Dept of Chem & Biomolecular
Tian, H., Lehigh University
Li, B., Lehigh University
The development of next-generation chemical manufacturing technologies will immensely benefit from robust, reliable, and accurate multiscale models that capture the underlying physical phenomena at multiple scales. A challenge, to this end, is to develop fast and accurate models of molecular and material properties using a combination of experimental and ab initio techniques. In this context, recent years have witnessed a burgeoning growth in the application of data science and machine learning for predicting such properties from large, homogeneous, and often computed molecular/material property datasets. Many properties, however, are hard to measure or compute; consequently, the available data in such cases is scarce, heterogeneous (that is available from across a variety of sources), and of differing fidelity. This talk will focus on data-driven model building for such “imperfect” datasets.

Using examples from molecule design and catalysis, this talk will focus on different methods, arising from process systems engineering, that can be applied to address the aforementioned challenges. First, active learning techniques that balance exploration of the molecule/material space and exploitation of the current model will be discussed to learn linear (generalized group additive) and nonlinear (graph convolutional neural network) property models. The second example will focus on the concept of transfer learning whereby information (model structure and features) from a model of a molecular property for which data is plentiful is subsequently leveraged (“transferred”) while training a model of a related property for which data is scarce. Third, we will show that, by infusing a relatively large amount of low fidelity data via multitask learning, the thermodynamic properties of adsorption on catalytic surfaces can be modeled to a high level of accuracy using only small amounts of high accuracy data.

Topics