(477k) Interpretable Model for Molecular Data Fusion | AIChE

(477k) Interpretable Model for Molecular Data Fusion

Authors 

Bhattacharjee, H. - Presenter, University of Delaware
Vlachos, D. - Presenter, University of Delaware - Catalysis Center For Ener
In this work, we introduce a graph-theoretical framework for fusing thermochemical data across different levels of quantum theory. We show that it can be used to predict different thermochemical quantities at a higher level of theory using a quantity at a lower level of theory. The generalizability of the model is investigated, and rigorous statistical tests are used to guarantee bounds in model predictions. Two important aspects of machine learnt models are addressed: domain knowledge and interpretability. We show how our model draws from chemical knowledge and thus provides an interpretable mapping across levels of theory and quantities of interest. These us to draw physical insight from the learning process. The approach is illustrated with multiple mapping tasks, and levels of quantum theory for a dataset of ~12k molecules. Chemical accuracy (1 kcal/mol) is approached for one task and is surpassed for all the others. Our approach provides a blueprint of how to merge disparate and incomplete literature datasets, built using different levels of theory, to create a more comprehensive thermochemical database for applications such as Big Data analysis.