(509cs) Combining Uncertainty Metrics to Control Neural Network Error and Accelerate Chemical Exploration | AIChE

(509cs) Combining Uncertainty Metrics to Control Neural Network Error and Accelerate Chemical Exploration

Authors 

Musa, E. - Presenter, University of Michigan
Gruich, C. - Presenter, Mississippi State University
Machine learning (ML) in computational chemistry research promises to accelerate chemical and materials discovery. ML approaches using state-of-the-art methods such as artificial neural networks (NNs) have been demonstrated to reproduce materials structures and energetics of high-fidelity quantum mechanical calculations but at a fraction of the computational cost. However, the effective application of ML to computational chemistry workflows requires that the uncertainty of the ML model’s predictions be accurately estimated, and the error controlled. Established uncertainty metrics for NNs are costly to obtain (e.g., ensemble method) or have limited effectiveness in predicting error (e.g., Monte-Carlo dropout).

Previous work using a NN to predict the energetics of small molecules showed that a K-nearest neighbors distance in the latent space could be used to predict error more accurately than the dropout method, and had comparable performance to the ensemble method while being more computationally tractable [1]. While promising, it is unknown how well these results for small molecules translate to solid-state materials and heterogeneous catalysis. Here we will answer this question, as well as examine two new latent space uncertainty metrics we dub the “latent density” and “latent probability”. We also hypothesize that combining latent space metrics with input space metrics gives an even better indicator of prediction error than using purely latent space metrics.

We analyze different proposed latent space uncertainty metrics and compare their efficacy in controlling NN error on the Open Catalyst Dataset (OC20) [2]. We train a NN on a subset of reference data from OC20 and test the error prediction performance of existing NN uncertainty metrics (Monte-Carlo dropout, ensemble method, and feature space distances), latent space metrics (K-nearest neighbors distance, latent density, and latent probability), and metrics combining latent space and feature space information. Ultimately, developing better uncertainty estimation approaches for NNs for solid-state materials will enable wider spread adoption of ML for computational chemistry research, and accelerate novel materials exploration.

References:

[1] Janet, Duan, Yang, Nandy, and Kulik. “A quantitative uncertainty metric controls error in neural network-driven chemical discovery”, Chem. Sci., (2019), 10, 7913

[2] Ulissi et al. “The Open Catalyst 2020 (OC20) Dataset and Community Challenges” arXiv, (2021), 2010.09990