(509cs) Combining Uncertainty Metrics to Control Neural Network Error and Accelerate Chemical Exploration

Conference

AIChE Annual Meeting

Year

2021

Proceeding

2021 Annual Meeting

Group

Catalysis and Reaction Engineering Division

Session

Poster Session: Catalysis and Reaction Engineering (CRE) Division

Time

Wednesday, November 10, 2021 - 3:30pm to 5:00pm

Authors

Musa, E. - Presenter, University of Michigan

Gruich, C. - Presenter, Mississippi State University

Goldsmith, B.

Doherty, F.

Machine learning (ML) in computational chemistry research promises to accelerate chemical and materials discovery. ML approaches using state-of-the-art methods such as artificial neural networks (NNs) have been demonstrated to reproduce materials structures and energetics of high-fidelity quantum mechanical calculations but at a fraction of the computational cost. However, the effective application of ML to computational chemistry workflows requires that the uncertainty of the ML modelâ€™s predictions be accurately estimated, and the error controlled. Established uncertainty metrics for NNs are costly to obtain (e.g., ensemble method) or have limited effectiveness in predicting error (e.g., Monte-Carlo dropout).

Previous work using a NN to predict the energetics of small molecules showed that a K-nearest neighbors distance in the latent space could be used to predict error more accurately than the dropout method, and had comparable performance to the ensemble method while being more computationally tractable^[1]. While promising, it is unknown how well these results for small molecules translate to solid-state materials and heterogeneous catalysis. Here we will answer this question, as well as examine two new latent space uncertainty metrics we dub the â€œlatent densityâ€ and â€œlatent probabilityâ€. We also hypothesize that combining latent space metrics with input space metrics gives an even better indicator of prediction error than using purely latent space metrics.

We analyze different proposed latent space uncertainty metrics and compare their efficacy in controlling NN error on the Open Catalyst Dataset (OC20)^[2]. We train a NN on a subset of reference data from OC20 and test the error prediction performance of existing NN uncertainty metrics (Monte-Carlo dropout, ensemble method, and feature space distances), latent space metrics (K-nearest neighbors distance, latent density, and latent probability), and metrics combining latent space and feature space information. Ultimately, developing better uncertainty estimation approaches for NNs for solid-state materials will enable wider spread adoption of ML for computational chemistry research, and accelerate novel materials exploration.

References:

[1] Janet, Duan, Yang, Nandy, and Kulik. â€œA quantitative uncertainty metric controls error in neural network-driven chemical discoveryâ€, Chem. Sci., (2019), 10, 7913

[2] Ulissi et al. â€œThe Open Catalyst 2020 (OC20) Dataset and Community Challengesâ€ arXiv, (2021), 2010.09990

Topics

Catalysis

Computational Molecular Engineering

Nanomaterials

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: December 2024

CEP: November 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.