(203c) Characterizing Uncertainty and Error in Machine Learning Chemical Property Prediction

Conference

AIChE Annual Meeting

Year

2021

Proceeding

2021 Annual Meeting

Group

Topical Conference: Applications of Data Science to Molecules and Materials

Session

Innovations in Methods of Data Science

Time

Monday, November 8, 2021 - 4:00pm to 4:15pm

Authors

Heid, E. - Presenter, Massachusetts Institute of Technology

McGill, C. J., North Carolina State University

Vermeire, F., Massachusetts Institute of Technology

Green, W., Massachusetts Institute of Technology

Deep neural networks recently have made a tremendous impact in chemical engineering disciplines, where graph-convolutional neural networks can predict molecular properties with higher accuracy than state-of-the-art techniques. One of the main remaining challenges is to better understand and quantify the different sources of uncertainty associated with molecular property prediction. The uncertainty of the predictions of a machine learning model and thus the deviations from the true target values are characterized by uncertainties due to the model (epistemic uncertainty, including model bias, parameter uncertainty and interpolation uncertainty), as well as uncertainties due to the underlying data and its noise (aleatoric uncertainty). The allocation of the error of a model into epistemic and aleatoric contributions is non-trivial, but essential to characterize possibilities for performance improvements. The categorization of sources of uncertainty in a machine learning model is especially difficult for chemical applications, where the vast chemical space and the diverse nature and number of targets enable a multitude of possible sources for prediction errors. The high dimensionality of chemical space furthermore complicates the differentiation between interpolation and extrapolation for a model prediction.

We systematically study the influence of model bias, such as errors due to the model architecture and input representation, model variance, as well as target data noise on the performance of graph-convolutional neural networks on chemical prediction tasks. Through a clever design of molecular prediction tasks for which an exact solution is known and achievable for a graph-convolutional neural network, we are able to add errors to the data and models in a controlled manner and study the effects on model performance. We combine the addition of controlled errors with different uncertainty estimation techniques, changes to model architecture, and changes in the size or makeup of the dataset to demonstrate trends important to users of machine learning for property prediction. We show that under the influence of random noise in the training and test set, the true performance of a model can continue to improve with larger datasets but the apparent performance will approach an asymptote and cease to improve. Further, we demonstrate the utility of using heteroscedastic and homoscedastic loss functions to assess the presence of noise errors in the dataset, when those errors are associated with model features and when they are not. We apply measured ensemble variance as a method of assessing epistemic error and use statistical analysis of the results to project how much of the model error is due to variance that can be observed with ensembling and how much is a baseline bias. Using trends over batch size and observed interactions between different uncertainty characterizations, we provide methods for estimating the contribution of different error types to the model performance, the likely effects of adding more data, and the maximum benefit available from ensembling.

Topics

Computational Molecular Engineering

Physical Properties

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: November 2024

CEP: October 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.