(474i) Exploring the Impact of Training Data Distributions on the Accuracy of Machine Learning Force Fields | AIChE

(474i) Exploring the Impact of Training Data Distributions on the Accuracy of Machine Learning Force Fields

Neural network interatomic potentials (NNIPs) have become an increasingly popular method in molecular simulations because they provide accurate and computationally efficient evaluations of atomic interactions. In this study, we investigate the impact of the configurational sampling distribution on the accuracy of NNIPs for two molecular systems, butane and alanine dipeptide. Specifically, we examine scenarios where the range of the collective variables (CVs) required for free energy surface determination is not known a priori and investigate the effect of the training data distribution on the accuracy of NNIPs. To address this, we create representative datasets that mimic various distributions of configurations and use hyperparameter optimization to train the NNIP models while analyzing their dependence on system size and shape. Additionally, we propose a comprehensive testing procedure that evaluates energy/force predictions, molecular dynamics stability, structural properties, thermodynamic properties, and free energy surface determination using enhanced sampling techniques to validate the accuracy and stability of the models. Our study highlights the challenges associated with generating a training dataset with a sufficient representation of CVs for these molecular systems and provides a robust testing methodology for accurate free energy surface determination using NNIPs and enhanced sampling techniques when the required CV range is unknown.