(544l) Optimization of Vaspsol Solvation Free Energy Predictions
AIChE Annual Meeting
2023
2023 AIChE Annual Meeting
Computational Molecular Science and Engineering Forum
Recent Advances in Molecular Simulation Methods II
Thursday, November 9, 2023 - 5:42pm to 5:54pm
Our work will demonstrate the techniques used to reduce VASPsolâs prediction error of solvation free energies against a dataset of neutral compounds solvated in various solvents. We apply three methods to find a set of VASPsol cavity parameters that result in lower errors in the entire Truhlar set. First, we attempt a grid search across electronic cutoff and transition of the diffuse cavity. The surface tension of the cavity is initially fixed for the grid search. We also apply a Nelder-Mead search for the absolute minimum error for all three VASPsol parameters, including the surface tension.
For our machine learning descriptors, we used the COSMO-SAC sigma profile. The sigma profile is a histogram of the charge density on the solvent-accessible surface area for each molecule as defined by the COSMO procedure. The COSMO sigma profile is thus used as a feature in the general analysis of the molecular set. Using this constructed dataset of sigma profiles, VASPsol parameters ( and ), and solvation energy errors, we can construct artificial neural networks trained to learn the errors that VASPsol produces for a given molecule and VASPsol parameters. We used experimental solvation energies from the Truhlar Minnesota dataset. We first assemble a set of 9 candidate molecules balanced by using sigma profiles. We access the sigma profile of these molecules using online databases, such as the Virginia Tech dataset by Mullins et al. Then we build a set of 9 training molecules from chemical intuition. These molecules also have an average sigma profile that matches the Truhlar set average sigma profile.
Parameters obtained from grid searches or direct optimization with the training set produce larger errors against the entire Truhlar set. Despite our efforts, the training set appears biased and caused us to overfit our VASPsol parameters. From each procedure, the minima and their corresponding errors can be found in figure 1. We can see that the traditional approaches with a small dataset resulted in worse errors than default VASPsol parameters. As a new approach, we train an artificial neural network on our dataset constructed from benchmarks and prior optimization attempts. This artificial neural network will be used to guide the optimization towards a minimum for the entire Truhlar dataset when computed with VASPsol. We train the artificial neural networks with the grid search, Nelder-Mead optimizations, and other validations conducted with the VASPsol code against the Truhlar set. The neural network then predicts the errors VASPsol produces for each molecule given the sigma profile representation and the VASPsol cavity parameters. This approach, known as delta-learning, will allow us to minimize the outputs of our neural network by changing the VASPsol cavity parameters. The extended benefit of this method is that we obtain a model that generalizes how VASPsol predictions deviate in response to parameter changes for any molecule. By analyzing the resultant errors across chemical groups, we optimize VASPsol parameters for multiple solvents. We show that VASPsol can lead to more accurate simulations for the larger community using these optimized parameters.