(544l) Optimization of Vaspsol Solvation Free Energy Predictions | AIChE

(544l) Optimization of Vaspsol Solvation Free Energy Predictions

Authors 

Hennig, R. G., Cornell University
Florez, S., University of Florida
Density functional theory (DFT) can accurately predict material properties and reaction barriers. However, DFT is often limited to small system sizes due to high computational costs. In addition, many properties and reaction barriers are dramatically different when solvent molecules surround the atomic system. To approximate the solvation effect, computational chemists use continuum models to mimic the countless number of solvent molecules in these systems. Continuum models attempt to capture the effect of the solvent on solute molecules and surfaces while dramatically reducing the computational cost. VASPsol uses a polarizable continuum model within VASP, a plane-wave DFT code. VASPsol has three parameters that control the cavity shape and, thus the solvation energy. These three parameters are the electronic cutoff of the cavity () ,the breadth of the diffuse cavity boundary (), and the cavity surface tension (). The present work sought to optimize the VASPsol cavity parameters by minimizing solvation energy errors compared to experimentally measured values.

Our work will demonstrate the techniques used to reduce VASPsol’s prediction error of solvation free energies against a dataset of neutral compounds solvated in various solvents. We apply three methods to find a set of VASPsol cavity parameters that result in lower errors in the entire Truhlar set. First, we attempt a grid search across electronic cutoff and transition of the diffuse cavity. The surface tension of the cavity is initially fixed for the grid search. We also apply a Nelder-Mead search for the absolute minimum error for all three VASPsol parameters, including the surface tension.

For our machine learning descriptors, we used the COSMO-SAC sigma profile. The sigma profile is a histogram of the charge density on the solvent-accessible surface area for each molecule as defined by the COSMO procedure. The COSMO sigma profile is thus used as a feature in the general analysis of the molecular set. Using this constructed dataset of sigma profiles, VASPsol parameters ( and ), and solvation energy errors, we can construct artificial neural networks trained to learn the errors that VASPsol produces for a given molecule and VASPsol parameters. We used experimental solvation energies from the Truhlar Minnesota dataset. We first assemble a set of 9 candidate molecules balanced by using sigma profiles. We access the sigma profile of these molecules using online databases, such as the Virginia Tech dataset by Mullins et al. Then we build a set of 9 training molecules from chemical intuition. These molecules also have an average sigma profile that matches the Truhlar set average sigma profile.

Parameters obtained from grid searches or direct optimization with the training set produce larger errors against the entire Truhlar set. Despite our efforts, the training set appears biased and caused us to overfit our VASPsol parameters. From each procedure, the minima and their corresponding errors can be found in figure 1. We can see that the traditional approaches with a small dataset resulted in worse errors than default VASPsol parameters. As a new approach, we train an artificial neural network on our dataset constructed from benchmarks and prior optimization attempts. This artificial neural network will be used to guide the optimization towards a minimum for the entire Truhlar dataset when computed with VASPsol. We train the artificial neural networks with the grid search, Nelder-Mead optimizations, and other validations conducted with the VASPsol code against the Truhlar set. The neural network then predicts the errors VASPsol produces for each molecule given the sigma profile representation and the VASPsol cavity parameters. This approach, known as delta-learning, will allow us to minimize the outputs of our neural network by changing the VASPsol cavity parameters. The extended benefit of this method is that we obtain a model that generalizes how VASPsol predictions deviate in response to parameter changes for any molecule. By analyzing the resultant errors across chemical groups, we optimize VASPsol parameters for multiple solvents. We show that VASPsol can lead to more accurate simulations for the larger community using these optimized parameters.