(356a) Shifting Paradigms: Using Molecular Dynamics with Machine Learning for the Calculation of Thermodynamic Properties of Fluids
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Applications of Data Science in Molecular Sciences II
Tuesday, November 17, 2020 - 8:00am to 8:15am
In this work, we employ machine learning algorithms such as artificial neural networks (ANN) and Gaussian process regression (GPR). The pros and cons of the models created by each technique are analysed, considering the amount of data required, the accuracy of the model and other further applications and drawbacks of each technique. With prior knowledge of the underlying theory relating these thermodynamic properties, hybrid models are created in the sense of using suitable data transformations with theoretical basis.
Using the SAFT- Mie coarse-grained force field, molecules can be described as chains of identical spherical segments employing four key molecular descriptors . A key aspect of the model is that there is an existing equation of state (EoS)2that can reproduce the outputs of the molecular simulations with quantitative accuracy. By generating pseudo-experimental data for a range of these parameters, as a proof-of-concept, we attempt to replicate the SAFT-VR Mie equation of state with machine learning, to predict critical properties , vapour-liquid phase equilibrium properties and supercritical densities from these molecular descriptors and state properties.
The comparison of PvT properties between the EoS and the ML model provides a benchmark to the viability of different machine learned models in predicting fluid phase properties. Different properties present different complexity of machine-learned model, with critical properties taking zero state properties as input being the simplest model, while supercritical properties requiring two state properties (temperature and pressure). While both ANN and GPR models are able to correlate with high degree of statistical performance with R2values above 0.99, the selection of activation functions and data transformation methods is essential. For a vapour pressure machine-learned model, the direct correlation between and results in a model that appears to give good R2score, but corresponds to extremely large average absolute deviation (AAD), as pressure varies across different orders of magnitude. Alternatively, although the use of the Clausius-Clapeyron relationship and training a model to correlate and might appear worse near the critical point, the overall performance in AAD improves significantly. Similarly, selecting the correct activation function improves the curves of the VLE envelope. Looking at individual saturated density curves, models with less smooth activation functions (such as the Matern kernel for GPR, or the ELU function for ANN) presents irregular shapes despite giving a statistically acceptable performance. This is because the machine is âinformedâ of the general shape of the model through the activation functions, which is essential to produce a good correlation. We show how the ideal activation function in this case for ANN would be the tanhactivation function, while similarly the radial basis function is the ideal kernel for GPR.
A pathway towards the correlation of fluid properties from molecular dynamics simulated data to a machine-learned model is being drawn out, with applications from both machine learning techniques. Comparing the two machine learning algorithms used, GPR requires much less data to achieve quantitative accuracy, and it is a more sophisticated technique as it calculates the variance of the predicted model which is useful for error analysis. However, it becomes increasingly expensive, from a computational point of view, as the complexity of the problem escalates with more input dimensions. In this sense, ANN is a more robust and flexible technique but requires a much larger set of data to be generated. Discussion along these lines is included in the presentation.
References:
[1] Müller, E. A.; Jackson, G. (2014). Force-field parameters from the SAFT-equation of state for use in coarse-grained molecular simulations. Annu. Rev. Chem. Biomolec. Eng., 5(1), 405â427.
[2] Lafitte, T., Apostolakou, A., Avendaño, C., Galindo, A., Adjiman, C.S., Müller, E.A., Jackson, G. (2013). Accurate statistical associating fluid theory for chain molecules formed from Mie segments. J Chem. Phys., 139(15), 154504.