(356a) Shifting Paradigms: Using Molecular Dynamics with Machine Learning for the Calculation of Thermodynamic Properties of Fluids | AIChE

(356a) Shifting Paradigms: Using Molecular Dynamics with Machine Learning for the Calculation of Thermodynamic Properties of Fluids

Authors 

Müller, E. A. - Presenter, Imperial College London
Zhu, K., Imperial College London
The accurate prediction of fluid properties holds an essential role for chemical engineering, with ubiquitous applications in many areas from process design, reaction engineering to fluid dynamics. For this purpose, the current chemical engineering folklore employs empirical and/or theory-based correlations based on a very reduced number of experimental observations. The limitations are obvious, particularly when extrapolating to unknown compounds and/or thermodynamic conditions remote from those represented by the experiments. We propose a new workflow, where we strive to combine molecular simulations (validated against experiments) to deliver an extensive and reasonably accurate database of thermodynamic properties with machine learning (ML) algorithms to provide for the synthesis and correlation of the information. We suggest that this new paradigm can, in some instances, take over the classical experiment/equation of state tools currently in use in the field. As a proof-of-concept we explore the use of the SAFT- Mie force field1to generate critical properties, vapor pressures and saturated densities of a representative family of common fluids and correlate them with ML techniques.

In this work, we employ machine learning algorithms such as artificial neural networks (ANN) and Gaussian process regression (GPR). The pros and cons of the models created by each technique are analysed, considering the amount of data required, the accuracy of the model and other further applications and drawbacks of each technique. With prior knowledge of the underlying theory relating these thermodynamic properties, hybrid models are created in the sense of using suitable data transformations with theoretical basis.

Using the SAFT- Mie coarse-grained force field, molecules can be described as chains of identical spherical segments employing four key molecular descriptors . A key aspect of the model is that there is an existing equation of state (EoS)2that can reproduce the outputs of the molecular simulations with quantitative accuracy. By generating pseudo-experimental data for a range of these parameters, as a proof-of-concept, we attempt to replicate the SAFT-VR Mie equation of state with machine learning, to predict critical properties , vapour-liquid phase equilibrium properties and supercritical densities from these molecular descriptors and state properties.

The comparison of PvT properties between the EoS and the ML model provides a benchmark to the viability of different machine learned models in predicting fluid phase properties. Different properties present different complexity of machine-learned model, with critical properties taking zero state properties as input being the simplest model, while supercritical properties requiring two state properties (temperature and pressure). While both ANN and GPR models are able to correlate with high degree of statistical performance with R2values above 0.99, the selection of activation functions and data transformation methods is essential. For a vapour pressure machine-learned model, the direct correlation between and results in a model that appears to give good R2score, but corresponds to extremely large average absolute deviation (AAD), as pressure varies across different orders of magnitude. Alternatively, although the use of the Clausius-Clapeyron relationship and training a model to correlate and might appear worse near the critical point, the overall performance in AAD improves significantly. Similarly, selecting the correct activation function improves the curves of the VLE envelope. Looking at individual saturated density curves, models with less smooth activation functions (such as the Matern kernel for GPR, or the ELU function for ANN) presents irregular shapes despite giving a statistically acceptable performance. This is because the machine is “informed” of the general shape of the model through the activation functions, which is essential to produce a good correlation. We show how the ideal activation function in this case for ANN would be the tanhactivation function, while similarly the radial basis function is the ideal kernel for GPR.

A pathway towards the correlation of fluid properties from molecular dynamics simulated data to a machine-learned model is being drawn out, with applications from both machine learning techniques. Comparing the two machine learning algorithms used, GPR requires much less data to achieve quantitative accuracy, and it is a more sophisticated technique as it calculates the variance of the predicted model which is useful for error analysis. However, it becomes increasingly expensive, from a computational point of view, as the complexity of the problem escalates with more input dimensions. In this sense, ANN is a more robust and flexible technique but requires a much larger set of data to be generated. Discussion along these lines is included in the presentation.

References:

[1] Müller, E. A.; Jackson, G. (2014). Force-field parameters from the SAFT-equation of state for use in coarse-grained molecular simulations. Annu. Rev. Chem. Biomolec. Eng., 5(1), 405–427.

[2] Lafitte, T., Apostolakou, A., Avendaño, C., Galindo, A., Adjiman, C.S., Müller, E.A., Jackson, G. (2013). Accurate statistical associating fluid theory for chain molecules formed from Mie segments. J Chem. Phys., 139(15), 154504.