(257a) Physics and Data-Informed Formulation Design and Development for Chemical Processes | AIChE

(257a) Physics and Data-Informed Formulation Design and Development for Chemical Processes

Authors 

Aglave, R. - Presenter, Siemens PLM Software
Mallick, S., Siemens
Boioli, F., Kaizen Solutions
Petris, P., Siemens Digital Industries Software
Mas, P., Siemens Digital Industries Software
The design of novel material formulations is a critical innovation process. The shelf-life and the performance of the final product depends on fast and informed decisions adopted early in the design phase. Digital ways to achieve this can not only accelerate the project timescales but reduce repetitive tasks and errors. Formulation management has been digital for a a while now in the industry. However, it has been done disjointed with the deep chemistry related insight necessary to innovation.

The use of molecular scale modeling is a method that has been used as well in the industry but not integrated into the formulation management workflows. In this work we present how the connection between formulation management and chemistry innovation can be carried out in the context of innovation and innovation project management.

For this purpose, we take solubility as a variable, critical in many aspects of formulation development. As a first step, a limited set of molecules from a specific chemical family (or families) is selected, and their solubility is calculated using molecular simulations. More specifically we estimate the Hansen solubility parameters1 using a fast Molecular Dynamics technique, where the simulation box is successively compressed and de-compressed to reach faster in correct density and thermodynamic equilibrium.2 The solubility parameters are defined by the intermolecular energy of the simulation box.

However, to reduce further the timescales of development, we combined Graph Neural Network (GNN) architectures with physics-based simulations to estimate materials properties of industrial formulations. The adopted machine learning model is very meticulously selected. Traditional machine learning algorithms and deep learning algorithms3 were also tested and provided high accuracy results, but molecular descriptors had to be generated to train the model against them (i.e. asphericity, molecular weight, number of hydrogen bonding donors/acceptors, radius of gyration, etc). The advantage of GNNs is that the complex chemical structure of a molecule can be naturally represented by a graph, defined as the ensemble of the connectivity relationships between a set of nodes (atoms) and a set of edges (bonds).4 That allows us to immediately train against the target solubility parameters without additional calculations.

This information can now be made available in the formulation management digital tools. Physics and data-informed materials property prediction in milliseconds is essential for an agile formulation development and decision-making. It allows for the exploitation of the vast chemical space for the optimal chemistry, something that cannot even be remotely accomplished by physical testing in relevant industrial cost and time scales.

(1) Hansen, C. M., J. Paint Technol. 1967, 39, 511

(2) Belmares, M.; Blanco, M.; Goddard, W. A.; Ross, R.; Caldwell, G.; Chou, S.-H.; Pham, J.; Olofson, P. M.; Thomas, C., J. Comput. Chem. 2004, 25: 1814–1826

(3) Deng, T.; Jia, G.-Z., Molecular Physics, 2020, 118:2

(4) Lee, S.; Lee, M.; Gyak, K.-W.; Kim, S.; Kim, M.-J.; Min, K., ACS Omega 2022, 7, 14, 12268–12277