(62a) Physics-Guided Machine Learning for Transferable Prediction of Polymer Properties | AIChE

(62a) Physics-Guided Machine Learning for Transferable Prediction of Polymer Properties

Authors 

Jiang, S. - Presenter, University of Wisconsin-Madison
Data-driven machine learning (ML) plays a crucial role in chemistry and materials science by facilitating rapid property prediction [1], novel molecule discovery [2], and synthesis route design [3]. The application of ML to polymer science is particularly promising, as polymers exhibit intriguing physical phenomena across extensive time and length scales, where experimental characterization and simulations can be both challenging and costly. Recent studies have demonstrated the success of ML in predicting various properties of polymers [4,5]. However, these studies focus on improving prediction accuracy within a limited set of polymer systems, without specifically considering prediction generalizability (or model transferability).

Generalizability refers to the ability of a predictive model to perform its intended purpose on unseen data, even when the distribution of properties differs significantly from the training data. This is crucial for predicting polymer properties, which can vary considerably due to their structural, compositional, and chemical complexities. To enhance the generalizability of ML models, several methods have been proposed, including cross-validation, model regularization, and data augmentation. However, these practices focus solely on the model training phase and are limited to a small set of polymer systems. As a result, they do not rigorously ensure generalizability to unseen systems. Increasing dataset diversity also has the potential to improve model generalizability, but it can be impractical when data points are costly to acquire. In contrast, polymer physics models are known for their generalizability due to the use of explicit analytical expressions. However, physics-based models are often based on simplified assumptions, which can result in low prediction accuracy in complex systems.

In this study, we present a graph neural network (GNN) model guided by polymer physics baseline [6] to predict the characteristic size distribution of polymers (i.e., the mean and variance of the squared radius of gyration). This baseline-guided GNN model, BaseGNN, is developed using an original dataset that comprises coarse-grained molecular dynamics data for over 18,000 polymers with various topologies, compositions, and chemical patterns, covering a wide range of molecular weights. The prediction accuracy and generalizability of BaseGNN are rigorously compared with those of the pure GNN and pure baseline model on diverse unseen datasets for different molecular weights, topologies, and chemical patterns. The features learned from BaseGNN also provide interpretability, explaining the variation in generalizability with polymer topologies. This work expands the utility of physics informed machine learning for polymer property prediction and demonstrates how such algorithms can also facilitate accurate and generalizable prediction for a variety of unseen polymer systems.

[1] Pilania, Ghanshyam, et al. "Accelerating materials property predictions using machine learning." Scientific reports 3.1 (2013): 2810.

[2] Soleimany, Ava P., et al. "Evidential deep learning for guided molecular property prediction and discovery." ACS central science 7.8 (2021): 1356-1367.

[3] Moosavi, Seyed Mohamad, Kevin Maik Jablonka, and Berend Smit. "The role of machine learning in the understanding and design of materials." Journal of the American Chemical Society 142.48 (2020): 20273-20287.

[4] Kim, Chiho, et al. "Polymer genome: a data-powered polymer informatics platform for property predictions." The Journal of Physical Chemistry C 122.31 (2018): 17575-17585.

[5] Park, Jaehong, et al. "Prediction and interpretation of polymer properties using the graph convolutional network." ACS Polymers Au 2.4 (2022): 213-222.

[6] Eichinger, Bruce E. "Configuration statistics of Gaussian molecules." Macromolecules 13.1 (1980): 1-11.