(197bc) Expanding Bigsmiles for Automated Simulations and Machine Learning Representation of Polymeric Systems | AIChE

(197bc) Expanding Bigsmiles for Automated Simulations and Machine Learning Representation of Polymeric Systems

Authors 

Schneider, L. - Presenter, University of Chicago
de Pablo, J., University of Wisconsin-Madison
Olsen, B., Massachusetts Institute of Technology
BigSMILES is a line notation used to describe complex polymeric systems. However, it does not describe a single system, but rather a family of possible realizations. To address this limitation, we present a compatible expansion of the BigSMILES line notation that adds information to make it generative.

With the expanded BigSMILES, the information from the line notation is sufficient to generate a full polymer system from scratch, including molecular weight distributions, percentages of monomers, reactivity and affinity of the monomers, as well as specifications such as solvents and mixtures. This enables the detailed atomistic generation of an ensemble of polymer molecules, providing a starting point for molecular dynamics simulations and the building of digital twins.

This expanded notation can also be used to generate initial conditions for high-throughput pipelines that analyze polymers using simulations and/or automated experimentation from a single line input prompt, combining completeness and human readability. Moreover, because the expanded BigSMILES describes exactly one ensemble of (random) polymer molecules, it is possible to determine the probability that a given molecule belongs to the described ensemble. This provides a starting point for training auto-encoders to represent these polymer ensembles for machine learning purposes, closing the loop between generating and quantifying molecules.

Overall, the expanded BigSMILES line notation provides a powerful tool for the design and analysis of polymeric systems, allowing for greater automation and efficiency in research and development.