(291a) Naming, Classifying, and Comparing Polymers in the Era of Data Science
AIChE Annual Meeting
2021
2021 Annual Meeting
Materials Engineering and Sciences Division
Area Plenary 1: Emerging Areas in Polymer Science and Engineering (Invited Talks)
Tuesday, November 9, 2021 - 12:30pm to 1:00pm
Recently, we developed BigSMILES, a stochastic line notation capable of capturing polymer structures in a way directly analogous to chemical structure drawings but offering all the advantages of and full compatibility with the SMILES small molecule line notation. However, BigSMILES, like chemical structure drawings, only defines the set of possible molecules. To define their probabilities, characterization data is necessary. To address this, we have put forward the PolyDAT schema that links characterization to line notation, providing complete chemical definition of a polymer. Together, these structures enable many exciting challenges to be addressed. First, we demonstrate how polymer structures can be canonicalized, both using empirical rules and through analogy to automata in computer science. Second, we show how BigSMILES can be used to drive polymer vectorization, and third, we show how BigSMILES can form the basis of polymer similarity comparisons.
Extending the initial BigSMILES grammar, we have also developed BigSMARTS, an extension of SMARTS that allows search of polymer structures. We have further demonstrated that BigSMILES is compatible with the concepts put forth in SELFIES, enabling polymers to be written in a way that makes them more amenable to use in genetic algorithms. Finally, the stochastic nature of BigSMILES makes it inherently compatible with non-covalent bonds, an advantage over deterministic line notations. We use this feature to extend BigSMILES to a wide variety of molecular constructs useful in colloidal and supramolecular materials.