(169bz) Accelerating Polymer Informatics Via Polymer Similarity
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computational Molecular Science and Engineering Forum
Poster Session: Computational Molecular Science and Engineering Forum
Monday, October 28, 2024 - 3:30pm to 5:00pm
Even if the exact distributions are not known, the similarity for a synthetic polymer can still be computed. First, the synthetic polymer is represented by BigSMILES, a structurally-based line notation for describing macromolecules. Then, this BigSMILES string is canonicalized and converted to a graph-based representation. Next, the stochastic graph representation is separated into three parts: repeat units, end groups, and polymer topology. The earth moverâs distance is utilized to calculate the similarity of the repeat units and end groups, while the graph edit distance is used to calculate the similarity of the topology. These three values can be linearly or nonlinearly combined to yield an overall pairwise chemical similarity score for polymers that is largely consistent with the chemical intuition of expert users and is adjustable based on the relative importance of different chemical features for a given similarity problem. [Shi et al. Macromolecules 2023, 56, 18, 7344-7357]
When polymers are represented as an ensemble where its distributions are characterized, earth mover's distance metric is proposed to calculate the pairwise similarity score between two polymer ensembles. The power of using earth mover's distance to characterize the pairwise similarity score between polymer ensembles is illustrated in four examples, including two-chain copolymer ensembles, first-order Markov linear copolymer ensembles, nonlinear star polymer ensembles with varying arm-length, topology and composition, and polymer ensembles represented by molecular mass distributions. These examples demonstrate that the earth mover's distance captures differences neglected by the average method and offers greater resolutions of chemical distinctions between polymer ensembles. With no supervision, the use of earth mover's distance metric gives a quantitative and reliable numeric calculation of pairwise similarity between two polymer ensembles. [Shi et al. ACS Polymers Au 2024, 4, 1, 66â76]
Our similarity methods can handle sequential, compositional, molecular mass, and topological differences, which are typically either entirely or partially ignored by traditional methods, enabling distinctions between different ensembles that would otherwise be erroneously determined to be identical. In conclusion, our similarity methodology represents a critical advancement in the quantitative calculation of polymer similarity and accelerates the progress of cheminformatics for polymers including for applications such as property prediction and classification.