(362e) Calculating Pairwise Similarity of Polymer Ensembles Via Earth Mover's Distance
AIChE Annual Meeting
2023
2023 AIChE Annual Meeting
Engineering Sciences and Fundamentals
Faculty Candidates in CoMSEF/Area 1a, Session 1
Monday, November 6, 2023 - 8:40am to 8:50am
In this work, we propose the earth mover's distance metric to calculate the pairwise similarity score between two polymer ensembles. First, the individual pairwise distance dij between a polymer chain pi in the polymer ensemble P and a polymer chain qj in the polymer ensemble Q is calculated via commonly used similarity methods, such as computing the sequence mismatching percent or graph edit distance between two single polymer chains, generating a distance matrix D = [dij]. Second, earth mover's distance utilizes the distance matrix D and the relative weights of each polymer chain to quantitatively calculate the pairwise similarity/dissimilarity between two polymer ensembles by linear optimization. We illustrate the power of using earth mover's distance to characterize the pairwise similarity score between polymer ensembles in four examples, including two-chain copolymer ensembles, first-order Markov linear copolymer ensembles, nonlinear star polymer ensembles with varying arm-length, topology and composition, and polymer ensembles represented by molecular mass distributions. These examples demonstrate that the earth mover's distance captures differences neglected by the average method and offers greater resolutions of chemical distinctions between polymer ensembles. With no supervision, the use of earth mover's distance metric gives a quantitative and reliable numeric calculation of pairwise similarity between two polymer ensembles. Our methodology represents a critical advancement in the quantitative calculation of polymer ensemble similarity and accelerates the progress of cheminformatics for polymers. This advancement is promising for applications such as search queries in polymer databases and polymer inverse design, fostering a more comprehensive understanding and utilization of polymer data.