Prediction of Polymer Solubility Using Quality Datasets | AIChE

Prediction of Polymer Solubility Using Quality Datasets

Polymers are large molecules of many repeating units and their high strength-to-weight ratio makes them versatile and appealing to many industries, including food and drink packaging, medical equipment, furniture, construction, and travel. Although plastic is crucial to these industries and more, the majority of plastic generated becomes waste at the end of life. According to the US Environmental Protection Agency, only 8.7 percent of plastic generated in the US was recycled in 2018–but while higher rates of mechanical recycling is ideal, it is much less feasible than the recycling of other materials. Every time plastic is recycled, the polymer chain becomes shorter, losing strength and durability. To resolve this, chemical recycling can depolymerize the plastic to its monomer form for repolymerization, attaining the same quality as a virgin polymer. For chemical recycling, the polymer must first be dissolved in a solvent. However, since polymers are large and complex molecules, predicting which solvents can be used is challenging. Many factors such as polymer and solvent functional groups, polarity, boiling/melting point, and molecular weight impact solubility.

To better predict polymer solubility without experimentally testing a polymer in every possible solvent, machine learning can be implemented. However, machine learning models are only as good as the quality of data they are trained on. In this work, I compare a high-quality dataset collected at Georgia Tech to a lower quality one consisting of literature data to discover mismatches in the solubility classifications. The objective is to identify likely inaccurate classifications within the low-quality dataset in comparison to the high-quality experimental dataset and determine if adding certain data or correcting mismatches improves the performance of machine learning models trained on the low-quality dataset. The experimental data contains 176 unique polymer-solvent combinations across room, cold, and hot temperatures, while the literature dataset does not specify temperature. I analyzed whether mismatches occur more frequently for polymer/solvent pairs listed in the literature dataset as insoluble or soluble and whether more mismatches were at hot or cold temperatures. I found that there were more instances, about 6% of the data points, where the literature data labeled the polymer as soluble where it was found to be completely insoluble in the same solvent across all temperature ranges by the experimental data. This analysis is being used to prioritize the experiments to perform to improve the data set and thus improve the quality of the machine learning predictions.