Prediction of Polymer Solubility Using Quality Datasets
AIChE Annual Meeting
2023
2023 AIChE Annual Meeting
Annual Student Conference: Competitions & Events
Undergraduate Student Poster Session: Materials Engineering and Sciences
Monday, November 6, 2023 - 10:00am to 12:30pm
To better predict polymer solubility without experimentally testing a polymer in every possible solvent, machine learning can be implemented. However, machine learning models are only as good as the quality of data they are trained on. In this work, I compare a high-quality dataset collected at Georgia Tech to a lower quality one consisting of literature data to discover mismatches in the solubility classifications. The objective is to identify likely inaccurate classifications within the low-quality dataset in comparison to the high-quality experimental dataset and determine if adding certain data or correcting mismatches improves the performance of machine learning models trained on the low-quality dataset. The experimental data contains 176 unique polymer-solvent combinations across room, cold, and hot temperatures, while the literature dataset does not specify temperature. I analyzed whether mismatches occur more frequently for polymer/solvent pairs listed in the literature dataset as insoluble or soluble and whether more mismatches were at hot or cold temperatures. I found that there were more instances, about 6% of the data points, where the literature data labeled the polymer as soluble where it was found to be completely insoluble in the same solvent across all temperature ranges by the experimental data. This analysis is being used to prioritize the experiments to perform to improve the data set and thus improve the quality of the machine learning predictions.