(634e) Systematic Chem-Informatics and Machine Learning Studies for Gas Permeability and Selectivity in Polymers

Conference

AIChE Annual Meeting

Year

2021

Proceeding

CO2 Capture for Power Generation

Time

Thursday, November 11, 2021 - 4:30pm to 4:45pm

Authors

Shi, W. - Presenter, National Energy Technology Laboratory, U.S. Department of Energy

Tiwari, S., Leidos Research Support Team

Budhathoki, S., AECOM

Steckel, J., National Energy Technology Laboratory

Sekizkardes, A., National Energy Technology Laboratory

Zhu, L., National Energy Technology Laboratory

Yi, S., Georgia Institute of Technology

Kusuma, V. A., Leidos Research Support Team

Resnik, K. P., Leidos Research Support Team - US DOE/NETL

Hopkinson, D., NETL

Quickly and reliably predicting gas permeability and selectivity in polymers is important to develop effective and efficient polymers for specific gas separation applications, such as membrane development for post-combustion carbon capture. There are several challenges, such as building a polymer database with various different gases permeabilities, designing a scheme to encode the polymer repeat units, generating fingerprints and descriptors which could discern different polymer repeat unit structures, and building the machine learning (ML) models. We will address these problems in this presentation.

We extended the simplified molecular-input line-entry system (SMILES) for small molecules to encode the polymer repeat unit by utilizing the [*] ghost atom with different charges to specify different head and tail atoms for the repeat units of both simple and ladder homopolymers. Multiple-mers could be generated from this extended SMILE scheme. Starting from the Membrane Society of Australasia (MSA) polymer database, which contains different gases permeabilities, we built an in-house polymer database by carefully checking the data and references for accuracy, adding the SMILES for the polymer repeat units for homopolymers, random polymers, and polymer blends. We added more than 100 new datasets, and we also added other polymer properties, such as glass transition temperature, polymer density and free volume fraction. There are 1,674 sets of data in our in-house polymer database; each set corresponds to one polymer. There are 1,210 sets of data corresponding to homopolymers, out of which there are only 807 unique polymer repeat units. Five different fingerprints, that is, RDKfingerprint, MACCS keys, AtomPair fingerprint, Torsional fingerprint, and Morgan fingerprint were tested along with 189 different 2D descriptors. Although the RDKfingerprint has been used by another researcher group ^[2] to develop ML models to predict gas permeability in polymers, it was found in our work that the RDKfingerprint fails to describe 2.2% of the 807 unique repeat units. That is, for some different polymer repeat unit structures, the RDKfingerprint gives the same fingerprint values, which is unexpected. To fix this problem, we combined the RDKfingerprint with the 189 2D descriptors, which was found to be able to discern different polymer structures. In addition to the comparison between different fingerprints and descriptors to describe polymer repeat units, we will also show their ML performance by using a Gaussian process regression (GPR) algorithm and a different number of repeat units. Our work shows that the RDKfingerprint combined with 189 2D descriptors exhibits the best ML performance at a polymer repeat unit length of 10. Furthermore, shuffling of the data was found to significantly affect the ML performance. For example, using the same 1,071 sets of data for CO₂ permeability along with 70% training and 30% test data, shuffling data (leading to different assignment to training and test data) could give R² values (accuracy of determination) for the test data sets between 0.84 to 0.91 although the R² values for the training set are approximately 0.98. We will show strategies to alleviate this problem. We will also show our preliminary results of designing polymers by using the iQSPR method ^[3,4].

References:

https://docs.google.com/spreadsheets/d/1LXwkZfhrdLtLuG7WvrZBgjZ2u5QsATsby9yN_FW6kcw/edit#gid=1
Sci. Adv. 2020; 6 : eaaz4301
J Comput Aided Mol Des 31, 379â€“391
Mol Inf 2020, 39, 1900107

Topics

Computing and Systems Engineering

Membrane-Based Separations

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: December 2024

CEP: November 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(634e) Systematic Chem-Informatics and Machine Learning Studies for Gas Permeability and Selectivity in Polymers

AIChE Annual Meeting

2021

2021 Annual Meeting

Sustainable Engineering Forum

CO2 Capture for Power Generation

Thursday, November 11, 2021 - 4:30pm to 4:45pm

Authors

Topics

More Conference Links

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams