(711c) Beyond Molecular Structure: Predicting CO2 Solubility in Amine Solvents Under Varying Operating Conditions with Transformer Models
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computing and Systems Technology Division
10B: AI/ML Modeling, Optimization and Control Applications II
Thursday, October 31, 2024 - 4:02pm to 4:18pm
Recent studies have explored using machine learning models, such as artificial neural networks [3][4]. These models have found excellent applications in Cheminformatics due to their high accuracy in system-specific property predictions. However, traditional machine-learning approaches face significant limitations in this context. They typically require extensive feature engineering and are often developed for a narrow range of compounds, which hampers their generalizability to other molecules[4]. In contrast, recent advancements have turned to transformer-based machine learning models. These models have shown promise in cheminformatics, offering accurate predictions of system-specific properties simply from the SMILES notation of compounds. This is a significant improvement over previous models, thanks to the self-attention mechanism allowing a better understanding of intermolecular relationships. However, while transformers are effective in predicting properties under standard conditions, their application in predicting properties under varied conditions, which is crucial for designing and optimizing industrial processes, is still an area that requires further exploration and development.
To address the limitations of traditional machine learning approaches in predicting the solubility of CO2 in amine solvents under various operating conditions, we have developed a novel method utilizing an encoder- only transformer. This transformer has been pre-trained on a vast dataset of 1.1 billion molecules from the PubChem database, enabling it to capture the intricate relationship between chemical structures and their properties. The encoder generates a numeric molecular representation, which can be considered a 'universal chemical language' for property prediction. In the fine-tuning stage, our database, comprising 146 solvent molecules under diverse operating conditions, is processed through the encoder to obtain molecular embeddings. These embeddings are then combined with the up-projected operating conditions and fed into a neural network to predict CO2 solubility. This approach addresses two significant challenges in using transformers for property predictions in chemical engineering. Firstly, by excluding operating conditions from the pre-training phase, we eliminate the need for a large labeled dataset to train the encoder. This is particularly advantageous given the scarcity of high-quality data in many chemical engineering domains, which has hindered the application of transformers in these areas. Secondly, we enhance the model's relevance for process design by incorporating operating condition augmentations. Chemical processes are inherently dynamic, and the ability to predict properties under varying conditions is crucial for designing more efficient processes. This adaptation improves the model's generalizability and offers a significant advantage in carbon capture from a process design perspective. Our architecture reduces the computational burden of training transformers for different systems and ensures accurate predictions without compromising efficiency.
References:
[1] IEA. "CO2 Emissions in 2023." International Energy Agency, 2024, https://www.iea.org/reports/co2-emissions-in-2023. Accessed 3 April 2024.
[2] Dziejarski, Bartosz, Renata KrzyżyÅska, and Klas Andersson. "Current Status of Carbon Capture, Utilization, and Storage Technologies in the Global Economy: A Survey of Technical Assessment." Fuel, vol. 342, 2023, pp. 127776.
[3] Chen, Guangying, et al. "Artificial Neural Network Models for the Prediction of CO2 Solubility in Aqueous Amine Solutions." International Journal of Greenhouse Gas Control, vol. 39, 2015, pp. 174-184.
[4] Li, Tianci, et al. "Experimental Investigations and Developing Multilayer Neural Network Models for Prediction of CO2 Solubility in Aqueous MDEA/PZ and MEA/MDEA/PZ Blends." Greenhouse Gases: Science and Technology, vol. 11, no. 4, 2021, pp. 712-733.