(556d) Using Data Mining and Large Language Models for a Large-Scale Comparison of Adsorption from Molecular Simulation and Experiment
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Separations Division
Molecular and Data Science Modeling of Adsorption
Wednesday, October 30, 2024 - 1:15pm to 1:30pm
In the study of adsorption phenomena within metal-organic frameworks (MOFs), molecular simulations, particularly grand canonical Monte Carlo (GCMC) simulations, have become an indispensable tool. A long-standing question is how well simulations predict adsorption isotherms compared to experiments. Mostly this question is answered by making comparisons for a few systems of interest, but a large-scale comparison remains elusive. One reason for this gap is the lack of a systematic naming system for MOFs, which makes it difficult to establish a definitive link between published experimental isotherms, typically in figure format, and crystal structures, typically in text or CIF format. Even with resources such as the NIST-ISODB, which provides valuable adsorption data for a range of adsorbents, and the Cambridge Structural Database (CSD), which catalogs crystal structures of MOFs, establishing this link remains challenging, especially for materials with less well-known common names. The availability of matched experimental isotherms and crystal structures is critical not only for enhancing the reproducibility of adsorption measurements but also for refining and advancing molecular simulations that can predict and understand adsorption behaviors in MOFs. To address this crucial need, we have developed a novel integration of data mining tools and large language models (LLM). This approach not only bridges the existing gap between theoretical predictions and experimental measurements but also provides new tools for that should be useful in a variety of MOF-based adsorption research and should pave the way for more accurate, reproducible, and insightful investigations.