(556d) Using Data Mining and Large Language Models for a Large-Scale Comparison of Adsorption from Molecular Simulation and Experiment | AIChE

(556d) Using Data Mining and Large Language Models for a Large-Scale Comparison of Adsorption from Molecular Simulation and Experiment

Authors 

Snurr, R. - Presenter, Northwestern University
Siepmann, J., University of Minnesota-Twin Cities
Kohen, D., Carleton College
Wang, N., Carleton College
Wolters, H., Carleton College
Zakhia, S., University of Minnesota
In the study of adsorption phenomena within metal-organic frameworks (MOFs), molecular simulations, particularly grand canonical Monte Carlo (GCMC) simulations, have become an indispensable tool. A long-standing question is how well simulations predict adsorption isotherms compared to experiments. Mostly this question is answered by making comparisons for a few systems of interest, but a large-scale comparison remains elusive. One reason for this gap is the lack of a systematic naming system for MOFs, which makes it difficult to establish a definitive link between published experimental isotherms, typically in figure format, and crystal structures, typically in text or CIF format. Even with resources such as the NIST-ISODB, which provides valuable adsorption data for a range of adsorbents, and the Cambridge Structural Database (CSD), which catalogs crystal structures of MOFs, establishing this link remains challenging, especially for materials with less well-known common names. The availability of matched experimental isotherms and crystal structures is critical not only for enhancing the reproducibility of adsorption measurements but also for refining and advancing molecular simulations that can predict and understand adsorption behaviors in MOFs. To address this crucial need, we have developed a novel integration of data mining tools and large language models (LLM). This approach not only bridges the existing gap between theoretical predictions and experimental measurements but also provides new tools for that should be useful in a variety of MOF-based adsorption research and should pave the way for more accurate, reproducible, and insightful investigations.