(455j) Simulation-Free, Two-Dimensional Histograms As Effective Adsorbent Representations for Machine-Learning Based Adsorption Predictions
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computational Molecular Science and Engineering Forum
Machine Learning for Soft and Hard Materials II
Wednesday, October 30, 2024 - 9:48am to 10:00am
It is known that the efficiency of ML models is highly dependent on the problem representation, where âmeaningfulâ representations of the problem can aid the model learning, and even reduce the data burden for training. To this end, here we demonstrate the efficacy of two-dimensional histograms as adsorbent fingerprints that encode information about the MOF pore shape/size and interaction parameters associated with grids of adsorption sites in MOFs. Specifically, these histograms encode information about the MOF atoms closest to each site, and the (average) non-bonded interaction parameters associated with it. These histograms thus encode MOF properties that are independent of the adsorbed molecules but that yet are intrinsically related to adsorption behavior in general, facilitating their applicability along a variety of adsorption scenarios (instead of the usual customized MOF representation specific to a given adsorption problem). As a side benefit, these histograms can be generated for each MOF without the need of significant âdomain knowledge,â making them accessible to a wider research community.
To demonstrate their efficiency, we evaluate and discuss the efficacy of these histogram-based MOF representations to facilitate the application of data science techniques to three MOF screening scenarios requiring adsorption predictions: i) imbalanced learning to find adsorbent materials to work as NH3-extracting membranes in plasma reactors for green NH3 synthesis, ii) multitask learning to simultaneously predict the adsorption of multiple small molecules with the same model, iii) transfer learning scenarios to predict the adsorption of molecule âAâ at condition âXâ, leveraging data for adsorption of molecule âBâ at condition âYâ.
For instance, we show that iterative, imbalanced learning using our proposed histogram representation found MOFs for NH3-selective membranes that were better that at least 70% of the structures that would have been assumed to constitute the top-200 MOFs if one would have relied on traditional hierarchical screening. As reference, note that in traditional hierarchical screening of adsorbents an early filter based on cheap calculations of adsorption Henryâs constants is used to reduce the number of structures on which to perform full molecular simulations. Here, instead we use the structures coming out of that filter to âjumpstartâ iterative imbalanced learning, which ultimately allowed us to find more promising MOFs while only performing full molecular simulation on less than 10% of the targeted MOF database. Additionally, we showed that the generated data was sufficient to yield meaningful data-driven design rules that could even be applied beyond MOFs to design adsorbents in general for these applications, .