(455j) Simulation-Free, Two-Dimensional Histograms As Effective Adsorbent Representations for Machine-Learning Based Adsorption Predictions | AIChE

(455j) Simulation-Free, Two-Dimensional Histograms As Effective Adsorbent Representations for Machine-Learning Based Adsorption Predictions

Authors 

Fajardo-Rojas, J. - Presenter, Colorado School of Mines
Liu, T. W., Colorado School of Mines
Gómez-Gualdrón, D. A., Texas A&M University
Vilas, T. G. D., University of São Paulo
Gaining full control of adsorption phenomena in adsorbent materials could make a myriad of applications in numerous engineering areas possible, and metal-organic frameworks (MOFs) are among the most promising adsorbents to achieve such control. The promise of MOFs, however, is tied to their extreme chemical and structural tunability, which in turn gives rise to an overwhelmingly large number of possible MOFs. Thus, while molecular simulation has accelerated the study of adsorption in MOFs, even facilitating high throughput screening of hundreds of thousands of MOFs in some instances, it is likely that the exhaustive study of adsorption phenomena in general in MOFs requires “instantaneous” adsorption predictions such as those that machine learning (ML) models can provide.

It is known that the efficiency of ML models is highly dependent on the problem representation, where “meaningful” representations of the problem can aid the model learning, and even reduce the data burden for training. To this end, here we demonstrate the efficacy of two-dimensional histograms as adsorbent fingerprints that encode information about the MOF pore shape/size and interaction parameters associated with grids of adsorption sites in MOFs. Specifically, these histograms encode information about the MOF atoms closest to each site, and the (average) non-bonded interaction parameters associated with it. These histograms thus encode MOF properties that are independent of the adsorbed molecules but that yet are intrinsically related to adsorption behavior in general, facilitating their applicability along a variety of adsorption scenarios (instead of the usual customized MOF representation specific to a given adsorption problem). As a side benefit, these histograms can be generated for each MOF without the need of significant “domain knowledge,” making them accessible to a wider research community.

To demonstrate their efficiency, we evaluate and discuss the efficacy of these histogram-based MOF representations to facilitate the application of data science techniques to three MOF screening scenarios requiring adsorption predictions: i) imbalanced learning to find adsorbent materials to work as NH3-extracting membranes in plasma reactors for green NH3 synthesis, ii) multitask learning to simultaneously predict the adsorption of multiple small molecules with the same model, iii) transfer learning scenarios to predict the adsorption of molecule “A” at condition “X”, leveraging data for adsorption of molecule “B” at condition “Y”.

For instance, we show that iterative, imbalanced learning using our proposed histogram representation found MOFs for NH3-selective membranes that were better that at least 70% of the structures that would have been assumed to constitute the top-200 MOFs if one would have relied on traditional hierarchical screening. As reference, note that in traditional hierarchical screening of adsorbents an early filter based on cheap calculations of adsorption Henry’s constants is used to reduce the number of structures on which to perform full molecular simulations. Here, instead we use the structures coming out of that filter to “jumpstart” iterative imbalanced learning, which ultimately allowed us to find more promising MOFs while only performing full molecular simulation on less than 10% of the targeted MOF database. Additionally, we showed that the generated data was sufficient to yield meaningful data-driven design rules that could even be applied beyond MOFs to design adsorbents in general for these applications, .