(420h) Advances Featurization and Characterization of MOF Pores for Adsorption Applications
AIChE Annual Meeting
2023
2023 AIChE Annual Meeting
Computational Molecular Science and Engineering Forum
Automated Molecular and Materials Discovery: Integrating Machine Learning, Simulation, and Experiment II
Tuesday, November 7, 2023 - 5:15pm to 5:30pm
Modern day computational materials design
State-of-the-art approaches to computational discovery of MOFs can be thought of in terms of three key components, as shown in Figure 1
- A representation engine: A way to represent MOFs E.g., Network of atoms and bonds, a connected string of building blocks etc.
- A prediction engine: A method to predict the desired property E.g., Molecular simulations, ab initio calculations, machine learning etc.
- A structure sampler/generator: A plan to search the phase space of all possible MOFs E.g., Databases like CoRE-MOFs, ToBaCCo etc. or, an optimizers based on Genetic Algorithm, Bayes etc.
Main challenge in representing MOFs for adsorption
How we represent materials is key to the efficiency of all three steps above. In adsorption, the properties are more strongly correlated to the energy and connectivity of the voids in the material. On one hand, the popular string and graph representations of MOFs (e.g., structure graph, MOFId etc.) primarily gear towards the chemical structure of the MOF and are far too detached to describe the 3D cavities sufficiently well. On the other hand, 3D grid (image) representations of materials are very comprehensive but are too computation and memory intensive to be compiled into a database to be used in high throughput. Hence, we present an intuitive and robust workflow for representing the 3D confining environment of a crystalline nanoporous materials efficiently. The result is a âpore graphâ, which is flexible enough to retain any cavity information relevant to adsorption, while being easy to compute, store and retrieve when used in characterization or machine learning problems.
The pore graph workflow
The void space in a material is converted into a network of interconnected pockets, using 3D image segmentation. As an example, let's look at this workflow applied to Cu-BTC (HKUST-1) as shown in Figure 2. First, the 3D distance map (Figure 2, Step 1) of the material is generated from its CIF file (Figure 2, Step 0), where each voxel is assigned a distance value to the nearest atom surface. The distance map is then thresholded to generate a binary image, where the void space is represented by 1s and the solid material by 0s (not shown in Figure 2). The binary image is then segmented into regions about the local distance maxima using Watershed segmentation (Figure 2, Step 2), which is then converted into a network of interconnected pockets, called the pore graph (Figure 2, Step 4). Each node on this graph represents a pocket in the material and each edge represents the connectivity between two pockets through one or more pore windows. In crystalline materials, it is necessary to account for PBC correctly while describing cavities, to average the properties efficiently. However, given the variety of ways in which the unit cell can cut through pockets, this is quite non-trivial. Hence, we implemented novel algorithms to account for PBC correctly. The result is a regrouped list of region indices called periodic groups, that tells which regions/pockets should be put together to form a complete pocket (Figure 2, Step 3). For example, in the case of Cu-BTC, all the 8 regions at the corners of the unit cell (Figure 2, Step 3) should be combined to get a complete pocket, hence its periodic group has 8 entries. Instead, if a pocket is fully inside the unit cell, then it has only one entry in its periodic group. This information is then annotated on to the nodes of the pore graph.
Two proof-of-concept characterization applications
Breaking down the void space into constituent pockets serves as a template to get the most out of simple pore descriptors and describe the cavities better. As examples, solutions to two challenging characterization problems using this workflow are described below:
- Detect pore windows in materials:
When two regions are connected, the outer boundary of one region, overlaps with the inner boundary of the other region. The windows can be detected as the centroid of this shared outer boundary. If two regions are connected through multiple windows, an agglomerative clustering step is used to group the shared boundary into separate windows. For example, Figure 3, Application 1 shows the windows in Cu-BTC detected and overlayed on its distance isosurface and its pore graph.
- Detect the types of pockets in a material:
Once PBC is applied correctly, a simple point-to-centroid distance histogram is calculated for each periodic group. We then used this simple, geometric descriptor to detect the different types of pockets in the material using agglomerative clustering. For example, Figure 3, Application 2 shows the three types of pockets detected in Cu-BTC and their corresponding size histograms .
Concluding remarks
Many other characterization problems can also be solved using this workflow including the breakdown of total simulated isotherms into regional isotherms using adsorption snapshots and comparing shapes of pockets across different materials and/or molecules using persistent homology. Apart from pure characterization, the pore graphs can be annotated with any node (pocket) or edge (connection) attribute (LCD, window size, energy histogram etc.) and fed into graph ML algorithms to predict complex adsorption properties that are often inaccessible to simpler architectures. This is much more efficient than training CNNs with cumbersome 3D images of materials. In summary, pore graphs help us understand the cavities in a material better by representing them as a network of interconnected pockets. This reveals the underlying similarities and opens the possibility for solving otherwise difficult characterization problems, and adsorption property prediction via machine learning.