(106e) The Open Catalyst Project Dataset | AIChE

(106e) The Open Catalyst Project Dataset

Authors 

Ulissi, Z. - Presenter, Carnegie Mellon University
Yoon, J., Carnegie Mellon University
Tran, K., Carnegie Mellon University
Palizhati, A., Carnegie Mellon University
Heras-Domingo, J., Carnegie Mellon University
Das, A., Georgia Tech
Parikh, D., Georgia Tech and Facebook AI Research
Chanussot, L., Facebook AI Research
Goyal, S., Facebook AI Research
Ho, C., Facebook AI Research
Lavril, T., Facebook AI Research
Riviere, M., Facebook AI Research
Zitnick, C. L., Facebook AI Research
The Open Catalyst Project aims to develop new ML methods and models to accelerate the catalyst simulation process for renewable energy technologies and improve our ability to predict activity/selectivity across catalyst composition. To achieve that in the short term we need participation from the ML community in solving key challenges in catalysis. One path to interaction is the development of grand challenge datasets that are representative of common challenges in catalysis, large enough to excite the ML community, and large enough to take advantage of and encourage advances in deep learning models. Similar datasets have had a large impact in small molecule drug discovery, organic photovoltaics, and inorganic crystal structure prediction. We present the first open dataset from this effort on thermochemical intermediates across stable multi-metallic and p-block doped surfaces. This dataset includes full-accuracy DFT calculations across 53 elements and their binary/ternary materials, various low-index facets. Adsorbates span 56 common reaction intermediates with relevance to carbon, oxygen, and nitrogen thermal and electrochemical reactions. Off-equilibrium structures are also generated and included to aid in machine learning force field design and fitting. Collectively, this dataset represents the largest systematic dataset that bridges organic and inorganic chemistry and will enable a new generation of catalyst structure/property relationships. Fixed train/test splits that represent common chemical challenges and an open challenge website will be discussed to encourage competition and buy-in from the ML community.