(106e) The Open Catalyst Project Dataset
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Applications of Data Science in Catalysis and Reaction Engineering II
Monday, November 16, 2020 - 9:00am to 9:15am
The Open Catalyst Project aims to develop new ML methods and models to accelerate the catalyst simulation process for renewable energy technologies and improve our ability to predict activity/selectivity across catalyst composition. To achieve that in the short term we need participation from the ML community in solving key challenges in catalysis. One path to interaction is the development of grand challenge datasets that are representative of common challenges in catalysis, large enough to excite the ML community, and large enough to take advantage of and encourage advances in deep learning models. Similar datasets have had a large impact in small molecule drug discovery, organic photovoltaics, and inorganic crystal structure prediction. We present the first open dataset from this effort on thermochemical intermediates across stable multi-metallic and p-block doped surfaces. This dataset includes full-accuracy DFT calculations across 53 elements and their binary/ternary materials, various low-index facets. Adsorbates span 56 common reaction intermediates with relevance to carbon, oxygen, and nitrogen thermal and electrochemical reactions. Off-equilibrium structures are also generated and included to aid in machine learning force field design and fitting. Collectively, this dataset represents the largest systematic dataset that bridges organic and inorganic chemistry and will enable a new generation of catalyst structure/property relationships. Fixed train/test splits that represent common chemical challenges and an open challenge website will be discussed to encourage competition and buy-in from the ML community.