(285e) A Database with Automated Quantum Chemistry Calculations and Machine Learning for Functional Transition Metal Complex Discovery | AIChE

(285e) A Database with Automated Quantum Chemistry Calculations and Machine Learning for Functional Transition Metal Complex Discovery

Authors 

Taylor, M., Massachusetts Institute of Technology
Harper, D., Massachusetts Institute of Technology
Nandy, A., Massachusetts Institute of Technology
Arunachalam, N., Massachusetts Institute of Technology
Liu, F., Stanford University
Large-scale databases of density functional theory (DFT) properties have proven invaluable in accelerating materials discovery, with notable example databases in both the solid state and small organic molecules. A similar database of transition metal (TM) complexes with DFT properties, however, is absent due to unique challenges in structure curation and domain of application of DFT for their properties. To address these challenges, we have developed open-source toolkits molSimplify and AutomaticDesign that incorporate both chemical rules and data-driven machine-learning (ML) models. Here, we highlight the first large-scale computational database of TM complexes, containing over 40,000 attempts of DFT geometry relaxation of octahedral complexes. From this database, we select a subset of over 24,000 complexes with higher-quality geometry and electronic structure. For these complexes, we calculated properties such as solubility, spin-splitting energy, redox potential, frontier orbital energies, and energetic functional dependence. To extend the number of complexes in the database continuously over time, we further built a workflow to automatically perform DFT geometry relaxations on diversity search-selected experimentally-determined complexes (i.e., from the Cambridge Structure Database) when supercomputing resources are idle. We also describe our automatic ML-model retraining module, which updates all our ML models when new results become available. With this automated workflow, we can readily monitor the evolution and quality of our ML models as the database grows. We anticipate our database, enhanced by automated quantum chemistry calculations and ML model retraining, to be a useful tool in accelerating discovery of functional transition metal complexes.