(4ef) Integrate Machine Learning in Automated Quantum Chemistry Calculation Workflows: Towards Faster and More Accurate Chemical Discovery | AIChE

(4ef) Integrate Machine Learning in Automated Quantum Chemistry Calculation Workflows: Towards Faster and More Accurate Chemical Discovery

Machine learning (ML) has begun to accelerate chemical discovery by providing advances in efficiency needed to overcome the combinatorial challenge of computational materials design. A large and high-fidelity dataset always lies in the center of the success for ML model training, and thus vital for both forward and inverse ML-accelerated chemical discovery. However, current automated QC calculation workflows with density functional theory (DFT) as workhorse leads to many attempted calculations that are doomed to fail and brings biases/inaccuracy to the training data that may be out of the domain of applicability of DFT. This includes many compelling functional materials and catalytic processes that are difficult because of their complex electronic structure, such as systems involving strained chemical bonds, open-shell radicals and diradicals, or metal–organic bonds to open-shell transition-metal centers. We address these challenges of computation efficiency and accuracy by integrating ML approaches into conventional DFT-based QC workflows. We built two types of classifiers to predict the likelihood of calculation success: 1. a static model prior to calculations and 2. a dynamic model on-the-fly monitoring calculations. The static classifier is a near zero-cost model that rapidly filters out candidate calculations most likely to fail, while the dynamic model monitors and terminates an already running calculation if it is predicted to fail with high confidence. Together, these classifiers save half of the computation resources. We also developed multiple types of ML classifiers to predict the presence of strong static correlation, which is usually a sign of a system being out of the domain of applicability of DFT. Our models only require calculations at DFT cost and can classify which systems in a dataset will require more expensive but accurate wavefunction theory calculations, leading to overall high fidelity of the entire dataset. Since electronic structure information is encoded as the inputs, our models are readily transferable to larger systems and systems with unseen elements. Lastly, we investigated the bias of data generated with one single density functional approximation (DFA) and its influence on ML model training and lead complexes during chemical discovery. By requiring consensus of the ML-predicted properties, we improve correspondence of these computational lead compounds with literature-mined, experimental compounds over the single-DFA approach typically employed. All these classifier models represent the first efforts toward autonomous workflows that move past the need for expert determination of the robustness of DFT-based materials discoveries.

Research Interests: computational chemistry, machine learning, automated computation workflow, chemical discovery