(4hf) Machine Learning and Computational Tools for Molecular Properties and Reaction Systems | AIChE

(4hf) Machine Learning and Computational Tools for Molecular Properties and Reaction Systems

Authors 

McGill, C. J. - Presenter, North Carolina State University
Research Interests

Machine learning (ML) is a powerful tool, being shown to be effective at solving otherwise intractible problems in many scientific domains. Recent work using machine learning in chemistry has yielded significant results, drawing complex features from representations of the molecular graph. I will be working to develop systems to improve ML treatment of chemical properties that will take advantage of the unique context of molecular features for the prediction of chemical properties. Specifically, I will be incorporating the technique of boosting (using a synthesis of multiple sequential ML models) into a broadly accessible machine learning software. In boosting, each model stage draws from the input features to find a way to reduce the residual errors remaining after the previous stage. In many contexts, the features available at each stage are the same, but this is not so with the molecular graph. Chemical systems are uniquely well situated for boosting because the incredible complexity of molecular graph features means that new information (or new combinations of data) can be utilized at each stage. Boosting in chemical systems presents the opportunity to build a model that incorporates all of the available feature information instead of relying on tools for curating the best subset of feature information. This new approach for chemical models will be applied to properties of great importance for which reliable models are still being developed (such as whole pH-range pKa) and to properties for which models have already been trained but where improvements may still be attainable.

I will also develop systems and tools for the continued advancement of uncertainty quantification in ML applications of chemical systems. ML models often suffer from a lack of context and interpretability, producing black box predictions for a user. Even with whole-model descriptions of performance and error, the level of uncertainty for individual predictions made by a model may vary wildly without any indication to the user. Some methods of uncertainty quantification already exist and are all capable of being calibrated to perform appropriately on average over a whole dataset. However, this overall behavior masks shortcomings in resolution when considering subsets of the data. I will develop and apply metrics for scoring uncertainty quantification methods in order to guide improvements to these calculations. Additionally, I will develop techniques to separate and quantify different sources of error (noise, model bias, model variance) as part of uncertainty quantification, a source of inconsistency not addressed in existing techniques. In tandem with making developments to ML structures for chemistry models and uncertainty prediction, I will incorporate these advances into user-accessible software. Creating a system that can be used productively even by non-experts in ML will help scientists in a wide range of fields benefit from the power of these tools.

In addition to ML tools, I will also study ways to improve the treatment of complex chemical mechanisms. Many of the most interesting reaction systems comprise very large sets of reaction systems. Tools such as Reaction Mechanism Generator (RMG) and similar allow for procedural generation and study of these systems. Large mechanisms frequently contain subsets of reactions which are semi-equilibrated, leading to inefficient calculations necessary to solve these stiff systems. Existing tools have been developed to de-stiffen these systems through quasi-steady-state assumptions and species lumping. Such lumping tools are incredibly important for enabling large-system calculations for any level of available computational resources. However, these techniques are often non-invertable and difficult to interpret. I will be using time-constant analysis to perform an analogous lumping function but with greater interpretability by the user, dynamic time-constant setting, and complete invertability. The addition of dynamic time-constant adjustments will allow for the lumping structure to change appropriately for different temperatures and reactant concentrations when considering different reaction conditions and, in some cases, shifting over the course of a single simulation.

I will also study individual chemical mechanisms in addition to the handling systems generally, with a focus on organometallic systems. The world of chemical manufacturing is filled with chemistries that are widely used, well studied, and yet not understood to the level of a detailed elementary mechanism. These reactions may already be used with great efficiency but their full potential cannot be assessed with a more full understanding of the elementary reactions at play. With this sort of improved understanding, it allows a practitioner to better tune reagents, and conditions. A full understanding of the mechanism could open up new pathways in analogous systems or provide solutions for unwanted byproducts. Recent advances in quantum chemistry calculations have opened the way to study these long-used mechanisms. Techniques like multistructure transition state theory will allow for the treatment of unusual vibrational modes that are difficult to approximate as simple hindered rotors. Approximations of solution effects can now be performed with more resolution using QM/MM techniques to represent solvent molecules directly in place of polarizable field models. Barrierless reactions can be represented with variable reaction coordinate transition state theory. The combination of advances in these three areas make elementary study of organo-metallic mechanisms possible at a higher quality than ever before.

Teaching Interests

My teaching philosophy focuses instruction on student achievement of the course’s and program’s ultimate goals. This entails considering carefully how learning objectives along the way will lead to the ultimate goals and capabilities and designing assessments and lectures around them. Framing the components of class in a way that the students know what goals they are leading to helps align the instructor’s goals for the class with the students. The approach also involves recognizing where the students are in terms of content and skills from other courses so that the appropriate emphasis can be placed on the learning objectives they need to advance. I intend to apply this approach to both undergraduate and graduate courses.

  • Material Balances - The introductory undergraduate course. Underlying the complex analysis of systems that takes place in all classes in chemical engineering is a core set of principles. Many of these core principles are first introduced to students in the introductory material balances course. I believe that a good conceptual foundation laid down in the material balances course will help students develop an engineering frame of mind that will benefit them in their other courses and beyond.

  • Controls - The senior level undergraduate course. In industry, most interactions between engineers and their chemical systems are mediated by controls technology, making a functional understanding of controls neceessary for applying engineering insight from other domains. Controls involves a mixture of numerical tools and practical guidance that are rewarding to teach. I also see controls as one of the best courses available for giving students hands-on exposure to data science and the ever-present challenges of working with real data.

  • Reaction Engineering - The senior level undergraduate course or the graduate course. Before taking reaction engineering, undergraduate students will study the mathematical treatment of physical systems in transport phenomena and explicit study of organic chemistry. Reaction engineering and its study of kinetics with associated impacts on system design is an important bridge between these two realms of study. Similarly at the graduate level, reaction engineering links mathematical analysis with systems thinking in a uniquely chemical engineering context. It is a course that I am passionate about and one that relates to both my research experience and interests.

  • Machine Learning for Molecular Sciences - An elective graduate course or senior undergraduate elective. Data science and machine learning are increasingly applied to all manner of scientific and industrial contexts. This class will introduce the practical classes of ML models and data science best practices that students may encounter in the field. Once introduced to the basics, the course will focus on ML models that are applied applied to chemical systems and new developments in ML for chemistry.
  • Quantum Chemistry and Kinetics - An elective graduate course or senior undergraduate elective. This course will cover the mathematical underpinnings of quantum mechanical calculations and the practical aspects of carrying out such calculations in available software packages. This course will then use the output of quantum mechanical calculations to calculate thermochemical properties and rates of reaction through transition-state-theory. Other advanced kinetics applications would also be discussed, such as variational transition-state theory and master equation theory.