(4em) Molecular Simulations, Neural Networks, and Active Learning for Molecular Design | AIChE

(4em) Molecular Simulations, Neural Networks, and Active Learning for Molecular Design

Research Interests:

Molecular dynamics simulations are powerful tools capable of predicting important physical quantities ranging from the viscosity of liquids to the strength of drug binding while simultaneously yielding a full molecular picture of the phenomena of interest. In the past decade, the advent of GPU computing has resulted in dramatic improvements in the computational speed of MD simulations, making it possible to study and predict the behavior of larger, more complex systems. Concurrently, the field of deep learning has experienced a renaissance, with neural networks being successfully employed for a variety of problems including reading text, classifying images, and even recently folding proteins. Thus, a new field combining the wealth of data arising from rapid MD simulations and the deep learning (DL) tools to mine this data is emerging. My research group will explore the intersection between molecular simulations, statistical physics, and machine learning to develop integrated MD/DL workflows capable of exploring chemical space efficiently and thoroughly to design novel compounds and materials with optimized properties.

PhD Research (2016-2020):

I carried out my PhD research at RPI with Steve Cramer and Shekhar Garde and at Lawrence Livermore National Laboratory (LLNL) where I was mentored by Ed Lau through the Advanced Simulation and Computation Graduate Fellowship Program. My research focused on understanding how water mediates complex, multi-modal interactions in the context of protein chromatography using MD simulations. In particular, we were interested in understanding why a series of commercially available chromatography ligands that were structurally similar behaved differently when separating proteins. In collaboration with industrial researchers at Merck and Bio-Rad Laboratories, we explored this question by performing extensive simulations of multimodal chromatographic surfaces and found that interactions between neighboring chromatography ligands drive the self-assembly of patterned surfaces containing large hydrophobic and charged patches. Moreover, we discovered that the length-scale and distribution of these patches can be controlled by ligand chemistry and solution conditions. We used these insights to develop a Monte Carlo model to rapidly evaluate pattern formation for new multimodal ligands as well as to test hypotheses about the fundamental nature of these self-assembly processes.

PostDoc Research (2020-2021):

I am currently working as a postdoctoral researcher at MIT with Klavs Jensen focusing on developing generative deep learning models to discover optimized compounds for a variety of molecular design problems including designing novel dyes, enantioselective catalysts, and soluble drugs. We have recently developed a generative modeling workflow that is capable of producing novel molecules that are optimized with respect to multiple objectives or constraints. Through collaborations with Dow Chemical Company, the Machine Learning and Pharmaceutical Discovery and Synthesis Consortium, and other groups in the MIT Chemical Engineering and Computer Science Departments, I have been able to design molecules with optimized properties that can be synthesized and tested in an experimental setting. In particular, I am working to combine our generative model with state-of-the-art property prediction models, uncertainty characterization approaches, retrosynthetic pathway builders, and active learning techniques in order to integrate it into a closed-loop, experimental workflow.

Future Directions:

My unique combination of research experiences has strategically positioned me to build a group that focuses on developing integrated molecular simulations and machine learning workflows to solve important challenges in molecular design. My group will explore the following research directions:

  1. Characterizing Pattern Formation on Functionalized Surfaces using Simulations and Machine Learning

Recent findings from my PhD work showed that tunable patterns can be formed on surfaces functionalized with small, multi-modal ligands. These tunable patterns can be extremely valuable for developing surfaces with desirable properties ranging from anti-fouling to selective adsorption. My group will develop an integrated machine learning and simulation workflow to efficiently explore the surface pattern design space. We will additionally explore approaches for generative modeling to achieve a reversible mapping between ligand structure and the resulting, self-assembled surface pattern. Finally, we will work with experimental collaborators to validate the observed surface properties.

  1. Capturing the Context-Dependence of Hydrophobicity using Machine-Learned Molecular Neighborhood Fingerprints

While macroscopic hydrophobicity can be measured using a simple contact angle, molecular-scale hydrophobicity is context dependent and challenging to characterize, typically requiring expensive enhanced sampling MD simulations. Despite this, molecular-scale hydrophobicity plays a key role in governing important phenomena ranging from the behavior of colloids to protein folding. Guided by my expertise in water-mediated interactions, my group will develop a new deep learning architecture to represent—and predict quantities from—molecular neighborhoods. We will focus on solving the challenge of predicting context-dependent hydrophobicity near surfaces and proteins, a problem of key importance in many biophysical systems.

  1. Breaking Through the Accuracy-Cost Tradeoff in Molecular Simulations using Active Learning and Machine Learning

While molecular simulations are versatile tools that can be applied to quantify a range of properties, there exists a tradeoff between simulation cost and accuracy. Some research groups are addressing this by training machine learning models to predict simulation outcomes, but this strategy requires generating vast amounts of training data up front. This limits the time/length scales and the chemical complexity of the types of problems that can be solved using this approach. To this end, my group will develop an active learning workflow capable of performing simulations at multiple levels of fidelity in an automated fashion to obtain highly accurate machine learning models with minimal computational expense. This research will build on recent advances in transfer learning and uncertainty quantification to enable the combination of data from multiple accuracy levels. While we will focus our attention on developing these workflows for applications using molecular simulations, the resulting machine learning and active learning tools will also well suited to high throughput experimentation platforms.

Teaching Interests:

ChemE Core Courses: Thermodynamics, Separations

ChemE Electives: Machine Learning (focusing on ChemE applications), Statistical Mechanics, Molecular Modeling

I have been fortunate enough to mentor a variety of students at the high school, undergraduate, and graduate levels through the research process. Helping students grow and develop their passion for research to the point where they are excited to work independently and propose new ideas has been extremely rewarding. In addition to mentoring students, I have tutored in undergraduate Fluid Mechanics and guest-lectured in Chromatographic Separations (an elective course at RPI). Most recently, I have been awarded a Communication Fellowship through the MIT Communication Lab. As a fellow, I am currently organizing workshops and mentoring students and postdocs in one-on-one settings to help them improve their communication skills.