(147ac) Empowering Molecular Simulations with Machine Learning to Understand and Engineer Molecular Systems | AIChE

(147ac) Empowering Molecular Simulations with Machine Learning to Understand and Engineer Molecular Systems

Authors 

Dasetty, S. - Presenter, The University of Chicago
Research Interests

The central theme of my interests are in developing and applying inverse computational methods to solve (bio)molecular and materials science problems. In recent work, I have focused on developing and applying data-driven methods comprising molecular simulations, enhanced sampling, and machine learning to (a) inverse design of self-assembling materials, (b) design molecular probes for detecting forever chemicals in water, (c) build robust multiscale models for understanding complex biological processes, and (d) develop a scientific software for enabling enhanced sampling with GPUs.

Abstract

Empowering conventional methods with machine learning enables robust and efficient data-driven solutions to various problems such as discovery of novel sustainable materials, catalyst optimization and development of effective therapeutics. In my presentation, I will discuss our work on integrating classical molecular simulations and machine learning to (a) engineer molecular probes for detecting forever chemicals in water and (b) map transient structures of multimolecular systems that are not accessible via conventional methods.

Per- and polyfluoroalkyl substances (PFAS) are highly persistent chemicals often referred as “forever chemicals” can cause detrimental effects in humans such as high cholesterol levels, birth defects, and cancer. Robust and efficient methods to detect PFAS in water are highly desirable. To this end, we developed a hybrid machine learning and enhanced sampling approach to discover effective molecular probes in a vast chemical search space to detect PFAS in water. This approach integrates molecular simulations, metadynamics, deep representational learning of probes and multi-objective Bayesian optimization. We demonstrate our approach to engineer linear probes for selectively binding with a harmful forever chemical named perfluorooctanesulfonic acid (PFOS) in the presence of an interferent sodium dodecyl sulfate (SDS). By searching just ~6% of the search space, our results revealed the optimal sensitivity (KbPFOS=177.4±12.7) and selectivity (KbPFOS/ KbSDS=4.6±1.7) that can be achieved by linear probes, thereby benchmarking their potential applications as molecular probes. In my presentation, I will discuss the development of our computational framework and present the discovered optimal linear probes as well as their design rules.

In addition, I will present a multiscale modeling method to predict transient structures of complex multimolecular systems for which sampling via conventional methods is computationally prohibitive. Modeling theses rare events provides another example of applications of molecular simulations empowered with machine learning. Our method integrates nonlinear manifold learning techniques, deep generative modeling methods, and biased molecular simulations. I will demonstrate the capability of our method in predicting transient structures of integrin (a large heterodimer protein). Using diffusion maps, we learn a unified low dimensional embedding of the gross structural relationships of integrin in different metastable states and pair this with an inverse mapping between the learnt low dimensional embedding with the high-dimensional molecular state using generative adversarial networks (GAN). Finally, we will show the transient structures of integrin hallucinated by the trained GAN and discuss their biophysical significance.