(346ay) How Can Machine Learning Accelerate the Sampling and Interpretation of Molecular Dynamics Simulations? | AIChE

(346ay) How Can Machine Learning Accelerate the Sampling and Interpretation of Molecular Dynamics Simulations?

Authors 

Feng, J. - Presenter, University of Illinois at Urbana-Champaign
Selvam, B., UIUC
Shukla, D., Massachusetts Institute of Technology
Molecular dynamics simulations offer high spatial and temporal resolutions to decipher functional mechanisms of membrane proteins. Major improvements in hardware, algorithms, and computational resources have made it possible now to investigate larger and more realistic protein systems. However, the capabilities of molecular dynamics simulations are still severely limited by three major problems. First, only a few crystal structures are available for membrane proteins and these structures are mostly restricted
to key metastable states that the protein adopts. Second, membrane protein simulations require access to large computational resources, of the order of several 100,000 hours on a single Nvidia GTX1080 GPU using a popular molecular simulation package such as NAMD. Finally, it is difficult to extract valuable insights from the resulting high-dimensional simulation data (several terabytes).

In this study, we aim at developing efficient computational tools to accelerate the sampling and interpretation of molecular dynamics simulations. To address the absence of structural information, we developed a machine learning based algorithm (FingerprintContacts) to quickly predict multiple protein structures by combining agglomerative clustering and co-evolutionary information. We have demonstrated the capabilities of FingerprintContacts on eight proteins with varying conformational motions. To enhance the sampling efficiency, we proposed that evolutionary couplings can be used as reaction coordinates to efficiently guide the sampling of complex conformational free energy landscapes. To interpret the resulting high-dimensional simulation data, we developed a genetic algorithm based method to automatically select features for dimensionality reduction. The integration of the developed algorithms and all-atom molecular dynamics simulations has allowed us to characterize long timescale conformational transitions and the complete substrate translocation cycle of two nitrogen transporters. This work would establish efficient computational frameworks for understanding long timescale biophysical processes.