(310a) Denoising Diffusion Models Meet Statistical Mechanics for Protein Protein Docking
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computational Molecular Science and Engineering Forum
Recent Advances in Molecular Simulation Methods II
Tuesday, October 29, 2024 - 12:30pm to 12:42pm
Traditional protein docking approaches comprise of two stages: (1) a sampling step to generate ensembles of candidate docked structures and (2) a ranking step to evaluate the docked poses from the sampling step. Recent breakthroughs in deep learning, particularly in likelihood-based models such as denoising diffusion probabilistic models, have been proposed for testing the rigid-body protein-protein docking task. The main idea behind a diffusion process is transforming a data distribution to a Gaussian prior over a series of time steps and learning a score function, which is the gradient of the log probability density function âxlogPt(x). We can then use the learned score function to sample from the underlying probability distribution of the data. Using this approach, we have developed a rigid body docking method: DockDiffusion. DockDiffusion was trained by adding rotation and translation noise to docked protein complexes, which the model then learns to reverse. During inference, the model takes two unbound monomers as input and solves a reverse diffusion process using a stochastic ordinary differential equation.
A grand question in deep learning-based protein structure prediction methods is whether these models have learned any thermodynamics associated with protein folding. To approach this question, we have estimated the change in exact likelihood (logPo(x) - logPT(x) = -ÎlogP) of a pre-trained diffusion model for the protein-protein docking task and correlated it with the Rosetta energy function. We draw inspiration from statistical mechanics, where the thermodynamic potential (Gibbs energy in the case of an isobaric-isothermal ensemble) is proportional to the log of the probability distribution of observing a system in a particular state. Hence, -ÎlogP in the diffusion-based protein docking task might be a measure of the binding energy between two protein partners in contact. This exact likelihood estimation from a diffusion-based protein docking model will serve us two purposes: (1) first, it will help us to rank candidate poses without an external energy function, and (2) second, it will help us interpret whether diffusion-based docking methods have learned generalizable physical principles. We observed a robust correlation between -ÎlogP and the Rosetta energy function in one test case, whereas in another test case, no discernible correlation was detected between these two quantities. These results suggest that likelihood estimation does not correlate with Rosetta Energies and such models might have not yet âlearnedâ generalizable physics. Answering fundamental questions like why and how deep-learning models work for protein design and modeling will help us build more robust tools for biomolecular engineering.