(310a) Denoising Diffusion Models Meet Statistical Mechanics for Protein Protein Docking | AIChE

(310a) Denoising Diffusion Models Meet Statistical Mechanics for Protein Protein Docking

Authors 

Chu, L. S., Johns Hopkins University
Gray, J. J., John Hopkins University
Chen, J., Merck
Understanding how proteins interact is crucial in most biological processes as protein-protein interactions are involved in nearly all cellular functions in living organisms, from enzyme catalysis to signaling and gene regulation. Emerging protein-protein docking methods using generative AI provide a fast way to predict protein complex structure and its binding affinity given the structures of its unbound monomeric partners.

Traditional protein docking approaches comprise of two stages: (1) a sampling step to generate ensembles of candidate docked structures and (2) a ranking step to evaluate the docked poses from the sampling step. Recent breakthroughs in deep learning, particularly in likelihood-based models such as denoising diffusion probabilistic models, have been proposed for testing the rigid-body protein-protein docking task. The main idea behind a diffusion process is transforming a data distribution to a Gaussian prior over a series of time steps and learning a score function, which is the gradient of the log probability density function ∇xlogPt(x). We can then use the learned score function to sample from the underlying probability distribution of the data. Using this approach, we have developed a rigid body docking method: DockDiffusion. DockDiffusion was trained by adding rotation and translation noise to docked protein complexes, which the model then learns to reverse. During inference, the model takes two unbound monomers as input and solves a reverse diffusion process using a stochastic ordinary differential equation.

A grand question in deep learning-based protein structure prediction methods is whether these models have learned any thermodynamics associated with protein folding. To approach this question, we have estimated the change in exact likelihood (logPo(x) - logPT(x) = -ΔlogP) of a pre-trained diffusion model for the protein-protein docking task and correlated it with the Rosetta energy function. We draw inspiration from statistical mechanics, where the thermodynamic potential (Gibbs energy in the case of an isobaric-isothermal ensemble) is proportional to the log of the probability distribution of observing a system in a particular state. Hence, -ΔlogP in the diffusion-based protein docking task might be a measure of the binding energy between two protein partners in contact. This exact likelihood estimation from a diffusion-based protein docking model will serve us two purposes: (1) first, it will help us to rank candidate poses without an external energy function, and (2) second, it will help us interpret whether diffusion-based docking methods have learned generalizable physical principles. We observed a robust correlation between -ΔlogP and the Rosetta energy function in one test case, whereas in another test case, no discernible correlation was detected between these two quantities. These results suggest that likelihood estimation does not correlate with Rosetta Energies and such models might have not yet “learned” generalizable physics. Answering fundamental questions like why and how deep-learning models work for protein design and modeling will help us build more robust tools for biomolecular engineering.