(435f) Playing Bayesian Jigsaw with PDB Structures: Optimizing Rigid Body Representations for Integrative Modeling of Macromolecular Complexes | AIChE

(435f) Playing Bayesian Jigsaw with PDB Structures: Optimizing Rigid Body Representations for Integrative Modeling of Macromolecular Complexes

Authors 

Sanyal, T. - Presenter, University of California Santa Barbara
Sali, A., University of California San Francisco
Chait, B. T., The Rockefeller University
Integrative modeling of macromolecular protein complexes typically involve representing structurally available components of the complex from X-ray crystallography, cryo-electron microscopy or comparative modeling, as rigid bodies, and sampling their relative spatial arrangements to generate an ensemble of models that are consistent with input experimental information. While there exist some sparse guidelines in the literature for optimizing the resolution of flexible regions used to represent missing residues within and between the rigid components, the process of defining rigid regions is entirely ad-hoc, often leading to several tedious rounds of trial and error between the computational modeler and their experimental collaborator, alternating between guessing (based on prior experience and literature) a rigid body definition and running a computationally expensive structural sampling using this definition, until the model achieves both a desirable precision and a quantifiably good fit to input data. Here, we combine concepts from stochastic graph partitioning and on-lattice spin-glass statistical mechanical models, and develop a maximum likelihood algorithm to automatically score and sample through alternate rigid body definitions and provide the modeler with an optimal set of rigid components for subsequent structural assembly. We illustrate our method with the example of a pathological case of structural inconsistency between a cryo-EM structure of the eleven-subunit yeast helicase complex CMG, and a high density dataset of ~1000 chemical crosslinks that describe its binding to the Mcm10 protein. Our algorithm efficiently dissects CMG into an optimal collection of rigid bodies without requiring the modeler to manually enumerate all such cases, and the optimal set of rigid components significantly increases the overall crosslink satisfaction compared to treating the entire CMG as a single rigid body. The method is standalone and could serve as a general tool for pre-processing available structures to ensure greater consistency with available crosslink data, prior to carrying out computationally expensive structural sampling