(394c) Experimentally-Informed Optimization of Amino Acid Interactions for Predictions of Intrinsically Disordered Protein Phase Behavior
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computational Molecular Science and Engineering Forum
Machine Learning for Soft and Hard Materials
Tuesday, October 29, 2024 - 3:54pm to 4:06pm
The demixing of intrinsically disordered proteins (IDPs) into coexisting protein-deficient and protein-enriched phases through liquid-liquid phase separation (LLPS) underlies the formation of membraneless organelles (MLOs). The formation and dissociation of MLOs, which play a key role in numerous cellular processes such as transcription and stress response, displays sensitivity to the IDP sequence and environmental conditions. Due to the large length and time scales involved in the process of LLPS, coarse-grained (CG) models have been instrumental in studying the effects of the primary sequence of IDPs on their phase behavior. However, to accurately model LLPS of a diverse set of IDPs using CG models, two main challenges remain. First, an appropriate mapping of the desired system into the reduced representation is required. Second, the reduced representation must be transferable, i.e. the model needs to be able to describe most proteins. To counter these challenges, amino-acid specific interaction potentials are usually developed and refined against experimentally derived single-chain dimensions of a subset of IDPs and validated against in-vitro phase separation assays targeting a handful of residue substitutions. In this work, we examine whether more accurate models of protein LLPS can be developed by accounting for in-vitro phase behavior in the optimization of interaction potentials. To do this, we adopt a Bayesian optimization protocol, iterating through the pair potential parameters against recently collected experimental data to probe a wider range of amino acid substitutions. We compare the pair potentials optimized for different proteins, as well as different physics-based loss functions to elucidate the factors influencing the development of transferable CG models. The insights gained in this work facilitate a deeper understanding of the applications of data-driven methods in understanding sequence-dependent phase separation of biomolecules.