(169bo) Protein Preparation: Is One Protonation State Enough?
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computational Molecular Science and Engineering Forum
Poster Session: Computational Molecular Science and Engineering Forum
Monday, October 28, 2024 - 3:30pm to 5:00pm
By encompassing our expert knowledge from manual protein preparation, weâve built a tool to enumerate multiple protonation and rotameric states efficiently. Even if we were to assume a single protein state is sufficient, we currently do not have access to all feasible states in which to identify âthe bestâ singular one. With several amino acids having multiple protonation and/or rotameric state, it can be quite daunting to see this problem as a combinatorial explosion. By exploiting the local environment we can evaluate the hydrogen bonding energy around an ambiguous residue to identify unlikely states and reduce the number of possible outcomes. At the end, there may be a handful of residues with multiple solutions that we use to generate all possible states, all of which can be completed within minutes. Using unrestrained molecular dynamics we then can get an understanding of how the varied protonation and rotameric states recapitulate the X-ray structure.
Multiple rotamer and protonation states can exist that are consistent with the X-ray structure, therefore making the selection of a single state arbitrary. We evaluate the rotamer and protonation states of 68 proteins from the DUD-E set3 and refine the structures using 10ns of unrestrained molecular dynamics to understand the number of possible states and how they relate back to the X-ray. For most protein systems there are less than 10 possible states found in the preparation, with the maximum just over 250 (black squares in Figure 1). Although there are many potential states for some proteins, they are still trackable, nullifying the assumption of this being a combinatorial explosion. Furthermore, the average structure from each simulation was compared to the X-ray structure and the RMSD was calculated for 50, 70, and 90% of the atoms by B-factor with the number of states less than 1.0Å reported (Figure 1). In most of the proteins there are multiple states where 50-70% of the atoms by B-factor match the X-ray with an RMSD < 1.0Å, underscoring how the model varies widely and selection of a single state is arbitrary.
These findings have downstream consequences to understanding the protein and the selection of a single state can be detrimental when attempting to discover novel chemical matter. Given a fixed protein conformation from the X-ray in this study, we can see how residues proximal to the binding pocket vary and can influence the environment quite dramatically. In several PDB entries we found that there are residues adjacent to significant pockets that have multiple stable states throughout the 10ns unrestrained MD simulation. In all cases the varying states lead to dramatic changes in local environment with acceptor, donors, or lipophilic groups all being stable. Using a singular state would result in critical information missing when making design choices, undermining the predictive nature of the methods.
This work is just the start of a much larger conversation around protein preparation but provides a much more promising outlook on enumerating protein states. In all cases explored here there are multiple rotameric and protonation states that vary and still recapitulate the X-ray structure, but all are still tractable given the sufficient compute resources. For the purpose of optimization or exploring a fixed protein conformation, one could easily study all relevant states and make more informed decisions during the design process. Expanding this concept, we can see how these different states can impact sampling when it comes to protein motion or the conformational landscape when running long MD.
Figure 1: For 68 protein systems selected form the DUD-E set3 this shows the total number of states determined by our method (black squares), the total number of these states which produce a time averaged structure after 10ns of unrestrained MD with an RMSD of less than 1 for lowest 50% of the atom in the original X-ray structure selected by B-factor (red diamonds), the total number of states with RMSD less than 1 for the lowest 70% by B-factor (blue triangles) and the total number of states with RMSD<1 for the lowest 90% by B-factor (brown circles).
References
1: Bender, B.J., Gahbauer, S., Luttens, A. et al. A practical guide to large-scale docking. Nat Protoc 16, 4799â4832 (2021). https://doi.org/10.1038/s41596-021-00597-z
2: SPRUCE: OpenEye tools
3: DUD-E Mysinger MM, Carchia M, Irwin JJ, Shoichet BK J. Med. Chem., 2012, Jul 5. doi 10.1021/jm300687e .