(284b) Single Sequence Prediction of Protein Structure and Impacts on Computational Protein Design | AIChE

(284b) Single Sequence Prediction of Protein Structure and Impacts on Computational Protein Design

Authors 

Chowdhury, R. - Presenter, Harvard Medical School
The long-term goal of Chowdhury Lab is to simulate in a bottom-up fashion, a simple metazoan cell with sufficient fidelity such that any experimentally measurable cell-biological quantity can be predicted in silico with comparable accuracy. To this end we have first set up a deep learning pipeline to predict protein structures directly from sequences using natural language processing. This paves the path for us to understand protein-protein and protein-non-protein interactions. These models extract the ‘grammar’ of protein folding from a corpus of all known ~250M protein sequences and unravel general rules that guide the folding patterns. Much like the real biological process of protein folding, which does not depend on any sequence alignment, we outperform MSA-dependent methods like AlphaFold2 on certain protein classes. We predict protein structure from its amino acid sequence without evolutionary data (in the form of MSAs). We thus map protein sequences to high-dimensional spaces to cluster functionally proximal proteins as nearby points without needing expensive 'labelled' data (e.g., experimental PDB structures).

We explain how this lays the foundation to construct, biochemistry-aware, machine learning models of proteins and their interactions to (a) derive insight about their functions, (b) tune existing proteins for biochemical and pharmaceutical applications, and (c) point-of-care diagnostics.