(284b) Single Sequence Prediction of Protein Structure and Impacts on Computational Protein Design
AIChE Annual Meeting
2022
2022 Annual Meeting
Topical Conference: Chemical Engineers in Medicine
Big Data and Machine Learning to Advance Medicine
Tuesday, November 15, 2022 - 8:19am to 8:38am
The long-term goal of Chowdhury Lab is to simulate in a bottom-up fashion, a simple metazoan cell with sufficient fidelity such that any experimentally measurable cell-biological quantity can be predicted in silico with comparable accuracy. To this end we have first set up a deep learning pipeline to predict protein structures directly from sequences using natural language processing. This paves the path for us to understand protein-protein and protein-non-protein interactions. These models extract the âgrammarâ of protein folding from a corpus of all known ~250M protein sequences and unravel general rules that guide the folding patterns. Much like the real biological process of protein folding, which does not depend on any sequence alignment, we outperform MSA-dependent methods like AlphaFold2 on certain protein classes. We predict protein structure from its amino acid sequence without evolutionary data (in the form of MSAs). We thus map protein sequences to high-dimensional spaces to cluster functionally proximal proteins as nearby points without needing expensive 'labelled' data (e.g., experimental PDB structures).
We explain how this lays the foundation to construct, biochemistry-aware, machine learning models of proteins and their interactions to (a) derive insight about their functions, (b) tune existing proteins for biochemical and pharmaceutical applications, and (c) point-of-care diagnostics.