(594d) Enzrank-2.0: A Deep Learning Tool for Selection and Re-Design of Enzymes for Novel Substrates
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Food, Pharmaceutical & Bioengineering Division
Machine Learning Based Protein Engineering
Wednesday, October 30, 2024 - 4:46pm to 5:04pm
Here, we propose a novel deep learning tool, EnzRank-2.0, to address these challenges. We train enzyme-sequence and their respective reaction compatibility models by adapting pretrained protein language models (ESM4), and reaction fingerprints respectively for their latent representations. Using contrastive learning, we project the enzyme encodings and reaction fingerprints into a joint latent space such that the trained ML model assigns higher compatibility scores to known (positive) enzyme-reaction pairs, while assigning lower scores to negative pairs (from a synthesized dataset of randomly generated enzyme-reaction associations). This trained model can be readily used for prioritization of enzyme sequences for novel reactions identified in biosynthesis pathway design. By integrating the trained model with a generative deep learning framework5 that uses the ESM protein language model decoder, we further extend the capability of EnzRank-2.0 to optimize wild-type enzyme sequences by introducing mutations for improved compatibility scores to a given reaction. This capability allows for both a high-throughput screening of natural enzymes and their optimization for novel reaction compatibility. Libraries of highly promising enzyme sequences that include both natural and mutant enzyme sequences can be tested experimentally potentially improving the entire enzyme discovery pipeline. Because entire reaction fingerprints are used during training, the proposed tool can be exploited to design co-substrate/cofactor specificity as well. We envision that this paradigm can serve as an integrated tool to address enzyme selection and design within biosynthetic pathways.
References
- Kumar, A., Wang, L., Ng, C. Y. & Maranas, C. D. Pathway design using de novo steps through uncharted biochemical spaces. Nat Commun 9, (2018).
- Upadhyay, V., Boorla, V. S. & Maranas, C. D. Rank-ordering of known enzymes as starting points for re-engineering novel substrate activity using a convolutional neural network. Metab Eng 78, (2023).
- Kroll, A., Ranjan, S., Engqvist, M. K. M. & Lercher, M. J. A general model to predict small molecule substrates of enzymes based on machine and deep learning. Nat Commun 14, 2787 (2023).
- Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science (1979) 379, 1123â1130 (2023).
- Verkuil, R. et al. Language models generalize beyond natural proteins. bioRxiv 2022.12.21.521521 (2022) doi:10.1101/2022.12.21.521521.