(594d) Enzrank-2.0: A Deep Learning Tool for Selection and Re-Design of Enzymes for Novel Substrates | AIChE

(594d) Enzrank-2.0: A Deep Learning Tool for Selection and Re-Design of Enzymes for Novel Substrates

Authors 

Boorla, V. S. - Presenter, Pennsylvania State University
Hartley, A., The Pennsylvania State University
Retro-biosynthesis algorithms assemble pathways to convert a source metabolite to a target by searching databases of known enzyme chemistries1. However, due to the vast sequence space of enzymes, there are often several sequence candidates that are known to perform a given chemistry. There exist several computational tools like EnzRank2, ESP3 etc. that use machine learned (ML) associations between natural enzyme sequences and corresponding substrates to enable prediction of enzyme-substrate compatibilities. Such tools can be used to screen large numbers of enzyme sequences to shortlist candidates with the highest potential to have any required activity. Promising candidate sequences are tested experimentally, with successful candidates subject to structure-guided or directed-evolution based enzyme design campaigns. However, due to the expensive nature of enzyme design campaigns, they are often limited in their throughput. This disconnect between enzyme selection and design can miss potentially active sequences that are a few mutations away from the untested sequences. ML models like the ones mentioned above are trained only on natural sequences and thus have limited applicability for screening non-natural (mutant) enzymes. Furthermore, existing ML tools are inadequate in accounting for the effect of co-substrates, co-factors, etc. that are essential for enzymatic reactions. On the other hand, while structure guided algorithms can screen the effects of mutations, they require extensive knowledge of enzyme-substrate binding conformations which are difficult to obtain for most reactions.

Here, we propose a novel deep learning tool, EnzRank-2.0, to address these challenges. We train enzyme-sequence and their respective reaction compatibility models by adapting pretrained protein language models (ESM4), and reaction fingerprints respectively for their latent representations. Using contrastive learning, we project the enzyme encodings and reaction fingerprints into a joint latent space such that the trained ML model assigns higher compatibility scores to known (positive) enzyme-reaction pairs, while assigning lower scores to negative pairs (from a synthesized dataset of randomly generated enzyme-reaction associations). This trained model can be readily used for prioritization of enzyme sequences for novel reactions identified in biosynthesis pathway design. By integrating the trained model with a generative deep learning framework5 that uses the ESM protein language model decoder, we further extend the capability of EnzRank-2.0 to optimize wild-type enzyme sequences by introducing mutations for improved compatibility scores to a given reaction. This capability allows for both a high-throughput screening of natural enzymes and their optimization for novel reaction compatibility. Libraries of highly promising enzyme sequences that include both natural and mutant enzyme sequences can be tested experimentally potentially improving the entire enzyme discovery pipeline. Because entire reaction fingerprints are used during training, the proposed tool can be exploited to design co-substrate/cofactor specificity as well. We envision that this paradigm can serve as an integrated tool to address enzyme selection and design within biosynthetic pathways.

References

  1. Kumar, A., Wang, L., Ng, C. Y. & Maranas, C. D. Pathway design using de novo steps through uncharted biochemical spaces. Nat Commun 9, (2018).
  2. Upadhyay, V., Boorla, V. S. & Maranas, C. D. Rank-ordering of known enzymes as starting points for re-engineering novel substrate activity using a convolutional neural network. Metab Eng 78, (2023).
  3. Kroll, A., Ranjan, S., Engqvist, M. K. M. & Lercher, M. J. A general model to predict small molecule substrates of enzymes based on machine and deep learning. Nat Commun 14, 2787 (2023).
  4. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science (1979) 379, 1123–1130 (2023).
  5. Verkuil, R. et al. Language models generalize beyond natural proteins. bioRxiv 2022.12.21.521521 (2022) doi:10.1101/2022.12.21.521521.