(2fs) Towards Machine Learning Prediction of Kinetic Properties of Enzyme Variants | AIChE

(2fs) Towards Machine Learning Prediction of Kinetic Properties of Enzyme Variants

Authors 

Boorla, V. S. - Presenter, Pennsylvania State University
The turnover number (kcat) and Michaelis constant (Km) of the Michaelis-Menten equation are important enzymatic kinetic parameters needed for assessing individual enzyme performance and parameterizing metabolic models. Their values tend to span orders of magnitude, reflecting the nature of the reaction chemistry, the enzyme's sequence and structural properties, and the bioenergetic/biosynthetic demands imposed by living cells. De novo prediction of these values has been quite challenging due to the mechanistic complexity of the underlying biological processes. Nevertheless, a few recent studies have shown the possibility of training organism-independent machine learning models for predicting either kcat values or Km values using only the enzyme's amino acid sequence and substrate's chemical features. However, due to the requirement of enzyme sequences, most of the available experimental data were not used for training in any of the existing methods. To address this issue, we present a novel sequence-agnostic training framework called DeepMMPred to learn separate feature vectors for the EC number and the Taxonomy Classification (TC) of the organism of origin for each enzyme. We train optimal numerical values for these embedding vectors, along with task-specific graph neural network-based molecular fingerprints for substrates, by fitting them to kcat and Km measurements of a dataset curated from the BRENDA database. DeepMMPred achieves a predictive performance of 0.45 coefficient of determination (R2) and 0.85 mean absolute error (MAE) and 0.45 R2, 0.75 MAE for kcat and Km prediction in log-scale respectively in randomized cross-validation evaluation studies. In addition, we perform comprehensive evaluations on enzyme-blind and organism-blind cross validation studies to infer on the extrapolation capability of the training framework for prediction on uncharacterized enzymes and organisms respectively. Furthermore, we present, DeepMMPred-Seq, that appends additional features of enzyme sequences and extend to prediction of kinetic parameters for enzyme variants using available datasets. We use sequence features extracted from state-of-the-art enzyme language models such as the ESM2. The trained models are evaluated on the capability to transfer knowledge for enzyme variant kinetic parameter prediction both with and without training on large-scale experimental datasets of kinetic measurements of enzyme variants. We highlight the current bottlenecks in training such models and their applicability to enzyme engineering applications.

Research Interests:

• Deep Learning algorithms for enzyme kinetic property prediction
• Modeling of protein-protein complexes and binding affinity prediction
• Design of targeted antibodies
• Computational design of Protein-based nanopores
• Protein-structure informed metabolic modeling