(218f) Using Machine Learning Approaches to Estimate Enzyme Kinetic Parameters

Conference

AIChE Annual Meeting

Year

2022

Proceeding

2022 Annual Meeting

Group

Topical Conference: Applications of Data Science to Molecules and Materials

Session

Applications of Data Science in Molecular Sciences II

Time

Monday, November 14, 2022 - 4:45pm to 5:00pm

Authors

Boorla, V. S. - Presenter, Pennsylvania State University

Maranas, C. D.

The catalytic turnover number (k_cat) is a key kinetic property an enzyme defined as the maximal number of molecules of substrates converted to products per active site per unit time. Accurate and generalizable methods for estimating enzyme kinetic parameters such as the one presented here can be invaluable for applications ranging from metabolic modeling to enzyme re-engineering. Enzyme databases such as BRENDA¹ contain a repository of turnover numbers measured in vitro. However, the available data is noisy due to various experimental conditions and lack of proper annotations for several entries. We curated a dataset of ~6,000 turnover numbers from BRENDA by applying several quality filters. Using this dataset, we trained a convolutional neural network (CNN) model to learn amino-acid embeddings that accurately estimate k_catvalues when used as enzyme features along with morgan fingerprints as substrate features. The trained model achieved an average Pearson correlation coefficient of 0.78 (standard deviation 0.008) and an average root mean squared error of 0.96 (standard deviation 0.061) in a 5-fold cross validation evaluation. The root mean squared error of 0.96 in log scale corresponds to less than an order of magnitude error in linear scale which is quite low compared to the overall range of k_catvalues (1E-06 to 1E+07). The low standard deviations across cross-validation suggests a robust and generalizable training across the entire dataset. The success of our model can be attributed to the ability of CNNs to extract complex local patterns of amino acid residues that may be responsible for actual enzyme-substrate interactions on the molecular level. Comparison of our model to existing methods along with its current limitations and provision for improvements will be discussed. In particular, the use of state-of-the-art protein language model embeddings as features and the use of graph-based architectures that overcome the limitations and CNNs and provide more meaningful insights will be discussed. By training ML models to accurately capture the mapping between amino acid mutations and changes in turnover numbers, they can be used to guide directed evolution and/or targeted enzyme engineering approaches.

References:

Chang, Antje et al. â€œBRENDA, the ELIXIR core data resource in 2021: new developments and updates.â€ Nucleic acids research vol. 49,D1 (2021): D498-D508.

Topics

Protein Engineering

Computational Molecular Engineering

Metabolic Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: December 2024

CEP: November 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.