MLProScape: Machine Learning (ML) Based Method for Engineering Enzymes Faster By Modeling Protein Fitness Landscape (ProScape) | AIChE

MLProScape: Machine Learning (ML) Based Method for Engineering Enzymes Faster By Modeling Protein Fitness Landscape (ProScape)

Authors 

Gupta, S. T. P. - Presenter, Great Lakes Bioenergy Research Center
Glasgow, E., University of Wisconsin Madison, Madison, WI
Fox, B. G., Great Lakes Bioenergy Research Center
Ramanathan, P., University of Wisconsin Madison
Reed, J., University of Wisconsin Madison
Protein engineering aims to improve the functional properties of a protein, such as thermo-stability, binding affinity, and/or catalytic activity and has become a vital step in developing industrial enzymes. This work describes a machine learning (ML) based method called ‘MLProScape’ that builds an accurate model of the protein fitness landscape (ProScape) and then, uses the model to design synthetic protein designs with superior functional properties. Unlike approaches using directed evolution, which requires experimentally screening millions of protein variants, MLProScape requires fewer protein variants (on the order of tens to hundreds) to be tested experimentally, owing to the power of statistical inference.

As a proof-of-concept, MLProScape was applied to enhance the catalytic activity of glycoside hydrolases – a key enzyme used to degrade lignocellulosic biomass for biofuel production. Experimentally measured specific activities for a diverse set of glycoside hydrolases were used to train the ML models. The resulting elastic net regression models have a high predictive power (with correlation coefficient and R2 values as high as 0.896 and 0.714, respectively, between the predicted and experimentally measured specific activities using a 5-fold cross validation). Moreover, by using position specific features, amino acid positions distal to the active site that might play a key role in modulating the activity level can be identified. MLProScape is also capable of modeling complex design criteria, such as engineering the catalytic activity of an enzyme towards multiple substrates simultaneously, as well as, to account for other desirable traits such as high stability and better in vivo expression.

Development of methods like MLProScape will complement and add value to the current growing repertoire of in-silico pathway engineering tools as it will enable metabolic engineers to alleviate bottleneck steps en route target chemical of interest.

Key words: machine learning, enzyme engineering, sequence-to-function, experimental design