(272b) Machine Learning Predicts Functional Classes of Family 7 Glycoside Hydrolases with High Accuracy

Conference

AIChE Annual Meeting

Year

2018

Proceeding

2018 AIChE Annual Meeting

Group

Computational Molecular Science and Engineering Forum

Session

Data Mining and Machine Learning in Molecular Sciences I

Time

Tuesday, October 30, 2018 - 8:15am to 8:30am

Authors

Gado, J. - Presenter, University of Kentucky

Payne, C. M., National Renewable Energy Laboratory

Ståhlberg, J., Swedish University of Agricultural Sciences

Borisova, A., Swedish University of Agricultural Sciences

Glycoside hydrolases (GH) are a class of enzymes that catalyze the hydrolysis of glycosidic bonds in saccharides. They are utilized in industries, such as the biofuel and textile industries, for enzymatic degradation and reorganization of saccharides. GHs are presently classified into 152 families based on sequence identity. Family 7 glycoside hydrolases (GH7s) are predominantly found in fungi and are often the largest composition by mass of the secretomes of cellulolytic fungi. In the biofuel industry, GH7s are the primary components of the enzymatic cocktails used in cellulose degradation. GH7s fall into one of two classes: cellobiohydrolases (CBHs) or endoglucanases (EGs). GH7 CBHs hydrolyze cellulose processively, i.e. they carry out multiple catalytic steps without dissociating from the substrate. GH7 EGs, on the other hand, are non-processive and dissociate from the substrate after hydrolyzing a glycosidic bond. Processive GH7s (CBHs) have become a focus of research because they provide the greatest hydrolytic potential in enzymatic cellulose degradation. As many of the known GH7 sequences have not yet been classified in terms of activity, we have set out to develop a predictive approach for classifying GH7 activity. We first retrieved a large and diverse set of 1,521 GH7 sequences from the genomic databases. The functional classes (i.e. CBH or EG) are reported for only about 30% of these GH7s. We trained multiple machine learning classifiers (decision tree, SVM, naïve Bayes and logistic regression) using known structural differences between GH7 CBHs and EGs as features. We determined, using Monte Carlo cross validation, that the overall accuracy of the machine learning classifiers ranges from 95 to 97%, suggesting that GH7 functional class can be readily predicted from sequence alone.

Topics

Biofuels (Energy)

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

Foundations of Molecular Modeling and Simulation (FOMMS 2024)

2024 International Mammalian Synthetic Biology Workshop (mSBW)

Upcoming Conferences & Events

Foundations of Molecular Modeling and Simulation (FOMMS 2024)

2024 Brazil Student Regional Conference

2024 Dow Sponsored CCPS Process Safety Faculty Workshop

2024 International Mammalian Synthetic Biology Workshop (mSBW)

2024 Chemical Ventures Conference

2024 China Chem-E-Car Competition

2024 India Student Regional Conference

CCPS India Regional Meeting

CCPS Process Safety Knowledge Webinar (Brazil)

CEP: July 2024

CEP: June 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(272b) Machine Learning Predicts Functional Classes of Family 7 Glycoside Hydrolases with High Accuracy

AIChE Annual Meeting

2018

2018 AIChE Annual Meeting

Computational Molecular Science and Engineering Forum

Data Mining and Machine Learning in Molecular Sciences I

Tuesday, October 30, 2018 - 8:15am to 8:30am

Authors

Topics

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams