(62g) Metabolic Modeling Tools in Kbase to Integrate Multi-Omics Data, Understand Microbiome Interactions, and Explore Energy Biosynthesis Mechanisms

Conference

AIChE Annual Meeting

Year

2022

Proceeding

2022 Annual Meeting

Group

Food, Pharmaceutical & Bioengineering Division

Session

Systems and Quantitative Biology: Modeling Biological Processes

Time

Monday, November 14, 2022 - 9:48am to 10:30am

Authors

Henry, C. S. - Presenter, Argonne National Laboratory

Increasingly multi-omics data is becoming more accessible for the study of a wide range of complex biological systems. Today, large-scale metagenomes can be readily obtained from soil microbiome systems, while the instruments and protocols surrounding the collection of metabolomic and proteomic data are constantly improving. Yet analysis methods still struggle to annotate and integrate these individual datasets, let alone combine them to discover new biological principles. Many of the species observed within microbiome systems have never been cultured and it is a challenge to assemble their genomes from metagenomic data; many of the genes observed within microbiomes are not annotated or misannotated; and many of the metabolites observed cannot be identified or associated with known biochemical pathways. Here we will discuss approaches we are applying with KBase and ModelSEED to address all three of these problems.

First, Iâ€™ll be discussing recent advances in our ModelSEED reconstruction pipeline to dramatically improve the quantitative accuracy of our draft models in predicting ATP yields across the tree of life. We apply this pipeline to 5000 reference and representative genomes, showing dramatic improvements in model size, accuracy, and completeness compared with previous versions of ModelSEED. In comparing to competing tools like CarveMe, we show improved accuracy. By correcting ATP predictions in draft reconstructions, we empower a broad family of model analysis algorithms like E-matrix analysis that rely on accurate ATP predictions to work properly.

We worked in collaboration with Mikayla Borton and Kelly Wrighton to apply our improved pipeline to analyzing multi-omics data from the Genome Resolved Open Watersheds (GROW) project. In this project, the GROW team has already sequenced 161 river microbiomes, from which they have assembled 2093 metagenome assembled genomes (MAGs) dereplicated across all samples. We applied the annotation and modeling systems in KBase to construct and characterize metabolic models for all of these MAGs, exposing insights into variation in metabolic pathways and energy biosynthesis mechanisms across these strains.

Unfortunately, draft metabolic models still struggle to accurately predict phenotypes for new genomes and MAGs, as they are extremely vulnerable to errors propagated from annotation gaps in metabolic pathways. Machine learning (ML) is more robust to incomplete data and generally more accurate than models when sufficient training data is available, but ML lacks mechanistic detail in output. Clearly there is great potential synergy in combining models and ML. We explore this idea in collaboration with Jim Davis by developing machine learning classifiers to predict growth on 60 distinct carbon substrates based on a training set of 178 diverse genomes. We similarly generated draft models for these genomes, demonstrating that machine learning predictions significantly outperform draft models. Thus, by fitting models to ML predictions for new genomes, we can rapidly identify missing annotations and raise the quality of models, even for incomplete genomes.

Another significant challenge is that many metabolic pathways remain unknown, while the genes associated with these pathways remain unannotated or misannotated. To address this challenge, Iâ€™ll be discussing the gene and pathway function discovery pipeline in KBase. We apply this pipeline to predict the pyridine degradation pathway and its associated genes in Arthrobacter luteus. This pipeline integrates data from genomes, models, cheminformatics, transcriptomes, and protein structure to identify the most probable candidate gene, which was subsequently validated in the lab. We also applied this pipeline to predict novel metabolic pathways in the minimal genome developed by JCVI, boosting our ability to explain the metabolome of this species from 10% of observed metabolites to 50% of the observed metabolites.

Topics

Systems Biology

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

The Foundations of Computer Aided Process Design (FOCAPD) Conference

Foundations of Molecular Modeling and Simulation (FOMMS 2024)

Upcoming Conferences & Events

The Foundations of Computer Aided Process Design (FOCAPD) Conference

2024 BASF Sponsored CCPS Faculty Workshop

Artificial Intelligence in PSM: First Steps

Foundations of Molecular Modeling and Simulation (FOMMS 2024)

2024 Brazil Student Regional Conference

2024 Dow Sponsored CCPS Process Safety Faculty Workshop

2024 International Mammalian Synthetic Biology Workshop (mSBW)

2024 Chemical Ventures Conference

2024 China Chem-E-Car Competition

CEP: July 2024

CEP: June 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(62g) Metabolic Modeling Tools in Kbase to Integrate Multi-Omics Data, Understand Microbiome Interactions, and Explore Energy Biosynthesis Mechanisms

AIChE Annual Meeting

2022

2022 Annual Meeting

Food, Pharmaceutical & Bioengineering Division

Systems and Quantitative Biology: Modeling Biological Processes

Monday, November 14, 2022 - 9:48am to 10:30am

Authors

Topics

More Conference Links

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams