(62g) Metabolic Modeling Tools in Kbase to Integrate Multi-Omics Data, Understand Microbiome Interactions, and Explore Energy Biosynthesis Mechanisms | AIChE

(62g) Metabolic Modeling Tools in Kbase to Integrate Multi-Omics Data, Understand Microbiome Interactions, and Explore Energy Biosynthesis Mechanisms

Authors 

Henry, C. S. - Presenter, Argonne National Laboratory
Increasingly multi-omics data is becoming more accessible for the study of a wide range of complex biological systems. Today, large-scale metagenomes can be readily obtained from soil microbiome systems, while the instruments and protocols surrounding the collection of metabolomic and proteomic data are constantly improving. Yet analysis methods still struggle to annotate and integrate these individual datasets, let alone combine them to discover new biological principles. Many of the species observed within microbiome systems have never been cultured and it is a challenge to assemble their genomes from metagenomic data; many of the genes observed within microbiomes are not annotated or misannotated; and many of the metabolites observed cannot be identified or associated with known biochemical pathways. Here we will discuss approaches we are applying with KBase and ModelSEED to address all three of these problems.

First, I’ll be discussing recent advances in our ModelSEED reconstruction pipeline to dramatically improve the quantitative accuracy of our draft models in predicting ATP yields across the tree of life. We apply this pipeline to 5000 reference and representative genomes, showing dramatic improvements in model size, accuracy, and completeness compared with previous versions of ModelSEED. In comparing to competing tools like CarveMe, we show improved accuracy. By correcting ATP predictions in draft reconstructions, we empower a broad family of model analysis algorithms like E-matrix analysis that rely on accurate ATP predictions to work properly.

We worked in collaboration with Mikayla Borton and Kelly Wrighton to apply our improved pipeline to analyzing multi-omics data from the Genome Resolved Open Watersheds (GROW) project. In this project, the GROW team has already sequenced 161 river microbiomes, from which they have assembled 2093 metagenome assembled genomes (MAGs) dereplicated across all samples. We applied the annotation and modeling systems in KBase to construct and characterize metabolic models for all of these MAGs, exposing insights into variation in metabolic pathways and energy biosynthesis mechanisms across these strains.

Unfortunately, draft metabolic models still struggle to accurately predict phenotypes for new genomes and MAGs, as they are extremely vulnerable to errors propagated from annotation gaps in metabolic pathways. Machine learning (ML) is more robust to incomplete data and generally more accurate than models when sufficient training data is available, but ML lacks mechanistic detail in output. Clearly there is great potential synergy in combining models and ML. We explore this idea in collaboration with Jim Davis by developing machine learning classifiers to predict growth on 60 distinct carbon substrates based on a training set of 178 diverse genomes. We similarly generated draft models for these genomes, demonstrating that machine learning predictions significantly outperform draft models. Thus, by fitting models to ML predictions for new genomes, we can rapidly identify missing annotations and raise the quality of models, even for incomplete genomes.

Another significant challenge is that many metabolic pathways remain unknown, while the genes associated with these pathways remain unannotated or misannotated. To address this challenge, I’ll be discussing the gene and pathway function discovery pipeline in KBase. We apply this pipeline to predict the pyridine degradation pathway and its associated genes in Arthrobacter luteus. This pipeline integrates data from genomes, models, cheminformatics, transcriptomes, and protein structure to identify the most probable candidate gene, which was subsequently validated in the lab. We also applied this pipeline to predict novel metabolic pathways in the minimal genome developed by JCVI, boosting our ability to explain the metabolome of this species from 10% of observed metabolites to 50% of the observed metabolites.

Topics