(35g) Development of Machine Learning Methods for Prediction of Microbial Metabolisms and Biosynthesis Performance (Invited Speaker) | AIChE

(35g) Development of Machine Learning Methods for Prediction of Microbial Metabolisms and Biosynthesis Performance (Invited Speaker)

Authors 

Tang, Y. - Presenter, Washington University in St. Louis
Mechanistic based metabolic models have been widely used to predict cell physiologies under simplified bioprocess conditions. For highly complex biological systems, machine learning models may be able to decode the relationship among genetic, enzymatic, and environmental factors that lead to specific metabolic states. To fulfill this hypothesis, we have attempted to develop and employ data driven models to offer predictions on microbial physiologies. In the first case study, we have shown that machine learning can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Specifically, 13C-metabolic flux analysis (13C-MFA) is a powerful tool to experimentally measure in vivo enzyme reaction rates in microorganisms. We leveraged published data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms and manually extracted features. Then we tested three machine learning methods (Support Vector Machines, k-Nearest Neighbors, and Decision Trees) to capture the sophisticated relationship between environmental/genetic factors and metabolic flux configurations. We also implemented quadratic programming to adjust flux profiles predicted from machine learning to satisfy stoichiometric (mass balance) constraints. The resulting model could forecast bacterial fluxome under denoted conditions and facilitate the metabolic analysis procedures.

In the second case study, we integrated data-driven methods with genome scale metabolic model for assessment of microbial bio-production (yield, titer and rate). Using Escherichia coli or Yarrowia lipolytica as examples, we organized and curated data sets comprising recent biomanufacturing papers. We then augmented the features (e.g., product and substrate types, bioreactor conditions, and genetic background) extracted from literature with additional features derived from genome-scale model simulations. To alleviate the challenges of sparse data sets, data augmentation and ensemble learning were employed. The hybrid framework demonstrated a reasonably high cross-validation accuracy for prediction of cell factory performance metrics under presumed bioprocess and pathway conditions. These predictions could be: 1. used to assess and rank these influential factors on bio-production; 2. integrated with technoeconomic analysis for prior estimation of cell factories outcomes (i.e., serve as the useful risk assessment tool); or 3. employed in conjunction with genome scale modeling to improve computational design tools.

Topics