(291e) Application of Machine Learning and Active Learning to Enhance Chemical Yields in Microbes | AIChE

(291e) Application of Machine Learning and Active Learning to Enhance Chemical Yields in Microbes

Authors 

Adamczyk, P., DOE Great Lakes Bioenergy Research Center
Zhang, X., University of Wisconsin Madison
Ramanathan, P., University of Wisconsin Madison
Reed, J., University of Wisconsin Madison
Microbes can be used to produce a wide variety of important chemicals from low cost substrates. The biochemical production capabilities of microbes can be improved through metabolic engineering, where metabolic and regulatory processes are adjusted using genetic manipulations. Efforts have been made to develop computational methods to study metabolic and regulatory networks of microbes and identify the genetic interventions needed to enhance the production of the desired metabolite(s). While kinetic models can be used to suggest strategies based on changes in enzyme levels and/or kinetic properties to improve flux through a metabolic pathway, these kinetic models need a lot of experimental data (e.g., proteomic, metabolomic, and fluxomic data) to parameterize them. Hence, there is a need for computational methods that can predict expression levels needed to achieve metabolic engineering goals with limited amounts of experimental data and no kinetic details about the biochemical pathways.

We developed an active learning framework called ActiveOpt to design expression constructs for a metabolic pathway of interest. ActiveOpt does not need a detailed kinetic model and instead uses a Support Vector Machine (SVM) classifier to predict product yields or productivities (either high or low) from ribosome binding site strengths estimated by the RBS Calculator [1]. ActiveOpt initially trains a SVM classifier from a few experiments, where RBSs in gene expression constructs are varied and product yields are measured and labeled (as high/low yield), and then proposes subsequent experiments to be conducted. ActiveOpt, with relatively little experimental data and no mechanistic or kinetic details of the pathway, can be used to design experiments to achieve high biochemical yields in a small number of experiments.

ActiveOpt was tested on two separate datasets: (i) a newly generated valine yield dataset and (ii) a published neurosporene productivity dataset [2]. The valine dataset included 91 experiments, in which plasmids expression valine biosynthesis and exporter genes (ilvBNIHCDE and ygaZH) with varying RBS strengths were transformed into Escherichia coli and valine yields measured. A leave-one-out cross validation showed that SVM classifiers built from this dataset have high precision (75%) and recall (87%). Starting with just a few of the 91 possible experiments, ActiveOpt could identify expression constructs resulting in at least 95% of the highest measured valine yield (across all conducted 91 experiments) in a small number of experiments (typically <7) and identify the genes whose RBS strengths significantly affect valine yield. ActiveOpt was also tested on a previously published neurosporene dataset, and the algorithm could again identify the expression constructs with high productivity in less than 10 experiments as compared to the 101 experiments conducted in the original study.

These results show that ActiveOpt can efficiently design gene expression constructs that lead to high chemical yield in organisms in very small numbers of experiments. It can also identify the genes whose expression (as predicted by RBS strengths) significantly influence biochemical production.

References

1. Salis, H.M., E.A. Mirsky, and C.A. Voigt, Automated design of synthetic ribosome binding sites to control protein expression. Nat Biotechnol, 2009. 27(10): p. 946-50.

2. Farasat, I., et al., Efficient search, mapping, and optimization of multi-protein genetic systems in diverse bacteria. Mol Syst Biol, 2014. 10: p. 731.