Automated Statistical Design of Experiments for Metabolic Engineering | AIChE

Automated Statistical Design of Experiments for Metabolic Engineering

Authors 

Olson, B. - Presenter, Lawrence Berkeley National Laboratory
Dehal, P., Lawrence Berkeley National Laboratory

The growing field of metabolic engineering for natural product production typically involves transplant of a heterologous pathway comprising multiple genes into a host organism.  A common challenge in metabolic engineering is balancing enzyme expression levels in this pathway to maximize product production.  Finding this optimal balance requires designing, building and testing many strains with different combinations of genetic components (genes, promoters, ribosome binding sites, etc). Since pathways can contain non-linear interactions, the number of potential strains grows exponentially with the length of the pathway.  Software tools make it easy to design constructs to cover the entire space of potential strains, however, the cost of building and testing constructs means that only a small fraction of the design space can be experimentally validated, even in a high-throughput environment. Thus effective Design of Experiments (DoE) is key to pathway optimization given limited experimental resources.

We approach this DoE challenge as a combinatorial optimization problem.  The goal is to find a high target-producing strain out all potential strains in the pathway design space, while minimizing the total number of constructs which must be actually built and tested. The field of combinatorial optimization offers many algorithms for approaching such a problem.  We propose an Estimation of Distribution Optimization (EDO) framework with an iterated regression model of the target metabolic pathway using proteomics and metabolomics data.  This approach leverages multiple rounds of experiments to narrow in on the goal strain.  In each round we first build a statistical model from the previous round’s omics data to predict target product production for any potential strain in the design space.  We then sample from this model to generate the DoE for the next round.  After each round, the model accuracy increases and is able to more effectively direct the next round of experiments.

Previous approaches to guide metabolic engineering show that statistical models are able to capture non-obvious interactions between pathway components, but fail to realize the potential of machine learning as a tool for automated DoE.  Previous methods, which employ PCA and linear regression, are limited in their ability to capture arbitrary complex interactions in high-dimensional data sets and require significant manual interpretation.  Machine learning algorithms allow us to train arbitrarily complex regression models and the EDO framework provides automated, statistically sound, DoE.

We test the effectiveness of this approach on several existing metabolic engineering data sets.  Our toolkit predicts high-yield strains for different target natural products on distinct pathways and suggests additional experiments which are likely to yield even higher production levels.  These predictions are computed without the need for manual interpretation, however, the toolkit also provides visualizations of high-dimensional omics data for qualitative representations of the pathway model.