Automated metabolic pathway design and characterization | AIChE

Automated metabolic pathway design and characterization


The automation and scaling of DNA synthesis and sequencing is ushering in a new era to metabolic engineering. Where once it was prohibitively costly to test many designs during a project, automated fabs are being built to prototype biosynthetic pathways and test them on a massive scale. In conjunction with these efforts, my lab has written algorithms to automate the design of biosynthetic pathways. The Act Ontology is an aggregator of standardized Observations of biochemical reactions. We have populated the Ontology from public databases and natural language processing (NLP) of PubMed abstracts. Upon this ontology we have written algorithms that process and abstract the observed reactions, to bin them by chemical-relatedness. Upon these abstractions we employ a design synthesis tool that uses these models to predict sets of genes that, when added to a particular cell, will result in the production of a target biochemical. The tool can exhaustively permute the chemical space accessible by concrete monofunctional enzyme reactions, which we refer to as the Reachable molecules. It can also extrapolate this space by including one speculated reaction. These Reachables can then be surveyed in various ways for interest in terms of societal benefit, profit, biosafety, regulatory, or intellectual property concerns. For any one of the Reachables, Act can compute all known paths to that molecule and all known genes that could satisfy each reaction in each path. Concrete predictions can thus be enumerated and then ranked for plausibility by incorporation of expression data, thermodynamic considerations, and the like. We have demonstrated the utility of the tool through the biosynthesis of Acetaminophen (Tylenol), an unnatural chemical that is surprisingly only one enzymatic step away from native E. coli metabolism. Additionally we have demonstrated unexpected glucaric acid pathways that included predictions mined through NLP methods. Ultimately, the algorithms are only as accurate, detailed, and extensive as the bag of Observations that support them. In parallel to these studies, my wetlab has developed multiplex characterization assays to scale the process of acquiring new Observations. These methodologies include MOLSET, a workflow from DNA microarrays to expression data acquired through deep sequencing. A similar technique, DLENCA allows monitoring of enzymatic reactions at scale. Large data sets from such experiments and pathway prototyping experiments are fed into Act as Observations. This completes an automated design-test-learn cycle where new scientific knowledge automatically accumulates through the testing of predictions.