(88a) A Bayesian Experimental Design Framework for Optimizing Microbial Communities | AIChE

(88a) A Bayesian Experimental Design Framework for Optimizing Microbial Communities

Authors 

Thompson, J. - Presenter, University of Wisconsin-Madison
Zavala, V., University of Wisconsin-Madison
Venturelli, O., University of Wisconsin-Madison
Microbial communities have enormous functional potential, including the ability to valorize biofuel production [1,2], improve agricultural yields [3], and detoxify waste [4]. However, designing microbial communities to perform desired functions remains a technical challenge. While communities of microbial species can outperform individual species at industrially relevant tasks, determining such communities and optimizing operating conditions is difficult due to poorly understood mechanisms and the inability to experimentally observe all possible conditions. Recently, data-driven approaches to rationally select for microbial communities have emerged as a promising avenue for microbiome engineering [5]. Despite a handful of initial successes in data driven approaches to optimize microbial communities [6,7], these studies have relied on fitting simple ecological models to small, noisy data and did not leverage model uncertainty to maximize the information content of experimental designs. Improving the ability to engineer microbial communities will therefore require advancements in model development, parameter estimation, and optimal experimental design.

In summary, in this work, we present a Bayesian design-of-experiments framework for modeling and optimizing microbial communities directly from data. Our framework includes a recurrent neural network (RNN) architecture tailored to model microbial community dynamics directly from data, a Bayesian inference method for parameter estimation and quantification of prediction uncertainty, and a model-guided optimization approach to select batches of microbial communities that collectively maximize information content and functional design objectives.

Interactions between microbial species are complex and currently not well understood, which necessitates flexible modeling approaches that learn how species interact from experimental data. Machine learning methods that model the dynamics of microbial species abundance over time, such as recurrent neural networks, are thus compelling approaches; however, they can produce physically unrealistic predictions, such as negative species abundances or the prediction of a species appearing despite not being initially present in the community. To overcome this limitation, we present an RNN that we call Microbiome Recurrent Neural Network (MRNN), a modified RNN architecture that eliminates the possibility of predicting physically unrealistic species abundances and metabolite concentrations. Because biological data sets are typically limited to a small collection of noisy observations, we present a rigorous, automated approach to optimize the degree of model regularization to avoid over-fitting [8]. Furthermore, data acquisition is often time-consuming and expensive. Consequently, the selection of an informative set of experiments is crucial for developing models that capture system properties while minimizing time and resources spent on performing experiments [9]. We adopt a Bayesian experimental design strategy to optimize dynamic biological systems using RNNs.

Once fit to experimental data, the MRNN is used to make probabilistic predictions of previously unobserved experimental conditions. Using a utility function that balances exploration with exploitation, a subset of experimental conditions can be proposed for future experiments. We show that the MRNN outperforms a more flexible machine learning model in the prediction of species abundances over time using ground truth species abundance data simulated from a generalized Lotka-Volterra model. We then show that the MRNN accurately predicts confidence intervals of species abundance and metabolite concentration for an experimental dataset that contains 95 different microbial communities composed of unique subsets of 25 species. To demonstrate the ability of the framework to seek informative experimental designs that optimize a time-dependent objective, we show that the MRNN can be applied to increase the abundance of a set of beneficial microbial species using a ground truth resource-competition model.

References:

[1] Scarborough MJ, Lynch G, Dickson M, McGee M, Donohue TJ, Noguera DR. Increasing the economic value of lignocellulosic stillage through medium-chain fatty acid production. Biotechnology for biofuels. 2018;11(1):1–17.

[2] Agler MT, Spirito CM, Usack JG, Werner JJ, Angenent LT. Chain elongation with reactor microbiomes: upgrading dilute ethanol to medium-chain carboxylates. Energy & Environmental Science. 2012;5(8):8189–8192.

[3] Kaul S, Choudhary M, Gupta S, Dhar MK. Engineering host microbiome for crop improvement and sustainable agriculture. Frontiers in Microbiology. 2021;12:1125.

[4] Loffler FE, Edwards EA. Harnessing microbial activities for environmental cleanup. Current Opinion in Biotechnology. 2006;17(3):274–284.

[5] Lawson CE. Retooling Microbiome Engineering for a Sustainable Future. Msystems. 2021;6(4):e00925–21.

[6] Clark RL, Connors BM, Stevenson DM, Hromada SE, Hamilton JJ, Amador-Noguez D, et al. Design of synthetic human gut microbiome assembly and butyrate production. Nature communications. 2021;12(1):1–16.

[7] Stein RR, Tanoue T, Szabady RL, Bhattarai SK, Olle B, Norman JM, et al. Computer-guided design of optimal microbial consortia for immune system modulation. Elife. 2018;7:e30916.

[8] Bishop CM. Pattern recognition. Machine learning. 2006;128(9).

[9] Box GE, Lucas H. Design of experiments in non-linear situations. Biometrika. 1959;46(1/2):77–9