Redemption: Reduced Dimensional Ensemble Modeling and Parameter Estimation | AIChE

Redemption: Reduced Dimensional Ensemble Modeling and Parameter Estimation




P355740.docx

The creation of kinetic ordinary differential equation (ODE) models of biological networks is often hampered by imprecise knowledge of reaction rate equations and their associated parameters. The model is generally written as:
dX/dt = Sv(x,p) (1)
where X is the vector of species concentrations, S is the stoichiometric matrix, v(X,p) is the vector of rate equations, and p is the vector of kinetic parameters. Here, the estimation of unknown kinetic parameters from time-series concentration data frequently becomes the bottlenecking step, motivating the development of a large number of methods [1]. Existing methods can generally be divided into two approaches: integral and differential approach, based on whether or not the ODE model is integrated during the estimation. In the integral approach, the parameter estimation is formulated as a constrained optimization to minimize model prediction error. Meanwhile, the differential approach involves a pre- processing step, in which the time-series data are smoothen and differentiated to give estimates of dX/dt and Eq (1) is then used to compute dynamic reaction flux estimates v. Afterwards, the parameters are estimated by minimizing flux prediction errors. The main advantage of the differential over the integral approach is computational speed, as the ODEs need not be integrated and the parameter estimation can be done one flux at a time. However, the differential approach is known to give biased parameter estimates, whose the parameter accuracy is sensitive to the data-smoothing step.
In metabolic networks, the number of metabolites often exceeds that of reactions, and thus the matrix S does not have a full column rank. Consequently, the differential approach cannot be directly applied to metabolic networks, as the dynamic flux estimation becomes underdetermined. Despite this restriction, it may be possible to estimate the dynamic fluxes when the data are sufficiently dense or by taking minimum norm solution [2, 3]. In a new class of methods, we have used the relationship between dX/dt and v in Eq. (1) to reduce the dimensionality of the parameter estimation [4]. Specifically, we first divided the fluxes into independent and dependent subsets, vi and vd, such that the dependent fluxes could be uniquely determined from the independent fluxes according to Eq. (1). Given the values of the (independent) parameters of vi(X,p) and data of X, the flux vi could be calculated according to the rate equations vi(X,p). Subsequently, the flux vd were obtained by solving Eq. (1) using the estimates of dX/dt from data and the vi values above. The (dependent) parameters of vd(X,p) could then be estimated by least square optimization of the differences between the flux vd and the flux function vd(X,p), performed one flux at a time. Therefore, the parameter estimation was done only over the independent parameters of vi(X,p). We have shown using case studies that such a strategy could provide more than two orders of magnitude improvement in the computational speed over the traditional parameter estimation using the same error function [4].
However, the strategy above still depends on time-series data preprocessing and thus suffers from the same issues affecting any differential estimation method. Bhatt and coworkers [5-7] recently proposed differential estimation methods that avoided time-series data smoothing and differentiation, by estimating the extents of reactions, not the reaction fluxes. Nevertheless, the estimation of reaction extents requires the S matrix to have a full column rank. In this work, we have developed a new reduced dimensional parameter estimation that can generate unbiased parameter estimates. Motivated by the
extents of reactions, the new method involved the calculation of integrated fluxes V, based on rewriting
Eq. (1) in the integral form: X(t) â?? X(0) = SV(X,p) (2)
The method was implemented following the procedure described above. In other words, we divided the
integrated fluxes V into independent and dependent components, and performed the parameter search only for those that appeared in the independent subset. We have demonstrated and compared the performance of the integrated flux method using two case studies: an ODE model of branched metabolic network and a lin-log model of L. lactis metabolic pathways. The case studies showed that the integrated flux strategy could provide unbiased parameter estimates and better accuracy than our previous method, at the cost of slightly higher computational speed. For lin-log models, the integrated flux however performed faster than the previous method, by taking advantage of the structure of the model equations where the integrations of log(X) were only computed once.
The difficulty in the parameter estimation above has been attributed to the parameter identifiability, or the lack thereof. In other words, there can exist multiple parameter combinations that are indistinguishable according to the data. This issue has motivated the application of ensemble modeling through the creation of a family of models. Within the umbrella of ensemble modeling, we have created a method for the generation of an ensemble of parameters that can provide statistically equivalent fit to the experimental data [8]. The method was based on the reduced dimensional parameter estimation method above. Here, instead of finding the global minimum solution of the error function, we used a combination of an out-of-equilibrium Adaptive Monte Carlo method and a multiple ellipsoids-based sampling method [9] to identify a set of parameters whose error functions were within a specified threshold. The threshold error value could be set using statistical score (e.g. F-distribution) or estimated by bootstrapping the original data using Monte Carlo approach. Our ensemble method has the advantage that the ensemble could be compactly defined over only the space of the independent parameters.
The methods mentioned above are available as a MATLAB based toolbox, called REDEMPTION (REduced Dimensional Ensemble Modeling and Parameter estimaTION). Here, we have used eSS (Enhanced Scatter Search) algorithm from SSmGO (Scatter Search for Matlab Global Optimization) Toolbox for the global optimization [10, 11], and CVODE from SUNDIALS (SUite of Nonlinear and DIfferential/ALgebraic equation Solvers) for solving ODEs [12]. For the exploration of parameters for ensemble modeling, we have used the HYPERSPACE toolbox [9]. The beta version of the toolbox is available upon request.
1. Chou, I.C. and E.O. Voit, Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Mathematical Biosciences, 2009. 219(2): p.
57-83.
2. Chou, I.C. and E.O. Voit, Estimation of dynamic flux profiles from metabolic time series data.
BMC Systems Biology, 2012. 6.
3. Voit, E.O., Characterizability of metabolic pathway systems from time series data. Mathematical
Biosciences, 2013.
4. Jia, G.J., G. Stephanopoulos, and R. Gunawan, Incremental parameter estimation of kinetic metabolic network models. BMC Systems Biology, 2012. 6.
5. Amrhein, M., et al., Extents of Reaction and Flow for Homogeneous Reaction Systems with Inlet and Outlet Streams. Aiche Journal, 2010. 56(11): p. 2873-2886.
6. Bhatt, N., M. Amrhein, and D. Bonvin, Incremental Identification of Reaction and Mass-Transfer Kinetics Using the Concept of Extents. Industrial & Engineering Chemistry Research, 2011. 50(23): p. 12960-12974.
7. Bhatt, N., et al., Incremental identification of reaction systems-A comparison between rate-based and extent-based approaches. Chemical Engineering Science, 2012. 83: p. 24-38.
8. Jia, G., G. Stephanopoulos, and R. Gunawan, Ensemble Kinetic Modeling of Metabolic Networks from Dynamic Metabolic Profiles. Metabolites, 2012. 2(4): p. 891-912.
9. Zamora-Sillero, E., et al., Efficient characterization of high-dimensional parameter spaces for systems biology. BMC Systems Biology, 2011. 5.
10. Egea, J.A., et al., Scatter search for chemical and bio-process optimization. Journal of Global
Optimization, 2007. 37(3): p. 481-503.
11. Rodriguez-Fernandez, M., J.A. Egea, and J.R. Banga, Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems. BMC Bioinformatics, 2006. 7.
12. Hindmarsh, A.C., et al., SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers.
Acm Transactions on Mathematical Software, 2005. 31(3): p. 363-396.