(541d) Identification of Families of Signal Transduction Models Using Pareto Optimal Ensemble Techniques (POETs)
AIChE Annual Meeting
2010
2010 Annual Meeting
Systems Biology
In Silico Systems Biology: Intracellular Signaling and Gene Regulation
Wednesday, November 10, 2010 - 4:30pm to 4:55pm
Mathematical modeling of complex signal transduction and gene expression programs is an emerging tool for understanding disease mechanisms. However, conventional wisdom suggests that the data requirement to identify and validate complex mechanistic models is too large. Typically, it is not possible to uniquely identify model parameters, even with extensive training data and perfect models. This reality has brought into the foreground a number of interesting questions. For example, do we actually need exact parameter knowledge to predict qualitatively important properties of a molecular network? Ensemble approaches, which use parametrically and potentially even structurally uncertain model families, have emerged to deal with uncertainty in systems biology and other fields like weather prediction. Their central value has been the ability to quantify simulation uncertainty and to constrain model predictions, despite sometimes only order-of-magnitude parameter estimates. In this study, we introduce Pareto Optimal Ensemble Techniques (POETs) to identify a family of simplified proof-of-concept signal transduction models. POETs integrate Simulated Annealing (SA) with Pareto optimality to estimate parameter sets on or near the optimal tradeoff surface between competing training objectives. The proof-of-concept model described in a general way the integration of extracellular signals with kinase activation, the phosphorylation of transcription factors and the up-regulation of an associated transcriptional program (including molecular descriptions of transcription, translation initiation and nuclear transport). Thus, while the example was not specific to a particular growth factor, signal integration cascade or expression program, it contained many of the general features encountered when identifying specific models. We modeled the molecular interactions in the prototypical signaling network using mass action kinetics within an ordinary differential equation (ODE) framework (64-ODEs in total). We assumed spatial homogeneity but differentiated between cytosolic, membrane and nuclear localized processes. The true model was used to generate synthetic data sets from which we tested the ability of POET algorithm to identify the 117 unknown model parameters. Each synthetic measurement was assumed to be a Northern or Western blot. To this end, we implemented a novel scaling procedure that allowed the systematic integration of these types of measurements into the model identification problem. Thus, we knew only relative amounts of protein or mRNA for any specific condition or time. To constrain the absolute concentration scale, we assumed a single ELISA or qRT-PCR measurement for the highest intensity band in each case. Lastly, we limited our training data to 20 samples per experiment (an upper limit on the lanes available on a Western blot).
The POET algorithm generated an ensemble of 1600 signal transduction models with several interesting features. First, the ODE model used here was deterministic and did not describe stochastic gene expression fluctuations. However, because many many different models were sampled, the deterministic ensemble exhibited population-like behavior. For example, scaled gene expression levels were approximately normally distributed following the addition of extracellular ligand. Thus, while gene expression was not described at a single-cell level, the ensemble captured coarse-grained expression heterogeneity. This suggested that deterministic ensembles could perhaps be used to model heterogeneous populations when stochastic simulation is not tractable. Second, the model ensemble captured critical robustness and fragility features of the true model, despite significant parameter uncertainty. Edge and node ranks computed over the ensemble recovered the true rankings for highly fragile and highly robust network components. This suggested that, in practice, results from sensitivity analysis obtained by analyzing model ensembles could represent true behavior, at least for highly fragile or robust network features. Taken together, analysis of the experimentally constrained ensemble of models generated using POET suggested that we don't need exact parameter information to understand qualitatively important network features.