(302b) Optimal Identification in Systems Biology. Applications in Cell Signaling | AIChE

(302b) Optimal Identification in Systems Biology. Applications in Cell Signaling

Authors 

Balsa-Canto, E. - Presenter, Spanish Council for Scientific Research
Rodriguez-Fernadez, M. - Presenter, Spanish Council for Scientific Research
Alonso, A. A. - Presenter, Spanish Council for Scientific Research
Banga, J. R. - Presenter, Spanish Council for Scientific Research


More than 10% of the proteins encoded in the human genome are involved in intracellular signalling cascades which regulate the typical cellular responses such as growth, division, differentiation and apoptosis. The malfunction of these signalling pathways, particularly those involving phosphorylation cascades, has a strong relationship with the development of diseases including cancer, diabetes, Alzheimer's disease or Parkinson's disease. The aim of modelling cell signalling pathways is to provide a systematic framework to generate hypothesis and make predictions ?in silico?, in order to get a better insight into the disease process and ultimately to identify potential drug targets. In particular, the modelling and simulation of cellular signalling pathways as networks of biochemical reactions has received major attention during recent years (see the review by Kholodenko, 2006). Most proposed models assume that the system is well-mixed and the mass conservation law results in sets of non-linear ordinary differential equations (ODEs). These models depend on several parameters (kinetic constants, etc.) and probably some initial conditions (initial concentration or number of molecules of some proteins) which are not accessible to experimental determination and must therefore be estimated by fitting the model to experimental data (model calibration). The model calibration is performed by minimizing a cost function which quantifies the differences between model predictions and measurements. This task is often rather complicated, mainly due to the following reasons (Banga et al, 2005; Rodriguez-Fernandez et al, 2006): ? the presence of a large number of parameters (usually dozens, or even hundreds) ? the multimodal character of the optimization problem, i.e. the presence of several sub-optimal solutions ? the presence of identifiability problems, that is, the impossibility of calculating unique values for all parameters. Moreover model calibration may only be performed successfully if the sources of information are of a sufficiently high quality. Unfortunately, experiments in molecular biology are usually time consuming and expensive and rarely produce large and accurate data sets (Kutalik et al., 2004). Concerning this, the following question should be answered: can the parameters be given unique values using a particular experimental procedure? Unfortunately the answer to this question may be negative, therefore a careful experimental design is required. This work proposes an iterative experimental design procedure which involves several phases:

1 preliminary structural identifiability analysis 2 use of parametric sensitivities to measure how the model output is affected by a slight modification of the parameters 3 use of the relative parametric sensitivities to rank the parameters in order of importance 4 computation of collinearity indexes to evaluate practical indentifiability problems in groups of two to several parameters 5 the solution of an optimal experimental design (OED) problem for parameter estimation 6 the calculation of robust confidence regions for the parameter estimates

Initial four phases allow to classify the parametes in two main groups in such a way that the components from one group (set K) are to be estimated from the experimental data whereas the parameters in the other set (set k) are kept constant. The OED phase is then devoted to obtain the optimal experimental design for the estimation of the parameters in set K. Optimal experimental design consists of the determination of the scheme of measurements that generates the maximum amount of information for the purpose of estimating the parameters with the greatest precision and/or decorrelation (see for example, Banga et al., 2002). The amount and quality of information can be measured in terms of a scalar function of the Fisher Information Matrix (FIM) computed for a given (near-optimal) value of parameters. In the context of cell signalling, Faller et al.(2003) made use of simulation based techniques to calculate polynomial optimal input profiles in order to enhance parameter estimation accuracy for a MAP kinase cascade; Kutalik et al. (2004) proposed the calculation of optimal sampling times so as to reduce the variation of the parameter estimates. Here, the optimal experimental design problem is formulated as a more general dynamic optimisation problem and its solution is approached using the so called control vector parameterization (CVP) approach. The CVP scheme proceeds dividing the duration of the experiment (time horizon) into a number of elements, and approximating the input functions inside these elements using low order polynomials. As a result, a non-linear programming problem (NLP) is obtained, where the decision variables are the polynomial coefficients plus the sampling times and possibly the experimental initial conditions. The evaluation of the objective function requires the simulation of the system dynamics plus the calculation of the parametric sensitivities to compute the Fisher Information Matrix (FIM). Remark that the non-linear character of the mathematical models of the cell signalling pathways lead to multi-modal NLPs therefore the use of global optimization methods is required. Finally, the computed optimal dynamic experiments are used to generate hundreds of pseudo-experimental data and the parameter estimation problems are then solved to estimate robust confidence intervals for the parameter estimates and thus measure the quality of the experimental design. The applicability and advantages of this iterative experimental design procedure are illustrated by considering a mitogen-activated protein (MAP) kinase cascade, which is frequently involved in larger cell signalling pathways, and it is known to regulate several cellular processes of major importance. The results obtained clearly indicate that dynamic experiments combined with optimal sampling times yield more information than the classical experiments using constant stimulus and equidistant measurements. Moreover the resulting confidence regions for the parameter estimates are significantly reduced.

References Banga, J.R., Versyck, K.J. and Van Impe, J.F. (2002) Computation of optimal identification experiments for nonlinear dynamic process models: an stochastic global optimization approach. Ind. & Eng. Chem. Res. 41, 2425-2430 Banga,J. R., E. Balsa-Canto, M. Rodríguez and A. A. Alonso (2005) Model calibration in Systems Biology. BioForum Europe 9:42-43. Faller, D., Klingmüller, U. and Timmer, J. (2003) Simulation methods for optimal experimental design in systems biology. Simulation, 79, 717?725. Kutalik, Z., Cho, K-H. and Wolkenhauer, O. (2004) Optimal sampling time selection for parameter estimation in dynamic pathway modelling. BioSystems, 75, 43?55. Rodriguez-Fernandez, M., Mendes, P. and Banga, J.R..(2006) A hybrid approach for efficient and robust parameter estimation in biochemical pathways. Biosystems, 83 (2-3), 248-265.