(725e) NEW Capabilities for Large-Scale Models in Computational Biology | AIChE

(725e) NEW Capabilities for Large-Scale Models in Computational Biology

Authors 

Abbott, C. - Presenter, Brigham Young University
Haseltine, E. L., California Institute of Technology


NEW
CAPABILITIES FOR LARGE-SCALE MODELS IN COMPUTATIONAL BIOLOGY

Casey
S. Abbott and John D. Hedengren

Brigham
Young University

Provo,
Utah

Abstract

Introduction

         Advances in
biomedical research have been leading to an increase of experimental data to be
interpreted in the context of reaction pathways, molecular transport, and
population dynamics. Kinetic modeling is a common way to interpret this data
and is used in the pharmaceutical industry in developing clinical trials for
new medications [1]. These models are based on first principles such as mole
balances and kinetic reactions. Often in the development of the model there are
parameters and initial conditions that are costly or cannot be measured
directly through experimental procedures. These parameters are usually
estimated through the use of optimization techniques. It is reasonable to
believe that biological modeling's role will only increase as pharmaceutical
companies such as Pfizer look to scale back and be more focused, spending less
on R&D while expecting more results [2]. One company that sees biological
modeling as a key to the future is Vertex Pharmaceutical who is ?working ... to develop improved models
that can be used to more rapidly identify and optimize lead molecules and drug
candidates than currently used methods? [3].

         Another
indicator of the growing interest of biological models is the large repository
of biological models publically available in the Systems Biology Markup
Language (SBML), which includes hundreds of contributions. Many models in this standard format
for computational biology have detailed reaction metabolic pathways that
describe biological systems, including cause and effect relationships in the
human body.  While simulations of these
biological systems have been successfully applied for many years, the alignment
to available measurements continues to be a challenge.  Researchers are reporting that best available
solution techniques continue to limit the size of the reconciliation of model
and measurements to small and medium size problems. This limits the usefulness
of the models due to the many assumptions and simplifications that are required
in order for the optimizer to be able to perform parameter estimation.

         This study
investigates the ability of advanced process monitor (APM) software to estimate
parameters of large-scale models. APM is proven optimization and control
software that was developed in the petrochemical industry and utilizes an
optimization technique known as the simultaneous approach. This approach shows
promise in efficiently optimizing large models (thousands of variables and
parameters) [4]. In this method, the model and optimization problem are solved
simultaneously, as opposed to the traditional approach of solving the
differential and algebraic equation (DAE) model sequentially.  In the sequential approach, each iteration of
the optimization requires the solution of the DAE model. Much of the recent
development for the simultaneous approach is occurring in the petrochemical
industry, where on-line process control applications require optimization of nonlinear
models with many decision variables in the span of minutes. With the success of
APM in the petrochemical industry in solving such models it is desired to see
if similar results can be replicated in computational biology.  

Results

         The first step
in testing APM abilities is to show that it can accurately simulate small
biological models. To do this the results from the APM simulation were compared
to the literature and MATLAB simulation values for a couple small models. The
first model used was a basic model describing the concentration of HIV viruses
over thirty days with nine parameters, three variables and three differential
equations [5]. APM was successfully able to replicate the results published in
the literature for this model. The next model that was used describes the dynamics
of HIV infection of CD4+ T cells [6]. This model is a little larger with nine
parameters, four variables, and five DAEs. This model was obtained from the
BioModel Database and was manually converted to a format that could be used by
APM. It was found that APM also accurately simulates this model and matched
values from the literature and simulations in MATLAB. 

         Once it was
shown that APM could simulate biological models accurately, the next step was
to verify the parameter estimation capabilities of APM. This was done using a
HIV model similar to those mentioned above. The objective function was set to
minimize the absolute error between the model and synthetic data in order to
perform the parameter estimation. Measurement noise of plus or minus 0.5 log
order was added to the synthetic data to make it more realistic. All six
parameters were estimated in order to verify that APM could find the correct
parameter values. The parameters were started from several different starting
points to insure the accuracy of APM over the design space. Figure 1 shows the
concentration of HIV viruses from the synthetic data and the predicted model
values using the estimated parameters. As seen in the figure, APM was able to
correctly find the parameters that allowed the model to fit the synthetic data.
With this same model it was shown that APM allows for parallel processing,
allowing multiple parameter estimations to be run simultaneously.

Figure 1: Parameter estimation capabilities of APM to fit a
model to synthetic data

         Before the
capability of APM's parameter estimation could be applied to large-scale
biological models it was necessary to create an automatic conversion from SBML
to a format usable by APM. This not only eliminates human error in the
conversion process but it also allows for the quick evaluation of many publicly
available models.

         This
conversion utility was used to automatically convert a model that describes the
ErbB signaling pathways [7]. It is a large model with 225 parameters, 504
variables and 1331 DAEs. 75 initial conditions and rate constants were
estimated out of the 229 identified by the sensitivity analysis. This was
accomplished through simulated annealing and required 100 annealing runs and 24
hours on a 100-node cluster computer to obtain one good fit, on average.

         APM is
currently able to simulate the ErbB model but does not match literature values.
However, it does properly show the dynamics of the values found in the
literature. It is believed that the values do not match due to the limitations
found in the conversion utility since it does not yet properly handle piecewise
functions. Even if the literature values are not replicated exactly, parameter
estimation can still be performed to show the contribution of APM. The current
parameter values can be assumed to be the correct values. Then these values
will be changed and APM will try to reproduce the assumed correct values through
parameter estimation. Once again measurement error will be applied to the
values that are being used to conduct the parameter estimation. Instead of
using simulated annealing to estimate the parameters, a multi-start approach
will be used. To accomplish this, the parameter values will be randomly varied
up to plus or minus 2.5 log order from the prior value and then be optimized to
minimize an objective function. To test the full capabilities of APM all of the
parameters will be estimated. A large numbers of these runs will be performed
and the results will be compared the base values. If it is found that the design
space is too flat or if there are too many local optimums, other optimizing
techniques will be considered such as simulated annealing or a genetic
algorithm. Once this is complete, these results will be compared to those found
in the paper. The time requirement to solve the parameter estimation will also
be analyzed and compared to the traditional sequential approach. It is believed
that APM will be able to significantly decrease the amount of time required to
perform parameter estimation of large-scale biological models and allow for the
use of large-scale models in the pharmaceutical industry.

References

1. Adiwijaya, B. S. Herrmann,
E. Hare, B. Kieffer, T. Lin, C. Kwong, A. D. Garg,  Randle, J. C. R. Sarrazin, C. Zeuzem,
S. and Caron, P. R. (2010), ?A Multi-Variant, viral dynamic model of genotype 1
HCV to assess the in vivo evolution of protease-inhibitor resistant variants,? PLoS Comput. Biol., 6(4):e1000745.

2. Thomas, Katie (2012, May 1), ?Pfizer Races to Reinvent
Itself,? New York Times, Retrieved
from

http://www.nytimes.com/2012/05/02/business/pfizer-profit-declines-19-aft...

3. Vertex (2011,
January 19), Retrieved from

http://www.vrtx.com/a-network-of-minds/our-network.html

4. Biegler, L. T. (2007), ?An overview of simultaneous
strategies for dynamic optimization,? Chemical
Engineering and Processing: Process Intensification
, 46(11) pp. 1043 ? 1053.

5. Nowark, M. and May, R. (2000), Virus Dynamics
Mathematical Principles of Immunology and Virology. Oxford, New York: Oxford
University Press.

6. Perelson, A. S. Kirschner, D. E. De Boer, R. (1993), ?Dynamics
of HIV infection of CD4 + T cells,? Math
Biosci,
March, pp. 81-125.

7. Chen, William W. Schoeberl, Birgit Jasper, Paul J.
Niepel, Ulrik B. Lauffenburger, Douglas A. and Sorger, Peter K. (2009),
?Input-output behavior of ErbB signaling pathways as revealed by a mass action
model trained against dynamic data,? Molecular
Systems Biology,
5, pp. 239.