(490d) Developing An Information Theoretic Framework for Model Selection in Systems Biology | AIChE

(490d) Developing An Information Theoretic Framework for Model Selection in Systems Biology

Authors 

DeVilbiss, F. T. - Presenter, Purdue University
Ramkrishna, D., Purdue University
Song, H. S., Purdue University



In the process of model development, systems biologists often arrive at crossroads when it comes to selecting a specific modeling approach out of many possible ones to describe the functionality of a given network. A variety of system descriptions are available for such applications from low level resolutions to that of the organism. While there is no universal rule to determine the absolute utility of a model, certain metrics demonstrate promise in providing a coherent, rational, and objective basis for comparing model predictions.  To address the problem of model selection, it is prudent to borrow arguments formulated from information theory. In this light, it becomes possible to distinguish models from one another in their ability to describe the regularity of a given system’s output data. Capturing this regularity is in itself an act of compressing the data and therefore information theoretic methods are applicable. In this vein, one may apply the Minimum Description Length (MDL) principle to candidate biological models in order to compare how well these models describe regularity in data. Metrics derived upon MDL principles seek to identify a single model from a set of candidate models that best compresses features in data. This approach accomplishes this task through the consideration of both the model’s likelihood of fit and inherent complexity. The model that best compresses data from a process can be said to be the most efficient and useful description of the system.

In this work, a handful of metrics developed upon these concepts, including Akaike Information Criteria (AIC), Bayesian Information Criterion (BIC), and Normalized Maximum Likelihood (NML) are considered for a group of dynamic metabolic models to determine which model best compresses metabolic data through maximizing model fit and minimizing model complexity. This analysis identifies a point of diminishing returns in which additional model complexity provides little gain in describing data accurately for both nested and non-nested metabolic models. Starting with a comparison of flux predictions made by constraint based, kinetic and cybernetic metabolic models, this work develops a framework that intends to be extended to other model comparison applications in systems biology.