(180a) Extraction Of Transcriptional Signaling Networks Via Globally Optimal Biclustering | AIChE

(180a) Extraction Of Transcriptional Signaling Networks Via Globally Optimal Biclustering

Authors 

Yang, E. - Presenter, Rutgers University
Androulakis, I. - Presenter, Rutgers - The State University of New Jersey
King, K. R. - Presenter, Massachusetts General Hospital/Shriners Burn Hospital/Harvard Medical School
Yarmush, M. L. - Presenter, Center for Engineering in Medicine, Massachusetts General Hospital, Harvard Medical School, Shriners Hospital for Children


One of the issues with reconstructing gene networks arises due to the ill-posed nature of the problem[1]. To successfully reconstruct the gene network one must determine the architecture, the weights which reflect the effect of a transcription factor upon gene expression, and the activity of the transcription factor itself[2]. Lacking sufficient sets of experimental conditions it is possible for many different models to fit expression data perfectly due simply to the fact that one is required to solve more variables than can be justified by the data. Techniques such as PLS or NCA have applied constraints to the overall solution space as well as information about the underlying network architecture in order to making solving the problem feasible[3]. However, it is still unclear as to whether the constraints which they impose are biologically relevant.

The Living Cell Array (LCA) is a microfluidics device which due to the construction of artificial reporter genes is able to give an indication as to the overall activity of a given transcription factor in in vivo conditions[4]. This itself is one piece of information which is required for network reconstruction. However, we can also utilize the LCA to identify a possible underlying network structure via bi-clustering. To do so, we have formulated an optimizations based approach to biclustering in which the problem is treated as a mixed integer problem. This allows for both the identification of globally optimal clusters as well as the ability to impose constraints that allow us to find arbitrary overlapping biclusters which are not wholly a subset of previous solutions, something which has not been handled before. In most current algorithms, biclustering works primarily as a heuristic and overlapping clusters are handled by defining specific qualities of the overlap, or identifying how much clusters a given gene/condition entry can participate in. This is problematic for the analysis of biological data because it is difficult to tell a priori how interconnected a given gene is under a given condition and therefore how many clusters it ought to belong to. Our approach on the other hand uses linear constraints to eliminate solutions that are a subset of previous solutions thereby eliminating the need to pre-define the structure of the overlap.

This leads nominally to the creation of a bi-partite network in which stimulus and responses are linked. This is essentially a feed forward network. However, given the information, specifically the link between a given transcription factor and its reporter, we are then able to translate this bi-partite network into a directed acyclic graph. This ability then allows us to obtain the second piece of information which is the network architecture. With both the information about the network architecture as well as the individual contribution of the transcription factor, it greatly simplifies solving for the overall network architecture because all that remains is to solve for the individual weights that specifies the effect a given transcription factor has on the expression level of a specific gene. The results of the biclustering has hypothesized the existence of feed forward interactions from TNF-α to IL6, as well as complex feedback loops involving, IL6, IFN-γ, and Dexamethasone indicating perhaps the overall role in which an administration of corticosteroids can mediate the overall levels of inflammation. One of the critical benefits of utilizing the bi-clustering framework over a simpler direct effect model is that we are able to hypothesize the existence of negative regulation despite the fact that the construction of the LCA in this current iteration does not allow explicitly the identification of negative interactions.

1. Sabatti C, Rohlin L, Lange K, Liao JC: Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites. Bioinformatics 2005, 21:922-931.

2. Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C, Roychowdhury VP: Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci U S A 2003, 100:15522-15527.

3. Boulesteix AL, Strimmer K: Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach. Theor Biol Med Model 2005, 2:23.

4. King KR, Wang S, Irimia D, Jayaraman A, Toner M, Yarmush ML: A high-throughput microfluidic real-time gene expression living cell array. Lab Chip 2007, 7:77-85.