(732a) Identifying Molecular Targets of Drugs Using an Integrative Network Analysis of Protein-Protein, Protein-DNA and Transcriptomics Data | AIChE

(732a) Identifying Molecular Targets of Drugs Using an Integrative Network Analysis of Protein-Protein, Protein-DNA and Transcriptomics Data

Authors 

The identification of the molecular targets of compounds has great importance in drug discovery and drug re-purposing, not only for understanding mechanism of action, but also for elucidating hidden phenotypes and off-target effects. Gene expression or transcriptomics analysis provides a common means to obtain genome-wide information of cellular response to drug’s actions and to identify genes with altered expressions due to a drug treatment. Computational algorithms for such transcriptomics data have been developed and applied with the purpose of determining the direct molecular targets of drugs, as opposed to indirect (downstream) targets. A family of these algorithms uses a network of gene regulatory interactions to predict the direct drug targets by viewing the drug action as a form of network perturbations. For example, causal-reasoning methods [1,2] and Master Regulatory Inference algorithm (MARINa) [3] employ literature-based and reverse-engineered network, respectively, to score a potential target based on statistical enrichment of the downstream molecules among the differentially expressed genes. More recent and advanced methods leverage multiple sources of information on the networks of protein and gene interactions, including protein-protein interaction (PPI) and transcription factor binding site (TFBS). The-state-of-the-art method, called Detecting Mechanism of Action by Network Dysregulation (DeMAND) [4] scores drug targets based on transcriptional dysregulations of the downstream genes using a protein-gene network, combining PPI and an inferred gene transcriptional network.

In this work, we developed Systems Analysis and Learning for inferring Modifiers of Networks (SALMON), a novel method for identifying drug targets. SALMON uses a protein-gene network (PGN) that is constructed using a combination of TFBS and PPI datasets. The strengths and signs of the protein-gene interactions in the PGN are inferred from the gene transcriptomics data by solving the linear regression model described in our previous network inference method called DeltaNeTS [5] with ridge regression strategy. To determine the targets of each drug, each candidate protein is scored based on the dysregulation of the protein-gene interactions in the drug treatment sample. A positive score indicates that the drug enhances the gene regulatory activity of a particular protein, and vice versa, a negative score implies an attenuation of the regulatory activity of the protein target.

We tested the performance of SALMON, using three drug treatment datasets in mammalian cells: (A) NCI-DREAM drug synergy challenge using human lymphoma cells [6], which comprises time-series gene expression profiles of human lymphoma cells from 14 compound treatment experiments, (B) genotoxic compound study using human liver cancer cells [7], which consists of time-series profiles from 62 genotoxic and non-genotoxic compound treatment experiments, and (C) drug treatment study using mouse pancreas cells [8], which includes time-series samples from 29 chromatin-targeting compound treatments. For the human cells, we used tissue-specific TFBS from RegulatoryCircuit database [9] and human PPI from ENRICHR [10] and STRING [11] databases. For the mouse cells, we used mouse-pancreas-cell-specific TFBS from CellNet [12] and mouse PPI from STRING.

We compared the predictions of SALMON to those from DeMAND, our previous strategy DeltaNeTS [5], and differential gene expression analysis (DE). SALMON significantly outperformed DeMAND, DeltaNeTS and DE analysis in terms of AUROC (area under the receiver operating characteristic curve of true positive rate vs. false positive rate) for all three datasets (see Table 1). Moreover, SALMON was able to more accurately differentiate multiple classes of drugs based on their known activities, such as genotoxicity and histone deacetylation, than the other methods. For example, in NCI-DREAM drug synergy dataset, SALMON was able to specifically detect the action of mitomycin – a DNA crosslinking agent – in activating the DNA crosslink repair, while DeMAND failed to reveal.

The superior performance of SALMON in predicting drug targets is a result of its ability to integrate multiple sources of information on the protein-gene network and to account for the strengths and signs of the protein-gene interactions. Because SALMON uses a kinetic model of the gene transcriptional network, the method is able to naturally accommodate time-series trancriptomics dataset, a common type of data in drug treatment studies. Finally, SALMON gives not only the prediction of the drug targets, but also the signs of the drug action (i.e. enhancement and attenuation). Our test cases demonstrated that SALMON is a highly robust algorithm, capable of providing high prediction accuracy over three different mammalian cells, without the need for exhaustive parameter tuning.

Table 1. Averaged AUROC over compounds in three data sets by SALMON, DeMAND, DeltaNeTS, and DE

SALMON

DeMAND

DeltaNeTS

DE

Drug synergy challenge

0.807

0.694

0.627

0.625

Genotoxic study

0.768

0.734

0.675

0.636

Mouse cell data

0.900

0.749

0.618

0.677

[1] Chindelevitch, L., D. Ziemek, A. Enayetallah, R. Randhawa, B. Sidders, C. Brockel, and E. S. Huang. 2012. Causal reasoning on biological networks: Interpreting transcriptional changes. Bioinformatics 28 (8): 1114–1121.

[2] Martin, F., T. M. Thomson, A. Sewer, D. a Drubin, C. Mathis, D. Weisensee, D. Pratt, J. Hoeng, and M. C. Peitsch. 2012. Assessment of network perturbation amplitudes by applying high-throughput data to causal biological networks. BMC systems biology 6: 54.

[3] Lefebvre, C., P. Rajbhandari, M. J. Alvarez, P. Bandaru, W. K. Lim, M. Sato, K. Wang, et al. 2010. A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Molecular systems biology 6 (377): 377.

[4] Woo, J. H., Y. Shimoni, W. S. Yang, P. Subramaniam, A. Iyer, P. Nicoletti, M. Rodríguez Martínez, et al. 2015. Elucidating Compound Mechanism of Action by Network Perturbation Analysis. Cell 162 (2): 441–51.

[5] Noh, H., H. Ziyi, and R. Gunawan. 2016. Inferring Causal Gene Targets from Time Course Expression Data. IFAC-PapersOnLine 49 (26): 350–356.

[6] Bansal, M., et al., A community computational challenge to predict the activity of pairs of compounds, Nature Biotechnology, 2014, pp.1213-1222.

[7] Magkoufopoulou, C., S. M. H. Claessen, M. Tsamou, D. G. J. Jennen, J. C. S. Kleinjans, and J. H. M. Van delft. 2012. A transcriptomics-based in vitro assay for predicting chemical genotoxicity in vivo. Carcinogenesis 33 (7): 1421–1429.

[8] Kubicek, S., J. C. Gilbert, D. Fomina-yadlin, A. D. Gitlin, and Y. Yuan. 2012. Chromatin-targeting small molecules cause class-speci fi c transcriptional changes in pancreatic endocrine cells.

[9] Marback, D., et al., Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases, Nature Methods, 2016.

[10] Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, McDermott MG, Monteiro CD, Gundersen GW, Ma'ayan A. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research. 2016; gkw377.

[11] Szklarczyk, D., A. Franceschini, S. Wyder, K. Forslund, D. Heller, J. Huerta-Cepas, M. Simonovic, et al. 2015. STRING v10: Protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Research 43 (D1): D447–D452.

[12] Cahan, P., H. Li, S. A. Morris, E. Lummertz Da Rocha, G. Q. Daley, and J. J. Collins. 2014. CellNet: Network biology applied to stem cell engineering. Cell 158 (4): 903–915.