Review
Review
Systems Biomedicine 1:1, 35–41; January/February/March 2013; © 2013 Landes Bioscience
Network as biomarker
Quantifying transcriptional co-expression to stratify cancer clinical phenotypes Rotem Ben-Hamo and Sol Efroni* The Mina and Everard Goodman Faculty of Life Science; Bar Ilan University; Ramat-Gan, Israel
Keywords: GSEA, networks, biomarker, PPI networks, network inference, network medicine Abbreviations: GSEA, gene set enrichment analysis; PPI, protein-protein interaction; GO, gene ontology
Identifying robust biomarkers for cancer phenotypes has challenged the biological and pharmacological communities for many years, more so since the availability of screening methods that reveal the expression levels of all the genes in the genome. A host of different approaches have been used to address this lack of robustness. These methods have included a spectrum of approaches from gene enrichment analysis to network inference analysis. More recently, some methods that use the network properties of genes have demonstrated an ability to provide a more robust signature. In this review, we survey different network-as-biomarker methods used to identify various biomarkers and we discuss the critical role of networks in the progress toward personalized medicine. We also discuss the ability of the network to identify misguided processes, rather than the gene itself, as the core of distinctions among phenotypes. Discussions about the importance of the molecular pathway view and about processes (rather than the gene per se) at the core of understanding cancer are not new. However, this review focuses on the set of tools available for actually measuring the pathway, or the process, when the expression levels of their components are available.
Introduction Cancer is the leading cause of death in economically developed countries and the second leading cause of death in developing countries. Overall, an estimated 12.7 million new cancer cases and 7.6 million cancer deaths occurred in 2008 according to GLOBOCAN estimates.1 WHO has estimated on the basis of current trends that the global cancer burden will increase from 10 million new cases per year in 2000 to 16 million in 2020,2 which suggests an alarming increase in the number of new cases and a slow drug development rate. On the molecular biology front of target discovery, the 1990s introduced gene expression microarrays, and with those came a common dogma: enrichment analysis. The effort in enrichment *Correspondence to: Sol Efroni; Email:
[email protected] Submitted: 10/29/2012; Revised: 07/23/2013; Accepted: 07/23/2013 http://dx.doi.org/10.4161/sysb.26474
www.landesbioscience.com
analyses is directed at finding gene-sets that are associated with a change in phenotype (outcome, stages, subtypes, etc.). Another popular approach to deciphering the results of microarrays is based on ranking genes according to their differential expression levels, and then identifying an enriched characteristic in this (differentially expressed) set (GO annotations, pathway membership, biological process membership, etc.). However, mounting evidence suggests that biological, cellular and disease outcomes depend strongly on intricate interactions among genes, RNA molecules or proteins within signaling and other pathways.3 An additional level of complexity is introduced by the coexistence of cancer cells with a complex ecosystem populated by host cells including fibroblasts, endothelial cells, and leukocytes, all within a scaffold of extracellular matrix.4 It is only reasonable to infer that their interactions within this ecosystem should be taken into consideration as well. The understanding of complex biological systems calls for the integration of experimental and computational research. Systems biology, as a scientific discipline, aims to quantify all molecular elements of a biological system so they can be treated as a whole, in order to evaluate their interactions and integrate them into a model.5,6 This discipline is based on the understanding that by measuring the system as a unit and by charting its intranet design we gain unique insights. Complete repertoires of molecular activity in and among tissues provided by new high-dimensional “omics” technologies hold great promise for characterizing human physiology at all levels of biological hierarchies.7 The new post-genomic era of genomic technologies is leading toward insight into the complex global system of changes that are critical for the evolution and maintenance of cancer. Oncogenesis must be considered not just as rearrangements of chromosomes but also as a rearrangements of regulatory networks, their interactions and their interconnections. Cancer is not a disease of a few genes but of clusters of genes, whose altered global interactions evolve into the transformed and malignant state.8 Taking this premise of systems biology and systems biomedicine therefore leads to the conclusion that quantifying the behavior of the network itself would initiate a highly informative quantification of the phenotype. Network behavior itself, once it is quantifiable, could serve as a biomarker. The network as biomarker may lead to a better understanding of the structure, centrality, dynamics and operating principle of the disease network, revealing new
Systems Biomedicine 35
features and new configurations that are obscured by a singlemolecule perspective.
Gene Enrichment Analysis The availability of gene-expression microarray technology has led to a primary interest in identifying differentially expressed genes that significantly stratify data into known phenotypic groups. Such groups can provide insight when they are affiliated with biological processes.9 The list of differentially expressed genes is usually selected using an arbitrary threshold and then assessed for enrichment of GO annotations, pathways, etc.; the primary aim is to identify the biological processes that are most pertinent to the study. For example, gene expression signatures have been used for the molecular classification of breast cancer into distinct sub-types (basal, luminal A, luminal B, ERBB2 and normal-like),10,11 stratification of patients into clinical outcome (survival analysis),12 predicting response to chemotherapy, and even for routine clinical use. For example: Boutros et al.12 identified a 6-gene signature that could predict survival of nonsmall-cell lung cancer (NSCLC) cases and validated their results in four independent data sets. A 76-gene signature identified by Wang et al.13 was shown to predict distant metastasis in lymph node-negative breast cancer patients. Miller et al.14 reported on a 32-gene expression signature that distinguishes p53-mutant and wild type tumors with different histologies, and outperforms sequence-based assessments of p53 in predicting prognosis and therapeutic response. Gene expression signature profiling also identified a 23-gene signature that predicts recurrence in Dukes’ B colon cancer patients; this was validated using an independent data set.15 Thus, bioinformatics tools for providing the community with enrichment analysis abound (for a detailed survey, see ref. 16). However, because there is no unified method or “gold standard”, the statistical methods used have varied among tools. For example, hypergeometric,17-21 Fisher exact,22-25 Z score,26 Chisquare,27,28 t test,29 binomial, Kolmogorov-Smirnov,30 Gaussian,31 etc. A well known example is in Agendia’s MammaPrint, which has become the first FDA-cleared breast cancer prognosis marker chip containing a 70 gene signature. Classification was based on the correlations between the expression profile of the “leaveone-out” sample and the mean expression levels of the remaining samples from the good- and the poor-prognosis patients, respectively.32 However, when this 70 gene signature was compared with the 76-gene signature produced by Wang et al.,13 only three overlaps were found. This absence of a gold standard for the statistical method or for the cutoff threshold produces dramatic changes in the gene lists obtained. For example, Michiels et al.33 re-analyzed data from seven published studies that have attempted to predict the prognosis of cancer patients on the basis of DNA microarray data. They found that the list of genes identified as predictors of prognosis was highly unstable, and that the proportion unclassified decreased as the number of patients in the training set increased. Another inconsistency can be observed when the results from two
36
breast cancer studies are compared: Ramaswamy et al.34 reported on a 17-gene signature that stratified prognosis, while Sorlie et al.11 reported on a list of 456 genes. Comparison revealed that those lists shared only two genes. The inevitable conclusion is that the sets of genes identified are not unique; they are strongly influenced by the subset of patients, the statistical method and the threshold chosen. Many equally predictive lists could have been produced from the same analysis,35 which indicates an urgent need for a better classification that could survive the transition between data sets and experiments.
Network Inference Cancer is not a static disease; disease progression correlates with molecular modifications to evade natural defenses and to adapt to new environmental circumstances. Such drastic changes call for systemic modification; indeed, the initiation and progression of the disease are driven by interactions among genes, proteins and the environment rather than by simple alterations in genetic variants3,36 Systems biology enables experimental and computational research to be integrated and addresses fundamental questions regarding the complexity and connectivity of a system. Procedures for inferring, or “reverse-engineering”, from gene-expression data to regulatory networks through the use of computational algorithms can be divided into two classes. The first, based on “physical interactions”, will be discussed later; the second, based on “influence interaction”, attempts to relate the expression of two genes and does not necessarily imply a physical interaction between them.37 Influence interaction or network inference techniques uncover the network mostly from gene expression data without using preexisting biological information.37 The inferred network can represent characteristics such as correlations, mutual information, expression pattern, etc. One technique for network inference uses static data such as time-series or steady-state data for identifying co-expressed genes. A number of studies have demonstrated benefit from such algorithms.38-42 By clustering genes into groups with similar expression profiles, co-expressed genes can be discovered. The rationale behind such clustering is that co-expressed genes may be regulated simultaneously in accordance with the specific requirements of a cellular process.37 Another approach seeks mechanisms by analyzing the correlation signatures of genes. By calculating pairwise gene-gene correlations across data sets, inferred networks are constructed. BioLayout Express43 and OryzaExpress44 are examples of algorithms that tackle this issue and construct correlation networks from gene-expression data. Slavov et al.45 identified cancer-specific correlation signatures by comparing cancer and non-cancer data. This signature was found on the basis of correlations between genes. It would therefore not follow changes in individual gene expression levels. Using correlation networks, Wang et al.46 introduced a method for cancer classification based on the concept of biomarker association networks. They modeled the biomarker association
Systems Biomedicine Volume 1 Issue 1
patterns responsible for cancer using a neural network structure that captured the core biomarkers. ARACNE,47 an algorithm for reconstructing gene regulatory networks in the mammalian cellular context, is an informationtheoretical algorithm for the reverse engineering of transcriptional networks from microarray data. Unlike the methods explained previously, ARCANE, using a mutual information algorithm, aims to deduce direct transcriptional interaction with high confidence. Thus, it enables novel significant interactions to be identified that could contribute to disease progression.
Metrics Based on Curated Protein–Protein Interaction Networks Calculations of protein–protein interaction (PPI) networks are a well-established approach in systems biology. Biological networks are often characterized as scale-free: a few nodes have a tremendous number of connections and are referred to as hubs, while other nodes display a smaller number of connections, governed by a power law. This scale-free feature of the network is of special importance, as one characteristic of scale-free systems is robustness in the face of accidental failures, yet vulnerability to a coordinated attack (e.g., pharmacological targeting).48,49 Furthermore, the nodes themselves as well as their centrality in the network carry meaning with respect to the node’s function and its essentiality and evolutionary status.50-52 Curated networks are also mostly protein-protein interaction networks. In such networks, every node represents a protein and the edges stand for experimentally supported, literature supported, known interactions. PID,53 Reactome,54 and Biocarta are all examples of publicly available pathway interaction data sets. By merging differentially expressed genes with biological knowledge via curated network information, key sub-networks can be identified. Several pathways-level tools are available for incorporating pathway topology to interpret high-throughput data sets. PPI spider55 uses a Monte Carlo simulation procedure to compute the statistical significance of an inferred model. The network models are inferred from an experimentally identified protein list based on the topology of the global PPI network. PARADIGM56 is a method for inferring patient-specific genetic activities incorporating curated pathway interactions among genes. A gene is modeled by a factor graph as a set of interconnected variables encoding the expression and known activity of that gene and its products. Using probabilistic inference, the method predicts the degree by which pathway activities (e.g., internal gene states, interactions or high-level “outputs”) are altered in the patient. HotNet57 identifies subnetworks within a PPI network that are altered (in sequence) in a significant number of patients. HotNet uses a heat diffusion model that combines the frequency of the genes that are mutated and the topology of the network. GenMAPP58 is another tool for the analysis of microarray data on biological pathways. However, unlike most other tools, GenMAPP enables users to modify pathways according to their own use, to design new pathways and to apply complex criteria for viewing gene expression data on those pathways. SurvNet59
www.landesbioscience.com
(http://bioinformatics.mdanderson.org/SurvNet/index.html) is a tool for the identification of network-based biomarkers that most correlate with patients survival. It incorporates PPI network information, molecular profiling and patients’ survival data. PathOlogist60 provides a metric for quantifying the nature of the interactions themselves within a pathway. In PathOlogist, every pathway is quantified as a numerical value. This metric is computed using the probabilities that genes in the interaction are in a functionally active/inactive state. This active/inactive probability is calculated using gene mRNA expression levels. The outputs of PathOlogist are two quantifiable metrics termed “activity” and “consistency”. Intuitively, the activity of an interaction captures the probability that the input states of the interaction are true, while its consistency captures the probability that the output states are consistent with the result of the interaction. A growing number of pathway-based studies have been published recently for identifying novel prognostic biomarkers. Frohlich61 derived a consensus signature from seemingly different prognostic gene signatures in breast cancer by taking knowledge about protein-protein interactions into account. They reported that the signature was significantly more stable. Wang et al.62 integrated gene expression data with PPI information to develop a network-based biomarker in lung carcinogenesis. Pierobon et al.63 used reverse-phase protein microarray analysis of laser capture-microdissected CRC tumor specimens to profile broad cell signaling pathways from patients who presented with liver metastasis vs. patients who remained recurrence-free after followup. They revealed that the EGFR and COX2 signaling pathways are highly activated in the primary tumors of patients with synchronous metastatic disease.
Genetic Variation Cancer develops after somatic mutations overcome the multiple checks and balances on cellular proliferation. Those normal checks and balances define a robust genetic control system that protects against perturbations. A complex collection of genomic alterations occurs during tumor cell evolution, including mutations, translocations, copy number alterations and methylations. Such genomic changes influence the expression levels of genes. Hence, such genomics can be beneficial in any search for a prognostic biomarker. The integration of pathway-based knowledge with genetic changes can reveal novel core mechanisms that were previously hidden. Efroni et al.64 proposed a method for identifying specific networks that are significantly altered by regions of copy number variation. By applying the hyper-geometric function and overlaying it on to Fisher omnibus, which is a test for detecting deviations from normality due either to skewness or kurtosis,65 on large-scale data, the authors were able to extract pathways that were significantly and non-randomly targeted. Implementation of the method on a breast cancer data set identified the CDC25 and CHK1 pathway as highly targeted by genomic alterations. In addition, the method was able to stratify patients’ prognoses on the basis of the metric levels of the same pathways.
Systems Biomedicine 37
Figure 1. As we include further knowledge about the participating agents in the produced signature, we are able to identify a more robust signature. By adding the nature of interactions to simple tagging of processes or pathways, we quantify modifications to the network itself and are thus able to produce a more robust signature.
38
Systems Biomedicine Volume 1 Issue 1
A major challenge in cancer characteristics is to distinguish between “driver mutations”, which are initial modifications responsible at the core for cancer, from the succeeding “passenger mutations”, which have accumulated in somatic cells but did not cause the initial processes. HotNet66 considers a large-scale interaction network and mutation data from many patients. The tool then finds subnetworks, or clusters of interacting genes, that are mutated in a significant number of patients. The algorithm thus generalizes the analysis of recurrent mutations in single genes. PathScan67 is a computational tool for the scenario in which pathway mutations collectively contribute to tumor development. By considering the distribution of both gene lengths within a pathway and of mutations among samples, these procedures aim to improve the accuracy of the results. The combined events of genomic alterations are at the core of understanding cancer. The large repertoire of somatic mutations, variable changes in copy numbers and the collection of various cancer genes expressed among patients with the same clinicopathological features indicates a need to integrate systems-based methods with genetic variation data sets. The methods detailed above point the way to a more comprehensive analysis that could reveal novel events contributing to disease progression.
Network Medicine Auffray, Chen and Hood recently suggested that, “Systems approaches will transform the way drugs are developed through academy-industry partnerships that will target multiple components of networks and pathways perturbed in diseases.”68 Such changes in our perception of the combined influences of molecular agents on disease have been often called “network medicine”. Several published research papers have taken up this challenge and reported on highly significant and robust biomarkers. All of these have identified pathway-based analysis as more significant and robust than the individually-scored gene markers. Lee et al.69 incorporated pathway information into disease classification procedures to classify disease according to the activities of entire signaling pathways or protein complexes. For each pathway, an activity level is summarized from the gene-expression levels, using a classification method that takes the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. By applying their method to two cross-data set experiments from lung cancer, they found a robust set of pathways that classified patient prognosis. Su et al.70 proposed a classification method based on probabilistic inference of pathway activities. For a given sample, they compute the log-likelihood ratio between different disease phenotypes based on the expression levels of each gene. The activity of a given pathway is then inferred by combining the log-likelihood ratios of the constituent genes. They applied the method to the classification of breast cancer metastases and showed that it leads to a robust classifier yielding symmetric results for data set inversion. Teschendroff et al.71 developed a network-method for estimating pathway activation in tumors from model signatures. By applying their method to ER+ and ER– breast cancer data sets they demonstrated that ER-tumors characterized by simultaneous
www.landesbioscience.com
high activation of a Th-1 differentiation module and low activation of a TGFB pathway module had better clinical outcomes than tumors stratified by each pathway alone. In a recent study published by our laboratory, we reported on a single pathway that significantly and robustly correlates with glioblastoma outcome.72 Using five independent molecular and clinical data sets with a set of computational algorithms, we were able to identify a gene-gene and gene-microRNA network that significantly stratifies patient prognosis. By combining gene expression microarray data with microRNA expression levels, copy number alterations, drug response and clinical data, combined with network knowledge, we were able to identify a single pathway at the core of glioblastoma. Additional work on ovarian cancer identified the PDGF signaling pathway as a novel biomarker for survival that was consistent over three independent data sets.73 A key challenge for improving our understanding of heterogeneity in clinical outcome and response to therapy is to map out the activation levels of cancer-relevant pathways across clinical tumor specimens. The integration between high-throughput data, genetic variations and network knowledge will undoubtedly shed light on the biology and pathogenesis of cancer. Finding a robust signature that carries the transition between data sets is a first step toward relying on a molecular signature to explain disease etiology and to design drug interventions.
Concluding Remarks In our effort to understand cancer on the one hand and to treat it on the other, we, as a community, quickly highlight any molecular finding that explains phenotype and can serve as a drug candidate. This is especially true in view of the previous decade, which was heralded as the start of the genomic era. This concept brought with it the notion that the genome would bear fruit in the form of new drug targets.74 While this hope has not been realized so far, we can view recent progress as heralding major findings.75 The two ends of this complex disease—etiology and treatment—characterizes much of the effort in cancer research. As benefit for treatment would obviously be enough, we rush to implement novel findings. The integrative effort of piecing basic findings together has perhaps not been given proper attention. Such integrative effort, from a computational point of view, can be seen in work such as,76-78 in which the objective is to collect basic information and integrate it into a whole biological process. This integrative computational biology is the bridge between the reduction we see in screening thousands of patients into their core genes or core molecular pathways, and the intensity with which these basic findings are brought to the clinic. The limited success achieved by this massive reduction in dimensionality, at least considering its initial promise that a handful of genes would be found to stand at the core of cancer, has been troubling the cancer community for over a decade. Lack of robustness, demonstrated repeatedly (see refs. 79 and 80 for many such examples), has interfered with our ability to solidify such genomic findings into the canon of basic findings. Without
Systems Biomedicine 39
solid basic findings, integration into understanding is not possible. In this overview of the subject, we have tried to demonstrate the shift from the gene view to the network view. More accurately, we deal with the shift from the gene or the random group of genes as biomarker, to the network among genes and the nature of modifications in the connections among this group of genes. We show how this shift, this metric of the network as biomarker, yields a quantification that does meet the criterion of robustness. The complexity of the system demands initial building blocks if we are to make sense of it. If the recent progress in No potential conflicts of interest were disclosed.
References 1. Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D. Global cancer statistics. CA Cancer J Clin 2011; 61:69-90; PMID:21296855; http://dx.doi. org/10.3322/caac.20107 2. Lingwood RJ, Boyle P, Milburn A, Ngoma T, Arbuthnott J, McCaffrey R, Kerr SH, Kerr DJ. The challenge of cancer control in Africa. Nat Rev Cancer 2008; 8:398-403; PMID:18385682; http://dx.doi. org/10.1038/nrc2372 3. Ziogas DE, Katsios C, Roukos DH. From traditional molecular biology to network oncology. Future Oncol 2011; 7:155-9; PMID:21345133; http://dx.doi. org/10.2217/fon.10.190 4. Camacho DF, Pienta KJ. Disrupting the networks of cancer. Clin Cancer Res 2012; 18:2801-8; PMID:22442061; http://dx.doi.org/10.1158/10780432.CCR-12-0366 5. Hood L, Heath JR, Phelps ME, Lin B. Systems biology and new technologies enable predictive and preventative medicine. Science 2004; 306:6403; PMID:15499008; http://dx.doi.org/10.1126/ science.1104635 6. Kitano H. Systems biology: a brief overview. Science 2002; 295:1662-4; PMID:11872829; http://dx.doi. org/10.1126/science.1069492 7. Schadt EE, Björkegren JL. NEW: network-enabled wisdom in biology, medicine, and health care. Sci Transl Med 2012; 4:rv1; PMID:22218693; http:// dx.doi.org/10.1126/scitranslmed.3002132 8. Green JE, Desai K, Ye Y, Kavanaugh C, Calvo A, Huh JI. Genomic approaches to understanding mammary tumor progression in transgenic mice and responses to therapy. Clin Cancer Res 2004; 10:385S-90S; PMID:14734496; http://dx.doi.org/10.1158/10780432.CCR-031201 9. Nam D, Kim SY. Gene-set approach for expression pattern analysis. Brief Bioinform 2008; 9:189-97; PMID:18202032; http://dx.doi.org/10.1093/bib/ bbn001 10. Perou CM, Sørlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, et al. Molecular portraits of human breast tumours. Nature 2000; 406:747-52; PMID:10963602; http:// dx.doi.org/10.1038/35021093 11. Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 2001; 98:10869-74; PMID:11553815; http://dx.doi. org/10.1073/pnas.191367098
40
the field is any indicator, exploiting these networks is also destined to revolutionize our view of fundamental human biology as well as disease progression, diagnosis, and treatment.81 Nevertheless, the network metric does not only confer robustness. The network-as-biomarker concept distinctly identifies the misguided process as the core of any distinction between phenotypes. We hope this concept will serve as one parameter by which we can calibrate molecular findings (Fig. 1). Disclosure of Potential Conflicts of Interest
12. Boutros PC, Lau SK, Pintilie M, Liu N, Shepherd FA, Der SD, Tsao MS, Penn LZ, Jurisica I. Prognostic gene signatures for non-small-cell lung cancer. Proc Natl Acad Sci U S A 2009; 106:2824-8; PMID:19196983; http://dx.doi.org/10.1073/pnas.0809444106 13. Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijervan Gelder ME, Yu J, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 2005; 365:671-9; PMID:15721472 14. Miller LD, Smeds J, George J, Vega VB, Vergara L, Ploner A, Pawitan Y, Hall P, Klaar S, Liu ET, et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci U S A 2005; 102:13550-5; PMID:16141321; http://dx.doi. org/10.1073/pnas.0506230102 15. Wang Y, Jatkoe T, Zhang Y, Mutch MG, Talantov D, Jiang J, McLeod HL, Atkins D. Gene expression profiles and molecular markers to predict recurrence of Dukes’ B colon cancer. J Clin Oncol 2004; 22:156471; PMID:15051756; http://dx.doi.org/10.1200/ JCO.2004.08.186 16. Huang W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009; 37:1-13; PMID:19033363; http://dx.doi. org/10.1093/nar/gkn923 17. Robinson MD, Grigull J, Mohammad N, Hughes TR. FunSpec: a web-based cluster interpreter for yeast. BMC Bioinformatics 2002; 3:35; PMID:12431279; http://dx.doi.org/10.1186/1471-2105-3-35 18. Martínez-Cruz LA, Rubio A, Martínez-Chantar ML, Labarga A, Barrio I, Podhorski A, Segura V, Sevilla Campo JL, Avila MA, Mato JM. GARBAN: genomic analysis and rapid biological annotation of cDNA microarray and proteomic data. Bioinformatics 2003; 19:2158-60; PMID:14594726; http://dx.doi. org/10.1093/bioinformatics/btg291 19. Castillo-Davis CI, Hartl DL. GeneMerge--postgenomic analysis, data mining, and hypothesis testing. Bioinformatics 2003; 19:891-2; PMID:12724301; http://dx.doi.org/10.1093/bioinformatics/btg114 20. Wrobel G, Chalmel F, Primig M. goCluster integrates statistical analysis and functional interpretation of microarray expression data. Bioinformatics 2005; 21:3575-7; PMID:16020468; http://dx.doi. org/10.1093/bioinformatics/bti574 21. Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM, Sherlock G. GO:TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 2004; 20:3710-5; PMID:15297299; http://dx.doi. org/10.1093/bioinformatics/bth456 22. Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz SA, Tainsky MA. Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res 2003; 31:3775-81; PMID:12824416; http://dx.doi. org/10.1093/nar/gkg624
23. Masseroli M, Martucci D, Pinciroli F. GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining. Nucleic Acids Res 2004; 32(Web Server issue):W293-300; PMID:15215397; http://dx.doi. org/10.1093/nar/gkh432 24. Bauer S, Grossmann S, Vingron M, Robinson PN. Ontologizer 2.0--a multifunctional tool for GO term enrichment analysis and data exploration. Bioinformatics 2008; 24:1650-1; PMID:18511468; http://dx.doi.org/10.1093/bioinformatics/btn250 25. Dennis G Jr., Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 2003; 4:3; PMID:12734009; http:// dx.doi.org/10.1186/gb-2003-4-5-p3 26. Tu K, Yu H, Zhu M. MEGO: gene functional module expression based on gene ontology. Biotechniques 2005; 38:277-83; PMID:15727134; http://dx.doi. org/10.2144/05382RR04 27. Zhong S, Storch KF, Lipan O, Kao MC, Weitz CJ, Wong WH. GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in Gene Ontology space. Appl Bioinformatics 2004; 3:261-4; PMID:15702958; http://dx.doi. org/10.2165/00822942-200403040-00009 28. Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, Wang J, Li S, Li R, Bolund L, et al. WEGO: a web tool for plotting GO annotations. Nucleic Acids Res 2006; 34(Web Server issue):W293-7; PMID:16845012; http://dx.doi.org/10.1093/nar/ gkl031 29. Boorsma A, Foat BC, Vis D, Klis F, Bussemaker HJ. T-profiler: scoring the activity of predefined groups of genes using gene expression data. Nucleic Acids Res 2005; 33(Web Server issue):W592-5; PMID:15980543; http://dx.doi.org/10.1093/nar/ gki484 30. Backes C, Keller A, Kuentzer J, Kneissl B, Comtesse N, Elnakady YA, Müller R, Meese E, Lenhof HP. GeneTrail--advanced gene set enrichment analysis. Nucleic Acids Res 2007; 35(Web Server issue):W186-92; PMID:17526521; http://dx.doi. org/10.1093/nar/gkm323 31. Smid M, Dorssers LC. GO-Mapper: functional analysis of gene expression data using the expression level as a score to evaluate Gene Ontology terms. Bioinformatics 2004; 20:2618-25; PMID:15130934; http://dx.doi.org/10.1093/bioinformatics/bth293 32. van ’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002; 415:530-6; PMID:11823860; http://dx.doi. org/10.1038/415530a 33. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 2005; 365:488-92; PMID:15705458; http://dx.doi.org/10.1016/ S0140-6736(05)17866-0 34. Ramaswamy S, Ross KN, Lander ES, Golub TR. A molecular signature of metastasis in primary solid tumors. Nat Genet 2003; 33:49-54; PMID:12469122; http://dx.doi.org/10.1038/ng1060
Systems Biomedicine Volume 1 Issue 1
35. Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 2005; 21:1718; PMID:15308542; http://dx.doi.org/10.1093/ bioinformatics/bth469 36. Correia AL, Bissell MJ. The tumor microenvironment is a dominant force in multidrug resistance. Drug Resist Updat 2012; 15:39-49; PMID:22335920; http://dx.doi.org/10.1016/j.drup.2012.01.006 37. Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo D. How to infer gene networks from expression profiles. Mol Syst Biol 2007; 3:78; PMID:17299415 38. D’haeseleer P, Liang S, Somogyi R. Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 2000; 16:70726; PMID:11099257; http://dx.doi.org/10.1093/ bioinformatics/16.8.707 39. Kyoda KM, Morohashi M, Onami S, Kitano H. A gene network inference method from continuousvalue gene expression data of wild-type and mutants. Genome Inform Ser Workshop Genome Inform 2000; 11:196-204; PMID:11700600 40. Bansal M, Della Gatta G, di Bernardo D. Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics 2006; 22:815-22; PMID:16418235; http://dx.doi.org/10.1093/bioinformatics/btl003 41. Reiss DJ, Baliga NS, Bonneau R. Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics 2006; 7:280; PMID:16749936; http://dx.doi.org/10.1186/1471-2105-7-280 42. Ernst J, Bar-Joseph Z. STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics 2006; 7:191; PMID:16597342; http://dx.doi.org/10.1186/1471-2105-7-191 43. Theocharidis A, van Dongen S, Enright AJ, Freeman TC. Network visualization and analysis of gene expression data using BioLayout Express(3D). Nat Protoc 2009; 4:1535-50; PMID:19798086; http:// dx.doi.org/10.1038/nprot.2009.177 44. Hamada K, Hongo K, Suwabe K, Shimizu A, Nagayama T, Abe R, Kikuchi S, Yamamoto N, Fujii T, Yokoyama K, et al. OryzaExpress: an integrated database of gene expression networks and omics annotations in rice. Plant Cell Physiol 2011; 52:2209; PMID:21186175; http://dx.doi.org/10.1093/pcp/ pcq195 45. Slavov N, Dawson KA. Correlation signature of the macroscopic states of the gene regulatory network in cancer. Proc Natl Acad Sci U S A 2009; 106:407984; PMID:19246374; http://dx.doi.org/10.1073/ pnas.0810803106 46. Wang HQ, Wong HS, Zhu H, Yip TT. A neural network-based biomarker association information extraction approach for cancer classification. J Biomed Inform 2009; 42:654-66; PMID:19162234; http://dx.doi.org/10.1016/j.jbi.2008.12.010 47. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 2006; 7(Suppl 1):S7; PMID:16723010; http://dx.doi. org/10.1186/1471-2105-7-S1-S7 48. Barabási AL, Bonabeau E. Scale-free networks. Sci Am 2003; 288:60-9; PMID:12701331; http:// dx.doi.org/10.1038/scientificamerican0503-60 49. Albert R, Jeong H, Barabasi AL. Error and attack tolerance of complex networks. Nature 2000; 406:378-82; PMID:10935628; http://dx.doi. org/10.1038/35019019 50. Jeong H, Mason SP, Barabási AL, Oltvai ZN. Lethality and centrality in protein networks. Nature 2001; 411:41-2; PMID:11333967; http://dx.doi. org/10.1038/35075138
www.landesbioscience.com
51. Zotenko E, Mestre J, O’Leary DP, Przytycka TM. Why do hubs in the yeast protein interaction network tend to be essential: reexamining the connection between the network topology and essentiality. PLoS Comput Biol 2008; 4:e1000140; PMID:18670624; http://dx.doi.org/10.1371/journal.pcbi.1000140 52. He XL, Zhang JZ. Why do hubs tend to be essential in protein networks? PLoS Genet 2006; 2:e88; PMID:16751849; http://dx.doi.org/10.1371/journal. pgen.0020088 53. Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH. PID: the Pathway Interaction Database. Nucleic Acids Res 2009; 37(Database issue):D674-9; PMID:18832364; http://dx.doi. org/10.1093/nar/gkn653 54. Joshi-Tope G, Gillespie M, Vastrik I, D’Eustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L, et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 2005; 33(Database issue):D428-32; PMID:15608231; http://dx.doi.org/10.1093/nar/gki072 55. Antonov AV, Dietmann S, Rodchenkov I, Mewes HW. PPI spider: a tool for the interpretation of proteomics data in the context of protein-protein interaction networks. Proteomics 2009; 9:27409; PMID:19405022; http://dx.doi.org/10.1002/ pmic.200800612 56. Vaske CJ, Benz SC, Sanborn JZ, Earl D, Szeto C, Zhu J, Haussler D, Stuart JM. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 2010; 26:i237-45; PMID:20529912; http://dx.doi. org/10.1093/bioinformatics/btq182 57. Vandin F, Upfal E, Raphael BJ. Algorithms for detecting significantly mutated pathways in cancer. J Comput Biol 2011; 18:507-22; PMID:21385051; http://dx.doi.org/10.1089/cmb.2010.0265 58. Dahlquist KD, Salomonis N, Vranizan K, Lawlor SC, Conklin BR. GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet 2002; 31:19-20; PMID:11984561; http:// dx.doi.org/10.1038/ng0502-19 59. Jun Li PR. Stefan Grünewald, Han Liang 2012. 60. Greenblum SI, Efroni S, Schaefer CF, Buetow KH. The PathOlogist: an automated tool for pathway-centric analysis. BMC Bioinformatics 2011; 12:133; PMID:21542931; http://dx.doi. org/10.1186/1471-2105-12-133 61. Fröhlich H. Network based consensus gene signatures for biomarker discovery in breast cancer. PLoS One 2011; 6:e25364; PMID:22046239; http://dx.doi. org/10.1371/journal.pone.0025364 62. Wang YC, Chen BS. A network-based biomarker approach for molecular investigation and diagnosis of lung cancer. BMC Med Genomics 2011; 4:2; PMID:21211025; http://dx.doi. org/10.1186/1755-8794-4-2 63. Pierobon M, Calvert V, Belluco C, Garaci E, Deng J, Lise M, et al. Multiplexed Cell Signaling Analysis of Metastatic and Nonmetastatic Colorectal Cancer Reveals COX2-EGFR Signaling Activation as a Potential Prognostic Pathway Biomarker. Clin Colorectal Cancer 2009; 8:110-7; PMID:19739273; http://dx.doi.org/10.3816/CCC.2009.n.018 64. Efroni S, Ben-Hamo R, Edmonson M, Greenblum S, Schaefer CF, Buetow KH. Detecting cancer gene networks characterized by recurrent genomic alterations in a population. PLoS One 2011; 6:e14437; PMID:21283511; http://dx.doi.org/10.1371/journal. pone.0014437 65. Dagostin.Rb. Omnibus Test of Normality for Moderate and Large Size Samples. Biometrika 1971; 58:341; http://dx.doi.org/10.1093/biomet/58.2.341 66. Vandin F, Upfal E, Raphael BJ. De novo discovery of mutated driver pathways in cancer. Genome Res 2012; 22:375-85; PMID:21653252; http://dx.doi. org/10.1101/gr.120477.111
67. Wendl MC, Wallis JW, Lin L, Kandoth C, Mardis ER, Wilson RK, Ding L. PathScan: a tool for discerning mutational significance in groups of putative cancer genes. Bioinformatics 2011; 27:1595-602; PMID:21498403; http://dx.doi.org/10.1093/ bioinformatics/btr193 68. Auffray C, Chen Z, Hood L. Systems medicine: the future of medical genomics and healthcare. Genome Med 2009; 1:2; PMID:19348689; http://dx.doi. org/10.1186/gm2 69. Lee E, Chuang HY, Kim JW, Ideker T, Lee D. Inferring pathway activity toward precise disease classification. PLoS Comput Biol 2008; 4:e1000217; PMID:18989396; http://dx.doi.org/10.1371/journal.pcbi.1000217 70. Su J, Yoon BJ, Dougherty ER. Accurate and reliable cancer classification based on probabilistic inference of pathway activity. PLoS One 2009; 4:e8161; PMID:19997592; http://dx.doi.org/10.1371/journal. pone.0008161 71. Teschendorff AE, Gomez S, Arenas A, El-Ashry D, Schmidt M, Gehrmann M, Caldas C. Improved prognostic classification of breast cancer defined by antagonistic activation patterns of immune response pathway modules. BMC Cancer 2010; 10:604; PMID:21050467; http://dx.doi. org/10.1186/1471-2407-10-604 72. Ben-Hamo R, Efroni S. Gene expression and network-based analysis reveals a novel role for hsa-miR-9 and drug control over the p38 network in glioblastoma multiforme progression. Genome Med 2011; 3:77; PMID:22122801; http://dx.doi.org/10.1186/ gm293 73. Ben-Hamo R, Efroni S. Biomarker robustness reveals the PDGF network as driving disease outcome in ovarian cancer patients in multiple studies. BMC Syst Biol 2012; 6:3; PMID:22236809; http://dx.doi. org/10.1186/1752-0509-6-3 74. Baggs JE, Hughes ME, Hogenesch JB. The network as the target. Wiley Interdiscip Rev Syst Biol Med 2010; 2:127-33; PMID:20836017; http://dx.doi. org/10.1002/wsbm.57 75. Lander ES. Genome-sequencing anniversary. The accelerator. Science 2011; 331:1024; PMID:21350161; http://dx.doi.org/10.1126/science.1204037 76. Angermann BR, Klauschen F, Garcia AD, Prustel T, Zhang F, Germain RN, Meier-Schellersheim M. Computational modeling of cellular signaling processes embedded into dynamic spatial contexts. Nat Methods 2012; 9:283-9; PMID:22286385; http:// dx.doi.org/10.1038/nmeth.1861 77. Setty Y, Cohen IR, Dor Y, Harel D. Four-dimensional realistic modeling of pancreatic organogenesis. Proc Natl Acad Sci U S A 2008; 105:20374-9; PMID:19091945; http://dx.doi.org/10.1073/ pnas.0808725105 78. Efroni S, Harel D, Cohen IR. Emergent dynamics of thymocyte development and lineage determination. PLoS Comput Biol 2007; 3:e13; PMID:17257050; http://dx.doi.org/10.1371/journal.pcbi.0030013 79. Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci U S A 2006; 103:5923-8; PMID:16585533; http://dx.doi. org/10.1073/pnas.0601231103 80. Venet D, Dumont JE, Detours V. Most random gene expression signatures are significantly associated with breast cancer outcome. PLoS Comput Biol 2011; 7:e1002240; PMID:22028643; http://dx.doi. org/10.1371/journal.pcbi.1002240 81. Ideker T, Sharan R. Protein networks in disease. Genome Res 2008; 18:644-52; PMID:18381899; http://dx.doi.org/10.1101/gr.071852.107
Systems Biomedicine 41