Probing differentially expressed genes against a ... - Wiley Online Library

The Plant Journal (2010) 61, 166–175

doi: 10.1111/j.1365-313X.2009.04043.x

TECHNICAL ADVANCE

Probing differentially expressed genes against a microarray database for in silico suppressor/enhancer and inhibitor/activator screens Jose´ J. Reina-Pinto, Derry Voisin, Roxana Teodor and Alexander Yephremov* Max-Planck-Institut fu¨r Zu¨chtungsforschung, Carl-von-Linne´-Weg 10, 50829 Ko¨ln, Germany Received 12 May 2009; revised 2 September 2009; accepted 25 September 2009; published online 5 November 2009. * For correspondence (fax +49 221 5062 113; e-mail [email protected]).

SUMMARY High-density oligonucleotide arrays are widely used for analysis of gene expression on a genomic scale, but the generated data remain largely inaccessible for comparative analysis purposes. Similarity searches in databases with differentially expressed gene (DEG) lists may be used to assign potential functions to new genes and to identify potential chemical inhibitors/activators and genetic suppressors/enhancers. Although this is a very promising concept, it requires the compatibility and validity of the DEG lists to be significantly improved. Using Arabidopsis and human datasets, we have developed guidelines for the performance of similarity searches against databases that collect microarray data. We found that, in comparison with many other methods, a rank-product analysis achieves a higher degree of inter- and intra-laboratory consistency of DEG lists, and is advantageous for assessing similarities and differences between them. To support this concept, we developed a tool called MASTA (microarray overlap search tool and analysis), and re-analyzed over 600 Arabidopsis microarray expression datasets. This revealed that large-scale searches produce reliable intersections between DEG lists that prove to be useful for genetic analysis, thus aiding in the characterization of cellular and molecular mechanisms. We show that this approach can be used to discover unexpected connections and to illuminate unanticipated interactions between individual genes. Keywords: analysis of microarrays, differentially expressed genes, rank product, suppressor/enhancer screen, extracellular matrix, reactive oxygen.

INTRODUCTION Microarray analysis is frequently used to identify groups of genes that show a transcriptional response to external stimuli or genetic modifications. As microarray analysis is now a common tool in molecular biology, microarray data from thousands of experiments are available for public access (Edgar, 2002; Brazma et al., 2003). Finding a means to reveal similarities and differences between gene expression responses across microarrays is an attractive task, as it could further expand the potential of this technology. For example, it has been shown that genetic and pharmacological inhibition of gene function can result in similar changes in global gene expression (Marton et al., 1998). A proof-of-concept study by Hughes et al. (2000) demonstrated a general approach for functional annotation of uncharacterized genes and pharmacological perturbations 166

using a reference database of expression profiles corresponding to 300 diverse chemical treatments and mutations in yeast (Hughes et al., 2000). However, despite significant advances in the storage, analysis and mining of microarray data, as well as the development of web-based tools, Cahan et al. (2005, 2007) noted that the resulting differentially expressed gene (DEG) lists were not amenable to electronic access, and remained inaccessible for comparison purposes. The poor consistency in determining DEGs and the uncertainty in selecting calculation methods may account for the observed shortfall. For example, Hosack et al. (2003) used various normalization procedures coupled with ordinary t-tests and significance analysis of microarrays (SAM) to select DEGs, and found that even the same dataset can ª 2009 The Authors Journal compilation ª 2009 Blackwell Publishing Ltd

Overlap meta-analysis of microarrays 167 produce DEGs that show limited overlap with one another (7–60% intersection, depending on the method used). Other disagreements between DEG lists obtained using different statistical methods in different laboratories have been reported (Marshall, 2004; Shi et al., 2005; Jeffery et al., 2006; MAQC Consortium, 2006). It is reasonable to assume that large-scale comparative analyses are likely to miss a substantial proportion of true cases if they are based on poorly defined DEG lists. The aim of this paper is to offer guidelines and suggest required procedures for performing an effective large-scale comparative analysis using meta-data from a reference database. Our approach does not rely on discrimination of gene expression ‘signatures’ or ‘fingerprints’ and classification techniques such as cluster analysis. It resembles sequence similarity searches of the type performed by BLAST and FASTA, and enables a researcher to compare query microarray data with a large collection of datasets in a depository. This method may be termed overlap metaanalysis, and is referred to here as MASTA (microarray overlap search tool and analysis) by analogy with FASTA, which is used for sequence similarity searching against nucleotide and protein databases. As in sequence similarity searches, the main question to ask in this case is what are the most similar responses to the queried gene expression response? The rationale of this method is in some ways reminiscent of that of FASTA, and is based on the consistency of gene order in DEG lists. We demonstrate how MASTA can be used for in silico suppressor/ enhancer and inhibitor/activator screens in Arabidopsis thaliana.

number of cases (537). The contributions of the other platforms were much lower (Figure 1). Sample tables in public repositories contain expression data that has undergone several successive pre-processing steps, including background adjustment, normalization and summarization for Affymetrix GeneChip data. In order to avoid bias from the various pre-processing strategies, we sought to uniformly re-analyze published microarray datasets to derive mutually comparable DEG lists. We therefore primarily considered accessions containing raw data from the array scan in CEL files. Choosing the right statistical method to generate DEG lists Rank-ordering and cut-off principle. Which computational methods are preferable may be a matter of debate, but it is noteworthy that any DEG list derived from a microarray data analysis is ambiguous because it may contain false positives (genes selected as being differentially expressed when they are not) and miss some that are truly differentially expressed. The length and composition of the selected DEG lists vary depending on the particular data, the statistical methods and the cut-off criteria applied. Although selection of the DEGs is the primary task for a variety of computational methods, not all approaches are equally suitable for MASTA, which is aimed at similarity searches. For example, although short DEG lists reduce the false positive rate, they may not provide enough information for a similarity assessment. On Affymetrix GPL1979

RESULTS

GPL71

Before implementing the overlap meta-analysis, several issues should be taken into account. The most important of these concern the uniformity and consistency of data processing, and detection of differential gene expression. Considerations in the selection of microarray data Multiple microarray platforms are used in laboratories, which often calls into question the validity of studies that compare the results of a microarray experiment with other microarray data. This also causes difficulties in statistical analysis and interpretation, as usually only part of the probes map to common genes between the platforms. Nevertheless, it is possible to resolve this problem (MAQC Consortium, 2006; Yi et al., 2007). To prove the MASTA concept, we decided to focus on a widely used platform. As an example, we surveyed the Gene Expression Omnibus (GEO) database for contributions from various microarray platforms for Arabidopsis thaliana, an experimental model plant. By the end of 2008, the GEO had received submissions for 390 Affymetrix GeneChip ATH1 Genome Array experiments, comprising 72.6% of the total

CATMA

GPL198 GPL4346 GPL6403 GPL5337

GPL888

Agilent Figure 1. Bubble-plot representation of Arabidopsis microarray data in the National Center for Biotechnology Information (NCBI) GEO database (December 2008). The areas of the circles are proportional to platform contributions to the database. The GEO accessions for the platforms are as follows: GPL198, Affymetrix GeneChip Arabidopsis ATH1 Genome Array; GPL71, Affymetrix GeneChip Arabidopsis Genome Array; GPL1979, AtTile1F to Arabidopsis Tiling 1.0F; GPL5337, CATMA_v2.1; GPL4346, CATMA_v2.2; GPL6403, CATMA_v2.3; GPL888, Agilent Arabidopsis 2 Oligo Microarray. The light-gray section indicates Affymetrix ATH1 GeneChip datasets for which CEL files are not available (22%).

ª 2009 The Authors Journal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2010), 61, 166–175

168 Jose´ J. Reina-Pinto et al. the other hand, MASTA can tolerate relatively high false positive rates in longer gene lists because arbitrarily selected items do not obstruct the effectiveness of a hypergeometric evaluation of the significance of the overlaps. Therefore, to simplify the MASTA algorithm, we applied gene number cut-offs to gene lists, inversely rank-ordered by false discovery rate (FDR) value, rather than fold change (FC), P value or FDR-controlling procedures. This provides uniformity across compared lists and makes it easier to evaluate overlaps statistically as a proportion of common DEGs between the two individual microarray analyses. Reasons not to use certain well-known statistical methods. Selection of DEGs can be achieved by using a number of statistical procedures. Examination of approximately 600 Arabidopsis ATH1 array datasets contained in our current database revealed that they usually include a few hybridizations, with practically all of them having fewer than five replicates. Most datasets (57%) comprise control versus treatment comparisons in a two-replicate design. Under these conditions, methods that use conventional tests for each individual gene cannot be sensibly applied. The MicroArray Quality Control (MAQC) project, which tested several microarray products (platforms) and three alternative technologies, showed that the within-platform consistency of DEG lists obtained from five replicate experiments from different laboratories was quite low when P values obtained by a t-test were used as the primary criterion for selecting DEGs (between 20 and 40% for the top 100 gene lists) (Guo et al., 2006; MAQC Consortium, 2006). We reasoned that, if gene lists produced by using a certain method from replicate samples are not consistent, then this method does not suit MASTA, which aims to reveal similarities between diverse data from different studies. In the MAQC project, four methods were directly compared (FC, ordinary t-test, SAM and the Wilcoxon test), but the consistency was high only when the FC ranking was used for selection of the DEGs (MAQC Consortium, 2006). In two dissimilar human RNA reference samples that were examined (MAQC Consortium, 2006), the differences in gene expression resulted in thousands of DEGs with FC values up to thousands of times, which is an extreme and usually unexpected situation. However, more recent MAQC results from rat toxicogenomics datasets, which revealed moderate differential expression levels between sample groups, confirmed this conclusion (Guo et al., 2006). This is a quite remarkable finding, as current practice in microarray data analysis does not favor use of the FC method as it is expected to preferentially select genes that are expressed at low levels because these tend to show higher FC values. It may therefore not be a good idea to rely on the FC ranking in MASTA, which deals with public microarray datasets of variable quality. A possible solution to the problem lies in

use of a variety of ‘variance shrinkage’ or rank-based nonparametrical methods that have higher statistical power than gene-specific tests when it comes to analyzing lowreplicate microarray data. In variance shrinkage methods, an attempt is made to optimize the standard error estimate for each gene by using data from other ‘similar’ genes or all genes examined in the microarray. In rank-based methods, the gene expression value is converted to a rank, which depends on the expression levels of all genes present in the microarray. Thus, the variance shrinkage and rank-based algorithms are methodologically different from traditional approaches that compare the expression of individual genes and do not take into account the expression status of the other genes present on the microarray. Comparative evaluation of the methods: the settings. Borrowing information from multiple genes using variance shrinkage methods should stabilize the selection of DEGs. To study this issue further, we directly compared the FC method and the ordinary t-test (TT) with six variance shrinkage methods: Cyber-T (CT) (Long et al., 2001), local pooled error (LPE) (Jain et al., 2003), two-sample Bayes T (BT) (Fox and Dimmic, 2006), empirical Bayes (EB) (Wright and Simon, 2003), SAM (Tusher et al., 2001) and RankProd (RP) (Breitling et al., 2004; Breitling and Herzyk, 2005). For this evaluation, we selected files from the Affymetrix 47K platform in MAQC human datasets (GEO GSE5350) that contain five replicates, and Affymetrix 22K independent Arabidopsis datasets comprising four replicates for each factor: treatment of the rre1 (rapid response to elicitor1) mutant and wild-type with chitin (GEO GSE2169), and powdery mildew (Erysiphe cichoracearum) infection of wild-type plants (GEO GSE431). For the MAQC data, inter-laboratory comparisons included factorial contrasts between datasets generated from six laboratories that used two contrasting standard RNA samples. DEG lists were calculated using each contrast and compared pairwise, giving rise to 15 comparisons (Figure 2a). This computational design is identical to that used by the MAQC Consortium (2006). To assess intra-laboratory consistency, the original Arabidopsis datasets were subdivided into 36 two-replicate contrasts resulting from all possible permutations. DEG lists, calculated for independent contrasts, were compared pairwise, giving to a total of 18 intra-laboratory comparisons (Figure 2b). The mean values of consistency of the DEGs from the 15 and 18 comparisons were used to produce the graphs shown in Figure 2. The process was repeated for each of the eight statistical methods. It may be anticipated that the respective DEGs would be compatible within any particular method, and the consistency of gene ranking, measured as the percentage of overlapping genes, should reflect the quality of the gene ranking when using the respective methods (Shi et al.,


Overlap meta-analysis of microarrays 169

(a)

(b)

(e) 100

Human MAQC datasets

90

Pairwise comparisons

Pairwise comparisons versus versus

versus versus

versus versus versus versus versus versus versus versus versus versus

versus

versus versus versus

versus versus

versus versus versus

80 RP FC LPE CT BT EB SAM TT

70 60 50 40 30 20 10

versus versus

0 20

50

versus

100

200

500

1000

2000

5000

Rank in the DEG list

versus versus

(f)

versus versus versus versus Selection of DEGs

100

Chitin treatment in Arabidopsis

90

versus

Selection of DEGs

Consistency of the DEG (%)

Selection of DEGs


Permutations

80 70

RP FC LPE CT BT EB SAM TT

60 50 40 30 20 10 0 20

(d)100

(g) 100

Human MAQC datasets

500 1000 200 100 Rank in the DEG list

2000

5000

Erysiphe cichoracearum infection in Arabidopsis

90

90



(c)

50

80 GCRMA_seqMM GCRMA_seq GCRMA_MM RMA MAS5

70 60 50 40 30 20

80 70

RP FC LPE CT BT EB SAM TT

60 50 40 30 20 10

10

0

0 20

50


2000

5000

20

50


2000

5000

Figure 2. Intra and inter-laboratory consistency of differentially expressed gene (DEG) rankings obtained by various statistical methods in Arabidopsis and human datasets. (a, b) Schematic representation describing the computational procedures for determining the inter-laboratory (a) (L denotes laboratory) and intra-laboratory (b) consistency of DEG lists by an independent contrast approach. For each method, DEG lists are calculated from raw data (CEL files), which are represented by black and orange squares for the factor levels, and compared pairwise using the sliding window technique shown in (c). Then the consistency of the method is calculated as an array of mean values over all independent pairwise contrasts (pc). (c) The sorted lists (by decreasing reliability) are divided into a number of sub-blocks, each containing W elements, and the number of common elements, C (red stripes), is determined for each pair. For the ith sub-block, the consistency (%) is defined as Ci/Wi · 100. To generate the graphs in (d–g), lists of the 5000 most upregulated genes were compared using a sliding window approach with a window of size 40, as shown in (c). (d) Inter-laboratory consistency analysis, comparing popular pre-processing algorithms for the RP method: GCRMA_seqMM (full model), GCRMA_seq, (uses probe sequence information and ignores mismatches), GCRMA_MM (uses intensity values read from the mismatches and ignores sequence information), RMA and MAS5. The human MAQC dataset (GEO GSE5350) was used. (e) Inter-laboratory consistency analysis with contrasting RNA reference samples from human MAQC datasets (GEO GSE5350). (f) Intra-laboratory consistency analysis for the response to chitin treatment in the Arabidopsis rre1 mutant and wild-type (GEO GSE2169). (g) Intra-laboratory consistency analysis for the response to powdery mildew (Erysiphe cichoracearum) infection in Arabidopsis wild-type plants (GEO GSE431). In (e–g), raw data were pre-processed with the GCRMA_seq algorithm (using the sequence-based background adjustment setting) and analyzed using one of the nine methods, as indicated: FC, fold change; CT, Cyber-T; RP, RankProd; LPE, local pooled error; SAM, significance analysis of microarrays; EB, empirical Bayes; BT, Bayes T; TT (t-test).

2005). We reasoned that a method could be recommended for a MASTA large-scale database search only if it produces consistent DEG lists in intra and inter-laboratory comparisons. The consistency criterion used in this study is very similar, but not identical, to the criterion of percentage of overlapping genes (POG) used by the MAQC Consortium (MAQC Consortium, 2006). Here, we took advantage of a slidingwindow approach, whereby DEGs are scanned by a

so-called search or detection window. The consistency was calculated as the proportion of those genes that occurred in both DEG list subsets, defined by a sliding window (Figure 2c). A sliding window of 40 genes, resulting in a nonoverlapping partitioning, was found to be adequate for graphing consistency results. This criterion, in contrast to POG, does not depend on the lengths of the gene lists, and allows uniform monitoring of the concordance level along the entire range of selected DEGs.


170 Jose´ J. Reina-Pinto et al. We chose to pre-process oligonucleotide array data using the GeneChip RMA (GCRMA) algorithm, which appeared to perform better for genes expressed at lower levels, and which has been favorably compared with other procedures (Wu and Irizarry, 2004; Allison et al., 2006; Vardhanabhuti et al., 2006). We performed our analyses using the methods listed above: FC, TT, CT, LPE, BT, EB, SAM and RP. The analysis steps for each method may be summarized as: (i) define factorial contrasts by selecting sets of replicates, i.e. CEL files; (ii) apply the method to identify up-regulated DEG lists ranging from high ranks of differential expression down to very low ranks, i.e. ordered by decreasing reliability; (iii) define pairwise comparisons and estimate consistency between each pair of DEG lists by the sliding window technique; (iv) calculate the mean consistency across pairwise comparisons within each partition window; and (v) plot the results on a two-dimensional graph. Comparative evaluation of the methods: the results. Assessment of the overlap between the MAQC human DEG lists revealed that agreement is greatest with the RP and FC methods, amounting to 82 and 79%, respectively, in the first window (Figure 2e). The CT and LPE approaches achieved a maximum of 55%, followed by BT (38%) and ST (35%). The other methods performed relatively poorly, barely reaching 28% for the first 40 genes. These results confirm that MAQC DEG lists produced with an FC ranking show minimal inter-laboratory variations; however, the RP gene ranking matches the performance of the FC method. We applied the same statistical methods to assess intralaboratory consistency in two Arabidopsis sample datasets (Figure 2f,g), and examined the ranking of the DEGs. Sets of independent contrasts were obtained through permutations of replicates within sites (Figure 2b). For the chitin treatment experiment (Figure 2f), RP gives a 78% concordance rate, and FC 76%, followed by LPE (74%) and CT (62%). For the powdery mildew experiment (Figure 2g), the values are similar: 78% for RP, 73% for FC, 68% for LPE and 51% for CT. In both tests, the SAM and TT generated incongruous DEGs, with only 7–14% of genes consistently defined. The results of this assessment support the findings from the MAQC analysis, which revealed high levels of concordance between DEG lists obtained using the FC method but not using the widely used SAM or t-tests. Remarkably, our findings also show that the RP method is accurate and may outperform the FC approach in its ability to define consistent DEGs. It is worth noting that, although the human and Arabidopsis microarray experiments have very different datasets with regard to the numbers of repetitions, probes and FC values, use of RP and FC minimized intra- and inter-laboratory variations and surpassed the other methods in terms of the consistency criterion. The LPE, which was developed to

take account of the need to handle microarray experiments with a small number of replicates (Jain et al., 2003), appeared to rival the RP and FC approaches for Arabidopsis, but lagged behind for the MAQC data. These results led us to conclude that use of SAM and TT-generated DEGs for MASTA is impractical. Although we have focused on Affymetrix GeneChip arrays in this paper, it is nevertheless worth noting that SAM and TT have a relatively low precision level for identification of 42 transcripts that have been spiked at various known concentrations in the same biological RNA sample in the HG-U133 Latin Square three replicate experiment (http://www.affymetrix.com/support/technical/ sample_data/datasets.affx). This benchmark dataset was generated by Affymetrix to develop statistical algorithms. It represents a framework for assessment of the discriminatory power of DEG selection methods by receiver-operating characteristic (ROC) analysis. Compared to TT, SAM, LPE and Resolver RatioBuild, the results from the ROC curve analysis favored RP (Vardhanabhuti et al., 2006). Jeffery et al. (2006) noted that RP performs well when datasets have low numbers of replicates (e.g. five) or high levels of noise, as measured by the pooled variance. This supports the view that this method is particularly suitable for dealing with the microarray datasets that are currently available. It is beyond the scope of this paper to discuss the differences between pre-processing algorithms, but the choice of algorithm and parameters can affect DEG lists in terms of inter-laboratory consistency (Guo et al., 2006; MAQC Consortium, 2006). Accordingly, we tested the RP and FC methods in combination with three popular microarray pre-processing algorithms: MAS5, RMA and GCRMA, using options to correct for non-specific hybridization (mismatch), GC content bias, or both. Figure 2(d) shows a graph compiled from MAQC data that indicates the relatively poor performance of the MAS5 algorithm combined with the RP approach (Figure S1 shows similar Arabidopsis data for RP and FC). The other methods performed similarly in our tests, although, when making use of MM (mismatch) probes, the GCRMA performed slightly better on average with Arabidopsis data. Evidence in support of the compatibility between the GCRMA algorithm and the RP method was also obtained from analysis of the Latin Square spike-in dataset (Vardhanabhuti et al., 2006). We are aware that the quality of a calculated DEG list could often be improved by applying suitable post-processing, such as cut-off filtering of FC and P values. However, this inevitably leads to DEG lists of unequal length, which may be disadvantageous because, as mentioned above, this risks generating gene lists that are too short and restrictive. Probing DEG lists against a microarray database To demonstrate the proposed methodology, we sought to re-analyze all data from multiple published studies of Arabidopsis, and downloaded a major portion of CEL files from


Overlap meta-analysis of microarrays 171 several repositories. Based on the descriptions of experiments, 614 contrasts (comparisons performed, e.g. mutant versus wild-type or pathogen-challenged plants versus control plants) were set up and calculated using the RP method (Breitling et al., 2004; Breitling and Herzyk, 2005; Hong et al., 2006). Length of the probe list. It can be seen from Figure 2 that a remarkable correspondence exists between the results in humans and Arabidopsis with regard to the consistency of DEGs. For example, for the 50th gene in the list, the probability of being consistently selected by the RP method is about 60%, and is about 25% for the 200th gene in humans and Arabidopsis. FDRs, which equal the difference between 100% and the actual consistency of DEGs, are about 40 and 75%, respectively. We generated 5000-gene lists by the RP approach, but selected 200 top-ranked genes for the example search below. This has provided reasonable estimates of DEG list similarity, but the optimal cut-off point still needs to be determined. The lengths of the DEG lists do not need to be equal, and could be defined using the data from specific FDR cut-offs. If required, FDR values can be accessed from RP result tables in which the percentage of false positive predictions (pfp) for each gene is calculated using a permutation-based procedure (Breitling and Herzyk, 2005). However, using fixed DEG list lengths facilitated graphical presentation of results. Genetic terminology of differential gene list overlaps in MASTA. For convenience of comparison, we adopted terminology similar to that commonly used when discussing gene linkage analysis: we call it a ‘coupling phase’ when overlap occurs between similarly mis-regulated DEGs in both microarray experiments, i.e. between up-regulated and up-regulated genes or between down-regulated and downregulated genes, while the term ‘repulsion phase’ refers to cases when overlap occurs between DEGs that are misregulated in opposite directions in two microarray experiments, i.e. between up-regulated and down-regulated genes or between down-regulated and up-regulated genes. An enhancer–target or activator–target gene interaction can be detected as a significant coupling-phase overlap between two DEG lists, whereas a suppressor–target or inhibitor–target gene interaction should be suspected when a significant repulsion-phase overlap occurs between two DEG lists. Generally, synergistic and antagonistic interactions are detectable in MASTA as coupling- and repulsionphase overlaps, respectively. Application of MASTA in Arabidopsis. We performed similarity searches between our database containing RPre-analyzed Arabidopsis data and RP-generated DEGs from each of the 614 contrasts, and then plotted the results for individual searches as bar graphs. One of the 614 graphs is

shown in Figure 3(a). All of the above computations were performed using custom-written R scripts. To illustrate the efficiency of the approach, we selected examples that are related to the role of extracellular matrix in the biotic and abiotic stress response in Arabidopsis. The full-size bar graphs showing 1228 DEG overlaps are available as Figures S2–S4, and subsets exhibiting representative overlaps are shown in Figure 3(b,c). Example query 1: asFBP1. It is known that cell wallassociated peroxidases and respiratory burst oxidases (plasma membrane NADPH oxidases) are involved in the production of reactive oxygen species (ROS), which are required for hypersensitive cell death in plant cells challenged with pathogens or elicitors. Antisense expression of French bean (Phaseolus vulgaris) cDNA encoding extracellular peroxidase FBP1 rendered transgenic Arabidopsis plants (hereafter called asFBP1) extremely susceptible to bacterial and fungal pathogens. In a leaf-disc assay, the asFBP1 plants showed a decreased oxidative burst in response to elicitors and pathogens, and were strikingly resistant to fumonisin B1 (FB1), a phytotoxin from Fusarium molds that induces apoptosis in mammalian cells and elicits programmed cell death in plants (Bindschedler et al., 2006). In the subset shown here, we used DEGs from the transgenic asFBP1 plants as queries. Figure 3(b) shows that the most significant coupling-phase overlap exists between asFBP1 down-regulated DEGs and eds16 down-regulated DEGs (111 genes, P < 2.111 · 10)187) and between asFBP1 up-regulated DEGs and eds16 up-regulated DEGs (54 genes, P < 3.711 · 10)66). The enhanced disease susceptibility 16 (eds16) mutant (also known as sid2), which has a mutation in isochorismate synthase gene ICS1 of the salicylic acid (SA) biosynthetic pathway, has very low levels of SA, is more susceptible to powdery mildew fungus (Erysiphe cichoracearum), and is impaired with respect to the systemic acquired resistance (SAR) that develops in uninfected parts of the plant. The asFBP1 down-regulated DEGs also exhibit a substantial coupling-phase overlap with NahG downregulated DEGs (79 genes, P < 3.246 · 10)114). As NahG is a bacterial salicylate hydroxylase that converts salicylic acid to catechol, NahG over-expression in CaMV35S:NahG plants mimics SA-deficient mutants with regard to transcriptional responses, and results in a breakdown of SAR. Moreover, 69 genes (P < 5.117 · 10)24) overlap between asFBP1 downregulated DEGs and those in the eds1 mutant, which accumulates about 20% less SA than the wild-type plant (Clarke et al., 2001). The lipase-like EDS1 protein is considered to be a regulator of SA-mediated resistance responses. The up-regulated DEGs in asFBP1 and eds1 also overlap (36 genes, P < 6.201 · 10)37). Further, exogenous application of SA appears to induce in wild-type plants a number of genes that are down-regulated in asFBP1 plants (67 genes, P < 3.936 · 10)90), suggesting the possible suppression of SA signaling by asFBP1. Finally, the repulsion-phase overlap


172 Jose´ J. Reina-Pinto et al.

(a)

(b)

(c)

Figure 3. The analysis subset from a large-scale search of the microarray DEG database for Arabidopsis. Affymetrix raw data files (CEL files) from databases were re-analyzed using an R script to calculate the statistical parameters according to the RP method (Breitling and Herzyk, 2005; Hong et al., 2006) and to detect the top 5000 upregulated and down-regulated DEGs. The first 200 genes were used for the overlap search shown here. (a) Typical bar plot of the large-scale search. Deregulated genes in response to hydrogen peroxide treatment (GEO GSE5530) were probed against a collection of 1228 DEG lists (representing 614 contrasts) using the MASTA algorithm. Bars show overlaps with the up-regulated DEGs (right-hand side) and down-regulated DEGs (lefthand side) in the query (the full-size bar graph is shown in Figure S2). (b, c) Subsets from searches with DEGs in asFBP1 transgenic plants and pmr6 mutants, respectively (the full-size bar graphs are shown in Figures S3 and S4). In all contrasts, mutants or transgenic plants were compared to wild-types, and treatments were compared to controls as indicated by the minus sign, e.g. the difference between the gene expression profiles in 35SMYB41 and wild-type was defined as 35SMYB41 minus wild-type. The numbers next to the bars show the actual number of identical gene probes in the corresponding lists (see Table S1). The workbench version of MASTA allows the user to display a list of Affymetrix probe identifiers in the overlaps by clicking next to the respective bar. Up-regulated DEGs in overlaps are shown as red bars and down-regulated genes as green bars; the random threshold level (the number of identical genes that appear at random in both lists of 200 genes) was set to 9 (blue lines), corresponding to P < 7.122 · 10)05 based on a hypergeometric distribution.


Overlap meta-analysis of microarrays 173 between asFBP1 down-regulated DEGs and pmr4 up-regulated DEGs (57 genes, P < 1.752 · 10)71) identifies asFBP1 as a putative suppressor of powdery mildew resistant 4 (pmr4). This is in agreement with the observation that blocking the SA signaling pathway renders this callose synthase mutant susceptible to powdery mildew (Figure 3b) (Nishimura et al., 2003). Collectively, these and the other DEG similarities shown in Figure 3(b) led us to formulate the hypothesis that extracellular peroxidases and SA, a key signaling molecule, could work in the same pathway in plant immunity. This makes both analysis of the accumulation of SA in asFBP1 plants and the study of double mutants to test for epistasis worthwhile. Example query 2: PMR6. In the second example, we performed a similarity search with a set of genes de-regulated in POWDERY MILDEW RESISTANT 6 (PMR6)-deficient plants (Figure 3c). In pmr6, the resistance is probably not attributable to a strengthening of immune defenses as is the case in certain Arabidopsis mutants, e.g. acd, cim, cpr, lsd and mlo (Vogel et al., 2002). Another mechanism appears to be operating in pmr6, because its resistance to powdery mildew is not compromised in double mutants that are defective in the salicylic acid (SA), jasmonic acid (JA) and ethylene (ET) signaling pathways that appear to control resistance to pathogens in plants. Accordingly, only moderate, albeit statistically significant, repulsion-phase overlaps (28–38 genes) between up-regulated DEGs in pmr6 and downregulated DEGs in SA-deficient (CaMV35S:NahG and eds16) and coronatine-insensitive (coi1) plants were found by MASTA (Figure 3c). PMR6 is a pectate lyase-like gene, consistent with regulation of this process by the cell wall (Vogel et al., 2002). Strikingly, the MASTA search reveals a notable coupling-phase overlap (Figure 3c) between genes up-regulated in pmr6 and those up-regulated in antisense alternative oxidase (asAOX1) plants (89 genes, P < 1.118 · 10)135) (Umbach et al., 2005). ROS-eliminating AOX is a plant mitochondrial enzyme that appears to function in stress resistance and may control cell death (Robson and Vanlerberghe, 2002). asAOX1 plants are capable of inducing AOX1, but the finding that asAOX1 is epistatic to pmr6 has not been reported previously. The MASTA search also revealed that AOX1, PMR5 and PMR6 appear to act in the same pathway; however, the primary overlap between pmr5 and pmr6 DEGs may in part be due to use of the same wild-type reference samples in this microarray experiment (GEO accession GSE9957). These examples highlight the power of MASTA to provide a theoretical account in the study of metabolic and signaling pathway networks. DISCUSSION In this paper, we describe the strategy and algorithms of MASTA, which may help in predicting functional relationships from microarray datasets and better understanding gene functions and their regulatory networks.

Recently developed databases, such as LOLA (Cahan et al., 2005), L2L (Newman and Weiner, 2005), Oncomine (Rhodes et al., 2004) and EXALT (Yi et al., 2007), offer the possibility of being able to search for similarity against DEG lists from published mammalian microarray studies. LOLA and L2L have been compiled from the literature and contain hundreds of lists of DEGs selected from various microarray platforms using various methods, whereas in both Oncomine, the cancer-specific gene expression database, and EXALT, DEG lists were deduced using Students’ t-test statistics. Prior to performing a t-test, data are either preprocessed (e.g. log-transformed, median-centered per array, or standard deviation-normalized in Oncomine) or used as they are (as pre-processed data from the GEO database). Another approach, called functional associations by response overlap (FARO), has been suggested by Nielsen et al. (2007). The DEG lists in FARO were obtained from re-analysis of a compendium of 242 Arabidopsis gene expression responses using the t-test or ANOVA. The results can be viewed using the FARO browser, but (at the time of writing) researchers’ data cannot be incorporated into the database and compared. Nevertheless, FARO has advantages over the use of existing methods such as co-expression analysis, and appears to be capable of revealing biologically meaningful associations between experimental factors, e.g. phytohormones, stresses, pathogens, growth stages, tissue types and mutants (Nielsen et al., 2007). In contrast to these approaches, we have paid special attention to the method by which DEG lists are generated. We demonstrate that an overlap analysis, based on the use of rank product-selected DEG lists, shows inter- and intralaboratory consistency of DEG lists and has substantial power when it comes to the low-replicate microarray data that are prevalent in the microarray databases, such as ArrayExpress (Brazma et al., 2003), GEO (Edgar, 2002), Genevestigator (Zimmermann et al., 2004; Laule et al., 2006) and others. Interestingly, it appears that, when dealing with thousands of genes, MASTA offers, by detecting DEGs, an alternative to direct measurement of cellular levels of biologically active small molecules. As microarrays that monitor gene expression levels in response to various phytohormonal signals are available, MASTA could be a useful tool for diagnosis of the responses elicited by pathogens and stresses in mutant and wild-type plants. Indeed, such in silico prediction regarding reduced levels of SA was found to be correct for transgenic plants with elevated levels of very long-chain fatty acids (A.Y. unpublished results) (Reina-Pinto et al., 2009). The important feature of MASTA, compared with association analyses, such as that in the Connectivity Map project (Lamb et al., 2006), is that it distinguishes between all coupling and repulsion phases when describing overlaps between DEGs, and details them down to the gene probes.


174 Jose´ J. Reina-Pinto et al. This allows MASTA to predict genetic and molecular interactions between factors that have been studied using gene expression profiling. The opportunity to predict enhancer–target and suppressor–target gene interactions has been shown to be a powerful means to assess the complex, multi-factorial phenotypes of the Arabidopsis cuticular mutants lacerata (lcr) and bodyguard (bdg). As predicted by MASTA, a mutation in SERRATE (SE), a known regulator of miRNA biogenesis and pre-mRNA splicing (Laubinger et al., 2008), suppresses morphological development and the epidermal adhesion phenotype of lcr and bdg, suggesting that RNA signaling may be an important regulatory mechanism in response to cuticular mutations (Voisin et al., 2009). The class of methods that includes Sample Angler (http:// bar.utoronto.ca) and Hormonometer (Volodarsky et al., 2009) is worth mentioning here as they are used to compare gene expression profiles from two or more microarray experiments. These methods are based on co-expression metrics, such as Pearson correlation, rather than on similarity of DEG lists, but are not within the scope of this paper. Finally, we would like to mention that a significant proportion of datasets in the repositories lack raw data files (about 20% in GEO for Affymetrix ATH1 GeneChip). This situation with Arabidopsis with regard to the lack of raw data is reminiscent of that noted for humans and rats (Larsson and Sandberg, 2006), and it requires the joint efforts of authors and editors to ensure that public data collections are complete and accessible for large-scale similarity searches. To summarize, we have found that a rank product method provides consistent DEG lists. This could be of benefit in practical similarity searches of a large number of poorly replicated microarray datasets. The MASTA approach appears to be sensitive, quantitative, and easy to automate. MASTA detects biologically relevant similarities between DEGs, including unanticipated parallelism. Further advancement of this high-throughput post-genomic technology requires the development of suitable web-based bioinformatic tools, cross-platform consolidation and integration with existing repositories. MASTA can be accessed through the website of the Bio-Array Resource for Arabidopsis Functional Genomics (http://bar.utoronto.ca/). EXPERIMENTAL PROCEDURES Raw microarray data were obtained from ArrayExpress (http:// www.ebi.ac.uk/microarray-as/ae/), the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo), AtGenExpress (http:// www.arabidopsis.org/index.jsp) and the Integrated Microarray Database System (http://ausubellab.mgh.harvard.edu/imds), via the NASC Affywatch subscription service (http://nasc.nott.ac.uk/), or from authors’ websites or directly from the authors. A suite of tools for microarray data management, analysis and plotting of results was written in the programming language R and run under Mac OS X. Computations for the RP method were performed using custom-written R scripts using packages available

through the Bioconductor project (http://www.bioconductor.org). User-friendly R scripts facilitating the standard RP analysis and the manual are available in Appendices S1–S5). For all other methods, the FlexArray software package for the statistical analysis of microarray data (http://genomequebec.mcgill.ca/FlexArray/) was used. FlexArray provides an interface through which to handle data and run Bioconductor packages. Ranking of the generated DEGs by decreasing reliability was performed using fold change values for FC, pfp values for RP, t values for BT, CT, EB and TT, d score values for SAM and the Z statistic for LPE. Prism 5 (GRAPHPAD Software), Canvas (ACD Systems) and ADOBE ILLUSTRATOR (Adobe Systems) were used for graph production, drawing and image editing, respectively. The statistical significance of the overlap between two groups of DEGs was calculated from the hypergeometric distribution using an online program available at http://elegans.uky.edu/MA/progs/overlap_stats.html. P values are reported without correction for multiplicity of comparisons.

ACKNOWLEDGEMENTS We would like to thank Shauna Somerville (Energy Biosciences Institute and Department of Plant and Microbial Biology, University of California), Jean-Pierre Me´traux (Department of Biology, University of Fribourg), Chiara Tonelli, Eleonora Cominelli (Dipartimento di Scienze Biomolecolari e Biotecnologie, Universita` degli Studi di Milano), Herman Ho¨fte (INRA, Versailles, France), Jae-Heung Ko (Department of Forestry, Michigan State University), Gary Stacey, Jinrong Wan (National Center for Soybean Biotechnology, University of Missouri), Franc¸oise ThibaudNissen (The Institute for Genomic Research, Rockville, MD), Steve Howell (Plant Sciences Institute, Iowa State University), Stephen Goldman (Plant Science Research Center, University of Toledo), Scott Baerson (National Center for Natural Products Research, University of Mississippi), Trevor Stevenson (Department of Applied Biotechnology and Environmental Biology, RMIT University) and Scott Poethig (Department of Biology, University of Pennsylvania) for making their raw microarray data available to us. This work was supported by an International Max Planck Graduate Research School (IMPRS) fellowship to R.T. and by Max Planck Gesellschaft (MPG) postdoctoral research fellowships to J.J.R.-P. and D.V.

SUPPORTING INFORMATION Additional Supporting Information may be found in the online version of this article: Figure S1. Effects of various pre-processing algorithms on the intralaboratory consistency of DEG rankings obtained using RP and FC methods in Arabidopsis. Figure S2. Probing the MASTA database using genes differentially expressed in response to hydrogen peroxide treatment (GEO GSE5530). Figure S3. Probing the MASTA database using genes differentially expressed in asFBP1 transgenic plants (http://ausubellab.mgh.harvard.edu/imds/). Figure S4. Probing the MASTA database using genes differentially expressed in the pmr6 mutant (http://ausubellab.mgh.harvard.edu/ imds/). Table S1. Gene list overlaps. Appendices S1–S4. R scripts allowing user-friendly calculations in the RP analysis. Appendix S5. RPtable user manual (Mac OS X and Windows). Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.


Overlap meta-analysis of microarrays 175 REFERENCES Allison, D.B., Cui, X., Page, G.P. and Sabripour, M. (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7, 55–65. Bindschedler, L.V., Dewdney, J., Blee, K.A., Stone, J.M., Asai, T., Plotnikov, J., Denoux, C., Hayes, T., Gerrish, C. and Davies, D.R. (2006) Peroxidasedependent apoplastic oxidative burst in Arabidopsis required for pathogen resistance. Plant J. 47, 851–863. Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Vilo, J., Abeygunawardena, N., Holloway, E., Kapushesky, M., Kemmeren, P. and Lara, G.G. (2003) ArrayExpress – a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68. Breitling, R. and Herzyk, P. (2005) Rank-based methods as a non-parametric alternative of the T-statistic for the analysis of biological microarray data. J. Bioinform. Comput. Biol. 3, 1171–1189. Breitling, R., Armengaud, P., Amtmann, A. and Herzyk, P. (2004) Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett. 573, 83–92. Cahan, P., Ahmad, A.M., Burke, H., Fu, S., Lai, Y., Florea, L., Dharker, N., Kobrinski, T., Kale, P. and McCaffrey, T.A. (2005) List of lists-annotated (LOLA): a database for annotation and comparison of published microarray gene lists. Gene, 360, 78–82. Cahan, P., Rovegno, F., Mooney, D., Newman, J.C., St Laurent, G. and McCaffrey, T.A. (2007) Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardization. Gene, 401, 12– 18. Clarke, J.D., Aarts, N., Feys, B.J., Dong, X. and Parker, J.E. (2001) Constitutive disease resistance requires EDS1 in the Arabidopsis mutants cpr1 and cpr6 and is partially EDS1-dependent in cpr5. Plant J. 26, 409–420. Edgar, R. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210. Fox, R.J. and Dimmic, M.W. (2006) A two-sample Bayesian t-test for microarray data. BMC Bioinformatics, 7, 126. Guo, L., Lobenhofer, E.K., Wang, C., Shippy, R., Harris, S.C., Zhang, L., Mei, N., Chen, T., Herman, D. and Goodsaid, F.M. (2006) Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat. Biotechnol. 24, 1162–1169. Hong, F., Breitling, R., McEntee, C.W., Wittner, B.S., Nemhauser, J.L. and Chory, J. (2006) RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics, 22, 2825–2827. Hosack, D.A., Dennis, G. Jr, Sherman, B.T., Lane, H.C. and Lempicki, R.A. (2003) Identifying biological themes within lists of genes with EASE. Genome Biol. 4, R70. Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H. and He, Y.D. (2000) Functional discovery via a compendium of expression profiles. Cell, 102, 109–126. Jain, N., Thatte, J., Braciale, T., Ley, K., O’Connell, M. and Lee, J.K. (2003) Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics, 19, 1945–1951. Jeffery, I.B., Higgins, D.G. and Culhane, A.C. (2006) Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics, 7, 359. Lamb, J., Crawford, E.D., Peck, D., Modell, J.W., Blat, I.C., Wrobel, M.J., Lerner, J., Brunet, J.P., Subramanian, A. and Ross, K.N. (2006) The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science, 313, 1929–1935. Larsson, O. and Sandberg, R. (2006) Lack of correct data format and comparability limits future integrative microarray research. Nat. Biotechnol. 24, 1322–1323. Laubinger, S., Sachsenberg, T., Zeller, G., Busch, W., Lohmann, J.U., Ra¨tsch, G. and Weigel, D. (2008) Dual roles of the nuclear cap-binding complex and SERRATE in pre-mRNA splicing and microRNA processing in Arabidopsis thaliana. Proc. Natl Acad. Sci. USA, 105, 8795–8800. Laule, O., Hirsch-Hoffmann, M., Hruz, T., Gruissem, W. and Zimmermann, P. (2006) Web-based analysis of the mouse transcriptome using Genevestigator. BMC Bioinformatics, 7, 311. Long, A.D., Mangalam, H.J., Chan, B.Y.P., Tolleri, L., Hatfield, G.W. and Baldi, P. (2001) Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework. Analysis of

global gene expression in Escherichia coli K12. J. Biol. Chem. 276, 19937– 19944. MAQC Consortium. (2006) The MicroArray Quality Control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161. Marshall, E. (2004) Getting the noise out of gene arrays. Science, 306, 630– 631. Marton, M.J., DeRisi, J.L., Bennett, H.A., Iyer, V.R., Meyer, M.R., Roberts, C.J., Stoughton, R., Burchard, J., Slade, D. and Dai, H. (1998) Drug target validation and identification of secondary drug target effects using DNA microarrays. Nat. Med. 4, 1293–1301. Newman, J.C. and Weiner, A.M. (2005) L2L: a simple tool for discovering the hidden significance in microarray expression data. Genome Biol. 6, R81. Nielsen, H.B., Mundy, J. and Willenbrock, H. (2007) Functional associations by response overlap (FARO), a functional genomics approach matching gene expression phenotypes. PLoS ONE, 2, e676. Nishimura, M.T., Stein, M., Hou, B.H., Vogel, J.P., Edwards, H. and Somerville, S.C. (2003) Loss of a callose synthase results in salicylic acid-dependent disease resistance. Science, 301, 969–972. Reina-Pinto, J.J., Voisin, D., Kurdyukov, S. et al. (2009) Misexpression of FATTY ACID ELONGATION1 in the Arabidopsis epidermis induces cell death and suggests a critical role for phospholipase A2 in this process. Plant Cell, 21, 1252–1272. Rhodes, D.R., Yu, J., Shanker, K., Deshpande, N., Varambally, R., Ghosh, D., Barrette, T., Pandey, A. and Chinnaiyan, A.M. (2004) Large-scale metaanalysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc. Natl Acad. Sci. USA, 101, 9309–9314. Robson, C.A. and Vanlerberghe, G.C. (2002) Transgenic plant cells lacking mitochondrial alternative oxidase have increased susceptibility to mitochondria-dependent and -independent pathways of programmed cell death. Plant Physiol. 129, 1908–1920. Shi, L., Tong, W., Fang, H., Scherf, U., Han, J., Puri, R.K., Frueh, F.W., Goodsaid, F.M., Guo, L. and Su, Z. (2005) Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential. BMC Bioinformatics, 6, S12. Toufighi, K., Brady, S.M., Austin, R., Ly, E. and Provart, N.J. (2005) The Botany Array Resource: e-Northerns, expression angling, and promoter analyses. Plant J. 43, 153–163. Tusher, V.G., Tibshirani, R. and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 5116–5121. Umbach, A.L., Fiorani, F. and Siedow, J.N. (2005) Characterization of transformed Arabidopsis with altered alternative oxidase levels and analysis of effects on reactive oxygen species in tissue. Plant Physiol. 139, 1806–1820. Vardhanabhuti, S., Blakemore, S.J., Clark, S.M., Ghosh, S., Stephens, R.J. and Rajagopalan, D. (2006) A comparison of statistical tests for detecting differential expression using Affymetrix oligonucleotide microarrays. OMICS, 10, 555–566. Vogel, J.P., Raab, T.K., Schiff, C. and Somerville, S.C. (2002) PMR6, a pectate lyase-like gene required for powdery mildew susceptibility in Arabidopsis. Plant Cell, 14, 2095–2106. Voisin, D., Nawrath, C., Kurdyukov, S., Franke, R.B., Reina-Pinto, J.J., Efremova, N., Will, I., Schreiber, L. and Yephremov, A. (2009) Dissection of the complex phenotype in cuticular mutants of Arabidopsis reveals a role of SERRATE as a mediator. PLoS Genet 5, e703. doi: 10.1371/journal. pgen.1000703. Volodarsky, D., Leviatan, N., Otcheretianski, A. and Fluhr, R. (2009) HORMONOMETER: a tool for discerning transcript signatures of hormone action in the Arabidopsis transcriptome. Plant Physiol. 150, 1796–1805. Wright, G.W. and Simon, R.M. (2003) A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics, 19, 2448–2455. Wu, Z. and Irizarry, R.A. (2004) Preprocessing of oligonucleotide array data. Nat. Biotechnol. 22, 656–658. Yi, Y., Li, C., Miller, C. and George, A.L. Jr (2007) Strategy for encoding and comparison of gene expression signatures. Genome Biol. 8, R133. Zimmermann, P., Hirsch-Hoffmann, M., Hennig, L. and Gruissem, W. (2004) GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol. 136, 2621–2632.