The Plant Journal (2005) 41, 212–220
doi: 10.1111/j.1365-313X.2004.02295.x
Transcriptional divergence of the duplicated oxidative stress-responsive genes in the Arabidopsis genome H. Stanley Kim1, Yan Yu1, Erik C. Snesrud1, Linda P. Moy1, Lara D. Linford1, Brian J. Haas1, William C. Nierman1 and John Quackenbush1,2,3,4,* 1 The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA, 2 Department of Biochemistry, The George Washington University, Washington, DC 20037, USA, 3 Department of Statistics, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, MD 21205, USA, and 4 Department of Chemical Engineering, the University of Maryland, College Park, MD 20742, USA Received 21 July 2004; revised 5 October 2004; accepted 14 October 2004. * For correspondence (fax þ301 838 0208; e-mail
[email protected]).
Summary Previous studies have indicated that Arabidopsis thaliana experienced a genome-wide duplication event shortly before its divergence from Brassica followed by extensive chromosomal rearrangements and deletions. While a large number of the duplicated genes have significantly diverged or lost their sister genes, we found 4222 pairs that are still highly conserved, and as a result had similar functional assignments during the annotation of the genome sequence. Using whole-genome DNA microarrays, we identified 906 duplicated gene pairs in which at least one member exhibited a significant response to oxidative stress. Among these, only 117 pairs were up- or down-regulated in both pairs and many of these exhibited dissimilar patterns of expression. Examination of the expression patterns of PAL1 and PAL2, ACD1 and ACD2, genes coding for two Hsp20s, various P450s, and electron transfer flavoproteins suggests Arabidopsis evolved a number of distinct oxidative stress response mechanisms using similar gene sets following the duplication of its genome. Keywords: DNA microarray, gene expression, gene duplication, Arabidopsis, oxidative stress, H2O2.
Introduction Analysis of the Arabidopsis thaliana genome sequence revealed the existence of pairs of chromosomal blocks that exhibit high-level synteny and similarity in gene content (Arabidopsis Genome Initiative, 2000; Blanc et al., 2000; Lynch and Conery, 2000; Bowers et al., 2003). These blocks span nearly the entirety of the genome without overlapping and are suggested to be remnants from a recent genomewide duplication. Statistical and phylogenetic studies suggest that this genome duplication most likely occurred after the Arabidopsis lineage diverged from that of soybean (Glycine max) and shortly before diverging from its sister genus, Brassica (Blanc et al., 2003; Bowers et al., 2003; Ermolaeva et al., 2003), with gene loss following duplication contributing to the speciation process (Ermolaeva et al., 2003; Lynch and Force, 2000). The possible existence of more ancient duplication events has also been reported. One appears to pre-date divergence of Arabidopsis from the other dicots analyzed, but post-date divergence from the monocots about 170–235 Ma (Blanc et al., 2003; Bowers 212
et al., 2003), but Bowers et al. (2003) reported an even older event that may pre-date monocot–dicot divergence. However, the signal from these more ancient duplications is much more difficult to detect as the duplicated blocks have been severely rearranged by more recent duplication events. Soon after a duplication event, degenerative mutations are likely to eliminate many duplicated genes from the genome (Force et al., 1999; Lynch and Conery, 2000; Lynch and Force, 2000; Otto and Whitton, 2000). The remaining duplicated genes are thought to be a source of biochemical diversity, although little is known about the mechanism that selectively preserves duplicates. Lately there has been much interest in whether a positive correlation exists between coding region divergence and gene expression. Two studies (Gu et al., 2002; Wagner, 2000) used yeast microarray data to test the presence of such correlation on a genome-wide scale. While Wagner (2000) did not find significant correlation between coding sequence (CDS) divergence and expression divergence, Gu et al. (2002) observed that the ª 2004 Blackwell Publishing Ltd
Divergence in duplicated Arabidopsis genes 213 expression divergence between duplicated genes significantly correlated with their synonymous divergence and also partly with their non-synonymous divergence. Makova and Li (2003) analyzed duplicated gene pairs in humans and suggested that protein sequence divergence is at least initially coupled to spatial expression patterns. Here we present an analysis of the expression of duplicated genes arising from the most recent genomewide duplication in Arabidopsis. As a model, we examined the genes associated with oxidative stress response, which is an integral component of the plant’s central defense mechanism and consequently must constantly evolve to allow survival in an ever-changing environment (Kovtun et al., 2000; Lamb and Dixon, 1997; McDowell and Dangl, 2000). We found that although the duplicated genes retain a high degree of CDS conservation, the transcriptional response of the majority has diverged, suggesting adaptive evolutionary mechanisms are at work. As an example, we discuss the evolutionary response of a few key genes and their significance in the stress response mechanism in Arabidopsis. Results and discussion To identify duplicated genes in Arabidopsis, we used the whole-genome comparison tool MuMmer (TIGR, Rockville, MD, USA; http://www.tigr.org/software), requiring a minimum 30% identity over at least 40% of the shorter protein length, and found 4222 putative duplicated gene pairs in 37 paired genomic segments (Table 1; Figure 1). These represent the recently duplicated blocks within the genome and are nearly identical to previously identified duplicated segments (Arabidopsis Genome Initiative, 2000; Blanc et al., 2003; Bowers et al., 2003; Ermolaeva et al., 2003). As noted in previous studies, we found that within duplicated segments, only about one-third of the total number of genes located within any segment can be associated with a corresponding gene in its duplicated partner (Table 1), suggesting that a major fraction of the duplicated copies may have greatly diverged or been deleted. We then exposed Arabidopsis Col-0 plants grown in liquid culture to H2O2 (5 mM final concentration) and collected plants at 0, 1, 3, 6, and 12 h following exposure. To reduce biological variation, plants at each time point were collected and pooled from three replicate growth cultures, each containing about 100 individual plants. Messenger RNA was extracted from the plant pools and gene expression profiles relative to the control (0 h) sample were obtained using whole-genome DNA microarrays. Of the 25 636 genes represented on these arrays, 20 469 satisfied our quality control standards and we were able to identify 1907 genes that were differentially expressed (95% confidence) in response to oxidative stress at one or more time points (Supplemental Table S1). Comparing these differentially ª Blackwell Publishing Ltd, The Plant Journal, (2004), 41, 212–220
regulated genes to the set of 4222 duplicated gene pairs, we found 906 pairs containing at least one member responsive to oxidative stress; of these, in only 117 pairs did both members exhibit a significant response (Table 1; Figure 2). This suggests that most of these duplicated copies of oxidative stress-responsive genes have either functionally diverged or one has been transcriptionally silenced and are therefore no longer redundant, although we cannot rule out that some may retain partial redundancy. While patterns of expression varied among individual genes, we found two pairs of duplicated segments (labeled 11 and 32 in Table 1 and containing 20 and 98 genes, respectively) in which a number of genes in one of the two sister segments significantly responded to oxidative stress while all their duplicated partners in their corresponding segments did not even have detectable expression levels (Figure 3a). Although most of the apparently silenced segments are scattered throughout the genome, two are located adjacent to each other near the centromere of chromosome 4, suggesting this may be a heterochromatic region. However, this region is not enriched in pseudogenes and transposable elements as occurs in most known heterochromatic regions (Copenhaver et al., 1999; McCombie et al., 2000). Within the silenced regions are genes coding for subtilisin-like serine proteases and the disease resistance protein AIG1 (pair 11), which have been previously implicated in stress response (Reuber and Ausubel, 1996; Tornero et al., 1996), and for various protein kinases (pair 32) that play pivotal roles in stress-related signaling (Kovtun et al., 2000; Levine et al., 1994). This apparent silencing of genes implemented in stress response warrants further study to understand the potential functions encoded by these genes and the mechanisms responsible for suppressing their expression. Technically, the data from these segment pairs demonstrate that our microarray assays can distinguish expression patterns of genes that share a high degree of sequence similarity; genes in these duplicated pairs range from 53 to 84% identity at the DNA level (Figure 3b). This is consistent with the results reported by Miki et al. (2001), in which they showed using mouse full-length cDNA arrays that the signal intensity due to cross-hybridization between genes sharing 80% identity was only about one-tenth of that from the legitimate perfect match probe. Among the 117 gene pairs that showed significant responses in both partners, the patterns of expression between duplicates ranged from similar to very distinct (Figure 4). Using quantitative real-time reverse transcriptionpolymerase chain reaction (qRT-PCR) we confirmed the microarray data for four pairs of duplicated genes, three with similar expression profiles and one in which the expression profiles diverged (Figure 5); the Pearson correlation coefficient between results from these techniques is 0.95. Our ability to recover the expression patterns observed on the
214 H. Stanley Kim et al. Table 1 Expression in the duplicated regions of the Arabidopsis genome Group 1a
Group 2a
Number of genes
H2O2responsive genes
Duplicated pairs between groups 1 and 2
Number of genes
H2O2responsive genes
Segment In In In In pairb Locationc Totald pairse Totald pairse Locationc Totald pairse Totald pairse 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Total
Chr 1 Chr 1 Chr 1 Chr 1 Chr 1 Chr 1 Chr 1 Chr 1 Chr 1 Chr 1 Chr 1 Chr 1 Chr 1 Chr 1 Chr 1 Chr 2 Chr 2 Chr 2 Chr 2 Chr 2 Chr 3 Chr 3 Chr 3 Chr 3 Chr 3 Chr 3 Chr 3 Chr 3 Chr 3 Chr 3 Chr 4 Chr 4 Chr 4 Chr 4 Chr 4 Chr 5 Chr 5
344 186 507 344 402 122 212 61 370 452 115 204 68 117 28 1249 222 205 279 265 329 310 98 40 118 341 936 66 215 401 79 181 402 125 154 304 338
119 85 210 87 149 56 49 21 111 130 34 55 34 38 8 373 79 65 104 75 72 124 31 19 26 112 201 21 69 111 26 57 119 34 80 115 99
23 20 40 26 20 13 23 3 16 28 14 17 5 7 3 100 17 15 29 30 16 21 12 1 4 28 60 1 20 27 5 0 27 7 8 16 16
11 12 25 6 10 8 7 1 8 10 6 6 3 0 1 43 9 5 15 7 4 9 3 1 0 14 1 0 4 8 5 0 18 3 6 8 2
10189
3198
718
279
Chr 1 Chr 1 Chr 1 Chr 1 Chr 2 Chr 2 Chr 2 Chr 3 Chr 3 Chr 3 Chr 4 Chr 4 Chr 5 Chr 5 Chr 5 Chr 3 Chr 4 Chr 4 Chr 4 Chr 4 Chr 4 Chr 5 Chr 5 Chr 5 Chr 5 Chr 5 Chr 5 Chr 5 Chr 5 Chr 5 Chr 4 Chr 4 Chr 5 Chr 5 Chr 5 Chr 5 Chr 5
H2O2-responsive genes Number of gene Only in In both Percentage pairsf one group groups in both
468 260 598 238 646 219 107 135 438 434 82 342 100 79 31 1076 215 211 321 302 245 442 112 64 138 562 61 85 205 542 91 162 365 226 325 472 301
136 92 230 79 162 54 41 23 117 127 30 57 31 30 8 394 85 75 111 88 71 131 29 22 27 120 27 20 62 148 26 66 132 39 90 124 61
33 13 56 16 58 16 5 13 26 35 0 18 13 5 2 58 24 10 43 22 14 33 5 5 16 42 9 9 14 41 4 22 16 12 22 28 6
15 8 24 7 14 7 1 4 9 16 0 3 4 4 1 28 15 4 22 8 4 15 3 1 6 8 3 1 8 11 2 17 10 4 10 6 5
282 116 271 107 230 71 61 24 133 219 78 71 36 69 10 490 94 83 170 104 80 168 36 22 29 137 28 23 139 99 42 210 220 46 112 112 0
47 30 52 15 26 10 13 3 19 56 20 9 6 19 0 73 17 9 43 15 10 30 5 0 6 25 4 1 10 19 9 83 47 8 14 11 10
4 3 4 1 5 10 0 1 2 13 0 0 1 0 1 8 5 1 10 2 1 12 1 1 0 3 0 0 3 2 4 0 12 0 5 2 0
8 9 7 6 16 50 0 25 10 19 0 0 14 0 100 10 23 10 19 12 9 29 17 100 0 11 0 0 23 10 31 0 20 0 26 15 0
10700
3165
764
308
4222
789
117
13
a
Groups 1 and 2 represent the collections of the genes present in a segment or the other in each of the 37 pairs. Segments from the pairs were randomly picked to be grouped together. b Pairs of duplicated chromosomal segments found by MuMmer. c Location of the chromosomal segments in the genome. d Total number of genes present in a segment. e Genes that have matching genes of high similarity (i.e. possible duplicated genes) in the other segment. These comprise about one-third of the total genes. f Number of the gene pairs identified by MuMmer in each segment pair. Some genes have multiple partners.
arrays using qRT-PCR, a gene sequence-specific technique, confirms that the similarity in expression patterns observed for these three pairs is not due to cross-hybridization but reflects the actual expression levels of the individual genes.
While the physiological and evolutionary significance of the observed patterns of expression for each gene pair remains to be exhaustively studied, it is possible that many of the pairs in which the genes have similar patterns of ª Blackwell Publishing Ltd, The Plant Journal, (2004), 41, 212–220
Divergence in duplicated Arabidopsis genes 215
30 M bp
20
10
Chr1
Chr2
Chr3
Chr4
Chr5
Figure 1. Genome-wide segmental duplications identified in the Arabidopsis genome. Matching duplicated regions are denoted with connecting lines. The entire list of the duplicated genes and their physical locations on the chromosomes can be found at: http://www.tigr.org/tdb/e2k1/ath1/ arabGenomeDups.html.
expression have retained identical or similar functions. In fact, phenylalanine ammonia lyases 1 and 2, encoded by At2g37040 and At3g53260 respectively, are functionally redundant, and their expression is both qualitatively and quantitatively similar in a number of plant tissues and under various inductive conditions (Chong et al., 2001; Wanner et al., 1995). These enzymes play key roles in the synthesis of lignin and antimicrobial agents collectively called phytoalexins (Chong et al., 2001), and rapid accumulation
Figure 2. Oxidative stress-responsive gene expression in Arabidopsis. Approximately 9% of the genes tested (1907 of 20 469) exhibited significant transcriptional response at least one time point. Within this set were representatives of 906 duplicated gene pairs of the 4222 pairs identified through sequences analysis; 789 pairs were represented by a single significantly regulated gene, while in 117 pairs both members were identified as significant.
ª Blackwell Publishing Ltd, The Plant Journal, (2004), 41, 212–220
of these enzymes is believed to be beneficial to the plant to defend itself against environmental challenges including pathogen attack (Grant et al., 2000). Thus, we cannot rule out the possibility that both copies have been retained because rapidly increased gene dosage provides a fitness advantage. However, it is also possible that both copies may actually have undergone quantitative subfunctionalization whereby each copy has partially degraded so that both are required for optimal fitness, but there still remains a partial overlap in function (Force et al., 1999; Lynch and Force, 2000). Genes At1g07400 and At2g29500, which also exhibited similar patterns of expression, are predicted to encode heatshock proteins (Hsp) of the Hsp20 family that form large highly ordered chaperone complexes (heat-stress granules, HSG) under heat stress (Waters et al., 1996). HSGs are found in all tissues, and are believed to protect denatured proteins under stresses (Vierling, 1991; Waters et al., 1996). It will be interesting to investigate whether the protein products of At1g07400 and At2g29500 simply provide higher copy numbers for the Hsp, or alternatively, whether they have undergone quantitative subfunctionalization and are now specialized to different parts of the cell. Genes At2g16500 and At4g34710 encode arginine decarboxylase 1 (ADC1) and arginine decarboxylase 2 (ADC2), respectively (Soyka and Heyer, 1999; Watson et al., 1997). Both proteins are involved in the synthesis of diamine putrescine that is protective against various stresses, including osmotic, salt, heat, chilling, and oxidative stresses (Bouchereau et al., 1999). While ADC1 has been found in all tissues tested, ADC2 is primarily expressed in siliques and cauline leaves (Soyka and Heyer, 1999). Although it remains to be experimentally verified, we can postulate that ADC2 has remnant expression in a subset of the tissues in which the ancestral gene is expressed, while ADC1 has retained expression in a larger set of tissues, but both still maintain similar regulation schemes in response to oxidative stress. There are approximately 20 duplicated gene pairs where members exhibited distinct (often the opposite) expression patterns (Figure 4). Among these are the genes coding for various cytochrome P450s (At2g23220 and At4g37370, At2g23220 and At4g37410, At2g25160 and At4g31970) and electron transfer flavoproteins (At4g20830 and At5g44380, At4g20830 and At5g44410, At4g20840 and At5g44380, At4g20840 and At5g44410, At4g20860 and At5g44380, At4g20860 and At5g44410) that are implicated in diverse cellular reactions including detoxifications (Massey, 2000). Diverged expression patterns between the duplicated copies may imply a functional divergence in response to oxidative stress. Genes showing such transcriptional divergence are good candidates for further studies involving mutational and genetic experiments comparing single and double mutant phenotypes to determine whether no or only weak phenotypic synergy between copies remains. As is common for flavoproteins, genes are not only duplicated across
216 H. Stanley Kim et al. Figure 3. A region in chromosome 4 appears silenced in response to oxidative stress. (a) Two rectangles represent genes on chromosomes 4 and 1 in the order in which they appear with genes identified as responsive to oxidative stress denoted with black dots; ‘CEN’ indicates the location of the centromere. Segment pairs 11 and 32 (see Table 1) and their relative locations and orientations are shown with blue and red yellow arrows, respectively. (b) Percentage identity of the 98 significantly regulated genes in homology block 32 (group 2) relative to the corresponding unresponsive genes in the corresponding duplicated segment (group 1); the percentage identity of the genes is shown in the order the genes are present in group 2.
(a)
(b)
segments, but exist within each segment as tandem duplicates (Figure 4). The fact that these tandem duplicates have similar expression patterns within each segment, but that the segmental duplications exhibit divergent expression suggests that tandem duplication may have occurred after segmental duplication. However, we cannot rule out the possibility that the tandemly duplicated genes are simply under the same regulation. To determine whether there is a correlation between sequence similarity and expression pattern, we limited our analysis to 57 gene pairs for which data were available at all time points (Table S2). For these, we calculated the Pearson correlation coefficient between the measured sequence percentage identity, using the annotated CDS regions as well as 1000 base pairs upstream to capture presumed promoter regions, and the Euclidean distance measured between paired gene expression vectors. The relationship is shown graphically for the CDS sequences in Figure 6. Overall similarity levels in the CDS (r ¼ 0.16) and promoter sequences (r ¼ 0.10) did not significantly correlate with the observed expression patterns. Thus, although sequence divergence is generally associated with functional divergence, our data indicate that the relationship is not simply linear. Rather, mutational changes must have affected regulatory motifs in order to produce a change in expression. Our data are consistent with the paradigm that the majority of duplicated gene copies lose their function, while a smaller proportion retain their original function or acquire a new function through sequence divergence (Force et al., 1999; Lynch and Force, 2000; Mitchell-Olds and Clauss, 2002;
Otto and Whitton, 2000). While much additional work remains to dissect the functions of duplicated gene pairs, more analysis of genome-wide expression in response to variety of conditions coupled with analysis of sequence conservation will provide clearer pictures of the evolution of the entire set of the duplicated genes. The growing body of whole-genome expression data, from studies using a variety of techniques such as whole-genome tiling arrays (Yamata et al., 2003) and massively parallel signature sequencing (Myers et al., 2004), promise to provide the necessary data to further elucidate these relationships. Indeed, such analysis is necessary as much of the existing homology-based annotation emerging from genome projects fails to distinguish between paralogous genes, leading to potentially spurious conclusions about gene function. Further, we believe that analysis of the stress-responsive duplicated genes will lead us to more complete understanding of plant stress biology. In a practical sense, such information will also be of great use in agriculture in developing disease and stress-resistant crops.
Experimental procedures Plant material Wild-type Arabidopsis thaliana Columbia plants were germinated and grown in 500 ml flasks (approximately 120 seeds per flask) with 100 ml of 0.5· Murashige and Skoog (MS) medium (pH 5.7) (Murashige and Skoog, 1962), supplemented with vitamins and sucrose without hormone, for 14 days with shaking at 100 rpm under constant light. H2O2 (Sigma, St Louis, MO, USA) was added to the liquid medium to a final concentration of 5 mM. Treated flasks ª Blackwell Publishing Ltd, The Plant Journal, (2004), 41, 212–220
Divergence in duplicated Arabidopsis genes 217
unknown protein unknown protein unknown protein Mlo protein, putative expressed protein putative transport protein hypothetical protein hypothetical protein myb factor, putative expressed protein expressed protein unknown protein receptor lectin kinase, putative receptor lectin kinase, putative receptor protein kinase - like protein receptor protein kinase - like protein myb family transcription factor putative protein hypothetical protein expressed protein expressed protein expressed protein oxidoreductase, putative heat shock protein, putative potential phospholipid-transporting ATPase 11 purple acid phosphatase, putative purple acid phosphatase, putative major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related phosphoethanolamine N-methyltransferase, putative phosphoethanolamine N-methyltransferase, putative jasmonate inducible protein, putative jasmonate inducible protein, putative jasmonate inducible protein, putative jasmonate inducible protein, putative jasmonate inducible protein, putative jasmonate inducible protein, putative jasmonate inducible protein, putative jasmonate inducible protein, putative jasmonate inducible protein, putative jasmonate inducible protein, putative aquaporin, putative NAM-like protein expressed protein expressed protein arginine decarboxylase putative protein kinase expressed protein peroxidase, putative unknown protein peroxidase, putative putative auxin-induced protein putative auxin-induced protein CONSTANS B-box zinc finger family protein hypothetical protein GTP-binding protein, putative potential calcium-transporting ATPase 7 cytochrome p450, putative cytochrome p450, putative cytochrome p450, putative cytochrome p450, putative calmodulin-binding protein cytochrome p450, putative phenylalanine ammonia lyase (PAL1) expressed protein putative nematode-resistance protein putative CCCH-type zinc finger protein putative integral membrane protein nodulin expressed protein putative chloroplast nucleoid DNA binding protein unknown protein putative auxin-induced protein putative auxin-induced protein putative auxin-induced protein putative auxin-induced protein putative auxin-induced protein putative auxin-induced protein putative auxin-induced protein putative auxin-induced protein putative auxin-induced protein putative auxin-induced protein putative auxin-induced protein putative auxin-induced protein putative RING-H2 zinc finger protein ATL6 expressed protein monodehydroascorbate reductase, putative pectinesterase family DRE binding protein (DREB2B) ammonium transporter, putative zinc finger -like protein zinc finger protein ZAT7 Myb family transcription factor peroxidase, putative putative protein membrane channel like protein ethylene-responsive element binding factor(AtERF6) ethylene responsive element binding factor 1 (frameshift !) TMV resistance protein N - like FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family xyloglucan endotransglycosylase (XTR-6) xyloglucan endotransglycosylase (XTR-6) xyloglucan endotransglycosylase XTR9 xyloglucan endotransglycosylase XTR9 tryptophan synthase beta-subunit (TSB2)
At1g09860 At1g09860 At1g09860 At1g11310 At1g15380 At1g16390 At1g16420 At1g17710 At1g18570 At1g20100 At1g21380 At1g24150 At4g04960 At4g04960 At4g05200 At4g05200 At5g07690 At5g09800 At1g04360 At1g05340 At1g05340 At1g05340 At1g06640 At1g07400 At1g13210 At1g14700 At1g14700 At1g14930 At1g14930 At1g14940 At1g14940 At1g14950 At1g14950 At1g14960 At1g14960 At1g48600 At1g48600 At1g52060 At1g52060 At1g52060 At1g52060 At1g52060 At1g52060 At1g52100 At1g52100 At1g52100 At1g52100 At1g52180 At1g52890 At1g55530 At1g66180 At2g16500 At2g16750 At2g16900 At2g18150 At2g18210 At2g18980 At2g21210 At2g21210 At2g21320 At2g21740 At2g22290 At2g22950 At2g23220 At2g23220 At2g23220 At2g23220 At2g24300 At2g25160 At2g37040 At2g38860 At2g40000 At2g40140 At2g40900 At2g41640 At2g42980 At2g44600 At3g03820 At3g03820 At3g03820 At3g03820 At3g03820 At3g03820 At3g03840 At3g03840 At3g03840 At3g03840 At3g03840 At3g03840 At3g05200 At3g06070 At3g09940 At3g10710 At3g11020 At3g24290 At3g46070 At3g46090 At3g46130 At3g49960 At3g50910 At4g17340 At4g17490 At4g17500 At4g19530 At4g20820 At4g20820 At4g20830 At4g20830 At4g20840 At4g20840 At4g20860 At4g20860 At4g25810 At4g25810 At4g25820 At4g25820 At4g27070
ª Blackwell Publishing Ltd, The Plant Journal, (2004), 41, 212–220
At1g57943 At1g57980 At1g57990 At1g61560 At1g80160 At1g79410 At1g79310 At1g73010 At1g74080 At1g75860 At1g76970 At1g70140 At4g21380 At4g21390 At4g21380 At4g21390 At5g61420 At5g64660 At5g43420 At2g32190 At2g32200 At2g32210 At2g30830 At2g29500 At3g25610 At2g01880 At2g01890 At2g01520 At2g01530 At2g01520 At2g01530 At2g01520 At2g01530 At2g01520 At2g01530 At3g17990 At3g18000 At3g16390 At3g16400 At3g16410 At3g16440 At3g16450 At3g16460 At3g16410 At3g16440 At3g16450 At3g16460 At3g16240 At3g15500 At3g13430 At5g37540 At4g34710 At4g35030 At4g35110 At4g36430 At4g36500 At4g30170 At4g38850 At4g38860 At4g38960 At4g39340 At4g39890 At4g37640 At4g37310 At4g37340 At4g37370 At4g37410 At4g31000 At4g31970 At3g53260 At3g54600 At3g55840 At3g55980 At3g56620 At3g57380 At3g59080 At3g60200 At5g18010 At5g18020 At5g18030 At5g18050 At5g18060 At5g18080 At5g18010 At5g18020 At5g18030 At5g18050 At5g18060 At5g18080 At5g27420 At5g19190 At5g03630 At5g04960 At5g05410 At4g13510 At5g59820 At5g59820 At5g59780 At5g67400 At5g66480 At5g47450 At5g47230 At5g47230 At5g44910 At5g44380 At5g44410 At5g44380 At5g44410 At5g44380 At5g44410 At5g44380 At5g44410 At5g57530 At5g57560 At5g57530 At5g57560 At5g54810
hypothetical protein hypothetical protein unknown protein Mlo protein, putative expressed protein hypothetical protein hypothetical protein common family expressed protein putative transcription factor unknown protein unknown protein hypothetical protein receptor-like serine/threonine protein kinase ARK3 serine/threonine kinase - like protein receptor-like serine/threonine protein kinase ARK3 serine/threonine kinase - like protein myb-related transcription factor(mixta), putative putative protein putative protein expressed protein unknown protein expressed protein putative dioxygenase putative small heat shock protein ATPase II, putative putative purple acid phosphatase putative purple acid phosphatase major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related major latex protein (MLP)-related unknown protein methyltransferase, putative putative lectin putative lectin putative lectin putative lectin putative lectin putative lectin putative lectin putative lectin putative lectin putative lectin delta tonoplast integral protein (delta-TIP) putative jasmonic acid regulatory protein unknown protein putative protein arginine decarboxylase SPE2 protein kinase - like putative protein peroxidase, putative putative protein peroxidase, putative small auxin up RNA (SAUR-AC1) putative auxin-induced protein CONSTANS B-box zinc finger family protein hypothetical protein GTP-binding protein, putative calcium-transporting ATPase 2 cytochrome p450, putative cytochrome P450 monooxygenase, putative cytochrome p450 family cytochrome p450, putative calmodulin-binding protein cytochrome p450 family phenylalanine ammonia-lyase putative protein nematode resistance protein-like protein putative protein nodulin-like protein putative protein putative protein putative protein auxin-induced protein (SAUR), putative auxin-induced protein-like auxin-induced protein-like auxin-induced protein-like auxin-induced protein-like auxin-induced protein-like auxin-induced protein (SAUR), putative auxin-induced protein-like auxin-induced protein-like auxin-induced protein-like auxin-induced protein-like auxin-induced protein-like RING-H2 zinc finger protein-like putative protein monodehydroascorbate reductase, putative pectinesterase family DRE binding protein (DREB2A) ammonium transport protein (AMT1) zinc finger protein Zat12 zinc finger protein Zat12 myb family transcription factor peroxidase, putative putative protein membrane channel protein-like ethylene responsive element binding factor 5 (AtERF5) ethylene responsive element binding factor 5 (AtERF5) putative protein FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family FAD-linked oxidoreductase family xyloglucan endotransglycosylase, putative xyloglucan endotransglycosylase (TCH4) xyloglucan endotransglycosylase, putative xyloglucan endotransglycosylase (TCH4) tryptophan synthase beta chain 1 precursor (sp|P14671)
218 H. Stanley Kim et al. 2.5
R = 0.95
RT PCR
Log2 ratio
4
2
Microarray
Euclidean distance
5
3 2 1
1.5 1 0.5
r = 0.16
0 0
-1
50
A
B
C
D
E
F
G
H
60
70
80
90
% CDS identity
Genes A B C D E F G H
Gene
Description
At2g37040 At3g53260 At1g07400 At2g29500 At2g16500 At4g34710 At4g20830 At5g44380
phenylalanine ammonia lyase (PAL1) phenylalanine ammonia-lyase heat shock protein, putative putative small heat shock protein arginine decarboxylase arginine decarboxylase SPE2 FAD-linked oxidoreductase family FAD-linked oxidoreductase family
Identity (%) 83.70
Figure 6. Relationship between percentage identity in the coding sequence (CDS) for 57 duplicated gene pairs and Euclidean distance between their expression vectors; CDS identities and Euclidean distance are shown in blue and red, respectively.
80.20 80.60 56.90
Figure 5. RT-PCR confirmation of microarray data. Microarray and RT-PCR data of 0–3-h oxidative stress response are compared using the eight selected genes discussed in the text. For each gene pair, percentage identity at the nucleotide level is shown.
were harvested at 0, 1, 3, 6, and 12 h following exposure, and the plants were immediately frozen in liquid nitrogen for storage until RNA purification. Three replicate flasks, each containing approximately 100 individual plants, were collected at each time point and replicates were pooled to reduce variation prior to RNA extraction.
Microarray analysis Whole-genome Arabidopsis thaliana DNA microarrays were fabricated as described previously for chromosome 2 (Hegde et al., 2000; Kim et al., 2003). Briefly, regions approximately 1.15 kb in length and anchored at the predicted 3¢ stop codon of each of 25 636 predicted gene models in the nuclear, chloroplast, and mitochondrial genome were extracted from the complete genome sequence. PCR primers were designed to these regions and the corresponding genomic segments were amplified, purified, and printed on Corning (Acton, MA, USA) UltraGAPSTM aminosilane-coated microscope slides using a robotic spotter built by Intelligent Automatic Systems (Cambridge, MA, USA) and cross-linked by ultraviolet illumination. Total RNA representing each time point was extracted, and mRNA was enriched from the samples as described previously. Indirect labeling reactions using enriched mRNA and hybridizations were performed using slight modifications of published protocols (Hegde et al., 2000; Kim et al., 2003) as detailed at http:// atarrays.tigr.org. Labeled cDNA from each time point was cohybridized with that from the 0 time point to profile expression
changes following exposure to H2O2. All hybridizations were performed using dye-reversal replication to eliminate any possible bias in labeling. Hybridized slides were scanned using the Axon GenePix 4000B microarray scanner and the independent TIFF images from each channel were analyzed using TIGR Spotfinder (Saeed et al., 2003; http://www.tigr.org/software/) to obtain relative expression levels, and individual array elements were subjected to rigorous quality control checks to flag and eliminate those with poor spot morphology, dust, scratches and other contaminants, and those with mean signal less than 1.5 times the local background. Raw fluorescence intensity measures and associated quality control flags were stored in the AGED relational database developed for this project. Data were normalized using intensity-dependent local regression (lowess) implemented in MIDAS (Saeed et al., 2003; http://www. tigr.org/software/). All calculated gene expression ratios were log2transformed and averaged over dye-swap replicates at each time point. Differentially expressed genes at the 95% confidence level were determined using intensity-dependent Z-scores (with Z ¼ 1.96) as implemented in MIDAS and the union of all genes identified at each time point was considered significant in this experiment. The resulting data were visualized and further explored using TIGR MeV (Saeed et al., 2003; http://www.tigr.org/software/).
Microarray validation: RT-PCR analysis Differential expression of eight selected genes discussed in the text was assed by SYBR Green real-time qRT-PCR by using the DCT method implemented in the ABI 7900 (Applied Biosystems, Foster City, CA, USA) with primers designed based on their exon sequences (Table 2). qRT-PCR reactions were performed using the same pooled samples used for microarray hybridization and absolute transcript levels of the relevant transcripts at the 0 and 3-h time points were estimated and the log2(3/0 h) ratios were compared with the corresponding estimates derived from the microarray assays (Figure 5).
Figure 4. Heatmap representation of gene expression profiles for the 117 duplicated gene pairs for which both members were identified as being significantly regulated. The top bar indicates colors corresponding to the range of the observed expression ratios on a log2 scale. Expression patterns for the duplicated gene pairs are displayed in parallel. Pairs representing intra- and inter-chromosomal duplications are denoted in the middle with yellow and blue solid bars, respectively. Duplicated gene pairs discussed in the text are denoted in purple boxes.
ª Blackwell Publishing Ltd, The Plant Journal, (2004), 41, 212–220
Divergence in duplicated Arabidopsis genes 219 Table 2 Primers used in RT-PCR confirmation of the microarray data
Target gene
Forward primer
Reverse primer
Amplicon size (bp)
At2g37040 At3g53260 At1g07400 At2g29500 At2g16500 At4g34710 At4g20830 At5g44380
TTTGTCAAGCTGTGGATTTGAGA CAAAGGAGCAGAGATTGCTATGG ACAGTGTGCTCAAGATCAGTGGA CCTGGATTGAAGAAGGAGGAAGT CATTGTGTCTCATCACTCGGTGT TCAGTTTTCACTTCGATTCCTGA GCTTCTGTGGTTGCTCTGTTTTT TGGACTCCTTACGGTGGTATGAT
AGGATGAAGCTCACCATTGACTC ACGAGACGAGATCAAACCAAGAG GCCTTGACTTGATCCATCTTCAC TCCTCGTAAACTGTCCACTCGAT TCGTAATCACCTCGAACTTCCTC GAACTTGTTGATCTTCCCGTCAC CCACCATAAAGCAGACTGAAACC GCCAGTTCGCGTAATAGAGAATC
119 133 142 142 123 144 132 103
Data availability Microarray expression data presented in this manuscript are available through ArrayExpress http://www.ebi.ac.uk/arrayexpress with accession numbers A-TIGR-4 (array design) and E-TIGR-5 (experimental data).
Acknowledgements We thank Steven Salzberg for valuable advice for the analysis of the duplicated genome segments in Arabidopsis. We also thank Jennie Larkin, Ka-Yin Kwong, and Hong-Ying Wang for helpful discussions. This work was supported by a grant to JQ from the US National Science Foundation.
Supplementary Material The following material is available from http://www. blackwellpublishing.com/products/journals/suppmat/TPJ/TPJ2295/ TPJ2295sm.htm Table S1 Expression data of the H2O2-responsive genes in Arabidopsis Table S2 Fifty-seven genes selected to determine the correlation between sequence identity and expression pattern
References Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 408, 796–815. Blanc, G., Barakat, A., Guyot, R., Cooke, R. and Delseny, M. (2000) Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell, 12, 1093–1101. Blanc, G., Hokamp, K. and Wolfe, K.H. (2003) A recent polyploidy superimposed on older large-scale duplication in the Arabidopsis genome. Genome Res. 13, 137–144. Bouchereau, A., Aziz, A., Larher, F. and Martin-Tanguy, J. (1999) Polyamines and environmental challenges: recent development. Plant Sci. 140, 103–125. Bowers, J.E., Chapman, B., Rong, J. and Paterson, A. (2003) Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature, 422, 433– 438. Chong, J., Pierrel, M.-A., Atanassova, R., Werck-Reichhart, D., Fritig, B. and Saindrenan, P. (2001) Free and conjugated benzoic acid in tobacco plants and cell cultures. Induced accumulation upon elicitation of defense responses and role as salicylic precursors. Plant Physiol. 125, 318–328. ª Blackwell Publishing Ltd, The Plant Journal, (2004), 41, 212–220
Copenhaver, G.P., Nickel, K., Kuromori, T., Benito, M.-I., Kaul, S., Lin, X., Bevan, M., Murphy, G., Harris, B. and Parnell, L.D.e.a. (1999) Genetic definition and sequence analysis of Arabidopsis centromeres. Science, 286, 2468–2474. Ermolaeva, M.D., Wu, M., Eisen, J.A. and Salzberg, S.L. (2003) The age of the Arabidopsis thaliana genome duplication. Plant Mol. Biol. 51, 859–866. Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.L. and Postlethwait, J. (1999) Preservation of duplicate gene by complementary, degenerative mutations. Genetics, 151, 1531–1545. Grant, J.J., Yun, B.-W. and Loake, G.J. (2000) Oxidative burst and cognate redox signaling reported by luciferase imaging: identification of a signal network that functions independently of ethylene, SA and Me-JA but is dependent of MAPKK activity. Plant J. 24, 569–582. Gu, Z., Nicolae, D., Lu, H.H.-S. and Li, W.-H. (2002) Rapid divergence in expression between duplicate genes inferred from microarray data. Trends Genet. 18, 609–613. Hegde, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Earle-Hughes, J., Snesrud, E., Lee, N. and Quackenbush, J. (2000) A concise guide to cDNA microarray analysis. Biotechniques, 29, 548–562. Kim, H., Snesrud, E.C., Haas, B., Fu, C., Town, C.D. and Quackenbush, J. (2003) Gene expression analyses of Arabidopsis chromosome 2 using a genomic DNA amplicon microarray. Genome Res. 13, 327–340. Kovtun, Y., Chiu, W.-L., Tena, G. and Sheen, J. (2000) Functional analysis of oxidative stress-activated mitogen-activated protein kinase cascade in plants. Proc. Natl Acad. Sci. USA, 97, 2940– 2945. Lamb, C. and Dixon, R.A. (1997) The oxidative burst in plant disease resistance. Annu. Rev. Plant Physiol. Plant Mol. Biol. 48, 251–275. Levine, A., Tenhaken, R., Dixon, R. and Lamb, C. (1994) H2O2 from the oxidative burst orchestrates the plant hypersensitive disease resistance response. Cell, 79, 583–593. Lynch, M. and Conery, J. (2000) The evolutionary fate and consequences of duplicate genes. Science, 290, 1151–1155. Lynch, M. and Force, A. (2000) The probability of duplicate gene preservation by subfunctionalization. Genetics, 154, 459–473. Makova, K. and Li, W.-H. (2003) Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res. 13, 1638–1645. Massey, V. (2000) The chemical and biological versatility of riboflavin. Biochem. Soc. Trans. 28, 283–296. McCombie, W.R., de la Bastide, M., Habermann, K., Parnell, L.D., Dedhia, N., Gnoj, L., Schutz, K., Huang, E., Spiegel, L. and Yordan, C.e.a. (2000) The complete sequence of a heterochromatic island from a higher eukaryote. Cell, 100, 377–386. McDowell, J.M. and Dangl, J.L. (2000) Signal transduction in the plant immune response. Trends Biochem. Sci. 25, 79–82.
220 H. Stanley Kim et al. Miki, R., Kadota, K., Bono, H. et al. (2001) Delineating developmental and metabolic pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays. Proc. Natl Acad. Sci. USA, 98, 2199–2204. Mitchell-Olds, T. and Clauss, M.J. (2002) Plant evolutionary genomics. Curr. Opin. Plant Biol. 5, 74–79. Murashige, T. and Skoog, F. (1962) A revised medium for rapid growth and bioassays with tobacco tissue culture. Physiol. Plant 15, 473–479. Myers, B.C., Vu, T.H., Tej, S.S., Ghazal, H., Matvienko, M., Agrawal, V., Ning, J. and Haudenschild, C.D. (2004) Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing. Nat. Biotechnol. 22, 1006–1011. Otto, S.P. and Whitton, J. (2000) Polyploid incidence and evolution. Annu. Rev. Genet. 34, 401–437. Reuber, T.L. and Ausubel, F.M. (1996) Isolation of Arabidopsis genes that differentiate between resistance responses mediated by the RPS2 and RPM1 disease resistance genes. Plant Cell, 8, 241–249. Saeed, A.I., Sharov, V., White, J. et al. (2003) TM4: A free, open source system for microarray data management and analysis. Biotechniques, 34, 374–378. Soyka, S. and Heyer, A.G. (1999) Arabidopsis knockout mutation of ADC2 gene reveals inducibility by osmotic stress. FEBS Lett. 458, 219–223.
Tornero, P., Mayda, E., Gomez, M.D., Canas, L., Conejero, V. and Vera, P. (1996) Characterisation of LRP, a leucine-rich repeat (LRR) protein from tomato plants that is processed during pathogenesis. Plant J. 10, 315–330. Vierling, E. (1991) The role of heat shock proteins in plants. Annu. Rev. Plant Physiol. Plant Mol. Biol. 42, 579–620. Wagner, A. (2000) Decoupled evolution of coding region and mRNA expression patterns after gene duplication: implication for the neutralist-selectionist debate. Proc. Natl Acad. Sci. USA, 97, 6579– 6584. Wanner, L.A., Li, G., Ware, D., Somssich, I.E. and Davis, K.R. (1995) The phenylalanine ammonia-lyase gene family in Arabidopsis thaliana. Plant Mol. Biol. 27, 327–338. Waters, E.R., Lee, G.J. and Vierling, E. (1996) Evolution, structure, and function of the small heat shock proteins in plants. J. Exp. Bot. 47, 325–328. Watson, M.B., Yu, W., Galloway, G. and Malmberg, R.L. (1997) Isolation and characterization of a second arginine decarboxylase cDNA from Arabidopsis (accession no. AF009647). Plant Physiol. 114, 1569. Yamata, K., Lim, J., Dale, J.M. et al. (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science, 302, 842–846.
ª Blackwell Publishing Ltd, The Plant Journal, (2004), 41, 212–220