Vascular expression in Arabidopsis is predicted ... - Wiley Online Library

11 downloads 175802 Views 936KB Size Report
May 9, 2011 - of the CC–SE complex. As with phloem ... CC–SE system. These motifs should ...... negative effect on expression have been reported in other promoters .... single-scan mode, and further processed using Adobe illustrator.
The Plant Journal (2011) 67, 130–144

doi: 10.1111/j.1365-313X.2011.04581.x

Vascular expression in Arabidopsis is predicted by the frequency of CT/GA-rich repeats in gene promoters Roberto Ruiz-Medrano1, Beatriz Xoconostle-Ca´zares1, Byung-Kook Ham2, Gang Li2 and William J. Lucas2,* Department of Biotechnology and Bioengineering, Centro de Investigacio´n y de Estudios Avanzados del Instituto Polite´cnico Nacional, Avenida IPN 2508, Zacatenco, 07360 Mexico DF, Mexico, and 2 Department of Plant Biology, College of Biological Sciences, University of California, Davis, CA 95616, USA

1

Received 6 January 2011; revised 23 February 2011; accepted 16 March 2011; published online 9 May 2011. * For correspondence (fax +1 530 752 5410; e-mail [email protected]).

SUMMARY Phloem-transported signals play an important role in regulating plant development and in orchestrating responses to environmental stimuli. Among such signals, phloem-mobile RNAs have been shown to play an important role as long-distance signaling agents. At maturity, angiosperm sieve elements are enucleate, and thus transcripts in the phloem translocation stream probably originate from the nucleate companion cells. In the present study, a pumpkin (Cucurbita maxima) phloem transcriptome was used to test for the presence of common motifs within the promoters of this unique set of genes, which may function to coordinate expression in cells of the vascular system. A bioinformatics analysis of the upstream sequences from 150 Arabidopsis genes homologous to members of the pumpkin phloem transcriptome identified degenerate sequences containing CT/GA- and GT/CA-rich motifs that were common to many of these promoters. Parallel studies performed on genes shown previously to be expressed in phloem tissues identified similar motifs. An expanded analysis, based on homologs of the pumpkin phloem transcriptome from cucumber (Cucumis sativus), identified similar sets of common motifs within the promoters of these genes. Promoter analysis offered support for the hypothesis that these motifs regulate expression within the vascular system. Our findings are discussed in terms of a role for these motifs in coordinating gene expression within the companion cell/sieve element system. These motifs could provide a useful bioinformatics tool for genome-wide screens on plants for which phloem tissues cannot readily be obtained. Keywords: companion cells, long-distance signals, phloem, promoter analysis, sieve elements, vascular system.

INTRODUCTION During the transition to a terrestrial habitat, plants evolved developmental programs essential for formation of a vascular system, which allowed expansion of form and complexity. This vascular system comprises xylem and phloem tissues; the xylem conducts water and mineral nutrients from the roots to aerial regions of the plant, while the phloem delivers fixed carbon and other nutrients from mature, photosynthetic source leaves to heterotrophic tissues and organs. In angiosperms, the phloem translocation stream moves by bulk flow through a specialized sieve tube system driven by a pressure gradient. This system comprises two cell types: companion cells (CCs) and sieve elements (SEs), which are inter-connected by plasmodesmata. In the physiologically competent state, CCs are metabolically active and provide support for the enucleate SEs, which 130

have become highly reduced in cellular complexity (van Bel et al., 2002). The simplified form of individual SEs, together with the development of specialized sieve plate pores between neighboring SEs, provides a low-resistance pathway for fluid flow (Ruiz-Medrano et al., 2004; Lough and Lucas, 2006). The developmental programs that arose to control vascular differentiation are currently under intense investigation. Important insights have been obtained regarding the signaling molecules and transcription factors involved in cambial, xylem and phloem specification (see, for example, Bonke et al., 2003; Hawker and Bowman, 2004; Motose et al., 2004; Scarpella et al., 2004; Zhou et al., 2004; Carlsbecker and Helariutta, 2005; Ito et al., 2006; Sieburth and Deyholos, 2006; Demura and Fukuda, 2007; Yokoyama et al., 2007; ª 2011 The Authors The Plant Journal ª 2011 Blackwell Publishing Ltd

Common promoter motifs drive expression of SE-localized transcripts 131 Etchells and Turner, 2010). However, information on the mechanisms involved in coordinating patterns of gene expression in the developing vascular system, and within the specific tissues of the cambium, phloem and xylem, is still limited. A similar situation exists with respect to the genetic regulatory networks that operate to control the functioning of the phloem within mature tissues, particularly operation of the CC–SE complex. As with phloem development, sequences that drive expression of genes involved in specific phloem functions are being characterized (Schneider et al., 2006; Schneidereit et al., 2008; Ma et al., 2009). Of note, sequences in the promoter of galactinol synthase that drive expression in the minor veins of cucurbits have been identified (Ayre et al., 2003) and mapped to homologous genes in other species, including Arabidopsis. However, although similar sequences are present in other vascularexpressed genes (Keller and Baumgartner, 1991; Medberry et al., 1992; Kosugi et al., 1995), their significance and/or functionality remain to be elucidated in many cases. Recent studies have revealed that the functional sieve tube system of angiosperms contains a subset of mRNA molecules and a diverse population of proteins (Giavalisco et al., 2006; Roney et al., 2007; Gaupels et al., 2008; Hannapel, 2010). As the SEs are enucleate at maturity, the 1000 or more mRNA molecules present in the cucurbit phloem translocation stream (Ruiz-Medrano et al., 1999, 2007; Xoconostle-Ca´zares et al., 1999; Haywood et al., 2005; Lough and Lucas, 2006; Ham et al., 2009) probably originate from the neighboring nucleate CCs (Ruiz-Medrano et al., 2001, 2004; van Bel, 2003; Lough and Lucas, 2006; Huang and Yu, 2009; Turgeon and Wolf, 2009). A similar situation could exist for the more than 1000 proteins detected in the phloem sap of the cucurbits (Walz et al., 2004; Lin et al., 2007, 2009). In this case, the phloem transcriptome and/or proteome of the cucurbits could serve as a valuable resource to test for the presence of common motifs within the promoters of these genes that may function to coordinate their expression in this phloem cell type. In the present study, we analyzed Arabidopsis homologs of the pumpkin phloem transcriptome database. For this purpose, we first analyzed the upstream sequences from 150 homologous Arabidopsis genes in search of common motifs in their promoters. We identified degenerate sequences containing CT/GA-rich and to a lesser extent GT/CA-rich motifs that were common to many of these promoters. Parallel studies performed on genes previously shown to be expressed in phloem tissues identified similar motifs. An expanded analysis, based on homologs of the pumpkin phloem transcriptome from cucumber (C. sativus), rice (Oryza sativa) and poplar (Populus trichocarpa), provided additional support for the hypothesis that a similar set of common motifs are located within the promoters of these genes. Experimental support for this hypothesis was

obtained using representative upstream sequences, as well as minimal promoter cassettes, to drive a uidA–GFP reporter gene system. Our findings are discussed in terms of a role for these motifs in coordinating gene expression within the CC–SE system. These motifs should provide a useful bioinformatics tool for genome-wide screens on plants for which phloem tissues cannot readily be obtained. RESULTS Gene promoters of SE-derived transcripts share common degenerate motifs Transcripts detected in pumpkin phloem sap are probably derived from transcription occurring in neighboring CCs. To obtain insight into the production of this unique set of phloem transcripts, we first used a previously established EST database that was generated using poly(A)+ mRNA extracted from pumpkin phloem sap; this database comprises more than 1200 transcripts (Lough and Lucas, 2006). Given that the pumpkin genome has yet to be sequenced, together with the recalcitrant nature of pumpkin to routine transformation, a bioinformatics analysis was performed to identify the Arabidopsis homologs for these genes. For this purpose, we selected, out of the most abundant pumpkin transcripts (based on the number of ESTs), genes that were annotated as being involved in various forms of signaling. From these, we identified 150 Arabidopsis genes encoding putative phloem transcription factors, protein kinases, protein phosphatases, cell-cycle regulators and hormone response factors (Table 1). To identify potentially conserved sequences between the promoters of these genes, enumerative methods, based on the total count of given motifs in a dataset (PROMOMER and YMF), and probabilistic methods, based on a position weight matrix [Gibbs motif sampler, ALIGNACE (based on the Gibbs sampler algorithm) and MEME] were used. As background sequences (to serve as controls), we incorporated into our analysis sets of pollen-, guard cell-, ribosomal protein- and cyclin-specific genes, because it seemed reasonable to assume that their signature motifs would be different from those of vascular-expressed genes. We assumed that genes expressed at the highest levels in a given tissue or organ would display more common motifs or a more conserved motif than those expressed at lower levels. Thus, for the pollen set (Table S1) (based on data reported by Honys and Twell, 2004) and the guard cell set (Table S2) (based on data reported by Leonhardt et al., 2004), genes were selected according to their expression levels. Genes for ribosomal proteins were selected at random (Table S3) (based on data reported by Barakat et al., 2001). A search was performed to identify the entire complement of cyclin genes within the Arabidopsis genome (Table S4). The ALIGNACE program (Hughes et al., 2000) was used to analyze the sieve element transcript promoter (SETP) and

ª 2011 The Authors The Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 130–144

132 Roberto Ruiz-Medrano et al. Table 1 Gene IDs for pumpkin phloem sap transcripts and their closest Arabidopsis gene homologs used for promoter motif analysis

Gene ID

Arabidopsis homolog

Cucurbita_000445

At3g46290

Cucurbita_006206

At3g47570

Cucurbita_011090

At4g05420

Cucurbita_007894

At5g66080

Cucurbita_000239 Cucurbita_009893

At5g65210 At2g17290

Cucurbita_008102

At3g15220

Cucurbita_008186

At4g26930

Cucurbita_008169 Cucurbita_001629 Cucurbita_003901

At1g49620 At5g47840 At1g63700

Cucurbita_000087

At4g26690

Cucurbita_008005 Cucurbita_010531 Cucurbita_009689

At5g03300 At1g80070 At1g53300

Cucurbita_009998

At1g19210

Cucurbita_004036

At3g24240

Cucurbita_010871 Cucurbita_004647 Cucurbita_007842

At2g16750 At5g03790 At5g54380

Cucurbita_010144 Cucurbita_007907 Cucurbita_004074 Cucurbita_006340 Cucurbita_011143 Cucurbita_010330

At5g67380 At3g25840 At1g51800 At1g43700 At1g57700 At1g34260

Cucurbita_011219 Cucurbita_010929

At1g19220 At4g23900

Cucurbita_011249 Cucurbita_009895

At4g17880 At3g14205

Cucurbita_010872 Cucurbita_011155

At1g16330 At3g17730

Cucurbita_011181

At1g77460

Cucurbita_001979

At1g18160

Cucurbita_010134 Cucurbita_003734 Cucurbita_002400

At1g11950 At4g14550 At1g80070

Table 1 (Continued)

Gene ID

Arabidopsis homolog

Function

Cucurbita_031271

At1g79580

Receptor protein kinase-like/ receptor-like Leucine-rich repeat transmembrane protein kinase UV-damaged DNA binding factor-like protein/XPE Protein phosphatase 2C-like protein bZIP transcription factor, TGA1 Putative calmodulin-domain protein kinase CPK6 Putative MAP kinase/similar to BnMAP4Ka2 Myb family transcription factor (MYB97) Kip-related protein 7 AMK2 Putative protein kinase/similar to MAP3Ka1 Glycerophosphodiester phosphodiesterase/kinase ADK2 Splicing factor Prp8, putative Tetratricopeptide repeat domain thioredoxin AP2 domain transcription factor, putative Leucine-rich repeat transmembrane protein kinase Putative protein kinase Homeodomain protein Receptor-protein kinase-like protein Casein kinase II a subunit Protein kinase, putative Receptor protein kinase, putative VirE2-interacting protein VIP1 CRK1 protein, putative Phosphatidylinositol-4phosphate 5-kinase family protein Auxin response factor, putative Nucleoside diphosphate kinase 4 (NDK4) bHLH protein Phosphoinositide phosphatase family protein Cyclin B3 Hypothetical protein/similar to GRAB1 protein Armadillo/b-catenin repeat family protein MAP kinase, putative/similar to MAP3Kd1 Putative DNA-binding protein IAA7-like protein Splicing factor Prp8, putative

Cucurbita_010471

At5g07370

Cucurbita_009855

At4g11800

Cucurbita_010468 Cucurbita_007857

At2g40270 At3g24550

Cucurbita_010505 Cucurbita_008667

At3g07610 At3g03770

Cucurbita_009955 Cucurbita_009029

At1g77450 At1g61370

Cucurbita_001979

At1g18160

Cucurbita_011373

At4g00460

Cucurbita_010764 Cucurbita_010798 Cucurbita_000123

At3g15730 At3g14980 At3g07650

Cucurbita_007834 Cucurbita_010139

At3g04830 At1g79640

Cucurbita_007967

At1g62310

Cucurbita_004499

At1g58100

Cucurbita_009773 Cucurbita_008262

At1g52150 At1g51070

Cucurbita_010344

At1g30330

Cucurbita_010176 Cucurbita_007868 Cucurbita_007884 Cucurbita_010282 Cucurbita_011532 Cucurbita_006504

At1g20696 At1g02230 At5g65530 At4g38520 At4g29230 At4g18020

Cucurbita_008052 Cucurbita_000582 Cucurbita_006544 Cucurbita_008235

At3g20770 At3g15030 At3g05050 At3g04730

Cucurbita_007941

At3g03300

Cucurbita_010586

At1g80840

Cucurbita_008066 Cucurbita_011480 Cucurbita_000153 Cucurbita_009862 Cucurbita_005788

At1g66340 At1g61550 At1g27730 At1g20080 At1g17720

Cucurbita_011082

At1g13800

Function NAM (no apical meristem)-like protein OsNAC4, putative Phosphatidylinositol kinase (IPK2a) Protein serine/threonine phosphatase Protein kinase family protein Proline extensin-like receptor kinase 1 IBM1 protein Leucine-rich repeat transmembrane protein kinase GRAB1-like protein Receptor protein kinase (IRK1), putative MAP kinase, putative/similar to MAP3Kd1 Rho guanyl-nucleotide exchange factor Phospholipase D, putative PHD finger protein, putative CONSTANS B-box zinc finger family protein Auxin-regulated protein Kinase, putative/similar to Ste-20 related kinase Transcription factor, JUMONJI (jmjC) domain-containing Auxin-induced basic helix-loop-helix transcription HD-Zip transcription factor bHLH protein/similar to bHLH transcription factor Auxin response transcription factor (ARF6) HMGB1 NAC domain-containing protein RBK1 Putative protein phosphatase 2c NAC domain-containing protein Pseudo-response regulator 2 (APRR2) Ethylene-insensitive 3 (EIN3) TCP3-like protein Cyclin-dependent protein kinase Auxin-induced transcription factor DEAD/DEAH box helicase, carpel factory-related WRKY family transcription factor Ethylene-response protein, ETR1 Receptor kinase, putative Salt-tolerance zinc finger protein C2 domain-containing protein Type 2A protein serine/threonine phosphatase Pentatricopeptide (PPR) repeat-containing protein

ª 2011 The Authors The Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 130–144

Common promoter motifs drive expression of SE-localized transcripts 133 Table 1 (Continued)

Table 1 (Continued)

Gene ID

Arabidopsis homolog

Cucurbita_011051 Cucurbita_001705

At5g62310 At5g07100

Cucurbita_010957

At5g03730

Cucurbita_003380 Cucurbita_010773 Cucurbita_010209

At4g36010 At4g28540 At4g27410

Cucurbita_000634

At4g11460

Cucurbita_008117

At3g51550

Cucurbita_011203

At3g43220

Cucurbita_007812 Cucurbita_010585

At3g22790 At3g19510

Cucurbita_008533

At3g14270

Cucurbita_011202

At3g02130

Cucurbita_010568

At2g13370

Cucurbita_010507

At2g12900

Cucurbita_007385

At1g79620

Cucurbita_007847

At1g76630

Cucurbita_003663

At1g56090

Cucurbita_011366

At1g51220

Cucurbita_008924

At1g20640

Cucurbita_011175 Cucurbita_004985

At1g01060 At5g67190

Cucurbita_010521

At5g66210

Cucurbita_010421

At5g65710

Cucurbita_002537 Cucurbita_011303 Cucurbita_000261 Cucurbita_000260 Cucurbita_010855 Cucurbita_010187

At5g63940 At5g62940 At5g61430 At5g61420 At5g54680 At5g46510

Cucurbita_010656 Cucurbita_007753

At5g39810 At5g12430

Cucurbita_011069 Cucurbita_009810

At5g11100 At5g10200

Cucurbita_006560

At5g05140

Cucurbita_006458

At5g03140

Function

Gene ID

Arabidopsis homolog

IRE (root hair elongation) WRKY family transcription factor CTR1 serine/threonine protein kinase Thaumatin-like protein Protein kinase ADK1-like protein Desiccation-induced NAC transcription factor Serine/threonine kinase-like protein/receptor-like Receptor-protein kinase-like protein Phosphoinositide phosphatase family protein Kinase-interacting family protein Putative homeobox protein, HAT3.1 Phosphatidylinositol-4phosphate 5-kinase family Leucine-rich repeat transmembrane protein kinase Putative chromodomainhelicase-DNA-binding protein bZIP transcription factor family protein Leucine-rich repeat transmembrane protein kinase Tetratricopeptide repeat (TPR)-containing protein Tetratricopeptide repeat (TPR)-containing protein Zinc finger (C2H2 type) protein (WIP5) RWP-RK domain-containing protein DNA-binding protein, putative AP2 domain-containing transcription factor, putative Calcium-dependent protein kinase Leucine-rich repeat transmembrane protein kinase Putative protein kinase Dof zinc finger protein NAM-like protein Myb-related transcription factor bHLH protein Disease resistance protein (TIR-NBS-LRR class), putative MADS box protein DnaK heat shock N-terminal domain-containing protein CLB1-like protein Putative protein/ tetratricopeptide repeat protein Transcription elongation factor-related protein Receptor-like protein kinase

Cucurbita_002475

At4g37250

Cucurbita_006111

At4g34000

Cucurbita_011182

At4g33920

Cucurbita_010159

At4g31800

Cucurbita_009611

At4g31770

Cucurbita_011350 Cucurbita_010918 Cucurbita_010649 Cucurbita_010963 Cucurbita_011513 Cucurbita_011508

At4g30480 At4g29090 At4g28980 At4g28600 At4g16360 At4g12020

Cucurbita_011388

At3g53110

Cucurbita_011107

At3g45790

Cucurbita_011033 Cucurbita_010145

At3g23890 At3g19290

Cucurbita_010373

At3g15540

Cucurbita_008401 Cucurbita_008101

At3g15260 At3g15210

Cucurbita_010689

At3g14350

Cucurbita_010491

At3g13840

Cucurbita_010663

At3g11540

Cucurbita_002882 Cucurbita_000307

At3g10070 At3g09780

Cucurbita_011169

At3g09400

Cucurbita_011269

At3g09010

Cucurbita_000076

At3g06220

Cucurbita_010934 Cucurbita_007977

At3g06030 At3g05330

Cucurbita_000759

At3g04450

Cucurbita_010435 Cucurbita_011229

At3g02750 At2g47430

Cucurbita_001889

At2g41900

Cucurbita_010181

At2g17290

Cucurbita_000219

At2g01460

ª 2011 The Authors The Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 130–144

Function Leucine-rich repeat transmembrane protein kinase Abscisic acid-responsive element binding factor(ABF3) Putative protein/protein phosphatase Wip1 WRKY family transcription factor RNA lariat debranching enzyme-like protein Tetratricopeptide repeat protein Reverse transcriptase, putative Cyclin-dependent kinase F Calmodulin-binding protein Kinase-like protein Putative disease resistance protein/DNA-binding DEAD/DEAH box RNA helicase protein, putative Putative protein/MAP3Ka1 protein kinase Topoisomerase II Abscisic acid-responsive element binding factor Early auxin-induced protein, IAA19 Protein phosphatase 2C (PP2C) Ethylene-responsive element binding factor 4 (AtERF4) Leucine-rich repeat transmembrane protein kinase Scarecrow transcription factor family protein Spindly (gibberellin signal transduction protein) TAF12/TAFII58 Putative protein kinase/similar to Pto kinase Protein serine/threonine phosphatase Putative receptor serine/ threonine protein kinase Transcription factor B3 family protein NPK1-related protein kinase 3 Peptidyl-prolyl cis-trans isomerase cyclophilin-type family protein Myb family transcription factor, putative Protein phosphatase 2C (PP2C) Cytokinin-responsive histidine kinase (CKI1) Putative CCCH-type zinc finger protein Putative calmodulin-domain protein kinase CPK6 Phosphoribulokinase/uridine kinase family protein

134 Roberto Ruiz-Medrano et al. control promoter sets to test for the presence of common degenerate motifs. The resulting alignments were visualized using the WEBLOGO program (http://weblogo.berkeley.edu). The most common sequences in all gene promoter sets were A/T-rich motifs (Figure S1) (Hampson et al., 2002); however, it is well known that such motifs are abundant throughout eukaryotic genomes. Additionally, probabilistic methods may tend to show homopolymeric sequences as over-represented; as the A/T-rich motifs are more abundant in eukaryotic genomes, false positives could result. Thus, more robust results are obtained when combined with enumerative methods. Importantly, the second most abundant motif, a CT/GA-rich repeat, was present in the SETP set. Degenerate motifs containing CT/GA repeats were also present in the control promoter sets, but their lower maximum a posteriori probability (MAP) scores, which reflect the relative frequency of the motif, indicated that they were far less abundant in the promoters for these genes. The A/T-rich sequences are probably non-specific, while other motifs appear to be unique to each promoter set. This was the case for the TCP motif, in the ribosomal protein gene set, which was also predicted by the Gibbs motif sampler. In general, the SETP CT/GA-rich signature motif was under-represented in the control promoter sets. Analysis of guard-cell genes offered additional support for the hypothesis that this motif was enriched mostly in vascular-specific promoters. Similar CT/GA-rich motifs were also detected within the promoters of Arabidopsis homologs for (i) transcripts of tomato (Solanum lycopersicum) that were previously shown to be transported through a graft union into dodder scions (Roney et al., 2007), and (ii) pumpkin phloem transcripts bound by CmRBP50 (Ham et al., 2009), but the small sample numbers (12 sequences for tomato phloem-mobile transcripts, and 10 for pumpkin CmRBP50bound phloem RNAs) made it invalid to analyze these data sets more thoroughly. Additional analysis for reported vascular-specific promoters (Table S5) using the ALIGNACE program identified similar A/T- and CT/GA-rich motifs (Figure S2). To further test the hypothesis that the SETP set is enriched in CT/GA-rich motifs, we used the Multiple EM for Motif Elicitation (MEME) program (Bailey et al., 2006). The MEME method allows prediction of repetitive sub-sequences within a set of larger sequences. However, as with the ALIGNACE program, this method limits the examination to 46 sequences each one 1000 bp in length. Figure 1 presents the expectation (E) values obtained using the SETP and control (background sequence) promoter sets. The control sequences, driving expression of genes in other cell types, are expected to have different signature motifs. Figure 1 shows the two motifs with the lowest E values. Evidently, the CT/GA-rich motif is present in the phloem and background (control) promoter sets. However, this motif was enriched

Phloem Transcriptome

E = 3.6 × 10–19

E = 1.2 × 10–11

Pollen

E = 8.1 × 10–7

E = 8.0 × 10–6

Guard Cell

E = 1.3 × 10–6

E = 6.1 × 10–5

Ribosomal Proteins

E = 2.9 × 10–20

E = 3.5 × 10–9

Cyclins

E = 3.0 × 10–9

E = 1.1 × 10–4

Motif 1

Motif 2

Figure 1. MEME analysis of Arabidopsis homologs for the pumpkin sieve element transcript promoters (phloem transcriptome) and background (control) gene promoter sets. Over-represented motifs in each promoter set were identified by MEME analysis. The two motifs with the lowest expectation (E) values are shown.

several orders of magnitude in the SETP set relative to the control promoter sets: E values in the SETP, pollen and guard cell sets were 3.6 · 10)19, 7.0 · 10)10 and 1.3 · 10)6, respectively. Low E values for a related CT/GA motif were also found for ribosomal protein and cyclin gene promoters (3.5 · 10)9 and 3.0 · 10)9, respectively). In order to determine the enrichment of the aforementioned motifs in the SETP set in a stringent manner, these sequences were filtered and then shuffled using the Sequence Manipulation Suite (http://www.bioinformatics. org/sms2/about.html). Randomized sequences were then

ª 2011 The Authors The Plant Journal ª 2011 Blackwell Publishing Ltd, The Plant Journal, (2011), 67, 130–144

Common promoter motifs drive expression of SE-localized transcripts 135 separated manually into 1 kb blocks for MEME analysis, which subsequently indicated that there were no enriched motifs in this dataset. Indeed, the only sequence appearing more than once was a GC-rich motif (detected twice) with an E value of 105, which is 15–20 orders of magnitude higher than the enriched CT/GA motifs identified in the SETP set. An inherent limitation of both the ALIGNACE and MEME methods was the extent of sequence information that could be simultaneously analyzed. To overcome this problem, we used the PROMOMER program (http://bar.utoronto.ca/ntools/ cgi-bin/BAR_Promomer.cgi; Toufighi et al., 2005), as it has the capacity to compare the frequency of a motif within a specific promoter set in a larger sample size of Arabidopsis upstream regions. Here, we used 400 gene promoter sequences from an expanded gene promoter set (Table S6), with the exception of the ribosomal and cyclin gene complement promoter sets, for which there are only 253 and 39 genes in the Arabidopsis genome, respectively. Using this approach revealed that the CT/GA-rich motif was overrepresented in only the set SETP (Table S7); only A/T-rich motifs were found to be over-represented in control sets. Another motif that was over-represented exclusively in the SETP set was a CA/GT-rich repeat, which scored even higher than the aforementioned CT/GA-rich motif. However, this motif was not identified with using ALIGNACE or MEME methods, so we are uncertain as to its significance. A phloem/cambium transcriptome dataset, established from the root–hypocotyl boundary of 8-week-old Arabidopsis plants (Zhao et al., 2005), was next analyzed using the PROMOMER program. Interestingly, the resulting analysis revealed that the CT/GA-rich motif was over-represented in the promoters of the phloem/cambium and xylem/cambium gene sets, but not in the phloem cambium/non-vascular gene set (Table S8). Alternative methods reveal similar motifs within the SETP set Sequence motifs within the SETP and control gene sets were searched for in 1 kb upstream regions of each gene using the Gibbs motif sampler (Lawrence et al., 1993). Each gene promoter set harbored specific motifs that were either more abundant or not present in the other promoter sets. This analysis revealed the presence of the well-characterized TCP motif (Welchen and Gonza´lez, 2006) in ribosomal protein gene promoters, which had not earlier been reported in these sequences. As shown in Figure S3, each of the four control promoter sets analyzed harbors specific over-represented motifs, as reflected by their low log MAP values. In the case of the SETP set, a CT/GA motif was the most abundant. Similar motifs were found in other promoter sets but at lower frequencies. Two additional enumerative approaches, based on the Weeder algorithm (Pavesi et al., 2007) and the YMF method (Sinha and Tompa, 2002), were also used in this study. The

Weeder algorithm predicted A/T-rich motifs with a P value