elements in Arabidopsis thaliana - Springer Link

3 downloads 0 Views 681KB Size Report
out by Genome Express (Montreuil, France). ... erate stable transformants. The 1.5-kb regions ... potential to form stable secondary structures (data not shown) with ∆G ..... closest gene-related insertions of Pony elements in the yellow-fever ...
Mol Genet Genomics (2002) 267: 459–471 DOI 10.1007/s00438-002-0675-4

O R I GI N A L P A P E R

A. El Amrani Æ L. Marie Æ A. Aı¨ nouche Æ J. Nicolas I. Coue´e

Genome-wide distribution and potential regulatory functions of AtATE, a novel family of miniature inverted-repeat transposable elements in Arabidopsis thaliana Received: 25 November 2001 / Accepted: 5 April 2002 / Published online: 15 May 2002  Springer-Verlag 2002

Abstract A study of transgenic promoter::b-glucuronidase lines showed that the promoters of the two Arabidopsis ARGININE DECARBOXYLASE paralogues, ADC1 and ADC2, exhibited extremely different patterns of activity. One major feature of the promoter of ADC1 was the presence of a novel transposable element, which was shown to possess all of the characteristics of Miniature Inverted-repeat Transposable Elements (MITEs), and to be present in 26 full-length copies and 1617 partial copies and fragments distributed throughout the Arabidopsis genome. TRANSFAC analysis showed that this transposable element possesses a significant number of transcription-factor binding motifs. A bioinformatics approach based on a suffix-tree compilation was used to obtain an exhaustive description of exact copy numbers and positions of the element in the Arabidopsis genome. The distribution among the chromosomes was nonrandom, and a significant number of copies were found in regions flanking genes. Full-length copies of the transposable element were detected in the immediate vicinity of 22 genes, either upstream or downstream. Keywords MITEs (Miniature Inverted-repeat Transposable Elements) Æ Arginine decarboxylase Æ Bioinformatics Æ Genome sequence Æ Arabidopsis Communicated by G. P. Georgiev A. El Amrani (&) Æ L. Marie Æ A. Aı¨ nouche Æ I. Coue´e Centre National de la Recherche Scientifique, Universite´ de Rennes 1, UMR 6553, Campus de Beaulieu, Baˆtiment 14, 263 Avenue du Ge´ne´ral Leclerc, 35042 Rennes Cedex, France E-mail: [email protected] Fax: +33-2-23235026 L. Marie Æ J. Nicolas Institut de Recherche en Informatique et Syste`mes Ale´atoires, Universite´ de Rennes 1, Campus de Beaulieu, Baˆtiment 12A, 263 Avenue du Ge´ne´ral Leclerc, 35042 Rennes Cedex, France

Introduction Two major classes of mobile genetic elements are classically considered. Class I elements, or retrotransposons, transpose through reverse transcription of an RNA intermediate (Kumar and Bennetzen 1999). LTR retrotransposons are characterised by the presence of long terminal repeats. Non-LTR retrotransposons, such as LINEs (long interspersed repetitive elements) and SINEs (short interspersed repetitive elements), lack these characteristic sequences (Kumar and Bennetzen 1999). Class II elements transpose via a DNA intermediate and are characterised by the presence of terminal inverted repeats (TIRs) (Casacuberta et al. 1998; Le et al. 2000). Some elements are difficult to classify (Feschotte and Mouche`s 2000a). Such is the case for Miniature Inverted-repeat Transposable Elements (MITEs), which were first described in plants (Wessler et al. 1995) and later in other organisms (Tu 1997). Arabidopsis thaliana has one of the smallest known genomes among higher plants (The Arabidopsis Genome Initiative 2000). All of the previously described classes of mobile genetic elements, including MITEs, have been found in its genome (Casacuberta et al. 1998; Kumar and Bennetzen 1999; Le et al. 2000), and complete genome sequence information has been available since the end of 2000 (The Arabidopsis Genome Initiative 2000). This sequence is therefore a powerful model for genome-wide analysis of the variety and distribution of transposable elements and their potential impact on gene expression. In the present study, these questions were approached as a result of analysing the transcriptional regulation of ARGININE DECARBOXYLASE (ADC) genes in Arabidopsis at the promoter level. The enzyme (EC 4.1.1.19) encoded by these genes catalyses the formation of agmatine from L-Arg, which is the committed step at the initiation of polyamine synthesis. Polyamines are ubiquitous growth regulators in prokaryotic and eukaryotic organisms, and have been implicated in a range of

460

developmental processes in higher plants, including embryogenesis, root development, flowering, and leaf senescence (Evans and Malmberg 1989; Watson et al. 1998). The ADC gene was duplicated at the origin of the Brassicaceae family, thus yielding two paralogues, which are generally called ADC1 (Accession AAD26494), located on chromosome II, and ADC2 (Accession CAB80188), located on chromosome IV of A. thaliana (Galloway et al. 1998). The protein sequences derived from these two genes show extremely high homology, and are approximately 80% identical. However, the putative enzyme activities of these proteins may differ somewhat, and differences in expression and function can be envisaged. Detailed analysis of the promoters of ADC1 and ADC2 genes in Arabidopsis revealed the presence of a novel 742bp transposable sequence with the characteristics of a MITE specifically in the promoter of ADC1. This MITE was named AtATE, for A. thaliana ADC-related Transposable Element. A systematic analysis of its genomewide distribution was carried out using a bioinformatics approach based on a suffix-tree compilation of the whole sequence of A. thaliana. Our study shows that AtATE exists as a set of 26 full-length copies and as several hundred partial copies, which are not distributed at random between or within the five chromosomes of Arabidopsis. Finally, the distribution of full-length copies of AtATE strongly suggests a potential promoter-level relationship between ADC1 and a set of at least 11 other genes.

sequence was analysed and organised by the suffix-tree structure (Weiner 1973; Bieganski 1995) using the algorithm of Giegerich and Kurtz (1997) with programs written in Python and Java. Such a structure may be built in time and space linearly with respect to the length of the sequence. Files of families of sequences were used as queries against whole chromosome sequences for pattern matching. Sequence comparisons using the suffix-tree approach were carried out chromosome by chromosome using a SUN Enterprise 450 server with a 4-Gb RAM, with a global search time of approximately 10 min. Results were treated with locally developed programs written in Java and PERL. Characteristics and sequence variation parameters were determined from multiple sequence alignment using options available in PAUP version 4.0 (Swofford 1998). Multiple sequence alignments required inference of a consistent number of insertion/deletion events (indels) to adjust the overall alignment. These were coded as additional multistate characters (A, T, G, C, 0, 1) in a novel data matrix in order to exploit their potential phylogenetic information. Each of the homologous indels having the same position and length (of one or more base pairs) in the sequence alignment was scored as a single event. The novel data matrix, including all ungapped sites with indels replaced by coded characters, was subjected to phylogenetic analyses. Reconstruction of phylogenetic relationships was performed using the neighbour-joining and maximum-parsimony methods in the PAUP program. The maximum parsimony analysis was performed by heuristic searches using Fitch parsimony. The RANDOM strategy of stepwise addition of sequences (with 100 replicates) was conducted to search for possible undiscovered islands of most parsimonious trees (Maddison 1991). Characters and character states were weighted equally. Branches of zero length were collapsed. The strict consensus tree was reconstructed from all the unrooted most parsimonious trees generated by the analysis (Margush and McMorris 1981). The bootstrap method (with 1000 replicates) was used to examine the robustness of the various nodes revealed in the consensus tree (Felsenstein 1985).

Materials and methods

Construction of ADC1 and ADC2 promoter::b-glucuronidase (GUS) gene fusions

Plant and DNA materials BAC clones F1P15 and T4L20, encompassing the genomic regions of the Arabidopsis genes ADC1 and ADC2, respectively, were obtained from the Nottingham Arabidopsis Stock Centre. For seed germination and seedling growth, Arabidopsis seeds of the WS ecotype were axenically grown on 1·MS medium (Murashige and Skoog 1962) containing 3% (w/v) sucrose. Growth of mature plants was carried out in a growth chamber at 25C/17C under fluorescent light (16 h light, 8 h dark). Bioinformatics and sequence information Small-scale sequence analysis was done with tools available at the infobiogen WWW server (at http://www.infobiogen.fr). Multiplesequence alignments were constructed using CLUSTALW version 1.7 (Thompson et al. 1994), followed by BOXSHADE version 3.21 (written by K. Hofmann and M. Baron). Analysis of binding sites for transcription factors was carried out with TRANSFAC programs and databases (Wingender et al. 2000) at http:// www.genome.ad.jp. The non-redundant set of nucleotide and protein databases at http://www.ncbi.nlm.nih.gov was searched (most recently in May 2001) by BLASTN and BLASTX (Altschul et al. 1990), respectively, using default parameters and the sequence of AtATE that is present in the promoter of ADC1 as the query. Sequences were extracted from BLASTN hit results, transformed into their plus-strand versions and classified using a locally developed program written in PERL (Practical Extraction and Report Language). The complete (February 2001 version) genome sequence of Arabidopsis (Arabidopsis Genome Initiative 2000) was downloaded from the WWW server of the Munich Information Centre for Protein Sequences – Forschungszentrum fu¨r Umwelt und Gesundheit (MIPS-GSF) at ftp://ftpmisp.gsf.de/cress/. The

PCR was used to generate 5¢ promoter fragments. The BAC clones F1P15 and T4L20 were used as templates to generate, respectively, pADC1 and pADC2. The right flanking primers 5¢-CAGCGG ATCCCATCTTCTTCTTCTTCAACG-3¢, which was homologous to the –17/+3 sequence of the ADC1 promoter, and 5¢TCGCGGATCCCATCTTTATCTTCACCCTCT-3¢), which was homologous to the –17/+3 sequence of the ADC2 promoter, contained a BamHI restriction site (underlined), which was integrated into the 5¢ end in order to facilitate cloning. These primers and the left flanking primers 5¢-CGCGGTCGACTAATTACTATAGTTCACTTC-3¢), homologous to positions –1485 to –1465 for pADC1, and 5¢-ACTGGTCGACAGAAGAAGCAGAGACGAA AC-3¢), homologous to positions –1500 to –1480 for pADC2, containing an integrated SalI site (underlined) at their 5¢ ends, were used to amplify pADC1 and pADC2 fragments. PCR amplification was carried out with a high-fidelity Taq DNA polymerase with 3¢fi5¢ exonuclease activity (Sigma). Reactions were performed as follows: four cycles of 45 s at 94C, 45 s at 45C and 2 min at 72C were followed by 35 cycles of 45 s at 94C, 45 s at 55C and 2 min at 72C. In order to generate transcriptional fusions with GUS, the amplified pADC fragments were digested with SalI and BamHI and ligated into these sites in the binary vector pBIN101 (Jefferson et al. 1987), upstream of the b-glucuronidase coding sequence which includes the NOS transcription terminator. These constructs were mobilized into Agrobacterium tumefaciens for plant transformation. Constructs were verified by sequencing, which was carried out by Genome Express (Montreuil, France). Plant transformation and histochemical analysis of GUS activity A. thaliana (ecotype WS) was transformed via Agrobacterium by the method of Clough and Bent (1998). Batches of 50 plants

461 presenting numerous immature floral buds and few siliques were transformed. Seeds were harvested after 3–4 weeks of growth. Histochemical assays for GUS activity were performed as described previously by Jefferson et al. (1987). Plant tissues containing pADC1::GUS or pADC2::GUS were stained for GUS activity at 37C overnight. Tissues were then washed with sodium phosphate buffer (50 mM, pH 7), then left overnight in 70% ethanol. No background staining was observed in any of the untransformed plants.

Results ADC1 and ADC2 promoters show contrasting patterns of activity Constructs carrying the promoters of either ADC1 (pADC1) or ADC2 (pADC2) were fused to the GUS reporter gene and introduced into Arabidopsis via Agrobacterium-mediated transformation in order to generate stable transformants. The 1.5-kb regions located upstream of the ATG start codon, including transcribed 5¢ untranslated leaders, were considered to be potentially important for gene regulation (Bolle et al. 1994). The promoter regions that were fused to the GUS reporter gene therefore spanned from approximately –1500 to +3 relative to the ATG. Twenty and 25 independent transformants were obtained for pADC1::GUS and pADC2::GUS, respectively. Arabidopsis lines were observed at the plantlet or adult stages. All the pADC1::GUS fusion lines showed low promoter activity in plantlets, with no detectable GUS staining in roots. Adult plants presented specific and strong expression in anthers, specifically in pollen grains (Fig. 1). In fully developed flowers, GUS expression was also localized at the bottom of the gynoecium and in the stigma; no GUS expression was found in rosette leaves. In contrast, all the pADC2::GUS fusion lines presented

Fig. 1a, b Histochemical localisation of GUS activity driven by the promoters of ADC1 (a) and ADC2 (b). Inflorescences (a1, b1), stamen (a2, a3), tapetum and pollen grains (a4), inflorescence leaves (b2), and rosette leaves (a5, b3) are shown

strong GUS staining in plantlet roots and leaves, and a more generalized pattern of GUS expression in different parts of inflorescences and in inflorescence and rosette leaves (Fig. 1). Thus, a significant and reproducible difference in activity was found between the two promoters. Analysis of ADC promoters and identification of a putative transposable element associated with the ADC1 promoter The two ADC genes in the Arabidopsis genome and their corresponding gene products are described in the databases [ADC1, Accession No. AAD26494, (X. Lin, direct submission, 2000); ADC2, CAB80188 (M. Bevan, direct submission, 2000)]. No data have been published on regulatory elements located in the promoter regions that might determine the specific expression of each ADC gene. The cDNAs corresponding to both ADC genes show transcribed 5¢ untranslated sequences of 382 bp for ADC1 (Accession No. U52851) and 74 bp for ADC2 (Accession No. AF009647) (Fig. 2a). The two promoter sequences were found to present a low level of global homology, showing only 28.8% identity, with numerous gaps. A striking feature was that the ADC1 promoter (pADC1) exhibited, 332 bp upstream of the TATA box, a 742bp insertion (Fig. 2b), with imperfect terminal inverted repeats (TIRs). BLASTN analysis of the non-redundant set of databases at http://www.ncbi.nlm.nih.gov, using default parameters and the 742-bp insertion as query, yielded several hundred hits, all in the Arabidopsis genome. Preliminary analysis of these sequences revealed that a small number (approximately 40) represented full-length copies of the 742-bp sequence, giving homology scores of over 1000 (E value = 0.0), while several hundreds represented partial copies. This strongly suggested that the 742-bp insertion corresponded to a family of transposable elements, which we named AtATE, for A. thaliana ADC-related Transposable Element.

462

Fig. 3 AT content of the sequence of the transposable element associated with ADC1 promoter. The plot was generated using the ‘‘riche en’’ program available at http://www.up.univ-mrs.fr/ wabim/d_abim/d_docs/riche-en.html

Fig. 2a, b Sequence and general features of the promoters of the genes ADC1 and ADC2. The schematic diagram of ADC1 and ADC2 (a) indicates the coding region, the ATG start codon, the 5¢ UTR segment, and the position of the MITE transposable element. The 1.5-kb promoter sequence of ADC1 (b) shows the 742-bp insertion (highlighted against a black background) with the imperfect terminal inverted repeats (highlighted against a grey background), the TATA box (boxed) the transcription start site (arrow) and the initiation codon (underlined)

sequence identity to any other known transposable element was found. Moreover, AtATE did not possess a signature for RNA polymerase III promoters, as is the case for SINEs (Gilbert and Labuda 1999). However, AtATE does have imperfect TIRs of 23 bp (Table 1) that showed high similarity to those of Mimo and MER, which are MITE families that have been described in Culex pipiens and in human, respectively (Feschotte and Mouche`s 2000a). These imperfect TIRs were most similar (15–17/23 nt) to those of the Emigrant MITE family of Arabidopsis (Casacuberta et al. 1998). On the other hand, AtATE did not appear to have any obvious target sequence preference, but was preferentially distributed in AT-rich regions (data not shown) as already described for MULEs in the Arabidopsis genome (Le et al. 2000). This characteristic is shared by Basho, also found in Arabidopsis (Le et al. 2000). AtATE therefore represents a novel MITE as it shares all of the characteristics of this type of sequence – length, copy number, presence of TIRs, lack of coding potential, AT-rich character, and the potential to form stable secondary structures.

Characterization of the transposable element associated with the ADC1 promoter The AtATE located at the 5¢ border of ADC1 was AT-rich (78% A+T) as shown in Fig. 3, and had the potential to form stable secondary structures (data not shown) with DG values comparable to those reported for MITE families (Bureau and Wessler 1992, 1994; Tu 1997). Its size is similar to that of the Emigrant MITE of Arabidopsis (604 bp; Casacuberta et al. 1998) and to the element MER (862 bp; Feschotte and Mouche`s 2000a). Translation of the six reading frames of its sequence did not suggest any coding activity, except for a small peptide of 44 amino acids with no significant homology to any protein in the non-redundant set of protein databases (based on BLASTX analysis using default parameters; http://www.ncbi.nlm.nih.gov). No significant

The novel ADC1-related MITE contains several eukaryotic cis-acting regulatory domains We searched for cis-acting regulatory domains in the AtATE found in the ADC1 promoter. Almost 70 eukaryotic binding motifs could be detected using the TRANSFAC database. Comparison of AtATE sequences with TRANSFAC motif matrices confirmed the presence in AtATE of 10 potential cis-acting regulatory domains that are described in Table 2. Most of them are located at the proximal end of AtATE, adjacent to ADC1. Some of these domains are related to regulation of developmental genes; the P-binding motif is involved in regulation of genes whose products are involved in flavonoid biosynthesis; the ATAGAT motif is related

463 Table 1 Homology in terminal inverted repeats (TIRs) of AtATE with members of the MITE families Emigrant, Mimo and MER Elementa

TIRs (5¢fi3¢)b

Size (bp)

A+T content (%)

Emigrant

CAGTAAAACCTCTATAA ATTAATA CATTRTMT-ATCTATAAACT TATA CAGTAGTTGTTCGGTAA CTGGGCCAGTNGTCCCTCGNTATCC GCGGG

604

83.3

742

78

346 862

AtATE Mimo MER a

Emigrant , Mimo and MER are found, respectively, in Arabidopsis (Casacuberta et al. 1998), in the mosquito Culex pipiens (Feschotte and Mouche`s 2000b) and in human (Smit and Riggs 1996) b Alignment of the TIR sequences was carried out by CLUSTALW (Thompson et al. 1994) using default parameters. Conserved

Copy number

Number of complete copiesc

–DG (kcal/mol)

14

87.9

1617

26

69.9

60

1000

?

?

75

1000

500–1000

8

?

nucleotides relative to the AtATE sequence are indicated in bold. N indicates a highly variable nucleotide in the corresponding family. R=G or A; M=C or A c Numbers of full-length or nearly full-length copies are derived from database descriptions (MER) or from suffix-tree analysis of the whole genome (AtATE)

Table 2 Cis-acting eukaryotic regulatory domains in AtATE Potential binding motif (position in AtATE)a

Corresponding transcriptional factor

Reference

TCAAATATCTATCTG (724) AAATATCTAT (726) AAAATATAC (708) ATTAAAAGGCT (652) GCAAGTTAATATCTAG (636) GCCTACCTG (618) CAACTCAC (519) ATAGAT (493) CTAGAATTGACGGG (386) TATCATCATTATAGA (117)

Transcription repressor CDP GATA-binding factor 1 CF2-II Dof1/MNB1a single zinc finger transcription factor Deformed Maize activator P of flavonoid biosynthetic genes GA-regulated myb gene from barley Activator of nitrogen-regulated genes Gut-enriched Kru¨ppel-like factor Arabidopsis thaliana homeo box protein

Andres et al. (1994) Whyatt et al. (1993) Gogos et al. (1992) Yanagisawa and Schmidt (1999) Ekker et al. (1992) Grotewold et al. (1994) Gubler et al. (1999) Fu and Marzluf (1990) Shields and Yang (1998) Sessa et al. (1993)

a

Analysis of eukaryotic binding motifs in the AtATE of ADC1 promoter was carried out using TRANSFAC, and revealed the presence of approximately 70 potential motifs. AtATE sequences

were then visually compared with TRANSFAC motif matrices of most probable nucleotide positions, which resulted in the validation of 10 cis-acting regulatory domains listed

to transcription of nitrogen-regulated genes. The presence of AtATE in the vicinity of a given gene would thus position potential domains implicated in transcription factor binding. The insertion of AtATE and positioning of cis-acting regulatory domains in the promoter region of only one of the ADC genes is consistent with the differential promoter activity of the two paralogues (Fig. 1).

some V. A multiple alignment of these 26 non-redundant full-length copies is shown in Fig. 4. Large domains of complete consensus were detected. Table 3 summarises the positions, and distances with respect to 5¢ (ATG) or 3¢ borders of the closest genes. Copies of AtATE were found within 1700 bp upstream of the ATG of 12 genes. Most of these genes are predicted to encode putative or hypothetical proteins. A small number of these genes, such as ADC1 (Galloway et al. 1998), and genes encoding a delta subunit of mitochondrial F1-ATPase, a mitotic checkpoint protein, and a gibberellin oxidase, have been clearly identified or functionally characterised. Exhaustive description of transcription-factor binding sites depends on the ongoing accumulation of experimental evidence. Given the present stage of knowledge and small variations in the full-length copies of AtATE family (see Fig. 4), the AtATEs in ADC1 and the gene encoding the delta subunit of mitochondrial F1-ATPase were found to share 9 of the 10 potential cis-acting regulatory domains listed in Table 2. The P binding motif was absent from the AtATE associated with the gene encoding the delta subunit of mitochondrial F1-ATPase. Moreover, full-length copies of AtATE were also found 368–1092 bp downstream of 10 genes, among

Genome-wide distribution of full-length copies of AtATE Determination of positions in the genome was facilitated by treatment of all the different versions of AtATE as a single query file. A file of the 40 full-length copies of AtATE that had been identified by BLASTN analysis was matched with the whole sequence of each Arabidopsis chromosome, compiled in a suffix-tree data structure. The 40 sequences corresponded to 26 nonredundant copies, ranging in size from 720 to 749 bp (Fig. 4). Copy numbers on chromosomes I, II, III, IV, and V were, respectively, 4, 7, 1, 4, and 10. The density of full-length copies thus ranged from 0.043 copy per Mb on chromosome III to 0.371 copy per Mb on chromo-

464 Fig. 4a, b Multiple alignment of the 26 non-redundant full-length copies of AtATE present in the genome of A. thaliana. Conserved nucleotides are highlighted in white. Imperfect terminal inverted-repeat sequences (TIRs) are indicated by the arrows

465 Fig. 4 (Contd.)

which are PINHEAD/ZWILLE (Lynn et al. 1999) and a gene encoding a starch phosphorylase. Thus, 22 genes situated on chromosomes I (5), II (6), III (1), IV (1) and V (9), were found to have full-length copies of AtATE close to their coding regions. Sequence variation and phylogenetic analysis of full-length copies of AtATE Most full-length copies of AtATE copies fall into the size range from 740 to 749 bp; only AtATE-9 was significantly shorter than the others (720 bp), due to a 23-bp deletion in its 3¢ region (Fig. 4). Optimal alignment of these sequences involved the inference of 25 insertion/deletion events (indels) covering a total of 55

gapped sites: 22 indels of 1 bp, one insertion of 3 bp (present in AtATEs 8 and 11), one insertion of 7 bp (in AtATE-2), and one deletion of 23 bp (in AtATE-9). Pairwise comparison of nucleotide differences among these elements (with indels not coded) showed moderate values for sequence divergence, ranging from 0.67% (between AtATE-4 and AtATE-24) to 7.07% (between AtATE-7 and AtATE-17). These values were only slightly modified when indels were converted into 25 additional characters. Among 208 variable sites, 127 are uninformative (autapomorphous substitutions), while 81 nucleotide substitutions were potentially informative from the cladistic point of view. Conversion of indels provided eight additional potentially informative characters (from 25 new coded sites). Both maximum-parsimony and neighbour-joining analyses of

9830810 22158908

22701717 29960647 394623

499274

7302587

11434992

12242332 13485937 17403198

17596234

1348682 3111299 4505243

10326069

1897623 4327498

8816095

10005385

14085660 15609423

16360615 17621504

17779868 20284804

I I

I I II

II

II

II

II II II

III

IV IV IV

IV

V V

V

V

V V

V V

V V

743 741

742 742

740 743

741

742

740 742

744

742 740 742

742

720 743 747

745

742

743

742 742 742

742 749

Arginine decarboxylase [5¢ 801] AAD26494.1

Unknown protein [3¢ 716] AAD20098.1

Hypothetical protein [5¢ 1357] AAG51478.1 Putative coatomer zeta subunit [5¢ 834] AAG51650.1 Pseudogene [5¢ 1351] Putative glycerol kinase [3¢ 3021] AAF27123.1 Hypothetical protein [5¢ 1657] AAD21779.1

Products of adjacent genese

Putative oxido-reductase [3¢ 4490] CAB77790.1 Transposon [2576] Orientation not determined Putative lipid transfer protein [5¢ 2148] CAB77992.1 Putative protein [5¢ 3892] CAB79134.1

AB005239 (AtATE-25) AB024032 (AtATE-26)

AB011477 (AtATE-23) AB026651 (AtATE-24)

AB025602 (AtATE-21) AB009048 (AtATE-22)

Pseudogene [3¢ 3757] F14A1.11 Cytochrome P450-like protein [5¢ 7796] BAB08653.1 Putative protein [5¢ 909] BAB11345.1 PINHEAD/ZWILLE (translation initiation factor) [3¢ 1092] BAB11310.1 Putative protein [3¢ 505] BAB10983.1 Myrosinase binding protein-like [3¢ 8407] BAA97008.1

Putative protein [5¢ 3559] BAB08947.1 Delta subunit of mitochondrial F1-ATPase [5¢ 883] CAB87152.1 AC006259 (AtATE-19) Similar to translation elongation factor 2 [3¢ 2075] NP197905.1 AC007627 (AtATE-20) Phosphatase-like protein [5¢ 4508] NP568503.1

AB006700 (AtATE-17) AL163572 (AtATE-18)

AL161554 (AtATE-16)

AL161495 (AtATE-13) AL161505 (AtATE-14) AL161512 (AtATE-15)

AR781, similar to yeast pheromone receptor [5¢ 2117] AAC14501.1 AC006283 (AtATE-9) Unknown protein [3¢ 6916] AAD20694.1 AC006593 (AtATE-10) Hypothetical protein [3¢ 5503] AAD20672.1 AC005662 (AtATE-11) Putative embryo-abundant protein [3¢ 1311] AAC78535.1 AL133329 (AtATE-12) Putative helicase [5¢ 2202] CAB61942.1

AC002505 (AtATE-8)

AC007195 (AtATE-7)

AC006532 (AtATE-6)

AC000375 (AtATE-3) AC018849 (AtATE-4) AC007069 (AtATE-5)

AC069471 (AtATE-1) AC018908 (AtATE-2)

Size (bp)c Clone source and MITE numberd

Putative protein [5¢ 3633] BAB10984.1 Mitotic checkpoint protein [5¢ 609] BAA97009.1

Unknown protein [3¢ 2151] BAB11346.1 Putative protein [5¢ 552] BAB11311.1

Similar to WD repeat protein [5¢ 9739] NP568505.1 Transposon [3¢ 850] F14A1.12 Putative protein [3¢ 514] BAB08654.1

Putative protein [3¢ 368] NP197906.1

Starch phosphorylase H (cytosolic form)-like protein [3¢ 614] CAB61943.1 Putative oxido-reductase [3¢ 1363] CAB77791.1 Pseudogene [10593] Orientation not determined Putative MuDR-A-like transposon protein [3¢ 784] CAB77993.1 Putative transposable element [3¢ 2203] CAB79135.1 Putative protein [3¢ 4406] BAB08948.1 Putative protein [5¢ 2763] CAB87153.1

Unknown protein [5¢ 2285] AAD20693.1 Hypothetical protein [5¢ 1640] AAD20673.1 Unknown protein [5¢ 2472] AAC78534.1

Unknown protein [5¢ 8189] AAG51481.1 Putative gibberellin 20-oxidase [5¢ 1603] AAG51653.1 Hypothetical protein [5¢ 868] Hypothetical protein [3¢ 3542] AAF27122.1 Putative purple acid phosphatase [3¢ 623] AAD21785.1 Putative C2H2-type zinc finger protein [5¢ 4514] AAD20087.1 Putative vacuolar proton-ATPase 16 kDa proteolipid [3¢ 5560] AAD26493.1 Hypothetical protein [3¢ 815] AAC14502.1

a The exact positions and genic environments of the 26 non-redundant copies on each of the five chromosomes (I–V) were determined from the suffix-tree structure as described in Materials and methods b Exact position on chromosome c Size of AtATE copy d Accession No. of BAC clone e Products of the genes adjacent to each AtATE copy, together with the corresponding GenBank numbers of the proteins. The distance (in nt) between the AtATE copy and the closest border, whether 5¢ (ATG) or 3¢, of neighbouring genes is given in brackets

Positionb

Chromosomea

Table 3 Distribution and position of the 26 non-redundant full-length copies of AtATE

466

467

phylogenetic relationships among the 26 AtATEs (including coded indels) generated similar results. The unrooted strict consensus tree yielded by maximumparsimony analysis (Fig. 5) revealed six groups (A–F) of phylogenetically closely related AtATE elements. In groups A, C, D and F, no correlation was found between phylogenetic relationship and distribution of the AtATE elements on the chromosomes. In contrast, groups B and E each consisted of two closely related AtATE copies found on the same chromosome (II and V, respectively). Within their respective groups, elements share high levels of sequence identity, ranging from

96.1% within group F to 98.8% for group E (Fig. 5A). Within group A (Fig. 5B), which contains the ADCrelated transposable element of reference (AtATE-7II) and the element related to the PINHEAD/ZWILLE gene (AtATE-24V), elements are phylogenetically very closely related (separated from each other by 0, 2 or 4 informative mutations). A well-supported node (71% bootstrap) was found to relate AtATE-7II to AtATE26V. AtATEs exhibited different molecular evolutionary rates within most groups (A, B, C, D, and F), as shown by variable branch lengths. For example, since its transposition onto chromosome II, AtATE-7II appears to have evolved faster than the other elements within group A (Fig. 5B).

Distribution and gene-related position of the several hundred partial copies of AtATE

Fig. 5A, B Phylogenetic analysis of the 26 non-redundant fulllength copies of AtATE in the A. thaliana genome. Elements are numbered 1–26 in Arabic numerals, in the same order as in Table 3. Exponents in Roman numerals indicate the chromosome in which each element is inserted. The unrooted strict consensus tree (A) shows the main groups of closely related elements (A–F; with bootstrap support above 70%). For each group, the mean sequence identity for all members is given in parentheses. Bootstrap confidence values above 50% are indicated for each node in bold Arabic numerals. Details of relationships within group A are shown in B. The numbers in italics indicate the number of character changes along each branch

Besides full-length copies (Fig. 4), several hundred partial copies or fragments of AtATE were also identified by BLASTN analysis. A single file of these fragments of AtATE was used as query against whole genome sequences of each Arabidopsis chromosome by suffixtree-based pattern matching. For the visualisation of the resulting large population of fragments, we generated histograms depicting the frequency of occurrence as a function of relative position within each chromosome (Fig. 6). A total of 1617 non-redundant partial copies was found. All of the five chromosomes carried a significant number of AtATE fragments with densities (number of copies per Mb) of 15.2 (chromosome I), 13.4 (chromosome II), 14.0 (chromosome III), 10.2 (chromosome IV), and 13.8 (chromosome V), respectively. Fragments were generally found everywhere along the chromosome (Fig. 6), except on chromosome IV, which is characterised by a large 7-Mb zone that is free of AtATE copies. However, the distribution was non-random. For instance, chromosome I showed a Gaussian distribution with a main central region, between positions 12 and 14 Mb, showing the highest frequency of AtATE sequences. Positions were also classified according to the main sub-domains of genomic regions, i.e., distant 5¢-upstream region with putatively low involvement in promoter activity, neighbouring 5¢-upstream region with potentially significant involvement in promoter activity, coding regions plus introns, and 3¢-flanking regions (Fig. 7). Fragments of AtATE were found in large numbers in all types of regions on the five chromosomes. A significant number of fragments was found within transcribed regions (coding regions and introns). However, the number of fragments in coding regions plus introns was generally lower than that in non-transcribed regions. This was particularly true for chromosome II, which showed a 10- to 100-fold lower level of fragments in coding regions plus introns than in flanking regions, and harboured fewer fragments in the coding regions and introns of genes than any of the other four chromosomes.

468

Fig. 6a–c Genome-wide distribution of the 1617 partial copies of AtATE on each of the five chromosomes of Arabidopsis. The distribution was analysed using the suffix-tree structure. Copy numbers per 100-kb interval are given for each chromosome

Discussion Striking features of the novel MITE AtATE

Fig. 7 Distribution of the 1617 partial copies of AtATE relative to genomic domains. Analysis of distribution and position was carried out using the suffix-tree structure. Coding regions plus introns, proximal 5¢ flanking regions (closer than position –1500), distal 5¢ flanking regions (further than position –1500), and 3¢ flanking regions were distinguished. Copy numbers in each region are given for each chromosome

Analysis of the whole sequence has shown that, despite its small size, the Arabidopsis genome contains significant numbers of all the types of transposable elements found in many other plant species (The Arabidopsis Genome Initiative 2000). The first description of a MITE family in Arabidopsis, the Emigrant family, was recently provided by Casacuberta et al. (1998). To our knowledge, the present study, using the suffix-tree approach, gives the first complete description of exact copy numbers and positions in the Arabidopsis genome for a family of MITEs. The Emigrant family is characterised by a low estimated copy number (500–1000) in the genome and quasi-absence from coding and flanking regions of genes (Casacuberta et al. 1998). Only one member of the family lay within 1 kb upstream or downstream of an ORF (Casacuberta et al. 1998). In contrast, 16 of the 26 full-length AtATE copies lie within 1 kb of a coding sequence (Table 3). Emigrant seems to be related to Wujin, a MITE family present in the

469

yellow-fever mosquito Aedes aegypti (Tu 1997), and the TIRs of AtATE (Table 1) showed homologies with those of Emigrant and Wujin. However, AtATE TIRs differed at their 5¢ ends from the typical CAGT/CACT motif found in Emigrant/Wujin TIRs (Feschotte and Mouche`s 2000a). Moreover, AtATE is present in higher copy numbers than Emigrant, more similar to the copy number of Wujin in the A. aegypti genome (Tu 1997). MITEs have been reported to show a bias towards non-coding regions (Tu 1997). This is also true for full-length copies of AtATE (Table 3). However, the location of complete and partial copies of AtATE in the proximal flanking regions of numerous genes (Table 3; Fig. 7) contrasts sharply with the corresponding properties of Emigrant. Moreover, a significant number of partial copies of AtATE is found within transcribed regions (Fig. 7). Significance of the genome-wide distribution of AtATE copies MITEs are units of successful transposition, and have exploited various mechanisms of transposition (Tu 2000), such as snap-back transposition or the transposition machinery of autonomous DNA transposons. Thus, successful transposition of the MITE Angel in fish has resulted in copy numbers of 1000 to 10,000 (Izsvak et al. 1999). When transposition occurs via RNA-mediated processes, errors in reverse transcription yield truncated, non-functional insertions (The International Human Genome Sequencing Consortium 2001). During DNA-mediated transposition of non-autonomous elements, multiple deletions and imperfect replication can also occur, generating heterogeneity in length and sequence among members of the same family (Feschotte and Mouche`s 2000a). The distribution of AtATE copies in the Arabidopsis genome could therefore be related to transposition activity and its mechanisms, in association with the multiple effects of genome shuffling. Evidence from different eukaryotic genomes suggests a DNA transposon origin for some MITE families, such as Emigrant elements in Arabidopsis (Feschotte and Mouche`s 2000a) and MERs in human (Smit and Riggs 1996). However, as underlined by Feschotte and Mouche`s (2000a), any extension of this hypothesis to all MITE families must be treated with caution due to the lack of relevant data. The abundance of non-redundant partial AtATE copies scattered throughout the genome of Arabidopsis strongly suggests that extensive bursts of amplification have occurred in the past. Poor resolution in the central part of the phylogenetic tree for full-length copies (Fig. 5A) indicates that most of the main groups and some individual AtATE copies derive from weakly divergent founder elements. However, the unrooted tree did not allow for accurate determination of the relative evolutionary age of the different amplification events. Further studies should first evaluate the presence and diversity of AtATE elements among different populations of A. thaliana, and within genomes of other related

taxa, primarily in Brassicaceae. It has been suggested that small plant genomes have relatively few repetitive DNAs, and that most of these are found in large blocks, such as satellites (Pelissier et al. 1995), centromeres (Jiang et al. 1996), telomeres (Richards and Ausubel 1988) and centromere-associated regions. The distribution of AtATE copies could be associated with general features of each chromosome (The Arabidopsis Genome Initiative 2000). Thus, the 14.5–18 and 11–18 Mb zones of, respectively, chromosomes II and IV, which lack copies of AtATE, correspond to zones with a much lower frequency of all types of transposable elements (The Arabidopsis Genome Initiative 2000). Conversely, the peaks of AtATE numbers on chromosomes I and V appeared to correspond to zones with higher frequencies of all types of transposable elements (The Arabidopsis Genome Initiative 2000). Chromosome III presents a peculiar case. On the one hand, it shows the same average level of transposable elements as other chromosomes (The Arabidopsis Genome Initiative 2000), and the same density of partial copies of AtATE as other chromosomes (Fig. 6). On the other hand, it carries only one full-length copy of AtATE (AtATE-12), suggesting that transposition mechanisms for AtATE were less efficient in chromosome III. Potential implications of the distribution of AtATE for gene regulation Some putative transposable DNA elements of solanaceous plants, with sizes in the same range as AtATE (0.1–1 kb), have been reported to be found only in introns and flanking regions of genes (Oosumi et al. 1995). In the present case, it was found that AtATE could occur in different parts of genes (Fig. 7), with specific patterns of distribution between transcribed and flanking regions depending on the chromosome. The existence of eukaryotic cis-acting regulatory elements within the transposon (Table 2) is consistent with the potential involvement of AtATE in gene regulation. This notion is also supported by the finding of full-length AtATE copies in the vicinity of TATA boxes in 5¢ flanking regions (Table 3). In addition, hundreds of partial copies were present within the 1500 bp proximal to the 5¢ ends of genes (Fig. 7). The existence of two ADC genes in Arabidopsis (Galloway et al. 1998) is compatible with the duplicated status of its genome (The Arabidopsis Genome Initiative 2000). Otherwise, it would be tempting to speculate that the ADC gene had been duplicated during a transposition of AtATE, as it has been suggested that transposable elements may have the ability to mobilise segments of genomes, including genes, and move them to new locations, thus increasing gene copy numbers (Bennetzen 2000). In any case, the stabilisation of genome organisation would then have provided an opportunity for functional divergence between the two genes. A double physiological role is generally assigned to ADC, during developmental

470

processes in non-dividing tissues and in responses to environmental stress (Tiburcio et al. 1990; Soyka and Heyer 1999). ADC2, which lacks AtATE, is a likely candidate for the stress-responsive ADC gene in Arabidopsis (Soyka and Heyer 1999). In terms of development, ADC gene expression in pea is high in young developing tissues such as shoot tips, young leaflets and flower buds (Pe´rez-Amador et al. 1995). The control of ADC expression is complex. However, transcriptional activation of ADC in response to light has been reported (Chang et al 2000). Our results provide the first evidence that control of the differential, development-related, expression of ADC1 and ADC2 may occur at promoter level. Thus, ADC1 promoter activity appeared to be restricted to specific parts of inflorescences and plantlets, whereas ADC2 promoter activity was more extensive (Fig. 1). If post-transcriptional and post-translational regulation are superimposed on the promoter activity of ADC1, it is likely that under normal conditions of development ADC1 mRNA and protein show an extremely localized pattern of distribution. An association of MITEs with plant genes has been reported especially in grasses (Bureau et al. 1996; Yang et al. 2001). The AtATE-7 insertion at 332 bp upstream of the TATA box (Fig. 2) of the ADC1 gene ranks amongst the closest associations between a MITE and a gene reported to date in eukaryotic organisms. Thus, the closest gene-related insertions of Pony elements in the yellow-fever mosquito and of Kiddo elements in rice are, respectively, situated at positions –705 and –126 upstream of coding sequences (Tu 2000; Yang et al. 2001). It is reasonable to assume that such close associations of AtATE with promoter regions would have an impact on promoter activity. The presence/absence of a MITE in the promoters of the UBIQUITIN2 orthologues in two subspecies of rice has also been associated with striking differences in promoter activity (Yang et al. 2001). One possible explanation is the presence of eukaryotic cisacting regulatory domains in MITEs, such as those described in Table 2 for the AtATE copy found near ADC1. A number of these cis-acting regulatory domains correspond to functionally characterised transcription factors of Arabidopsis, such as Athb-1 (Aoyama et al. 1995) and Myb (Hoeren et al. 1998). Thus, the GCCTACCTG and CAACTCAC binding motifs found in the ADC1 AtATE possess the typical (A/T)AC central motif characteristic of Myb binding (Hoeren et al. 1998). The distribution of full-length copies of AtATE also revealed potential links between ADC1 and other genes, such as genes encoding the delta subunit of mitochondrial F1 ATPase, a mitotic checkpoint protein, and a gibberellin oxidase (Table 3). It is interesting to relate this to the significant activity of pADC1 in inflorescences and the importance of mitochondrial respiration for flower development (Landschu¨tze et al. 1995), or to the links between ADC expression, gibberellin action and the control of division/differentiation (Pe´rez-Amador et al. 1995). Further experiments with pADC::GUS lines under conditions of environmental stress, and experi-

ments designed to obtain deletion mutations of AtATE, are underway, with the goal of further investigating the regulatory effects of AtATE on gene expression. Acknowledgements We thank Yoann Mescam and Mathias Rossignol (Institut de Recherche en Informatique et Syste`mes Ale´atoires, Rennes, France) for help with the application of the suffix-tree approach.

References Altschul SF, Gish W, Miller W, Myers EW, Lipman D (1990) Basic local alignment search tool. J Mol Biol 215:403–410 Andres V, Chiara MD, Mahdavi V (1994) A new bipartite DNAbinding domain: cooperative interaction between the cut repeat and homeo domain of the cut homeo proteins. Genes Dev 8:245–257 Aoyama T, Dong CH, Wu Y, Carabelli M, Sessa G, Ruberti I, Morelli G, Chua NH (1995) Ectopic expression of the Arabidopsis transcriptional activator Athb-1 alters leaf cell fate in tobacco. Plant Cell 7:1773–1785 Bennetzen JL (2000) Transposable element contributions to plant gene and genome evolution. Plant Mol Biol 42:251–269 Bieganski P (1995) Genetic sequence data retrieval and manipulation based on generalized suffix trees. Ph.D. thesis, University of Minnesota, Minneapolis-St. Paul Bolle C, Sopory S, Lu¨bberstedt T, Herrmann RG, Oelmu¨ller R (1994) Segments encoding 5¢-untranslated leaders of genes for thylakoid proteins contain cis-elements essential for transcription. Plant J 6:513–523 Bureau TE, Wessler SR (1992) Tourist: a large family of small inverted repeat elements frequently associated with maize genes. Plant Cell 4:1283–1294 Bureau TE, Wessler SR (1994) Stowaway: a new family of invertedrepeat elements associated with the genes of both monocotyledonous and dicotyledonous plants. Plant Cell 6:907–916 Bureau TE, Ronald PC, Wessler SR (1996) A computer-based systematic survey reveals the predominance of small invertedrepeat elements in wild-type rice genes. Proc Natl Acad Sci USA 93:8524–8529 Casacuberta E, Casacuberta JM, Puigdomenech P, Monfort A (1998) Presence of miniature inverted-repeat transposable elements (MITEs) in the genome of Arabidopsis thaliana: characterization of the Emigrant family of elements. Plant J 16:79–87 Chang KS, Lee SH, Hwang SB, Park KY (2000) Characterization and translational regulation of the arginine decarboxylase gene in carnation (Dianthus caryophyllus L.) Plant J 24:45–56 Clough SJ, Bent AF (1998) Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J 16:735–743 Ekker SC, von Kessler DP, Beachy PA (1992) Differential DNA sequence recognition is a determinant of specificity in homeotic gene action. EMBO J 11:4059–4072 Evans PT, Malmberg RL (1989) Do polyamines have roles in plant development? Annu Rev Plant Physiol Plant Mol Biol 40:235–269 Felsenstein J (1985) Confidence limits on phylogenies: an approach using bootstrap. Evolution 39:783–791 Feschotte C, Mouche`s C (2000a) Evidence that a family of miniature inverted-repeat transposable elements (MITEs) from Arabidopsis thaliana has arisen from a pogo-like transposon. Mol Biol Evol 17:730–737 Feschotte C, Mouche`s C (2000b) Recent amplification of miniature inverted-repeat transposable elements in the vector mosquito Culex pipiens: characterization of the Mimo family. Gene 250:109–116 Fu YH, Marzluf GA (1990) Nit-2, the major positive-acting nitrogen regulatory gene of Neurospora crassa, encodes a sequence-specific DNA-binding protein. Proc Natl Acad Sci USA 87:5331–5335

471 Galloway GL, Malmberg RL, Price RA (1998) Phylogenetic utility of the nuclear gene ARGININE DECARBOXYLASE: an example from Brassicaceae. Mol Biol Evol 15:1312–1320 Giegerich R, Kurtz S (1997) From Ukkonen to McCreight and Weiner: a unifying view of linear-time suffix tree construction. Algorithmica 19:331–353 Gilbert N, Labuda D (1999) CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs. Proc Natl Acad Sci USA 96:2869–2874 Gogos JA, Hsu T, Bolton J, Kafatos FC (1992) Sequence discrimination by alternatively spliced isoforms of a DNA binding zinc finger domain. Science 257:1951–1954 Grotewold E, Drumond BJ, Bowen B, Peterson T (1994) The mybhomologous P gene controls phlobaphene pigmentation in maize floral organs by directly activating a flavonoid biosynthetic gene subset. Cell 76:543–553 Gubler F, Raventos D, Keys M, Watts R, Mundy J, Jacobsen JV (1999) Target genes and regulatory domains of the GAMYB transcriptional activator in cereal aleurone. Plant J 17:1–9 Hoeren FU, Dolferus R, Wu Y, Peacock WJ, Dennis ES (1998) Evidence for a role for AtMYB2 in the induction of the Arabidopsis ALCOHOL DEHYDROGENASE gene by low oxygen. Genetics 149:479–490 Izsvak Z, Ivics Z, Shimoda N, Mohn D, Okamoto H, Hackett PB (1999) Short inverted-repeat transposable elements in teleost fish and implications for a mechanism of their amplification. J Mol Evol 48:13–21 Jefferson RA, Kavanagh TA, Bevan MW (1987) GUS fusion: b-glucuronidase as a sensitive and versatile marker in higher plant. EMBO J 6:3901–3907 Jiang J, Nasuda S, Dong F, Scherrer CW, Woo S-S, Wing RA, Gill BS, Ward DC (1996) A conserved repetitive DNA element located in the centromeres of cereal chromosomes. Proc Natl Acad Sci USA 93:14210–14213 Kumar A, Bennetzen JL (1999) Plant retrotransposons. Annu Rev Genet 33:479–532 Landschu¨tze V, Willmitzer L, Mu¨ller-Ro¨ber B (1995) Inhibition of flower formation by antisense repression of mitochondrial citrate synthase in transgenic potato plants leads to specific disintegration of the ovary tissues of flowers. EMBO J 14:660–666 Le QH, Wright S, Yu Z, Bureau T (2000) Transposon diversity in Arabidopsis thaliana. Proc Natl Acad Sci USA 97:7376–7381 Lynn K, Fernandez A, Aida M, Sedbrook T, Tasaka M, Masson P, Barton MK (1999) The PINHEAD/ZWILLE gene acts pleiotropically in Arabidopsis development and has overlapping functions with the ARGONAUTE1 gene. Development 126: 469–481 Maddison DR (1991) The discovery of multiple islands of most parsimonious trees. Syst Zool 40:315–328 Margush T, McMorris FR (1981) Consensus n-trees. Bull Math Biol 43:239–244 Murashige T, Skoog F (1962) A revised medium for rapid growth and bioassay with tobacco tissue cultures. Physiol Plant 15:473–497 Oosumi T, Garlick B, Belknap WR (1995) Identification and characterization of putative transposable DNA elements in solanaceous plants and Caenorhabditis elegans. Proc Natl Acad Sci USA 92:8886–8890 Pelissier T, Tutois S, Deragon JM, Tourmente S, Genestier S, Picard G (1995) Attila, a new retroelement from Arabidopsis thaliana. Plant Mol Biol 29:441–452 Pe´rez-Amador MA, Carbonell J, Granell A (1995) Expression of arginine decarboxylase is induced during early fruit develop-

ment and in young tissues of Pisum sativum (L.). Plant Mol Biol 28:997–1009 Richards EJ, Ausubel FM (1988) Isolation of a higher eukaryotic telomere from Arabidopsis thaliana. Cell 53:127–136 Sessa G, Morelli G, Ruberti I (1993) The Athb-1 and -2 HD-Zip domains homodimerize forming complexes of different DNA binding specificities. EMBO J 12:3507–3517 Shields JM, Yang VW (1998) Identification of the DNA sequence that interacts with the gut-enriched Kru¨ppel-like factor. Nucleic Acids Res 26:796–802 Smit AFA, Riggs AD (1996) Tiggers and other DNA transposon fossils in the human genome. Proc Natl Acad Sci USA 93:1443–1448 Soyka S, Heyer AG (1999) Arabidopsis knockout mutation of ADC2 gene reveals inducibility by osmotic stress. FEBS Lett 458:219–223 Swofford DL (1998) PAUP: phylogenetic analysis using parsimony, version 4.0. Illinois Nature History Survey, Champaign, Ill. The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 The International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921 Thompson JD, Desmond D, Higgins DG, Gibson TJ (1994) CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighing, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680 Tiburcio AF, Kaur-Sawhney R, Galston AW (1990) Polyamine metabolism in plants. In: Miffin BJ, Lea PJ (eds) Intermediary nitrogen metabolism (The Biochemistry of plants, vol. 16). Academic Press, New York, pp 283–325 Tu Z (1997) Three novel families of miniature inverted-repeat transposable elements are associated with genes of the yellow fever mosquito, Aedes aegypti. Proc Natl Acad Sci USA 94:7475–7480 Tu Z (2000) Molecular and evolutionary analysis of two divergent subfamilies of a novel miniature inverted repeat transposable element in the yellow fever mosquito, Aedes aegypti. Mol Biol Evol 17:1313–1325 Watson MB, Emory KK, Piatak RM, Malmberg RL (1998) Arginine decarboxylase (polyamine synthesis) mutants of Arabidopsis thaliana exhibit altered root growth. Plant J 13: 231–239 Weiner P (1973) Linear pattern matching algorithms. Proceedings of the 14th IEEE Symposium on Switching and Automata Theory 1–11 Wessler SR, Bureau TE, White SE (1995) LTR-retrotransposons and MITEs: important players in the evolution of plant genomes. Curr Opin Genet Dev 5:814–821 Whyatt DJ, deBoer E, Grosveld F (1993) The two zinc finger-like domains of GATA-1 have different DNA binding specificities. EMBO J 12:4993–5005 Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, Meinhardt T, Pru¨ss M, Reuter I, Schacherer F (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 28:316–319 Yanagisawa S, Schmidt RJ (1999) Diversity and similarity among recognition sequences of Dof transcription factors. Plant J 17: 209–214 Yang G, Dong J, Chandrasekharan MB, Hall TC (2001) Kiddo, a new transposable element family closely associated with rice genes. Mol Genet Genomics 266:417–424