Plant transposable elements, with an emphasis on ... - Springer Link

2 downloads 0 Views 532KB Size Report
François Sabot, Delphine Simon & Michel Bernard. ∗. UMR INRA-UBP Amélioration et Santé des Plantes, Domaine de Crouël, 234 Avenue du Brézet, F-63039.
Euphytica 139: 227–247, 2004.  C 2004 Kluwer Academic Publishers. Printed in the Netherlands.

227

Plant transposable elements, with an emphasis on grass species Fran¸cois Sabot, Delphine Simon & Michel Bernard∗ UMR INRA-UBP Am´elioration et Sant´e des Plantes, Domaine de Crou¨el, 234 Avenue du Br´ezet, F-63039 Clermont-Ferrand Cedex 2, France; (∗ author for correspondence: e-mail: [email protected]) Received 2 June 2004; accepted 12 September 2004

Key words: evolution, grass, genome organization, transposable elements

Summary Transposable elements are presents in all known genomes so far, and have the faculty of changing their genomic location and/or number of copies within the genome. They are mobile endogenous genetic elements, with a large variety of structure and transposition mechanism. In crops, they compose the major part of the nucleic DNA, up to 80% in some cereals like maize and wheat. Despite their omnipresence, they are largely unknown and uncharacterized within the Poaceae family. In this review, we describe a possible classification of the elements present in this family, some of their known transposition mechanism, their known activity and possible action in crops, and their possible origin. Abbreviations: DR: Direct Repeats; LARD: LArge Retrotransposon Derivative; LINE: Long Interspersed Nuclear Element; LTR: Long Terminal Repeats; MITE: Miniature Inverted Repeats Transposable Element; ORF: Open Reading Frame; SINE: Short Interspersed Nuclear Element; TRIM: Terminal Repeats In Miniature; TSD: TargetSite Duplication; UTR: Untranslated Region Introduction Transposable elements are mobile genomic DNA sequences that can modify their number and/or their location within a host genome. The presence of transposable elements was suspected in maize in the early 1950s by McClintock (1950) who studied seed color variegation. Since then, they have been identified in bacterial as well as in eukaryotic genomes (for first reviews, Doolittle & Sapienza, 1980; Hickey, 1982; Orgel & Crick, 1980). They constitute a large part of eukaryotic genomes, from 5% for S. cerevisiae (Finnegan, 1989), to more than 80% for cereal crops (Flavell et al., 1977), with entire or partial sequences scattered throughout the genome (Thompson-Stewart et al., 1994). Their omnipresence in all analyzed genomes, their similarities even among life kingdoms, and their important place in genome content make them the central theme of numerous studies. The exact origin of these transposable elements is unknown, and their importance at the genomic level or their implication

in cellular regulation is misunderstood, but their commonness among all life kingdoms let suppose that they are essential to life. They are not merely “junk DNA” or “semi-parasitic genes” of the host genome but constitute a component of the genome. In this study, we will review transposable elements of cereal crops: we propose a classification, a view of transposable elements actions, some of their known regulations, and their genomic and cellular environment. Finally, we will discuss their origin and their transmission among individuals, offspring and species.

Classification Transposable elements are generally classified according to their transposition intermediate and their known or supposed transposition mechanisms (Finnegan, 1989; for review, Capy et al., 1998; Kumar & Bennetzen, 1999). These transpositions are realized in cis when the element is autonomous and translates its

228 Table 1. Non-exhaustive list of main elements found in main grass species Class

Subclass

Group

Class I

LTR retrotransposon

gypsy

copia

Species

Examples (non-exhaustive)

Rice

RIRE3, SZ, OSR

Wheat

Fatima, Sabrina

Barley

BAGY

Maize

Reina, Grande

Rice Wheat

Angela, Wis

Barley

BARE-1

Maize athila LARDs

Class II

Wheat and barley

Wham, Sabrina

Wheat and barley

Sukkula

Rice

Spip and Sqiq

TRIM

Wheat and barley

Veju

Non-LTR retrotransposon

LINE

Wheat and barley

Karine

SINE

Wheat

Au

DNA transposon

CACTA

Wheat and barley

Caspar, Jorge

hAT

Maize

Ac/Ds

Maize

Mutator

MuDR

Wheat and barley

Mutator

Stowaway

Wheat and barley

Hades, Fortuna

MDM

Rice

MDM

LITEs/FoldBack

NA

Wheat and barely

Apollon, Zeus

Helitrons

NA

Maize

Helitron

MITEs

own proteins, or in trans when the element is defective, the latter being dependant on coding abilities of the autonomous elements. The simple classification based on their transposition intermediates (Finnegan, 1989) was soon outdated by the increase of knowledge on transposable elements. However, the transposition intermediate allows separating the transposable elements in two major classes: the Class I elements have a reversetranscribed RNA intermediate, whereas the Class II element have only a DNA intermediate (Table 1). There are others subclasses of elements identified in other types of plants or animals than the elements considered in this paper, as BEL, DIRS-1 (Goodwin & Poulter, 2001), or Penelope (Volff et al., 2001a), for example. These types of elements are unidentified for the moment in grass genomes, and will not be taken in consideration in this review. Class I elements Class I elements (retroelements) have an RNA intermediate during transposition, which is reverse-transcribed

in DNA by a reverse transcriptase (RT). If the element is autonomous, this RNA is integrated in a new genomic location by a (normally) self-encoded protein complex. This RNA intermediate either integrates indirectly through a cytoplasmic reverse-transcribed DNA intermediate, or, directly reverse-transcribes at the new genomic site. Autonomous Class I elements contain complete coding sequences, in one or more open reading frames (ORFs), and generally have no introns: their mRNA is normally not prone to splicing. However, the ribosome may shift between different ORFs (when they exist) during synthesis of the protein. Class I elements generally integrate a new genomic location after an asymetric double-strand break. This forms the direct repeats DR, or TSD (target-site duplication) (Figures 1– 3) at both ends of the newly integrated element after resolution of integration (for review, Capy et al., 1998). This is a “Copy-and-Paste” replicative mechanism. The specific mechanism of transposition is used to distinguish two subclasses within the Class I elements. The sequence homologies at the proteic level and the sequence organization of the element are classically used

229

Figure 1. Class I elements structure. LTR: Long Terminal Repeat; U3: Unique 3 RNA; R: Repeated RNA; U5: Unique 5 RNA; CP: CapsideProtein; EN: ENdonuclease; INT: INTegrase; PR: PRotease; RT: Reverse Transcriptase; PBS: Primer Binding Site; PPT: PolyPurine Tract; NA: Nucleic Acid binding moiety; IR: Inverted Repeat; DR: flanking target Direct Repeat; TRIMs: Terminal Repeats In Miniature; TDR: Terminal Direct Repeats; LARDs: LArge Retrotransposon Derivatives; 5 UTR: 5 UnTranslated Region; 3 UTR: 3 UnTranslated Region; Pol III A & B: RNA Polymerase III promoter recognition sites; LINEs: Long Interspersed repetitive Elements; SINEs: Short Interspersed repetitive Elements.

230

Figure 2. Retrotransposition model (LTR retrotransposon example). See the text for further explanations. VLP: Virus-Like Particle.

Figure 3. Retroposition model (LINE with poly(A) tail example). (a) Double-strand shift break is made by LINE itself or another mechanism, (b) After poly(A) hybridization, reverse transcription starts from 3 to 5 for LINE mRNA. (c) 90% of time, reverse transcription stops before ending, with DNA/RNA duplex elimination. RNA is destroyed, (d) DNA is then repaired and complemented, with direct repeats DR formation.

231 to determine the superfamily of the element (Xiong & Eickbush, 1990) inside of each subclass. Sequences homology and variation within the superfamily are then used to determine the specific family of each element. Retrotransposons form the first subclass, and their reverse transcription is made in a virus-like particle (VLP, Figure 2) in the cytoplasm: they have a cytoplasmic DNA intermediate. The second subclass is formed by the retroposons, which are directly reversetranscribed at their new insertion site (Figure 3). The copy number of Class I elements increases at each transposition cycle, therefore this class is generally very abundant in grass genomes (more than 100,000 copies for some elements of the same family per haploid genome). Retrotransposons LTR retrotransposons: The first retroelement subclass is composed of elements with long terminal repeats, or LTRs. This superfamily is the best known, and probably the most abundant in crops. The average length of these elements is dependent of their subclass. Their LTRs are of various lengths (from 100 bp to 5 kb), organized in three regions (U3, R and U5), and have inverted repeats (IR, Figure 1) on their sides, generally delimited by dinucleotids 5 -TG. . .CA-3 . An RNA polymerase II promoter (TATA box) and a transcription initiation site are located within the LTR of active elements. Between the two LTRs, one to three ORFs are present, depending on the family. This division is based on the structural organization of the Pol region (Figure 1), the origin of the reverse transcriptase sequence. There are numerous different elements, and tens of families in crops genomes (Matsukoa & Tsunewaki, 1996), each type of element can be present in a very important number of copies: from three for Tos17 in the rice to 120 000 per haploid genome for BARE-1 elements in barley (Suoniemi et al., 1996). They are transcribed in a unique mRNA, which allows the formation of Gag and Pol proteins; the Pol product is then cleaved in post-translational maturation by its own internal protease sequence (PR, Figure 1). The Gag proteins will polymerize between them, and constitute the cytoplasmic VLP. A structural RNA (as a tRNA, for example) is generally used as primer for the circular reverse transcription (Figure 2). As a result of this mechanism, the two LTRs are identical at the time of insertion. After the synthesis of the second DNA strand, the new double-stranded DNA integrates in the genomic target-site (2 to 16 bp) in an asymmetric double-strand break made by the integrase (INT, Figures 1 & 2). In grasses, and generally in plants,

there are much more LTR retrotransposons and derived sequences than in animals: this diversity reflects a possible ancient association between plant genomes and LTR retrotransposons (Wessler et al., 1995). Xiong & Eickbush (1990) described first two types of LTR retrotransposons, but now three groups can be identified in crops from sequence homologies and their Pol sequence organization. 1. copia group: the first to be discovered were Ty1 in yeast (Transposon yeast, for review, Boeke 1989), and copia in the fruit fly Drosophila melanogaster (Young, 1977; for review, Mount & Rubin, 1985). Their Pol region is organized in PR-INT-RTRNAseH. RT and RNAseH are part of a single protein with two active domains. They are generally 8 kb mean long. 2. gypsy group: the Pol sequence is PR-RT-RNAseHINT. The first ones to be identified in this group were Ty3 in yeast (for review, Boeke 1989) and gypsy in D. melanogaster (Modolell et al., 1983). They are longer than copia, about 10 kb mean length. 3. athila group: these elements were first isolated in A. thaliana (P´elissier et al., 1995), are closely related to gypsy LTR retrotransposons, and are 10–12 kb long. Nevertheless, their internal sequences are different from gypsy, and their LTRs are generally longer. Terminal-repeat retrotransposon in miniature (TRIMs): Witte et al. (2001) have identified a new Class I superfamily, very similar to LTR retrotransposons, called TRIMs. They are short (about 500 bp) and lined with 100–250 bp long “LTRs”, called terminal direct repeats (TDRs). The TRIM insertion also forms target-site duplications in their target-site (TSD or DR, Figure 1). TDRs surround a central region of 100– 300 bp (Figure 1), with a PBS and a PPT but no coding sequences. They are defective and non-autonomous, despite their important number of copies. Their transposition is probably dependant on LTR retrotransposons, and their TDRs probably identical at the time of insertion as long as they have the same retrotransposition mechanism. It is also possible that these elements appeared through reducing recombination between LTR retrotransposons: large internal deletions may have occurred for a large portion of LTR retrotransposons, which have led to the apparition of the TRIMs. They are present in both monocotyledonous and dicotyledonous plants. Poaceae TRIMs and Arabidopsis TRIMs present in about 60–75% sequence homologies, even 60 millions years after divergence. These

232 observations suggest (i) either a constant transactivation of TRIMs, but reverse transcriptase errors would then have driven to more variations than observed; (ii) or sequence conservation by recruitment from the plant for currently unknown function(s). TRIMs are present in the whole genome of Arabidopsis, and are associated with coding sequences (ESTs from rice & maize), introns or promoters. These elements are often targeted for insertion by the other types of retrotransposons, LTR retrotransposons or other (for review, Witte et al., 2001). LARDs, LArge Retrotransposons Derivatives, were recently identified in barley and related genomes (Kalendar et al., 2004). These elements were previously characterized as gypsy LTR retrotransposons, such as Sukkula elements in barley (Jiang et al., 2002a, 2002b) or Dasheng elements in rice (Shirasu et al., 2000). LARDs are formed by large LTRs (about 4.5 kb) and by a higly conserved internal domain (about 3.5 kb) (Figure 1). No ORF can be detected within the internal sequence, but potential RNA secondary structures with high stability level have been demonstrated. These elements are non-autonomous, and they are probably dependent on gypsy elements such as Erika (barley and wheat element) or RIRE3 (rice elements) for their transposition. LARDs seem to be recent elements, likely to be active in Triticeae genomes: empty sites were detected on Ae. tauschii D genome (Kalendar et al., 2004). Retroposons They are not lined by LTRs/TDRs, and probably form a very large number of different families; each element number is supposed to be also important (up to 100,000 in animals). They are split into two groups, based on their coding sequences (or lack of coding sequences). Their retrotransposition mechanism, also called “retroposition,” does not use a cytoplasmic DNA intermediate, unlike retrotransposons. Long Interspersed transposable Elements or Long Interspersed Nuclear Elements (LINEs) are 1–8 kb long, contain one or two ORFs, and terminate their 3 extremity by a polyadenine tract or by repeats of a simple motif (Figure 1). The first ORF seems to be equivalent to a Gag sequence (Dawson et al., 1997), and the second ORF to Pol, but without recognizable PR and INT sequences. Their supposed expression involves internal RNA polymerase II and III promoters, with the formation of a biscistronic mRNA (two adjacent ORFs on the same mRNA). An apurinic/apyrimidic (AP) endonuclease, Gag-like ORF (Volff et al., 2001b) allows the integration of the newly reverse-transcribed DNA

(reviewed in Kumar & Bennetzen, 1999). The reverse transcription occurs during integration, directly at the target location (Figure 3). This endonuclease-like protein could also have an activity comparable to a restriction enzyme activity, with target-site recognition (Jurka, 1997), instead of AP endonuclease (Volff et al., 2001b). Moreover, this protein contains zinc-fingerlike structures, which are known to allow protein– DNA interactions (Schmidt, 1999), with single-strand nucleic acid cooperative binding ability (Kolosha & Martin, 1997). LINEs can also integrate during DNA repair (Moore & Haber, 1996), although the doublestrand breaks could be made by restriction enzyme-like activity of the AP endonuclease (Jurka, 1997). Their DNA genomic integration starts in 3 , but it is completed at the frequency of 10% only, inducing much more truncated 3 LINE than complete ones in the genome (for review, Jurka, 1997) (Figure 3). Short Interspersed transposable Elements or Short Interspersed Nuclear Elements (SINEs) are 100– 300 bp long and do not code for any transposition functions (Figure 1). They are supposed to be omnipresent in eukaryotes, but seem to be less common in grasses than others Class I elements. There are only few described examples in the Triticineae, some of the largest grass genomes, as the Au SINE (Yasui et al., 2001). SINEs are RNA polII or polIII-derived products integrated after accidental reverse transcription and transposition errors of LINEs or LTR retrotransposons. As the SINEs are parasites of transposition tools of autonomous Class I elements, their transposition mode depends on the type of element (LINE or retrotransposon), which has led to the creation of SINE. Different from spliced pseudogenes (reverse transcripted sequence too), they generally contain an internal RNA polIII promoter system that enables their transcription. Class II elements In grasses, Class II elements have a conservative transposition model of transposition (“Cut-and-Paste”), or a replicative mechanism (“Copy-and-Paste”). For all Class II elements, the transposition intermediate is DNA. The DNA sequence moves or duplicates “itself” from the first location (donor site) to another genomic location (acceptor site, Figure 4); in the first two subclasses, the protein responsible for the movements is the transposase. The transposition enzymes could be encoded by the element itself (autonomous element), or by another element (“defective” element). As the intermediate is DNA, the sequence element can

233

Figure 4. “Cut-and-Paste” transposition system for Class II transposable elements. There is a double-strand shift break, so a direct repeats DR formation. After the excision, DR stay in the donor site: there is a footprint of transposition/excision. gDNA: Genomic double-strand DNA.

have introns which can be spliced. Class II elements are distributed in three groups in grass, depending on their transposition mechanism and/or their length, each group being further subdivided in superfamilies, families, subfamilies, etc. DNA transposons per se Transposons are historically the first transposable elements identified: The couple of elements Ac/Ds (activator/dissociator) of maize was responsible for the grain color variegations observed by McClintock (1950), but they were cloned only 30 years later (Fedoroff et al., 1983). Transposons are divided into different subclasses, following their transposase sequence and their target-site sequence. The first difference between the two subclasses is the presence or the lack of a DDE motif (two aspartates D followed by a glutamate E, 35 residues later) in their transposase sequence (Capy et al., 1998). There can be little modifications, like DD(34)E or DD(35)D in this DDE/DDD motif. In crops, the majority of known DNA transposons do not exhibit this feature. DNA transposons are further classified according to their sequence homolo-

gies, and each superfamily is defined by the name of its most representative element, or the first discovered (Capy et al., 1998). The first subgroup contains DNA transposons showing the DDE/DDD motif in their transposase sequence, and is then subdivided in superfamilies. The second subgroup of DNA transposons lacks the DDE/DDD motif, and melts various subfamilies on this single criterion. These superfamilies are hAT, P, MuDR, CACTA, Tx and ISb, for example. In grass genomes, the majority of DNA transposons belong to the second group, with the hAT, MuDR and CACTA superfamilies and the Ac, MwDR and En/Spm elements as main respective representative elements. In Poaceae, and especially in Triticeae, the CACTA elements are widespread, with a high-copy number (more than 3000 individuals) in each genome (Langdon et al., 2003; Wicker et al., 2003). Whatever the subgroup, the supposed standard mechanism of these elements requires the recognition of the target sequence of the transposase: a specific transposase recognizes specific TIRs. This transposase is synthesized with the mRNA belonging from an active element. The transposase allows the excision of the DNA sequence of the DNA

234

Figure 5. Canonical structure of Class II elements. DRs/TSD are formed by target-site duplication, and leave a print after element excision. Coding sequence can contain several ORFs and introns.

transposon and its re-integration in the target-site (Figure 4). This activity generates excision footprints: even when the element is correctly excised from the donor site, the reparation of excision-generated-gaps leaves two DR at the donor site, forming a footprint (Figure 4). Moreover, excision could be abnormal, and the element can transfer supplementary DNA sequences to the acceptor site (transduction of flanking genomic sequences). When an element is integrated, the transposase makes an asymmetric double-strand break, repaired by homologous or heterologous recombination (end joining) (for review, Pastink et al., 2001). This creates TSDs on each side of the element and duplication of the target-site (Figure 5). The target-site selection seems to be under the influence of neighboring sequences upstream and downstream (Plasterk, 1999). A defective element, with no functional transposase but recognizable TIRs, can be mobilized in trans. This has been shown for Ds, which is recognized by the Ac transposase, or for Dspm, which uses the Spm transposase. On the other hand, an element coding for a functional transposase but containing imperfect TIRs cannot move. This last property can be used in genetic transformation (Kumar & Hirochika, 2001). It must be noted that some MULEs DNA transposons (Mutatorlike elements) do not have TIRs and are called nonTIR-MULEs. They are very abundant in genomes, and can transpose because they have non-TIR sequences recognized by their own specific transposase (Turcotte et al., 2001). These elements are long (8 kb mean length, and up to 23 kb, Chopra et al., 1999), but their copy number is generally lower than Class I elements, hundreds for each element (for review, Capy et al., 1998). This “lower abundance” is due to the fact that transposon copy number can increase only if transposition occurred during the S phase (DNA synthesis) of the cellular cycle, and only if the acceptor site is located down-

stream from the replication fork (Figure 6). In the S phase, the repair of the double-stranded break at the donor site is realized by homolog recombination with the sister-chromatid, newly synthesized (Figure 6), resulting then in two copies. The transposition mechanisms of these elements only require a transposase source in vitro, but host factors are necessary to resolve the transposition in vivo (Beall & Rio, 1996; Mahillon & Chandler, 1998; Mizuuchi, 1992), possibly as a regulation mechanism. Miniature inverted-repeats transposable elements (MITEs) MITEs are small Class II elements (from 30 to 500 bp), terminated by TIRs and bounded by TSD. They are ATrich, have specific target-sites, and are often associated with genes (in 5 or 3 , or even in coding sequences) (for first reviews, White et al., 1994), or nested (Jiang & Wessler, 2001). Nevertheless, they do not contain coding sequences. The most studied in cereal crops are Stowaway and Tourist elements. At the present time, transposase sources for different MITEs are identified. The maize Tourist MITEs are spread by the PIF (DNA Transposon) transposase, whereas the rice Tourist mPING are under the influence of PONG transposase (another DNA transposon) (Jiang et al., 2003; Zhang et al., 2001, 2004). Similarly, the Stowaway-like MITEs from rice seem related to mariner-like elements for their transposition (Feschotte et al., 2003). Finally, some Mutator-derived MITEs (MDM-1 and MDM-2) are present in rice (Yang & Hall, 2003). Thus, it has been proposed to classify MITEs in correlation with their transposase source, as an element of the corresponding subfamily (Capy, 2003). MITEs are present in a large number of copies (up to 100,000 for HeartBreaker in maize) (Casa et al., 2000), and are generally more numerous than their corresponding DNA transposons. MITEs are able to form stable secondary DNA

235

Figure 6. Class II element transposition during DNA replication, in S phase of cell cycle. The transposition double-strand break is repaired by ectopic recombination with the sister-chromatid. From one non-replicative element, there are two elements.

structures, and are often associated with Matrix Attachment Regions (MARs) (Bennetzen, 2000). Helitrons-like: Rolling-circle transposons Kapitonov & Jurka (2001) recently discovered Class II elements with a rolling-circle transposition cycle (Figure 7), like some bacterial elements (e.g., IS91) called helitrons. Helitrons are very different from other Class II elements: they are not bounded by TIRs, and they display a very small target-site (AT, as TC1mariner Class II elements) but without DR, and have a

replicative transposition mechanism. They do not contain specific internal motifs, but start with 5 -TC and end by CTRR-3 (R = A/G, Figure 7). Moreover, there is a palyndromic sequence of 16–20 nucleotide, 11 bp upstream from helitron 3 end. This structure seems to be an important sequence for transposition. Their internal sequence encodes a helicase sequence to enable the transposition of the element. Their lack of canonical structure did not allow their previous identification as transposable elements in the majority of the analyzed genomes. Nevertheless, they seem present in

236

Figure 7. Helitrons structure and possible transposition mechanism. (a) General structures and important features possibly required for transposition. C. elegans helitrons have only one gene, but A. thaliana and O. sativa ones have two or three ORFs. (b) Possible transposition mechanism by “Rolling-Circle”, based on a bacterial element (IS91). The element (in dark grey) can be autonomous or not. Two transposases molecules (black point) cut donor and acceptor site, and make a ligation between 3 donor strand and 5 acceptor strand. Replication starts at free 3 of donor site, and allows the helitron one-strand displacement. If the palyndromic structure and the 3 sequence are well recognized (left), a cut is made after the CTRR, and the heteroduplex is resolved during DNA replication, without adding other sequences than the helitron. If the palyndromic structure, or CTRR, is unrecognized (right), the cut is made far away, and helitron transposition allows the transduction of flanking sequences (in light grey) to another genomic location: “exon-shuffling”.

237 a great number of copies in the eukaryotic genomes studied: they constitute 2% of the A. thaliana and C. elegans genomes (Feschotte & Wessler, 2001; Kapitonov & Jurka, 2001). They have been recently detected in maize and Sorghum genomes (Lal et al., 2003); the maize helitrons are specific and are different from Arabidopsis helitrons. In maize, their insertions are responsible for some mutations, as for the Sh2 phenotype. The Sorghum seems to have less helitrons sequences than maize (Lal et al., 2003). Helitrons are mainly defective (internal deletions), but they could be probably transmobilized: the defective copies are too similar to each other to have been created by different deletion events. However, their trans-activation seems difficult because of their transposition mechanism. It appears that helitrons are independent from cell proteins for their transposition, unlike other Class II elements, and code for all proteins necessary for their transposition (transposase, resolvase, etc.) (reviewed in Feschotte & Wessler, 2001). Moreover, helitrons can sometimes transfer other DNA sequences to new genomic locations by transduction. Kapitonov & Jurka (2001), Feschotte & Wessler (2001), and Lal et al. (2003) suspect that the helitrons could be the evolutionary missing link between prokaryotic rolling-circle elements and geminivirus, or, conversely, that helitrons would come from geminivirus. FoldBack elements The FoldBack elements (Potter, 1982) have not been extensively studied in grass, and their mechanism of transposition is unknown. It has been first proposed that they transpose by means of an RNA intermediate. They can form a secondary structure called Hairpin (Ad´e & Belzile, 1999), perfect or nearly perfect. In grasses, they are generally small (less than 3 kb), and are flanked by very imperfect LTR-like structures (reviewed in Capy et al., 1998; Rebatchouk & Narita, 1997). It has been proposed that FoldBacks are long MITEs (Long Inverted repeat Transposable Elements, LITEs) longer than 500 bp, probably derived from MuDR Class II DNA transposons. The major problem with the classification of transposable elements is that new elements that do not match with any of the different classes are regularly identified (TRIMs, LARDs). The discovery of such new elements increases the difficulty to make a clear systematic classification. A system based on synapomorphism and virus-like structure has been proposed for LTR retrotransposons (Hull, 1999) but this kind of or-

ganization does not allow the classification of atypical elements, like TRIMs. Moreover, it should be noticed that LTR retrotransposon integrase sequence is very close to the transposase sequence of DDE DNA transposon subclass (Capy et al., 1998), and this homology is much greater than between DDE and the non-DDE transposases. Host-element interactions and transposition regulation Transposable elements are defined by their ability to transpose, which means that they can insert on their own into a new genomic location. As this action can have some negative effects (in short term), the host genome and the elements have selected different regulation/repression (trans-regulation) or self-regulation (cis-regulation) systems, respectively (Matyunina et al., 1996). Okamoto & Hirochika (2001) give the hypothesis that the regulation of a given transposable element by the host genome can be different between two species. Self-regulation of elements: A few examples As they need to preserve the viability of their host genome, transposable elements have evolved strategies to reduce their hazardous effects. There is probably one specific system per element, and these mechanisms could be equivalent or totally different even between closely related elements. In cereal genomes, retrotransposons are associated in very large intergenic spacers, where elements are nested within each other, like pyramids or “Russian dolls” (Figure 8) (Suoniemi et al., 1996; Chantret et al., 2004; Jiang & Wessler, 2001;

Figure 8. Nested insertions in “Russian Dolls”. Each arrow represents another transposable element. Such structures can have more than four levels of nested insertions. Here, the oldest element is nested with three other elements, and only the latest element sees its structure conserved.

238 Kumar & Bennetzen, 1999; SanMiguel et al., 1996). These nested insertions reduce active copy number, because the so-formed multimers cannot fully transpose. This type of insertion specificity may be selection criteria for plant: if one element blocks its “rivals,” this element must be conserved (Kumar & Bennetzen, 1999). These intergenic spacers separate “gene islands”, with a high gene density (Avramova et al., 1996; Chantret et al., 2004; SanMiguel et al., 1996). Cereal centromeric regions (heterochromatic regions) seem to be mainly, composed of gypsy LTR retrotransposons (Ananiev et al., 1998; Miller et al., 1998; Presting et al., 1998). LTRs contain transcriptional silencing sequences, or do not code for cis-activators: they cannot activate themselves without host cell enzymes (Wessler et al., 1995). It is true for the vast majority of Class II elements, which need host protein to transpose, except helitrons (Beall & Rio, 1996; Feschotte & Wessler, 2001; Kapitonov & Jurka, 2001; Mahillon & Chandler, 1998; Mizuuchi, 1992). MITEs have a preferential insertion site in AT-rich regions, usually genepoor regions (Hu et al., 2000), but possibly regulatory regions (Bureau & Wessler, 1994a) or introns (Chantret et al., 2004): they do not block coding sequences, but can disrupt the regulation of genes. Small defective elements like TRIMs, SINEs or MITEs are more abundant than complete elements that allow them to transpose. Some authors suggests this is because small elements have a weak mutagenic “power” (Hu et al., 2000; Kumar & Bennetzen, 1999; Wessler et al., 1995; Witte et al., 2001). However, “low-copy” number elements are more often found near coding sequences, or in active and non-methylated regions, in contrast to “highcopy” number elements. These latter will remain in non-coding, hyper-methylated and inactive sequences; high-copy number LTR retrotransposons compose 70% of maize genome, but few mutations are associated with their insertion (Bennetzen, 2000; Kumar & Bennetzen, 1999; Kumar & Bennetzen, 2000). It is also possible that the insertions near genes or regulatory sequences of such elements are eliminated by the host genome, and are counter-selected. Transposable elements sequence variation and polymorphism In all the genomes studied to date, some transposable elements were evidenced, even in prokaryotic simplest genomes. Nevertheless, the elements are not identical, even within a subfamily: their levels of polymorphism are different between species and between element

families within species. There are many reasons for this polymorphism, which fall in two majors groups: (1) the element itself, (2) the host genome and other external factors (see later section). The element itself The most representative example is the retrotransposon reverse transcriptase: it shows a mutation rate of 2.5 × 10−5 errors per/cycle, 100–1000 times more than for host genome DNA polymerase (Gabriel et al., 1996). Different types of reverse transcriptase errors were identified (Boutabout et al., 2001; Preston, 1996): sequence extension after base mismatch, low dNTPs discrimination, no 3 or 5 exonucleasic activity (no repair), primers or matrix splicing on a long repetitive sequence (ORF modification), and addition of bases at the end of the sequence synthesis. All these errors can lead to accumulation of defective copies, they enable new insertions without capacity of transposition, and thus can limit the risk of a transposition burst, as it is the case for virus infections (Gabriel et al., 1996; Purugganan & Wessler, 1994). The trans-mobilization of defective elements also increases this polymorphism, like for Ac/Ds system, or recombination mechanism between the different copies of elements of the same type (Shirasu et al., 2000). Two theories were put forward to explain over the transcription/transposition model of SINEs (Deragon et al., 1994): (i) the “Cascade”: all new entire SINEs are able to be retroposed, and could generate new copies; (ii) the “Master Copy” (Deininger et al., 1996): only a very small number of SINE copies are active and are able to retropose, and the remaining derived sequences are inactive. Master copies are under the control of a strong RNA pol II promoter (Arnaud et al., 2001; Deragon et al., 1996). Species or external factors Transposable elements polymorphism is much more important in plants, and particularly in grasses, than in animals. Two reasons, one mechanical, and one developmental (reviewed in Kumar & Bennetzen, 1999) can explain this: (i) The larger genomes of crops, allowing them to withstand much more new insertions, partly due to retrotransposons activity, and thus much more polymorphism between the different copies of the same element. (ii) The differentiation mechanism of germline, which is not determined during embryogenesis compared to animals. For animals, the early specification of the germline allows the use of repressive and protective mechanisms against mutations (methylation and repressive chromatin). In plants, any meristem can

239 form a new germline at different life times, even after a long time of vegetative life, when the cells have accumulated a lot of mutations, and this increases also the polymorphism level of transposable elements. General regulation mechanisms by host genome Transcriptional level The major regulation/repression way of host genome on transposable elements seems to be mediated by cytosine methylation (Bender, 1998; Henikoff & Matzke, 1997; Yoder et al., 1997). For these authors, methylation-mediated epigenetic was selected to block transposable elements activity: loss or decrease in methylation level of transposable elements sequences allow their reactivation (Okamoto & Hirochika, 2001), strongly suggesting an important role for methylation in the regulation of transposable elements. Methylation is generally associated to repressive chromatin formation (heterochromatin). Cytosine methylation generally occurs in a symmetric context (CpG and CpNpG), but also in an asymmetric context in grasses (CXX). Post-transcriptional level PTGS mechanisms (Post-Transcriptional Gene Silencing) seem to participate in transposition control and repression (Baulcombe, 2000; Lawson et al., 1994; Nakayashiki et al., 2001). In maize, the DNA transposon MuDR regulates its transposition by the transcription of non-functional competitive MuDR-related proteins (Rudenko & Walbot, 2001). These transcripts seem to be generated by the nuclear genome, because they are also present in MuDR-defective lines. Moreover, small RNAs (which are implicated in silencing) of 21–26 nucleotids specific from MuDR are constitutively expressed in maize, even in Mutator-activity defectives lines (Rudenko et al., 2003). Finally, the accumulation of nuclear MuDR RNA enables the inhibition of this element, along with the methylation of TIRs. Sequence elimination and mutations Transposable elements regulation by the host genome can occur through simple sequence elimination, and especially for Class I elements. The more explicit example is the formation of “solo-LTRs” for LTR retrotransposons, LARDs and TRIMs (Figure 9) (Kalendar et al., 2004; Shirasu et al., 2000; Witte et al., 2001). This mechanism allows the direct elimination of the major part of a retrotransposon, leaving behind a soloLTR. This mechanism could happen between LTRs of

two related but distinct elements, and permits the elimination of a large block of sequences. Solo-LTRs are evidence that genome length expansion is reversible, and is not “one-way pass.” Kalendar et al. (2000) analyzed wild barley (Hordeum spontaneum) accessions from the Evolution Canyon (Mount Carmel, Israel) and showed that the LTR number of BARE-1 was more important than that of the complete sequences of this element: the solo-LTR formation is a important mechanism. One can hypothesize that small-sized genomes are so, partly because of their ability to eliminate repeated sequences faster than large genomes. Moreover, the sequence elimination may explain the disappearance of some element families from one genome, and not from another related genome. Repeated sequences are less prone to selection pressure than coding sequences. They are thus more susceptible to punctual mutations, and it is available for transposable elements (Kumar & Bennetzen, 1999). Methylated and/or heterochromatic sequences have a worst DNA repairing from DNA polymerase errors during DNA replication in S phase. Moreover, methylation increases 10-fold the transition speed from Cytosinemethylated to Thymine. All these mechanisms result in the rapid inactivation and evolution of transposable elements. Counter selection of deleterious insertions As for any evolutionary mechanism, a “negative” insertion will be counter-selected, and the individuals bearing this insertion will be lost. Then, in a natural population, only the neutral or “positive” insertion will be retained and transmitted to the offspring. Thus, if the insertion near or within genes will appear as a “negative” insertion, it will disappear during the evolution of the species, and we have not observed it so far.

Transposable elements activity and activation In spite of regulation systems, transposable elements are active under several conditions. In general, stress can activate most of the elements (Hirochika, 1995; Wendel & Wessler, 2000), but some elements have a minimal basic activity during the whole cell life under some physiological conditions (J¨aa¨ skel¨ainen et al., 1999; Raina et al., 1993). In various crops, activating stresses can be either biotic (bacterial or viral infections, interspecific breeding, . . .), or abiotic (chemical products, wind, radiations, . . .) (Hirochika, 1995). For example, the maize MITE mPING increases its activity after another culture (Kikuchi et al., 2003). In

240

Figure 9. “Solos” LTRs/TDRs formation mechanism. There is a recombination between 2 nearly retrotransposons LTRs, and circular elimination of DNA sequences. The resulting DNA genomic sequence has only the solo LTR. This is the case for LTR retrotransposons and TRIMs.

the same way, formation of synthetic allotetraploid of wheat triggers the reactivation of retroelements, and in particular of Wis2-like LTR retrotransposon (Kashkush et al., 2002, 2003). McClintock (1984) expressed the hypothesis that transposable elements are able to reshape genomes under extreme conditions (genomic shock), because of their reactivation of transposition during different types of stress.

Putative roles and actions of transposable elements For Orgel & Crick (1980), transposable elements were “selfish DNA”, able to increase their copy number

within the genome, but without bringing notable contribution to phenotype. For the same reasons, for Doolittle & Sapienza (1980), transposable elements do not correspond to phenotype paradigm: “The major and perhaps only way in which a gene can ensure its own perpetuation is by ensuring the perpetuation of the organism it inhabits.” In the early 1980s, the only known effect of transposable elements was mutations, which are most of the time very deleterious for the organism. All these effects influence only a small part of the studied organisms. Transposable elements were considered as intracellular obligatory, but non-infectious parasites (in general), or as “junk DNA” (Doolittle & Sapienza, 1980). It is only during the 1990s and after

241 systematic sequencing of model genomes that their abundance and their ubiquity became noticeable to biologists, despite their lack of major phenotypic effects (Pardue, 2000). According to Fedoroff (2000), transposable elements may play an important role in eukaryotic genomes, but they have often unseen phenotypic effects. For Jurka (1998), transposable elements have contributed to genome creation and improvement, and they continue to act in this way. Thus, the difficulty is to make short work of the idea of only parasitic aspect of the elements, or the possibility that they are “symbiotic” partners of genomes. Transposable elements are fundamental components of all known genomes, even if we do not know whether presence is always necessary for correct genome activity. Sometimes, they seem obligatory for the host life, sometimes they have a neutral activity, and sometimes they are really deleterious. For Kempken & Windhofer (2001), transposable elements are the “Dr Jekyll and Mr Hyde” of the genome. Even if it is impossible to be definitive on the transposable element role, it is now admitted that they are not “junk DNA,” but that they participate in genomic organization and evolution, directly and indirectly. Null alleles by insertion The more visible harmful effects of transposable elements are their insertion within coding sequences, and, in general, the following inactivation of the corresponding gene or protein. For example, this happened to the glutenin gene Glu-1A in bread wheat (Triticum aestivum), where the LTR retrotransposon Wis2 is inserted (Harberd et al., 1987), or with the insertion of a 23 kb DNA transposon in Y gene (control the accumulation of red phlobasphen in seed) from Sorghum bicolor (Chopra et al., 1999). Most often, the first known elements were identified through their insertion in coding sequences, giving null alleles. Even the Mendel genetic laws on pea are derived from the ancient insertion of a fossil DNA transposon in the starch-branching enzyme coding DNA sequence (Bhattacharyya et al., 1990). Genomic resource pool, genome restructuration and sequence reorganization Transposable elements can generate coding sequences: the Stowaway MITE exists in some cDNA 3 ends, where it creates a polyadenylation site, and in 5 re-

gion of genes, as a cis-regulatory sequence (Bureau & Wessler, 1994b). For example, the TouristC MITE from rice is associated with ESTs from flowers (Iwamoto & Higo, 2003). Some elements can also rearrange promoter sequences, by promoter scrambling (Kloeckener-Gruissem & Freeling, 1995), and then create a new regulation pattern. For example, rice FoldBack Tnr8 might have played a role in the rice genome constitution (Cheng et al., 2000), as for Stowaway elements (Hu et al., 2000): their ubiquitous location within this genome let suppose a constant action in the past and probably in the present. We can find them in coding and regulatory sequences as well as in non-coding sequences. Some studies suggest that eukaryotes, and principally higher plants such as crops, accept a high number of transposable elements within their genome because of their capacity to produce genetic variability (Kidwell & Lisch, 1997). For instance, these elements can generate sequence variations, and they can move exons from place to place (transduction) during transposition: it is the “exon shuffling” (Eickbush, 1999; Feschotte & Wessler, 2001; Giroux et al., 1994; Moran et al., 1999). This transduction can be perfomed by LTR retrotransposons (maize Bs1; Bureau et al., 1994; Elrouby & Bureau, 2001), by LINEs, DNA transposons or helitrons (Feschotte & Wessler, 2001), even on a entire gene. Class II element excision can also lead to an abnormal reparation: for Ac/Ds element, this excision/repair is under the control of sequences 39 bp upstream (5 sequences) or 18 bp downstream (3 sequences), through proteins acting on transposition (abnormal excision) and repair (abnormal reparation), or both (Rinehart et al., 1997). This suggests that retrotransposons have played a role in the formation of promoter and enhancer regions (White et al., 1994). In the same way, the high copy number of Class II CACTA elements (CASPAR, in particular) let supposed an important role of these elements in the evolution and structure of grass genomes (Kumar & Bennetzen, 1999; Wessler et al., 1995; Wicker et al., 2003). In synthetic allotetraploid wheats, the reactivation of Wis2-like LTR retrotransposon leads to the transcription of antisens RNA of various cDNA, and thus, to their elimination by PTGS (Kashkush et al., 2003). Hu et al. (2000) suggest that Stowaway MITEs inserted into R family genes from rice (regulation of the anthocyanin biosynthesis pathway) allow and enhance diversification, and hence decrease interactions between mRNAs that could lead to gene silencing: two or more identical expressed sequences are conducted to silencing by sequence homolog recognition

242 (co-suppression mechanism). Similarly, the close relationship between TouristC MITEs & ESTs in rice flowers may imply an action of these MITEs in the specificity of genes regulating flowering (Iwamoto & Higo, 2003). According to Wessler (2001), cereal crops are in a genome-restructuring phase by transposable elements, and are the perfect tool to analyze these elements and the genomic modifications they induce.

Genome size increase Transposable elements can increase DNA quantity in the nucleus, but they are not the only contributor for genome size increase (reviewed in Petrov, 2001). Cereals have large genomes that are in majority constituted by transposable elements. LTR retrotransposons alone are responsible for the doubling of the length of the maize genome during the last 3 millions years (SanMiguel et al., 1996). This illustrates the genome “obesity” of plant genome (Wessler, 2001), or the “Cvalue Paradox”, i.e. the lack of correlations between genome length and organism complexity. Walbot & Petrov (2001) enlighten that genome expansion is not linear: there are hot-spots of expansion. Some authors suggest that increase in length simply allows a better resistance under different conditions: a larger genome involves larger nuclei, vacuoles and cells (for an harmonious nucleocytoplasmic ratio), that may induce a better drought resistance (Nevo, 2001) because of the more important water content in the cell (Kalendar et al., 2000). On the other hand, the nuclear size increase without an increase for the cell volume leads to a more concentrated cytosol, and thus allows a better resistance to cold (Kalendar et al., 2000). However, there might be a ratio between gene number and genome length: if this ratio is overrun, a system against retrotransposon excess copies should be initiated (Rabinowicz, 2000), which may implicate recombination. More and more evidence suggests an equilibrium between increase and decrease of the genome (Kalendar et al., 2000; Rabinowicz, 2000; Vitte & Panaud, 2003); hence, the increase is not irreversible: it is not a “one-way ticket to genome obesity” (Devos et al., 2002). Then, for rice Oryza sativa, Vitte & Panaud (2003) suggest an “increase/decrease model,” with the possibility of “genomic diet” by soloLTR formation, internal deletion of transposable elements, and various types of recombination (Bennetzen, 2002).

Chromosomal structure and MARs In grasses, centromeric regions are in majority constituted of transposable elements (Ananiev et al., 1998; Dong et al., 1998; Fukui et al., 2001; Kishii et al., 2001; Miller et al., 1998; Presting et al., 1998), and this suggests that these elements may play an important function in centromeric activity (Bennetzen, 2000). It seems now that the vast majority of centromeric and pericentromeric regions (heterochromatic regions) are composed of LTR retrotransposons (Ananiev et al., 1998; Miller et al., 1998; Presting et al., 1998). The question is now whether transposable elements insert into heterochromatin or if their insertion leads the region to heterochromatinisation, and whether transposable elements are responsible for chromatic environment. In the same way, the putative “preference” for MITEs and SINEs insertion in MARs suggests a possible intervention of these elements in supposed recognition/attachment functions of MARs, maybe because of their ability to form secondary structures (Bennetzen, 2000; Lenoir et al., 2001; Tikhonov et al., 2001). Chromosomal breaks and translocations Transposable elements were historically discovered because of their ability to induce chromosomal breaks in maize (Ac/Ds system) (McClintock, 1950, 1984). Class II elements are responsible for these chromosomal breaks or for translocation, as they are cut from their original location. If two of them, in opposite orientation, transpose simultaneously from the two sister chromatids, most of the time, this will result in a chromosomal breakage and a translocation (Bennetzen, 2000). Splicing, intron creation, and transposable elements For Hickey (1982), splicing could be a mechanism selected by transposable elements, resulting in the elimination from their RNA of all non-transposable element sequences. Further, host genome could have retrieved this system, at its own interest, to eliminate transposable elements from coding sequences. Thus, introns would be degenerated rests from transposable elements, and, transposable elements only “semi-parasitic genes;” this theory is still to be discussed. In the maize Sh2 gene, the Ds element was involved in the creation of a new intron (Giroux et al., 1994). In addition, a lot of Class II elements that are inserted into coding

243 sequences contain cryptic splicing sites, to avoid most of deleterious effects from their insertion (for review and examples, see Kumar & Bennetzen, 1999). These mechanisms could produce new ORFs and thus new proteins, new enzymes and new cellular functions. According to Barbara McClintock (1984), transposable elements are a way to react quickly to an unexpected and violent environmental change. They make a quick and violent genome remodeling, normally lethal, but which allows some individuals to survive under exceptional conditions. It is possible that sequences created during abnormal transpositions could play a role in resistance or survival in response to a new kind of stress. It is the “neo-Lamarckism” theory: transposable elements are able, under special circumstances, to modify host genome in one generation, and even during the individual lifespan, in order to adapt to a new environment, and these modifications are transmitted to offspring (Capy et al., 1998; Kumar & Bennetzen, 1999). They also have an impact on the regulation and the modulation of gene expression.

Origin of transposable elements, vertical and horizontal transfers The origin of transposable elements remains unclear because of their presence in all life kingdoms. They were probably present in ancestral bacterial genomes. It is suspected that transposable elements are RNA ancestral genome remainders (Waldrop, 1989), or tools selected to evolve from RNA to DNA world during evolution (Jurka, 1998), but there are no real evidence

yet. The first elements seem to be prokaryotic, with regards to the sequence (Capy et al., 1998). Then, during host genomes evolution, the elements captured some genes, enabling their own evolution. Malik & Eickbush (2001) have shown by RNAseH sequence analysis that LTR retrotransposons and retroviruses appeared after LINEs, probably following recombinations between LINEs & DNA transposons (Figure 10) (Capy et al., 1998). One possibility for LTR retrotransposons is that they are retrovirus derivatives, where the virus muted and lost its env coding ability. But phylogenic analysis seem to show the reverse, as some gypsy elements have gained the env coding sequence (Temin, 1980; Xiong & Eickbush, 1990). For SINEs, their origin is clearer: they are issued from accidents of reverse transcription (from LINEs or LTR retrotransposons). MITEs and LITEs are probably derived from Class II DNA transposon, after elimination of the internal sequences, as shown by Feschotte et al. (2003) and Wicker et al. (2003). The helitrons are probably related to geminiviruses, but the former are not yet identified (Feschotte & Wessler, 2001). LARDs probably originate from gypsy LTR retrotransposons after numerous cycles of deletions/recombinations/mutations (Kalendar et al., 2004). TRIMs origin is still an enigma: they might be derivated from LTR retrotransposons, but the transient structures of these elements have not been identified yet. Because of their spreading ability (even for Class II non-replicative elements), transposable elements could invade a genome species much faster than Mendelian segregation (Arkhipova & Meselson, 2000), even through vertical transmission. Another transposable element spreading mechanism, particularly for

Figure 10. Evolutive relationships between elements. Plain arrows identify known relationships, and interrupted arrows show supposed relationships.

244 elements belonging to related species, could be horizontal transfer from one species to another or even from one kingdom to another (Bannister & Parker, 1985; Bowen & McDonald, 2001; Flavell, 1999; Jordan et al., 1999; Yamanouchi, 2000). Different examples of elements that are common between different species, but very different from other elements within one species, were reported (Flavell et al., 1992; Takasaki et al., 1994). However, the horizontal transfer hypothesis is probably not the main reason for element sequence variation between closely related species: genetic drift, vertical transfer and founding effect are probably also important for these variations (Kumar & Bennetzen, 1999; Kumekawa et al., 1999). In conclusion, transposable elements are essential actors and components of the genome plasticity, but still poorly known and misunderstood. Their ability to alter chromatic environment, expression pattern, and sequence organization suggests their importance in evolution mechanisms, particularly in speciation, genome differentiation and in gene regulation. Future studies about these elements are needed to improve our understanding on their behavior, origin and usefulness in host genomes. Acknowledgments We thank Pierre Sourdille, Marie-Ang´ele Grandbastien, Thierry Langin and Sylvie Bernard for their useful corrections. This work is supported by the French Institute for Agronomical Research (INRA) and the R´egion Auvergne. References Ad´e, J. & F.J. Belzile, 1999. Hairpin elements, the first family of foldback transposons (FTs) in Arabidopsis thaliana. Plant J 19(5): 591–597. Ananiev, E.V., R.L. Phillips & H.W. Rines, 1998. Chromosomespecific molecular organization of maize (Zea mays L.) centromeric regions. Proc Natl Acad Sci USA 95: 13073–13078. Arkhipova, I. & M. Meselson, 2000. Transposable elements in sexual and ancient asexual taxa. Proc Natl Acad Sci USA 97(26): 14473– 14477. Arnaud, Ph., Y. Yukawa, L. Lavie, T. P´elissier, M. Suguira & J.M. Deragon, 2001. Analysis of the SINE S1 Pol III promoter from Brassica; impact of methylation and influence of external sequences. Plant J 26(3): 295–305. Avramova, Z., A. Tikhonov, Ph. SanMiguel, Y.-K. Jin, C. Liu, S.S. Woo, R.A. Wing & J.L. Bennetzen, 1996. Gene identification in a complex chromosomal continuum by local genomic crossreferencing. Plant J 10(6): 1163–1168.

Bannister, J.V. & M.W. Parker, 1985. The presence of a copper/zinc superoxide dismutase in the bacterium Photobacterium leiognathi: A likely case of gene transfer from eucaryotes to prokaryotes. Proc Natl Acad Sci USA 82(1): 149–152. Baulcombe, D.C., 2000. Unwiding RNA silencing. Science 290: 1108–1109. Beall, R. & D.C. Rio, 1996. Drosophila IRBP/Ku p70 corresponds to the mutagen-sensitive mus309 gene and is involved in P-element excision in vivo. Genes Dev 10(8): 921–933. Bender, J., 1998. Cytosine methylation of repeated sequences in eukaryotes: The role of DNA pairing. TIBS 23: 252–256. Bennetzen, J.L., 2000. Transposable element contributions to plant gene and genome evolution. Plant Mol Biol 42: 251–269. Bennetzen, J.L., 2002. Mechanism and rates of genome expansion and contraction in flowering plants. Genetica 115: 29–36. Bhattacharyya, M.K., A.M. Smith, T.H. Ellis, C. Hedley & C. Martin, 1990. The wrinkled-seed character of pea described by Mendel is caused by a transposon-like insertion in a gene encoding starchbranching enzyme. Cell 60(1): 115–122. Boeke, J.D., 1989. Transposable elements in Saccharomyces cerevisiae. In: D.E. Berg & M.H. Howe (Eds.), Mobile DNA, pp. 335–374, American Society for Microbiology, Washington, DC. Boutabout, M., M. Wilhelm & F.-X. Wilhelm, 2001. DNA synthesis fidelity by the reverse transcriptase of the yeast retrotransposon Tyl. Nucleic Acids Res 29(11): 2217–2222. Bowen, N.J. & J.F. McDonald, 2001. Drosophila euchromatic LTR retrotransposons are much younger than the host species in which they reside. Genome Res 11: 1527–1540. Bureau, T.E. & S.R. Wessler, 1994a. Mobile inverted-repeat elements of the Tourist family are associated with the genes of many cereal grasses. Proc Natl Acad Sci USA 91: 1411–1415. Bureau, T.E. & S.R. Wessler, 1994b. Stowaway: A new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants. Plant Cell 6: 907–916. Bureau, T.E., S.E. White & S.R. Wessler, 1994. Transduction of a cellular gene by a plant retroelement. Cell 77: 479–80 Capy, P., 2003 (Eds.). Proceedings of the Xle colloque Elements Transposables, Montpellier, France. Capy, P., C. Bazin, D. Higuet & Th. Langin, 1998 (Eds.). Dynamics and Evolution of Transposable Elements Springer, Landes Biosciences, Library of Congress, Austin, Texas. Casa. A.M., C. Brouwer, A. Nagel, L. Wang, Q. Zhang, S. Kresovich & S.R. Wessler, 2000. The MITE family HeartBreaker (Hbr): Molecular markers in maize. Proc Natl Acad Sci USA 97(18): 10083–10089. Chantret, N., A. Cenci, F. Sabot, O. Anderson & J. Dubcovsky, 2004. Sequencing of the Triticum monococcum hardness locus reveals good microcolinearity with rice. Mol Genet Genomics 271: 377– 386. Cheng, C., S. Tsuchimoto, H. Ohtsubo, E. Ohtsubo, 2000. Tnr8, a foldback transposable element from rice. Genes Genet Syst 75: 327–333. Chopra, S., V. Brendel, J. Zhang, A.D. Axtell & Th. Peterson, 1999. Molecular characterization of a mutable pigmentation phenotype and isolation of the first active transposable element from Sorghum bicolor. Proc Natl Acad Sci USA 96(26): 15330– 15335. Dawson, A., E. Hartswood, T. Paterson & D.J. Finnegan, 1997. A LINE-like transposable element in Drosophila, the I factor, encodes a protein with properties similar to those of retroviral nucleocapsids. EMBO J 16(14): 4448–4455.

245 Deininger, P.L., H. Tiedge, J. Kim & J. Brosius, 1996. Evolution, expression, and possible function of a master gene for amplification of an interspersed repeated DNA family in rodents. Prog Nucleic Acid Res Mol Biol 52: 67–88. Deragon, J.M., N. Gilbert, L. Rouquet, A. Lenoir, Ph. Arnaud & G. Picard, 1996. A transcriptional analysis of the Slsn (Brassica napus) family of SINE retroposons. Plant Mol Biol 32: 869–878. Deragon, J.M., B.S. Landry, T. Pelissier, S. Tutois, S. Tourmente & G. Picard, 1994. An analysis of retroposition in plants based on a family of SINEs from Brassica napus. J Mol Evol 39: 378–386. Devos, K.M., J.K.M. Brown & J.L. Bennetzen, 2002. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res 12: 1075–1079. Dong, F., J.T. Miller, S.A. Jackson, G.-L. Wang, P.C. Ronald & J. Jiang, 1998. Rice (Oryza sativd) centromeric regions consist of complex DNA. Proc Natl Acad Sci USA 95: 8135–8140. Doolittle, W.F. & C. Sapienza, 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601–603. Eickbush, T.H., 1999. Exon shuffling in retrospect. Science 283: 1465–1467. Elrouby, N. & T.E. Bureau, 2001. A novel hybrid ORF formed by multiple cellular gene transductions by a plant LTR-retroelement. J Biol Chem 276(45): 41963–41968. Fedoroff, N.V., 2000. Transposons and genome evolution in plants. Proc Natl Acad Sci USA 97(13): 7002–7007. Fedoroff, N.V., S.R. Wessler & M. Shure, 1983. Isolation of the transposable maize controlling elements Ac and Ds. Cell 35(1): 235–242. Feschotte, C., L. Swamy & S.R. Wessler, 2003. Genome-wide Analysis of mariner-like transposable elements in rice reveals complex relationships with Stowaway MITEs. Genetics 163: 747–758 Feschotte, C. & S.R. Wessler, 2001. Treasures in the attic: Rolling circle transposons discovered in eukaryotic genomes. Proc Natl Acad Sci USA 98(16): 8923–8924. Flavell, A.J., 1999. Long terminal repeat retrotransposons jump between species. Proc Natl Acad Sci USA 96(22): 12211– 12212. Flavell, A.J., D.B. Smith & A. Kumar, 1992. Extreme heterogeneity of Copia-Ty family retrotransposons in plants. Mol Gen Genet 231(2): 233–242. Flavell, R.B., J. Rimpau & D.B. Smith, 1977. Repeated sequence DNA relationship in four cereals genomes. Chromosoma 63: 205– 222 Fukui, K.-N., G. Suzuki, E.S. Lagudah, S. Rahman, R. Appels, M. Yamamoto & Y. Mukai, 2001. Physical arrangement of retrotransposon-related repeats in centromeric regions of wheat. Plant Cell Physiol 42(2): 189–196. Gabriel, A., M. Willems, E.H. Mules & J.D. Boeke, 1996. Replication infidelity during a single cycle of Ty retrotransposition. Proc. Natl. Acad. Sci. USA 93: 7767–71 M.J. Giroux, M. Clancy, J. Baier, L. Ingham, D. McCarty & L.C. Hannah, 1994. De novo synthesis of an intron by the maize transposable element Dissociation. Proc. Natl. Acad. Sci. USA 91: 12150–12154. Goodwin, T.J.D. & R.T.M. Poulter, 2001. The DIRS 1 group of retrotransposons. Mol Biol Evol 18(11): 2067–2082. Harberd, N.P., R.B. Flavell & R.D. Thompson, 1987. Identification of a transposon-like insertion in a Glu-1 allele of wheat. Mol Gen Genet 209: 326–332. Henikoff, S. & M.A. Matzke, 1997. Exploring and explaining epigenetic effects. Trends In Genetics 13(8): 293–295.

Hickey, D.A., 1982. Selfish DNA: A sexually-transmitted nuclear parasite. Genetics 101: 519–531. Hirochika, H., 1995. Activation of plant retrotransposons by stress. In: Modification of Gene Expression and Non-Mendelian Inheritance. NIAR, Japan, pp. 15–21. Hu, J., V.S. Reddy & S.R. Wessler, 2000. The rice R gene family: Two distinct subfamilies containing several miniature inverted-repeat transposable elements. Plant Mol Biol 42: 667–678. Hull, R., 1999. Classification of reverse transcribing elements: A discussion document. Arch Virol 144(1): 209–214. Iwamoto, M. & K. Higo, 2003. Tourist C transposable elements are closely associated with genes expressed in flowers in rice (Oryza sativa). Mol Genet Genomics 268: 771–778. J¨aa¨ skel¨ainen, M.J., A.H. Mykkanen, T. Arna, C.M. Vicient, A. Suoniemi, R. Kalendar, H. Savilahti & A.H. Schulman, 1999. Retrotransposon BARE-1: Expression of encoded proteins and formation of virus-like particles in barley cells. Plant J 20(4): 413–422. Jiang, N., Z. Bao, S. Temnykh, Z. Cheng, J. Jiang, R.A. Wing, S.R. McCouch & S.R. Wessler, 2002. Dasheng: A recently amplified nonautonomous LTR element that is a major component of pericentromeric regions in rice. Genetics 161(3): 1293–1305. Jiang, N., Z. Bao, X. Zhang, H. Hirochika, S.R. Eddy, S.R. McCouch & S.R. Wessler, 2003. An active DNA transposon family in rice. Nature 421: 163–167. Jiang, N., K. Jordan & S.R. Wessler, 2002. Dasheng and RIRE2. A nonautonomous long terminal repeat element and its putative autonomous partner in the rice genome. Plant Physiol 130: 1697– 1705. Jiang, N. & S.R. Wessler, 2001. Insertion preference of maize and rice miniature inverted repeat transposable elements as revealed by the analysis of nested element. Plant Cell 13: 2553–2564. Jordan, I.K., L.V. Matyunina & J.F. McDonald, 1999. Evidence for the recent horizontal transfer of long terminal repeat retrotransposon. Proc Natl Acad Sci USA 96(22): 12621–12625. Jurka, J., 1997. Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. Proc Natl Acad Sci USA 94: 1872–1877. Jurka, J., 1998. Repeats in genomic DNA: Mining and meaning. Curr Opin Struct Biol 8: 333–337. Kalendar, R., J. Tanskanen, S. Immonen, E. Nevo & A.H. Schulman, 2000. Genome evolution of wild barley (Hordeum spontaneum) by BARE-1 retrotransposon dynamics in response to sharp microclimatic divergence. Proc Natl Acad Sci USA 97(12): 6603–6607. Kalendar, R., C.M. Vicient, O. Peleg, Anamthawat-K. Jonsson, A. Bolshoy & A.H. Schulman, 2004. LArge Retrotransposon Derivatives: Abundant, conserved but nonautonomous retroelements of Barley and related genomes. Genetics 166: 1437–1450. Kapitonov, V.V. & J. Jurka, 2001. Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci USA 98(15): 8714–8719. Kashkush, K., M. Feldman & A.A. Levy, 2002. Gene loss, silencing and activation in a newly synthesized wheat allotetraploid. Genetics 160: 1651–1659. Kashkush, K., M. Feldman & A.A. Levy, 2003. Transcriptional activation of retrotransposons alters the expression of adjacent genes in wheat. Nat Genet 33: 102–106. Kempken, F. & F. Windhofer, 2001. The hAT family: A versatile transposon group. Chromosoma 110: 1–9. Kidwell, M.G. & D. Lisch, 1997. Transposable elements as sources of variation in animals and plants. Proc Natl Acad Sci USA 94: 7704–7711.

246 Kikuchi, K., K. Terauchi, M. Wada & H.-Y. Hirano, 2003. The plant MITE mPING is mobilized in anther culture. Nature 421: 167– 170. Kishii, M., K. Nagaki & H. Tsujimoto, 2001. A tandem repetitive sequence located in the centromeric region of common wheat (Triticum aestivum) chromosomes. Chromosome Res 9: 417–428. Kloeckener-Gruissem, B. & M. Freeling, 1995. Transposon-induced promoter scrambling: A mechanism for the evolution of new alleles. Proc Natl Acad Sci USA 92: 1836–1840. Kolosha, V.O. & S.L. Martin, 1997. In vitro properties of the first ORF protein from mouse LINE-1 support its role in ribonucleoprotein particle formation during retrotransposition. Proc Natl Acad Sci USA 94: 10155–10160. Kumar, A. & J.L. Bennetzen, 1999. Plant retrotransposons. Annu Rev Genet 33: 479–532. Kumar, A. & J.L. Bennetzen, 2000. Retrotransposons: Central players in the structure, evolution and function of plant genomes. Trends Plant Sci 5(12): 509–510. Kumar, A. & H. Hirochika, 2001. Applications of retrotransposons as genetics tools in plants biology. Trends Plant Sci 6(3): 127– 134. Kumekawa, N., E. Ohtsubo & H. Ohtsubo, 1999. Identification and phylogenetic analysis of Gypsy-type retrotransposons in the plant kingdom. Genes Genet Syst 74: 299–307. Lal, S.K., M.J. Giroux, V. Brendel, C.E. Vallejos & L.C. Hannah, 2003. The maize genome contains a Helitron insertion. Plant Cell 15: 381–391. Langdon, T., G. Jenkins, R. Hasterok & I.P. King, 2003. A highcopy-number CACTA family transposon in temperate grasses and cereals. Genetics 163: 1097–1108. Lawson, E.J.R., S.R. Scofield, C. Sjodin, J.D.G. Jones & C. Dean, 1994. Modification of the 5 untranslated leader region of the maize Activator element leads to increased activity in Arabidopsis. Mol Gen Genet 245: 608–615. Lenoir, A., L. Lavie, J.-L. Prieto, C. Goubely, J.-C. Cote, T. Pelissier & J.M. Deragon, 2001. The evolutionary origin and genomic organization of SINEs in Arabidopsis thaliana. Mol Biol Evol 18(12): 2315–2322. Mahillon, J. & M. Chandler, 1998. Insertion sequences. Microbiol Mol Biol Rev 62(3): 725–774. Malik, H.S. & T.H. Eickbush, 2001. Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable element and retroviruses. Genome Res 11: 1187– 1197. Matsukoa, Y. & K. Tsunewaki, 1996. Wheat retrotransposon families identified by reverse transcriptase domain analysis. Mol Biol Evol 13(10): 1384–1392. Matyunina, L.V., I.K. Jordan & J.F. McDonald, 1996. Naturally occuring variation in Copia expression is due to both element (cis) and host (trans) regulatory variation. Proc Natl Acad Sci USA 93: 7097–7102. McClintock, B., 1950. Mutable loci in maize. In: Carnegie Institute of Washington Year Book, pp. 174–181, Washington. McClintock, B., 1984. Significance of responses of the genome to challenge. Science 226: 792–801. Miller, J.T., F. Dong, S.A. Jackson, J. Song & J. Jiang, 1998. Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics 150: 1615–1623. Mizuuchi, K., 1992. Transpositional recombination: Mechanistic insights from studies of Mu and other elements. Ann Rev Biochem 61: 1011–1051.

Modolell, J., W. Bender & M. Meleson, 1983. Drosophila melanogaster mutations suppressible by the suppressor of Hairywing are insertions of a 7.3-kilobases mobile element. Proc Natl Acad Sci USA 80(6): 1678–1682. Moore, J.K. & J.E. Haber, 1996. Capture of retrotransposon DNA at the sites of chromosomal double-strand breaks. Nature 383: 644–646. Moran, J.V., R.J. DeBernardinis & H.H. Kazazian, Jr., 1999. Exon shuffling by LI retrotransposition. Science 283: 1530–1534. Mount, S.M. & G.M. Rubin, 1985. Complete nucleotide sequence of the Drosophila transposable element Copia: Homology between Copia and retroviral proteins. Mol Cell Biol 5(7): 1630–1638. Nakayashiki, H., K. Ikeda, Y. Hashimoto, Y. Tosa & S. Mayama, 2001. Methylation is not the main force repressing the retrotransposon MAGGY in Magnaporthe grisea. Nucleic Acids Res 29(6): 1278–1284. Nevo, E., 2001. Evolution of genome-phenome diversity under environmental stress. Proc Natl Acad Sci USA 98(11): 6233–6240. Okamoto, H. & H. Hirochika, 2001. Silencing of transposable element in plants. Trends Plant Sci 6(11): 527–534. Orgel, L.E. & H.C. Crick, 1980. Selfish DNA: The ultimate parasite. Nature 284: 604–607. Pardue, M.L., 2000. Transposable elements: Friends, foes, or merely fellow travelers? Trends Genet 16(4): 155–156. Pastink, A., J.C.J. Eeken & P.H.M. Lohman, 2001. Genomic integrity and the repair of double-strand DNA breaks. Mut Res 480–481: 37–50. Pelissier, T., S. Tutois, J.M. Deragon, S. Tourmente, S. Genestier & G. Picard, 1995. Athila, a new retroelement from Arabidopsis thaliana. Plant Mol Biol 29(3): 441–452. Petrov, D.A., 2001. Evolution of genome size: New approaches to an old problem. Trends Genet 17(1): 23–28. Potter, S.S., 1982. DNA sequence analysis of a Drosophila foldback transposable element rearrangement. Mol Gen Genet 188(1): 107–110. Presting, G.G., L. Malysheva, J. Fuchs & I. Schubert, 1998. A Ty3/Gypsy retrotransposon-like sequence localizes to the centromeric regions of cereal chromosomes. Plant J 16(6): 721–728. Preston, B.D., 1996. Error-prone retrotransposition: Rime of the ancient mutators. Proc Natl Acad Sci USA 93: 7427–7431. Purugganan, M.D. & S.R. Wessler, 1994. Molecular evolution of Magellan, a maize Ty3/Gypsy-like retrotransposon. Proc Natl Acad Sci USA 91: 11674–11678. Rabinowicz, P.D., 2000. Are obese plant genome on a diet? Genome Res 10: 893–894. Raina, R., D. Cook & N.V. Fedoroff, 1993. Maize Spm transposable element has an enhancer-insensitive promoter. Proc Natl Acad Sci USA 90: 6355–6359. Rebatchouk, D. & J.O. Narita, 1997. Foldback transposable elements in plant. Plant Mol Biol 34(5): 831–835. Rinehart, T.A., C. Dean & C.F. Weil, 1997. Comparative analysis of non-random DNA repair following Ac transposon excision in maize and Arabidopsis. Plant J 12(6): 1419–1427. Rudenko, G.N., A. Ono & V. Walbot, 2003. Initiation of silencing of maize MuDR/Mu transposable elements. Plant J 33: 1013–1025. Rudenko, G.N. & V. Walbot, 2001. Expression and posttranscriptional regulation of maize transposable element MuDR and its derivatives. Plant Cell 13: 553–570. SanMiguel, Ph., A. Tikhonov, Y.-K. Jin, Motchoulskaia, N., D. Zakharov, A. Melake-Berhan, P.S. Springer, K.J. Edwards, M. Lee, Z. Avramova & J.L. Bennetzen, 1996. Nested

247 retrotransposons in the intergenic regions of the maize genome. Science 274: 765–768. Schmidt, T., 1999. LINEs, SINEs and repetitive DNA: Non-LTR retrotransposons in plant genomes. Plant Mol Biol 40: 903– 910. Shirasu, K., A.H. Schulman, T. Lahaye & P. Schulze-Lefert, 2000. A contiguous 66-kb barley DNA sequence provides evidence for reversible genome expansion. Genome Res 10(7): 908–915. Suoniemi, A., K. Anamthawat-Jonsson, T. Arna & A.H. Schulman, 1996. Retrotransposon BARE-1 is a major, dispersed component of the barley (Hordeum vulgare L.) genome. Plant Mol Biol 30: 1321–1329. Takasaki, N., S. Murata, M. Saitoh, T. Kobayashi, L. Park & N. Okada, 1994. Species-specific amplification of tRNA-derived short interspersed repetitive elements (SINEs) by retroposition: A process of parazitation of entire genome during the evolution of salmonides. Proc Natl Acad Sci USA 91: 10153–10157. Temin, H.M., 1980. Origin of retro viruses from cellular moveable genetic elements. Cell 21: 599–600. Thompson-Stewart, D., G.H. Karpen & A.C. Spradling, 1994. A transposable element can drive the concerted evolution of tandemly repetitious DNA. Proc Natl Acad Sci USA 91: 9042– 9046. Tikhonov, A., L. Lavie, Tatout, Ch., J.L. Bennetzen, Z. Avramova & J.M. Deragon, 2001. Target sites for SINE integration in Brassica genomes display nuclear matrix binding activity. Chromosome Res 9: 325–337. Turcotte, K., S. Srinivasan & T.E. Bureau, 2001. Survey of transposable element from rice genomic sequences. Plant J 25(2): 169– 179. Vitte, C. & O. Panaud, 2003. Formation of Solo-LTRs through unequal homologous recombination counterbalances amplifications of LTR retrotransposons. Mol Biol Evol 20(4): 528–540. Volff, J.-N., U. Hornung & M. Schartl, (2001a). Fish retroposons related to the Penelope element of Drosophila virilis define a new group of retrotransposable elements. Mol Genet Genomics 265: 711–720. Volff, J.-N., C. Korting, A. Froschauer, K. Sweeney & M. Scharl, (2001b). Non-LTR retrotransposons encoding a restriction enzyme-like endonuclease in vertebrates. J Mol Evol 52: 351–360. Walbot, V. & D.A. Petrov, 2001. Gene galaxies in the maize genome. Proc Natl Acad Sci USA 98(15): 8163–8164.

Waldrop, M., 1989. Did life really start out in an RNA world? Science 246(4935): 1248–1249. Wendel, J.F. & S.R. Wessler, 2000. Retrotransposon-mediated genome evolution on a local ecological scale. Proc Natl Acad Sci USA 97(12): 6250–6252. Wessler, S.R., 2001. Plant transposable element. A hard act to follow. Plant Physiol 125: 149–151. Wessler, S.R., T.E. Bureau & S.E. White, 1995. LTRretrotransposons and MITEs: Important players in the evolution of plant genomes. Curr Opin Genet Dev 5: 814–821. White, S.E., L.F. Habera & S.R. Wessler, 1994. Retrotransposons in the flanking regions of normal plant genes: A role for Copza-like elements in the evolution of gene structure and expression. Proc Natl Acad Sci USA 91: 11792–11796. Wicker, Th., R. Guyot, N. Yahiaoui &, B. Keller, 2003. CACTA transposon in Triticeae. A diverse family of high-copy repetitive elements. Plant Physiol 132: 52–63. Witte, C.-P., Q.H. Le, T.E. Bureau & A. Kumar, 2001. Terminalrepeat retrotransposons in miniature (TRIM) are involved in restructuring plant genomes. Proc Natl Acad Sci USA 98(24): 13778–13783. Xiong, Y. & T.H. Eickbush, 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J 9(10): 3353–3362. Yamanouchi, K., 2000. Potential risk of genotransplant-associated infections. Transplant Proc 32: 1155–1156. Yang, G. & T. Hall, 2003. MDM-1 and MDM-2: Two Mwtatorderived MITE families in rice. J Mol Evol 56: 255–264. Yasui, Y., S. Nasuda, Y. Matsukoa & T. Kawahara, 2001. The Au family, a novel short interspersed element (SINE) from Aegilops umbellulata. Theor Appl Genet 102: 463–470. Yoder, J.A., C.P. Walsh & T.H. Bestor, 1997. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet 13(8): 335–340. Zhang, X., C. Feschotte, Q. Zhang, N. Jiang, W.B. Eggleston & S.R. Wessler, 2001. P instability factor: An active maize transposon system associated with the amplification of Tourist-like MITEs and a new superfamily of transposases. Proc Natl Acad Sci USA 98(22): 12572–12577. Zhang, X., N. Jiang, C. Feschotte & S.R. Wessler, 2004. PIF- and Pong-like transposable elements: Distribution, evolution and relationship with Tourist-like miniature inverted repeat tranposable elements. Genetics 166: 971–986.