Abundance, distribution and dynamics of retrotransposable elements ...

1 downloads 0 Views 176KB Size Report
that transpose through DNA), transposable elements (TEs) are often viewed as selfish DNA that expands into the genome. (Doolittle and Sapienza, 1980; Orgel ...
Evolution of Retrotransposable Elements Cytogenet Genome Res 110:426–440 (2005) DOI: 10.1159/000084975

Abundance, distribution and dynamics of retrotransposable elements and transposons: similarities and differences A. Hua-Van, A. Le Rouzic, C. Maisonhaute, P. Capy Laboratoire Populations, Génétique et Evolution, CNRS, Gif/Yvette (France)

Manuscript received 20 January 2004; accepted in revised form for publication by J.-N. Volff April 2004.

Abstract. Retrotransposable elements and transposons are generally both found in most eukaryotes. These two classes of elements are usually distinguished on the basis of their differing mechanisms of transposition. However, their respective frequencies, their intragenomic dynamics and distributions, and the frequencies of their horizontal transfer from one species to another can also differ. The main objective of this review is to compare these two types of elements from a new perspective, using data provided by genome sequencing projects and relat-

ing this to the theoretical and observed dynamics. It is shown that the traditional division into two classes, based on the transposition mechanisms, becomes less obvious when other factors are taken into consideration. A great diversity in distribution and dynamics within each class is observed. In contrast, the impact on and the interactions with the genome can show striking similarities between families of the two classes.

Whatever their class (Class I: retroelements, that transpose through an RNA intermediate, and Class II: DNA transposons, that transpose through DNA), transposable elements (TEs) are often viewed as selfish DNA that expands into the genome (Doolittle and Sapienza, 1980; Orgel and Crick, 1980). However, it is now accepted that their contribution to the generation of variability has important consequences for genome evolution. These views are not mutually exclusive. In the course of the cohabitation between these elements and host genomes, regulation mechanisms have developed to avoid or to limit detrimental effects. It is usually claimed that TEs are found in all living organisms. So far, only two exceptions have been reported. First, there is no IS-like element (Insertion Sequence, Class II) in the genome of the sequenced strain of Bacillus subtilis (Kunst et al.,

1997). The second example is the apparent absence of any known TEs in the genome of Plasmodium falciparum (Gardner et al., 2002). All other organisms contain TEs belonging to at least one class, and usually to both classes. The numerous reports describing the TE content of genomes reveal both the great natural diversity of their distribution and abundance, and the fact that only a limited number of families or superfamilies have been identified. Extreme situations have been reported, such as the classical example of Saccharomyces cerevisiae, in which only elements with Long Terminal Repeats (LTR retrotransposons) have been described (Kim et al., 1998). Another example provided is the Bdelloid rotifers, in which only Class II elements were initially suspected (Arkhipova and Meselson, 2000). However, elements with a reverse transcriptase, although unusual, have been recently described (Arkhipova et al., 2003). On the other hand, if we assume that retrons and group II introns are not transposable elements (this is still an open question), then we can say that bacteria do not contain retroelements. All possible intermediate situations are found between these extreme cases, regarding two main criteria: the number of different families within a genome and the number of copies within one family. For instance, in the human genome, most of the copies are retroelements belonging to a few families of LINE (Long INterspersed Element) and SINE (Short INter-

Supported by the GDR no. 2157 “Evolution of transposable elements: from genome to populations” of the CNRS, by the CNRS UPR9034 and by the University Paris 11. Request reprints from Pierre Capy Laboratoire Populations, Génétique et Evolution, CNRS FR–91198 Gif/Yvette (France); telephone: 33 1 69.82.37.09 fax: 33 1 69.07.04.21; e-mail: [email protected]

ABC

Fax + 41 61 306 12 34 E-mail [email protected] www.karger.com

© 2005 S. Karger AG, Basel 1424–8581/05/1104–0426$22.00/0

Copyright © 2005 S. Karger AG, Basel

Accessible online at: www.karger.com/cgr

Genome invasion or de novo formation Dynamics of invasion

Host defense

Multiplication Evolution

Regulation

Variability

Genetic burden

Deleterious effect Death

Domestication Fig. 1. Once in a genome, the fate of a TE will be determined by the balance between different factors, which will lead either to the domestication, the maintenance or the loss of the TE, if deleterious effects have not triggered the death of the host. To compare Class I and Class II elements, we used two approaches, one based on the genome sequencing projects and the other based on dynamics studies.

spersed Element), while the remaining TEs represent numerous families with relatively low copy numbers. The two transposition mechanisms that define the TE classes involve some similar biochemical steps, but they are globally very different, as they use either an RNA- or a DNAintermediate. This mechanistic difference is thought to have some major consequences for the TE life cycle. But whatever their class, TEs all have to deal with the same problems: invading the genome, remaining and multiplying within it, and finally colonizing other genomes (species). All these steps have to be done without being so deleterious as to lead to the death of the host genome. Moreover, each genome will produce particular conditions, and the success or failure of an element to settle within it is therefore the result of the interactions between the element and this specific genome. What different strategies can be adopted by the different classes of TE, and what evolutionary forces are involved? The aim of this review is not to answer these questions as it is almost impossible, but, from the mass of data available, to try to withdraw tendencies that could represent the first step towards a better understanding. In this review, we propose to list and discuss the features that distinguish the retrotransposons (Class I) from the transposons (Class II), using two different approaches (see Fig. 1). We will first try to identify differences concerning the structure, specific distribution, relative abundance, and chromosomal location of TEs in sequenced genomes. The second approach will consist of evaluating the different scenarios advanced to explain their dynamics, from their transposition rate and regulation, to their ability to transfer horizontally. Although numerous prokaryotic genomes are available, they mainly contain Class II elements, thus we will focus on eukaryotes, which display a great diversity of elements of each class.

Loss Maintenance

Distribution in sequenced genomes

Structural differences Basically the main differences between the two classes of TEs are their structure and their mode of transposition. All retroelements (Class I) use a reverse transcriptase for their transposition which involves a copy-and-paste mechanism. In this case, the transposition intermediate is an RNA molecule, transcribed from the donor copy, and reverse transcribed before insertion. Classical DNA transposons use a cut-andpaste mechanism, with a DNA molecule intermediate, the excised donor copy that will reinsert elsewhere. The recentlydiscovered Helitron elements replicate using a rolling circle mechanism, using a DNA molecule. This means that they can be classified as Class II elements, even though their transposition does not involve a classical cut-and-paste mechanism. This new subclass will not be considered here, because of the lack of information about its abundance, chromosomal location or transposition dynamics. Within Class I, autonomous elements (i.e. encoding proteins that promote their own transposition) are divided into two subclasses (LTR retrotransposons and LINEs), based on their differing overall structures (the presence or absence of LTRs), even though they contain the same kind of proteins. The first ORF encodes a gag-like protein, the role of which has not yet been totally elucidated. The second gene, the pol gene, contains four domains in the LTR retrotransposons (retroposonineae): a protease (PR), an integrase (IN), a reverse transcriptase (RT) and an RNAse H (RH). As mentioned in Capy (this issue), the order of these domains differs in the Pseudoviridae (PR-RTRH-IN) and in the Metaviridae (PR-IN-RT-RH). In the nonLTR retrotransposons (Order: Retrales; Suborder: Retroposinae), RT and RH domains can be easily identified, but the IN

Cytogenet Genome Res 110:426–440 (2005)

427

and PR domains seem to be absent. However, in some of these elements an endonuclease (EN) can be found (Feng et al., 1996; Volff et al., 2001). A third ORF is sometimes specifically found in LTR-retroelements. This ORF is thought to encode an envelope gene (env). Such a gene could be partly responsible for the infectivity of the elements. This has been strongly suggested for the gypsy element in D. melanogaster (Kim et al., 1994), but no clear conclusion could be drawn for other elements. The presence of such an ORF suggests that the env-containing LTR retrotransposons are closely related to retroviruses. However, env genes could have been acquired independently by several different elements, such as elements of the gypsy group (Errantiviruses) and the SIRE-1 element of soybean (Hemiviruses) (Laten et al., 1998; Lerat et al., 1999). The size of Class I full-length elements is generally around or more than 5 kb. Each type gives rise to non-autonomous derivatives. LINE copies are often 5) truncated due to the transposition mechanisms used, whereas LTR retroelements can result in solo-LTRs as a result of the recombination of LTRs. Another type of short non-autonomous Class I elements exists. These are known as SINEs and are about 500 bp long, lack coding capacity and utilize the LINE machinery to transpose, although they appear to be unrelated to LINEs, and are primarily derived from RNA genes. Finally, in addition to structural differences, the transposition mechanisms of these elements can be radically different, although they keep in common the use of an RNA intermediate. Class II elements were traditionally viewed as being structurally more homogenous. Indeed, all these elements (with the exception of the Helitrons) are flanked by Inverted Terminal Repeats (ITRs) and the autonomous elements all encode at least one protein, the transposase. However, several superfamilies can be distinguished that show no similarities in their transposases. In some of them, a second ORF is present (maize Mutator element). For some others, such as the P or Ac elements of D. melanogaster and Zea mays respectively, the transposase gene contains introns. In any case, the transposase protein normally contains several motifs inherent to its function, such as a nuclear localization signal and a DNA binding motif. These autonomous elements are usually shorter than Class I elements (less than 3 kb). Short elements, flanked by ITRs but without any coding capacity, known as MITEs (Miniature Inverted Terminal Element), have recently been included in this class. For several of them, similarities with the ITRs of complete elements have been identified (Feschotte et al., 2002). On this basis, MITEs related to virtually any Class II superfamily have been detected. It is therefore assumed that they transpose using the transposase of the autonomous element (Zhang et al., 2001). However, the origin of these elements remains uncertain. For instance, it has been suggested that the Emigrant elements of Arabidopsis thaliana derive from elements belonging to the pogo family (Feschotte and Mouches, 2000). However, the alternative hypothesis that MITEs are newly evolving and opportunistic elements, cannot be ruled out at this time. So both classes contain both autonomous (coding) large elements, and non-autonomous ones that will use the transposition machinery of the former to transpose. It is obvious, how-

428

Cytogenet Genome Res 110:426–440 (2005)

ever, that these short elements have arisen independently by different mechanisms on several occasions in each class.

How many elements, on which chromosome? Element detection The complete genome sequences of Bacteria, Protozoa, Metazoa, Plants and Fungi have been released during the last decade. Today about 196 genomes of Bacteria and Archaea, and 29 of Eukarya are available (http://genomesonline.org), making it possible to carry out comparative studies and comprehensive surveys of TEs based on their diversity, number and chromosomal distribution (Cline et al., 2002; Nanda et al., 2002; Tettelin et al., 2002). These data can also be used for reconstructing phylogenies, or estimating the age of a family of TE, and can be particularly useful for elucidating their putative involvement in genome architecture and function (Kim et al., 1998). It must be stressed that TE analysis at a genome-wide scale requires very effective software programs. Commonly used programs (RepeatMasker, BLAST) generally detect only sequences that are fairly similar to known elements. This inevitably leads to a severe underestimation of the number of elements. Other specific programs have been developed to identify a particular class of TE. They generally rely on amino-acidtranslated sequences or on known structures (presence of an ITR or LTR). However, this means that they may fail to detect elements with unusual structures. This is probably one of the reasons for the delay in discovering Helitron-like elements (Kapitonov and Jurka, 2001). Another difficulty also resides in the heavily deleted copies that do not match the query sequences, even though these fragments appear to be an important part of “junk” DNA. In this review, only eukaryotes for which sufficient data concerning TEs are available will be examined. Our pool of complete genomes will include the genomes of two yeasts (Saccharomyces cerevisiae, and Schizosaccharomyces pombe), one filamentous fungus (Neurospora crassa), two plants (Oriza sativa and Arabidopsis thaliana), two insects (Drosophila melanogaster and Anopheles gambiae), a nematode (Caenorhabditis elegans) and three vertebrates (Homo sapiens, Mus musculus and Takifugu rubripes) as well as some protozoans. Species for which only partial sequences are available, such as the fish Tetraodon nigroviridis, the slime mold Dictyostelium discoideum, and the maize Zea mays, will be also considered. Although extrapolation to the whole genome may be questionable, these last examples provide us with a useful forecast of the TE content, at least in terms of family diversity, until a more complete analysis becomes available. TE abundance and genome size The abundance of TEs within a genome can be quantified in different ways. One can either consider the number of families within each subclass (diversity), the copy number of a given family, or the percentage of the genome occupied by TE sequences. On the basis of this last criterion, the proportion of the genome consisting of TEs appears to vary considerably,

Table 1. Relative abundance of transposable elements of the main genome sequenced. The average number of families within each group of TEs is generally given, except for Mus musculus and for Homo sapiens. In this case, the percentage of the total genome is provided. For A. thaliana the grey box corresponds to the total number of LINEs + SINEs. Species

Genome size Mb

Saccharomyces cerevisiae Caenorhabditis elegans

12 97

Drosophila melanogaster Anopheles gambiae Mus musculus Homo sapiens sapiens Arabidopsis thaliana

137 278 2700 2900 125

a

Copy no. of TE 331 3718

1572 ? 2.9 u 106 3.16 u 106 5500

% of the genome

Total no. of families

3.1 6.5

5 34

3.1 16? 37.9 45 14

~100 ~80 ? 263a 160

Type I LTRs 5 19

49 8 10% 8% 70a

Type II LINEs SINEs 0 3

0 5

27 2 19% 21% 10a

0 1 8% 13%

Reference

DNA 0 7

19 0 0.9% 3% 80a

Kim et al. (1998) Duret et al. (2000), The C. elegans Sequencing Consortium (1998) Ganko et al. (2001) Kaminker et al. (2002) Holt et al. (2002) Waterston et al. (2002) Lander et al. (2001) The Arabidopsis Genome Initiative (2000) Kidwell (2002)

Kidwell (2002).

depending on the species (see Table 1). Moreover, as suspected by several authors, the genome size is closely correlated to the abundance of transposable elements (Kidwell, 2002). This relationship is almost linear when computed for 12 genomes and the equation of the regression line is: Y = – 92.41 + 0.51 X,

where X is the genome size in megabases and Y the total size of TE (Kidwell, 2002). In terms of the relative proportion of each class of element, retroelements appear to be the most abundant type of TEs. The copy number of Class I families is generally higher than that of DNA transposons (if some MITE elements are excluded), and the large size of full-length Class I elements means that they constitute a high proportion of the TE sequences. However, the family distribution is highly variable across genomes. In a given species, one family or subclass of TEs will often predominate, probably as a consequence of its recent amplification, whereas the same family can be absent in another species. For instance, LINE elements are absent from the genome of S. cerevisiae, but are the main super-family in the human genome. For instance, 40–45 % of human DNA consists of transposable elements. Most of this corresponds to non-LTR retroelements represented by few super-families, only two of which are present in a huge copy number: more than half a million for the LINE L1 and one million for the SINE Alu. The LTR retroelement subclass (which includes the vertebrate-specific endogenous retroviruses, or ERVs) is composed of more than 100 different families (and more than 200 subfamilies). However, Human ERVs are present in relatively small copy numbers (Eickbush and Furano, 2002) and account for only 8 % of the human genome (Lander et al., 2001). Finally, DNA transposons are diverse, but correspond to less than 3 % of this genome. In mouse, the contrast is even sharper since less than 1 % of the genome consists of DNA transposons. In the large genome of maize (2,700 Mb), Class I also predominates over Class II. Class I elements account for about 50 % of the genome, and consist mainly of about 20 LTR retroelement families (SanMiguel et al., 1996; Meyers et al., 2001).

The preponderance of Class I elements is not restricted to large genomes (Kidwell, 2002). One of the smallest vertebrate genomes, that of the fish Takifugu rubripes (365 Mb), contains less than 3 % of TE sequences. Overall, more than 40 families have been identified, equally distributed between Class I and Class II and representing all known super-families. But the copy number within Class I families is higher (up to to 6,500) (Aparicio et al., 2002). In the related species Tetraodon nigroviridis, TEs are estimated to make up less than 1 % of the 46 Mb sequenced, and as in Fugu, LINEs are the most frequent (Crollius et al., 2000). Class I is also the most abundant class in insects. The genome analysis of Anopheles gambiae uncovered about 40 different types of TEs, encompassing all known types of element (Holt et al., 2002). In mosquito, LTR elements and SINEs are the most abundant types in terms of copy number, followed by MITEs. Once again this is an underestimate, as more than 100 families of non-LTR elements and a total of 400 different families have recently been recognized (Biedler and Tu, 2003; Quesneville et al., 2003). In Drosophila, TEs account for 4–5 % of the euchromatic part of the genome (Hoskins et al., 2002; Kaminker et al., 2002), and exhibit great diversity. The TE panel has also been extended to 330 families as a result of using high-performance programs (Quesneville et al., 2003). It must be stressed that this species lacks the SINE and MITE families, but does have the still-unclassified FB element. In both the euchromatic and the heterochromatic parts, retroelements (with LTRs) are the most abundant in percentage terms, as well as in terms of copy number. However, unlike the situation in the human genome, no high copy number family has been found (Kaminker et al., 2002). Nevertheless, the predominance of Class I elements is not absolute, and some genomes appear to have been invaded by particular families of short elements, MITEs, now recognized as non-autonomous Class II elements. These elements are present in high copy numbers and represent diverse families, most of which are related to an autonomous family (Feschotte et al., 2002). This is the case of genomes such as those of O. sativa, A. thaliana and C. elegans. As MITEs are short (! 500 bp), the percentage of the genome they occupy usually remains

Cytogenet Genome Res 110:426–440 (2005)

429

moderate. However, they contribute more than 70 % of TEs in O. sativa, and about 50 % in A. thaliana, despite the wide diversity of retroelements displayed (Le et al., 2000). MITEs are also the most common elements found in C. elegans, in which both classes are otherwise present with about 12 DNA transposons related to the main Class II superfamilies, 12 LINEs (Duret et al., 2000) and 20 LTR retroelement families (Ganko et al., 2001). In these LTR retroelement families, very few copies are observed and for some of them only fragmented or solo-LTRs have been detected. At the opposite, the numerous families of C. elegans MITEs are often related to a known Class II transposon, and are sometimes present in thousands of copies. The picture is somewhat different in unicellular eukaryotes. In the very compact yeast genomes (S. cerevisiae and S. pombe), no Class II elements have been detected and the number of Class I families is extremely low. Five Ty families coexist in the baker’s yeast, in relatively few copies (Kim et al., 1998). Only two Class I elements have been found in the fission yeast (Tf1 and Tf2), and only one is present as full-length copies in the sequenced strain, also with a low copy number, although the detection of several different solo-LTRs suggests the past existence of some other families (Bowen et al., 2003). Parasitic protozoans include several groups of phylogenetically unrelated organisms, such as the Trypanosomatids (Trypanosoma cruzi, T. brucei, Crithidia fasciculata, Leishmania major), Apicomplexans (Plasmodium falciparum), Mycetozoans (Entamoeba histolytica) or Diplomonads (Giardia lamblia). In all these species, no DNA transposons are found and only LINE elements have been detected, except in T. cruzi, in which an LTR retroelement exists (Bennetzen, 2000; Bringaud et al., 2002), and Plasmodium falciparum and Leishmania, which apparently possess no TEs at all (Carlton et al., 2002; Gardner et al., 2002). All these diverse species apparently harbor only Class I elements. However, protozoans form a paraphyletic group with no common features apart from being unicellular and having a parasitic life style. Moreover, related, non-parasitic species do not display this pattern of TE content. Indeed, several TEs from both classes have been detected in D. discoideum, distantly related to Entamoeba (Glöckner et al., 2002) and several Tc1-mariner (Class II) have been found in Ciliates, a taxon close to Apicomplexans, and could be involved in macronucleus rearrangements (Klobutcher and Jahn, 1991). The first Ciliates LINE element has been reported recently (Fillingham et al., 2004). Inter-chromosomal location Abundance of TEs can vary locally, and it is usually observed that the TE density differs between chromosomes and even between chromosome arms. In plants, a chromosome-by-chromosome analysis has been done only for a few rice chromosomes. The global percentages of TEs appear to vary from chromosome to chromosome, but there is no data that considers Class I and Class II separately (Feng et al., 2002; Sasaki et al., 2002; Rice Chromosome 10 Sequencing Consortium, 2003). In the human genome, a wide chromosomal variation is found between chromosomes: about 40 % of both chromosomes 21 and 22 are TEs, but Alu elements are twice as frequent on chromosome 22 as on chromosome 21

430

Cytogenet Genome Res 110:426–440 (2005)

(Grover et al., 2003). In Drosophila, a large excess of TEs is observed on chromosome 4, consisting mainly of DNA transposons and LINE retroelements (Bartolome et al., 2002; Kaminker et al., 2002). This situation could simply reflect the heterochromatic nature of chromosome 4 (see below). However in other cases, the observed differences are more likely to be related to family specificities and/or chromatin structure than to the class it belongs to. The Y chromosomes are usually found to be rich in TEs (full-length L1 and LTR in human and mouse, several retroelements in Drosophila sp.). This excess could be explained by the lack of recombination, and greater tolerance, probably due to the paucity of genes or merely to the high heterochromatin content (Junakovic et al., 1998; Boissinot et al., 2001; Waterston et al., 2002). An accumulation of LINE elements is observed on the X chromosome (L1) in human and mouse. The possible involvement of L1 clusters in X inactivation was invoked by Bailey et al. (2000) to explain this situation. In Anopheles, the highest TE density is found on the X chromosome (59 TE/Mb), while lower values (37–48 TE/Mb) are observed for autosomal chromosomes (Holt et al., 2002). It is not known whether this involves one class of TE more than the other. Conflicting conclusions have been reported for the Drosophila X chromosome: a deficiency of TEs on the X chromosome (Bartolome et al., 2002) or, based on an analysis of Release 3 sequences, no evidence of a reduction of TE density on the X chromosome compared to the major autosomal chromosome arms (Kaminker et al., 2002). Rizzon et al. (2002) observed that the deficiency on X was only significant for DNA and non-LTR retroelements. They further pointed out the marked bias resulting from regions of TE accumulation (chromosome 4, centromeric and pericentromeric regions). When these regions were eliminated, a significant accumulation of LTR retroelements on chromosome X was revealed. The distributions of some MITEs, LTR retroelements, several DNA transposons and non-LTR retroelements have been investigated in C. elegans. Transposons were uniformly distributed amongst autosomes. Once again, differences were observed between families, but not between classes. Although three out of the four MITEs analyzed were clearly under-represented on the X chromosome (Surzycki and Belknap, 2000), an overall higher density of TEs, mainly due to specific DNA transposons, was observed on the X (Duret et al., 2000). However, C. elegans remains a particular model due to its unusual reproductive system. The X chromosome is particular since males of numerous species are hemizygotic for this chromosome. Two predictions can be made: First, any recessive lethal mutations should be immediately eliminated and then a deficit in TE is expected (Biémont et al., 1997). Second, ectopic recombination should be less frequent allowing TE accumulation (Charlesworth et al., 1997). However, the inconsistency of data between species, or between TE families does not reveal the relative importance of each of these two models. Moreover, these models predict states that would be observed when equilibrium is reached, data usually unknown for most of the populations studied.

On which region of the chromosome? TEs are usually not evenly distributed within a given chromosome. Clustering (adjacent insertions) and nesting (insertion within another element) of transposons have been reported in numerous organisms, such as slime molds (Glöckner et al., 2001), filamentous fungi (Nitta et al., 1997; Cambareri et al., 1998; Hua-Van et al., 2000), cereals (SanMiguel et al., 1996, 2002), fish (Dasilva et al., 2002) and D. melanogaster (Kaminker et al., 2002). The proportion of TEs involved in such arrangements may be not negligible as it can reach 21 % of euchromatic TEs in Drosophila (Kaminker et al. 2002). In D. discoideum, about two thirds of the TEs not displaying tRNA insertion specificity were found inserted in other TEs (Glöckner et al. 2001). Chromatin versus heterochromatin In a number of genomes, DNA transposons and retrotransposons appear to be more abundant within the heterochromatin. In Dipterans, TEs accumulate near centromeres and telomeres. This was observed long ago in Drosophila, where TEs account for 8 % of heterochromatin and 4–5 % of euchromatin. These figures are probably an underestimate, and a recent analysis of heterochromatic unmapped scaffolds revealed that 50 % of the sequences exhibited homology with TEs (Hoskins et al., 2002). In mosquito, TEs account for 60 % of heterochromatin (usually pericentromeric) versus 16 % of euchromatin (Holt et al., 2002). As in Drosophila, the TEs in heterochromatin are highly fragmented. In plants such as O. sativa or A. thaliana, centromeres and pericentromeric regions also contain high levels of TEs. In both these plants, retroelements appeared to be centromeric and DNA transposons more predominantly pericentromeric. Some classical DNA families and MITEs, as well as SINEs, were an exception to this clustering, since these TEs are distributed along the chromosomes in A. thaliana (The Arabidopsis Genome Initiative, 2000; Lenoir et al., 2001). The fine analysis of centromeres in a number of species (insects, plant, fungi) has revealed a typical interspersion of satellite sequences consisting of (short) tandem repeats and transposable elements (mainly retroelements). This is true for the centromeres of several insects (Stratikopoulos et al., 2002; Sun et al., 2003) and for at least one fungus, Neurospora crassa, in which several degenerated retroelements have been found in the centromere of chromosome 7 (Cambareri et al., 1998). Some of these elements are found exclusively in these centromeric regions, such as the highly-conserved Ty3/gypsy-like CR retroelements from cereal centromeres (Zhong et al., 2002). Similar organizations are also found in the heterochromatic knobs of plants. For instance, the resolution of the A. thaliana knob of chromosome 5 revealed a tandem repeat of 2.2 kb units embedded within a transposon-rich region including Athila elements (Tabata et al., 2000). In maize, large blocks of high copynumber LTR retrotransposons that make up to 50 % of the genome, are located between genes, in a presumably methylated and heterochromatinized state, although they have not been reported as heterochromatin (SanMiguel et al., 1998; Bennetzen, 2000). In contrast, maize DNA transposons are mainly

found in unmethylated, genetically active euchromatic regions (Bennetzen, 2000). In vertebrates with small genomes, such as that of Tetraodon nigroviridis, most TEs appear clustered in the heterochromatic part of the chromosomes (Dasilva et al., 2002). Pericentromeric regions are also transposon rich, but contain both classes of elements, and display no simple repeats. These organization patterns, apparently widespread among multicellular eukaryotes, raise the question of the involvement of retroelements in centromeric functions (Dawe, 2003), and more generally of the relationships between transposable elements and heterochromatin. It was shown that transposon tandem-arrays (see Marsano et al., 2003, for Bari-1 and Dorer and Henikoff, 1994, for P), and maybe LTR retroelement clusters in maize (SanMiguel et al., 1998) can lead to heterochromatinization. In A. thaliana, some non-autonomous transposons appear to consist of minisatellite-like repeats, and similarly, some satellite repeats arise from DNA transposons (Kapitonov and Jurka, 1999). One should also remember the high sequence similarities between transposases of the pogo-like superfamily and the centromeric CENP-B protein (Smit and Riggs, 1996). Conserved regions included a putative DNA binding domain as well as the DDE domain involved in transposase activity suggesting that there could be a link between Class II element and centromere functions. However, localisation of transposases at centromeres, nor sequence similarity between ITRs (DNA binding sites of transposase) and CENP Box (DNA binding site of CENP-B), were never observed. To summarize, although not well understood yet, relationships between heterochromatin, TEs and centromeres, and connection with epigenetics are obvious. Curiously, the involvement of TEs concerns both classes, but apparently in different ways. Intragenic versus intergenic insertion As mentioned above, many elements (notably Class I elements in plants) are found in the gene-poor, heterochromatic fractions of the genome, but not all. It has long been known that some families, belonging to Class I or Class II, tend to be found close to genes. For these elements, insertions must occur in regions where their deleterious effects are limited, such as intergenic regions or introns. Wong et al. (2000) propose that insertions occur (or are maintained) mainly in introns in animals, but preferentially in intergenic sequences in plants. Is this difference due to the nature of repeated elements or to the different properties held by genomes of plants and genomes of animals (such as different permissivity in different regions)? It may also just reflect two distinct strategies for accepting numerous repeats while minimizing their deleterious effects, which was adopted once during evolution and then conserved. In yeast, where there are only LTR-retroelements, several different scenarios are observed. Ty5 elements tend to insert near telomeric inactive DNA, whereas others will target the 5) position of the tRNA gene, probably because of the open structure of these regions (Kim et al., 1998). This shows that insertion specificity is clearly not related to the class of elements, but rather to specific families. Plant genomes usually contain few insertions within introns, although there are several instances involving MITE ele-

Cytogenet Genome Res 110:426–440 (2005)

431

Ectopic recombination

Low recombination rate region

Heterochromatin

Low expression level region

Fig. 2. Each of the three models (rectangles) predicts that TEs accumulate in defined regions of the chromosomes (octagons). However heterochromatin possesses characteristics of these three regions.

TE products poisoning

ments (El Amrani et al., 2002; Santiago et al., 2002; Wright et al., 2003). These short elements prefer the flanking regions of genes (5) or 3)). They are also fairly often found integrated into other MITE copies (Jiang and Wessler, 2001). Other DNA transposons (Mu, Ac or P) also target genes, as do some Class-I elements: the SINE S1 in Arabidopsis (Lenoir et al., 2001) or the LTR retroelement Tos7 in rice (Miyao et al., 2003). In maize, the Adh1-F region is probably the most striking example of the intergenic accumulation of LTR elements, which are found in clusters over a length of several tens of kilobases (SanMiguel et al., 1996). These LTR retroelements correspond to high copy-number families. Curiously, DNA transposons and MITEs are excluded from these regions, and tend to occur either between LTR retroelement clusters and genes, or within introns (Tikhonov et al., 1999). In this region, MITEs have higher copy numbers than LTR-retroelements. In barley (Hordeum vulgare), LTR retroelements appear to be similarly organized in intergenic nests (SanMiguel et al., 2002). In Drosophila, TEs have not invaded the introns of euchromatic genes. However, the opposite tendency was observed when analyzing heterochromatic introns, in which TE density was 450-fold higher than in euchromatic introns. It was found that more than 50 % of heterochromatic introns (which are larger than the euchromatic ones) originated from TEs (Dimitri et al., 2003). These intronic TE sequences are degenerated and most are Class II elements, as those observed in heterochromatic chromosome 4. In humans, the different retroelement subclasses have distinctive distributions. The SINEs Alu are reported to insert into GC rich regions whereas L1 prefer AT-rich regions, and LTR

432

Cytogenet Genome Res 110:426–440 (2005)

Low gene density region

Gene disruption

retroelements display intermediate patterns (Medstrand et al., 2002). Moreover, L1 and LTR elements are excluded from the genes, whereas Alu elements are over-represented near genes, and even within genes. Alu in introns are mainly inserted in the middle of the introns, avoiding their extremities, where they could have deleterious effects (Majewski and Ott, 2002). Curiously, the distribution of Alu within genes is further biased, and involves only some gene categories (proteins involved in metabolism, transport and signaling functions). This suggests that Alu may have a potential role in gene regulation, perhaps via epigenetic modulations (Grover et al., 2003). It has also been reported that 5 % of spliced exons comes from Alu-derived alternative splicing (Lev-Maor et al., 2003). Nekrutenko and Li (2001) detected TEs within 4 % of exons of human protein-coding regions. Alu represented half of these occurrences, followed by L1 and LTR retroelements; DNA transposons accounted for less than 9 % of these TE-containing exons, which is not negligible if we take into account the small percentage of DNA transposons in the human genome. Recombination rate As described above, TEs are usually found in particular compartments on the chromosomes. Three models have been put forward to explain this situation (see Nuzhdin, 1999 for review, and Fig. 2). The ectopic recombination model states that TEs will accumulate in the low recombination rate regions as a passive consequence of their elimination from high recombination rate regions, where ectopic recombination, supposed to be intense, would have more deleterious effects. The insertion model suggests a higher elimination of TEs from high gene

density regions, because of the obvious deleterious effect of insertion within genes. Finally, the third model assumes that high expression of TE-encoded products (DNA modifying enzymes) will be costly for the cell and could have negative consequences on genome integrity. It therefore predicts that TEs will be eliminated from regions of high expression level. It is frequently found that TEs accumulate in low recombination regions, which at first glance seems to support the first model. However, low recombination regions usually correspond to heterochromatin regions found near the centromere or telomere, where genes are known to be rare and poorly expressed. Hence discriminating between the different models is actually more complex. In Drosophila, one study found that TE abundance was more strongly associated with a low recombination rate than with low gene density, whatever the element type (Bartolome et al., 2002), which supports the ectopic recombination model. However, other reports do not demonstrate any clear correlation between the recombination rate and TE abundance, except for DNA transposons (Rizzon et al., 2002). In A. thaliana, an overall negative correlation between recombination rate and transposon density was observed, but this relied mainly on the centromere-specific Athila and CACTA-like elements (Wright et al., 2003). When only intergenic insertions were considered, a slight positive correlation was found, similar to the situation observed in C. elegans. In this worm, gene density is negatively correlated to recombination rate. TEs were found to be excluded from the central gene-rich regions characterized by a low recombination rate. However, this is not viewed as evidence of a direct correlation between gene density and TE insertions that would support the gene-disruption model (Duret et al., 2000). As observed in Drosophila, a significant correlation was found only for DNA elements. However it remains to be established whether distribution of DNA transposons is really more affected by these factors (gene density or recombination rate) than Class I, and if so, by which mechanisms. The different models all rely on the effect of natural selection to explain the observed distribution. Gene density, chromatin structure, and recombination rate are viewed as factors that influence the persistence of an element at one site, meaning that a TE could first jump anywhere, but would only remain at positions where it produces little deleterious effect. The relative contribution of each of these factors is far from clear, and other factors, such as DNA accessibility, could also directly influence TE insertion preference. Finally, the previously described striking differences, observed between genomes as well as between transposon families, reflect the complexities and influences of the long-term interactions between genome and TEs.

Age-related distribution of TEs The last point that should be highlighted is how TEs are distributed according to age. In humans, the various retroelement families display different patterns of insertion relative to GC content. For most of them, an age-related trend is perceptible. For example, young Alu (belonging to a few polymorphic, then

recent, subfamily) tend to be found in AT-rich regions, whereas older ones (from families showing more polymorphism) are located in GC-rich regions. LTR retroelements appear to be excluded from gene regions, but here too, exclusion seems to be more marked for older LTR elements, probably because of their deleterious effects on gene transcription (Medstrand et al., 2002). For the Emigrant MITEs of Arabidopsis, subfamilies of different ages display differing integration patterns, with older families tending to be located close to genes, and more recent ones further from them (Santiago et al., 2002). Finally, in Drosophila, the young LINEs are found more randomly distributed than the older ones, which are concentrated in heterochromatin (Blumenstiel et al., 2002). Once again, these data suggest that the apparent location of TEs could result from selection against insertions in euchromatin (more harmful) rather than from a true insertion preference.

Potential impact in the cell (domestication) Regulatory roles Human Alu elements can also affect gene expression when inserted in regulatory sequences. On the one hand, a recent analysis of human promoters revealed that 25 % of 5) promoters contained TEs-derived sequences (Jordan et al., 2003). Most of them are Alu sequences, followed by LINEs. Very few examples were found of LTRs present in 5) promoters. This finding is in agreement with the distribution reported by Medstrand et al. (2002), who observed that Alu elements were preferentially located close to genes, unlike LTR retroelements, usually found several kb apart from genes. On the other hand, numerous examples of TEs inserted in 3) UTR have been reported. Although the presence of regulatory elements in 3) UTR is well known, the abundance of TEs could result from little or no selection against such insertions. In the sequenced S. pombe strain, the same tendency was observed for solo LTRs to be located within 500 bp of transcription starts of RNA Pol II promoters (Bowen et al., 2003). In plants, there are numerous cases of retroelements or MITEs associated with genes (Mao et al., 2000), and a change in gene expression has sometimes been reported (Wessler et al., 1995). The MITE Tourist-C was recently found to be strongly associated with genes expressed in flowers (Iwamoto and Higo, 2003). White et al. (1994) reported the presence of copia-like retroelements near several normal plant genes, suggesting that they had acquired a functional role. Ancient transposon recruitment for a regulatory role is well documented in animals, and has been reviewed by Britten (1997). These examples mainly involved Class I elements (LTR, LINEs or SINEs). The presence of particular motifs (binding motif for regulation factors, RNA pol II signal) further supports the acquisition of a functional role. Again, the most convincing examples come from animal Class I elements (Tomilin, 1999; Jordan et al., 2003). For Class II elements, regulatory motifs have been detected within a MITE in Arabidopsis (El Amrani et al., 2002), but better evidence remains to be found for MITEs.

Cytogenet Genome Res 110:426–440 (2005)

433

Insulator/MARs Beside the regulatory impact of TEs on transcription functions such as enhancer, silencer, cryptic promoter, tissue specificity, polyadenylation signaling, TEs are known or thought to have been used by the genome for a number of other functions, the best known of which are telomerase activity (Class I) and immunoglobulin recombination (Class II). A potential role in higher-order chromatin organization is now emerging. Indeed, several reports suggest that TEs could serve as Matrix Attachment Regions (MARs). MARs are DNA structural elements that bind to the nuclear matrix and then divide the genome into distinct chromatin loops. MARs could then participate in gene regulation by isolating a gene and its regulatory elements from adjacent genes, and also by modifying DNA accessibility (Holmes-Davis and Comai, 1998). The multiple roles of MARs (insulator, silencer, enhancer, replication) are still being debated. MARs sequences are defined by their ability to bind to the nuclear matrix (at least in vitro). These sequences are usually AT-rich, which is not sufficient to explain the binding ability. A number of nucleotide motifs that recurrently occur in MARs have been identified, but no clear correlation between their abundance within a MAR sequence and its binding strength was evidenced, except for the so-called 90 %-AT-Box (Michalowski et al., 1999). Recently, several reports have identified transposable elements as being part of MARs, located close to MARs or having suspected MAR functions. This MAR activity has yet to be demonstrated in most cases, but it is clear that the candidate elements belong to both Class I (LINEs, SINEs, LTR retroelements), and Class II (MITEs). In cereals, numerous sequences located outside the genes or within introns bind to the nuclear matrix. MITE elements have been shown to co-localize on the same fragments as MARs (Avramova et al., 1998; Tikhonov et al., 1999), and to bind to isolated nuclear matrices (Tikhonov et al., 2000). Another example of a potential MAR activity is found in the Drosophila gypsy retrovirus. The insulator activity of gypsy, mediated by an Su(Hw) binding site has been recently shown to create chromatin domains (Byrd and Corces, 2003), and this reinforces the finding that a MAR co-localizes with this site (Nabirochkin et al., 1998). In humans, an analysis of the TEs located near genes revealed the presence of numerous MAR motifs in Alu elements and LINEs (Jordan et al., 2003). This confirmed a previous detailed analysis of a human locus in which MARs were found to consist mainly of copies of Class I elements (LINEs, Rollini et al., 1999). Hence, a MAR role does not seem to be restricted to a particular type of element. However, stronger evidence is still required to confirm the involvement of TEs in the important (but still unclear) MAR function.

Dynamics: copy-and-paste versus cut-and-paste Transposable elements activity can lead to an increase of the genome size (Kalendar et al., 2000), or to the occurrence of new genetic diversity (Mackay, 1985). These phenomenon may have some positive consequences for the host (e.g. Schlenke

434

Cytogenet Genome Res 110:426–440 (2005)

and Begun, 2004), but their evolutive adaptive impact is still widely misunderstood. On the one hand, their universality suggests that they are basic components of genome evolution, but, on the other hand, their activity often induces deleterious mutations or chromosomal abnormalities (McDonald, 1993). These latter observations can lead to the idea that a transposable element is able to spread into a genome by its own duplication ability, independently of any beneficial effect on its host. In this case, TEs can be considered as selfish DNA sequences (Dawkins, 1976; Doolittle and Sapienza, 1980; Orgel and Crick, 1980). Further modeling studies have shown that these sequences can spread even if they have deleterious effects on their host (Hickey, 1982; Charlesworth and Charlesworth, 1983). Among the TEs hosted by a species, those that are overrepresented should have been at one point the most efficient ones in invading a genome, multiplying and persisting within it, at least long enough for their observation. This intragenomic selection would promote the invasion ability of TEs, and the distinct transposition mechanisms of Class I and Class II elements should therefore correspond to two different parasitic strategies. Transposition rate Transposition is the process by which elements move into the genome and may increase their copy number. Transposition rate counts all new insertion events. Allowance must also be made for the loss rate (excision rate), which is the frequency of loss of a copy from any locus. The effective rate of copy number increase, which reflects the real invasion ability of an element, is calculated by subtracting the loss rate from the transposition rate. Class II elements transpose using the cut-and-paste mechanism, in which excision is the first and obligatory step of each transposition event. As the excised element is not always reinserted, a transposition event should apparently always lead to no gain or to a loss. However, the element may be duplicated if the excision occurs on replicated DNA, or if the DNA break left after excision is repaired using the homologous-template dependent process (Brookfield, 1995). In contrast, retrotransposition is described as a copy-and-paste mechanism, since the transcribed copy remains at the donor locus. Numerous RNA molecules can be produced and reverse transcribed, and each one potentially represents a new insertion. “Excision” can occur by random deletion and can be enhanced by intra-element recombination, as a consequence of the homology of LTRs. The retrovirus replication cycle is known to be fast and productive, and a retrotransposon like copia is able to produce hundreds of virus-like particles (VLPs) per cell under particular conditions (Nuzhdin et al., 1996). To summarize, whereas we could say that the “non-loss” of a copy at a given position for a Class II element is accidental, the reverse seems to be true for a retroelement (for which the loss of a copy is an exceptional event). Furthermore, Class I elements, and particularly LTR-retroelements, seem to be better self-replicators, and DNA elements only increase their copy number accidentally. This suggests that the Class I copy-andpaste mechanism of invasion should be more efficient than the Class II cut-and-paste strategy.

However, in fact, the observed transposition rates seem to be similar for elements of both classes. Several studies of D. melanogaster report transposition rates ranging from 10–5 to 10–3 events/copy/generation under normal conditions for retroelements (Charlesworth et al., 1992; Nuzhdin and Mackay, 1995; Suh et al., 1995; Maside et al., 2000), about 10–2 after heat shocks (Vasilyeva et al., 1999) and up to 10–1 in dysgenic crosses (Seleme et al., 1999). The same orders of magnitude are reported for Class II elements, in normal and dysgenic conditions (Eggleston et al., 1988; Biémont, 1994). Although they remain exceptional and hardly quantifiable events, excisions seem to occur more frequently for Class II elements. It is now generally thought that excision rates are at least one order of magnitude below transposition rates (Eggleston et al., 1988; Suh et al., 1995). These data suggest that in vivo, Class I and Class II elements do not exhibit different transposition capacities, at least in Drosophila. These data are in complete agreement with what is observed in the Drosophila genome, where there are no differences between the average copy number of Class I and Class II families (Kaminker et al., 2002). Similar studies on other species are rare, although it has been shown that in maize, Ac and Mu (Raizada et al., 2001) transpose more than they excise (Bennetzen, 2000). However, the interpretation of these data highlights some problems. On one hand, the estimated rates are not very accurate, partly due to the small number of events observed, and partly because the different methods used for transposition detection can give different results (Maside et al., 2001). On the other hand, the implicit hypothesis that transposition rates are constant for each family is manifestly wrong. Firstly, transposition rates measured under standard conditions are far below the maximum rates observed during transposition bursts. This suggests that, for a given family, amplification is able to vary by several orders of magnitude following environmental (stress) or genomic (regulation or deregulation) events. Secondly, the activity of an element is usually different in the different laboratory strains of the same species (Maside et al., 2001). For example, copia transposition rate can vary from 2 W 10–4 to 1.9 W 10–2 in D. melanogaster following the genomic copia copy number (Nuzhdin et al., 1996). It is suspected that it may also differ from that in the natural populations, although there is less evidence for this (Vieira and Biémont, 1997). Finally, amplification bursts could be precisely restricted to rare permissive genetic contexts, such as dysgenic individuals. In this case, the transposition rates observed in a population reflect the mean activity of a majority of genomes, where transposition is strongly restricted, and a few individuals predisposed to a transposition burst (Nuzhdin, 1999). Selective impact on the genome TEs are known to induce negative effects on the fitness of their host (e.g., Mackay, 1986). These effects are thought to be at least partly responsible for the apparent limitation in the TE copy number for each family (Charlesworth and Charlesworth, 1983). Several non-exclusive models have been proposed to explain the causes of the deleterious effects of TEs (Nuzhdin, 1999, for review and see also Fig. 2). First, as shown above, a TE can insert close to or into a gene, leading to its inactivation

or dysfunction (the deleterious insertion model). Secondly, several repeated sequences dispersed in a genome may recombine, leading to chromosomal translocations, inversions or deletions (the ectopic recombination model). Thirdly, TE transposition activity may have a direct negative effect on the host (the deleterious transposition model). Would Class I and Class II elements have the same consequences for these different modes of selection? The consequences of an insertion for gene expression (according to the deleterious insertion model) are particularly diverse, and make it difficult to evaluate the average cost in fitness for each class. First, we have to distinguish between the total inactivation of a gene and changes in the expression pattern or in expression level, with the possibility that the balance may be shifted in the context of alternative splicing, or alternative transcription starts. For any element, an insertion within a coding sequence will clearly have a drastic effect on gene activity. However, it seems likely that in the case of introns, short elements will be better tolerated than long elements. Finally, the sequence of the element itself may influence gene expression. Evidences are accumulating, suggesting that TEs may play a significant role in gene regulation (Labrador and Corces, 1997; Jordan et al., 2003). However, how the insertion into a promoter of an LTR retroelement (which contains strong regulatory and promoter sequences) differs from that of a Class II element is still an open question. The two classes of element could also have different effects on ectopic recombination (ectopic recombination model). The size difference between retroelements and transposons suggests that, with the same copy number, a long Class I TE could induce more ectopic recombinations than a short Class II element. Some findings suggest that this may be the case, since long LINE sequences seem to be more sensitive than shorter ones to selective pressure against ectopic recombination (Petrov et al., 2003). Thus, if elimination by ectopic recombination is a major process involved in TE loss, and if the influence of the size of the repeated element is strong enough, we could expect that TEs, such as retrotransposons, would be eliminated by natural selection to a greater extent than the shorter Class II elements. This could be a good explanation of why several million copies of some Class I elements may be found in species where ectopic recombination is limited, such as mammals, whereas in high recombination rate species, such as D. melanogaster, copy numbers never exceed a few tens (Eickbush and Furano, 2002). The last hypothesis of how TEs affect the fitness of their host is the suggestion that the deleterious effects could be directly caused by the transposition process itself, rather than by the presence of TEs in the genome (deleterious transposition model). In this case, we could expect some differences between Class I and Class II elements. For example, a harmful effect of Class II could result from chromosomal breaks triggered upon intense transposition. This is not expected to occur with Class I elements. For (some) Class I elements, transposition could poison the cell, for example, by producing a high number of VLPs. In common for both classes, protein production could consume resources and/or titrate host factors. This hypothetical selection mechanism is quite attractive, since it increases the probability

Cytogenet Genome Res 110:426–440 (2005)

435

that a mutant regulator element will become fixed (Brookfield, 1996; Quesneville and Anxolabéhère, 2001), as reported in natural populations (Black et al., 1987). However, there is still very little evidence to support this model, and it remains highly speculative. The selection models described above all propose a possible explanation for the copy number limitation in each family. In such a situation, a genome could host a very large number of TE families. However, although the total number of TE copies that can be tolerated by a genome may be very high, it is not infinite. In addition to this intra-family selection, a TE might be subject to a more global selection process, at the genome level. Indeed, if each TE copy inserted into a genome causes a slight decrease of fitness, then the total TE copy number is limited by the genetic load that the host can bear. Moreover, the various TE families, despite having different transposition mechanisms, use the same cellular machineries. Consequently, the different TE families, even if they do not interact directly, are competing for a common limited resource (Leonardo and Nuzhdin, 2002), and their evolution in a common environment (i.e. the genome of their host) may be less independent than was previously thought. Regulation It is clear that transposition of Class I and Class II TEs is controlled by complex regulation systems. Generally, the number of copies within each family seems to be limited, with some notable exceptions (e.g. Alu sequences in the human genome). Depending on the TE family concerned, the limitation of the activity of the transposable elements can occur at least at two levels: during synthesis (of RNA, protein, or cDNA), and also during the transposition step (excision and reinsertion for Class II, insertion alone for Class I). The amplification process may be limited by the element itself (self-regulation), for example by the production of transposition repressors (Lemaitre et al., 1993), or by the properties of the transposase molecule, such as the OPI (Over-Production Inhibition, Lohe and Hartl, 1996). Some of the regulation systems known also imply mutant TE copies (Black et al., 1987). If an excessive amplification is deleterious, a regulatory element produced by mutation from an active copy can fix in a population as a beneficial allele (Brookfield, 1996; Quesneville and Anxolabéhère, 2001). Finally, the genome itself has developed diverse mechanisms, such as RIP (Repeat Induced Point mutation) in Neurospora crassa (Cambareri et al., 1989), PTGS (Post-Translational Gene Silencing) using RNA interference (Sijen and Plasterk, 2003), epigenetics (Matzke et al., 1999) to defend itself against TE invasion. In some cases, these limitations can be overcome, for instance in the case of hybrid dysgenesis. This syndrome, which occurs among the offspring of particular crosses in Drosophila, is associated with intense transposition into the germline, due to a deregulation of the element (Bregliano and Kidwell, 1983). This leads to reduced fertility as a result of increases in harmful insertions, in chromosomal rearrangements and possibly in DNA breaks. So far, only three elements have been shown to trigger dysgenesis in D. melanogaster: two Class II elements (P and hobo) and one Class I element (I).

436

Cytogenet Genome Res 110:426–440 (2005)

The diversity of regulation mechanisms is related to the diversity of transposition processes. However, this diversity does not seem to mask any class-specific regulation ability: selfregulation is found in both classes, as is regulation by mutant copies (Craig et al., 2002). The sensitivity of TEs to genomic regulation seems to depend on their family (Lippman et al., 2003), but, once again, no Class I/Class II distinction is discernable. Even complex regulation systems, such as hybrid dysgenesis, are common to Class I and Class II TEs, and it seems unlikely that regulation specificities could be responsible for differing evolutionary successes.

Horizontal transfers Several putative horizontal transfers (HTs) of TEs between distantly related species have been reported. This phenomenon occurs more frequently in TEs than in regular genes. In Drosophila, at least four different events have been suggested; two for the retroelements I (Abad et al., 1989) and Copia (Jordan et al., 1999) and, two for the DNA elements P (Daniels et al., 1990b) and hobo (Daniels et al., 1990a). Since the first description of the transfer of the P element, probably from D. willistoni to D. melanogaster, many more suspected HTs have been suggested (Clark et al., 1995, 1998; Clark and Kidwell, 1997). However, are horizontal transfers more frequent for Class I or Class II elements? Before we can attempt to answer this question, we must be able to detect HTs. In many cases, these transfers are highlighted by means of phylogenetic analyses, in particular when TE phylogenies are not congruent with those of the species. However, such inconsistencies must be interpreted carefully, since several alternative hypotheses can be proposed (Capy et al., 1994; Cummings, 1994). These hypotheses include long branch attraction, ancestral polymorphism, and the loss of elements in some species. A recent analysis clearly shows that frequency of HTs differs in the main families of elements (Silva et al., 2004). More precisely, it looks as though the frequency of this phenomenon is related to the presence of a DNA intermediate during the transposition process. If this assumption is fulfilled, Class II elements should be more prone to HTs than Class I. An HT can be divided into two main steps: first the transfer from a donor species to a recipient one, and then the spread of the transferred TE within the recipient species (Kidwell, 1992). According to these authors, the main transposition mechanism used by Class II elements is more suitable for HTs. This assumption is based on the greater stability of DNA. Similar arguments can be used to explain the differences observed between LTR and non-LTR elements. On the one hand, in LTR elements, the RNA is first reverse-transcribed to DNA, and only then can the insertion into the genome occur. On the other hand, in the case of nonLTR elements, the RNA must be transferred into the nucleus and the reverse transcription then occurs directly at the target site (see Eickbush and Malik, 2002, for a review), which eliminates any free DNA intermediate. This could explain why HT of non-LTR elements is so rare (Malik et al., 1999). The vector of TEs involved in HT could be another source of differences between Class I and II. So far the steps involved

in HT have not all been explained. Bacteria and viruses are frequently proposed as the putative vectors. In Arthropods, Wolbachia could be an attractive candidate vector. Indeed, this endosymbiotic ·-proteobacterium is found in about 15–20 % of arthropods, and is maternally transmitted. Moreover several horizontal transfers of Wolbachia have been described between Drosophila species and their parasitoids (Vavre et al., 1999). Another example is provided by the baculoviruses. These viruses are specific to the Lepidoptera, and several parts of their genome are derived from their hosts. Moreover, numerous host TEs have been found integrated in the genome of these viruses, the TED LTR retrotransposon and several Class-II elements (see Capy et al., 1997 for references). These Class-II elements can be excised, and HT could have occurred for one of them, piggyBac, between the Lepidopteran Trichoplusia ni and the Tetriphids Bractocera dorsalis (Handler and McCombs, 2000).

Conclusions No striking differences have been detected between Class I and Class II elements regarding their distribution to the different compartments of the genome and their dynamics within and between species. Due to their mobility and their repetitive nature, both are involved in functional and structural events. Some elements seem to have been domesticated and thus involved in new functions. Particular features depend more on the families of the TE or on the host genome than on the class of the elements. As we have shown in this review, these two classes of elements differ in several aspects, but their fundamental impact on genomes has been quite similar over long evolutionary times.

The sequence similarities between Class I and Class II bring up the following question: Do they share a common origin? Several authors have raised this question during the last decade (see Capy et al., 1997 for a review), but answering it remains very difficult. On the one hand, it is obvious that the integrase of the Class I elements shares a common origin with the DDE transposase of Class II elements, but it may well be almost impossible to determine which derives from which, or whether both derive from a common ancestral sequence. On the other hand, non-DDE transposases probably have a different origin from that of the DDE transposases. As stressed by Capy and Maisonhaute (2002), it is surprising that all the components necessary to build some non-DDE transposases, such as the P element transposase, are present in prokaryotes (see Lerat et al., 1999 for more details). Therefore, some eukaryotic Class II elements could be an assemblage of prokaryotic domains, whereas others may result from the decay of retroelements and retroviruses. For the moment, it is impossible to provide a complete answer to the question of the origin of all the types of elements. It is generally assumed that TEs are combinations of modules, which could have independent evolutionary histories. This means that it is quite possible that some of them may have a prokaryote origin, and others derive from viruses. If this is the case, it would suggest that the differences and similarities observed between the two main classes of TE could be the result of their having a similar origin and different coevolutions within the genomes to reach different evolutionary stable states.

Acknowledgements The English text was reviewed by Monika Gosh. We would like to thank the anonymous reviewers for helpful comments.

References Abad P, Vaury C, Pelisson A, Chaboissier MC, Busseau I, Bucheton A: A long interspersed repetitive element – the I factor of Drosophila teissieri – is able to transpose in different Drosophila species. Proc Natl Acad Sci USA 86:8887–8891 (1989). Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, et al: Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301– 1310 (2002). Arkhipova I, Meselson M: Transposable elements in sexual and ancient asexual taxa. Proc Natl Acad Sci USA 97:14473–14477 (2000). Arkhipova IR, Pyatkov KI, Meselson M, Evgen’ev MB: Retroelements containing introns in diverse invertebrate taxa. Nat Genet 33:123–124 (2003). Avramova Z, Tikhonov A, Chen M, Bennetzen JL: Matrix attachment regions and structural colinearity in the genomes of two grass species. Nucleic Acids Res 26:761–767 (1998). Bailey JA, Carrel L, Chakravarti A, Eichler EE: Molecular evidence for a relationship between LINE-1 elements and X chromosome inactivation: the Lyon repeat hypothesis. Proc Natl Acad Sci USA 97:6634–6639 (2000).

Bartolome C, Maside X, Charlesworth B: On the abundance and distribution of transposable elements in the genome of Drosophila melanogaster. Mol Biol Evol 19:926–937 (2002). Bennetzen JL: Transposable element contributions to plant gene and genome evolution. Plant Mol Biol 42:251–269 (2000). Biedler J, Tu Z: Non-LTR retrotransposons in the African malaria mosquito, Anopheles gambiae: Unprecedented diversity and evidence of recent activity. Mol Biol Evol 20:1811–1825 (2003). Biémont C: Dynamic equilibrium between insertion and excision of P elements in highly inbred lines from an M strain of Drosophila melanogaster. J Mol Evol 39:466–472 (1994). Biémont C, Tsitrone A, Vieira C, Hoogland C: Transposable element distribution in Drosophila. Genetics 147:1997–1999 (1997). Black DM, Jackson MS, Kidwell MG, Dover GA: KP elements repress P-induced hybrid dysgenesis in Drosophila melanogaster. EMBO J 6:4125–4135 (1987). Blumenstiel JP, Hartl DL, Lozovsky ER: Patterns of insertion and deletion in contrasting chromatin domains. Mol Biol Evol 19:2211–2225 (2002). Boissinot S, Entezam A, Furano AV: Selection against deleterious LINE-1-containing loci in the human lineage. Mol Biol Evol 18:926–935 (2001).

Bowen NJ, Jordan IK, Epstein JA, Wood V, Levin HL: Retrotransposons and their recognition of pol II promoters: a comprehensive survey of the transposable elements from the complete genome sequence of Schizosaccharomyces pombe. Genome Res 13:1984–1997 (2003). Bregliano J-C, Kidwell MG: Hybrid dysgenesis determinants, in Shapiro JA (ed): Mobile Genetic Elements, pp 363–410 (Academic Press, New York 1983). Bringaud F, Garcia-Perez JL, Heras SR, Ghedin E, ElSayed NM, Andersson B, Baltz T, Lopez MC: Identification of non-autonomous non-LTR retrotransposons in the genome of Trypanosoma cruzi. Mol Biochem Parasitol 124:73–78 (2002). Britten RJ: Mobile elements inserted in the distant past have taken on important functions. Gene 205:177– 182 (1997). Brookfield JFY: Transposable elements as selfish DNA, in Sherratt D (ed): Mobile Genetic Elements, pp 131–153 (Oxford University Press, New York 1995). Brookfield JFY: Models and spread of non-autonomous selfish transposable elements when transposition and fitness are coupled. Genet Res Camb 67:199–209 (1996).

Cytogenet Genome Res 110:426–440 (2005)

437

Byrd K, Corces VG: Visualization of chromatin domains created by the gypsy insulator of Drosophila. J Cell Biol 162:565–574 (2003). Cambareri EB, Jensen BC, Schabtach E, Selker EU: Repeat-induced G-C to A-T mutations in Neurospora. Science 244:1571–1575 (1989). Cambareri EB, Aisner R, Carbon J: Structure of the chromosome VII centromere region in Neurospora crassa: degenerate transposons and simple repeats. Mol Cell Biol 18:5465–5477 (1998). Capy P: Classification and nomenclature of retrotransposable elements. Cytogenet Genome Res 110:457–461 (2005). Capy P, Anxolabéhère D, Langin T: The strange phylogenies of transposable elements: are horizontal transfers the only explanation? Trends Genet 10:7– 12 (1994). Capy P, Bazin C, Higuet D, Langin T: Dynamics and Evolution of Transposable Elements. (Landes Biosciences, Austin 1997). Capy P, Maisonhaute C: Acquisition and loss of modules: the construction set of transposable elements. Genetika 38:719–726 (2002). Carlton JM, Angiuoli SV, Suh BB, Kooij TW, Pertea M, Silva JC, Ermolaeva MD, Allen JE, Selengut JD, Koo HL, et al: Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature 419:512–519 (2002). Charlesworth B, Charlesworth D: The population dynamics of transposable elements. Genet Res Camb 42:1–27 (1983). Charlesworth B, Lapid A, Canada D: The distribution of transposable elements within and between chromosomes in a population of Drosophila melanogaster. I. Element frequencies and distribution. Genet Res 60:103–114 (1992). Charlesworth B, Langley CH, Sniegowski PD: Transposable element distributions in Drosophila. Genetics 147:1993–1995 (1997). Clark JB, Kidwell MG: A phylogenetic perspective on P transposable element evolution in Drosophila. Proc Natl Acad Sci USA 94:11428–11433 (1997). Clark JB, Altheide TK, Schlosser MJ, Kidwell MG: Molecular evolution of P transposable elements in the genus Drosophila. I. The saltans and willistoni species groups. Mol Biol Evol 12:902–913 (1995). Clark JB, Kim PC, Kidwell MG: Molecular evolution of P transposable elements in the genus Drosophila. III. The melanogaster species group. Mol Biol Evol 15:746–755 (1998). Cline M, Liu G, Loraine AE, Shigeta R, Cheng J, Mei G, Kulp D, Siani-Rose MA: Structure-based comparison of four eukaryotic genomes. Pac Symp Biocomput 127–138 (2002). Craig NL, Craigie R, Gellert M, Lambowitz AM: Mobile DNA II (American Society for Microbiology Press, Washington DC 2002). Crollius HR, Jaillon O, Dasilva C, Ozouf-Costaz C, Fizames C, Fischer C, Bouneau L, Billault A, Quetier F, Saurin W, et al: Characterization and repeat analysis of the compact genome of the freshwater pufferfish Tetraodon nigroviridis. Genome Res 10:939–949 (2000). Cummings MP: Transmission patterns of eukaryotic transposable elements: arguments for and against horizontal transfer. Trends Ecol Evol 9:141–145 (1994). Daniels SB, Chovnick A, Boussy IA: Distribution of hobo transposable elements in the genus Drosophila. Mol Biol Evol 7:589–606 (1990a). Daniels SB, Peterson KR, Strausbaugh LD, Kidwell MG, Chovnick A: Evidence for horizontal transmission of the P transposable element between Drosophila species. Genetics 124:339–355 (1990b).

438

Dasilva C, Hadji H, Ozouf-Costaz C, Nicaud S, Jaillon O, Weissenbach J, Crollius HR: Remarkable compartmentalization of transposable elements and pseudogenes in the heterochromatin of the Tetraodon nigroviridis genome. Proc Natl Acad Sci USA 99:13636–13641 (2002). Dawe RK: RNA interference, transposons, and the centromere. Plant Cell 15:297–301 (2003). Dawkins R: The Selfish Gene (Oxford University Press, Oxford 1976). Dimitri P, Junakovic N, Arcà B: Colonization of heterochromatic genes by transposable elements in Drosophila. Mol Biol Evol 20:503–512 (2003). Doolittle WF, Sapienza C: Selfish genes, the phenotype paradigm and genome evolution. Nature 284:601– 603 (1980). Dorer DR, Henikoff S: Expansions of transgene repeats cause heterochromatin formation and gene silencing in Drosophila. Cell 77:993–1002 (1994). Duret L, Marais G, Biémont C: Transposons but not retrotransposons are located preferentially in regions of high recombination rate in Caenorhabditis elegans. Genetics 156:1661–1669 (2000). Eggleston WB, Johnson-Schlitz DM, Engels WR: P-M hybrid dysgenesis does not mobilize other transposable element families in Drosophila melanogaster. Nature 331:368–370 (1988). Eickbush TH, Furano AV: Fruit flies and humans respond differently to retrotransposons. Curr Opin Genet Dev 12:669–674 (2002). Eickbush TH, Malik HS: Origin and evolution of retrotransposons, in Craig NL, Craigie R, Gellert M, Lambowitz AM (eds): Mobile DNA II, pp 1111– 1146 (American Society for Microbiology Press, Washington DC 2002). El Amrani A, Marie L, Ainouche A, Nicolas J, Couee I: Genome-wide distribution and potential regulatory functions of AtATE, a novel family of miniature inverted-repeat transposable elements in Arabidopsis thaliana. Mol Genet Genomics 267:459–471 (2002). Feng Q, Moran JV, Kazazian HH Jr, Boeke JD: Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87:905–916 (1996). Feng Q, Zhang Y, Hao P, Wang S, Fu G, Huang Y, Li Y, Zhu J, Liu Y, Hu X, et al: Sequence and analysis of rice chromosome 4. Nature 420:316–320 (2002). Feschotte C, Mouches C: Evidence that a family of miniature inverted-repeat transposable elements (MITEs) from the Arabidopsis thaliana genome has arisen from a pogo-like DNA transposon. Mol Biol Evol 17:730–737 (2000). Feschotte C, Zhang X, Wessler SR: Miniature invertedrepeat transposable elements and their relationship to established DNA transposons, in Craig NL, Craigie R, Gellert M, Lambowitz AM (eds): Mobile DNA II, pp 1147–1158 (American Society for Microbiology Press, Washington DC 2002) Fillingham JS, Thing TA, Vythilingum N, Keuroghlian A, Bruno D, Golding GB, Pearlman RE: A nonlong terminal repeat retrotransposon family is restricted to the germ line micronucleus of the ciliated protozoan Tetrahymena thermophila. Eukaryot Cell 3:157–169 (2004). Ganko EW, Fielman KT, McDonald JF: Evolutionary history of Cer elements and their impact on the C. elegans genome. Genome Res 11:2066–2074 (2001). Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, et al: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419:498–511 (2002). Glöckner G, Szafranski K, Winckler T, Dingermann T, Quail MA, Cox E, Eichinger L, Noegel AA, Rosenthal A: The complex repeats of Dictyostelium discoideum. Genome Res 11:585–594 (2001).

Cytogenet Genome Res 110:426–440 (2005)

Glöckner G, Eichinger L, Szafranski K, Pachebat JA, Bankier AT, Dear PH, Lehmann R, Baumgart C, Parra G, Abril JF, et al: Sequence and analysis of chromosome 2 of Dictyostelium discoideum. Nature 418:79–85 (2002). Grover D, Majumder PP, Rao CB, Brahmachari SK, Mukerji M: Nonrandom distribution of Alu elements in genes of various functional categories: insight from analysis of human chromosomes 21 and 22. Mol Biol Evol 20:1420–1424 (2003). Handler AM, McCombs SD: The piggyBac transposon mediates germ-line transformation in the Oriental fruit fly and closely related elements exist in its genome. Insect Mol Biol 9:605–612 (2000). Hickey DA: Selfish DNA: a sexually-transmitted nuclear parasite. Genetics 101:519–531 (1982). Holmes-Davis R, Comai L: Nuclear matrix attachment regions and plant gene expression. Trends Plant Sci 3:91–97 (1998). Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R, et al: The genome sequence of the malaria mosquito Anopheles gambiae. Science 298:129–149 (2002). Hoskins RA, Smith CD, Carlson JW, Carvalho AB, Halpern A, Kaminker JS, Kennedy C, Mungall CJ, Sullivan BA, Sutton GG, et al: Heterochromatic sequences in a Drosophila whole-genome shotgun assembly. Genome Biol 3:Research0085 (2002). Hua-Van A, Davière JM, Kaper F, Langin T, Daboussi MJ: Genome organization in Fusarium oxysporum: clusters of class II transposons. Curr Genet 37:339–347 (2000). Iwamoto M, Higo K: Tourist C transposable elements are closely associated with genes expressed in flowers of rice (Oryza sativa). Mol Genet Genomics 268:771–778 (2003). Jiang N, Wessler SR: Insertion preference of maize and rice miniature inverted repeat transposable elements as revealed by the analysis of nested elements. Plant Cell 13:2553–2564 (2001). Jordan IK, Matyunina LV, McDonald JF: Evidence for the recent horizontal transfer of long terminal repeat retrotransposon. Proc Natl Acad Sci USA 96:12621–12625 (1999). Jordan IK, Rogozin IB, Glazko GV, Koonin EV: Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet 19:68–72 (2003). Junakovic N, Terrinoni A, Di Franco C, Vieira C, Loevenbruck C: Accumulation of transposable elements in the heterochromatin and on the Y chromosome of Drosophila simulans and Drosophila melanogaster. J Mol Evol 46:661–668 (1998). Kalendar R, Tanskanen J, Immonen S, Nevo E, Schulman AH: Genome evolution of wild barley (Hordeum spontaneum) by BARE-1 retrotransposon dynamics in response to sharp microclimatic divergence. Proc Natl Acad Sci USA 97:6603–6607 (2000). Kaminker JS, Bergman CM, Kronmiller B, Carlson J, Svirskas R, Patel S, Frise E, Wheeler DA, Lewis SE, Rubin GM, et al: The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective. Genome Biol 3:Research0084 (2002). Kapitonov VV, Jurka J: Molecular paleontology of transposable elements from Arabidopsis thaliana. Genetica 107:27–37 (1999). Kapitonov VV, Jurka J: Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci USA 98:8714– 8719 (2001). Kidwell MG: Horizontal transfer of P elements and other short inverted repeat transposons. Genetica 86:275–286 (1992). Kidwell MG: Transposable elements and the evolution of genome size in eukaryotes. Genetica 115:49–63 (2002).

Kim A, Terzian C, Santamaria P, Pelisson A, Purd’homme N, Bucheton A: Retroviruses in invertebrates: the gypsy retrotransposon is apparently an infectious retrovirus of Drosophila melanogaster. Proc Natl Acad Sci USA 91:1285–1289 (1994). Kim JM, Vanguri S, Boeke JD, Gabriel A, Voytas DF: Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. Genome Res 8:464–478 (1998). Klobutcher LA, Jahn CL: Developmentally controlled genomic rearrangements in ciliated protozoa. Curr Opin Genet Dev 1:397–403 (1991). Kunst F, Ogasawara N, Moszer I, Albertini AM, Alloni G, Azevedo V, Bertero MG, Bessieres P, Bolotin A, Borchert S, et al: The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390:249–256 (1997). Labrador M, Corces VG: Transposable element-host interactions: regulation of insertion and excision. Annu Rev Genet 31:381–404 (1997). Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al: Initial sequencing and analysis of the human genome. Nature 409:860–921 (2001). Laten HM, Majumdar A, Gaucher EA: SIRE-1, a copia/Ty1-like retroelement from soybean, encodes a retroviral envelope-like protein. Proc Natl Acad Sci USA 95:6897–6902 (1998). Le QH, Wright S, Yu Z, Bureau T: Transposon diversity in Arabidopsis thaliana. Proc Natl Acad Sci USA 97:7376–7381 (2000). Lemaitre B, Ronsseray S, Coen D: Maternal repression of the P element promoter in the germline of Drosophila melanogaster: a model for the P cytotype. Genetics 135:149–160 (1993). Lenoir A, Lavie L, Prieto JL, Goubely C, Cote JC, Pelissier T, Deragon JM: The evolutionary origin and genomic organization of SINEs in Arabidopsis thaliana. Mol Biol Evol 18:2315–2322 (2001). Leonardo TE, Nuzhdin SV: Intracellular battlegrounds: conflict and cooperation between transposable elements. Genet Res 80:155–161 (2002). Lerat E, Brunet F, Bazin C, Capy P: Is the evolution of transposable elements modular? Genetica 107:15– 25 (1999). Lev-Maor G, Sorek R, Shomron N, Ast G: The birth of an alternatively spliced exon: 3) splice-site selection in Alu exons. Science 300:1288–1291 (2003). Lippman Z, May B, Yordan C, Singer T, Martienssen R: Distinct mechanisms determine transposon inheritance and methylation via small interfering DNA and histone modification. PLoS Biol 1:420– 428 (2003). Lohe AR, Hartl DL: Autoregulation of mariner transposase activity by overproduction and dominant negative complementation. Mol Biol Evol 13:549– 555 (1996). Mackay TF: Transposable element-induced response to artificial selection in Drosophila melanogaster. Genetics 111:351–374 (1985). Mackay TFC: Transposable element-induced fitness mutations in Drosophila melanogaster. Genet Res 48:77–87 (1986). Majewski J, Ott J: Distribution and characterization of regulatory elements in the human genome. Genome Res 12:1827–1836 (2002). Malik HS, Burke WD, Eickbush TH: The age and evolution of non-LTR retrotransposable elements. Mol Biol Evol 16:793–805 (1999). Mao L, Wood TC, Yu Y, Budiman MA, Tomkins J, Woo S, Sasinowski M, Presting G, Frisch D, Goff S, et al: Rice transposable elements: a survey of 73,000 sequence-tagged-connectors. Genome Res 10:982–990 (2000).

Marsano RM, Moschetti R, Barsanti P, Caggese C, Caizzi R: A survey of the DNA sequences surrounding the Bari1 repeats in the pericentromeric h39 region of Drosophila melanogaster. Gene 307:167–174 (2003). Maside X, Assimacopoulos S, Charlesworth B: Rates of movement of transposable elements on the second chromosome of Drosophila melanogaster. Genet Res 75:275–284 (2000). Maside X, Bartolome C, Assimacopoulos S, Charlesworth B: Rates of movement and distribution of transposable elements in Drosophila melanogaster: in situ hybridization vs Southern blotting data. Genet Res 78:121–136 (2001). Matzke MA, Mette MF, Aufsatz W, Jakowitsch J, Matzke AJM: Host defenses to parasitic sequences and the evolution of epigenetic control mechanisms. Genetica 107:271–287 (1999). McDonald J: Evolution and consequences of transposable elements. Curr Opin Genet Dev 3:855–864 (1993) Medstrand P, van de Lagemaat LN, Mager DL: Retroelement distributions in the human genome: variations associated with age and proximity to genes. Genome Res 12:1483–1495 (2002). Meyers BC, Tingey SV, Morgante M: Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. Genome Res 11: 1660–1676 (2001). Michalowski SM, Allen GC, Hall GE Jr, Thompson WF, Spiker S: Characterization of randomly-obtained matrix attachment regions (MARs) from higher plants. Biochemistry 38:12795–12804 (1999). Miyao A, Tanaka K, Murata K, Sawaki H, Takeda S, Abe K, Shinozuka Y, Onosato K, Hirochika H: Target site specificity of the Tos17 retrotransposon shows a preference for insertion within genes and against insertion in retrotransposon-rich regions of the genome. Plant Cell 15:1771–1780 (2003). Nabirochkin S, Ossokina M, Heidmann T: A nuclear matrix/scaffold attachment region co-localizes with the gypsy retrotransposon insulator sequence. J Biol Chem 273:2473–2479 (1998). Nanda I, Haaf T, Schartl M, Schmid M, Burt DW: Comparative mapping of Z-orthologous genes in vertebrates: implications for the evolution of avian sex chromosomes. Cytogenet Genome Res 99:178– 184 (2002). Nekrutenko A, Li WH: Transposable elements are found in a large number of human protein-coding genes. Trends Genet 17:619–621 (2001). Nitta N, Farman ML, Leong SA: Genome organization of Magnaporthe grisea: integration of genetic maps, clustering of transposable elements and identification of genome duplications and rearrangements. Theor Appl Genet 95:20–32 (1997). Nuzhdin SV: Sure facts, speculations, and open questions about evolution of transposable elements. Genetica 107:129–137 (1999). Nuzhdin SV, Mackay TF: The genomic rate of transposable element movement in Drosophila melanogaster. Mol Biol Evol 12:180–181 (1995). Nuzhdin SV, Pasyukova EG, Mackay TF: Positive association between copia transposition rate and copy number in Drosophila melanogaster. Proc R Soc Lond B Biol Sci 263:823–831 (1996). Orgel LE, Crick FH: Selfish DNA: the ultimate parasite. Nature 284:604–607 (1980). Petrov DA, Aminetzach YT, Davis JC, Bensasson D, Hirsh AE: Size matters: non-LTR retrotransposable elements and ectopic recombination in Drosophila. Mol Biol Evol 20:880–892 (2003). Quesneville H, Anxolabéhère D: Genetic algorithmbased model of evolutionary dynamics of class II transposable elements. J Theor Biol 213:21–30 (2001).

Quesneville H, Nouaud D, Anxolabéhère D: Detection of new transposable element families in Drosophila melanogaster and Anopheles gambiae genome. J Mol Evol 57:1–10 (2003). Raizada MN, Brewer KV, Walbot V: A maize MuDR transposon promoter shows limited autoregulation. Mol Genet Genomics 265:82–94 (2001). Rice Chromosome 10 Sequencing Consortium: Indepth view of structure, activity, and evolution of rice chromosome 10. Science 300:1566–1569 (2003). Rizzon C, Marais G, Gouy M, Biémont C: Recombination rate and the distribution of transposable elements in the Drosophila melanogaster genome. Genome Res 12:400–407 (2002). Rollini P, Namciu SJ, Marsden MD, Fournier RE: Identification and characterization of nuclear matrix-attachment regions in the human serpin gene cluster at 14q32.1. Nucleic Acids Res 27:3779– 3791 (1999). SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, Melake-Berhan A, Springer PS, Edwards KJ, Lee M, Avramova Z, et al: Nested retrotransposons in the intergenic regions of the maize genome. Science 274:765–768 (1996). SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL: The paleontology of intergene retrotransposons of maize. Nat Genet 20:43–45 (1998). SanMiguel PJ, Ramakrishna W, Bennetzen JL, Busso CS, Dubcovsky J: Transposable elements, genes and recombination in a 215-kb contig from wheat chromosome 5A(m). Funct Integr Genomics 2:70– 80 (2002). Santiago N, Herraiz C, Goni JR, Messeguer X, Casacuberta JM: Genome-wide analysis of the Emigrant family of MITEs of Arabidopsis thaliana. Mol Biol Evol 19:2285–2293 (2002). Sasaki T, Matsumoto T, Yamamoto K, Sakata K, Baba T, Katayose Y, Wu J, Niimura Y, Cheng Z, Nagamura Y, et al: The genome sequence and structure of rice chromosome 1. Nature 420:312–316 (2002). Schlenke TA, Begun DJ: Strong selective sweep associated with a transposon insertion in Drosophila simulans. Proc Natl Acad Sci USA 101:1626–1631 (2004). Seleme M, Busseau I, Malinsky S, Bucheton A, Teninges D: High-frequency retrotransposition of a marked I factor in Drosophila melanogaster correlates with a dynamic expression pattern of the ORF1 protein in the cytoplasm of oocytes. Genetics 151:761–771 (1999). Sijen T, Plasterk RH: Transposon silencing in the Caenorhabditis elegans germ line by natural RNAi. Nature 426:310–314 (2003). Silva JC, Loreto EL, Clark JB: Factors that affect the horizontal transfer of transposable elements. Curr Issues Mol Biol 6:57–71 (2004). Smit AF, Riggs AD: Tiggers and DNA transposon fossils in the human genome. Proc Natl Acad Sci USA 93:1443–1448 (1996). Stratikopoulos EE, Augustinos AA, Gariou-Papalexiou A, Zacharopoulou A, Mathiopoulos KD: Identification and partial characterization of a new Ceratitis capitata-specific 44-bp pericentromeric repeat. Chromosome Res 10:287–295 (2002). Suh DS, Choi EH, Yamazaki T, Harada K: Studies on the transposition rates of mobile genetic elements in a natural population of Drosophila melanogaster. Mol Biol Evol 12:748–758 (1995). Sun X, Le HD, Wahlstrom JM, Karpen GH: Sequence analysis of a functional Drosophila centromere. Genome Res 13:182–194 (2003). Surzycki SA, Belknap WR: Repetitive-DNA elements are similarly distributed on Caenorhabditis elegans autosomes. Proc Natl Acad Sci USA 97:245–249 (2000).

Cytogenet Genome Res 110:426–440 (2005)

439

Tabata S, Kaneko T, Nakamura Y, Kotani H, Kato T, Asamizu E, Miyajima N, Sasamoto S, Kimura T, Hosouchi T, et al: Sequence and analysis of chromosome 5 of the plant Arabidopsis thaliana. Nature 408:823–826 (2000). Tettelin H, Masignani V, Cieslewicz MJ, Eisen JA, Peterson S, Wessels MR, Paulsen IT, Nelson KE, Margarit I, Read TD, et al: Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae. Proc Natl Acad Sci USA 99:12391– 12396 (2002). The Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 (2000). The C. elegans Sequencing Consortium: Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2018 (1998). Tikhonov AP, SanMiguel PJ, Nakajima Y, Gorenstein NM, Bennetzen JL, Avramova Z: Colinearity and its exceptions in orthologous adh regions of maize and sorghum. Proc Natl Acad Sci USA 96:7409– 7414 (1999).

440

Tikhonov AP, Bennetzen JL, Avramova ZV: Structural domains and matrix attachment regions along colinear chromosomal segments of maize and sorghum. Plant Cell 12:249–264 (2000). Tomilin NV: Control of genes by mammalian retroposons. Int Rev Cytol 186:1–48 (1999). Vasilyeva LB, Bubenshchikova EV, Ratner VA: Heavy heat shock induced retrotransposon transposition in Drosophila. Genet Res 74:111–119 (1999). Vavre F, Fleury F, Lepetit D, Fouillet P, Bouletreau M: Phylogenetic evidence for horizontal transmission of Wolbachia in host-parasitoid associations. Mol Biol Evol 16:1711–1723 (1999). Vieira C, Biémont C: Transposition rate of the 412 retrotransposable element is independent of copy number in natural populations of Drosophila simulans. Mol Biol Evol 14:185–188 (1997). Volff JN, Korting C, Froschauer A, Sweeney K, Schartl M: Non-LTR retrotransposons encoding a restriction enzyme-like endonuclease in vertebrates. J Mol Evol 52:351–360 (2001). Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al: Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562 (2002). Wessler SR, Bureau TE, White SE: LTR-retrotransposons and MITEs: important players in the evolution of plant genomes. Curr Opin Genet Dev 5:814–821 (1995).

Cytogenet Genome Res 110:426–440 (2005)

White SE, Habera LF, Wessler SR: Retrotransposons in the flanking regions of normal plant genes: a role for copia-like elements in the evolution of gene structure and expression. Proc Natl Acad Sci USA 91:11792–11796 (1994). Wong GK, Passey DA, Huang Y, Yang Z, Yu J: Is “junk” DNA mostly intron DNA? Genome Res 10:1672–1678 (2000). Wright SI, Agrawal N, Bureau TE: Effects of recombination rate and gene density on transposable element distributions in Arabidopsis thaliana. Genome Res 13:1897–1903 (2003). Zhang X, Feschotte C, Zhang Q, Jiang N, Eggleston WB, Wessler SR: P instability factor: an active maize transposon system associated with the amplification of Tourist-like MITEs and a new superfamily of transposases. Proc Natl Acad Sci USA 98:12572–12577 (2001). Zhong CX, Marshall JB, Topp C, Mroczek R, Kato A, Nagaki K, Birchler JA, Jiang J, Dawe RK: Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell 14:2825–2836 (2002).