Relationships Between Transposable Elements Based ... - Springer Link

10 downloads 0 Views 427KB Size Report
3 Laboratoire de Dynamique du Génome et Evolution, Institut J. Monod, 75251 Paris Cedex 05, France ... elements Ac of Zea mays, Tam3 of Antirrhinum majus,.
J Mol Evol (1996) 42:359–368

© Springer-Verlag New York Inc. 1996

Relationships Between Transposable Elements Based Upon the Integrase-Transposase Domains: Is There a Common Ancestor? Pierre Capy,1 Renaud Vitalis,1 Thierry Langin,2 Dominique Higuet,3 Claude Bazin1 1

Laboratoire Populations, Ge´ne´tique et Evolution, CNRS, 91198 Gif/Yvette Cedex, France Laboratoire de Cryptogamie, Bat. 400, Universite´ Paris XI, 91405 Orsay Cedex, France 3 Laboratoire de Dynamique du Ge´nome et Evolution, Institut J. Monod, 75251 Paris Cedex 05, France 2

Received: 3 July 1995 / Accepted: 9 October 1995

Abstract. The integrase domain of RNA-mediated elements (class I) and the transposase domain of DNAmediated transposable elements (class II) were compared. A number of elements contain the DDE signature, which plays an important role in their integration. The possible relationships between mariner-Tc1 and IS elements, retrotransposons, and retroviruses were analyzed from an alignment of this region. The mariner-Tc1 superfamily, and LTR retrotransposons and retroviruses were found to be monophyletic groups. However, the IS elements of bacteria were found in several groups. These results were used to propose an evolutionary history that suggests a common ancestor for some integrases and transposases. Key words: Transposable elements — Evolution — Horizontal transfers and alternative hypotheses Introduction Since the discovery of transposable elements (TE) in the 1950s and the 1960s by Barbara McClintock (see McClintock 1984 for references), many elements have been described in most prokaryotes and eukaryotes, from bacteria to vertebrates, examined to date. These elements seem to be relatively well tolerated by their hosts and may play an important role in regulating structural genes,

Correspondence to: P. Capy

in the organization and evolution of genomes, and more generally in the adaptation of natural populations and species (see special issue of Genetica on transposable elements and evolution, McDonald 1992; White et al. 1994). Several authors have examined the evolution of these elements by comparing the phylogenies of TE with those of their host species. There were many inconsistencies between these two types of phylogenies (Daniels et al. 1990; Maruyama and Hartl 1991b; Robertson 1993; Capy et al. 1994b). These results have been interpreted by postulating horizontal transmission of TE between species or by assuming ancestral TEs with particular evolutionary characteristics, such as variable evolution rates and/or ancestral polymorphism (Kidwell 1993; Capy et al. 1994a; Cummings 1994). However, these approaches are not mutually exclusive, since, while horizontal transfers are probably exceptional in some organisms (e.g., Drosophila), they can be frequent in bacteria (Hartl and Sawyer 1988). Transposable elements have been divided into two main groups. Class I elements are TEs that move via an RNA intermediate and use a reverse transcriptase, while class II elements transpose via a DNA intermediate (Finnegan 1989). The conserved regions within each class have been compared and used to defined families, such as the IS3 family (Schwartz et al. 1988); and superfamilies, like the hAT superfamily, which includes the elements Ac of Zea mays, Tam3 of Antirrhinum majus, and hobo of Drosophila melanogaster (Calvi et al. 1991; Atkinson et al. 1993), and the mariner-Tc1 superfamily, which contains the mariner-like and Tc1-like elements of

360

several organisms, including Drosophila, nematods, ciliates, and fungi (Doak et al. 1994; Langin et al. 1995; Robertson 1995). Two of the main questions raised by the evolution of TEs are their origin and the relationship between the different families of elements. Xiong and Eickbush (1990) have proposed an evolutionary history of retrotransposons and retroviruses based on the similarities between their reverse transcriptase domains. These authors showed that retroviruses may be derived from retrotransposons by the acquisition of an envelope gene, while non-LTR retrotransposons could be on another evolutionary branch leading to the group II introns or the reverse transcriptase found in cell organites like mitochondria. In order to check our suggestion that most elements may be derived from ancestral copies (Capy et al. 1994a), we have compared a large number of TEs belonging to different classes. We have analyzed the relationships between families or superfamilies of TEs without regard to the organisms in which they are detected. Thus, the fundamental question was: Can the similarities between families or superfamilies shed any light on their evolution and origin? If they do, then we may be able to determine whether they are derived from the same ancestral element or are the result of convergence. This study compared the integrase (class I) and transposase (class II) of four main groups of class II elements, mariner-Tc1, IS, hAT, and P. We then compared the regions showing the greatest similarities to conserved regions of retrotransposon and retrovirus integrases. There are some similarities between ISs and members of the mariner-Tc1 superfamily, but the two remaining groups of elements, the hAT and P superfamilies, show no similarity either between themselves or with the other groups. These comparisons were used to prepare alignments of several elements based upon the DDE regions; the resulting trees were used to propose an evolutionary history of some groups of elements.

Table 1.

Element

Host

Accession number

IS2 IS3 IS4 IS15–IS15D IS26–IS6 IS30 IS150 IS231A IS240B IS426 IS630 IS911 IS1086 IS1151 IS4351

Escherichia coli Escherichia coli Escherichia coli Salmonella panama Proteus vulgaris Escherichia coli Escherichia coli Bacillus thuringiensis Bacillus thuringiensis Agrobacterium tumefaciens Shigella sonnei Shigella dysenteriae Alaligenes autrophus Clostridium perfringens Escherichia coli

JQ0040a X02311b J01733b M12900b X00011b X00792b X07037b X03397b J03315b S37713a X05955b X17613b X58441b Z18246b M17124b

Bari1 HB1 Impala

Drosophila melanogaster Drosophila melanogaster Fusarium oxysporum

KZ370 Mar Minos MLE Mos1 Tc1 Tc3 Tcb1 Tec1 Tes1 Tss Uhu

Caenorhabditis elegans Dugestia tigrina Drosophila hydei Hyalophora cecropia Drosophila mauritiana Caenorhabditis elegans Caenorhabditis elegans Caenorhabditis briggsae Euplotes crassus Eptatretus stouti Salmo salar (consensus) Drosophila heteroneura

X67681b X01748b Langin et al. 1995 M98552b S35068b X61695b M63844b M14653b X01005b M77679b S01245b L03359b M93038b L12206-07-08b X17356b

Copia Gypsy Ted Tnt1 Ty1 Ty3 412 1731

Drosophila melanogaster Drosophila melanogaster Autographa californica Nicotiana tabacum Saccharomyces cerevisiae Saccharomyces cerevisiae Drosophila melanogaster Drosophila melanogaster

X02599b X03734b M32662b X13777* X03840b M23367b X04132b X07656b

ARV

A03959a

SMR

Avian reticoloendotheliosis virus Bovine immunodeficiency virus Feline immunodeficiency virus Human immunodeficiency virus type 1 Human immunodeficiency virus type 2 Human T-cell lymphotrophic virus type 1 Moloney murine leukemia virus Mouse mammary tumor virus Rous sarcoma virus Simian immunodeficiency virus Squirrel monkey retrovirus

Ac

Zea mays

X013820b

BIV

Materials and Methods The sequences used in the present work were extracted from GenBank, EMBL, and NBRF-PIR (see Table 1 for the accession numbers). These sequences were aligned using software provided by GCG (Genetics Computer Group 1991) and BISANCE (Dessen et al. 1990). More precisely, alignments were generally performed with the GCG-GAP program when the sequences were highly divergent and with CLUSTAL V, PILEUP, or MULTALIGN when sequences were at least 70% similar. However, the alignments were also manually optimized using published alignments. Trees were inferred using the programs of the PHYLIP package (version 3.5c for Macintosh, Felsenstein 1993) and PAUP (version 3.1.1, Swofford 1993). The neighbor-joining trees were derived from matrices of distances based on the category distance model proposed by George et al. (1988), in which the amino acids are divided up into groups, the change occurs based on the genetic code, and there is greater difficulty of changing between groups.

Elements, hosts, and accession numbers in data bases

FIV HIV1 HIV2 HTCLV1 MoMuLV MMTV RSV SIV

B34742a M25538b M93258b X16109b C28136a A03956a C26795a J00844b X15781b A05072a

361 Table 1.

Continued

Element

Host

Accession number

Hermes Hobo Tam3

Musca domestica Drosophila melanogaster Antirrhinum majus

L34807b M69216b X55078b

P P P P

(Bifasc) (Guanche) (Nebulosa) (PII 25.1)

Drosophila Drosophila Drosophila Drosophila

P P P P P

(Subobs 1) (Subobs 2) (Scapto 1) (Scapto 2) (Lucilia 1A)

Drosophila subobscura Drosophila subobscura Scaptomyza pallida Scaptomyza pallida Lucilia cuprina

X60990b M81221b M17424b X06779b V01520b X69493b X60436b X60437b M63342b M63341b M89990b

a b

bifasciata guanche nebulosa melanogaster

From NBRF-PIR From Genbank or EMBL

Results

Conserved Regions Within and Between Groups of Elements: Synthesis and Comparisons The Class II Elements The IS Elements. The ISs include a large number of elements that have been mainly isolated from the chromosomal DNA of bacteria, from plasmids, and from bacteriophages. These elements can be grouped into moreor-less related sets based on their sequence similarities (Galas and Chandler 1989; Rezsöhazy et al. 1993; Mahillon and Chandler, personal communication). Most of these elements contain a specific amino acid signature, DDE (aspartate: position 140, aspartate: position 200, and glutamate: position 236 in the IS3 element), including the members of the IS3, IS4, IS6, IS30, and IS630 families found in Gram-positive and Gramnegative bacteria (Fayet et al. 1990; Khan et al. 1991; Kulkosky et al. 1992; Rezsöhazy et al. 1993). This signature initially described by Fayet et al. (1990), also found in most retrotransposons and retroviruses (Khan et al. 1991), is particularly significant for integration of the elements (Skalka 1993; Dyda et al. 1994; Vos and Plasterk 1994; Polard and Chandler 1995). The existence of such a region could be due to a common ancestor or to convergence. However, the probability of convergence is probably very weak because many small motifs are conserved inside the DDE signature, i.e., within 90 amino acids (minimal distance between the first D and the E residues) and particularly within the last 35 amino acids (most common distance between the last D and E residues). Therefore, the similarities between distantly related species of bacteria strongly suggest a common origin of their elements.

The DDE signature is not found in all IS elements. A few of them, such as the IS1 element (Serre et al. 1995), have some similarities to a signature present in the recombinase of bacteriophages l, f80, P22, P2, 186, P4, and P1 (Argos et al. 1986), while others, such as the IS110, IS117 elements (Lenich and Glasgow 1994), are related to an essential protein in site-specific DNA inversion in Moraxella lacunata (Piv). However, these similarities are presently known in only very few IS elements. The Mariner-Tc1 Superfamily. Mariner-like and Tc1like families of elements have been described in many species very different from the species in which they were first found. The mariner element initially described in Drosophila mauritiana (Jacobson et al. 1986) has been found in many Drosophila species (Maruyama and Hartl 1991a,b; Brunet et al. 1994), in the lepidoptera Hyalophora cecropia (MLE element, Lidholm et al. 1991), in several species of Diptera and Hymenoptera (Robertson 1993; Bigot et al. 1994), in platyhelminths (Mar1 element in Dugesita tigrina, Garcia-Fernandez et al. 1993), and in the phytopathogen fungus Fusarium oxysporum (element Impala, Langin et al. 1995; see also Capy et al. 1994b for a review). There have been similar reports of relatives of the Tc1 element initially described in Caenorhabditis elegans (Emmons et al. 1983; Liao et al. 1983; Rosenzweig et al. 1983). Related elements have recently been found in teleost fishes (Radice et al. 1994) and in several species of Drosophila, such as the Uhu element in Hawaiian Drosophila (Brezinsky et al. 1990), the minos element in D. hydei (Franz and Savakis 1991), and the Bari-1 element in D. melanogaster (Caizzi et al. 1993). The two families have several similarities, including functional ones such as their excision characteristics, their target sites (TA), the length of their putative transposases (about 340–350 aa), and several conserved regions of their transposases. A detailed analysis of these conserved amino acid blocks was used to define the ‘‘share derived characters or synapomorphies in their transposase sequences’’ (Robertson 1995). For instance, at positions 154–158 (position in the Mos1 of D. mauritiana), the consensus sequence is DE for the marinerlike elements and WSDE for the Tc1-like elements. There is HDNA at positions 248–252 in all mariners and QDND in almost all Tc1s. Finally, there is the motif YSPDLAP (S/T/I)D in all mariner-like elements and YSPDLNPIE in all Tc1-like elements around position 284 of the Mos1 element of D. mauritiana. According to Doak et al. (1994), Robertson (1995), and Langin et al. (1995), these three regions correspond to the D,D(35)E region (underlined residues in the previous motifs) described for the IS elements. The main difference in the DDE signatures in these elements is a D residue in mariner-like elements and an E residue in Tc1-like ones.

362

The hobo-Ac-Tam3 Superfamily. The relationships between Tam3 (Antirrhinum majus) and Ac (Zea mays) elements were suspected by Haring et al. (1989) and confirmed by Haring et al. (1991), Hehl et al. (1991), Calvi et al. (1991), and Atkinson et al. (1993). This hAT superfamily (Atkinson et al. 1993) also includes elements detected in distantly related species, such as the Hermes element of Musca domestica and the Tag1 element of Arabidopsis thaliana (Warren et al. 1994). These elements have similar structural, excision, and transposition characteristics and similar transposases. Hence, they probably have a common origin. The main regions conserved between these elements are the DMWT, TRWN, and RNRL signatures in regions 1, 2, and 3, respectively, defined by Calvi et al. (1991; see also Warren et al. 1994). However, there is no similarity with the DDE signature. The P Family. The transposases of P elements detected in several species of the Drosophilidae family and in other diptera, such as Lucilia cuprina (Perkins and Howells 1992), were aligned. The sequences of the exons 2 are relatively well conserved between species, so it is difficult to define any region or signature specific to this family. Therefore, this family is presently composed of elements closely related to the P element (pp25.1) of D. melanogaster. There are also no similarities with the DDE signature of the ISs and the mariner-Tc1 elements. The Class I Elements This particular class of elements is divided into two groups, the retroviral-type retrotransposons (with LTRs, long terminal repeats) and the nonretroviral-type retrotransposons (without LTRs). The first group of elements can be further subdivided into the Ty1-copia and Ty3gypsy, while the second group includes the I element of Drosophila, the Tad element of Neurospora crassa, the LINEs of vertebrates, and R1 and R2 elements. While these elements transpose through an RNA intermediate, they do not all use similar integration mechanisms (Eickbush 1992; Burke et al. 1993; Luan et al. 1993). LTR retrotransposons use an integrase domain, while some non-LTR retrotransposons such as R2 elements use an RNA-mediated integration (Luan et al. 1993). The integrase domain of LTR retrotransposons contains some regions similar to the DDE region (Khan et al. 1991; Kulkosky et al. 1992). Another signature seems to be relatively well conserved upstream of this region: This is the HHCC signature (Khan et al. 1991; Bushman et al. 1993). According to Khan et al. (1991), these signatures are involved in DNA binding/cutting (DDE) and may also be involved in recognizing LTR sequences (HHCC). However, the function of this last region is still debated (Polard and Chandler, 1995). Finally, the HHCC signature is not detected in the transposase of class II elements.

Relationships Between Mariner-Tc1, IS, Retrotransposon, and Retroviruses Based Upon the DDE Region Figure 1 shows a general alignment of the transposase and integrase of elements containing a DDE signature. Two main blocks were considered. The first block (A) corresponds to nine amino acids surrounding the first D residue. The second block (B), subdivided into B1 and B2 blocks, includes the second D and the E residues. The general alignment is based on those of Fayet et al. (1990), Rezsohazy et al. (1993) for the IS elements, Robertson (1995) and Doak et al. (1994) for the mariner-Tc1 like elements, and Kulkosky et al. (1992) for retrotransposons and retroviruses. This alignment reveals the great variability among the IS sequences compared to the other groups of elements. For instance, the region of the first D (block A) is not well conserved in the IS4 family according to the alignment provided by Rezsohazy et al. (1993). Moreover the distances between the two blocks (A and B) and the distance between the second D (block B1) and the E (block B2) residues vary greatly in these elements. These alignments were confirmed using programs such as MULTALIGN, CLUSTAL, and/or GAP. This alignment was used to infer unrooted trees within and between the different groups of elements. These trees were obtained using the PAUP program based on the parsimony principle and the neighbor-joining algorithm (from the program proposed in the PHYLIP package, distance matrix of George et al. 1988, see Materials and Methods). Figure 2 shows the trees obtained by analysis of the members of the mariner-Tc1-like superfamily. The elements are divided into two main groups: the mariner-like and the Tc1-like elements. While the bootstrap values are relatively weak, partly due to the short length of the region analyzed, the two types of methods give similar element clusters. The main unsolved classifications are the positions of Impala and Tec1 elements. The PAUP analysis indicates that the impala element lies in an intermediate position between mariner and Tc1 families and may be a single member of a new group within this superfamily, as suggested by Langin et al. (1995). However, the neighbor-joining method places both Tec1 and Impala in an intermediate position, although the Tec1 of Euplotes crassus is considered to be a member of the Tc1-like family (Jahn et al. 1993). Despite this problem, the mariner-like elements are clearly differentiated from the other elements. The tree obtained for the IS elements (Fig. 3) shows them to be clustered into the IS4 (Rezohazy et al. 1993), IS6, IS3, and IS30 families (see Galas and Chandler 1989 and references therein). As for the previous elements, and in spite of low bootstrap values, the two techniques used produce similar groups. The only difference between the two trees is the position of the IS630 element, which is more closely related to the IS6 family in the

363

Fig. 1. Alignment of regions corresponding to the DDE signature. The class II element are represented by the IS (from IS4 to IS630) and mariner-Tc1 (from Mos1 to Minos) elements and the class I elements by the LTR retrotransposons of the copia-Ty1 and gypsy-Ty3 groups (from Ty1 to 412). A few retrovirus sequences are also included (from RSV to HTCLV1). The host species and the accession numbers of all these elements are given in Table 1. Block A corresponds to the first D

of the DDE signature. Distance between blocks A and B (not given) varies from one element to the other. Block B is subdivided into B1 and B2. As indicated by Rezsohazy et al. (1993), a gap of variable length was introduced between B1 and B2 in order to align all these sequences. The number of amino acids between B1 and B2 is given in parentheses. Arrows indicate the position of the DDE signature.

phylogeny based upon the parsimony principle, and more closely related to the IS3 family according to the tree obtained by neighbor-joining. Analysis of retrotransposons and retroviruses (Fig. 4) shows a clear segregation between the two types of elements. Again, the two techniques used produced similar classifications in spite of low bootstrap values. A more consistent classification of these elements is obtained (Fig. 5) when another domain of the integrase specific to retroviruses and to LTR retrotransposons, the HHCC signature involved in the recognition of LTR sequences (Khan et al 1991), is added to the DDE matrix of Fig. 1.

In these conditions, retroviruses are then clearly separated from the LTR retrotransposons, while the GypsyTy3 elements lie between the retroviruses and the CopiaTy1 elements. This relationship between these elements is similar to that obtained by comparing the reverse transcriptase domains (Xiong and Eickbush 1990). While this latter tree is more consistent, it leads to a classification of the elements similar to those of the trees based on the DDE signature alone. Finally, unrooted trees were obtained using only one or two elements of each previous group (Fig. 6). These trees show that the mariner-Tc1 superfamily is mono-

364

Fig. 2. Trees obtained for the members of the mariner-Tc1 superfamily using PAUP and neighbor-joining procedures with the alignment given Fig. 1. Members of the mariner family (Mos1, MLE, Mar1, and KZ370) form a specific group that is clearly separated from the members of the Tc1 family. The numbers given for each node correspond to the bootstrap values (100 repetitions).

Fig. 3. Trees obtained for the IS elements using PAUP and neighborjoining procedures with the alignment shown in Fig. 1. The elements are clustered according to their families (IS4 family 4 IS4, IS231A, and IS1151; IS6 family 4 IS6, IS15D, and IS240A; IS3 family 4 IS3, IS911, IS150, IS2, and IS426; and IS30 family 4 IS30, IS1086, and IS4351). The numbers given for each node correspond to the bootstrap values (100 repetitions).

phyletic, as well as the group of elements including the retrotransposons and the retroviruses. The most surprising result is the position of ISs. Some of these elements seem to be more closely related to other groups than to other IS families. For instance, the IS630 element are closely related to the mariner-Tc1 superfamily, while the members of the IS30 family (IS30 and IS4351) are closely related to retrotransposons and retroviruses. Again the bootstrap values are relatively low, but the two techniques used give similar classifications. Moreover, similar classifications and conclusions are deduced from a tree including several elements of each group (tree including 57 elements, not shown).

class II elements share a common ancestor. Moreover, two results suggest that the ancestral sequence of elements containing a DDE signature could be an ancestral IS, or an element initially present in bacteria: These are, first, the great variability between the transposase domains of the IS elements, and second, the close relationships between some of them and mariner-Tc1-like members, or retrotransposons and retroviruses. We can therefore propose the evolutionary scenario summarized in Fig. 7 (a and b). This scenario assumes the existence of an ancestral DDE signature with some endonuclease properties. This signature was then ‘‘trapped’’ by two short inverted repeats leading to an ancestral IS. Several groups of elements emerged from this ancestral TE(s), to give the present-day IS families, such as IS3, IS6, IS30, or IS630. The last two families could have then led to the mariner-Tc1 superfamily and to retrotransposons and retroviruses. The question underlying this scenario is whether the capture of an endonuclease domain by two inverted repeats occurs once (scenario I) or several times (scenario II). We cannot, yet, choose between these two alternatives. For instance, the inverted terminal repeats of several class II elements

Discussion: Possible Evolutionary History From the previous results, it is tempting to infer an evolutionary history of the elements sharing the DDE signature similar to that proposed by Xiong and Eickbush (1990) from comparison of the reverse transcriptase domains of retrotransposons and retroviruses. The present analysis strongly suggests that members of class I and

365

Fig. 5. Trees obtained for retrotransposons and retroviruses using PAUP procedure. The data matrix was an alignment (not given) of the region including the HHCC and the DDE signature. The former signature is specific for LTR retrotransposon and retrovirus integrases. The numbers given for each node correspond to the bootstrap values (100 repetitions). The underlined elements are those present in the tree given in Fig. 4. The remaining sequences correspond to other retroviruses (see Table 1 for host and accession numbers).

Fig. 4. Trees obtained for retrotransposons and retroviruses using PAUP and neighbor-joining procedures with the alignment shown in Fig. 1. The numbers given for each node correspond to the bootstrap values (100 repetitions).

show no clear similarities, except for the four extreme base pairs (see for instance Langin et al. 1995). In their evolutionary scheme, Xiong and Eickbush (1990) proposed that non-LTR retrotransposons were derived from an ancestral structure containing the gag, reverse transcriptase (RT), RNase-H (RH), and integrase (int) domains. Hence, the integrase domains of non-LTR retrotransposons should be similar to those of LTR retrotransposons. However, Luan et al (1993) showed that the integration mechanisms of LTR and non-LTR retrotransposons can be different. However, the reverse transcriptases of non-LTR retrotransposons have some similarities to the other retrotransposons and retroviruses (Xiong and Eickbush 1990). These similarities and differences suggest that LTR retrotransposons may result from an association between the transposase of ancestral class II TEs and the gag-RT-RH domains of an ancestral non-LTR retrotransposon. Thus, present-day non-LTR retrotransposons could be closely related to an ancestral non-LTR retrotransposon which could be partly at the origin of LTR retrotransposons and retroviruses. Thus, in this model, it is more parsimonious to assume that nonLTR retrotransposons have never used an integrase do-

Fig. 6. Trees obtained using PAUP and neighbor-joining procedures when one or two elements of each group were considered together. The data matrix is that shown in Fig. 1. The numbers given for each node correspond to the bootstrap values (100 repetitions).

366

Fig. 7. Possible evolutionary history of class I and class II elements. In this model the same symbol (int) is used for the transposases of class II and the integrases of class I elements. a There are two nonexclusive hypotheses for class II elements. First (scenario I), the DDE domain (int) was ‘‘trapped’’ between two inverted repeats (ITR) only once, and the IS families or related superfamilies emerged from the same ancestral element. Second (scenario II), the DDE domain was ‘‘trapped’’ several times. b Retrotransposons could be the result of an association

between an ancestral non-LTR retrotransposons, providing the gag-RTRH (RT 4 reverse transcriptase and RH 4 RnaseH) domains, and an ancestral transposase of class II elements, providing the integrase domain. The IS30 elements for the integrase (int) domain seem to be closely related to the LTR retrotransposon and retrovirus integrases. Finally, non-LTR retrotransposons use an RNA-mediated system of integration.

main similar to that of LTR retrotransposon than to assume that such a domain was first integrated into these elements and then replaced by an RNA-mediated mechanism. Therefore, the evolutionary history described here indicates that the direction of evolution proposed by Xiong and Eickbush (1990) could be inverted for the non-LTR

retrotransposons but not for the LTR retrotransposons and retroviruses. This hypothesis is reinforced by the following results. Michel and Lang (1985) and Ferat and Michel (1993) have described several group II selfsplicing introns that encode proteins that are similar in some respects to the reverse transcriptase of retrotransposons and retroviruses. It is, therefore, possible that

367

such introns were ‘‘trapped’’ by the ancestors of nonLTR retrotransposons and then transmitted to presentday non-LTR, LTR retrotransposons and retroviruses. Recently, Zimmerly et al. (1995), from an analysis of the group II intron mobility, also proposed that mobile group II introns were ancestors of non-LTR retrotransposons and telomerases. Moreover, one of these group II introns has been found in a DNA sequence closely related to the IS3411 element (Ferat et al. 1994), suggesting that there can be associations between transposase of class II elements and reverse transcriptase domains. Comparisons of the transposase and integrase domains of several transposable elements clearly show the existence of a DDE signature. The overall similarities within this region (at least 35 aa between the second D and the E) are more likely due to a common ancestor rather than to convergence. The maintenance of such similarities in these restricted regions may well be due to their function in the integration of the elements (see for instance Fayet et al. 1990; Skalka 1993; Dyda et al. 1994; Vos and Plasterk 1994; Polard and Chandler 1995). Thus, in order to maintain the integration capacities of the elements, these regions could be under high selective constraints (Capy et al. 1994a). The scenario proposed above makes no reference to the host organisms. Since the phylogeny of the species from their transposable elements is frequently inconsistent with their taxonomic classification based on more conventional characters (morphology, structural genes etc.), it is difficult to use such comparisons to investigate the structural evolution of the elements (Capy et al. 1994a; Cummings 1994). However, as suggested by the comparison of their transposase-integrase domains, some transposable elements may have a common origin. The spread of the IS elements in the tree shown in Fig. 6 suggests that insertion sequences may be closely related to an ancestral form of several class II elements and to the integrase domain of class I elements. Therefore bacteria may have had, and may play, an important role in the evolution of transposable elements. Several groups of elements, such as hAT and P, are not included in this evolutionary history, since there were no similarities with the DDE signature of the above elements. Neither were there any similarities to regions related to recombinases, such as the HRY signature defined by Argos et al. (1986) and Serre et al. (1995), or those emerging from the analysis of Lenich and Glasgow (1994) in the transposases of these class II elements. The situation is the same for the non-LTR retrotransposons (I and LINEs), which have no similarities with the integrase domain of LTR retrotransposons. It is therefore possible that all these elements are derived from ancestor(s) in which other systems of integration were ‘‘trapped.’’ In conclusion, the present analysis, together with that of Xiong and Eickbush (1990), shows that comparison of transposable elements in terms of their transposase-

integrase domain or their reverse transcriptase domain can provide considerable information about their evolution. However, all the pieces of the puzzle are not available, and the positions of several elements which are not included in the present evolutionary history remain to be defined. Acknowledgments. We thank Dominique Anxolabe´he`re, Mickael Chandler, and Ronald Plasterk for their comments, two anonymous reviewers for their helpful criticisms on the first version of the manuscript, and Malcolm Eden and Owen Parks for their help with the English form. This work was supported the GREG (Groupement de Recherche et d’Etude sur les Ge´nomes): grant No. 48.

References Argos P, Landy A, Abremski K, Egan JB, Haagard-Ljungquist E, Hoess RH, Khan ML, Kalionis B, Narayana SVL, Pearson III LS, Sternberg N, Leong JM (1986) The intergrase family of sitespecific recombinases: regional similarities and global diversity. EMBO J 5:433–440 Atkinson PW, Warren WD, O’Brochta DA (1993) The hobo transposable element of Drosophila can be cross-mobilized in houseflies and excises like the Ac element of maize. Proc Natl Acad Sci USA 90:9693–9697 Bigot Y, Hamelin MH, Capy P, Periquet G (1994) Mariner-like elements in hymenopteran species: insertion site and distribution. Proc Natl Acad Sci USA 91:3408–3412 Brezinsky L, Wang GVL, Humphreys T, Hunt J (1990) The transposable elements Uhu from Hawaiian Drosophila—member of the widely dispersed class of Tc1 like transposons. Nucleic Acid Res 18:2053–2059 Brunet F, Godin F, David JR, Capy P (1994) The mariner transposable element in the Drosophilidae family. Heredity 73:377–385 Burke WD, Eickbush DG, Xiong Y, Jakubczack J, Eickbush TH (1993) Sequence relationship of retrotransposable elements R1 and R2 within and between divergent insect species. Mol Biol Evol 10: 163–185 Bushman FD, Engelman A, Palmer I, Wingfield P, Craigie R (1993) Domains of the integrase protein of human immunodeficiency virus type 1 responsible for polynucleotidyl transfer and zinc binding. Proc Natl Acad Sci USA 90:3428–3432 Caizzi R, Caggese C, Pimpinelli S (1993) Bari-1 a new transposon-like family in Drosophila melanogaster with a unique heterochromatic organization. Genetics 133:335–345 Calvi BR, Hong TJ, Findley SD, Gelbart WM (1991) Evidence for a common evolutionary origin of inverted repeat transposons in Drosophila and plants: hobo, Activator, and Tam3. Cell 66:465–471 Capy P, Anxolabehere D, Langin T (1994a) The strange phylogenies of transposable elements: are the horizontal transfer the only explanation? Trends Genet 10:7–12 Capy P, Langin T, Bigot Y, Brunet F, Daboussi MJ, Periquet G, David JR, Hartl DL (1994b) Horizontal transmission versus ancient origin: mariner in the witness box. Genetica 93:161–170 Cummings MP (1994) Transmission patterns of eukaryotic transposable elements: arguments for and against horizontal transfer. TREE 9:141–145 Daniels SB, Peterson KR, Strausbaugh LD, Kidwell MG, Chovnick A (1990) Evidence for horizontal transmission of the P element between Drosophila species. Genetics 124:339–355 Dessen P, Fondrat C, Valencien C, Mugnier C (1990) BISANCE: a French service for access to biomolecular sequences databases. Cabios 6:355–356 Doak TG, Doerder FP, Jahn CL, Herrick G (1994) A proposed superfamily of transposase-related genes: new members in transposonlike elements of cilliated protozoa and a common ‘‘D35E’’ motif. Proc Natl Acad Sci USA 91:942–946

368 Dyda F, Hickman AB, Jenkins TM, Engelman A, Craigie R, Davies DR (1994) Crystal structure of the catalytic domain of the HIV-1 integrase: similarity to other polynucleotidyl transferase. Science 266: 1981–1986 Eickbush TH (1992) Transposing without ends: the non-LTR retrotransposable elements. New Biol 4:430–440 Emmons SW, Yesner L, Ruan K, Katzenberg D (1983) Evidence for a transposon in Caenorhabditis elegans. Cell 32:55–65 Fayet O, Ramond P, Polard P, Fre`re MF, Chandler M (1990) Functional similarities between retroviruses and the IS3 family of bacterial insertion sequences? Mol Microbiol 4:1771–1777 Felsenstein J (1993) PHYLIP (Phylogeny Inference Package) version 3.5.c. University of Washington, Seattle Ferat JL, LeGouar M, Michel F (1994) Multiple group II self-splicing introns in mobile DNA from Escherichia coli. CR Acad Sci, Life Sciences 317:141–148 Ferat JL, Michel F (1993) Group II self-splicing introns in bacteria. Nature 364:358–361 Finnegan DJ (1989) The I factor and I-R hybrid dysgenesis in Drosophila melanogaster. In: Berg DE, Howe MM (eds) Mobile DNA. American Society for Microbiology, Washington, DC, pp 503–517 Franz G, Savakis C (1991) Minos, a new transposable element from Drosophila hydei, is a member of the Tc1-like family of transposons. Nucleic Acids Res 19:6646–6646 Galas DJ, Chandler M (1989) Bacterial insertion sequences. In: Berg DE, Howe MM (eds) Mobile DNA. American Society for Microbiology, Washington, pp 109–162 Garcia-Ferna`ndez J, Marfany G, Baguna` J, Salo` E (1993) Infiltration of mariner elements. Nature 364:109–110 Genetic Computer Group (1991) Program manual for the GCG package, version 7. Madison, WI George DG, Hunt LT, Barker WC (1988) Current methods in sequence comparison and analysis. In: Schlesinger DH (ed) Macromolecular sequencing and synthesis. Alan R Liss, New York, pp 127–149 Haring MA, Gao J, Volbeda T, Rommens CM, Nijkamp HJ, Hille J (1989) A comparative study of Tam3 and Ac transposition in transgenic tobacco and petunia plants. Plant Mol Biol 13:189–201 Haring MA, Teeurven-de Vroomen, Nijkamp HJ, Hille J (1991) Transactivation of an artificial dTam3 transposable element in transgenic tobacco plants. Plant Mol Biol 16:39–47 Hartl DL, Sawyer SA (1988) Why do unrelated insertion sequences occur together in the genome of Escherichia coli? Genetics 118: 537–541 Hehl R, Nacken WK, Krause A, Saedler H, Sommer H (1991) Structural analysis of Tam3, a transposable element from Antirrhinum majus, reveals homologies to the Ac element from maize. Plant Mol Biol 16:369–371 Jacobson JW, Medhora MM, Hartl DL (1986) Molecular structure of a somatically unstable element in Drosophila. Proc Natl Acad Sci USA 83:8684–8688 Jahn CL, Doktor SZ, Frels JS, Jaraczewski JW, Krikau MF (1993) Structures of the Euplotes crassus Tec1 and Tec2 elements: identification of putative transposase coding regions. Gene 133:71–78 Khan E, Mack JPG, Katf RA, Kulkosky J, Skalka AM (1991) Retroviral integrase domains: DNA binding and the recognition of LTR sequences. Nucleic Acids Res 19:851–860 Kidwell MG (1993) Lateral transfer in natural populations of eukaryotes. Ann Rev Genet 27:645–662 Kulkosky J, Jones KS, Katz RA, Mack JPG, Skalka AM (1992) Residues critical for retroviral integrative recombination in a region that is highly conserved among retroviral/retrotransposon integrases and bacterial insertion sequence transposases. Mol Cell Biol 12:2331– 2338 Langin T, Capy P, Daboussi MJ (1995) The transposable element, impala, a fungal member of the Tc1-mariner superfamily. Mol Gen Genet 246:19–28 Lenich AG, Glasgow AC (1994) Amino-acid sequence homology between Piv, an essential protein in site-specific inversion in Mo-

raxella lacunata, and transposases of an unusual family of insertion elements. J Bact 176:4160–4164 Liao LW, Rosenzweig B, Hirsh D (1983) Analysis of a transposable element in Caenorhabditis elegans. Proc Natl Acad Sci USA 80: 3585–3589 Luan DD, Korman MH, Jakubczak JL, Eickbush TH (1993) Reverse transcription of R2Bm is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72: 595–605 Lidholm DA, Gudmundsson GH, Boman HG (1991) A highly repetitive mariner-like element in the genome of Hyalophora cecropia. J Biol Chem 266:11518–11521 Maruyama K, Hartl DL (1991a) Evolution of the transposable element mariner in Drosophila species. Genetics 128:319–329 Maruyama K, Hartl DL (1991b) Evidence for interspecific transfer of the transposable element mariner between Drosophila and Zaprionus. J Mol Evol 33:514–524 McClintock B (1984) The significance of responses of the genome to challenge. Science 226:792–801 McDonald JF (1992) Transposable element and evolution. Special issue of Genetica 86 Michel F, Lang BF (1985) Mitochondrial class II introns encode proteins related to the reverse transcriptases of retroviruses. Nature 316:641–643 Perkins HD, Howells AJ (1992) Genomic sequences with homology to the P element of Drosophila melanogaster occur in the blowfly Lucilia cuprina. Proc Natl Acad Sci USA 89:10753–10757 Polard P, Chandler M (1995) Bacterial transposase and retroviral integrases. Mol Microbiol 15:13–23 Radice AD, Bugaj B, Fitch DHA, Emmons SW (1994) Widespread occurrence of the Tc1 transposon family: properties of Tc1-like transposons from teleost fish. Mol Gen Genet 244:606–612 Rezso¨hazy R, Hallet B, Delcour J, Mahillon J (1993) The IS4 family of insertion sequences: evidence for a conserved transposase motif. Mol Microbiol 9:1283–1295 Robertson HM (1993) The mariner transposable element is widespread in insects. Nature 362:241–245 Robertson HM (1995) The mariner-Tc1 superfamily of transposons in animals. J Insect Physiol (in press) Rosenzweig B, Liao LW, Hirsh D (1983) Sequence of the C. elegans transposable element Tc1. Nucleic Acids Res 11:4201–4209 Schwartz E, Kroeger M, Rak B (1988) IS50: distribution, nucleotide sequence, and phylogenetic relationship of a new E. coli insertion element. Nucleic Acids Res 16:6789–6802 Serre MC, Turlan C, Bortolin ML, Chandler M (1995) Mutagenesis of the IS1 transposase: importance of his-arg-tyr for activity. J Bacteriol 177:5070–5077 Skalka AM (1993) Retroviral DNA integration: lessons for transposon shuffling. Gene 135:175–182 Swofford (1993) Phylogenetic analysis using parsimony. Version 3.1.1. Smithsonian Institution, Washington, DC Vos JC, Plasterk RHA (1994) Tc1 transposase of Caenorhabditis elegans is an endonuclease with a bipartite binding domain. EMBO J 13:6125–6132 Warren WD, Atkinson PW, O’Brochta DA (1994) The Hermes transposable element from house fly, Musca domestica, is a short inverted repeat-type element of the hobo, Ac and Tam3 (hAT) element family. Genet Res Camb 64:87–97 White SE, Habera LF, Wessler SR (1994) Retrotransposons in the flanking regions of normal plant genes: a role for copia-like elements in the evolution of the gene structure and expression. Proc Natl Acad Sci USA 91:11792–11796 Xiong Y, Eickbush TH (1990) Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J 9:3353– 3362 Zimmerly S, Guo H, Perlman PS, Lambowitz A (1995) Group II intron mobility occurs by target DNA-primed reverse transcription. Cell 82:545–554