We determined the complete nucleotide sequence of the gypsy element present at the forked locus of. Drosophila melanogaster in thef allele. The gypsy ...
MOLECULAR AND CELLULAR BIOLOGY, Apr. 1986, p. 1129-1134 0270-7306/86/041129-06$02.00/0 Copyright C 1986, American Society for Microbiology
Vol. 6, No. 4
The Drosophila melanogaster Gypsy Transposable Element Encodes Putative Gene Products Homologous to Retroviral Proteins ROBIN L. MARLOR,t SUSAN M. PARKHURST, AND VICTOR G. CORCES* Department of Biology, The Johns Hopkins University, Baltimore, Maryland 21218 Received 25 September 1985/Accepted 16 January 1986
We determined the complete nucleotide sequence of the gypsy element present at the forked locus of Drosophila melanogaster in thef allele. The gypsy element shares more homology with vertebrate retroviruses than with the copia element of D. melanogaster or the Ty element of Saccharomyces cerevisiae, both in overall organization and at the DNA sequence level. This transposable element is 7,469 base pairs long and encodes three putative protein products. The long terminal repeats are 482 nucleotides long and contain transcription initiation and termination signals; sequences homologous to the polypurine tract and tRNA primer binding site of retroviruses are located adjacent to the long terminal repeats. The central region of the element contains three different open reading frames. The second one encodes a putative protein which shows extensive amino acid homology to retroviral proteins, indluding gag-specific protease, reverse transcriptase, and DNA endonuclease.
the amino-terminal region of this protein encodes a gagspecific protease required for the cleavage of the gag polyprotein (3). Finally, the third open reading frame, located at the 3' end of the retroviral genome, encodes the components of the viral envelope (3, 20). The copia element of D. melanogaster contains only one open reading frame, whereas Ty has two distinct ones, both elements encoding putative protein products homologous to the reverse transcriptase and endonuclease activities of retroviruses (2, 10). Other Drosophila copialike elements, such as 17.6, are more similar to vertebrate retroviruses in the organization of their coding potentials, with three different open reading frames, the second one encoding the enzymatic activities (15, 19). Here we present evidence which indicates that the gypsy element contains three different open reading frames and is structurally closer to the Drosophila 17.6 element and vertebrate retroviruses than to copia or Ty.
The Drosophila melanogaster gypsy transposable element is associated with spontaneous mutations whose phenotype can be reversed by mutations at unlinked suppressor loci (9). This element is transcribed in a temporal specific fashion, giving rise to a major 6.5 kilobase RNA which accumulates at highest levels in 2- to 3-day-old pupae (11). We recently proposed (11, 12) that the mutational activity of the gypsy element on suppressible genes is a direct consequence of the transcriptional properties of this element. The mutagenic effect of the transposable element on these loci is due to transcriptional interference on the genes located nearby. To understand the molecular basis of this phenomenon, we investigated the DNA structure, of the gypsy element. Gypsy i5 a member of a class of structurally similar transposable elements which contain long terminal repeats (LTRs) (1, 5, 9). Other members of this family are the copia-like elements of D. melanogaster (14), the Ty elements of Saccharomyces cerevisiae (13), and vertebrate retrovirus proviruses (20). In addition to the conservation in the organization of the different transcription signals between the LTRs of retroviruses and copialike elements, the latter ones also contain nucleotide sequences homologous to the tRNA primer binding site and purine-rich sequences, both necessary for the initiation of DNA synthesis in a retrovirus system (6, 16, 22). The organization of the protein-coding regions of these different elements nevertheless varies. A typical vertebrate retrovirus consists of three genes, termed gag, pol, and env, required for viral infection and replication (see reference 20 for a review). The gag region encodes a polyprotein which is cleaved giving rise to several small proteins found in the core of the virus particle (3) and whose exact function is not yet understood. The pol gene is expressed as a gag-pol polyprotein which is the precursor of the mature form of reverse transcriptase. The N-terminal region of this protein product contains the DNA polymerase and RNase H activities of reverse transcnptase, whereas the DNA endonuclease activity required for provirus integration is located in the carboxy-terminal region (21). In addition,
MATERIALS AND METHODS DNA isolation and sequencing. Purification of plasmid DNA, digestion with restriction enzymes, and labeling of DNA fragments were done by standard procedures (7). The sequence of the gypsy element was determined on both DNA strands as described by Maxam and Gilbert (8). The analysis of the protein sequence homologies among different retroviral elements was performed with the Bionet computer facility AX
+_*-
N
C
I
-*-
HH
CC
111 -4.* **-
A
1 1 _-
_ *
H U
X
111
.-
E C AX
1
II1
_
.
-*
-_ -*- -_-
_4
_IKb
FIG. 1. Restriction map of the gypsy element. The shaded areas represent the LTRs of gypsy, and the central region of the element is indicated by a thin line. The lower part of the figure indicates the strategy used to obtain the DNA sequence of the gypsy element. The symbols used to represent the restriction enzymes are A, AvaI; C, Clal; E, EcoRI; H, Hindlll; N, NcoI; and X, XbaI.
* Corresponding author. t Present address: Department of Biology, Massachusetts Institute of Technology, Cambridge MA 02139.
1129
1130 ,
MOL. CELL. BIOL.
MARLOR ET AL.
LTR
TZ cTTCMrCOFCTCC AAHAAT"ccT su TBMCcTYCCTC ABTTAAcAATACMTOcTATtTBCTCOABATAABACTTTSTA0TBACTCTABCAUT6lTc aD
CTTCTACTCASTTCAAATCTTST6TCSAWATA OATcS TAAACTTASTTTTC USBTITTSCTTCACSTBCAAAT,CTBATCSCTTUCCSACTTBTN CU ATUIIITC T CTTCAB TCCAACCTTCTAATTT MMASTSCTTBOSCCMCTCATCBTTMA AACATCTTATTCTTATTBMTAACTTAACTCCACKCTASTT ATMAACTT CAC MTCTTTT CASTBTCTAAA"CBCTCAK6AC""AA"TCTCTABCCT*ACT6 AATAACAT MCTCTBCCTATT CCTCATCCT TYATCTBC TAA TTCWTTTAAAA CAAB TTAABTGAAASTMTTTTA^CAKT@TCTTTOACBCAA C"TAB TCANCA "CAI MCACACTTYS * AAAAWAA* AAA*-AOnAAWWAA*VW * WAAAAAAAA-A
B A A
VVVARAAAAA-"AAA--bV"w^vA
- *---^V--AAAA -AAA
130
260 390 520 650
A ---A-AAWATv-VVVA -^--^-A ---^-VV
TATKAAtCtTTCACOAAAACAMTTTTCCTTSATACCCAAAtOAtAA TTATAATTCCAtTA TTTTAATOMA 910 TACATTOCATKMCTCYTTTMSTAAAAATATTOCATACBTTACBAAACATtT>TtCOTTAACCAATAAAAGATTATTATATTOCATACCTTTtCTTOCATCCATTTAOnCCBATCAATwTIT 1040 1170 A TAhSBC6TCTTA66TC SC TASCT TC6BCAKAATATATTTSTB6TBTSCCAACCAACAACCAATOS TTS6CAATAACTACAS UTAA6BTtCAAT
ccATcCCOTTTfTTAtATAtAT ATTOCATUttsSATAMCCTTTA
NETS#rTrpAlaHisAsnTyrArgLysValLysValGluTyrGlubrluAspSrTrpgluSluBluSlnVaISly6lnAlauiBlyA
BCCBTTABATABT6CACBBTABATATTACCATB ACCCAACBTTCABTCTTATCBACATCT6TCA"C ABBCATTBTC6CMAACM6CTCCCAATTtT UU
TTCCCTABC 1300
rgProLaukpS rAliThrV&lAsplIeThrktAipProAsn6lnIldInAliLoullekpAsnAlavilArg6lnAl&Lrbr6InGIn6InOwBInphdInthrOlnLovAurLruAI TCTCT6tTMMCCCCBT6TTAUT6C6ACATT=CTTBACATAA^TA TCTtTACCU 6TTCTCCTA 1430 T6C6CC66TACAB A6TTKMTBCtAO TCAATTTI aAl&ArgYalBltnSwLeuCInVal6luAlaProOln1leLyslIlTyr6luLysVlkr&lbIkAnProA pSlulArgCyskspll#PrdoukpIlIl LysSrValPrdluthdrClyThr CBCC6E- 1560 CAMC6ACTATBCBTOWCAATCBCATATACBCCTAttABCTCTCtACCACTACATtCBYCAMTC^CCATCABCT6TTCCAMTATTAAUC66AATAAAATtCUTB CInAspOluTyrV^IAlTrpArgglnS rAlalleTyrAliTyrCluLeuPhLysProTyrAsn6.lySwerAl&HisTyr6InAliVilAl&IllL uArgAinLysll ArgelyAlaAla6lyA ACAT6T T\BTtTSC^AZAN CT 16"0 CTTTACTC6TCTCCCACAATACSAT UATTCATKTACTTTSCCAtACTABATBCACtTCT6CNACUMCATCCCTCTACBMBT lILeuLeuValSerHi AsnThrVelLeuAinPheAspAlIillieuAliArgLrukpCysThr'TyrkrAfpLysThr8orLeuArgLeuLeuArgCInGlyiLeu6luktV&lk6ln6lyAspLe D CCCTSCAT8T 1820 CAPTAACC TTBTCACTMCAATC6TAATBACBCAT6SACABST6CTSAMT6CTTAACBCTBABOT ACCACTAAT6CAATATACT^CATABTT
*pL uLmAmlBluVYaArAlaAspAiLd"Htisl uProLeuNetBInTyrTyrAspBluVal6luLysLysLeuThrLwvYelThrknLyslevlYalNtThrHidlu6InGlu6lyAak fTTTATTTC 6TCAAAA6CMT"CAAcTTc^"TCT,,TCBcCCA CCAANCT,ccTCTACTOOCTTTAOCTA BC",,"MOc^T..T.6 TB= MTTc6cY T 195
TTC6 TCAMCAA6A6CC6TTTCCANB6 CCAAAT TCAT_AMCCACTTCACC K MATAACN 20W0 ACCCAABSCCBTA yrAl LysAIlValoluBluArg l^HidwrClyAliAsnIlyLyderArgPhiBlnBlyLysProhnL'ysClu$luOlndlyGInAArgAsnProHishoThrLysArj rdy*sk nOl AATCATCCTAACBATCCCOTTTAAANAAAT ACAACCC ACTCACKI"CBCAATt8 TCBTTCATCCTCCATTTASKACSTATACTTATCA" yBlnThrAsnLyeAepThrClnAldInAliPr Bn roNtlvtsiVlAspSerSrkrArgPhoArulInArgThrCluNisTyrCInA nHisProAduhrAsnAlikh9LysrjgAsn STAtTSAAACAC 6CTSCASTCSAB TChAA~ TCCTCA6AACS CTCAAABMC6CAA"CSTCTSA C6C"FTTSTCM6N CCTAAA krrCwluArgkrThBlyProArgArgCInArgLeuAsnAsnV&VlYsIlnCluAalPrdyydInLyskpProLydluCluTyrOluLysThrA&LysAlAlVel8lullulleAsprI CtTNICt"CCTTAAAATCCTAACBTACTACCMMACTC AATBtABTACCTCACTCACBACTCBTTBAATTTTTT ------TCCWTMTCATTTBC@t ilArdlCyCArgrLruAinAup6lyTrpLeuuly6luPro t Iubnl uTyrAl aProPraAlspyrLnknPhIL AllI lyrThrLouLysNETLoulleJhpThrkpAI&AIaLysnTyr ATTAUCCCBTAAAACTMrATOTATCCCsCTCBCC MCTTTCTCBITBACTCAATACACBCTCCACMBATCAAMATCCTTCATeSMTCTTCAAOCACACTCMCATTTT
2210
1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~212340 2470
ArjLruY&IBIuPhvPhvArglywgrwgLAoLuPr;ghlltluZrArltrL
IllArgProYAlLydI1uLeuyshnYalktProYVeIAlaBrProPherYV&lSwSorlIeHidly6orThrClulliiLysHisLysCysLeufttLysYVeIPheLysHislleSerProPhoP
2UOO
CKTtATTCATTATTT 2730 TTCTTTTOATTCTCTCAATOC CTTCOAUCTTTCATABCTTB6*AMTCTTeFUZTAMACTCAACCTTBCOABCTTTA6AAT hoL iuL up6wL#AsnAlOheAspAlaiIleIleClyLeukpLruLwThr1uAl^lywlLysLeugknLeu^1AlhlukprL u6luTyrCln8lyIAl&61uLysLnHisTyrPh TCAAMACA TAATAA6TTA C CC^TTTCTCCTACCAAAT6MACTCTTMTTTT 2860 CA9CT6CCCMA6TTAAATTTCAT6ATBTAAACBATATT9TT6TACCT6ACTCC6TT
e$rCysProStrY&AlkaPhoThrAspV^lAsnA%pIltValYalProAtsperYalLyLydISuPheLy%AspThrllelleArqArgLysLysAlaPhdwThrThrkhnCluAliLeuProPhe TCAA6CACTCCTBllCBATTA -ACCKTBC TCYCACMTTC6CBTACAAT BAATTCSCOT8TACTtA6ATAMCAACTCTTATM T8TCTCCBACTT T knThrAlVY^lThrAIAThrIleArgThrYV^lhpkn6luProYalTyrS rArgAlaTyrP'roThrLeulbtCIyV&lS rAspPheVilAsnAuSluYalLydInLAfLyskpClylleI TCABOCCCTCM TCTCCTATAACA8CCBCCOBTOTOAC
AA
NCEBACCCCCCUC
29
CU A-iC 6TT"TCATT6ACTTCAVTAAAtTBAAATTTCCTCA 3120
lIArgProwerArgBrProTyrAsnSerProThrTrpYalYalkspLysLydlyThrAspAlahuelyASnProAsnLysArgL.uValIlApPhArgLysLuuAuSluLysTkrIluProk MC6TACCC6AT9CCTA6CATTMCATSATTCTA6C CATCT CT TTCTTCACTACCCTT9ATCTTAA6TCA6STATCATCAAATTTACCT pkrgTyrProNstProkrllSrolletIl LtuAl AsnLeuBlyLysAlaLy*sPhoMThrThrLvAspL#uLySrBlyTyrHi6lnlleTyrLwAikl^ClHisA*wl6LysThX TTATTTCCTTCTT CTACCBTACACTTMYC TAUCOAT6TBCTT TTCTCB6TS TCA T66TTAATACTTTT
PthSerY&A&n6ly61yLystyr61oPbSCysArgLouProPhlyh61L ukArgAl arbrSir Ihe61l ArgAl&LvuSkpA9pYsILvuArj6l u6InIe1L ileCysTyrYalTyrY
350
I t TAATBA6TCATAATTTTCTCTAACATCCBACCAT6TCCBMAATCBTA6TACTAAATBT6ATCOT6CUA CATBU TAsU TABTTCTTTTf alApAspYal IlIlSh@XrBluknClukrhpNisYalAgHisll AspThrYa LissCydoullAslubtkrV lSr6lntluLyThrArddl Lyuv AGTAMCCt6CTTTATTBT AT AC"TAAATMBATMAAMTCXTCCATCAtWTACCTAAMAClTOTTTACt^TTACTTTCCTTUTTTAKACTM uQ
16luTyrLuSlyPhull.YalrLyAspSlyThrLysbrAspPrdluLysValLysAlalldlnBluTyrProSluProAspCysValTyrLysValArgSPALedSlyLsvAIaSsrTyr
3770 tC" .-TTMTOTTCCTTTATftKTCAA AC TAATCTTCATCAABCTTTCCTBMT@CBCCC^TCACCOATA "TTCCTAAA_TA"BTCTT TyrkroV^lPhtIl LyskhpPheAlaAlIlleAlaklbProIlrThr bplI@LruLySlyllulblykrVallirL-ydi iteLyLyslI&odrIICl^lu8IThrC InArjA ACKtTTCO CTCTACTABCATCCBA6TtTCATACTCAAATA £CACT "TTAAAACCTTTTCCTTACTA tCAGTKTCBKAATUTATCT06TTACTtCMAC 3900 snAlOke gAreArgIILddAl ahr6lukpVlI Ild"LysTyrProAspPhoLysLysProPh AspLtoThrThrA Al a8rAlhra8l y I IIIIyAl da LruhwCl nCI
=T8BLTAATTCIFCAGATTCCTOTAT6CTCTAN 030 TAKAMACTC OAM ATCTKTOMTTTet^-C BC6CC AATCACCCATATATCCC6TACCTTMACAO C6AC Al #YiklTrpAl &Limgl yLydeuSlnA§Ph Lityrll y I dlyZrProlI #ThroltII #SerkgThrLeuLydl nPrdl utlnAsnTyrAltThrAsnlurgOl LuLMIlI
GYPSY ELEMENT AND PUTATIVE RETROVIRAL PROTEIN PRODUCTS
VOL. 6, 1986
1131
TCATTTCCTOTBCCACABCCATTOCCTTAAATBOAATCTACTABACABATMTCCAnTTTCTAAACCT l41 BOTTAAATTTTACCBCCACAAC 61el ItAnII ehtThrpHlls61tProLeuThrPtA hSlVAlakpArgknThrknAl aLyslI eLysArgTrpLysSerTyr IIesp6lnHi sknAl &LyY&VlPheTyrLysPro6l yL AtAAAATCCTCOTKBCBTCTCTAl CBATCTBATOMTACCCAAAACAMCCCAAABCTBACC ATTCACArTACTCTMCTOACCTACACO UTCL ACABCACCU 4290 ydI uA ntheVYl A1 spAl LisSrArg6l nAsnLeduSAl Ltu6lnAsn61 uProBl nBrkpAl AI Thr IlI Hi skr6luLeu$rLvuThrTyrThrYal 61 ThrThrkpLysPr sYTTAATTCTTCACAMAACATTCTBBBCAC8 TTTTMCCTCAAC6 AACCT66T6CTCTTTC6AAC TCTCOCCCTTAATCAB TTTKT CACACTCATMCCSt"CAABTTSA 5070
ILdysTbrYalTyrCy-sAsp1BuProAlaPheAsnkrBluThrY^IlThrXrlktLeuLysAsn6rPhr6yllIAspIleValA nAlIProProuHisBwr$rbknClyClnYA1BI
MTOlTTCo6AWACtAAT^ AABAWATCTT6AC ACF ACOATAMATATCC CCBTCTIATTACTCtTW1 5200 uArfhcHisSerThrLtuAI&61IalldAaArgCysLeuLysLrukpLysLysThrknAspThrYal61loLouIleL uArgAl,aThrll6lvTyrAsnLysThrYV^lHisrV^lThrArg6lu CUA"t T"TL4 TNCAt CACBACtTCCCTAGAATCAW.ABTTABTAABCtU CTCA CMACACATC CCt" TT=BAGAAACCBMTBTTUTBT 5330 ACNTTACC*CA4CAUMCMTTlt
ArgProlldluYalYalHisProdyAlHistiduAgCysLouCluII .ysAlargLeuYalLysAla^6InS1nkpBrIIdlyArgAsnknProbrArpilnAwArgValPhdluYalB
6A6AACSCST6TTT6TMAAM UMTAMATAA6CT ACTCCACTAT6CAMWAMAAATBCAUCA6ACTT6NAAt TCT6TTCTTATT
AM
T66TMCACA6CA
lyIWrluArgPIhoVikILyskAshnLysArgLtuClyknLysLouThrProL uCysThrCluClnLysYVIIlnAalipLou6lyThr8orYalLoulleLydlyArjVIYVIHisLyskpAs
MCCtCiTAW.T=TCTACAtzTTATS TATTATITCAA0F AAAATCTtAt ,CSTBACACCT TTCTTTAA^TTTCCAJSTTC AMCTCAtTCATTTCATACCTTOBTATABCCAAT aLv&Lys f TbrPhoroLuClInLruBlyrLysL uCysSInBl yLysrCl uNi sCysr I l eThrLouSerLoullI rArlPbeThrLmETNbtPhollI PrdLruValY^l Al in CCTCNAT-AUCTTITtTCATCCATCTACATTMCTCTCTTAATH TOTCTCCtTCTTTBAACABCTACCTCTTGAAACTTCUITMCCTTTCCWTACCTAOTATATGATSAAA AlaArglleThrhpPhXkrHisAI^AsnTyr lldroY&IlLr4*lh6yAspY&ILeuV&lPhdl61ulnkrgAspLnuLtuLysHisSer&rXnL uSorBluTyrAl brktillhpSluT APTTCC MCTTCMCCACTACATT^ttTAATtTBCTACTSCTCATKTACCATCTTVCTUTOTTCMTCTCATCCACCATANATAK BOCTtCTATATfTCTT n &wBrl u$rthfroHi serHi stArgLysLooL"BlVeYl AsThrkpHi sLeuArjThrLv LuLw& ulr^L LysYl Hi sHisArgI#ld aArjwL#ukpMwLr y&C ACOTA TtTV TT0Ir-lTCTCC1 ATIXA oTCTCTT UTTOTCACA6CMAACTA6TATCTMTTMAXCABTACTATAACT ABAAA8CA dlyTbrAldwLpVeLyslYlA1dlyThrProAsAlaThrAspLsPhoLyslIeLysIllThr6luA1 ^1nLeuYal6lvSerkhntrArg8lnl~laleA nkr6luThr6lnLydln ATAtATMTTAACTACACATAASTABTCAM MTCCC6TAAACT CT66TT W.CACT MACACTTATAT6eC TACT6A6ATANATKCT6TCTACATTCAAAATTTM IIOLysLeuThrAhpThrII LysYal IlOAl ArLydlyAspLeVYal kpThrProHi sLoTyr6luAl aLeuLtuAI aArgA n9gtL#u6wThr81IeII6I*AsnL"vI TTCTCACTATTKlTTTNTAATCACTTA TAAATCCACAATCTT6 ATCATOCCBACTTBAABMTCTT6T ATACMCCAATTCTCABCTTAATAAACATCTAABATCAGNTMT l LnTIr IleTbrLnY&l LysSrtIlell dhProThrIILdeuAsHi sAl aAspLnLysPro9uYaI6Iu61nAspThrProll VIeYlkrL ul I dIuAlaGrLyslIlArvlYlLe ATCCBAtlTCATCATTAATTtTTATT6CCTATT*TCAATSATCATTC0ANAATC6CCOUTACCTATCR ..-rATTCTTK CT CtTTT00CBAA
Iln$r6luhntillrNisIoLdelleAlaTyrProArVY^LysPkeSrCiysLyLysY&IlaYVlTyrProYa1SerHis6lnHlisThrlIeLd"AgLftApOluAipThrLedla6lu TBBATBCKCTTT6C"TCACCBATCA
5460 55" 5720
585
5M8 6l110 6240 6370
CTC ACTCCATOCTC6AACCTBCTCAATCCCACACTCAAC 650 CAC^Ct TTAACTTCT6 CUTCTC OCBAACTTCC6T6C8
CydISis*ThrPheAlaValThr6lyCysThr hThrHisheTbrPhrCys6luArg6orrgArg6luThrCysValArgSorLouHisAliClyAs1AI1AI61nCysHisThr6InP
CU-ACUTTBTAMCATACCCCTABAT6AT6sC6TT6T6ATTATCAAC6 TCAC6TTABCACTOATOAB6CMCACCT6ATABWACTACCTOBTAACCTTCBAC 630 roSbrHisLnrgBl lluIProYalAspkpClyY&IY& allskIlh6u*AlMl l Hi sVal BThrAsp6l ySrPrdluThrLtul IvlluIyTbrTyrLmYVlThrPhdIuAr A TCATATMCTKTCC 6760 CtttiwAAC STCTAAT COTAA 6CY TC6 TCCBTOCTTTAC6TACTACA:T6BT0CtMC0FTBT ITbrAl Ohr I nldh y61*rlsPhVlAsnLwArgLysTbrL"bSrLydlnPrdlylIeYalkrgrProLruLrAuhlSlYl 61 yHi skProVelL"biSrII roL"Lm*i s C8ATO:T"CATACATTMACCATKT^tT0AUTGTACBTCTCT&U T ATCTCT"CBTBCTSOSOTUtCTAACTTCOO CCTATWCTCTC=TT 68 AAA
AA
PoAIgL TrpPh V l A2l11 ydVa LmtPhdlyLnlIdl1yrLnAl aLnT k tSrA§lletArEnL HidwIlI11nAhnLuNtA pAvp ltlubrOlu0 ATCTATT T_A ABCACTCTAh6CCCTAC ACAT^CATATTCTCAAC fT R NTMAC SCT AAAATAAATTTTC CTTC6TAK 7020 yrLntAl aLnArgArgk%rjArl aforrkg1u I le0ntgThr IlohaThrPhihnstThr01ueAsp0IyHi sLy L&lClllV&lVlknkh § TAABACTSTTCTATACATtCCTCaCAeTT08TTtCACCTMCOCABATCCT0C6CAAGTWCTTCCTACtWTTT0TTCACCTCAWTCCTIA 7150 6TCBCTTtCCACTTITTIITKTECTC A TAACTTATTTTTC^TATT6TCTTCT^CTCAIBTTCAATCTT6TBT C-A-AB-TTOCTCCOSClCATTIICTTAA 72110 7410 CATCATTOTTATTACATCATCOTATCOCCACABTABTCCTAATAATAA TAATCAACTCCACCTATCCCATATSTCTC> TCATBTC LTR MTCTCTACCTACTMTAAATATCTT ATACTTNWTTATTATT 749
FIG. 2. Nucleotide sequence of gypsy. The gypsy element located at the forked locus in the tX allele was cloned as previously reported (11). This complete element was sequenced by the procedure of Maxam and Gilbert (8) and the strategy depicted in Fig. 1. The arrows indicate the starting and ending nucleotides of the LTRs. The different sequences implicated in the transcription and replication of the element are shaded. The CAT box, the Hogness box, the tRNA binding site, the polypurine tract, and the polyadenylation signal have been respectively designated a, b, c, d, and e. Also indicated is the amino acid sequence of the putative gene products encoded by the three different open reading frames located in the central portion of the element.
1132
MARLOR ET AL.
(IntelliGenetics, Inc., Palo Alto, Calif.) and the SEARCH and ALIGN programs. RESULTS AND DISCUSSION General features of the gypsy transposable element. The gypsy element whose sequence is reported in this manuscript was isolated from the forked locus of D. melanogaster in the I' allele (11). The restriction map of this element and the neighboring sequences of the f locus is shown in Fig. 1. The complete sequence of this particular copy of the gypsy element was determined by the sequencing protocol of Maxam and Gilbert (8), following the strategy indicated in the lower part of Fig. 1. This sequence information is shown in Fig. 2. The gypsy element has a total length of 7,469 base pairs (bp) and contains two LTRs 482 bp long. The central part of the element contains three different open reading frames which encode putative proteins whose amino acid sequence is shown in Fig. 2. Structure of the LTRs. The nucleotide sequence of the LTRs of the gypsy elements responsible for the mutant phenotype in the alleles sc], bx3, and bx34e was determined previously by Freund and Meselson (5). The LTRs of these three different insertions were found to be identical and 482 bp long. In addition, Bayev et al. (1) determined the sequence of the LTRs from two different copies of the gypsy element located at undetermined places in the wild-type Oregon R genome. These LTRs were found to be 479 bp long. In the case of the gypsy element located in the forked locus in ji, the LTRs are identical and 482 bp long. The sequence we obtained differs from that of Bayev et al. (1) by the addition of three nucleotides at positions 163 (A), 165 (T), and 170 (T), and it differs from that of Freund and Meselson (5) by the deletion of one C at position 67 and the addition of another C at position 102. The observed changes do not affect any of the transcription regulatory sequences of the LTRs. This low divergency rate among different copies of the same transposable element has also been observed in the case of the copia element (10). The locations on the LTRs of various transcription signals, such as the CAT and TATA boxes and the polyadenylation sequence, are also shown in Fig. 2. Freund and Meselson (5) proposed that the sequence TATATAA, located at the 3' end of the LTR, could play the role of the Hogness box for initiation of transcription. Because this sequence is located 3' to the polyadenylation signal, the RNA synthesized would not contain all the information necessary to make a new full copy of the transposable element by reverse transcriptase. In agreement with what is generally the case in retroviruses, in which the TATA box is located 20 to 50 bp upstream of the polyadenylation signal (19), we propose that the sequence AATATT (Fig. 2). serves as the transcription initiation signal for gypsy expression. This sequence is similar to the Hogness box of other retroviruses and retroviruslike elements (20) and is located 30 bp upstream from the polyadenylation signal. In addition, a sequence homologous to the CAT box is present 23 bp usptream of the sequence AATATT in a position similar to that found in other retroviral elements. We do not yet have direct experimental evidence indicating that the sequence AATATT serves as a signal for initiation of transcription, but its location with respect to the different components of the LTR suggests that this might be the case. Other conserved sequences present in retroviral elements and necessary for the completion of their life cycle are also found in the gypsy element. Immediately 3' of the left LTR there is a sequence similar to the tRNA binding site of other
MOL. CELL. BIOL.
retroviruses (Fig. 2) which is involved in the synthesis of the first strand of viral DNA (20). This sequence has a strong homology with Drosophila tRNAs (tRNALYS). In addition, the region located immediately 5' to the second LTR contains a 10-bp purine-rich tract similar to the sequence highly conserved among retroviruses which appears to serve as primer for the synthesis of plus-strand DNA (20). Insertion site of the gypsy element. Freund and Meselson reported previously that the gypsy element shows a sequence specificity at the insertion point in the Drosophila genome. In all three mutations sc', bx3, and bx34e, the gypsy element inserts at the sequence TACA*TA, in which * denotes the insertion site (5). To determine the recognition sequence for gypsy insertion in the fJ mutation, we sequenced both the surrounding region of the forked locus, located outside the gypsy element inf, and the corresponding wild-type sequences from a Canton S strain (11). From the comparison of the mutant and wild-type sequences (data not shown), we determined that the recognition sequence for gypsy insertion in this particular case is TATA*TA. This is also the insertion site for one of the gypsy elements characterized by Bayev et al. (1). Because the sequence TACATA is present in the forked locus a few nucleotides upstream of the insertion site of the gypsy element (data not shown), it seems that this particular sequence might not be required for the insertion of the transposable element and that the consensus sequence for gypsy insertion is TA(C/T)A*TA. This sequence is similar to that reported by Ikenaga and Saigo for the insertion of the 17.6 element (6). As a consequence of the insertion of the gypsy element, there is a 4-bp duplication at the insertion site. Gypsy encodes putative gene products homologous to retroviral proteins. The central region of the gypsy element contains three different open reading frames (Fig. 2) organized in a fashion similar to those of the Drosophila 17.6 transposable element (19) and vertebrate retroviruses (20). Using the ALIGN program of the Bionet computer facility, we did not detect any significant homology between putative proteins encoded by the first and third open reading frames (designated ORFi and ORF3, respectively) and gene products encoded by retroviruses or other transposable elements. These two open reading frames presumably encode gene products respectively similar to the gag and env proteins which are expected to be different among various retroviruses. The second open reading frame, however, encodes a putative protein (ORF2) highly homologous to proteins typically encoded by retroviruses. It is interesting to note that the putative gene products ORF2 and ORF3 are encoded in the same reading frame and separated by only one translation termination codon (Fig. 2). Thus, a 1-bp change in this codon will yield a gypsy element in which the ORF2 and ORF3 gene products are translated as a single polypeptide. To check whether the existence of this termination codon is a general structural characteristic of the gypsy element or a particular property of the gypsy located in the forked locus in the fi allele, we have also sequenced the corresponding DNA region of a different gypsy element responsible for the mutant phenotype of y2 (12). This gypsy element is organized in the same fashion as that in the fP mutant, suggesting that gypsy encodes three different putative gene products, the second and third ones being separated only by a protein termination codon. The amino acid sequence in three different regions of the putative gypsy ORF2-encoded protein is shown in Fig. 3. These regions are respectively homologous to the retroviral gag-specific protease (Fig. 3A), the reverse transcriptase
GYPSY ELEMENT AND PUTATIVE RETROVIRAL PROTEIN PRODUCTS
VOL. 6, 1986
1133
A
ASpy
- -- -- -- -T EI KH KC L KKN.&R M- VKE LK - NE P V ASP F S YS SI H LIV --NKSIIIP V-N T KNIFD--- PI9NTSTFIHTSN *KRY------RTTDRK H-S L 9NP6P--- S--DKSAYV96AT A S IT I SK V C K DI DL II A L-CAK F I P E E H V MNAE R P I N V K I D D-H I DESLYTDSXEVVPPLKI A v AK EFIYATKR6IVRLRNDH
E R RIALRT KM
TLKV6 9Pa TF YFK6YTKKIE HCC Copi a -N N T SVHNDN C 6F
MoMLV Cai
Sypsy MoMLY Ca!1 Copia
HK SK V H 6E E IT
AF FKHISPFFLL-DSLNAFD L F PT T N E F L L H P F S E N Y D A T 6 K V T H S F L H V P D C P Y F K I P T VY - - - - 9 9 E S 6 I D LE D VL FCKE AA6 NL N S VK
LLD. LLTEA6VKLNL .R K .L A E A K A T I S Y *.R Di EL T K L K A 9 I H F N N FC 9 L Y E P F T 9 F a 61S I E F D K S 6 V T
L
P F R
a R E T I
EDSLEY D E V TL 6S6A D D RV I F T S K N6L
+ . f
B 6,psy c,nv MoMLV
Copia
Sypsy
1.b MoMIV
Ca"V Copia
F VNNE EVE S .LS6IK P: II0Y R. 1~S
R! S N
D OR
6.
PM
DYE
1
LV.1LK
F
E T F A
S'.
A E H DR E S K H PT S 9P LF
H..Y H 9
F F I L
F K E L K 9 C9K FKK
S E 9 K
Y T V I
-
-
-
-
-
-
-
-
SK
T P
'Y L
-
.EWNRD P E M61S
-9ILT
T
T------CP9 -HYE N D9ESRPLT I LS L I 9 Y N LKV HG NDV K TA FLN6 TL KE ElI L
9
ISSF f
EFH YI
L
FC.
I
+
+
+
F.
. .
F
'T I
(45aia) ... S p
-
-
EK
L
9
LIRE ED-T-
-
A Y
'EK VI
P 9
A
EARKET H I
E H
Y
| M6
'N
E N E
TSL T S E NN E 6DH
F A
-
R E 9 9
E
L KD
L N K A I Y 6 L R Q A a. T C A A
-H
CCKS
--A"-VL ~~ Ip"H C LEaL H D LD A FI Y HF R K LF C LC-
D K I
C
-K YE
S------T K H
F
R
RH P-
F
~S
D P ES
E
E IV TY
K
AK
g~~~
Copi a
DSR
ENKN
T
-
I
R
+
G6psy 1 .6 OMNLY CaMY 6ypsy 17.6 MoMLV Ca"V Copia
K A-K
S6R C- N S.p PPH9 N L R6K-KI K
+
TIT
NTEIN+
K SS
NnDE N P YN NKDEE R6 FT 9KVY
v E6D S K -KF D.A
N
E I
V
L
9 99
D C Y YF
T
K C L I D A N R V S9 L DIT 6 LF E K LAKAN KL9L 9 T L 6 N L 6 Y R A S A T R A 9 K C N 9 H 62I L S K V A E K F R NT D 0N E K R Y
R H I
S LC 6 E LI H T R N NF D
-
M E S L
K TP
RE
E D K
R
L
K F
D,
y Y
T I
yE KIEF R
-
F
CR1L
PS6F
AS
P K
IL
C SPSY I .6 MoILV Copia
F P K 6 S L A K - - - VYV A N"R V CT 9VY D R P K EEIVA EN A H R A a Q E N I K9 V L R D T E H "T F P S L L I - II N E C S I - - - N L L H P:I N T T K F - - - - - - 6 E T - - -. SAV LMNRDRTL K ITETCKA A YVNA A.L------ERSHS P LDFLH9gwTHLIFS iA-RLNNFRILH RF6HISDSD I.EIKRKNMFSD9SILNNLELSCEIC --PCLN6K9ARIPF
T D KNEGQ
L T
A
H
E K
F DlECKN LI K RAKNEEIVKT DMP TK LVSC---F IV"FSIESRKS IPa EHCREIFM 176 F PTKKE A-- KEV T KKLL IE T MoMLY ----6 RV HRP6T EF FTEIKPS6. 6Y6KYLLVF 6SPIIPVT: PLFVVY SD. DDKNYFVIFQ .C.Y.CV YLVIIKYKIDVFSMF9DF A Copia ----K.HI.
6!
6 Y MoMLV Copia
t+ t
+
+ +
t
NIK T.VYC NET.TTS KL KNF F I"'D IVYN AP PIL NILF KL K PA RS.LIA KKR LE--SEEVEL LNTT R| F N9I KPKI P V KYV9TVADLL- MED KLHC A Y EE FPRF NMP9V 6T RELI NERFCVKK-.SVSYHLTVPHT KSEAHFNLKVYY YI.Il
-
t
t
+
t#
* +
*
H HK .NR IR .
:
A N K T
I K T K
A I LT A is
+
FIG. 3. Alignment of the amino acid sequences of the protease (A), reverse transcriptase (B), and endonuclease (C) activities encoded by the gypsy and other retroviral elements. The positions at which four or more of the sequences share identical or chemically similar amino acids are shaded. The symbol * indicates that the position is occupied by identical residues, and + indicates that similar amino acids are present at that position. Chemically similar amino acids are grouped as follows (17): A, S, T, P, and G; N, D, E, and Q; H, R, and K; M, L, I, and V; F, Y, and W.
(Fig. 3B), and the endonuclease activity required for provirus integration (Fig. 3C). Also shown in Fig. 3 is a comparison of the amino acid sequence of the ORF2 gypsy gene product with homologous proteins from cauliflower
mosaic virus (4), Moloney murine leukemia virus (18), and the Drosophila 17.6 (15) and copia (10) elements. The homology among these different gene products is concentrated in specific regions of the protein, and it is apparent
1134
that the gypsy-encoded protein contains the same or a conserved amino acid at those locations in the protein where the sequence has been shown to be invariant among many retroviruses and retroviral elements (19). These locations are indicated in the figure by the symbol * if the different proteins contain the same amino acid at that position or + if there is a conservative change. By comparing the various sequences represented in Fig. 3 at those specific locations, we concluded that the gypsy element is evolutionarily closer to the 17.6 element and vertebrate retroviruses than to the Drosophila copia element. ACKNOWLEDGMENTS This work was supported by Public Health Service award GM32036 from the National Institutes of Health and by Biomedical Research support grant S07 RR07041. S.M.P was funded by National Institutes of Health training grant GM07231. 1.
2.
3. 4. 5. 6.
7.
8. 9.
MOL. CELL. BIOL.
MARLOR ET AL.
LITERATURE CITED Bayev, A. A., N. V. Lyubomirskaya, E. B. Dzhumagaliev, E. V. Ananiev, I. G. Amiantova, and Y. V. nyin. 1984. Structural organization of transposable element mdg4 from Drosophila melanogaster and a nucleotide sequence of its long terminal repeats. Nucleic Acids Res. 12:3707-3723. Clare, J., and P. Farabaugh. 1985. Nucleotide sequence of a yeast Ty element: evidence for an unusual mechanism of gene expression. Proc. Natl. Acad. Sci. USA 82:2829-2833. Dickson, C., R. Eisenman, H. Fan, E. Hunter, and N. Teich. 1982. Protein biosynthesis and assembly. Cold Spring Harbor Monogr. Ser. IOC:513-648. Franck, A., H. Guilley, G. Jonard, K. Richards, and L. Hirth. 1980. Nucleotide sequence of cauliflower mosaic virus DNA. Cell 21:285-294. Freund, R., and M. Meselson. 1984. Long terminal repeat nucleotide sequence and specific insertion of the gypsy transposon. Proc. Natl. Acad. Sci. USA 81:4462-4464. Ikenaga, H., and K. Saigo. 1982. Insertion of a mobile genetic element, 297, into the TATA box from the H3 histone gene in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 79: 4143-4147. Maniatis, T., E. F. Fritsch, and J. Sambrook. 1982. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. Maxam, A. M., and W. Gilbert. 1980. Sequencing end-labeled DNA with base-specific chemical cleavages. Methods Enzymol. 65:499-560. Modolell, J., W. Bender, and M. Meselson. 1983. Drosophila
10.
11.
12.
13. 14.
15.
16.
17.
18. 19.
20. 21.
22.
melanogaster mutations suppressible by the suppressor of Hairy-wing are insertions of 7.3 kilobase mobile element. Proc. Natl. Acad. Sci. USA 80:1678-1682. Mount, S. M., and G. M. Rubin. 1985. Complete nucleotide sequence of the Drosophila transposable element copia: homology between copia and retroviral proteins. Mol. Cell. Biol. 5:1630-1638. Parkhurst, S. M., and V. G. Corces. 1985. forked, gypsys and suppressors in Drosophila. Cell 41:429-437. Parkhurst, S. M., and V. G. Corces. 1986. Interactions among the gypsy transposable element and the yellow and the suppressor of Hairy-wing loci in Drosophila melanogaster. Mol. Cell. Biol. 6:47-53. Roeder, G. S., and G. R. Fink. 1983. Transposable elements in yeast, p. 299-328. In J. A. Shapiro (ed.), Mobile genetic elements. Academic Press, Inc., Orlando, Fla. Rubin, G. M. 1983. Dispersed repetitive DNAs in Drosophila, p. 329-361. In J. A. Shapiro (ed.), Mobile genetic elements. Academic Press, Inc., Orlando, Fla. Saigo, K., W. Kugimiya, Y. Matsuo, S. Inouye, K. Yoshioka, and S. Yuki. 1984. Identification of the coding sequence for a reverse transcriptase-like enzyme in a transposable genetic element in Drosophila melanogaster. Nature (London) 312:659l661. Scherer, G., C. Tschudi, J. Perera, H. Delius, and V. Pirrota. 1982. B104, a new dispersed repeated gene family in Drosophila melanogaster and its analogies with retroviruses. J. Mol. Biol. 157:435-451. Schwartz, R. M., and M. 0. Dayhoff. 1978. Matrices for detecting distant relationships, p. 353-358. In M. 0. Dayhoff (ed.), Atlas of protein sequence and structure, vol. 5. National Biomedical Research Foundation, Washington, D.C. Shinnick, T. M., R. A. Lerner, and J. G. Sutcliffe. 1981. Nucleotide sequence of Moloney murine leukemia virus. Nature (London) 293:543-548. Toh, H., R. Hikuno, H. Hayashida, T. Miyata, W. Kugimiya, S. Inouye, S. Yuki, and K. Saigo. 1985. Close structural resemblance between putative polymerase of a Drosophila transposable genetic element 17.6 and pol gene product of Moloney murine leukemia virus. EMBO J. 4:1267-1272. Varmus, H. E. 1983. Retroviruses, p. 411-503. In J. A. Shapiro (ed.), Mobile genetic elements. Academic Press, Inc., Orlando, Fla. Varmus, H. E., and R. Swanstrom. 1982. Replication of retroviruses. Cold Spring Harbor Monogr. Ser. 1OC:369-512. Will, B. M., A. A. Bayev, and D. J. Finnegan. 1981. Nucleotide sequence of terminal repeats of 412 transposable element of Drosophila melanogaster. A similarity to proviral long terminal repeats and its implication for the mechanism of transposition. J. Mol. Biol. 153:897-915.