Apr 19, 1985 - Nucleotide Sequence of the Syrian Hamster Intracisternal A-Particle. Gene: Close ...... Spring Harbor Laboratory, Cold Spring Harbor, N.Y.. 6.
Vol. 55, No. 2
JOURNAL OF VIROLOGY, Aug. 1985, p. 387-394
0022-538X/85/080387-08$02.00/0 Copyright © 1985, American Society for Microbiology
Nucleotide Sequence of the Syrian Hamster Intracisternal A-Particle Gene: Close Evolutionary Relationship of Type A Particle Gene to Types B and D Oncovirus Genes MASAO ONO,'* HIROYUKI TOH,2 TAKASHI MIYATA,2 AND TOMOE AWAYA' Department of Molecular Biology, School of Medicine, Kitasato University, Sagamihara-shi, Kanagawa 228,1 and Department ofBiology, Faculty of Science, Kyushu University, Fukuoka-shi, Fukuoka 812,2 Japan Received 8 February 1985/Accepted 19 April 1985
We determined the complete nucleotide sequence of the intracisternal A-particle gene, IAP-H18, cloned from the normal Syrian hamster liver DNA. IAP-H18 was 7,951 base pairs in length with two identical long terminal repeats of 376 base pairs at both ends. On the coding strand, imperfect open reading frames corresponding to gag and pol of the retrovirus genome were observed, whereas many stop codons were present in the region corresponding to env. The putative H18 gag gene (809 amino acids) had a sequence homologous to the N-terminal half of the mouse mammary tumor virus gag gene and locally to the Rous sarcoma virus gag gene. The putative H18 pol gene (900 residues) was homologous to the Rous sarcoma virus pol gene almost throughout the entire region. Two conserved regions among the retrovirus pol genes have been reported. One presumably corresponds to the DNA polymerase and the RNase H domain, and the other corresponds to the DNA endonuclease domain of the multifunctional protein pol. By the comparison of the deduced amino acid sequences of the putative endonuclease domain of six representative oncovirus genomes, a phylogenetic tree of the oncovirus genomes was constructed, and the intracisternal A-particle (type A) genome was found to be more closely related to the mouse mammary tumor virus (type B) and squirrel monkey retrovirus (type D) genomes.
transcription was different from known retrovirus primers. Recently, a significant homology of the deduced amino acid sequence in the pol region among the retrovirus genes has been reported (3, 29). However, due to the incompleteness of the total nucleotide sequencing, the internal structure of the IAP gene and a definite relationship between the IAP and retrovirus genes have yet to be clarified. Among the cloned IAP genes, those from two Mus species were fairly polymorphic (10, 15, 16), and a portion of the IAP gene was missing in some of them. Those from the Syrian hamster, however, were not markedly polymorphic, and no deletion was observed among them (28). Thus, we chose the Syrian hamster IAP gene H18 (28), which is a representative among the IAP genes we have isolated, for determining the complete nucleotide sequence. The nucleotide sequence of 1AP-H18 predicted a typical LTR-gag-pol-env-LTR structure, although many stop codons were present in the region corresponding to env. The computer-assisted comparison of the deduced amino acid sequences corresponding to the putative DNA endonuclease domain of the pol region showed 1AP-H18, mouse mammary tumor virus (MMTV) (type B), and squirrel monkey retrovirus (SMRV) (type D) genomes to be closely related. Furthermore, based on the homology of the deduced amino acid sequences corresponding to that region, we were able to construct a phylogenetic tree of six representative members of the oncovirus subfamily.
Morphologically and biochemically, intracisternal A particles (IAPs) are retrovirus-like structures which are consistently observed in a variety of tumor cells and in early embryonic cells derived from normal rodents, such as mice, rats, and Syrian hamsters (12). Mouse IAPs contain a major gag-like protein of 73,000 daltons (8, 18), a magnesiumdependent reverse transcriptase (31), and a polyadenylated RNA molecule (IAP RNA) (18). DNA sequences complementary to the IAP RNAs (IAP genes) are interspersedly present on the rodent chromosomes (12) in several hundred to a thousand copies per haploid genome (11, 13, 16, 28), although the function of these genes in vivo is unknown. Recently, the amplification of a particular subset of the IAP genes (25) and activation of a proto-oncogene, c-mos, due to the integration of an IAP gene into its coding region close to the N terminal (2), have been observed in mouse myeloma cells. Furthermore, the integration of an IAP gene into the intron of an actively transcribed immunoglobulin kappa light chain gene has been reported to cause inactivation of that gene (9). These findings indicate that a considerable number of IAP genes present in the rodent genome can act as endogenous insertion mutagens which may cause genetic diseases such as cancer. There may be a close evolutionary relationship between the IAP and retrovirus genes, because the molecularly cloned IAP genes from two species of Mus and Syrian hamster were 6 to 8 kilobases in length (10, 15, 16, 28) with long terminal repeat (LTR) sequences of ca. 0.35 kilobases at both ends of the gene (2, 4, 9, 16, 17). These LTRs possessed all of the structural features commonly observed on the retrovirus LTR, although they show no marked sequence homology with other retrovirus LTRs. Furthermore, a presumed primer tRNA (phenylalanine tRNA) (17) for reverse *
MATERIALS AND METHODS Clones and DNA sequencing analysis. The IAP gene clone IAP-H18 from the Syrian hamster has been previously described (28). DNA fragments were labeled either at 5' ends with [y-32P]ATP (Amersham Corp., Arlington Heights, Ill.) and T4 polynucleotide kinase or at 3' ends with [a-32P] ddATP (Amersham Corp.) and terminal deoxynucleotidyl
Corresponding author. 387
388
ONO ET AL.
J. VIROL.
1
(+) 2HII IN 3111
Il
ORF11111111i IIIIIIIIIIIIIIIIIIIII I M11111 II IIIliiuii 111111 I 1III II 111111 1 111111ra ORF5 1 ORFB I IH I I I I11 1 Il lIIIlf I||LII| ORF2 1 ORF3, F7 H
111111
gag
A
I ^
4-@
.
*-
OI o-*
O
I1000 1000
1 *0 4
*0 4
*
4
*
0
O-
4. 404O
0
2000 2000
-* 4-,,O
v,
O_
*
----. 0
0-_
11 1
*
4-
O- *
4000I 4000
4-,0
,
.-*
4-* 0
0
0
0O.
~~~~0
O
*----
I
I
I
5000
6000
7000
0-
*--
.
V O-*
-4
A a
7951
4.4.u
-
*-O
4-0 4* * -
1111 1 1 1 ll 11 1I1 1111111 11111 11
X ) IIE
4-.-
O1
3000 3000 -
X--0
*
pol I1H
II llllI1 11111 1111 lI II11111111 1 11111 iHilll II1 I nI III IIII
1H111 11 I 11 I1 I11 I lTiIii IIIliT 11III1 11 1111 111111 11111 I I III 11 IIIITDllllllll IIIT IIll I II11111 I11111 1111111111111 111111111 III1111111 FT1111 111 1 IIII 11111111111111
FIG. 1. Restriction map, sequencing strategy, and open reading frames of the Syrian hamster IAP gene, H18. The arrows indicate the extent and direction of the sequence determined for the coding strand (top; +) and the noncoding strain (bottom; -). Symbols: 0, labeled at the 5' end; 0, labeled at the 3' end; and V, determined previously (17). Stop codons in each phase of both strands are shown by the vertical lines. Putative gag and pol regions are shown, and two mutually conserved regions, A and B, in the retrovirus pol genes are boxed. Restriction enzyme abbreviations are as follows: Ba, BamHI; Bg, BgIlI; E, EcoRI; H, HindIII; K, Kpnl; P, Pstl; Sa, Sacl; Sc, Scal; St, Stul; and X, XbaI.
transferase. The nucleotide sequence of the fragments was determined by the method of Maxam and Gilbert (14), and the sequencing strategy is shown in Fig. 1. For the comparisons of the deduced amino acid sequences by the homology matrix, a computer program was used to generate diagonal lines indicating segments of 30 residues long that show homology with a probability of occurrence by chance of less than 3 x 10-4 (29). With these matrices, the sequences were aligned manually or by the computer-assisted method of Sankoff (21). Amino acid sequence homology was calculated as the percentage of the mutually identical residues at the same position. In calculating homology, each gap was counted as one substitution, regardless of its length. RESULTS Structural features of IAP-H18. IAP-H18, a representative clone of the IAP genes present in multiple copies in the Syrian hamster genome (13, 28), was 7,951 base pairs (bp) in length with two identical LTRs of 376 bp at both ends (Fig. 2). On each LTR, many structural features commonly observed on the retrovirus LTR, such as the CAT box, TATAA box, and polyadenylation signal, were present (17). About 60% homology at the nucleotide sequence level was detected between H18 LTR and those of two Mus species (16), whereas these IAP LTRs had little sequence homology with either MMTV (6), Rous sarcoma virus (RSV) (22), Moloney murine leukemia virus (Mo-MuLV) (27), or human T-cell leukemia virus type I (HTLV-I) (24) LTRs. On the
H18 LTR, four consecutive enhancer-like direct repeats of 23 bp as a standard unit were found (DR1 in Fig. 2) (17). Starting at position 611, eight consecutive and almost identical direct repeats of 14 bp with unknown functions were present (DR2 in Fig. 2) in the region corresponding to the leader sequence in the retrovirus genome. Except in the regions mentioned above, neither direct nor inverted repeats of more than 13 bp in length could be detected. After translation of the H18 nucleotide sequence into the amino acid sequence, positions of the stop codons on each reading frame were identified (Fig. 1B). On the reading frames of the coding strand, several open reading frames (ORF1 to ORF7) were present in the region corresponding to gag and pol of the retrovirus genome, whereas many stop codons were found in the region corresponding to env. No open reading frame of more than 140 amino acid residues in length was detected on the reading frames of the noncoding strand. IAP-H18 gag gene. To elucidate the internal structure of the H18 gene, deduced amino acid sequences of ORF1 to ORF7 were compared with known retrovirus gag or pol gene products by the computer-assisted graphical matrix method as previously described (29). Deduced amino acid sequences starting from ORF1 to ORF3 showed a meaningful resemblance to the RSV gag gene (22) in three regions (Fig. 3). The first region (N-p27) of ca. 90 residues was located at the N terminus of p27, a major structural protein of the RSV virion core; the second region (C-p27 + N-p12) of ca. 140 residues was positioned at the C terminus of p27 and the N terminus
FIG. 2. Nucleotide sequence of the Syrian hamster IAP gene, H18. The DNA sequence of the coding strand is given. LTRs are enclosed by brackets. The CAT box, TATAA box, and polyadenylation signal are boxed. A presumed primer-binding site (PBS) and short direct repeats (DR1 and DR2) are underlined. Nucleotides differing from a standard 14 bp of DR2 are shown by A. In the putative gag region, deduced amino acid sequences homologous to the RSV gag gene are presented in the box. In the putative pol region, deduced sequences of the two conserved regions, A and B, are presented in the box.
1
I't;AGGAGCCG('CCt'T('G('l'ATt-G('l'Al 'J GCCtI'Gl'A(AtAt(;A1('(;( (;('1'(;~A( 1 ( ACTGTCAGTTGGAGTTAACTGTCGCTGTCAGTTGGAGTTAACTGTCGCTGTCAGTTAGAGTTAMC
,0
AtT'GCA(,CTCC'JAGGC1iA 7GTCGC1TCAGTTh GAGTTCACCGTTI?AA A71' 1VCAt t (A('tACM*'('CG('(' ( 1t CA(:Al C'l'(A ~ATGTCt C1 C lCCCGTCACCl'('1 1
246
iGGGCTGGGTTTCTTAGCTTGGGGTCTCCCTC'AAGAAGc,(''CAh'( A1'CT'Al't.l' l'(AAGAI'Cc
360
CT IlA( ? t ('AGACGAl ('C'GACG(I'ITC'CCCGTCGTTCT'I'GCTCGCCA(I
ACGGTAGCGCGGGAc3rCTGGTGCCGAACCCGGGAACTCI'TCAACAT'G( CGCCGCCCCGCGAGGACCCCI(GC;MAA( CCGGC,C'GATTCAGAICTCCAGGGACGTAGGTTCAGAGAGGT 46C "C1 'TTGI1 CCl'TCl L C'TTCATCGTCGCTCGTG & 0R2 a& TGCTGTCAGT CCATC1GCGCTGTCCGTCTiCGCTGTCCATC CGCTGTCCATC GCGCTGTCCATC GIGCTGTCAGTC GTGCTGTCAG
600
ATGCTTTATCCTG AACAACCCTTTl'CTCGACl'TTGUGCCl l'AG Tl'l'Gl'L'ATCAC CGl'CGGht'CGGG 1''AG 1'C:(.AC,.(;Cl'l'l'6G TC;CTAGT(C
&
720
CTGACAGAACMgCGCTG
ClGGTAACGGAGTTG AGCGCGTTl'TTl'AGATAAGCTGTCAGCGCG C,GG'TCGI CACAGTCAGIGGTTACAGCCTTACAGACTG IC.CTiAAAACAAAGAGAT'CTitAAAAl'l'GCCACCACG
840
1
'GGCCTCATGGAACAAGCTiGGTtAAGAACTAGACAGGAAGCATGCACAt ACACTTCAAMATTTTA'G AAGGAGGTAGATAGAG1'GGCACCATGGI'ATGCTTGI'TL-CGGGTCCCTCACTC
960
1080
AGAAAGTCAGGGAACCTTAGAGGAACTTCAGGATAGC GGAGATCTCTGCTTAGGTACCAAAGCCATTTGGAAGCTAGTTAAAAACIGCTTGGAGGATGAGGCCTCI'CtACCCTGCTATAAI ATG TCAGAAACTGA CGGAGTGAGAGAATGGGAGCTCGCAAAAGAAAAGACATGTC1'AAGAAAAAAGGCCCTCCCCAGGAAGTTAAGAAAGGGGGAGAGAAAGAAGGGAGTTATCATTCA
1200 1 320 1440
GGTCTAGAGGAGGAGG TGGCA CGG TATG AGG AAAAAAG ATATs .A CCCTG ATA.GACACCGG CCA CTC;AAP.AC ^AAAATG AATG TG AG(;C tCCACG C C 1'A1w'ATC CGG CGGG CTCAC GCCC C
TCAGCGCCTCCGCCATATGAGCTTCGGCTGAATACtIGGTACTGACTCCTTI'TTACCTTTAGAGGAACGGAGAAAAATACAAAICGCTI'T 'CAGTATTCGAGAATGCGCAAGCCGGAAGA 1'60 roValPheGluAsnAlaC(l uGlyClyArg . p27 CTTGTCTG AGG ATTTT'GCCATGCACT GTACATACCCCCGGTAGGATTACCAATAATTAAAGAATTGGCCGAATCCGTTCGTAATTAC1GGGGTCAATGACTAACTTTACCAAA ValHGisThrProValAspTyrAsnGlnreLysGlul,euAlaGl uSeralAr gAsnTyrGlyValAsnAlaAsnPeThrThrLeuGlnValGC1uArgLeuAlyaAsnPheAlaMetThr
1680
o00
CCCACTGACTGGCAAACGACAG TGAAAMGCAG TG CTTCCCAACATGGGACAATAC.ATGG AATGG AA.G.CTCI'C'l'GG TATG ATG CAG CCCAGG CACAAG CTAG AG TTAATGCCA CGG CT AA ProThrAspTrpGlnTh rTh rValLysAl aVal LeuPr oAsnMetGlyGInTy rMetG 1uTr pLysAl aLeuTr pTy rAspAl1a.Al1aC 1nAl aC:lnAl aA rgValAsnAl aTh rAI
1
GATGAAGAA ATCGAGGCAGTGATGACATGAAATGCGACAGGTCAGGGTCAACATGCCCTTAAAACCAACTACATCTGGGGCGTCATGAGTCAGGTGTCGGCCGCCGCTATTAAAGCT
1920
7GGAAGGCCCTCACAA
GGAGGTCCAGTGGACACCTTALAAAGATTGTCCAAGGACCCCAAGAGCCATTTi'AGACTTTGTGGuCCAGMTGACAGAGGCCGCCAGTCGCATCTTTG 2040 GlvAroSerSerGlyHisLeuThrLysIleValGlnGlyProGlnGluProPheSerAspPheValAlaArgMetThrGluAlaAlaSerArgIlePhe 2160
GTGATTCAGAACMAGCTATGCCCCTGATTGAGCAACTTG TCTATGAACAAG CAAC CCAGG AATGTCGAG CGG CTATCGC CC CTCG CAt AAG CAAAGG TTTACAAGATTGGCTCAGAG TIG1
GlyAspSerGluGlnAl aMetProLeuIl1eGluGlnLeuValTy rGluGlnAl aTh rGlnG1luCysAr gA1 aAla y 1eAl aProA rgLysSe rlysGlyLeuGlnAspTr pLeuAr gVal
ACTAGCT,GCCGCCAT*kTACAAACG CACAGGTATAGAG ATTTATCTAATCG~AAA; CTTGTTTTAATTG TGGGAGAATGGGGCAC C
2280
*p27= CTTGCATGTTGTZGGTAAAGG TTATCATAGGG CTAGCGtAATG CAG lCAG TCCGGG ATG TAAAGGGAGG CTTC
2400
GCAGGGAGCTGGGGGGACCCCTAACCAATGCAGG
1 euGl nTh r HisAr gTy rAr gAspLeuSe rAsnA r gyA1 aCysPheAsnCysGlyAr gMetG lyHi s CysArgGluLeuGlyGlyProLeuTh rAsnAl aGlyLeuAl aaAlAlaIL pl 2
TTAAGAAMGATTGTCAAGCCCCTGAAAGAACAAGAGAACGA
LeuLysLysAspCysGlnAlaProGluArgThrArgGluSerLysLeuCysTyrArgCysGlyLysGlyTyrHisArgAlaSerGluCysArg 2 520 26 40 2760 2880
TACCCCTAAATGAGAGACCTACAATAAJkAiWTTAATTAATGGAAAAGJkATTTGA GG.TkATGGATTCGGGAGCAGACAAAAGCATTAI-TTCACTCCATTGGTGGCCGAAGTCTTGGC lyIleMetAspSerGlyAlaAspLysSerIlelleSerLeuHisTrpTrpProLysSerTrp CCACTGTTGTTTCATCTCATTCTCTACAA.GTCTTGGATACCAGTCCTCTCCTGCCATC.GTGCCTCAGCCTTAACCTGGCGGGATGCT.AAGGCAAACAGGGATGTTTTACCCCCTAT.
3000 3120
ProShrValValSerSerHisSerLeuGlnGlyLeuGlyTy rGlnSerSer ProAlaIleSerAl aSerAl aLeuTh rTrpArgAspAl aGl uGlyLysGlnGlyCysPheThr ProTyrj 3240
GCTCCCCCCAGGCATCACCATTATGGCAAAGATGGGGlTATACAAATC TGTTGCCACTCCCTG1TAAATTTATGGGGACGAGATGTGTTACAAGCCATGGGCATGACCCTAAC IValLeuPr oLeuPr oValAsnLeuTr pG lyAr gAspVal LeuG lnAl aMe tGlyMe tTh rLeuTh rAsnGl1
*~~~~~~~~ .
ATTACACAACACGGAAATCGGGGTAGAAAAGGACTGGGTTTTAT+1TGGaGGCCGTTGAGGCTTCACGACCCATACCATG
I
3360
seuGlyAlaValGluAlaSerArgProIleProTrp 3480
ATGGAATACACCAATTTTTGTTATTAM"GAAATCTGGGAAGTGGAGATTACTCCACG.CCTGCGGGCCATTAACAATCAGATGCATC;TTTTGGCCCTGTTCAAAGAGGCCTTCCTT'T
A
3600
GCTTTCTG CACTTCCTCAAGATTGGAAGCTTATTATTATsAGATATTAAGGATTGTTTCI'TCTCTATTCCACTTTACCCACGGGATAGACCAAGGTTTG CCTTCACTATC CCTTCTCTTAA 313720 LeuSerAlaLeuProGlnAspTrpLysLeuI leIl1ell1eAspI leLysAspCysPhePheSer Il1eProLeuTy rPr oArgAspArgProAr gPheAlaPheThr IleProSerLeuAsn
TCAkTMkGGAACCAGACAGAGAI,TTTCA%GTGGAAGGTACTGCCGCAAGGCATGGCCAATAGCCCAACAATATGTCAGCTA.TATGTGCAGG'AGG CTTTGGAGCCAATTAGGAAGCAATTT~AC3 84 0 Hi sMetGluProAspLysArgPheGlnTr pLysValLeuPr oG lnG lyMetAl aAsnSe rProThr Il1eCysGl1nLeuTy rVa 1Gl1nGl1uAl al,euG 1uPro Il1eAr gLysGl1nPhe Thrl
ATCTTAATCGTTATCCATTATATGGACGiTATTCTTATTTGTCACAAAG AGTTAGATG TTTACAAAMGG CTTTTCCCATGCTGGTGG CTAATTAAAACAGTGGGGATTAGAGATTGC 13 960 SerLeulleValIleHisTyrNetAspAspIleLeuIleCysHisLysGluLeuAspValLeuGInLysAlaPheProMetLeuValAlaGluLeuLysGlnTrpGlyLeuGluIleAlI CTCAGAAAAGGTTCAAATTGCAGATACAGGG CTCTTTTTGGGCTCAAAGAT ACTCCCA"AATATAGTTCCTCAGAAAATAGAAATTCGCAAAGACCACTTACAACATTAAATGATTT 14080
I
It
SerGluLysValGln Il1eAl aAspTh rG lyLeuPheLeuGlySer Lys TTAAAAATTCCATCI'GCTGATCTAAAACCCCTCTTC GCCTTACAGGAAGCACAAATTACAACGAATTGAGGA TCATCCAAATGCTTCCCCTGCTAAGATTATTGACTG GCCTTATACTGCTGCACAAGTTCAGACCTTGGCAGC
:CGCGCTCGCGAATAAAAGCAG
rTAGTTACTTCCTTTTCAGGAC
4200 43 20 4440 4560 4680 4800 4920 5040
CTGCCTTGTCTACCCATGCAC ^CAGCAAAAGAATTCCATAAGCGCTTTCATGTGACGGCTGAGACCTTGCGCCGTCGTTTTGCTCTATCCAGGAAAGAGGCTAGGGAAATTGTTACTC
5160
AATGCCAGAMTTGCTG CGAATTCTTACCTACTCCCCACATGGGAATAAMCCCACGTGGCATTAGACCA TTGCAG ATG TCGGCAGATGGATG TTACTCATATTCCCTCTTTTGGAAGG CTTC GlnCysGlnAsnCysCysGl uPheLeuProTh rProH isMetGly Il1eAsnProArgGly Il1eArgPr oLeuGl nMetTr pGlnMetAspValTh r His Il1ePr oSer PheGlyArgLeu
5 28 0
AATATGTTCATGTTTCTGTTGACACCTGTTCTGGCGTAATGTTTCCA CACCCCTAACAGGGGAAAMGG CCTCATATG TGATTCAACATTiG CCTTG AGG CTTGGAG TGCTTGGGGAMAAC
5 40 0
CCAGAATTTAAAAACAGACAATGGACCAGCATATACCTCTCAAAAATTTC'GACAG TTTTGTCGG CAAATGGATG TTACCCATCTAACT~GGCCTACCTTATAAC CCTCAAGG ACAGGGCA
5 52 0
laAlaLysGluPheHisLysArgPheHisValThrAlaGluThrLeuArgArgArgPheAlaLeuSerArgLysGluAlaArgGluIleValThr
lGlnTy rVal H isVal SerValAspTh rCysSerG lyValMetPheAl aTh rProLeuTh rGlyGl uLysAl aSe rTy rVal I1leGl nH isCysLeuGl1uAl aTr pSe rA1laTr pGlyLys ProArgl le** *LysThrAspAsnGlyProAl aTy rTh rSerGlnLysPheArgGl nPheCysAr gGl nMetAspValTh rH isLeuTh rGly LeuPr oTy rAsnPr oG lnGlyGlnGly
T .wTGGAACGTGCCCATCGTACACTCAAATCATATCTG ATTAAACAAAMGGGGC GTATTG AGGATG TTC1'ACCCTCGG TACCAAGAG TTG CTG TATCCATGG CACTTTTTAC CCTTAATT
5 64 0
IleValGluArgAlaHfisArgThrLeuLysSerTyrLeuIleLysGInLysGly TCTTGAACTGTGATGCCCAAGGCCACACTGCGGCTGATCGCCACAGTTTAGAGCCTG ATAGGCCAAAAGAAATGGTAAAATGG AAAGATCTCTTGACTGATCTCTGGAAAGGCCCGGATC
5 76 0
CTATTCTTATAAGATCCAGGGGAGCTGTGT,GTGTTTTTCCACAGGAGGAAGAAAACCCCCTGTGGATCCCTGAAMGACTCACTCGAAGAGCTCCTTCACAACTCCAGAATAAGAAGGATG 5 88 0 GAGGCCAAGTGGGTGATGAAGACTCTCCTACTCGGGATGATGGTTCCTAGTATGATGGGAGAGTTACGGTGGGCCATCATGTCTACCTTCCCCCTACCGATGCCAGTGATGCAGAATGCC 16 000
LCAGGTGTTCCCTAGATTTTTCACM TAATAGAGAGTTGGGGCTTGCCTTTTTGCCTTTAGATCCTCAGATACAGAGTTTTCAGGATGAG'AGAACCTTGTCATTAGTAGGAAGCTTGTGC TTTACTGTGATTTCTTCATTAMAGG ATAATTTTATTCATATGGATCCAGAAAGATCTAATG CTAATCCTTTTG AATTAATGG AAG CTG TACTGATCTTTCTCCTATGATAGTGAGAAGGG ACAGATTGGAAAiGGGGCATTAGGCAACAGTCTGAG TGAGGGACAGAGCTCTGGATCATCTGATTGG TCCTC'CTCCAATG TTAAG TATAC CCCTTTTAAGAGAACTAATTAT'TCAG CTG CC CTTTATTTGGGTTGTGAGTAATGATACCTTAGGTAAGAATCAGAAAGAAATAAATTGTTCTGCTTAG AATTG TTTTTATGCATTATG CTGGGATG CTATTAAGCACCCTTTTGCTTTGG T AATGCGCATGCCTCGATTTATTCCAGCTCCTGTTGAGG CGCCAAG TAATTTGAG TTTGTTTAGGG AAAAG CGAGATTTTTGGGATCTCI'G CTATTATTGTGGGCCTGATAGCCAC CGCCG CGGTTACTG CTTCGGTTACCGCCTCGGCCCTTGCACTTTCTG CCACTG TGCAGACAACTCAAACCA TTAATGAACTATAGG CCACTG TGAC CATGG TCCTAGATAAACAGG CCACTGCCA
ATTCTtCAGATACAAGGAGGCCTGATGTTGGTAAACCAACAGATTAATT'l'GGTTCAGGAACAATTGGATATATTGTGGCAGATGGCCAAATTGGGATGTGAACAAAGTTGCCTGGCCTTTG
TG TCACTTAGTACAATATGAGAATTTTACAAGAGCAG CTAAC CTG TCAAMAG CTCTG TCl'CAACATTl'ATT ACAGAATTGG ACTTTTGAG TTTG AGCAGACACTCCGTGAG TT,GAGAATG GCGATCCTGCAAG7GAATTCGAsCGCATCTGGATCTTTCTCTGATGGAAG ACCTCTCCCCCTGG ATTTCATCTG CATTTl'Cl'T ATTT1'AAGG AATGGGTGGGGGTAGGATCATTTAGCCTA GTTCTCTG CTGTGGAATAGTGT1"GATGCTTTGGATGGTGTGTACGCTCAGAG CCCAAACAAAAGG AG AAAG T,GGTGGTGG CA CAAGCACTTG CTGCCCTTAGAACACGGAGCATCCCCTGA CATTTGGTTAACCATGTTAAAGCAATAGGTCGCTGGCCTCCCAGCTCTTGCACCCCACGAGGCl AGTCTCATTGCACGGGATAGAG1sGGGTAGGCTTCAGCAGCCCAAGGAGTTGCAAGG CTAAGCACTG CACAGGAGAGGTCTG CGGTATAACGACTTTCTC CTGGGAGATAAGTCATCI'TG CATG AAGG TTCT'ATG TCATGATC'TCCTTC CC CCAGAAAAAAGACATCGGACTGGTCA GGACTTCCTCTGGGGATAAGACCCTGGAG AGGAGCTTTGTACATG TTC'CGAATT1S CA CATGGGG A1'TTGAC CTCTATCTCCACTCCAMAGTTGTGGGTC TCCTATTG CTTCAAAAATAA
ATGAATAGGGGGAG ;TGAGGAGCCGCCCTCGCTATlCGCTAT'TGCCGTI'AGAAC'.AT('GCC:TCTGACATCCAiTGTAT
612 0 6 24 0 6 36 0 6 480 6 60 0 6 72 0 6 84 0 6 96 0 7 080 7 20 0 7 32 0 7 440O 7 56 0
ORITACGTijGCGTGATMTT6jG768
TCICAGTAGAGTTAACTGTCggCTCAGTTGGAGTTA§t CGTT-1'AAA('1Y; 1*t:( 'Cll('l'CCcG TG ACC:TC'A TCI'GGGCTG ATGCTC.CAAAC CAC~CCGG CTC1'ACACATCTCACTCGGAG
7 80 0
CTCCTAsGGC123GGGCTCGG TT1'Z-1 TAG CTTGGGG 1'Cl''t C ('1'(1'A)\CAAGAACTG A|'(A TCI''AT'CTC1'CtAAWGATCC