Complete Nucleotide Sequence of Wild-Type ... - Journal of Virology

3 downloads 103 Views 2MB Size Report
Jun 27, 1986 - sequence was compared with that of a cell culture-adapted HAV strain (R. Najarian, D. Caput, W. Gee, S. J.. Potter, A. Renard, J. Merryweather, ...
Vol. 61, No. 1

JOURNAL OF VIROLOGY, Jan. 1987, p. 50-59

0022-538X/87/010050-10$02.00/0 Copyright © 1987, American Society for Microbiology

Complete Nucleotide Sequence of Wild-Type Hepatitis A Virus: Comparison with Different Strains of Hepatitis A Virus and Other Picornaviruses JEFFREY I. COHEN,* JOHN R. TICEHURST, ROBERT H. PURCELL, ALICIA BUCKLER-WHITE, AND BAHIGE M. BAROUDYt Laboratory of Infectious Diseases, National Institute of Allergy and Infectious Diseases, Bethesda, Maryland 20892 Received 27 June 1986/Accepted 1 October 1986

The complete nucleotide sequence of wild-type hepatitis A virus (HAV) HM-175 was determined. The sequence was compared with that of a cell culture-adapted HAV strain (R. Najarian, D. Caput, W. Gee, S. J. Potter, A. Renard, J. Merryweather, G. V. Nest, and D. Dina, Proc. Natl. Acad. Sci. USA 82:2627-2631, 1985). Both strains have a genome length of 7,478 nucleotides followed by a poly(A) tail, and both encode a polyprotein of 2,227 amino acids. Sequence comparison showed 624 nucleotide differences (91.7% identity) but only 34 amino acid differences (98.5% identity). All of the dipeptide cleavage sites mapped in this study were conserved between the two strains. The sequences of these two HAV strains were compared with the partial sequences of three other HAV strains. Most amino acid differences were located in the capsid region, especially in VP1. Whereas changes in amino acids were localized to certain portions of the genome, nucleotide differences occurred randomly throughout the genome. The most extensive nucleotide homology between the strains was in the 5' noncoding region (96% identity for cell culture-adapted strains versus wild type; >99% identity among cell culture-adapted strains). HAV proteins are less homologous with those of any other picornavirus than the latter proteins are when compared with each other. When the sequences of wild-type and cell culture-adapted HAV strains are compared, the nucleotide differences in the 5' noncoding region and the amino acid differences in the capsid region suggest areas that may contain markers for cell culture adaptation and for attenuation.

purified from the livers of marmosets with acute hepatitis and had never been in cell culture (42). We determined the complete nucleotide sequence of wild-type HM-175 HAV, portions of which were previously reported (2, 42). We compared this sequence to those from cell culture-adapted HAV strains as well as to sequences from other picor-

Hepatitis A virus (HAV) is an RNA virus belonging to the picornavirus family. In 1983, about 21,500 cases of hepatitis A were reported in the United States, accounting for about 38% of all reported hepatitis cases (4). The true incidence of hepatitis A is thought to be much higher because most cases are not reported. Hepatitis A is endemic in developing countries, where virtually entire populations are infected during childhood (13). HAV also causes hepatitis in chimpanzees and certain new world monkeys. The virus has been propagated in several human and primate cell lines, including Alexander hepatoma cells, human diploid fibroblasts, and monkey kidney cells (34). Wild-type virus generally grows poorly in cell culture, but after several passages, the virus adapts to growth in vitro, resulting in higher titers of progeny virus and shorter replicative cycles. It is unknown what changes occur in the HAV genome during adaptation to cell culture. However, after only 10 passages of wild-type HAV HM-175 virus in monkey kidney cells, the virus becomes partially attenuated for chimpanzees (9). Recently, several groups have reported the nucleotide sequences of different strains of HAV (2, 23, 28, 30, 42, 44). These strains were isolated from hepatitis outbreaks of diverse geographic origin. Four of the HAV strains (23, 28, 30, 44) were adapted to growth in cell culture before molecular cloning. Strain HM-175 was isolated from an outbreak in Australia (14) and subsequently passaged three times in marmosets. The HM-175 virus used for cDNA cloning was

naviruses. MATERIALS AND METHODS Cloning. Molecular cloning of cDNA representing greater than 99% of HAV HM-175 was described previously (42). The 5' end of the genome was cloned by primer extension (20, 35). Virion RNA template was extracted from HAV purified from marmoset liver. A primer was prepared from cDNA clone pHAVLB113 (42) by successive digestion with restriction enzymes NciI and NcoI. A 223-nucleotide fragment (HAV nucleotides 45 to 268) was isolated and annealed to HAV RNA; cDNA was synthesized with reverse transcriptase (160 U/ml), and the RNA was subsequently hydrolyzed (42). A homopolymer tail of dCMP was added to the cDNA by terminal deoxynucleotidyl transferase (42), and the second strand of cDNA was synthesized by using the large fragment of Escherichia coli DNA polymerase I with oligo(dG) as the primer (20, 35). The double-stranded DNA was separated on a 2% low-melting-point agarose gel, and DNA fragments from 200 to 500 nucleotides in length were isolated. The selected DNA was tailed with dCMP by deoxynucleotidyl transferase, annealed to oligo(dG)-tailed pBR322, and used to transform E. coli HB101 (42). Recombinant plasmids were screened for the presence of Hinfl (base 28) and NcoI (base 45) restriction sites. Plasmids containing both sites were characterized with additional

* Corresponding author. t Present address: Division of Molecular Virology and Immunology, Georgetown University Medical Center, Rockville, MD 20852.

50

NUCLEOTIDE SEQUENCE OF WILD-TYPE HEPATITIS A VIRUS

VOL. 61, 1987

restriction enzymes, and nucleotide sequences were determined for several clones thought to contain the 5' terminus. One clone (of more than 300 evaluated), pHAVL5375, extended to the 5' end of the genome, and its 5' terminal sequence is similar to that of another strain of HAV (28). Sequence determination. A portion of the nucleotide sequence (4,886 nucleotides) has been previously published (2). Additional sequence was determined both from labeled DNA fragments by the procedure of Maxam and Gilbert (26) and directly from plasmid DNA by the method of Zagursky et al. using reverse transcriptase with oligonucleotide primers and dideoxynucleotide triphosphates (48). The entire genomic sequence of HAV HM-175 was determined by the method of Zagursky et al., and 95% was also determined by the procedure of Maxam and Gilbert. A total of 95% of the sequence was obtained on both strands, and the remainder was obtained from multiple determinations on one strand. Computer analysis. Sequences were analyzed by using a VAX 11/750 computer. Comparisons of HAV nucleotide sequences were made with the SEQH program (11). Secondary structures and free energies for the 5' terminus (bases 1 to 750) of HAV RNAs were predicted with the folding programs of Zuker and Steigler (49). Putative peptide cleavage sites for HAV were identified by the alignment of amino acid sequences from HAV with other picornaviruses. The SEQHP program (11) was used to align sequences with standard parameters (deletion penalty-8) or by reducing the deletion penalty to 6. Graphic matrix analysis (25) was performed with a window size of 25 and a minimum score of 15. Amino acid sequences surrounding putative dipeptide cleavage sites were compared by using the RELATE program (6), and regions with the highest scores from the mutation data matrix (37) were identified. The two picornaviruses with the highest homology to HAV near each cleavage site were determined. Sequences of 100 amino acids from each of the three viruses were aligned at each site by using a program for comparing the amino acid sequences (27) with a gap penalty of 8. RESULTS

Sequence of HAV HM-175. The complete nucleotide sequence of HAV HM-175 is shown in Fig. 1. The genome is 7,478 nucleotides long followed by a poly(A) tail and encodes a polyprotein of 2,227 amino acids. A 5' noncoding region (734 bases) precedes a single long open reading frame (6,681 bases), which is followed by a 3' noncoding region (63 bases). Either methionine codon (bases 735 to 737, 741 to 743) may initiate translation; both are surrounded by several of the consensus nucleotides preferred by eucaryotic ribosomes for initiation of protein synthesis (19). The longest open reading frame initiated by an AUG codon in the 5' noncoding region is only 60 nucleotides long (bases 673 to

732). The 5' noncoding region contains two pyrimidine-rich tracts. The first region, near the 5' terminus (bases 99 to 138), has a 95% pyrimidine content. The second tract (bases 712 to 720) lies immediately before the initiation codon. On the basis of limited amino acid homology with other picornaviruses, we identified putative posttranslational cleavage sites for the polyprotein encoded by HAV (Fig. 1). The exact location of the VP3/VP1 site and approximate location of the VP2/VP3 site have been determined by direct amino acid sequence data from VP1 and VP3 (23). Locations of HAV VP4/VP2, VP3/VP1, 3A/3B, and 3B/3C cleavage

51

sites have been mapped previously by using amino acid homology with other picornaviruses (2, 45). Amino acid alignments for HAV with other picornaviruses that were used to locate the remaining putative cleavage sites (VP1/2A, 2A/2B, 2B/2C, 2C/3A, and 3C/3D) are shown in Fig. 2A through E. Assignment of the VP1/2A cleavage site is also supported by data from antibodies to synthetic peptides representing sequences surrounding the site. Antibody to a peptide located proximal to the proposed site reacted with VP1; antibody to a peptide located distal to the site failed to react with VP1 by Western blot analysis (C. Wheeler, personal communication). Although a few of the peptide cleavage sites that we predicted differ markedly from those of Najarian et al. (28), other groups have assigned many cleavage sites which are similar to ours (7; A. C. Palmenberg, in D. J. Rowlands, B. W. J. Mahy, and M. Mayo, ed., Molecular Biology of Positive-Strand RNA Viruses, in press). The molecular weights of VP1, VP2, and VP3 predicted from the sequence data (33,200, 24,800, and 27,800) are similar to those determined by Western blot analysis with antibodies to synthetic peptides (33,000, 27,000, and 29,000 [47]). However, the predicted molecular weight of VP4 (2,500) differs from that determined by biophysical methods (14,000 [5]). The nucleotide composition of HAV HM-175 has a very low G+C content (38%), less than that of any of the picornaviruses that have been sequenced (Palmenberg, in press). The 5' noncoding region of HAV has a much higher G+C content (47%) than does the rest of the genome. The low G+C content of HAV RNA is also reflected in the codon selection pattern used for translation (data not shown). The C-G dinucleotide content of HAV (0.52%) is lower than that expected on a random basis (3.5%) considering its base composition. A lower than expected C-G content is also seen in polioviruses (43) and in eucaryotic mRNA (12). Comparison with other strains of HAV. (i) Overall genome. The complete nucleotide sequence of one other HAV strain has been previously reported (designated here as strain LA; isolated in Los Angeles, Calif. [28]). Unlike HAV HM-175, strain LA was isolated from an outbreak in the Western hemisphere and was adapted to growth in cell culture before cDNA cloning. A comparison of the nucleotide and predicted amino acid sequences of the two strains is shown in Table 1. Both strains have a genome length of 7,478 nucleotides; compared with strain LA, HM-175 has one additional nucleotide in the 5' noncoding region but one less nucleotide in the 3' noncoding region. Sequence comparison shows 624 nucleotide differences (91.7% identity) but only 34 amino acid differences (98.5% identity). Of the 589 nucleotide differences in the coding region, 511 (87%) are in the third position of codons, 61 (10%) are in the first position, 9 (2%) are in the second position, and 8 are due to insertions or deletions of nucleotides. The small number of changes in the first and second codon positions reflects the low number of amino acid differences between the two strains. When the two strains were compared for the presence of 100 restriction endonuclease sites present in HM-175 cDNA, only 56 of the 100 sites were present in strain LA. (ii) 5' Noncoding region. The complete 5' noncoding region sequence was determined for strains HM-175 and LA. The nucleotide sequence of this region is the most conserved portion of the two genomes (96% identity). Most of the changes (79%) are nucleotide transitions. A possible secondary structure for the 5' terminus of the HAV RNA for both strains is shown in Fig. 3. The nucleotide differences be-

COHEN ET AL.

52

J. VIROL.

TC^GAGGGCTCGGGATTCCGGGTCcTCTGG^GTCATGTGAGGGGACTTGATACCTCACCGCCGTTTGCCTAGGCTATAGGCTAAATTTrTCCCTTTCCCTTTTcCCCTTTC

r

150

200

CTATTCCCTTTGTTTTGCTTGTAAATATTAATTCCTGCAGGTTCAGGGTTCTTAAATCTGTTTCTCTATAAACACTCATTTTTCACGCTTTCTGTCTTCTTTCTTCCAGGGCTCTCCC 250

300

350

CTTGCCCTAGGCTCTGGCCGTTGCGCCCGGCCGGGGTCAACTCCATGATTAGCATGGAGCTGTAGGAGTCTAAATTGGGGACACAGATGTTTGGAACGTCACCTTGCAGTGTTAACTTGGC 400

450

TTTCATGAATCTCTTTGATCTTCCACAAGGGGTAGGCTACGGGTGAAACCTCTTAGGCTAMATACTTCTATGAMAGAGATGCCTTGGATAGGGTAACAGCGGCGGATATTGGTGAGTTGTTA 500

550

600

AGACAAAAACCATTCAACGCCGGAGGACTGACTCTCATCCAGTGGATGCATTGAGTGGATTGACTGTCAGGGCTGTCTTTAGGCTTAMATTCCAGACCTCTCTGTGCTTAGGGCAAACATC 650

700

ATTTGGCCTTAAMATGGGATTCTGTGAGAGGGGATCCCTCCATTGACAGCTGGACTGTTCTTTGGGGCCTTATGTGGTGTTTGCCTCTGAGGTACTCAGGGGCATTTAGGTTTTTCCTCAT < VP4

730

IA (nt 735-803 / oa 1-23 / 23 oa)

-

VP4 >< VP2 * 18 (nt 804-1469 / aa 24-245 /

TCTTAAATAATAATGAACATGTCTAGACAAGGTATTTTCCAGACTGTTGGGAGTGGTCTTGACCACATCCTGTCTTTGGCAGACATTGAGGAAGAGCAAATGATTCAATCAGTTGATAGG M N M S R O G I F Q T

V G

S

222 aa)

G

L

D H

I

L SLA D

I

E

E

E

M I

900

S

V

D

R

950

ACTGCAGTGACTGGTGCTTCTTATTTTACTTCTGTGGATCAMATCTTCAGTTCATACAGCTGAGGTTGGATCACACCAGGTTGAACCTTTGAGAACCTCTGTTGATAAACCCGGTTCAAAG Q Q T

A

V

T

G

A

S

Y

F

T

S

V

D

S

S

V

H

T

A

E

V

G

S

H

V

E P

1000

L

R

T

S V

D

K

P G

S

K

1050

AAGACTCAGGGAGAGAAATTTTTCTTGATTCATTCTGCAGATTGGCTTACTACACATGCTCTTTTCCATGAMAGTTGCAAAATTGGATGTGGTGAAATTATTATACAMATGAGCAGTTTGCT K t O G E K F F L I H S A D W L T T H A L F H .E V A K L D V V K L L Y N E O F A 1100

1150

1200

GTTCAAGGGTTGTTGAGATACCATACATATGCAAGATTTGGCATTGAAATTCAAGTTCAGATAAACCCTACACCTTTCCAACAGGGGGGATTGATCTGTGCTATGGTTCCTGGTGACCAG V O

G

L

L

R

Y

H

T

Y

A R

F

G

I

E

I

Q V O I N

P

T

P F Q Q G G

L

1250

I

C

A M

V P G D O

1300

AGCTATGGTTCTATAGCATCATTGACTGTTTATCCTCATGGTTTGTTAAATTGCAMATATTAACAATGTGGTTAGAMAT MAGGTTCCATTTATTTACACAMAGAGGTGCTTACCACTTTAAA S Y G S I A S L T V Y P H G L L N C N I N N V V R I K V P F I Y T R G A Y H F K 1350

1400

GATCCACAATACCCAGTTTGGGAATTGACAMATTAGAGTTTGGTCAGAMATTAAATATTGGGACAGG,I\CTTCAGCTTATACTTCACTCAMATGTTTTAGCTAGATTTACAGATTTGGAGTTG D

P O

Y

P

V W E

L

T

I

R

V

W S

E

L

N

I

G

T

G

T

S A Y T S

L

N

V

L A R

F

T

D

L

E

L

1450 VP2 >< VP3 - IC (nt 1470-2207 / a0 246-491 / 246 aa) 1550 CATGGATTAACTCCTCTTTCTACACAAATGATGAGAMAATGAATTTAGGGTCAGTACTACTGAGAATGTGGTGAATCTGTCAAATTATGAAGATGCAAGAGCAAAGATGTCTTTTGCTTTG H G L T P L S T O M M R N E F R V S T T E N V V N L S N Y E D A R A K M S F A L

1600 1650 GATCAGGAAGATTGGAATCTGATCCGTCCCAGGGTGGTGGGATCAATTACTCATTTTACTACTTGGACATCTATTCCAACTTTGGCTGCTCAGT TTCCAtTTTATGCTTCAGACTCA D O E D W K S D P S G GG I K I T H F T T W T S I P T L A A O F P F N A S D S 1700 1750 1800 GT TGGTCAACAATTAAGTTATTCCAGTTGACCCATATTTTTTCCAAATGACAATACGAATCCTGACCAAATGTAT AACTGCTTTGGCTTCTATTTGTCAGATGTTTTGTT TTTGG V G OQOI K V I P V D P Y F F O M T N T N P D O K C I T A L A S I C O M F C F W

1850 1900 AGAGGAGATC TTGTCTTTGATT TTCAAGTTTT TCCCACCAATATCATTCAGGTAGATTACTGTTTTGTTTTGTTCCTGGCAATGAGCTAATAGATGTTTCTGGAATCACATTAAGCAA R G D L V F D F O V f P T K Y H S G R L L f C f V P G N E L I D V S G I t L K 0

1950 20"0 GCAACTACTGCTCC TTGTGCAGTAATGGATATTACAGGAGTGCAGTCAACTTTGAGATTTCGTGTTCCCTGGATT TCTGACACTCCT TACAGAGTGAACAGGTATACAAAGTCAGCACAT A T T A P C A V M D I T G V O S T L R F R V P W I S D T P Y R V N R Y T K S A H 2100 2050 2150 CAGAAAGGTGAGTACACTGCCATTGGGAAGCTTATTGTGTATTGTTATAACAGATTGACCTCTCCTTCTAACGT TGCTTCCCATGTCAGAGTGAATGTTTATCTTTCAGCAATTAACTTG O K G E Y T A I G K L I V Y C Y N R L T S P S N V A S H V R V N V Y L S A I N L FIG. 1. Complete nucleotide and predicted amino acid sequence of cDNA from wild-type HAV HM-175. Nucleotides at positions 6208, 6282, 6299, and 6301 have been corrected from the previously published sequence of this region (2). The nucleotides were sequenced from multiple determinations on complementary strands. The nucleotide at position 871 is C in pHAVLB39 and T in pHAVLB113 (changes to Ile codon); the nucleotide at position 2196 is G in pHAVLB39 and A in pHAVLB1307 (changes to Ile codon); the nucleotide at position 6216 is

T in pHAVLB93 and C in pHAVLB24 (no change in agnino acid). Asterisks indicate stop codons.

tween the strains reside in the loops, resulting in a similar free energy for both structures. Compared with other picornaviruses, the 5' terminal hairpin of HAV is much longer, resulting in a higher free energy (V. Rivera, personal

communication). A large portion (498 nucleotides) of the 5' noncoding region sequence has been determined for HAV CR-326 (23). When sequences representing the two cell culture-adapted strains are compared, there is over 99% identity in the 5' noncoding region sequences. Furthermore, both strains have identical sequences in the last 280 nucleotides of this region. A small portion (188 nucleotides) of the 5' noncoding region sequence has also been determined for a strain of HAV

reported to cause cytopathic changes in cell culture (44). The sequence of the latter strain resembles the other three strains for the first 159 nucleotides; however, there is no detectable homology for the remaining 29 nucleotides. (iii) Coding region. The nucleotide sequence of the entire capsid region of three strains (HM-175, LA, and CR-326) and 95% of the capsid region of a fourth strain (HAS-15 [30]) has been determined. The amino acid differences for the capsid region of the four strains are shown in Fig. 4. Strain HM-175 most closely resembles the consensus sequence for the capsid region (five different amino acids; 99.4% identity). The two most closely related strains in the capsid region are HM-175 and CR-326 (98.6% identity). The two most diver-

53

NUCLEOTIDE SEQUENCE OF WILD-TYPE HEPATITIS A VIRUS

VOL. 61, 1987 2200

1D (nt 2208-3107 / oo 492-791 / 300 oo)

VP3 >< VP1

GAATGTTrTTGCTCCTCTTTATCATGCTATGGATGTTACTACACAAGTTGGAGATGATTCTGGAGGTTTTTCAACAACAGTTTCTACAGAACAGAATGTTCCAGATCCCCAAGTTGGTATA V G D D S G G F S T T V S T E O N V P D P O V G 1 E C F A P L Y H A M D V T TT 2400

2350

2300

2400GTCC

ACAACCATGAAAGATTTGAAAGGAAAAGCTAACAGAGGGAAAATGGATGTTTCAGGAGTACAAGCACCTGTGGGAGCTATCACAACAATTGAGGATCCAGTTTTAGC T T M K D L K G K A N R G K M D V S G V O A P V G A I T T I E D P V L A

K K V P

2500

2450

GAGACATTTCCTGAATTGAAACCTGGAGAATCCAGACATACATCAGATCATATGTCCATCTACAAGTTTATGGGAAGGTCTCATTTCTTGTGCACTTTTACATTCAATTCAATAATAAA S I Y K F G R S H F L C T F T F N S N N K E T F P E K P G E S R H T S D H L

M

M

2600

2550

GAGTACACATTTCCTATAACCTTGTCTTCAACCTCTAATCCTCCTCATGGTTTGCCATCAACACTGAGGTGGTTTTTCAACTTGTTTCAGTTGTATAGAGGGCCTTTAGATCTGACAATT O I E Y T F P

I

T

L

S S T S N P P H G L P S T

L

R W F F N L F

L Y R G P L D L

2750

2700

2650

T

ATTATTACAGGAGCAACTGATGTAGATGGCATGGCCTGGTTCACTCCAGTAGGTCTTGCCGTTGATACTCCTTGGGTAGAGAAGGAGTCAGCTTTrGTCTATTGACT2750 I I T G A T D V D G M A W F T P V G L A V D T P W V E K E S A L S I D Y K

TCT

T A L

2850

2800

GGAGCTGTCAGATTTAACACAAGGAGAACAGGGAACATTCAGATTAGATTACCATGGTATTCTTATTTATATGCTGTGTCTGGAGCACTGGATGGTTTGGGTGACAAGACAGATTCTACA G A V R F N T R R T G N I Q I R L P W Y S Y L Y A V S G A L D G

2900

L

G

D

K

T

D

S

T

3000

2950

TTTGGATTGGTTTCTATTCAGATTGCAATTACAATCATTCTGATGAATACTTGTCTTTTAGTTGTTATTTGTCTGTCAICAGAACAATCAGAGTTTTATTTTCCCAGAGCTCCATTG0A0 fG L V S I O I A N Y N H S D E Y L S F S C Y L S V T E O S E F Y F P R A P L N 3100 VPI >< 2A 3050 TCAAATGCCATGTTATCCACTGAATCAATGATGAGCAGAATTGCAGCTGGAGACTTGGAGTCATCAGTGGATGATCCTAGATCAGAGGAAGATAAGATTTGAGAGTCATATAGAATGC S N A M L S T E S M M S R

I

A AG

D

L

E

S V D

S

P R

D

S E E D K R F E S H

I

E C

3200 (nt 3108-3674 / oa 792-980 / 189 ao) AGGAAGCCATATAAAGAACTGAGATTAGAAGTTGGGAACAAGACTCAAGTATGCTCAGGAAGAATTGTCAATGAAGTACTTCCACCCCCTAGGAATGAAGGGACTGTTTTCACAA R K P Y K

E L R L E V G K O R L K Y A O

3250

E E

L S N E V

L P P P R K M K

G

L F

O

S

3350

3300

GCCAAAATTTCTCTTTTTTATACTGAGGAGCATGAATAATGAAGTTTTCCTGGAGAGGTGTGACTGCTGATACTAGAGCTTTAAGGAGGTTTGGATTCTCTTTGGCCGCAGGCAGAAGT A

K

I

S L F Y

T

E

E

H

E

I

M K

F S W R G V T A D T R A

L R R F G F S

L A A G

R

S

3450

3400

GTGTGGACTCTTGAATGGATGCTGGGGTTCTTACTGGGAGACTGATTAGATTGAATGATGAGAATGGACAGAATGAAGGATGACAAGATTGTTTCATTGATTGAAGTTTACAAGT V

W

T

L

E M D A G V L T G R L

I

R

L N D

E K W T

E M K D D

K

I

V S

L

I

E K

F T S

3600

3550

3500

AACAAATATTGGTCCAAGTGAATTTCCCACATGGGATGTTGGATCTTGAAGAATTGCTGCCAATTCTAAGGATTTTCCTAACATGTCTGAACGGATTTGTGTTTCTTGCTGCATTGG N

K

Y

W S

K

V

N

F

P

H

G M L D

L

E

E

I

A A N S K

F P N M S

D

E

T

D

L C

F

L

L H W

2A >< 28 (nt 3675-3995 / 0a 981-1087 / 107 oo) 3650 TTAAATCCAAGAATTA ATTTAGCAGATAG AATGCTTGGATTGTCTGGAGTTCAGGAAATTAAAGACAGGTGTTGGATTAATAGCAGAGTGTAGAACTTTCTTAGATTCTATTGCT L N P K K I N L A D R M L G L S G V Q E I K E Q G V G L I A E C R T F L D S I A 3750

3800

GGAACTTTAAAATCTATGATGTTTGGATTTCATCATTCTGTGACTGTTGAAATTATAAACACTGTGCTCTGTTTTGTTAAGAGTGGAATTTTGCTTTATGTAATACAACAATTGAATCAG G

T

L

K

S M M F G F H H S V T V

E

I

I

N

T

V

L C

F

V

K

S G

I

L

L

Y

V

I

OO

L

N O

3950

3900

3850

GATGAACATTCTCACATAATTGGTTTGTTGAGAGTCATGAATTATGCAGATATTGGTTGTTCAGTTATTTCATGTGGCAAAGTTTTTTCCAAAATGCTGGAAACAGTCTTTAATTGGCAA F W D

E

H

S

H

I

I

G

L

L

R

V

M

N

Y A

D

I

G

C

S V

I

S

C

G

K

V

F

S

K M L

E

T

V

N

O

4050

28 >< 2C (nt 3996-5000 / o0 1088-1422 / 335 oo)

ATGGACTCCAGAATGATGGAGTTAAGGACTCAGAGTTTTTCCAACTGGTTAAGAGATATTTGTTCTGGGATCACCATTTTTAAAAACTTCAAGGATGCAATTTATTGGCTTTATACAAAA M D

S

R

M

M

E

L

R

T

O

S

F

S

N W

L

R

D

I

C

S

G

I

T

I

F

K

N

F

K

D

A

I

Y W L

Y

K

4200

4150

4100

T

TTAAAGGACTTTTATGAAGTGAATTATGGCAAGAAGAAGGACATTTTAAATATTCTTAAAGATAACCAACAAxAAAATAGAGAAAGCCATTGAGGAAGCCGATGAATTTTGCATTTTGCAA L

K

D

F

Y E

V

N

Y

G

K

K

K

D

I

L

N

I

L

K

D

N

O

O

K

I

E

K

A

I

E

E A D

E F

C

I

L O

FIG. 1.-Continued.

gent strains, LA and HAS-15 (97.3% identity), have prominent differences from the consensus sequence. HAS-15 has a deletion of six amino acids in VP1 (positions 26 to 31); this area was confirmed by nucleotide sequence from two different plasmids (30). When strain LA is compared with the sequences of the three other strains, it apparently has three clusters of frameshift mutations (VP1 amino acid positions 49 to 56, 168, and 229 to 232) resulting in the addition of an amino acid (between positions 54 and 55) and subsequent deletion of an amino acid (position 168). VP1 has the most amino acid diversity of the capsid proteins. In the capsid region of the four HAV strains, there are 10 amino acid differences from the consensus sequence that result in amino acids of different charges; 8 of the 10

amino acids are located in VP1. All of the amino acid insertions and deletions occur in VP1. Proteins 2A, 3B, and 3C have identical amino acid sequences in HAV HM-175 and LA. When the complete amino acid sequences of strains HM-175 and LA are compared, there are six amino acid differences resulting in amino acids of different charge in the P1 region; however, there are only two with different charges in the P2 region and two with different charges in the P3 region. Although the P1 region is the site of several amino acid insertions or deletions among different strains, there are no apparent amino acid insertions or deletions in the P2 or P3 region. Comparison of the amino acid sequences of strains HM-175 and LA shows that all of the dipeptide cleavage

COHEN ET AL.

54

J. VIROL. 4250

4300

ATCCAAGATGTGGAAAAATTTGAACAGTATCAGAAAGGGGTTGACTTGATACAAAAATTGAGAACTGTTCATTCAATGGr,TCAGGTTGATCCAAATTTAATGGTTCATTTGTCACCTTTG I O D V E K F E O Y O K G V D L I O K L R T V H S M A O V D P N L M V H L S P L 4350

4400

AGAGATTGTATAGCAAGAGTTCATCAGAAACTTAAAAACCTTGGATCTATAAATCAGGCAATGGTAACGAGATGTGAGCCAGTTGTTTGTTATTTATATGGCAAAGAGGGGGAGGAAAG R D C I A R V H O K L K N L G S I N O A M V T R C E P V V C Y L Y G K R G G G K 4450

4550

4500

AGCTTAACATCAATTGCATTGGCAACCAAAATTTGTAAACATTATGGTGTTGAGCCTGAAAAGAATATCTATACT^AACCTGTGGCTTCAGATTACTGGGATGGATATAGTGGACAATTA S L T S I A L A T K I C K H Y G V E P E K N I Y T K P V A S D Y W D G Y S G Q L 4600

4650

GTTTGCATCATTGATGATATTGGCCAAAACACAACAGATGAGGATTGGTCAGATTTTTGTCAGTTAGTGTCAGGATGTCCAATGAGATTAAACATGGCCTCTCTTGAGGAGAAGGGTAGG V C I I D D I G O N T T D E D W S D F C O L V S G C P M R L N M A S L E E K G R 4700

4750

4800

CATTTTTCTTCTCCTTTTATAATAGCAACTTCAAATTGGTCAAATCCAAGTCCAAAAACAGTTTATGTTAAGGAAGCAATTGACCGCAGACTCCATTTCAAGGTTGAAGTTAAACCTGCT H F S S P F I A T S N W S N P S P K T V Y V K E A I D R R L H F K V E V K P A I

4900

4850

TCATTTTTCAAAAATCCTCACAATGATATGTTGAATGTTAATTTAGCTAAAACAAATGATGCAMATCAAAGATATGTCTTGTGTTGATTTGATAATGGATGGACATAATGTTTCATTGATG S F F K N P H M N D

L

N

V

N

L A

K

D

T N

I

A

K

M S

D

4950

C

V

D

L

D

I 'M

V

N

H

G

L

S

M

2C >< 3A (nt 5001-5222 / 00 1423-1496 / 74 aa)

GATTTGCTCAGTTCTTTAGTCATGACAGTTGAAATTAGAACAACATGACTG AATTCATGGAGTTGTGGTCTCAGGGAATTTCAGATGATGATAATGATAGTGCAGTAGCTGAGTTT D

L

L S S

L V M T V

E

I

R K O N M T

F M E

E

L W

S O G

I

S D D D N D S A V A E F

5150 5050 5100 TTCCAGTCTTTTCCATCTGGTGAACCATCGAACTCTAATTATCTGGCTTTTTCCAATCTGTTACTAATCACAAGTGGGTTGCTGTGGGAGCTGCAGTTGGCATTCTTGGAGTGCTCGTT F O

S F P S G

E P S N S K L S G F F O S V T N H K W V A V GA

5200

3A >< 38

- VPg

A

V G

L G V

I

L V

(nt 5223-5291 / 00 1497-1519 / 23 oc)

GGAGGATGGTTTGTGTATAAGCATTTCTCCCGCAAGAGGAGGAACCAATCCCAGCTGAAGGGGTATATCATGGTGTAACTAAGCCCAAGCAAGTGATTAATTAGATGCAGATCCAGTA G G W

F

V

Y

K

H

F

S R K

E

E

E P

I

P A

E G V Y H G V T K P K O V

I

K

D A D P V

L

(nt 5292-5948 / aa 1520-1738

36 >< 3C

5350 5400 / 219 co) GAATCTCAGTCAACTTTGGAAATAGCAGGACTGGTTAGGAAGAACTTGGTTCAGTTTGGAGTTGGAGAGAGAATGGATGTGTGAGATGGGTTATGAATGCCTTGGGAGTGAMAGATGAT E S O S

T

L

E

I

A

G

L V R K

N

L V O f G V G

E K N G C V R W V M N A

5450

L G V K D D

5500

TGGCTGCTTGTGCCTTCCCATGCTTATAAATTTGAGAAAGATTATGAAATGATGGAGTTTTATTTTAATAGAGGTGGAACTTACTATTCAATTTCAGCTGGTAATGTTGTTATTCAATCT

W"

L

L

V

P

S

H

A

Y

K

F

E K

D Y

E M M

E

F Y

F N

R G G

T

Y

Y

S

I

S A G N V V

I

O S

5600

5550

TTGGATGTGGGATTCCAGGATGTTGTTCTGATGAAGGTTCCTACAATTCCTAAGTTTAGAGATATTACTCAGCATTTTATTAAGAAAGGGGATGTGCCTAGAGCTTTGAATCGCCTGGCA L D V G

F

O D V V

L M K V

P

T

I

P K

5650

F R D

I

T

Q H F

I

K K G D V P R A

5700

L N R

L

A

5750

ACATTAGTGACACTGTVNATGGLCCCCTATGTTATTTCTGAGGGCCCACTLMAGATGGAGAGAGCTACTTATGTTCATYGVAAATGAGGTAC VCAGTTGATTT VCTGTG A M T V T

L

V

T

T

V

N

G

T

P

MI

L

I

S

E G P

E

L K

T

E K

Y

V

H

K

K

N

D G

T

T

V D

L

5850

5800

GATCAGGCATGGAGAGGAAGGCGAAGGTCTTCCTGGAATGTGTGGTGGGCCTTGGTTTCATCGAATCAATCTATACAGAATGCAATCTTGGATCCATGTTGCTGGAGGAAATCA D O A

WR

G K G E G

L P G

MC

G G

AL V S

S N O S

I

O N A I

L G I H V A G G N

S

6000 3C >< 3D (nt 5949-7415 / 0o 1739-2227 / 489 ca) 5900 AT TCTTGTTGCAAATTGGT TACTCAAGAAATGTTCCAAAATATTGATAAGAAAATTGAAGTCAGAGAATTATGAAAGTGGAGTTTACTCAGTGTTCAATGAATGTGGTCTCCAAAACG I L V A K L V T O E M F O N I D K K I E S O R I M K V E F T O C S M N V V S K T

6100

6050

CTTTTTAGAAGAGTCCCATTTATCATCACATTGATAACCATGGATTATTTrCCTGCAGCTATGCCCTTTCTAArc CTGAAATTGATCCAATGGCTGTGATGTTTTAArGTATTCA L

F

R

K

S

P

I

Y

H

H

I

D

K T M

I

N

F P A A M P

F

S

K

A

E

I

D

P M A

V M

L

S

K

Y

S

6200

6150

TTACCTATTGTAGAAGAACCAGAGGATTATAAAGAGGCTTCAATTTTTTATCA^AAATbAAAAAGTGGGTAAGACTCAGTTAGTTGATGATTrTTTAGATCTTGATATGGCCATTACAGGG L

P

I

V

E

E

P

E

D

Y

K

E

A

S

I

F

Y O N

K

I

V

G

K

T

O

L

V

D

D

F

L

D

L

D M

A

I

T

G

FIG. 1.-Continued.

sites proposed above are conserved. The areas surrounding these cleavage sites are also identical, except for one amino acid at the 1D/2A junction (Fig. 4, VP1 amino acid 297). (iv) 3' Noncoding region. The complete sequence of the 3' noncoding region was determined for strains HM-175 and LA. The nucleotide sequences of the two strains are most divergent in the 3' noncoding region (88.9% identity). Both strains have a stop codon (UGA) followed six nucleotides later by a second stop codon (UAA). Comparison with other picornaviruses. HAV shows less homology with other picornaviruses than the non-HAV picornaviruses show with each other. HAV is most homologous with encephalomyocarditis virus (EMCV); proteins 2C and 3C have 28 and 25% amino aci-d identity between the two

picornaviruses.

Part of the amino acid sequence from the carboxy portion of protein 2C for HAV and other picornaviruses is shown in Fig. 2G. If analogous with poliovirus protein 2C (41), HAV protein 2C may be involved in transcription. The carboxy portion of 2C is highly conserved among different picornaviruses (1). The amino acid at position 198 (Fig. 2G, arrow) has been shown to correlate with guanidine sensitivity for several picornaviruses (33). Poliovirus and rhinovirus type 14 (asparagine at position 198) are inhibited by guanidine; however, EMCV, foot-and-mouth disease virus, and HAV (glycine at position 198) are guanidine resistant. Protein 3C is a protease in poliovirus (15) and in EMCV (32). Two amino acids in 3C, cysteine and histidine (Fig. 2E, asterisks), are thought to be reactive residues of the functional site for the protease (1). These two amino acids

6350

6300

6250

GCCCCAGGAATTGATGCTATCAACATGGATTCATCTCCTGGATTTCCTTATGTCCAGGAGAAG A

P

G

55

NUCLEOTIDE SEQUENCE OF WILD-TYPE HEPATITIS A VIRUS

VOL. 61, 1987

I

D

A

I

N

M D

S

S

P

G F P

Y

V Q

E

K

TrGACCAAGAGArTTTAATTGGTTGGATGAAAGGTTTATTGCTGGGAGTTCAT

L T

K

R

D

L

I

W

L

D

E

N G

L

L

L

G V

H

6450 6400 CCAAGAT TGGCTCAGAGAATCT TAT TCAATACTGTCATGATGGAATTGTTCTGATT TGGATGTTGTT TTTACAACCTGTCCAAGATGAATTGAGACCATTAGAGAAGTGTTGGAA P R L A O R I L F N T V M M E N C S D L D V V F T T C P K D E L R P L E K V L E 6600 6550 6500 TCAAAAACAAGAGCTAT TGATGCTTGTCCTCTGGATTACTCAATTTTGTGCCGAATGTATTGGGGTCCAGCTATTAGTTAT TTTCATTTGAATCCAGGTTTCCATACAGGTGTTGCTATT S K T R A I D A C P L D Y S I L C R M Y W G P A I S Y F H L N P G F H T G V A I 6700 665e GGCATAGATCCTGATAGACAGTGGGATGAATTATTTAAACAATGATuAAGATCGGAGATGTTGGTCTTGATTTAGATTTCTCTGCTTTTGATGCTAGTCTTAGTCCAT TTATGATTAGA G I D P D R O W D E L F K T M I R F G D V G L D L D F S A F D A S L S P F M I R

6800 6750 GAAGCAGGTAGAATCATGAGTGAACTATCTGGAACTCCATCCCATTTTGGCACAGCTCTTATCAATACTATCATTTATTCCAAGCATTTGCTGTATAACTGTTGTTACCATGTCTGTGGT E A G R I M S E L S G T P S H F G T A L I N T I I Y S K H L L Y N C C Y H V C G 6950 6900 6850 TCAATGCCCTCTGGGTCTCCT TGTACAGCTTTGCTAAATTCAATTATTAATAATGTCAATTTGTATTATGTGTTTTCCAAGATATTTGGAAGTCTCCAGTTTTCTTTTGTCAGGCTTTG S M P S G S P C T A L L N S I I N N V N L Y Y V F S K I F G K S P V F F C O A L

7050 7000 AAGATTCTCTGTTATGGAGATGATGTTTTAATAGTTTTCTCTCGAGATGTTCAGATTGATAATCTTGATTTGATTGGACAAAAAATTGTAGATGAGTTTAAGAAACTTGGCATGACAGCT K

I

L

C

Y

G

D

D V

L

I

V F S

R

D V O I D N

L

D

L

I

G Q K

I

V

D E F K K

L G M

T A

7200 7150 7100 ACTTCTGCTGACAAGAATGTACCTCAGCTGAAACCAGTTTCGGAATTGACTTTTCTCAAAAGATCTTTC MATTTGGTAGAGGATAGAMATTAGACCTGCAMATTTCGGAAAAAACAMATTTGG T S A D K N V P Q L K P V S E L T F L K R S F N L V E D R I R P A I S E K T I W 7300

7250

TCTTTAATAGCATGGCAGAGAAGTAACGCTGAGTTTGAGCAGAATTTAGAAAATGCTCAGTGGTTTGCTTTTATGCATGGCTATGAGTTTTATCAGAAATTTTATTATTTTGTTCAGTCC S

L

I

A

W

O

R

S

N A E F E

O

N

L

E

N A O W F A F M H

G

Y

E F Y Q

K F

Y

Y

F

V Q

S

3D >

7400

7350

TGTTTGGAGAAAGAGATGATAGAATACAGACTTAAATCTTATGATTGGTGGAGAATGAGATTTTATGACCAGTGTTTCATTTGTGACCTTTCATGATTTGTTTAAACAAATTTTCTTAAA C

L

E

7450

K

E M

I

E

Y

R

L

K

S

Y

D W W

R M R

F

Y

D Q C

F

I

C

D

L

S

7478

ATTTCTGAGGTTTGTTTATTTCTTTTATCAGTAAATAAAAAAAAAAAAAAA FIG. l.-Continued. conserved among several picornaviruses, including HAV. Protein 3D of HAV has 29% amino acid identity with 3D of poliovirus type 1 (Mahoney). Protein 3D has been identified as an RNA-dependent polymerase in poliovirus (24). HAV, like other picornaviruses, has a 14-amino-acid region (Fig. 2F) consisting of two aspartate residues surrounded by hydrophobic amino acids. This region is thought to be an active site or recognition site of RNA polymerases (17). Of the 10 proposed dipeptide cleavage sites used by HAV, 4 are shared with EMCV (VP4/VP2, 2A/2B, 2B/2C, and 2C/3A). Other picornaviruses have two or fewer dipeptide cleavage sites shared with HAV. are

DISCUSSION Sequence variation among different HAV strains has been examined previously by using ribonuclease T1 oligonucleotide mapping. With this procedure, the estimated variation in nucleotide sequences ranges from 0.8 to 10% among different strains (46). In contrast, comparison of cDNAs from different HAV strains shows only 50 to 60% conservation of restriction endonuclease sites. When the complete nucleotide sequences of two strains (HM-175 and LA) are compared, there are 624 nucleotide differences (91.7% identity) and 34 amino acid differences (98.5% identity). Thus, restriction site mapping overestimated nucleotide variation, whereas oligonucleotide mapping more closely reflected actual sequence data. Nucleotide sequences of poliovirus types 1 and 3 have been determined for both vaccine and parent wild-type strains (18, 29, 35, 40, 43). For poliovirus type 2, sequences have been reported for a cell culture-adapted (vaccine) strain

and a mouse-adapted (neurovirulent) strain (21, 43). The numbers of nucleotide differences between wild-type and vaccine strains of types 1, 2, and 3 are 57, 1308, and 10, respectively. The numbers of amino acid differences between the wild-type and vaccine strains of poliovirus types 1, 2, and 3 are 21, 83, and 3, respectively. Thus, the number of amino acid differences between the two HAV strains studied is similar to the number of amino acid differences between wild-type and attenuated variants of the same poliovirus strains. The majority of amino acid differences between HAV HM-175 and LA occur in the capsid region. Within this region, the highest variation is seen in VP1. When the capsid regions from all four HAV strains are compared, about half (15 of 31) of the amino acid differences are located in the N terminus of VP1. A similar pattern of variation is seen for poliovirus types 1 and 2 (21, 29). When the wild-type and vaccine strains of these viruses are compared, most of the amino acid differences are located in the capsid region, and about half of the amino acid differences in the capsid region are clustered in the N terminus of VP1 (5 of 12 amino acids for poliovirus type 1 and 15 of 22 for type 2). Presumably, different environmental influences (including adaptation to cell culture) may select for amino acid changes in VP1, the immunodominant protein of the virion. Two groups have attempted to locate antigenic sites in the capsid region of HAV. Emini et al. (8) proposed the location of three HAV antigenic sites by comparing surface probability profiles of HAV with those of poliovirus. A synthetic peptide corresponding to one of three sites induces antiHAV neutralizing antibody. All four of the HAV strains described above have identical amino acid sequences in the three sites. Palmenberg (personal communication) has

COHEN ET AL.

56 A

J. VIROL.

VP1 / 2A

EMCV

PRPT

VFFPWPTSGDKIDMTPR

HAV

PRAP

LNSNAMLSTE

RV2

PRPPRALEYTRAHRTN

C

SMMSR

AGVLMLESPNALDISRTGTL

HVLIOFNHRGLEV

IAAGDLESSVDDPRSEEDKRFZHIECRKPYKELRLEV

FKIEDRSIOTAIVTRPIITTAGPSDhM

28/ 2C

SLPLKQVNDIFSLAKNLDW

HVG NLIYRNLHL

D

2C / 3A

/ 2B

B

2A

EMCV

RP RKO

HAV

AD

RV14

ADIROLEC

TZ:AA RMLGLSGVOEIKEZ G VFO

EMCV

PTISLFQ

EMCV

ISLVDVIERAVARIERKKKVLTTVOTLVA

Suggest Documents