Nucleotide Sequence of a Cloned Duck Hepatitis ... - Journal of Virology

3 downloads 0 Views 2MB Size Report
ELISABETH MANDART, ALAN KAY, AND FRANCIS GALIBERT*. Laboratoire d'Hematologie Experimentale, Centre Hayem, Hopital Saint-Louis, 75475Paris ...
JOURNAL OF VIROLOGY, Mar. 1984, p. 782-792 0022-538X/84/030782-11$02.00/0 Copyright © 1984, American Society for Microbiology

Vol. 49, No. 3

Nucleotide Sequence of a Cloned Duck Hepatitis B Virus Genome: Comparison with Woodchuck and Human Hepatitis B Virus Sequences ELISABETH MANDART, ALAN KAY, AND FRANCIS GALIBERT* Laboratoire d'Hematologie Experimentale, Centre Hayem, Hopital Saint-Louis, 75475 Paris Ce'dex 10, France Received 16 June 1983/Accepted 6 October 1983

The nucleotide sequence of an EcoRI duck hepatitis B virus (DHBV) clone was elucidated by using the Maxam and Gilbert method. This sequence, which is 3,021 nucleotides long, was compared with the two previously analyzed hepatitis B-like viruses (human and woodchuck). From this comparison, it was shown that DHBV is derived from an ancestor common to the two others but has a slightly different genomic organization. There was no intergenic region between genes 5 and 8, which were fused into a single open reading frame in DHBV. Genes for the surface and core proteins were assigned to open reading frames 7 and 5/8. Amino acid comparisons showed some structural relationship between gene 6 product and avian reverse transcriptase, suggesting either evolution from a common ancestor or convergence to some particular structure to fulfill a specific function. This should be correlated with the synthesis of an RNA intermediate during DNA replication. This is also taken as an argument in favor of the hypothesis that gene 6 codes for the DNA polymerase that is found within the virion. DNA sequence comparison also showed that the two mammalian hepatitis B viruses are more homologous to each other than they are to DHBV, indicating that DHBV starts to evolve on its own earlier than the two other viruses, as do birds compared with mammals. From this it is proposed that the viruses evolved in a fashion parallel to the species they infect.

Duck hepatitis B virus (DHBV), which was recently

MATERIALS AND METHODS Enzymes and chemicals. Restriction endonucleases came from New England Biolabs and were used as recommended by the manufacturer. DNA polymerase I canie from Boehringer Mannheim Biochemicals, and bacterial alkaline phosphatase and polynucleotide kinase were from P. L. Biochemicals. Chemicals used for nucleotide sequence analysis were as described, previously (14). [-y-32P]ATP (specific activity, >2,500 Ci/mmol) and ot32P-labeled nucleotide triphosphate (specific activity, >3,000 Ci/mmol) were from New England Nuclear Corp. Preparation of Eco and Xho DHBV DNAs. X-DHBV recombinants were constructed and given to us by W. S. Mason et al. The cloned DNAs were referred to as Eco and Xho DHBV DNAs. Propagation and purification of the recombinants, as well as preparation of the DNAs, were performed as previously described (3, 10, 14). Containment. Containment conditions were as recommended by the French National Control Committee. The culture of recombinant bacteriophage was done under L3B1 conditions. DNA nucleotide sequence. Sequence analyses were determined by the Maxam and Gilbert method (19). Usually, ca. 10 pmol of Eco DHBV DNA (20 Rg) was fully digested each time with a given restriction enzyme. Fragments were dephosphorylated and labeled with [y-32P]ATP and polynucleotide kinase as described previously (14). To separate the two labeled ends, fragments were denatured by heating to 920C in the presence of 30% dimethyl sulfoxide and fractionated by electrophoresis in acrylamide gel (19). Fragments larger than 600 base pairs were hydrolyzed with another restriction enzyme. Under some circumstances, fragments with a recessed 3' end were labeled with an ot32P-labeled nucleotide triphosphate of choice and DNA polymerase I as described by Hartley and Donelson (13).

isolated (18), is the fourth member of a new viral family called Hepadnaviridae, whose prototype is human hepatitis B virus (HBV) (24). The four members of this growing family have several characteristics in common, such as ultrastructire, antigenic makeup, DNA size, and structure. Similarities in the pathological field have been described as well (11, 17, 33, 34). In the past few years, by molecular cloning (4, 5) and nucleotide sequence analysis (3, 8-10, 35, 36), our knowledge of the biology of these viruses has increased. Two genes, one coding for the coat protein and the other coding for the core protein, have been identified on the genomes of HBV and woodchuck hepatitis B virus (WHV). Two other possible coding regions have been mapped, but their function and their products have not, as yet, been determined. In spite of our growing knowledge about the structure of various viral components, the biological study of these viruses is still complicated by the absence of cell cultures susceptible to infection. Some of these complications have been overcome by the discovery of DHBV (18), which is not only able to infect ducks, an animal easily kept in colony, but can also infect and multiply in embryonic eggs. These findings caused us to undertake the elucidation of the nucleotide sequence of the DHBV genome. During the course of this study, the usefulness of the duck-DHBV model was proven by the finding of Summers and Mason (31), who, using an in vitro system, found evidence implicating an RNA intermediate in DHBV DNA replication. In this paper, we report the complete nucleotide sequence of the genome of DHBV and compare its primary structure with that of the HBV and WHV genomes. * Corresponding author. 782

VOL. 49. 1984

DHBV GENOME NUCLEOTIDE SEQUENCE

I

r

2 3

IJ!2

rt 1.,

4

5 6

isons

1 ;

r j

iJ

f-J

W

FIG. 1. Diagram of analyzed DNA fragments. Vertical bars correspond to the positions of the labeled ends of restriction fragments used. Length of the arrows is relative to the number of analyzed nucleotides. Row 1, HaeIII; row 2. Hinfl; row 3. Aliil; row 4, Sau3a; row 5, RsaI; row 6, Mspl + BgllI. Sau3a fragments are labeled at their 3' ends; the others are labeled at their 5' ends.

RESULTS AND DISCUSSION The complete nucleotide sequence of the EcoRI DHBV DNA cloned in Escheric-hia coli was determined by the method of Maxam and Gilbert (19), using five different chemical reactions giving specific bands for G, AG, CT, C, and AC. The sequence was derived from a large number of overlapping fragments so that both strands were entirely and independently analyzed, and all starting restriction sites were analyzed within an overlapping fragment (Fig. 1). For reasons explained below, the EcoRI site used for cloning was also overlapped by using an XlzoI DHBV clone, eliminating the risk of the loss of a small DNA fragment located between two putative Ec oRI sites within the DHBV genome. The DHBV sequence shown in Fig. 2 is 3,021 nucleotides long, compared with 3,182 for the HBV DNA and 3,308 for the WHV DNA. Restriction enzyme data obtained from virion DNA enabled us to define which strand was the L strand and which was the S strand in both HBV and WHV (9, 10). We do not have such data for DHBV, but the position of the open reading frames as well as the nucleotide sequence homologies found within these three genomes indicate that the nucleotide sequence presented in Fig. 2 is complementary to the L strand, also called the minus strand by Summers and Mason (31). Location of the open reading frames. The number and distribution of stop codons within the L strand is such that, as previously noticed with HBV and WHV, it is difficult to locate a gene which could be transcribed from the S strand (Fig. 3). Only one region spanning from nucleotide 1397 to

nucleotide 835 could have

a

substantial coding capacity.

However, (i) the first in-phase ATG in this region (open reading frame 1) is toward the end, at position 904. very close to the stop codon; (ii) there is no obvious acceptor site at the beginning of open reading frame 1 to which an exon containing an ATG could be spliced; and (iii) neither HBV nor WHV has corresponding open reading frames which could give a protein homologous to the putative protein made from open reading frames 1 of DHBV. These suggest the absence of a coding function for the S strand, as previously observed for the HBV and WHV genomes (9, 10). On the

the number and distribution of stop open several regions, indicating that mRNA could arise by transcription of the L strand. However, only three open reading frames are displayed along the DHBV genome instead of the four previously noticed for the HBV and WHV genomes. These three regions span from nucleotides 14 to 2527, 684 to 1784, and 2515 to 411. Their relative positions (Fig. 4) as well as contrary,

codons in the S strand leave

information derived from nucleotide

aidd amino acid compar-

783

of HBV and WHV sequences indicate that (i) the

largest open reading frame (nucleotides 14 to 2527) corresponds to gene 6 of HBV; (ii) the open reading frame which is overlapped by region 6 and goes from nucleotides 684 to 1784 corresponds to gene 7, also called S (for surface protein): and (iii) regions 5 and 8 found in HBV and WHV genomes are fused into a single region in DHBV, as suggested by the size and position of the third open reading frame

(nucleotides 2515 to 411). Therefore, we call it region 5/8. Nucleotide sequence comparisons. By using a computer program developed by Staden (29), the nucleotide sequence of the DHBV genome was compared with those of the HBV and WHV genomes. Although nucleotide sequence comparisons show a large degree of homology between HBV and WHV, 62 to 70% along the genomes except in two small regions (9), a much smaller level of homology was observed with the DHBV sequence. Figure 5 shows several graphs which demonstrate this result. For the largest part of the DNA sequence, the degree of homology was below 40%. Four regions located between nucleotides 100 and 200, 300 and 500. 600 and 800, and 1300 and 2100 were more conserved and had about 50% homology with the homologous regions found in the HBV and WHV genomes. The region between nucleotides 1300 and 1400 had the homology (70%). This highly conserved sequence is highest part of regions 6 and 7, which could be read in two different frames. This remarkably high degree of homology is probably due to the product of gene 6, which is slightly more conserved than the product of gene 7, as explained below. As for HBV, viruses found in nature vary from one to another (2, 28, 32). The two DHBV clones, cloned through the EcoRI or XlioI site, were exactly the same size as demonstrated by their comparative electrophoretic mobility but exhibited a slightly different restriction pattern. In agreement with this observation, the nucleotide sequence we determined around the EcoRI site of the X/lo DHBV clone diverged from that of the Eco DHBV clone to an extent of ca. 3%. None of these nucleotide changes alters the coding capacity of this se-

quence.

Amino acid sequence comparisons. (i) Region 6. In the three viruses, this region covered ca. 80% of the genome. In DHBV it went from nucleotides 14 to 2528. From the first ATG (residue 20) up to the TAA stop codon (residue 2528), a protein of 836 amino acids can be predicted, as compared with 879 and 838 for the corresponding WHV and HBV proteins (9, 10). These proteins have not been identified so

far, but the

size of region 6 leaves little doubt that it does have a coding function. A comparison of the predicted amino acid sequences of these three proteins clearly shows that they are related to each other (Fig. 6 and 7). Although the DNA sequences allow the gene 6 of DHBV and HBV to have nearly identical lengths products (836 amino acids, compared with 838), it is difficult to align these two proteins starting from the first methionine in each region. If regions of amino acid homology are paired, then the best coincidence is obtained by assuming that the DHBV region 6 protein starts at the second in-phase ATG. This would reduce the DHBV protein 6 to a size of 786 amino acids. Supporting that hypothesis, the first in-phase ATG of region 6 is preceded at position -3 by a pyrimidine, whereas the second in-phase ATG is preceded at position -3 by a purine

(16). Because of its

size, which is comparable to various polymerases, it has been proposed that the DNA polymerase found within the virion would be coded by gene 6 (10). An experiment performed by Summers and Mason (31) has

784

MANDART, KAY, AND GALIBERT

J. VIROL.

revealed the existence of an RNA transcript complementary to the L strand during the process of DNA replication. We

therefore interested in comparing the amino acid predicted from the large open regions (region 6) of the three known hepatitis viruses with several viral encoded DNA polymerases or reverse transcriptases. The amino acid comparisons were made with a computer program established by B. Caudron (personal communication) which is able to detect small stretches of 20 amino acids with homology above 25%. With this program, the amino acid sequences predicted from genes 6 were scored against the amino acid sequences of the DNA polymerase encoded in the adenovirus 2 genome (1, 12) and the amino acid sequences of the avian and murine reverse transcriptases (25, 27). Several stretches of amino acids with homologies were

sequence

over 25% were found in all comparisons. However, homologies between adenovirus 2 DNA polymerase and the hepatitis B virus region 6 products generally involved only one of the viruses at a time, and the positions of the various homologies indicated no particular pattern. On the contrary, when the reverse transcriptases were compared with the three hepatitis B virus region 6 products, at least one set of amino acid sequence homologies appeared to give a coherent pattern. These sequences are indicated in Table 1. When DHBV and HBV gene 6 proteins and Rous sarcoma virus reverse transcriptase (RSVRT) are aligned by pairing this set of sequence homologies, other smaller stretches of homology become apparent (Fig. 7). (The same is also true when WHV region 6 protein is included.) Also, in a significant number of cases in which there was an amino acid difference

Start of region 6 (frame 2) CATGCTCATTTGAAAGCTTATGCAAAAATTAACGAGGAATCACTGGATAGGGCTAGGAGATTGCTTTGGT 71

GGCATTACAACTGTTTACTGTGGGGAGAAGCTCAAGTTACTAACTATATTTCTCGTTTGCGTACTTGGTT

141

GTCAACTCCTGAGAAATATAGAGGTAGAGATGCCCCGACCATTGAAGCAATCACTAGACCAATCCAGGTG

211

GCTCAGGGAGGCAGAAAAACAACTACGGGTACTAGAAAACCTCGTGGACTCGAACCTAGAAGAAGAAAAG

281

TTAAAACCACAGTTGTCTATGGGAGAAGACGTTCAAAGTCCCGGGAAAGGAGAGCCCCTACACCCCAACG

351

End of the fused 5 + 8 region-1 TGCGGGCTCCCCTCTCCCACGTAGTTCGAGCAGCCACCATAGATCTCCCTCGCCTAGGAAATAAATTACC

421

TGCTAGGCATCACTTAGGTAAATTGTCAGGACTATATCAAATGAAGGGCTGTACTTTTAACCCAGAATGG

491

AAAGTACCAGATATTTCGGATACTCATTTTAATTTAGATGTAGTTAATGAGTGCCCTTCCCGAAATTGGA

561

AATATTTGACTCCAGCCAAATTCTGGCCCAAGAGCATTTCCTACTTTCCTGTCCAGGTAGGGGTTAAACC

631

AAAGTATCCTGACAATGTGATGCAACATGAATCAATAGTAGGTAAATATTTAACCAGGCTCTATGAAGCA

701

GGAATCCTTTATAAGCGGATATCTAAACATTTGGTCACATTTAAAGGTCAGCCTTATAATTGGGAACAGC

771

AACACCTTGTCAATCAACATCACATTTATGATGGGGCAACATCCAGCAAAATCAATGGACGTCAGACGGA

841

TAGAAGGAGGAGAAATACTGTTAAACCAACTTGCCGGAAGGATGATCCCAAAAGGGACTTTGACATGGTC

911

AGGCAAGTTTCCAACACTAGATCACGTGTTAGACCATGTGCAAACAATGGAGGAGATAAACACCCTCCAG

981

AATCAGGGAGCTTGGCCTGCTGGGGCGGGAAGGAGAGTAGGATTATCAAATCCGACTCCTCAAGAGATTC

1051

CTCAGCCCCAGTGGACTCCCGAGGAAGACCAAAAAGCACGCGAAGCTTTTCGCCGTTATCAAGAAGAAAG

1121

ACCACCGGAAACCACCACCATTCCTCCGTCTTCCCCTCCTCAGTGGAAGCTACAACCCGGGGACGATCCA

1191

CTCCTGGGAAATCAGTCTCTCCTCGAGACTCATCCGCTATACCAGTCAGAACCAGCGGTGCCAGTGATAA

1261

AAACTCCCCCCTTGAAGAAGAAAATGTCTGGTACCTTCGGGGGAATACTAGCTGGCCTAATCGGATTACT

1331

GGTAAGCTTTTTCTTGTTGATAAAAATTCTAGAAATACTGAGGAGGCTAGATTGGTGGTGGATTTCTCTC

1401

AGTTCTCCAAAGGGAAAAATGCAATGCGCTTTCCAAGATACTGGAGCCCAAATCTCTCCACATTACGTAG

1471

GATCTTGCCCGTGGGGATGCCCAGGATTTCTTTGGACCTATCTCAGGCTTTTTATCATCTTCCTCTTAAT

Start of region 7 (frame 3)

FIG. 2

9

DHBV GENOME NUCLEOTIDE SEQUENCE

VOL. 49, 1984

between the DHBV 6 protein and RSVRT, the HBV 6 proteins and RSVRT were identical. Finally, if one takes into account that several of the observed differences correspond to amino acids of the same family, involving a socalled conservative change (6), then one can suggest that the homologies observed between the hepatitis gene 6 proteins and the avian reverse transcriptase did not occur only by chance. However, more sophisticated computer work is needed to establish more definitely what kind of structural relationship exists between region 6 proteins and reverse transcriptases and whether this is due to evolution from a common ancestor or to convergence to fulfill a similar function. (ii) Region 7. This open reading frame went from nucleotide 684 in frame 3 to stop codon TAG 1785. It has been

785

shown that the homologous HBV open reading frame codes for the surface protein, and according to DNA sequence homology and amino acid sequence homology, an identical result has been inferred for WHV. As we have already pointed out, significant DNA homology between DHBV and the other two viruses was found in this region. This suggests that the open region 7 codes for the duck viral surface (DHBs) protein. Due to the presence of numerous ATGs in the reading frame, translation could start at several different places. From the first encountered ATG at position 693, a protein of 364 amino acids can be predicted. By using different sources of data, such as the molecular weight and the known amino acid sequence of the N-terminal portion of the human viral surface (HBs) protein, it has been deduced that the N-methionine of the HBs protein is not coded by the

1541

CCTGCTAGTAGCAGCAGGCTTGCTGTATCTGACGGACAACGGGTCTACTATTTTAGGAAAGCTCCAATGG

1611

GCGTCGGTCTCAGCCCTTTTCTCCTCCATCTCTTCACTACTGCCCTCGGATCCGAAATCTCTCGTCGCTT

1681

TAACGTTTGGACTTTCACTTATATGGATGACTTCCTCCTCTGCCACCCAAACGCTCGTCACCTTAACGCA

17 51

End of region 7 ; ATTAGCCACGCTGTCTGCTCTTTTTTACAAGAGTTAGGAATAAGAATAAACTTTG ACAAAACCACGCCTT

1821

CTCCGGTGAATGAAATAAGATTCCTCGGTTACCAGATTGATGAAAATTTCATGAAGATTG AAGAAAGCAG

1891

ATGGAAAGAATTAAGGACTGTAATCAAGAAAATAAAAGTAGGAGAATGGTATGACTGGAAATGTATTCAA

1901

AGATTTGTCGGGCATTTGAATTTTGTTTTGCCTTTTACTAAAGGTAATATTGAAATGTTAAAACCAATGT

2031

ATGCTGCTATTACTAACCAAGTAAACTTTAGCTTCTCTTCATCCTATAGGACTTTGTTATATAAACTAAC

2101

AATGGGTGTGTGTAAATTAAGAATAAAGCCAAAGTCCTCTGTACCTTTGCCACGTGTTAGCTACAGATGCT

2171

ACCCCAACACATGGCGCAATATCCCATATCACCGGCGGGAGCGCAGTGTTTGCTTTTTCAAAGGTCAGAC

2241

ATATACATGTTCAGGAACTATTGATGTCTTGTTTAGCCAAGATAATGATTAAACCACGTTGTCTCTTATC

2311

TGATTCAACTTTTGTTTGCCATAAGCGTTATCAGACGTTACCATGGCATTTTGCTATGTTGGCCAAACAA

2381

TTGCTCAAACCGATACAATTGTACTTTGTCCCGAGCAAATATAATCCTGCTGACGGCCCATCCAGGCACA

2451

AACCTCCTGATTGGACGGCTTTTCCATACACCCCTCTCTCGAAAGCAATATATATTCCACATAGGCTATG

2521

TGGAACTTAAGAATTACACCCCTCTCCTTCGGAGCTGCTTGCCAAGGTATCTTTACGTCTACATTGCTGT

2591

TGTCGTGTGTGACTGTACCTTTGGTATGTACCATTGTTTATCATTCTTGCTTATATATGGATATCAATGC

2661

TTCTAGAGCCTTAGCCAATGTGTATGATCTACCAGATGATTTCTTTCCAAAAATAGATGATCTTGTTAGA

2731

GATGCTAAAGACGCTTTAGAGCCTTATTGGAAATCAGATTCAATAAAGAAACATGTTTTGATTGCAACTC

2S01

ACTTTGTGGATCTCATTGAAGACTTCTGCCAGACTACACAGGGCATGCATGAAATAGCCGAATCATTAAG

Start of the fused

>s regions in frame 1

End of region 6

2871 AGCTGTTATACCTCCCACTACTACTCCTGTTCCACCGGGTTATCTTATTCAGCACGAGGAAGCTGAAGAG 2941

ATACCTTTGGGAGATTTATTTAAACACCAAGAAGAAAGGATAGTAAGTTTCCAACCCGACTATCCGATTA

3011

CGGCTAGAATT

FIG. 2. Nucleotide sequence of the Eco DHBV DNA clone. The sequence shown is complementary to the viral L strand.

786

MANDART, KAY, AND GALIBERT

J. VIROL.

Chain S *,

5/8

l

111,

6

2

3

5/8

to

7

Ion III

tl iCO

Chain L 100 1 1

o

"' 1

1

2

11 iI

3

3a *f f J'-

1 1111

p

dkl3l

dPlI

ml II

1

3I3om

A

i,

III

1

11111 1 1 1.. 11-

11

II

1131

11

I

Is

II fill

FIG. 3. Diagram showing the localization of the nonsense codons on chains S and L. Three reading frames were defined from the 5' end of each DNA strand. On chain S, frame 1 is defined by its first triplet CAT, frame 2 is identified by ATG, and frame 3 is identified by TGC. On chain L, frame 1 is defined by its first triplet AAT, frame 2 is identified by ATT, and frame 3 is identified by TTC. The viral DNA is circular, and its length in nucleotides (3,021) is a multiple of 3; therefore, passing through the EcoRI site does not change the reading frame. Upper vertical bars indicate stop codons. Lower vertical bars represent ATG triplets. Numbers 5/8, 6, and 7 define areas in which a viral gene has been located: region 5/8 goes from 2515 to 411; region 6 goes from 14 to 2527; region 7 goes from 684 to 1784.

first in-phase ATG codon but by the third one (10). An identical deduction has been made for the woodchuck viral surface (WHs) protein (9). According to these results, translation of DHBs protein may start with the second or following in-phase ATG. A comparison of the amino acid sequence deduced from the DNA sequence of open reading frame 7 of HBV, WHV, and DHBV showed no homology for the so-called pre-S region. On the other hand, a significant homology existed between the DHBV amino acid sequence starting with ATG 1284 and the N-terminal amino acid sequences of HBs and WHs proteins, suggesting, if not proving, that the N-terminal amino acid sequence for the DHBs protein starts with ATG 1284, the seventh in-phase ATG within open reading frame 7 (Fig. 8). From ATG 1284 up to TAG 1785, a protein of 167 amino acids with a molecular weight of 18,204 can be predicted, as compared with 25,645 and 25,422 for the WHs and HBs proteins, respectively (3, 8, 35). Electrophoretic results, showing an apparent molecular weight of 17,000 for

DHBs protein (Mason et al., personal communication), are in good agreement with the theoretical molecular weight. A comparison of the amino acid sequences of the three surface proteins showed that a fragment of about 50 amino acids, corresponding roughly to position 105 to 155 of the HBs protein sequence, was absent from the DHBs protein (Fig. 9). This deletion was also seen in the gene 6 protein (Fig. 9) and was visible at the DNA level (Fig. 5). The most intriguing point about this deletion is that the main antigenic epitope of the HBs protein was tentatively located in that region by proteolytic digestion, amino acid comparison, and peptide synthesis (7, 8, 22, 23). Several characteristic features of the HBs and WHs proteins have been previously noted, such as the existence of a very large hydrophobic sequence and of a sequence, Asn-X-Thr/Ser, known to be involved in glycosylation (30). The same is true for the DHBs protein (Fig. 9). There is a hydrophobic region limited by amino acids 79 and 97 with a hydrophobic index equal to 3.31 (26). A potential glycosylation site is also located at position 99. The homology observed between the DHBs protein and the two others was much lower (35%) than between the HBs and WHs proteins (61%), and the carboxylic regions showed very little homolo-

gyIt is interesting to note that the DHBs protein was partially encoded by the sequence located between nucleotides 1300 to 1400 in which the largest sequence homology (70%) was observed with the other two genomes. This sequence also coded for the gene 6 protein. This high percentage of homology was mainly due to the gene 6 product, which is more conserved (Fig. 10). Although we cannot formally prove that ATG 1284 codes for the N-terminus of the DHBs protein, it is highly suggested by the amino acid sequence homology observed among the three N-terminal sequences. However, the existence in all three viruses of a large open reading frame preceding the putative N-termini of their surface proteins is most intriguing. Various experiments have located the TATA box and the cap site of the HBs mRNA 150 nucleotides ahead of the first ATG of region 7, suggesting that transcription of the pre-S region does occur (21). Recent experiments by Si mapping of HBs protein mRNA made in transfected mouse cells have detected a protected DNA fragment starting at position 3160 of the HBV genome (29a). Because of the location at nucleotide 2776 of a TATA box, a more probable hypothesis is that HBs mRNA is spliced and that position 3160 corresponds to the 5' end of its mainbody. A consensus

6

5

5

FIG. 4. Localization of the open reading frames on the viral genomes of HBV and WHV and comparison with the DHBV genome. The striped area in region 7 corresponds to the pre-S sequence. Arrows indicate the position of the first ATG found within an open reading frame. Numbers 1 to 8 refer to the various open reading frames as defined in the text and in Galibert et al. (9, 10).

DHBV GENOME NUCLEOTIDE SEQUENCE

VOL. 49, 1984

8- w

D

81

/

//

// / / i

~~~~H

170

970

a080

t890

§ 170 /

H 970

/

8 V09

1890

t890

10910

/

/

1890

787

was lost during cloning through the EcoRI site by sequencing this region on an independent XhoI DHBV clone. Therefore, the fused 5/8 region represents the actual structure of the DHBV genome. Because the nucleotide sequence homology was too poor in this region, comparisons did not allow us to infer which part of the sequence was lost (or acquired) during evolution. A comparison of the amino acid sequence of DHBV region 5/8 with the corresponding amino acid sequences of regions 5 and 8 of HBV and WHV showed a net degree of homology between the carboxylic end of the molecule coded by regions 8 and 5/8. The same peculiar structure, involving repetition and increased amounts of basic amino acids, was observed at the carboxylic end, giving, upon comparison, a characteristic picture (Fig. 12). This clearly suggests that this amino acid sequence corresponds to the core protein. The molecular weight of the DHBV core protein has been estimated by gel electrophoresis to be 35,000 (W. S. Mason and J. Newbolt, personal communication). This would be in good agreement with a protein made with the entire open reading frame from ATG 2518 to TAA 412 for which a molecular weight of 34,986 can be predicted. However, some topological problem may prevent the use of that ATG because of the position of the nick on the minus strand, which was downstream from ATG 2518. If transcription occurs on a nicked genome, then transcription probably starts after the nick, H6

//

/

4 2620

300

2/

2620

// 300

/

FIG. 5. Nucleotide sequence comparisons. Sequences 40 nucleotides long with a homology equal or superior to 50% were scored, and they are indicated by lines whose coordinates correspond to the position of that sequence within the two compared genomes. Lefthand row is a comparison of WHV (W) and HBV (H) genomes; right-hand row is a comparison of DHBV (D) and HBV (H) genomes.

acceptor sequence at position 3171 in the 14BV genome supports this hypothesis. Donor and acceptor consensus

(20) are also found in the WHV and DHBV (Fig. 11). The existence of splice and acceptor sequences at the suggested positions raises the question as to the choice of the initiator ATG. In all three cases, the one defining the Nterminal methionine is not the first encountered ATG, and neither is the first one with a purine at position -3 (16). Another question raised by an eventual splice of HBs protein mRNA is relative to the conservation during evolution of an open reading frame within the pre-S region. A working hypothesis is that another protein coded by the pre-S region alone or by the totality of region 7 is expressed. (iii) Region 5/8. Starting with nucleotide 2515, there was an open reading frame which continued through the EcoRI site up to stop codon TAA 412. Because of its relative position, overlapping the 3' and 5' ends of region 6 with its 5' and 3' ends, respectively, this region looks like a product of fusion of the formerly defined 5 and 8 regions of HBV and WHV (9, 10). We eliminated the possibility that a small DNA fragment sequences sequences

/

/

H6

/

/

/

W6

FIG. 6. Comparison of the amino acid sequences of gene 6 proteins of HBV (H), DHBV (D), and WHV (W). Stretches 30 amino acids long with homology equal or superior to 20% are indicated by a line.

D6 SerThrProGlyLysSerValSer o ArgAspSerSerAlaIleProValArgThrSerGlyAlaSerAspLysAsn RT ThrValAlaLeuHisLeuAlaIleProLeuLysTrpLysProAspHisThrProValTrpIleAspGlnTrpProLeu H6 CysTrpTrp o GlnPheArgAsnSerLysProCysSerAspTyrCysLeuSerLeuIleValAsnLeuLeuGluAsp SerProLeuGluGluGluAsnValTrpTyr o ArgGlyAsnThrSerTrpProAsnArg o ThrGlyLys o PheLeu

ProGlLysLeuValAlau6uThrGlnLeualGluLysGluLeuGlnLeuGlyHisIleGluProProLeuSers

ys

TrpGlyProCysAlaGluHisGlyGluHisHislleArgIleProArgThrProSerArgValThrGlyGIyValPheLeu

ValAspLysAsnSerArgAsnThrGluGlu o ArgLeuValValAspPheSerGlnPheSerLysGlyLys o

o Met

TrpAsnThrProPhePheValIleArgLysAlaSerGlySerTyrArgLeuLeuHisAspLeuArgAlaValAsnAlaLys ValAspLysAsnProHisAsnThrAlaGluSerArgLeuValValAspPheSerGlnPheSer o GlyAsnTyrArgVal ArgPhe o ArgTyrTrpSerProAsnLeuSerThrLeuArgArgIle o o Val o Met o ArgIleSer o o LeuValProPheGlyAlaValGlnGlnGlyAlaProValLeuSerAlaLeuProArgGlyTrpProLeuMetValLeuAsp SerTrp o LysPhe o o ProAsnLeuGlnSerLeuThrAsnLeu o SerSerAsnLeuSerTrpLeuSer o o 0 SerGlnAla o TyrHisLeu o o AsnProAlaSerSerSerArgLeu o ValSerAspGlyGlnArgValTyr LeuLysAspCysPhePheSerIleProLeuAlaGluGlnAspArgGluAlaPheAlaPheThrLeuProSerValAsnAsn ValSerAlaAla o TyrHisLeu o o HisProAlaAlaMetProHisLeuLeuValGlySerSerGlyLeuSerArg

PheArgLysAla o Met o ValGlyLeu o

Tyr

o PheLeu-

LeuHisLeuPhe

GlnAlaProAlaArgArgPheGlnTrpLysValLeuProGlnGlyMetThrCysSerProThrlleCysGInLeuValVal

Tyr//

//PheArgLysIle

54 aa

o Met o ValGlyLeu o

o

PheLeu-LeuAlaGlnPhe

ThrThrAla o GlySerGluIleSerArgArgPheAsnValTrp-ThrPheThr o

o

o

o Phe o

o Cys

GlyGlnValLeuGluProLeuArgLeuLysHisProSerLeuCys-MetLeuHisTyrMetAspAspLeuLeuLeuAla ThrSerAlaIleCysSerValValArgArgAlaPheProHis o LeuAlaPheSer o o o o ValVal o Gly HisProAsnAlaArgHis o Asn o IleSerHisAla o Cys o Phe o GlnGluLeu o IleArg o AsnPhe AlaSerSerHisAspGlyLeuGluAlaAlaGlyGluGluValIleSerThrLeuGluArgAlaGlyPheThrIleSerPro o Lys o ValGlnHis o o SerLeuPheThrAla o ThrAsnPhe o LeuSerLeu o IleHisLeuAsn o o o ThrThrProSer o ValAsnGluIleArgPhe o o o GlnIleAspGluAsnPheMetLysIleGluGlu AspLysValGlnArgGluProGlyValGlnTyr LeuGlyTyrLysLeuGlySerThrTyrValAlaProValGly Asn o ThrLys o TrpGlyTyrSerLeuAsn-PheMet o o ValIle o CysTyrGlySerLeu o GlnGlu

SerArgTrpLys o Leu o ThrVal1leLysLysIleLysValGlyGluTrpTyrAspTrpLysCysIleGlnArgPhe LeuValAla- GluProArgIleAlaThrLeuTrpAspValGlnLysLeuValGlySerLeuGlnTrpLeuArgProAla

HisIltIleGlnLyslleLysGluCysPheArgLysLeuProIleAsnArgProIleAspTrpLysValCysGlnArgIle Val o HisLeuAsnPheValLeuProPheThrLysGlyAsnIleGluMetLeuLys o MetTyr o AlaIleThr o LeuGlylleProProArgLeuMetGlyProPheTyrGluGlnLeuArgGlySerAspProAsnGluAlaArgGluTrpAsn Val o LeuLeuGlyPheAlaAlaProPheThrGlnCysGlyTyrProAlaLeuMet o LeuTyr o CysIleGlnSer GlnValAsnPheSerPheSerSerSerTyrArgThr o LeuTyrLysLeuThrMetGlyValCysLysLeuArgIleLys ThrAlaAlaLeu LeuAspMetLysMetAlaTrpArgGluIleValArgLeuSerThr LysGlnAlaPheThrPheSerProThrTyrLysAlaPheLeuCysLysGlnTyrLeuAsnLeuTyrProVal o ArgGln

ProLysSerSerValPro o 0 ArgValAlaThrAsp o ThrProThrHis o o o Ser -HisIleThrGluArgTrpAspProAlaLeuProLeuGluGlyAlaValAlaArgCysGluGlnGlyAlaIleGly-ValLeuGlyArgProGlyLeuCysGlnValPheAlaAspAlaThrPro-ThrGlyTrp-- o LeuValMetGlyHisGlnArgMet Gly o SerAlaValPheAlaPhe o LysVal o AspIleHisVal o GluLeuLeuMetSerCysLeuAlaLysIle GInGlyLeuPheThrHisProArgSerCysLeuArgLeuPheSerThrGlnProThrLysAlaPheThrAlaTrpLeuGlu Phe o AlaProLeuProlleHis o AlaGluLeuLeu o AlaCysPheAlaArgSer

Arg o Thr

MetileLysProArgCysLeuLeuSerAspSerThrPhe o CysHisLysArgTyrGlnThrLeuProTrpHisPheAla ValLeuThrLeuLeuIleThrLysLeuArgAlaSerAlaValArgThrPheGlyLysGluValAspIleLeuLeuLeuPro ArgSerGlyAlaAsn o IleGlyThrAspAsn o Val o LeuSerArgLysTyrThrSerPheProTrp o o Gly

MetLeuAlaLysGlnLeu

o

LysProIleGlnLeuTyrPheValProSer

o

TyrAsnProAlaAspGlyPro

o

Arg

AlaCysPheArgGluAspLeuProLeuProGluGlyIleLeuLeuAlaLeuLysGlyPheAlaGlyLysIleArgSerSer CysAlaAlaAsnTrpIle HisLys

o

o

ArgGlyThrSerPheValTyrValProSerAlaLeuAsnProAlaAspAspPro

ProAspTrpThrAlaPheProTyrThrProLeu

o

o Arg

LysAlaIleTyrIleProHisArgLeuCysGlyThr-

AspThrProSerIlePheAspIleAlaArgProLeuHisValSerLeuLysValArgValThrAspHisProValProGly GlyArgLeuGlyLeuSerArgProLeuLeuArg

o

ProPheArgProThrThrGlyArg

o

SerLeuTyrAlaAspSer

Stop

ProThrValPheThrAspAlaSerSerSerThrHisLysGlyValValValTrpArgGluGlyProArgTrpGluIle..RT o Ser o ProSerHisLeuProAspArgVal o PheAlaSerProLeuHisValAlaTrpArgProPro Stop H6 FIG. 7. Amino acid sequence comparison between RSVRT (RT, middle line), DHBV gene 6 protein (D6, top line), and HBV gene 6 protein (H6, bottom line). RSVRT is taken as reference. Amino acids in either D6 or H6 that are identical to the corresponding amino acid in RT are represented by an open circle. Amino acids which are not identical but belong to the same family are shaded. The amino acid sequence of D6 starts with the 390th amino acid after the first ATG of the reading frame, and H6 starts with amino acid 312. RT starts with the first amino acid of the mature protein. 788

DHBV GENOME NUCLEOTIDE SEQUENCE

VOL. 49, 1984

789

TABLE 1. Sequence comparison showing a nonapeptide of very similar sequence that is found within the three hepatitis B gene 6 proteins and the two reverse transcriptases Peptide

Protein

position within

Amino acid

sequence

RSVRT 180 Tyr Met Asp Asp a MLVRT 342 Valb WHs 583 DHBs 561 HBs 538 a _, Amino acids identical to those shown for RSVRT. b Amino acids of the same group as the amino acids shown for RSVRT.

and the initiation codon for the 5/8 gene product will then be ATG 2647, which is not favored by the presence of a T at position 2644 (16). On the other hand, the transcriptional template may not have a nick, allowing translation to start with ATG 2518. We found no homology between the N-terminal end of the DHBV 5/8 region and region 5 of HBV or WHV. Therefore, although we believe that the third open reading frame in DHBV does represent a fusion of the 5 and 8 regions, we can draw no firm conclusions. However, it should be noted that there was far less homology between HBV and WHV with region 5 than there was with the rest of the genome. At the present time, no protein coded by region 5 of HBV or WHV has been identified, and no function has been clearly proposed. However, the existence of a gene encoded in this region has been supported by comparative analysis of amino acid and nucleotide sequences (9). Therefore, the question arises whether the function of the protein coded by HBV and WHV region 5 has the same function as the protein coded by region 5/8 of the DHBV genome. In other words, does the protein made by region 5/8 have two roles, one devoted to the core protein and one devoted to gene protein 5, or is region 5/8, either through RNA splicing or protein processing, making two different proteins? Replication origin. The genome of the hepatitis B virus is a noncovalent circle (15). The interruption in the L strand has been assigned in HBV and WHV to the only region devoid of

Leu _

Leu

Leu

Valb

Cys Valb

Ala

Glyb

Phe

Valb

Ala

-

Glyb

His -

coding capacity, between the open reading frames 5 and 8 in the vicinity of a hairpin structure (9). A similar hairpin structure, although different in sequence, was observed in the DHBV genome, starting with nucleotide 2504 and ending with nucleotide 2525. This hairpin was localized at the end of region 6 and overlapped the beginning of region 5/8. It was surrounded by two direct repeats (ACACCCCTCTC) at position 2478 and 2536, which are reminiscent of the short direct repeats found at both ends of retroviruses. Nucleotide sequence comparisons of the HBV, WHV, and DHBV genomes at the molecular level clearly showed that these viruses belong to the same family and were derived from a common ancestor. Whereas HBV and WHV share between 60 to 70% nucleotide sequence homology (9), DHBV showed much less homology (around or below 40%) for a large part of the genome and showed between 50 to 55% homology for the remaining part, except for a small sequence of 100 nucleotides which reached 70%. This indicates that during evolution the three viruses did not separate from each other at the same time, but that DHBV separated earlier from the ancestor of the two others. This is probably in conjunction with the fact that DHBV infects birds and that birds started to evolve separately from mammals 250 million years ago. In turn, this also could indicate that, at least in the case of the hepatitis virus family but possibly for all kinds of viruses, viruses appeared very early during evolution and evolved in a fashion parallel to their target. Unfortunately,

SerLeuLeuGluThrHisProLeuTyrGlnSerGluProAlaValProValIleLysThrPro DHBs 1206 TCTCTCCTCGAGACTCATCCGCTATACCAGTCAGAACCAGCGGTGCCAGTGATAAAAACTCCC HBs 79 GGAACAGTAAACCCTGTTCTGACTACTGCCTCTCCCTTATCGTCAATCTTCTCGAGGATTGGG GlyThrValAsnProValLeuThrThrAla o ProLeuSerSerlIePheSerArgIleGly

ProLeuLysLysLysMMetSerGlyThrPheGlyGlylleLeuAlaGlyLeulieGlyLeuLeu DHBs 1269 CCCTTGAAGAAGAAAATGTCTGGTACCTTCCGGGGAATACTAGCTGGCCTAATCGGATTACTG HBs 142 GACCCTGCGCTGAACATGGAGAACATCACATCAGGATTCCTAGGACCCCTTCTCGTGTTACAG AspProAlaLeuAsn o GluAsnIleThrSer o Phe o GlyPro o LeuVal o Gln

ValSerPhePheLeuLeuIleLyslIeLeuGluIleLeuArgArgLeuAspTrpTrpTrplle DHBs 1332 GTAAGCTTTTTCTTGTTGATAAAAATTCTAGAAATACTGAGGAGGCTAGATTGGTGGTGGATT HBs 205 GCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACT AlaGly o o o o ThrArg o o Thr o ProGinSer o o Ser o o Thr

SerLeuSerSerProLysGlyLysMetGlnCysAlaPheGlnAspThrGlyAlaGInlleSer

DHBs 1395 TCTCTCAGTTCTCCAAAGGGAAAAATGCAATGCGCTTTCCAAGATACTGGAGCCCAAATCTCT HBs 268 TCTCTCAATTTTCTAGGGCGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTCCCCAACC o o AsnPheLeuGly o ThrThrVal o LeuGly o AsnSerGlnSerProThr

FIG. 8. Nucleotide and amino acid sequence comparison around ATG 1285 for DHBs protein and ATG 157 for HBs protein, which probably code for the N-terminal methionine of the surface antigen. Although there is no homology upstream from these ATGs, numerous identical amino acids downstream from these positions can be observed. Shading indicates ATG 1285 and ATG 157.

DHBs HBs

1 MetSerGlyThrPheGlyGlyIleLeuAlaGlyLeuIleGlyLeuLeuValSerPhePheLeuLeuIleLys 1 o GluAsnIleThrSer o Phe o GlyPro o LeuVal o GlnAlaGly o o o o ThrArg

DHBs HBs

25 IleLeuGluIleLeuArgArgLeuAspTrpTrpTrpIleSerLeuSerSerProLysGlyLysMetGlnCys 25 o o Thr o ProGlnSer o o Ser o o Thr o o AsnPheLeuGly o ThrThrVal o

DHBs HBs

49 AlaPheGlnAspThrGlyAlaGlnIleSerProHisTyrValGlySerCysProTrpGlyCysProGlyPhe 49 LeuGly o AsnSerGlnSerProThrSerAsn o SerProThr o o o ProThr o o o Tyr

6

//AspLeuSerGlnAlaPheTyrHisLeuProLeuAsnProAlaSerSerSerArgLeuAlaValSer

s 73 LeuTrpThrTyrLeuArgLeupheIleIlePheLeuLeuIleLeuLeuValAlaAlaGlyLeuLeuTyrLeu /7 ACCTATCTCAGGCTTTTTATCATCTTCCTCTTAATCCTGCTAGTAGCAGCAGGCTTGCTGTATCTG DHBV 1506 *

HBV s

6 6 s DHBV

HBV s

6 HBV s

6

**. ...

.... ..... ........... 5*... X^...............

.. .. ...... ..-

AspGlyGlnArgValTyr 97 ThrAspAsnGlySerThr ACGGACAACGGGTCTACTA

CTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCTCTAATTCCAGGATCCTCAACAACCAGCACGGGACCA 97 LeuLeuAspTyrGlnGlyMetLeuProValCysProLeuIleProGlySerSerThrThrSerThrGlyPro SerSerGlyLeuSerArgTyrValAlaArgLeuSerSerAsnSerArgIleLeuAsnAsnGlnHisGlyThr TGCCGGACCTGCATGACTACTGCTCAAGGAACCTCTATGTATCCCTCCTGTTGCTGTACCAAACCTTCGGAC 97 CysArgThrCysMetThrThrAlaGlnGlyThrSerMetTyrProSerCysCysCysThrLysProSerAsp MetProAspLeuHisAspTyrCysSerArgAsnLeuTyrValSerLeuLeuLeuLeuTyrGlnThrPheGly

6 s DHBV

.

7/ ATGTGTCTGCGGCGTTTTATCATCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTT 379 73 Arg o MetCys o o Arg.o o 0 o o Phe o o o LeuCysLeuIlePhe o LeuVal // o Val o Ala o o o o o o o His o o AlaMetProHis o Leu o Gly

TyrPheArgLysAlaProMetGlyValGlyLeuSer IleLeuGlyLysLeuGlnTrpAlaSerValSerAla TTTTAGGAAAGCTCCAATGGGCGTCGGTCTCAGCC

103

HBV

GGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCCTATGGGAGTGGGCCTCAGCC

s

121 GlyAsnCysThrCysIleProIleProSerSerTrpAlaPhe o o PheLeu o GluTrpAla o o ArgLysLeuHisLeuTyrSerHisProIleIleLeuGly o o o Ile o o o o o o o

6 6

ProPheLeuLeuHisLeuPheThrThrAlaLeuGlySerGluIleSerArgArgPhe-AsnValTrpThr

s

115 LeuPheSerSerIleSerSerLeuLeuProSerAspProLysSerLeuValAlaLeu-ThrPheGlyLeu

DHBV HBV

CTTTTCTCCTCCATCTCTTCACTACTGCCCTCGGATCCGAAATCTCTCGTCGCTTT-AACGTTTGGACTT

s

6 6 s

DHBV HBV s

CGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCCACTGTTTGGCTT 145 Arg o o TrpLeu o Leu o Val o PheValGlnTrpLeuValGlyLeuSerPro o ValTrp o o o o o AlaGin o o Ser 0 IleCys o ValValArg o Ala o ProHisCysLeuAla

PheThr// 138 SerLeuIleTrpMetThrSerSerSerAlaThrGlnThrLeuValThrLeuThrGln-LeuAlaThrLeu TCACTTATATGGATGACTTCCTCCTCTGCCACCCAAACGCTCGTCACCTTAACGCA-ATTAGCCACGCTG i... ..... . .. .. ...

6 s

DHBV HBV s

.. .........

.

.

.

.

.

.

.

.... ...

..

TCAGTTATATGGATGATGTGGTATTGGGGOCCAAGTCTGTACAGCATCTTGAGTCCCTTTTTACCGCTGTTA i69 o ValIleTrpMetMetTrpTyrTrpGlyProSerLeuTyrSerIleLeuSerProPheLeuProLeuLeu 0

Ser//

161

SerAlaLeuPheTyrLysSerTCTGCTCTTTTTTACAAGAGTTAG//

CCAATTTTCTTTTGTCTTTGGGTATACATTTAA// 193 ProIlePhePheCysLeuTrpValTyrIle-

FIG. 9. Amino acid and nucleotide sequence comparisons of gene 7. The position of a deletion within the DHBV genome affecting the surface antigen and gene 6 protein has been determined by comparing the DHBs and HBs protein sequences, the DHBV and HBV protein 6 sequences, and the DNA sequences. To clarify the figure, the gene 6 protein sequences and the DNA sequences are only shown in part, delimited by 11. The DHBs and HBs protein sequences are shown in full. Dots indicate homology between the DNA sequences. An open circle in the HBV protein sequences indicates that the amino acid at that position is the same as the corresponding amino acid in the homologous DHBV protein. s

6

GlyPheLeuGlyProLeuLeuValLeuGinAlaGlyPhePheLeuLeuThrArgIleLeu ArgIleProArgThrProSerArgValThrGlyGLyValPheLeuValAspLysAsnPro

HBV 174 AGGATTCCTAGGACCCCTTCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCT DHBV 1300 GGGAATACTAGCTGGCCTAATCGGATTACTGGTAAGCTTTTTCTTGTTGATAAAAATTCT 6 GlyAsnThrSerTrp o Asn o Ile o o LysLeu o o o o o 0 Ser o Ile o AlaGly o IleGly o LeuValSer o o o o IleLys o o s

ThrIleProGlnSerLeuAspSerTrpTrpThrSerLeuAsnPheLeuGlyGlyThr HisAsnThrSerGluSerArgLeuValValAspPheSerGlnPheSerArgGlyAsnTyr 258 CACAATACCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTA 1384 AGAAATACTGAGGAGGCTAGATTGGTGGTGGATTTCTCTCAGTTCTCCAAAGGGAAAAA o Lys o LysAsn 6 Arg o o Glu o Ala o o o o o o o o s

6

HBV DHBV

Trp o o Ile o o SerSerProLy3 o Lys FIG. 10. Amino acid and nucleotide sequence comparison of the highly conserved sequence located in DHBV between nucleotide 1300 and 1400. As can be seen, the gene 6 proteins are more conserved in this region than the surface antigens. Out of 40 amino acids, 24 are identical for protein 6, whereas only 18 out of 39 are identical in the surface antigen. Symbols are the same as those in Fig. 9. s

Glu

o

LeuArgArg

o

o

790

DHBV GENOME NUCLEOTIDE SEQUENCE

VOL. 49, 1984 Cap HBV

I

Acceptor

Donor

TATATAA

......................

2776

3170/3171

Cap WHY

several TA rich

Donor

TAAAGIGTAAC

A

....

region2949/2s95

Cap DHBV

N-methionine

AAC ACTCATCCTCAG=AUCAC....

II

several TA Irich regionsl

I

Acceptor CTTTTCATCTCCAG 141/142

....

Donor

791

JAT 157

ACAAT1 CCTATGOAC ... .GA\ ATU TCA 1 9,' 193 2(0

Acceptor

CCAGCCTAGG.

....

616/617

TCTTCCCCTCCTCACGGA 1163/1164

Donor consens

CA

Acceptor consensus

(C) IINTAC/C

............................

AAA ArC TCT 1 2'4

A

FIG. 11. Comparison of putative control sequences for the expression of gene 7.

this hypothesis cannot be tested for the moment since HBV, WHV, and DHBV so far represent the only examples of viruses which belong to the same family but infect widely different hosts and whose genomes have been entirely sequenced.

HO

7/w

W8 FIG.

12.

Amino acid sequence

comparison of the

Stretches of 20 amino acids with homology equal indicated by lines.

or

core

antigen.

above 20%

are

ACKNOWLEDGMENTS We are very grateful to B. Masson and J. Summers for helpful discussions and for the gift of the DNA cloned recombinants. This work was supported in part by Institut National de la Sante et de la Recherche Mddicale through grant SC15. LITERATURE CITED 1. Alestrom, P., G. Akusjarvi, U. Pettersson, and M. Pettersson. 1982. DNA sequence analysis of the region encoding the terminal protein and the hypothetical N. Gene product of adenovirus type 2. J. Biol. Chem. 257:13492-13498. 2. Burrel, C. J., P. Mackay, P. J. Greenaway, P. H. Hofschneider, and K. Murray. 1979. Expression in Escherichia coli of hepatitis B virus DNA cloned in plasmid pBR 322. Nature (London) 279:43-47. 3. Charnay, P., E. Mandart, A. Hampe, F. Fitoussi, P. Tiollais, and F. Galibert. 1979. Localization on the viral genome and nucleotide sequence of the gene coding for the two major polypeptides of the hepatitis B surface antigen (HBs Ag). Nucleic Acids Res. 7:335-346. 4. Charnay, P., C. Pourcel, A. Louise, A. Fritsch, and P. Tiollais. 1979. Cloning in Escherichia coli and physical structure of hepatitis B virion DNA. Proc. Natl. Acad. Sci. U.S.A. 76:22222226. 5. Cummings, I. W., J. K. Browne, W. A. Salser, G. V. Tyler, R. L. Snyder, J. M. Smolec, and J. Summers. 1980. Isolation characterization and comparison of recombinant DNAs derived from the human hepatitis B and woodchuck hepatitis virus genome. Proc. Natl. Acad. Sci. U.S.A. 77:1842-1846. 6. Dayhoff, M. O., R. V. Eck, and C. M. Park. 1972. A model of evolutionary change in proteins, p. 89-99. In M. 0. Dayhoff (ed.), Atlas of protein sequence and structure 1972, vol. 5. National Biomedical Research Foundation, Washington, D.C. 7. Dreesman, G. R., Y. Sandrez, I. Ionescu-Matin, J. T. Sparrow, H. R. Six, D. L. Peterson, F. B. Hollinger, and J. L. Melnick. 1982. Antibody to hepatitis B surface antigen after a single inoculation of uncoupled synthetic HBs Ag peptides. Nature (London) 295:158-160. 8. Galibert, F., T. N. Chen, and E. Mandart. 1981. Localization and nucleotide sequence of the genes coding for the woodchuck hepatitis virus surface antigen: comparison with the gene coding

792

9.

10.

11.

12.

13.

14.

15.

16. 17.

18. 19.

20. 21.

22.

23.

MANDART, KAY, AND GALIBERT for the human hepatitis B virus surface antigen. Proc. Natl. Acad. Sci. U.S.A. 78:5315-5319. Galibert, F., T. N. Chen, and E. Mandart. 1982. Nucleotide sequence of a cloned woodchuck hepatitis virus genome: comparison with the hepatitis B virus sequence. J. Virol. 41:51-65. Galibert, F., E. Mandart, F. Fitoussi, P. Charnay, and F. Galibert. 1979. Nucleotide sequence of the hepatitis B virus genome (subtype ayw) cloned in E. coli. Nature (London) 281:646-650. Gerlich, W. H., M. A. Feitelson, P. L. Marion, and W. S. Robinson. 1980. Structural relationships between the surface antigens of ground squirrel hepatitis virus and human hepatitis B virus. J. Virol. 36:787-795. Gingeras, T. R., D. Sciaky, R. E. Gelinas, J. Bing-Dong, C. E. Yen, M. M. Kelly, P. A. Bullock, B. L. Parsons, K. E. O'Neill, and R. J. Roberts. 1982. Nucleotide sequences from the adenovirus-2 genome. J. Biol. Chem. 257:13475-13491. Hartley, J. L., and J. E. Donelson. 1980. Nucleotide sequence of the yeast plasmid. Nature (London) 286:860-864. Herisse, J., G. Courtois, and F. Galibert. 1980. Nucleotide sequence of the EcoRI D fragment of adenovirus 2 genome. Nucleic Acids Res. 8:2173-2191. Hruska, J. F., D. A. Clayton, J. L. R. Rubenstein, and W. S. Robinson. 1977. Structure of hepatitis B Dane particle DNA before and after the Dane particle DNA polymerase reaction. J. Virol. 21:666-672. Kozak, M. 1981. Possible role of flanking nucleotides in recognition of AUG initiator codon by eukaryotic ribosomes. Nucleic Acids Res. 9:5233-5252. Marion, P. L., L. S. Oshiro, D. C. Regnery, G. H. Scullard, and W. S. Robinson. 1980. A virus in Beechey ground squirrels that is related to hepatitis B virus on humans. Proc. Natl. Acad. Sci. U.S.A. 77:2941-2945. Mason, W. S., G. Seal, and J. Summers. 1980. Virus of Pekin ducks with structural and biological relatedness to human hepatitis B virus. J. Virol. 36:829-836. Maxam, A., and W. Gilbert. 1980. Sequencing end labeled DNA with base specific chemical cleavage. Methods Enzymol. 65:499-560. Mount, S. M. 1982. A catalogue of splice junction sequences. Nucleic Acids Res. 10:459-472. Pourcel, C., A. Louise, M. Gervais, N. Chenciner, M.-F. Dubois, and P. Tiollais. 1982. Transcription of the hepatitis B surface antigen gene in mouse cells transformed with cloned viral DNA. J. Virol. 42:100-105. Prince, A. M., H. Ikram, and T. P. Hopp. 1982. Hepatitis B virus vaccine: Identification of HBs Ag/a and HBs Ag/d but not HBs Ag/y subtype antigenic determinants on a synthetic immunogenic peptide. Proc. Natl. Acad. Sci. U.S.A. 79:579-582. Rao, K. R., and G. N. Vyas. 1976. Biochemical characterization of hepatitis B surface antigen in relation to serological activity.

J. VIROL. J. Biol. Stand. 4:295-304. 24. Robinson, W. S., P. L. Marion, M. A. Feitelson, and A. A. Siddiqui. 1981. the hepadna virus group: hepatitis B and related viruses, p. 57-58. In W. Szmuness, H. J. Alter, and J. E. Maynard (ed.), Proceedings of the International Symposium on Viral Hepatitis. Franklin Institute Press, Philadelphia. 25. Schwartz, D., R. Tizard, and W. Gilbert. 1983. Nucleotide sequence of Rous sarcoma virus. Cell 32:853-869. 26. Segrest, J. P., and R. J. Feldmann. 1974. Membrane proteins: amino acid sequence and membrane penetration. J. Mol. Biol. 87:853-858. 27. Shinnick, T. M., R. A. Lerner, and J. G. Sutcliffe. 1981. Nucleotide sequence of Moloney murine leukaemia virus. Nature (London) 293:543-548. 28. Sninsky, J. J., A. Siddiqui, W. S. Robinson, and S. N. Cohen. 1979. Cloning and endonuclease mapping of the hepatitis B viral genome. Nature (London) 279:346-348. 29. Staden, R. 1977. Sequence data handling by computer. Nucleic Acids Res. 4:4037-4051. 29a.Stenlund, A., D. Lamy, J. Moreno-Lopez, H. Ahola, U. Pettersson, and P. Tiollais. 1983. Secretion of the hepatitis B virus surface antigen from mouse cells using an extra-chromosomal eucaryotic vector. EMBO J. 5:669-673. 30. Struck, D. K., W. J. Lennarz, and K. Brew. 1978. Primary structural requirements for the enzymatic formation of the Nglycosidic bond in glycoproteins. J. Biol. Chem. 253:5784-5786. 31. Summers, J., and W. S. Mason. 1982. Replication of the genome of a hepatitis-B-like virus by reverse transcription of an RNA intermediate. Cell 29:403-415. 32. Summers, J., A. O'Connell, and 1. Millman. 1975. Genome of hepatitis B virus: restriction enzyme cleavage and structure of DNA extracted from Dane particles. Proc. Natl. Acad. Sci. U.S.A. 72:4797-4801. 33. Summers, J., J. M. Smolec, and R. Snyder. 1978. A virus similar to human hepatitis B virus associated with hepatitis and hepatoma in woodchucks. Proc. Natl. Acad. Sci. U.S.A. 75:45334537. 34. Summers, J., J. M. Smolec, B. G. Werner, Jr., T. J. Kelly, G. V. Tyler, and R. L. Snyder. 1980. Hepatitis B virus and woodchuck hepatitis virus are members of a novel class of DNA viruses. Viruses in naturally occurring cancers. Cold Spring Harbor Conf. Cell Proliferation 7:459-470. 35. Valenzuala, P., P. Gray, M. Quiroza, J. Zaldivar, H. M. Goodman, and W. J. Rutter. 1979. Nucleotide sequence of the gene coding for the major protein of hepatitis B virus surface antigen. Nature (London) 280:815-819. 36. Valenzuala, P., M. Quiroga, J. Zaldivar, P. Gray, and W. J. Rutter. 1981. The nucleotide sequence of the hepatitis B viral genome and the identification of the major viral genes. In B. Fields, R. Jalnisch, and C. F. Fox (ed.), Animal virus genetics. Academic Press, Inc., New York.