Variation in Nucleotide Sequences Coding for the N ... - Europe PMC

0 downloads 0 Views 1MB Size Report
Sep 19, 1980 - Department ofMicrobiology, John Curtin School ofMedical Research, Australian National University,. Canberra City, A.C.T. 2601, Australia.
JOURNAL OF VIROLOGY, Apr. 1981, p. 1-7 0022-538X/81/040001-07$02.00/0

Vol. 38, No. 1

Variation in Nucleotide Sequences Coding for the N-Terminal Regions of the Matrix and Nonstructural Proteins of Influenza A Viruses RUTH M. HALLt AND GILLIAN M. AIR* Department ofMicrobiology, John Curtin School of Medical Research, Australian National University, Canberra City, A.C.T. 2601, Australia Received 19 September 1980/Accepted 17 December

1980

Nucleotide sequences have been determined for complementary DNA transcribed from the 3' ends of RNA segments 7 (matrix gene) and 8 (nonstructural gene) from a number of human influenza A viruses isolated over a period of 43 years and representing HON1, HlNl, H2N2, and H3N2 subtypes. The pattern of nucleotide variation in both genes suggests that RNA segments 7 and 8 were conserved during the reassortmnent events which were responsible for the antigenic shifts HlNl -1 H2N2 and H2N2 - H3N2. During the 23-year period between the isolation of A/PR/8/34(HON1) and A/RI/5-/57(H2N2), substitutions have occurred at 7 of 230 nucleotides in RNA segment 7 and 13 of 220 nucleotides in RNA segment 8, and in 20 years A/RI/5-/57(H2N2) to A/Canberra Grammar/ 77(H3N2) substitutions have occurred at 5 of 230 nucleotides in RNA segment 7 and 12 of 220 nucleotides in RNA segment 8. These give rise to 2 of 67, 5 of 64, 1 of 67, and 5 of 64 amino acid changes, respectively. The number of nucleotide and amino acid changes observed is of the same order of magnitude as that which occurs over a comparable period of drift in RNA segments 4 and 6, which code for the variable antigenic determinants hemagglutinin and neuraminidase. The influenza A virus genome is segmented and contains eight distinct single-stranded RNA molecules of negative polarity. Studies of the natural variation of human influenza A virus have focused largely on the antigenic variation of the two surface proteins, hemagglutinin and neuraminidase. It is now clear that changes in the amino acid sequence of these proteins arise in two ways. Mutation causes gradual changes in antigenic properties (drift), whereas more dramatic changes in antigenicity (shifts) probably result from recombination events involving the exchange of one or both of the RNA segments 4 and 6 which code for haemagglutinin and neuraminidase, respectively (12, 16). However, little is known about the evolution and variation of the proteins encoded by the remaining RNA segments. The recent development of rapid methods for sequencing RNA molecules allows a direct study of RNA sequence divergence. Analyses of the 3'terminal sequences of RNA segments 4 and 6 have shown that valuable information on the relationships among influenza virus strains can be obtained in this way (3, 5). We report here the sequences of the DNA complementary to t Present address: Biozentrum der Universitat Basel, CH4056 Basel, Switzerland.

the 3' terminus of viral RNA segments 7 and 8 from human influenza virus A strains (cDNA). Sequences presented previously for RNA segments 7 (6) and 8 (4) have been extended and corrected. Comparisons of 230 nucleotides of RNA segment 7 from five human strains and of 220 nucleotides of RNA segment 8 from three human strains suggest that both RNA segments 7 and 8 were conserved at the antigenic shifts HlNl -* H2N2 and H2N2 - H3N2. Further, the numbers of base substitutions found in these RNA species are comparable to the numbers found in RNA segments 4 and 6 from drift strains (3, 5; Air, unpublished data). This implies that the natural error rate which leads to a variation of influenza virus RNA sequences is sufficient to account for the observed rate of antigenic drift. MATERIALS AND METHODS Viruses. The influenza virus strains used were A/ PR/8/34(HON1), A/FW/1/50(H1N1), A/Loyang/4/ 57(HlNl), A/RI/5-/57(H2N2), and A/Canberra Grammar/77(H3N2). All virus stocks were provided by W. G. Laver. The viruses were grown in embryonated chicken eggs and purified by absorption to and elution from erythrocytes, followed by sucrose density gradient centrifugation (11). Chemicals and enzymes. [a-32P]dATP and [a-

2

HALL AND AIR

32P]dGTP (350 Ci/mol or >2,000 Ci/mmol) were obtained from the Radiochemical Centre, Amersham, England. Dideoxynucleoside triphosphates were from P-L Biochemicals, Inc., Milwaukee, Wis. Reverse transcriptase was a generous gift from J. W. Beard, Life Sciences Inc., St. Petersberg, Fla., under National Cancer Institute contract. The oligonucleotide primer

d(AGCAAAAGCAGG), synthesized by Collaborative Research, Inc., Waltham, Mass., was generously sup-

plied by Ching-Juh Lai. Preparation of template RNA. Extraction of virion RNA, separation of the RNA segments by polyacrylamide gel electrophoresis, and elution of RNA segments 7 and 8 from the gel were as described previously (6), except RNA segments were not polyadenylated. Nucleotide sequencing procedures. cDNA sequences were obtained by the dideoxy chain-terminating method. The procedure was as described previously (2), except the synthetic dodecamer d(AGCAAAAGCAGG) was used as a primer and the concentration of dideoxynucleoside triphosphate required to give long or short sequences was determined

empirically. Peptide isolation and analysis. The protein methods were essentially as described previously (1, 6). Matrix protein from A/PR/8/34(HON1) was Scarboxymethylated and then digested with either trypsin or chymotrypsin. Because of the low solubility of the protein, a portion was treated with succinic anhydride in 6 M guanidine hydrochloride to modify eamino groups and then digested with trypsin or chymotrypsin. The soluble peptides from these digests were separated on Whatman 3MM paper by electrophoresis at pH 6.5, followed by chromatography in butan-l-ol-acetic acid-water-pyridine (150:30:120: 100). The papers were stained with fluorescamine (19). The peptides were eluted with 6 N HCl, hydrolyzed in vacuo for 24 h at 105°C, and analyzed on a Beckman 119CL amino acid analyzer.

RESULTS Sequence analysis. The sequence of the first 12 nucleotides at the 3' terminus of all influenza A virus RNA segments is identical (14, 18). A synthetic dodecamer, d(AGCAAAAGCAGG), complementary to this common sequence, was used to prime cDNA synthesis on purified RNA segments. When synthetic primer is used, sequences can routinely be determined for at least 200 nucleotides and, in some cases, for more than 300 nucleotides. As in previous sequencing experiments with an RNA template (4-6), there are positions in the sequence at which "cross bands" appear on the gel in all four lanes. When the templates are stored in ethanol rather than frozen and when they are used soon after preparation, these cross bands are not as intense or as frequent. When the results of several sequencing experiments with the same template are considered, the assignment of a nucleotide becomes clear, and differences among strains are

J. VIROL.

obvious. The origin of these cross bands is not known; they may be due to a secondary structure in the template, but specific degradation of the template is another possibility since they often occur at T residues followed by a G residue. With the standard reaction mixture which contains only [a-32P]dATP, nucleotides 13 through 17 are not normally detected. When the concentration of dideoxynucleoside triphosphate is increased and both [a-32P]dATP and [a-32P]dGTP are used, it is possible to determine the sequence from nucleotide 15 of RNA segment 7 and from nucleotide 14 of RNA segment 8. The remaining nucleotides are known only for those strains for which earlier data, obtained with polyadenylated RNA and d(pT8A) as a primer, were available (4, 6). Sequences of cDNA are presented since the mRNA and cDNA are of the same sense and predicted N-terminal amino acid sequences can be read off directly. Nucleotide sequence at the 3' terminus of RNA segment 7 (matrix gene). The sequence of the first 343 nucleotides of cDNA transcribed from the 3' terminus of RNA segment 7 from strain A/PR/8/34(HON1) is shown in Fig. 1. This sequence differs from that published previously (6), in that a G residue has been inserted between nucleotides 76 and 77 and the two spaces at 108 and 109 are replaced by a single T residue in all strains examined (Fig. 2). This discrepancy is due to the compression of the pair of G residues at positions 77 and 78, which previously appeared as a single G residue and which have now been resolved with the use of thin polyacrylamide gels (15). This correction alters the reading frame from nucleotide residues 77 to 109 and therefore the predicted amino acid sequence of residues 18 to 27. Peptides corresponding to this region of the matrix protein sequence have now been identified (Tables 1 and 2 and Fig. 1). The sequence shown in Fig. 1 is identical to the recently published sequence for RNA segment 7 of A/PR/8/34(HON1) (20), except for a single silent nucleotide difference at position 58 (A/T). The sequences of at least 238 nucleotides of RNA segment 7 from four other strains, A/FW/ 1/50(HlN1), A/Loyang/4/57(H1N1), A/RI/5-/ 57(H2N2), and A/Canberra Grammar/ 77(H3N2), have also been determined. All sequences were highly related; nucleotide substitutions in human virus strains were observed at a total of only 13 positions. The differences among the sequences are shown in Table 3. Three of the alterations lead to changes in the amino acid sequences, namely, 68, A -- G, Ile -k Val; 143, G -* C, Glu -- Gln; and 147, T -* C, Val -- Ala. The remaining 10 substitutions are

INFLUENZA VIRUS M AND NS GENE SEQUENCES

VOL. 38, 1981 10

3

40

AGCAAAAGCAGGTAGATATTGAAAG ATG AGT CTT CTA ACC GAG GTC GAA ACG TAC GTA CTC TCT ATC 10 1 Met Ser Leu Leu Thr Glu Val Glu Thr Tyr Val Leu Ser Ile

70 100 ATC CCG TCA GGC CCC CTC AAA GCC GAG ATC GCA CAG AGA CTT GAA GAT GTC TTT GCA GGG

20 30 Ile Pro Ser Gly Pro Leu Lys Ala Glu Ile Ala Gln Arg Leu Glu Asp Val Phe Ala Gly 130

160

AAG AAC ACC GAT CTT GAG GTT CTC ATG GAA TGG CTA AAG ACA AGA CCA ATC CTG TCA CCT

40

50

Lys Asn Thr Asp Leu Glu Val Leu Met Glu Trp Leu Lys Thr Arg Pro Ile Leu Ser Pro

190

220

CTG ACT AAG GGG ATT TTA GGA TTT GTG TTC ACG CTC ACC GTG CCC AGT GAG CGA GGA CTG 60 70 Leu Thr Lys Gly Ile Leu Gly Phe Val Phe Thr Leu Thr Val Pro Ser Glu Arg Gly Leu

250

280

CAG CGT AGA CGC TTT GTC CAA AAT GCC CTT AAT GGG AAC GGG GAT CCA AAT AAC ATG GAC

90 80 Gln Arg Arg Arg Phe Val Gln Asn Ala Leu Asn Gly Asn Gly Asp Pro Asn Asn Met Asp

310

340

AAA GCA GTT AAA CTG TAT AGG AAG CTC AAG AGG GAG

100 Lys Ala Val Lys Leu Tyr Arg Lys Leu Lys Arg Glu

FIG. 1. Nucleotide sequence of cDNA transcribed from the 3' end of RNA segment 7 from strain A/PR/8/ 34(HON1). The first 12 nucleotides represent the sequence of the synthetic dodecamer used as the primer, and nucleotides 13 and 14 were determined previously (6). The amino acid sequence predicted for the matrix protein is also shown.

in the third codon position and do not alter the predicted amino acid. Thus, the amino acid sequence of the matrix protein shows less than a 5% (3 of 70) sequence divergence over a period of 43 years. The three HON1 and HlNl strains studied were isolated over a period of 23 years, and the number of nucleotide differences observed is 4 of 238 for 1934 to 1950, 3 of 228 for 1950 to 1957, and 7 of 238 for 1934 to 1957. The H2N2 strain and the latest HlNl strain were both isolated in 1957, and the sequences differ by only 2 of 238 nucleotides. For these four strains, sequence data to at least 270 nucleotides have been obtained, and no further nucleotide differences were detected. The near identity of the nucleotide sequences of RNA segment 7 from A/Loyang/4/57(HlNl) and A/RI/5-/57(H2N2) argues strongly that RNA segment 7 of the H2N2 strains was derived from an HlNl strain circulating at the time of the shift event. RNA segment 7 from A/Canberra Grammar/77(H3N2), a strain isolated in 1977, differs by 5 of 238 nucleotides from A/RI/5-/57(H2N2) and by 12 of 238 nucleotides from A/PR/8/34(HON1). All

nucleotide changes accumulated in A/RI/5-/ 57(H2N2) are conserved in A/Canberra Grammar/77(H3N2) (Table 3). This pattern of variation suggests that RNA segment 7 was also conserved at the H2N2 -. H3N3 shift event. Matrix protein data. The compositions of tryptic peptides of S-carboxymethylated matrix protein of virus strain A/PR/8/34(HON1) are given in Table 1. No further information was obtained from the tryptic digest of succinylated protein. The compositions of several of these peptides are in agreement with the predicted amino acid sequence shown in Fig. 1, and the locations of these are indicated in Table 1. The chymotryptic digests yielded less information, partly because there were many instances of incomplete hydrolysis, leading to a more complex mixture of products. Analyses of peptides which gave clear results are shown in Table 2 and are derived from succinylated or non-succinylated digests. It can be seen that only three peptides were obtained which cannot be derived from the predicted N-terminal amino acid sequence of the protein shown in Fig. 1. However, the results shown in Table 1 and 2 do

4

HALL AND AIR

FIG. 2. Autoradiograph of an 8% gel showing the of DNA complementary to RNA segment 7 from A/Canberra Grammar/77(H3N2). The sequence extends from residues 51 to 108 and reads C-G-T-AT-G-T-T-C-T-C-T-C-T-A-T-C-G-T-T-C-C-G-T-C-AG-G-C-C-C-C-C-T-C-A-A-A-G-C-C-G-A-A-A-T-C-GC-A-G-A-G-A-C-T. The nucleotides which differ from theA/PR/8/34(HON1) sequence shown in Fig. occur at residues 55, 58, 68, 70, 94, and 100 and are boldfaced. The two G residues at position 77 and 78 can be clearly distinguished (see text). sequence

account for every amino acid ofthe 105 predicted from the nucleotide sequence in Fig. 1. As previously reported (6), the only processing of the primary translation product at the N-terminus is the removal of the initiating methionine and the addition of a blocking group to the resulting N-terminal serine residue. Nucleotide sequence at the 3' terminus of RNA segment 8 (nonstructural gene). The sequence of the first 294 nucleotides of cDNA transcribed from the 3' terminus of RNA segment 8 from strain A/RI/5-/57(H2N2) is shown in Fig. 3. The first 131 nucleotides are identical

J. VIROL.

to the sequence previously published for this strain (4). It has been shown recently that RNA segment 8 codes for two polypeptides, designated NS1 and NS2 (7, 8), and that the 3'terminal region of the viral RNA encodes the NS1 protein (9). The predicted N-terminal amino acid sequence for NS1 is also shown in Fig. 3. Sequences of RNA segment 8 have been obtained from A/PR/8/34(HON1) to 274 nucleotides and from A/Canberra Grammar/77(H3N2) to 221 nucleotides. The differences found in the first 221 nucleotides are shown in Table 4. Included are data from A/Udorn/72(H3N2) RNA segment 8, which Lamb and Lai (10) have sequenced completely as cloned DNA. Differences in human strains occur at a total of 26 positions, and amino acid differences are predicted at 10 positions (Table 4). In three cases nucleotide substitutions have occurred in both the first and the third codon positions: two of these lead to amino acid substitutions (33 to 35, CCA/T -- TCC = Pro -- Ser, and 189 to 191, AAG -- GAA = Lys -+ Glu); however, in the third case (81 to 83) both CGC and AGA code for Arg. A comparison of the A/PR/8/34(HON1) and A/RI/5-/57(H2N2) sequences to 274 nucleotides reveals only two further differences: A -) C at nucleotide 225 and G -* A at nucleotide 236. Neither substitution alters the predicted amino acid. Both the nucleotide sequence of RNA segment 8 and the protein sequence of NS1 vary more than in the case of RNA segment 7 and matrix protein. In the first 220 nucleotides, differences are observed between A/PR/8/34(HON1) and A/RI/ 5-/57(H2N2) at 13 positions, between A/RI/5-/ 57(H2N2) and A/Canberra Grammar/77(H3N2) at 12 positions, and between A/PR/8/34(HON1) and A/Canberra Grammar/47(H3N2) at 22 positions. The pattern of nucleotide variation is clearly consistent with the conservation of RNA segment 8 at both major shift events. DISCUSSION Sequence analysis of the 3' ends of the human influenza A virus matrix gene (RNA segment 7) and nonstructural gene (RNA segment 8) indicates that with time a sequential accumulation of single base changes has occurred in both genes. When strains are compared pairwise, the number of differences in nucleotide sequence is proportional to the time period between the isolation of the viruses, and the majority of changes observed in intermediate strains are conserved in later isolates. These results suggest that both the matrix gene and the nonstructural gene were conserved at both the HlNl -+ H2N2 and the H2N2 -. H3N3 shift events. Although

5

INFLUENZA VIRUS M AND NS GENE SEQUENCES

VOL. 38, 1981

TABLE 1. Amino acid composition of tryptic peptides isolated from S-carboxymethylated matrix protein of influenza virus strain A/PR/8/34(HON1)' Composition of peptideb (molar ratio): Aminoacid l I,L~~~~~~~~~~~ ~~cq

Arginine

2.0

1.0 1.0 1.0

1.1

1.0

2.0 1.1

1.0 0.9

1.0 1.0 1.0

1.0 1.0 1.0 1.0

1.9 1.0

Lysine Histidine

1.0

1.2

1.0

0.9 1.0 0.8

0.8 1.0 1.0 1.0

1.0

Carboxy-

methyl cysteine

Aspartic acid Threonine Serine

2.0 2.1 1.1 3.1 1.1

Glutamic acid Proline

7.0 1.2 3.2 1.1 2.0 1.0 0.8 2.7 2.1 1.0 1.1 2.2 1.1 3.2 1.1 1.8 7.4 1.2 1.3 1.1 3.5

1.0 2.2 0.8 1.0 2.1 1.1 0.5 1.0 1.0 1.1 1.0 1.0 1.8 1.0 1.0 1.0 1.0 2.1 1.0 0.9 1.0 1.1 1.0 0.9 0.9 2.0

0.9

Glycine Alanine Valine Methionine Isoleucine Leucine

0.9

Tyrosine Phenylaa-

0.9 0.7

1.0 1.0 1.0 0.9 0.7 0.5

2.0 1.0 0.9 1.0 1.0

3.1 0.9 2.3 0.8 0.9

1.1

1.1 7.0 1.1 2.8

1.0 2.1 1.1 1.0 0.8 1.0

0.8 nine a Soluble tryptic peptides were separated in two dimensions on Whatman 3MM paper, using electrophoresis at pH 6.5 and chromatography in butanol-acetic acid-water-pyridine (15:3:12:10) (19). The peptides were located with fluorescamine, the fluorescent areas were cut out, and peptides were eluted with 6 N HCI. After hydrolysis at 105°C for 22 h, the amino acid composition of each peptide was determined with a Beckman 119CL amino acid analyzer. Tryptophan was not determined. b Peptide spots on the map were given an identification number, approximately from the most basic (no. 1) to the most acidic (no. 36). 'Location in sequence. Several peptides have amino acid compositions which match those predicted from the amino acid sequence deduced from the nucleotide sequence shown in Fig. 1 and can therefore be located in the sequence. For example, peptide 10 has an amino acid composition corresponding to that predicted for amino acids 48 to 57 in Fig. 1.

TABLE 2. Amino acid composition of chymotrypticpeptides isolated from S-carboxymethylated matrix protein of influenza virus strain A/PR/8/34(HON1ta Composition of peptideb (molar ratio): Amino acid

S23 (2- 35 (S16) 37 (S17) 38 (S20)

10)c Lysine Histidine

1.0

Arginine

0.9 1.5

Aspartic acid Threonine Serine Glutamic acid Proline Glycine Alanine Valine Methionine Isoleucine Leucine Tyrosine

1.7 1.0 2.2

0.8

19

1.8 3.0

32 (S25)

S1

1.0

10

0.9 1.1

0.9

2.6 2.0

1.4

1.9 1.0

1.1

2.4

0.8 1.5

1.5 1.0 1.8

1.0

1.0

S8

2.1 1.1 1.1 1.1 1.0

1.1

1.4 0.9 3.0

0.9 1.0

2.1

0.7 2.0 1.3 1.9 2.4 1.9 1.1

0.8 2.2 1.0

5.8

0.9 0.9 0.4 0.8 a Details as for Table 1. Tryptophan was not determined. 'Numbers with S prefixes are from a chymotryptic digest of succinylated matrix protein. c Location in sequence. d NL, Not located.

Phenylalanine

4

1.1

3.1

+

2.0 0.5

S21

2.0

0.9

2.2

1.1

1.4

0.9 0.9

2.0 1.0

1.6 1.0

16

(11-20) (21-32) (33-45) (46-55) (56-62) (63-74) (75-79) (80-100) (NLd) (NLd) (NLd)

1.9 1.3 1.3

1.0

0.7 2.1 1.0 0.5 0.7 1.0

2.0 0.9 1.0

6

HALL AND AIR

J. VIROL.

TABLE 3. Nucleotide differences in 238 residues at the 3' terminus of RNA segment 7 and amino acid substitutions in the predicted polypeptide Differences at nucleotide positiona: Strain

A/PR/8/34(HON1) A/FW/1/50(H1N1) A/Loyang/4/57(H1N1) A/RI/5-/57(H2N2) A/Canberra Grammar/77(H3N2)

31 T T T T C

55 C C C C T

58 A T T T T

68 A G G G G

70 C C C C T

94 G G G G A

100 A A A G G

124 A T T T T

130 G G A G G

143 G G G G C

147 T C C C C

205 A A G G G

214 G G A A A

aAmino acid substitution for nucleotides position 68, 143, and 147 were Ile -* Val, Glu -* Gln, and Val -. Ala, respectively.

20 50 AGCAAAAGCAGGGTGACAAAGACATA ATG GAT CCT AAC ACT GTG TCA AGC TTT CAG GTA GAT TGC TTC Met Asp Pro Asn Thr Val Ser Ser Phe Gln Val Asp Cys Phe 80 110 CTT TGG CAT GTC CGC AAA CAA GTT GCA GAC CAA GAA CTA GGT GAT GCC CCA TTC CTT GAT Leu Trp His Val Arg Lys Gln Val Ala Asp Gln Glu Leu Gly Asp Ala Pro Phe Leu Asp

140 170 CGG CTT CGC CGA GAT CAG AAG TCC CTA AGG GGA AGA GGC AGT ACT CTC GGT CTG AAC ATC

Arg Leu Arg Arg Asp Gln Lys Ser Leu Arg Gly Arg Gly Ser Thr Leu Gly Leu Asn Ile 200 230 GAA ACA GCC ACC CGT GTT GGA AAG CAG ATA GTG GAG AGG ATT CTG AAG GAA GAA TCC GAT

Glu Thr Ala Thr Arg Val Gly Lys Gln Ile Val Glu Arg Ile Leu Lys Glu Glu Ser Asp 260 290 GAG GCA CTT AAA ATG ACC ATG GCC TCC GCA CCT GCT TCG CGA TAC C

Glu Ala Lou Lys flet Thr Met Ala Ser Ala Pro Ala Ser Arg Tyr

FIG. 3. Nucleotide sequence of cDNA transcribed from the 3' end of RNA segment 8 from strain A/RII 5-/57(H2N2). The first 12 nucleotides represent the sequence of the synthetic primer, and nucleotide 13 was determined previously (4). The amino acid sequence predicted for the NS1 protein is also shown.

TABLE 4. Nucleotide differences in 221 residues at the 3' terminus of RNA segment 8 and amino acid substitutions in the predicted polypeptide Differences at nucleotide positiona: Strain

A/PR/8/34 HON1)

21 33 35 50 62 68 81 83 88 90 94 110 131 149 158 164 170 182 183 189 191 192 200 202 205 221 A C A C T T C C G G C C G A A G T G G A G A A G C G

A/RI/5-/57

G C T C T C C C A G C T G G G A H2N2) A/Udorn/72 G T C T C C C A A G T T G G G A

T

G

A

G

A

A

C

G

T

G

C

A

A

G

A

G

C

A

T

A

A/Canberra

C

A

G

G

A

G

C

A

T

G

(H3N2)b

G T C T T C A A A A C T

A

G

G

A

Grammar/ 77(H3N2) aAmino acid substitutions were as follows for the indicated nucleotide positions: 21, Asn -* Asp; 33 and 35, Pro -. Ser; 88, Arg -* Gln; 90, Val -p Be; 94, Ala -- Val -* Ala; 183, Asp -- Asn -. Asp; 189 and 191, Lys -. Glu; 192, Thr - Ala; 202, Arg His; 205, Ala -- Val. 'Data of Lamb and Lai (10).

INFLUENZA VIRUS M AND NS GENE SEQUENCES

VOL. 38, 1981

the data presented here for the nonstructural gene are less extensive than those for the matrix gene, species specificity for the nonstructural gene has been proposed previously on the basis of base sequence homology measured in a hybridization assay (17). The complete sequence of RNA segment 8 from an avian virus, fowl plague, has been determined (13) and does not show a particular relationship to any of the human strains sequenced (Table 4). The number of nucleotide changes which have accumulated in the RNA sequences in the period bqtween the isolation of two naturally occurring viruses can be used as a measure of the rate of nucleotide divergence. Of particular interest is the comparison of the rate of nucleotide change in the two genes studied here (which are not coding for surface proteins) with the rate of divergence of the genes for the two glycoprotein surface antigens, hemagglutinin (RNA segment 4) and neuraminidase (RNA segment 6). A variation in the antigenic properties of hemagglutinin and neuraminidase is responsible for the antigenic drift which occurs in natural influenza virus populations between shift events. Sequence data are available for RNA segment 4 from a number of HON1 and HlNl drift strains (3; Air, unpublished data) and for RNA segment 6 from both N1 and N2 drift strains (5). The numbers of nucleotide changes in RNA segments 4 and 6 over the period from 1934 to 1957 are 16 of 250 in RNA segment 4 (3) and 11 of 191 in RNA segment 6 (5). These numbers are comparable to the numbers of changes observed in RNA segment 7 (7 of 230) and RNA segment 8 (13 of 221) in the same period, and the proportions which give amino acid changes are similar. From this comparison it is possible to conclude that the rate of mutation for at least these four RNA segments is of the same order and, therefore, that no special mechanism is required to generate the high frequency of antigenic variants of hemagglutinin and neuraminidase which appear in natural populations of influenza virus. ACKNOWLEDGMENTS This work was supported in part by Public Health Service grant AI-15343 from the National Institute of Allergy and Infectious Diseases. We thank Anne Mackenzie and Sally Campbell for expert technical assistance. Avian myeloblastosis virus reverse transcriptase was kindly supplied by J. W. Beard.

LITERATURE CITED 1. Air, G. M. 1976. Amino acid sequences from the gene F (capsid) protein of bacteriophage 0X 174. J. Mol. Biol. 107:433-443.

7

2. Air, G. M. 1979. Nucleotide sequence coding for the signal peptide and N-terminus of the hemagglutinin from an Asian (H2N2) strain of influenza virus. Virology 97: 468-472. 3. Air, G. M. 1980. Sequences from the 3' ends of influenza virus RNA segments, p. 135-146. In W. G. Laver and G. M. Air (ed.), Structure and variation in influenza virus. Elsevier/North-Holland Publishing Co., New York. 4. Air, G. M., and J. A. Hackett. 1980. Gene 8 of influenza virus: sequences of cDNA transcribed from the 3' ends of viral RNA of influenza A and B strains. Virology 103:291-298. 5. Blok, J., and G. M. Air. 1980. Comparative nucleotide sequences at the 3' end of the neuraminidase gene from eleven influenza type A viruses. Virology 107:50-60. 6. Both, G. W., and G. M. Air. 1979. Nucleotide sequence coding for the N-terminal region of the matrix protein of influenza virus. Eur. J. Biochem. 96:363-372. 7. Inglis, S. C., T. Barnett, C. M. Brown, and J. W. Almond. 1979. The smallest genome RNA segment of influenza virus contains two genes that may overlap. Proc. Natl. Acad. Sci. U.S.A. 76:3790-3794. 8. Lamb, R. A., and P. W. Choppin. 1979. Segment 8 of the influenza virus genome is unique in coding for two polypeptides. Proc. Natl. Acad. Sci. U.S.A. 76:49084912. 9. Lamb, R. A., P. W. Choppin, R. M. Chanock, and C.J. Lai. 1980. Mapping of the two overlapping genes for polypeptides NS1 and NS2 on RNA segment 8 of influenza virus genome. Proc. Natl. Acad. Sci. U.S.A. 77: 1857-1861. 10. Lamb, R. A., C.-J. Lai. 1980. Sequence of interrupted and uninterrupted mRNAs and cloned DNA coding for the two overlapping nonstructural proteins of influenza virus. Cell 21:475-485. 11. Laver, W. G. 1969. Purification of influenza virus, p. 8286. In K. Habel and N. P. Salzman (ed.), Fundamental techniques in virology. Academic Press, Inc., New York. 12. Laver, W. G., and R. Webster. 1979. Ecology of influenza viruses in lower mammals and birds. Br. Med. Bull. 35:29-33. 13. Porter, A. G., J. C. Smith, and J. S. Emtage. 1980. Nucleotide sequence of influenza virus RNA segment 8 indicates that coding regions for NS1 and NS2 proteins overlap. Proc. Natl. Acad. Sci. U.S.A. 77:5074-5078. 14. Robertson, J. S. 1979. 5' and 3' terminal nucleotide sequences of the RNA genome segments of influenza virus. Nucleic Acids Res. 6:3745-3757. 15. Sanger, F., and A. R. Coulson. 1978. The use of thin acrylamide gels for DNA sequencing. FEBS Lett. 87: 107-110. 16. Scholtissek, C. 1978. The genome of the influenza virus. Curr. Top. Microbiol. Immunol. 80:139-169. 17. Scholtissek, C., and V. von Hoynigen-Huene. 1980. Genetic relatedness of the gene which codes for the nonstructural (NS) protein of different influenza A strains. Virology 102:13-20. 18. Skehel, J. J., and A. J. Hay. 1978. Nucleotide sequences at the 5' termini of influenza virus RNAs and their transcripts. Nucleic Acids Res. 5:1207-1219. 19. Udenfriend, S., S. Stein, P. Bohlen, W. Dairman, W. Leimgruber, and M. Weigele. 1972. Fluorescamine: a reagent for assay of amino acids, peptides and primary amines in the picomole range. Science 178:871-872. 20. Winter, G., and S. Fields. 1980. Cloning of influenza cDNA into M13: the sequence of the RNA segment encoding the A/PR/8/34 matrix protein. Nucleic Acids Res. 8:1965-1974.