Complete Nucleotide Sequence of the ... - Journal of Virology

0 downloads 0 Views 1MB Size Report
plasmid pBR322, and its nucleotide sequence has been determined. The influenza. B NP gene contains 1,839 nucleotides and codes for a protein of 560 amino ...
Vol. 47, No. 3

JOURNAL OF VIROLOGY, Sept. 1983, p. 642-648 0022-538X/83/090642-07$02.00/0 Copyright C 1983, American Society for Microbiology

Complete Nucleotide Sequence of the Nucleoprotein Gene of Influenza B Virus DAVID R. LONDO, ALAN R. DAVIS, AND DEBI P. NAYAK* Jonsson Comprehensive Cancer Center, and Department of Microbiology and Immunology, School of Medicine, University of California, Los Angeles, California 90024 Received 6 April 1983/Accepted 3 June 1983

A DNA copy of influenza B/Singapore/222/79 viral RNA segment 5, containing the gene coding for the nucleoprotein (NP), has been cloned in Escherichia coli plasmid pBR322, and its nucleotide sequence has been determined. The influenza B NP gene contains 1,839 nucleotides and codes for a protein of 560 amino acids with a molecular weight of 61,593. Comparison of the influenza B NP amino acid sequence with that of influenza A NP (A/PR/8/34) reveals 37% direct homology in the aligned regions, indicating a common ancestor. However, influenza B NP has an additional 50 amino acids at its N-terminal end. As is the case with influenza A NP, influenza B NP is a basic protein, with its charged residues relatively evenly distributed rather than clustered. The structural homology suggests functional similarity between the NP of influenza A and B viruses.

Influenza viruses are classified into type A, type B, or type C viruses based on the lack of serological cross-reactivity between their internal components, particularly nucleoprotein (NP) and matrix (M) proteins (39). All virus isolates belonging to a single type possess the crossreactive M and NP proteins, although their surface antigens, hemagglutinin (HA) and neuraminidase (NA), may vary drastically. In addition, there are major differences in the epidemiology and host range among these three types. For example, only type A viruses, although species specific, can infect and produce disease both in humans and in various animal species in nature. Type B and type C viruses, on the other hand, are primarily restricted to humans (see reference 21), although occasionally they have been isolated from animal hosts (41). Type A viruses are known to undergo both antigenic shift and drift, whereas B and C viruses are subject only to antigenic drift. Furthermore, the antigenic variation observed among type A viruses is more pronounced than that observed in type B viruses, which in turn undergo more variation than type C viruses (20, 21). Additionally, genetic exchange yielding stable viruses of mixed genotypes has not been observed to occur between viruses of different types (14). The molecular basis of these biological differences between the three types remains undefined. Clearly, an understanding of the structure of the genes as well as the structure and function of the gene products of all three types of viruses will be required before their differences in epidemiology and species specificity can be under-

stood. Recently, cDNA cloning and DNA sequencing have permitted the determination of complete nucleotide sequences and predicted amino acid sequences of protein products of all eight RNA segments from one or more type A strains (see reference 34). Similarly, sequences of the HA, NA, M, and nonstructural (NS) genes of type B viruses have been determined (3, 4, 16, 28, 35). However, little is known about the structure or function of the influenza B NP (B NP), although the primary structure of the NP gene of two influenza A strains has been determined (12, 33, 36). In both A and B types, NP appears to have a multifunctional role. First, it is the major component of the helical ribonucleoprotein (RNP) core (24), and it has been suggested that each influenza A NP (A NP) molecule may bind to ca. 20 bases (b) of RNA (5). Second, the RNP complexes in association with the polymerase proteins are active in transcription (2, 26). Temperature-sensitive mutants of influenza A virus defective in the NP gene fail to synthesize tnRNA or virion RNA (vRNA) in infected cells at nonpermissive temperatures (15, 30, 31). Also, monoclonal antibodies to A NP inhibit viral transcription in vitro, suggesting that NP may be involved in the process of transcription (40). Furthermore, NP may be important in binding the RNP complex to M protein and thus, may have an important role in the process of budding and virion assembly (23). Evidence of intracistronic complementation between temperature-sensitive mutants defective in the NP gene also supports its multifunctional character

642

NOTES

VOL. 47, 1983

643

B/Singapore/222/79 Insert l

pBR 322

nuclootides + strand S' - strand 3'-

Alu I Bgl II Dde I EcoR I Hinf I Hph I

100 200 3~O

460

a00 600 700 900 900 1000

1160

1200 1300

M600 150 60

1700 18 o

pBR 322 l- 7-' 1K00 II ~~~~~~~~~~~3' il

S'

-'--

I

o-

I

-

-

-4

I -

0-

I o

RsaI Sam 96 1Taq I -

44

I

FIG. 1. Endonuclease restriction enzyme cleavage map and strategy used for sequencing of the cloned B/Sing NP DNA. The insert DNA was characterized by multiple restriction endonuclease digestions, mapping the sites relative to one another. DNA sequencing was carried out by using 5' singly end-labeled fragments by the methods of Maxam and Gilbert (18, 19). Approximately 85 to 90% of both strands were sequenced, and all restriction sites used in sequencing were sequenced through an intact DNA segment cut at a different position. The sequence at the 5' end of the viral RNA segment was determined by using a primer extension procedure (11). Total influenza B vRNA was hybridized to a 5' singly end-labeled DNA fragment of clone B1-28. This primer was extended by reverse transcriptase, and the resulting cDNA was purified and sequenced. The solid line along the top shows the cloned insert, with the broken line representing sequences from pBR322. Listed along the left side are the restriction endonuclease sites which were used for labeling with T4 polynucleotide kinase, as indicated by vertical bars, and sequenced in the direction of the arrows. Both strands are also represented showing their overlapping sequenced regions. The asterisk shows the HinF I to Taql fragment used in the primer extension.

(31). In this report, nucleotide sequence predicted amino acid peptide of a type B

we present the complete of the NP gene and the sequence of the NP poly-

influenza virus (B/Singapore/222/79 [B/Sing]) and a comparison of the sequence homology and diversity with type A NP. Virus from the strain B/Sing was obtained from Alan Kendal, Centers for Disease Control, and grown in 10-day-old embryonated chicken eggs. Virus-specific genomic RNA was isolated from purified virions, enriched for specific RNA segments by fractionation in sucrose velocity gradients, and used for cDNA cloning (7). The genes of influenza B/Sing were cloned into the PstI restriction site of pBR322 by procedures previously described (7, 13). Briefly, the vRNA enriched in the 4th and 5th largest segments was reverse transcribed by using a synthetic dodecanucleotide primer (5'-dAGCAGAAGCAGA-3') complementary to the common 3' ends of influenza B vRNA segments (9). Full length cDNA copies were isolated on 1.4% alkaline agarose gels and converted into double-stranded DNA by using the foldback loop at the 3' end as a self-primer. The doublestranded DNA fragments were treated with S1 nuclease, size fractionated on neutral agarose gels, and cloned into the PstI site of pBR322 with deoxyguanine:deoxycytosine tailing. Esch-

erichia coli 294 was transformed with the recombinant plasmids and screened for tetracycline resistance and ampicillin sensitivity. Cloned DNAs were grouped by insert size and restriction enzyme analyses (7). Groups were identified by hybridization to specific vRNA segments. Clone B1-28, containing an insert of ca. 1.8 kilobases (kb), was identified as a clone of the B NP gene on the following basis. (i) It hybridized exclusively to segment 4 vRNA which has been shown to code for NP (22); (ii) segment 5 vRNA, on the other hand, known to code for the HA gene (22, 32) hybridized exclusively to DNA of clones B1-61 and B2-6 and not to the B1-28 DNA; and (iii) neither B1-28, B1-61, nor B2-6 hybridized to the vRNAs of segments 1, 2, 3 (polymerase genes), or 6, known to code for the NA gene (22, 32). Additionally, the nucleotide sequence of B1-61 insert DNA (unpublished data) is over 90% homologous to the reported sequence of the HA gene of influenza B/Lee/40 (16). A partial restriction endonuclease cleavage map of the cloned B NP gene was determined experimentally and is shown in Fig. 1. This served as the basis for preparing 5' end-labeled fragments used in DNA sequencing. The complete nucleotide sequence of the mRNA sense strand of the B NP gene is shown in Fig. 2. The first four nucleotides from the

644

J. VIROL.

NOTES

B/Sirigapore/222/79 + STRAND EI/S,ingi,ipore/222/79

FW

40

5' AGCAGAAGC.A(if)(.;(.,A'I'I'ITC.'T'+T(il(',Aof'+T'+I'(:(,f')GT'AC'f.'AACAAAAAA('T(;A(iAAI+(',AAA f)'T(,+ W(". AAC A'H3 GAI All' GAC N-Terminus

A/PR/8/34

100 160 140 ATC AAC.; ACT OGA ACA A'T'T GA( AAA ACA C`CA GAA GAA ATA A'Tl- TCT GGA ACU' o(+;I+ (36G (3CA A('(: A6A (To oT(- Alf', AC3,A CCA SCA ACC C1T -1 20 10 0

260 220 180 240 200 WiG AAA A.CC` C'.AA AAG. GCC` CCA CCA AGC AAC AAA CGA ACC CGG AAC CCA 'TCC CCIG GAA AGA GC'A ACC ACA AGC AG'Y' GAA OCT GAI (31C Pro-61,jr-A S U. 1- S C? T- C3 lu A L a A,, r, V i- 1, G) 1,3 A r,:I,-. L. .:is Th r GI r'. tlo 60 40 H F, t A I a-S F:. r G I r. 6 I i r T- r G I r G I r. M e t 61.-.j-- Thr-AsF-61L.,4 s A r -4

340 300 20 280 AAA CAG ACC' C'1CG ACA GAG ATA AAG AAG AGC GITC TAC AAT A'TG GTA GI'G AAA CJS G(iT GAA TTC TAC AAC CAG A ril (3 ATC; C`,TC` AAA GC-1 [36A e Met-Val-U.4s-Ala-61.,; Ttq r -Asr, Val-Va1--L-.iS--Le1_1 Lus 611's Thr-Pro hr-61i.i-Ile Lws--L,4s c-r4 Glil- Phe-T4r- Asrs 80 70 I 1 e G 1 .:t 6 1 .-.j I 1 e G 1,_4 Ar-I Phe-Tvr Ile Gl -Me Cvs--Thr--Gl-.j-L.e1j-Lvs A r,4 G I n Asn--Ala Thr-Glu-Ile AT'A-Ala 11 a Gl,=j-L-js30 40 20

90

440 400 420 360 380 CTC AAC GA'T GAC ATO GAG AGA AAC CTA ATC CAA AAT GCA CAT GCT GTG GAA AGA ATT CI'A TTG GCI' GCC ACT GAI' GAC AAG AAA ACT GAA AsF:-Lvs-L. vs-Thr-Ghj I le-Leki Ala Thr -j- r Ar-l-Asn e -Ile-Gln-A5n Ala-His-Ala-Val Asri-Asp Met 11-0 100 110 (31--i-Ar-4-Arg Ser Phe r- T v r Ct 1,4 A r!A L-e-i-I le-Glr-i-Asr-, Ser-Leu-Thr-I leMet-Val Ser 60 50 520 500 460 480 TTC CAG AAG AAA AAG AAT GCC' AGA GAT GI'C AAA GAA GGG AAA 6AA 6AA Al A GAC CAC AAC AAA ACA GCiA GGC AC`C, I"Tl' TAC, AACi ATG GTA His-Am-, 4 s --T t-i r G 1 .-.j G I .:i l'ti r Ph e 9I, L.:ls-Met A 1 a A rA A s, F, V a I L -i s G I u 6 I .-s L,..$ s 1; I -.f G'L i,j T I Phe--Gln--Lvs--L.4s-L,ds 150 140 I _3 0 Pr c) L., vs, --is-Thr G 1 v G 1,,; PT, o I 1 e T,:; T- A r .A -) T- I 90 80 0 600 580 560 540 AC`C ATG GGG AGA 6AI' GA'T' AAA ACC A'TC TAC rTC AGC CCI' ATA AGA Al'l' ACC TTT I"TA AAA GAA GAG G116 AAA ACA ATC, TAC AAA AC(.. J--Gl,1_1V,1 f) I 1-i T, - M e t I'fi r P ti e L e U 'LI Ar-A,.-Asp.-Asp-L.vs-Thr--Ile--'T'vr Phe-Ser-Pro.-Ile-Ar.-i 18( 170 160 I I e -1 Y, A T- G 1 r, A 1;.., A i:, i., A Leu-Tvr-Asp- L.,,-- D. 1`0 tio T,

640 AGT GAT GGC TTC: AOT Glt.4--Phe-Ser Ser 190 A 1 z-i ---T ti r A 1 a Asp 130

UG-A CIA

Blv-Leu

660 AA-T CAC ATA ATG ATT_6SG C Asri

Ile

Glv

Thr

Met

200 T rr-

r

700 680 CAG ATG AAC GAT GTC, TGT TTC` C'AA AGA TCA AAG GCA CTA AAA A6A 1 a-l- eu L.,3 fi Ph e Glr-,-Ars. se:+ 1. 9 s n- sp V a I Bln Met

Ala-Thr-T-ir-

Asr, -Leu-

760 740 .7-)o GTT GGA CT-T GAC CC'T TCA 'I'TA ATC AGT ACC TTT GCA GGA AGC ACA CTC CCC' AGA AGA T.CA GGT 6,':A r Thr-Phe-Ala :Ser-Thr-[+.c-i.j-F'ro--Ar-q--Arg-Spr---Glt-!-Ala r Val Ser.-Leu-Ile t- e u 230 220 A1a r Leu Me t Ci 1. G I v S e r Th r -L.e-_.r-+, Y. o A r 9 A rg -Se YThr Met Ar-i-Met-Cvs, 170

ril

GOA ACT TTA GTG GCA 19-Th r L. e u A la 250 lw-Thr MetMet 190

820 GCC ATT Ala Ile u

Leki-Val

9

TTT Ptie Met.

67A

T

G e

v

e

L.,j s

r

'T h

A T, .-_l A 1. a L,

Val.

150

140

840 6 C A ATG 6 CA Ala-Met-Ala G 1,3 I 1 e Asn 200

0(C) GG T (1 1 .-t va I

880 860 ATC AAA GCC' AAG GGG CJA TTG AGA GAC L,P U -I A s Alzi--Lvs Ile 0 I'h T, A r .-l A s r-i Ph e -Tr P G-_ I 9 G I -+j A s ri G 1 w A T, -J 210

G 1 .:i L e,, rg

780 ACT G(3 F 6 TT G CO A TC` A A A C; G, A isI'h r Tle If Val 240 V a 1. Ala 6 I m A la 180

fA T,,jj

-[A rj!j

940 960 920 900 ACG GCC TAT GAA AAG A'T'T CI'T CTG AAT CTG AAA AAC AAA TGC 'TCT GCG CCC CAA CAA AAr, GCI C'i A GTT GA'T CAA GTG_ AI'C GGA AGT AGA 1. e G I tl PTI_ C-js-Ser-Ala-Pro-61n Gln-Lvs-Ala Le'.1-Val- Asr,--Glri--Va u- Vs Asri Thr Ala-Tvr-Olli Lvs- I 1. e-Leu-Leu-Asn 30( 290 280 r- rg v s Phe-GIn-Thr-Ala-Ala Gln-L+-.i-_,--A1a Met Met-Al: .qs G1,j Ile Ala-Tvr-Oloi Arg-Met-Cvs-Asn-Ile 240 230 2 1060 1.040 101.10 1000 980 AAT CCA GOG ATT GCA 6AC: ATA GAA GAC CTA ACC CT6 CT1 GC-T CGA AGT A'16 GTC G'TT GT'T AGG CCC TCT G1'(3 GCG AGC AAA GTG GTG C Val-Val Asp-Ile 61-i-Asp-Leu- Ttir- Le-.1 P.-J-1AIa-Ar.1 Ser Met-Val-Val--k)PI Tfl Prc). Sc-r-Val .Ala-ser Asn r Ile 330 3.1( 310 Ser-Cjs 'A 61..j Ser..-Val A Ta. H i s a Glu-Phe Glu.-Asp-Lou-Thr F't-icAsp ro-. Asn 270 260 250

FIG. 2

synthetic dodecanucleotide primer used to re- obtained from the insert of clone Bl-28, which transcribe the virus-specific RNA were represents the complete coding region for the missing from the cloned DNA, presumably lost NP polypeptide. The sequence from nucleotides during manipulation of the transcribed DNA. 1,737 to 1,839 was obtained by the primer extenThe sequence from nucleotides 5 to 1,756 was sion of a DNA fragment (nucleotides 1,664 to verse

VOL. 47, 1983

NOTES

(CCC Al'A AGC(i( TT TAT

AFeeIie I-TT -CVs

CT

OA1 C ('1 (iCto

1'

I. Fr.

...C

,

645

it(iII

wiT!1;;(1iTPi

1

C)

19 I1160 TAT AAT ATG GCA ACA LI'T OTT I'CC AlA 'T"A OVA ATIi (CGA (AC GAT1 ((0 6,.A 10 Tvrt-A-rn-Met-Ala-ThrFePo- Va1 Se lie L ei A.I l Met-G.a-vO', c'c O-AI' L

Oi CIt1 '0C0 to3--eOiL

Leu-Ile Ar Fie -Asr.Ot -Hi,37 L-eu-Cilr.---Asrto--Ser--Glri-Val-TyeSe e t Alz '0'

IIIIIII

f77

-I-s-I III

1240 1260 1280 l10( f iI .V TI CT GCC TAT GAO GAC CTA AGA GTT TTG TCT GCA TTA ACA GGC ACA CAA TTC AAG CCIl AoA (CA C(A TIA AA(C h ci AI a a T-rGli Ap-Lu-Arl-Val-Leu-Ser Ala-Leu-Th- b-Tr eCit-Tj-he A. ",I Ale Tor Clo-Ase--Let 41 l ~~~~~~400 Al Fhe ii1i AsF,Ie-leeAr4-Vall-leu.-See Phe-Ilie-Lvs Cl-Th ro-Vt-e 111--1 I TiI liii(e i(i 340 350

.1

1440 1360 1 8)I0 iI 1-00'r ill GCA AAG GAG0 COAt GT6 GAA GGA ATG GOG GCA GCT CTG ATG TCC ATC 006 CTC CAG 111' 10G3 GCT' CCA AIL Oil iii iC A ilT Ale-lovsf Glri-Valf 6}ioMe Glv-Ala-Ala-Leu.j-Met-Ser-Ilre-L.o-s-Letj-Glr.--FPhe Itr---la Pro-c-t c-r-Ai.~rtCl

Asn-Metf ~jThr

e

Glui-Ser-Ser--Ther-Leu--CitG--L.. j-Atl-A-ei--Or- -Tvt LT 3480

370

1420 GTA

1440

OeI, Ile-A~-. TiArolSe. Gl 9

C

GCT 660 GAC GCA CGC TCT GGC COO ATA ACT TGC AliC CCA C'TC TTT GCA GTA COO 0(0 CI'T OTT 0C1

0 FI1G0

01' AA I(10ii ci II

'L"i

4604/L

Val-C1bo-Glo-Ase-lu-Gl-Ci-Seerfffl7-Gr.TIle-SerC-i%--Ser

FeVal

F1

l Ihe Alia (al 7 li-OlG

Asrt-G-ln-Glnrg-Al-Oa--Ser-Ala Clo--Clit-Ile-Ser Ile-ClriFeehThP Te 410 400

r Sc rlQ '1 lit tj

As-1,

)11

',

I

42

1540 1160 1520 1580l All ACA ATG CTG TCA ATG OAT OTT G00 COO CC!' GOT OCA CAT GTC 000 CGO OAT C(TA CTC 000 A10 ATG 0i'J CAC TCAtt )I a s-l ri~ Il l-l-r Atr --Atr -SCMi Areg-Met-Leu-Ser-Met

Al-l-h-h-l 430

'I

1480

1460

tI -

490 nTrGi-l-r

500 e-r-trG--l-l-r Me GMtIt-.ISee 44045

Tr-e

TO

oc ,--

TC lO;t

1 ---- -

PI I:c

1640 1620 1600 166011 (.ci- 1 ~Oi OAT G60 001 GCT TTC OTT 066 006 000 016 TTT COO ATA TCA GAC 000 AOL 000 0CC OAT CCC GCI I 00G A1 ITCA ,TI -,01-6( 11, tH r- speLos-Asnt-Los-Thr-Asn el Vai-G-j 01 i- Fri-1ti-- I e Glr-IleAsen-Glw-Asri-Ala helie Oi Lws-Lvs-Mlet ". 530 5I 20 C? 0.- 1I1e_ iaI-f'T. r l-LsAaAaSr GuLe Ai I lpGwAgGvVl Gl-s-a-e 4i-

470

460

7 16 1740 1720 1700 CCC OAT TTC TTC TTT GGG AGGOGAC ACA GCA GAC COT TOT GOT GAC CTC GOT TOT TAAAGCAACAAAATAGACACTATGACTGTGATT0TTIlTCA0T'ALGT1T The O-Gui Asp-Tye sp~ Ase-Leu-O-se-Tve Peo-Asn-Phe-Phe GF~ ly-Arg

II 1550111

Gluj-Gly-See-Tye+rtj Phe-Oly

s

490 1820

II

0~~~~~~~~60

Ole-See 4AspJ Aset 498 1839

GCAATGTGGGTGTTTACTCTTATGAATAATATAAAAACGCTG3TTGTTTCTAC-

3'

FIG. 2-Continuted 2. FIG. Complete nucleotide sequence of the NP gene (plus-sense strand) and the deduced amino acid sequence of the NP polypeptide of influenza B/Sing. The first 12 nucleotides represent the synthetic primer used to reverse transcribe the virus-specific RNA. Nucleotides are numbered from the 5t end of the plus-strand. Also shown in alignment with B NP is the amino acid sequence of A/PR/8/34 NP (36). Boxes are drawn to show regions of direct amino acid homology. Dashes indicate where deletions were placed to bring the two sequences into maximum alignment. The precise location of substitution, addition, or deletion of nucleotides is not known, therefore, amino acid deletions were placed in regions only to satisfy the criteria of maximum amino acid homology. Both amino acid sequences are numbered starting from their N-terminal end.

1,732) labeled at the 5t end. The B NP gene segment contains 1,839 nucleotides and has an open reading frame from the first initiation codon (AUG) at position 60 to 62 to a termination codon (UAA) at position 1,740 to 1,742. This open reading frame of 1,680 nucleotides could code for a polypeptide of 560 amino acids, leaving 97 and 59 noncoding nucleotides at the 3t and 5t ends, respectively, of the plus-sense DNA strand. There are no other open reading frames in either strand which could code for more than 125 amino acids. A tract of five

adenine residues at position 1,819 to 1,823 probably represents the polyadenylation site for the virus mRNAs as has been reported for influenza A viruses (25). The B NP gene has a calculated base composition of 35.2% uridine, 22.3% adenine, 22.9% cytosine and 19.6% guanine, reflecting the fact that adenine is more commonly used in the third codon positions than guanine. The frequency of the cytosine-guanine dinucleotide as well as the use of cytosine-guanine-containing codons is low, as is commonly seen in other virus and eukaryotic genes (29).

646

NOTES

J. VIROL.

TABLE 1. Amino acid composition of A and B NP Amino acids

Frequency B/Sing (%)

(%)

A/PR/8/34 (%)"

Avg protein (%)a

Ala Arg Asn Asp

46 (8.2) 39 (7.8) (8.6) 31 (5.5) 49 (9.8) (4.9) 27 (4.8) 25 (5.0) (4.3) 34 (6.1) 23 (4.6) (5.5) Cys 5 (0.9) 6 (1.2) (2.9) Gln 18 (3.2) 21 (4.2) (6.0) Glu 29 (5.2) 36 (7.2) (3.9) Gly 45 (8.0) 41 (8.2) (8.4) His 5 (0.9) 6 (1.2) (2.0) lie 38 (6.8) 26 (5.2) (4.5) Leu 38 (6.8) 32 (6.4) (7.4) Lys 50 (8.9) 21 (4.2) (6.6) Met 27 (4.8) 25 (5.0) (1.7) Phe 22 (3.9) 18 (3.6) (3.6) Pro 25 (4.5) 17 (3.4) (5.2) Ser 38 (6.8) 39 (7.8) (7.0) Thr 35 (6.3) 29 (5.8) (6.1) Trp 1 (0.2) 6 (1.2) (1.3) Tyr 13 (2.3) 15 (3.0) (3.4) Val 33 (5.9) 24 (4.8) (6.6) Total 560 (100) 498 (100) (100) "Amino acid compositions of A/PR/8/34 NP and an average protein were obtained from Winter and Fields (36) and Dayhoff et al. (8), respectively.

The B/Sing NP gene is 274 nucleotides longer than the A NP gene (1,839 versus 1,565; see reference 36). These extra nucleotides are present in both the 5' noncoding (97 b versus 45 b) and the 3' noncoding (59 b versus 26 b) regions as well as in the coding region (1,680 b versus 1,494 b) of the B NP gene. The sequence data show that RNA segments of influenza B virus, excluding the three unstudied polymerase RNA segments, are all larger than the corresponding RNA segments of influenza A virus. However, TABLE 2. Amino acid homology between regions of B NP and A/PR/8 NP" Amino acid region of B NP

% Homology to A/PR/8 NP

51-150 ............................ 31.5 151-250 ............................ 45.8 251-350 ............................ 46.0 351-450 ............................ 38.0 451-550 ............................ 25.8 a The sequence homology was determined for every 100 amino acids beginning with amino acid residue 51 (Fig. 2). Since amino acids corresponding to the Nterminal 50 amino acids of B/Sing NP are not present in A/PR/8 NP, they are not included in the calculation. Also unused were amino acids corresponding to areas of addition or deletion created to obtain optimum alignment of the two amino acid sequences.

since these extra sequences are present in most cases in both coding and noncoding regions, their significance remains unclear. These data also show that the segment 4 RNA of B/Sing encodes the NP gene and, therefore, agree with the gene assignment of three other influenza B strains, namely B/Lee/40, B/Mass/1/71, and B/Md/59 (24). Furthermore, our data also show that the B/Sing NP gene (1,839 nucleotides) is smaller than the B/Lee HA gene (1,882 nucleotides) as was predicted by the migration pattern of glyoxalated RNAs (22). The predicted B/Sing NP polypeptide is 560 amino acids in length (Fig. 2), with a calculated Mr of 61,593, which is slightly less than the 66,000 estimated in gels (22). Of the 560 amino acids of B NP, 86 residues are basic (Arg, 31; Lys, 50; His, 5) and 63 residues are acidic (Asp, 34; Glu, 29; Table 1). B NP is a basic protein with a net charge of +20.5 at pH 7.0, assuming a charge of + 1 for each arginine and lysine, -1 for each glutamic acid and aspartic acid, and +0.5 for each histidine, and, therefore, possesses a greater net positive charge than A NP, with a charge of +14 (36). The amino acid composition (Table 1) shows that B NP is remarkably rich in lysine and methionine but poor in histidine, cysteine, and tryptophan when compared with that of a statistically average protein (8). When compared with A/PR/8/34 NP (36), the B NP polypeptide is 62 amino acids longer than A NP (Fig. 2). Seven regions of deletion or addition or both of amino acids were required for optimum alignment of the two sequences, leaving influenza B NP with an extra three residues at its Cterminal end and 50 residues at its N-terminal end. The amino acid composition (Table 1) shows that both A and B NP are rich in basic amino acids and in methionine but low in cysteine as expected for soluble globular proteins. Only one of the cysteine residues appears conserved between B NP (amino acid 389) and A NP (amino acid 333), supporting the finding of Selimova et al. that disulfide bonds are not important in the formation of the secondary structure of NP (27). This is in sharp contrast to the conservation of cysteine residues between influenza A and B viruses in the HA and NA proteins (A. R. Davis, T. J. Bos, and D. P. Nayak, Proc. Natl. Acad. Sci., in press; 16, 28). The distribution and composition of basic amino acids are different between A and B NP. For example, B NP is rich in lysine and low in arginine, whereas the reverse is true for A NP. Both A and B NPs are low in histidine, and B NP contains only one tryptophan residue. It remains to be seen whether this divergence in the composition of basic amino acids between B and A NP has any structural or functional significance or provides the basis for any differences in biologi-

VOL. 47, 1983

cal specificity. As with A NP, B NP does not possess large clusters of charged residues. Rather, these are evenly distributed along the length of the polypeptide, suggesting multiple contact points between each NP molecule and the viral RNA. Since B NP has a greater net positive charge than A NP, it may bind more tightly to the negatively charged vRNA. Relative nuclease sensitivity of B RNP versus A RNP may determine whether the greater positive charge in B NP provides an increased protection of viral RNA segments by B NP. Comparison of the amino acid sequence homology between B and A NP shows that some parts of the sequence are remarkably conserved, whereas other areas are quite divergent. The conservation of sequence is most striking in the central region of the protein (amino acid 150 to 350) which contains nearly 50% direct homology (Table 2). This region also contains a direct 10 amino acid residue homology (amino acid 230 to 239; B NP) between A and B NPs. This homology is remarkable because such a large stretch of homology is not present even in the highly conserved hydrophobic region at the N terminus of HA 2 of A and B HAs (16, 37). This region then may provide some critical structure for the function of both A and B NP, such as a nucleation point in RNP formation or a critical contact with RNA or with M or polymerase (P) proteins. The entire aligned region (amino acids 51 to 557) contains 37% direct homology (184/492 residues). Since 43% of the differences (133/308 residues) are of conservative nature, i.e., basic, acidic, polar, or nonpolar amino acid replaced by similar residue, the combined structural homology between A and B NP is 64%. The most remarkable difference in the primary structure of A and B NP is the presence of 50 additional amino acids at the N terminus (Fig. 2), making it unique in that respect among the influenza virus proteins, except for influenza B NS1, which is 51 amino acids longer than influenza A NS1 (1, 38). Whether these extra sequences are involved in providing functional type specificity remains to be seen. The proteins coded by the five genes (HA, NP, NA, M, and NS) of influenza B virus that have been completely sequenced show that influenza B proteins possess structural features similar to those of influenza A viral proteins, attesting for their functional similarity. Among these viral proteins, NP shows the greatest amino acid homology between the types. However, diversity in the sequence between influenza A and B proteins is far greater than that observed among the corresponding proteins of influenza A virus strains, suggesting that influenza A and B viruses have undergone similar but independent evolutionary pathways and that

NOTES

647

none of these five RNA segments have been recently acquired by reassortment between influenza A and B viruses. Since stable recombinants between A and B viruses have not been observed even under laboratory conditions (14), it is likely that these viral proteins are functionally incompatible with the genes and gene products of another type. Furthermore, since envelope proteins can form pseudotypes with divergent viruses (6), the functional type specificity is likely to be provided by the internal components such as polymerases and NPs, etc. Since active influenza proteins, including NPs, now can be expressed from cloned cDNAs in eukaryotic cells (17), the function of these proteins as well as chimeric proteins (constructed by joining two or more DNAs) can now be tested by intertype complementation and should provide insight into their functional specificity and incompatibility with each other. We thank N. Sivasubramanian for his helpful discussion and advice. This paper was supported by grant PCM 7823220 from the National Science Foundation, by Public Health Service grant RO1 A116348 from the National Institute of Allergy and Infectious Diseases, and by a biological research support grant BRSG RR 05304 from the National Institutes of Health. LITERATURE CITED 1. Baez, M., R. Taussig, J. J. Zazra, J. F. Young, P. L. Palese, A. Reisfeld, and A. M. Skalka. 1980. Complete nucleotide sequence of the influenza A/PR/8/34 virus NS gene and comparison with the NS genes of the A/Udorn/72 and A/FPV/Rostock/34 strains. Nucleic Acids Res. 8:5845-5859. 2. Bishop, D. H. L., P. Roy, W. J. Bean, Jr., and R. W. Simpson. 1971. Transcription of the influenza ribonucleic acid genome by a virion polymerase. III. Completeness of the transcription process. J. Virol. 10:689-697. 3. Briedis, D. J., and R. A. Lamb. 1982. Influenza B virus genome: sequences and structural organization of RNA segment 8 and the mRNAs coding for the NS, and NS, proteins. J. Virol. 42:186-193. 4. Briedis, D. J., R. A. Lamb, and P. W. Choppin. 1982. Sequence of RNA segment 7 of the influenza B virus genome: partial amino acid homology between the membrane proteins (M,) of influenza A and B viruses and conservation of a second open reading frame. Virology 116:581-588. 5. Choppin, P. W., and R. W. Compans. 1975. The structure of influenza virus, p. 15-51. In E. D. Kilbourne (ed.), The influenza viruses and influenza. Academic Press, Inc., New York. 6. Compans, R. W., M. G. Roth, and F. V. Alonso. 1981. Directional transport of viral glycoproteins in polarized epithelial cells, p. 213-231. In D. P. Nayak (ed.), Genetic variation among influenza viruses. Academic Press, Inc., New York. 7. Davis, A. R., A. L. Hiti, and D. P. Nayak. 1980. Construction and characterization of a bacterial clone containing the hemagglutinin gene of the WSN strain (HON1) of influenza virus. Gene 10:205-218. 8. Dayhoff, M. O., L. T. Hunt, and S. Hurst-Calderone. 1978. Composition of proteins, p. 363-373. In M. 0. 0. Dayhoff (ed.), Atlas of protein sequence and structure, vol. 5, suppl. 3. National Biomedical Research Foundation, Washington, D.C.

648

NOTES

9. Desselberger, U., V. R. Racaniello, J. J. Zazra, and P. Palese. 1980. The 3' and 5'-terminal sequences of influenza A, B and C virus RNA segments are highly conserved and show partial inverted complementarity. Gene 8:315328. 10. Fields, S., and G. Winter. 1982. Nucleotide sequences of influenza virus segments 1 and 3 reveal mosaic structure of a small viral RNA segment. Cell 28:303-313. 11. Hiti, A. L., A. R. Davis, and D. P. Nayak. 1981. Complete sequence analysis shows that the hemagglutinins of the HO and H2 subtypes of human influenza virus are closely related. Virology 111:113-124. 12. Huddleston, J. A., and G. G. Brownlee. 1982. The sequence of the nucleoproteins gene of human influenza A virus, strain A/NT/60/68. Nucleic Acids Res. 10:10291038. 13. Kaptein, J. S., and D. P. Nayak. 1982. Complete nucleotide sequence of the polymerase 3 gene of human influenza virus A/WSN/33. J. Virol. 42:55-63. 14. Kilbourne, E. D. 1975. The influenza viruses and influenza-an introduction, p. 1-13. In E. D. Kilbourne (ed.), The influenza viruses and influenza. Academic Press, Inc., New York. 15. Krug, R. M., M. Ueda, and P. Palese. 1975. Temperaturesensitive mutants of influenza WSN virus defective in virus-specific RNA synthesis. J. Virol. 16:790-796. 16. Krystal, M., R. M. Elliott, E. W. Benz, Jr., J. F. Young, and P. Palese. 1982. Evolution of influenza A and B viruses: conservation of structural features in the hemagglutinin genes. Proc. Natl. Acad. Sci. U.S.A. 79:48004804. 17. Lin, B. C., and C. J. Lai. 1983. The influenza virus nucleoprotein synthesized from cloned DNA in a simian virus 40 vector is detected in the nucleus. J. Virol. 45:434438. 18. Maxam, A. M., and W. Gilbert. 1977. A new method for sequencing DNA. Proc. Natl. Acad. Sci. U.S.A. 74:560564. 19. Maxam, A. M., and W. Gilbert. 1980. Sequencing endlabelled DNA with base-specific chemical cleavages. Methods Enzymol. 65:499-560. 20. Palese, P., R. M. Elliott, M. Baez, J. J. Zazra, and J. F. Young. 1981. Genome diversity among influenza A, B, and C viruses and genetic structure of RNA 7 and RNA 8 of influenza A viruses, p. 127-140. In D. P. Nayak (ed.), Genetic variation among influenza viruses. Academic Press, Inc., London. 21. Palese, P., and J. F. Young. 1982. Variation of influenza A, B, and C viruses. Science 215:1468-1474. 22. Racaniello, V. R., and P. Palese. 1979. Influenza B virus genome: assignment of viral polypeptides to RNA segments. J. Virol. 29:361-373. 23. Rees, P. J., and N. J. Dimmock. 1981. Electrophoretic separation of influenza virus ribonucleoproteins. J. Gen. Virol. 53:125-132. 24. Rees, P. J., and N. J. Dimmock. 1981. Electrophoretic analysis of influenza virus ribonucleoproteins from purified virus and infected cells, p. 341-344. In D. H. L. Bishop and R. W. Compans (ed.), The replication of negative strand viruses. Proceedings of the International Symposium of Negative Strand Viruses. Elsevier/NorthHolland, Amsterdam.

J. VIROL. 25. Robertson, J. S., M. Schubert, and R. A. Lazzarini. 1981. Polyadenylation sites for influenza virus mRNA. J. Virol. 38:157-163. 26. Rochovansky, 0. M. 1976. RNA synthesis by ribonucleoprotein-polymerase complexes isolated from influenza virus. Virology 73:327-338. 27. Selimova, L. M., V. M. Zaides, and V. M. Zhdanov. 1982. Disulfide bonding in influenza virus proteins as revealed by polyacrylamide gel electrophoresis. J. Virol. 44:450457. 28. Shaw, M. W., R. A. Lamb, B. W. Erickson, D. J. Briedis, and P. W. Choppin. 1982. Complete nucleotide sequence of the neuraminidase gene of influenza B virus. Proc. NatI. Acad. Sci. U.S.A. 79:6817-6821. 29. Subak-Sharpe, J. H. 1967. Base doublet frequence patterns in the nucleic acid and evolution of viruses. Br. Med. Bull. 23:161-168. 30. Thierry, F., and 0. Danos. 1982. Use of specific single stranded DNA probes cloned in M13 to study the RNA synthesis of four temperature-sensitive mutants of HK/68 influenza virus. Nucleic Acids Res. 10:2925-2938. 31. Thierry, F., S. B. Spring, and R. M. Chanock. 1980. Localization of the Ts defect in two ts mutants of influenza A virus: evidence for the occurrence of intracistronic complementation between ts mutants of influenza A virus coding for the neuraminidase and nucleoprotein polypeptides. Virology 101:484-492. 32. Ueda, M., K. Tobita, A. Sugiura, and C. Enomoto. 1978. Identification of hemagglutinin and neuraminidase genes of influenza B virus. J. Virol. 25:685-686. 33. Van Rompuy, L., W. Min Jou, D. Huylebroeck, R. Devos, and W. Fiers. 1981. Complete nucleotide sequence of the nucleoprotein gene from the human influenza strain AIPR/8/34 (HON1). Eur. J. Biochem. 116:347-353. 34. Webster, R. G., W. G. Laver, G. M. Air, and G. C. Schild. 1982. Molecular mechanisms of variation in influenza viruses. Nature (London) 296:115-121. 35. Winter, G., and S. Fields. 1980. Cloning of influenza cDNA into M13: the sequence of the RNA segment encoding the A/PR/8/34 matrix protein. Nucleic Acids Res. 8:1965-1974. 36. Winter, G., and S. Fields. 1981. The structure of the gene encoding the nucleoprotein of human influenza virus A/PR18/34. Virology 114:423-428. 37. Winter, G., S. Fields, and G. G. Brownlee. 1981. Nucleotide sequence of the hemagglutinin gene of a human influenza virus Hi subtype. Nature (London) 292:72-75. 38. Winter, G., S. Fields, M. J. Gait, and G. G. Brownlee. 1981. The use of synthetic oligodeoxynucleotide primers in cloning and sequencing segment 8 of influenza virus (A/PR/8/34). Nucleic Acids Res. 9:239-244. 39. World Health Organization. 1971. Report: a revised system of nomenclature for influenza viruses. Bull. W.H.O. 45:119-124. 40. van Wyke, K. L., W. J. Bean, Jr., and R. G. Webster. 1981. Monoclonal antibodies to the influenza A virus nucleoprotein affecting RNA transcription. J. Virol. 39:313-317. 41. Yuanji, G., J. Fengen, W. Ping, W. Min, and Z. Jiming. 1983. Isolation of influenza C virus from pigs and experimental infection of pigs with influenza C virus. J. Gen. Virol. 64:177-182.