Complete nucleotide sequences of two soybean mosaic virus strains ...

2 downloads 0 Views 4MB Size Report
RNAs of strains G2 and G7 of soybean mosaic virus were determined. In both cases, the genome is 9588 nucleotides long, excluding the Y-terminal poly(A).
2067

Journal of General Virology (1992), 73, 2067-2077. Printed in Great Britain

Complete nucleotide sequences of two soybean mosaic virus strains differentiated by response of soybean containing the Rsv resistance gene Ch. Jayaram, John H. Hill* and W. Allen Miller Department of Plant Pathology, Iowa State University, Ames, Iowa 50011, U.S.A.

The complete nucleotide sequence of the genomic RNAs of strains G2 and G7 of soybean mosaic virus were determined. In both cases, the genome is 9588 nucleotides long, excluding the Y-terminal poly(A) sequence. A large open reading frame (nucleotides 132 to 9329) encodes a polyprotein of 3066 amino acids with a predicted Mr of either 349542 (strain G2) or 349741 (strain G7). Based on comparison with the proposed locations of cleavage sites of other potyvirus polyproteins, nine mature proteins are predicted. The

mature proteins of the two strains share 94 to 100% amino acid identity, with the greatest variability occurring in the 35K and 42K proteins. Differences in local net charge in portions of these proteins as well as differences in amino acid sequence throughout the genome are discussed in relation to resistance and susceptibility of host plants to strains G2 and G7. Comparison with other potyviruses may be useful for taxonomic clarification of viruses and strains.

Introduction

in the function of virus-encoded protease, movement or replicase proteins. One of the best characterized examples of host resistance is that of the cowpea cultivar Arlington to cowpea mosaic virus. In vitro studies suggested that Arlington leaves contain a protease inhibitor that inhibits proteolytic processing of a virus-encoded polyprotein (Sanderson et al., 1985). A translation inhibitor, although not specific to the viral RNA, may also be involved in the resistance mechanism (Ponz et al., 1988). A different resistance mechanism involves blocking of cell-to-cell movement from the initial site of replication. A single Mr 30 000 (30K) protein encoded by the tobacco mosaic virus (TMV) genome has been identified which potentiates cell-to-cell movement of TMV in tobacco plants (Deom et al., 1987; Meshi et al., 1987). It has been speculated that the 30K protein is either not expressed properly or is unable to act on cells in plants which do not support systemic movement of TMV (Moser et al., 1988; Taliansky et al., 1982). At least three genes, Rsv, Rsv2 and RSV3, confer resistance to various strains of SMV (Buzzel & Tu, 1984; Kihl & Hartwig, 1979; Lim, 1985). However, the resistance conferred by each gene can be overcome by different strains. For example, strain G7 overcomes resistance conferred by the Rsv gene in the soybean line PI 96983. However, several other SMV strains, namely G1 to G6, do not overcome resistance conferred by this gene (Lim, 1985). To improve understanding of the resistance mecha-

Soybean mosaic virus (SMV), a member of the potyvirus group of plant viruses (Hollings & Brunt, 1981), is the cause of one of the most widespread viral diseases of soybean. Several strains of the virus have been identified on the basis of both phenotypic response of differential soybean lines (Buzzel & Tu, 1984; Chen et al., 1988; Cho & Goodman, 1979) and transmission by aphid species (Lucas & Hill, 1980). Like other potyviruses, SMV genomic RNA encodes a large precursor polyprotein that is processed by a virusencoded protease(s) (Ghabrial et al., 1990) to yield several proteins (Vance & Beachy, 1984a, b). Unlike those potyviruses whose genomes have been extensively characterized, [i.e., tobacco etch virus, (TEV; Allison et al., 1986), potato virus Y (PVY; Robaglia et al., 1989), tobacco vein mottling virus (TVMV; Domier et al., 1986) and plum pox virus (PPV; Maiss et al., 1989)], SMV is seed-borne (Hill et al., 1980) and its genome structure had not been fully characterized. Host resistance can occur by interruption of the virus life cycle at one or more of several stages. Siegel (1979) identified six such steps upon which resistance could act. These are (i) entry into the cell, (ii) uncoating of the nucleic acid, (iii) translation of viral proteins, (iv) replication of the viral nucleic acid, (v) assembly of progeny virions and (vi) spread of the virus, both to new cells and new hosts. At the molecular level, evidence has supported resistance mechanisms involving alterations 0001-0752 © 1992 SGM

2068

Ch. Jayaram, J. H. Hill and W. A. Miller

nism conferred by the Rsv gene, the genomes of strains G2 (unable to induce disease in plants containing the Rsv gene) and G7 (able to induce disease in plants containing the Rsv gene) have been sequenced, and their amino acid sequences derived. The potential pathogenic relevance of differences found between genomic sequences is described here. The sequence data are consistent with a genome organization similar to that of other potyviruses (see review by Riechmann et al., 1992) and have relevance to potyvirus taxonomy.

Methods Virus purification, RNA isolation, cDNA synthesis and cloning. The origins of strains G2 and G7 of SMV and their purification have been described (Hill & Benner, 1980a; Hill et aL, 1989). Viral RNA was isolated from purified virions according to the method of Vance & Beachy (1984a). cDNA was synthesized by the method of Gubler & Hoffman (1983), using a modified kit (Pharmacia), and cloned into a pGEM3Zf(+) vector (Promega). Initially, a random primed cDNA library was constructedusing the RNA of strain G7. Approximately50 clones were sequenced and mapped to different regions of the genome by comparisonwith published potyviralsequences. From the data, four oligonucleotide primers were synthesized for use in cloning the different regions of the genomes of strains G2 and G7. cDNA sequencing. The cDNA clones were sequenced both manually and with an Applied Biosystems 370A automated DNA sequencing system using the dideoxynucleotidemethod (Sanger et al., 1977) and Taq polymerase (Promega). Overlapping cDNA clones of different sizes were used to eliminate almost completelythe need for subcloning. Every base was determined by sequencing at least two independent clones or by sequencing twice from a single clone. RNA sequencing and 5" end determination. RNA was sequenced directly using a modified procedure of Mierendorf & Pfeffer (1987). In a volumeof 10 ~tl, a mixture of 1 ~tgof viral RNA and 10 pmol of a 25met primer, which anneals between bases 66 and 90 at the 5' end, was heated for 3 rain at 75 °C and allowed to cool to 42 °C. Two ~tl (16 units) of avian myeloblastosis virus reverse transcriptase (Promega) and 4 ~tl of [ct-3zp]dATP(400 Ci/mmol) were added to the mixture. A 3 ~tl aliquot was removed and added to 3 Ixl of a solution containing 250 ~tM each of dCTP, dGTP and dTTP and one of the four dideoxyNTPs (172 IxM-ddCTP, 15.3 ~tM-ddATP, 1 mM-ddTTPor 250 laM-ddGTP). The reactions were incubated at 42 °C for 15 min, after which 1 ~tl of chase solution containing 2 mM each of all four dNTPs and 2.5 units of terminal deoxynucleotidyl transferase (BRL) were added, and the mixture was incubated at 42 °C for an additional 15 min. Nucleotide sequence alignment and data analysis were performed, compiled and analysed using sequence analysis software from the Genetics Computer Group (version 6.0; Madison, Wis., U.S.A.) and an IBM-compatibleprogramby W. R. Bottomley(CSIRO, Divisionof Plant Industry, Canberra, Australia).

Results Phenotype o f soybean plants inoculated with virus strains Soybean lines PI 96983 and Williams '82 responded differently to mechanical inoculation with strains SMV G2 and G7 (Fig. 1). PI 96983, containing the resistance

gene Rsv, was resistant to strain G2, but systemic necrosis developed in plants inoculated with strain G7. Systemic mottling occurred when Williams '82, which lacks the Rsv gene, was inoculated with either strain.

Nucleotide sequence analysis o f S M V strains G2 and G7 The SMV G2 and G7 c D N A inserts from overlapping sets of 42 and 51 c D N A clones, respectively, were chosen for nucleotide sequence analysis. These c D N A inserts cover the entire genomes of G2 and G7 except the 5'most 27 and 25 (strains G2 and G7, respectively) nucleotides, which were determined by direct R N A sequencing.

Genome organization o f S M V R N A The genomic R N A of both strains of SMV is 9588 nucleotides (nt) long, excluding the Y-terminal poly(A) sequence (Fig. 2). This is comparable to the genomes of T V M V (9471 nt; Domier et aL, 1986), T E V (9495 nt; Allison et al., 1986), PVY (9704 nt; Robaglia et al., 1989) and PPV (9741 nt; Maiss et al., 1989). The base composition of both strains was 32% adenine, 24% guanine, 18% cytosine and 26% uracil, in agreement with previous observations for G2 (Hill & Benner, 1980b). The base composition is nearly identical to that of TVMV (Domier et al., 1986). Computer translation of the R N A s and their complements revealed a single, large open reading frame (ORF) beginning at the first A U G on the genome (base 132) and terminating with a U A A codon at position 9330. This differs from PPV and T V M V which appear to initiate translation of the polyprotein at the second and third A U G codons, respectively, in the genome. The large O R F of SMV encodes a 3066 amino acid polyprotein with Mrs of 349542 (G2) or 349741 (G7) (Fig. 3).

Polyprotein cleavage sites Based upon the proposed locations of cleavage sites, and sizes of predicted mature (fully processed) proteins of TEV, TVMV, PVY and PPV (Carrington et aL, 1989; Domier et al., 1986; Dougherty & Parks, 1991; Dougherty et al., 1988 ; Ghabrial et al., 1990; Maiss et al., 1989; Parks & Dougherty, 1991; Robaglia et al., 1989), and based upon alignments of amino acid sequences for each protein, nine mature proteins are predicted for SMV (Fig. 4). At least five sites are cleaved by the nuclear inclusion (NI) protein a (NIa) (27K) protease (Parks & Dougherty, 1991). The consensus cleavage site for this protease from SMV G6 is (E/N)XVXXQ'(G/S) (Ghabrial et al., 1990). [Amino acids in parentheses represent alternatives at that position relative to the cleavage site

Nucleotide sequences of two S M V strains

2069

Fig. 1. Phenotypicresponsesof Williams '82 and P196983soybeansto inoculationwith strains G2 and G7 of SMV. Williams '82 plants inoculated with (a) G2 and (b) G7 were susceptibleand showed systemicmosaic whereas PI 96983 was immune to (c) G2 but systemic necrosis occurred in plants inoculated with (d) G7.

which is shown by t h e ' symbol. X represents any amino acid.] All sites in Fig. 4 that contain a Q are those predicted to be cleaved by the 27K protease. Cleavage between amino acids 2041 and 2042 is a late event separating the VPg (viral protein-genome linked) (21K) from the protease (27K) (Dougherty & Parks, 1991). Although this cleavage has been shown only for TEV, these authors showed a consensus sequence of (E/Q) (D/E/R)(L/V)XXE'(G/S/A)(E/K)(S/A)(L/V) at this site among known potyviruses. Carrington et aL (1989) identified the cleavage site G ' G at the C terminus of the helper component which

catalyses its own cleavage from the polyprotein at this site. Thus, this region is designated H C - P R O (helper component-protease). The N-terminal protein (35K in SMV) also serves as a protease to cleave itself from the polyprotein (Verchot et al., 1991). A consensus of (Y/F)'S has been reported by Mavankal & Rhoads (1991). Fulllength sequences that were published before this information was known used Q'S(G/S/A) (the NIa protease consensus) as the cleavage site for all the mature proteins. This led to some predictions of different termini, which we have revised in the alignments of the 35K and 42K ORFs (Fig. 5).

2070

Ch. Jayaram, J. 1t. Hill and W. A. Miller

AAATTAAAACTCGTTATAAAGACAACAAACAATTTAATCGCAAACAGAAATTTTCGTAATTACATTTCTACAAGCAACCATTACTCTAGTTATTTGCAGTTTCACATTTCii0 CC

G

C

A

G C

C

T

CTCACAGCAATAGCAAGTCAAATGGCAACAATCATGATTGGAAGCATGGCGATTTCTGTGCCAAACACTCACGTCTCGCGCGCATCGAATTCTGTGATGCCGGTTCAAGC C C A T T A CT A G

220

AGTTCAGATGGCAAAACAAGTGCCTTCTGCTCGTGGGGTGTTATACACACTTAAGAGAGAGGGCAGCACGCAAGTCATAAAGCATGAGGAGGCACTGCG~AAATTTCAAG G A A A A A GCAT T

330

AAGCATTCGACCAAGATGTTGGCATTCAGCGAAGGCTTCTAGTAAACAAGCATAGTTCCATACAATCCACAAAGGAAGGATGGTTTGACCTTGCGTCGCTTAACTTTAGA A G G T A A

440

GC•GGCTCGAGCAAAGAAGCGGCAATTGCAAGGCGAAAGCAAG•AGAGGAAGACTTTCTCAATGGGAAGTATGAACAGCAATTTTACGCTGGTGTTTCCGCT ACAAAGTC

550

T

G T

A

CATGAAGTTTGAAGGAGGGAGTGTTGGGTTCAGAACAAAGTACTGGAGACCAACTCCAAAGAAGACTA•AGAAAGGCGTGCAAC•TCACAGTGTAGGAAACCAACATAT A GG TTTTGGAGGAGGTTCTTTCCATAGCTTCAAAGAGTGGTAAGCTGGTTGAATTTATCACAGGCAAAGGGAAGAGAGTCAAAGTCTGTTATGTCCGTAAGCATGGCGCAATA T A A GA

G

660

770 T

TTGCCCAAGTTCTCCCTCCCGCATGAGGAAGGCAAATATATCCATCAGGAGCTTCAGTATGCAAGCACATATGAATTTCTTCCCTATATTTGCATGTTTGCAAAATATAA A T A T A G T T

880

GAGCATAAATGCGGATGATATAACTTATGGAGATAGTGGTTTACTGTTTGATGAGCGATCATCTTTAACCACAAATCACACTAAGTTACCGTACTTTGTTG G AT A

990

GGAGGAATGGGAAGCTCGTTAACGCTCTTGAAGTGGTTGAAAACATGGAGGATATTCAGCACTACTCCCAAAATCCTGAAGCTCAGTTTTTCCGTGGTT A T C G TTTGATAAAATGCCTCCTCATGTGGRGAATCATGAATGCACCATTGATTTCACAAAT

TTCGGGGAA A GGAAA

G GTG

ii00

GAACAATGTGGTGAATTGGCAGCAGCAATAAGCCAATCAATCTTT C C A G T T A A G

1210

GAAACTATCATGTAAGCAATGTCGGCAGCACATTAAGCAC CTCAGTTGGGAGGAGTATAAACAATTCCTCTTGGCTCATATGGGCTGTCATGGGGCTGAAT G AT AC TCCAAGAAAT T G A C G G C A T G A G G T A T G T G A A G A G A G T G A T T G A G A C A T C A A C T G C G G A A A A T G C A A G T C T G C A A A C A T C A C T G A G G

GGGAAACTT A

1320

GGAGATTGTGCGTTTAACGCA.GAACTAT C C A

1430

AAGAGCACTCACATGCTTCAAATACAGGATATTAATAAGGCTCTGATGAAGGGTCCATCGGTAACACAGAGCGAGCTGGAGCAAGCGTCCAAGCAGCTGCTCGCAATGAC AT C G T T G

1540

ACAGTGGTGGAAGAATCACATGGCTTTGACTGATGAAGATGCACTTAAAGTGTTCAGGAATAAGAGATCTTCCAAAGCACTACTTAAC C

1650

CCAAGTTTACTTTGT GAT AACC

AGTTGGACAAGAATGGTAACTTTGTTTGGGGAGAGCGTGGCAGGCATTCAAAGCGATTCTTTGCGAATTATTTTGAAGAGGTGGTTCCTTCTGAAGGGTACAGCAAGTAT T A

1760

GTGATCAGAACGAATCCAAATGGGCAAAGGGAGTTGGCAATTGGGTCACTC G A C A

1870

ATTGTGCCGTTGGATTTTGAGCGCGCTCGAATGGCATTACAGGGCAAGAGC A A C

AGAGCCAATTACAATGTCATGTATCTCAAGACAAGACGGAAACTTTGTGTATCCTTGTTGT G C

GTAACAAG

TGTGTCACACATGATGATGGCAAAGCTTTCTATTCTGAGCTCAAGAGTC T T

1980

CTACAAAGC G C C A C T T G G T T A T T G G A A C A T C T G G T G A C C C G A A A T A C A T T G A T C T A C CAGCCACTGATGCAGACAGGATGTACATAGCTAAAGAAGGATTTTGTTAC CTT T A T

2090

AAT • T C T T C T T G G C A A T G T T G G T T A A T G T A A A T G A A G A T G A G G C C A A A G A C T T C A C G A A G A T G G T A A G G G A T G T C A T T G T A C C A A G G T T A G G A A A G T G G C C G A C A A T G T T C T T G T

2200

AGATGTAG C A A C A G C T G C A T A C A T G C T C A C A G T T T T T C A C C C T G A A A C C A G G A A T G C T G A G C T C C C A C G T A T T T T G G T T G A C C A T G C A C C C

2310

CGTGTCAAAC CATG C A C G T G A T T G

ACTCTTTTGGATCCTTGACAGTTGGGTACCATGTTCTTAAAGCTGGTACAGTGAATCAATTAATTCAATTTGCTTCTAATGACCTTCAGAGTGAGATGAAATTCTACAGA T C G C T

2420

GTTGGTGGTGAAGTGCAACAGAGAATGAAGTGTGAAACAGCACTTATAACAAGCATTTTCAAACCTAAGAGAATGATTCAAATCCTT G G C T G G

2530

GAAAATGACCCATACATTCTCTT

GATGGGCTT GGTTTCAC CTTCTATCTTGATTCACATGTATCGTATGAAGCATTTTGAGAAAGGGGTGGAGTTGTGGATAAGTAAAGAACATAGT G G

GTGGCAAAGATTTTCA

TCATATTGGGACAACTCACTAAGAGGGTCGCTGCAAATGATGTGCTACTTGAGCAACTCGAAATGAT•TCAGAAACTTCTGAGAGGTTCATGAGTAT A T C A T T G

CT TAGAGGATTG C

CCTCAAGCACCACATTCATACAAGACAGCAAAAGATTTGTTGACAATGTACATAGAAGGAAAAGCATCCAACAACCAATTGGTGGAGAATGGTTTTGTAGATATGAATGA T C T A CAAATTGTACATGGCATATGAAAAAATCTACTCAGATCGCTTGAAGCAGGAATGGCGCGCATTAAGCTGGTTGGAAAAATTTTCTATAACATGGCAATTGAAAAGATTT

2640

2750

2860

G A

2970

CTCCACATACGGAGAAATGTTTGACAAAGAAAGTTGTAGAAGAAAGCAGCGCATCTTCAGGAAACTTTGCGAGTGTGTGCTTCATGAATGcCCAGTCACACCTAAGAAAT A C A C

3080

GTAAGAAATACACTTTTCCAAAAATGTGACCAGGTTTGGACTGCATCGGTGCGAGCCTTTGTGAGGCTCATAATTTCAACACTTCACAGGTGCTACAGTGATATAGTTTA A T C C C

3190

TCTGGTAAACATCTGTATAATCTTTTCCTTGCTTGTCCAAATGACTAGTGTACTGCAGGGCATTGTCAACACAGCAAGGAGAGACAAAGCACTCTTAAGTGGATGGAAAA CT A G C A A T A T A T G G GCA A TCAT G

3300

GGAAAGAAGATGAAGAGGCCGTGATTCATTTGTAT GAAATGTGTGAAAAGATGGAAGGTGGACATCCAAGTATTGAGAAATTTTTGGACCATGTCAAGG A G T G G C G AGG A G

GAGTTAGAC CT A

3410

G A T C T A C T C C C T G T G G C A G T A A G C A T G A C A G G G C A A T C A G A A G A T G T C T CCGCACAGGCCAAAACAGCAACTCAATTGCAACTTGAGAAAATTGT G G C A T T T A T G G C T T T G TT A AGG A A A T T G C

3520

GTTGACCATGTGTATTGATAATGAAAGGAGTGATGCGGTTTTCAAAGTATTGAGCAAGTTAAAGGCATTTTTCAGCACAATGGGTGAGGATGTTAAAGTGCAGAGTCTTG A C A C G G G

3630

Nucleotide sequences of two S M V strains

ATGAA/~TT CA~/~GCATT GAT G ~ G A T A A G ~ G C G G A C AATAGACAGT TGGAACAGAATAGAGTAATTC AC G

TCACAATTGAT TTCCACCTTGAAAC~TAAGGAGT CTTC C A G T G T C T C T T T T G A T G T C A A G T T T G A G G C C T G G T G G G A A T T A AA CACACTACAGGTCGACAGGTGAGTTT T

CT G G A G T T C A C A C G A G A A A C A G C A G C C A A A A T T G C A A A T T T GGT A G C A A C A T C A A G G

3740

3850

AAGCCACACAGAATTTTTGATTAGAGGTGCAGTTGC~TTCAGGGAAATCAACAGGTTTACCACACCACCTTTCAAGGAAGGGCAAAGTTCTGCTACTGGAACCAACTAGAC C C G T A T T T T G

3960

CGTTAGCGGAGAATGTCAGTAAGCAGTTGAGCTTTGAACCTTTCTATCACAATGTAACATTGAGGATGAGAGGATTGAGCAAGTTTGGCTCAAGCAACATAGTTGTTATG C T T TC T

4070

ACAAGTGGATTTGCGTTCCATTACTACTTTAACAATCCACAACAGCTATCTGATTTCGATTTTATCATAATAGATGAATGCCATGTTCAAGATAGCCCAACGATTGCATT G T TT C

4180

C A A C T GT GC GC T T A A A G A A T T T G A A T T C A G T G G C A A G C T T A T A A A A G T G T C C A C G G T C

4290

TGCAACGACTCCAGGGAGAGAGTGC C G G A

G A A T T C A C A A C GCAAC AC CC G G T G A A G CT G A A A G A A AT A

TTGAAGACCATTTGTCTTTTCAGAACTTTGTGCAAGCTC•AGGTACAGGATCAAATGCTGATATGATCCAACATGGGAACAACTTACTTGTATATGTT•CAAGCTACAAT 4400 C C C A T G T T T G A A G T T G A C CAAT TGT CAC G A T T A T T A A C T G A G A A A C A T A

TATAAGGTGACAAAGGTTGATGGGAGAACAATGCAAATGG C C G

GAAAT GTAGAGATT G CAACCACAGGCAC CGA A A

4510

GGT~AAACCACACTTCATAGTCGCAACAAACATCATTGA~AATGGAGTGACTCTTGATATTGATTGCGTAATTGATTTTGGACTTAAAGTGGT~C~CTACCCTTGACACAG 4620 GA G T G C T G C T ATAACCGGT~T~TGCGTTACAACAAACAGTCAGTTTCCTATGGAGAGCGAATTCAAAGACTTGGCAGA~TTGGTCGTTGTAAACCTGGATTTGCGCTCA~ATTGGACAC C C A T C C G G A T G

4730

ACAGGAAAAGGAGTTGAGGAAGTTCCCGAGTTCATAGCTACAGAGGCAGCTTTTCTATCCTTTGCTTATGGGTTGCCAGTTACAACACAAAGTGTCTCGACCAATATACT G A A C CC A A C

T

4840

GTCCCGTTGCACAGT•AAACAAGCTCGAGTAGCTCTAAATTTTGAGCTAACTCCATTTTTCACCACTAATTTCATAAAGTATGATGGTAGCATGCAC•TGATTGACAcAA G A T G C CCAGAGAT

CAC

4950

GACTGCTCAAGTCCTATAAACTCAGGGAGTCTGAGATGTTGCTGACCAAGTTAGCCATACCATATCAGTTTGTTGGGCAGTGGGTAACAGTcAAG•AGTATGAACGTCAA T C C T A A A A T

5060 G

GGTATCCACCTCAATTGTCCAGAGAAAGTGAAAATACCTTTCTATGTGCATGGAATACCA~ACAAGTT~TATGAGATGTTGTGG~ACACAGTTTGTAAATACAA~AATGA G T g T

5170

TGCTGGGTTCGGCTCAGTCAAGAGTGTGAATGCAACGAA•ATTAGTTACACTCTAAGCACTGACCCAACAGCAATTCCTCGCACACTTGCAATACTGGATCATTTGTTGA T C

5280 A

GTGAGGAGATGAC CAAGAAGAGTCATTTTGACACAATTGGCTCTGCTGTCACTG~GTATTCCTTTTCTCTTGCAGGCATAGCTGATG~qATTTAGGAAGAGGTATTTAAAG A T A C C C G G

5390

GACTACACACAGCATAATATAGCCGTTTTACAACAGGCTAAAGCACAGTTGCTG~AATTTGATTGCAACAAAGTTGACATCAACAACCTGCACAATGTTGAG~GTATAGG C T A CC AC A

5500

CATTTTAAATGCAGTCCAACTACAGAGCAAGCATGAAGT•AGTAAATTTTTGCAGCTCAAAGGAAAGTGGGATGGGAAGAAATTCAT•AATGATGCTGTCGTGGCTATCT G T A G A A A T

5610

TCACTTTAGTG•••GGTGGTT•GATGTTATGGGATTACTTCACAAGAGTTATAC•TGAACCAGTATCAACTCAAGGAAAGAAGAGGCAGATACAAAAACTCAAATTTAGA G C G GATGCCTTTCACAGAAAAATAGGC T G

CGTGAGGTGTATGCAGATGACTACACCATGGAACACAGGTTTGGGGAGGCATATACCAAGAAA•GAAAGCA•AAGGGTAGCACCC• C g CC C C A

5720 G 5830 T

TACAAAAGGAATGGGTCGCAAGTCGAGGAACTTCATACATCTATATGGAGTTGAGCCAGAGAATTATAGCATGATTAGATTTGTAGACCCGCTAACTGGACATACAATGG C G A T CT G C C T T C

5940

ATGAACACCCCAGAGTTGATATTAGAATGGTTCAACAAGAGTTTGAGGAGATAAGGAAAGACATGATTGGGC,AGGGTGAATTGGATCGGCAAAGAGTCTACCACAATCCT 6050 G C G G C A C G G T T T A C A A G C T T A T T T CAT T G G G A A G A A T A C A G A G G A A G C A C T C A A G G T T G A C C T C A C A C

C G C A C A G A C C C A C A C T T C TCT G C C A A A A C A G C A A T G C T A T AG C G G G T T T A

6160

TCCTGAGAGGGAGGATGAATTGCGTCAGACAGGATTGCCACAAGTAGTTTCCAAGTCAGACGTCCCACGTGCCAAAGAAAGGGTTGAAATGGAAAGCAAATCTGTTTACA 6270 C G A T A G C AAGGACTCAGAGATTATAGTG G T

G CATTTCCACACTAATAT GTCAAC TTACAAATTCATCAGATGGGCACAAAGAAACAATGTTT T C G G T

ACAAAT GGACACTT GTTTAGAAGGAACAACGGAATGCTTACAGTTAAGACAT T C T C

G GGGTT GGCTAT GGTTCTTTCATTATC C T

6380

GGCATGGTGAGTTTGT GATACACAACACAACACAGCT CAAGATACATTTTATTCAAGG

6490

G A G G G A T G T G A T T T T G A T T C G C A T G C C A A A G G A C T T ~ C C T C CTT,fT G G A A A G C G C A A C C T C T T T A G A C A A C C A A A G C G T G A G G A A C A C T A C A T TCCAAGAGAAGAGCTT G CGCGCAACAGTTTC GGAATCTTCCATGA~ ATTGCCAGAGGGGAAAGGTTCTTTCTGGATACACTGGAT T A T G G C T TTGCCT CTTGTTTCTGTTAATGATGGGCACATTGTTGGAATACAT C

GGGTTTGTAT GGTTGGGACAAACT T

6600

CACAACC CAAGAT GGTTTTTGT GGG g C

6710

GGATTAACATCTAATGATTCAGAGAAGAACTTCTTCGTCCCACTCAC C A

ATATCT GGAGAATGCT GATAACTTGTCATGGGATAAGCATTGGTTTTGGGAAC A C

CAAGCAAGATAGCATGGGGCTCTTTGAATTTAGTCGAGGAACAAC C T

TCAAAATATCAAAGCTT•TGTCGGATCTCTTT•GAAACACAGTGACAGTTCAAGGGA•AAA•GAAAGATGGGTTTTGGAT A T A A G

CAAAAGAGGAAT G

GCAATGGAAGGTAACTTAGCGGCTTGT C T

CAAGAC GACAGTGCACTGGTAACAAAGCATGTTGTTAAAGGAAAGTGCCCCTATTTCGCACAATATCTTTCAGTGAATCAAGAAGCAAAGTC C T T G GGGTGC GTATCAAC CAAGCC GATTGAACAAAGATGCATTCAAAC A

T GATGGGT TCGAGAAGGA T A

GAGGTTTCTTCAAATATAACAAACCAGTTGTTCTGAATGAAGTT AC G T

GGG

6820

6930

7040

CTTCTTCGAACCACTTAT T

7150

GATTTCCAAT CTTTTGAGAGGG T G C A

7260

2071

Ch. Jayaram, J. H. Hill and W. A. Miller

2072

CAGTGGCTGGAGTGAAATTGATGATGATGGAATTTGATTTCAAGGAGTGTGTGTATGTGACTGATCCTGATGAGATATATGACTCCTTGAATATGAAAGCTGCAGTTGGT

7370

GCACAATACAAAGGGAAGAAGCAAGATTATTT

CT CT GGAAT G G A C A G T T T T G A C A A G GAAC G C T T G C T T T A T C T C A G T T G C G~d%AGGT T A T T T T A T G G G G A A A A A G G A G T C C C C

7480

GTGGAATGGATCCCTGAAAGCAGAGCTAAGGC T G

CAAT T G A A A A A G T G C A A G C A A A C A A A

7590

GTGTTGATGATTTCAACAACCAATTTTACAGC T

C TAGAACATTCACAGCAGCAC CAATT GACACATTACTTGGAGCAAAAGTTT C G T C G

CT CAAT C T T A C A T G T C C A T G G A C A G T T G G G A T G A C C A A A T T T T A T A G A G G T T G G G A T A A G T T G A T G A G A A G T T T A C C C G

GATGGATGGGTGTACTGTCATGCAGATGGTTCACAGTTTGATAGCTCCC G T C A T

CC T

T GACGCCCTTACTACTGAATGCAGTTCTTGATGTTAGGAGCTTTTTCATGGAAGACTGGTG A A T G G

7700

7810

GGTTGGAAGAGAAATGCTAGAGAACCTCTATGCTGAAATAGTCTACACACCAATTCTAGCACCTGATGGCACAATTTTTAAGAAGTTCAGAGGAAACAACAGTGGGCAAC G A G T T C

7920

CAT CCACAGTTGTGGACAATACCTTGATGGTAG•CATTGCCCTGTACTATTCTGGTTGTAAACAAGGGTGGTCAGAGGAGGACATTCAGGAAAGATTAGTGTTTTTCGCC T C G T G

8030

A A T G G C G A T G A C A T CAT T CTT G C A G T T AGT G A T A A G G A C KCAT G G C T T T AT G A C A C T CTT A G C A C T T C A T T T G C T G A A C T T G G T C T C A A T T A C A A C T T T G A G G A A C G G A C T C G G C AG G C C C

8140

AAAGAAAAGGGAGGAATTGTGGTTCATGTCCCACAAAGCCATGTTAGTTGATGGAATTTATATTCCAAAACTTGAGCCTGAGAGAATTGTCTCTATCCTAGAGTGGGACA A C C A

8250

GGAGCAAAGAGCTTATGCATCGCACTGAGGCGATATGCGCATCAATGATTGAGGCAT•GGGATACACTGAATTGCTGCAGGAGATCCGCAAATTTTATTTGTGGCTTTTG G

8360 C A

A A C A A G G A T G A A T T T A A G G A G C T C G C T T C GTC T G G A A A A G CAC C A T A T A T T G C A G A G A C A G C TT T G A G A A A G C T A T A C A C A G A T G T C A A T GC G C A A A C A A G T G A G C T A C A A T G T G

8470

AAGATATCTTGAAGTGCTGGATTTCACTCATGCTGATGACTGTTGTGAATCAGTGTCCTTACAATCAGGCAAGGAGAAGGAAGGAGATATGGATGCAGGTAAGGATCCAA A

8580

AGAAGAGCACCAGTAGTAGTAAGGGAGCTGGCACAA•CAGCAAAGATGTAAATGTTGGATCAAAGGGAAAGGTGGTTCCGCGTTTGCAGAAGATCACAAGGAAGATGAAT C C A T T A

8690

CTTCCAATGGTTGAAGG~AAGATCATCCTCAGTTT~GACCACTT~CTTGAGTACAAACCTAATCAGGTTGATTTATTCAACACTCGAGCAAC~n~GAACACAGTTCGAAGC T T C C T

8800

GTGGTACAATGCAGTT •AAGATGAATATGAGCTTGACGATGA•CAGATGGGTGTGGTTATGAATGGCTTCATGGTATGGTGCATTGACAATGGTACATCTCCAGATGCTA T A

8910

ATGGCGTGTG•GTGATGATGGATGGAGAGGAACAGATTGAATATCCGCTGAAACCCATTGTC•AAAATGCAAAACCAACTTTGAGACAAATCATGCACCATTTCTCAGAT.

9020

GCAGCAG~GCTTACATTGAGATGAGAAATTCTGAAAGTCCGTATATGCCTAGATATGGACTACTGAGG~TTTGAGAGATAGAGAGCTAGCTCGCTATGCTTTTGATTT T A G A A T C CTATGAGGTTACTTCTAAAACACCAAACAGGGC~GGG~GC~TAGCGCAGATG~GGCTGCAGCTCTCTCGGGAGTT~C~C~GTTGTTT~GACTTGATGGG~CA C AA

9130 C

9240 AT

TCTC~CC~CT~CGAAAATACTGAAAGGCACACTGC~GGGATGTG~TCAAAACATGCACACTCTTTTG~CATGGGCCCACCGCAGT~TAAAGGCT~GTAAATTG9350 C

A

9460

GTCACAGTTATCATTTCGGGTCGCTTTATAGTTTACTAT~TATAGTAGTTGCACTGTCTTTAAATATAGT~GATTGCATCACCAAAT~TGTTTGTGTTTAGTGTGGT C G T G G AC T TTT~CCACCCCAGTGTGCTTTATGTTATAGTTTATG~TGCCAGCGAG~CCATTGTGTTGCCGGAGCCCTTTG~GAGTGATTTCATCACCTCTAGTGGCCGAGGTGC T A A T C CT G T GGC~TGTTTGTTGTCCT

9570 A 9588

Fig. 2. Nucleotide sequences of SMV strains G2 and G7. The full sequence of the G2 strain is shown. Bases of the G7 sequence that differ from G2 are shown below the G2 sequence.

Table 1. Percentage amino acid sequence identity of

predicted mature proteins of SMV strain G7 with those of other potyviruses including SMV strain G2 SMV G7* Virus G2 PPV TVMV PVY TEV

35K HC-PRO 42K CIP 6K 21K 27K POL 94 14 6 13 9

98 44 45 37 43

94 29 13 17 25

96 72 51 52 51

100 32 32 32 45

99 53 45 45 46

* Abbreviations are as designated in the text.

99 48 47 35 34

98 60 54 56 55

CP 99 51 52 64 60

Comparison of mature proteins with those of other potyviruses T h e nucleotide and a m i n o acid s e q u e n c e s of S M V G 2 and G 7 were 94% and 97% identical, respectively, w i t h c h a n g e s m o r e pronounced in the 5' region (Tables 1 and 2). Based on the proposed g e n o m i c m a p o f S M V , an a m i n o acid sequence c o m p a r i s o n o f strain G 7 w i t h T E V , T V M V , P V Y , P P V and strain G 2 (Table 1) s h o w s that the m o s t c o n s e r v e d regions a m o n g the five potyviruses are the cylindrical inclusion protein (CIP), the putative R N A - d e p e n d e n t R N A p o l y m e r a s e (POL; Robaglia et al., 1989) and the coat protein (CP). POL, C I P and 2 1 K

2073

Nucleotide sequences of two S M V strains

M A T I M I G S M A I SVP NT HV S RAS N S V M P V Q A V Q M A K Q V P S A R G V L Y T L K R E G S T Q V I K H E E A L R K F Q E A F D Q D V G I Q R R L L V N K H S S I Q S T K E G W F D LAS LNF RAG S S K E A A I A R R K Q E E E L I C T N H S G E N

120

DFLNGKYEQQFYAGVSATKSMKFEGGSVGFRTKYWRPTPKKTKERRATSQCRKPTYVLEEVLS IASKSGKLVEF I T G K G K R V K V C Y V R K H G A I LP KFSLP H E E G K Y I H Q E L Q Y A S T Y E F L

240

N

D

T

A

S

I

I( P Y I C M F A K Y K S INADD I TYGD SGLLFD ERSS LTTNHTKLP Y F V V R G R R N G K L V N A L E V V E N M E D I QHYSQNP EAQFF RGWKKVFD KMP P HVENHECT ID F T N E Q C G E L A A A I SQSI FPVK S S I K

360

KLSCKQCRQHI K H L S W E E Y K Q F LLAHMGC H G A E W E T F Q E IDGMRYVKRVI ET STAENAS LQT SLE IVRLTQNY KSTHMLQ IQD I N K A L M K G P SVTQ S E LEQAS K Q L L A M T Q W W K N H M A L T

480

N

T

K

K

D

N

R

D E D A L K V F R N K R S S K A L L N P S L L C D N Q L D K N G N F V W G ERGRHS KRF FANYF EEVVP SEGYSKYVI RTN PNGQRELAIG SLIVP LD F E R A R M A L Q G K S V T RE P ITMSCI S R Q D G N F V Y P C C K N

600

C V T H D D G K A F Y S ELKS PT K R H L V I G T S G D P KY I DLP ATDAD RMY I A K E G F C Y L N I F L A M L V N V N EDEAKD FTKMVRDV IVP RLGKWP T M L D V A T A A Y M L T V F H P ETRNAELP RI LVD HAC M F

720

QTMHVID SFG S L T V G Y H V L K A G T V N Q L I Q F A S N D L Q S EMKFY RVGG E V Q Q R M K C E T A L I T S IFKP KRMIQI LENDPYI LLMGLVS P S ILI HMYRMKHF E KGVE LWI S KEHSVAKI FI I LG I V E

840

Q L T K R V A A N D V L L E Q L E M I S E T S E R F M S I LEDCPQAP H•YKTAKDLLTMYIEGKASNNQLVENGFVDMNDKLYMAYEKIY•DRLKQ•WRALSWLEKFSITWQLKRFAPHTEKCLTKKVVE S ~ T A

960

ESSASSGNFASVCFMNAQSHLRNVRNTLFQKCDQVWTASVRAFVRLIISTLHRCYSDIVYLVNICI I FS L L V Q M T S V L ~ G I V N T A R R D K A L L S G W K R K E D EEAVI H L Y E M C E K M E G G H P S

1080

N

F

I

HSH

R

I EKF LDHVKGVRP D L L P V A V S M T G Q S E D V S A Q A K T A T Q L Q L E K I V A F M A L L T M C I D N E R S D A V F K V L S K L K A F F STMGEDVKVQS LD E I QS I D E D K K L T I D F D L E T N K E S S S V S F D V K F E V G N R S RN V V N

1200

A W W N R Q L E Q N RVI P H Y R S T G E F L E FTRETAAKI ~ M L V A T S S HT EFLI RGAVGS G KSTGLP HHLS RKG KVLLLEP TRP LAENVSKQLS FE P FYHNVT LRMRGLS KFG S SNI VVMT SGFAFH D Q K L

1320

Y Y F N N P Q Q L S D F D F I I I D ECHVQD S PT IAFNCALKE F EF S GKLI KVSATT PG RECE F T T Q H P V K L K V E D HLS F Q N F V Q A Q G T G S N A D M I Q H G N N L L V Y V A S Y N E V D Q L S R L L T E KHYKVT V S L P Q

1440

K V D G R T M Q M G N V E IATTGTEVKP HF IVATN I I ENGVT LD I DCVI D F G L K V V A T L D T D N R C V R Y N K Q S V S Y G E RI QRLGRVGRC KP G FALRI G H T G K G V E E V P E FIAT EAAFLS FAYG LPV G L V I

1560

T T Q S V S T N I LS R C T V K Q A R V A L N F E L T P F FTTNF I K Y D G S M H V I DT RLLK SYKLRE S EMLLT KLAI P Y Q F V G Q W V T V K E Y E R Q G I HLNC P EKVKI P FYVHG I P D KLYF~MLWDTVCKY KND PEIH P I V V

1680

X

A G F G S V K S V N A T K I SYTLSTDPTAI P RTLAI L D H L L S E E M T K K S H F D T I G S A V T G Y S F S L A G I A D G F R K R Y L K D Y T Q H N I A V L Q Q A K A Q L L E F D C N K V D I N N L H N V E G I G I S R VI

LNAVQLQSK

1800

HEVS K F L Q L K G K W D G K K F M N D A V V A I F T L V G G G W M L W D Y F T R V I R E P V S T Q G K K R Q I Q K L K F R D A F D RKI G R E V Y A D D Y T M E H R F G E A Y T KKGKQ KG S T R T K G M G R K S RNF I HLYGVEP E V T

1920

NYSMI R F V D P L T G H T M D E H P RVD I RMVQQEFEEI R K D M I G E G E L D R Q R V Y H N P G L Q A Y F I G K N T E E A L K V D L T P HRPTLLCQNSNAI AGFP E R E D E L R Q T G L P Q W S KSDVP R A K E R V E M

2040

k E S K S V Y K G L R D Y S G I STL I C Q L T N S S D G H K E T M F G V G Y G S F I I T N G H L F R R N N G M L T V K T W H G E F V I HNTTQLKI HFIQGRDVI LI ?~MpKD FP p FG KRNLFRQP KEg E R V C M V G T N F Q E K I K

2160

SLRATVSESSMI LP£GKGSFWI

HWITTQDGFCGLPLV••NDGH•VG•HGLTSND•EKNF•VPLTDGFEKEYLENADNLSWDKHWFWEP•KIAWG•LNL•EEQPKEEFKI•KLV•DLFG•T

2280

V T V Q G R K E R W V L D A M E G N L A A C G Q D D S A L V T K H V V K G K C P y F A Q Y L S V N Q E A K S F FE p LMGAYQP S RLN KDAFKRGFF KYN K P W L N EVDFQS F g R A V A G V K L M M M E F D F K E C V Y V T D P D K V A D A K

2400

E I YDS L N M K A A V G A Q Y K g K K Q D Y F S G M D S FD KE RLLY LS C E R L F Y G E K G V W N G S LKAELRP I E K V Q A N K T R T F T A A P I D T L L G A K V C V D D FNNQFYS L N L T C P W T V G M T K F Y R G W D KLMR

2520

SLPDGWVYCHADGSQFDSSLTPLLLNAVLDVRSFFMEDi~dVGREMLENLYA~IVYTPILAPDGTIFKKFRGNN~GQPSTVV~NTL~IALYYSGCKQGW~EEDIQERLVFF/UqGDDIIL 2640 V C AVSDKDTWLYDTLSTSFAELGLNYNFEERTKKREELWFMSHKANLVDGIY~PKLEPERIVSILEhq)RSKEI24HRTEAICASMIEAWGYTELLQEIRKFYLWLLNKDEFKELASSGK~PYI 2760 E K Q A X A E T A L R K L Y T D V N A Q T S E L Q R Y L E V L D FTHADDCC E SVS LQSGKE KEGDMDAGKD P KKST S S S KGAGT S S KDVNVG S KGKVVP RLQKI T RKM~LPMVEG K I I LS LD HLLEYKP NQVD L FN E

2880

T PAT RTQF E A W Y N A V KD EY E L D D E Q M G V V M N G F MVWC I D NGT S P DAN GVWVMMD G E E Q I E Y P L KP IV ENAKP T L RQ I MHH F S D AA FAY I EM~N S E S P YMP R YG L L RN L RD RE LARYAFD F

3000

YEVT S KTPN P A R E A I A Q M K A A A L S G V N N K L F G L D G N I STN SENT E R H T A R D V N Q N M H T L L G M G p p Q I Q

3066

Fig. 3. Deduced amino acid sequences of SMV strains G2 and G7. The full sequence of the G2 strain is shown. Amino acids of the G7 sequence that differ from G2 are shown below the G2 sequence. Scissors indicate predicted cleavage sites.

proteins show more similarity to homologous proteins of PPV than to those of the other potyviruses. In contrast, SMV CP shows greater similarity to PVY and TEV CP. Overall, SMV is most similar to PPV. The POL protein of SMV is analogous to the NIb of TEV (Dougherty & Parks, 1991), but nuclear inclusions

are not evident in SMV-infected cells (Edwardson & Christie, 1986). The POL protein was identified as the polymerase because it contains the conserved sequence GX~TXXXN(X)¢2o.4o>GDD at amino acids 2595 to 2637. This fits the consensus of virtually all known RNAdependent R N A polymerases (Kamer & Argos, 1984).

2074

Ch. Jayaram, J. H. Hill and W. A. Miller

The 21K and the 27K proteins of SMV are analogous to the NIa protein of TEV and, by comparison with TEV (Dougherty & Parks, 1991), consist of a VPg and a protein processing activity, respectively. The tripeptide GRD (2120 to 2122 amino acid position; Fig. 3) in the QS QG %

YS

cG

0s

308

765

1164

/

\ /

135K I.c Rol 42K I

zs

QG

J[21 127 I

/6K\2041 2284 1798 1852

Qs

I

I

2801 3066

Fig. 4. Proposed map of SMV polyprotein. The amino acids between which cleavage occurs and their position in the genome are shown above and below the map, respectively,

27K protein is conserved in TVMV, TEV, PVY and PPV (Domier et al., 1986; Allison et al., 1986; Robaglia et al., 1989; Maiss et al., 1989), with the aspartic acid residue predicted as the active site (Parks & Dougherty, 1991). This tripeptide is also conserved in strain G2. In strain G7, however, the arginine residue at amino acid position 2121 is changed to lysine (GKD). The CIP protein of SMV shares conserved domains with a group of proteins believed to be helicases (Company et al., 1991 ; Koonin, 1991), including the P80 protein of bovine viral diarrhoea virus, dengue 4 virus non-structural protein 3, mammalian translation initiation factor 4A and PPV CIP [recently shown to have

35K G2 PPV PVY TEV TVMV

MaTImi GS MSTIvf GS MSTIcf GS MalIfgtvnanilkevfGa MSTI

G2 PPV PVY TEV TVMV

282 280 256 276 228

FvVRGrrnGkLVnA FIVRGkhnsiLVDs FIVRGsheGkLyDA FIVRGrskGmLVDA ¥1VRGtcddsLear

08 08 08 19 04

------

185 181 157 179 130

XAsksgklVEfItgK mAkangqkVEiIgrK ImsekrgsVhlIsKK IvrkrhmqVEiIsKK IAkasslrVEvIhKK

295 293 269 289 242

------

306 306 282 302 254

qHY iHY iqf tHY tHf

199 195 171 193 144

------

260 257 233 254 205

G D S G llf G m S G fvv GDSG viL G s S G ivL GDSGIvlL

266 263 239 260 212

308 308 284 304 256

42K G2 PPV PVY TEV TVMV

Gev q qrmkceta LItSIFKPkrMiQiLEndPYiLlmglVSPsILihMYrmkhFEkgvelWIske GeV dkcdefknvKl LIrSIyKPqiMeQvLkEEPYLLImsvlSPGvLMAIfNSGslEKAtqyWItrs Gvi m sesaalKl LlkgIFrPkvMrQLLIdEPYLLilsilSPGILMAMYNyGiFEIAvrlWInek G M n rdwtqgaiemLIkSIiKPh/MkQLLEEEPYIivlaiVSPSILiAMYNSGtFEqAIqnWIpnt GiVysen ndasavKa LtqaIFrPdvlseLiEKEPYLmvfalVSPGILMAMsNSGalEfgiskWIssd

64 66 64 66 67

G2 PPV PVY TEV TVMV

HSvAkIfiILgqLtkrVaandvLleQlemlsetserfmsiLeDcpQaphSYktAkdlLtmyiegkasnnqLv HSLAaItSmLSALAaKVSiAstLnaQmsvldehAavLyDsvfgGtQpyaSYmmAvktLermkartEsDhtLn qSiAmIASiLSALAlrVSaAetLvaQriiIdaaAtDLiDatcDGfnlhltYptAlmvLqwknrnEcDdTLf MrLAnlAaILSALAwKltlAdlfvqQrnlIneyAqvilDnLiDGvrvnhSlslAmeivtiklatqEmkmaLr HSLvrmASILktLAsKVSvAdtLalQkhimrqnanfLcgeLinGfQkkkSYthAtrfLlmiseenEmDdpvl

136 138 136 138 139

G2 PPV PVY TEV TVMV

enGFvdmndklymayEKiYsdrLkqeWraLSWiEKF SitwqlkrfapHte KcLtkKvveessassgnfas dlGFsVlrqatphlvEKsYLqeLeqAWkeLSWsEKF SailesqrwrkHip KpfipK Dgadlggrvdi kaGFpsyntsvvqimEKnYLnlLndAWkdLtWrEny p qhgtHteqnaLstryik ptekadlkg egGyaVtsekvhemlEKnYvkaLkdAWdeLtWiEKF Sairhsrkllkfgr KpLimKntvDcgghidlsvk naGyrVleassheimEKtYLalLetsWsdLSlygKFkSiwftrkhfgryka eLfpKeqtDlqgrysnslr

206 205 199 208 209

G2 PPV PVY TEV TVMV

vcfmnaqshL rn vrntlfqkcdqvwtasvrafvrli IstlhrcysdivylvniciifS svrsllgnqy Kr irdvvrwkrddvvcytyqsmgklfckalgispsfLPstlkmldmLiVfS lynispqafLgrsaqvvKg rasglserfnnyfntkcvnissff IrrifrrLPtfvtFvnsLlViS slfkfhlelL Kh tisravndcggarkvrvaknamtkgn flkiysnLPdvykFitvssViS fhyqstlkrL rnkgslcrerflesissarrrttca v fsllhkafPdvlkFintLvivS

264 266 264 268 267

G2 PPV PVY TEV TVMV

LLvqmtsvlqgIvntaRrdKallsgwkrKedE eavihLY emceKmEgghPsiekFLdhvkgvrPdllpvA LLlsigatcnsmvneHkhlKqlaAdredKkrf krlqvLYtrlseKvgct PtadEFLEYvgdenPdllkhA mLtsvvavcqaIIldqRkyrreielmqieknE ivcmeLYaslqrKLErd FtwdEyiEYlksvnPqivqfA LLltflfqidcmlraHReaK vAawlaKesEwdniinrt fqysKLEnpigyrstaeErlqsehPeafey LsmqiyymlvaIIheHRaaKiksAqleervlE dktmlLYddfkaKLpeg sfeEFLEYtrqrdkey ve

334 336 334 335 334

G2 PPV PVY TEV TVMV

vsmt g q s E D V s aQaKt a t q l q L E k I V A F m A L I t M c i D n E R S D a V F K v L s K I K a f f S t m g e d V k v Q e d i i g d g q v V v h Q s K r d s q a nLE r vVAF v A L V m M I F D s E R S D g V y K I L N K I K G IMG S vd r a V h h Q q a q m e e y D V r hQr s t p w k n L E q v V A F m A L V i M v F D a E R S D C V F K t L N K F K G c L S S idyeVr h Q yk f c igkED i v e Q a K q p e iay fEk IiAFit L V l M a F D a E R S D C V F K I L N K F K G I L S S t e re iivQ ylm~met t E lye fQAKnt gqa s LE r IiAFvs Lt I M I F D n E R S D C V y K I L t K F K G I L g S v e n n V r fQ

399 401 398 401 399

Fig. 5. Amino acid sequence alignments of portions of SMV G2 35K protein and all of 42K protein with those of other potyviruses. Each sequence was aligned with G2 in pairwise fashion using the program BESTFIT (GCG sequence analysis software). Amino acids are shown in bold upper case letters where three or more align. G7 was identical to G2 in all conserved (bold) amino acids. Numbers in the 35K alignment indicate positions of amino acids. Intervening amino acids which showed no significant alignments (indicated by double hyphens) are not shown.

Nucleotide sequences o f two S M V strains

2075

Table 2. Nucleotide and amino acid differences between strains G2 and G7 of S M V Amino acids

Nucleotides

Region* 5'Non-coding 35K HC-PRO 42K CIP 6K 21K 27K POL CP 3'Non-coding

Total differences Total leading to Total nonTotal Percentage aminoacid Percentage Total Percentage conservative conservative Total differences differences changes differences Total differences differences differencest differencesi" (1) (2) (2/1) (3) (3/2) (4) (5) (5/4) (6) (7) 131 924 1371 1197 1903 161 567 729 1551 795 259

13 52 54 85 155 9 36 45 78 31 18

10 6 4 7 8 6 6 6 5 4 7

NA~ 24 14 29 37 0 3 2 13 3 NA

NA 46 26 34 24 0 8 4 17 10 NA

NA 308 457 399 634 54 189 243 517 265 NA

NA 20 11 22 27 0 2 2 12 3 NA

NA

NA

NA

6 2 6

13 8 15

7 3 7

4

20

7

0

0

0

1 1

1 1

1 1

9

3

2 1 NA

2 NA

1 NA

* Abbreviations are as designated in the text. t Conservative and non-conservative differences are defined on the basis of physicochemical properties of amino acids and reflect similarity of function in three-dimensional conformation of proteins as discussed by George et al. (1990). NA, Not applicable.

RNA helicase and RNA-dependent ATPase activity (Lain et al., 1990, 1991)]. The protein also has a conserved nucleotide binding site at amino acid position 1249, a characteristic of potyviruses (Robaglia et al., 1989). The HC-PRO, the 35K protein and the 42K protein, all encoded near the 5' end, show markedly less similarity to the homologous proteins of other potyviruses. The HC-PRO shares between 37 % and 45 % identity whereas the 42K has 13 % to 29 % identity, and the identity of the 35K protein is an insignificant 6% to 14%. Although the amino acid sequences of both 35K and 42K are known to be highly variable among members of the potyvirus group (e.g. Fig. 5), there are no differences between the two SMV strains in the conserved regions of these proteins (Fig. 3).

Discussion The basis for resistance to strain G2 of soybean plants containing the Rsv gene is unknown. This report of the complete sequence of two closely related strains of a potyvirus, differentiated by their ability to infect soybean containing the Rsv resistance gene, should provide the basis for correlating host susceptibility/resistance with alterations in nucleotide sequence occurring among the strains. The Y-terminal proteins may be involved in the ability of a TVMV isolate to overcome host resistance (Hellman et al., 1990). We have shown

that the region with the greatest number of differences between the two SMV strains is in the 5' region of the genome. In particular, the greatest number of nonconservative amino acid differences between strains G2 and G7 occurs in the 42K protein, followed by the 35K, CIP and HC-PRO proteins (Table 2). Although protease and vector transmission functions have been demonstrated for the 35K and HC-PRO proteins with reasonable certainty, other functions of proteins in the 5' region are only speculative. All, however, could relate to host plant resistance and include the suggestion that the 35K, 42K and CIP proteins may be involved in cell-tocell movement (Domier et al., 1987), regulation of proteolytic processing of the viral polyprotein (Riechmann et al., 1992) and replication (Company et al., 1991 ; Koonin, 1991 ; Lain et al., 1990, t991 ; Robaglia et al., 1989), respectively. A previous report showed a strong correlation between the ability of TMV strains to overcome resistance and a change in local net charge, because of single amino acid changes, in the putative replicase genes encoding the 126K and 183K proteins (Meshi et al., 1988). A comparison of the hydropathy profiles of the 35K, 42K and POL proteins showed differences in only the first two. The 35K protein of strain G7 showed, with respect to G2, an increase in local net charge at amino acid positions 13 to 25 and 244 to 259, and a decrease at positions 47 to 60 and 132 to 137 (Fig. 6). Upon comparison of the 42K protein of strain G7 with that of strain G2, an increase in local net charge at positions 15

2076

Ch. Jayaram, J. H. Hill and IV. A. Miller

50 100 150 613-25 ] ~.v~ 47 60 Jt 132 137

J~,

~, . . . . . . .

• . .

1~-29

200

100

200

~

250 ~44-25~ ~1-~

4~..A..

~

300

42K

, A,A/ vA Atl Fig. 6. Comparison of hydropathy profiles (Kyte & Doolittle, 1982) of cistrons 35K and 42K of SMV strains G2 and G7. Numbers identify amino acid positions. Boxes outline regions that differ between strains.

to 29 and a decrease at positions 328 to 340 were evident. Although their significance is unknown, differences in local net charge have been proposed to affect electrostatic interactions between a host factor and non-structural viral proteins involved in resistance and susceptibility (Meshi et al., 1988). We have shown the presence of only three amino acid differences in CP of both G2 and G7 at amino acid positions 2809, 3018 and 3065 (Fig. 3 and Jayaram et al., 1991). Changes at positions 2809 and 3065 occur within the N- and C-terminal regions of the CP, which are known to be highly variable among potyviruses. But the change from methionine in strain G2 to isoleucine in G7 at amino acid position 3018 occurs within the trypsinresistant core, which displays significant amino acid identity among all potyviruses examined (Ward & Shukla, 1991). A recent report has shown that a change from glycine to proline in the virus CP correlates with ability of a strain of potato virus X to overcome resistance (Kohm et al., 1991). We have also noted that a change in the amino acid tripeptide G R D to G K D (at 2121) in the 27K protein correlates with strain G7 infection of soybean plants containing the Rsv gene. The role (if any) that these differences play in the interaction between the virus and resistance gene product is unknown. However, since different viral proteins are involved in different resistance mechanisms, the results of this study provide the basis for determination of specific nucleotide sequences involved in overcoming host resistance. Chimeric full-length infectious transcripts generated by exchanging homologous regions of

the two virus strains as well as site-specific mutagenesis will facilitate identification of these sequences. The variability in CP may be a useful criterion for taxonomy of potyviruses (Shukla & Ward, 1989). The sequence identity of CP is greater than 5 0 ~ among all potyviruses. However, the relative similarity of different viruses, when based on a single protein, is dependent upon which viral protein is compared. For example, based on CP, SMV is more closely related to TEV and PVY than PPV, but based on CIP, POL, 21K and overall homology, SMV is most closely related to PPV (Table 1). Thus, it may be insufficient to characterize taxonomic relationships based upon a single protein. Furthermore, overall relatedness may be reflected best by comparison of biological properties such as host range as well as viral genes. The results reported here and those of Robaglia et al. (1989) demonstrate that both the 35K and 42K proteins show little similarity among potyviruses. However, because two strains of the same virus, i.e., SMV G7 and SMV G2, share 97% overall identity, comparison of the 35K and 42K proteins of potyvirus genomes may clarify the taxonomic position of closely related potyviruses as, for example, the distinction between SMV and watermelon mosaic virus 2 (Jayaram et al., 1991). The authors thank J. C. Carrington for access to a manuscript before publication and Carol Manthey of the Iowa State University Nucleic Acids Research Facility for automated D N A sequencing. This research was supported in part by Pioneer Hi-Bred International, Incorporated and by the Iowa Soybean Promotion Board. Journal Paper No. J-14720 of the Iowa Agriculture and Home Economics Experiment Station, Ames, Iowa, U.S.A. Project No. 2428.

References

ALLISON,R. F., JOHNSTON, R. E. & DOUGHERTY,W. G. (1986). The nucleotide sequence of the coding region of tobacco etch virus genomic RNA: evidence for the synthesis of a single polyprotein. Virology 154, 9-20. BUZZEL, R. I. & TU, J. C. (1984). Inheritance of soybean resistance to soybean mosaic virus. Journal of Heredity 75, 82. CA~mr~OTON, J. C., CARY, S. M., PARr,s, T. D. & DOUGRERTV, W. G. (1989). A second proteinase encoded by a plant potyvirus genome. EMBO Journal 8, 365-370. CrlEN, P., Buss, G. R. & TOLIN, S. A. (1988). Inheritance of reaction to strains G5 and G6 of soybean mosaic virus (SMV) in differential soybean cultivars. Soybean Genetics Newsletter 15, 130-134. CrIO, E. & GOODMAN, R. M. (1979). Strains of soybean mosaic virus: classification based on virulence in resistant soybean cultivars. Phytopathology 69, 467-470. COMPANY, M., ARENAS, J. & ABELSON, J. (1991). Requirement of the RNA helicase-like protein PRP22 for release of messenger RNA from spliceosomes. Nature, London 349, 487-493. DEOM, C. M., OLIVER, M. J. & BEACrIY, R. N. (1987). The 30kd gene product of tobacco mosaic virus potentiates virus movement. Science 237, 389-394. DOMIER, L. L., FRANKLIN,K. M., SHAHABUDDIN, M., HELLMANN, G. M., OVERMEYER, J. H., HIREMATH, S. T., SIAW, M. F., LOMONOSSOFF, G. P., SHAW, J. G. & RHOAOS, R. E. (1986). The nucleotide sequence of tobacco vein mottling virus RNA. Nucleic Acids Research 14, 5417-5430.

Nucleotide sequences o f two S M V strains

DOMIER,L. L., SHAW,J. G. & RHOADS, R. E. (1987). Potyviral proteins share amino sequence homology with picorna-, como-, and caulimoviral proteins. Virology 158, 20-27. DOUGHERTY, W. G. & PARKS, T. D. (1991). Post-translational processing of the tobacco etch virus 49-kDa small nuclear inclusion polyprotein: identification of an internal cleavage site and delimitation of VPg and proteinase domains. Virology 183, 449-456. DOUGHERTY,W. G., CARRINGTON,J. C., CAR',', S. M. & PARKS,T. D. (1988). Biochemical and mutational analysis of a plant virus polyprotein cleavage site. EMBO Journal 7, 1281-1287. EDWARDSON,J. R. & CHRISTIE,R. G. (1986). Viruses infecting forage legumes, vol. 2. Florida Agricultural Experiment Station Monograph Series no. 14. GEORGE, D. G., BARKER,W. C. & HUNT, L. T. (1990). Mutation data matrix and its uses. Methods in Enzymology 183, 333-351. GHABRIAL, S. A., SMITH, H. A., PARKS, T. D. & DOUGHERTY,W. G. (1990). Molecular genetic analyses of the soybean mosaic virus NIa proteinase. Journal of General Virology 71, 1921-1927. GUBLER, U. & HOFFMAN, B. J. (1983). A simple and very efficient method for generating eDNA libraries. Gene 25, 263-269. HELLMAN, G. M., THORNBURY, D. W. & PIRONE, T. P. (1990). Molecular analysis of tobacco vein mottling virus (TVMV) pathogenicity by infectious transcripts of chimeric potyviral eDNA genomes. Phytopathology 80, 1036 (abstrac0. HILL, J. H. & BENNER, H. I. (1980a). Properties of soybean mosaic virus and its isolated protein. PhytopathologischeZeitschrift 97, 272281. HILL, J. H. & BENNER, H. I. (1980b). Properties of soybean mosaic virus ribonucleic acid. Phytopathology 70, 236-239. HILL, J. H., BENNER, H. I., PERMAR,T. A., BAILEY,T. B., ANDREWS, R. E., JR, DURAND, D. P. & VAN DEUSEN, R. A. (1989). epidemiology of soybean mosaic virus in Iowa. Phytopathology 70, 536-540. HILL, J. H., BENNER,H. I., PERMAR,T. A., BAILEY,T. B., ANDREWS, R. E., JR. DUKAND, D. P. & VAN DEUSEN, R. A. (1989). Differentiation of soybean mosaic virus isolates by one-dimensional trypsin peptide maps immunoblotted with monoclonal antibodies. Phytopathology 79, 1261-1265. HOLLINGS, M. & BRUNT, A. A. (1981). Potyvirus group. CMI/AAB Descriptions of Plant Viruses, no. 245. JAYARAM, CH., HILL, J. H. & MILLER, W. A. (1991). Nucleotide sequences of the coat protein genes of two aphid-transmissible strains of soybean mosaic virus. Journolof General Virology72, 10011003. KAMER, G. & ARGOS, P. (1984). Primary structural comparison of RNA-dependent polymerase from plant, animal and bacterial viruses. Nucleic Acids Research 12, 7269-7282. KIHL, R. A. S. & HARTWIG, E. E. (1979). Inheritance of reaction to soybean mosaic virus in soybeans. Crop Science 19, 372-375. KorlM, B., SANTA CRUZ, S., GOULDEN, M., KAVANAGH,T. & BAULCOMBE,n. (1991). Molecular study of resistance in Solanum tuberosum cv. Cara and potato virus X (PVX). Abstract No. 1225, Third International Congress of Plant Molecular Biology, Molecular Biology of Plant Growth and Development. KOONIN, E. V. (1991). Similarities in RNA helicases. Nature, London 352, 290. KYTE, J. & DOOLITI'LE,R. F. (1982). A simple method for displaying the hydropathic character of a protein. Journalof Molecular Biology 157, 105-132. LAIN, S., RIECrlMANN,J. L. & GARCfA,J. A. (1990). RNA helicase: a novel activity associated with a protein encoded by a positive strand RNA virus. Nucleic Acids Research 18, 7003-7006. LAIN, S., MARTIN, M. T., RIECHMANN,J. L. & GARCiA,J. A. (1991). Novel catalytic activity associated with positive-strand RNA virus infection: nucleic acid-stimulated ATPase activity of the plum pox potyvirus helicaselike protein. Journal of Virology 65, 1-6. LIM, S. M. (1985). Resistance to soybean mosaic virus in soybeans. Phytopathology 75, 199-201. LucAS, B. S. & HILL, J. H. (1980). Characteristics of the transmission of three soybean mosaic virus isolates by Myzus persicae and Rhopalosiphum maidis. Phytopathologische Zeitschrift 97, 47-53.

2077

MAISS, E., TIMPE, U., BRISSKE, A., JELKMANN, W., CASPER, R., HIMMLER,G., MA'I'rANOVICH,n. & KATINGER,H. W. D. (1989). The complete nucleotide sequence of plum pox virus RNA. Journal of General Virology 70, 513-524. MAVANKAL,G. & RHOADS,R. E. (1991). In vitro cleavage at or near the N-terminus of the helper component protein in the tobacco vein mottling virus polyprotein. Virology 185, 721-731. MESHI, T., WATANABE,Y., SAITO, T., SUGIMOTO,A., MAEDA, T. & OKADA, Y. (1987). Functions of the 30kd protein of tobacco mosaic virus: involvement in cell-to-cell movement and dispensability for replication. EMBO Journal 6, 2557-2563. MESHI, T., MOTOYOSm, F., ADACHI,A., WATANABE,Y., TAKAMATSU, N. & OKADA,Y. (1988). Two concomitant base substitutions in the putative replicase genes of tobacco mosaic virus confer the ability to overcome the effects of a tomato resistance gene, Tm-1. EMBO Journal 7, 1575-1581. MIERENDORF, R. C. & PFEFFER, D. (1987). Sequencing of RNA transcripts synthesized in vitro from plasmids containing bacteriophage promoters. Methods in Enzymology 152, 563-566. MOSER, O., GAGEY, M.-J., GODEFROY-COLBURN,T., STUSSI-GARAUD, C., ELLWART-TSCHORTZ,M., NITSCHKO, H. & MUNDRY, K.-W. (1988). The fate of the transport protein of tobacco mosaic virus in systemic and hypersensitive tobacco hosts. Journal of General Virology 69, 1367-1373. PARKS, T. D. & DOUGHERTY,W. G. (1991). Substrate recognition by the NIa proteinase of two potyviruses involves multiple domains: characterization using genetically engineered hybrid proteinase molecules. Virology 182, 17-27. PONZ, F., GLASCOCK,C. B. & BRUENING,G. (1988). An inhibitor of polyprotein processing with the characteristics of a natural virus resistance factor. Molecular Plant-Microbe Interactions 1, 25-31. R~ECHMANN,J. L., LAIN, S. & GARCiA, J. A. (1992). Highlights and prospects of potyvirus molecular biology. Journal of General Virology 73, 1-16. ROBAGLIA,C., DURAND-TARDIF,M., TRONCHET, M., BOUDAZIN,G., ASTIER-MANIFACIER,S. & CASSE-DELBART,F. (1989). Nucleotide sequence of potato virus Y (N strain) genomic RNA. Journal of General Virology 70, 935-947. SANDERSON,J. L., BRUENING, G. & RUSSELL, M. L. (1985). Possible molecular basis of immunity of cowpeas to cowpea mosaic virus. UCLA Symposia on Molecular and Cell Biology, New Series 22, 401412. SANGER,F., NICKLEN,S. & COULSON,A. R. 0977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, U.S.A. 74, 5463-5467. SHUKLA,n. D. & WARD, C. W. (1989). Identification and classification of potyviruses on the basis of coat protein sequence data and serology. Archives of Virology 106, 171-200. SIEGEL, A. (1979). Recognition and specificity in plant virus infection. In Plant Resistance to Viruses,pp. 109-113. Edited by D. Evered & S. Harnett. Chichester: John Wiley and Sons. TALIANSKY, i . E., MALYSHENKO, S. I., PSHENNIKOVA, E. S. & ATAnEKOV,J. G. (1982). Plant virus-specific transport functions. II. A factor controlling host range. Virology 122, 327-331. VANCE,V. B. & BEACHY,R. N. (1984a). Translation of soybean mosaic virus RNA in vitro: evidence for protein processing. Virology 132, 271-281. VANCE, V. B. & BEACrlY,R. N. (1984b). Detection of genomic-length soybean mosaic virus RNA on polyribosomes of infected soybean leaves. Virology 132, 26-36. VERCHOT,J.-M., KOONIN, E. V. & CARRINGTON,J. C. (1991). The 35kDa protein from the N-terminus of the potyviral polyprotein functions as a third virus-encoded protease. Virology 185, 527535. WARD, C. W. & SHUKLA, D. D. (1991). Taxonomy of potyviruses: current problems and some solutions, lntervirology 32, 269-296.

(Received 25 November 1991; Accepted 31 March 1992)