(Amersham cDNA-2gtl 1 cloning system). Screening of the library with a synthetic oligonucleotide. An aliquot of the cDNA library was screened with.
J. gen. Virol. (1989), 70, 2775-2783. Pr&tedin Great Britain
2775
Key words: WMV 2/SMV-N/nucleotide sequence/Y untranslated region
The Use of 3' Non-coding Nucleotide Sequences in the Taxonomy of Potyviruses: Application to Watermelon Mosaic Virus 2 and Soybean Mosaic Virus-N By M. J. F R E N K E L , C. W. W A R D AND D. D. S H U K L A * CSIRO, Division of Biotechnology, Parkville Laboratory, 343 Royal Parade, Parkville, Victoria 3052, Australia (Accepted 16 June 1989)
SUMMARY The sequence of the 3' 1106 nucleotides of the watermelon mosaic virus 2 (WMV 2) genome has been determined. The sequence contains the complete coding region of the viral coat protein followed by a 3' untranslated sequence of 251 nucleotides. When these sequences were compared with the equivalent regions of the N strain of soybean mosaic virus (SMV-N), the coat protein coding regions were 82 ~ homologous, whereas the 3' untranslated sequences were 7 8 ~ homologous. Optimal alignment of the 3' untranslated regions of R N A from 13 strains of seven other distinct potyviruses revealed that the degree of homology between strains was in the range 83 to 99~. In contrast, the sequences from distinct viruses had identities in the range 39 to 53~, comparable to the level of identity found between the 3' non-coding regions of viruses from unrelated plant virus groups. On the basis of these results, WMV 2 and SMV-N could be regarded as strains of one virus. These results also suggest that the sequence of the 3' untranslated region of the potyvirus genome may be an accurate marker of genetic relatedness and could serve as an aid to identification and classification of potyviruses. INTRODUCTION The number of viruses now recognized as potyviruses has expanded rapidly in recent years to a current estimate of at least 175 definitive and possible members (Milne, 1988; Shukla & Ward, 1989 b), or 30 ~o of all known plant viruses. Thus the potyvirus group is by far the largest of the 28 plant virus groups and much of its taxonomy is complex, inconsistent and confused. As Francki et al. (1985) pointed out, the existence of so many members and possible members of the potyvirus group, and the frequent description of new candidates, emphasize the need to agree on the characteristics required to assign viruses to the group and those that will distinguish distinct viruses from strains. In a review of the historical development of general taxonomic principles, Mayr (1982) drew attention to Darwin's comment that taxonomy reflects propinquity of descent and that all true classifications are genealogical. Mayr (1982) also traced the changes in the criteria used for general taxonomy from the use of descriptive morphological characters to the application of biochemical techniques that characterize the variation and evolution of molecules. Plant virus taxonomy is going through similar phases of development from the initial reliance on morphological, biological and serological properties to assignments based on coat protein and nucleic acid sequences. Thus, of the physical, biological and chemical properties used to classify potyviruses, protein and genome sequence information should represent the ultimate criteria. We have shown recently that coat protein sequence data can be used to discriminate between distinct viruses and strains (Shukla & Ward, 1988). Distinct potyviruses have coat protein sequence identities of 38 to 71 ~o compared to 90 to 9 9 ~ for related strains (see Shukla & Ward, 1989a, for review). These relationships can be readily identified from comparative HPLC 0000-8958 © 1989 SGM
2776
M. J. F R E N K E L , C. W. WARD AND D. D. SHUKLA
profiles of coat protein peptides (Shukla et al., 1988a) or serology with polyclonal antibodies directed towards virus-specific N termini of the coat proteins (Shukla et al., 1989 a, b, c). Despite these advances, there are some problems, such as the occurrence of unexpected and inconsistent paired serological relationships between biologically distinct potyviruses and single sequence changes in key contact residues in the virus-specific epitopes of some strains. These problems still remain if the identification and classification is based solely on coat protein properties (Shukla & Ward, 1989b). In this report we draw attention to an alternative approach to discriminating between independent potyviruses and strains. It developed from our attempts to establish whether watermelon mosaic virus 2 (WMV 2; Yu et al., 1989) and soybean mosaic virus-N (SMV-N; Eggenberger et al., 1989) were strains of one virus or distinct but closely related viruses. In a previous report, Yu et al. (1989) showed that the homology (83~) between the amino acid sequences of the coat proteins of WMV 2 and SMV-N lay midway between the homology found between distinct potyviruses and that for related strains. In order to investigate the genetic changes associated with these sequence differences and the degree of genetic relatedness beween WMV 2 and SMV-N, we have determined the nucleotide sequences of the coat protein-coding region and the 3' untranslated region of WMV 2 RNA and compared them with the sequence data for RNA of SMV-N (Eggenberger et al., 1989) and other potyviruses (Allison et al., 1985a, b, 1986; Dougherty et al., 1985 ; Domier et al., 1986; Gough et al., 1987; Gunyuzlu et al., 1987; Lain et al., 1988 ; Maiss et al., 1989; Ravelonandro et al., 1988; Rosner & Raccah, 1988; van der Vlugt et al., 1989). The comparisons show that nucleotide sequences of the 3' untranslated region of the potyviral genome can serve as an aid to identification and classification of potyviruses. METHODS Isolation o f W M V 2 . The WMV 2 isolate was originally obtained from Mr R. S. Greber (Greber, 1978) and was maintained and propagated in marrow and purified according to methods described previously (Shukla et aL, 1989a). Isolation o f RNA and synthesis o f a ds cDNA library. RNA was extracted from WMV 2 using a modification of the method described by MacDonald et al. (1987). Briefly, virus particles (in 1 ml) were extracted in 10 ml of 6 M-guanidine hydrochloride, 0.2 M-sodium acetate, pH 5.2 and 10 mi-2-mercaptoethanol. Following the addition of 10 ml of ethanol, the mixture was placed at - 20 °C for 2 h. Precipitated material was sedimented, resuspended in 5 ml of the extraction buffer containing EDTA (10 mM) in the place of 2-mercaptoethanol and reprecipitated by the addition of 5 ml of" ethanol. The sedimented material was resuspended in 2.5 ml of a solution containing 7 M-urea, 100 mM-Tris-HC1 pH 7.5, 0.1 mM-EDTA and 0.1 ~ SDS, the proteins were removed by extraction with buffered phenol and the RNA was precipitated by the addition of ethanol. Double-stranded cDNA was synthesized using avian myeloblastosis virus reverse transcriptase for the first strand and RNase H/DNA polymerase I (Amersham) for the second strand essentially as described by Gubler & Hoffman (1983). EcoRI linkers were added to the ds cDNA and DNA larger than 500 bp was ligated to 2gtll arms, packaged, and used to produce a library of recombinant bacteriophage in Escherichia coli Y1090 cells (Amersham cDNA-2gtl 1 cloning system). Screening o f the library with a synthetic oligonucleotide. A n aliquot of the cDNA library was screened with a 5' radiolabelled, mixed, synthetic oligonucleotide. The deoxyinosine-containing (Takahashi et al., 1985) oligonucleotide used, 5' ~TGATTIGG~TT~TA~TC Y, was the reverse complement of the derived sequence coding for the peptide EssYKPNQ9 o (Yu et al., 1989) and was synthesized using an Applied Biosystems Model 381 DNA synthesizer. The methods used to radiolahel the 5' end of the oligonucleotide with [~,-32p]ATP using polynucleotide kinase, and to screen the library, were those described by Maniatis et al. (1982) and Miyada & Wallace (1987), respectively. A temperature of 30 °C was used for both the hybridization and washing of the nitrocellulose filters. Bacteriophage DNA was purified from liquid lysates of two positively reacting clones using the method of Kao et al. (1982). Subeloning and DNA sequence determination. EcoRI was used to digest the bacteriophage DNA and the sizes of the excised inserts were estimated on agarose gels. Insert DNA was isolated and ligated to EcoRI-digested M13mpl9. Following transformation, recombinants were isolated and ssDNA was prepared for sequence determination, essentially as described by Sanger et al. (1980). DNA sequencing reactions were carried out using the chain termination method of Sanger et al. (1980) with either a 17-mer universal sequencing primer (Amersham) or with synthetic oligonucleotides based on previously determined sequences. Pairs of sequences were aligned using the ALIGN program supplied by the National Biomedical Research
2777
3' Non-coding sequences and potyvirus taxonomy -?
1
-I
C E S V S L Q ~v
2
SMV-N
20
S G K E A V E W L D T G ' K D S K K D T S G K G O K P O N S
TGTGAATCAGTGTCTCTAC~ ..
TcAGG/~U~A~G~AGTAGAJU~ATTTGGA~ACAGGG~GGA~CG~G~G~A~cAGTGG~AAAGGGGATAAAC~ACAAA~CTCG B7 . • ,.... • • • . .. ... .. . • .... ,
TGTGAATCAGTGTCCTTACAA
TCAGGC~GGAGAAGG~GGAGATATGGATGCAGAT~GGATCCAAAG~GAGCACCAGTAGTAGTAAGGG .
K E G D M
AD
P
S
2
SMV-N
.
.
.
.
.
K
O
V
W
V
G
S
K
G
K
E
V
P
R
L
O
K
I
T
K
; ..............
.
.
.
.
.
.
.
M
N
L
P
T
V
.
.
~
A
I
.
.
.
.
S
, . .
v
R
I
L
S
L
lO0
D
L
L
E
Y
K
P
N
Q
V
O
L
F
N
T
R
A
T
K
T
E
ATGAACCTTC~GA~AG~TG~TGGGA~AT~ATTCTTAGcTTAGACC~TTTA~TcGAGTACA~CCT~AT~AAGTTGAT~TGTTTAACAC~GAG~Ak~AAAAACAGA~TTT
SMV.N
ATGAATCTTCCAATGGTTGAAGGGAAGA~cATCCTCAGTTTGGAcCAcTTGCTTGAGTAc~CCTAAT~AGGTTGATTTA~TcAA~Ac~GAGCAACAAGAAcACAG~T~ E
~ 30~
R
O
1~0
E S ~ Y S A V K I E Y O L N O UMV 2
SMV'N
.
lg5
~MV 2
M
.
AGCTGGCAC~GCAGC~AGATGTAAATGTTGGATCAAAGGGA~GGTGGTTCCGCGTTTGCAGAAGATTACAAGAAAG
.
K
.
•
80 G
.
K
~AAA~TGGC~AAGG~AGCA~GAACAGAC~/L~TTGG~ACAGTCAG~A~GATGTG~TGTTGGATCG~AGGAAAAGAAGTCC~A~GAT~A~A~AAGATAA~AAGAAA ,, ,, . . o ,. * * • * •
.................
.
60
40
O T G Q G S K E Q T K I G T V ~V
.
. . . . .
S S K G
140
~ M G V I N N G F ~ V W C I D N G T S P O
GAATCAtGGTACAG~G~AGT~AAAA~AA~ATGATC~T~GA~GAGCA~GGGTGTGATTA~GAA~GG~TATGG~GGTG~A~GA~AATGGTACATCT~CAGA~ • . *, *, • • • * •
411 .
GAAGCGTGGTACAATG~AGT~A~GATGAATATGAG~TTGA~GA~GCAGATGGGTGTGGTTATG~TGGCTT~ATGG~ATGGTGCAT~GACAA~GGTA~ATC~CCAGAT A
N
E
b
D
V
160 V N G V W V M M D G E E Q V E
P
C
(
P
I
V
E
%
A
K
P
T
L
R
Q
I
M
H
M
F
5
~MV 2
G~CAA~GG~GT~GGG~GA~GA~GGATGGAGAAGAG~GTTGAG~ATCCATT~G~CAA~G~GAAAATGCAAAACcAACTC~AAGACAAA~GCACcATTTCTCA 519 ., • • , . .. * .. * • • • ..
SMV N
GCTAATGG~G~GTGGG~GATGATGGATGGAGAGGAACAGAT~GAATATCCG~TGA~CCCATTG~GAAAA~GCAA~cCAACT~TGAGACAAA~CA~GCACCA~TCTCA A
I
180
200
O A A E A Y [ E M R N S E S P
M
P
R
Y
G
L
L
R
N
L
R
D
R
E
L
A
R
Y
A
F
D
u~v 2
GACGCAGCAGAAGCAT~TATTGAAAT~AGAAACTCTGAAAGTC~G~ATA~GCCTA~ATACG~A~ACT~GA~TTT~AGAGA~AGGGAATTAG~CGCTATGCTTT1GAC • , • , * • , , • ,, ,
SMV.N
GATGCAG~AGAAGCTTACATTGAGA~GAG~ATTC~GAAAG~CG~A~A~GC~TAGA~ATGGAC~AC~GAGGAA~TTGAGAGA~AGAGAGCTAGc~GCTATGCT~TGAT
220 F WMV
2
Y
E
V
T
S
2/,0 K
T
P
W
R
A
R
E
A
A
Q
M
K
A
A
A
T TT TAt G A G G T T A C T T C C A / ~ A C A C C T A A T A G G G C A k G A G k A G C A A T A G C A C A A A T G A A G G C C G C A G C T .
SMV-N
.
.
t
A
G
[
N
S
R
CTCGCGGGkkTTAACAGCAGGT
.
.
.
.
.
L
r
G
V
[
S
T
N
S
E
N
T
E
.
H
T
A
R
O
V
~
g
N
M
~
T
L
L
G
M
G
P
P
*
•
o.
*
*
~
735
.
O
*
•
~v
CTGGTC~CAGT~AGCATT~CGGG~GTTATA.AG~TT~ATATTATkk~G~GTTG~A~,CT~T~AG~A~AGTGTGAT~TTATcAccTTTAIAC~T~TTATGTTAGTG~G •, * * * ,,, ,*,* •, *, **,** *,
2
,
ATCT~GAC~AATTCCGAAAATA~GAGAGG~AC~T~AAGGGACG~GAAT~AGAATATG~ATACT~TGTTGGGTATGGGTCCG~GCAG,.,TAAAGACTAGGTAAA 8~.0
A•CTCAA•CAA•TCCGAA•A•A••GAAAGGCA•A••GCAAGGGA•GTGAATCAA•kCATG•ACAC•C•TTTGGG•ATGGGc•CAC•GCAG
~v
G
N K
SMV-N
SMV.N
O
.
281 R
•
2
L
TAT T T G G G C T T G A T G G T A A
TTCTATGAGG~ACTT~TAAAACACCAAACAGGGcAAGGGAAGCAA~AGCG~AGA~GA~GG~TGcAGcT~C~CGGG~GTTAACAA~AAG~G~TTGGACTTGATGGGAAC
260 WMV 2
627
*
TkATAAAGGCTAAGTAAA
958 ,*,
T~GG~ACAGTTA~CATT~CGGG1C~C~A~AG~ACTATA.~A~A~TAG~TGcAC~GTC~TAAATATAG~G~GATTG~AT~ACCAAAT~A~G~TTTTGT~TAGTGTC
G•T•.AAC•ACCTTAGTGTG••TTATATtATAGTTTATGcATAA•AGGGAGAA••A•TAc•ATA•CGGAGTTGT•TGTAG•GTGA•TACATCAcGGTTGATAGCcGAGG•A ** . ** ,.,, * *.. . * ., .. •
1066
SMV-N
G~AA~CA~c~AG~GTGCTTTA?G1~A~AG~T~ATcdLA~GGCAGGGA~JL~ccA~G~GTTG~GGAGcCCTTTG~GAG~GAT~TcAT~ACGT~TAGTGG~GAGG~G
~MV 2
CGGTAATGTTTGTTGTCCTp o l y ( A ) 1 0 8 5
SMV-N
CGGCAATGTTTGTTGTCCT p o l y ( A )
Fig. 1. Comparison of the complete nucleotide sequences of WMV 2 and SMV-N (Eggenbergcr et al., 1989) coat protein genes, deduced amino acid sequences and their 3' untranslated regions. The SMV-N protein sequence is shown only where it differs from that of WMV 2. The sequences were aligned for maximum identity using the ALIGN program of NBRF with a matrix bias of 0 and a break penalty of 5. Differences in the nucleotide sequences are shown by asterisks and deletions are indicated by dots. The WMV 2 amino acid residue numbers are shown above the protein sequence and the nucleotide numbers (including deletions) are shown at the end of each line. Foundation (NBRF), with a matrix bias of zero and a break penalty of 5. The percentage identity is based on the number of possible matches and disregards deletions introduced to maximize the homology.
2778
M. J. F R E N K E L ,
C. W. W A R D A N D D. D. S H U K L A RESULTS
Characterization of a cDNA clone encoding the W M V 2 coat protein sequence Using a 5' radiolabelled synthetic oligonucleotide as the probe, a screen of 6000 recombinants from the WMV 2 cDNA library yielded 17 positive clones. Two clones were selected that contained inserts of approximately 1600 bp and 800 bp. The inserts were subcloned into M 13mp 19 in both orientations and the complete sequences of both DNA strands for each clone were determined and found to be identical for the overlapping region. The sequence of the coding strand from the region immediately 5' to the coat protein gene to the poly(A) tail is shown in Fig. 1. The location of the coat protein amino terminus is based on the N-lerminal amino acid sequence described by Yu et al. (1989). The coat protein gene comprises an open reading frame of 843 nucleotides which codes for a protein of 281 residues, and is followed by a 3' untranslated region of 251 nucleotides, excluding the poly(A) tail (Fig. 1). The sequence derived from the cDNA clone is identical to that determined by direct protein sequencing (Yu et al., 1989). Comparison of the 3' ends of the W M V 2 and S M V - N genomes Fig. 1"shows the alignment for optimal homology of the DNA sequences for the coat protein genes and the 3' untranslated regions of WMV 2 and SMV-N. If the 48 nucleotide deletion in the coding region of SMV-N is disregarded, and some gaps in WMV 2 are introduced to maximize the homology in the 3' non-coding region, the coat protein-coding sequences are 8 2 ~ homologous, whereas the 3' untranslated sequences are 78 ~ homologous. The alignment shown in Fig. 1 indicates that the differences between the sequences are evenly distributed except for the concentration of mismatches at the 5' end of the coat protein-coding region where there are 24 differences in the first 69 nucleotides. Comparisons among the 3" untranslated sequences of different potyviruses and potyvirus strains
Fig. 2 shows the optimal alignments of the 3' untranslated sequences of known and predicted strains of different potyviruses. The degree of identity between strains (Table 1) was 99 % for the two strains of tobacco etch virus (TEV) and 98 or 99 % for the different strains of plum pox virus (PPV). The N and I strains of potato virus Y (PVY) were 83 % identical, whereas these two PVY strains had identities of 83% and 92%, respectively, with pepper mottle virus (PeMV). The sequence homology of 78% between WMV 2 and SMV-N is very similar to that observed between the PVY strains. These and other comparisons are shown in Table 1. Although alignments of the 3' untranslated sequences of 13 strains of seven distinct potyviruses revealed degrees of homology of 83 to 99 %, similarities in the range 39 to 53 % were found in comparisons between unrelated viruses, with most showing about 45% homology (Table 1). The percentage identity between unrelated virus sequences is very similar to that found for comparisons between viruses from different taxonomic groups (data not shown) and presumably this level of similarity arises by chance. An alignment of the coat protein gene sequences of the potyviruses showed that the degree of homology in comparisons of unrelated viruses was generally in the range 56 to 64%, whereas the corresponding sequences of related strains were 88 to 99% homologous (Table 1). DISCUSSION
The amino acid sequence homology (83~) between the coat proteins of WMV 2 and SMV-N (Yu et al., 1989) lies midway between the ranges of similarity for independent potyviruses (38 to 71 ~ ) and related strains (90 to 99%). Most of the differences (including a 16 residue deletion in SMV-N) occurred in the 45 N-terminal residues, with the remainder of the two coat protein molecules being very similar (92 ~ homology). This observation, that the 235 C-terminal amino acid residues are far more conserved than the surface-exposed, immunodominant N-terminal region (Shukla et al., 1988b), suggested that the 3' end of the potyvirus genome including the untranslated region may be an accurate marker of genetic relatedness. Complete nucleotide sequences are now available for the genomes of three potyviruses, namely the aphid non-transmissible isolate of PPV (PPV-NAT; Maiss et al., 1989), the highly
2779
3' Non-coding sequences and potyvirus taxonomy (a) TEV-NAT TEV-NAT
TGATAG~TTCTGCGTGTCTTTGCTTTCCGCTTTTAAGCTTATTGTAATAtATATGAATAGCTA~TCJ~AGTGG~CTTGG~TTGTGTTGAATAGTATCTTATATG~TTT
TEV-NAT TEV-NAT
AATATGTCTTATTAGTCTCATTACTTAGGCGAACGACAAAGTGAGGTCACCTCGGTCTAACTCTECTATGTAGTGCGAG
110
189
T
(b) PPV-D PPV-R
TAGTGGTcTCGGTATCTATCAT~CTCTACCTGG'TGAC`AGTCTAJ~TCATCCAGTTGTTTT~A~1TCCTGTTAGCATC~TTTTCTCCG~TTT~TAGCAGTACATT~ G
PPV-NAT PPV-AT PPV-D PPV-R
110
T
G
C
T
G
C
T
G~G~T~.~TAECTCCAT~TGTT~:~G~Te.TT~T~G~C~C~C~GGCCC?TGTATCT~TGT~GE~GTGCTCC~CT~CAT~GGGTTA~GT~CTTGTGC~GAG~£ 220 T C G
PPV-NAT PPV-AT
I" t T
C £
............
(c) PVY-!
?C`ATTGTGATG.°TCTCTCCG~CC`ATATATAAG~ATTTACA~ATGCAGT~G~A%TTT~GGGCTTTTCCTGT~CTA~TTTATTGC~CT~TAATCAGTT~C~AT~T
PVY-N PeMV
AG
..
C
CC
T
GA
TG T
..
C T
110
T
T
G
PVY-I PVY-N PeMV
ATTAATAAATAGAG~GCAGGATGATTT~GTCATTGTGG~GACT~TATT1GTTATTTCTGCATTATT~tGCTTAATTA~tAGTCGCCCGGGTTG~TGTTGTTGTAGAEG
PVY-I
AACrAT~GATTAGGT~G~TGC~TTcTGTrGTAG~AGT(aLCTATGT~G~TCTAT~TA~TTGGGTGGTGTTGT~TTc~GTCAT~AGT~TGTA~TT~AATC 330
PVY-N PeMV PVY-[
C GGC G
G GTG TGTAGC
C
C
G
CCA
CT
GC
GTG
CG
Y
G
T
l I"
T
AT
T
CA
T
GT
7 T ¢GTA T
A
CT
G..
C /LtT
AGI
C
T A
220 G CT
C
GT
G
T
AGGA~AC ] 3 7
PVY-N PeMV
(d) WMV 2 SMV-N
.~.TAAAGACTAGGT~CTGGT~AGTTAGCATTTCGGGTCGTTATA°AGTTTTCTATATTAT~TGTGTTGCA~CTTTTAGTATAGTGTGATTTTATCACCTTTA TAA
G
A
T
T
C T
T
A
A
G AGT GCACTGT
A A
GC
110 AAAT
WMV 2
TACTTTTTATGTT~GTGTGGT~T.kiC~AC~TTAGTGTGCCTTATATTATAGTTTATGCATk4CAGGGAGA~CCATTAC/~`TA~CG~J~GTTGT~TGTAGTGTGATTAEAT2ZO
SMV-N
A TG
WMV 2 SMV-N
CACG~TTGATAGCCGAGGTACGGTAATGTTTGTTGTCCT 259 Tr AG G G C
TGT
T
CC
T
G
A
GG
GTGT G
CCC
A
A
T
Fig. 2. Comparisons of 3' untranslated regions from strains of distinct potyviruses. The sequences were aligned for maximum identity using the ALIGN program of NBRF with a matrix bias of Oand a break penalty of 5. Sequences are shown only where they differ from the full sequence, and deletions are indicated by dots. The nucleotide numbers (including deletions) are shown at the end of each line. (a) NAT and HAT strains of TEV; (b) D, R, NAT and aphid-transmissible (AT) strains of PPV; (c) I and N strains of PVY and PeMV; (d) WMV 2 and SMV-N. The sources of sequence data are given in Table 1. aphid-transmissible isolate of TEV (TEV-HAT; Allison et al., 1986) and tobacco vein mottling virus (TVMV; Domier et al., 1986) with sequences for 3' coding and non-coding regions available for the D, AT and R strains of PPV (Ravelonandro et al., 1988; E. Maiss, personal communication; Lain et al., 1988), the N A T strain of TEV (Allison et al., 1985 b), Johnson grass mosaic virus (JGMV) (Gough et al., 1987), PeMV (Dougherty et al., 1985) and the I and N strains of PVY (Rosner & Raccah, 1988; van der Vlugt et al., 1989). Examination of these sequences reveals that the 3' non-coding regions of distinct viruses differ in length (189 to 475 nucleotides) and display no significant sequence homology (Table 1). The degree of homology, ranging from 39 to 5 3 ~ , is comparable to that obtained when the 3' untranslated regions of unrelated viruses from other plant virus groups are compared with
99 98 99 46 51 43 46 45 47 42 41 42 41
97 98 99 53 49 44 44 44 45 41 41 42 42
97 99 99 46 46 44 48 40 42 44 39 42 42
98 99 99 45 44 43 47 44 43 43 42 43 43
58 59 58 58 46 44 43 44 45 48 47 50 49
PPV-D PPV-AT PPV-NAT PPV-R JGMV-JG 63 64 63 63 60 83 92 44 41 47 47 48 48
63 64 64 64 59 88 83 42 43 45 41 50 50
PVY-I P V Y - N
42 50 44 45 49 49
62 64 64 64 60 92 88
61 62 62 62 62 62 64 63 78 44 47 42 42
61 61 61 62 64 60 61 60 82 45 39 47 47
60 60 61 60 59 62 62 64 58 59 45 48 48
60 62 62 63 59 57 56 58 59 59 63 52 53
P V Y - P e M V W M V 2 S M V - N SMV-V T Y M V
62 64 64 64 60 63 63 61 62 62 63 60 99
TEV-NAT
62 64 64 64 60 64 64 62 62 61 62 60 97 -
TEV-HAT
* The sources of sequence data were: P P V - D (Ravelonandro et al., 1988); P P V - N A T and P P V - A T (Maiss et al., 1989; E. Maiss, personal communication); P P V - R (Lain et al., 1988); J G M V - J G (Gough et al., 1987); PVY-I (Rosner & Raccah, 1988); PVY-N (van der Vlugt et al., 1989); P e M V (Dougherty et al., 1985); S M V - N (Eggenberger et al., 1989); SMV-V (Gunyuzlu et al., 1987); T V M V (Domier et al., 1986); T E V - N A T a n d T E V - H A T (Allison et al., 1985a, b, 1986). Above the diagonal, coat protein-coding region sequence identities; below the diagonal, 3' untranslated region sequence identities.
PPV-D PPV-AT PPV-NAT PPV-R JGMV-JG PVY-I PVY-N PVY-PeMV WMV 2 SMV-N SMV-V TVMV TEV-NAT TEV-HAT
T a b l e 1. Percentage nucleotide sequence homology between coat protein-coding and 3' unstranslated regions o f potyvirus genomes*
to
r~
t7
Z
t~
rn r-'
Oo
3" Non-coding sequences and potyvirus taxonomy
2781
potyviruses and are probably close to the value expected for coincidental matching. In contrast, the 3' untranslated regions of related strains are very similar in length and nucleotide sequence (Table 1, Fig. 2). We have previously commented (Shukla et al., 1986, 1988c; Shukla & Ward, 1988) on the close structural relationship between the coat proteins of PeMV and strains of PVY, and the possibility that PeMV is a strain of PVY. Comparison of the 3' untranslated region of PeMV (Dougherty et al., 1985) with that published for PVY-I (Rosner & Raccah, 1988) shows that the 3' untranslated sequences are 92% homologous (Table 1), supporting our contention that they are closely related genetically and should be considered strains of one virus. In a more recent paper, van der Vlugt et al. (1989) have determined the nucleic acid sequence of the coat proteincoding and 3' untranslated region of PVY-N and have reached the same conclusion. Moreover, they point out that PeMV was originally classified as PVY-S, the speckling strain of PVY (Zitter, 1972). As shown in Table 1, the 3' untranslated regions of RNA of WMV 2 and SMV-N are as similar in sequence as those of some strains of PVY, which strongly supports the suggestion that WMV 2 and SMV-N are strains of the one virus. Yu et al. (1989) reached the same conclusion, based on the homology observed for the coat protein sequences, and suggested that SMV-N could be renamed the soybean N strain of WMV (WMV-SN). However, more information is required on the range of viruses causing mosaic disease of soybean before deciding whether SMV-N and WMV 2 should be considered strains of SMV or strains of WMV 2. There are numerous reports of the use of cDNA hybridization to detect strains of plant viruses (Hull, 1984). However the use of randomly primed cDNAs to the RNAs of some viruses have resulted in strong hybridization with heterologous viruses, including viruses from different groups (Koenig et al., 1988). The sequence of the 3' untranslated region of potyvirus RNA may have great value as a marker for distinguishing viruses and strains and can be readily obtained from cDNA clones generated by oligo(dT)-primed synthesis on a viral RNA template. In addition, synthetic oligonucleotides, Taq polymerase-generated fragments (Saiki et al., 1988) or cloned cDNA from the 3' untranslated region should find ready application as sensitive probes (Landegren et aL, 1988) for potyvirus detection and classification. We are most grateful to Drs R. N. Beachy (St Louis), E. Maiss (Bra aschweig), S. A. T~lin (Blacksburg) and R. Goldbach (Wageningen) for their generous provision of sequence data before publication. W e t h a n k M r N. Bartone for synthetic oligonucleotide probes, M s S. L. Tracy for excellent technical assistance, Ms L. M o n a r c h for the photographs and Ms B. Wood for typing the manuscript. This work was supported by the Rural Credits Development F u n d of the Reserve Bank of Australia.
REFERENCES ALLISON, R. F., DOUGHERTY, W. G., PARKS, T. D., WILLIS, L., JOHNSTON, R. E., KELLY, M. E. & ARMSTRONG,F. B. (1985 a). Biochemical analysis of the capsid protein of tobacco etch virus: N-terminal a m i n o acids are located on the virion's surface. Virology 147, 309-316. ALLISON, R. F., SORENSEON, J. C., KELLY, M. E., ARMSTRONG, F. B. & DOUGHERTY, W. G. (1985h). Sequence determination of the capsid protein gene and flanking regions of tobacco etch virus: evidence for the synthesis and processing of a polyprotein in potyvirus genome expression. Proceedings of the National Academy of Sciences, U.S.A. 82, 3969-3972. ALLISON, R. F., JOHNSTON, R. E. & DOUGHERTY,W. G. (1986). T h e nucleotide sequence of the coding region of tobacco etch virus genomic R N A : evidence for the synthesis of a single polyprotein. Virology 154, 9-20. DOMIER, L. L., FRANKLIN,K. M., SHAHABUDDIN,M., HELMAN,G. M., OVERMEYER,J. H., HIREMATH,S. T., SIAW, M. F. E., LOMONOSOFF,O. P., SHAW,J. G. & aHOADS, a. E. (1986). The nucleotide sequence of tobacco vein mottling virus. Nucleic Acids Research 14, 5417-5430. DOUGHERTY,W. G., ALLISON,R. F., PARKS,T. D., JOHNSTON,R. E., FEILD, M. J. & ARMSTRONG,F. B. (1985). Nucleotide sequence at the 3'-terminus of pepper mottle virus genomic R N A : evidence for an alternative mode of capsid protein gene organization. Virology 146, 282-291. EGGENBERGER, A. L., STARK,D. M. & BEACHY,R. N. (1989). The nucleotide sequence of a soybean mosaic virus coat protein-coding region and its expression in Escherichia coli, Agrobacteriura tumefaciens, and tobacco callus. Journal of General Virology 70, 1853-1860. FRANCKI, R. I. n., MmNE, R, ~. & HATTA,T. (1985). Atlas of Plant Viruses, vol. I and II. Boca Raton: C R C Press.
2782
M. J. F R E N K E L ,
C. W . W A R D A N D D. D. S H U K L A
GOUGH, K. H., AZAD, A. A., HANNA, P. J. & SHUKLA, D. D. (1987). Nucleotide sequence of the capsid and nuclear inclusion protein genes from the Johnson grass strain of sugarcane mosaic virus R N A . Journal of General Virology 68, 297-304. GREBER, R. S. (1978). Watermelon mosaic virus 1 and 2 in Queensland cucurbit crops. Australian Journal of Agricultural Research 29, 1235-1245. GUBLER, U. & HOFFMAN, B. J. (1983). A simple and very efficient method for generating e D N A libraries. Gene 25, 263-269. GUNYUZLU, P. L., TOLIN, S. A. & JOHNSON,J. L. (1987). The nucleotide sequence of the 3' terminus of soybean mosaic virus. Phytopathology 77, 1766. HULL, R. (1984). Rapid diagnosis of plant virus infections by spot hybridization. Trends in Biotechnology 2, 88-91. KAO, F.-T., I-IARTZ, J. A., LAW, M. L. & DAVIDSON, J. N. (1982). Isolation and chromosomal localization of unique D N A sequences from a h u m a n genomic library. Proceedings of the National Academy of Sciences, U.S.A. 79, 865-869. KOENIC;, R., AN, D. & BURGErtMEISTER, W. (1988). The use of filter hybridization techniques for the identification, differentiation and classification of plant viruses. Journal of Virological Methods 19, 57-68. LAIN, S., mECrtMXrCN,J. L., MENDEZ,E. & GARCIA,J. A. (1988). Nucleotide sequence of the 3'-terminal region of plum pox potyvirus R N A . Virus Research 10, 325-342. LANDEGREN, U., KAISER, R., CASKEY, C. T. & HOOD, L. (1988). D N A diagnostics - molecular techniques and automation. Science 242, 229-237. MACDONALD,R. J., SWIFT, ~. ft., PRZYBYLA,A. E. & CHIRGWIN, J. M. (1987). Isolation of R N A using guanidine salts. Methods in Enzymology 152, 219-227. MAISS, E., TIMPE, U., BRISSKE, A., JELKMANN, W., CASPER, R., HIMMLER, G., MATTANOVICH,D. & KATINGER, H. W. D. (1989). The complete nucleotide sequence of plum pox virus R N A . Journalof General Virology 70, 513-524, MANIATLS,T., FRITSCH, E. F., & SAMBROOK,I. (1982). Molecular Cloning." A Laboratory Manual. N e w York: Cold Spring Harbor Laboratory. MAYR, E. (1982). The Growth of Biological Thought: Diversity, Evolution and Inheritance. Cambridge: Harvard University Press. MILNE, R. G. (1988). The Plant Viruses, vol. 4. The FilamentousPlant Viruses. N e w York & London: Plenum Press. MIYADA,C. G. & WALLACE,R. B. (1987). Oligonucleotide hybridization techniques. Methods in Enzymology 154, 94107. RAVELONANDRO,M., VARVERI,C., DELBOS,R. & DUNEZ, J. (1988). Nucleotide sequence of the capsid protein gene of plum pox potyvirus. Journal of General Virology 69, 1509-1516. ROSNER, A. & RACCAH,B. (1988). Nucleotide sequences of the capsid protein gene of potato virus Y (PVY). Virus Genes 1, 255-260. SAIKI, R. K., GELFAND,D. H., STOFFEL, S., SCHARF,S. J., HIGUCHI, R., HORN, G. T., MULLIS,K. B. & ERLICH, H. A. (1988). Primer-directed enzymatic amplification of D N A with a thermostable D N A polymerase. Science 239, 487491. SANGER, F., COULSON, A. R., BARRELL, B. G., SMITH, A. J. H. & ROE, B. A. (1980). Cloning in single-stranded bacteriophage as an aid to rapid D N A sequencing. Journal of Molecular Biology 143, 161-178. SHUKLA, O. D. & WARD, C. W. (1988). A m i n o acid sequence homology of coat proteins as a basis for identification and classification of the potyvirus group. Journal of General Virology 69, 2703-2710. SHUKLA, D. D. & WARD, C. W. (1989a). Structure of potyvirus coat proteins and its application in the taxonomy of the potyvirus group. Advances in Virus Research 36, 273-314. SHUKLA, D. D. & WARD, C. W. (1989b). Identification and classification of potyviruses on the basis of coat protein sequence data and serology. Archives of Virology 106, 171-200. SHUKLA, O. D., INGLIS, A. S., McKERN, N. M. & GOUGH, K. H. (1986). Coat protein of potyviruses. 2. A m i n o acid sequence of coat protein of potato virus Y. Virology 152, 118-125. SHUKLA, D. D., McKERN, N. M., GOUGH, K. H., TRACY,S. L. & LETHO, S. G. (1988a). Differentiation of potyviruses and their strains by high performance liquid chromatographic peptide profiling of coat proteins. Journal of General Virology 69, 493-502. SHUKLA, D. D., STRIKE, P. M., TRACY, S. L., GOUGH, K. H. & WARD, C. W. (1988b). The N and C termini of the coat proteins of potyviruses are surface-located and the N terminus contains the major virus-specific epitopes. Journal of General Virology 69, 1497-1508. SHUKLA, D. D., THOMAS,J. E., McKERN, N. M., TRACY, S. L. & WARD, C. W. (1988C). Coat protein of potyviruses. 4. Comparison of biological properties, serological relationships, and coat protein a m i n o acid sequences of four strains of potato virus Y. Archives of Virology 102, 207-219. SHUKLA,D. D., JILKA, J., TOSIC, i . & FORD, R. E. (1989a). A novel approach to the serology of potyviruses involving affinity-purified polyclonal antibodies directed towards virus-specific N termini of coat proteins. Journal of General Virology 70, 13-23. SHUKLA, D. D., TOSIC, M., FORD, R. E., JILKA, J., TOLER, R. W. & LANGHAM,M. A. C. (1989b). T a x o n o m y of potyviruses infecting maize, sorghum and sugarcane in Australia and the United States as determined by reactivities of polyclonal antibodies directed towards virus-specific N-termini of coat proteins. Phytopathology79, 223-229. SHUKLA,D. D., TRIBBICK,G., MASON,T. J., HEWISH, D. R., GEYSEN, H. M. & WARD, C. W. (1989C). Localization of virusspecific and group-specific epitopes of plant potyviruses by systematic immunochemical analysis of overlapping peptide fragments. Proceedings of the National Academy of Sciences, U.S.A. (in press).
3' Non-coding sequences and potyvirus taxonomy
2783
TAKAHASHI, Y., KATO, K., HAYASHIZAKI, Y., WAKABAYASHI,T., OHTSUKA, E., MATSUKI, S., IKEHARA, M. & MATSUBARA,
K. (1985). Molecular cloning of the human cholecystokinin gene by use of a synthetic probe containing deoxyinosine. Proceedingsof the National Academy of Sciences, U.S.A. 82, 1931-1935. VAN DER VLUGT, R., ALLEFS, S., DE HAAN, P. & GOLDBACH, R. (1989). Nucleotide sequence of the 3'-terminal region of potato virus yN RNA. Journal of General Virology 70, 229-233. Yu, M. H., FRENKEL,M. J., McKERN, N. M., SHUKLA,D. D., STRIKE, P. M. & WARD, C. W. (1989). Coat protein of petyviruses. 6. Amino acid sequences suggest watermelon mosaic virus 2 and soybean mosaic virus-N are strains of the same potyvirus. Archives of Virology 105, 55-64. ZITTER, T. A. (1972). Naturally occurring pepper virus strains in Florida. Plant Disease Reporter 56, 586-590.
(Received 14 March 1989)