E P. Y P L. D M. A. Y. I H E K G M. S M. D. S P L V V. T T A N. T A. V. P P T. 886 .... S D 1486. G CG C C AC AU GUC~ CUUC U/~JkGAC J C C UUCAUG~.
2357
Journal of General Virology (1991), 72, 2357 2365. Printed in Great Britain
Complete nucleotide sequence and genetic organization of grapevine fanleaf nepovirus RNA1 C. Ritzenthaler, 1 M . Viry, ~ M . Pinck, 1 R. Margis, l t M. Fuchs 2 and L. P i n c k 1. l lnstitut de Biologie Molkculaire des Plantes du C N R S et Universitd Louis Pasteur, Laboratoire de Virologie, 12 rue du Gdnkral Zimmer, 67084 Strasbourg Cedex and 2Station de Recherches Vigne et Vin, Laboratoire de Pathologie Vdgktale I.N.R.A., 28 rue de Herrlisheim, 68021 Colmar, France
The nucleotide sequence of the genomic RNA 1, 7342 nucleotides (nt) of grapevine fanleaf virus strain F13 (GFLV-FI 3) has been determined from cDNA clones. The complete sequence contained only one long open reading frame (ORF) of 6852 nucleotides extending from nucleotide 243 to 7101. The putative polyprotein encoded by this ORF is 2284 amino acids in length with an Mr of 253K. The location of genome-linked protein and comparison of the primary structure of the 253K polyprotein to that of other closely related viral proteins of the picornavirus-like family allows the proposal of a scheme for the genetic organization of GFLV-FI3 RNAI. The primary structure of the
polyprotein includes a putative RNA-dependent RNA polymerase of 92K and a cysteine protease of 25K. This protease shares not only major structural homologies, particularly in the snbstrate-binding pocket, with the trypsin-like serine proteases of other picorna-like viruses, but also their specificity in terms of cleavage. The large region of Mr 133K upstream of the VPg was found to contain at least two domains, one of which could be easily aligned with the NTP-binding sequence pattern and another which may have the characteristics of a protease eofactor. Thus, the 253K protein possesses the same general genetic organization as the corresponding protein of other picorna-like viruses.
Introduction
1990) and 37K for RNA3 (Fuchs et al., 1989). Studies on the distribution of genetic functions between genome segments of nepoviruses have shown that RNAI is able to replicate independently in protoplasts (Robinson et al., 1980) and thus encodes the information necessary for viral replication. Moreover, in vitro processing studies have shown that the RNA 1-encoded polyprotein is able to catalyse the cleavage of the 122K protein into two proteins of 68K and 58K and thus contains a protease activity (Morris-Krsinich et at., 1983). The coat protein (CP) cistron has been precisely positioned within the polyprotein encoded by RNA2 and it has been shown that the CP (Mr 56K) is produced by proteolytic cleavage of the 122K polyprotein at an Arg/Gly site (Serghini et al., 1990). Recently, the VPg protein linked to the 5' end of GFLV strain F13 RNAs has been microsequenced and mapped on RNA1 (Pinck et al., 1991). In this paper we present the complete nucleotide sequence of GFLV RNA1. Together with the sequence of GFLV RNA2 (Serghini et al., 1990) this completes the characterization of the genome of GFLV-FI3. In addition, sequence comparisons have been made between RNA1 species of several nepoviruses [GFLVFI3 and two closely related viruses, tomato black ring virus (TBRV; Greif et al., 1988) and Hungarian
Grapevine fanleaf virus (GFLV) is a member of the nepovirus group and is responsible for an economically significant disease in vineyards. Its genome is composed of two single-stranded positive-sense polyadenylated RNAs which carry a genome-linked protein (VPg) at their 5' ends (Pinck et al., 1988). The length of GFLV RNAs was estimated at about 6800 nucleotides (nt) upon gel electrophoresis under denaturing conditions for RNA1 (Pinck et al., 1988) and at 3774 nt for RNA2 after cloning and complete sequencing (Serghini et al., 1990). An additional smaller satellite RNA (Pinck et al., 1988) has also been identified and is composed of 1114 nt (Fuchs et al., 1989). When translated in vitro in a wheatgerm extract each viral RNA induces the synthesis of a polyprotein corresponding to its entire coding capacity, with apparent Mr values of 225K for RNA1 (Pinck et al., 1988), 122K for RNA2 (Serghini et al.,
t Permanentaddress:EscolaTecnicaFederalde Quimicado Rio de Janeiro, 121 Rua SenadorFurtado, 20270 Rio de Janeiro, Brasil. The sequencedata reportedappearin the EMBLdatabankunderthe accession number D00915. 0001-0366 © 1991 SGM
2358
C. Ritzenthaler and others
grapevine chrome mosaic virus ( G C M V ; Le Gall et al., 1989)], a comovirus, cowpea mosaic virus ( C P M V ; Lomonossoff & Shanks, 1983) and polio picornavirus (human poliovirus, HPV, Sabin strain; Kitamura et al., 1981). A search for consensus sequences for R N A polymerase, protease and nucleic acid-binding protein domains, as well as a search for potential cleavage sites has allowed us to locate precisely these presumed domains within the polyprotein encoded by the R N A l of GFLV.
Methods Virus purification and nucleic acid extraction. These were as described previously (Pinck et al., 1988; Serghini et al., 1990). Synthesis and cloning of double-stranded cDNA. Partial clones containing the Y-terminal sequence of GFLV-FI3 RNA1 were constructed by cDNA synthesis using oligo(dT)-tailed pUC9 primer extension cDNA synthesis (Heidecker & Messing, 1983). First-strand cDNA synthesis and subsequent double-stranded cDNA synthesis were performed as already reported (Serghini et al., 1990). The 5' end of RNAI was cloned using the primer P1626 complementary to nt 77 to 94 of RNA 1. The first strand cDNA was either oligo(dC)- or oligo(dA)-tailed in a 50 ~tl reaction mixture containing 20 units terminal deoxynucleotidyl transferase, 0.1 mMdCTP or -dATP in the tailing buffer from Bethesda Research Laboratories. The mixture was incubated for 20 min at 37 °C and the enzyme was then inactivated by heating at 70 °C for 5 min. After removal of non-incorporated nucleotides the tailed cDNA was amplified by 50 cyclesof the polymerasechain reaction (PCR) (Saiki et al., 1988)with the 5' primer oligo(dG)or oligo(dT)respectivelyand the primer P1626. Melting, annealing and polymerization times and temperature were 90 s at 94°C, 120 s at 45 °C and 60 s at 72°C, respectively, under the conditions specified for Thermophilus aquaticus polymerase (Perkin Elmer Cetus). The amplified DNA was phenolextracted, phosphorylated and ligated with SmaI-linearized Bluescribe M 13+ (Stratagene) before transformation of Escherichia coli TG2. Screening of the recombinant clones. The recombinant plasmids were selected for the presence of the viral cDNA inserts using either nicktranslated RNAl-specific cDNA probes (Pinck et al., 1988)or 5'32Plabelled specific oligodeoxynucleotideprobes. Nucleotide sequence analysis. Sequencing was performed on both strands of overlapping clones with the methods previously described for GFLV RNA2 (Serghini et al., 1990).
Results and Discussion P r i m a r y structure o f R N A 1
The primary sequence was established by sequencing both strands of c D N A inserts from three sets of clones obtained from independent cloning experiments. The clones containing the largest insert of each set were : (i) clone pA87 corresponding to 2538 nt from the 3' end of R N A 1 , (ii) clone pD67 for 4296 nt (from nt 658 to 4954) and (iii) clone pR1 (from nt 44 to 4954). The 5'-terminal
clone was produced by P C R after dC- or dA-tailing of the first strand c D N A as detailed in Methods. The comparison of the sequence of two sets of Y-terminal clones unambiguously indicated the 5' end sequence of RNA1 as A U G A A A A U U U .... The complete nucleotide sequence of GFLV-F13 RNA1 is presented in Fig. 1. The R N A is 7342 nt long, excluding the Y-terminal poly(A) tail. This length is slightly more than the 6800 nt estimated by denaturing gel electrophoresis (Pinck et al., 1988). Six sequence heterogeneities were found on separate clones, four were single nucleotide changes in positions 112, 2204, 6930 and 7011 which resulted in two amino acid changes, E into Q and D into N at postions 2230 and 2257 of the O R F respectively. In one subclone of pA87, two additional U residues were found near the 3' end (Fig. 1). Computer analysis of the RNA1 sequence revealed the presence of a single large O R F of 6852 nt in the positivesense orientation from nt 243 to the U A A termination codon at position 7095. No other long ORFs of more than 252 nt (from nt 6175 to 6426) in the positive-sense orientation and less than 867 nt (from nt 932 to 1798) in the negative-sense orientation of R N A s were found. The A U G codon at position 243, the first potential initiation codon of the sequence, is in a context known to enhance translation in eukaryotic cells, with an A in position - 3, a C in position - 2 and with the G C C motif in position - 4 to - 6 (Kozak, 1989). Assuming that translation begins at the first A U G , the translation product would comprise 2284 amino acids with a calculated Mr of 252786 (253K) which is slightly more than the 225K estimated by 10% S D S - P A G E for the major translation product of RNA1 synthesized in wheatgerm extracts (Pinck et al., 1988). Sequence homologies between R N A 1 and R N A 2 o f G F L V and other nepoviruses
Comparison of the nucleotide sequence of RNA1 with that previously determined for R N A 2 (Serghini et al., 1990) showed extensive similarity in the 5' and in the 3' non-coding regions. The 5' non-coding region of RNA1 is 10 nt longer than in R N A 2 , assuming that the R N A 2 A U G codon at position 233 (the second in-phase A U G ) is the initiation codon. The global similarity between these two sequences is close to 80%, with the highest degree of homology within the 130 first nucleotides. The sequence U G A A A A U U U from nt 2 to 10 is part of the 5' consensus sequence already reported for nepovirus R N A s (Fuchs et al., 1989). The 3' non-coding region of RNA1 is longer by 35 nt than in R N A 2 with an average similarity of 80% as observed also in most of the multipartite single-stranded positive-sense R N A viruses. The 3' similarity is reduced
G F L V R N A 1 sequence and genomic organization
to less than 50~ when compared to the corresponding regions of RNA1 of the two other nepoviruses so far sequenced, TBRV (Greif et al., 1988) and GCMV (Le Gall et al., 1989), which have nearly identical 3' noncoding sequences. Comparison of the 253K polyprotein with other viral proteins and search for significant motifs in the amino acid sequence
The nepoviruses, as members of the picornavirus-like superfamily (Goldbach et al., 1990 and references therein) are similar to comoviruses, potyviruses and picornaviruses in many ways, having a single-stranded positive-sense RNA genome, ending with a VPg and a poly(A) tail, and being translated from the genomic RNA(s) into a polyprotein further cleaved by (a) virusencoded protease(s) and/or autocatalyticatly to yield the mature viral proteins. Thus the 253K protein potentially encoded by GFLV RNA1 is a polyprotein. This is confirmed by an in vitro translation experiment showing that the primary translation product of GFLV RNA1 is processed (Morris-Krsinich et al., 1983) as has also been shown for other nepoviruses (Fritsch et al., 1980; Forster & Morris-Krsinich, 1985; Demangeat et al., 1990). To map the mature viral proteins on the precursor, we aligned the RNAl-encoded polyprotein (253K) with that encoded by other nepoviruses TBRV (254K) and GCMV (250K), comoviruses (CPMV, 200K) and a picornavirus (HPV, 247K) using computer-assisted sequence comparison with the software of Devereux et al. (1984). This procedure revealed domains of homology with a number of conserved residues characteristic of viral R N A polymerases, proteases, nucleotide-binding proteins and protease cofactors. At the C terminus of the polyprotein of GFLV R N A 1, four blocks resembling conserved motifs of RNAdependent polymerases were found between residues 1726 and 1876 (Fig. 2). The similarity ranged from 65 to 7 1 ~ for equivalent regions of GFLV, TBRV and CPMV. One of these motifs corresponds to the Y G D D span described first by Kamer & Argos (1984) and which is included in the third domain of the recently extended consensus sequence for RNA-dependent RNA polymerases (Poch et al., 1989). As proposed by these authors the four motifs may cooperate to form a well ordered domain described as a 'polymerase module' implicated in template seating and polymerase activity. Major structural homologies have been described between proteases that are encoded by animal picornaviruses, plant comoviruses, potyviruses, nepoviruses and cellular/viral trypsin-like serine proteases (Bazan & Fletterick, 1988; Gorbalenya et al., 1989). A triad of highly conserved His, Asp and Cys residues equivalent to the catalytic triad
2359
His, Asp and Ser of trypsin family proteases (Kraut, 1977) are found at structurally similar positions in all of these viral proteases. Other conserved residues have been described and could play an essential structural role or could contribute directly to the homologous catalytic and substrate-binding properties of the serine proteases (Bazan & Fletterick, 1988, 1989; Gorbalenya et al., 1989). The region of the GFLV 253K polyprotein extending from amino acids 1276 to 1443 can be easily aligned with blocks which display significant similarity between viral cysteine and trypsin-like serine proteases and which surround the three active site residues (Bazan & Fletterick, 1988) (Fig. 3). Although structurally very similar, nepovirus proteases (the GFLV putative protease, as well as the TBRV and GCMV protease domains) present some differences from the consensus sequence, especially in the amino acids that have been suggested to be important in forming the substratebinding pocket in large serine proteases (Bazan & Fletterick, 1988). The major point of difference consists of a His to Leu substitution at position 161 (numbering according to the poliovirus 3C protease) in nepovirus proteases. This amino acid is thought to be the key to the Gln/(Gly,Ser,Met) cleavage specificity of picornavirus-like proteases by hydrogen bonding to the Gin ( - 1 position) immediately upstream of the cleavage site. Another point of difference is a Pro Thr to Glu (Ser,Ala) substitution in positions 141 and 142. Amino acids in these positions are also thought to be important in forming the substratebinding pocket in serine proteases. All these changes make nepovirus cysteine proteases more typical of trypsin-like serine proteases in terms of substrate specificity (Kraut, 1977) in that attack occurs preferentially at peptide bonds following an arginine or a lysine residue at position - 1. Indeed, sequencing of the coat protein of three nepoviruses (Brault et al., 1989; Demangeat et al., 1991; Serghini et al., 1990) provides evidence that the cleavage occurs at trypsin-type dipeptide bonds. The purine NTP-binding sequence pattern, consists of two separate motifs, the N-terminal 'A' site [consensus sequence: (G/A)xx(G)xGKS/T, where x may be any amino acid residue] and the C-terminal 'B' site [consensus sequence: < hydrophobic stretch>D(E/D)] as described by Gorbalenya & Koonin (1989). This motif is located at positions 776 to 796 (A site) and 827 to 838 (B site) in the 253K GFLV protein (Fig. 4). Homologies between this region of GFLV 253K, the GCMV and TBRV polyproteins, CPMV 58K protein and poliovirus protein 2C are shown in Fig. 4. This motif has been found in non-structural proteins encoded by a wide range of positive-strand RNA viruses and in the dsDNA viruses so far sequenced
2360
C. Ritzenthaler and others AUG~CCCAC~G~C~AC~ACCG~=GCiUC~CCOGUG~G/~G~&~C~C~C~A~ ~
C
C
6
~
G
~
~
&
~
~
~
A
~
~
~
i00
U
&
~
=
~
=
~
200
U A ~ ~ G ~ G U ~ ; ~ C C ; ~ A U ~ / ~ C C ~ C U C C ~ C ~ C ~ A ~ U ~ ~ ~ MW Q V p E G ~ ~C C C ~G K S ~
S
300 20
~A
~AGGc6;~G~cuccGcuacGuGu&cUcauG~&uGAGcAcacGuc~uu~cu~6ccuccuc~c&~uc~GU~u;Gc E
A
K
E
L
R
Y
V
C
S
C
W
M
S
T
R
L
V
K
GCC CACUCC CCUAAAAUCC~AGGGACCAUUCAGGUCUCACUCCCAAAGGCUACUG P
T
P
L
K
S
K
G
T
I
Q
V
S
L
P
K
A
A
E
A
P
P
Q
Q
S
R
K
400
S
G
I
A
53
GG GUCAAACC CAGUAUCCAUAAAUCCAAAG GAGC UUCUGUGGCU
T
G
V
K
P
S
I
H
K
S
K
G
A
S
V
A
86
ccuc~cc~~b~U~Uc~Ua~Uc~U~U~ccUccU~cc~6~~cU~ccc~c~u~U~~ P
A
P
120
AGUCC UCCAAUAUAGUGGUUC UGCCACCUAC CCAAAAGGUGGAAGUAAGGGUGC CAGUL~GC UGUGCACC CAAGUGGAUGGUGGCUAUCC CUAAGCCACC
700
S
L
N
K
I
Q
V
R
V
C
L
E
P
V
P
V
T
V
Q
Q
K
Y
V
G
E
P
V
P
R
A
V
D
P
I
600 K
S
L
500
V
E
C
L
C
V
A
Y
P
P
K
P
W
L
M
V
V
R
A
E
I
E
P
K
P
GGUGAAGCUAG CCCCUAAAGC UAGCAAGC UACG GUUUCC UAAAGGAGCCGUAGCUUACAAUGGUGUCAAUiK~AUUGACACUAAGGGGAAAGUC V
K
L
A
P
K
A
S
K
L
R
F
P
K
G
A
V
A
Y
N
G
V
N
F
I
D
T
K
G
E
K
V
P
154
GUCC UA
800
V
186
L
UCUGAGGC~GCAAAGAGC~UC CUUAGGGG GAUC CGCG UUGCUG CAAAG CAAC GUCUUC GUGCUGCGCGCAGGUCUGCUGCGUGCAAGAAG GUGAGGGC ~ S
E
G
A
K
R
I
L
R
G
I
R
V
A
A
K
Q
R
L
R
A
A
~
R
S
A
A
C
K
K
V
R
A
900 K
220
AGC GUGC UC [~'UC-CC GAG UUUGAG GCAAUC GUUC AAAG UG;~ACGAUL]GGAC CAGUUAAAGACUGG GUUC CAAG UG GUGC L~CC AG CAC C AAAG AUGAGCUG i000 R
A
C
253
CAGCUUAAAAGAAGCUGCUCC UUCUAC CACUUC UGUGGUGGUAGUUAAGAAGAG GAAGCUGCCAAG GC UUCCUAAAALIUCUG CCUGAGCAGGACUUCUCC
II00
S
L
L
A
K
E
E
F
A
E
A
A
P
I
S
V
T
Q
T
S
S
E
V
R
V
L
V
D
V
K
UGCUUAGAGGGCUUUGACUGGGGGGAGAAAUCCCACCCAGUUGAGG C
L
E
G
F
D
W
G
E
K
S
H
P
V
Q
L
K
K
R
T
K
G
L
F
P
Q
R
V
L
V
P
L
K
P
I
A
L
P
P
K
E
M
Q
S
D
F
S
286
UAGACAUCGAGGAUGAUUGGAUCCUCGUGGAAA~CCUGUCCUUAA~
E
V
D
I
E
D
D
W
I
L
V
E
K
P
V
L
K
1200
R
Q
A
CUG UGCAAACUGCACAG GGCAGG GCAACC GAGG CC UUAACUCGGUUUG CAGCUACCAGUGGCUUCUCACUGGGC GC CCACCAAAAGGUGGAAGAUUUCGC V
Q
T
A
Q
G
R
A
T
E
A
L
T
R
F
A
A
T
S
G
F
S
L
G
A
H
Q
K
V
E
D
F
1300
A
353
UUCG~CUGG~GAAGCGGAA~AUU~GA~GGCAGGAGA~U~GCAGACC~C~GCUUGCUAUCUUUGG~G~AUAAUGA~GCACCGACGUUGUCUGCGAC~AUU S
S
G
E
A
E
Y
L
M
A
G
E
F
A
D
L
C
L
L
S
L
V
Y
N
D
A
P
T
L
S
A
320
1400
T
I
386
GAG GAAC UCAG GGAUAG CAAAGAUU'JJC UAGAG GC UAUC GAAC UCC UC AAGUUGGAAUUAGC UGAAAUUC CAACAGAC UC CACUAC AUGU GC CC CAUUUA 1500 E
E
L
R
D
S
K
D
F
L
E
A
I
E
L
L
K
L
E
L
A
E
I
P
T
D
S
T
T
C
A
P
F
K
AACAA~GGGCUUCUGC~GCCAAGCAGA~GGCUAAAGGGGUUGGCAC~A~GGUUGGGGAUUUUACI~AGAGC~CC~G~G~A~~ Q
W
A
S
A
A
K
Q
M
A
K
G
V
G
T
M
V
G
D
F
T
R
A
A
G
A
A
1600
V
V
I
S
F
D
453
UAUGGCAGUGGAAUUuCUGCAAGAUAAAGCGUUAAAGUUCUGUAAAAGAAUCUUUGACGUAACAAUGGCUCCA~AUCUCCAGCAUCUuGCCAGUC.CACAU M
A
V
E
F
L
Q
D
K
A
L
U CC A U C C U U A A G A A G A U L K ] G G C ~ U U G U S
I
L
K
K
I
W
E
K
K
F
C
K
R
I
F
D
V
T
M
A
P
Y
L
Q
H
L
CAGAAUGGAU GGAAAGC CUCAAGAG]3AAAGCUAGUUUAG ~ G U ~ U G
L
S
E
W
M
E
S
L
K
S
K
A
S
L
A
L
E
V
A
S
A
CG ~ C M
R
Q
H
1700
H
486
GCCA~
1800
A
I
520
UCG CUUUAGGUGCUAUGGUuAUAGGAGGUGUAGUAGUGUUGGUAGAAAAAGUGCUUAUUGCAGCCAAAAUUAUCCCUAAUUGUGGGAUUAUUCUGGGUGC A
L
G
A
M
V
I
G
G
V
V
V
L
V
E
K
V
L
I
A
A
K
I
I
P
N
C
G
I
L
CUULK/UGACAC LKSLVJCUUUGC UAGC CI3GG GGUU AACAGC C CUG GAGUG CACUGCAGAG GAAAUC UU C A G A A U G C A U ~ ~ G ~ G U ~ G F
L
T
L
F
F
A
S
L
G
L
T
A
L
E
C
T
A
E
E
I
F
R
M
H
A
C
420
C
1900
G
A
553
~ CUA~AC
K
S
A
I
2000
Y
586
UCCAuGUAUI.~UGUUC`CAGAGCCUACUAUGGCuC~L~AGGGAG1~UCuCACACUAUC~C`GC`CGACUC.AC,G~CUUGAU1~UGC1~CAG~CCUGACUC 2 1 0 0 S
M
Y
V
A
E
P
T
M
A
D
E
G
E
S
H
T
M
G
A
T
Q
G
L
D
N
A
I
Q
A
L
T
R
2200
GAGUAGGACAGAGUAUGAU~AGCUUCAAAcUGGGGAGCUUUUCAUAUUAUGCAAAGAUAGCCCAGGGGUUUGAcCAACUUGCAAGGGGUAAGcGAGcAAU V G Q S M I S F K L G S F S Y Y A K I A Q G F D Q L A R G K R A
AGGC b~3/~CLrt JACUAGUUC~ CUC~UC C~UC UUGU r.~3~ G UAUUUACUC CCAGG~.gJCUC-GAC AGC ~ G U ~ ~ A C G E L T S W L I D L V G S I Y S Q V S G Q E S T F F D
~ E
~ L
UA~G T I
S
620
I
653
~
2300 686
V
UC-CCUAGAUGUGAGAGCAUGGCUACUGAAAAGUAAGC GUGUUCGGUUGCAAGUGGAAACAAUGGCCAUAGGC GAUAGAAUAACCUUGGAUACUAUUGC CA 2400 C
L
D
V
R
A
W
L
L
K
S
K
R
V
R
L
Q
V
E
T
M
A
I
D
R
I
T
L
D
T
I
A
K
AAUUGUUAGAAGAAG GC CACAAGAUAC UG GUCACUGC GG CAGG UGUIK~CAAG GAAAAC GU CUGC UGAUT.ILK]ACAAUGUGCAUCAAG GAAG A A G U G U C U ~ L L E E G H K I L V T A A G V P R K T A D F T M C I K E E V S K
720
2500 753
GUUGGAAGAAG UGCAUGCCAGAAC G GCAUGUGC GG GAAUCAAUGAG GG UAUG C GAG CUUUUC CUUUL~GG GUGUACAUULrOUGG UG CUUCACAGUC UG GG 2600 L
E
E
V
H
A
R
T
A
C
A
G
I
N
E
G
M
R
A
F
P
F
W
V
Y
I
F
G
A
S
Q
S
G
786
AAGACAACAAUAGCAAAUU CAAUCAUUAUUC CAGCUUUG CUGGAAGAAAU GAAUCUUC CCAAGAGC UCAGUUUAUUCCAG GC CCAAGACUGG UG G G U L ~ K
T
T
I
A
N
S
I
I
I
P
A
L
L
E
E
M
N
L
P
K
S
S
V
Y
S
R
P
K
T
G
G
F
2700 W
820
GGAGUGGUUAUGCUAGG CAAG CAUG UGUGAAAG UIjGACG AUUUUUAUG CAAUUGAG CAGACC CC UAGC CU GG CGAGUUCUAUGAUUGAUG UG GUGAAUL~ 2800 S
G
Y
A
R
Q
A
C
V
K
V
D
D
F
Y
A
I
E
Q
T
P
S
L
A
S
S
M
I
D
V
V
N
S
853
AGAAC CUUAUC CC CUCGACAUGGCUUACAUUCACGAGAAGG GAAUG UCAAUGGAUUCUCCAUUAGUUGUAAC CACUGCAAAUAC CGC C GUGC CUCC UACC 2900 E
P
Y
P
L
D
M
A
Y
I
H
E
K
G
M
S
M
D
S
P
L
V
V
T
T
A
N
T
A
V
P
P
T
886
GFLV RIVAl sequence and genomic organization
AAUUC UCAGGUGG UGGACUUG CC GUCAUDI/~AUAACAGGAGGGCGG CG GUAC UG GAAGUACG C A G G A A G G A U G G ~ ~ A C A C N S Q V V D L P S F Y N R R A A V L E V R R K D G S F F
AIIUCAL~C~Lrt/C,/~GtNJCGCUUCAUC~AU/~U~I3[~UCCGUAL~IJI~AUIX~UGU J ~ ~ ~ C C ~ A G ~ CC~ A G S
C
I
E
V
R
F
M
H
N
K
C
P
Y
V
D
S
A
G
V
P
Q
G
P
T
P
U ~ C C
A
V
N
T
2361
C G C G ~ C G U A ~ 3000 R A Y D 920
C P
A
M
~
~
D
3100
E
G
953
AUGGAUCAC UC CAAGUGAGGC UG UUGCAGUUCUCAAAAAUC UC UUG GG UGAACACAUUUUGG CUGAGGAAGCAAAGCUC-CUGGAAUACAGAGAG CG GAUU 3200 W I T P S E A V A V L K N L L G E H I L A E E A K L L E Y R E R I 986 3300 1020
GGUAAUGACCAUCCCAUAUAUAACGCAGCAAAGGAAUuCAUAGGCAACAUGCAcUAuCCUGGGCAGUGGCUGACUGCUGAACAGAAGAGUACCUAUGGAA G N D H P I Y N A A K E F I G N M H Y P G Q W L T A E Q K S T Y G
UCAAG GAUGAUGGAUUCUCLKli~ CUUGCG GUUGAUGGAAAGAUAUACAAG UACAAUGUAC UG GGUAAGUUGAAUCCAUGUGAGUCUGAAC CACCACAUCC 3400 K D D G F S F L A V D G K I Y K Y N V L G K L N P C E S E P H P 1053
C ~ U G UGAU-clCC.AUG~,rtJG~ u ~ . G ~ . U U A G ~ U U G N
V
I
P
W
L
E
K
K
T
L
r~CAUUGGGAUGUGCA U / ~ C AUAUI.X3CCAC~ GUCCCC~ ~ U G ~ C U ~ ~ G C A U ~
E
I
V
H
W
D
V
H
K
H
I
A
T
G
P
LKKJI~GCAGGGCUUGG U C C / ~ A C / ~ I ~ 3 ~ G U A G A G A G I.X3UGG/~CGUAUGGGG.~.C~.UAG~C UCCG~ C F
L
Q
G
L
V
Q
G
Q
S
K
V
E
S
V
E
R
M
G
K
D
S
S
P
E
R
N
A
L
V
A
A A C ~ ~ ~ C Q
Q
N
F
F
K
C
3500 1086
GC~
3600
L
1120
R
GUUUAUCCGAGAGAAUUUAUC UGAGGCUGUGCCAAAUCC GCAUUGAUAAUAUCCAGAAAGAAGAGC UGGCGG GUUCUGGUAGAG GGCC CAUGC-CAAUAUU 3700 L S E R I Y L R L C Q I R I D N I Q K E E L A G S G R G P M A I L 1153 GAGAGAGUGUC UGAUGAAGAGUAAG CAGGUAGUGGUGGAAAAC UACUCAUUG CUAUUGACAUUG GUGG CUAUUC UCUUGCUUAUCAGUGCUGCUUACACG 3800 R E C L M K K Q V V V E N Y S L L L T L V A I L L L I S A A Y T 1186 C UACUCUCAACAGUGGUGGCUCUGG CGGGUUGCUC U A G ~ C U G G U G G U A U G G U U G L L S T V V A L A G C S S F A G G M V
A
CAGUGACGGCUGUCAACAAUGCUUCUAUACCGUGUUCAGAGC 3900 V T A V N N A S I P C//S E P 1220
CUC GCUUGGAGGAAAGAUAUUCCCCUAGAA UC GUUUUG UC UC GCGAAUUUCUAAAAUUAGGGG UGAG GGAC CUUC CAAAGGACAAGGUGAGCAUGAGGA 4000 R L E E R Y S P R N R F V S R I S K I R G//E G P S K G Q G E H E E 1253 AUUGGUUACUGAGCUCUAUUACUAUUGUGAUGGAGUUAAGAAGCUAAUUUCCACGUGUUGGUUUAAGG GAAG GUCACUCUUGAUGACGAGACAUCAAGCC 4100 L V T E L Y Y Y C D G V K K L I S T C W F K G R S L L M T R H Q A 1286 C UGGC CGUUCCGAUC GGCAAUGAAAUUGAAGUAAUAUACGCUGACGGAACAACAAAAAAGUUAGUUIX~C,CCAGGCAGG CAGGAG GAUGGCAAtKIGCAAAG 4200 L A V P I G N E I E V I Y A D G T T K K L V W P G R Q E D G N C K G 1320 GCUUCGUCGAAUUCC CUGAAAAUGAGUUAGUUGUCUUUGAGCAUCCACACUUAUUGACACUGCC CALrOAAGUAUGAGAAGUACUUUGUUGAUGAUGCCGA 4300 F V E F P E N E L V V F E H F H L L T L P I K Y E K Y F V D D A D 1353 UAGACAGAUUUCC CCCAACGUUGCGGUUAAGUG UUGCGUUGCGCGCUUAGAAGAUGGAAUUC CCCAAUUC CACUtru]UGGAGCAAAUAUGCAACAGCCCGC 4400 R Q I S P N V A V K C C V A R L E D G I P Q F H F W K Y A T A R 1386
AGUG/~G UUCAUACGC U ~ C ~ U G AGGGCGGGGG~J~UGUCUACCAG/~C/~GAU~u~C~ CGCUAUAU-UGU[~AUGCGCAUG.~.GC/~XAG/~GUAUGA~ 4500 S
E
V
H
T
L
K
D
E
G
G
G
N
V
Y
Q
N
K
I
R
R
Y
I
V
Y
A
H
E
A
K
K
Y
D
C 1420
GI..~C42CUUGGCuGUGC~UGUGAUCC1~GG~uCCC1~l~G[)C-AUCGC1~UGCI3UG~~UAGAG~Gu~CCuACUC~CUG~AU~C~ 4600 G
A
L
A
V
A
V
I
Q
G
I
P
K
V
I
A
M
L
V
G
N
R
G
V
T
Y
S
S
V
I
P
N
1453
CUACAGUUCUuCUUUUAUuAGGGGAGAAGUGCCAUAUGUACCAGAAGAUGGGUUAGUGUCGAGGGGAUAUAGGAAAGUGGGUUAUUUGCACGCAUCGGAU 4700 Y S S S F I R /G E V P Y V P E D L V S R G Y R ~[ V G Y L H A S D 1486
GCGCCACAUGUC~CUUCU/~JkGAC~ J CCUUCAUG~.GGUACCAGAI~/~LrtJGUC.42UUI~C~UAIX~CUGAUCCU/~GCAG CCCGCCAUCCI~AGCGCA~GG 4800 A
P
H
V
P
S
K
T
S
F
M
K
V
D
E
L
C
F
P
Y
P
D
AL~/~CGCCUUI~.GCIG/~CC~UCCAUGAAC~AUACACUCCGITkJ~GG~N£]C.4~CAUG~GAAG~ E
R
L
K
G
T
I
H
E
G
Y
T
P
L
R
D
G
M
K
K
F
E
V
A
G
D
M
V
Q
T
W
Y
D
P
G
E
F
L
E
D
I
K
Q
P
A
CU~GC ~ U G U ~ A
CGAUG/~GUUGCAGI3UGACi%UGGr.~C.AGACGUGGUAUC~CCCGGGUC~IrtJCc u r ~ / ~ G A U A ~ D
P
E
~ S
P
A L
M
L
L
L
A
N
A
E
~ C
E
UA~UG
Q
S
CU~ U A ~
Y
~
D
I
E
D 1520
UACU 4900
K
L
L
GG~CAUG ~ A G G
D
M
D
1553
5000
E
1586
GAAUAUUUUGACCCC CUAGUGAUGGACACGUCUGAAGGAUAUCCCGAUGUUUUAGAUC GUAAACCUGGGGAAAAGGGAAAGGCGAGAUUCLK/UGUUGGGG 5100 E Y F D P L V M D T S E G Y P D V L D R K P G E K G K A R F F V G E 1620
1~cCAGG~UAGAGCCUUI.X3UAGCUGGUL~U1~UCCUG~GC~r~ALrtJAcc1~CUAGAA~C~ACUCU~C~GAUACC~CC~GG~AGUAU 5200 P
G
N
R
A
F
V
A
G
C
N
P
E
AGAt~CCC C / k ~ C ~ C G A G A G A U U ~ G / ~ G U ~ U E
T
P
K
D
E
R
L
K
R
S
K
K
A
Y
Y
Q
L
E
E
D
CGAUACUCCUGGUAC~GALrtJ A ~ C I
D
T
P
G
T
R
L
F
S
K
T
K
I
P
S
V
UGUGCUGCCCLr~C~~.UAU/~UCUC~ S
V
L
P
L
A
Y
N
L
L
S
I
~ L
5300 R
1686
5400 1720
GUAAAAUUuuUGUCU•UUUCCCGCCUACUGAuGAAGAAGAGAGGCCACCUGCCUUGUCAAGUGGGCAUCAAUCCUUAUAGUCGCGAAUGGAcCGAUUUGU V K F L S F S R L L M K K R G H L P C Q V G I N P Y S R E W T D L AUCACCGCUUAGGUGAACUCUCUGAUGUCGGAUACAAUUGUGAUUAUAAGGCUUUUGAUGGCCUAAUUACGGAGCAAAUUUUGAGUAcGAUCGCCGAUAU H R L G E L S D V G Y N C D Y K A F D G L I T E Q I L S T A D
1653
M
5500 1753
GAUCAAUGCUGGAUAUC GUGACC CUGUCG GCAAUAGGCAGAGGAAAAAUUUG CUCCUAC-CAAUAUGUGGACGCC UCUCUAUCUGUGGAAAUCAAGUGUAU 5600 I N A G Y R D P V G N R Q R K N L L L A I C G R L S C G N Q V Y 1786 G CCACUGAAGCAGGCAUACCUUCAGGCUGUGcUCUCACAGUAGUACUCAAUUCCAUUUUUAAAGAACUACUGAUGAGAUAUUGC UUCAAGAAAAUAGUUC 5700 A T E A G I P S G C A L T V V L N S I F K E L L M R Y C F K K I V P 1820
C. Ritzenthaler and others
2362
C C C C U G U G U A C A A G G A A U G U U U U G A C A G A U G UG U C G U G C U C A U U A C C U A U G G U G A U G A U A A U G U C U U C A C U G U G G C A C A A U C UG UCAUC-CAG UALrOUUAC 5800 P V Y K E C F D R C V V L I T Y G D D N V F T V A Q s V M Q Y F T 1853 U G G C G A U G C UC U G A A A A U G C A A A U G G C A A A G C U U G GG GUAACUA[]UACUGAUGGGAAAGAUAAG UC UC U U U C C A C U A U U C C A G C CC G U C C A C UG C U G G A A G D A L K M Q M A K L G V T I T D G K D K S L S T I P A R P L L E
5900 1886
UUAGAGUUUUUGAA L E F L K
6000 1920
C G U G G A U U U G trOAGAAGC UC UG GG GG U A U G A U A A A U G C G CC tK]t~ G A A A A A U U A U C A A U A A U G A G U U C U U U G G UC U A C A U C A G A A R G F V R S S G G M I N A P L E K L S I M S S L V Y I R
GUGAU~C U C . . A ~ L ~ UUGCAG~I~CUAtJUGGAC.~u~UG~ J ~ U A C UG~CUL~3b~ GAGCUUUA~UACA~ ~ U A ~ C D
G
S
D
M
L
Q
K
L
L
D
N
V
N
T
A
L
V
E
Y
L
H
G
D
~
R
T
A Y
~
F
AG~ A G 6100 R
1953
A G C U U U U U A C U U C G A G A A A C U C C C U C C UG GC GC U U A U A A G G A G U U G A C A A C G U G G U U U C A A G C A G A A U C A U U [ ~ A U G A G U G U C A G A A G A G U G G A G A ~ G U A F Y F E K L P P G A Y K E L T T W F Q A E S F H E C Q K S G E S
6200 1986
GGUUAUAAG( C A C A A G G G C u C A U U G A G A U u A G U C A U G G A G C A G C U u U u G C C A G U U U u A C U C A A C A A G C U G G U A C A G A G U u G G A A A A G C A U G A C A U u U G C C G Y K P Q G L I E I S H G A A F A S F T Q Q A G T E L E K H D I C
6300 2020
CUGGCUUAUCGALK~ CAGGAACUAAGUACAUUGCUACUGAGAAUGAGAUUGUUUUG G L S I A G T K Y I A ,v E N E I V L
L
E
S
V
UCUC UUAG U U C C G U C C UACC C G G G G A U A G A A A U G U U U U C A A G UU 6400 5 L S S V L P G D R N V F K L 2053
G GAUC U G C C U U G U G G A G A C G G G A U A G G GC G U U U G C C U U C CAAAUGC A G U A U C U U A A A C U U G A A A A C C G G G C U U G G U U A U G A G A U U G U G C A A G C G U G CC 6500 D L P C G D G I G R L P S K C S I L N L R K P G L V M R L C K R A 2086 C A A G A U G A A A A G A A G A C C U U A G U U A U U C G U G A C G A A A G G C C A U A C A U U G G U G C A U G G G C A G U U G C U U G C A U A U G U G G A G A G A G U U U U G G C U U C G GGCAAC 6600 Q D E K K T L V I R D E R Y I G A W A V A C C G E S F G F G Q Q 2120 A G A G C GUrJCUUGC GCUUr,]AUGCC A A C U U G U U G G G A C C AAAUCAGAGGAAUGGCUi]AGC C A G i , R ] A ~ C U G S V L A L Y A N L L G P N Q R N G L A S Y F S
D
A~G~GUC F E
S
P
CC A ~ C A U A U C ~ G ~ I I K K
6700 2153
AGUCCAUGC CAAAACAAACUC UUAUGAGGGG GGUGAAGC UUUAAAG GAAAUUUUCACUUUUUGUGAGACUAUUUUCUAUGAAGCCACCGAAAUGGAUACU V H A K T N S Y E G G E A L K E I F T F C E T I F Y E A T E M D T
6800 2186
A G G A A A G U G A U G U U G C A A A A U C A A C C A G A C G U C U A U C C U A G U A U A A G U C U U G UUGG GGGG GUUUGKJ[Z3CCCAAAUGAGGGAGGAGAGCCUGGAGCCAUGU R K V M L Q N Q P D V Y P S I S L V G G V C F P N E G G E P G A M C ACUCGGAAACAGAUGUUAC GAUGGC UAGAGAGGUUCAAGGAGUCUAUGUAAG UGAAGC GUGCGUGAAAUGUUGCAGGC GUUGUGUAGGAG UAGCAACCAG S E T D V T M A R E/QV Q G V Y V S E A C V K C C R R C V G V A T R .A
6900 2220 7000 2253
GGLF~GUGACUGAUACGC1~3UUUUGGC1~C~.UcUUL~1h~GACUcAUCUUl~GGCUI~G~GG~r`~CA~UCAUACAUGCCUUAG~U~GcC7100 V
V
T
D/NT
Q
L
F
G
N
N
L
L
K
T
H
L
K
A
L
R
K
I
Q
N
H
T
C
L
R
K
2284
U U C C A A U U C LK]GGUAC'UGGGAUAAC CAAG[Z/UAAAUAAC CC AG UL]UCUUrb~JGCC LKIiK:AUCgZUC U U U U A G G C A A U U G A A U A U U A G A G C A U U U G U G ~ A ~
7200
GUuUGUUUUAACUUGuu[yJCUGCUUUAGUGUGUUUUCUUUCAUGCUUUUAGUGGCGACAGUGUGUUGUUUGUCCUUUGGAAcCACUUGCCUUGUUGGACG +U . . +U CAAAAAGAULrtK]CULK]UUTJC ULK]UUACUGUUAUG C A A A U U U U 7342
7300
Fig. 1. Complete nucleotide sequence of G FLV-FI 3 RNA1 and deduced amino acid sequence of the large ORF. N ucleotide sequence heterogeneities are shown above the sequence. The bar under nt 2 to 10 indicates the correspondence with the 5" end consensus for nepovirus RNAs. The double slash bars (//) denote the cleavage sites in the VPg sequence indicated by the underlined residues and the single slash bars (/) indicate the putative R/G cleavage site between RNA polymerase and protease in the 253K polyprotein.
Cons GFLV TBRV GCMV CPMV HPV
DY D LSDVGVYCDyKAFDGL KTNEAINCDYSGFDGL KTNEAINCDYSGFDGL KGNDVLCCDYSSFDGL MEEKLFAFDYTGYDAS
43 47 47 44 38
G PSG T NS VYATEAGIPSGCALTVVLNSIFKELLM VYKVNCGLPSGFALTVVVNSVFNEILI VYKVNCGLPSGFALTVVMNSIFNEILI VWRVECGIPSGFPMTVIVNSIFNEILI IYCVKGGMPSGCSGTSIFNSMINNLII
21 21 21 26 17
YGDD DK LITYGDDNVFT 20 LGVTITDGKDKSL LLVYGDDNLIS 20 KKVKITDGSDKDA LLVYGDDNLIS 20 KRIKITDGSDKDA LTVYGDDNLIS 20 GGVTITDGKDKTS MIAYGDDVIAS 15 DYGLTMTPADKSA
Fig. 2. Alignments of the putative polymerase domains of the 253K GFLV protein (positions 1726 to 1876) with the four conserved motifs of the RNA-dependent RNA polymerases of TBRV (1712 to 1866), GCMV (1695 to 1849), CPMV (1427 to 1583) and HPV (1970 to 2106). The consensus sequence is indicated in the line 'Cons'. The bold faced residues show the extended consensus used by Poch et al. (1989). The lengths separating the motifs are indicated.
(Gorbalenya & K o o n i n , 1989). It has been proposed that all the N T P - b i n d i n g p a t t e r n - c o n t a i n i n g p r o t e i n s throughout the viral k i n g d o m m i g h t all be N T P a s e s i n v o l v e d in either (i) duplex u n w i n d i n g during D N A and R N A replication, transcription, r e c o m b i n a t i o n and
repair and possibly m R N A translation, (ii) D N A p a c k a g i n g or (iii) d N T P generation (Gorbalenya & K o o n i n , 1989; Lain et al., 1990). In light o f w h a t is k n o w n about the maturation o f virus-encoded polyproteins (for review see W e l l i n k &
G F L V RNA1 sequence and genomic organization
Cons GFLV TBRV GCMV CPMV HPV YFV TRP CHT
* GRSLLMTRHQALAVPIGNEIEV NKSVRMTRHQA/~RFQEGEQLTV NKCVI~MTRHQAQSLNEGDELSV LACKHFFTHIKTKLRVEIVMDG DNVA/LPTI]AS PGESIVIDG GVAQGGVRHTMWHVTRGAFLVR SQWVVSAAI:ICYK SGIQVRLG ENWVVTAABCGVTTSDV VVAG I 510 40
51 40 40 34 27 22 30 30
* VDDADRQISPN LEDKEVDLPNH LEDAEVELPSN CWDLFCWDPDK FRDIRQHIPTQ KEDLVAYGGSW NNDIMLIKLKS NNDITLLKLST J 0
48 48 48 46 41 35 64 66
pt *G NKIRRYIVYAHEAKKYDCGALAVAVI HEIPEKITFHYESRNDDC(~4IILCQI HEIPEKIVFHYESRNNDC(~MLLTCQL NKVSRYLEYEAPTIPEDCGSLVIAHI RQTARILMYNFPTRAC.-QCGGVITCTG RNGGEIGAVALDYPSGTSGSPIVNRN MFCAGYLQGGKDSCMGDSGGPWCSG DAMICAGASGVSSCMGDSGGPLVCKK I I I 130 140 150
4 4 4 4 0 1 0 4
2363
G h gg Cleavagesite KVIAMLVSGNR R/G RVVGI~VAGKD K/A KVVGMLVAGKD R/A KIVGVHVAGIQ Q/G,S,M KVIGMHVGGNG Q/G EVIGLYGNGIL K,R{G.S,V ICLQGIVSWGSG K,R/X TLVGIVSWGSS #,M,L/X I 160
Fig. 3. Alignment of the putative cysteine protease domains of GFLV (positions 1276 to 1443) with the similar domains of TBRV (positions 1263 to 1425), GCMV (positions 1248 to 1409), CPMV 24K (positions 983 to 1135), HPV 3C (positions 1594 to 1728), yellow fever virus (YFV) NS3 (positions 1525 to 1642) (Rice et al., 1985), trypsin (TRP) and chymotrypsin (CHT). The four similarity boxes encompassing the 'conserved' catalytic triads of His, Asp and Ser/Cys residues are indicated by (*). Residues conserved among all the large proteases are shown in upper case in the consensus sequence ('Cons' line). Residues thought to be important in forming the substrate-binding pocket in large serine proteases are presented in lower case. Cleavage specificity associated with the different (putative) proteases are presented in the 'cleavage site' column. The symbol # designates an aromatic amino acid. Positions indicated below the alignments refer to numbering of the poliovirus 3C protease.
van Kammen, 1988), several types of mechanism can be considered. The most common mode of expression is certainly the specific cleavage of a viral polyprotein catalysed by virus-encoded protease(s). For some viruses, additional cleavages indispensable for their expresTable 1. Cleavage sites in nepovirus RNA 1 and RNA 2-encoded polyproteins Cleavage sites in polyproteins encoded by GFLV RNA1 NBP*/VPg VPg/protease Protease/polymerase t Protease/polymerase$ GFLV RNA2 Coat protein TBRV RNA2 Coat protein:~ GCMV RNA2 Coat protein§
Amino acid at position -5
-4
A
[]
~
i
F
-3
-2
-1
+1
+2
+3
I
V
I I F
R R I
C G G R
S E E G
E G V E
P P P V
T
V
R
G
L
K
C
N
L
KIA
G
G
E
T
N
L
R/A
G
G
* NBP, Nucleotide-binding protein. t Putative cleavage sites depending on whether cleavage at G/E or R/G occurs. Demangeat et al. (1991). § Brault et aL (1989).
Cons GFLV TBRV GCMV CPMV HPV
*** g g GK s * a t 776 WVYIFGASQSGKTTIANSIII 776 W I Y L F G Q R H C G Y ~ S ~ T L D N 758 WIYLWGPSHCGKSNFMDVLGM 489 TIFFQGKSRTGKSLLMSQVTK 1248 CLLVHGSPGTGKSVATNLIAR A site
30 28 28 29 26
***D d e ACirKVDDFYAIE TFFHVDDLSSVK TIMEIDDLSSIK PFVLMDDFAAVV GVVIMDDLNQNP B site
838 836 818 550 1306
Fig. 4. Alignment of the putative NTP-binding domain of GFLV 253K with the corresponding regions of TBRV, GCMV, CPMV and HPV. The positions of the two sites in the polyprotein domains are indicated by underlined numbers. In the upper part, the consensus sequence ('Cons' line) shows invariant residues in upper case; (*) indicates hydrophobic residues. The distances separating the two motifs, designated "A site' and 'B site', are indicated.
sion are provided by host-encoded proteases and/or by autocatalytic cleavage (Arnold et al., 1987). Cleavage sites of viral proteases are usually characterized by a preference for certain amino acids at one or more of the upstream positions (positions - 1 to - 5 ) , ( B a z a n & Fletterick, 1988). Accordingly and in view of primary structure and cleavage sites of the RNA1- and RNA2encoded polyproteins of GFLV, GCMV and TBRV, we have attempted to establish a motif of conserved amino acids for the GFLV cleavage sites. Table 1 shows the amino acids surrounding the various experimentally determined cleavage sites in the nepoviruses and the two cleavage possibilities between the protease and the polymerase domain in GFLV. The amino acids downstream of the + 1 position in GFLV are not highly conserved and thus seem not to have an important role in determining whether a site is used for proteolytic processing. This is in good agreement with what has been described for other viruses (Wellink & van Kammen, 1988). Nevertheless, as shown in Table 1, Ser residues are always present at positions - 5 and/or - 4 in GFLV (boxed in Table 1) and consequently might be the key residues for the recognition of the site of cleavage by the protease(s). To predict other maturation sites in the RNA1 and RNA2 polyproteins of GFLV we tried to find the same conserved residues in their primary structure. Only the two proposed alternative cleavage sites, R/G or G/E, between protease and polymerase were found to have the same characteristics (Table 1). Note that the two sites are separated from one another by only one residue in the polyprotein sequence [R(1460)/G; G(1461)/E]. As discussed above, all these sites are of the trypsin type except for the Cys/Ser and Gly/Glu sites which occur between the putative NTP-binding protein, the VPg and the putative protease of the GFLV 253K protein respectively (Pinck et al., 1991). Since it is the first time that such a cleavage has been reported for viral proteolytic processing, we do not yet know whether this
2364
C. Ritzenthaler and others
Cons GFLV 471 GCMV 462 TBRV
463
CPMV 192
F * * * ** W I, * * * * ** * L'E* * * ** FDVTMAPYLQHLASAHS I LKKIWEKLSEWMESLKSKAS~QHAI FALGAMVI GGVVVLVEKVLIAAKI I FREALTTIKFELGVAMELVEVLIARVKSWFDTLLAKIDHALASLGKWACYALGILLGIGLCNLIETIIGGHGML FREC IKMIHKELGCAMELIEVMIKKVKDWYNSMLEKLHCGLATLGTYAMYALAI LLGCGLTT/~LERC IGGAGIL FEKMVQEYVFMAHRVCSWLSQLWDKIVQWISQASETMGWFLDGCRDLMTWGIATLATCSALS~VEKLLVAMGFL
544 535 536
265
Fig. 5. Alignmentofpartofthe 32KproteasecofactordomainofCPMVtotheequivalentdomainsofGFLV, TBRVandGCMV. The amino acids conserved between the four sequences are indicated under 'Cons'. (*) indicates homologousamino acids and the strictly conserved residues are bold faced. The numbers indicate the position in each polyprotein.
C/S G/E
ca!_v 253K
::~: :: :::i~.~,~&~,~.~.~,~,~.t2~.~,~.~,~.~" 133K
'
Q/S CPMV 200K
Protcase cofactor 32K
~i~ii~ii!~iii~Q ~ I I I I I I I I I I I I I I 1111111111 * I I I 1111Ill " "25K
QtS Q/M NTP binding protein 58K
(R/G) 92K
Q~
VPg Proteasc 4K 24K
RNA polymerase 87K
Fig. 6. Schemeofgenomeorganizationandproteolyticprocessingofthe 253KpolyproteinencodedbyGFLVRNA1 compared to that of the CPMV B-RNA-encoded polyprotein. The position of the consensus sequences for polymerase (,), protease (0), nucleotidebinding protein (iS]) and protease cofactor (O) are marked with symbols. The predicted cleavage sites between the protease and the polymerasedomains are indicated in parentheses. (?) indicates a possiblecleavage site between the NTP-binding protein and protease cofactor. is due to a change in the specificity of the putative protease of G F L V involving a protease cofactor as in CPMV, or if it is due to another virus- or host-encoded protease. Comparisons of the N-terminal part of the 253K protein, which has no known function, with the proteins present in the N B R F and Swissprot databases allowed us to align the region between residues 471 and 544 with the equivalent regions of TBRV, G C M V and C P M V (Fig. 5). The similarity in this region ranged from 51% to 54~o between nepoviruses or with CPMV. This region in C P M V has no specific proteolytic activity but acts as a cofactor for the 24K protease in processing the R N A Mencoded polyprotein at the Gin/Met site (Vos et al., 1988). Since the C P M V protease cofactor does not change the specificity of the trypsin-type dipeptide cleaved by the protease (Gin/Met), we cannot conclude that the putative cofactor of G F L V allows the recognition by the protease of the uncommon Cys/Ser site at the N terminus of the VPg. This unusual cleavage specificity could either be explained by autocatalytic cleavage (Arnold et al., 1987) or cleavage by a host protease. The study of sequence homology suggests a probable function of each domain in the G F L V polyprotein. Thus the experimentally established location of the VPg and the search for putative cleavage sites within the 253K protein has allowed us to propose the genomic organization of the R N A 1-encoded polyprotein as seen in Fig. 6. The VPg, with 24 residues, has an Mr of 2"9K. The protease domain extending from residues 1241 to 1460 or 1461 and in which only the N-terminal cleavage site has been confirmed, would have an Mr of 25K and the R N A dependent R N A polymerase (positions 1461 to 2264) an
M r of 92K. Since no other cleavage sites are predicted according to the cleavage recognition site described in Table 1, the N-terminal part of the 253K region upstream of the VPg, involving the NTP-binding domain and the putative protease cofactor domain, would have an Mr of 133K (positions 1 to 1217). Recently, in vitro proteolytic processing of the T B R V 250K has been described, and at least four cleavage products could be detected; two of these are located in the N-terminal part of the polyprotein (Demangeat et al., 1991). In addition, in C P M V , two proteins are produced upstream of the VPg. Thus, it is likely that the G F L V 133K is cleaved further into at least two proteins. It is possible, in view of the fact that no other consensus cleavage sites based on our consensus sequence could be found, that the cleavage recognition site is in fact less narrow than that described in Table 1. The organization of the G F L V R N A l - e n c o d e d polyprotein shown in Fig. 6 is hypothetical and, since none of the 253K cleavage products, except the VPg, have been isolated and characterized, the exact sizes of the viral proteins remain unknown and need to be confirmed by in vivo and in ritro studies. R. Margis was supported by a grant from the National Council of Research and Technology (CNPq, Brazil). We thank Dr K. Richards for improving the manuscript.
References ARNOLD, E., LUO, M., VRIEND, G., ROSSMAN, M. G., PALMENBERG, A. C., PARKS, G. D., NICKLIN, J. H. & WIMMER, E. (1987).
Implications of the picornavirus capsid structure for polyprotein processing. Proceedingsof the National Academy of Sciences, U.S.A. 84, 21-25.
G F L V R N A 1 sequence and genomic organization
BAZAN, J. F. & FLETTERICK, R. J. (1988). Viral cysteine proteases are homologous to the trypsin-like family of serine proteases: structural and functional implications. Proceedings of the National Academy of Sciences, U.S.A. 85, 7872-7876. BAZAN, J. F. 86 FLETTERICK, R. J. (1989). Comparative analysis of viral cysteine protease structural models. FEBS Letters 249, 5-7. BRAULT, V., HIBRAND, L., CANDRESSE, T., LE GALL, O. 86 DUNEZ, J. (1989). Nucleotide sequence and genetic organization of Hungarian grapevine chrome mosaic nepovirus RNA2. Nucleic Acids Research 17, 7809 7819. DEMANGEAT, G., GREIF, C., HEMMER, O. 86 FRITSCH, C. (1990). Analysis of the in vitro cleavage products of the tomato black ring virus RNA 1-encoded 250K polyprotein. Journal of General Virology 71, 1649-1654. DEMANGEAT,G., HEMMER, O., FRITSCH, C., LEGALL, O. 86 CANDRESSE, T. (1991). In vitro processing of the RNA2-encoded polyprotein of two nepoviruses: tomato black ring virus and grapevine chrome mosaic virus. Journal of General Virology 72, 247 252. DEVEREUX, J., HAEBERLI, P. & SMITHIES, O. (1984). A comprehensive set of sequence analysis programs for VAX. Nucleic Acids Research 12, 387 395. FORSTER, R. L. S. & MORRIS-KRSlNICH, B. A. M. (1985). Synthesis and processing of the translation products of tobacco ringspot virus in reticulocyte lysates. Virology 144, 516-519. FRITSCH, C., MAYO, M. A. & MURANT, A. F. (1980). Translation products of genome and satellite RNAs of tomato black ring virus. Journal of General Virology 46, 381 389. FUCHS, M., PINCK, M., SERGHINI, M. A., RAVELONANDRO, M., WALTER, B. & P1NCK, L. (1989). The nucleotide sequence of satellite RNA in grapevine fanleaf virus, strain F13. Journal of General Virology 70, 955-962. GOLDBACH, R., EGGEN, R., DE JAGER, C. 86 VAN KAMMEN, A. (1990). Genetic organization, evolution and expression of plant viral RNA genomes. Recognition and response in plant-virus interactions. Cell Biology 41, 147-162. GORBALENYA,A. E. & KOONIN, E. V. (1989). Viral proteins containing the purine NTP-binding sequence pattern. Nucleic Acids Research 17, 8413-8440. GORBALENYA, A. E., DONCHENKO, A. P., BLINOV, V. M. & KOONIN, E. V. (1989). Cysteine proteases of positive strand RNA viruses and chymotrypsin-like serine proteases. FEBS Letters 243, 103-114. GREIF, C., HEMMER, O. 86 FRITSCH, C. (1988). Nucleotide sequence of tomato black ring virus RN A 1. Journal of General Virology 69, 15171529. HEIDECKER, G. 86 MESSING,J. (1983). Sequence analysis of zein cDNAs obtained by an efficient mRNA cloning method. Nucleic Acids Research 11, 4891-4906. KAMER, G. 86 ARGOS, P. (1984). Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Research 12, 7269-7282. KITAMURA, N., SEMLER, B. L., ROTHBERG, P. G., LARSEN, G. R., ADLER, C. J., DORNER, A. J., EMINI, E. A., HANECAK, R., LEE, J. L.,
2365
VAN DER WERF, S., ANDERSON, C. W. 86 WIMMER, E. (1981). Primary structure, gene organization and polypeptide expression of poliovirus RNA. Nature, London 291, 547-553. KOZAK, M. (1989). The scanning model for translation: an update. Journal of Cell Biology 108, 229-241. KRAUT, J. (1977). Serine proteases: structure and mechanism of catalysis. Annual Review of Biochemistry 46, 331-358. LAIN, S., RIECHMANN, J. L. & GARCIA, J. A. (1990). RNA helicase; a novel activity associated with a protein encoded by a positive strand RNA virus. Nucleic" Acids Research 18, 7003-7006. LEGALL, O., CANDRESSE, T., BRAULT, V. 86 DUNEZ, J. (1989). Nucleotide sequence of Hungarian grapevine chrome mosaic nepovirus RNA1. Nucleic Acids Research 17, 7795 7807. LOMONOSSOFF, G. P. 86 SHANKS, M. (1983). The nucleotide sequence of cowpea mosaic virus B RNA. EMBO Journal 2, 2253 2258. MORRIS-KRSINICH, B. A. M., FORSTER, R. L. S. & MOSSOP, D. W. (1983). The synthesis and processing of the nepovirus grapevine fanleaf virus proteins in rabbit reticulocyte tysate. Virology 130, 523526. PINCK, L., FUCHS, M., PINCK, M., RAVELONANDRO,M. & WALTER, B. (1988). A satellite RNA in grapevine fanleaf virus strain F13. Journal of General Virology 69, 233 239. PINCK, M., REINBOLT, J., LOUDES, A. M., LE RET, M. & PIr~CK, L. (1991). Primary structure and location of the genome-linked protein (VPg) of grapevine fanleaf nepovirus. FEBS Letters 284,117-119. POCH, O., SAUVAGET, I., DELARUE, M. & TORDO, N. (1989). Identification of four conserved motifs among the RNA-dependent polymerase encoding elements. EMBO Journal 8, 3867-3874. RICE, C. M., LENCHES, E. M., EDDY, S. R., SHIN, S. J., SHEETS, R. L. & STRAUSS, J. H. (1985). Nucleotide sequence of yellow fever virus: implications for flavivirus gene expression and evolution. Science 229, 726-733. ROBINSON, D. J., BARKER, H., HARRISON, B. D. & MAYO, M. A. (1980). Replication of RNA1 of tomato black ring virus independently of RNA2. Journal of General Virology 51, 317-326. SAIKI, R. K., GELFAND, D. H., STOFFEL, S., SCHARF,S. J., HIGUCHI, R., HORN, T., MULLIS, K. B. & ERLICH, H. A. (1988). Primer directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239, 487-491. SERGHINI,M. A., FUCHS, M., PINCK, M., REINBOLT, J., WALTER,B. & PINCK, L. (1990). RNA2 of grapevine fanleaf virus: sequence analysis and coat protein cistron location. Journal of General Virology 71, 1433-1441. Vos. P., VERVER, J., JAEGLE, M., WELLINK,J., VAN KAMMEN, A. & GOLDBACH,R. (1988). Two viral proteins involved in the proteolytic processing of the cowpea mosaic virus polyproteins. Nucleic Acids Research 16, 1967-1985. WELLINK, J. 86 VAN KAMMEN, A. (1988). Proteases involved in the processing of viral polyproteins. Archives of Virology 98, 1-26.
(Received 7 May 1991; Accepted 17 June 1991)