The complete nucleotide sequence of turnip mosaic potyvirus RNA

0 downloads 0 Views 1MB Size Report
Journal of General Virology (1992), 73, 2785-2793. ... The complete RNA genome of turnip mosaic potyvirus ... Potyviral RNA is translated into a polyprotein.
Journal of General Virology (1992), 73, 2785-2793. Printed in Great Britain

2785

The complete nucleotide sequence of turnip mosaic potyvirus RNA Olivier Nicolas and Jean-Frangois Lalibert6* Centre de Recherche en Virologie, Institut Armand-Frappier, 531 Boulevard des Prairies, Ville de Laval, Quebec, Canada H 7 N 4Z3

The complete RNA genome of turnip mosaic potyvirus (TuMV) was amplified by seven consecutive reverse transcriptase-polymerase chain reactions and cloned into pUC9. The viral RNA is 9830 nucleotides long and contains a single open reading frame (ORF) of 9489 bases encoding a large polyprotein of 3863 amino acids with a calculated Mr of 358 000. The non-coding region (NCR) preceding the O R F is 129 nucleotides long and has a high AU content (70%). Its predicted secondary structure is characterized by a hairpin loop with a free energy loss o f - 6 9 . 9 kJ/mol. The

termination codon is followed by an AU-rich N C R of 209 bases, excluding the poly(A) tail. Seven potential nuclear inclusion a proteinase (NIa-Pro) recognition heptapeptides are found in the polyprotein. Their sequences agree with consensus potyviral NIa-Pro cleavage sequences except for that at the 6 K - V P g site, which is characterized by a glutamic acid residue preceding the hydrolysed peptide bond. The TuMV proteins are similar to their corresponding potyviral proteins.

Introduction

Potyviral RNA is translated into a polyprotein containing between 3005 (TVMV) and 3206 (PSbMV) amino acids. This polyprotein is proteolytically processed into at least eight mature proteins by three virusencoded proteinases, the N-terminal (P1) protein (Verchot et al., 1991), the helper component-proteinase (HC-Pro) (Carrington et al., 1989; Carrington & Herndon, 1992), which is also involved in aphid transmissibility, and the nuclear inclusion a protein (NIa-Pro) (Carrington & Dougherty, 1987; Chang et al., 1988; Hellmann et aL, 1988; Garcia et al., 1989; Ghabrial et al., 1990). Other viral proteins with known function are the cytoplasmic inclusion protein (CI) which displays an RNA-dependent ATPase activity characteristic of R N A helicases (Lain et al., 1990, 1991); VPg which has been determined to constitute the N-terminal domain of NIa-Pro (Murphy et al., 1990; Dougherty & Parks, 1991), the nuclear inclusion b protein (NIb) thought to be the core replicase (Dougherty & Carrington, 1988) and the capsid protein (CP) (Shukla & Ward, 1989). Other putative proteins are less well characterized: the P3 protein preceding CI on the polyprotein has no known function but has recently been detected in TVMVinfected plants (Rodriguez-Cerezo & Shaw, 1991), and two other small proteins of 6K (p6K 1 and p6K2) have never been detected either in vivo or in vitro. To understand the organization of the TuMV genome and its evolutionary relationship with other R N A viruses, and as a first step toward its genetic manipulation, we undertook the cloning and sequencing of

Turnip mosaic virus (TuMV) belongs to the potyvirus group, which has at least 180 classified members (Ward & Shukla, 1991), and is considered to be the most significant and widespread virus infecting cruciferous crops (Shattuck et al., 1989). Although most potyviruses have narrow restricted host ranges, TuMV infects a large array of economically important crop plants, particularly the Brassica genus, including oil-seed rape (13. napus ssp. oleifera). It is sap-transmissible and spread by over 40 species of aphids in a non-persistent manner (Shattuck et al., 1989). Interestingly, TuMV infects Arabidopsis thaliana, which should provide a valuable model for the study of host-virus interactions. The genome of potyviruses is composed of a positivesense ssRNA of approximately 10 kb, linked covalently at its 5' end to a virus-encoded protein (VPg) and polyadenylated at its 3' end (Dougherty & Carrington, 1988). Of all the members of this group, the genomes of only five have been completely cloned and sequenced: tobacco vein mottling virus (TVMV) (Domier et al., 1986), tobacco etch virus (TEV) (Allison et al., 1986), plum pox virus (PPV) (Lain et al., 1989; Maiss et al., 1989), potato virus Y (PVY) (Robaglia et al., 1989; Turpen, 1989) and pea seed-borne mosaic virus (PSbMV) (Johansen et al., 1991). The nucleotide sequencereported in this paper will appear in the DDBJ, EMBLand GenBanknucleotidesequencedatabasesunder the accession number D10927. 0001-1107 © 1992SGM

2786

O. Nicolas and J.-F. Lalibert6

TuMV RNA. Previously, we have reported the sequence of approximately 1800 nucleotides derived from the 3" end of the genome (Tremblay et al., 1990) and a very efficient method for the cloning of large R N A fragments of this virus by the polymerase chain reaction (PCR) (Nicolas & Lalibertd, 1991). In this paper the complete nucleotide sequence of the TuMV genome is presented. Information is also given concerning the seven cleavage sites recognized by NIa-Pro and the secondary structure of the 5' non-coding region (5'-NCR).

(a) .... 5' CG

FT16

.... 5, AG CTGGAAGCT T C T G G A G C A T T 3 '

FT18

.... 5' c c T T A G C C T G G T A C ( 3 C C T T A A C A T T 3'

FT30

,,, ~' A T G C A A A T G T G A C T G C T G C 3 '

Hind

Hind

Virus and RNA purification. The TuMV isolate originated from Qudbec, Canada and was purified as described by Nicolas & Lalibert~ (1991). RNA was treated with proteinase K, precipitated in ethanol, resuspended in RNase-free water and stored at - 7 0 °C until use. eDNA synthesis and PCR. Purified RNA (1 ktg) was reversetranscribed using random (dN6) as well as oligo(dT)~ 18 primers and Moloney murine leukaemia virus RNase H- reverse transcriptase (Superscript; Bethesda Research Laboratories) as described by the manufacturer. After 1 h at 37 °C, the single-stranded cDNA was digested with 5 ng/ttl of DNase-free RNase A and 0.1 units/ttl of RNase H (Pharmacia LKB Biotechnology) for 30 rain, purified using the GENECLEAN II kit (Bio 101 Inc.) and resuspended in 100 ~tl of water. For the cloning of the 5' end of the RNA, the single-stranded cDNA was polyadenylated using dATP and terminal deoxynucleotidyl transferase (Bethesda Research Laboratories) as described by Sambrook et al. (1989), then purified using GENECLEAN II. PCR was performed with 2-5 units Taq DNA polymerase (BIO/ CAN Scientific) in 100 ~tl of the buffer provided supplemented with 2 to 4 m~,l-MgC12, 50 kt~l-tetramethyl ammonium chloride, 50 pmol each primer, 200 ~tM each dNTP and 1 Ixl of the single-stranded eDNA synthesis reaction mixture. The reactions were carried out in a DNA Thermal Cycler from Perkin-Elmer Cetus. The PCR conditions were as follows. Cycles 1 and 2: denaturation at 94 °C for 30 s, annealing at 37 °C for 30 s, extension at 72 °C for 1 min for each kbp of DNA for amplification (up to 3 min). Cycles 3 to 34: same conditions as above except an increase to 50 °C for the annealing temperature. Cycle 35 : same as cycles 3 to 34 except the elongation time was increased to 5 min. The primers were synthesized on a Gene Assembler (Pharmacia LKB Biotechnology) and are shown in Fig. 1 except for FT0 [5' GGTCTAGAGCTCGAG(T)17 3']. They varied from 21 to 27 nucleotides in length and each pair of primers was composed of a specific 3' end primer from the TuMV sequence and a degenerate 5' end primer based on five published potyvirus sequences (Nicolas & Lalibertd, 1991). Rapid amplification of the cDNA ends (RACE) was performed as described by Frohman et al. (1988). Primers FT30 and FT0 were used for the 5' RACE. PR7 (5" GGTGTTGAGGCTTGGATCCGAACCATGGCAGGTGAAACG 3'), situated at positions 8728 to 8766 of the TuMV RNA sequence, and FT0 were used for the 3' RACE. Molecular cloning, sequencing and primer extension. The PCR products were visualized on a 1.0~ agarose gel, and bands of the expected size (130 bp to 2-2 kbp) were excised and eluted from the gel using GENECLEAN II or MERMAID (Bio 101 Inc.). The fragments were cloned by dC-tailing into dG-tailed pUC9 as described by Nicolas & Lalibert6 (1991). In situ hybridization, using the two primers separately as probes, was performed on white colonies produced by the transforma-

III

I1! .

Kpn

I

(b) PPV 1142 PVY 1131 TVMV1136

Methods

TAGTTAA~,GCTTGTCTTTG 3'

FT12

TEV JF13

PPV PVY

C

lO86

E

R

GA

& ~,(~i~;

D

G

F

V

T'r

Y

K

T

A K N Q E

TVMV594 644

I

21

A

647 621

TEV

S

I

G

A......

Y

C

T

Y T

M T

]

JF17 5' G CCGGTACCGGATATTGCTACAT]3' m

3 PMY 2 TVMV2

PPV

Kpnl

AATATAAAAACTCAACACAACAT ATT AAAACAAC TCAATACAACA AATGAAACAAATCAACACAACA

T T

TEV 3 AATAACAAATCTCAACACAACAT

JF15

5'

AAT AAA A(3A AT TCAACACAACA TI3, T

G TC

Eo*A,

T

J

Fig. 1. Sequence of synthetic oligonucleotides used for PCR. (a) Primers of the FT series were derived from the sequence complementary to the TuMV RNA sequence and their position on the genome is indicated. Asterisks indicate nucleotides not derived from TuMV. (b) Primers of the JF series were derived from homology with the genomes of PPV, PVY, TVM¥ and TEV. For JF13 and JF17, numbers represent the position of the first amino acid in the potyviral polyprotein and amino acids indicate the region from which the primers were derived. For JF 15, numbers represent the position of the first nucleotide in potyviral genomic RNA.

tion of the XLl-blue strain of Escherichia coli (Stratagene). Several clones were analysed by restriction mapping, and fragments were subcloned into pUC19, M13mpl8 or M13mpl9 vectors. The different sabclones were sequenced by the dideoxynucleotide chain termination method (Sanger et al., 1977) using a Pharmacia LKB sequencing kit. In some cases, specific oligonucleotides were used for sequencing. Primer extension using [32p]ATP-radiolabelled FT30 was performed as described by Calzone et al. (1987). The radiolabelled product was loaded onto a 4 ~ sequencing polyacrylamide gel along with the four sequencing reactions of two pRACE-5' clones using FT30 as the sequencing primer, and subjected to electrophoresis. DNA and protein sequences were analysed by the PUSTELL-IBI DNA sequence program. RNA secondary structures were analysed by the RNAFOLD program of Zuker & Stiegler (1981).

Turnip mosaic virus

Results and Discussion Cloning of TuMV RNA The amplification of large fragments of TuMV RNA by PCR has been reported previously (Nicolas & Lalibert6, 1991) and the same procedure was used for the cloning of the complete genomic RNA. Fig. 2 shows the proposed genetic map of TuMV and the strategy used to amplify the RNA, along with the positions of the different PCR clones. Fragments overlapped each other by 50 to 200 nucleotides and their length ranged from 130 to 2200 bp. The 5' end degenerate primers were derived from highly conserved regions of four potyviruses (TEV, TVMV, PVY and PPV) but, interestingly, their specificity was not critical for effective amplification. For example, comparison of the JF 13 primer sequence with the TuMV sequence at positions 3736 to 3754 showed that there was a four nucleotide mismatch which did not prevent amplification. We think that these degenerate primers could be used efficiently for the cloning of other potyviruses. The viral extremities were cloned using the RACE procedure (Frohman et al., 1988). Five independent clones for the 5' end (pTUM-5') were sequenced. Four clones had five adenine residues adjacent to the poly(T) tract introduced during the cloning procedure (clones 5A), whereas one had six adenine residues (clone 6A). To confirm that the 5' end of the viral RNA had been effectively cloned, primer extension experiments using FT30 were performed (Fig. 3). Two bands were observed and these coincided with the adenine residues immediately preceding the poly(T) tract of the pTUM-5' clones. This suggested that there were at least two populations of TuMV RNA. Alternatively, we cannot exclude the possibility that one or more nucleotides could have been removed with VPg during the extraction process. The nucleotide sequence reported in this work begins with five adenines because it was apparently the most abundant species. The nucleotide sequence of the 3' end was also determined from three independent pTUM-3' clones and they all contained a poly(A) tract of at least 20 residues.

Sequencing and analysis of TuMV RNA The nucleotide sequence of TuMV RNA is shown in Fig. 4. The sequence was determined for at least three independent clones from one PCR reaction. The error frequency was estimated as three nucleotide changes for every 1000 bases sequenced. The assembled TuMV eDNA is 9830 nucleotides in length and is the second longest potyvirus sequence published, the longest being that of PSbMV (9924 nucleotides). The base composition

2787

AUQ

UOA

~,

D6K1

ISO

1216

D6K2

8740 6899 6478

30eG 3811

2693

~.

7204

S?$¢5

g61S

pTUM-5'

FTO4'Q'FT30 U DTUM2200 Frfe! jF15

./F~r

pTUM1600

Fr,$

l pTUM1100 ./FI~

pr;., pTUM3300

Jr~

rr8 t

DTUM12OO U, ,,.'FS

DTUM-3' I I P~,

FrO

Fig. 2. Proposed genetic map of TuMV. Boxes represent the coding region of the viral RNA. Vertical lines in the boxes indicate proposed cleavage sites. Numbers below each box indicate the position of the first coding nucleotide of each gene. Horizontal arrows represent clones obtained by PCR. Primers used for each amplification are given in italic. Plasmids pTUM 3300 and pTUM 1200 have been described previously (Nicolas & Lalibert6, 1991).

PE

Clone 6A

[

Clone 5A II

G

A

T

C

G

A

T

C

(A) A -A A

A A A T

A T A

A A

A A C T

c A A Fig. 3. Autoradiograph of the primer extension reaction. Two pTUM-5' clones (clones 5A and 6A) were sequenced using the FT30 primer and loaded onto the same sequencing gel as the primer extension reaction (lane PE). The autoradiograph was turned in mirrorimage so that the nucleotide sequence indicated to the right of the gel is the sequence of the 5' end of TuMV RNA. The nucleotide sequence of the 5' end of clone 5A is indicated for the first 19 residues. A in parentheses represents the first nucleotide of clone 6A.

of TuMV RNA is 32% adenine, 24.3% guanine, 22.8% uracil and 20.9% cytosine. Three in-phase AUG codons were found within the first 400 nucleotides. The first AUG at position 130 is likely to be the initiation codon because it is in a context (CAAAAUGGC) similar to the consensus sequence for translation initiation in plants (AACAAUGGC) described by Lutcke et al. (1987). Computer translation of TuMV RNA starting at this AUG codon revealed a single open reading frame (ORF) of 9489 nucleotides encoding a polyprotein of 3863 amino acids with a calculated Mr of 358000.

2788

O. Nicolas and J.-F. LalibertO

i AAA

AAT

ATA

AAA

ACT

CAA

CAC

AAC

88 AAT 1

CTT

TTG

TCG

TTA

TCA

AAG

CAA

ATA

CAC

AAA

ACG

ATT

AAA

GCA

A/~C A C A

~T

CTT

TCA

AAG

CAT

TCA

AGC

AAT

CAA

AGA

TTT

TCA

~ ~ T ~ ~ ~c cc~ ~c ~ o~ ~I~ o~ ~7 ~ ~° ~ qo ~ ~ o? ~c ~c 7?3 °~ q~ ~.~ ~ c.~c ~c ~ ~o oc~ ~o c? c7 ~Ic ~ o~ T ~ o~o cc~ T ~ ~ ~o ~I¢ c~¢ ~c~ ~ ~ oc~ c~ o~ ~;c ~? ~? ~I~ o? ~ ~ c~ q~ c~o ~ ~o ~ oc~ ~I~ c~ ~ ~? ~o ~c~ %~ ~I~ ~ o~ ~ ~ ~

~o437 ~ ~ o~ ~ ~ o~o

c~ ~? ~o ~c~ ~ c~c ,~ ~7 ~.~

o~c oc~ ~ ~A ocA c~A T~c c~G ~G

T~C~G

Ale ~o ~ ACA c~A ~G

1915 596

ACT T

CAG Q

TAC Y

GAG me

ACT T

AGA

GTA

GTA

CCA

AAC

G~C

TCG

R

V

V

P

N

(~

S

C G A "AAG C T T R

K

L

GCA A

ATT I

GGT G

AAA K

CTC L

ATA I

GTC V

CCG P

ACG T

AAT B

TTT F

GAA E

GTT V

CTG L

~oo8/'~ o? c~o ~o ~ %, o? ~cc ~7 °~ ~c ~37o41~ ~c~ ~ ~ ~

c[c %~oT %~ %~ ~c ~ ~ ~ %~ ~? o~ %~ c.~ c[~ ~ ~ ~[~ q~ ~c~ cc~ o~c ~ ~ ~ occ o~ ~[~ c~ ~7 %~c T ~ ~c~ c~o ~c~ ~ ~ %~o ~c~ ~ ~ c~ ~F ~° ~Ic ~7 ~? °~° ~ c[~ cco •;c ~o %c ~ T q~ ~c

~9~C4 4 oc~ oc~ ~ c~o ~? ~ ~c ~c~

c~ ~c o? ~c o~ ~ o~c u~

,o3 1~o ~ ~c ~o ~ ~o ~c ~ ~ •cc ~o ~o ~ ~c ~ ~o 3394 1089

AAC

GTA

TTA

GTG

TGC

N

V

L

V

C

ATT I

AGT S

TTG L

TTA L

ATC I

AAG K

ATG M

o~ ~ o~ ~ ~ ~ ~ c~

ACT

GCT

GAA

GCA

AAT

CAC

ATT

ATC

ACT

T

A

E

A

N

H

I

I

T

ACG T

CAG Q

AGG R

AGG R

TTG L

AAG K

CTG L

GAC D

Turnip mosaic virus

3s681147 c~G G~c ~ T~ A~AG2 ~c A~o o~o o~G 36~s1176 ,~oT~G ~A

o~ ~ ~

~t~

2789

2790

O. Nicolas and J.-F. LalibertO

23077048CAC ACC ATTI GAAE GCA CAC GAG TGG GTC ~ G

CAC TGG A ?

7135 CAA Q

CTA ATC TCA GAC C C G T AGC ACG GCA GTC TAC °CA s T a V Y a L I S

2336

CC~ GTC AGC TTG

p

V

L

TTC

F

AAA GTG AGC ~ G K V S

7222 ATG TTC GAG CA° CTC ACT % A 2365 ~ L 7309 2394

~C

TAC ~ C

CTG AAA GCT ATA GCA CAT L K A

CAG AT° TTT GAC TTG TAT CTC A~KG TTA CAT GAC GEAA °CA CGA Q M F D L Y L L H D h

ACT AGCs GCC A .I . . . S. . . .W

~T

TCT TTG

0

~C

AITA C ~

T=

GA. . . . S.

a

T~ GoT.AG~ C~G C~T O~O hCh ~G Oho.A~h G~T ~

%h ~

w

~T

GC ThT,TTC,~ ? CCO,hTG.CTG~%C CAG~T~T ~G ~G AGT~~

CTC

24237396 ~T c~A G? ocA T~T ochhh? OhToC~T~CTG~~, T~T °CA A~G CCh hl~ G? G? %A ~ 0 hit GiC T~T ~IT C~G T G? ~G h~h ~,~3 OTT G~ h? ~TC h? T? OhT OTO OOA %T ThT %T T;~ 0? A~A Toc~~J ThT,GTC'4hCT~OhTOGhG,~T GiC A? T~c G? o~TACTT

2452

V

E

V

D

L

R

Y

~4.17~7°~ c A~G ~ , T~c G? %~T %h G~G qG . .•. . ~ %h ~G ~G 25107657 CTGL

KA~ C ~

AGTs TGcC GEAA . R. . . .L. .

TCF TTGL GGGG ~ G

T~T oi • ~II ~o ~TT h~h O~T . .P. . . . .W. L R A

7~3~ ~ho T Q

2568

25977918CCA GiT GGT TGwGGTG,4TATy. C. . . .D. . . .A.

D

~G

GAT TACy TTTF GeTA GAGE TTCF

ACA CCCpGAG GTG A ~ GEA A GEA A ATAI

ATMG ~ A

COST %C

DATG~C TCA C~G

TTT

T~o AGo ~ ? hTo GTG ~T AO~ O~ ~TT TCT hCA 8005 cG~ ~A %~ T~/h~G G? .......... TA GGG GhG ~h hT~ c L~ h~h ~T T~o L Y T I V T I S T 262&

R

L

M

E

W

D

"4

0

E

M

%h c~G ......, ~ h~T G~T G~A GiO ~ 0 A~h ~C hTo. OTO~h ? TTG 8179 GCA 2684

A

GTC

A~C

ThT TCA CTC ~ G ¥

V

S

L

~

% C CCA 0

K

hTT CCA hGT GhG TTG ooc oAo Ao~ ATC ATO h~h TT~ TTO G~C ~O %A GhC OhT TTA 1

P

S

E

L

8~6627~3 c~G ~A h~o G~A c~ ~h ..... • ,T Gho,TAT~ATC,C~T GIC h~T ATG

R

D

S

°CA GACD ~ C

I

I

R

F

F

D

TTTF CGA GEAA CTG GGT CTG ~ G

D

L

TAC ACT TTC GACD

Go° ~h~ hGh hGh GhO GGh ATT TOOhit ~CT ~G O~T 0 ? C~A 2742~53TcG~hGG,hCT hOG,G? ~ . %A OhO~OT~ T~ TTT,hTG.TCG C~.T cho Q G s R R E 0 I W P 27718440GAG CGA AITAGvTATCA A~T CTAL°~A TwGG. .D. . . RG TCG A ~ G~A c~A T~c o~T ooA. TTG~G~o, G~h AIC T~ G~A GC h~G AIT G~G TCG °CO T~ hTO hTT O? O~, G~T O~h ~T~ h~ T~ ~T~ ~CA e ~ o ? A

87ol ~h~ ~ch ~I~ T 2858

Q

87882887GAGE CEAA ~ G 8875 2916

o~G ~AT Th~ o? .....T o~T GAG G~ T~T D

GGC AAG GAT GTC GCA C ?

G

K

Y

D

C~G GCAA GAGE A~ . . . E.

V

h

GEA A ......

E

G

A~ . . . .K .

G

V

E

E

AGA GAG ~ G

AAA CGC GAC A~KG GEA A

K

R

D

M

"4

II

I

Q

9049 hcG °IT ~A T~ ~c A~ ~GT T~h hCG CGA ~G CA° T T 8 T R S T R Q

91363003 A ~

AT° C ~

AITC ATT CTCL ~ T

P

P

h

Q

G

E

T

~

~

GcT,O~hoAcc T

rcT

S

G

Q

F

S

L

A

L T KA~ C~G TTLG GCA TTC ~K G AAAK

GGA ACT TTC AGr GTA CCC AGA CTC ~ G L K T F S V P R

cTc °IT c~T .... Tc c~A ,~ AgG C~G GE..... L L I Q

. A D • DOAC ~ic ACAT TwGG TFTT GAAE %TGTA'4YATGGeTGATTheo?T~GAgGGhG

GGT TTG AGG GTCv TwC'G TGC AITT

3o329~3°AGoG~c GC °IT C;G GTG'4°hA,TT~,CCG~hTC,~

A

°CA GE~ /~KG . .E . . . R. . . .E. . . .R. .

2945~96~hGT~c.~G A~A A~: ~G hTG.OGO.G~G CCA h~h T~T GiG ~ , h~h 2974

A

Q

GiG ~o %h h~o . .8. . . . ~ NC ATh~~C ~

COO.CTC~AIT Gic c~ G~oh~

~G TaG G~G h~G hTG

. . . . . ATTTAGG~hGAIhATG~CC~T';~hGT P

T

F

R

Q

M A

S

...... T ~hG C~ ~T qA ~gc Gic AT° 3o6193~°GAc~~A °C ~? ~co,T~c ATT~G? ~G~ ~GT.~,~. ~ ? ~Ic c~h c~h T~o hTG.~A °~h TA ¥ G L Q M 3o9o939~ hG~ T~h OC ~h ~C O~AhT~T,OATOTT~,ThTyG? hT~.h~T T~T h ~ hgT =h, ATA C G T G C G R A R E A G G C A C ~ C A I C C ; G A ~ G ~ G C G K A C~GG TTA TGA AGT L 9658 CTC GTT AGT ATT 9745

TAT TTT

CTC GCT TAT GGG AAA TAT GTA AGT

CGC CGA ACA

TTT TAT TGG TGT

TTG TTA

TGT

;tAG CA° CCA GTG TGA

TAG CGC ATG TGG TGA GGA TCG TCC TCG

AT° CTG GTA GAC TAT ~ G CTT TGT

CAT GTG

ATT GCC TTA ACA TTT

TAT TTA AGT TTA

TGT TGT TGT TAC TTT

GAT AGG ATG C ~

CTG

GGG AC-An

Fig. 4. N u c l e o t i d e s e q u e n c e o f T u M V R N A . T h e nucleotide s e q u e n c e o f the plus-sense c D N A and the deduced a m i n o acid sequence o f the polyprotein are g i v e n and n u m b e r e d . T h e a m i n o acids w h i c h f o r m the potential c l e a v a g e sites proposed in this article are double underlined and the peptide bonds c l e a v e d are indicated by vertical arrows.

The 5'-NCR preceding the ORF is 129 nucleotides long and has a high AU content (70~). Since viral protein expression does not seem to involve translational shut-off of the host mRNA, it has been proposed that this high AU content facilitates the melting of secondary RNA structures for efficient translation (Jobling & Gerhrke, 1987; Altmann et al., 1990). Fig. 5 shows the predicted secondary structure of the 5"-NCR. This

structure is characterized by a hairpin loop with a free energy loss of - 6 9 . 9 kJ/mol. The two highly conserved regions reported by Lain et al. (1989) and Turpen (1989), namely box 'a' (ACAACAU) and box 'b' (UCAAGCA), are present in TuMV RNA; box a is found in a singlestranded region whereas box b is located within a bulge in the hairpin loop. Interestingly, these boxes in TEV, PPV, TVMV and PVY are found in similar secondary

Turnip mosaic virus

1

AUGC~A c

BOX b

"

C

G A A A ~ j AA A AU GC

CA U 5' A A A

p~K1 ~re912~8

I "c-''ol ,,.// .3 11~

~ ~ - K~VVHQfA {YRVG/Gt TPTVYHQ/T [EAVNHQ/S [EPVTHE/A AAAUG 3' A

p6K2

laTl~e24211e 2369

2~r6, 38e3

I Iv"01," '-l ~ "'0 / I c. I

"iE/s~/,//

~Q/T~/' iQ/A r

Fig. 6. Amino acid sequences of all potential cleavage sites in the TuMV polyprotein. Numbers above vertical lines represent the position of the first amino acid residue of each protein indicated.

C

G A A

C C

AU AU AU CG AU C GC AU.

821

....-- ~

AA AC U AC u UA UA CG UA AU AcAU A AC AU U A C

3~3

2791

A A

C C A

C

.oxen(

AA AG

Fig. 5. Prediction of the secondary structure of the TuMV 5'-NCR for the plus strand. Box a and box b, which are highly conserved in several potyviruses, are boxed.

structures (results not shown). This common structure among several potyviruses suggests that the hairpin and/or the conserved boxes may play a role in virus replication but are not necessary for translation. Indeed, Riechmann et al. (1991) have shown that deletion of the first 100 nucleotides of the PPV 5'-NCR does not affect the efficiency of in vitro translation; a similar conclusion was drawn by Carrington & Freed (1990) for TEV RNA. The stop codon at positions 9619 to 9621 is followed by an N C R of 209 bases, excluding the poly(A) tail. The high AU content of this 3'-NCR (61.3%) is comparable to that of the 5'-NCR (70~), but no sequence homology was observed. The 3'-NCR displayed 93.3~ identity with that of a Chinese strain of TuMV (Kong et al., 1990), but did not agree with data reported by Tremblay et al. (1990). We believe that a cloning artefact was introduced during the previous work. Proteolytic processing o f the polyprotein

The C-terminal domain of NIa-Pro is the proteinase responsible for several of the polyprotein processing

events (Dougherty & Parks, 1991). A seven amino acid block is sufficient to define the cleavage site recognized by NIa-Pro (Dougherty et aL, 1988) and it has been suggested that differential proteolysis of these sites may regulate potyviral gene expression (Dougherty et al., 1989). The amino acid sequence surrounding the cleavage site between TuMV NIb and CP has been determined previously (Tremblay et al., 1990) and is similar to the NIa-Pro consensus cleavage site of other potyviruses (reviewed in Riechmann et aL, 1992). Based on these consensus NIa-Pro recognition cleavage sites and on the N I b - C P cleavage site sequence, we looked for all potential NIa-Pro cleavage sites in the TuMV amino acid sequence. Seven potential heptapeptides were found and these are shown in Fig. 6. The sequences of these cleavage sites agree with consensus potyviral sequences except for the p6K2-VPg site which is characterized by a glutamic acid residue preceding the hydrolysed peptide bond, a situation similar to that in PSbMV (Johansen et aL, 1991). If hydrolysed in vivo, cleavage at these sites would produce seven proteins very similar to previously described potyviral proteins: p6K1, CI, p6K2, NIa (VPg-proteinase), NIb and CP. Certain evidence indicates that p6K1 is released from the polyprotein of PPV (Lain et al., 1989; Riechmann et al., 1992) or TVMV (Rodriguez-Cerezo & Shaw, 1991), although this has not been observed for TEV (Parks et aL, 1992); the controversy lies in whether the P3-p6K1 site is cleaved. The region of the TuMV polyprotein containing the last 52 amino acids of P3, all of p6K1 and the first 53 amino acids of CI was aligned with that of five potyviruses (PPV, TEV, PVY, TVMV and PSbMV) (Fig. 7) and the percentage identity calculated to be 30 ~, 83 ~ and 64 for P3, p6K1 and CI respectively. The evident shift in similarity downstream from the first cleavage site suggests that p6K1 may be released from P3 by NIa-Pro. The protein following p6K1 contains the nucleoside triphosphate-binding motif (NTBM) (Miller & Purcell, 1990) and thus would be CI with an associated helicase

2792

O. Nicolas and J.-F. Lalibertk

TuMV

1124 ME I E W ~ i F H H N

TEV

lO6O

w D N I I

XLTHSASQ

r~_~;~EIELR - ~ T

F Q Y S K L E N P I G Y R S T A

L

TVMV

1012

L E D K T

Y D D F K A K L P E G S

T

PSbMV

1165

RYTLW

AQEGR

N

TuMV PPV TEV PVY TVMV PSbMV

TuMV PPV TEV PVY TVMV PSbMV

H G

R F~_~Q ~ _ _ ~ S

i is

o

QVTRE

E F

Q

EAFE

KS

Q IVQFAQA-

RQRD

KE

VYE

g YKFC

YL

G

QME -M

ME

Y

T

Ab~_A_~L /~_~b~_FJG S DIR S

K

YvT

v *.

T[]PF

T"/QIL

K T s L P

Q P ~ QFqQ M olK T Fic

E

Fig. 7. Amino acid alignment in the region of the p6K1 protein of TuMV, PPV, TEV, PVY, TVMV and PSbMV. Numbers represent the position of the first amino acid in the polyprotein of the six potyviruses. The putative cleavage sites delimiting the p6K 1 protein are indicated by vertical arrows. Amino acids identical in at least three sequences are boxed.

activity. Immediately after this is a region that could potentially encode a protein of 53 amino acids (p6K2) with an Mr of 5966. No similarity greater than 51% was found with the corresponding potyviral proteins. Next is the NIa with a calculated Mr of 49292, which is a multifunctional protein: the N-terminal domain is the 22K VPg and the C-terminal domain the 27K proteinase. Recently, release of the 22K protein from TuMV NIa has been shown to take place efficiently in E. coli (Lalibert6 et al., 1992). The region directly following this is NIb and includes the GDD motif associated with replicase activity (Dougherty & Carrington, 1988) at positions 2709 to 2711 on the polyprotein. The TuMV NIb protein would be composed of 518 amino acids with a calculated Mr of 59594. The last protein is the CP, which has already been characterized (Tremblay et al., 1990). It has been shown that complete processing of the potyviral polyprotein requires two more proteolytic activities: HC-Pro (Carrington et al., 1989) and P1 (Verchot et al., 1991). The consensus HC-Pro cleavage sequence is K-X-Y-X-V-G/G (where cleavage occurs between the two glycine residues) (Carrington & Herndon, 1992). A similar sequence (K-H-Y-R-V-G-G) is present at positions 815 to 821 of the TuMV polyprotein and hydrolysis at this site would release a 40-3K P3 protein from the HC-Pro and the p6K1 protein. On the other hand, the precise P1 cleavage site has not yet been identified but the sequence I-V-H-F-S is

found at positions 359 to 363 in the TuMV polyprotein and may represent the cleavage site sequence (Mavankal & Rhoads, 1991). Furthermore, conserved amino acids found in the C-terminal half of the protein have been identified (Verchot et al., 1991). These same amino acids are located in TuMV P1. We thank Johanne Roger for her technical expertise in sequencing and Robert Lalonde for his computer help. O.N. is supported by a studentship from le Fonds pour la Formation des Chercheurs et l'Aide la Recherche (FCAR) du Qu6bec. This work was supported by the FCAR, Programme de Soutien aux l~quipes de Recherche and le Conseil des Recherches en P6cherie et Agroalimentaire du Qu6bec (CORPAQ).

References ALLISON, R., JOHNSTON, R. E. & DOUGHERTY, W. G. (1986). The nucleotide sequence of the coding region of tobacco etch virus genomic RNA: evidence for the synthesis of a single protein.

Virology 154, 9 20. ALTMANN, M., BLUM, S., WILSON, T. M. A. & TRACHSEL,H. (1990). The Y-leader sequence of tobacco mosaic virus RNA mediates initiation-factor-4E-independent, but still initiation-factor-4Adependent translation in yeast extracts. Gene 91, 127-129. CALZONE,F. J., BRITTEN, R. J. & DAVIDSON,E. H. (1987). Mapping of gene transcripts by nuclease protection assays and cDNA primer extension. Methods in Enzymology 152, 611-632. CARRINGTON, J. C. 8~ DOUGHERTY, W. G. (1987). Small nuclear inclusion protein encoded by a plant potyvirus genome is a protease.

Virology 61, 2540-2548. CARRINGTON, J. C. 8~ FREED, D. n. (1990). Cap-independent enhancement of translation by a plant potyvirus 5' nontranslated region. Journal of Virology 64, 1590-1597.

Turnip mosaic virus

CARRINGTON,J. C. St, HERNDON, K. L. (1992). Characterization of the potyviral HC-Pro autoproteolytic cleavage site. Virology 187, 308-315. CARRINGTON, J. C., CARY, S. M., PARKS, T. O. 8/. DOUGHERTY, W. G. (1989). A second protease encoded by a plant potyviral genome. EMBO Journal 8, 365-370. CHANG, C.-A., HIEBERT, E. & PURCIPULL, D. E. (1988). Analysis of in vitro translation of bean yellow mosaic virus RNA" inhibition of proteolytic processing by an antiserum to the 49K nuclear inclusion protein. Journal of General Virology 69, 1117-1122. DOMIER, L L., FRANKLIN, K. M., SHAHABUDDIN, M., HELLMANN, G. M., OVERMEYER, J. H., HIREMATH, S. T., SIAW, M. F. E., LOMONOSSOFF, G. P., SHAW, J. G. & RrlOADS, R. E. (1986). The nucleotide sequence of tobacco vein mottling virus RNA. Nucleic Acids Research 14, 5417-5430. OOUGHERTY, W. G. & CARRINGTON, J. C. (1988). Expression and function of potyviral gene products. Annual Review of Phytopathology 26, 123-142. DOUGHERTY, W. G. &. PARKS, T. n . (1991). Post-translational processing of the tobacco etch virus 49-kDa small inclusion polyprotein: identification of an internal cleavage site and delimitation of VPg and proteinase domains. Virology 183, 449-456. DOUGHERTY, W. G., CARRINGTON, J. C., CARY, S. M. & PARKS, T. O 0988). Biochemical and mutational analysis of a plant virus polyprotein cleavage site. EMBO Journal 7, 1281-1287. DOUGHERTY, W. G., CARY, S. M. 8z PARKS, T. D. (1989). Molecular genetic analysis of a plant virus polyprotein cleavage site: a model. Virology 171, 356-364. FROHMAN, M. A., DUSH, M. K. & MARTIN, G. R. (1988). Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. Proceedings of the National Academy of Sciences, U.S.A. 85, 8998-9002. GARCiA, J. A., RIECHMANN,J. L. & LAIN, S. (1989). Proteolytic activity of the plum pox potyvirus NIa-like protein in Escherichia coll. Virology 170, 362-369. GHABRIAL, S. A., SMITH, H. A., PARKS, T. D. ~,~DOUGHERTY, W. G. (1990). Molecular genetic analyses of the soybean mosaic virus NIa proteinase. Journal of General Virology 71, 1921-1927. HELLMANN, G. M., SHAW, J. G. & RHOADS, R. E. (1988). In vitro analysis of tobacco vein mottling virus NIa cistron: evidence for a virus-encoded protease. Virology 163, 554--562. JOBLING, S. A. & GEHRKE, L. (t 987). Enhanced translation of chimeric messenger RNAs containing a plant viral untranslated leader sequence. Nature, London 325, 622-625. JOHANSEN, E., RASMUSSEN,O. F., HEIDE, M. & BORKHARDT, B. (1991). The complete nucleotide sequence of pea seed-borne mosaic virus RNA. Journal of General Virology 72, 2625-2632. KONG, L.-J., FANG, R.-X., CHEN, Z.-H. & MANG, K.-Q. (1990). Molecular cloning and nucleotide sequence of coat protein gene of turnip mosaic virus. Nucleic Acids Research 18, 5555. LAIN, S., RIECHMANN, J. L. & GARCiA, J. A. (1989). The complete nucleotide sequence of plum pox potyvirus RNA. Virus Research 13, 157-172. LAIN, S., RIECHMANN,J. L. & GARCiA, J. A. (1990). RNA helicase: a novel activity associated with a protein encoded by a positive strand RNA virus. Nucleic Acids Research 18, 7003-7006. LAIN, S., MARTIN, M. T., RIECHMANN, J. L. & GARCiA, J. A. (1991). Novel catalytic activity associated with positive-strand RNA virus infection: nucleic acid-stimulated ATPase activity of the plum pox potyvirus helicase-like protein. Journal of Virology 63, 1-6. LALIBERT~, J.-F., NICOLAS, O., CHATEL, H., LAZURE, C. & MOROSOLI, R. (1992). Release of a 22-kDa protein derived from the aminoterminal domain of the 49-kDa NIa of turnip mosaic potyvirus in Escherichia coll. Virology (in press). LUTCKE, H. A., CHOW, K. C., MICKER, F. S., MOSS,K. A., KERN, H. F. & SCHEELE, G. A. (1987). Selection of A U G initiation codons differs in plants and animals. EMBO Journal 6, 43-48.

2793

MAISS, E., TIMPE, U., BRISSKE, A., JELKMANN, W., CASPER, R., HIMMLER, G., MAI"rANOVICH,D. & KATINGER, H. W. D. (1989). The complete nucleotide sequence of plum pox virus RNA. Journal of General Virology 70, 513-524. MAVANKAL,G. & RHOAOS, R. E. (1991). In vitro cleavage at or near the N-terminus of the helper component protein in the tobacco vein mottling virus polyprotein. Virology 185, 721-731. MILLER, R. H. & PURCELL, R. H. (1990). Hepatitis C virus shares amino acid sequence similarity with pestiviruses and flaviviruses as well as members of two plant virus supergroups. Proceedings of the National Academy of Sciences, U.S.A. 87, 2057 2061. MURPHY, J. F., RHOADS, R. E., HUNT, A. G. & SHAW, J. G. (1990). The VPg of tobacco etch virus RNA is the 49-kDa proteinase or the N-terminal 24-kDa part of the proteinase. Virology 178, 285 288. NICOLAS, O. & LALIBERTI~,J.-F. (1991). The use of PCR for the cloning of large cDNA fragments of turnip mosaic potyvirus. Journal of Virological Methods 32, 57~6. PARKS, T. D., SMITH, H. A. & DOUGHERTY, W. G. (1992). Cleavage profiles of tobacco etch virus (TEV)-derived substrates mediated by precursor and processed forms of the TEV NIa proteinase. Journalof General Virology 73, 149-155. RIECHMANN,J. L., LAIN, S. & GARCiA, J. A. (1991). Identification of the initiation codon of plum pox potyvirus genomic RNA. Virology 185, 544-552. RIECHMANN, J. L., LAIN, S. & GARCiA, J. A. (1992). Highlights and prospects of potyvirus molecular biology. Journal of General Virology 73,1 16. ROBAGLIA, C., DURAND-TARDIF, M., TRONCHET, M., BOUDAZIN, G., ASTIER-MANIFACIER, S. & CASSE-DELBART, F. (1989). Nucleotide sequence of potato virus Y (N strain) genomic RNA. Journal of General Virology 70, 935-947. RODRIGUEZ-CEREZO, E. 8¢. SHAW, J. G. (1991). Two newly-detected non-structural viral proteins in potyvirus-infected cells. Virology 185, 572-579. SAMBROOK, J., FRITSCH, E. F. & MANIATIS, T. (1989). Molecular Cloning: A Laboratory Manual, 2rid edn. New York: Cold Spring Harbor Laboratory. SANGER, F., NICKLEN, S. • COULSON, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, U.S.A. 74, 5463-5467. SHA'I~UCK, V. I., BROLLEY, B., STOBBS, L. W. & LOUGHEED, E. C. (1989). The effect of turnip mosaic virus infection on the mineral content and storability of field-grown rutabaga. Communications in Soil Science and Plant Analysis 20, 581 595. SrlUKLA, D. D. & WARD, C. W. (1989). Structure of potyvirus coat proteins and its application in the taxonomy of the potyvirus group. Advances in Virus Research 36, 273-314. TREMBLAY, M.-F., NICOLAS, O., SINHA, R. C., LAZURE, C. St. LALIBERTI~,J.-F. (1990). Sequence of the T-terminal region of turnip mosaic virus RNA and the capsid protein gene. Journal of General Virology 71, 2769-2772. TURPEN, T. (1989). Molecular cloning of a potato virus Y genome: nucleotide sequence homology in non-coding regions of potyviruses. Journal of General Virology 70, 1951-1960. VERCHOT,J., KOONtN, E. V. & CARRINGTON, J. C. (1991). The 35-kDa protein from the N-terminus of the potyviral polyprotein functions as a third virus-encoded proteinase. Virology 185, 527-535. WARD, C. W. & SHUKLA, D. D. (1991). Taxonomy of potyviruses: current problems and some solutions. Intervirology 32, 269-296. ZUKER, M. & STIEGLER, P. (1981). Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research 9, 133-148.

(Received 21 May 1992; Accepted 21 July 1992)

Suggest Documents