transforming proteins - Europe PMC

3 downloads 0 Views 1MB Size Report
Nov 23, 1987 - We thank Dr. Dina Ron for helpful discussions, Dr. Shiv Sri- vastava for .... Di Fiore, P. P., Pierce, J. H., Kraus, M. H., Segatto, O., King,.
Proc. Nail. Acad. Sci. USA Vol. 85, pp. 2061-2065, April 1988 Biochemistry

The predicted DBL oncogene product defines a distinct class of transforming proteins (B-cell lymphoma/transfection/transformation)

ALESSANDRA EVA*, GIANCARLO VECCHIOt, C. DURGA RAO, STEVEN R. TRONICK, AND STUART A. AARONSON Laboratory of Cellular and Molecular Biology, National Cancer Institute, Building 37, Room 1E24, Bethesda, MD 20892

Communicated by Roscoe 0. Brady, November 23, 1987 (receivedfor review August 31, 1987)

soluble and crude membrane fractions of DBL-transformed cells. The protein is phosphorylated on serine residues but appears to lack detectable kinase activity or GTP-binding properties (14). To investigate the structure and function of the DBL oncogene product, we endeavored to isolate biologically active cDNA. We describe the nucleotide sequence of the putative DBL transcriptional unit. Our findings establish the nature of the DBL transforming gene product and indicate that DBL may represent a distinct class of oncogenes.

The DBL transforming gene was originally ABSTRACT identified by transfection of NIH 3T3 cells with DNA from a human B-cell lymphoma. This gene was found to have arisen as a result of recombination of the 3' portion of the DBL protooncogene coding sequences with an unrelated segment of human DNA. It encodes a cytoplasmic protein that is equally distributed between cytosol and crude membrane fractions. To further characterize this transforming gene, a biologically active cDNA clone of the DBL transforming gene mRNA was isolated. Analysis of the sequence of the DBL oncogene cDNA revealed a long open reading frame that encodes a hybrid protein whose first 50 amino acids (at least) derive from a complete exon of a different locus. No significant homology with known oncogenes or any known protein sequences was demonstrated. The computer analysis of the predicted DBL protein indicated it is highly hydrophilic with no hydrophobic domains characteristic of a membrane-spanning region or signal peptide. Thus, the DBL oncoprotein is distinct among known transforming gene products.

MATERIALS AND METHODS Cells and Transfection Assay. NIH 3T3 cells and DBL transfectants of NIH 3T3 have been described (13). Cells were grown in Dulbecco's modified Eagle's medium supplemented with 10% (vol/vol) calf serum. DNA transfer into NIH 3T3 cells was performed by the calcium phosphate precipitation method as described (15). cDNA Cloning. Total cellular RNA was purified and poly(A)+ RNA prepared as described (16-18). Poly(A)+ RNA (10 ,ug) was used as a template for cDNA synthesis by 2000 units of Moloney murine leukemia virus reverse transcriptase (Bethesda Research Laboratories). The first cDNA strand was synthesized as described by Maniatis et al. (19), and the second strand was synthesized according to Gubler and Hoffman (20). Double-stranded cDNA was methylated at 37°C for 30 min with EcoRI methylase (New England BioLabs) and phosphorylated EcoRI linkers (Pharmacia) were added by ligation at 14°C for 48 hr. DNA was digested with EcoRI, and free linkers were removed by binding the cDNA to NA45 DEAE-cellulose membrane (21) as described by the manufacturer (Schleicher & Schuell). The purified cDNA was ligated to dephosphorylated EcoRI-digested Agtll arms (Promega Biotec), and the resulting library was screened by plaque hybridization with genomic fragments of the DBL oncogene containing transcribed sequences (8).

A number of human transforming genes have been detected by DNA-mediated gene transfer techniques utilizing NIH 3T3 cells. Oncogenes detected by DNA transfection have been shown to arise from such subtle changes as point mutations (1) or from gross rearrangements that have occurred either in vivo (2) as a consequence of chemical transformation (3) or in vitro during the process of DNA transfer (4-8). The majority of oncogenes identified so far by this approach are members of the RAS gene family. RAS genes encode GTP-binding proteins with autokinase activity and are thought to play a critical but as yet undefined role in signal transduction (9). Other oncogenes detected by DNA transfection analysis include several whose proteins share homology with tyrosine kinases (2, 10, 11). Finally, the predicted products of certain genes detected by this approach possess similarities to the family of fibroblast growth factors (12). We have molecularly cloned a human oncogene, DBL, which was initially detected by transfection of NIH 3T3 cells with DNA of a B-cell lymphoma (13). An independent isolate of a DBL-related transforming gene was obtained following transfection of NIH 3T3 cells with DNA of a human nodular, poorly differentiated lymphoma (8). Comparison of the structure of these independently isolated DBL oncogenes with that of the normal DBL locus has indicated that they arose as a result of independent gene rearrangements involving portions of two different genes that recombined with the 3' portion of the DBL protooncogene coding sequenceA By molecular hybridization, the DBL oncogene lacks detectable homology to characterized oncogenes. The DBL oncogene product has been identified by immunological approaches as a 66-kDa product that fractionates in both the

RESULTS Isolation and Characterization of DBL cDNA. We have reported (13) the isolation, from a genomic library, of a recombinant cosmid clone containing the DBL oncogene. Its transcribed sequences were shown to be distributed over a 30-kilobase (kb) span within the molecularly cloned 45-kb segment of human DNA that contained the transforming gene. The physical map of the DBL genomic clone is shown in Fig. 1. By restriction mapping, the transcribed region of *To whom reprint requests should be addressed.

tPresent address: University of Naples, Dipartimento di Biologia and Patologia Cellulare and Molecolare, Via Sergio Pansini, 5, Naples, Italy 80131. :This sequence is being deposited in the EMBL/GenBank data base (Bolt, Beranek, and Newman Laboratories, Cambridge, MA, and Eur. Mol. Biol. Lab., Heidelberg) (accession no. J03639).

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

2061

2062

Biochemistry: Eva et al.

Proc. Natl. Acad. Sci. USA 85 (1988) 0

10

20

30

40 kb

PC

E

dbi genomn ckln

E

E .-I-

H E

E

E E

E

E

E

E

E

.1

pq

Iax I

I\ I (E)Ss N

dbl cDNA

I

b

/ \

/

/

\\,/ FE

x

Sc

S H

(E)t

__m

I

I

I I

genomic fragment a

t

pSs N IC 4

0

0.5 I

1.0

I

1.5

2.0

~~~~~I ~ ~ I

FIG. 1. Restriction map and structure of DBL genomic and cDNA clones. (Top) Physical map of the DBL genomic clone. The open boxes represent the 5' and 3' structural rearrangements of the DBL gene involving recombinational events with human sequences of different origin. Genomic fragments a and b were subcloned in pUC13 and used as probes to screen by plaque hybridization the recombinant phage cDNA library. (Middle) Structure and restriction map of the assembled DBL cDNA, as determined by characterization of the isolated DBL cDNA clones. The origin of the 5' end of the DBL transcribed sequences from genomic fragment a was determined by nucleotide sequence analysis. The region of the DBL genomic clone corresponding to its cDNA was approximated on the basis of Southern blot analysis. The hatched box represents the 5'-untranslated region, solid box represents the whole DBL coding sequence, cross-hatched box represents the 3'-untranslated region. (Bottom) Restriction map and structure of DBL genomic fragment a. The stippled box indicates the genomic region where a putative promoter was identified, the hatched box represents 5'-untranslated region, the solid box represents the first exon of the DBL gene, and the

dotted line indicates the beginning of the first intron. Restriction sites are as follows: A, Acc I; C, Cla I; E, EcoRl; H, Hind1Il; N, NotI; P, Pst I; S, Sal I; Sc, Sac I; Ss, Sac II; X, Xba I. Ss and H cleavage sites shown on DBL cDNA schematic diagram (Middle) delimit the boundaries of P4.5 and pC11 inserts (see text).

DBL corresponded closely to its normal homologue except at the 5'end where a rearrangement involving transcribed DBL sequences distinguished the oncogene from its normal counterpart. A rearrangement was also detected at the 3' terminus of DBL relative to the DBL protooncogene, but this rearrangement did not involve sequences that were detectably transcribed (8). Further evidence of structural rearrangement of the oncogene derived from findings that its 2.8-kb hybrid transcript was significantly smaller than the 5.3-kb transcript of the DBL protooncogene (8). To characterize the DBL oncogene and define its transcriptional unit, we isolated cDNA clones corresponding to the entire DBL oncogene coding sequences. To isolate DBL cDNA, poly(A) + RNA was purified from a DBL transfectant and used as the template for oligo(dT)-primed cDNA synthesis. A cDNA library consisting of 106 recombinant clones was constructed in Agtll and screened by plaque hybridization with two repetitive-sequence-free genomic fragments (fragments a and b in Fig. 1), isolated from the cloned DBL oncogene and known to contain coding sequences (8). Twenty-two recombinant phages were identified, each containing EcoRI fragments ranging from 0.18 to 1.8 kb. Further characterization of these fragments by restriction enzyme analysis and hybridization to DBL-induced transfectant DNAs indicated that a series of overlapping clones had been isolated. The physical map of the assembled DBL cDNA was constructed and found to span =2.3 kb, as shown in Fig. 1. Results of hybridization of cDNA clones to human placenta DNA were consistent with the presence of a structural rearrangement at the 5' end of the DBL oncogene, involving a recombinational event with coding sequences of a different origin, as reported (8). Sequence and Organization of the DBL Gene Transcript. The complete nucleotide sequence of DBL cDNA clones was determined by the methods of Maxam and Gilbert (22) and Sanger et al. (23) after subcloning the cDNA fragments into pUC19. As shown in Fig. 2, a single major open reading frame of 1418 base pairs, capable of directing synthesis of a

478-amino acid polypeptide, extended from nucleotide 157 at the 5' end to the termination codon TGA at nucleotide 1590. The first in-frame ATG at position 157 was preceded by a termination codon, TAG, at position 61. Moreover, a purine was present three nucleotides upstream from this methionine, in accordance with some of the conserved features that identify the initiation codon (25). The next in-frame methionine codon was localized within the known transcribed region of DBL (8) at position 388. Thus, the ATG at position 157 is the most likely initiation codon for translation of the DBL oncogene product. To define the full 5' extent of the DBL transcript, we searched for DBL genomic DNA fragments overlapping with the 5' end of DBL cDNA and extending upstream from it. We chose the genomic fragment a (Fig. 1), used to screen the DBL cDNA library, since it was known to contain 5'-coding sequences and to originate from the 5'-rearranged region of the DBL genomic clone (8). This fragment, whose restriction map is shown in Fig. 1, was found to contain 292 nucleotides overlapping with the cDNA as well as information upstream. Comparison of the sequence with consensus sequences for eukaryotic regulatory elements revealed several regions of significant homology (Fig. 2). In the genomic fragment, ending two nucleotides upstream from the 5' end of the cloned cDNA sequence, a CACTTCTCCT sequence, similar to the described cap site consensus sequence CAYYYYYYYY (where Y = pyrimidine) (26), was identified. The first cytidine of this nucleotide stretch was tentatively identified as the transcriptional initiation site (position 1). The DBLcoding sequence was, thus, flanked by a 5'-noncoding stretch of 156 nucleotides. The 5'-untranslated region and the genomic region upstream from the hypothetical cap site were highly G + C rich. The sequences from positions - 241 to -1 and from positions + 1 to + 156 were 76% and 72% G + C, respectively. Upstream from the cap site, from positions -117 to - 57, three GGGCGG repeats were identified. These same sequences have been found in the promoters of certain viral and cellular genes and appear to serve as bind-

2063

Proc. Natl. Acad. Sci. USA 85 (1988)

Biochemistry: Eva et al.

I Start of Genornic Sequence GAACT TGT TTGAGCCGTAAGC CCGAGCCTAGCGTCGCACGCTGGGCGACTCCCCTCAGGCTCTCAGGCCGGCGCCTTCGGGGGACCACGTAGCGCCCCAGCGGTGGCGGCTGCGCCCGGC -240

-230

-220

-110

-100

-210

-200

-190

-160

-170

-160

-150

-140

-130

-90

-60

-70

-60

-50

-40

-30

-20

-10

I Start of cDNA Seqence CCGAGAGACTGAGC CGCGCTGGCAGC TCGCGTCGAGTCGGACTGCCC

OCGCATCCCGCGGCGCCCGGTCGGGTCCCGGGCACCAGGCAACACCTAGGCCGTTC

TC CG[IT9T'2flCCCGAGAGACTGAGCCGCGCTGGCAGCTCGCGTCGAGTCGGACTGCCC KOCGCATCCCGCGGCGCCCGGTCGGGTCCCGGGCACCAGGCAACACCTAGGCCGT 1 60 30 40 10 20 50 90 60 70 100 110 10

20

M S S G R R R G SA P W H S F S R F F A PR S P S R D K CCTTCAGACAGCCCCGGGCCAGCGGCCCCCTCGGGAAATGTCCAGCGGCCGCAGAAGGGGCAGCGCCCCCTGGCACAGCTTCTCCCGGTTCTTCGCTCCCCGAAGTCCTTCCCGGGACAA CCTTCAGACAGCCCCGGGCCAGCGGCCCCCTCGGGAAATGTCCAGCGGCCGCAGAAGGGGCAGCGCCCCCTGGCACAGCT TCTCCCGGT TCTTCGCTCCCCGAAGTCCT TCCCGGGACAA 120

130

150

140

160

160

170

1,0

40

30

200

210

50

220

60

E E E E E E R PGQ TS P P P A P G R S A A SH V L N E LI Q T E R GGAAGAGGAAGAGGAGGAGAGGCCGGGGACGAGCCCGCCTCCAGCTCCAGGCCGGTCCGCTGCCAGCTGGGAGTCGAGCAGGGGGTCGCGG ... 240

250

260

27-0

260

70

290

300

310

320

330

890

V

230

Y

V

340

R EL Y 350

10 0

T V L L G Y R A E M D 'NO P E U F D L U P P 9L L R N K K D I L F G N U A E I Y E F TACTGTTTTGTTGGGTTATAGAGCGGAGATGGATAATCCAGAGATGTTTGATCTTATGCCACCTCTCCTGAGAAATAAAAAGGACATTCTCTTTGGAAACATGGCAGAAATATATGAATT 360

370

360

390

420

410

400

430

1 20

110

H

N D

460

I F

L

LE N

500

490 150

PRSET 600

S S

1W

R

440

450

1 30

A

H

A

PE R

V

G

530

520

510

P C

460

470

14 0

F L

540

E R

550

K

D

D

F

560

Q

M Y

A

K Y

560

570

C

Q

N

K

590

10170o

6 A F FQ F Q E CCQR Q R K L KHR K HR L R L" L D S Y L L K P V Q RI~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~toC

KY

SEC

620

610

C

630

1 90

640

650

660

660

670

690

21

2 00

700

710

220

G SA Lo L K K A L D AM L DoL L K S V N D S T K Y Q L L L K E L L K Y S K D C E CACTAAATATCAGTTATTGTTGAAGGAGCTATTAAAATATAGCAAAGACTGTGAAGGATCTGCTCTGTTGAAGAAGGCACTCGATGCAATGCTGGATTTACTGAAGTCAGTTAATGATTC 720

740

730

750

2 30

760

760

770

620

610

600

790

2 40

2 50

630

2 60

U H Q I A I N G Y I G N L N E L G K M I M Q G G F S V W IG U K D H K K G A T K TATGCATCAGATTGCAATAAATGGCTATATTGGAAACTTAAATGAACTGGGCAAGATGATAATGCAAGGTGGATTCAGCGTTTGGATAGGGCACAAGAAAGGTGCTACAAAAATGAAGGA 640

650

270

660

670

690

660

900

910

260

930

920

290

940

950

300

P M Q R H L F L YE K A I V F C K R R V E S G E G S D R Y P S Y S F K TTTGGCTAGATTCAAACCAATGCAGCGACACCTTTTCTTGTATGAAAAAGCCATTGTTTTTTGCAAAAGGCGTGTTGAAAGTGGAGAAGGCTCTGACAGATACCCGTCATACAGTTTTAA L A

R

960

F

K

960

970

990

310

H

C

1000

1010

1020

1030

320

W

1060

K

U

D

1090

E

V

G

I T

1100

E

Y

V

K

G D

1120

1110

N

R

K

1130

ElI

F

1060

1050

1070

340

W Y

GE

1150

1140

360

350

1040

330

K

1160

E

E

V

Y

I V 1160

1170

Q

A SH

V

D

1190

360

370

U T W L K ElI R N I LL K Q Q E L L T V K K R K Q Q D Q L T E R D K F Q I S V K TGTGAAGATGACGTGGCTAAAAGAAATAAGAAATATTTTGTTGAAGCAGCAGGAACTTTTGACAGTTAAAAAAAGAAAGCAACAGGATCAATTAACAGAACGGGATAAGTTTCAGATTTC 1200

L

1210

3 90

'Q

Q

1220

N D

E

K

Q

1230

Q

G A

1250

1240

4 00

F I ST

E

E

1260

1270

1260 4 10

1290

1300

1310

42 0

V E V C E A I A S V Q A E A TCTTCAGCAGAATGATGAAAAGCAACAGGGAGCTTTTATAAGTACTGAGGAAACTGAATTGGAACACACCAGCACTGTGGTGGAGGTCTGTGAGGCAATTGCGTCAGTTCAGGCAGAAGC 1320

1330

4 30

1340

1350

1360

TE L

1370

E

H

1360

4 40

T

S T

1390

V

1400

1420

1410

4 50

1430

46 0

ElI S E EP A E W S S N Y F Y P T Y D EN E E E N R P LUM AAATACAGTTTGGACTGAGGCATCACAATCTGTAGAAATCTCTGAAGAACCTGCGGAATGGTCAAGCAACTATTTCTACCCCACTTATGATGAAAATGAAGAAGAAAATAGGCCCCTCAT N

T V

1440

R

W

T EA S

1450

P

V

S E

Q

S

1460

U

A LL

V

1470

1460

1490

1500

1510

1520

1530

1540

1550

YC

1560

1570

1560

1590

1600

1610

1620

1630

1640

1650

1660

1670

1660

1690

1700

1710

1720

1730

1740

1750

1760

1770

1760

1790

ATTTAATGTGTCAGATATTGTGCTTGAAAGATTCTCATCTCAGAATACTTTTGGACTTGAAAATTATTTCTTCTCTACTTTGTAACCAAATGCAATCGGTGTGCCTTGGATTATTTAGTT 1600

1610

1620

1630

1640

1650

1660

1670

1660

1690

1900

1910

1920

1930

1940

1950

1960

1970

1960

1990

2000

2010

2020

2030

2040

2050

2060

2070

2060

2090

2100

2110

2120

2130

2140

2150

TAATTCT TCAGGTTAAAAAAAAAAAAAAAA 2260

2290

2300

FIG. 2. Nucleotide sequence and deduced amino acid sequence of DBL transcriptional unit and promoter. Nucleotides are numbered from the deoxycytidine residue of the putative cap site and amino acid residues from the presumed initiating methionine. The three 0 + C motifs at positions - 117, - 82, and - 62, the cap site, and the in-frame upstream stop codon TAG at position 59 are boxed. Nucleotides in italics indicate the region of divergence between the genomic fragment a (Fig. 1) and the cDNA and probably represent the beginning of the first intron. Two stop codons are boxed at the end of the open reading frame. The polyadenylylation site is indicated at position 2271. This figure was prepared by using the DRAW program of Shapiro and Senapathy (24).

ing sites for promoter-specific transcription factors (27, 28). Downstream from the putative initiation codon for the DBL gene product, at position + 305, the genomic nucleotide sequence diverged from the cDNA sequence, indicating the end of the first exon. At the point of divergence, a GCCAG/ GTGGATG nucleotide stretch was identified, compatible with a donor-splice consensus sequence (29). At position 2271, within the 702 nucleotides comprising the 3'untranslated region, a polyadenylylation signal consensus sequence, AATAAA (30), was present, followed 17 nucleotides downstream by a stretch of 16 adenines. All of these findings indicated that we had cloned the entire transcript of DBL from two nucleotides downstream of the cap site to the polyadenylylation site. The structures of the DBL cDNA and

the overlapping genomic, fragment comprising the putative DBL transcriptional unit are diagramed in Fig. 1. Primary and Secondary Structure of the DBL Oncoprotein. To gain insight into the possible function of the DBL oncogene product, we searched the National Biomedical Research Foundation data base§ for similar sequences by using the FASTP program of Lipman and Pearson (31). No similarity to oncogene products or oncogene-related proteins such as protein kinases was found. Moreover, there were no statistically significant matches between DBL and any other sequences in the data base.

§Protein Identification Resource (1987) Protein Sequence Database (Natl. Biomed. Res. Found., Washington, DC), Release 13.0.

2064

Biochemistry: Eva et al.

Proc. Natl. Acad. Sci. USA 85 (1988)

We also attempted to uncover clues to DBL function by analyzing the sequence for features that might be characteristic of various classes of proteins. The Hopp-Woods algorithm (32) was employed to examine the hydropathic properties of the predicted DBL sequence. The results of this analysis are displayed in Fig. 3. The DBL product was markedly hydrophilic in its overall characteristics. Further analysis with the Kyte-Doolittle algorithm (33) revealed no hydrophobic domains indicative of a membrane spanning region, nor was there a presecretory signal peptide (34). Further analysis with the Quest program (IntelliGenetics) for signal sequences revealed the presence of one potential site for N-linked glycosylation starting at residue 228. The presence of nine cysteines in the sequence prompted us to determine if they were arranged in a pattern characteristic of metal ion binding and nucleic acid binding domains (35). Serine and threonine kinases (36) as well as phospholipase A2 (37) also contain characteristic spacings of cysteines. None of these patterns was found in the predicted DBL protein sequence. Finally, two putative sites for serine phosphorylation by a cAMP-dependent protein kinase were identified starting at residues 5 and 291, respectively, in agreement with the reported consensus sequence Arg-ArgXaa-(Xaa)-Ser-Xaa (38). This is consistent with our findings (14) that the DBL p66 product is a phosphoprotein with phosphorylation occurring specifically at serine residues. Transforming Activity of DBL cDNA. To establish that a functional DBL cDNA had been isolated, we constructed vectors for expression of the gene in mammalian cells. pZIPNeoSV(X)1 and pSV2/gpt/MuLV, which utilize transcriptional regulatory sequences of the Moloney murine leukemia virus long terminal repeat, have been described (39, 40). The DBL cDNA was first assembled and subcloned in pUC13. The insert was then released from the plasmid by digestion with Sst II and HindIII, with resultant removal of most of the 5'- and 3'-untranslated regions (Fig. 1). The Sst II and HindIII sites were rendered blunt-ended, converted to BamHI recognition sites, and ligated to the BamHI sites of each vector. Two of the recombinant plasmids obtained (p4.5 and pC11) were tested in DNA transfection assay utilizing NIH 3T3 cells. Foci of transformed cells were detected as early as 4-5 days after transfection and showed the morphology characteristic of NIH 3T3 cells transformed by the DBL oncogene (13). As shown in Table 1, the transforming efficiencies of the clones were 3.6-5.8 x 10' foci per pmol of DNA, I' 3 LUI

2

D

1 C-)

A

0

.1tl/ltlllE|t,la ..h

I

a_ 0

1

0-

-2

_

-3

50 100 150 200 250 300 350 400 450 500

CODON NUMBER

FIG. 3. Hydrophilicity of the predicted amino acid sequence of the DBL protein. The sequence was analyzed by using the HoppWoods algorithm (in the IntelliGenetics PEP analysis package) and a window of six residues. Hydrophilic and hydrophobic values are greater than and less than zero, respectively.

Table 1. Transforming activity of DBL cDNA Foci, no. per pmol of DNA Selection Selection for Eco gpt for neoR DNA No selection expression expression p4.5(pSV2/gpt) 4 x 105 2 x 105 NT 3 x 105 pC11(pZIP-neo)