Nucleotide sequence of a glucosyltransferase gene from ...

9 downloads 0 Views 2MB Size Report
MC CM GOT mTT AC0 GAT GAG ATG TTA &A GM GOC 00G MT GTG GCT ACT. CT GMA TCT. Asn Gln Val Phe TOr Asp Glu Set Lou Ala Glu Ala Lys Asn Vol ...
Vol. 169, No. 9

JOURNAL OF BACTERIOLOGY, Sept. 1987, p. 4271-4278

0021-9193/87/094271-08$02.00/0 Copyright © 1987, American Society for Microbiology

Nucleotide Sequence of a Glucosyltransferase Gene from Streptococcus sobrinus MFe28 JOSEPH J. FERRETTI,'* MARTYN L. GILPIN,2 AND ROY R. B. RUSSELL2 Department of Microbiology and Immunology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma 73190,1 and Dental Research Unit, Royal College of Surgeons of England, Downe, Kent BR6 7JJ, United Kingdom2 Received 3 April 1987/Accepted 2 June 1987 The complete nucleotide sequence was determined for the Streptococcus sobrinus MFe28 g#f gene, which encodes a glucosyltransferase that produces an insoluble glucan product. A single open reading frame encodes a mature glucosyltransferase protein of 1,559 amino acids (Mr, 172,983) and a signal peptide of 38 amino acids. In the C-terminal one-third of the protein there are six repeating units containing 35 amino acids of partial homology and two repeating units containing 48 amino acids of complete homology. The functional role of these repeating units remains to be determined, although truncated forms of glucosyltransferase containing only the first two repeating units of partial homology maintained glucosyltransferase activity and the ability to bind glucan. Regions of homology with alpha-amylase and glycogen phosphorylase were identified in the glucosyltransferase protein and may represent regions involved in functionally similar domains.

The glucosyltransferases (EC 2.4.1.5) produced by various species of oral streptococci are of considerable interest because of their production of extracellular glucans from sucrose. These glucans are thought to play a key role in the development of dental plaque because of their ability to adhere to smooth surfaces and mediate the aggregation of bacterial cells and food debris (12). It is known that a single strain can produce several distinct glucosyltransferases differing in electrophoretic, antigenic, or enzymatic properties, although some of this apparent variety may be due to the use of different oral streptococcal strains and different purification procedures and activity assays by different laboratories. The properties and characteristics of the glucosyltransferases of the mutans group streptococci have been reviewed by Ciardi (3) and Mukasa (18). Recently, several glucosyltransferase genes from various strains of streptococci have been cloned by recombinant DNA techniques and have been shown to be expressed in Escherichia coli. Robeson et al. (24) have cloned a glucosyltransferase gene (gtfA) from Streptococcus mutans UAB90 (serotype c) and shown that it produces a protein with a molecular weight of 55,000. A similar gtfA gene has also been cloned by Pucci and Macrina (23) from S. mutans LM7 (serotype e) and by Burne et al. (2) from S. mutans GS5 (serotype c). Aoki et al. (1) reported the cloning of a glucosyltransferase gene (gtfB) from S. mutans GS-5 that produces a protein with a molecular weight of about 150,000. Another glucosyltransferase gene, gtfC, which specifies a 150,000-molecular-weight polypeptide has been obtained from S. mutans LM7 by Pucci et al. (22). Finally, Gilpin et al. (9) have cloned two glucosyltransferase genes from Streptococcus sobrinus MFe28 (serotype h): gtfS, which encodes a glucosyltransferase that synthesizes a watersoluble glucan, and gtfl, which encodes a glucosyltransferase that synthesizes a water-insoluble glucan. The availability of these cloned genes allows further characterization of both the genes and gene products, and in this communication, we report the complete nucleotide sequence of the gtfl gene from S. sobrinus MFe28. *

MATERIALS AND METHODS Bacteria and media. E. coli MAF1 (containing plasmid pMLG1) was the initial source of the S. sobrinus MFe28 gtfp gene (27a). E. coli MAF5 contains plasmid pMLG5, which has the same 5.0-kilobase (kb) fragment as pMLG1 and an additional 0.5-kb fragment from the bacteriophage lambda recombinant in which the gtfp insert was first cloned (9). E. coli JM109 was used as the recipient for transfection experiments with M13 bacteriophage vectors (35) and was routinely grown in 2x YT broth (19). Soft agar overlays consisted of 2x YT broth supplemented with final concentrations of 0.75% agar, 0.33 mM isopropyl-3-D-thiogalactopyranoside, and 0.02% 5-bromo-4-chloro-3-indolyl-3-galactoside for differentiating recombinant and nonrecombinant phages. For the titration of M13 recombinants carrying all or part of gtJl, phages were plated on E. coli JM109 on B-broth agar (26) to which 1% sucrose was added for detection of enzyme activity. Enzymes and chemicals. Restriction enzymes were purchased from Bethesda Research Laboratories, Inc., Gaithersburg, Md., and were used in accordance with the specifications of the manufacturer. T4 DNA ligase was purchased from Amersham Corp., Arlington Heights, Ill., or Bethesda Research Laboratories. The Klenow fragment of DNA polymerase and the M13 15-base primer were purchased from Bethesda Research Laboratories. The deoxy- and dideoxynucleotide triphosphates were purchased from P-L Biochemicals, Inc., and [a-32P]dATP was purchased from New England Nuclear Corp., Boston, Mass.). Isopropyl-3-D-thiogalactopyranoside and 5-bromo-4-chloro-3-indolyl-3-galactoside) were purchased from Sigma Chemical Co., St. Louis, Mo. Subcloning of the gtfl gene and nucleotide sequencing. The gtfl gene was obtained for subcloning experiments by digestion of pMLG1 with HindIII followed by electrophoresis and isolation of the 5.0-kb fragment from 0.8% type VII agarose gels as previously described (16). The fragment was unidirectionally degraded with Bal 31 by a modification of the procedure of Gilmore et al. (8), and all subcloning into M13 phages mpl8 and mpl9 was done as described by Ferretti et al. (7). A 0.5-kb HindIIl fragment was subsequently isolated

Corresponding author. 4271

4272

FERRETTI ET AL.

J. BACTERIOL.

co ('4

in

P

so.

I(

4(

IIh-.a

........

sis of purified glucosyltransferase was done with an Applied Biosystems 470A protein sequencer with an on-line 120A PTH analyzer in accordance with the instructions of the manufacturer.

..1 ....

!X.......

FIG. 1. SDS-PAGE of glucosyltransferase activity in S. sobrinus MFe28, E. coli MAF4 (carrying pMLG5), and E. coli MAF1 (carrying pMLG1). The right lane contains the following protein standards: RNA polymerase 1' subunit (Mr, 165,000), RNA polymerase P subunit (155,000), 3-galactosidase (116,000), phosphorylase b (97,400), albumin (66,000), and ovalbumin (45,000).

from pMLG5, cloned into M13 phages, and sequenced. This fragment contained the remainder of the glucosyltransferase sequence not present in the pMLG1 HindIII fragment. Sequencing reactions were performed by the Sanger dideoxy chain termination method (28) by using the procedures described by Amersham. All sequences were confirmed from at least two overlapping clones, and the entire gene sequence was determined on both strands. The sequence information was analyzed by the James M. Pustell DNA/ protein sequencing program obtained from International Biotechnologies, Inc., New Haven, Conn. Gel electrophoresis. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and detection of glucosyltransferase activity by incubation of gels with sucrose in the presence of Triton X-100 was done as described previously (9), but the sensitivity of the method was enhanced by the use of a periodic acid-Schiff reagent procedure modified from published methods (15, 29). After incubation in sucrose (generally for 40 h at 37°C), the gels were fixed for 30 min in 75% ethanol and treated on a shaker for 30 min with 0.7% periodic acid in 5% acetic acid. They were then shaken for 60 min in several changes of 0.2% sodium metabisulfite in 5% acetic acid and placed in Schiff reagent (Sigma) for several hours. Finally, the gels were washed extensively in 45% methanol-45% acetic acid-10% water. All procedures were carried out at room temperature. Purification of glucosyltransferase and glucan-binding peptides. Glucosyltransferase was purified from cells of E. coli MAF5 (which carries plasmid pMLG5) by subjecting the bacteria to ultrasonic disruption and using a single-step affinity chromatography procedure: the bacterial extract was passed through a column containing Sepharose 1000 (Pharmacia) and mutan; after the column was washed with buffer, bound glucosyltransferase was eluted with 5 M guanidine hydrochloride (26). The same procedure was used for detection of glucan-binding peptides; i.e., disrupted cell extracts, plus for phage M13-infected cultures, the culture supernatant, were passed through the affinity column. Eluted peptides were analyzed by SDS-PAGE, and derivation from glucosyltransferase was confirmed by Western blotting (immunoblotting) with antiserum against purified glucosyltransferase (27a). N-terminal sequence analysis. N-terminal sequence analy-

RESULTS Cloning of complete gtl gene. The gtpl gene from S. sobrinus MFe28 was originally cloned into the BamHI sites of bacteriophage lambda L47.1 and was located in a 7.6-kb DNA insert. Subsequently, mapping experiments showed that HindIII cleaved the insert at three points and the lambda DNA at two points outside the insert to give four fragments of 5.0, 2.6, 2.5, and 0.5 kb which carried S. sobrinus DNA. Plasmid pMLG1, from which a functional glucosyltransferase is expressed in E. coli, carries the 5.0-kb fragment (27a). At an early stage of the nucleotide-sequencing program, it became clear that pMLG1 contained a long open reading frame with no termination codon. These results indicated that the entire gtp gene was not present in pMLG1, and so a further collection of derivatives of pBR322 carrying HindIII fragments from the bacteriophage lambda recombinants was examined. One of these, pMLG5, encoded a glucosyltransferase of 173 ki}odaltons (Fig. 1) and was found to have both the 5.0- and 0.5-kb fragments. Restriction mapping of pMLG5 and subsequent nucleotide sequencing confirmed that the 0.5-kb fragment carried the information for the C-terminal region of glucosyltransferase. A partial restriction site map of the 5.5-kb insert containing the gft gene is shown in Fig. 2. Nucleotide sequence. The complete nucleotide sequence of the 4,995-base-pair (bp) fragment carrying the gtp gene was determined in both orientations and is shown in Fig. 3. Previous evidence had indicated that 0.5 kb of the insert in pMLG1 and pMLG5 was derived from the bacteriophage lambda vector (27a), and this was confirmed by the finding that the first 494 bp of the 5.5-kb fragment showed 100% homology with the corresponding part of the lambda sequence (lambda sequence not shown). An open reading frame containing 4,791 bp codes for the glucosyltransferase protein. The deduced amino acid sequence, which is coded starting at the ATG codon beginning at position 160 and extending to the termination codon TAA at position 4951, contains 1,597 amino acids and has a molecular weight of 177,100. A putative ribosome-binding site sequence (AGGAGGA) is located nine nucleotides upstream from the translation initiation codon. Further upstream is a probable -35 region (TTGACG) separated by 18 bp from a proposed -10 region (TTAAAA). Amino acid composition. The deduced amino acid composition indicated a highly hydrophilic protein, with a hydrophobic N-terminal region. This region displayed the characteristics expected of a signal peptide, i.e., a basic N-terminal region followed by a central hydrophobic region and a more polar C-terminal region. The residues surrounding amino acid 38 conform with the "-3, -1" rule proposed by von Heijne (31) for amino acids found at a cleavage site. To Haei

KWnI

-

10.Clal EcoRV

800

1600

Tbal

K,pnl

NciI

Aval

Hidul PatIHindlI

Clal.b 8acl 20 0 3

2400

3200

4000

4800

bp

FIG. 2. Partial restriction map of the 4,995-bp fragment containing the gtp gene and the 494-bp fragment of bacteriophage lambda (hatched) carried by plasmid pMLG5. The 5' end of the gene is at the left, and the 3' end is at the right.

DNA SEQUENCE OF S. SOBRINUS gtfl

VOL. 169, 1987

MC

AGA

CM

TOT MT

100

90

60

50

CAA ACA TIC ATG CCT1ITC TC

s0

70

40

30

20

10

GAT OG1 CIA TOG TM

MC

TAG AMA

110

1330

1340

1350

*

*

*

CTG GAC TCr CGT TTC ACC TAC AT Thr Gly Ser Leu Asp Sr Arg Ph. Thr Tyr Asn

ACT GOT TCC

120

AGl TCO

TTA

190

TOT

TO

CM

170

AM

AGT

210

230

220

260

270

200

290

1450

310

320

330

350

340

1510

380

370

390

410

400

430

1520

1530

*

0

1580

1570

440

450

460

470

510

530

520

560

550

570

500

590

620

610

630

640

650

1660

1670

*

0

1690

1700

1710

1720

1730

*

*

0

*

*

CTC

680

670

690

700

710

Ltu

740

750

760

780

770

TT GCM ACT G0C GCT TAC MA GAC ACT AGC 4G GTA GCM GOTATMAM ATC TAT Ile Tyr Tyr P60 Asp Glu TOn Gly A10 Tyr Lys Asp Thr SIr Lys Vol Glu Ala Aop Lys

Asp Lys Arg Ser Gly

Ltu

Ain Pro

810

830

820

840

MC1AM GA ACA0CC TT OCT GT MC MC TCA GOT TCC GAT ATO AG AM Ser Gly SMr Asp Ioe Ser Lyo Glu Glu Thr Thr Phe Alo Alo Asn Ann Arg A1 Tyr SIr

860

850

870

900

090

000

AAMC TTG ACA OCT MAC TCA TOG TAO OGT ACC TCA GCT GM MC TT GM GC ATT GAT Thr Ser Ala Glu Ann Phe Glu Ala Ile Asp Asn Tyr Lou Thr A1a Asp Ser Trp Tyr Arg 920

910

930

960

950

940

AAG C TG ACA GM TCA AGC AMS GAT GAC TTC CdT CCA AMG TCC ATC CTC MAG GAT Pro Lys SMt Ile Lnu Lys Asp Gly Lys Thr Trp Thr Glu Ser SMr Lys Asp Asp Ph6 Arg 980

970

990

1010

1000

1020

CCG CTA TT2 ATG GOT T1G TG1 CCA GAT ACC G00 ACC AM COC MC TAT GOT MC TAC A02 Pro L tuL SuNot A1 Trp Trp Pro Asp Thr Glu Thr Lys Arg Asn Tyr Vol AMn Tyr oet 1030

1040

1050

1060

GOT GTT GOT ATT GAT AM ACC TAT Ace0 GOT G00A AMn Lys Val Val Gly Ile Asp Lys lir Tyr Tn Ala0 Gli MC MG

1090

1100

1110

1070

1080

CG ACA 40ACM OCT CGAC 04ln A41 Asp Lou Thr 1130

1120

1140

1160

1170

1190

1100

1200

1740 *

1600

Ile His

Ann SIr

Lou Val Asp Arg Glu Val Asp

1860

1850

1840

1830

G0A

GAC CGT CAA GTT GM ACC GII CCA A0T TAC AGC TO CC OTTGCT CAC G0 AGC GM Asp Arg Gli Val Glu Thr Val Pro Ser Tyr Ser Phs Ala Arg 410 Ris Asp SIr Glu

CM

GAC CTG

Gln

Anp

Lou

TOT ACT

CM

1800

ATT

ITe

Phe Thr Gln

1990

1980

1970

CM GCC TTC MAG AT TAC MC GAA GAC CTC GAMG ACe Gln A10 Phe Lys Ile Tyr Ann Glu Asp Ltu Lys Lys Thr

4T GTG COG CTT

Ann

Val Pro Lu

2060

G00 ToT A Cr OTT GTT TAOT

41

SMr

Ltu Thnr

Tyr Thr Leu Lou

2080

Lys

An

2090

2100

T0 C ACC GAT MGAT 0T CM Sot Phe Thr Asp Asp Gly Gln

TAC ATG Gcc Tyr Sot Ala

2150

2160

A GAT ATG

2120

AMG

TCT TAT ACC TTG CTT CTG ACT MC

2070

Gly Ser Il. Pro Arg Val Tyr Tyr Gly Asp

2040

2030

2020

2010

2000

GAT MG AM TAC ACT CAC TAC Asp Lys Lys Tyr Thr His Tyr

TOT 0G0 TAT TO Ph6 Gly Tyr SMr

GCA Ala

1960

1950

1940

OAC GM ATc GAC In e Asp Asp Glu

Val 1920

1910

1900

1890

OTT ATT MG GCT GM AT MU CCA AAT Pro AMn e I10 I1e Lys Ala GluIn A1

CGT GAC Arg Asp

1930

2130

2140

G01 MC TAC GAT GCT ATC GM TCT CTG CTG AM GIC CGT AT AMG TAC OTT Thr Val Asn Tyr Asp A10 Ile Glu SMr Lou Lon Lys Al Arg Sot Lys Tyr Vol

MC MG ACT Asn Lys

2180

2170

2220

2210

2200

2190

G00 G0T CAA GCT ATG CM AAT TAC CM ATC GT MTCGOTCGM ATC TTG ACT TCT GTC Ala Gly Cly Gln Ala Hot Gln Ann Tyr Gln Ile Gly Ann Gly Glu Ile Ltu Thr Ser Val

GCT

2240

2230

CGT TAT 00 AMS

G10

GCC Arg Tyr Gly Lys Gly Ala

2290

2250

2300

Gly Val Gly Val Val

Sot

2350

CTC MC A0GGT

Lto

2310

CM CCC

Gly Asn Gln Pro 2360

GCT

Asn Met Gly Ala

GCC

A41

CAC His

MTC TT

Ann

2330

MG A0G4M

TAC OTT Tyr Arg

2430

GOT A41

TOG TTG GT GM G ATC TCA GCT OT GTT AAA A0111AACA CM TG AAT ST GCM Lyo Trp Ltu Arg Glu A01 Ile Ser A01 Phe Vol Lys Thr Gln Pro Gln Trp Asn Gly Glu

2490

TAC CTC TAC TTC 200 Tyr Lou Tyr Phe Leu

Ala

AACC AAM Thn

Lys

2460

2450

OT

2400

2470

0A CAT G0A MC0T0 Thr Asp Glu Ann Gly

Val

2400

CTT T0 ATG GTA Lou Set Val SMr

2440

lCT

C0TA GCC

2390

2380

OCT MC CM GM Ala Asn Gln Glu

2340

2320

AGC 20T

404 GAT OCTMC ocAGCM GG 0T G5T GOT oCA ACC TAT Asp Gly Val Ala Thr Tyr Ala Thn Asp Ala Asp Ala Ser Lys Ala Gly

GAC

2280

Phe SrLo u Asp Gly Lys Val

2370

2420

2410

2270

2260

AM CM AGC GAT AMG GT GAT GOG ACA OCT C ACG TCA Lou Lys Gln SMr Asp Lys Gly Asp Ala Thr Thr Arg Thr Ser

COT

GOT OTC GOC GTT GTT ATG G0A MC

CM AAT ACC ACT 0C GOT CM 0cC OGT ATC M CAA AG A0T GAC GCA O&A G&T Ala Ala Ala Glu Ltu Vol Gln Ala Arg I1e Glu 0ln Lys I11 Thr Thr Glu Gln AMn TOr

1150

Ltu

1820

2110 800

790

*

1790

1780

1770

1760

1750

2050

730

1680

OT CGATG 04T GAC AMA CT TCT GOC TTG MT CCC CTC ATCCAT MCAG0 CTG OTT GOCGT

720

TAT TAc TAC GAT CM GAC GC AAC GTT AMG AG MC TT Gct GTT AOC GTT GT GAG AMG Tyr Tyr Tyr Asp Gin Asp Gly Asn Vol Lys Lys Ann Phe A01 Vol SMr Vol Gly Glu Lys

*

ATG MC ATG GAC MC AMG TC CGT TTS TCT ATG CTT TCG TOT TTG GCT AM CCA T21 lNot Asn Hot Asp Ann Lys Ph6 Arg Lou Ser Met Leu Trp Ser Lou Ala Lys Pro Leu

660

MC TTG0CC AM ATG TCA MT GTT AMG CAG GTT GAC GUT MA TAT MT TCA ATT CCA T MA Asn SIr Ile Pro Sir Asp Lnu A41 Lys lot SMr Asn Vol Lys Gln Vol Aso Gly Lys Tyr

1620

*

*

600

GOT mTT AC0 GAT GAG ATG TTA &A GM GOC 00G MT GTG GCT ACT CT GMA TCT Asn Gln Val Phe TOr Asp Glu Set Lou Ala Glu Ala Lys Asn Vol A01 Thr Alo Glu Ser

*

1610

1650

1870

MC CM

*

*

540

AOG G&A ocr CAA ACA ACC ACA AAT oCT AAT GAA GCT AAG TGG GT CSG ACT GMAA T GAG Thr Ala Alo Gln Thr Thr Thr AMn Alo Ann Glu Alo Lys Trp Vol Pro ohr Glu AMn Glu

1600

*

1640 *

1010 500

490

1560

*

*

GTA GAM CA TGG AOC GAC MC G4 ACC CCT TAT CCC CAT GAT 04T GOC GAC MC Val Glu Ala Trp Ser Asp Ann Asp Thr Pro Tyr LAu His Asp Asp Gly Asp Asn

480

G0 A GAG CAA AOT CM 0A ACA ACA GOCT AGC ACA GAC TCA 4A ACA GAT CM 0CA TCA GCA Ser A01 Thr Asp Gln Ala Ser A01 A01 Glu Gln Thr Gln Gly TOn lhr A01 SIr Thr Asp

1550

1540

1590

*

1630

TOT UI nr Ile

420

*

1500

1490

1480

GAT TAC COT AMG GCA GCT TAC GOT II0 AMA MACAM C AAAM AA GCT MT MC CAC GTT Asp Tyr Lou Lys Ala Ala Tyr Gly Ile Asp Lys Ash Asn Lys Asn Ala Asn Asn His Val

360

0A 0A ACA GAT CAA GCA GT GCA GoCG AcA GoCT ACA TCA GM CAG TCT GCT TCA ACT r Gin 1in SMr Ala SIr Thr Asp A1 A01 Thr Asp Gln Ala Val Alo A41 Thr A010 hr

*

*

*

1470

1460

*

AAC CAA GCA GTC TTG ACG GCT GAC CAA ASG ACT ACC MC CM GAT ACT GAG CM ACT TCT Asn Gln A10 Vol Ltu Thr Alo Asp Gln Tht Thr Thr Asn Gln Asp Thr Glu Gin Thr SIr

1440

1430

1420

AT C GT T GCT GAc cTT CTG CM ATc TeT AGT G0 T M GAc TCT ATC CGT GTT 0U GOG GTA Asp Ser Ile Arg Val Asp Ala Val Asp Asn Val Asp Ala Asp Leu Ltu Gln Ioe Ser Ser

300

GCT TCA GCT CTC GOT lCO TCA GOT GOT ATC (TICAGrACT CMACT GTT AOC GM MC AGC Al4 SMr Ala Ltu Gly A1 SMr Vol A41 Sir Ala Asp Thr Olu Thr Vol Ser Glu Asp Ser

*

MC GAC GTG GAT MC TCT MAT CCC ATC GTT CM GCA GAG CM CTI MC Asn Asp Val Asp Asn Ser Aon Pro Ile Val Gln Ala Glu Gln Lou Asn

*

250

1380

TOG CTG CAT TAC CTG CCC MC TIC GOT ACT ATC TA GCT AM0AGA GOTU0AT OCT MC m Trp Leu His Tyr Lou Ltu Asn Phe Gly Thr I1e Tyr Ala Lys Asp Ala Asp Ala Ann Phe

240

AG AT2 CAT AM OTC AM AMA AGA TOG0T ACT ATC TCA COTT CA TCT GCC ACT A TTA Lys Sot Hnis Lys Val Lys Lys Arg Trp V01 Thr Ile SIr Vol A01 Ser Alo Thr Not Leu

*

*

*

TIC CIT CTG OCT Phe Lou Lou Ala

10

G AAMG AAT "A OT TOT Sot Glu Lys AMn Glu Arg Phe

ACT cOCA T20 A02

ATT AG0 AG

200

TIA TOT MC TCT TAO

160

150

140

130 TM

CM

1370

*

OCT MC GAC COG TTA GS100G TAT GAG Ala Ann Asp Pro Lu Gly Gly Tyr Glu

1410

1400

1390 *

TM ATT GTA MT TOT GGT AAA ATT ACT TGA OCA TIA

1360

4273

cTG GT AMG

Ltu Val

CGC

Lys Arg 2520

2510

2500

AM 0 GTT GCT MT CC MAC GAC OTT &C AMn Asp Asp Lnu Lys Gly Val Ala Asn Pro

AAA

1210

1220

1230

1260

1250

1240

TICAT AmC CM TOT AGC GAA AAG 2CATAC GAT GAC MAC TTGCMMA 0T G0CC CIT Ser Glu Lys Pro Tyr Asp Asp HisLo Gln AMn lly Ala Lou Lys Ph Asp Ann 0ln SMr 1270

GAT Tno AcA CCA GAT AoG Asp Lou Thr Pro Asp Thr

1280 CM

1290

1310

1300 CoC

1320

Phb Lto

CTT

2580

A0T

CCA

A4

ACC CIA GCA GCT ACC GAT ACA GCA0 Asp Thr A1 Ser Arg Val Ala Ala

Thn

2630

2620

2610

i600

SMr

2570

2560

0 CCA GCA M CAT AC CM CAT Gln Val Trp Val Pro Val Gly Ala Ala Asp Asp Gln Asp

20A G0T TIC COT CM OTC TG

2590

CAA AMT AA COCA ACT TSG MC TAOGT TTG SMr Asn Tyr Arg Lto Ltu AMn Arg TO Pro Tn AMn Gln CTC

2550

2540

2530

GIT TCT Gln Val Ser Gly

CAG

CAT Asp

I1e

2640

GAM CT GCC A10 Ala

00 AM 24 CTC CAT CM Cly Lys Ser Lu His Gln Asp

mC

2650

2690

2680

2670

2660

2700

00 GT M G0T TIC TCT MC TrC CM TOT OGC 404M CM ACA AT0 GAC ICT COC GTC ATG Sot Asp Sr Arg Val Sot Phe Glu Gly Phe Ser Asn Phe Gln Ser Phe A4a Thn Lys Glu

2720

2710

TAT A t.lu Glu Tyr Thr

GM GAG

MT OT

Ann

G Val Val

2730

ATT OCT

210

Ala

MC

2740

MAT CT MAM

Ann AMn Val

Asp Lys

2750 m

Phs

TT

2760

TOA TOG G0A ATC

Val Ser Trp Gly

I1-

Continued on following page

4274

J. BACTERIOL.

FERRETTI ET AL. 2780

2770

Gln Tyr Val Ser Ser Thr

Pro

2840

2830

286

2850

TSAT GCC TOT Val Ile Gln A n Gly Tyr Ala Phb GT

All

Thr

*

*

CAG OCT AT Gln Ala Asn

Ala

2970

2980

*

*

*

Ala

Lys

CT

COT

Gly Lou

Leo

3070

3080

3090

3100

3110

*

*

*

*

*

CTC TAC G0A ACA GAT ACA AAG AGC TCS S GOAT G0c TAT CAA GoC AAA Tyr Val TAr Asp TAr Lys Ser Ser Gly Asp Asp Tyr Gln Ala Lys

Lau

3160

3150

3140

3130

GAA AAA TAT CCA GAA CTC TT ACC Pha Leu Asp Glu Lou Lys Glu Lys Tye Pro Glu Lau Phe Thr TIC COT GAC GAA TIA AAM

3220

3210

3200

3190

TAc

GTT

3230

MG GTA TTG Lys Val Lsu

Thr

Gly

4460

3260

AGC

MT

His

Gln

Val

Ser Asn Ile Lau Gly

Gly 3320

3310

TAT

Vol

3330

3340

3350

MC TCT

Arg Tyr Tyr Asp

Asn

Ala

*

OAT ACT GT CM AA T GCT MT CCA TAC TIC GOT Tyr Tyr Phs Gly Ser Asp Gly Thr Ala Gln Thr Gin Ala Asn Pio

3400

3410

MO

GOT

CM

Lys

TAr

Lys Gly

Gln

ACC

TTTAMOG

TAr

3440

*

3450

*

3460

Al1

3470

G0A

TCT

3510

4760

GCC TT CSC MT ACA Ala Lou Arg Asn Thr

OT

Val

T 14S01SG Tyr Thr Asp

3570

3580

ACT

GGC CT

G

*

3640

**

*

*

3650

3680

3690

3700

3740

3750

3760

3770

3660

3720

3780

MO GTT COT TAC STC GAM MO AT ATI GTC ACC CGT GAT GAT G0T GOT CAAMC AAG Asp Gly Val Gln Ala Lys Asp Lys Ile Ile Vol Thr Arg Asp Gly Lys Val Arg Tyr Phb

3800

3790

3810

3820

3830

3840

AAM ACC TIC GTC GOT 0A8 MO ACT GGT CAC TOG GAC CAA CAT MT G0A MAAT GT TA ACC Asp Gln His Asn Gly Aga Ala Val Thr Asn Thr Phe Val Ala Asp Lys Thr Gly His Trp 3840

3850

A3

*

*

3870

3080

3890

3900

*

*

CM MT GOT OCT MA CM CAC AA 0GA GTC OM GTT AC1 GOT G 0T T1C TAT CTA GSSM Tyr Tyr Lou Gly Lys Asp Gly Val Ala Val Thr Gly Ala Gln Thr Val Gly Lys Gln His 3920

3910

3930

3940

3950

3960

CTT TAC TTC GM 0CC MT GOT CM CM GTr AMG GOT G04C TOT C1 0A GCC MAM GAT GG Leu Tyr Phe Glu Ala Asn Gly Gln Gln Val Lys Gly Asp Phe Val Thr Ala Lys Asp Gly 3970

3980

3990

4000

4010

4020

TOT 00C 01G TOO 0C MT A1 TIc AT1 GAA G08 AM CTT SAC TTC T0C GAT GTT Lys La Tyr Phe Tyr Asp Val Asp S r Gly Asp lit Trp Thr AMn Thr PFs Ile Glu Asp

4030 *

4040

A4

4050

*

4060

4070

*

*

4080

ACA GGT GC CAA ACC MOG 0A G0C MC TGG TIC TAT COT GOT MMA AT G0 A Ge G 0C Lys Ala Gly Asn Trp Phe Tyr Lau Gly Lys Ap Gly Ala Ala Val Ahr Gly Ala Gln Thr 4090

4100

4110

4120

4130

4140

Am 1MG G0C CM MA COT 1AC TIC AAG 0T MC 0C0CM CMGT AM GO A ATCG1C I1e Lys Gly Gln Lys Lou Tyr Phe Lys Ala A n Gly Gln Gln Val Lys Gly Asp Ile Val 4150

4160

4170

4180

4190

4200

OM GA1 CA0CM GAT OSAT 0SMG AZT OG0 TAC TAC ACT G GAA CM 0T1 AAT Lys Asp Ala Asp Gly Lys I1 Arg Tyr Tyr Asp Ala Gln Thr Gly Glu Gln Val Ph Asn

MG

GT

Lys Val

TAC TOT Tyr Pbs

Val

4800 GOT

ATc

CM

Asp Lys Asn Gly

Ile

Gln

GU AMA

4840

4830

4820 ACT

MT

Thr

4860

4850

M TCT GAT GOT MO GOTC 1GC TAC TTT MU TOT Ser Asp Gly Lys Val Arg Tyr Phe Asp Glu Asa SOr

4890

4880

4900

4920

4910

*

*

4940

GAT GOT GOGT GOGTC Asp Gly Ala Ala Val

3600

MAT GOT GTC ATG GCA COT 00C CTC ACA ACC GTS GAT G00 CAC GTr CAA TA4 IT GAT MA Asn Gly Val eIt Ala Lau Gly LAo Thr Thr Val Asp Gly His Val Gln Tyr Phb Asp Lys 3730

Gly Lys

4790

*

CMA MOG

*

Ala

* * ~~~ ~~A2

3710

Lys Asp

GOr AGC AG 0T1 ACC MC CM ToG AMA TT GTT TAC G0A CM AT TAC TAT TIC G0T AGT Gly Ser HIt Ile TAr Asn Gln Trp Lys Phb Val Tyr Gly Gln Tyr Tyr Tyr Phe Gly Ser

3540

TAC TIC AAG GGr AAA CGr TAc GAA MST 0G TAC CAA CM TT GG0 MT GAC AGC TOG Gly Lys Arg Tyr Glu Asn Gly Tyr Gln Gln Phe Gly AMn Asp SOr Trp Arg Tyr Phs Lys 3670

GOT MC

Gly AMn Gln

MO

4870

4960

4950

TAC

CST G0C

Tyr

Arg

Gly

Ac Trp Asn

MAA GAT cTG AG0

TOO

4980

4970

TM Am

TAT

crc

GAC

---

49iO

1CC CAA GOT CM MC CA 11AC TAT 01C 0C GAC Gln Gly Gln AMn His Tyr Tyr Gly Ain Asp

3630

MAMA GTA

*

ACA GTA

A0 A G00 MOG GCGOTA Ala Lys Gly Lys Ala Val Arg GCC

Ala

3620

3610

4740

GAC

3480

3530

3590

MO

4780

4770

*

*

TTG

4950

3560

Gln

4730

*

TGl CTT T1C OTT Gly Gln Trp Lou Tyr Val

CM GOT CM

3420

C MC AT1 MOG 00A TOC 4C0 TAT 1AC TC CSO GST M A40 GTG Ac GOT G Hst Val Thr Gly Ala Gln AMn Ile Lys Gly SOr AMn Tyr Tyr Phe Lou Ala A n Gly

3550

AB

*

*

AGT G0A TOG TAT MMA AT GCC Gly Ser Gly Trp Tyr Lyt AMn Ala GOT

3360

*

3520

4720

4710

4700

*

4AC

MO

Lys Tyr

ATn AM GM G01 GG MT COT SA SC TOT G CAA GA GG0 AC C1AT MA Val Thr Asp Ser Phb Ile Thr Glu Ala Gly Asn Leu Tyr Tyr Phb Gly Gln Asp Gly Tyr 3500

GOT

3300

GUSA

3490

*

*

11A CAG TAT CTT CGT TIc TAC MT CTT CM Pbh Lys Asp Gly Ser Gly Val Lou Arg Phe Tyr AMn Lou Glu Gly 01n Tyr Val Ser AT

4810

*

4680

4670

4660

4650

4640

4630

1CA GSS AOC COC TOT GAT G0A ACT 0G0 TAT G0C TAC MC TCA AMC ACA ACA GGT GAA AAG Ser Gly Ile Arg Phs Asp Gly Thr Gly Tyr Val Tyr A n Ser SOr Thr Thr Gly Glu Lys 3430

*

ACT TAC

Lau Thr Gly Leu Gln Thr Val 3390

4620

4610

4600

4590

*

*

62

GGC GAC CAG OCT TIC MC MG TCT GTM ACT GTT MT G0C Gly Asp Gln Ala Phe AMn Lys Ser Val Thr Val Asn Gly

Ser

4580

4570

CTC MST GTSTCA GAT GAT MA CTC TIC STO CCA AAA ACT CSC CTA GG CAA GTC GTA CAA Leu Asn Val SOr Asp Asp Lys Lau Ph Leou Pro Lys Thr Leu Lau Gly Gln Val Val Glu 3380

*

Lou

Lys

4560

4550

4540

MO

MG

4750

3370

GOT CM

4530

4520

4690 Ala

4500

Hindlul* CcT

*

TOG 01A ACT 0T1 MC GAT GOT Lys Gly Gln Lou Val Thr Gly AMn Asp G1y

GTC MM

COC TAC TAT GAT GCC

3240

3290

GTC C0C AMC GAC CM G0A AMC MC Asp Tyr LAu SOr Asp Gln Ala SOr AMn

A0C CTT G0C COGGOTr01 GAT Arg

3280

3270

*

4490

4480

4470

CM

*

3250

4440

GCT ACC ATT G0A MT CM COGA GTT TAC Ala Gln Thr I1e Gly AMn Gln Arg Val Tyr CM

*

TIC MC GOT G0C CM 1cc A1A0ATCCM TCT GTT MO ATS MA CM TOG TOT GCT MO Gly Gln Ala Ile Asp Pro Ser Val Lys Ile Lys Gln Trp SOr Ala Lys Tyr Phe Asn Gly

*

*

GOT

4510

AA CAA A4C TC1 ACC Lys Lys Gln I1e Ser Thr

4380 AS

4430

4420

4410 ACT

MT GOC CAT

MO

3180

3170

Gly

GAT Phs Lys Asp AMn Gly

MO

*

*

*

*

Ala

4320

4370

4360

4350

4400 GOT

4450

TIC

G0C o01 GCC

Gly

TC0

MO

Tyr Val Lys Ser

3120

Tyr Gly

CM ACT

Gln Thr

ACT

*

*

TOT ATC AAT CAC AGT G0 OAT AAM A4 A CAA coO GSC AAA CCA ATC GSA AC4 GTTS Thr Vol TAr Arg Thr Asp Lys Phb Gly Lys Pro I1e Ala Gly SOr Gln Ile Asn His SOr

GCT

Thr Ala

4310

*

*

4390 TAT

3060

3050

3040

3030

3020

Gly

4300

*

4340

*

GTT CCA 4AC CAA AT0 TAC ACC TIC CT AAA CMA "A GIG G01 AAG GOT ATG GCA GAC Lys Val Hst Ala Asp Trp Val Pro Asp G0n lit Tyr Thr Phb Pro Lys Gln Glu Val Vol TOG

3010

Ser Asp

GM GOT CAG TAT GTA TCA GOT AGT G0A TGG TAT G0M ACA GCA GAG CAC GM TOO GOT Glu Gly Gln Tyr Val Ser Gly Ser Gly Trp Tyr Glu Thr Ala Glu His Glu Trp Val

3000

2990

GU 00A

GGT CM ACC TT MOG GAT GOT TCT 00A GOT COT COT TTC TAC AT Lys Gly Gln Thr Phe Lys Asp Gly Ser Gly Val Lsu Arg Phe Tyr A.n

*

AAA G00

Gly

4290

4280 *

Pro

*0

CTC CM Lou Ris

GOT MT

CCA MG

4330

2940

2930

2920

*

GOT

*

*

2960

*

GTA AGT 4270

2880

2870

*

AAG TAT G0S ACA GCC GAC CAA OTG GTT AAG GCT AC Lyt Tyr Gly Thr Ala Asp Gln Lau Val Lys Ala Ile Lys 2950

Gln Phb LAu Asp SOr

GAC CGT TAT GAC OTG G0 A1G C1 MA GCA AAC Asp Arg Tyr Asp Lou 0ly Nost SOr Lys Ala AMn 2910

2900

2890

AG TCT

4250

4240

4230

*

MC GOT MG ACT TAC TAC TIC Lys Ser Val Oer Val Asn Gly Lys Thr Tyr Tyr Phs

CITT GAT ICT

*

*

*

GTC ASSCMAA AAT

Asp Gly

BI

*

*

*

TIC

4220

4210

2820

2810

2800

2790

*

CCr CAG TAT GTC 1CA TC AC0 GAC GOT CG

ACT GAC TOT G0A ATG G0 Thr Asp Phe Glu Mst Ala

CM

G0C AMA

MO

cO

FIG. 3. Nucleotide sequence of the gtjp gene and flanking regions. Numbering begins at the 5' end of the sequence. The deduced amino acid sequence of glucosyltransferase is given below the nucleotide sequence; an arrow designates the cleavage site for the removal of the signal peptide. Putative promoter and ribosomebinding site sequences are underlined. The starts of repeat regions Al to A6, Bi, and B2 are marked.

investigate whether the gtJp gene product was indeed cleaved at this site, the enzyme expressed in E. coli MAF5 was purified by affinity chromatography and subjected to N-terminal amino acid analysis. The first 16 amino acids were

identified

as

Asp-Thr-Glu-Thr-Val-Ser-Glu-Asp-Ser-

sequence is identical to that of the deduced amino acid sequence directly following the postulated cleavage site. Thus, the signal peptide is 38 amino acids long and contains regions similar to those found in most secretory signal sequences (20). The mature glucosyltransferase contains 1,559 amino acids and has a molecular weight of 172,983. The deduced amino acid composition of glucosyltransferase with and without the signal peptide is presented in Table 1. Amino acid sequence homology. A series of repeating units is located in the C-terminal one-third of the glucosyltransferase molecule (Fig. 4). One of the repeating units, designated A, is 35 amino acids long and is present six times. Although these repeats are hot completely identical, repeating unit A4 was found to have the greatest homology with

Asn-Gln-Ala-Val-Leu-Thr-Ala; this

DNA

VOL. 169, 1987 TABLE 1. Amino acid composition of glucosyltransferase deduced from the nucleotide sequence of gtfl No. of residuesa

Amino acid

With signal peptide

Without signal peptide

143 43 109 137 65 93 135 19 50 92 122 27 64 30 106 124 25 98 115 1,597

136 41 108 137 63 93 134 18 49 90 117 24 63 30 101 122 24 98 111 1,559

Alanine Arginine Asparagine Aspartic acid Glutamic acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine Total a

Mr with signal peptide,

177,100;

Mr without signal peptide,

172,983.

each of the other five repeating units. Based on a comparison with A4, the homologies were as follows: Al, 65%; A2, 38%; A3, 72%; A5, 65%; and A6, 50%. The gene segments corresponding to these regions are also highly conserved and except for repeat A2, which contained only 38% identical bases, the repeats contained 65 to 72% identical bases. Repeating unit B is present twice and contains 48 amino acids, all of which are identical. The corresponding gene regions contain a stretch of 132 identical bases. Functional regions of glucosyltransferase. Since functional glucosyltransferase was expressed from both pMLG1 and pMLG5, the terminal 0.5 kb of the gtpl gene is clearly not essential for activity. However, we have previously reported that a deletion extending from the end of the gene to the Sacl site at position 3085 resulted in expression of a truncated and enzymatically inactive peptide (27, 27a). To further define the length of the gene sequence required for expression of a functional glucosyltransferase, a series of derivatives of phage M13 containing various lengths of gtf7 were examined for expression of peptides which had enzyme activity and the ability to bind to glucan (Fig. 5). The M13 derivatives R7-20, R5-2, and R7-3 all formed polymer when plated with E. coli JM109 on sucrose-containing medium (Fig. 6). The two shortest M13 derivatives tested, R7-34 and R3-6, did not form any polymer either on plates or in a tube assay for glucosyltransferase using 14C-labeled sucrose. Nor did they release reducing sugar from sucrose, whereas the longer derivatives did encode enzyme activity for the release of reducing sugars, as indicated by the fact that when they were plated on E. coli JM109 on sucrose indicator plates (Russell et al., in press), acid was produced by E. coli. The same pattern of function was found when glucan-binding ability was examined; derivatives R7-20, R5-2, and R7-3 all expressed peptides which were retained by a mutan-Sepharose column, whereas R7-34 and R3-6 did not. The results presented above indicated that the genetic information essential for enzyme activity and glucan-binding function was located in the C-terminal one-third of the gene. An in-frame gene fusion was therefore made between pUC8 and the ScaI site of gtpl located at position 3290. The

SEQUENCE OF S. SOBRINUS gtfl

4275

A

1100 YYFGQDGYMVTGAQNIKGSNYYFLANGAALRNTVY 1163 1228 1293 1406

WRYFKNGVMALGLTTVDGHVQYFDKDGVQAKDKII YYLGKDGVAVTGAQTVGKQHLYFEANGQQVKGDFV FYLGKDGAAVTGAQTIKGQKLYFKANGQQVKGDIV WVYVKSGKVLTGAQTIGNQRVYFKDNGHQVKGQLV

1519 WLYVKDGKVLTGLQTVGNQKVYFDKNGIQAKGKAV B 1352 VNGKTYYFGSDGTAQTQANPKGQTFKDGSGVLRFYNLEGQYVSGSGWY 1464 VNGKTYYFGSDGTAQTQANPKGQTFKDGSGVLRFYNLEGQWYVSGSGWY

FIG. 4. Amino acid sequences of A and B repeating units found in the C-terminal region of the glucosyltransferase protein. The numbers at the left indicate the position number of the first amino acid of each repeating unit.

resultant recombinant plasmid pSF86 expressed a 65kilodalton peptide which reacted with antiserum to glucosyltransferase, indicating that the entire C-terminal one-third of the enzyme was being made. This peptide had no detectable glucosyltransferase activity but did bind to the affinity column. Protein homology. Comparison of the deduced amino acid sequence of glucosyltransferase with other sequenced proteins revealed partial homologies with three proteins: alphaamylase from barley, alpha-amylase from Bacillus amyloliquefaciens, and glycogen phosphorylase from rabbits (Fig. 7). The homologies of the glucosyltransferase with the two alpha-amylases overlap in the same general region, suggesting a region of functional homology for the three proteins. DISCUSSION Nucleotide sequence analysis of the 5-kb fragment of S. sobrinus MFe28 showed the presence of a single open reading frame which coded for the glucosyltransferase protein. The deduced amino acid sequence of the mature protein had a molecular weight of 172,983, which agreed closely with the value derived by SDS-PAGE. This value varies from Base

3457

3646

3841

4036 4213

4375 4552

4713

Amino acid

1100

1163

1228

1293 1352

1406 1464

1519

Al

A2

A3

A4 _

B1

-

A5

-6_

82

AS

_w%A

GTF GBP

pMLG5 pMLG 1 R7-20 R5-2

+

+

+

+

R7-3 R7-34 R3-6

pSF86

_

+

FIG. 5. The 3' end of the gtp gene showing positions of repeat regions and termination points of phage M13 derivatives and pSF86, a pUC8 vector carrying the terminal ScaI-HindIII fragment of gtfl. Adjacent to each fragment is shown the ability of its product to exhibit glucosyltransferase (GTF) activity or to function as a glucanbinding protein (GBP).

4276

FERRETTI ET AL.

J. BACTEPIOL.

FIG. 6. Accumulation of glucan above points where E. coli JM109 was infected with phage M13 derivative R7-20 on sucrosecontaining medium.

previous estimates for glucosyltransferase that produce insoluble glucans (3, 18), although proteolytic degradation and problems associated with molecular weight determinations by gel analysis could easily account for the differences. The deduced amino acid composition of the glucosyltransferase indicates that it is a highly hydrophilic protein, containing 11.5% basic amino acids, 12.6% acidic amino acids, and 41.6% polar amino acids. The restriction map generated from nucleotide sequence analysis is in agreement with previous maps established for this fragment and also supports previous speculations concerning the location of probable transcription and translation initiation sites (27, 27a). These sites are similar to transcription and translation sites reported for other streptococcal genes (6, 7, 14, 17). Downstream of the coding region, a single termination codon is present, but insufficient sequence

is available to comment further about sequences involved in transcription termination. The presence of a 38-amino-acid signal peptide was confirmed by N-terminal amino acid analysis of the purified glucosyltransferase protein, in which the sequence of the first 16 amino acids was identical to the deduced sequence. This signal peptide has properties similar to those of other signal peptides (20), i.e., a positively charged N-terminal region followed by a string of 23 hydrophobic amino acids and a more polar C-terminal region. The cleavage site between Ala and Asp and the surrounding residues are in accordance with the -3, -1 rule proposed by von Heijne (31). The 38-amino-acid signal peptide of glucosyltransferase is in the general size range of other streptococcal signal peptides (6, 14, 17, 32) and that reported for other grampositive organisms (20). It is apparent that E. coli is capable of recognizing the gtfl gene product and cleaving it at the site expected for removal of a secretion signal peptide. Other evidence suggests that the enzyme passes through the cytoplasmic membrane. For example, E. coli strains expressing glucosyltransferase can metabolize sucrose (27a), although sucrose can pass through only the outer membrane and not the cytoplasmic membrane (5). Glucosyltransferase would thus be expected to accumulate in the periplasmic space, but we have been unable, using conventional osmotic shock methods for release of periplasmic proteins (13, 34), to obtain release of the protein. In view of the observation that much of the C-terminal region of glucosyltransferase is not essential for function, it is tempting to speculate that the commonly observed heterogeneity of molecular sizes in enzyme preparations (18, 25) is due to sequential degradation by proteolytic action on this end of the molecule. As yet, however, there is insufficient evidence to confirm this idea. The C-terminal region of the glucosyltransferase protein contains two sets of repeated sequences, the A repeating unit present six times and the B repeating unit present twice. The A repeating units exhibit some variability, whereas the B repeating units are completely identical. The manner or sequence in which these duplications and changes occurred is not obvious. However, duplications in other streptococcal

Alpha-amylase (EC 3.2.1.1)

69a

HSVIQNGYAFTDRYDID---ASKYGNAAELKSL *::

*-::,

*

::--

:-:-:*

840

FQSFATKEEEYTNVVIANNVDKFVSWGITDFEMAPQYVSSTDGQFLDSVIQNGYAFTDRYDLGMSKANKYGTADQLVKA

40

FEWYTPNDGQHWK-RLQNDAEHLSDIGITAVWIPPAYKGLSQSD-NGYGPYDLYDLGE-FQQKGYVRTKYGTKSELQDA

looa

IGALHGKGVQAIADIVINHRCA

919

IKALHAKGLKVMADWVPDQMYT

116b

IGSLHRRNVQVYGD

Glycogen phosphorylase (EC 2.4.1.1) 1107

YMVYGAQNIKGSNYYFLANGAALRNTVYTD-AQGQNHYYGNDGKRYENGYQQFGNDSWRYFKNGVMALGLTTVDGHVQY * --:-

37c

---

:--:--

--:-.

.

.

*0::::--:

::

:

*---::----: .. ..

FTLVKNRNVATPRDYYFAHALTVRDHLVGRWIRTQQHYYEKDPKRI--YYLSLQFYMGRTLQNTMVNLALENACDEADY

FIG. 7. Alignment of predicted amino acid sequences of glucosyltransferase with regions of alpha-amylase from barley (a), alpha-amylase from Bacillus amyloliquefaciens (b), and glycogen phosphorylase from rabbit (c). The sequences were aligned by the PRTALN program of Wilbur and Lipmann (33). Identical (:) and conserved ( ) amino acids are indicated. The numbers at the left indicate the position number of the first amino acid of each protein shown.

VOL. 169, 1987

genes and proteins have been recently reported, e.g., the group A type 6 M protein (14) and the group B immunoglobulin G-binding protein (6, 11). The functional role of the repeating units is not clear, although the region containing them is essential both for glucosyltransferase activity and for binding of glucans. It is of interest that the part of the glucosyltransferase protein showing homology with glycogen phosphorylase spans the region containing the first two A repeat units. This region of glycogen phosphorylase is thought to be involved in substrate binding or catalysis (21) but is distinct from the region (amino acids 401 to 443) thought to be involved in the storage site which binds heptamylose (10). The regions of homology with the alpha-amylase proteins are found further upstream, not too distant from the repeating unit. The common feature of all three enzymes is the ability to bind to glucans, and it seems likely that the identified regions of glucosyltransferase homology are also involved in or essential for glucan binding. The relationship of the gtJf gene to other known glucosyltransferase genes is of considerable interest, especially in view of the different restriction maps reported (1, 22, 24, 27a) and the evolutionary distance between S. mutans, (G+C content, 36 to 38%) and S. sobrinus (G + C content, 44 to 46% (4). In the accompanying paper, Shiroza et al. report a sequence analysis of the S. mutans gtJB gene, which specifies a glucosyltransferase that also produces insoluble glucans (30). ACKNOWLEDGMENT We thank David R. Lorenz for his many contributions to this work and Tricia Stalker for excellent technical assistance. This research was supported by Public Health Service grant DE 08191 from the National Institutes of Health and by the Medical Research Council.

1.

2. 3.

4.

5.

LITERATURE CITED Aoki, H., T. Shiroza, M. Hayakawa, S. Sato, and H. K. Kuramitsu. 1986. Cloning of a Streptococcus mutans glucosyltransferase gene coding for insoluble glucan synthesis. Infect. Immun. 53:587-594. Burne, R. A., B. Rubinfeld, W. H. Bowen, and R. E. Yasbin. 1986. Cloning and expression of a Streptococcus mutans glucosyltransferase gene in Bacillus subtilis. Gene 40:201-209. Ciardi, J. 1983. Purification and properties of glucosyltransferases of Streptococcus mutans: a review, p. 51-64. In R. J. Doyle and J. E. Ciardi (ed.), Glucosyltransferases, glucans, sucrose and dental caries (a special supplement to Chemical Senses). Information Retrieval Limited, Washington, D.C. Coykendall, A. L., and K. B. Gustafson. 1986. Taxonomy of Streptococcus mutans, p. 21-28. In S. Hamada, S. M. Michalek, H. Kiyono, L. Menaker, and J. R. McGhee (ed.), Molecular microbiology and immunobiology of Streptococcus mutans. Elsevier Science Publishing, Inc., New York. Decad, G. M., and H. Nikaido. 1976. Outer membrane of gram-negative bacteria. XII. Molecular-seiving function of cell wall. J. Bacteriol. 128:325-336.

6. Fahnestock, S. R., P. Alexander, J. Nagle, and D. Filpula. 1986. Gene for an immunoglobulin-binding protein from a group G streptococcus. J. Bacteriol. 167:870-880. 7. Ferretti, J. J., K. S. Gilmore, and P. Courvalin. 1986. Nucleotide sequence analysis of the gene specifying the bifunctional

6'-aminoglycoside acetyltransferase 2"-aminoglycoside phosphotransferase enzyme in Streptococcus faecalis and identification and cloning of gene regions specifying the two activities. J. Bacteriol. 167:631-638. 8. Gilmore, M. S., K. S. Gilmore, and W. Goebel. 1985. A new strategy for "ordered" DNA sequencing based on a novel

DNA SEQUENCE OF S. SOBRINUS gtfl

4277

method for the rapid purification of near milligram quantities of a cloned restriction fragment. Gene Anal. Tech. 2:108-114. 9. Gilpin, M. L., R. R. B. Russell, and P. Morrissey. 1985. Cloning and expression of two Streptococcus mutans glucosyltransferases in Escherichia coli K-12. Infect. Immun. 49:414-416. 10. Goldsmith, E., and R. J. Fletterick. 1983. Oligosaccharide conformation and protein saccharide interactions in solution. Pure Appl. Chem. 55:577-588. 11. Guss, B., M. Eliasson, A. Olsson, M. Uhlen, A.-K. Frej, H. Jornvall, J.-I. Flock, and M. Lindberg. 1986. Structure of the IgG-binding regions of streptococcal protein G. EMBO J. 5: 1567-1575. 12. Hamada, S., and H. D. Slade. 1980. Biology, immunology, and cariogenicity of Streptococcus mutans. Microbiol. Rev. 44: 331-384. 13. Hazelbauer, G. L., and S. Harayama. 1979. Mutants in transmission of chemotactic signals from two independent receptors of E. coli. Cell 16:617-625. 14. Hollingshead, S. K., V. F. Fischetti, and J. R. Scott. 1986. Complete nucleotide sequence of type 6M protein of the group A streptococcus. J. Biol. Chem. 261:1677-1686. 15. Konat, G., H. Offner, and J. Mellah. 1984. Improved sensitivity for detection and quantitation of glycoproteins on polyacrylamide gels. Experientia 40:303-304. 16. Kuhn, S., H.-J. Fritz, and P. Starlinger. 1979. Close vicinity of ISI integration sites in the leader sequence of the gal operon of E. coli. Mol. Gen. Genet. 167:235-241. 17. Malke, H., B. Roe, and J. J. Ferretti. 1984. Nucleotide sequence of the streptokinase gene from Streptococcus equisimilis H46A. Gene 34:337-362. 18. Mukasa, H. 1986. Properties of Streptococcus mutans glucosyltransferases, p. 121-132. In S. Hamada, S. Michalek, H. Kiyono, L. Menaker, and J. R. McGhee (ed.), Molecular microbiology and immunobiology of Streptococcus mutans. Elsevier Science Publishing, Inc., New York. 19. Muller-Hill, B., L. Crapo, and W. Gilbert. 1968. Mutants that make more lac repressor. Proc. Natl. Acad. Sci. USA 59: 1259-1264. 20. Oliver, D. 1985. Protein secretion in Escherichia coli. Annu. Rev. Microbiol. 39:615-648. 21. Palm, D., R. Goerl, and K. J. Burger. 1985. Evolution of catalytic and regulatory sites in phosphorylases. Nature (London) 313:500-503. 22. Pucci, M. J., K. R. Jones, and F. L. Macrina. 1987. Evidence for a duplicated DNA sequence associated with a glucosyltransferase gene in Streptococcus mutans, p. 205-208. In J. J. Ferretti and R. Curtiss III (ed.), Streptococcal genetics. American Society for Microbiology, Washington, D.C. 23. Pucci, M. J., and F. L. Macrina. 1986. Molecular organization and expression of the gtfA gene of Streptococcus mutans LM7. Infect. Immun. 54:77-84. 24. Robeson, J. P., R. G. Barletta, and R. Curtiss III. 1983. Expression of a Streptococcus mutans glucosyltransferase gene in Escherichia coli. J. Bacteriol. 155:211-221. 25. Russell, R. R. B., E. Abdulla, M. L. Gilpin, and K. Smith. 1986. Characterization of Streptococcus mutans surface antigens, p. 61-70. In S. Hamada, S. Michalek, H. Kiyono, L. Menaker, and J. R. McGhee (ed.), Molecular microbiology and immunobiology of Streptococcus mutans. Elsevier Science Publishing, Inc., New York. 26. Russell, R. R. B., D. Coleman, and G. Dougan. 1985. Expression of a gene for glucan-binding protein from Streptococcus mutans in Escherichia coli. J. Gen. Microbiol. 131:295-299. 27. Russell, R. R. B., and M. L. Gilpin. 1987. Identification of virulence components of mutans streptococci, p. 201-204. In J. J. Ferretti and R. Curtiss III (ed.), Streptococcal genetics. American Society for Microbiology, Washington, D.C. 27a.Russell, R. R. B., M. L. Gilpin, H. Mukasa, and G. Dougan. 1987. Characterization of glucosyltransferase expressed from a Streptococcus sobrinus gene cloned in Escherichia coli. J. Gen. Microbiol. 133:935-944. 28. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci.

4278

FERRETTI ET AL.

USA 74:5463-5467. 29. Segrest, J. P., and R. L. Jackson. 1972. Molecular weight determinations of glycoproteins by polyacrylamide gel electrophoresis in sodium dodecyl sulphate. Methods Enzymol. 28: 54-63. 30. Shiroza, T., S. Ueda, and H. K. Kuramitsu. 1987. Sequence analysis of the gtjB gene from Streptococcus mutans. J. Bacteriol. 169:4263-4270. 31. von Heijne, G. 1983. Patterns of amino acids near signal sequence cleavage sites. Eur. J. Biochem. 133:17-21. 32. Weeks, C. R., and J. J. Ferretti. 1986. Nucleotide sequence of the type A streptococcal exotoxin (erythrogenic toxin) gene

J. BACTERIOL.

from Streptococcus pyogenes bacteriophage T12. Infect. Immun. 52:144-150. 33. Wilbur, W. J., and D. J. Lipman. 1983. Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. USA 80:726-730. 34. Witholt, B., M. Boekhout, M. Brock, J. Kingma, H. van Heerikhuizen, and L. de Lelj. 1976. An efficient and reproducible procedure for the formation of spheroplasts from variously grown Escherichia coli. Anal. Biochem. 74:160-170. 35. Yanisch-Perron, C., J. Vieira, and J. Messing. 1985. Improved M13 phage cloning vectors and host strains: nucleotide sequences of M13 mpl8 and pUC19 vectors. Gene 33:103-119.