MC CM GOT mTT AC0 GAT GAG ATG TTA &A GM GOC 00G MT GTG GCT ACT. CT GMA TCT. Asn Gln Val Phe TOr Asp Glu Set Lou Ala Glu Ala Lys Asn Vol ...
Vol. 169, No. 9
JOURNAL OF BACTERIOLOGY, Sept. 1987, p. 4271-4278
0021-9193/87/094271-08$02.00/0 Copyright © 1987, American Society for Microbiology
Nucleotide Sequence of a Glucosyltransferase Gene from Streptococcus sobrinus MFe28 JOSEPH J. FERRETTI,'* MARTYN L. GILPIN,2 AND ROY R. B. RUSSELL2 Department of Microbiology and Immunology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma 73190,1 and Dental Research Unit, Royal College of Surgeons of England, Downe, Kent BR6 7JJ, United Kingdom2 Received 3 April 1987/Accepted 2 June 1987 The complete nucleotide sequence was determined for the Streptococcus sobrinus MFe28 g#f gene, which encodes a glucosyltransferase that produces an insoluble glucan product. A single open reading frame encodes a mature glucosyltransferase protein of 1,559 amino acids (Mr, 172,983) and a signal peptide of 38 amino acids. In the C-terminal one-third of the protein there are six repeating units containing 35 amino acids of partial homology and two repeating units containing 48 amino acids of complete homology. The functional role of these repeating units remains to be determined, although truncated forms of glucosyltransferase containing only the first two repeating units of partial homology maintained glucosyltransferase activity and the ability to bind glucan. Regions of homology with alpha-amylase and glycogen phosphorylase were identified in the glucosyltransferase protein and may represent regions involved in functionally similar domains.
The glucosyltransferases (EC 2.4.1.5) produced by various species of oral streptococci are of considerable interest because of their production of extracellular glucans from sucrose. These glucans are thought to play a key role in the development of dental plaque because of their ability to adhere to smooth surfaces and mediate the aggregation of bacterial cells and food debris (12). It is known that a single strain can produce several distinct glucosyltransferases differing in electrophoretic, antigenic, or enzymatic properties, although some of this apparent variety may be due to the use of different oral streptococcal strains and different purification procedures and activity assays by different laboratories. The properties and characteristics of the glucosyltransferases of the mutans group streptococci have been reviewed by Ciardi (3) and Mukasa (18). Recently, several glucosyltransferase genes from various strains of streptococci have been cloned by recombinant DNA techniques and have been shown to be expressed in Escherichia coli. Robeson et al. (24) have cloned a glucosyltransferase gene (gtfA) from Streptococcus mutans UAB90 (serotype c) and shown that it produces a protein with a molecular weight of 55,000. A similar gtfA gene has also been cloned by Pucci and Macrina (23) from S. mutans LM7 (serotype e) and by Burne et al. (2) from S. mutans GS5 (serotype c). Aoki et al. (1) reported the cloning of a glucosyltransferase gene (gtfB) from S. mutans GS-5 that produces a protein with a molecular weight of about 150,000. Another glucosyltransferase gene, gtfC, which specifies a 150,000-molecular-weight polypeptide has been obtained from S. mutans LM7 by Pucci et al. (22). Finally, Gilpin et al. (9) have cloned two glucosyltransferase genes from Streptococcus sobrinus MFe28 (serotype h): gtfS, which encodes a glucosyltransferase that synthesizes a watersoluble glucan, and gtfl, which encodes a glucosyltransferase that synthesizes a water-insoluble glucan. The availability of these cloned genes allows further characterization of both the genes and gene products, and in this communication, we report the complete nucleotide sequence of the gtfl gene from S. sobrinus MFe28. *
MATERIALS AND METHODS Bacteria and media. E. coli MAF1 (containing plasmid pMLG1) was the initial source of the S. sobrinus MFe28 gtfp gene (27a). E. coli MAF5 contains plasmid pMLG5, which has the same 5.0-kilobase (kb) fragment as pMLG1 and an additional 0.5-kb fragment from the bacteriophage lambda recombinant in which the gtfp insert was first cloned (9). E. coli JM109 was used as the recipient for transfection experiments with M13 bacteriophage vectors (35) and was routinely grown in 2x YT broth (19). Soft agar overlays consisted of 2x YT broth supplemented with final concentrations of 0.75% agar, 0.33 mM isopropyl-3-D-thiogalactopyranoside, and 0.02% 5-bromo-4-chloro-3-indolyl-3-galactoside for differentiating recombinant and nonrecombinant phages. For the titration of M13 recombinants carrying all or part of gtJl, phages were plated on E. coli JM109 on B-broth agar (26) to which 1% sucrose was added for detection of enzyme activity. Enzymes and chemicals. Restriction enzymes were purchased from Bethesda Research Laboratories, Inc., Gaithersburg, Md., and were used in accordance with the specifications of the manufacturer. T4 DNA ligase was purchased from Amersham Corp., Arlington Heights, Ill., or Bethesda Research Laboratories. The Klenow fragment of DNA polymerase and the M13 15-base primer were purchased from Bethesda Research Laboratories. The deoxy- and dideoxynucleotide triphosphates were purchased from P-L Biochemicals, Inc., and [a-32P]dATP was purchased from New England Nuclear Corp., Boston, Mass.). Isopropyl-3-D-thiogalactopyranoside and 5-bromo-4-chloro-3-indolyl-3-galactoside) were purchased from Sigma Chemical Co., St. Louis, Mo. Subcloning of the gtfl gene and nucleotide sequencing. The gtfl gene was obtained for subcloning experiments by digestion of pMLG1 with HindIII followed by electrophoresis and isolation of the 5.0-kb fragment from 0.8% type VII agarose gels as previously described (16). The fragment was unidirectionally degraded with Bal 31 by a modification of the procedure of Gilmore et al. (8), and all subcloning into M13 phages mpl8 and mpl9 was done as described by Ferretti et al. (7). A 0.5-kb HindIIl fragment was subsequently isolated
Corresponding author. 4271
4272
FERRETTI ET AL.
J. BACTERIOL.
co ('4
in
P
so.
I(
4(
IIh-.a
........
sis of purified glucosyltransferase was done with an Applied Biosystems 470A protein sequencer with an on-line 120A PTH analyzer in accordance with the instructions of the manufacturer.
..1 ....
!X.......
FIG. 1. SDS-PAGE of glucosyltransferase activity in S. sobrinus MFe28, E. coli MAF4 (carrying pMLG5), and E. coli MAF1 (carrying pMLG1). The right lane contains the following protein standards: RNA polymerase 1' subunit (Mr, 165,000), RNA polymerase P subunit (155,000), 3-galactosidase (116,000), phosphorylase b (97,400), albumin (66,000), and ovalbumin (45,000).
from pMLG5, cloned into M13 phages, and sequenced. This fragment contained the remainder of the glucosyltransferase sequence not present in the pMLG1 HindIII fragment. Sequencing reactions were performed by the Sanger dideoxy chain termination method (28) by using the procedures described by Amersham. All sequences were confirmed from at least two overlapping clones, and the entire gene sequence was determined on both strands. The sequence information was analyzed by the James M. Pustell DNA/ protein sequencing program obtained from International Biotechnologies, Inc., New Haven, Conn. Gel electrophoresis. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and detection of glucosyltransferase activity by incubation of gels with sucrose in the presence of Triton X-100 was done as described previously (9), but the sensitivity of the method was enhanced by the use of a periodic acid-Schiff reagent procedure modified from published methods (15, 29). After incubation in sucrose (generally for 40 h at 37°C), the gels were fixed for 30 min in 75% ethanol and treated on a shaker for 30 min with 0.7% periodic acid in 5% acetic acid. They were then shaken for 60 min in several changes of 0.2% sodium metabisulfite in 5% acetic acid and placed in Schiff reagent (Sigma) for several hours. Finally, the gels were washed extensively in 45% methanol-45% acetic acid-10% water. All procedures were carried out at room temperature. Purification of glucosyltransferase and glucan-binding peptides. Glucosyltransferase was purified from cells of E. coli MAF5 (which carries plasmid pMLG5) by subjecting the bacteria to ultrasonic disruption and using a single-step affinity chromatography procedure: the bacterial extract was passed through a column containing Sepharose 1000 (Pharmacia) and mutan; after the column was washed with buffer, bound glucosyltransferase was eluted with 5 M guanidine hydrochloride (26). The same procedure was used for detection of glucan-binding peptides; i.e., disrupted cell extracts, plus for phage M13-infected cultures, the culture supernatant, were passed through the affinity column. Eluted peptides were analyzed by SDS-PAGE, and derivation from glucosyltransferase was confirmed by Western blotting (immunoblotting) with antiserum against purified glucosyltransferase (27a). N-terminal sequence analysis. N-terminal sequence analy-
RESULTS Cloning of complete gtl gene. The gtpl gene from S. sobrinus MFe28 was originally cloned into the BamHI sites of bacteriophage lambda L47.1 and was located in a 7.6-kb DNA insert. Subsequently, mapping experiments showed that HindIII cleaved the insert at three points and the lambda DNA at two points outside the insert to give four fragments of 5.0, 2.6, 2.5, and 0.5 kb which carried S. sobrinus DNA. Plasmid pMLG1, from which a functional glucosyltransferase is expressed in E. coli, carries the 5.0-kb fragment (27a). At an early stage of the nucleotide-sequencing program, it became clear that pMLG1 contained a long open reading frame with no termination codon. These results indicated that the entire gtp gene was not present in pMLG1, and so a further collection of derivatives of pBR322 carrying HindIII fragments from the bacteriophage lambda recombinants was examined. One of these, pMLG5, encoded a glucosyltransferase of 173 ki}odaltons (Fig. 1) and was found to have both the 5.0- and 0.5-kb fragments. Restriction mapping of pMLG5 and subsequent nucleotide sequencing confirmed that the 0.5-kb fragment carried the information for the C-terminal region of glucosyltransferase. A partial restriction site map of the 5.5-kb insert containing the gft gene is shown in Fig. 2. Nucleotide sequence. The complete nucleotide sequence of the 4,995-base-pair (bp) fragment carrying the gtp gene was determined in both orientations and is shown in Fig. 3. Previous evidence had indicated that 0.5 kb of the insert in pMLG1 and pMLG5 was derived from the bacteriophage lambda vector (27a), and this was confirmed by the finding that the first 494 bp of the 5.5-kb fragment showed 100% homology with the corresponding part of the lambda sequence (lambda sequence not shown). An open reading frame containing 4,791 bp codes for the glucosyltransferase protein. The deduced amino acid sequence, which is coded starting at the ATG codon beginning at position 160 and extending to the termination codon TAA at position 4951, contains 1,597 amino acids and has a molecular weight of 177,100. A putative ribosome-binding site sequence (AGGAGGA) is located nine nucleotides upstream from the translation initiation codon. Further upstream is a probable -35 region (TTGACG) separated by 18 bp from a proposed -10 region (TTAAAA). Amino acid composition. The deduced amino acid composition indicated a highly hydrophilic protein, with a hydrophobic N-terminal region. This region displayed the characteristics expected of a signal peptide, i.e., a basic N-terminal region followed by a central hydrophobic region and a more polar C-terminal region. The residues surrounding amino acid 38 conform with the "-3, -1" rule proposed by von Heijne (31) for amino acids found at a cleavage site. To Haei
KWnI
-
10.Clal EcoRV
800
1600
Tbal
K,pnl
NciI
Aval
Hidul PatIHindlI
Clal.b 8acl 20 0 3
2400
3200
4000
4800
bp
FIG. 2. Partial restriction map of the 4,995-bp fragment containing the gtp gene and the 494-bp fragment of bacteriophage lambda (hatched) carried by plasmid pMLG5. The 5' end of the gene is at the left, and the 3' end is at the right.
DNA SEQUENCE OF S. SOBRINUS gtfl
VOL. 169, 1987
MC
AGA
CM
TOT MT
100
90
60
50
CAA ACA TIC ATG CCT1ITC TC
s0
70
40
30
20
10
GAT OG1 CIA TOG TM
MC
TAG AMA
110
1330
1340
1350
*
*
*
CTG GAC TCr CGT TTC ACC TAC AT Thr Gly Ser Leu Asp Sr Arg Ph. Thr Tyr Asn
ACT GOT TCC
120
AGl TCO
TTA
190
TOT
TO
CM
170
AM
AGT
210
230
220
260
270
200
290
1450
310
320
330
350
340
1510
380
370
390
410
400
430
1520
1530
*
0
1580
1570
440
450
460
470
510
530
520
560
550
570
500
590
620
610
630
640
650
1660
1670
*
0
1690
1700
1710
1720
1730
*
*
0
*
*
CTC
680
670
690
700
710
Ltu
740
750
760
780
770
TT GCM ACT G0C GCT TAC MA GAC ACT AGC 4G GTA GCM GOTATMAM ATC TAT Ile Tyr Tyr P60 Asp Glu TOn Gly A10 Tyr Lys Asp Thr SIr Lys Vol Glu Ala Aop Lys
Asp Lys Arg Ser Gly
Ltu
Ain Pro
810
830
820
840
MC1AM GA ACA0CC TT OCT GT MC MC TCA GOT TCC GAT ATO AG AM Ser Gly SMr Asp Ioe Ser Lyo Glu Glu Thr Thr Phe Alo Alo Asn Ann Arg A1 Tyr SIr
860
850
870
900
090
000
AAMC TTG ACA OCT MAC TCA TOG TAO OGT ACC TCA GCT GM MC TT GM GC ATT GAT Thr Ser Ala Glu Ann Phe Glu Ala Ile Asp Asn Tyr Lou Thr A1a Asp Ser Trp Tyr Arg 920
910
930
960
950
940
AAG C TG ACA GM TCA AGC AMS GAT GAC TTC CdT CCA AMG TCC ATC CTC MAG GAT Pro Lys SMt Ile Lnu Lys Asp Gly Lys Thr Trp Thr Glu Ser SMr Lys Asp Asp Ph6 Arg 980
970
990
1010
1000
1020
CCG CTA TT2 ATG GOT T1G TG1 CCA GAT ACC G00 ACC AM COC MC TAT GOT MC TAC A02 Pro L tuL SuNot A1 Trp Trp Pro Asp Thr Glu Thr Lys Arg Asn Tyr Vol AMn Tyr oet 1030
1040
1050
1060
GOT GTT GOT ATT GAT AM ACC TAT Ace0 GOT G00A AMn Lys Val Val Gly Ile Asp Lys lir Tyr Tn Ala0 Gli MC MG
1090
1100
1110
1070
1080
CG ACA 40ACM OCT CGAC 04ln A41 Asp Lou Thr 1130
1120
1140
1160
1170
1190
1100
1200
1740 *
1600
Ile His
Ann SIr
Lou Val Asp Arg Glu Val Asp
1860
1850
1840
1830
G0A
GAC CGT CAA GTT GM ACC GII CCA A0T TAC AGC TO CC OTTGCT CAC G0 AGC GM Asp Arg Gli Val Glu Thr Val Pro Ser Tyr Ser Phs Ala Arg 410 Ris Asp SIr Glu
CM
GAC CTG
Gln
Anp
Lou
TOT ACT
CM
1800
ATT
ITe
Phe Thr Gln
1990
1980
1970
CM GCC TTC MAG AT TAC MC GAA GAC CTC GAMG ACe Gln A10 Phe Lys Ile Tyr Ann Glu Asp Ltu Lys Lys Thr
4T GTG COG CTT
Ann
Val Pro Lu
2060
G00 ToT A Cr OTT GTT TAOT
41
SMr
Ltu Thnr
Tyr Thr Leu Lou
2080
Lys
An
2090
2100
T0 C ACC GAT MGAT 0T CM Sot Phe Thr Asp Asp Gly Gln
TAC ATG Gcc Tyr Sot Ala
2150
2160
A GAT ATG
2120
AMG
TCT TAT ACC TTG CTT CTG ACT MC
2070
Gly Ser Il. Pro Arg Val Tyr Tyr Gly Asp
2040
2030
2020
2010
2000
GAT MG AM TAC ACT CAC TAC Asp Lys Lys Tyr Thr His Tyr
TOT 0G0 TAT TO Ph6 Gly Tyr SMr
GCA Ala
1960
1950
1940
OAC GM ATc GAC In e Asp Asp Glu
Val 1920
1910
1900
1890
OTT ATT MG GCT GM AT MU CCA AAT Pro AMn e I10 I1e Lys Ala GluIn A1
CGT GAC Arg Asp
1930
2130
2140
G01 MC TAC GAT GCT ATC GM TCT CTG CTG AM GIC CGT AT AMG TAC OTT Thr Val Asn Tyr Asp A10 Ile Glu SMr Lou Lon Lys Al Arg Sot Lys Tyr Vol
MC MG ACT Asn Lys
2180
2170
2220
2210
2200
2190
G00 G0T CAA GCT ATG CM AAT TAC CM ATC GT MTCGOTCGM ATC TTG ACT TCT GTC Ala Gly Cly Gln Ala Hot Gln Ann Tyr Gln Ile Gly Ann Gly Glu Ile Ltu Thr Ser Val
GCT
2240
2230
CGT TAT 00 AMS
G10
GCC Arg Tyr Gly Lys Gly Ala
2290
2250
2300
Gly Val Gly Val Val
Sot
2350
CTC MC A0GGT
Lto
2310
CM CCC
Gly Asn Gln Pro 2360
GCT
Asn Met Gly Ala
GCC
A41
CAC His
MTC TT
Ann
2330
MG A0G4M
TAC OTT Tyr Arg
2430
GOT A41
TOG TTG GT GM G ATC TCA GCT OT GTT AAA A0111AACA CM TG AAT ST GCM Lyo Trp Ltu Arg Glu A01 Ile Ser A01 Phe Vol Lys Thr Gln Pro Gln Trp Asn Gly Glu
2490
TAC CTC TAC TTC 200 Tyr Lou Tyr Phe Leu
Ala
AACC AAM Thn
Lys
2460
2450
OT
2400
2470
0A CAT G0A MC0T0 Thr Asp Glu Ann Gly
Val
2400
CTT T0 ATG GTA Lou Set Val SMr
2440
lCT
C0TA GCC
2390
2380
OCT MC CM GM Ala Asn Gln Glu
2340
2320
AGC 20T
404 GAT OCTMC ocAGCM GG 0T G5T GOT oCA ACC TAT Asp Gly Val Ala Thr Tyr Ala Thn Asp Ala Asp Ala Ser Lys Ala Gly
GAC
2280
Phe SrLo u Asp Gly Lys Val
2370
2420
2410
2270
2260
AM CM AGC GAT AMG GT GAT GOG ACA OCT C ACG TCA Lou Lys Gln SMr Asp Lys Gly Asp Ala Thr Thr Arg Thr Ser
COT
GOT OTC GOC GTT GTT ATG G0A MC
CM AAT ACC ACT 0C GOT CM 0cC OGT ATC M CAA AG A0T GAC GCA O&A G&T Ala Ala Ala Glu Ltu Vol Gln Ala Arg I1e Glu 0ln Lys I11 Thr Thr Glu Gln AMn TOr
1150
Ltu
1820
2110 800
790
*
1790
1780
1770
1760
1750
2050
730
1680
OT CGATG 04T GAC AMA CT TCT GOC TTG MT CCC CTC ATCCAT MCAG0 CTG OTT GOCGT
720
TAT TAc TAC GAT CM GAC GC AAC GTT AMG AG MC TT Gct GTT AOC GTT GT GAG AMG Tyr Tyr Tyr Asp Gin Asp Gly Asn Vol Lys Lys Ann Phe A01 Vol SMr Vol Gly Glu Lys
*
ATG MC ATG GAC MC AMG TC CGT TTS TCT ATG CTT TCG TOT TTG GCT AM CCA T21 lNot Asn Hot Asp Ann Lys Ph6 Arg Lou Ser Met Leu Trp Ser Lou Ala Lys Pro Leu
660
MC TTG0CC AM ATG TCA MT GTT AMG CAG GTT GAC GUT MA TAT MT TCA ATT CCA T MA Asn SIr Ile Pro Sir Asp Lnu A41 Lys lot SMr Asn Vol Lys Gln Vol Aso Gly Lys Tyr
1620
*
*
600
GOT mTT AC0 GAT GAG ATG TTA &A GM GOC 00G MT GTG GCT ACT CT GMA TCT Asn Gln Val Phe TOr Asp Glu Set Lou Ala Glu Ala Lys Asn Vol A01 Thr Alo Glu Ser
*
1610
1650
1870
MC CM
*
*
540
AOG G&A ocr CAA ACA ACC ACA AAT oCT AAT GAA GCT AAG TGG GT CSG ACT GMAA T GAG Thr Ala Alo Gln Thr Thr Thr AMn Alo Ann Glu Alo Lys Trp Vol Pro ohr Glu AMn Glu
1600
*
1640 *
1010 500
490
1560
*
*
GTA GAM CA TGG AOC GAC MC G4 ACC CCT TAT CCC CAT GAT 04T GOC GAC MC Val Glu Ala Trp Ser Asp Ann Asp Thr Pro Tyr LAu His Asp Asp Gly Asp Asn
480
G0 A GAG CAA AOT CM 0A ACA ACA GOCT AGC ACA GAC TCA 4A ACA GAT CM 0CA TCA GCA Ser A01 Thr Asp Gln Ala Ser A01 A01 Glu Gln Thr Gln Gly TOn lhr A01 SIr Thr Asp
1550
1540
1590
*
1630
TOT UI nr Ile
420
*
1500
1490
1480
GAT TAC COT AMG GCA GCT TAC GOT II0 AMA MACAM C AAAM AA GCT MT MC CAC GTT Asp Tyr Lou Lys Ala Ala Tyr Gly Ile Asp Lys Ash Asn Lys Asn Ala Asn Asn His Val
360
0A 0A ACA GAT CAA GCA GT GCA GoCG AcA GoCT ACA TCA GM CAG TCT GCT TCA ACT r Gin 1in SMr Ala SIr Thr Asp A1 A01 Thr Asp Gln Ala Val Alo A41 Thr A010 hr
*
*
*
1470
1460
*
AAC CAA GCA GTC TTG ACG GCT GAC CAA ASG ACT ACC MC CM GAT ACT GAG CM ACT TCT Asn Gln A10 Vol Ltu Thr Alo Asp Gln Tht Thr Thr Asn Gln Asp Thr Glu Gin Thr SIr
1440
1430
1420
AT C GT T GCT GAc cTT CTG CM ATc TeT AGT G0 T M GAc TCT ATC CGT GTT 0U GOG GTA Asp Ser Ile Arg Val Asp Ala Val Asp Asn Val Asp Ala Asp Leu Ltu Gln Ioe Ser Ser
300
GCT TCA GCT CTC GOT lCO TCA GOT GOT ATC (TICAGrACT CMACT GTT AOC GM MC AGC Al4 SMr Ala Ltu Gly A1 SMr Vol A41 Sir Ala Asp Thr Olu Thr Vol Ser Glu Asp Ser
*
MC GAC GTG GAT MC TCT MAT CCC ATC GTT CM GCA GAG CM CTI MC Asn Asp Val Asp Asn Ser Aon Pro Ile Val Gln Ala Glu Gln Lou Asn
*
250
1380
TOG CTG CAT TAC CTG CCC MC TIC GOT ACT ATC TA GCT AM0AGA GOTU0AT OCT MC m Trp Leu His Tyr Lou Ltu Asn Phe Gly Thr I1e Tyr Ala Lys Asp Ala Asp Ala Ann Phe
240
AG AT2 CAT AM OTC AM AMA AGA TOG0T ACT ATC TCA COTT CA TCT GCC ACT A TTA Lys Sot Hnis Lys Val Lys Lys Arg Trp V01 Thr Ile SIr Vol A01 Ser Alo Thr Not Leu
*
*
*
TIC CIT CTG OCT Phe Lou Lou Ala
10
G AAMG AAT "A OT TOT Sot Glu Lys AMn Glu Arg Phe
ACT cOCA T20 A02
ATT AG0 AG
200
TIA TOT MC TCT TAO
160
150
140
130 TM
CM
1370
*
OCT MC GAC COG TTA GS100G TAT GAG Ala Ann Asp Pro Lu Gly Gly Tyr Glu
1410
1400
1390 *
TM ATT GTA MT TOT GGT AAA ATT ACT TGA OCA TIA
1360
4273
cTG GT AMG
Ltu Val
CGC
Lys Arg 2520
2510
2500
AM 0 GTT GCT MT CC MAC GAC OTT &C AMn Asp Asp Lnu Lys Gly Val Ala Asn Pro
AAA
1210
1220
1230
1260
1250
1240
TICAT AmC CM TOT AGC GAA AAG 2CATAC GAT GAC MAC TTGCMMA 0T G0CC CIT Ser Glu Lys Pro Tyr Asp Asp HisLo Gln AMn lly Ala Lou Lys Ph Asp Ann 0ln SMr 1270
GAT Tno AcA CCA GAT AoG Asp Lou Thr Pro Asp Thr
1280 CM
1290
1310
1300 CoC
1320
Phb Lto
CTT
2580
A0T
CCA
A4
ACC CIA GCA GCT ACC GAT ACA GCA0 Asp Thr A1 Ser Arg Val Ala Ala
Thn
2630
2620
2610
i600
SMr
2570
2560
0 CCA GCA M CAT AC CM CAT Gln Val Trp Val Pro Val Gly Ala Ala Asp Asp Gln Asp
20A G0T TIC COT CM OTC TG
2590
CAA AMT AA COCA ACT TSG MC TAOGT TTG SMr Asn Tyr Arg Lto Ltu AMn Arg TO Pro Tn AMn Gln CTC
2550
2540
2530
GIT TCT Gln Val Ser Gly
CAG
CAT Asp
I1e
2640
GAM CT GCC A10 Ala
00 AM 24 CTC CAT CM Cly Lys Ser Lu His Gln Asp
mC
2650
2690
2680
2670
2660
2700
00 GT M G0T TIC TCT MC TrC CM TOT OGC 404M CM ACA AT0 GAC ICT COC GTC ATG Sot Asp Sr Arg Val Sot Phe Glu Gly Phe Ser Asn Phe Gln Ser Phe A4a Thn Lys Glu
2720
2710
TAT A t.lu Glu Tyr Thr
GM GAG
MT OT
Ann
G Val Val
2730
ATT OCT
210
Ala
MC
2740
MAT CT MAM
Ann AMn Val
Asp Lys
2750 m
Phs
TT
2760
TOA TOG G0A ATC
Val Ser Trp Gly
I1-
Continued on following page
4274
J. BACTERIOL.
FERRETTI ET AL. 2780
2770
Gln Tyr Val Ser Ser Thr
Pro
2840
2830
286
2850
TSAT GCC TOT Val Ile Gln A n Gly Tyr Ala Phb GT
All
Thr
*
*
CAG OCT AT Gln Ala Asn
Ala
2970
2980
*
*
*
Ala
Lys
CT
COT
Gly Lou
Leo
3070
3080
3090
3100
3110
*
*
*
*
*
CTC TAC G0A ACA GAT ACA AAG AGC TCS S GOAT G0c TAT CAA GoC AAA Tyr Val TAr Asp TAr Lys Ser Ser Gly Asp Asp Tyr Gln Ala Lys
Lau
3160
3150
3140
3130
GAA AAA TAT CCA GAA CTC TT ACC Pha Leu Asp Glu Lou Lys Glu Lys Tye Pro Glu Lau Phe Thr TIC COT GAC GAA TIA AAM
3220
3210
3200
3190
TAc
GTT
3230
MG GTA TTG Lys Val Lsu
Thr
Gly
4460
3260
AGC
MT
His
Gln
Val
Ser Asn Ile Lau Gly
Gly 3320
3310
TAT
Vol
3330
3340
3350
MC TCT
Arg Tyr Tyr Asp
Asn
Ala
*
OAT ACT GT CM AA T GCT MT CCA TAC TIC GOT Tyr Tyr Phs Gly Ser Asp Gly Thr Ala Gln Thr Gin Ala Asn Pio
3400
3410
MO
GOT
CM
Lys
TAr
Lys Gly
Gln
ACC
TTTAMOG
TAr
3440
*
3450
*
3460
Al1
3470
G0A
TCT
3510
4760
GCC TT CSC MT ACA Ala Lou Arg Asn Thr
OT
Val
T 14S01SG Tyr Thr Asp
3570
3580
ACT
GGC CT
G
*
3640
**
*
*
3650
3680
3690
3700
3740
3750
3760
3770
3660
3720
3780
MO GTT COT TAC STC GAM MO AT ATI GTC ACC CGT GAT GAT G0T GOT CAAMC AAG Asp Gly Val Gln Ala Lys Asp Lys Ile Ile Vol Thr Arg Asp Gly Lys Val Arg Tyr Phb
3800
3790
3810
3820
3830
3840
AAM ACC TIC GTC GOT 0A8 MO ACT GGT CAC TOG GAC CAA CAT MT G0A MAAT GT TA ACC Asp Gln His Asn Gly Aga Ala Val Thr Asn Thr Phe Val Ala Asp Lys Thr Gly His Trp 3840
3850
A3
*
*
3870
3080
3890
3900
*
*
CM MT GOT OCT MA CM CAC AA 0GA GTC OM GTT AC1 GOT G 0T T1C TAT CTA GSSM Tyr Tyr Lou Gly Lys Asp Gly Val Ala Val Thr Gly Ala Gln Thr Val Gly Lys Gln His 3920
3910
3930
3940
3950
3960
CTT TAC TTC GM 0CC MT GOT CM CM GTr AMG GOT G04C TOT C1 0A GCC MAM GAT GG Leu Tyr Phe Glu Ala Asn Gly Gln Gln Val Lys Gly Asp Phe Val Thr Ala Lys Asp Gly 3970
3980
3990
4000
4010
4020
TOT 00C 01G TOO 0C MT A1 TIc AT1 GAA G08 AM CTT SAC TTC T0C GAT GTT Lys La Tyr Phe Tyr Asp Val Asp S r Gly Asp lit Trp Thr AMn Thr PFs Ile Glu Asp
4030 *
4040
A4
4050
*
4060
4070
*
*
4080
ACA GGT GC CAA ACC MOG 0A G0C MC TGG TIC TAT COT GOT MMA AT G0 A Ge G 0C Lys Ala Gly Asn Trp Phe Tyr Lau Gly Lys Ap Gly Ala Ala Val Ahr Gly Ala Gln Thr 4090
4100
4110
4120
4130
4140
Am 1MG G0C CM MA COT 1AC TIC AAG 0T MC 0C0CM CMGT AM GO A ATCG1C I1e Lys Gly Gln Lys Lou Tyr Phe Lys Ala A n Gly Gln Gln Val Lys Gly Asp Ile Val 4150
4160
4170
4180
4190
4200
OM GA1 CA0CM GAT OSAT 0SMG AZT OG0 TAC TAC ACT G GAA CM 0T1 AAT Lys Asp Ala Asp Gly Lys I1 Arg Tyr Tyr Asp Ala Gln Thr Gly Glu Gln Val Ph Asn
MG
GT
Lys Val
TAC TOT Tyr Pbs
Val
4800 GOT
ATc
CM
Asp Lys Asn Gly
Ile
Gln
GU AMA
4840
4830
4820 ACT
MT
Thr
4860
4850
M TCT GAT GOT MO GOTC 1GC TAC TTT MU TOT Ser Asp Gly Lys Val Arg Tyr Phe Asp Glu Asa SOr
4890
4880
4900
4920
4910
*
*
4940
GAT GOT GOGT GOGTC Asp Gly Ala Ala Val
3600
MAT GOT GTC ATG GCA COT 00C CTC ACA ACC GTS GAT G00 CAC GTr CAA TA4 IT GAT MA Asn Gly Val eIt Ala Lau Gly LAo Thr Thr Val Asp Gly His Val Gln Tyr Phb Asp Lys 3730
Gly Lys
4790
*
CMA MOG
*
Ala
* * ~~~ ~~A2
3710
Lys Asp
GOr AGC AG 0T1 ACC MC CM ToG AMA TT GTT TAC G0A CM AT TAC TAT TIC G0T AGT Gly Ser HIt Ile TAr Asn Gln Trp Lys Phb Val Tyr Gly Gln Tyr Tyr Tyr Phe Gly Ser
3540
TAC TIC AAG GGr AAA CGr TAc GAA MST 0G TAC CAA CM TT GG0 MT GAC AGC TOG Gly Lys Arg Tyr Glu Asn Gly Tyr Gln Gln Phe Gly AMn Asp SOr Trp Arg Tyr Phs Lys 3670
GOT MC
Gly AMn Gln
MO
4870
4960
4950
TAC
CST G0C
Tyr
Arg
Gly
Ac Trp Asn
MAA GAT cTG AG0
TOO
4980
4970
TM Am
TAT
crc
GAC
---
49iO
1CC CAA GOT CM MC CA 11AC TAT 01C 0C GAC Gln Gly Gln AMn His Tyr Tyr Gly Ain Asp
3630
MAMA GTA
*
ACA GTA
A0 A G00 MOG GCGOTA Ala Lys Gly Lys Ala Val Arg GCC
Ala
3620
3610
4740
GAC
3480
3530
3590
MO
4780
4770
*
*
TTG
4950
3560
Gln
4730
*
TGl CTT T1C OTT Gly Gln Trp Lou Tyr Val
CM GOT CM
3420
C MC AT1 MOG 00A TOC 4C0 TAT 1AC TC CSO GST M A40 GTG Ac GOT G Hst Val Thr Gly Ala Gln AMn Ile Lys Gly SOr AMn Tyr Tyr Phe Lou Ala A n Gly
3550
AB
*
*
AGT G0A TOG TAT MMA AT GCC Gly Ser Gly Trp Tyr Lyt AMn Ala GOT
3360
*
3520
4720
4710
4700
*
4AC
MO
Lys Tyr
ATn AM GM G01 GG MT COT SA SC TOT G CAA GA GG0 AC C1AT MA Val Thr Asp Ser Phb Ile Thr Glu Ala Gly Asn Leu Tyr Tyr Phb Gly Gln Asp Gly Tyr 3500
GOT
3300
GUSA
3490
*
*
11A CAG TAT CTT CGT TIc TAC MT CTT CM Pbh Lys Asp Gly Ser Gly Val Lou Arg Phe Tyr AMn Lou Glu Gly 01n Tyr Val Ser AT
4810
*
4680
4670
4660
4650
4640
4630
1CA GSS AOC COC TOT GAT G0A ACT 0G0 TAT G0C TAC MC TCA AMC ACA ACA GGT GAA AAG Ser Gly Ile Arg Phs Asp Gly Thr Gly Tyr Val Tyr A n Ser SOr Thr Thr Gly Glu Lys 3430
*
ACT TAC
Lau Thr Gly Leu Gln Thr Val 3390
4620
4610
4600
4590
*
*
62
GGC GAC CAG OCT TIC MC MG TCT GTM ACT GTT MT G0C Gly Asp Gln Ala Phe AMn Lys Ser Val Thr Val Asn Gly
Ser
4580
4570
CTC MST GTSTCA GAT GAT MA CTC TIC STO CCA AAA ACT CSC CTA GG CAA GTC GTA CAA Leu Asn Val SOr Asp Asp Lys Lau Ph Leou Pro Lys Thr Leu Lau Gly Gln Val Val Glu 3380
*
Lou
Lys
4560
4550
4540
MO
MG
4750
3370
GOT CM
4530
4520
4690 Ala
4500
Hindlul* CcT
*
TOG 01A ACT 0T1 MC GAT GOT Lys Gly Gln Lou Val Thr Gly AMn Asp G1y
GTC MM
COC TAC TAT GAT GCC
3240
3290
GTC C0C AMC GAC CM G0A AMC MC Asp Tyr LAu SOr Asp Gln Ala SOr AMn
A0C CTT G0C COGGOTr01 GAT Arg
3280
3270
*
4490
4480
4470
CM
*
3250
4440
GCT ACC ATT G0A MT CM COGA GTT TAC Ala Gln Thr I1e Gly AMn Gln Arg Val Tyr CM
*
TIC MC GOT G0C CM 1cc A1A0ATCCM TCT GTT MO ATS MA CM TOG TOT GCT MO Gly Gln Ala Ile Asp Pro Ser Val Lys Ile Lys Gln Trp SOr Ala Lys Tyr Phe Asn Gly
*
*
GOT
4510
AA CAA A4C TC1 ACC Lys Lys Gln I1e Ser Thr
4380 AS
4430
4420
4410 ACT
MT GOC CAT
MO
3180
3170
Gly
GAT Phs Lys Asp AMn Gly
MO
*
*
*
*
Ala
4320
4370
4360
4350
4400 GOT
4450
TIC
G0C o01 GCC
Gly
TC0
MO
Tyr Val Lys Ser
3120
Tyr Gly
CM ACT
Gln Thr
ACT
*
*
TOT ATC AAT CAC AGT G0 OAT AAM A4 A CAA coO GSC AAA CCA ATC GSA AC4 GTTS Thr Vol TAr Arg Thr Asp Lys Phb Gly Lys Pro I1e Ala Gly SOr Gln Ile Asn His SOr
GCT
Thr Ala
4310
*
*
4390 TAT
3060
3050
3040
3030
3020
Gly
4300
*
4340
*
GTT CCA 4AC CAA AT0 TAC ACC TIC CT AAA CMA "A GIG G01 AAG GOT ATG GCA GAC Lys Val Hst Ala Asp Trp Val Pro Asp G0n lit Tyr Thr Phb Pro Lys Gln Glu Val Vol TOG
3010
Ser Asp
GM GOT CAG TAT GTA TCA GOT AGT G0A TGG TAT G0M ACA GCA GAG CAC GM TOO GOT Glu Gly Gln Tyr Val Ser Gly Ser Gly Trp Tyr Glu Thr Ala Glu His Glu Trp Val
3000
2990
GU 00A
GGT CM ACC TT MOG GAT GOT TCT 00A GOT COT COT TTC TAC AT Lys Gly Gln Thr Phe Lys Asp Gly Ser Gly Val Lsu Arg Phe Tyr A.n
*
AAA G00
Gly
4290
4280 *
Pro
*0
CTC CM Lou Ris
GOT MT
CCA MG
4330
2940
2930
2920
*
GOT
*
*
2960
*
GTA AGT 4270
2880
2870
*
AAG TAT G0S ACA GCC GAC CAA OTG GTT AAG GCT AC Lyt Tyr Gly Thr Ala Asp Gln Lau Val Lys Ala Ile Lys 2950
Gln Phb LAu Asp SOr
GAC CGT TAT GAC OTG G0 A1G C1 MA GCA AAC Asp Arg Tyr Asp Lou 0ly Nost SOr Lys Ala AMn 2910
2900
2890
AG TCT
4250
4240
4230
*
MC GOT MG ACT TAC TAC TIC Lys Ser Val Oer Val Asn Gly Lys Thr Tyr Tyr Phs
CITT GAT ICT
*
*
*
GTC ASSCMAA AAT
Asp Gly
BI
*
*
*
TIC
4220
4210
2820
2810
2800
2790
*
CCr CAG TAT GTC 1CA TC AC0 GAC GOT CG
ACT GAC TOT G0A ATG G0 Thr Asp Phe Glu Mst Ala
CM
G0C AMA
MO
cO
FIG. 3. Nucleotide sequence of the gtjp gene and flanking regions. Numbering begins at the 5' end of the sequence. The deduced amino acid sequence of glucosyltransferase is given below the nucleotide sequence; an arrow designates the cleavage site for the removal of the signal peptide. Putative promoter and ribosomebinding site sequences are underlined. The starts of repeat regions Al to A6, Bi, and B2 are marked.
investigate whether the gtJp gene product was indeed cleaved at this site, the enzyme expressed in E. coli MAF5 was purified by affinity chromatography and subjected to N-terminal amino acid analysis. The first 16 amino acids were
identified
as
Asp-Thr-Glu-Thr-Val-Ser-Glu-Asp-Ser-
sequence is identical to that of the deduced amino acid sequence directly following the postulated cleavage site. Thus, the signal peptide is 38 amino acids long and contains regions similar to those found in most secretory signal sequences (20). The mature glucosyltransferase contains 1,559 amino acids and has a molecular weight of 172,983. The deduced amino acid composition of glucosyltransferase with and without the signal peptide is presented in Table 1. Amino acid sequence homology. A series of repeating units is located in the C-terminal one-third of the glucosyltransferase molecule (Fig. 4). One of the repeating units, designated A, is 35 amino acids long and is present six times. Although these repeats are hot completely identical, repeating unit A4 was found to have the greatest homology with
Asn-Gln-Ala-Val-Leu-Thr-Ala; this
DNA
VOL. 169, 1987 TABLE 1. Amino acid composition of glucosyltransferase deduced from the nucleotide sequence of gtfl No. of residuesa
Amino acid
With signal peptide
Without signal peptide
143 43 109 137 65 93 135 19 50 92 122 27 64 30 106 124 25 98 115 1,597
136 41 108 137 63 93 134 18 49 90 117 24 63 30 101 122 24 98 111 1,559
Alanine Arginine Asparagine Aspartic acid Glutamic acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine Total a
Mr with signal peptide,
177,100;
Mr without signal peptide,
172,983.
each of the other five repeating units. Based on a comparison with A4, the homologies were as follows: Al, 65%; A2, 38%; A3, 72%; A5, 65%; and A6, 50%. The gene segments corresponding to these regions are also highly conserved and except for repeat A2, which contained only 38% identical bases, the repeats contained 65 to 72% identical bases. Repeating unit B is present twice and contains 48 amino acids, all of which are identical. The corresponding gene regions contain a stretch of 132 identical bases. Functional regions of glucosyltransferase. Since functional glucosyltransferase was expressed from both pMLG1 and pMLG5, the terminal 0.5 kb of the gtpl gene is clearly not essential for activity. However, we have previously reported that a deletion extending from the end of the gene to the Sacl site at position 3085 resulted in expression of a truncated and enzymatically inactive peptide (27, 27a). To further define the length of the gene sequence required for expression of a functional glucosyltransferase, a series of derivatives of phage M13 containing various lengths of gtf7 were examined for expression of peptides which had enzyme activity and the ability to bind to glucan (Fig. 5). The M13 derivatives R7-20, R5-2, and R7-3 all formed polymer when plated with E. coli JM109 on sucrose-containing medium (Fig. 6). The two shortest M13 derivatives tested, R7-34 and R3-6, did not form any polymer either on plates or in a tube assay for glucosyltransferase using 14C-labeled sucrose. Nor did they release reducing sugar from sucrose, whereas the longer derivatives did encode enzyme activity for the release of reducing sugars, as indicated by the fact that when they were plated on E. coli JM109 on sucrose indicator plates (Russell et al., in press), acid was produced by E. coli. The same pattern of function was found when glucan-binding ability was examined; derivatives R7-20, R5-2, and R7-3 all expressed peptides which were retained by a mutan-Sepharose column, whereas R7-34 and R3-6 did not. The results presented above indicated that the genetic information essential for enzyme activity and glucan-binding function was located in the C-terminal one-third of the gene. An in-frame gene fusion was therefore made between pUC8 and the ScaI site of gtpl located at position 3290. The
SEQUENCE OF S. SOBRINUS gtfl
4275
A
1100 YYFGQDGYMVTGAQNIKGSNYYFLANGAALRNTVY 1163 1228 1293 1406
WRYFKNGVMALGLTTVDGHVQYFDKDGVQAKDKII YYLGKDGVAVTGAQTVGKQHLYFEANGQQVKGDFV FYLGKDGAAVTGAQTIKGQKLYFKANGQQVKGDIV WVYVKSGKVLTGAQTIGNQRVYFKDNGHQVKGQLV
1519 WLYVKDGKVLTGLQTVGNQKVYFDKNGIQAKGKAV B 1352 VNGKTYYFGSDGTAQTQANPKGQTFKDGSGVLRFYNLEGQYVSGSGWY 1464 VNGKTYYFGSDGTAQTQANPKGQTFKDGSGVLRFYNLEGQWYVSGSGWY
FIG. 4. Amino acid sequences of A and B repeating units found in the C-terminal region of the glucosyltransferase protein. The numbers at the left indicate the position number of the first amino acid of each repeating unit.
resultant recombinant plasmid pSF86 expressed a 65kilodalton peptide which reacted with antiserum to glucosyltransferase, indicating that the entire C-terminal one-third of the enzyme was being made. This peptide had no detectable glucosyltransferase activity but did bind to the affinity column. Protein homology. Comparison of the deduced amino acid sequence of glucosyltransferase with other sequenced proteins revealed partial homologies with three proteins: alphaamylase from barley, alpha-amylase from Bacillus amyloliquefaciens, and glycogen phosphorylase from rabbits (Fig. 7). The homologies of the glucosyltransferase with the two alpha-amylases overlap in the same general region, suggesting a region of functional homology for the three proteins. DISCUSSION Nucleotide sequence analysis of the 5-kb fragment of S. sobrinus MFe28 showed the presence of a single open reading frame which coded for the glucosyltransferase protein. The deduced amino acid sequence of the mature protein had a molecular weight of 172,983, which agreed closely with the value derived by SDS-PAGE. This value varies from Base
3457
3646
3841
4036 4213
4375 4552
4713
Amino acid
1100
1163
1228
1293 1352
1406 1464
1519
Al
A2
A3
A4 _
B1
-
A5
-6_
82
AS
_w%A
GTF GBP
pMLG5 pMLG 1 R7-20 R5-2
+
+
+
+
R7-3 R7-34 R3-6
pSF86
_
+
FIG. 5. The 3' end of the gtp gene showing positions of repeat regions and termination points of phage M13 derivatives and pSF86, a pUC8 vector carrying the terminal ScaI-HindIII fragment of gtfl. Adjacent to each fragment is shown the ability of its product to exhibit glucosyltransferase (GTF) activity or to function as a glucanbinding protein (GBP).
4276
FERRETTI ET AL.
J. BACTEPIOL.
FIG. 6. Accumulation of glucan above points where E. coli JM109 was infected with phage M13 derivative R7-20 on sucrosecontaining medium.
previous estimates for glucosyltransferase that produce insoluble glucans (3, 18), although proteolytic degradation and problems associated with molecular weight determinations by gel analysis could easily account for the differences. The deduced amino acid composition of the glucosyltransferase indicates that it is a highly hydrophilic protein, containing 11.5% basic amino acids, 12.6% acidic amino acids, and 41.6% polar amino acids. The restriction map generated from nucleotide sequence analysis is in agreement with previous maps established for this fragment and also supports previous speculations concerning the location of probable transcription and translation initiation sites (27, 27a). These sites are similar to transcription and translation sites reported for other streptococcal genes (6, 7, 14, 17). Downstream of the coding region, a single termination codon is present, but insufficient sequence
is available to comment further about sequences involved in transcription termination. The presence of a 38-amino-acid signal peptide was confirmed by N-terminal amino acid analysis of the purified glucosyltransferase protein, in which the sequence of the first 16 amino acids was identical to the deduced sequence. This signal peptide has properties similar to those of other signal peptides (20), i.e., a positively charged N-terminal region followed by a string of 23 hydrophobic amino acids and a more polar C-terminal region. The cleavage site between Ala and Asp and the surrounding residues are in accordance with the -3, -1 rule proposed by von Heijne (31). The 38-amino-acid signal peptide of glucosyltransferase is in the general size range of other streptococcal signal peptides (6, 14, 17, 32) and that reported for other grampositive organisms (20). It is apparent that E. coli is capable of recognizing the gtfl gene product and cleaving it at the site expected for removal of a secretion signal peptide. Other evidence suggests that the enzyme passes through the cytoplasmic membrane. For example, E. coli strains expressing glucosyltransferase can metabolize sucrose (27a), although sucrose can pass through only the outer membrane and not the cytoplasmic membrane (5). Glucosyltransferase would thus be expected to accumulate in the periplasmic space, but we have been unable, using conventional osmotic shock methods for release of periplasmic proteins (13, 34), to obtain release of the protein. In view of the observation that much of the C-terminal region of glucosyltransferase is not essential for function, it is tempting to speculate that the commonly observed heterogeneity of molecular sizes in enzyme preparations (18, 25) is due to sequential degradation by proteolytic action on this end of the molecule. As yet, however, there is insufficient evidence to confirm this idea. The C-terminal region of the glucosyltransferase protein contains two sets of repeated sequences, the A repeating unit present six times and the B repeating unit present twice. The A repeating units exhibit some variability, whereas the B repeating units are completely identical. The manner or sequence in which these duplications and changes occurred is not obvious. However, duplications in other streptococcal
Alpha-amylase (EC 3.2.1.1)
69a
HSVIQNGYAFTDRYDID---ASKYGNAAELKSL *::
*-::,
*
::--
:-:-:*
840
FQSFATKEEEYTNVVIANNVDKFVSWGITDFEMAPQYVSSTDGQFLDSVIQNGYAFTDRYDLGMSKANKYGTADQLVKA
40
FEWYTPNDGQHWK-RLQNDAEHLSDIGITAVWIPPAYKGLSQSD-NGYGPYDLYDLGE-FQQKGYVRTKYGTKSELQDA
looa
IGALHGKGVQAIADIVINHRCA
919
IKALHAKGLKVMADWVPDQMYT
116b
IGSLHRRNVQVYGD
Glycogen phosphorylase (EC 2.4.1.1) 1107
YMVYGAQNIKGSNYYFLANGAALRNTVYTD-AQGQNHYYGNDGKRYENGYQQFGNDSWRYFKNGVMALGLTTVDGHVQY * --:-
37c
---
:--:--
--:-.
.
.
*0::::--:
::
:
*---::----: .. ..
FTLVKNRNVATPRDYYFAHALTVRDHLVGRWIRTQQHYYEKDPKRI--YYLSLQFYMGRTLQNTMVNLALENACDEADY
FIG. 7. Alignment of predicted amino acid sequences of glucosyltransferase with regions of alpha-amylase from barley (a), alpha-amylase from Bacillus amyloliquefaciens (b), and glycogen phosphorylase from rabbit (c). The sequences were aligned by the PRTALN program of Wilbur and Lipmann (33). Identical (:) and conserved ( ) amino acids are indicated. The numbers at the left indicate the position number of the first amino acid of each protein shown.
VOL. 169, 1987
genes and proteins have been recently reported, e.g., the group A type 6 M protein (14) and the group B immunoglobulin G-binding protein (6, 11). The functional role of the repeating units is not clear, although the region containing them is essential both for glucosyltransferase activity and for binding of glucans. It is of interest that the part of the glucosyltransferase protein showing homology with glycogen phosphorylase spans the region containing the first two A repeat units. This region of glycogen phosphorylase is thought to be involved in substrate binding or catalysis (21) but is distinct from the region (amino acids 401 to 443) thought to be involved in the storage site which binds heptamylose (10). The regions of homology with the alpha-amylase proteins are found further upstream, not too distant from the repeating unit. The common feature of all three enzymes is the ability to bind to glucans, and it seems likely that the identified regions of glucosyltransferase homology are also involved in or essential for glucan binding. The relationship of the gtJf gene to other known glucosyltransferase genes is of considerable interest, especially in view of the different restriction maps reported (1, 22, 24, 27a) and the evolutionary distance between S. mutans, (G+C content, 36 to 38%) and S. sobrinus (G + C content, 44 to 46% (4). In the accompanying paper, Shiroza et al. report a sequence analysis of the S. mutans gtJB gene, which specifies a glucosyltransferase that also produces insoluble glucans (30). ACKNOWLEDGMENT We thank David R. Lorenz for his many contributions to this work and Tricia Stalker for excellent technical assistance. This research was supported by Public Health Service grant DE 08191 from the National Institutes of Health and by the Medical Research Council.
1.
2. 3.
4.
5.
LITERATURE CITED Aoki, H., T. Shiroza, M. Hayakawa, S. Sato, and H. K. Kuramitsu. 1986. Cloning of a Streptococcus mutans glucosyltransferase gene coding for insoluble glucan synthesis. Infect. Immun. 53:587-594. Burne, R. A., B. Rubinfeld, W. H. Bowen, and R. E. Yasbin. 1986. Cloning and expression of a Streptococcus mutans glucosyltransferase gene in Bacillus subtilis. Gene 40:201-209. Ciardi, J. 1983. Purification and properties of glucosyltransferases of Streptococcus mutans: a review, p. 51-64. In R. J. Doyle and J. E. Ciardi (ed.), Glucosyltransferases, glucans, sucrose and dental caries (a special supplement to Chemical Senses). Information Retrieval Limited, Washington, D.C. Coykendall, A. L., and K. B. Gustafson. 1986. Taxonomy of Streptococcus mutans, p. 21-28. In S. Hamada, S. M. Michalek, H. Kiyono, L. Menaker, and J. R. McGhee (ed.), Molecular microbiology and immunobiology of Streptococcus mutans. Elsevier Science Publishing, Inc., New York. Decad, G. M., and H. Nikaido. 1976. Outer membrane of gram-negative bacteria. XII. Molecular-seiving function of cell wall. J. Bacteriol. 128:325-336.
6. Fahnestock, S. R., P. Alexander, J. Nagle, and D. Filpula. 1986. Gene for an immunoglobulin-binding protein from a group G streptococcus. J. Bacteriol. 167:870-880. 7. Ferretti, J. J., K. S. Gilmore, and P. Courvalin. 1986. Nucleotide sequence analysis of the gene specifying the bifunctional
6'-aminoglycoside acetyltransferase 2"-aminoglycoside phosphotransferase enzyme in Streptococcus faecalis and identification and cloning of gene regions specifying the two activities. J. Bacteriol. 167:631-638. 8. Gilmore, M. S., K. S. Gilmore, and W. Goebel. 1985. A new strategy for "ordered" DNA sequencing based on a novel
DNA SEQUENCE OF S. SOBRINUS gtfl
4277
method for the rapid purification of near milligram quantities of a cloned restriction fragment. Gene Anal. Tech. 2:108-114. 9. Gilpin, M. L., R. R. B. Russell, and P. Morrissey. 1985. Cloning and expression of two Streptococcus mutans glucosyltransferases in Escherichia coli K-12. Infect. Immun. 49:414-416. 10. Goldsmith, E., and R. J. Fletterick. 1983. Oligosaccharide conformation and protein saccharide interactions in solution. Pure Appl. Chem. 55:577-588. 11. Guss, B., M. Eliasson, A. Olsson, M. Uhlen, A.-K. Frej, H. Jornvall, J.-I. Flock, and M. Lindberg. 1986. Structure of the IgG-binding regions of streptococcal protein G. EMBO J. 5: 1567-1575. 12. Hamada, S., and H. D. Slade. 1980. Biology, immunology, and cariogenicity of Streptococcus mutans. Microbiol. Rev. 44: 331-384. 13. Hazelbauer, G. L., and S. Harayama. 1979. Mutants in transmission of chemotactic signals from two independent receptors of E. coli. Cell 16:617-625. 14. Hollingshead, S. K., V. F. Fischetti, and J. R. Scott. 1986. Complete nucleotide sequence of type 6M protein of the group A streptococcus. J. Biol. Chem. 261:1677-1686. 15. Konat, G., H. Offner, and J. Mellah. 1984. Improved sensitivity for detection and quantitation of glycoproteins on polyacrylamide gels. Experientia 40:303-304. 16. Kuhn, S., H.-J. Fritz, and P. Starlinger. 1979. Close vicinity of ISI integration sites in the leader sequence of the gal operon of E. coli. Mol. Gen. Genet. 167:235-241. 17. Malke, H., B. Roe, and J. J. Ferretti. 1984. Nucleotide sequence of the streptokinase gene from Streptococcus equisimilis H46A. Gene 34:337-362. 18. Mukasa, H. 1986. Properties of Streptococcus mutans glucosyltransferases, p. 121-132. In S. Hamada, S. Michalek, H. Kiyono, L. Menaker, and J. R. McGhee (ed.), Molecular microbiology and immunobiology of Streptococcus mutans. Elsevier Science Publishing, Inc., New York. 19. Muller-Hill, B., L. Crapo, and W. Gilbert. 1968. Mutants that make more lac repressor. Proc. Natl. Acad. Sci. USA 59: 1259-1264. 20. Oliver, D. 1985. Protein secretion in Escherichia coli. Annu. Rev. Microbiol. 39:615-648. 21. Palm, D., R. Goerl, and K. J. Burger. 1985. Evolution of catalytic and regulatory sites in phosphorylases. Nature (London) 313:500-503. 22. Pucci, M. J., K. R. Jones, and F. L. Macrina. 1987. Evidence for a duplicated DNA sequence associated with a glucosyltransferase gene in Streptococcus mutans, p. 205-208. In J. J. Ferretti and R. Curtiss III (ed.), Streptococcal genetics. American Society for Microbiology, Washington, D.C. 23. Pucci, M. J., and F. L. Macrina. 1986. Molecular organization and expression of the gtfA gene of Streptococcus mutans LM7. Infect. Immun. 54:77-84. 24. Robeson, J. P., R. G. Barletta, and R. Curtiss III. 1983. Expression of a Streptococcus mutans glucosyltransferase gene in Escherichia coli. J. Bacteriol. 155:211-221. 25. Russell, R. R. B., E. Abdulla, M. L. Gilpin, and K. Smith. 1986. Characterization of Streptococcus mutans surface antigens, p. 61-70. In S. Hamada, S. Michalek, H. Kiyono, L. Menaker, and J. R. McGhee (ed.), Molecular microbiology and immunobiology of Streptococcus mutans. Elsevier Science Publishing, Inc., New York. 26. Russell, R. R. B., D. Coleman, and G. Dougan. 1985. Expression of a gene for glucan-binding protein from Streptococcus mutans in Escherichia coli. J. Gen. Microbiol. 131:295-299. 27. Russell, R. R. B., and M. L. Gilpin. 1987. Identification of virulence components of mutans streptococci, p. 201-204. In J. J. Ferretti and R. Curtiss III (ed.), Streptococcal genetics. American Society for Microbiology, Washington, D.C. 27a.Russell, R. R. B., M. L. Gilpin, H. Mukasa, and G. Dougan. 1987. Characterization of glucosyltransferase expressed from a Streptococcus sobrinus gene cloned in Escherichia coli. J. Gen. Microbiol. 133:935-944. 28. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci.
4278
FERRETTI ET AL.
USA 74:5463-5467. 29. Segrest, J. P., and R. L. Jackson. 1972. Molecular weight determinations of glycoproteins by polyacrylamide gel electrophoresis in sodium dodecyl sulphate. Methods Enzymol. 28: 54-63. 30. Shiroza, T., S. Ueda, and H. K. Kuramitsu. 1987. Sequence analysis of the gtjB gene from Streptococcus mutans. J. Bacteriol. 169:4263-4270. 31. von Heijne, G. 1983. Patterns of amino acids near signal sequence cleavage sites. Eur. J. Biochem. 133:17-21. 32. Weeks, C. R., and J. J. Ferretti. 1986. Nucleotide sequence of the type A streptococcal exotoxin (erythrogenic toxin) gene
J. BACTERIOL.
from Streptococcus pyogenes bacteriophage T12. Infect. Immun. 52:144-150. 33. Wilbur, W. J., and D. J. Lipman. 1983. Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. USA 80:726-730. 34. Witholt, B., M. Boekhout, M. Brock, J. Kingma, H. van Heerikhuizen, and L. de Lelj. 1976. An efficient and reproducible procedure for the formation of spheroplasts from variously grown Escherichia coli. Anal. Biochem. 74:160-170. 35. Yanisch-Perron, C., J. Vieira, and J. Messing. 1985. Improved M13 phage cloning vectors and host strains: nucleotide sequences of M13 mpl8 and pUC19 vectors. Gene 33:103-119.