Dec 29, 1997 - mal subunit S3 by partial amino acid sequence analysis. .... Phone: (089) 2892-2372. ... The band corresponding to subunit S3 was cut out.
JOURNAL OF BACTERIOLOGY, June 1998, p. 3091–3099 0021-9193/98/$04.0010 Copyright © 1998, American Society for Microbiology
Vol. 180, No. 12
Multidomain Structure and Cellulosomal Localization of the Clostridium thermocellum Cellobiohydrolase CbhA VLADIMIR V. ZVERLOV,1 GALINA V. VELIKODVORSKAYA,1 WOLFGANG H. SCHWARZ,2 KARIN BRONNENMEIER,2 JOSEF KELLERMANN,3 AND WALTER L. STAUDENBAUER2* Institute of Molecular Genetics, Russian Academy of Science, 123182 Moscow, Russia,1 and Institute for Microbiology, Technical University Munich, 80290 Munich,2 and Max-Planck-Institute for Biochemistry, 82152 Martinsried,3 Germany Received 29 December 1997/Accepted 16 April 1998
The nucleotide sequence of the Clostridium thermocellum F7 cbhA gene, coding for the cellobiohydrolase CbhA, has been determined. An open reading frame encoding a protein of 1,230 amino acids was identified. Removal of a putative signal peptide yields a mature protein of 1,203 amino acids with a molecular weight of 135,139. Sequence analysis of CbhA reveals a multidomain structure of unusual complexity consisting of an N-terminal cellulose binding domain (CBD) homologous to CBD family IV, an immunoglobulin-like b-barrel domain, a catalytic domain homologous to cellulase family E1, a duplicated domain similar to fibronectin type III (Fn3) modules, a CBD homologous to family III, a highly acidic linker region, and a C-terminal dockerin domain. The cellulosomal localization of CbhA was confirmed by Western blot analysis employing polyclonal antibodies raised against a truncated enzymatically active version of CbhA. CbhA was identified as cellulosomal subunit S3 by partial amino acid sequence analysis. Comparison of the multidomain structures indicates striking similarities between CbhA and a group of cellulases from actinomycetes. Average linkage cluster analysis suggests a coevolution of the N-terminal CBD and the catalytic domain and its spread by horizontal gene transfer among gram-positive cellulolytic bacteria. CbhA. This enzyme, formerly designated CBH3, has been characterized as a cellobiohydrolase by its ability to hydrolyze crystalline cellulose, yielding cellobiose as the only degradation product (36, 44, 49). It will be shown that CbhA has a highly complex multidomain structure containing, in addition to an Ig-like domain and a catalytic domain homologous to cellulase family E1, two distinct CBDs, a duplicated Fn3-like module, and a dockerin domain. Evidence identifying CbhA as cellulosomal component S3 is presented.
Numerous proteins of higher organisms have a multidomain architecture consisting of strings of mobile modules (10). Many of the modules identified so far have defined binding functions, but some may just act as simple spacer elements required only to arrange binding surfaces in space. Common types of constituent modules found in extracellular mosaic proteins are the fibronectin type III (Fn3) domain and the variants of the immunoglobulin (Ig) domain. These modules have very similar three-dimensional folds that form a sandwich of two antiparallel b-sheets with slightly different strand topologies (7, 25). The broad distribution of these modules in animal proteins is often regarded as evidence for exon shuffling. Many modules are found in multiple copies resulting from several gene duplications after the original shuffling event. Large mosaic proteins are conspicuously absent in plants and fungi but appear to be widespread among bacteria. Thus, cellulases and other glycohydrolases from diverse bacteria have multidomain structures containing in addition to their catalytic domains several noncatalytic domains involved in substrate binding or specific protein interactions (46). A particularly interesting example is the cellulosome of Clostridium thermocellum, a cellulolytic multienzyme complex located at the cell surface and consisting of numerous catalytic components, including b-1,4-endoglucanases, cellobiohydrolases, and hemicellulases attached to the cellulosome integrating protein (scaffoldin) CipA (3, 4). This attachment is mediated by the conserved dockerin domain of the catalytic subunits and the iterated cohesin domains of CipA (43). Targeting of the cellulosome to its cellulose substrate is accomplished primarily by the cellulose-binding domain (CBD) of CipA. In this paper, we report the structure of the C. thermocellum F7 cellobiohydrolase gene cbhA and the encoded cellulase,
MATERIALS AND METHODS Bacterial strains and growth conditions. Escherichia coli TG1 harboring recombinant plasmid pCU303 or pCU304 (49) was aerated at 37°C in Luria broth supplemented with ampicillin (0.1 mg/ml). C. thermocellum F7 (obtained from the Institute of Microbial Biochemistry and Physiology, RAS, Puschino, Moscow Region, Russia) was grown under strict anaerobiosis at 60°C in GS-2 medium (21). Sequence analysis. The DNA sequence was determined from supercoiled double-stranded plasmid DNA for both strands by using the Sequenase kit (Pharmacia) for extension of 59 biotinylated primers. DNA fragments were detected with a GATC 1500 Direct-Blotting-Electrophoresis apparatus (GATC, Konstanz, Germany) using streptavidin-conjugated alkaline phosphatase and nitroblue tetrazolium–5-bromo-4-chloro-3-indolylphosphate (Serva) as the chromogenic substrate. Sequence data were analyzed with the DNASIS software package (Hitachi Software Engineering). Multiple sequence alignments were carried out by the CLUSTAL procedure (19). Hydrophobic cluster analysis (HCA) was performed by the method of Gaboriaud et al., employing a simplified two-dimensional sequence representation (18). To define hydrophobic clusters, F, I, L, M, V, W, and Y were considered hydrophobic amino acids. Alanine is considered hydrophobic only within a hydrophobic cluster. To evaluate the correspondence between two HCA patterns, a matching score [(2 CR 3 100)/(RC1 1 RC2)] was calculated, where RC1 and RC2 are the numbers of aligned hydrophobic residues in sequences 1 and 2, respectively, and CR is the total number of matching residues. Purification of truncated CbhA protein. A 5-liter culture of E. coli TG1(pCU304) was harvested by centrifugation, washed with 50 mM phosphatecitrate (PC) buffer (pH 6.3), suspended in 200 ml of buffer containing 2 mM phenylmethylsulfonyl fluoride, and sonicated by using an ultrasonic disintegrator (MSE). Cell extracts were heated for 30 min at 60°C and centrifuged (10,000 3 g, 20 min). The cleared crude extract was precipitated with ammonium sulfate (60% saturation). The precipitate was collected by centrifugation and dissolved in 50 ml of PC buffer. Column chromatography was performed at room temperature with a fast-
* Corresponding author. Mailing address: Institute for Microbiology, Technical University Munich, Arcisstrasse 21, D-80290 Munich, Germany. Phone: (089) 2892-2372. Fax: (089) 2892-2360. E-mail: zverlov @biol.chemie.tu-muenchen.de. 3091
3092
ZVERLOV ET AL.
J. BACTERIOL.
FIG. 1. Nucleotide and deduced amino acid sequences of the cbhA gene. The potential ribosome-binding site (SD) is in boldface type and underlined. A palindrome is indicated by arrows facing each other. The putative leader sequence is indicated by italic type. The segments encoding the different regions of CbhA are indicated by boxes of different patterns: o, CBD family IV; t, Ig-like domain; ■, catalytic domain; z, Fn3-like domain; p, CBD family III; d, dockerin domain. The underlined amino acids were determined for cellulosomal protein S3 by liquid-phase sequencing.
performance liquid chromatography system (Pharmacia). Aliquots (1.5 ml) were loaded on a 20- by 900-mm Toyopearl HW-60 column (Toyo-Soda, Shinanyo, Japan) equilibrated with PC buffer and eluted with the same buffer at a flow rate of 0.7 ml/min. Pooled fractions with cellobiohydrolase activity were applied to a MonoQ HR 10/10 column. Elution was performed with a linear NaCl gradient (0.0 to 0.4 M) in PC buffer. Fractions containing CbhA, which eluted at 0.3 M NaCl, were dialyzed, concentrated, and purified to electrophoretic homogeneity by gel filtration on a Superose-12 HR column (16 by 500 mm).
Enzyme assay. Cellobiohydrolase activity was assayed at 60°C for 10 min in PC buffer (pH 6.0) by using p-nitrophenyl-b-D-cellobioside (1 mM) as the substrate. Reactions were terminated by the addition of 1 M Na2CO3. One enzyme unit corresponds to the release of 1 mmol of p-nitrophenol per min. Preparation of cellulosomes. A 0.5-liter culture of C. thermocellum F7 was grown for 36 h in GS-2 medium containing filter paper as a sole carbon source. Cells were harvested by centrifugation, washed six times with 250 ml of deionized water, and resuspended in 30 ml of 100 mM acetate buffer (pH 5.7) containing
VOL. 180, 1998
STRUCTURE AND LOCALIZATION OF C. THERMOCELLUM CbhA
3093
FIG. 1—Continued.
10 mM CaCl2, 2 mM EDTA, and 5 mM dithiothreitol. The suspension was sonicated for 3 min in an MSE ultrasonic disintegrator and dialyzed at 50°C against acetate buffer to completely hydrolyze the remaining cellulose fibers (37). After centrifugation (30,000 3 g, 30 min), the supernatant was concentrated by ultrafiltration (XM300 membrane; Amicon) to 1 ml and applied to a Superose 6 HR 10/30 column (Pharmacia) equilibrated with 50 mM Tris-HCl (pH 7.5). The purified cellulosomes eluted near the void volume of the column. Immunological methods. Polyclonal antibodies were raised in white rabbits by infection of 0.25 mg of recombinant CbhA protein in Freund’s complete adjuvant (Amersham). Booster injections were given after 7 days, and bleeding was performed after 14 days. The serum was purified by using a serum IgG column and checked for specificity. For Western blot analysis, sodium dodecyl sulfate (SDS)– 12% polyacrylamide gel electrophoresis (PAGE) slabs of purified cellulosomes
were blotted onto a nitrocellulose membrane. The replicates were incubated with anti-CbhA rabbit serum and subjected to immunostaining using donkey antirabbit serum conjugated to horseradish peroxidase (Amersham) and 4-chloro1-naphthol as a chromogenic substrate. Protein cleavage, isolation of peptides, and sequencing of peptides and N termini. Cellulosomal proteins (100 mg) were separated by SDS–10% PAGE and stained with Coomassie blue. The band corresponding to subunit S3 was cut out and incubated with 2 mg of endoproteinase LysC (Boehringer) in 200 ml of 0.1 M Tris-HCl (pH 8.5) for 6 h at 37°C. The peptide mixture was separated by reversed-phase high-performance liquid chromatography on a Supersphere 60 RP select B column (Merck) at a flow rate of 0.3 ml/min. Solvent A was 0.1% trifluoroacetic acid, and solvent B was 0.1% trifluoroacetic acid in acetonitrile. The gradient of 0 to 70% solvent B was run in 70 min. Selected peptide-
3094
ZVERLOV ET AL.
J. BACTERIOL.
FIG. 2. Alignment of amino acid sequences of family IV CBDs of bacterial cellulases and endo-1,3-b-glucanases. Abbreviations and accession numbers: Cth-LicA, C. thermocellum LicA, X89732; Tne-LamA, Thermotoga neapolitana LamA (54), Z47974; Cth-CbhA, C. thermocellum CbhA; Cce-CelE, Clostridium cellulolyticum CelE (2), Q46002; Cfi-CenC, Cellulomonas fimi CenC (9), P14090; Sre-Cel1, S. reticuli Cel1 (42), Q05156; Tfu-E1, Thermomonospora fusca E1 (26), Q08166. Shaded boxes highlight positions where residues are conserved in five or more family members, including those of CbhA. The conserved aromatic residues are indicated by asterisks. All sequences are numbered from Met-1.
containing fractions were subjected to automated sequencing. N-terminal amino acid sequences were determined by Edman degradation using a Procise 492 protein sequencer (Applied Biosystems). The phenylthiohydantoin derivatives were identified by reversed-phase high-performance liquid chromatography. Nucleotide sequence accession number. The nucleotide and amino acid sequences reported in this study have been submitted to GenBank under accession no. X80993.
RESULTS Nucleotide sequence of the cbhA gene. The recombinant plasmid pCU303 carries a 10.7-kb insert of C. thermocellum including the cellobiohydrolase gene cbhA. EcoRI digestion of pCU303 resulted in the deletion of a 7.3-kb DNA segment, yielding plasmid pCU304 (49). Sequencing the insert of pCU304 revealed that EcoRI cleavage had removed the 59-end portion of the cbhA gene, leading to the production of a truncated enzyme species still exhibiting cellobiohydrolase activity. Therefore, the cbhA sequence was completed by sequencing the corresponding region of pCU303 by using specific oligonucleotide primers. The sequenced region (4,183 bp) contained only one long open reading frame (ORF) of 3,690 nucleotides encoding a protein of 1,230 amino acids (Fig. 1). The putative initiation codon ATG was preceded at a spacing of 6 bp by a potential
ribosome-binding site with a calculated free energy of ShineDalgarno base pairing of 266.5 kJ/mol. The ochre stop codon at position 4129 is followed by another in-frame ochre stop codon at position 4153. As observed previously for other C. thermocellum genes (1), the coding sequence and its flanking regions differed markedly in their G1C contents (43.0 and 29.7%, respectively). A palindromic sequence with a free energy for RNA hairpin formation of 281.2 kJ/mol is located immediately downstream of the ORF. This dyad symmetry element, which is followed by a run of 5 T’s, might function as a factor-independent transcription terminator. Sequence inspection did not reveal any consensus promoter sequence recognized by bacterial RNA polymerases. Multidomain structure of CbhA. Analysis of the amino acid sequence of CbhA derived from the nucleotide sequence revealed a multidomain structure of unexpected complexity (Fig. 1). Most structural elements could be readily identified by sequence comparison. Thus, the N-terminal sequence exhibits the typical features of a bacterial signal peptide required for protein secretion (50) with a predicted cleavage site between position 27 (Ala) and position 28 (Leu). Removal of the signal peptide yields a mature protein of 1,203 amino acids with a molecular weight of 135,139.
VOL. 180, 1998
STRUCTURE AND LOCALIZATION OF C. THERMOCELLUM CbhA
3095
FIG. 3. Alignment of amino acid sequences of selected CBDs from family III. Abbreviations and accession numbers: Cth-CbhA, C. thermocellum CbhA; Cth-CipA, C. thermocellum CipA (12), X67506; Csa-CelB, Caldicellulosiruptor saccharolyticus CelB (41), X13602; Cst-CelZ, Clostridium stercorarium CelZ (20), X55299; Cth-CelI, C. thermocellum CelI (17), L04735; Bla-CelA, Bacillus lautus CelA (15), M76588; Eca-CelV, Erwinia carotovora CelV (33), X79241. Shaded boxes highlight positions where residues are conserved in four or more family members, including those of CbhA. The conserved aromatic residues and the residues that are implicated in Ca21 binding are indicated by asterisks and solid triangles, respectively. All sequences are numbered from Met-1.
The central region of CbhA contains the catalytic domain, which is homologous to cellulase subfamily E1 (46). It exhibits 38 to 40% sequence identity with the catalytic domains of a carboxmethylcellulase from Pseudomonas fluorescens (14) and a group of cellulases from gram-positive bacteria with high
G1C contents, including endoglucanase E1 from Thermomonospora fusca (26), endoglucanase CenC from Cellulomonas fimi (9), and endoglucanase Cel1 from Streptomyces reticuli (42). On the other hand, only 20 to 22% sequence identity was observed between the catalytic domain of CbhA
FIG. 4. HCA plots of CbhA amino acid positions 825 to 912 (A) and 914 to 1000 (B). Hydrophobic amino acids are shown as gray circles with conserved positions highlighted in dark gray. Proline residues are shown as black circles, and other helix-breaking amino acids (D, G, S, and N) found predominantly in loop regions are shown as white circles.
3096
ZVERLOV ET AL.
J. BACTERIOL.
FIG. 5. Secondary-structure prediction of Fn3-like modules. Abbreviations and accession numbers: Tfu-E4, Thermomonospora fusca exoglucanase E4 (26), L20093; Hum-Fib, human fibronectin (34), P02751. Other abbreviations are as described in the legend to Fig. 2. Secondary-structure states of amino acids were predicted by the PREDATOR program (11) and are represented by an “E” (extended or sheet) and a dash (coil). The seven antiparallel b-strands of the 10th Fn3 module of human fibronectin are designated by the letters A to F (34) at the bottom of the figure. All sequences are numbered from Met-1.
and the C. thermocellum cellulases CelD (24) and CelJ (1), two other members of cellulase subfamily E1. As observed for all enzymes of this subfamily, the catalytic domain of CbhA is preceded by an Ig-like b-barrel domain of unknown function (27). The catalytic core region of CbhA is flanked by two distinct CBDs. The N-terminal domain is homologous to family IV substrate-binding domains of bacterial cellulases and endo-1,3b-glucanases (Fig. 2), whereas the C-terminal domain is a member of CBD family III (Fig. 3). Both domains consist of two antiparallel b-sheets with the topology of a jelly role b-sandwich (23, 48). Substrate binding is mediated by a strip of highly conserved aromatic residues flanked by polar hydrogenbonding groups. The family III CBD of the C. thermocellum scaffoldin CipA also contains a Ca21 binding site (48), which seems to be present in all members of this family. Identification of a novel Fn3-like domain. Inspection of the protein joining the CbhA catalytic domain and the family III CBD sequence by HCA (13, 29) revealed the presence of a repeated domain (Fig. 4). Although the aligned sequences exhibit only 26% identity, their HCA matching score is 80%, which is considered strong evidence for sequence homology (13). The duplicated domain showed no obvious homology to other noncatalytic domains but appeared to be distantly related to the Fn3-like domain of T. fusca endoglucanase E1 and T. fusca exoglucanase E4. Although the similarity is barely detectable on the amino acid sequence level, high-accuracy secondary-structure prediction (11) suggests a b-sheet topol-
FIG. 6. Detection of CbhA in the cellulosome of C. thermocellum. (Left panel) Western blot analysis of cellulosomal proteins detected with a polyclonal antibody raised against truncated CbhA; (right panel) SDS-PAGE of cellulosomal proteins stained with Coomassie brilliant blue. Cellulosomal subunits S1 to S14 are indicated with corresponding molecular masses.
ogy strikingly similar to that of Fn3 modules (Fig. 5). The duplicated CbhA domain also resembles Fn3-like domains in amino acid composition, exhibiting an increased content of valine and hydroxylated aliphatic amino acids (data not shown). Cellulosomal localization. The C-terminal segment of CbhA is made up of the highly conserved duplicated sequence of 24 amino acids constituting the cellulosomal dockerin module.
FIG. 7. Comparison of the domain structure of cellulases of subfamily E1. Domains and regions showing significant similarity are indicated by the same pattern. Abbreviations and accession numbers: Cth-CelJ, C. thermocellum CelJ (1), D83704; Cth-CelD, C. thermocellum CelD (24), X04584; Pfl-EglA, P. fluorescens EglA (14), X12570; Fsu-EgB, Fibrobacter succinogenes EgB (6), L14436; Bfi-CelD, Butyrivibrio fibrisolvens CelD (5), X55732; aa, amino acids. Other abbreviations for the enzymes are described in the legend to Fig. 2.
STRUCTURE AND LOCALIZATION OF C. THERMOCELLUM CbhA
VOL. 180, 1998
3097
FIG. 8. Average linkage cluster analysis. Similar amino acids were grouped by the classification of Risler et al. (40). The dendrogram was derived from pairwise similarity scores in accordance with the UPGMA (unweighted pair group maximum averages) method (45). Abbreviations for enzymes are described in the legends to Fig. 2 and 7.
This domain is separated from the family III CBD by an acidic 14-amino-acid linker sequence consisting of repeats of the tripeptide Pro-Glu-Glu (Fig. 1). The presence of a dockerin domain strongly suggests that CbhA is a cellulosome constituent. To confirm this conjecture, polyclonal antibodies were raised against the truncated CbhA protein expressed by pCU304. It should be pointed out that the cloned insert terminates at the EcoRI site at nucleotide 3688 and thus lacks the C-terminal portion of CbhA including the dockerin domain. The truncated protein is further processed upon expression in E. coli, yielding an enzymatically active protein of 80 kDa, which presumably consists only of the catalytic domain and the flanking Ig- and Fn3-like domains. Western blot analysis indicated that two cellulosomal proteins, S3 and S5, with apparent molecular masses of 150 and 98 kDa, respectively, strongly reacted with anti-CbhA antibodies (Fig. 6). The comparison of molecular masses indicates that CbhA might correspond to subunit S3. The identity of CbhA and S3 was established by amino acid sequence analysis. Due to blockage of the N terminus, partial sequences were determined upon cleavage of S3 with endoprotease LysC. The two peptide sequences obtained (see Fig. 1) were fully consistent with the deduced amino acid sequence of CbhA. DISCUSSION Although numerous cellulolytic and hemicellulolytic C. thermocellum enzymes are considered cellulosome constituents due to the presence of a dockerin domain, only a few have been correlated with cellulosomal subunits (1, 8, 16, 38, 53). Western blot analysis and amino acid sequence determination clearly demonstrate that CbhA is identical to cellulosomal protein S3. It should be noted that the molecular mass of S3 (150 kDa) determined by SDS-PAGE is considerably larger than the mass of CbhA (135 kDa) deduced from the DNA sequence. This difference might be due to an atypical electrophoretic mobility of CbhA possibly caused by the highly acidic linker sequence (positions 1150 to 1162). It was observed previously that the presence of linker regions rich in glutamic acid
residues can retard migration of multidomain proteins in SDSPAGE (31). The immunological data suggest that S5 is either a structurally related protein or a proteolytic degradation product of CbhA. Formation of S5 by proteolytic cleavage of CbhA is consistent with the N-terminal sequence of S5 from C. thermocellum JW20 (8). The reported sequence LEDKS(S)KLPDYK NDL(L)YE is nearly identical to the N terminus of mature CbhA predicted from the sequence data (Fig. 1). Minor sequence variations could reflect differences between C. thermocellum JW20 and F7. The size of S5 (98 kDa) indicates that the proteolytic cleavage between the two Fn3-like modules of CbhA might have occurred. Truncation of the C-terminal dockerin domain during cellulosome dissociation has recently been reported for subunit S8, which corresponds to cellobiohydrolase CelS (8). The identification of CbhA and CelS as cellulosomal constituents S3 and S8, respectively, implies that the cellulosome contains at least two exoglucanases and refutes the early concept that the cellulosome consists entirely of endoglucanase activities (35). Both exoglucanases have been characterized as cellobiohydrolases (28, 36, 44, 49) but belong to different cellulase families. CbhA is a member of cellulase family E1, whereas CelS belongs to family L (46). The two enzymes also differ strikingly in their domain structures. CelS is less complex and consists of a catalytic domain and a C-terminal dockerin domain (51). Due to its lack of CBDs, CelS requires the presence of CipA for the efficient hydrolysis of crystalline cellulose. It has been proposed that both proteins interact synergistically in an enzyme (CelS)-anchor (CipA) manner (32, 52). The multidomain structure of CbhA was unexpected, considering that the cellulosome is mainly an assembly of catalytic subunits, which are organized for concerted action and targeted to the insoluble substrate by the CipA protein (4). In particular, the presence of both an N-terminal and a C-terminal CBD is apparently redundant. However, it should be kept in mind that family IV and family III CBDs differ strikingly in their substrate specificity. Whereas family III domains bind specifically to crystalline cellulose, family IV domains bind with
3098
ZVERLOV ET AL.
approximately equal affinities to amorphous cellulose, cellooligopentaose, and mixed-linkage b-glucans (22, 47). Conceivably, this binding site could participate directly in cellulose degradation by keeping the amorphous region in a noncrystalline state suitable for enzymatic hydrolysis. On the other hand, the C-terminal family III CBD might assist CipA in attaching the cellulosome to crystalline cellulose fibers. The role of the other noncatalytic domains of CbhA is less obvious. It should be noted that the Ig-like b-barrel domain has so far been found only in members of cellulase family E1, where it is always positioned at the N terminus of the catalytic domain (see Fig. 7). It might therefore be specifically involved in the folding and/or stabilization of the catalytic a6/a6-barrel domain of this cellulase subfamily. In contrast, Fn3-like domains are found in various unrelated prokaryotic depolymerases in widely different arrangements (30). It is therefore likely that these domains have a similar function in prokaryotic and in eukaryotic exoproteins, namely, adhesion to cell surface receptors. In the case of CbhA, this original function became redundant upon integration of the enzyme into the cellulosomal complex. On the other hand, duplication of the module might be required for correct positioning of the C-terminal CBD with respect to the catalytic domain. Structure analysis has shown that such module pairs do not simply function as flexible spacer elements but adopt defined relative orientations stabilized by specific intermodule interactions (7, 39). This change of function could explain the sequence divergence from other prokaryotic Fn3-like modules. Comparison of the domain structures of various other cellulases of subfamily E1 indicates striking similarities between CbhA and a group of enzymes from gram-positive bacteria with high G1C content (Fig. 7). In particular, it is obvious that the endoglucanase E1 of T. fusca has a similar functional design consisting of an N-terminal and central catalytic region involved in cellulose hydrolysis and a C-terminal portion involved in substrate and cell surface adherence. Average linkage cluster analysis of the N-terminal family IV CBD and the catalytic domains suggests coevolution of these two domains (Fig. 8). Apparently, this domain array arose by a rare recombination event and spread by horizontal transfer among grampositive cellulolytic bacteria. Contrary to the proposed rearrangements of eukaryotic multidomain proteins due to exon shuffling, such domain arrays appear to be remarkably stable in bacteria, reflecting a fundamental difference in gene structure. ACKNOWLEDGMENTS This work was supported in part by a grant from the Deutsche Forschungsgemeinschaft (SFB 145), by a NATO Collaborative Research grant (HTECH. CRG 930993), by a grant from the Volkswagenstiftung, and by a grant from the Russian Foundation of Basic Research.
J. BACTERIOL.
7. 8.
9.
10. 11. 12.
13. 14. 15. 16.
17. 18. 19. 20.
21. 22.
23.
24. 25. 26.
REFERENCES 1. Ahsan, M. M., T. Kimura, S. Karita, K. Sakka, and K. Ohmiya. 1996. Cloning, DNA sequencing, and expression of the gene encoding Clostridium thermocellum cellulase CelJ, the largest catalytic component of the cellulosome. J. Bacteriol. 178:5732–5740. 2. Bagnara-Tardif, C., C. Gaudin, A. Belaich, P. Hoest, T. Citard, and J. P. Belaich. 1992. Sequence analysis of a gene cluster encoding cellulases from Clostridium cellulolyticum. Gene 119:17–28. 3. Bayer, E. A., E. Morag, and R. Lamed. 1994. The cellulosome—a treasuretrove for biotechnology. Trends Biotechnol. 12:379–386. 4. Be´guin, P., and M. Lemaire. 1996. The cellulosome: an exocellular, multiprotein complex specialized in cellulose degradation. Crit. Rev. Biochem. Mol. Biol. 31:201–236. 5. Berger, E., W. A. Jones, D. T. Jones, and D. R. Woods. 1990. Sequencing and expression of a cellodextrinase (ced1) gene from Butyrivibrio fibrisolvens H17c cloned in Escherichia coli. Mol. Gen. Genet. 223:310–318. 6. Broussolle, V., E. Forano, G. Gaudet, and Y. Ribot. 1994. Gene sequence and
27. 28. 29.
30. 31. 32.
analysis of protein domains of EGB, a novel family E endoglucanase from Fibrobacter succinogenes S58. FEMS Microbiol. Lett. 124:439–447. Campbell, I. D., and C. Spitzfaden. 1994. Building proteins with fibronectin type III modules. Structure 2:333–337. Choi, S. K., and L. G. Ljungdahl. 1996. Dissociation of the cellulosome of Clostridium thermocellum in the presence of ethylenediaminetetraacetic acid occurs with the formation of truncated polypeptides. Biochemistry 35:4897– 4905. Coutinho, J. B., B. Moser, D. G. Kilburn, R. A. J. Warren, and R. C. Miller. 1991. Nucleotide sequence of the endoglucanase C gene (cenC) of Cellulomonas fimi, its high-level expression in Escherichia coli, and characterization of its products. Mol. Microbiol. 5:1221–1233. Doolittle, R. F. 1995. The multiplicity of domains in proteins. Annu. Rev. Biochem. 64:287–314. Frishman, D., and P. Argos. 1997. Seventy-five percent accuracy in protein secondary structure prediction. Proteins Struct. Funct. Genet. 27:329–335. Fujino, T., P. Beguin, and J. P. Aubert. 1993. Organization of a Clostridium thermocellum gene cluster encoding the cellulosomal scaffolding protein CipA and a protein possibly involved in attachment of the cellulosome to the cell surface. J. Bacteriol. 175:1891–1899. Gaboriaud, C., V. Bissery, T. Benchetrit, and J. P. Mornon. 1987. Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid sequences. FEBS Lett. 224:149–155. Hall, J., and H. J. Gilbert. 1988. The nucleotide sequence of a carboxycellulase gene from Pseudomonas fluorescens subsp. cellulosa. Mol. Gen. Genet. 213:112–117. Hansen, C. K., B. Diderichsen, and P. L. Jorgensen. 1992. celA from Bacillus lautus PL236 encodes a novel cellulose-binding endo-b-1,4-glucanase. J. Bacteriol. 174:3522–3531. Hayashi, H., K.-I. Takagi, M. Fukumura, T. Kimura, S. Karita, K. Sakka, and K. Ohmiya. 1997. Sequence of xynC and properties of XynC, a major component of the Clostridium thermocellum cellulosome. J. Bacteriol. 179: 4246–4253. Hazlewood, G. P., K. Davidson, J. I. Laurie, N. S. Huskisson, and H. J. Gilbert. 1993. Gene sequence and properties of CelI, a family E endoglucanase from Clostridium thermocellum. J. Gen. Microbiol. 139:307–316. Henrissat, B., Y. Popineau, and Y. Kader. 1988. Hydrophobic cluster analysis of plant protein sequences. A domain homology between storage and lipid transfer proteins. Biochem. J. 255:901–905. Higgins, D. G., J. D. Thompson, and T. J. Bibson. 1996. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266:388–402. Jauris, S., K. P. Ruecknagel, W. H. Schwarz, P. Kratzsch, K. Bronnenmeier, and W. L. Staudenbauer. 1990. Sequence analysis of the Clostridium stercorarium celZ gene encoding a thermoactive cellulase (Avicelase I): identification of catalytic and cellulose-binding domains. Mol. Gen. Genet. 223:258– 267. Johnson, E. A., A. Madia, and A. L. Demain. 1981. Chemically defined minimal medium for growth of the anaerobic cellulolytic thermophile Clostridium thermocellum. Appl. Environ. Microbiol. 41:1060–1062. Johnson, P. E., P. Tomme, M. D. Joshi, and L. P. McIntosh. 1996. Interaction of soluble cellooligosaccharides with the N-terminal cellulose-binding domain of Cellulomonas fimi CenC. 2. NMR and ultraviolet absorption spectroscopy. Biochemistry 35:13895–13906. Johnson, P. E., M. D. Joshi, P. Tomme, D. G. Kilburn, and L. P. McIntosh. 1996. Structure of the N-terminal cellulose-binding domain of Cellulomonas fimi CenC determined by nuclear magnetic resonance spectroscopy. Biochemistry 35:14381–14394. Joliff, G., P. Be´guin, and J. P. Aubert. 1986. Nucleotide sequence of the cellulase gene celD encoding endoglucanase D of Clostridium thermocellum. Nucleic Acids Res. 14:8605–8613. Jones, E. Y. 1993. The immunoglobulin superfamily. Curr. Opin. Struct. Biol. 3:846–852. Jung, E. D., G. Lao, D. Irwin, B. K. Barr, A. Benjamin, and D. B. Wilson. 1993. DNA sequences and expression in Streptomyces lividans of an exoglucanase gene and an endoglucanase gene from Thermomonospora fusca. Appl. Environ. Microbiol. 59:3032–3043. Juy, M., A. G. Amit, P. M. Alzari, R. J. Poljak, M. Claeyssens, P. Be´guin, and J. P. Aubert. 1992. Three-dimensional structure of a thermostable bacterial cellulase. Nature 357:89–91. Kruus, K., W. K. Wang, and J. H. D. Wu. 1995. Exoglucanase activities of the recombinant Clostridium thermocellum CelS, a major cellulosome component. J. Bacteriol. 177:1641–1644. Lemesle-Varloot, L., B. Henrissat, C. Garboriaud, V. Bissery, A. Morgat, and J. P. Mornon. 1990. Hydrophobic cluster analysis: procedures to derive structural and functional information from 2-D-representation of protein sequences. Biochimie 72:555–574. Little, E., P. Bork, and R. F. Doolittle. 1994. Tracing the spread of fibronectin type III domains in bacterial glycohydrolases. J. Mol. Evol. 39:631–643. Lu ¨ck, A., J. D’Haese, and H. Hinssen. 1995. A gelsolin-related protein from lobster muscle: cloning, sequence analysis and expression. Biochem. J. 305: 767–775. Lytle, B., C. Myers, K. Kruus, and J. H. D. Wu. 1996. Interactions of the CelS
VOL. 180, 1998
33. 34. 35.
36. 37. 38.
39. 40.
41.
42. 43.
STRUCTURE AND LOCALIZATION OF C. THERMOCELLUM CbhA
binding ligand with various receptor domains of the Clostridium thermocellum cellulosomal scaffolding protein, CipA. J. Bacteriol. 178:1200–1203. Mae, A., R. Heikinheimo, and E. T. Palva. 1995. Structure and regulation of the Erwinia carotovora subspecies carotovora SCC3193 cellulase gene celV1 and the role of cellulase in phytopathogenicity. Mol. Cen. Genet. 247:17–26. Main, L. M., T. S. Harvey, M. Baron, J. Boyd, and I. D. Campbell. 1992. The three-dimensional structure of the tenth type III module of fibronectin: an insight into RGD-mediated interactions. Cell 71:671–678. Mayer, F., M. P. Coughlan, Y. Mori, and L. G. Ljungdahl. 1987. Macromolecular organization of the cellulolytic enzyme complex of Clostridium thermocellum as revealed by electron microscopy. Appl. Environ. Microbiol. 53:2785–2792. Mel’nik, M. S., M. L. Rabinovich, and I. V. Voznyi. 1991. Cellobiohydrolase from Clostridium thermocellum, synthesized by a recombinant E. coli strain. Biokhimiya 56:1787–1797. Morag, E., E. A. Bayer, and R. Lamed. 1992. Affinity digestion for the near-total recovery of purified cellulosome from Clostridium thermocellum. Enzyme Microb. Technol. 14:289–292. Morag, E., E. A. Bayer, G. P. Hazlewood, H. J. Gilbert, and R. Lamed. 1993. Cellulase SS (CelS) is synonymous with the major cellobiohydrolase (subunit S8) from the cellulosome of Clostridium thermocellum. Appl. Biochem. Biotechnol. 43:147–151. Potts, J. R., and I. D. Campbell. 1996. Structure and function of fibronectin modules. Matrix Biol. 15:313–320. Risler, J. L., M. O. Delorme, H. Delacroix, and A. Henaut. 1988. Amino acid substitutions in structurally related proteins. A pattern recognition approach. Determination of a new and efficient scoring matrix. J. Mol. Biol. 204:1019– 1029. Saul, D. J., L. C. Williams, R. A. Grayling, L. W. Chamley, D. R. Love, and P. L. Bergquist. 1990. celB, a gene coding for a bifunctional cellulase from the extreme thermophile Caldocellum saccharolyticum. Appl. Environ. Microbiol. 56:3117–3124. Schlochtermeier, A., S. Walter, J. Schro¨der, M. Moorman, and H. Schrempf. 1992. The gene encoding the cellulase (Avicelase) Cel1 from Streptomyces reticuli and analysis of protein domains. Mol. Microbiol. 6:3611–3621. Shimon, L. J. W., E. A. Bayer, E. Morag, R. Lamed, S. Yaron, Y. Shoham, and F. Frolow. 1997. A cohesin domain from Clostridium thermocellum: the
44. 45. 46. 47.
48.
49.
50. 51. 52. 53.
54.
3099
crystal structure provides new insights into cellulosome assembly. Structure 5:381–390. Singh, R. N., and V. K. Akimenko. 1993. Isolation of a cellobiohydrolase of Clostridium thermocellum capable of degrading natural crystalline substrates. Biochem. Biophys. Res. Commun. 192:1123–1130. Sokal, R. R., and P. H. A. Sneath. 1963. Principles of numerical taxonomy. Freeman, San Francisco, Calif. Tomme, P., R. A. J. Warren, and N. R. Gilkes. 1995. Cellulose hydrolysis by bacteria and fungi. Adv. Microb. Physiol. 37:1–81. Tomme, P., L. Creagh, D. Kilburn, and C. Haynes. 1996. Interaction of polysaccharides with the N-terminal cellulose-binding domain of Cellulomonas fimi CenC. 1. Binding specificity and calorimetric analysis. Biochemistry 35:13885–13894. Tormo, J., R. Lamed, A. J. Chirino, E. Morag, E. A. Bayer, Y. Shoham, and T. A. Steitz. 1996. Crystal structure of a bacterial family-III cellulose-binding domain: a general mechanism for attachment to cellulose. EMBO J. 15: 5739–5751. Tuka, K., V. V. Zverlov, B. K. Bumazkin, G. A. Velikodvorskaya, and A. Y. Strongin. 1990. Cloning and expression of Clostridium thermocellum genes coding for thermostable exoglucanases (cellobiohydrolases) in Escherichia coli cells. Biochem. Biophys. Res. Commun. 169:1055–1060. von Heijne, G. 1985. Signal sequences. The limits of variation. J. Mol. Biol. 184:99–105. Wang, W. K., K. Kruus, and J. H. D. Wu. 1993. Cloning and DNA sequence of the gene coding for Clostridium thermocellum cellulase SS (CelS), a major cellulosome component. J. Bacteriol. 175:1293–1302. Wu, J. H. D., W. H. Orme-Johnson, and A. L. Demain. 1988. Two components of an extracellular protein aggregate of Clostridium thermocellum together degrade crystalline cellulose. Biochemistry 27:1703–1709. Zverlov, V. V., K. P. Fuchs, W. H. Schwarz, and G. Velikodvorskaya. 1994. Purification and cellulosomal localization of Clostridium thermocellum mixed linkage b-glukanase LicB (1,3-1,4-b-D-glucanase). Biotechnol. Lett. 16:29– 34. Zverlov, V. V., I. Y. Volkov, T. V. Velikodvorskaya, and W. H. Schwarz. 1997. Highly thermostable endo-1,3-b-glucanase (laminarinase) LamA from Thermotoga neapolitana: nucleotide sequence of the gene and characterization of the recombinant gene product. Microbiology 143:1701–1708.