tive to those of the less alkaliphilic subtilisin Carlsberg,. Crystallization of .... Active center structure of M-protease superimposed on the electron density map.
Protein Engineering vol.10 no.6 pp.627–634, 1997
High-resolution crystal structure of M-protease: phylogeny aided analysis of the high-alkaline adaptation mechanism
Tsuyoshi Shirai1,3, Atsuo Suzuki1, Takashi Yamane1, Tamaichi Ashida1, Tohru Kobayashi2, Jun Hitomi2 and Susumu Ito2 1Department
of Biotechnology, Graduate School of Engineering, Nagoya University, Nagoya 464-01, Japan and 2Tochigi Research Laboratories of Kao Corporation, 2606 Akabane, Ichikai, Haga, Tochigi 321-34, Japan 3To
whom correspondence should be addressed
M-protease is a subtilisin-family serine protease produced by an alkaliphilic Bacillus sp. strain. Optimal enzymatic activity of the protein occurs at pH 12.3. The crystal structure of M-protease (space group P212121, a J 62.3, b J 75.5, c J 47.2 Å) has been refined to a crystallographic R-factor of 17.2% at 1.5 Å resolution. The alkaline adaptation mechanism of the enzyme was analyzed. Molecular phylogeny construction was used to determine the amino acid substitutions that occurred during the high-alkaline adaptation process. This analysis revealed a decrease in the number of negatively charged amino acids (aspartic acid and glutamic acid) and lysine residues and an increase in arginine and neutral hydrophilic amino acids (histidine, asparagine and glutamine) residues during the course of adaptation. These substitutions increased the isoelectric point of M-protease. Some of the acquired arginine residues form hydrogen bonds or ion pairs to combine both N- and C-terminal regions of M-protease. The substituted residues are localized to a hemisphere of the globular protein molecule where positional shifts of peptide segments, relative to those of the less alkaliphilic subtilisin Carlsberg, are observed. The biased distribution and interactions caused by the substituted residues seem to be responsible for stabilization of the conformation in a high-alkaline condition. Keywords: X-ray crystallography/subtilisin/alkaline adaptation/ protein evolution/molecular phylogeny Introduction M-protease (EC 3.4.21.14) is a subtilisin-family serine protease produced by alkaliphilic Bacillus sp. KSM-K16 strain (Hakamada et al., 1994; Kobayashi et al., 1995). The most remarkable feature of M-protease is its high alkaline resistance. The enzyme belongs to the extremely alkaliphilic group among related alkaline serine proteases and shows optimal enzymatic activity at pH 12.3. Extensive use of alkaline proteases in such industrial applications as laundry detergent additives has led to increased interest in the enzymatic properties (Markland and Smith, 1971; Kraut, 1977) and three-dimensional structures of this family of enzymes including subtilisin BPN9 (Wright et al., 1969), proteinase K (Pa¨hler et al., 1984), subtilisin Carlsberg (McPhalen et al., 1985), thermitase (Dauter et al., 1988), alkaline protease from Bacillus alcalophilus (Sobek et al., 1990), Savinase (Betzel et al., 1992), subtilisin BL (Goddette et al., 1992), PB92 (Van Der Laan et al., 1992) and subtilisin E (Chu et al., 1995). © Oxford University Press
One of the major goals of the structural study of alkaline proteases is to clarify the molecular basis of adaptation to a non-physiological pH condition. Some alkaline proteases (e.g. PB92, elastase YaB and M-protease) derived from alkaliphilic Bacillus species or strains show optimal enzymatic activity at a pH around 12. One of the important goals of protein engineering is to design proteins that are active in extreme conditions (Jaenicke, 1991). We carried out X-ray structure analysis of the form 1 crystal of M-protease. The crystal structure was refined at 1.5 Å resolution which provided a vast improvement over previous analysis of the form 2 crystal structure at 2.4 Å resolution (Yamane et al., 1995). The 1.5 Å resolution structure was used for analysis of the high-alkaline adaptation mechanism of Mprotease. We investigated the mechanism using molecular evolutionary analysis. Essential structural features of highalkaline proteases that contribute to their alkaline resistance have developed through an evolutionary process that led to a group of the high-alkaline strains from their relatives. Amino acid substitutions that occurred during the evolutionary process were deduced from the phylogenetic relationship with related proteases. The deduced substitutions were studied with respect to their distribution and interactions based on the threedimensional structure of M-protease. Materials and methods Crystallization and X-ray data collection Crystallization of purified M-protease was performed by the hanging drop vapor diffusion method at 18°C. A 10 µl drop of 50 mM acetate buffer (pH 5.5), containing 0.7 M ammonium sulfate and 5 mg/ml M-protease, was equilibrated against a 1 ml reservoir solution containing 50 mM acetate buffer (pH 5.5) and 1.4 M ammonium sulfate. Two different forms of the crystals were obtained under the same conditions. The needle-shaped crystal (form 1) was found to diffract to 1.5 Å, which was much better than the value 2.4 Å observed previously in the analysis of pillar crystal (form 2). Thus, the form 1 crystal was used for the present high resolution study. The crystallographic parameters for the form 1 crystal are listed in Table I. Two form 1 crystals were used for data collection using the Weissenberg method with synchrotron radiation at the BL6A station of the Photon Factory, National Laboratory for High Energy Physics. One crystal was rotated around the needle axis (a) and the other around the c axis during the data collection. The Weissenberg camera (Sakabe et al., 1983) and Fuji imaging plates were used to record the reflections. A total of 32 images were digitized using a Fuji BAS-2000 analyzer and processed by the WEIS program (Higashi 1989). Reflections up to 1.5 Å resolution were merged into a data set as summarized in Table I. Structure determination and refinement The crystal structure was determined by the molecular replacement method (Crowther, 1972) using the previously determined 627
T.Shirai et al.
Table I. Crystallographic parameters, data collection and refinement statistics Crystallographic parameters Space group Unit cell dimensions (Å) a b c No. of mol. in one unit cell
62.3 75.5 47.2 4
Data collection statistics X-ray source Wave length (Å) No. of crystals Resolution limit (Å) No. of unique ref. (F . 1σ(F)) Rmerge (%) Completeness (%) of final shell (1.53–1.50 Å)
SR 1.00 2 1.5 29 853 9.4 82 59
P212121
Refinement statistics Model contents No. of amino acid residues No. of protein non-H atoms No. of water mol. Ligands
269 1882 124 Ca21 (1 atom), SO42– (1 mol.) 0.172 27 846
R-factor (F . 3σ(F) in 8.0–1.5 Å) No. of ref. R.M.S. deviations from ideal values Bond length (Å) Bond angle (deg.) Dihedral angle (deg.) Improper angle (deg.)
0.010 2.3 23.6 0.9
form 2 crystal structure as a model. The rotation and the translation functions located the molecules in the unit cell with an initial crystallographic R-factor of 0.330 for the reflections between 5.0 and 3.5 Å resolution. After rigid-body refinement of the positional parameters, the R-factor dropped to 0.316. Structural refinement was performed by repeatedly applying a simulated annealing procedure using X-PLOR (Bru¨nger, 1987) and computer graphics aided model manipulation using the turboFRODO program (Cambillau, 1992). The resolution range was extended gradually to 1.5 Å. Refinement against a total of 27 846 reflections (F . 3σ(F)) between 8.0 and 1.5 Å resolution was converged at an R-factor of 0.172. The final model contents and deviation from ideal geometry are shown in Table I. Comparison of structures The refined structure of M-protease was compared with a less alkaliphilic relative, subtilisin Carlsberg from Bacillus licheniformis (BNL code 1SCA; Fitzpatrick et al., 1993). In addition to canonical Cα deviation analyses, we employed an index that can detect collective dislocations of local peptide segments between two proteins. The index is defined as k1n
Sk 5
i–1
Σ Σ r ·r , i j
i5k–n j5k–n
where Sk is the collective dislocation index of residue k of a protein, and ri is the vector from the Cα atom of residue i of a protein to the corresponding atom of a superimposed protein. Sk is large when residues within the window range (residues from k–n to k1n) are dislocated in a similar direction to those of another protein. The collective dislocation index can discriminate a systematic dislocation of atoms from a random one. A window size of 11 (n 5 5) was used in this study. 628
Deduction of ancestral sequences by molecular phylogenetic analysis A molecular phylogeny of the M-protease and six related enzymes was constructed to facilitate structural comparison. The amino acid sequences of three high-alkaline enzymes, M-protease (Hakamada et al., 1994), PB92 (Van Der Laan et al., 1991) and elastase YaB (Kaneko et al., 1989) and four alkaline enzymes, subtilisin Carlsberg (Jacobs et al., 1985), subtilisin J (Jang et al., 1992), subtilisin BPN9 (Markland and Smith, 1967) and bacillopeptidase F (Sloma et al., 1990) were used to construct the phylogenetic tree and for ancestral sequence deduction. Evolutionary distances among the seven proteins were calculated by the maximum likelihood method (Kishino et al., 1990) according to a multiple sequence alignment of the proteins (Figure 1). The tree topology was deduced by the neighbor joining method (Saitou and Nei, 1987). The program system MOLPHY (Adachi and Hasegawa, 1992) was used for the distance calculation and tree construction. A phylogenetic relationship of proteins can be applied to deduce their ancestral sequences (Dayhoff et al., 1978; Stewart et al., 1987). The deduced ancestral sequences can be, in turn, used to identify amino acid substitutions between protein groups. In the case of alkaline proteases, amino acid residues essential for activity in a high-alkaline condition would have been substituted in the evolutionary process that resulted in the distinction between high-alkaline and alkaline proteases. Thus, essential residues are among those substituted between ancestral sequences of high-alkaline and alkaline groups. The procedure applied in this study is explained by a simple artificial phylogenetic tree consisting of four proteins (Figure 2). Suppose proteins 1–4 are related by two ancestral nodes (A and B) as shown in the figure. The two nodes should be occupied by proper ancestral sequences. By assigning corresponding residue sites of the four proteins and two ancestral proteins at nodes according to particular amino acids as shown in the figure, the number of minimum amino acid substitutions required for the model can be calculated. Furthermore, by exhaustively repeating this for every possible combination of amino acid residues at the ancestral nodes, the model(s) with the most parsimonious number of substitutions were obtained for each residue site. Results and discussion Active center and Ca21-binding site structures in M-protease form 1 crystal All main chain dihedral angles (φ and ψ) of non-glycine residues of the final M-protease model are within the generously allowed region of the Ramachandran plot (data not shown). The error in coordinates of the model is estimated to be ~0.15 Å by the Luzzati plot (Luzzati, 1952) (Figure 3). The crystallization sample of M-protease was treated with the inhibitor phenylmethylsulfonyl fluoride (PMSF) during the purification process (Kobayashi et al., 1995). However, the presence of an inhibitor at the active center was not clearly demonstrated in the previous analysis of form 2 crystal (Yamane et al., 1995). In the present study, the absence of inhibitor moiety was confirmed by an improved electron density map and a catalytic assay of the enzyme that was extracted from the crystals. The electron density found at the active center were reasonably interpreted as water molecules that formed hydrogen bonds with catalytic Ser221 or His64
M-protease crystal structure
Fig. 1. Alignment of amino acid sequences of M-protease and its relatives. Amino acid sequences of high-alkaline enzymes: M-protease, PB92 and elastase YaB, and alkaline enzymes: subtilisin Carlsberg (sub. Carlsberg), subtilisin J (sub. J), subtilisin BPN9 (sub. BPN9) and bacillopeptidase F (B. peptidase F) are aligned with deduced ancestral sequences (ancestors A and B). Residue numbers of M-protease are indicated at the top of each row. C-terminal 174 residues of bacillopeptidase F, most of which is not essential for protease activity, are omitted. Ancestors A and B are most probable sequences for direct ancestors of high-alkaline and alkaline enzymes, respectively. Their amino acid residues are shown in lower case letters when two or more equally probable amino acids are suggested. Symbols between ancestral sequences indicate residue sites which have been substituted between ancestral sequences. Symbols @,% and * indicate substitutions of ionizable residues, residues at interface of shifted segments and others, respectively. Symbol # indicates inserted/deleted residue. Underlined regions of M-protease sequence indicate positions of shifted segments.
Fig. 2. Model phylogeny of ancestral sequence deduction procedure. Circles labeled 1–4 are proteins. Boxed A and B are nodes (ancestors) from which amino acid types are to be deduced. Known amino acid residues of proteins 1–4 are associated with circles. Some possible combinations of residue types for nodes appear on either side of numbers associated with minimum amino acid substitutions.
Fig. 3. Luzzati plot of crystallographic R-factor against resolution. A total of 27846 reflections between 8.0 and 1.5 Å resolution (F.3σ(F)) was used in calculation.
629
T.Shirai et al.
Fig. 4. Active center structure of M-protease superimposed on the electron density map. Filled circles represent water molecules. Dotted lines indicate hydrogen bonds. Electron density map is contoured at 1.0σ level.
(Figure 4). Proteins from form 1 crystals were enzymatically active (data not shown). Thus, the inhibitor group was excluded from the model. M-protease and related proteases have two Ca21-binding sites. The geometry of the 1st site in the M-protease model is similar to that of subtilisin Carlsberg (Fitzpatrik et al., 1993). In the second binding site, however, the distances between the peak electron density at the site and ligand atoms were too long for standard Ca21 ligation distance; the ligation distances in M-protease range from 2.7 to 3.0 Å, whereas those in Carlsberg range from 2.3 to 2.4 Å. Tentatively, the final model contains a water molecule at the site. Shifts of polypeptide segments in C-terminal half region The model of M-protease was compared with that of subtilisin Carlsberg. The root mean square distance between the corresponding Cα atoms of the proteins was 0.85 Å when a total of 269 residues were superimposed. However, when the collective dislocation of Cα atoms between the superimposed structures was calculated, the positional shifts of six segments (referred to as the S1–S6 segments from the N- to C-termini) in M-protease were found to be significantly larger than the other portions (Figure 5). The S2 segment is known as a variable loop in subtilisin families (Goddette et al., 1992). The highly variable nature of the loop in both amino acid sequences and three-dimensional structures can be detected by collective dislocation analysis. Except for the variable loop, five other segments directly or indirectly contact with each other. The five segments are localized to the second Ca21-binding site. These segments cover almost half of the protein surface which is composed mainly of the C-terminal half region of the polypeptide (Figure 6). The movements of the segments are related to each other. The helix (residues 133–145) in the S3 segment moves toward the N-terminal direction. The loop region in the S5 segment follows this movement. The C-terminal helix (residues 269–275) in the S6 segment moves toward the S1 and S4 segments and repels these segments slightly. The mean distances between corresponding Cα atoms of M-protease and Carlsberg were 0.93, 1.87, 1.29, 1.19, 0.86 and 0.90 Å for the S1–S6 segments, respectively. These values are approximately double the mean deviation (0.49 630
Fig. 5. Plots of Cα deviation (thin line) and collective dislocation index (thick line) between superimposed M-protease and Carlsberg against residue number of M-protease. Segments (S1–S6), which are composed of residues with indices .10, are indicated by horizontal lines above plot. Numbers in parentheses indicate residue number of segments. Residual sites that take part in the crystal packing in M-protease and Carlsberg are indicated by open and closed circles, respectively. Positions of helices and strands in M-protease are indicated by filled and open boxes, respectively.
Å) of the other Cα atoms. Differences in the overall structure of the two proteins resulted from shifting of the surface segments in the C-terminal half region. Since the manners of crystal packing are different between the two proteins (Figure 5), the possibility that the segments are moved differently by the distinct contact modes with neighboring molecules cannot be ruled out. However, an analysis of amino acid substitutions in a subsequent section suggests that the shifting of the segments is an intrinsic difference between the two proteins. Ancestral sequence deduction based on phylogenetic tree The subtilisin family contains proteases or elastases which are alkaliphilic to varying degrees. In the present study, seven such proteases and an elastase are categorized as high-alkaline or
M-protease crystal structure
Fig. 6. Collectively dislocated segments between M-protease and Carlsberg. Trace of Cα atoms of M-protease is indicated by thin lines. Peptide segments of Carlsberg, only those corresponding to S1–S6 of M-protease, are shown by thick lines and superimposed on M-protease structure. Residues 23–46, 62–98 and 204–218 of M-protease are omitted to simplify figure. Filled circle indicates the second C21-binding site. Labels NT and CT indicate N- and C-termini of protein, respectively.
Table II. Balance in amino acid compositions from ancestors B to A
Fig. 7. Phylogenetic tree of M-protease and its relatives. Evolutionary distances (in PAM: percent accepted point mutations) are shown on the branches. Two boxes indicate positions of ancestors A and B. All branches are more than 95% probable from 1000 bootstrap reconstructions.
alkaline enzymes. The high-alkaline enzymes, M-protease, PB92 and elastase YaB show optimal enzymatic activity at a pH around 12 (Tsai et al., 1986; Van Der Laan et al., 1991; Kobayashi et al., 1995). The optimal pH range for the alkaline proteases, subtilisins Carlsberg, J, BPN9 and bacillopeptidase F is reportedly 8–11 (Matsubara et al., 1958; Markland and Smith, 1971; Wu et al., 1990; Jang et al., 1992). The phylogenetic tree of the enzymes is shown in Figure 7. The three high-alkaline enzymes are clustered and appear to derive from a common ancestor. The amino acid substitutions, which are essential to the highly alkaliphilic characteristic of the three enzymes, likely accumulated during the process of the divergence between ancestors A and B (Figure 7). The ancestral sequences that were deduced to occupy nodes A and B are presented in Figure 1. Since there may be two or more equally probable models to explain the evolutionary process, the several residue sites of ancestral sequences could not be identified. The balance in amino acid composition between the ancestral sequences is shown
Amino acid
Gain
Loss
Difference (gain–loss)
Asn Arg His Thr Gln Val Ala Trp Cys Leu Met Phe Tyr Ser Pro Ile Glu Gly Lys Asp
4.4 2.9 1.6 3.7 3.1 3.7 5.3 0.5 0.0 2.3 0.6 0.2 0.6 5.1 1.0 3.1 0.9 1.0 0.6 0.7
0.5 1.0 0.3 2.5 2.0 2.8 4.5 0.3 0.0 2.5 0.9 0.6 1.3 6.1 2.1 4.2 2.3 2.5 2.1 3.2
3.9 1.9 1.3 1.2 1.1 0.9 0.8 0.2 0.0 –0.2 –0.3 –0.4 –0.7 –1.0 –1.1 –1.1 –1.4 –1.5 –1.5 –2.5
Table III. Comparison of interactions between M-protease and Carlsberg M-protease
No. of H-bonds No. of ion pairsb No. of non-polar atomic contactsc
244 10 1334
Carlsberg
219 6 1383
Difference (M-pro.–Carls.)a Dall
Drep
Danc
25 4 –49
15 2 –72
14 2 –26
aD , all
total difference between M-protease and Carlsberg; Drep, difference made by side chains that replaced between M-protease and Carlsberg (103 residues); Danc, difference made by side chains that replaced between ancestors A and B (41 residues). bNumber of residue pairs in which charged atoms exist within 4.0 Å. cNumber of atom pairs of aliphatic or aromatic carbons that exist within 5.0 Å.
in Table II. Aspartic acid and lysine composition decreased, whereas that of arginine increased, suggesting that the evolutionary process involves an increase in the isoelectric 631
T.Shirai et al.
point (pI) of proteins (Goddette et al., 1992; Van Der Laan et al., 1992). The loss of tyrosine, which is also believed to be involved in this strategy, is not as evident from this analysis. The loss of another negatively charged residue, glutamic acid, appears to contribute to the increase in pI. Asparagine, glutamine and histidine content also increased. They are dominantly uncharged components at a highalkaline pH. An increase in these neutral residues might help to maintain protein solubility in water by compensating for the loss of lysine and negatively charged residues. The substitutions appear to be responsible for the increase in pI of M-protease to 10.6 (Kobayashi et al., 1995) from that of Carlsberg at 9.4 (Markland and Smith, 1971). The total number of substitutions observed when the two ancestral sequences are compared is 42.1 residues. The predicted number of substitutions based on the evolutionary distance (branch length between nodes A and B in the tree) is 37.4, which is ~89% of the observed number. Thus, the number of substitutions appears to be over-predicted by the comparison of ancestral sequences, although the amount is within a reasonable range. Spatial distribution of the substituted residues between two ancestors The ancestral sequences (A and B in Figure 1) show that 41 residues have been substituted, and six residues have been deleted during the course of adaptation, i.e. those residue sites of ancestors A and B are occupied by different amino acids in more than half of all accepted evolutionary models. When any pair of high-alkaline and alkaline enzyme sequences is directly compared, the counts of substitutions are within a range of 100–167 sites. Thus, only 25–41% of apparent substitutions are suggested to have accumulated during the high-alkaline adaptation process. Spatial distribution of the 41 residues shows that those residues are mainly localized to the hemisphere of the globular protein molecule where the segment shifts are observed (Figure 8a). Thirteen out of the 41 residues are found at the buried interface of the shifted segments (Figure 8b). The residue Thr58 (in the numbering system and residue type of Mprotease) is on the S2 segment and forms the interface. The residues Ala108, Leu111, Ala122, Thr134 and Thr143 are found at the interface of the S3 segment. They take part in the interactions between the helix (residues 133–145) in the S3 segment and other regions. The other seven residues, Val11, Leu148, Met175, Val199, Val234, Ile246 and Thr274 are placed in a ring shape and form interfaces with segments S1 and S4–S6. These substitutions at the interior of the protein molecule appear to have been a driving force for the shift in the segments. The increase in pI appears to be involved in the alkaline adaptation strategy. The pI of M-protease increased as the result of the decrease in aspartic acid, glutamic acid and lysine content, and the increase in arginine content. Thirteen out of the 41 sites were substituted in this manner, although some of these cases were ambiguous (Figures 1 and 8c). A total of three ion pairs are involved in the 13 substituted residues: Arg19 ... Glu271, Arg170 ... Tyr167 and Arg275 ... Glu271 (Figure 8c). It is significant that arginine residue, which can retain a positive charge under high-alkaline conditions, is acquired to form these ion pairs. Difference in interactions between M-protease and Carlsberg Difference in interactions between M-protease and Carlsberg structures is summarized in Table III. Numbers of hydrogen 632
bonds and ion pairs increased by approximately 11 and 67%, respectively, and that of hydrophobic atomic contacts decreased by 4% from Carlsberg to M-protease. The results suggest that the acquired hydrogen bonds and ion pairs play one of the important roles in alkaline adaptation. Remarkably, the differences in the numbers of hydrogen bonds and ion pairs, which are made by 103 side chains replaced between Mprotease and Carlsberg (Drep in Table III), are dominated by the contributions from the 41 substituted residues between ancestors A and B (Danc in Table III). The 41 substituted residues made no direct interaction with the catalytic sites (Asp32, His64 and Ser221) of M-protease and Carlsberg and, as a result, cause no significant alteration in the catalytic center structures. Some of the acquired arginine residues appear to stabilize the protein structure by fastening up both the terminal regions of the protein molecule. The ion pair or hydrogen-bond (Arg19 Nη2 ... Glu271 Oε1) between Arg19, one of the acquired arginine residues, and Glu271 is formed as if they combine the segments S1 and S6 that are almost at the N- and Ctermini of the protein (Figure 8d). Arg19 also forms a hydrogen bond with the side chain of Thr274 (Arg19 Nη1 ... Thr274 Oγ1). Since Thr274 appears to have been substituted in the branch between the two ancestors (Figure 1), the hydrogen bond was likely to be acquired during the adaptation process. The hydrogen bond network is extended to the second Ca21binding site through residues Arg275 (Arg275 Nε ... Glu271 Oε1 and Arg275 Nη2 ... Glu271 Oε2), His249 (His249 Nε2 ... Ala273 O) and Lys251 (Lys251 Nζ ... Asp197 Oδ1) which have been acquired during the adaptation process. The other acquired arginine, Arg10 in the S1 segment forms hydrogen bonds (Arg10 Nη1 ... Gln182 O and Arg10 Nη2 ... Gln182 O) with the S4 segment and combines the segments which are separated by approximately 170 residues in the primary structure. The hydrogen bond network involves three ion pairs (Arg19 ... Glu271, Lys251 ... Asp197 and Arg275 ... Glu271) and the eight hydrogen bonds (those mentioned above) that are newly acquired in M-protease. The numbers of the interactions that compose the network correspond to a significant proportion in the net increase of interactions (Table III). Asp197, the terminal residue of the network, serves as a ligand when a Ca21 ion binds to the second binding site (Figure 8d). Since binding of the Ca21 ion enhances the stability of subtilisins, a possible function of the network is to transmit the stabilizing effect to the portion where both terminal regions of the protein molecule meet. Localization of the substituted residues The 13 charge altering substitutions also appear to be localized to the shifted segments (Figure 8c). Some of these substituted residues (Arg10, Gln12, Arg19, Glu136, Asn173, Val244, Ser265 and Arg275) are on or directly in contact with the segments. Both the substituted residues at the buried interface (Figure 8b) and surface region appear to be localized and related with respect to the shifted segments. A possible explanation for this localization involves interactions among the amino acid substitutions that occurred as a series of evolutionary events. Amino acid substitutions in a compensative or correlated manner were often found in proteins. A frequently observed feature is that correlatively substituted residues exist proximal in space (Altschuh et al., 1987; Shindyalov et al., 1994). Substitution of a residue affects
M-protease crystal structure
Fig. 8. Residues substituted between two ancestors are viewed on M-protease structure. (a) Cα trace of M-protease is represented by tube model. Shifted segments are shown in dark yellow and other regions are in cyan. Side chains and Cα atoms of 41 substituted residues are represented by space filling models in light green. (b) Thirteen substituted residues found at interface of shifted segments are shown as stick models (in green). (c) Residues that involve ionizable residues in substitutions (in blue) and partners of ion pairs (Tyr167 and Glu271 in yellow green) are shown as stick models. (d) Hydrogen bond network formed by substituted residues. Main chain atoms of shifted segments are shown in dark yellow and those of other regions are in cyan. Side chains of substituted residues (Arg10, Arg19, His249, Lys251, Thr274 and Arg275) are shown in green and residues (Asp197 and Glu271) that mediate network are in blue. Hydrogen bonds are shown in yellow. Light-blue sphere indicates the second Ca21-binding site.
633
T.Shirai et al.
substitutions of contacting residues. This suggests that a substitution influences surrounding residues by a kind of domino effect. Trace of such propagation may be more clearly observed as a biased distribution of the substituted residues, because we extracted substitutions in a specific period during the evolutionary process. It is possible that shifts in the segments and substitutions of hydrophobic residues at the interface occurred in response to substitutions of charged residues. These responses may reduce possible stress that was introduced by an alteration in the charge distribution at the protein surface, or optimize the interactions formed by the acquired residues. The substituted residues are involved in the specific intramolecular interactions that stabilize the structure, rather than simply being modifiers of the net charge of the protein. Acknowledgements We thank Drs M.Suzuki, N.Watanabe, K.Sakabe and N.Sakabe for their assistance in data collection at the Photon Factory. Data collection was approved by the Photon Factory Advisory Committee (proposal 95G058). We thank Dr M.Go for her comments on the manuscript. This work was supported in part by a Grants-in-Aid for Encouragement of Young Scientists (No. 08780618) to T.S., Developmental Scientific Research (B) (No. 07554059) to T.Y. and Scientific Research on Priority Areas (No. 05244102) to T.A. from the Ministry of Education, Science, Sports and Culture of Japan.
References Adachi,J. and Hasegawa,M. (1992) Computer Science Monographs, No. 27, Molphy: Programs for Molecular Phylogenetics, I. - PROTML: Maximum Likelihood Inference of Protein Phylogeny. Institute of Statistical Mathematics, Tokyo. Altschuh,D., Lesk,A.M., Bloomer,A.C. and Klug,A. (1987) J. Mol. Biol., 193, 693–707. Betzel,C., Klupsch,S., Papendorf,G., Hastrup,S., Branner,S. and Wilson,K.S. (1992) J. Mol. Biol., 223, 427–445. Bru¨nger,A.T., Kuriyan,J. and Karplus,M. (1987) Science, 235, 458–460. Cambillau,C. (1992) Turbo-FRODO, Molecular Graphics Program for Silicon Graphics IRIS 4D Series, Version 3.0. Bio-Graphics, Marseille, France. Chu,N.-M., Chao,Y. and Bi,R.-C. (1995) Protein Engng, 8, 211–215. Crowther,R.A. (1972) In Rossmann,M.G. (ed.), International Science Review, vol. 13, The Molecular Replacement Method. Gordon & Breach, New York, pp. 173–178. Dauter,Z., Betzel,C., Ho¨hne,W.-E., Ingelman,M. and Wilson,K.S. (1988) FEBS Lett., 236, 171–178. Dayhoff,M.O., Schwartz,R.M. and Orcutt,B.C. (1978) In Dayhoff,M.O. (ed.), Atlas of Protein Sequence and Structure, vol. 5, supp. 3, National Biomedical Research Foundation, Washington DC, pp. 345–352. Fitzpatrick,P.A., Steinmetz,A.C.U., Ringe,D. and Klibanov,A.M. (1993) Proc. Natl Acad. Sci. USA, 90, 8653–8657. Goddette,D.W., Paech,C., Yang,S.S., Mielenz,J.R., Bystroff,C., Wilke,M.E. and Fletterick,R.J. (1992) J. Mol. Biol., 228, 580–595. Hakamada,Y., Kobayashi,T., Hitomi,J., Kawai,S. and Ito,S. (1994) J. Ferment. Bioengng, 78, 105–108. Higashi,T. (1989) J. Appl. Crystallogr., 22, 9–18. Jacobs,M., Eliasson,M., Uhle´n,M. and Flock,J.-I. (1985) Nucleic Acids Res., 13, 8913–8926. Jaenicke,R. (1991) Eur. J. Biochem., 202, 715–728. Jang,J.S., Kang,D.O., Chun,M.J. and Byun,S.M. (1992) Biochem. Biophys. Res. Commun., 184, 277–282. Kaneko,R., Koyama,N., Tsai,Y.-C., Juang,R.-Y., Yoda,K. and Yamasaki,M. (1989) J. Bacteriol., 171, 5232–5236. Kishino,H., Miyata,T. and Hasegawa,M. (1990) J. Mol. Evol., 31, 151–160. Kobayashi,T., Hakamada,Y., Adachi,S., Hitomi,J., Yoshimatsu,T., Koike,K., Kawai,S. and Ito,S. (1995) Appl. Microbiol. Biothechnol., 43, 473–481. Kraut,J. (1977) Annu. Rev. Biochem., 46, 331–358. Van Der Laan,J.C., Gerritse,G., Mulleners,L.J.S.M., Van Der Hoek,R.A.C. and Quax,W.J. (1991) Appl. Environ. Microbiol., 57, 901–909. Van Der Laan,J.M., Teplyakov,A.V., Kelders,H., Kalk,K.H., Misset,O., Mulleners,L.J.S.M. and Dijkstra,B.W. (1992) Protein Engng, 5, 405–411. Luzzati,P.V. (1952) Acta Crystallogr., 5, 802–810. Markland,F.S. and Smith,E.L. (1967) J. Biol. Chem., 242, 5198–5211.
634
Markland,F.S. and Smith,E.L. (1971) In Boyer,P.D. (ed.), The Enzymes. Academic Press, New York, vol.3, pp. 561–608. Matsubara,H., Hagihara,B., Nakai,M., Komaki,T., Yonetani,T. and Okunuki,K. (1958) J. Biochem., 45, 251–258. McPhalen,C.A., Schnebli,H.P. and James,M.N.G. (1985) FEBS Lett., 188, 55–58. Pa¨hler,A., Banerjee,A., Dattagupta,J.K., Fujiwara,T., Lindner,K., Pal,G.P., Suck,D., Weber,G. and Saenger,W. (1984) EMBO J., 3, 1311–1314. Saitou,N. and Nei,M. (1987) Mol. Biol. Evol., 4, 406–425. Sakabe,N. (1983) J. Appl. Crystallogr., 16, 542–547. Shindyalov,I.N., Kolchanov,N.A. and Sander,C. (1994) Protein Engng, 7, 349–358. Sloma,A., Rufo,G.A., Rudolph,C.F., Sullivan,B.J., Theriault,K.A. and Pero,J. (1990) J. Bacteriol., 172, 1470–1477. Sobek,H., Hecht,H.J., Hofmann,B., Aehle,W. and Schomburg,D. (1990) FEBS Lett., 274, 57–60. Stewart,C.-B., Schilling,J.W. and Wilson,A.C. (1987) Nature, 330, 401–404. Tsai,Y.-C., Lin,S.-F., Li,Y.-F., Yamasaki,M. and Tamura,G. (1986) Biochim. Biophys. Acta, 883, 439–447. Wright,C.S., Alden,R.A. and Kraut,J. (1969) Nature, 221, 235–242. Wu,X.-C., Nathoo,S., Pang,A.S.-H., Carne,T. and Wong,S.-L. (1990) J. Biol. Chem., 265, 6845–6850. Yamane,T., Kani,T., Hatanaka,T., Suzuki,A., Ashida,T., Kobayashi,T., Ito,S. and Yamashita,O. (1995) Acta Crystallogr., D51, 199–206. Received November 20, 1996; revised January 30, 1997; accepted February 10, 1997