Article No. jmbi.1998.2333 available online at http://www.idealibrary.com on
J. Mol. Biol. (1999) 285, 741±753
Exploring the Conformational Properties of the Sequence Space Between Two Proteins with Different Folds: An Experimental Study Francisco J. Blanco*, Isabelle Angrand and Luis Serrano European Molecular Biology Laboratory (EMBL) Meyerhofstrasse 1 69117 Heidelberg, Germany
We have examined the conformational properties of 27 polypeptides whose sequences are hybrids of two natural protein domains with 8 % sequence identity and different structures. One of the natural sequences (spectrin SH3 domain) was progressively mutated to get closer to the other sequence (protein G B1 domain), with the only constraint of maintaining the residues at the hydrophobic core. Only two of the mutants are folded, each of them having a large sequence identity with one of the two natural proteins. The rest of the mutants display a wide range of structural properties, but they lack a well-de®ned three-dimensional structure, a result that is not recognized by computational tools commonly used to evaluate the reliability of structural models. Interestingly, some of the mutants exhibit cooperative thermal denaturation curves and a signal in the near-ultraviolet circular dichroism spectra, both typical features of folded proteins. However, they do not have a well-dispersed nuclear magnetic resonance spectrum indicative of a de®ned tertiary structure. The results obtained here show that both the hydrophobic core residues and the surface residues are important in determining the structure of the proteins, and suggest that the appearance of a completely new fold from an existing one is unlikely to occur by evolution through a route of folded intermediate sequences. # 1999 Academic Press
*Corresponding author
Keywords: protein folding; secondary structure; protein stability; modeling; evolution
Introduction Understanding protein folding involves unraveling the mechanisms by which natural proteins attain their special physical properties, as compared to random heteropolymers, and in particular their ability to fold into a unique de®ned state (their native state). It has been theoretically proposed that the necessary and suf®cient condition for folding is the existence of an energy gap between the native state and any of the other possPresent address: F. J. Blanco, National Institutes of Health, Laboratory of Chemical Physics, NIDDK, 9000 Rockville Pike, Building 5, Room 406, Bethesda, MD 20892 0520, USA. Abbreviations used: UV, ultraviolet; ANS, 1-anilinonaphthalene-8-sulfonate; 1D, one-dimensional; 2D, twodimensional; TFE, tri¯uoroethanol; SH3, a-spectrin SH3 domain; PG, B1 domain of streptococcal protein G. E-mail address of the corresponding author:
[email protected] 0022-2836/99/020741±13 $30.00/0
ible conformations of a protein (Bryngelson et al., 1995; Karplus & Sali, 1995). The fraction of all the possible amino acid sequences with this property is unknown, but from the global examination of protein structures, it can be inferred that a common feature must be their ability to maximize hydrophobic side-chain burial (formation of a hydrophobic core) while keeping the polar groups either solvated or involved in hydrogen bonds (secondary structure formation; Cordes et al., 1996; Beasley & Hecht, 1997). This is supported by the fact that while proteins are very tolerant to mutations, the core remains hydrophobic and maintains a reasonable packing while the surface only shows a bias against large hydrophobic residues (Bowie et al., 1990). It has been proposed that only some of the information (that is, some of the interactions) in a protein sequence is necessary to specify a fold, with the rest just stabilizing that speci®ed structure (the speci®city/stability conundrum; Lattman & Rose, 1993; Rose & Creamer, 1994). This is related # 1999 Academic Press
742 with another dualism in protein folding models, the local/global one, where folding is viewed as a global process (driven by non-local interactions) or locally determined (driven by local interactions; Dill et al., 1995; Rose & Creamer, 1994). One way to experimentally explore the relative importance of both types of interactions in determining the ®nal three-dimensional structure of a protein is to change or modify all the local interactions while keeping constant all the non-local ones. This is not possible in practical terms since many of the amino acid residues of a protein participate to a different degree in both types of interactions. However, in a very rough way we can partition the residues in a protein between those that participate mainly in non-local interactions (those forming the core of the protein) and the rest, which have fewer interactions and most of those are of local nature. If non-local hydrophobic interactions are unspeci®c in nature and mainly stabilize a protein through the hydrophobic effect, we might expect that by changing all the other residues into those of another protein of similar size, while simultaneously creating the core of this target protein, the folding of the chain would change to adopt the structure of the second protein. Alternatively, if non-local interactions specify the fold, keeping the core of one protein unchanged should maintain to a certain extent the ability of the modi®ed sequences to fold into that structure. The two proteins we have chosen for this study are the a-spectrin SH3 domain (SH3) and the B1 domain of streptococcal protein G (PG; Figure 1). Both are compact globular proteins with very low sequence identity (8 %) and very different three-dimensional structures, as determined by X-ray crystallography (Musacchio et al., 1992b; Gallagher et al., 1994) and NMR (Gronenborn et al., 1991; Blanco et al., 1997). SH3 is a 62-residue protein that folds as an eightstranded orthogonal b-sheet sandwich, and PG is a 57-residue protein that has a central a-helix packed against a four-stranded b-sheet. They are small enough to attempt the experimental characterization of a number of hybrid sequences and monitor the change in the structural properties of the sequences as they diverge from one and approach the other. Both proteins follow a simple two-state folding transition (Viguera et al., 1994; Alexander et al., 1992a,b), although the conformational tendencies of their isolated secondary structure elements are very different. In the case of SH3, these tendencies are very small (Viguera et al., 1995), while in PG the second hairpin and the a-helix show signi®cant native-like secondary structure populations (Blanco & Serrano, 1995). The two proteins also have different stabilities (Viguera et al., 1994; Alexander et al., 1992a), PG being particularly stable given its small size. These facts suggest mutation of the sequence of SH3 towards PG, as it could require fewer
Structures of Hybrid Protein Sequences
Figure 1. Ribbon representation of the three-dimensional structures of (a) the a-spectrin SH3 domain (PDB code 1shg) and (b) the protein G B1 domain (PDB code 1pgb). The Figures were generated with the program MOLSCRIPT (Kraulis, 1991).
changes in the sequence to change the fold than starting with PG.
Results Alignment of the proteins and the strategy of mutation The two proteins have a very low degree of sequence identity in a number of possible alignments. The ®nal alignment was manually done so that the number of aligned hydrophobic residues (Ala, Val, Leu, Ile, Met, Pro, and Phe residues) and identical residue was maximized (®ve identities and six hydrophobic residues aligned), while the sequence of the protein G was still comprised by the slightly longer sequence of the SH3. In this
743
Structures of Hybrid Protein Sequences Table 1. Sequences of the 27 mutant sequences plus the two wild-type sequences of SH3 and PG Protein
Sequence
aa number %SH3 %PG aacSH3 aacPG
In color code we show those residues having different solvent accessibility. Red color indicates those residues having less than 5 % solvent accessibility. Blue color indicates residues with less than 25 % solvent accessibility. Black color represents residues with more than 25 % solvent accessibility. The underlined residues are those that have been changed in the SH3 either by replacement or insertion. aa number, number of amino acids. %SH3, percentage of identity with the SH3. %PG, percentage of identity with the PG. aacSH3, number of SH3 core residues (those having less than 5 % accessibility). aacPG, number of PG core residues (those having less than 5 % accessibility).
way the mutation strategy would not lead to sequences with very different hydrophobic/hydrophilic residue distributions (see the sequence analysis in Materials and Methods). The cores of both proteins were de®ned as those residues which had less than 5 % solvent accessible area as measured with the program WHATIF (Vriend, 1990). Eight residues were found with this property in each of the two proteins (Table 1). The substitution of the non-core residues of the SH3 by the aligned residues of protein G gradually introduces the core of protein G, but the mutant protein retains the hydrophobic core of the SH3 (Table 1). In Table 2, we summarize the structural properties of all the mutants determined from several different techniques. The ®rst mutations were chosen based on solvent accessibility criteria (residues having more than 25 % solvent accessibility). The order and groups of mutations were also selected from practical considerations (the use of one oligonucleotide to mutate several residues close in the sequence) and the knowledge of the folding properties of the secondary structure elements of the proteins.
Mutants m2, m3, m4, m5, and m7, were done in the region corresponding to the a-helix and second b-hairpin of PG, regions that show native structure formation when isolated from the rest of the protein (Blanco et al., 1994; Blanco & Serrano, 1995). Next we made the sequences more similar to the PG at the ®rst hairpin also (mutants m6 and m8), and at the same time we shortened the sequence at the N terminus, as the ®rst ®ve residues of the SH3 are quite ¯exible and do not make contacts with the rest of the protein (Musacchio et al., 1992b; Blanco et al., 1997). A control SH3 protein without three of the N-terminal residues was made to check that the protein was still folded (m22). More mutations were introduced at several points in the sequence, yielding mutants m9 and m12. Mutant m12 has 52 % sequence identity with SH3 and 55 % sequence identity with PG but does not fold into a de®ned tertiary structure (see below). The inspection of a model structure of this sequence built onto the PG three-dimensional structure suggests that a major reason for the inability of this mutant to adopt the structure of PG could be the presence of a valine residue with
744
Structures of Hybrid Protein Sequences
Table 2. Summary of the experimental data for all the mutants (except m12 and m15, for which no expression was obtained) and the wild-type proteins FUV-CDa
NUV-CDb
Tempden
ANS
NMR
SH3 PG
SH3 like helixsheet
yes yes
cooperative cooperative
n.d. n.d.
dispersed dispersed
m22 m10 m14
SH3 like RC SH3 like
n.d. no n.d.
cooperative linear cooperative
n.d. no n.d.
dispersed, SH3 like not dispersed dispersed, SH3 like
m2 m3 m4 m7 m5 m6 m8 m9 m11 m13
SH3 like coil helix RC RC RC helix RC helix sheet helix, agg sheet RC, agg coil
no n.d. no n.d. no no weak very weak yes no
cooperative linear linear linear complex linear complex complex cooperative linear
n.d. n.d. n.d. n.d. n.d. n.d. no no no no
dispersed dispersed dispersed dispersed dispersed n.d. not dispersed not dispersed, broad not dispersed, broad not dispersed
m16 m21 m20 m19 m24 m28 m25 m26 m27 m29 m30 m31
sheet coil, agg RC RC
very weak no no no no no weak no no very weak yes yes
cooperative linear linear linear cooperative like linear cooperative like complex linear cooperative cooperative cooperative
yes n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d.
not dispersed, broad not dispersed not dispersed not dispersed* not dispersed, broad not dispersed not dispersed not dispersed not dispersed not dispersed* not dispersed, broad dispersed
Mutant
sheet, agg RC sheet, agg sheet, agg RC RC helix RC helix helix sheet
not not not not
a Secondary structure revealed by far-UV circular dichroism. For SH3, the abundance of aromatic residues, including two tryptophan residues, makes the CD very speci®c for this domain with no similarity to typical b-sheet protein spectra. For the rest of the sequences, the secondary structure description is just a qualitative evaluation based on the relative intensities of the dichroic bands as compared with reference spectra for random coil (RC), a-helix and b-sheet (Johnson, 1988). Concentration dependency of CD is indicated by agg (aggregation). b Signal in the near-UV circular dichroism spectrum. The criterion used was to compare the spectrum of the protein in the absence and the presence of 6 M urea and with the spectrum of the wild-type sequences. Tempden, shape of the temperature denaturation curve. ANS, binding of ANS. NMR, shape of the 1H monodimensional NMR spectra. In these cases there are some minor signals which show some dispersion and which disappear at high temperature. n.d., not determined.
backbone dihedral angles in the left-handed helical region of the Ramachandran map (the angles of Lys50 in PG). These angles are very unfavorable for a b-branched residue; however, it is one of the residues at the hydrophobic core of SH3, and it has been shown that its mutation to alanine strongly destabilized the protein (Viguera et al., 1996). However, the mutation of this residue to Lys (mutants m11 and m13) did not yield a protein able to fold as either SH3 or PG. We then decided to continue the original mutation strategy without changing the core of the SH3 protein by the introduction of an insertion, so that Val53 in SH3 was now aligned with a residue with more favorable backbone angles. Three residues were inserted in the middle of a two-residue b-turn of SH3 (with the sequence Asn47-Asp48), a position where some members of the SH3 family of sequences present insertions of various lengths (Musacchio et al., 1992a). This position permits the displacement of Val51 so that it is aligned with a threonine residue in protein G. A mutant protein was constructed with the sequence of the SH3 plus the insertion of three residues (Tyr-Asp-Asp from
the sequence of protein G in the alignment). In addition, Asp48 was mutated to Ala to avoid the presence of three aspartate residues contiguous in the sequence. This protein (m14) shows all the structural properties of SH3, although it is less stable (a decrease in the midpoint of temperature denaturation of about 10 K), con®rming that the protein tolerates this insertion. The same residues were also inserted in mutant m8, rendering m15. We continued changing SH3 into PG while keeping the SH3 core residues (m16, m19, m20, m21, m24, and m25). At this point the hypothesis test was completed. Mutant m25 contains the cores of both PG and SH3 (Table 1) but it does not fold into a de®ned tertiary structure. Even removing the last six residues that could interfere with the structure of protein PG (m36) does not favor the folding into the structure of protein G, although the sequence identity is 91 % (and 23 % identity with SH3). In order to achieve the ®nal PG fold it was necessary to eliminate some of the eight residues of the core of the SH3. To do that we followed structural criteria. Met25 of SH3 would occupy the N-cap position of the PG helix. Methionine is a bad
Structures of Hybrid Protein Sequences
residue to perform this structural role, which is done by an aspartate residue in PG. However, this sole substitution is not enough to achieve proper folding (m27). Two other residues could be impeding the folding into the PG structure; Val44, which occupies the position of a glycine residue at the connection of the helix and the hairpin in PG, and whose side-chain could bump with Phe33 as suggested by a model built onto the structure of PG, and Gly51, which should be replaced by a threonine residue, in the middle of the last b-strand of the protein G, since glycine is very unfavorable for b-sheet structure formation (Kim & Berg, 1993). The individual mutations, combined with the Met25 to Asp mutation, do not drive the folding into the structure of PG (m28 and m29), although m29 could form a small population of a folded structure (see below). When the three residue substitutions are done simultaneously in the same sequence (m31, with 96 % sequence identity with protein G), the fold of the protein G is ®nally obtained (Table 2).
745
Structural analysis of the mutant proteins
in the presence and absence of the protein (Table 2). One-dimensional (1D) 1H NMR spectra were performed also for all the mutants at different temperatures, pH values and mixtures of TFE/water (Table 2). The dispersion of the proton signals gives information about the formation of tertiary structures. For some of the mutants, two-dimensional (2D) spectra were also recorded in order to assign the individual resonances to speci®c residues in the sequence. In addition, the possible aggregation was checked by measuring the concentration dependency of the NMR signal chemical shifts and line widths, and the shape of the far-UV CD spectrum. The 27 mutants analyzed display a wide range of behaviors. A brief summary of the structural analysis is reported in Table 2. Far-UV circular dichroism spectra, thermal denaturation followed by far-UV CD, and the one-dimensional NMR spectra of some representative mutants are shown in Figures 2, 3, and 4. Based on the data shown in Table 2, the mutants can be grouped into seven different categories:
An extensive analysis of the formation of secondary structure by the mutant proteins was performed by UV circular dichroism in a wide range of conditions: pH values in the range of 2.512, denaturing agents (6 M urea), mixed organic/ aqueous solvents (tri¯uoroethanol (TFE) or acetonitrile between 5 and 50 % (v/v) in water) and non-ionic detergents such as octyl glucoside (Table 2). The effect of temperature on secondary structure was measured in the range 278-363 K, as a cooperative loss of structure is a common feature of folded proteins (Table 2). For some of the mutants, near-UV circular dichroism was measured in the presence and absence of 6 M urea to monitor the formation of asymmetric structural ordering around the aromatic residues, normally associated with de®ned tertiary structures (Table 2). The existence of solvent-accessible hydrophobic regions was monitored by the changes in ¯uorescence of 1-anilino-naphthalene-8-sulfonate (ANS)
(1) Mutants that fold into a structure similar to SH3 (m2, m14, and m22). Mutants m22 and m14 fold as SH3 and they are controls for the effect of shortening the N terminus and for the insertion of three residues between Asp47 and Asn48. The m2 protein folds as wild-type SH3, albeit with reduced stability, as shown by the temperature denaturation curve (Figure 3(a); the temperature at the midpoint is reduced by about 10 K as compared with the wild-type sequence). The NMR signals were dispersed, and correspond to the formation of a distinct fold, although they are a bit broadened relative to the spectrum of SH3 or PG, indicating a more mobile or ¯uctuating structure (Figure 4). (2) Soluble mutants that aggregate only at very extreme pH values: mutants m4 and m6 could only be analyzed at pH 2.5, where they formed intermolecular b-sheet structures, or at pH 11,
Figure 2. Far-UV spectra of (a) mutants m2 (circles), m13 (squares) and m31 (triangles) and (b) mutants m29 (triangles) and m9 at two different concentrations (170 mM, circles; 5 mM, squares). The spectra were obtained at 278 K in 10 mM sodium phosphate (pH 7.0) except for m29 that was done at pH 6.0.
746
Structures of Hybrid Protein Sequences
Figure 3. Temperature denaturation curves monitored by the change in ellipticity for mutants m2, m13, m31 and wild-type SH3 (WT) (a), and mutants m29 and m9 (b). The ellipticity was monitored at 220 nm and was measured for all the mutants except for m2 and WT, that were done at 235 nm, where the sensitivity of the change in ellipticity is better in these cases. The buffers used were the same as for Figure 2.
where they behaved as random coils (data not shown). (3) Mutants that are soluble at pH 6-7 (m9, m11, m16, m24, m25, and m26). They typically have a high content of secondary structure (with different proportions of helix, sheet or coil structures as qualitatively inferred from the shape of the far-UV spectra), but no tertiary structure as monitored by the signal dispersion in the 1H NMR spectrum. The aggregation is detected through the dependency of the far-UV CD spectra on concentration and by broad 1H NMR signals. Apart of these common features, other structural probes yielded different results for these mutants. Some of them present a cooperative transition, with two de®ned plateaus (m11 and m16), or close to a cooperative one, when one of the plateaus would lie outside the measured 278-363 K temperature range (m24 and m25; data not shown). For mutants m9 (Figures 24) and m26 (data not shown) the curves are classi®ed as complex (non-linear curves at low and high temperature and a strong deviation from a sigmoidal curve (Figure 3(b)). Some of the mutants have some signal in the near-UV CD spectrum, a feature normally associated with ®xed tertiary structures. Also, most of them do not bind ANS, a probe for the existence of accessible partly formed hydrophobic cores in the protein (with the exception of m16). All these features suggest that these molecules adopt compact globular structures, but poorly de®ned, which make them prone to aggregation. (4) Mutants with a high content of secondary structure and apparently non-aggregating but without a de®ned tertiary structure (m5, m8, m29,
and m30). Mutants m5 and m8 do not show a dependency of CD signal with concentration, and the NMR signals are not dispersed and are broad, but not as broad as those in the previous group. This suggests the formation of partly folded structures in the monomeric state. The temperature denaturation curves are complex and not reversible, suggesting intermolecular interactions at least at high temperatures (data not shown). The temperature denaturation of m29 exhibits a cooperative transition although the ®rst plateau is not seen (Figure 3(b)), and interestingly, a minor set of dispersed signals is recognizable at the NMR high-®eld region of the one-dimensional spectrum (Figure 4). This set of signals decreases in intensity with respect to the major ones when the temperature is reduced or increased from 278 K, suggesting that there is a small population of molecules folding into a de®ned three-dimensional structure, but with low stability. The intensity of the signals is, however, too small to obtain more detailed information from 2D NMR spectra. Mutant m30, in contrast to m29, has a linear dependency of the ellipticity at 220 nm with temperature, and the signals in the NMR spectrum are broad and not dispersed (data not shown). (5) Mutants behaving as random coils, with no or very little secondary structure (mutants m3, m7, m10, m13, m19, m20, m21, m27, and m28). All these mutants have similar CD and NMR spectra and the temperature denaturation curve is linear with the ellipticity at 220 nm decreasing almost monotonically with temperature, typical of polypeptides with no tertiary structure and little or no secondary structure. In the NMR spectrum of m19,
747
Structures of Hybrid Protein Sequences
mutants. Most of these mutants also form different populations of helical structures in the presence of TFE, a solvent that is known to stabilize this kind of secondary structure in sequences that have some tendency to form helices. These results suggest that while most of these mutants are largely unstructured, their sequences still have some tendency to form secondary structures. (6) Mutants not expressed in our system, based on bacteriophage T7 expression system in Escherichia coli. Only the clones of mutants m12 and m15 yielded no detectable expression, even when growing the cells at several temperatures. (7) Mutants that fold into a structure similar to the protein G structure. Mutant m31 has all the structural attributes of a properly folded protein, and the structure is that of PG as indicated by the conformational shifts of the backbone Ca protons (Figure 5(b)) and nuclear Overhauser enhancements (NOEs; not shown).
NMR structural analysis of an unstructured mutant
Figure 4. One-dimensional NMR spectra of mutants m2, m9, m13, m29, and m31. The spectra were recorded in 10 mM sodium phosphate buffer at pH 7.0 (m2, m13, m29) or pH 6.0 (m9 and m31) and 298 K (m2 and m31) or 278 K (m9, m13, and m29). The sharp signal at 0.00 ppm is the reference TSP (see Materials and Methods). The strong and sharp signals at around 3.5 ppm in the spectra of m29 is a contamination from the membranes used for sample concentration.
a minor set of dispersed signals is recognizable at the high-®eld region of the one-dimensional spectrum (data not shown). This set of signals decreases in intensity when the temperature is reduced or increased from 298 K, a similar behavior to mutant m29. Mutant m10 contains only two mutations, at the positions of the two tryptophan residues of SH3, neither or them belonging to the core. This mutant was constructed to obtain a CD spectrum that would better represent the secondary structure of SH3 with a reduced contribution of the aromatic side-chains. However, the mutant behaves as a random coil (data not shown). As has been found in a number of short linear peptides or protein fragments, some amount of locally folded structures (a-helices, b-hairpins or b-turns) could be present in equilibrium with random conformations in these
Mutant m13 is interesting since it has similar sequence identities with SH3 and PG. This mutant exhibits a far-UV CD spectrum typical of an unordered polypeptide with some minor amounts of secondary structure (negative ellipticity at 220 nm; Figure 2(a)). There is no indication of protein aggregation either by CD or NMR. Its proton NMR spectrum is not dispersed, but the amide proton signals were dispersed enough to assign most of the sequence using homonuclear spectroscopy and combining spectra acquired at several temperatures and pH values. The conformational shifts of the Ca protons (Figure 5(a)) are very small for most of the residues (except for the ®rst and last ones, due to the effect of the terminal charges), indicating that this sequence has little folded structure. At the C terminus there is a group of residues with signi®cantly larger conformational shifts, suggesting the presence of a small population of local non-random structure. However, comparison with the conformational shifts of the Ca protons corresponding to three fragments spanning the whole sequence of PG (Blanco & Serrano, 1995), shows no evidence for the independent formation of a detectable population of b-hairpin and/or a-helix structure as occurs in these fragments. In the presence of increasing concentrations of TFE, the NMR signals broaden and the dispersion of the Ca proton resonances is reduced, with an overall shift to up®eld values. These observations suggest helical structure formation, but it is not possible to assign the spectra in these conditions. A titration with increasing proportions of aqueous TFE by CD shows that the molecule adopts a mixture of random coil and helical conformations with a maximum average helical population of 36 % (data not shown).
748
Structures of Hybrid Protein Sequences
Figure 5. Conformational shifts of the Ca proton resonances. (a) Mutant m13 obtained at 278 K, 10 mM phosphate, pH 7.0 (®lled circles and crosses). The resonances corresponding to leucine residues 8, 10, 31, and 33, valine residues 9, 23, 44, 57, and 58, and glutamate residues 45 and 59 could not be unambiguously assigned due to partial overlap with other residues of the same type; however, it was possible to assign a range of values for these protons that result in a range of possible conformational shifts. The biggest values are shown as dots for each of these residues and the smallest ones by crosses. The numbering of the sequences follows that of the wild-type protein, so that the ®rst residue in mutant 13 is number 3. The open squares represent values obtained for three peptides that span the whole sequence of PG (Blanco & Serrano, 1995). (b) Conformational shifts of the Ca proton resonances of wild-type PG at pH 5.4, 298 K (open squares; Gronenborn et al., 1991) and m31 (®lled circles) at 298 K, 10 mM phosphate buffer (pH 6.0).
Discussion The folding behavior of the hybrid sequences Mutant m2 bears ®ve mutations and its sequence still speci®es the SH3 fold. However, the hydrophobic core of the SH3 domain is not enough by itself to sustain its folded structure in other mutants, and, for example, mutant m10 with only two mutations at partly solvent-exposed positions (Trp41 and 42), is unstructured. In a similar way, the mutation of all the non-core residues of SH3 (in mutants m25 and m26, which contain the core of the PG and most of its non-core residues as well) is not enough to drive folding of the sequence into PG structure. To obtain the PG fold, it has been necessary to replace three of the remaining eight residues of the core of SH3. These results could change to a certain extent using different criteria for the selection of the core residues (for example Trp42, mutated in m10, could have been classi®ed as a core residue although it has 25 % of solventaccessible surface), but it would be a trivial result that enlarging the number of core residues increases the number of sequences able to fold as SH3. The results would probably lead to the same conclusion, that the hydrophobic core is a global feature of all folded proteins but a particular core does not de®ne a set of interactions as exclusive,
with the ability to specify the fold. The core is necessary to stabilize a structure, but it is not suf®cient to specify it. The set of sequences analyzed here are hybrids of the sequences of SH3 and PG and represent a more or less uniform sampling of sequence identities between 100 and 10 % with each protein, but only those sequences very similar to the wild-type proteins have unique folds. Would it have been possible to ®nd a sequence with comparable identity to both proteins which still folded into one of the two folds? Our experimental results suggest that following our strict and simple strategy of mutating everything but the core of SH3 would not succeed. We could do it, however, if we were able to evaluate the energies of all possible interactions and then identify a subset that would make a fold unique, overriding other subsets that would favor the other fold. Regan and co-workers have shown that such a comprehensible knowledge is not indispensable (Dalal et al., 1997). They have designed a sequence which seems to have this property. It folds into a dimeric helical structure while holding about 50 % identity with PG. This is a remarkable result, made possible by the great deal of knowledge accumulated over the last years about the determinants of helical structure formation (Chakrabartty & Baldwin, 1995; MunÄoz & Serrano, 1995) and from the previous extensive
Structures of Hybrid Protein Sequences
749
characterization of the folding of the protein Rop (their design target, with a four-helix bundle structure). Especially useful was the information on the effect of helix packing on Rop stability (Munson et al., 1996), and that the designed sequence was not constrained to be a hybrid of PG and Rop sequences. However, more importantly for our work, in their case the hydrophobic core of PG was disrupted by introducing polar residues. A similar experiment used another helical protein as the target for design (the B domain of protein A) but their sequence was unfolded (Jones et al., 1996). Recently, Yuan & Clark (1998) have reported an approach that is more similar to what we have done in our experiment. They designed a hybrid sequence between their parent (434 Cro, with a helical structure) and their target protein (PG) while keeping the core residues of the parent protein. This hybrid was unfolded and the authors concluded that the core residues and a similar binary pattern of polar and apolar residues is not enough to specify the fold of PG. although the number of experiments of this kind is small, it is tempting to speculate that in order to change the fold of one sequence into another, its hydrophobic core needs to be destroyed while simultaneously introducing the new one. The reason behind this is that alternative compact conformations with similar energies as the target one could be present competing with the expected native fold. The identification of foldable sequences It is interesting to consider if it is possible to obtain a simple and reliable prognosis of the ability of these sequences to fold into the SH3 or PG structures before actually doing the experiments. To do so we have modeled the 27 hybrid sequences onto the three-dimensional structures of both SH3 and PG proteins. the quality of the models have been evaluated by two of the more common tools available for the evaluation of protein structure qualities and fold recognition: PROSAII (Sippl, 1993) and the packing quality evaluation implemented in the program WHATIF (Vriend, 1990). Both tools are based on the use of threading potentials, and the simplest output is a global Z-score that describes how well a particular sequence of amino acid residues ®ts into the three-dimensional structure. The Z-scores of all the models plus the crystal structures of SH3 and PG are represented in Figure 6. Based on the results with high-quality protein structures, PROSAII Z-scores between ÿ10 and ÿ5 would correspond to a ``correct'' model for a protein of the size of SH3 and PG. In the case of WHATIF, Z-scores smaller than ÿ3 are considered bad structures or poorly re®ned models, and only values under ÿ5 are classi®ed as wrong structures. The results show that the two criteria recognize as wrong structures the SH3 sequence modeled onto the PG structure and the PG sequence modeled onto the SH3 structure, but many of the hybrid sequences can be ``correctly'' modeled onto both
Figure 6. Plots of the Z-scores of the models of all the sequences modeled onto the structure of SH3 (shaded bars) or onto the structure of PG (stippled bars). (a) PROSAII Z-scores; (b) WHATIF packing quality Z-scores. The horizontal lines in both plots delimit the values that de®ne a model as correct (see the text). Sequences are ordered as in Table 1 by increasing identity with protein G. S and G stand for the SH3 and PG sequences, respectively.
structures. PROSAII detects seven wrong SH3 models and ten wrong PG models. WHATIF recognizes only one wrong SH3 model (the one built with the PG sequence) and six wrong PG models. Both programs accept more models based on the SH3 structure than on the PG structure, very likely due to the presence of the SH3 core residues in most of the mutant sequences. This re¯ects the fact that the main criterion used in these potentials to discriminate between correct and wrong structures is the partition of polar and apolar residues of the sequence into the surface and core of the trial structure. In fact, when using just the inside/outside distribution, the Z-scores calculated by WHATIF, which measure how this partitioning of the residues compares with what is typical for globular proteins, give similar results, with nine models on
750 PG structure classi®ed as outside the range expected for protein structures (data not shown). These results might suggest that the hybrid sequences cannot fold because they are similarly suited to fold into SH3 or PG. However, the fact that both methods assign a good score to m10, which is unfolded although close in sequence to SH3, indicates that the main problem is that the potentials are incomplete or not precise enough to detect other than really bad models. Would other approximations be more ef®cient? It could be so, as suggested in a recent report of nanosecond-range molecular dynamics trajectories in solvated models of several variants of PG and on one of the mutants analyzed here, m29, modeled on PG (D. Cregut & L.S., unpublished results). In this work, the molecules that were folded maintained stable structures after 1 ns of free molecular dynamics simulations while the structural models of molecules that were unfolded were unstable and quickly lost long-range order during the simulation. However, this procedure has two limitations. On the one hand it is computationally very expensive, and on the other hand, it is possible to perform one or a few trajectories on a single molecule, but reliable results would only come from statistics from several such long trajectories, impractical to perform today on many different models of 60-residue proteins. It follows that in order to evaluate the foldability of a sequence an experimental analysis is indispensable. Even working with the actual protein samples it has been necessary to make several complementary measurements to examine their folding behavior. We have found that, in our expression system, protein expression levels and the presence of the protein in the supernatant or the pellet after cell disruption has zero predictive value on the foldability of a sequence (m13 is highly expressed as soluble protein, for example). The widely used observation of sigmoidal dependency of a structural probe with a denaturing agent is not a de®nitive criterion and can lead to errors, considering as folded structures conformational states that lack the speci®city of typical protein structures (Uversky & Ptitsyn, 1995). With the molecules analyzed here, the simplest and the only de®nitive criterion has been the observation of a well-dispersed 1 H NMR spectrum with sharp signals that should correspond to a small folded protein. Sequence space, protein folding and evolution The sequence space between SH3 and PG is enormous, and we have sampled just a tiny portion of it, and with the constraint that all the sequences are hybrids of SH3 and PG sequences. However, our results suggest that only a small fraction of this space will have adequate properties for folding into a unique structure. The evolution of natural proteins can probably be described as a random walk over the sequence space subjected to a complex selection pressure, involving function,
Structures of Hybrid Protein Sequences
stability, and rapid folding, that de®ne the ``®tness'' of a particular protein (Govindarajan & Goldstein, 1997a). Simulations of the evolution of sequences, using simpli®ed lattice structure models, suggest that a strong selective pressure on foldability is necessary to reproduce the general features of the frequencies of residue per residue mutations found in natural proteins (Govindarajan & Goldstein, 1997a). Furthermore, as pressure increases, the simulated evolutionary walks become increasingly con®ned to neutral networks of sequences where different sets of interactions preserve the same structure. It is argued that for highly optimized sequences (high foldability) the sequence space allowed could be large enough to overlap with other spaces determining another fold, so that it could be reached within a valid (all the sequences being foldable) evolutionary trajectory (Govindarajan & Goldstein, 1997b). Our experimental results suggest that this is unlikely to happen. However, nature could have other means to preserve evolving sequences not able to fold. For example, whenever the function of the original folded sequence is maintained by gene duplication, one gene will be able to become non-functional (pseudogene) as long as the other remains functional (Haldane, 1933). The fact that most of the hybrid sequences survived in the bacterial environment, and that there are several examples of proteins that exist in ``native unfolded states'' (Weinreb et al., 1996), indicates that the avoidance of aggregation or degradation is not a major obstacle to create protein structure diversity through random sequence drifting.
Materials and Methods Mutagenesis and cloning The pET3d plasmid coding for the wild-type SH3 domain was a generous gift from Dr M. Saraste. A clone of the B2 domain of protein G was a gift from Dr J. P. Derrick, from which the B1 domain was obtained by mutagenesis. The site-directed mutagenesis was done by PCR using either two mutagenic oligonucleotides (Hemsley et al., 1989) or only one (megaprimer strategy; Sarkar & Sommer, 1990). All the constructs were subcloned into the plasmid pBAT (ParaÈnen et al., 1996) within the NcoI and HindIII cloning sites and sequenced by the dideoxy method (Sanger et al., 1977; Hattori & Sasaki, 1986). The use of NcoI introduces as the second residue either Gly, Asp, Val or Ala. Since the residue preceding the ®rst threonine residue in the B1 domain is an aspartate (in the full sequence of streptococcal protein G; Fahnestock et al., 1986), this amino acid residue was selected as the second after the initial methionine in our studies. This aspartate residue is not present in the clone used in the structure determination of the domain (Gronenborn et al., 1991; Gallagher et al., 1994). The structural properties of both clones are the same, although the clone with the aspartate is slightly less stable (D. Cregut & L.S., unpublished results).
751
Structures of Hybrid Protein Sequences Protein expression and purification
ANS binding essays
The proteins were expressed in E. coli BL21(DE3) cells grown in Luria broth medium at 37 C. Expression of the proteins was induced with 1 mM isopropyl-b-D-thiogalactopyranoside when the absorbance at 600 nm was around 0.6, and the cells were allowed to grow further for about three hours. Only two of the clones did not show protein expression as monitored by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDSPAGE) stained with Coomassie blue, even growing the cells at 30 C. The cells were then pelleted by centrifugation, resuspended in 10 mM sodium phosphate buffer (pH 7.0) and broken by extensive sonication. After ultracentrifugation, most of the mutant proteins were found both in the soluble fraction and in the pellet. Only four mutants were predominantly soluble (m2, m13, m29, and m31) and were puri®ed basically as described (Viguera et al., 1994). The rest were recovered by washing the pellets with 6 M urea, that solubilized most of the protein. Solubilized nucleic acids were precipitated with 0.05 % (v/v) polyethilenimine. The supernatant was applied to a HiLoad 26/60 Superdex 75 gel-®ltration column (Pharmacia) with 6 M urea, 150 mM NaCl, 10 mM sodium phosphate (pH 7.0) as running buffer on a FPLC system (Pharmacia). Fractions containing the protein were pooled, diluted tenfold with different buffers depending on the optimal solubility of the proteins and then concentrated in a stirred cell (Amicon) using a membrane with a cut off of 3000 Da. The concentrated samples were extensively dialyzed against the appropriate buffer (10 mM sodium phosphate (pH 6-7 or pH 3.5)) in a membrane with a 3500 Da cut off (Electrapor) and further concentrated in small Amicon concentration units. At this point most of the samples appeared as a single band in SDS-PAGE. some of the proteins needed further puri®cation by anion-exchange chromatography on HR monoQ columns (Pharmacia). In these cases, after elution with a gradient of NaCl, the samples were dialyzed and concentrated as explained above.
These were performed by ¯uorescence measurements at 278 K using 1 cm path length cells in an Aminco Bowman ¯uorimeter. The concentrations of the protein and ANS were 10 and 250 mM, respectively. Emission spectra of the ANS were recorded from 400 to 600 nm with an excitation wavelength of 396 nm.
Circular dichroism spectroscopy CD spectra were recorded at 5 C on a Jasco-710 instrument calibrated with (1S)-()-10-camphorsulfonic acid in the continuous mode, with 1 nm bandwidth, one second response and 100 nm/minute scan speed. In the far-UV region, the spectra were the average of 20 scans, recorded in cuvettes with path lengths of 0.01 cm and 1 cm on samples that were between 180 and 3 mM to monitor aggregation effects. In the near-UV region, 40 scans were averaged, obtained in 1 cm cells on protein samples between 70 and 130 mM. The concentration of protein stock solutions was measured by ultraviolet absorbance (Gill & von Hippel, 1989). Thermal denaturation curves were measured in 2 mm path length cuvettes closed with a te¯on cap on 20 mM protein solutions in the range 278 to 368 K. Point measurements were taken every 0.5 or 0.1 K at a rate of 1 K/minute and the CD signal measured at 220 nm or 235 nm, depending on the shape of the spectra. Reversibility was tested for some of the samples by monitoring the change in signal upon cooling the sample back to 278 K. For the rest of the samples, reversibility was checked simply by comparison of the spectra at the starting and ending temperatures and again at 278 K after cooling the sample down.
1
H NMR spectroscopy
NMR samples were prepared at different concentrations, depending on protein solubility and availability, in 600 ml of H2O/2H2O, 9:1 (v/v) or in H2O/perdeuterated TFE (CF3C2H2O2H). There were minute amounts of HCl or NaOH in the appropriate buffer. Sodium 3-trimethylsilyl (2,2,3,3-2H4) propionate (TSP) was used as an internal reference at 0.00 parts per million (ppm). NMR experiments were performed on Bruker AMX-500 or DRX-500 spectrometers. One-dimensional spectra were acquired with 32,000 data points which were zero-®lled to 64,000 data points before Fourier transformation. Two-dimensional (2D) spectra were acquired in the phase-sensitive mode using the time proportional phase incrementation (TPPI) technique (Marion & WuÈthrich, 1983). 2D double quantum ®ltered 2D scalar correlated spectroscopy (DQFCOSY; Piantini et al., 1982) and 2D nuclear Overhauser effect spectroscopy (NOESY; Kumar et al., 1980) experiments were recorded using standard pulse sequences. Total correlation spectroscopy (TOCSY; Bax & Davis, 1985) spectra were acquired using the standard MLEV17 spin lock sequence and 80 ms mixing time. The water signal was suppressed by presaturation during the relaxation delay (one second) and also during the mixing time of NOESY spectra (50-150 ms). The data were processed with the programs UXNMR or XWINNMR from Bruker on Silicon graphics work stations. For mutants m13 and m31, the 1H NMR resonances were assigned by the sequential assignment procedure (WuÈthrich, 1986). A complete assignment was obtained for the backbone protons of m31 and only 82 % for m13, due to severe signal overlap. The speci®c patterns of cross-peaks typical of each amino acid were identi®ed as part of the sequential assignment strategy, but no exhaustive assignment of side-chain protons was performed. The conformational shifts of the Ca protons were obtained by subtracting the random coil values of each amino acid (Bundi & WuÈthrich, 1979; Merutka et al., 1995) from those measured in the mutants, and corrected for neighboring aromatic side-chain contributions as described (Merutka et al.,1995; Viguera et al., 1995). Sequence analysis, structure modeling and model quality evaluation The sequences of the mutants were analyzed by the Statistical Analysis of Protein Sequences (SAPS) server accessible at the following address: http://ulrec3.uni.ch/ software/SAPS form.html. This statistical analysis involves a wide variety of protein sequence properties including the distribution of charged or hydrophobic residues (Brendel et al., 1992). No unusual features were detected in the sequences. The sequences were modeled on the crystal structures of SH3 or PG using the procedures implemented in the program WHATIF (Vriend, 1990), that make use of position-speci®c rotamers and the sorting of the residues to be modeled as a function of their freedom in rotamer space (Chinea et al., 1995). The models were then
752 evaluated with the program PROSAII (Sippl, 1993) and the WHATCHECK option of WHATIF (http:// www.sander-embl-heidelberg.de/whatcheck). PROSAII Z-scores were obtained with the pair and surface potentials combined by automatic weighting of both energy terms.
Acknowledgments We are grateful to Matti Saraste for the plasmid encoding the SH3 domain, and to Marko HyvoÈnen for plasmid pBAT and assistance in molecular biology. We thank MarõÂa J. MacõÂas, Jose MartõÂnez and David Cregut for help performing some of the experimental work and Emmanuel Lacroix for helpful discussions. F. J. B. was supported by a postdoctoral fellowship from the Spanish ministry of Science and Education.
References Alexander, P., Orban, J. & Bryan, P. (1992a). Kinetic analysis of folding and unfolding the 56 amino acid IgG-binding domain of streptococcal protein G. Biochemistry, 31, 7243-7248. Alexander, P., Fhanestock, S., Lee, T., Orban, J. & Bryan, P. (1992b). Thermodynamic analysis of the folding of the streptococcal protein G IgG-binding domains B1 and B2: why small proteins tend to have high denaturation temperatures. Biochemistry, 31, 35973603. Bax, A. & Davis, D. G. (1985). MLEV-17 based twodimensional homonuclear magnetization transfer spectroscopy. J. Magn. Reson. 65, 355-360. Beasley, J. R. & Hecht, M. H. (1997). Protein design: the choice of de novo sequences. J. Biol. Chem. 272, 20312034. Blanco, F. J. & Serrano, L. (1995). Folding of protein G B1 domain studied by the conformational characterization of fragments comprising its secondary structure elements. Eur. J. Biochem. 230, 634-649. Blanco, F. J., Rivas, G. & Serrano, L. (1994). A short linear peptide that folds into a native stable b-hairpin in aqueous solution. Nature Struct. Biol. 1, 584-590. Blanco, F. J., RamõÂrez, A. & Serrano, L. (1997). 1H and 15 N assignment and solution structure of the SH3 domain of spectrin: comparison of unre®ned and re®ned structure sets with the crystal structure. J. Biomol. NMR, 9, 347-357. Bowie, J. U., Reidhaar-Olson, J. F., Lim, W. A. & Sauer, R. T. (1990). Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science, 247, 1306-1310. Brendel, V., Bucher, P., Nourbakhsh, I., Blaisdell, B. E. & Karlin, S. (1992). Methods and algorithms for statistical analysis of protein sequences. Proc. Natl Acad. Sci. USA, 89, 2002-2006. Bryngelson, J. D., Onuchic, J. N., Socci, N. D. & Wolines, P. G. (1995). Funnels, pathways and the energy landscape of protein folding: a synthesis. Proteins: Struct. Funct. Genet. 21, 167-195. Bundi, A. & WuÈthrich, K. (1979). 1H-NMR parameters of the common amino acid residues measured in aqueous solution of the linear tetrapeptides H-GlyGly-X-L-Ala-0H. Biopolymers, 18, 285-298. Chakrabartty, A. & Baldwin, R. L. (1995). Stability of a-helices. Advan. Protein Chem. 46, 141-176.
Structures of Hybrid Protein Sequences Chinea, G., Padron, P., Hooft, R. W. W., Sander, C. & Vriend, G. (1995). The use of position-speci®c rotamers in model-building by homology. Proteins: Struct. Funct. Genet. 23, 415-421. Cordes, M. H. J., Davidson, A. R. & Sauer, R. T. (1996). Sequence space, folding and protein design. Curr. Opin. Struct. Biol. 6, 3-10. Dalal, S., Balasubramanian, S. & Regan, L. (1997). Protein alchemy: changing b-sheet into a-helix. Nature Struct. Biol. 4, 512-548. Dill, K. A., Bromberg, S., Yue, K., Fiebig, K. M., Yee, D. P., Thomas, P. D. & Chan, H. S. (1995). Principles of protein folding. A perspective from simple exact models. Protein Sci. 4, 561-602. Fahnestock, S. R., Alexander, P., Nagel, J. & Filpula, D. (1986). Gene for a immunoglobulin-binding protein from a group G Streptococcus. J. Bacteriol. 167, 870880. Gallagher, T., Alexander, P., Bryan, P. & Gilliland, G. L. (1994). Two crystal structures of the B1 immunoglobulin-binding domain of streptococcal protein G and comparison with NMR. Biochemistry, 33, 47214729. Gill, S. C. & von Hippel, P. H. (1989). Calculation of protein extinction coef®cient from amino acid sequence data. Anal. Biochem. 182, 319-326. Govindarajan, S. & Golstein, R. A. (1997a). The foldability landscape of model proteins. Biopolymers, 42, 427-438. Govindarajan, S. & Goldstein, R. A. (1997b). Evolution of model proteins on a foldability landscape. Proteins: Struct. Funct. Genet. 29, 461-466. Gronenborn, A. M., Filpula, D. R., Essig, N. Z., Achari, A., Whitlow, M., Wing®eld, P. T. & Clore, G. M. (1991). A novel, highly stable fold of the immunoglobulin binding domain of streptococcal protein G. Science, 253, 657-661. Haldane, J. B. S. (1933). The part played by recurrent mutation in evolution. Am. Nat. 67, 5-7. Hattori, M. & Sasaki, Y. (1986). Dideoxy sequencing method using denatured plasmid templates. Anal. Biochem. 152, 232-238. Hemsley, A. A., Norman, M. D., Toney, G., Cortopasi, & Galas, D. (1989). A simple method for site speci®c mutagenesis using the polymerase chain reaction. Nucl. Acids Res. 17, 6545-6551. Johnson, W. C. (1988). Secondary structure of proteins through circular dichroism spectroscopy. Annu. Rev. Biophys. Biophys. Chem. 17, 145-166. Jones, D. T., Moody, C. M., Uppenbrink, J., Viles, J. H., Doyle, P. M., Harris, C. J., Pearl, L. H. & Thornton, J. M. (1996). Towards meeting the paracelsus challenge: the design, synthesis and characterization of paracelsin-43, an a-helical protein with over 50 % sequence identity to an all-b-protein. Proteins: Struct. Funct. Genet. 24, 502-513. Karplus, M. & Sali, A. (1995). Theoretical studies of protein folding and unfolding. Curr. Opin. Struct. Biol. 5, 58-73. Kim, C. A. & Berg, J. M. (1993). Thermodynamic betasheet propensities measured using a zinc-®nger host peptide. Nature, 362, 267-270. Kraulis, P. J. (1991). MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallog. 24, 946-950. Kumar, A., Ernst, R. R. & WuÈthrich, K. (1980). A twodimensional Overhauser effect enhancement (2D NOE) experiment for the elucidation of complete proton-proton cross-relaxation networks in biologi-
Structures of Hybrid Protein Sequences cal macromolecules. Biochem. Biophys. Res. Commun. 95, 1-6. Lattman, E. E. & Rose, G. D. (1993). Protein foldingwhat's the question? Proc. Natl Acad. Sci. USA, 90, 439-441. Marion, D. & WuÈthrich, K. (1983). Application of phase sensitive two dimensional correlated spectroscopy (COSY) for measurements of 1H-1H spin-spin coupling constants in proteins. Biochem. Biophys. Res. Commun. 113, 967-974. Merutka, G., Dyson, H. J. & Wright, P. E. (1995). Random coil 1H chemical shifts obtained as a function of temperature and tri¯uoroethanol concentration for the peptide sries GGXGG. J. Biomol. NMR, 5, 14-24. MunÄoz, V. & Serrano, L. (1995). Helix design, prediction and stability. Curr. Opin. Biotech. 6, 382-386. Munson, M., Balasubramanian, S., Fleming, K. G., Nagi, A. D., O'Brian, R., Sturtevant, J. M. & Regan, L. (1996). What makes a protein a protein? Hydrophobic core designs that specify stability and structural properties. Protein Sci. 5, 1584-1593. Musacchio, A., Gigson, T., Lehto, V. & Saraste, M. (1992a). SH3-an abundant protein domain in search of a function. FEBS Letters, 307, 55-61. Musacchio, A., Noble, M. E. M., Paupit, R., Wierenga, R. K. & Saraste, M. (1992b). Crystal structure of a Src-homology (SH3) domain. Nature, 359, 851-855. ParaÈnen, J., Rikkone, M., HyvoÂnene, M. & KaÈaÈriaÈinen, L. (1996). T7 vectors with a modi®ed T7 lac promoter for expression of proteins in Escherichia coli. Anal. Biochem. 236, 371-373. Piantini, U., Sùrensen, O. W. & Ernst, R. R. (1982). Multiple quantum ®lters for elucidating NMR coupling networks. J. Am. Chem. Soc. 104, 6800-6801. Rose, G. D. & Creamer, T. P. (1994). Protein folding: predicting predicting. Proteins: Struct. Funct. Genet. 19, 1-3.
753 Sanger, F., Nicklen, S. & Coulson, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proc. Natl Acad. Sci. USA, 74, 5463-5467. Sarker, G. & Sommer, S. S. (1990). The ``megaprimer'' method of site-directed mutagenesis. Biotechniques, 8, 404-407. Sippl, M. J. (1993). Recognition of errors in three-dimensional structures of proteins. Proteins: Struct. Funct. Genet. 17, 355-362. Uversky, V. N. & Ptitsyn, O. B. (1995). All-or-none solvent induced transitions between native, molten globule and unfolded states in globular proteins. Folding Design, 1, 117-122. Viguera, A. R., Martinez, J. C., Filimonov, V., Mateo, P. & Serrano, L. (1994). Thermodynamic and kinetic analysis of the SH3 domain of spectrin. Biochemistry, 33, 2142-2150. Viguera, A. R., Jiminez, M. A., Rico, M. & Serrano, L. (1995). Conformational analysis of peptides corresponding to b-hairpins and a b-sheet, that represent the entire sequence of a-spectrin SH3-domain. J. Mol. Biol. 255, 507-521. Viguera, A. R., Serrano, L. & Wilmanns, M. (1996). Different folding transition states may result in the same native structure. Nature Struct. Biol. 3, 874-880. Vriend, G. (1990). WHATIF: a molecular modelling and drug design program. J. Mol. Graph. 8, 52-65. Weinreb, P. H., Zhen, W., Poon, A. W., Conway, K. A. & Lansbury, P. T. (1996). NACP, a protein implicated in Alzheimer's disease and learning, is natively unfolded. Biochemistry, 35, 17709-17715. WuÈthrich, K. (1986). NMR of Proteins and Nucleic Acids, John Wiley & Sons, New York. Yuan, S. & Clarke, N. D. (1998). A hybrid sequence approach to the Pareceluss challenge. Proteins: Struct. Funct. Genet. 30, 136-143.
Edited by A. R. Fersht (Received 1 September 1998; received in revised form 22 October 1998; accepted 22 October 1998)