Combinatorial approaches: A new tool to search for highly structured -hairpin peptides Maria Teresa Pastor*, Manuela Lo´pez de la Paz†, Emmanuel Lacroix†, Luis Serrano†, and Enrique Pe´rez-Paya´*‡ *Department of Biochemistry and Molecular Biology, University of Vale`ncia, Vale`ncia, E-46100 Burjassot, Spain; and †European Molecular Biology Laboratory, Structures and Biocomputing, Meyerhofstrasse 1, D-69012 Heidelberg, Germany Communicated by William F. DeGrado, University of Pennsylvania School of Medicine, Philadelphia, PA, November 1, 2001 (received for review September 6, 2001)
Here we present a combinatorial approach to evolve a stable -hairpin fold in a linear peptide. Starting with a de novo-designed linear peptide that shows a -hairpin structure population of around 30%, we selected four positions to build up a combinatorial library of 204 sequences. Deconvolution of the library using circular dichroism reduced such a sequence complexity to 36 defined sequences. Circular dichroism and NMR of these peptides resulted in the identification of two linear 14-aa-long peptides that in plain buffered solutions showed a percentage of -hairpin structure higher than 70%. Our results show how combinatorial approaches can be used to obtain highly structured peptide sequences that could be used as templates in which functionality can be introduced.
I
n recent years, the number of natural and nonnatural peptides applied for pharmacological purposes has increased significantly (1). Poor bioavailability and pharmacokinetics, however, often have prevented their wide use as therapeutic agents. An additional problem is their conformational flexibility resulting in poor binding to the target. Therefore, a major challenge in peptide design is to develop methods that allow the rapid discovery of new sequences that give rise to properly structured peptides. The conventional research process to design peptides with an intended fold has consisted in the search for protein fragments that exhibit the desired structural motif (2) or in the de novo design of new sequences based on amino acid propensities and statistical preferences (3). Although this stepwise approach has served well, it is long and expensive because of the number of iterative steps that are required to obtain a folded peptide. The rapid discovery of new target products by large-scale genomics and proteomics initiatives therefore necessitates the search for alternative strategies to design peptides and proteins capable of specific-target recognition. Combinatorial approaches are among the most promising strategies. Synthetic peptide combinatorial libraries have arisen as a source of new lead compounds (4); combinatorial libraries of genes have provided new proteins and protein domains (5, 6); and peptide libraries built on ␣-helical scaffolds have appeared as a useful strategy for the identification of new antimicrobial and catalytic synthetic ␣-helical peptides (7, 8). In contrast, libraries of linear peptides that fold as -hairpin or -sheet structures have not been constructed thus far. This fact has been mainly the result of the scarcity of data and incomplete understanding of the factors determining formation of such secondary structure motifs (9). Most of the available knowledge on -hairpin formation and stability has been obtained from some sequences from natural proteins that fold partly into a -hairpin conformation outside of their original protein context (10), as well as from de novo rational-designed artificial peptides (11–13). For this work, we intended to extend the use of synthetic combinatorial libraries to design stable -hairpins. As a framework sequence, we have used a de novo-designed linear -hairpin peptide, BHKE (12, 13), which adopts a two-stranded -hairpin conformation with four residues per strand and a two-residue type I⬘ -turn, with a structured population of ⬇30% in aqueous solution. The screen614 – 619 兩 PNAS 兩 January 22, 2002 兩 vol. 99 兩 no. 2
ing of the peptide library for molecules that fold into the desired -hairpin conformation has been carried out by using a simple spectroscopic selection protocol. We show that defined peptides isolated from those libraries can present higher tendency to fold than the original template sequence. Materials and Methods Conformational-Defined Peptide Library and Individual Peptide Synthesis. The library and individual peptides were prepared by
simultaneous multiple-peptide synthesis, using Fmoc chemistry as described elsewhere (14, 15). The mixture (‘‘X’’) positions were incorporated by coupling a mixture of 19 L-amino acids (cysteine was omitted), with the relative ratio suitability adjusted to yield close to equimolar incorporation. The quality of the synthesized peptide mixtures was validated by electrospray mass spectrometry. Individual peptides were purified by preparative RP-HPLC. Peptide identity was confirmed by laser desorption– time of flight mass spectrometry. Circular Dichroism (CD) Measurements. Spectra were acquired on a Jasco (Eaton, MD) J-810 CD spectropolarimeter. CD spectra were the average of a series of 20 scans recorded at 5°C in 5 mM acetate buffer (pH 5). Peptide concentrations were determined spectrophotometrically as described (8). Peptide Aggregation. Before the conformational analysis of the peptides, we confirmed their aggregation state. No concentration effects were found on the far-UV CD spectra (0.01–0.1 mM) of the defined sequences (peptides MBH1–MBH36) in water or in 40% (vol兾vol) methanol. The 1D 1H NMR spectra at 1 mM concentration, in pure water, showed identical signal line widths and chemical shifts to those acquired at 0.1 mM for the selected peptides analyzed. These results are good evidence that these peptides are monomer up to 1 mM. To further check the monomeric state of the peptides, they were analyzed by sedimentation equilibrium experiments on dilute (0.1 mM) and on concentrated samples (0.5–1 mM). The averaged molecular weight (Mav) obtained from the fitting of the data to a single component model was in agreement with the molecular weight calculated from the amino acid composition (Mth). This result indicated that these peptides are monomeric at all experimental conditions. NMR. All NMR experiments were performed on a Bruker (Billerica,
MA) DRX500 spectrometer on peptide samples of ⬇1–2 mM concentration in pure D2O and 90% H2O兾10% D2O solutions at pH 5 and 40% CD3OD兾60% H2O. All chemical shifts were internally referenced to the sodium salt of trimethylsilylpropionate Abbreviations: CD, circular dichroism; NOE, nuclear Overhauser effect. Data deposition: The atomic coordinates have been deposited in the Protein Data Bank, www.rcsb.org (PDB ID code 1K43). ‡To
whom reprint requests should be addressed. E-mail:
[email protected].
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. §1734 solely to indicate this fact.
www.pnas.org兾cgi兾doi兾10.1073兾pnas.012583999
(TSP). Phase-sensitive total correlation spectroscopy (mixing time 50 ms) and nuclear Overhauser effect (NOE) spectroscopy (mixing time 150, 200, and 250 ms) experiments were performed at 283 K, collecting 2,048 points in f2 and 512 points in f1. Solvent suppression was achieved by selective presaturation during the relaxation delay (1.2 s) or field-gradient pulses. The proton resonances were assigned by the sequential assignment procedure (16). Heteronuclear multiple quantum correlation experiments were performed to assign 13C␣ chemical shifts.
peptide BH1 (11) in the case of the asparagine residue. To estimate the population we just assumed a two-state transition. The population of the -hairpin state for each peptide was calculated at each indicator residue from ⌬␦H␣ data via [(␦obs ⫺ ␦U)兾(␦F ⫺ ␦U)] ⫻ 100 (Eq. 1). Experimental uncertainties reflect error propagation from the ⫾ 0.01 ppm uncertainty in the ␦H␣ measurement.
Structure Calculation. NOE intensities were evaluated qualita-
natorialized positions on the BHKE original scaffold (Arg1-Gly2Lys 3-Ile 4-Thr 5-Val 6-Asn 7-Gly 8-Lys 9-Thr 10-Tyr 11-Glu 12-Gly 13Arg14) was performed with care to preserve the tendency to fold into the desired conformation, to avoid alternative secondary structure conformations, and to prevent aggregation processes, while allowing enough sequence diversity. Residues Asn7 and Gly8 forming the central -turn were kept as part of the sequence template to favor the formation of a -hairpin with proper strand register and to decrease the possibility of ␣-helix formation (23). Residues Arg1-Gly2 and Gly13-Arg14 were included to improve solubility (11). The polar face of the BHKE -hairpin peptide was chosen as the core of the -hairpin basal template, committing residues Lys3, Thr5, Thr10, and Glu12. These residues provide favorable -strand formers (Thr5, Thr10), good sidechain interactions across strands (Thr5-Thr10 and Lys3-Glu12, and they also prevent aggregation. Therefore, only the four residues (Ile4, Val6, Lys9, and Tyr; ref. 11) located at the partially hydrophobic face of BHKE were combinatorialized. The potential to find sequences that generate a substantial stabilization is high because cooperative interactions can develop in this cluster of residues. The scaffold sequence is shown in Table 1, in which B means that the residue is part of a -strand (⫺B, first -strand; ⫹B, second -strand), L is part of the -turn, and c is the random coil.
Protein Structure Database Analysis. The search on the Protein Data
Bank (PDB) was carried out by using the SCAN3D option of the program WHATIF (19) and by using the Kabsch and Sander (20) definition of secondary structure. Fifty-six -hairpin structures with asparagine or aspartic acid at position L1 and glycine at position L2 of the turn were found. The average angle values obtained from these structures were as follows (the actual angles of peptide MBH12 obtained from the NMR structure are given in parentheses): position ⫺B3 ⫽ ⫺120 ⫾ 46° (⫺150°); position ⫺B2 ⫽ ⫺107 ⫾ 35° (⫺131°); position ⫺B1 ⫽ ⫺127 ⫾ 20° (⫺147°); position L1 ⫽ ⫹52 ⫾ 8.0° (⫹71°); position L2 ⫽ ⫹81 ⫾ 11° (⫹94°); position ⫹ B1 ⫽ ⫺108 ⫾ 16° (⫺100°); position ⫹ B2 ⫽ ⫺89 ⫾ 20° (⫺81°); position ⫹ B3 ⫽ ⫺122 ⫾ 13° (⫺139°). Calculation of -Sheet Population from H␣ Conformational Chemical Shifts. Ranking of -sheet population from C␣H conformational
chemical shifts have been calculated, using as reference for the chemical shifts of the unfolded state, the linear peptides that correspond to the two strands of the -hairpin: RGKWTYNGNH2, RGKYTYNG-NH2, Ac-NGITYEGR, Ac-NGHTYEGR, Ac-NGHTDEGR (-NH2 and Ac-, mean C terminus-amidated peptide and N terminus-acetylated peptide, respectively). For those peptides for which we did not have a reference peptide with the exact sequence of one of the two strands, we use the values of the closest reference in terms of sequence, corrected by local effects like the presence of proline at i ⫹ 1 (21). We chose the two threonine in the middle of the strands and asparagine at the -turn, to rank the peptides in order of structured population (see Results). As a reference for 100% folded peptide, we used the data obtained from a circular version of BHKE (T. Korteme and L.S., unpublished data) or from a mutated version of the spectrin SH3 domain containing the BHKE sequence inserted as an elongation of a protein -hairpin (22). As reference for the random coil, we used the random coil Pastor et al.
Selection of the Template Sequence. The selection of the combi-
Design of the Combinatorial Library. The library was synthesized in
the so-called positional scanning format (8). It is composed of four sublibraries: RG-KO ⫺B3 TX-NG-XTXE-GR, RGKXTO ⫺B1 -NG-XTXE-GR, RG-KXTX-NG-O ⫹B1 TXE-GR, and RG-KXTX-NG-XTO⫹B3E-GR. Each sublibrary comprises 20 peptide samples, in which the position labeled (OB) corresponds to a defined natural amino acid, whereas the combinatorialized position (X) are close to equimolar mixtures of 19 natural amino acids (cysteine was excluded to avoid the formation of intermolecular disulphide bridges). Each peptide sample thus contains 193 (6,859) different peptides. Spectroscopic Screening. The library was screened for particular amino acids that favor -hairpin formation, at each of the positions selected for design, by a deconvolution process based on CD spectroscopy. -Hairpin conformations render a characteristic far-UV CD signal, with a relative minimum ellipticity at 217 nm and a positive ellipticity band close to 200 nm. In contrast, random-coil ensembles show a minimum around 202 nm. We expected the number of combinations able to fold into a -hairpin to be small and, therefore, the CD spectra would be dominated by the randomcoil contribution. As a consequence, low population of structured conformations and concentration errors might obscure the minimum at 217 nm. Thus, to avoid these problems, we used the concentration-independent ratio 217兾202 (24). All 80 peptide mixtures were analyzed in buffer solution and in the presence of 40% MeOH. This solvent is a stabilizing agent, Table 1. Scaffold sequence c1 c2 ⫺B4 ⫺B3 ⫺B2 ⫺B1 L1 L2 ⫹B1 ⫹B2 ⫹B3 ⫹B4 c13 c14 Arg Gly Lys Xaa Thr Xaa Asn Gly Xaa Thr Xaa Glu Gly Arg
PNAS 兩 January 22, 2002 兩 vol. 99 兩 no. 2 兩 615
BIOCHEMISTRY
tively and used to obtain upper-limit distance constraints. NOEs were classified as strong (ⱕ3.0), between strong and medium (3.0 to ⱕ3.5), medium (3.5 to ⱕ4.0), between medium and weak (4.0 to ⱕ4.5), weak (4.5 to ⱕ5.0), and very weak (5.0 to ⱕ6.0). Pseudo atom corrections were added when they were necessary. For those residues whose coupling constants could not be measured, angles were constrained to the range ⫺180–⫺20°, except for amino acids in positions c and L (see Results). For residue Ile9 (3JH␣-HN ⫽ 8.6 Hz), was constrained between ⫺145–⫺95°; for Thr10 (3JH␣-HN ⫽ 8.0 Hz), between ⫺160–⫺80°; and for Tyr11 (3JH␣-HN ⫽ 8.3 Hz), between ⫺150–⫺90°. With a total of 60 upper-limit distance restraints derived from NOEs, a family of 50 structures without NOE violations larger than 0.2 Å was generated by using DYANA (Dynamics Algorithm for NMR Applications; ref. 17). Overlay and mean of the 10 best structures calculated by using DYANA were obtained with MOLMOL (18). Energy minimization of the mean structure and of the DYANAcalculated structure with less rms deviation respect to the mean was done with the GROMOS96 implementation of SwissPdbViewer. Figures showing three-dimensional structures were prepared with MOLMOL (Fig. 6A) and RASMOL (Fig. 6B).
Results
Fig. 1. Structure-based screening of the four separate sublibraries. Each bar represents the ratio 217兾202 obtained from the CD spectra of each peptide mixture in buffer (empty bars) and in 40% of MeOH (solid bars) solutions, with the x axis representing the defined amino acid (OB position). For the selection of the most suitable amino acids for each position, we calculate the average 217兾202 ratio value, in aqueous and MeOH solution, for the four positions (the average values are within the error for the four positions when considered independent). Any residue whose 217兾202 value was equal (with an error of ⫾ 0.1) or higher than the sum of the mean plus one SD, in water and兾or in MeOH, was considered to favor -hairpin formation.
which increases -hairpin population proportionally to the structure existing in water (25). Most peptide mixtures showed indeed a good correlation between the 217兾202 ratio in buffer and aqueous MeOH (Fig. 1). Furthermore, the presence of an isodichroic point suggested that no significant portion of the peptides present in each sample significantly populate conformations other than the random coil and -hairpin (Fig. 2A). On the basis of the 217兾202 ratio (Fig. 1), the most suitable amino acids for each position were selected. This selection was done by determining the average 217兾202 ratio value, in aqueous and MeOH solution, for all peptides and determining the SD. Any amino acid whose 217兾202 value was equal to or higher than the sum of the mean plus one SD, in water and兾or in MeOH, was preselected. Based on this criterion, tryptophan and tyrosine are clearly superior for position ⫺B3, and aspartic acid, proline, tyrosine, and tryptophan for position ⫹B3. At position ⫺B1 aspartic acid, proline, tryptophan, and tyrosine fulfilled the criteria. Finally, at position ⫹B1, tryptophan, isoleucine, histidine, and asparagine are above the average. To minimize the number of defined sequences to further characterize and also to simplify the experimental analysis, we decided to consider aspartic acid, proline, and tyrosine at ⫺B1, isoleucine, histidine, and asparagine at position ⫹B1, and tyrosine and aspartic acid at position ⫹B3. The combinations of the selected amino acids resulted in the generation of a set of 36 individual peptides (2 ⫻ 3 ⫻ 3 ⫻ 2 ⫽ 36). CD Analysis of Selected Individual Peptides. It is well known that
aromatic side chains contribute to the far-UV CD spectra of peptides (26). Therefore, to carry out a reliable ranking of the structural content of our peptides by CD, we only compared peptides that bear an identical number of aromatic residues located at the same positions. For this reason, we classified the 36 selected peptides into 8 families, and structural content comparisons were made only into this family context. Table 2 lists the ratio 217兾202 for the more- and less-structured peptide that constitutes each family. The most folded sequences of each family were selected for further structural studies. Some of the 616 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.012583999
Fig. 2. (A) CD spectra in buffer of the 20 different peptide mixtures that define the sublibrary RG-KO⫺B3TX-NG-XTXE-GR. Note the presence of an isodichroic point at 208 nm. (B and C) CD spectra of selected peptides that fold into -hairpin structures. The CD spectra of peptides MBH12 (solid line) and MBH36 (dotted line) (sequences listed in Table 2) were acquired at 5°C in 5 mM acetate buffer (pH 5) (B) and in the presence of 40% MeOH (C). The CD spectra of the original scaffold peptide BHKE (dashed line) are included as reference.
selected individual peptides exhibit a CD spectrum typical of a fully folded -hairpin in water solution (i.e., MBH12 and MBH36; Fig. 2B). The CD spectra of these peptides did not changed after addition of MeOH (Fig. 2C), thus indicating a highly structured population in water solution. NMR Analysis of the -Hairpin Population of the Selected Peptides. 1H ␣
conformational shifts (difference between the observed 1H␣ chemical shifts of the studied peptide and those from a random-coil reference peptide) are good indicators of secondary structure formation (27). In our study, we used as random-coil reference peptides the isolated strands of the -hairpins (see Materials and Methods) to take into account the fact that short, unfolded peptides can adopt nonrandom conformations (28) and that aromatic or other residues can contribute to the 1H␣ proton chemical shifts of neighbor residues (21). (Chemical shifts for all of the peptides are available upon request as additional information.) Ranking of the -hairpin content of the selected peptides and comparison to the original peptide BHKE were carried out by using the conformational shifts of the 1H␣ of threonine (⫺B2), threonine (⫹B2), and asparagine (L1) (Table 2). These residues were selected because they are conserved in all peptides from the sequence template. We have found that the estimate of -hairpin population is consistent, regardless of the probe used to evaluate it. Nevertheless, threonine (⫹B2) consistently reported lower -hairpin population in peptides containing aromatic residues at position ⫺B3. Pastor et al.
Table 2. Structural parameters and -hairpin population of defined peptides ⌬␦C␣H†
Ellipticity Family
Peptide
BHKE BH19-Bergerac*
Sequence
217兾202 ratio
Thr (⫺B2)
Thr (⫹B2)
Asn (L1)
% Population
RGKITVNGKTYEGR RGKITVNGKTYEGR
0.60 ⫺0.76
0.29 0.97
0.17 0.76
⫺0.11 ⫺0.36
24 ⫾ 6 100
RGKWTPNGHTDEGR RGKWTDNGHTDEGR RGKWTYNGNTDEGR RGKWTYNGHTDEGR RGKWTDNGITYEGR RGKWTDNGNTYEGR RGKWTYNGITYEGR RGKWTYNGNTYEGR RGKYTPNGITDEGR RGKYTDNGHTDEGR RGKYTYNGITDEGR RGKYTYNGNTDEGR RGKYTDNGITYEGR RGKYTDNGNTYEGR RGKYTYNGITYEGR RGKYTYNGNTYEGR
0.43 0.18 0.26 0.31 0.40 0.22 ⫺8.50 0.38 0.27 0.12 0.26 0.19 1.10 0.10 ⫺0.09 1.00
0.28
0.10
⫺0.10
23 ⫾ 5
0.11 0.28
0.12 0.08
⫺0.20 ⫺0.13
25 ⫾ 5 24 ⫾ 4
0.67
0.21
⫺0.36
66 ⫾ 4
0.29
0.07
⫺0.06
18 ⫾ 3
0.05
0.11
⫺0.10
15 ⫾ 4
0.30
0.10
⫺0.15
28 ⫾ 5
0.67
0.27
⫺0.37
69 ⫾ 4
MBH 1 2 3 4 5 6 7 8
no. 6 no. 4 no. 9 no. 8 no. 10 no. 16 no. 12 no. 18 no. 20 no. 22 no. 21 no. 27 no. 28 no. 33 no. 36 no. 35
*The BH19-Bergerac corresponds to the BH19 placed inside the spectrin SH3 domain so as to elongate a natural -hairpin. By different criteria, this peptide can be considered 100% folded (22). †The ⌬␦ C␣H for BHKE was calculated as described (13). In the case of the other peptides, the corresponding controls are indicated in Materials and Methods. In the cases of peptide having a proline residue, because we did not have the short peptide corresponding to the -strand, we used the control peptide being more similar in sequence corrected by the presence of a proline residue at position i ⫹ 1 (21).
Pastor et al.
folded -hairpin structures to a higher extent than the original BHKE hairpin. NMR Structural Characterization. A more detailed NMR study was carried out for the two most structured peptides, MBH12 and
Fig. 3. 13C␣ conformational shift profile of (A) MBH12 (black bars; Aa9 is I) and MBH18 (gray bars; Aa9 is N) and (B) MBH36 (black bars; Aa9 is I) and MBH35 (gray bars; Aa9 is N). Random-coil reference values have been taken from Wishart et al. (29). The profiles are consistent with the intended -hairpin conformation. There are some anomalous values (positive or null) that are likely reflecting those residues that do not adopt an extended conformation. This finding is consistent with the angles measured from regular -hairpins of this size that have been found on the Protein Data Bank. PNAS 兩 January 22, 2002 兩 vol. 99 兩 no. 2 兩 617
BIOCHEMISTRY
Aromatic ring current effects on this proton produced by a tryptophan or tyrosine side chain could explain this result. The 13C␣ conformational shift values of Thr10 are also anomalous (positive or close to zero; Fig. 3). This behavior is also observed for the cyclic version of BHKE, whose population is close to 100% (T. Kortemme and L.S., unpublished data), and for the most structured peptides, MBH12 and MBH36 (Fig. 3). Because 13C␣ chemical shifts are mainly determined by and backbone dihedral angles, we looked for the explanation for this result by measuring the dihedral angles of the structure calculated for peptide MBH12. The angle of Thr10 is ⫺81°. This value is not within the expected range for a residue with a -strand conformation. A search with WHATIF (19) showed that position ⫹B2 of the 2-residue turn -hairpin presents an angle of ⫺89° (⫾20), which is consistent with the value measured in the calculated structure of MBH12. Nevertheless, Thr10 13C␣ conformational shift values from these peptides move in the right direction, as compared with values from less structured members of their respective families, MBH18 and MBH35, and to cyclic BHKE (Fig. 3). The structural content ranking obtained from this analysis is MBH12 ⬃ MBH36 ⬎ MBH28 ⬎ MBH8 ⬃ MBH10 ⬃ BHKE ⬃ MBH6 ⬎ MBH21 ⬎ MBH20. For the most structured peptides, this ranking is consistent with that derived from the CD 217兾202 ratio values (Table 2) and with the number of long-range NOEs observed for these peptides (data not shown). For the less structured peptides, the classification is not always consistent between NMR and CD. Although peptide MBH20 is the one with the lowest structural content by all criteria, peptides MBH21, MBH6, and MBH8 can be found more or less structured depending on the parameter being measured. This finding might be explained by the fact that factors others than structured population, i.e., aromatic contribution both to the CD and NMR chemical shifts, have a proportionally stronger effect when the population is low. Furthermore, the error in the measurement of small populations is much larger than for highly folded peptides, which also strongly contributes to the disagreement between both techniques. Thus, deconvolution of the libraries and recombinations provided at least three peptides that by all criteria used adopt well
Fig. 4. Observed conformational C␣H chemical shift increments (⌬␦C␣H) relative to the chemical shifts of the unfolded state for peptides MBH12 (black bars; Aa4 is W) and MBH36 (gray bars; Aa4 is Y).
MBH36. Both peptides showed the expected H␣ conformational shift profile for a -hairpin (Fig. 4): positive values for the strand residues and negative for the Asn7 and Gly8 at the turn. Fig. 5 shows a summary of the NOEs found for these peptides in aqueous solution. For both molecules, in water the expected H␣i-H␣j NOEs between residues Thr5-Thr10 and Lys3-Glu12 were observed. A three-dimensional model structure of peptide MBH12 was calculated with DYANA (17). For this peptide, among the 50 best structures, the pairwise rms deviation for residues 3–12 was 0.38 ⫾ 0.21 Å for the backbone atoms and 1.36 ⫾ 0.35 Å for all heavy atoms. The backbone traces of the 10 best structures appear in Fig. 6A. Fig. 6B shows both faces of the minimized mean of the 10 best structures. The peptide exhibits the expected -hairpin conformation. The first and last two residues (RG-) are disordered, as expected, because they were added just to avoid aggregation. The side chains of Trp4, Tyr6, Ile9, and Tyr11 are interacting in one side of the -hairpin, forming a hydrophobic cluster, and the pairs Lys3-Glu12 and Thr5-Thr10 at the other side (Fig. 6B). Asn7 is at position L1 of the -turn and is directed outwards from the -hairpin, as expected for a type I⬘ -turn. The structure of peptide MBH36 was not calculated because the severe overlapping of the 1H␣ signals of the three tyrosines that MBH36 contains prevented the unambiguous assignment of most long-range NOEs at the hydrophobic face of this peptide. Discussion To evolve the sequence of a peptide that adopts a -hairpin conformation in aqueous solution to make it more structured we have developed a framework that combines rational design and combinatorial approaches. The rational design provides a con-
Fig. 5. Schematic diagram of the structure of peptides MBH12 and MBH36 in solution. The observed NOEs in water are shown as thick lines.
strained scaffold shared by all of the peptide sequences in the library that allows the fine-tuned selection of key amino acids that accomplish all of the restrictions imposed in the selection phase. Odds for Success. The challenge in this work was to find, for each position, a limited set of amino acid choices that would interact favorably with the peptide template residues, and most importantly with the other residues yet to be designed. The first selection criteria should have rendered residues that should have good -strand propensities, i.e., -branched residues (30). The second should result in the selection of those residues having a larger number of favorable combinations with the other 19 residues at the remaining three positions. We have checked whether there is any correlation between the preferences for particular positions of -hairpins in the protein database and the ranking order we have found experimentally in
Fig. 6. (A) Backbone traces of the 10 best structures obtained from NMR restraints for peptide MBH12 using DYANA. (B) Minimized mean of the 10 best NMR structures calculated for peptide MBH12. View of the hydrophobic (a) and polar face (b) of MBH12. Side chains are shown in space-fill representation. 618 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.012583999
Pastor et al.
Quality of the Template Sequence. A good sequence template should cause numerous sublibrary sequences to fold into -hairpins, as well as prevent alternative nonnative conformations. It is very difficult to conclude this from the CD analysis of the four sublibraries, because the structure content in the best of the cases cannot be above 20% (comparing the 217兾202 values of the sublibraries to the defined peptides). However, when analyzing the MeOH effect on our library, we find that in the majority of the cases the CD spectra become more -hairpin like and there is a correlation between the 217兾202 values in water and MeOH. Thus, it seems that in all our sublibraries there is a sizeable proportion of sequences that tend to fold as -hairpins, which are stabilized by MeOH. Even more, taking into account the success 1. Latham, P. W. (1999) Nat. Biotech. 17, 755–757. 2. Starovasnik, M. A., Braisted, A. C. & Wells, J. A. (1997) Proc. Natl. Acad. Sci. USA 94, 10080–10085. 3. DeGrado, W. F., Summa, C. M., Pavone, V., Nastri, F. & Lombardi, A. (1999) Annu. Rev. Biochem. 68, 779–819. 4. Dooley, C. T., Chung, N. N., Wilkes, B. C., Schiller, P. W., Bidlack, J. M., Pasternak, G. W. & Houghten, R. A. (1994) Science 266, 2019–2022. 5. West, M. W., Wang, W., Patterson, J., Mancias, J. D., Beasley, J. R. & Hecht, M. H. (1999) Proc. Natl. Acad. Sci. USA 96, 11211–11216. 6. Jermutus, L., Honegger, A., Schwesinger, F., Hanes, J. & Plu ¨ckthun, A. (2001) Proc. Natl. Acad. Sci. USA 98, 75–80. (First Published December 26, 2000; 10.1073兾pnas.011311398) 7. Blondelle, S. E., Takahashi, E., Houghten, R. A. & Pe´rez-Paya´, E. (1996) Biochem. J. 313, 141–147. 8. Pe´rez-Paya´, E., Houghten, R. A. & Blondelle, S. E. (1996) J. Biol. Chem. 271, 4120–4126. 9. Lacroix, E., Kortemme, T., Lopez de la Paz, M. & Serrano, L. (1999) Curr. Opin. Struct. Biol. 9, 487–493. 10. Viguera, A. R., Jimenez, M. A., Rico, M. & Serrano, L. (1996) J. Mol. Biol. 255, 507–521. 11. Ramirez-Alvarado, M., Blanco, F. J. & Serrano, L. (1996) Nat. Struct. Biol. 3, 604–612. 12. Ramirez-Alvarado, M., Kortemme, T., Blanco, F. J. & Serrano, L. (1999) Bioorg. Med. Chem. 7, 93–103. 13. Ramirez-Alvarado, M., Blanco, F. J. & Serrano, L. (2001) Protein Sci. 10, 1381–1392. 14. Houghten, R. A. (1985) Proc. Natl. Acad. Sci. USA 82, 5131–5136. 15. Fields, G. B. & Noble, R. L. (1990) Int. J. Pept. Protein Res. 34, 161–214.
Pastor et al.
of our screening, it seems that the election of our template framework was successful. The main failure in our strategy has been the selection of proline as one of the best residues at several positions. As we have discussed in Results, all of the peptides having proline at position ⫺B1 analyzed by NMR are not very structured. However, they tend to have a good 217兾202 value. That proline is not good for -hairpin formation in our peptides can be understood from the fact that the four selected positions are involved in two main chain–main chain H-bonds, and proline will not be able to make one of them. So why does proline tend to give high 217兾202 values? One reason is that proline is quite favorable at position i⫹2 of type I -turns (9). Formation of the turn could produce a CD spectrum with negative ellipticity around 217 nm. It seems that when doing this type of approximations to select for structured peptides, one should avoid amino acids that could favor nonnative conformations. Otherwise, when dealing with low populations, nonnative conformations can be easily accepted as good ones. Success of the Use of Libraries To Find Structured Peptides. The
success of our approach has been tested through the finding of several peptide sequences, being more structured than the parent peptide and approaching in some cases a percentage of -hairpin conformation higher than 70% in plain buffered-water solutions. The best peptides, MBH12 and MBH36, were selected from an initial pool of close to 140,000 peptide sequences contained in our -hairpin library. From our screening, it is easy to realize that aromatic residues will stabilize the -hairpin conformation at positions ⫺B3, ⫺B1, and ⫹B3. In particular, tryptophan at ⫺B3 and tyrosine at ⫺B3, ⫺B1, and ⫹B3 were found to be the amino acids that best achieve the selection criteria. Interestingly enough, Cochran et al. (33, 34) recently found through a combination of design and empirical analysis a sequence that is highly structured and also contains a large amount of aromatic residues. As shown in the calculated structure, the three aromatic residues pack against each other, and the aliphatic residue makes a small hydrophobic minicore. The balance of hydrophobicity and hydrophilicity is probably critical for the stabilization of a monomeric -hairpin conformation in water solution and to disfavor aggregation. In our structure-directed in vitro evolution approach, the problem of obtaining such an optimized balance has been solved by the selection of amphipathic aromatic amino acids. We thank Dr. Cristina Carren ˜o at DiverDrugs for helpful advice in the quality analysis of the peptide libraries. This work was supported by European Union Biotechnology Grant BIO4-CT97-2086 (to L.S. and E.P-P.). M.T.P. was a recipient of fellowships from the University of Valencia and the European Molecular Biology Organization (shortterm). M.L.P. was supported by fellowships from the Ministerio de Educacio ´n y Cultura (Spain—until end of 2000) and from the European Union Marie Curie Program. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34.
Wu ¨thrich, K. (1986) NMR of Proteins and Nucleic Acids (Wiley, New York). Guntert, P., Mumenthaler, C. & Wuthrich, K. (1997) J. Mol. Biol. 273, 283–298. Koradi, R., Billeter, M. & Wuthrich, K. (1996) J. Mol. Graph. 14, 29–32. Vriend, G. (1990) J. Mol. Graph. 8, 52–56. Kabsch, W. & Sander, C. (1983) Biopolymers 22, 2577–2637. Merutka, G., Dyson, H. J. & Wright, P. E. (1995) J. Biomol. NMR 5, 14–24. Viguera, A. R. & Serrano, L. (2001) J. Mol. Biol. 311, 377–391. O’Neil, K. T. & DeGrado, W. F. (1990) Science 250, 646–651. Jimenez, M. A., Mun ˜oz, V., Rico, M. & Serrano, L. (1994) J. Mol. Biol. 242, 487–496. Buck, M. (1998) Q. Rev. Biophys. 31, 297–355. Chakrabartty, A., Kortemme, T., Padmanabhan, S. & Baldwin, R. L. (1993) Biochemistry 32, 5560–5565. Williamson, M. P. (1990) Biopolymers 29, 1423–1431. Smith, L. J., Fiebig, K. M., Schwalbe, H. & Dobson, C. M. (1996) Folding Des. 1, 95–106. Wishart, D. S., Bigam, C. G., Holm, A., Hodges, R. S. & Sykes, B. D. (1995) J. Biomol. NMR 5, 67–81. Mun ˜oz, V. & Serrano, L. (1994) Proteins 20, 301–311. Minor, D.L., Jr., & Kim, P. S. (1994) Nature (London) 367, 660–663. Smith, C. K. & Regan, L. (1995) Science 270, 980–982. Cochran, A.G., Tong, R. T., Starovasnik, M. A., Park, E. J., McDowell, R. S., Theaker, J. E. & Skelton, N. J. (2001) J. Am. Chem. Soc. 123, 625–632. Cochran, A. G., Skelton, N. J. & Starovasnik, M. A. (2001) Proc. Natl. Acad. Sci. USA 98, 5578–5583. (First Published May 1, 2001; 10.1073兾pnas.091100898)
PNAS 兩 January 22, 2002 兩 vol. 99 兩 no. 2 兩 619
BIOCHEMISTRY
our peptide mixtures. Essentially, the correlation is nonexistent (data not shown). Similar results are found when comparing with the experimental scales on -sheet propensities (31, 32). This observation means that the stronger selection in our experiment is for those residues that in combination with a number of residues at other positions will have a significant contribution to -hairpin stability. What are the chances of selecting one residue as favorable if it can only participate in one good -hairpin forming sequence? Taking three positions and 19 residues, the number is 1兾193. This number even for a fully folded -hairpin will be to small to contribute significantly to the CD spectrum of the mixture. This result means that there must be more than one sequence combination involving a particular amino acid at a defined position, which is favorable for -hairpin conformation. We can estimate that in the selected peptide mixtures the -hairpin content must be roughly between 15–25%, looking at the CD spectra. Thus, considering that in the peptide mixtures we have either fully folded or fully unfolded peptides, it implies that roughly one of every five peptides in the mixture is fully folded. Of course in reality we will have many partly folded sequences, but the previous number indicates that the number of sequence combinations involving our selected residues that are compatible with a -hairpin structure must be quite large. Therefore, it seems that a local random environment can translate the average -hairpin interaction of an amino acid with all possible sequences. As a result we will be in principle selecting for those residues that are compatible in a -hairpin structure with a large number of different residues. That the 36 defined sequences obtained from combining the selected residues are in general folded as -hairpins with populations higher than 20% (except for those containing a proline) supports this idea. This fact is especially interesting from the point of view of making combinatorial libraries, because selection of these residues will help biasing the mixtures to the desired conformation.