Nonplanar peptide bonds in proteins are common ... - Semantic Scholar

18 downloads 0 Views 3MB Size Report
Pauling, et al. were aware that deformations from planarity ..... Edison AS (2001) Linus Pauling and the planar peptide bond. Nat Struct Biol 8:201–202. 5.
Nonplanar peptide bonds in proteins are common and conserved but not biased toward active sites Donald S. Berkholza,b, Camden M. Driggersa, Maxim V. Shapovalovc, Roland L. Dunbrack, Jr.c, and P. Andrew Karplusa,1 a Department of Biochemistry and Biophysics, Oregon State University, 2011 Agriculture and Life Sciences Building, Corvallis, OR 97331; bDepartments of Physiology and Biomedical Engineering and Pediatric and Adolescent Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN 55905; and cInstitute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111

The planarity of peptide bonds is an assumption that underlies decades of theoretical modeling of proteins. Peptide bonds strongly deviating from planarity are considered very rare features of protein structure that occur for functional reasons. Here, empirical analyses of atomic-resolution protein structures reveal that trans peptide groups can vary by more than 25° from planarity and that the true extent of nonplanarity is underestimated even in 1.2 Å resolution structures. Analyses as a function of the φ,ψ-backbone dihedral angles show that the expected value deviates by 8° from planar as a systematic function of conformation, but that the large majority of variation in planarity depends on tertiary effects. Furthermore, we show that those peptide bonds in proteins that are most nonplanar, deviating by over 20° from planarity, are not strongly associated with active sites. Instead, highly nonplanar peptides are simply integral components of protein structure related to local and tertiary structural features that tend to be conserved among homologs. To account for the systematic φ,ψdependent component of nonplanarity, we present a conformation-dependent library that can be used in crystallographic refinement and predictive protein modeling.

strongly depend on the ψ torsion angle of the residue preceding the peptide bond in question (8), with additional influence caused by participation in an α-helix or a β-strand. The authors proposed that accounting for these variations by conformation-dependent crystallographic restraints would be beneficial (8). In a related effort, we recently created the Protein Geometry Database (PGD; 18) and used it to document how protein backbone bond lengths and angles vary as a function of φ and ψ and to produce a backbone conformation-dependent library (CDL) for use in protein modeling (19). We further showed that using this CDL to move beyond the paradigm of a single, contextindependent ideal geometry does greatly improve the behavior of crystallographic refinements (20). Here, we extend this CDL to include the nonplanarity of the peptide bond. In the course of the analysis, we gain additional insight into aspects of peptide nonplanarity that allow it to be viewed as a feature that is widely seen in folded proteins and heavily influenced by nonlocal interactions. Results and Discussion The Resolution Dependence of Observed Deviations from Planarity.

omega torsion angle ∣ peptide planarity ∣ protein geometry ∣ kernal density regression ∣ strain

T

he prediction of the dominant forms of secondary structure in proteins, α-helices and β-strands, was enabled by the simplifying assumption that the peptide bond was planar, consistent with its expected partial double-bond character and evidence from small-molecule crystal structures (1–3). Pauling, et al. were aware that deformations from planarity associated with an energetic cost could occur, but the expectation was that the minimumenergy conformation was always planar (2, 4). In proteins, the ω torsion angle measures peptide planarity, with ω ¼ 180° and ω ¼ 0° representing planar trans and cis peptides, respectively. In an early large-scale empirical study of peptide planarity, MacArthur and Thornton (5) found that in proteins determined at better than 2 Å resolution and in small-molecule peptides, the ω-distributions were Gaussian-like with averages of 179.6° (σ ¼ 4.7°) and 179.7° (σ ¼ 5.9°), respectively. These authors further proposed that the smaller spread seen in proteins was an artifact due to the planarity restraints used in crystallographic refinements. This study and that of Karplus (6) also showed that the average ω-value varies as a function of the conformation of the backbone torsion angles φ and ψ, with MacArthur and Thornton suggesting that the direction of nonplanarity was related to the handedness of the φ,ψ-associated chain twist (5). As more structures were analyzed at ultrahigh (≤1.2 Å) resolutions (7), higher deviations in planarity have emerged (8–13). It has also been proposed that highly nonplanar residues are biased toward active sites (14), and a number of descriptions of protein structures emphasized nonplanar peptide bonds in the active site (14–17). The question of conformation dependence was revisited by Esposito, et al. (8) using structures refined at better than 1.2 Å resolution, and a correlation with the handedness of the chain twist was not found. Instead, peptide planarity was seen to most www.pnas.org/cgi/doi/10.1073/pnas.1107115108

Consistent with earlier studies, for nonredundant structures determined at 1.0 Å resolution or better (see Materials and Methods), the distribution of ω-values for trans peptides has σ ¼ 6.3°, much broader than the σ ¼ 4.8° distribution seen for structures determined at the lesser but still quite high resolution of 1.7 Å (Fig. 1A). This sizable increase in the standard deviation brings the 1 Å resolution structures to a spread on par with the deviation from planarity of σ ¼ 5.9° seen in linear small-molecule peptides (5). What has not yet been documented is at which resolution the artifact due to planarity restraints used in refinement ceases to be a problem. Compared to the standard deviation of the distribution, a more sensitive measure of the effects of restraints is the number of highly deviating residues; this is because those will incur the largest restraint penalties with, for instance, a 20°-outlier experiencing a fourfold greater restraint pushing it back toward a planar conformation than would a 10°-outlier, assuming a harmonic restraint such as is used in protein crystallography (23). Indeed, the fractions of peptides deviating by >10° or >20° from planarity are about two and threefold higher for structures at 1 Å resolution compared with those at 1.6 Å resolution (Fig. 1B), and the electron density at the highest resolutions provides unambiguous evidence for the reality and the level of nonplanarity of such extreme outliers (Fig. 1 C and D). Author contributions: D.S.B. and P.A.K. designed research; D.S.B., C.M.D., and M.V.S. performed research; R.L.D. and M.V.S. developed kernel-regression methods; D.S.B. and P.A.K. analyzed data; and D.S.B. and P.A.K. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. 1

To whom correspondence should be addressed. E-mail: [email protected]. edu.

This article contains supporting information online at www.pnas.org/lookup/suppl/ doi:10.1073/pnas.1107115108/-/DCSupplemental.

PNAS ∣ January 10, 2012 ∣ vol. 109 ∣ no. 2 ∣ 449–453

BIOPHYSICS AND COMPUTATIONAL BIOLOGY

Edited by Axel T. Brunger, Stanford University, Stanford, CA, and approved October 19, 2011 (received for review May 4, 2011)

B

3000

% residues with |

Observations

4000

2000

1000

0 150

C

– 180°| > 10°

0–1.0 Å 1.66–1.69 Å

170

180

ω (°)

190

200

210

– 180°| > 10° – 180°| > 20°

0.6

10

0.5

8

0.4

6

0.3

4

0.2

2

0.1

0

160

0.7 | |

12

2.4

2

1.6 Resolution (Å)

D

Assuming that the proteins in each resolution bin have similar behavior in terms of nonplanarity, a surprise finding is that even at the normally-used ultrahigh resolution threshold of 1.2 Å, crystal structures still underestimate by ca. 30% and 100% the numbers of peptides that have deviations from planarity of >10° and >20°, respectively. It is not until ∼0.9–1.0 Å resolution that the curves level out. At 0.9 Å resolution the number of observations (at ∼6;900 residues) is still large enough to be considered broadly representative, so we suspect the increase in outlier observation between 1.0 Å and 0.9 Å is real. The fewer observations at 0.8 Å (∼1;900 residues) and especially 0.7 Å (∼500 residues) lead us not to propose a more stringent resolution cutoff associated with the reliable determination of extreme outlier ω-values. The Local Conformation Dependence of Observed Deviations from Planarity. To analyze the dependence of peptide planarity on back-

bone φ,ψ-angles, we used a dataset of 28,917 well-defined 3-residue segments from diverse protein chains determined at 1.0 Å resolution or better (see Materials and Methods) and carried out separate statistical analyses for eight groups of residues (Gly, Pro, Ile/Val, other “general” residues, and each of these groups preceding Pro) as well as control calculations that grouped all residues together and all prePro residues together (19). Although even 1 Å resolution structures may not have fully accurate ω-

1.2

0.8

0

– 180°| > 20°

14

5000

% residues with |

A

Fig. 1. Observed nonplanarity in peptide bonds increases at atomic resolution. (A) Histogram of the distribution of ω angles for two resolution ranges (≤1.00 Å and 1.66–1.69 Å). Observation numbers and means of the two histograms are 32;549∕179.1° and 17;001∕179.3°, respectively. (B) The percent of general peptides that are modeled as highly nonplanar (≥10° and ≥20° as noted in figure) are plotted as a function of resolution in 0.1 Å resolution slices. (C,D) Two well defined highly nonplanar peptide bonds (one rotated in each direction) located outside of protein active sites. Shown are the peptide bonds between (C) residues Ile102-Asp103 from a carboxylic esterase (PDB code 1qlw) (21) with ðω − 180°Þ ¼ −26° (electron density at 6.0ρrms ), and (D) residues Asp105-Asn106 from a β-glycoside hydrolase (PDB code 7a3 h) (22) with ðω − 180°Þ ¼ 23° (electron density at 4.6ρrms ). In a planar peptide bond, all five atoms would lie in the plane shown in gray. Searches were done with the PGD (18) for dipeptides, using a 90% sequence-identity threshold and otherwise default seach parameters.

values for extreme outliers, we have chosen this 1 Å resolution cutoff as a trade-off that provides sufficient numbers of observations to carry out a φ,ψ-dependent analysis while at the same time providing sufficiently accurate ω-values for extreme outliers. Because a peptide bond resides halfway between two residues, we assessed the φ,ψ-dependence of the planarity of both the peptide bonds before and after the central residue (residue 0): Xaa−1 ωbefore Xaa0 ωafter Xaaþ1 : With this nomenclature, ωbefore is the omega-angle traditionally assigned as belonging to residue 0. For observing conformationdependent trends, we used kernel density regression with periodic von Mises functions as a method for achieving smooth local regressions as a function of φ and ψ (24). As seen in Fig. 2 for general residues, the variation of ωbefore is largely φ-dependent with a pattern of vertical stripes, and for ωafter the dependence is mostly on ψ, resulting in horizontal stripes. For both peptide units, the conformation-dependent averages vary over ∼16–17°, yet the standard deviation within each conformation remains near 6°, close to the 6.3° standard deviation of the overall distribution. For other residue types (i.e., Ile/Val, Gly, Pro, prePro) the variations as a function of conformation show similar trends yet include distinct features (see Figs. S1, S2, S3, and S4).

A

Fig. 2. Conformation-dependent variation in the planarity of peptide bonds for general residues. Ramachandran plots of the averages (A) and standard deviations (B) are shown for ωbefore and ωafter as a function of the φ,ψ-angles of residue 0, and for ωafter as a function of the ψ of residue 0 and φ of residue þ1 (i.e., ωbetween ; right boxes). Within each plot, colors indicate ω values ranging from the global minimum (blue) to the global maximum (red) as calculated using kernel density regression (see SI Methods). The global minimum and maximum are provided in each plot. With ∼90% of the data in bins having ≥36 observations (N), the standard errors p of the means (s∕ N) are below 1° for the large majority of residues.

B

450 ∣

www.pnas.org/cgi/doi/10.1073/pnas.1107115108

Berkholz et al.

analyses of ωbetween for residues adopting α-region φ,ψ-angles but not residing in helices and for residues adopting β-region φ,ψ-angles but not in β-strands (Fig. 3A). For the non-α-helical subpopulation, the average ω-value shifted from 180° to 183° and the σ rose from 2.5° to 3.9°. For the non-β-strand subpopulation, the average ω-value shifted from 172° to 176° along with a little change in the spread of the distribution from a σ of 6.9° to 6.7°. Thus secondary structure formation causes a systematic ∼3°–4° adjustment in the expected ω-values (in one case closer to planar and in the other case away from planar) that modulates the ∼15° range correlated with variations in φ and ψ. The observation of high (∼6°) standard deviations for the individual φ,ψ-bins contrasts strongly with the behavior of backbone bond angles, for which the standard deviations of the conformation-dependent distributions (at ∼1.0°–1.5°) were about half of the standard deviations seen for the population as a whole (19). This distinct behavior of ω, with values within each φ,ψ-bin spanning ∼25° (which is 2σ with σ ≈ 6°), implies that longerrange interactions are playing a dominant role in influencing individual ω-values. This implication is consistent with a quantum mechanics study of peptide planarity in which calculated φ,ψassociated deviations did not match closely with the ω-values in the small protein crambin (25). In contrast, the authors reported that quantum mechanics calculations done for each residue in the

A

B

C

D

Fig. 3. Properties and implications of conformation-dependent ω deviations. (A) Observed deviations from planarity (ωbetween − 180°) are shown for observations from selected 10° × 10° bins in three regions of the bordering torsion angles [ψ 0 ∕φþ1 ]: α-helical residues in the region [−45  5∕ − 65  5]; β-strand residues in the region [þ155  5∕ − 115  5]; and all residues in the region [−45  5∕ − 95  5]. In addition, distributions for residues in the first two bins but not in α-helices or β-strands, respectively, are shown. The distributions, based on structures at ≤1.2 Å resolution, are normalized to have the same area and smoothed using Gaussian kernel-density estimates with a 1.5° bandwidth. (B) The distribution of median values for peptide nonplanarity (ωbetween − 180°) seen in the 10° × 10° φ,ψ bins. Distributions are treated as in A, with a 0.5° bandwidth. (C) The predictive power of a CDL-estimated ωbetween (ωCDL ), plotted as the observed deviation from planarity for ωbetween (ωexptl − 180°) vs. ωCDL -predicted nonplanarity (ωCDL − 180°). In contrast, fixed 180° predictions would collapse all data to x ¼ 0. Plotted are observations in the ωexptl range shown as well as the best-fit linear regression (black line and equation), which has a standard uncertainty in the slope of 0.01. The coefficient of determination indicates that the CDL accounts for ∼20–25% of nonplanarity. The slope of >1.0 results from extreme deviations in less-populated regions being damped by the kernal density estimate fitting; this damping is intended to create a better predictive model. As an estimate of the gain in predictive accuracy, comparing use of the CDL vs. a fixed prediction of 180°, the ω rmsd of the models from the reference is 5.6° vs. 6.3°. (D) Conceptual illustration of how a shift in the minimum of a harmonic energy well for nonplanarity with no change in its width would make large nonplanarities accessible at a lower computed strain energy. The true nature of the potential functions in relevant environments are unknown, but for the illustration we use a generic energy form previously suggested [Energy ¼ A sin2 ω, with A ¼ 30 kcal∕mol (2, 4)], and show minimumenergy ω-values shifted 10° (red, blue) from 180° (gray). This conceptual illustration shows how, all other things being equal, the 10°-shifted potentials enable a nonplanarity of 30° to be reached at a 4 kcal∕mol lower computed energetic cost relative to the minimal energy in the relevant environment.

Berkholz et al.

PNAS ∣

January 10, 2012 ∣ vol. 109 ∣

no. 2 ∣

451

BIOPHYSICS AND COMPUTATIONAL BIOLOGY

Given these patterns of dependence, the peptide planarity varies mostly with ψ of the preceding residue and φ of the following residue. Focusing on the peptide unit following the central residue (i.e., ωafter ), the dependence of its planarity on these two torsion angles can be visualized with a φþ1 ,ψ 0 plot. Although this plot (called ωbetween because the analyzed peptide bond is between the two torsion angles being varied) shows a smaller total spread of only 13.5°, it appears that the extreme values are more centrally located in populated regions. Also, the standard deviations, while similar, are a few tenths of a degree smaller throughout. This plot also shows that the ψ 0 -dependence appears to dominate over the φþ1 -dependence (i.e., the main variations occur with ψ 0 so plots have horizontal stripes of relatively constant ω), so for generating a local CDL we expect that using either ωafter or ωbetween will lead to the highest predictive power; ω-between will likely have somewhat higher predictive power, because it has both dihedrals adjacent to the peptide in question. Despite the large standard deviations, histograms of the ωbetween distributions for selected regions with high and low averages emphasize their distinct natures (Fig. 3A). Also rather striking is that the distribution of the φ,ψ-dependent averages shows distinct maxima ∼4° degrees to either side of 180° in addition to a main peak near 179° (Fig. 3B). To assess the impact of secondary structure on nonplanarity, we carried out separate

context of the whole protein produced much better agreement. In this light, the near 3° standard deviation of residues in α-helices can be explained by their highly consistent longer-range context even compared with β-strands. Interestingly, the ∼3° standard deviation seen for α-helices may be limited by coordinate accuracy, as it roughly matches the estimated uncertainty of ω-values in the 1.2 Å resolution crystal structure of ribonuclease (13) and the agreement between noncrystallographic symmetry related extreme nonplanar peptides in this study (see Table S1). Given the strong dependence of planarity on tertiary factors, the first-generation ω-CDL we generate here will not capture the full diversity of ω-values in proteins. Nevertheless, such a CDL is still a valuable step forward compared to a universal target value of 180°. To decide which parameters to use in this first generation ω-CDL, a set of trial CDLs were generated and tested for their predictive power (Table S2). As expected, both ωafter and ωbetween strongly outperformed ωbefore . For CDLs using ωafter or ωbetween , and with or without residue classes, the performance differences are smaller, but overall the CDL based on ωbetween and using classes of residue types performed best (Table S2). Fig. 3C illustrates the systematically improved agreement of this ωbetween -CDL with the observed ω-values in protein structures. The slope near 1 shows that the CDL, as expected based on how it was developed, is a good match to the averages of the observed values. However, the large spread in ωexptl at each ωCDL value indicates the dominant impact of tertiary factors. The coefficient of determination of ∼0.20–0.25 implies that the local conformational dependence accounts for about one-quarter of the total variation. Also, the tendency of the most extreme deviations to occur in the same direction as the local effects supports the further insight that the conformation-dependent shifts in the expected value of ω enable the larger deviations to occur at a much lower computed energetic cost (Fig. 3D). Extreme Deviations from Planarity Tend to Be Conserved but Do Not Favor Functional Sites. To investigate the conservation and func-

tional significance of the most extreme examples of peptide nonplanarity, we searched the PGD (18) for peptides ≥20° from planar using a slightly less stringent resolution criterion of ≤1.2 Å. Manual inspection of the electron-density evidence for each of the occurrences yielded 116 examples of proven reliability (Table S1). We assessed evolutionary conservation by finding the subset of these proteins for which a homolog was also known at ≤1.2 Å resolution. This search yielded homologs (having ∼25–50% sequence identity) for eight proteins (from five protein families) that included 16 of the 116 highly deviating peptides. For 15 of the 16 cases, the local backbone conformation is conserved and the equivalent peptide in the homolog is strongly nonplanar in the same direction — greater than 9° in every case with a median

A

B

C

value of 16° (Table S3). For seven of these, the high deviation from planarity is maintained despite mutation of the residue. For one of the 16 cases [PDB code 1o5x:Phe150 (26–28)], the local backbone conformation in the homolog changed, and the nonplanarity was not conserved. Viewing the distribution of these ω-outliers in the protein structures, we were surprised that the large majority of them, 13 of 16, were not associated with the protein’s active site (Fig. 4). To carry out a more general assessment of any correlation between the most extreme nonplanar peptides and functional sites in proteins, we used the Sequence Annotated by Structure (SAS) resource (31). Automated searches for all 116 ω-outliers showed no significant enrichment (at p ≤ 0.05) at functional sites compared to a control set of randomly chosen residues (Table S4). Interestingly, a consideration of the secondary structural context of the ω-outliers reinforces the idea the secondary structure is not a strong determinant of peptide nonplanarity. Considering the central tripeptide residue (i.e., residue 0) of each of the reliable ω-outliers, all secondary structure types are represented: 65 are in β-structure, 12 in α-∕310 -helices, 14 in H-bonded turns, 11 in non-H-bonded bends, and 14 have no defined secondary structure (Table S1). Interestingly, all five secondary structure types include ω-outliers on both sides of 180°, proving that within a given secondary structure context, tertiary factors can cause omega to vary over 40°. Outlook. In this work, we have conclusively shown that peptide

nonplanarity is a common, even mundane, feature of proteins that is distributed throughout their structures, and it is not in general a marker for functional sites. The perceived association with active sites appears due to a bias in what has been noticed rather than reflecting what exists. Indeed, the overwhelming majority of the extreme outliers we studied [including those in Fig. 1 C and D (20,21)] were not mentioned in the original structure reports. We also show that using current refinement methodologies, better than 1 Å resolution data are required to accurately model the most extreme outliers and that based on such structures, a generic protein will have on the order of 10–15% of general residues deviating ≥10° from planarity with occasional residues deviating over 30° from planarity. When backbone path is conserved, such extreme ω-deviations also tend to be conserved. One factor that makes such extreme deviations more energetically accessible are φ,ψ-dependent shifts in the thermodynamically most stable ω-value (Fig. 3D). We have documented these φ,ψ-dependent shifts in a first-generation ω-CDL. As was seen for a backbone bond length and angle CDL (19), the implementation of this CDL should help with the accuracy of protein modeling, even though in this case the local effects only capture a modest portion of the variations in planarity. As specific longer-range effects that influence peptide

D

E

Fig. 4. Highly nonplanar residues are not dominantly present in active sites. For five protein families having extreme ω-outliers and at least two divergent members analyzed at atomic-resolution (Table S1), a backbone ribbon is shown with ≥20° ω-outlier residues (red sticks) labeled and the active site region identified by a bound ligand (cyan sticks). (A) Penicillinopepsin at 0.95 Å resolution (PDB code 1bxo) (29). (B) Triose phosphate isomerase at 1.10 Å resolution (PBD code 1n55) (27). (C) Cellulase 6A at 1.11 Å resolution (PDB code 1oc7) (30). (D) Nitrophorin at 1.10 Å resolution (PDB code 1pm1). (E) β-glucosidase at 0.99 Å resolution (PDB code 1ug6). A stereoview of each of these molecules that in addition has the backbone ribbon colored by ω from 160° (red) to 200° (blue) is provided as Fig. S5. Those images provide additional visualization of the lack of correlation of ω-variations and active sites. 452 ∣

www.pnas.org/cgi/doi/10.1073/pnas.1107115108

Berkholz et al.

Materials and Methods Quantifying φ,ψ-Dependent Variations in Peptide Planarity. The φ,ψ-dependent variations in ω were derived in the same way as were the φ,ψ-dependent variations in bond lengths and angles by Berkholz, et al. (19). Briefly, a PGD (18) search of structures determined at ≤1.0 Å resolution with a maximum sequence identity of 25% as determined by the PISCES (34) 06-18-2011 dataset resulted in 28,917 well ordered three-residue segments (from 204 protein chains) with average main-chain, side-chain, and Cγ B-factors below 25 Å2. The systematic ω-variations are represented using a smoothing technique 1. Pauling L, Corey RB (1951) Configurations of polypeptide chains with favored orientations around single bonds: two new pleated sheets. Proc Natl Acad Sci USA 37:729–740. 2. Corey RB, Pauling L (1953) Fundamental dimensions of polypeptide chains. P R Soc Lond B 141:10–20. 3. Corey RB, Branson HR, Pauling L (1951) The structure of proteins: two hydrogenbonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA 37:205–211. 4. Edison AS (2001) Linus Pauling and the planar peptide bond. Nat Struct Biol 8:201–202. 5. MacArthur MW, Thornton JM (1996) Deviations from planarity of the peptide bond in peptides and proteins. J Mol Biol 264:1180–1195. 6. Karplus PA (1996) Experimentally observed conformation-dependent geometry and hidden strain in proteins. Protein Sci 5:1406–1420. 7. Schmidt A, Lamzin VS (2002) Veni, vidi, vici—atomic resolution unraveling the mysteries of protein function. Curr Opin Struct Biol 12:698–703. 8. Esposito L, De Simone A, Zagari A, Vitagliano L (2005) Correlation between ω and ψ dihedral angles in protein structures. J Mol Biol 347:483–487. 9. Esposito L, Vitagliano L, Zagari A, Mazzarella L (2000) Pyramidalyzation of backbone carbonyl carbon atoms in proteins. Protein Sci 9:2038–2042. 10. Herzberg O, Moult J (1991) Analysis of the steric strain in the polypeptide backbone of protein molecules. Proteins 11:223–229. 11. Kang BS, Devedjiev Y, Derewenda U, Derewenda ZS (2004) The PDZ domain of syntenin at ultra-high resolution: bridging the gap between macromolecular and small molecule crystallography. J Mol Biol 338:483–493. 12. Longhi S, Czjzek M, Lamzin V, Nicolas A, Cambillau C (1997) Atomic resolution (10 Å) crystal structure of Fusarium solani cutinase: stereochemical analysis. J Mol Biol 268:779–799. 13. Sevcik J, Dauter Z, Lamzin VS, Wilson KS (1996) Ribonuclease from Streptomyces aureofaciens at atomic resolution. Acta Crystallagr D 52:327–344. 14. Merritt EA, et al. (1998) The 125 Å resolution refinement of the cholera toxin B-pentamer: evidence of peptide backbone strain at the receptor-binding site. J Mol Biol 282:1043–1059. 15. Lawson CL (1996) An atomic view of the L-tryptophan binding site of trp repressor. Nat Struct Biol 3:986–987. 16. Xu Q, Buckley D, Guan C, Guo HC (1999) Structural insights into the mechanism of intramolecular proteolysis. Cell 98:651–661. 17. Dobson RCJ, et al. (2008) Conserved main-chain peptide distortions: a proposed role for Ile203 in catalysis by dihydrodipicolinate synthase. Protein Sci 17:2080–2090. 18. Berkholz DS, Krenesky PB, Davidson JR, Karplus PA (2010) Protein Geometry Database: a flexible engine to explore backbone conformations and their relationships to covalent geometry. Nucleic Acids Res 38:D320–325.

Berkholz et al.

called kernel density regression (SI Methods), which lacks the artifacts caused by binning. Creation and Analysis of a Set of Extreme ω Outliers. The set of extreme ω-outliers was created by a PGD (18) search similar to that above but using a ≤1.2 Å resolution for three-residue segments with ωafter ≥ 20° from planarity (performed in July 2009). For each of the 66 proteins containing an ω-outlier, a BLASTP (35) search of the Protein Data Bank (SI Methods) was used to identify all homologs with structures determined at 1.2 Å resolution or better. Automated searches of the SAS (31) server were carried out using the wsSAS interface (36) (SI Methods). For each homolog, the two residues bordering the ω-outlier (i.e., positions “0” and “þ1”) were searched for all functional annotations. The control was equivalent searches based on five randomly-chosen peptides in the same protein chain. Library Availability. The ω-CDL is freely available at http://dunbrack.fccc.edu/ and http://proteingeometry.sourceforge.net/. ACKNOWLEDGMENTS. This work was supported by National Institutes of Health (NIH) Grant R01-GM083136 (to P.A.K.), NIH grants P20-GM76222 and R01 GM84453 (to R.L.D.), and an American Heart Association (Midwest affiliate) postdoctoral fellowship (to D.S.B.). 19. Berkholz DS, Shapovalov MV, Dunbrack RL, Jr, Karplus PA (2009) Conformation dependence of backbone geometry in proteins. Structure 17:1316–1325. 20. Tronrud DE, Berkholz DS, Karplus PA (2010) Using a conformation-dependent stereochemical library improves crystallographic refinement of proteins. Acta Crystallagr D 64:834–842. 21. Sevrioukova IF, Li H, Poulos TL (2004) Crystal structure of putidaredoxin reductase from Pseudomonas putida, the final structural component of the cytochrome P450cam monooxygenase. J Mol Biol 336:889–902. 22. Davies GJ, et al. (1998) Snapshots along an enzymatic reaction coordinate: analysis of a retaining β-glycoside hydrolase. Biochemistry 37:11707–11713. 23. Evans PR (2007) An introduction to stereochemical restraints. Acta Crystallagr D 63:58–61. 24. Shapovalov MV, Dunbrack RL, Jr (2011) A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure 19:844–858. 25. Ramek M, Yu C-H, Sakon J, Schafer L (2000) Ab initio study of the conformational dependence of the nonplanarity of the peptide group. J Phys Chem A 104:9636–9645. 26. Parthasarathy S, Eaazhisai K, Balaram H, Balaram P, Murthy MR (2003) Structure of Plasmodium falciparum triose-phosphate isomerase-2-phosphoglycerate complex at 1.1-Å resolution. J Biol Chem 278:52461–52470. 27. Kursula I, Wierenga RK (2003) Crystal structure of Triosephosphate isomerase complexed with 2-phosphoglycolate at 0.83-Å resolution. J Biol Chem 278:9544–9551. 28. Jogl G, Rozovsky S, McDermott AE, Tong L (2003) Optimal alignment for enzymatic proton transfer: structure of the Michaelis complex of triosephosphate isomerase at 1.2-Å resolution. Proc Natl Acad Sci USA 100:50–55. 29. Khan AR, et al. (1998) Lowering the entropic barrier for binding conformationally flexible inhibitors to enzymes. Biochemistry 37:16839–16845. 30. Varrot A, et al. (2003) Structural basis for ligand binding and processivity in cellobiohydrolase Cel6A from Humicola insolens. Structure 11:855–864. 31. Milburn D, Laskowski RA, Thornton JM (1998) Sequences annotated by structure: a tool to facilitate the use of structural information in sequence analysis. Protein Eng 11:855–859. 32. Panchenko AR, Luthey-Schulten Z, Wolynes PG (1996) Foldons, protein structural modules, and exons. Proc Natl Acad Sci USA 93:2008–2013. 33. Zhuravlev PI, Papoian GA (2010) Functional versus folding landscape: the same yet different. Curr Opin Struct Biol 20:16–22. 34. Wang G, Dunbrack RL, Jr (2005) PISCES: recent improvements to a PDB sequence culling server. Nucleic Acids Res 33:W94–98. 35. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410. 36. Talavera D, Laskowski RA, Thornton JM (2009) WSsas: a web service for the annotation of functional residues through structural homologues. Bioinformatics 25:1192–1194.

PNAS ∣

January 10, 2012 ∣ vol. 109 ∣

no. 2 ∣

453

BIOPHYSICS AND COMPUTATIONAL BIOLOGY

planarity are discovered and the effects of secondary structure are more fully worked out, these can be incorporated into future more general “context-dependent” restraint libraries. Finally, the prevalence of widespread and substantial deviations from planarity in proteins supports the view that the exquisite packing of folded proteins is not as ideal as it appears to the eye. Instead, folded protein structures are filled with hidden strain (6) and are a dynamic ensemble of many similar energy structures that are “minimally frustrated,” but nevertheless frustrated (32, 33).