ARTICLE IN PRESS
Journal of Theoretical Biology 241 (2006) 152–157 www.elsevier.com/locate/yjtbi
Relationships between the folding rate constant and the topological parameters of small two-state proteins based on general random walk model Dong Mia,b,, G.R. Liub,c, J.-S. Wangb,d, Z.R. Lic a
Department of Physics, Dalian Maritime University, Dalian 116026, PR China b Singapore-MIT Alliance (SMA), Singapore 117576, Singapore c Department of Mechanical Engineering, Centre for Advanced Computations in Engineering Science, National University of Singapore, Singapore 119260, Singapore d Department of Computational Science, National University of Singapore, Singapore 117543, Singapore Received 10 June 2005; received in revised form 27 September 2005; accepted 10 November 2005 Available online 28 December 2005
Abstract In this paper, we propose an analytically tractable model of protein folding based on one-dimensional general random walk. A secondorder differential equation for the mean folding time of a single protein is constructed which can be used to derive the observed relationship between the folding rate constant and the number of native contacts. The parameters appearing in the model can be determined by fitting the theoretical prediction to the experimental result. In addition, taking into account the fact that the number of native contacts is almost proportional to the relative contact order, we can also explain the observed relationship between the folding rate constant and the relative contact order. r 2005 Elsevier Ltd. All rights reserved. Keywords: Protein folding; Folding rate; Contact order; Native contact; Random walk model
1. Introduction Protein folding is one of the most complicated problems in structural biology (see, e.g. Berendsen, 1998). There are two aspects to this problem: one is the prediction of the three-dimensional native structure of a protein from its sequence, and the other concerns with the question of how a protein reaches its native (folded) structure from its denatured (unfolded) state. Although it is not yet possible to reliably predict the native structure of a protein from its sequence, our understanding of mechanisms governing protein folding, which could also lead to better algorithms for structured prediction, has progressed considerably. Such progress was achieved through intensive experimental Corresponding author. Department of Physics, Dalian Maritime University, Dalian 116026, PR China. E-mail addresses:
[email protected],
[email protected] (D. Mi).
0022-5193/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.jtbi.2005.11.011
and theoretical studies of a broad class of small singledomain proteins (less than 110 amino acids) (see, e.g. Plaxco et al., 2000; Mirny and Shakhnovich, 2001; Myers and Oas, 2002). Experiments have shown that many small proteins fold with simple two-state kinetics, i.e., only folded and unfolded states are stable, whereas all partially folded states are unstable (Jackson, 1998). This two-state feature makes small single-domain proteins ideal systems for studying the folding mechanism. It is interesting to note that proteins not only spontaneously fold to a unique structure (Anfinsen, 1973), but can also do so amazingly quickly than that by randomly exploring all possible conformations of its unfolded state (Levinthal, 1968), moreover, they can fold at very different rate constants, ranging from 105 to 101 s1 (Jackson, 1998). Thus, a method for quantitative description of the time-scale of protein folding is useful to facilitate the study of folding mechanism, and the successful candidate of protein folding
ARTICLE IN PRESS D. Mi et al. / Journal of Theoretical Biology 241 (2006) 152–157
theories should also explain the six order of magnitude range of observed two-state folding rates. From a thermodynamic standpoint, protein folding can be understood in terms of free-energy change. The folding process can occur when the combination of the enthalpy change and the entropy change of the whole system makes the overall free-energy change negative. To describe the folding rates of two-state proteins, one usually need the help of the mass-action model and a corresponding Arrhenius diagram (Schonbrun and Dill, 2003). But, both thermodynamics and kinetics are macroscopic theories which could not give a microscopic insight into protein folding. To describe protein folding in detail, it is necessary to develop the microscopic theory which studies the folding behavior of a single protein. However, the simulation of protein folding in atomistic detail has proven computationally overwhelming. The large majority of protein folding models has been based on highly simplified representations of the polypeptide chain not of fully detailed protein, including lattice polymers and many offlattice computational models. Although many of the simulation- and experiment-based microscopic theories of folding kinetics could capture some of the potentially relevant aspects of real proteins to some extent, they generally cannot be responsible for the vast range of folding rates observed in experiments (Gillespie and Plaxco, 2004). In addition, in the usual microscopic folding models, such as ‘‘folding funnel’’ model (Leopold et al., 1992; Wolynes et al., 1995; Dill and Chan, 1997), the solvent–protein interactions are generally not taken into account explicitly, but rather by ‘‘renormalizing’’ the protein–protein interactions (Dill, 1999). This makes protein’s ‘‘internal free-energy’’ difficult to calculate accurately. Recently, a structure-based models has emerged, in which the native structures of proteins are used to infer the sequences of folding events (Weikl et al., 2004). For the small two-state proteins, it has been observed that both the relative contact order ðCOÞ and the number of native contacts ðNÞ are statistically significant correlation with the logarithms of the folding rate constant ðln kÞ (Plaxco et al., 1998; Makarov et al., 2002). Here CO and N all are topological parameters of protein in its native state, which refer to the set of non-covalent contacts. In this paper, we developed a simple model of protein folding based on onedimensional general random walk model (GRWM), which belongs to the type of structure-based method. The model describes protein folding from the ‘‘kinematic viewpoint’’ which does not require the calculation of any thermodynamic quantities. This simple model allows us to illustrate a surprising number of essential aspects of small single-domain protein folding phenomenon, such as the unique native structure, a large number of conformations, fast folding to the native state, and a cooperative folding (first-order like) transition, etc. Especially, the GRWM of protein folding can predict the observed relationship between the folding rate constant and the number of
153
native contacts. In addition, notice the fact that the number of native contacts is almost proportional to the relative contact order, this model can also explain the observed relationship between the folding rate constant and the relative contact order. 2. Theoretical model Although there exist vast, complex interatomic interactions during protein folding, some recent studies show that the folding rates and mechanisms appear to be largely determined by the topology of the native state (Baker, 2000). So, simple models based on the structure of the native state should be able to predict the overall features of protein folding to some extent. In this work, we assume that only native interactions are present in protein folding process (Munoz and Eaton, 1999; Alm and Baker, 1999; Galzitskaya and Finkelstein, 1999) and regard protein folding as a random walk process along the one-dimensional discrete chain with N þ 1 lattice, where N is the native contact number of the folded protein. Suppose that the folding conditions are turned on (e.g. by removing denaturant) at time t ¼ 0, then the conformation of the unfolded polypeptide chain evolves in time because of the stochastic interactions of solvent molecules. Two residues near enough in the sequence, say i1 and j 1 , can readily find each other through a small conformational search and form a contact Cði1 ; j 1 Þ. Then, Cði1 ; j 1 Þ serves as a constraint, forcing the other two residues i2 and j 2 into spatial proximity and forming another contact Cði2 ; j 2 Þ. Of course, instead of forming Cði2 ; j 2 Þ at the second step, Cði1 ; j 1 Þ may also unravel and the polypeptide chain goes back to its initial state owing to the thermal motion of solvent molecules. If Cði2 ; j 2 Þ is formed, it can further constrain the chain, bringing i3 and j 3 into spatial proximity and forming contact Cði3 ; j 3 Þ. Again, instead of forming Cði3 ; j 3 Þ at the third step, Cði2 ; j 2 Þ can also unravel and the polypeptide goes back to the state with only contact Cði1 ; j 1 Þ present, etc. Once the Nth contact ðiN ; j N Þ is formed, the protein can quickly and irreversibly fold into its final native state owing to the interatomic ‘‘order interactions’’ in polypeptide, such as static and van der Walls forces, etc. The almost instantaneous process accounts for a negligible fraction of the total folding time and does not determine the folding rate. Thus, the state with the Nth native contactis formed corresponds to the transition state of this model, and the folding time equals the time of forming all N contacts in native state. We want to point out that although both the native and the transition states have the same native contacts at the present coarse-grained level, their structures are different at atomic level. During the rapid and free-energy downhill process from the transition state to the native state, a lot of atoms constituting polypeptide may quickly rearrange their relative positions owing to the interatomic interactions, but all these adjustments in atomic positions are too slight to lead to the change of the formed N native contacts.
ARTICLE IN PRESS D. Mi et al. / Journal of Theoretical Biology 241 (2006) 152–157
154
The above protein folding picture is somewhat similar to the ‘‘HZ’’ model (Dill et al., 1993). However, the GRWM allows to ‘‘unzip’’ temporarily the contact zipper during protein folding process. Before transition state is formed, all the conformations are short-lived. As a result, the folding intermediates could not be observed in experiments owing to their too low concentrations. The only visible macroscopic states are the initial state (with 0 contact) and final state (with N contacts). This is consistent with the two-state feature of small proteins folding observed in experiments. Now we turn to the mathematical description of the above folding picture. Consider a one-dimensional discrete chain with a homogeneous distribution of N þ 1 latticesites, and let Dx be the distance between any two adjacent sites. Suppose that these lattice-sites represent N þ 1 states of a folding protein with different contact numbers. If none of contacts exists, the polypeptide will be ‘‘located’’ at one terminal of the chain, say, the left end. When the polypeptide randomly ‘‘walks’’ to the right end of the chain, then all N native contacts are formed. If the polypeptide has formed X (X is an integer from 1 to N 1) contacts at a given time, then it is at position x ¼ X Dx of the chain. From this position, it will randomly ‘‘walk’’ to the right site ‘‘x þ Dx’’ with a probability, say rðxÞ, or to the left site ‘‘x Dx’’ with a probability 1 rðxÞ. For the present coarse-grained model of protein folding, since there is no sufficient reason to say that the time taken to form (or unravel) a contact is actually longer or shorter than the time taken to form (or unravel) another, we tentatively assume that it takes the same amount of time to form (or unravel) any contact. Furthermore, we assume that the time step to form a contact is equal to the time step to break one. Let Dt be the time interval of each step, then it is ‘‘position’’ independent. The mean time TðxÞ for the protein to randomly ‘‘walk’’ from position ‘‘x’’ to the right end ‘‘n’’ satisfies the equation TðxÞ ¼ Dt þ rðxÞTðx þ DxÞ þ ½1 rðxÞTðx DxÞ.
(1)
This equation can be rewritten as ½Tðx þ DxÞ 2TðxÞ þ Tðx DxÞ ðDxÞ2 ½2rðxÞ 1 ½Tðx þ DxÞ TðxÞ Dt þ þ ¼ 0. Dx Dx ðDxÞ2
macroscopic description which can be directly tested by experiment. We would like to emphasize here that the one-dimensional discrete chain used in our model represents an abstract state space other than a real chain. This is different from the case in our previous work (Chen et al., 1997), in which, based on Berg’s random walk model (Berg, 1993), the GRWM was developed and used to study ATP-driven helicase translation along DNA. 3. Results and comparison with experiments In order to study the mean folding time of a single protein by using GRWM, we first derive the relationship between macroscopic rate contacts of forming or breaking a contact and the microscopic probabilities of randomly moving to the right and to the left in GRWM. For an ensemble of proteins, we assume that the number of proteins with X contacts at time t is Pðx; tÞ, then, Pðx; tÞ evolves in time during one time step according to the firstorder rate equation law dPðx; tÞ ¼ ½kf ðxÞ þ kd ðxÞPðx; tÞ, (4) dt where kf ðxÞ and kd ðxÞ are the mean rates of forming and breaking a contact at position ‘‘x’’, respectively. Generally, these rates depend on the number of contacts which have been formed. The solution of above equation is Pðx; tÞ ¼ Pðx; t0 Þe½kf ðxÞþkd ðxÞt ,
where Pðx; t0 Þ is the number of proteins with X contacts at initial time t0 of one time interval t0 t0 þ Dt. Correspondingly, for a single protein, the probability that the polypeptide keeps X contacts at time t is pðx; tÞ ¼ e½kf ðxÞþkd ðxÞt .
ð2Þ
rkf ðxÞ ¼
d2 TðxÞ ½2rðxÞ 1 dTðxÞ Dt þ þ ¼ 0. ½1 rðxÞ 2 dx Dx dx ðDxÞ2
rkd ðxÞ ¼
The above two equations can be solved for a given set of boundary conditions. Given the solution of Eq. (2) or (3), one can further determine the rate constant of protein folding, the inverse of the mean folding time. In this way, although we study protein folding from a microscopic insight (single protein folding), our results can provide a
(7)
Notice that rkf ðxÞ / ½1 ekf ðxÞDt , rkd / ½1 ekd ðxÞDt , it is easy to see that
It is worth noting that, in the limit of small steps, Dx ! 0, Eq. (2) becomes a second-order differential equation (3)
(6)
When t ! t0 þ Dt (Dt is the time of each step in GRWM of protein folding), pðx; tÞ!0. This means a new contact was formed or the existing contact unravelled. Let the formation and dissociation probability for a contact at position x be rkf ðxÞ and rkd ðxÞ, respectively, then 1 pðx; t0 þ DtÞ ¼ 1 ¼ rkf ðxÞ þ rkd ðxÞ.
½1 rðxÞ
(5)
1 ekf ðxÞDt , 2 ekf ðxÞDt ekd ðxÞDt
1 ekd ðxÞDt . (8) 2 ekd ðxÞDt We can see that the more greater the rate constant, the more greater the corresponding probability. From the two-state kinetics of protein folding we know that the partially folded conformations are thermodynamically unfavorable. This implies that more and more formation of native contacts correspond to ‘‘proceeding uphill in free energy’’, and hence the rate of breaking a ekf ðxÞDt
ARTICLE IN PRESS D. Mi et al. / Journal of Theoretical Biology 241 (2006) 152–157
155
contact is higher than the rate of forming it for most contacts (Makarov and Metiu, 2002), i.e. kd ðxÞ4kf ðxÞ. Combining this and Eq. (8) results in
comparing with the third term, the first two terms in the above equation are too small to considerably contribute to Tð0Þ. Neglect the first two terms in Eq. (13), we have
rkd ðxÞ4rkf ðxÞ.
Tð0Þ A1 eA2 N ,
(9)
That is the polypeptide ‘‘walks’’ to the right with a lower probability than ‘‘walks’’ to the left in the GRWM of protein folding. Since there is a lack of experimental data and theoretical method that can be used to determine the formation rate for each contact, for approximation, we use the mean formation rate to tentatively represent the formation rate for each contact. Similarly, we use the mean dissociation rate to represent the dissociation rate for each contact. Thus, both the probabilities rkf ðxÞ and rkd ðxÞ are independent of position x in the model under consideration. Let rðxÞ ¼ rkf ðxÞ ¼ r, Eq. (3) becomes d2 TðxÞ dTðxÞ þ b ¼ 0, a dx2 dx
(10)
with a ¼ ðð1 2rÞ=ð1 rÞÞð1=DxÞ, b ¼ ð1=ð1 rÞÞðDt=ðDxÞ2 Þ. Using Eqs. (7) and (9), one can see that 1 r4r. This gives ro12. So, both parameters a and b are greater than zero. This is very important in the case under consideration, otherwise Eq. (10) will have a very different solution (see Eq. (11)). The general solution of Eq. (10) is b TðxÞ ¼ x þ C 1 eax þ C 2 , a
(11)
where C 1 and C 2 are two constants which are determined by the boundary conditions. In the present model, a natural choice is to take the adsorbing boundary at the right end ‘‘n’’ of the discrete chain, i.e. TðnÞ ¼ 0. Considering that if none of contacts exists at a given time, the polypeptide will form the first contact next step, one can assume that x ¼ 0 is a reflecting boundary. This implies that the mean time does not vary with x, which gives dT=dx ¼ 0 at x ¼ 0 (Berg, 1993). Thus, the solution of Eq. (10) under the above boundary conditions is 1 Dt ðn xÞ 1 2r Dx 1r þ Dteðð12rÞ=ð1rÞÞð1=DxÞn ð1 2rÞ2
½1 eðð12rÞ=ð1rÞÞð1=DxÞðnxÞ .
ð12Þ
According to the former hypothesis, let x ¼ 0 in Eq. (12), we can obtain the mean folding time of a protein as 1 1r DtN Dt 1 2r ð1 2rÞ2 1r Dteðð12rÞ=ð1rÞÞN . þ ð1 2rÞ2
where A1 ¼ ðð1 rÞ=ð1 2rÞ ÞDt, A2 ¼ ð1 2rÞ=ð1 rÞ. This result is not surprising. Because, on the one hand, according to the kinetics of protein folding, the more the number of rate-determining native contacts, the slower a protein folds. At the same time, from the former analysis, one can see that the probability of forming a contact is smaller than that of breaking one. Combining above two conditions yields the following result: the folding time of a protein will quickly increase with the number of native contacts. This qualitative analysis is consistent with the above quantitative result: the folding time of a protein shows an exponential increase as a function of the number of native contacts. The rate constant of protein folding is given by k¼
1 1 A2 N ¼ e . Tð0Þ A1
Tð0Þ ¼
ð13Þ
It is obvious that the folding time is independent of the parameter Dx, so it has disappeared in the above equation. Generally, no matter how r varies between 0 and 12,
(15)
Taking the natural logarithm of both sides of the above equation, we can obtain the following linear relationship between ln k and N: ln k ¼ ln A1 A2 N.
(16)
In addition, taking into account the fact that the number of native contacts is almost proportional to the relative contact order (The linear regression of N vs. CO by using the experimental data will show this linear relationship, see Eq. (20).), from Eq. (16) we can obtain ln k ¼ B1 þ B2 CO,
(17)
where B1 and B2 are constants for the selected proteins data set. Eq. (17) indicates that ln k is linear with CO, which is the famous observation by Plaxco et al. (1998). Now, let us consider the corresponding experimental results. Using the measured rate constant and native structure for a set of 22 small two-state proteins (Table 1), we can see that ln k is linear correlated with N ln k ¼ 9:98 0:112 N
TðxÞ ¼
(14) 2
(18)
with a correlation coefficient R ¼ 0:84. The p-value associated with correlation, po0:0001, is extremely low, suggesting that the observed correlation is highly unlikely to have arisen by chance in the 22 member test set. (We assume that two residues in the folded protein are in contact if the straight-line distance between their C a atoms is less than d, and if there are more than l residues between them along the chain. To calculate the number of native contacts N, we take the cutoffs d ¼ 6:0 A˚ and l ¼ 12. We ˚ l from 4 to 15, do found that cutoffs d from 4:0 to 8:0 A, not significantly affect the correlations described in this work.) Table 1 in Plaxco et al. (2000) contains 24 small proteins that fold with two-state kinetics. Two of these proteins were considered unsuitable for the case under
ARTICLE IN PRESS D. Mi et al. / Journal of Theoretical Biology 241 (2006) 152–157
Table 1 List of the selected proteins in this article Protein
PDB code
L
N
CO ð%Þ
ln ðkobs Þ
cyt b562 l-repressor PSBD cyt c Im9 ACBP N-terminal L9 ubiquitin CI-2 U1A ADAh2 protein G protein L FKBP HPr MerP mAcP CspB TnFNIII TiI27 fyn SH3 twitchin
256B 1LMB 2PDD 1HRC 1CEI 2ABD 1DIV 1UBQ 1CIS 1URN 1PBA 1PGB 2PTL 1FKB 1POH 1AFI 1APS 1CSP 1TEN 1TIT 1SHF 1WIT
106 80 41 104 85 86 56 76 64 102 81 56 62 107 85 72 98 67 90 89 59 93
11 9 4 30 14 25 23 38 49 52 37 20 27 94 48 50 78 45 77 63 38 98
7.47 9.37 11.20 11.22 12.07 13.99 12.74 15.11 16.40 16.91 16.96 17.30 17.62 17.70 18.35 18.90 21.20 16.40 17.35 17.82 18.28 19.70
12.21 11.01 9.68 8.76 7.28 6.57 6.61 7.35 4.03 5.83 6.64 5.67 4.10 1.38 2.70 0.60 1.47 6.54 1.06 3.48 4.54 0.41
that r12, the rate constant of forming a contact and that of breaking a contact should have the same order of magnitude. The order of magnitude of rate calculated here
14 12 10 8 lnk
156
6 4 2 0 -2 0
20
40
60
80
100
N
(A)
100
The columns in this table are as follows: Protein, name of protein; PDB code, Protein Data Bank entry; L, number of residues in the protein used in experimental study; N, number of native contacts; CO ð%Þ, relative contact order ð%Þ; ln ðkobs Þ, natural logarithm of the experimental folding rates in the water.
80
N
60 40
r ¼ 0:47;
Dt ¼ 3:14 107 s.
(19)
As expected, the probability r is less than 12. The inverse of Dt, which equals 3:18 106 s1 , corresponds to the mean rate constant of forming and breaking a contact. Notice
20 0 6
8
10
12
14
16
18
20
22
16
18
20
22
CO
(B)
14 12 10 8 lnk
consideration and were excluded: Myoglobin and villin 14T (4110 amino acids). If these two proteins were included, the linear correlation coefficient between ln k and N is 0:78. Comparing with the correlation coefficient 0:84, one can see that the correlation between ln k and N becomes evidently smaller than that of the case excluding these two proteins. We speculate that, for the small twostate proteins with less than about 110 amino acids, the protein’s topology may be the most important factor in determining the folding rate; for the proteins with more than about 110 amino acids, besides the protein’s topology, other factors (such as the chain length, the stability of proteins, etc.) should be considered when discussing protein folding kinetics. The latter case goes beyond the present discussion. So, for the small, single-domain, two-state proteins (less than 110 amino acids), the random walk model of protein folding can well predict the observed relationship between the folding rate constant and the number of native contacts. Comparing Eqs. (16) and (18), we can determine the two constants in Eq. (16), and further obtain the two parameters r and Dt in the model
6 4 2 0 -2 6
(C)
8
10
12
14 CO
Fig. 1. The linear regressions of (A) ln k vs. N, (B) N vs. CO and (C) ln k vs. CO together with the experimental points for the 22 small proteins used in this study.
ARTICLE IN PRESS D. Mi et al. / Journal of Theoretical Biology 241 (2006) 152–157
agrees well with that of the experimental observations (Bieri et al., 1999; Lapidus et al., 2000). In addition, using the experimental data (Table 1), the linear regressions of N vs. CO and ln k vs. CO show that N is proportional to CO N ¼ 45:49 þ 5:61 CO,
(20)
with a correlation coefficient 0:75; ln k is proportional to CO ln k ¼ 19:48 091 CO,
(21)
with a correlation coefficient 0:91. In both cases, the pvalues are all smaller than 0:0001, suggesting that the observed correlations are highly credible. Fig. 1 shows the linear regressions of ln k vs. N, N vs. CO and ln k vs. CO together with the experimental points (Table 1). 4. Conclusions In addition to deriving the observed relationship between the folding rate constant and the number of native contacts, the model analysed in this work can reproduce most of the essential aspects of small protein folding phenomenon. We hope that the simple model with a few hypotheses may lend itself to the understanding of protein folding mechanism from a certain point of view. Of course, the GRWM deals with protein folding from the ‘‘kinematic viewpoint’’ other than ‘‘dynamic viewpoint’’. If the parameters in the model, the probabilities to form and break a contact and the time step of each step, can be obtained from the fundamental physical principles, it will further perfect the present model. Future work will also seek to merge the model into a theoretical framework that allows the enthalpy, entropy and free energy of protein folding to be estimated. Acknowledgements We acknowledge Profs. F. Jiang, L.H. Lai, Y.Z. Chen, D.E. Makarov and F.X. Han for their help and useful discussions. This work was partially supported by the Research Foundation of Singapore-MIT Alliance and the National Natural Science Foundation of China (Grant no. 10273004). References Alm, E., Baker, D., 1999. Prediction of protein-folding mechanisms from free-energy landscapes derived from native structures. Proc. Natl Acad. Sci. USA 96, 11305–11310.
157
Anfinsen, C., 1973. Principles that govern the folding of protein chains. Science 181, 223–230. Baker, D., 2000. A surprising simplicity to protein folding. Nature 405, 39–42. Berendsen, H.J.C., 1998. A glimpse of the holy grail. Science 282, 642–643. Berg, H.C., 1993. Random Walks in Biology. Princeton University Press, Princeton, p. 42. Bieri, O., Wirtz, J., Hellrung, B., Schtkowski, M., Drewello, M., Kiefhaber, T., 1999. The speed limit for protein folding measured by triplet–triplet energy transfer. Proc. Natl Acad. Sci. USA 96, 9597–9601. Chen, Y.Z., Mi, D., Song, H.S., Wang, X.J., 1997. General random walk model of ATP-driven helicase translocation along DNA. Phys. Rev. E 56 (1), 919–922. Dill, K.A., 1999. Polymer principles and protein folding. Protein Sci. 8, 1166–1180. Dill, K.A., Chan, H.S., 1997. From Levinthal to Pathways to Funnels: the ‘‘new view’’ of protein folding kinetics. Nat. Struct. Biol. 4, 10–19. Dill, K.A., Fiebig, K.M., Chan, H.S., 1993. Cooperativity in proteinfolding kinetics. Proc. Natl Acad. Sci. USA 90, 1942–1946. Galzitskaya, O.V., Finkelstein, A.V., 1999. A theoretical search for folding/unfolding nuclei in three-dimensional protein structures. Proc. Natl Acad. Sci. USA 96, 11299–11304. Gillespie, B., Plaxco, K.W., 2004. Using protein folding rates to test protein folding theories. Annu. Rev. Biochem. 73, 837–859. Jackson, S.E., 1998. How do small single-domain proteins fold? Folding Des. 3, R81–R91. Lapidus, L.J., Eaton, W.A., Hofrichter, J., 2000. Measuring the rate of intramolecular contact formation in polypeptides. Proc. Natl Acad. Sci. USA 97, 7220–7225. Leopold, P.E., Montal, M., Onuchic, J.N., 1992. Protein folding funnels: a kinetic approach to the sequence–structure relationship. Proc. Natl Acad. Sci. USA 89, 8721–8725. Levinthal, C., 1968. Are there pathways for protein folding? J. Chim. Phys. 65, 44–45. Makarov, D.E., Metiu, H., 2002. A model for the kinetics of protein folding: kinetic Monte Carlo simulations and analytical results. J. Chem. Phys. 116, 5205–5216. Makarov, D.E., Keller, C.A., Plaxco, K.W., Metiu, H., 2002. How the folding rate constant of simple, single-domain proteins depends on the number of native contacts. Proc. Natl Acad. Sci. USA 99 (6), 3535–3539. Mirny, L., Shakhnovich, E., 2001. Protein folding theory: from lattice to all-atom models. Annu. Rev. Biophys. Biomol. Struct. 30, 361–396. Munoz, V., Eaton, W.A., 1999. A simple model for calculating the kinetics of protein folding from three-dimensional structures. Proc. Natl Acad. Sci. USA 96, 11311–11316. Myers, J.K., Oas, T.G., 2002. Mechanism of fast protein folding. Annu. Rev. Biochem. 71, 783–815. Plaxco, K.W., Simons, K.T., Baker, D.J., 1998. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277, 985–994. Plaxco, K.W., Simons, K.T., Ruczinski, I., Baker, D., 2000. Topology, stability, sequence, and length: defining the determinants of two-state protein folding kinetics. Biochemistry 39, 11177–11183. Schonbrun, J., Dill, K.A., 2003. Fast protein folding kinetics. Proc. Natl Acad. Sci. USA 100, 12678–12682. Weikl, T.R., Palassini, M., Dill, K.N., 2004. Cooperativity in two-state protein folding kinetics. Protein Sci. 13, 822–829. Wolynes, P.G., Onuchic, J.N., Thirumalai, D., 1995. Navigating the folding routes. Science 267, 1619–1620.