Document not found! Please try again

Fluctuations in the DNA double helix.

4 downloads 0 Views 3MB Size Report
DNA is not the static entity suggested by the famous double helix structure. ... 1: The double helix structure of DNA shown in a full atomic representation (left) and ...
Fluctuations in the DNA double helix. Michel Peyrard, Santiago Cuesta L´opez Laboratoire de Physique, Ecole Normale Sup´erieure de Lyon, 46 all´ee d’Italie, 69364 Lyon Cedex 07, France.

Dimitar Angelov Laboratoire Joliot Curie, Ecole Normale Sup´erieure de Lyon, 46 all´ee d’Italie, 69364 Lyon Cedex 07, France. (Dated: January 30, 2007) DNA is not the static entity suggested by the famous double helix structure. It shows large fluctuational openings, in which the bases, which contain the genetic code, are temporarily open. Therefore it is an interesting system to study the effect of nonlinearity on the physical properties of a system. A simple model for DNA, at a mesoscopic scale, can be investigated by computer simulation, in the same spirit as the original work of Fermi, Pasta and Ulam. These calculations raise fundamental questions in statistical physics because they show a temporary breaking of equipartition of energy, regions with large amplitude fluctuations being able to coexist with regions where the fluctuations are very small, even when the model is studied in the canonical ensemble. This phenomenon can be related to nonlinear excitations in the model. The ability of the model to describe the actual properties of DNA is discussed by comparing theoretical and experimental results for the probability that base pairs open an a given temperature in specific DNA sequences. These studies give us indications on the proper description of the effect of the sequence in the mesoscopic model.

I.

INTRODUCTION

Fermi-Pasta-Ulam showed that a simple model for a solid can yield extremely interesting results, and in particular exhibit a new class of excitations, nonlinear localized excitations, the solitons. However the model was oversimplified for a solid, and, except in very special cases, the lattice solitons detected by Fermi-Pasta-Ulam are not seen in experiments. This raises two related questions 1. Are there real systems where nonlinear excitations are very likely to exist and can be observed in experiments? 2. Can we design a simple model that would provide a realistic description of a system as complex as a solid or a macromolecule and allow us to study these nonlinear excitations? In this paper we show you that we can answer yes to these two questions by considering the example of DNA. Like in the Fermi-Pasta-Ulam study, the nonlinear excitations which appear when the DNA model is studied by molecular dynamics simulations raise fundamental questions in statistical physics because they are responsible for a temporary breaking of equipartition of energy, regions with large amplitude fluctuations being able to coexist with regions where the fluctuations are very small, even when the model is studied in the canonical ensemble. In the first part we make a brief review of experimental facts about DNA, and in particular discuss observations that show the importance of nonlinear excitations in the molecule. In the second part, we examine the design of a simple DNA model, nevertheless accurate enough to describe experimental observations related to the nonlinear excitations in DNA, including the effects of the base-pair sequence. II.

NONLINEAR EXCITATIONS IN DNA: SOME EXPERIMENTAL FACTS.

Since the famous paper of Watson and Crick [1] the double helix structure of DNA became familiar to anybody. Actually DNA is a polymer, or, more precisely a set of two entangled polymers. Its structure is presented in details in the books of Saenger [2] and Calladine [3] and shown on Fig. 1. Each of the monomers that make up these polymers is a nucleotide which is a compound of three elements: a phosphate group P O4− , a sugar ring, i.e. a five-atom cyclic group, and a base which is a complex organic group that can have one or two cycles. The backbone is formed by a sequence of phosphate groups and sugars, and it is oriented because on one side of the sugar the phosphate group is linked to a carbon atom that does not belong to the sugar ring while, on the other side, it is linked to a carbon atom which is part of the sugar ring.

2

FIG. 1: The double helix structure of DNA shown in a full atomic representation (left) and schematically (right).

The sugars and phosphates of all nucleotides are all the same but nucleotides differ by the bases which can be of 4 types, belonging to two categories, the purines Adenine (A) and Guanine (G) have two organic cycles and the pyrimidines Cytosine (C) and Thymine (T), which are smaller because they only have one organic cycle. One important observation that lead Watson and Crick to the famous discovery of the DNA double helix structure is that the bases tend to assemble in pairs through hydrogen bonds, and that pairs formed by a purine and a pyrimidine have almost the same size, so that such pairs give a very regular structure to the two chains of nucleotides linked by the hydrogen bonds, as schematized on figure 2. P

S

P

P

S

T

A

P

S

S

C

P

First strand

A Base

P

pairs

A

T

S

P

G

S

P

T

S

P

S

P

Second strand

FIG. 2: Schematic view of the two nucleotide chains, assembled by hydrogen bonds (thick lines) forming DNA.

The biological role of DNA is to store the genetic code in the sequence of base pairs. Its structure, which contains two complementary strands led Watson and Crick to write the famous sentence: It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material. But, in order to read the code, one needs to have an access to the bases to allow some chemical reactions. As shown in Fig. 1, owing to the size of the atoms forming the sugar-phosphate backbone, the bases are hardly accessible. Thus, if one looks at the static structure of DNA, the code seems well protected but unreadable! But this static view of DNA is only a very incomplete view. DNA is a highly dynamical entity and its structure is not frozen. The “breathing” of DNA has been known from biologists for decades. It consists in the temporary opening of the base pairs. This is attested by proton-deuterium exchange experiments. DNA is put in solution in deuterated water, and one observes that the imino-protons, which are the protons forming hydrogen bonds between two bases in a base pair, are exchanged with deuterium coming from the solvent. As these protons are deeply buried in the DNA structure, the exchange indicates that bases can open, at least temporarily, to expose the imino protons to the solvent [4]. The determination of the lifetime of a base pair, i.e. the time during which it stays closed, has been the subject of some controversy [5] because the rate limiting step in the exchange may be either the rate at which base pairs open, or the time necessary for the exchange. Accurate experiments, using NMR to detect the exchange, showed that the lifetime of a base pair is of the order of 10 ms. These experiments also show that the protons of one particular base pair can be exchanged while those of a base pair next to it are not exchanged. This indicates that the large conformational changes that lead to base pair opening in DNA are highly localized, which means that the coupling between successive

3

UV Absorption

bases along the DNA helix is weak enough to allow consecutive bases to move almost independently from each other. The fluctuational opening of the base pairs and the structure of DNA itself are strongly temperature dependent. When the molecule is heated up to a sufficiently high temperature (70◦ C to 90◦ C, depending on the sequence) it is thermally denaturated. The fluctuations are so large that they break all the hydrogen bonds connecting the two strands, and the strands separate from each other. This transition is also called the “melting” of DNA and it can be followed easily by UV absorption spectroscopy because the breaking of the base pairs is accompanied of a large increase of the absorbance of UV light at 260 nm. Experiments have been performed on artificial DNAs which are homopolymers, i.e. have only one type of base pairs. In this case they detect a sharp transition between the double helix and the separated strands as shown in Fig. 3, showing that the “melting appears as a genuine phase transition.

60° C

70°C

FIG. 3: Denaturation curve of an homopolymer of DNA having only G − C base pairs. (Figure adapted from Ref. [6])

Natural DNA molecules show a denaturation that occurs in multiple steps and is highly sensitive to the details of the sequence [7]. The observation of DNA molecules under denaturation, using cryomicroscopy, shows that denaturation starts by local openings, called “denaturation bubbles” that grow with temperature and invade the whole molecule at the denaturation temperature, causing the separation of the two strands, as schematically plotted on Fig. 4

Temperature FIG. 4: Schematic picture of the thermal denaturation of DNA, showing that it starts locally by “denaturation bubbles”.

4 What makes DNA highly attractive to study nonlinear excitations is that the fluctuational opening of the base pairs, which is a very large amplitude motion of some parts of the molecule, highly nonlinear and well localized in space, can be studied very accurately by physicists, using biological tools. One method which is widely used is to attach some “sensors” to the DNA double helix. They are made of two parts, a fluorophore, i.e. a fluorescent organic group, and a quencher. These two elements are chemically bound to the molecule near the base pair of interest, each one being attached to a strand. When the base pair is closed, the two elements are next to each other in space and there is an electronic transfer between the fluorophore and the quencher which prevents the fluorescence. But when the two strands separate locally, the fluorophore is far enough from the quencher to become fluorescent. Therefore a study of the time dependence of the fluorescence made by confocal microscopy observations provides a measurement of the local opening of the molecule [8]. In this approach biochemical methods are used to bind the fluorophore and the quencher at the desired site in DNA. This requires the preparation of a special molecular probe for each site where the opening has to be studied. This is why we are using a different approach, developed by D. Angelov and A. Spassky [9] which is particularly interesting because, in a single experiment, it gives an information on the opening of all the GC base pairs at once. The principle of the experiment is schematized on Fig. 5.



*

+ G

C DNA structure dependent ionization of G (5ns)

*

G*

C Competitive transformation of G+ into 2 stable products ( ~ 1 µs)

*

G*

C DNA cleavage at G* by piperidine or Fpg and sequencing gel analysis FIG. 5: Experiment to observe the local fluctuations of the GC base pairs (from Ref. [9]). Left: schematic picture of the process: a radioactively marked DNA is exposed to high intensity UV pulses that cause some ionization of the guanines. Then it is cleaved at the site of the ionization. The efficiency of the cleavage by different reactants depends of the state of the base pair when the molecule was irradiated (local conformation). The migration of the fragments on a gel, which is followed thanks to the radioactivity that they carry can be used to measure the length of the fragments with a resolution of a single base pair (right part of the figure).

Radioactively marked DNA molecules are exposed to high intensity UV–laser pulses which are able to ionize the guanines through a two-photon process. After the ionization, the electrons of the neighboring bases are redistributed, and the final product depends on the local conformation of the DNA molecule when it was hit by the laser [10, 11]. Open or closed GC base pairs lead to different products. These products may be identified by two reactants, piperidine or Fpg, each one being able to cleave a strand at the site of one of the two products. Therefore measuring the ratio between the two types of cleavages gives the measure of the probability that the DNA molecule was open or closed when it was hit by the laser. Moreover one can determine where each type of cleavage occurred by measuring the length of the molecular fragments. This is done by a standard biological method: electrophoresis on a denaturing gel, i.e. a gel in which the two strands separate from each other, due to a chemical effect. Short nucleotides move faster than longer ones, and therefore they go farther in the gel during the electrophoresis. Because the strand of interest carries a radioactive label, a photographic plate can be used to determine the position reached by the fragments, and how many of them reached this position, i.e. how many had a given length. Such an experiment can attain an

5 accuracy of a single base pair. As a result by making the ratio of the two types of cleavages, one can get a cartography of the probability of opening of all the GC base pairs in a particular DNA sequence.

FIG. 6: Temperature evolution of the fluctuational opening of different GC base pairs of a short DNA sequence (indicated on top of the figure). The indices of the Guanine bases marked on the graph correspond to the indices indicated on the sequence (From Ref. [9]).

Figure 6 shows an example of the results that the method can provide (From Ref. [9]). The fluctuational opening of some base pairs rises sharply at a particular temperature in a way which is very similar to the rise of the UV absorbance observed at the thermal denaturation. This can be viewed as a “local melting” temperature of some base pairs, while others show large fluctuations even at room temperature. Note for instance that one of these highly fluctuating base pairs, involving Guanine number 8, is next to a segment containing many AT base pairs, which are bound only by 2 hydrogen bonds, instead of 3 for GC and thus are more prone to opening. This suggests that local fluctuations of AT rich regions affect their vicinity, and this has been confirmed by more accurate experiments performed very recently [12] To summarize this experimental part, it should be noticed that DNA is an ideal “nonlinear lattice” because it can be manipulated with biological tools and studied in details locally. The very large fluctuations which exist in the structure of the molecule, even at room or biological temperature, can be highly localized and can exist even in homopolymers, i.e. in artificial DNA sequences where all the base pairs are the same. Therefore the existence of these localized excitation should not be attributed to disorder induced localization, and instead their origin should be sought in the nonlinear aspect of the motion of the bases in DNA. In natural DNA, the local fluctuations are moreover affected by the sequence and for instance AT rich regions are more likely to show large amplitude fluctuations. However the effect of the sequence may be rather subtle [13]. III.

MODELING DNA NONLINEAR DYNAMICS.

Having recognized that DNA is an interesting “nonlinear lattice”, it is therefore tempting to try to revive the FermiPasta-Ulam idea with this particular system, i.e. use the computer to study its dynamics and, in particular, examine how it thermalizes. The first step is therefore to select a model for this study. A.

Selection of a model

In their original study Fermi-Pasta-Ulam designed a very crude model for a solid, which was simply restricted to a one-dimensional lattice of particles interacting with a nonlinear potential. This was of course imposed by the very small computing abilities of the first electronic computers. We are now used to powerful machines so that it is tempting to consider a model which is as close as possible to reality. Figure 7 illustrates the difficulty of simulations which try to stay too close to the actual system. Even for a short DNA segment, the number of degrees of freedom for all the atoms of the molecule is very large, and, as the calculations involve complex interaction potentials, they are slow. But, more importantly, in a realistic calculation, one does not study the molecule alone. It would be meaningless

6 (a)

(b)

FIG. 7: a) Atomic model of a short single strand of DNA forming a hairpin studied in an all-atom numerical simulation. b) The actual system which is studied in the simulation, including the solvent molecules.

to try to describe all the details of the molecular structure and ignore the water which surrounds the molecule, and often contribute to the atom-atom interactions of the molecule itself through hydrogen bond bridges. As a result the program spends most of its time studying the solvent and calculations for a single molecular-dynamics trajectory can require weeks or months, depending on the size of the molecule which is studied and the number of processors used. In a study which is interested in the thermalization of the molecule, or in its properties as a function of temperature, a single trajectory in the phase space is not sufficient. One needs many of them to get meaningful averaged properties. Moreover large pieces of the molecule have to be considered to be able to investigate the energy flow between different parts. For DNA, if one wants to be able to take into account, for instance, the denaturation bubbles that we have described above, tens or hundreds of base-pairs have to be considered. Then the simulations become completely out of reach of all-atom molecular dynamics. Even if it were possible such a calculation might be inappropriate because it would produce a huge amount of data, from which it would be very difficult to extract mesoscopic quantities such as the fraction of open base pairs at a given temperature, which can be compared with experiments. This is why models at intermediate scales are useful. The fact that these models are less precise does not imply that their development is simpler. In fact it can even be more difficult because one cannot rely on the first principles to establish the models and determine their parameters. For instance the interaction potentials between individual atoms are now rather well known. But this is not the case for the interaction potentials between complex groups such as the bases of DNA. Such potentials involve many atoms, but also solvent molecules or ions, so that they are hard to evaluate. This is why it is very important to check the validity of the models and to calibrate their parameters by comparison with well controlled experiments. At the scale of a base pair, the simplest model that one can think of is the Ising model. The state of a base pair is simply described by a two-state variable, equal to 0 if the base is closed and 1 if it is open. Such models have been proposed and successfully used to study the thermal denaturation of DNA [14, 15]. The drawbacks for our purpose are the following • they contain a large number of parameters which cannot be obtained (or even estimated) from basic physical laws. For instance the energy to open a base pair can be evaluated, but, to describe collective effects one also needs parameters such as the probability that a base pair opens if a neighboring one is open, or if it is closed. They can only be obtained by fitting denaturation curves. • As the model uses a two-state variable to describe the status of a base pair, it cannot represent the nonlinear dynamics of opening or closing, that requires the description of a continuum of intermediate states. The next step is to keep only one degree of freedom per base pair, but use a real variable that describes the stretching of the bond linking the bases instead of the discrete 0/1 variable of the Ising model. Let us denote by yn the stretching

7 of the nth base pair. The value yn = 0 corresponds to a closed base pair as in the Ising model, but now yn can increase continuously to infinity if the two bases separate completely as in DNA denaturation. The variable yn can even take negative values, corresponding to a compression of the bond linking the bases with respect to its equilibrium length. Large negative values will be forbidden by steric hindrance, which is introduced in the model by the potential linking the bases in a pair. n-1

n

n+1

ş ť2 V (y) = D e−ay − 1 V(y)

y

0

V(yn )

W(y , y n

n-1

y

)

FIG. 8: The simple dynamical model for DNA nonlinear dynamics, described by Hamiltonian (1).

The model is shown on Fig. 8 and it is defined by its Hamiltonian H=

X p2 n + W (yn , yn−1 ) + V (yn ), 2m n

with pn = m

dyn , dt

(1)

where m is the reduced mass of the bases. At this stage we do not explicitely include the genetic code and all base pairs are considered to be the same. The potential V (y) describes the interaction between the two bases in a pair. We use a Morse potential ¡ ¢2 V (y) = D e−ay − 1 ,

(2)

where D is the dissociation energy of the pair and a a parameter, homogeneous to the inverse of a length, which sets the spatial scale of the potential. This expression has been chosen because it is a standard expression for chemical bonds and, moreover, it has the appropriate qualitative shape • it includes a strong repulsive part for y < 0, corresponding to the sterical hindrance between the bases, • it has a minimum at the equilibrium position y = 0, • it becomes flat for large y, giving a force between the bases that tend to vanish, as expected when the bases are very far apart; this feature allows a complete dissociation of the base pair, which would be forbidden if we had chosen a simple harmonic potential. The potential W (yn , yn−1 ) describes the interaction between adjacent bases along the DNA molecule. It has several physical origins: • the presence of the sugar-phosphate strand, which is rather rigid and connects the bases. Pulling a base out of the stack in a translational motion tends to pull the neighbors due to this link. One should notice however that we have not specified the three dimensional motion of the bases in this simple model. An increase of the base pair stretching could also be obtained by rotating the bases out of the stack, around an axis parallel to the helix axis and passing through the attachment point between a base and the sugar–phosphate strand. Such a motion would not couple the bases through the strands. The potential W (yn , yn−1 ) is an effective potential which can be viewed as averaging over the different possibilities to displace the bases. • the direct interaction between the base pair plateaux, which is due to an overlap of the π-electron orbitals of the organic rings that make up the bases.

8 In a first stage, we shall use for W (yn , yn−1 ) the simplest expression, i.e. the expansion of the potential around its minimum which is reached when yn = yn−1 W (yn , yn−1 ) =

1 K(yn − yn−1 )2 . 2

(3)

Such an harmonic approximation would be good if the stacking interaction were strong enough to keep yn close to yn−1 at all times. This is not true for DNA, but the harmonic approximation allows easier calculations, and it is sufficient to get some interesting results which agree with some experimental observations. However we shall see that the expression of W has to be improved to provide a correct description of the thermal denaturation (see Sec. III B), and for calculations interested in quantitative properties for actual DNA we use expressions of the potential W (yn , yn−1 ) which take the sequence into account (see Sec. III C). The choice of the potential parameters is a very difficult question because the potentials entering in the model are effective potentials, which combine many actual interactions. For instance V (y) includes the hydrogen bonds between the bases but also the repulsion between the charged phosphate groups, which is partly screened by the ions which are in solutions. Therefore, while the parameters can be estimated from our knowledge of order of magnitude of the strength of the bonds involved in the effective interactions, their exact values are not deduced from theoretical calculations. The parameter that we use have been calibrated by comparison with experiments, in particular the thermal denaturation as discussed in Sec. III C. Typical parameters for V (y) are D = 0.03 eV, which is slightly above kB T at room temperature (kB being the Boltzmann constant) and a = 4.5 ˚ A−1 . For a stretching of the base pair distance of 0.1 ˚ A, these parameters give a variation of energy of 0.006 eV, which is consistent with the values observed for hydrogen bonds. The typical value chosen for K is K = 0.06 eV/˚ A2 , which corresponds to a weak coupling between the bases, as attested by the experimental results showing that proton-deuterium exchange can occur on one base pair without affecting the neighbors. The average mass of the nucleotides is 300 atomic mass units. The values of the constants have been given with a systems of units adapted to the scale of the problem: lengths in units of ` = 1 ˚ A, energies in units of e = 1 eV, mass in units of m0 = 1 atomic mass unit. This defines a natural time unit t0 through −14 e = m0 `2 t−2 s, which is of the order of magnitude of the period of the vibrational 0 , which is equal to t0 = 1.018 10 motions of the base pairs. B.

Thermal properties of the DNA model.

In the original study of Fermi-Pasta-Ulam, the question that they chose to investigate was the thermalization, i.e. the study of the evolution from a specific boundary condition towards a thermally equilibrated state. For the DNA model, this also leads to very interesting results when one starts from an initial condition which is a plane wave, i.e. a uniform energy density. The calculation shows that, instead of reaching equilibrium quickly, the model develops large amplitude localized nonlinear modes [16], which are discrete breathers [17]. However these nonlinear modes would be merely mathematical curiosities if they could not appear in an actual physical system. In is in this respect that the study of DNA allows us to go beyond the kind of investigations performed by Fermi-Pasta-Ulam, because the results of the numerical experiments on a model non-linear lattice can be compared to the experimental observations which can be made at a local scale, as explained above. For this, instead of studying the evolution of a particular initial condition, one needs to study the molecule in in the presence of its thermal fluctuations. Properly simulating a thermal bath is not a simple task, but several methods have been designed to satisfactorily approximate the thermal fluctuations. A simple one is to add to the equations of motions a fluctuating force and a damping term, related by the fluctuation dissipation theorem, leading to a set of coupled Langevin equations. A more efficient way has been developed by Nos´e [18] and improved by Martyna et al. [19]. The idea is to simulate an extended system which includes not only the physical system of interest but a few additional dynamical variables corresponding to a chain of “thermostats”. One of the thermostats is coupled to all the degrees of freedom of the physical system, according to the method proposed by Nos´e that leads to exact canonical properties for the physical system, and the others contribute to properly randomize the first thermostat in order to ensure a proper exploration of the phase space of the system. This approach leads to a faster thermalization than Langevin simulations and one can control that, not only the average kinetic energy corresponds to the expected temperature, but also the fluctuations of the kinetic energy have the value N (kB T )2 expected for a one dimensional canonical system with N degrees of freedom. Figure 9 show the properties of the DNA model when it is kept at different temperatures (T = 100 K, T = 310 K T = 350 K). It displays two types of characteristic patterns. The most apparent, at high temperature, are large red

9 T=150 K

T=310 K

T=350 K

FIG. 9: Numerical simulations of the dynamics of the DNA model in contact with a thermal bath simulated according to the method of Martyna et al. [19]. The model has 256 base pairs. The amplitude of the base pair stretching is shown by a color scale ranging from blue (base pair fully closed) to red (base pair completely open). The horizontal axis extends along the molecule and the vertical axis corresponds to time. (a) Simulation at constant temperature T = 150 K, (b) T = 310 K, (c) T = 350 K.

spots, which correspond to regions where the base pair stretching is very large over a few tens of consecutive bases for a limited time. Such regions correspond to the “denaturation bubbles” observed in the experiments. Another noticeable feature is the presence of vertical dotted lines, i.e. successions of red and blue regions involving a few base pairs, located in one particular place of DNA. Such patterns are created by very large amplitude localized vibrational modes of the molecule. They show that the discrete breathers predicted by theoretical analysis, and which can emerge from a particular initial condition, as the solitons do in the Fermi-Pasta-Ulam model [16], can actually be thermally generated. It means that they should play a role in the actual physical properties of DNA, at least if the model is accurate enough to describe an actual DNA molecule in spite of its simplicity. They could correspond to the “breathing of DNA” observed in experiments. The simulation also points out an interesting feature of the model, the existence of localized large amplitude motions appears as a temporary breaking of energy equipartition in the system, although it is in equilibrium. Figure 9 clearly shows regions where the displacements are large separated by narrower “cold” regions where the fluctuations of the base pairs are much smaller. Of course energy equipartition is indeed obeyed, but to observe it one must follow the dynamics on very long time scales. On times which can be as large as thousands of periods of the small amplitude

10 vibrations of the bases, one finds “hot regions” coexisting with “cold regions”, but, on even longer times, the amplitude of the fluctuations in some hot regions will decrease while they grow in some other regions. A true uniformity in the fluctuations is never achieved but the inhomogeneities move from place to place. This phenomenon is reminiscent of turbulence, where pressure in a fluid shows large local fluctuations although an average pressure can be well defined. A quantitative analysis of the properties of a lattice of coupled nonlinear oscillators shows that the relationship with turbulence may go well beyond a simple qualitative analogy [20]. To investigate the thermal properties of the DNA model further, one needs to use the methods of statistical physics. For such a one-dimensional model with nearest neighbor interactions, the calculation of the canonical partition function Z and the free energy F = −kB T ln Z is possible [21]. From these quantities, one can calculate all the quantities of physical interest, and for instance the fraction of closed base pairs at a given temperature, or the entropy of the system as shown on Fig. 10. First, this figure shows that there exist a particular temperature Tc above which the fraction of closed base pairs vanishes. This temperature is the temperature of the “melting” of DNA the we introduced in Sec. II. From the fundamental point of view this result is interesting because it shows that the one-dimensional DNA model that we have introduced can have a phase transition at finite temperature, contrary to the common expectation for such a one-dimensional system with short range interactions [22, 23]. However the result is not very satisfactory because the transition occurs smoothly, on a wide temperature range, in contradiction with the experimental observations. This suggests a drawback in the model, which is due to the oversimplified form chosen for the stacking interaction (Eq. 3). The interaction between the bases of DNA is more complex, but we can qualitatively predict that, when base pairs are open, which means that the overlap of the plateaux of consecutive bases decreases, the strength of the stacking interaction decreases as well. This effect can be introduced in the model as a nonlinear stacking interaction ´ 1 ³ W (yn , yn−1 ) = K 1 + ρe−δ(yn +yn−1 ) (yn − yn−1 )2 . (4) 2 With such an interaction potential Fig. 10 shows that the transition becomes very sharp. This can be understood qualitatively. When a small bubble is formed locally by a large fluctuation, the stacking interaction drops to a lower value in the two pieces of strands within this bubble. As a result these parts become very flexible, and can fluctuate more easily, increasing the entropy of the system. This entropy increase tends to decrease the free energy F = E − T S of the system, where E is the energy and S the entropy. It implies that, in spite of the energy cost associated to the breaking of the hydrogen bonds linking the bases in pairs, the opening of the bubble can lead to a free energy gain. This tends to promote an extension of the bubble. Therefore the nonlinear stacking interaction increases the cooperativity of the opening transition, which can become very sharp, in agreement with the experiments. This example points out the interest of working with a system like DNA for which detailed experiments can be performed. The comparison between theory and experiments imposes some constraints on the model. The study of the nonlinear excitations can then become more relevant for physics because they are made on a model which has been validated, at least for some aspects. C.

Quantitative properties of DNA fluctuations: sequence effects.

The studies presented up to now have considered a homogenous nonlinear lattice, where all sites are equivalent, which corresponds to a DNA homopolymer. They suggest that the model is sufficient to describe actual DNA properties, such as the existence of “DNA breathing” or its melting transition. Thus it is tempting to extrapolate to real DNA the properties that we have found with the nonlinear model, such as the temporary breaking of equipartition which is associated to the coexistence of “hot” and “cold” regions, due to the discrete breathers. This would imply that computer simulation studies in the spirit of Fermi-Pasta-Ulam, i.e. with a highly simplified model of the actual system, are useful to infer peculiar properties of a complex physical system. Before drawing such a conclusion one must check the validity of the model by more quantitative studies which compare experimental results to theoretical (or numerical) predictions. For that purpose one must not ignore the essential property of DNA: it stores the genetic code and therefore it is not a homopolymer. The base sequence means that the model is inhomogeneous. It appears as disordered even though this “disorder” actually codes some meaningful information. In the domain of nonlinear science, this immediately means that we touch a difficult topic which was the object of numerous studies for many years, and is still not fully understood [24]. One of the questions is the respective role of disorder and nonlinearity in the existence of the localized excitations. For DNA, the sequence is made of AT pairs linked by two hydrogens bonds and GC pairs linked by three hydrogens bonds. The natural idea to introduce sequence effects in the Hamiltonian (1) is to modify the on-site potential V (yn ) which describes the binding between the bases, by introducing site-dependent parameters Dn , an , which take two

11

FIG. 10: Theoretical result for the fraction of closed base pairs versus temperature (in this calculation a base pair is considered as closed if the average stretching of the bond linking the bases is smaller than yc = 2 ˚ A). Inset: variation of the entropy versus temperature. The temperature T is given in reduced units, by dividing it by Tc , the temperature at which the denaturation transition occurs. The triangles correspond to the model where the stacking interaction is harmonic (Eq. 3) and the diamonds correspond to the nonlinear stacking interaction W (Eq. 4).

different values, depending on the nature AT or GC of the base pair. And because the interaction potentials of the model are effective potentials which are very difficult to calculate quantitatively these parameters have been determined by fitting actual denaturation curves of short DNA molecules having different sequences [25]. The values obtained by Campa and Giansanti in these studies are DAT = 0.05 eV, DGC = 0.0755 eV, aAT = 4.20 ˚ A−1 , aGC = 6.90 ˚ A−1 , 2 −1 ˚ ˚ K = 0.025 eV/A , ρ = 2.0, δ = 0.35 A . Once these parameters have been determined the model can be used to study other sequences, for instance to predict the local opening as a function of temperature. A first study along this line has shown that this approach can be fruitful because the model is found to be in good agreements with experimental observations using biological methods [26]. We showed however that the a model based only on a description of the sequence in terms of the on-site potential V (y) is not sufficient to predict biologically significant sites, such as initiation sites where the reading of a gene starts [13]. This indicates that the sequence should enter in the model in a more subtle way. Measurements of the stacking energies of different base-pair doublets, shown on Fig. 11, show a very good correlation with the melting temperature of the corresponding homopolymers and attract the attention on the large differences which are observed in the stacking interactions of different doublets. This points out the need to take into account the inhomogeneities of the stacking when the sequence is introduced in the theoretical model. But this has an immediate consequence: it drastically increases the number of parameters of the model, and one must make sure that the results that one can extract from the theory stay meaningful: with a sufficient number of parameters, one can fit any curve! However this problem is not as serious as it appears at a first glance if the same parameter set is used to analyze experiments on all sequences because a finite number of parameters is involved in the description of a number of experiments that is, in principle, infinite. Determining the appropriate parameter set is nevertheless a challenge. Fortunately, as shown in Sec. II, experiments can now provide a large body of data, obtained by different methods, against which the theoretical analysis can be confronted. The measurements involve both local and global measurements. In the current studies we are considering global measurements, i.e. the denaturation curves of some specific sequences, to calibrate the model. The next stage will be to test the results with the local measurements presented in Sec. II. The determination of a parameter set suitable to quantitatively describe the properties of various DNA sequences is still a work in progress. Let us present here some results that illustrate the achievements and difficulties of this question. In a clever series of experiments, Montrichok et al. [28] managed to get experimental results on DNA denaturation which go beyond the standard melting curves obtained by UV absorbance measurements. The standard methods measure the fraction of open base pairs at a given temperature. But a fraction of 0.5 for instance can indicate that all DNA molecules in the sample are half open, or that 50% of the molecules are fully open and the others fully closed. On short sequences, this situation is perfectly possible because the denaturation is not a true phase transition

12 150 C.G G.C C.G A.T −9

G.C G.C G.C C.G

−8

(kcal/mol )

A.T A.T

T.A A.T G.C T.A

MN

G.C A.T

A.T G.C

∆H

TMN (°C)

100

50

−10

A.T T.A −7 0

−5 −10 Stacking Energy (kcal/mol doublet)

−15

FIG. 11: Variation of the melting temperatures and stacking energies of different DNA sequences (from [27]).

FIG. 12: The denaturation curve of the sequence CATAATACTTTATATTTAATTGGCGGCGCATCGGGACCCGTGCGCCGCC obtained by Montrichok et al. [28]. The open circles show the standard denaturation curve obtained from a UV absorbance experiment. The squares show the fraction of open base pairs for partly open molecules, and the black points show the fraction of fully open molecules. This particular sequence tends to open partially (approximately to 70% of its length), and complete the denaturation of the last part at a higher temperature, only after some plateau in the opening.

as in an infinite system. Instead it is an equilibrium between different species. What Montrichok et al. are able to measure is the nature of the species at a given temperature, closed double helices, partly open molecules, and single strands resulting from the full denaturation of some molecules. The trick is to consider well chosen test sequences. In these sequences, each single strand, part of the double helix, has its two end sections made of bases which are complementary of a corresponding base at the other end section of the same strand. As a result, the single strand can fold on itself and assemble its two ends sections in a short double helix, while the central part of the strand, not assembled with a complementary part, makes a loop. In this self-assembled state, each single strand has the shape of a hairpin. Thus, when the two strands that make up the double helix in the sequence which is studied are separated from each other by heating, they can evolve along two ways in subsequent cooling: i) each strand can make a hairpin, ii) the two strands reassemble to make a double helix. But the second way is much slower because the two strands have to find each other in solution before assembling into a double helix. On the contrary the formation of the hairpins can be very quick because the two ends of the strand that self-assemble are close to each other since they belong to the same single strand. This allowed the authors of this study to answer the question that we raise above

13 h!

(a)

(b)

FIG. 13: Theoretical calculation of the opening versus temperature of the DNA sequence shown in Fig. 12. The upper plots show the fraction of open base pairs versus temperature, while the lower surface plots show the probability of opening versus T for each base pair. (a) calculation made with the model parameters determined by Campa and Giansanti [25]. The sequence is only introduced through the on-site Morse potential. (b) calculation made with a parameter set which introduces sequence-dependent parameters for stacking as well.

when a UV experiment gives an open fraction of 0.5. At this stage they cool the DNA solution very quickly. If the DNA molecules were partly open they reassemble in double helices because the two strands were not fully separated. If 50% of the molecules were fully open, with the quick cooling they make hairpins and the outcome is one half the DNA in hairpin form and the other half in double helices (those 50% that were still closed). Figure 12 shows the result of such an experiment. The denaturation curve shows a slope change when the fraction of open base pairs reaches approximately 0.7, and the study is able to indicate that it is associated to a partial opening of the molecules: 70% of the base pairs open in some temperature range, and then, in a second stage at higher temperature the last segment breaks and the two strands separate. In order to confront the model to such experimental results, we need to calculate how the molecule opens with temperature, not only on average for the full molecule, but also locally for each base pair. This is given by a statistical physics calculation [29]. For base pair n we can introduce a characteristic function that indicates whether it is closed of open θn (yn ) = θ(yn − ξ) .

(5)

Here θ(.) is the Heaviside step function, equal to 1 when the stretching yn of base pair n exceeds a threshold ξ

14 (chosen as ξ = 2 ˚ A in our calculations) and 0 otherwise. Computing the statistical average of θn (yn ) as a function of temperature R N dy θn (yn ) exp[−βU (y N )] R hθn (yn )i = (6) dy N exp[−βU (y N )] where U is the potential energy of the model (from Eq. 1) and y N designates the whole set of variables of the system of interest (y1 , y2 , . . . yN , N = 48 for the sequence shown in Fig. 12) gives the probability of opening of base pair n versus temperature. This integral is highly multidimensional but, for the one-dimensional model that we study, it is factorizable and can be computed in a few hours of computation which would otherwise take years if molecular dynamics were used to reach a similar accuracy [29]. Figure 13 compares the results obtained with the parameters determined by Campa and Giansanti [25] and with a parameter set that includes, in addition, the sequence effects in the staking interactions. Comparing this figure with the experimental results of Fig. 12 clearly shows that introducing the sequence in the intra base-pair interaction (making the Morse potential sequence dependent) is not enough. The surface plot of Fig. 13.a indicates a tendency of the molecule to open first at one end (towards the low index n, which is the part of the sequence which is richer in AT pairs) but this effect is very weak, and the denaturation curve is smooth, without the slope change observed in the experiments. On the contrary Fig. 13.b indicates that the molecule opens first in one end, and only at a higher temperature in the other end, giving a melting curve with a slope variation in the middle. The result is not yet quantitatively perfect because the variation in the opening rate appears when about 50% of the molecule is open, instead of the 70% observed in the experiments. In order to get better results we need to take account three-body interactions, which have to be evaluated using the local-opening measurements introduced in Sec. II. This example shows the limits of a simple model to describe the properties of DNA, which are observed more and more precisely with different experimental methods. However, although it is not yet fully quantitative, the model is nevertheless beginning to be able to describe some subtle properties, such as the variation of the opening rate versus T as a function of the sequence. It is no longer confined in a simple calculation of the denaturation temperature of the molecule. IV.

CONCLUSION

In this review of current studies of DNA based on a simple model which does not attempt to introduce all degrees of freedom, we have exhibited a few points of interest: • The model exhibits very interesting properties such as the spontaneous formation of highly localized large-amplitude fluctuations which lead to a temporary breaking of energy equipartition because “hot regions” can coexist with “cold regions” for a time scale which is much larger than the period of the vibrational modes. This is similar to the observation made by Fermi-Pasta-Ulam with a simple model of a solid, which suggested that equipartition of energy was not reached. Of course this does not mean that statistical physics is wrong because equipartition is recovered on average on a very long time scale. Although the experiments on DNA do not exactly observe this phenomenon they nevertheless clearly show that DNA can have spontaneous large amplitude, highly localized excitations, the “breathing of DNA” observed by the biologists. • A proper introduction of the effect of the sequence is necessary to obtain useful results on DNA. It has to be done by taking into account the inhomogeneities in the stacking interactions and not only those arising from the base pairing. But, in terms of nonlinear science it raises difficult questions. Are the localized excitations in DNA due to the sequence only or are they dominated by nonlinear effects? It would be interesting to investigate the role of the sequence in a study of the localization phenomenon, which leads to large amplitude discrete breathers when the model is excited by a plane wave in the absence of thermal fluctuations [16]. Investigations of thermal “bubbles” has shown that sequence effects may be subtle [13]. • Modeling DNA at a meso-scale, with only one degree of freedom per base pair can give useful results although it cannot claim to describe all the dynamical features of the molecule. Some studies [26] suggest that it could even have biological applications by providing a method for sequence screening that could be able to exhibit structures which are not detected by usual methods based on a lexical analysis of the sequence. • Perhaps the main lesson for nonlinear science is that DNA can be a marvelous object of studies for nonlinear physics. Its structure is rich but much simpler than the structure of proteins. It exhibits very large amplitude fluctuations, which are essential for biological functions, and can be studied experimentally with many tools borrowed from biology. On the theoretical side, DNA can be described in a meaningful way by fairly simple models, which are amenable to studies based on the theory of nonlinear dynamical systems [30].

15 Acknowledgments

Part of this work has been supported by R´egion Rhˆone Alpes under the program SRESR–Cible 2006.

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]

[25] [26] [27] [28] [29] [30]

J.D. Watson and F.H.C. Crick, Molecular structure of nucleic acids. Nature 171 737-738 (1953) W. Saenger, Principles of Nucleic Acid Structure, Springer Verlag, Berlin (1984) C.R. Calladine and H.R. Drew, Understanding DNA, Academic Press, London (1992) M. Gueron, M. Kochoyan and J. L. Leroy, A single mode of DNA base-pair opening drives imino proton exchange. Nature 328, 89 (1987). M. Frank-Kamennetskii, How the double helix breathes. Nature 328, 17 (1987). R.B. Inman and R.L. Baldwin, Helix-Random Coil Transitions in DNA Homopolymer Pairs. J. Mol. Biol. 8, 452-469 (1964) R.M. Wartell and A.S. Benight, thermal denaturation of DNAmolecules: a comparison of theory with experiments, Physics Reports 126, 67 (1985) G. Altan-Bonnet, A. Libchaber and O. Krichevsky, Bubble Dynamics in Double-Stranded DNA. Phys. Rev. Lett. 90, 138101 (2003) A. Spassky and D. Angelov, Temperature-dependence of UV Laser one-electron oxidative guanine modifications as a probe of local stacking fluctuations and conformational transitions. J. Mol. Biol. 323 9-15 (2002) T. Douki, D. Angelov, and J. Cadet, UV Laser Photolysis of DNA: Effect of Duplex Stability on Charge-Transfer Efficiency. J. Am. Chem. Soc. 123 11360-11366 (2001) D. Angelov, B. Beylot and A. Spassky, Origin of the heterogeneous distribution of the yield of guanyl radical in UV laser photolyzed DNA. Biophys. J. 88 2766-2778 (2005) S. Cuesta-L´ opez, D. Angelov and M. Peyrard, unpublished. Titus S. van Erp, Santiago Cuesta-Lopez, Johannes-Geert Hagmann, Michel Peyrard, Can one predict DNA Transcription Start Sites by studying bubbles? Phys. Rev. Lett. 95, 218104 (2005) D. Poland and H.A. Scheraga, Occurence of a phase transition in nucleic acid models. J. Chem. Phys. 45 1464-1469 (1966) E. Yeramian, Gene and the physics of the DNA double helix. Gene 255 139-150 (2000) I. Daumont, T. Dauxois and M. Peyrard, Modulational instability: first step toward energy localization in nonlinear lattices. Nonlinearity 10, 617-630 (1997) R.S. MacKay and S. Aubry, Proof of existence of breathers for time-reversible or Hamiltoniian networks of weakly coupled oscillators. Nonlinearity 7, 1623-1643 (1994) S. Nose, A unified formulation of the constant temperature molecular dynamics methods, J. Chem. Phys. 81, 511-519 (1984) G.J. Martyna, M.L. Klein and Mark Tuckerman, Nos´e-Hoover chains: The canonical ensemble via continuous dynamics. J. Chem. Phys. 97, 2635-2643 (1992) I. Daumont and M. Peyrard, One dimensional “turbulence” in a discrete lattice. Chaos 13, 624-636 (2003) M. Peyrard, Nonlinear dynamics and statistical physics of DNA. Nonlinearity 17 R1-R40 (2004) T. Dauxois, N. Theodorakopoulos and M. Peyrard, Thermodynamic instabilities in one dimension: correlations, scaling and solitons. J. Stat. Phys. 107, 869-891, (2002) J.A. Cuesta and A. Sanchez, General non-existence theorem for phase transitions in one-dimensional systems with short range interactions, and physical examples of such transitions . J. Stat. Phys. 115 869-893 (2004) Nonlinearity with disorder, Tashkent, October 1-7, 1990 Abdullaev Fatkulla ; Bishop Alan R. ; Pnevmatikos Stephanos Eds, Springer, Berlin (1984) Nonlinearity and Disorder: Theory and Applications, Tashkent, October 2-6, 2001 Fatkhulla Abdullaev, Ole Bang, Mads Peter Srensen Eds, NATO Sciences Series, Kluwer Academic Publishers (2001) A. Campa and A. Giansanti, Experimental tests of the Peyrard-Bishop model applied to the melting of very short DNA chains. Phys. Rev. E 58, 3585-3588 (1998) Chu H. Choi, G. Kalosakas, K. O. Rasmussen, M. Hiromura, A.R. Bishop and A. Usheva, DNA dynamically directs its own transcription initiation Nucleic Acid Research 32 1584-1590 (2004) O. Gotoh and Y. Tagashira, Sensibilities of neares-neighbor doublets in double helical DNA determined by fitting calculated melting profiles to observed profiles. Biopolymers 20 1033-1042 (1981) A. Montrichok, G. Gruner and G. Zocchi, Trapping intermediates in the melting transition of DNA oligomers. Europhys. Lett. 62, 452-458 (2003) Titus S. van Erp, Santiago Cuesta-Lopez, Michel Peyrard, Bubbles and denaturation in DNA. Eur. Phys. J. E 20, 421-434 (2006) N. Theodorakopoulos, M. Peyrard and R.S. MacKay, Nonlinear structures and thermodynamic instabilities in a onedimensional lattice system Phys. Rev. Lett. 93, 258101-1-4 (2004)