Improving the replica exchange molecular dynamics

Improving the replica exchange molecular dynamics method for efficient sampling in the temperature space

Changjun Chen, Yi Xiao and Yanzhao Huang* Biomolecular Physics and Modelling Group, School of Physics Huazhong University of Science and Technology, Wuhan 430074 , Hubei, China

ABSTRACT Replica Exchange Molecular Dynamics (REMD) is a popular sampling method in the molecular simulation. By frequently exchanging the replicas at different temperatures, the molecule can jump out of the minima and sample efficiently in the conformational space. Although REMD has been shown practical in a lot of applications, it does have a critical limitation. All the replicas at all the temperatures must be simulated for a period between the replica exchange steps. This may be problematic for the reaction with high free energy barriers. In that case, too many replicas are required in the simulation. To reduce the calculation quantity and improve its performance, in this paper, we propose a modified REMD method. During the simulation, each replica at each temperature can stay in either the active or inactive state, and only switch between the states at the exchange step. In the active state, the replica samples freely in the canonical ensemble by the normal molecular dynamics, and in the inactive state, the replica is frozen temporarily until the next exchange step. The number of the replicas in the active states (active replicas) depends on the number of CPUs in the computer. Using the additional inactive replicas, one can perform a REMD simulation in a wider temperature space. The practical applications show that the modified REMD method is reliable. With the same number of active replicas, the modified REMD method can provide a more reasonable free energy surface around the free energy minima than the standard REMD method.

Keywords: Conformational sampling; Replica exchange molecular dynamics; Free energy calculation; Molecular dynamics PACS numbers(s): 87.15.ap 87.15.hp 87.15.Cc

*

Corresponding author: [email protected] 1

I. Introduction Conformational sampling is critical in molecular simulation. By sufficient sampling in all the important states of the molecule, one can do a reliable analysis on its free energy surface and determine the reaction path. To enhance the sampling, a lot of methods have been proposed before, like Replica Exchange Molecular Dynamics (REMD) [1-5], Accelerated Molecular Dynamics [6, 7], Essential Dynamics (ED) [8], Amplified Collective Motion (ACM) [9] and Directed Essential Dynamics (DED) [10]. Of these, REMD is the most widely used method. REMD simulates a lot of replicas of the molecule at different temperatures simultaneously or sequentially. Each pair of replicas at the neighboring temperatures can be exchanged with a special probability at the fixed time step (exchange step). So all the replicas have opportunities to increase their own kinetic energies and escape from the minima on the free energy surface. This is the reason for its high sampling efficiency in the conformational space. Since REMD is rather similar to normal molecular dynamics (MD) except the replica exchange strategy, its implementation is rather easy. As a powerful sampling tool, REMD has been used in various fields of research. Till 2012, a Microsoft Academic Search (website: http://academic.research.microsoft.com/) shows that over 291 papers on REMD have been published in the academic journals, and they are cited over 1922 times. Besides lots of applications, the validity of REMD has also been discussed frequently. In the past, the free energies or free energy surfaces of small peptides (like 21-residue peptide in Ref. [11], 7-residue peptide in Ref. [12] and 5-peptide in Ref. [13]) from the REMD simulations were compared to those from the long-time MD simulations. The results show that REMD can produce the correct data and converge rather fast. If combined with another sampling method such as Accelerated Molecular Dynamics [6, 7], its sampling efficiency can be enhanced further [14]. Moreover, REMD has been modified into various versions in the past for different purposes. For example, original REMD puts replicas at a series of temperatures, and its revised version, λ-REMD [15] puts them at different positions in the reaction coordinates. The latter is much useful for the study of the ligand binding problem. Besides the temperature and the reaction coordinate, even the Hamiltonian of the molecule can be used for exchange in the simulations (Hamiltonian REMD [16-18]). It must be pointed out that, for a reaction with very high free energy barriers, the upper limit of the temperature must be set to high enough, which is necessary for the successful simulation of the reaction. On the contrary, the temperature intervals determine the acceptance ratio of the replica exchange, which must be small enough. It is important for the random walk of the replicas in the temperature space. From another point of view, without high acceptance ratio, the replicas at the low temperatures could be trapped for a long time and produce biased free energy values. So in general, REMD has to use a large enough number of replicas at same time in the simulation. This will greatly increase the calculation quantity. To improve the performance of REMD, in this paper, we make one change to the REMD method. Standard REMD samples in the generalized ensemble (by replica exchange) and canonical ensemble (by normal MD) alternatively in the simulation. These two sampling processes are independent of each other. So even if some of the replicas in the system stop the sampling in the conformational space, they can still perform a random walk in the temperature space by replica exchanges. Based on this idea, we divide all the replicas into two 2

categories: inactive replicas and active replicas. At the exchange step, all the replicas exchange their environment temperatures (or coordinates and velocities) with neighboring replicas according to the Metropolis criterion [19]. Then the active replicas continue the sampling in the canonical ensemble by the normal MD, and the inactive replicas keep their coordinates and velocities until the next exchange step. This is the critical modification to the standard REMD method. It allows a few active replicas to walk in a wide temperature space. Moreover, in standard REMD, the number of the replicas (temperatures) must be an integral multiple of the number of the CPUs in the computer. This relation limits the temperature distribution of the replicas in the temperature space. As a comparison, these two quantities are completely independent of each other in the modified REMD. We can fix the number of the active replicas according to the number of CPUs, but freely set up the temperature levels for the REMD simulation. So it utilizes the computer resource more efficiently. Formal tests on the ALA dipeptide and trpzip2 show that the modified REMD method is better than the standard REMD method for searching the minima on the free energy surface and calculating their free energies. II. Materials and Methods A. Standard replica exchange molecular dynamics In REMD [1, 2], a number of replicas are set on a sequence of temperatures and simulated independently by normal molecular dynamics. All these non-interacting replicas constitute a global generalized system. Any state of the system can be represented by a state vector s(…, r(i)i , p(i)i , …). Here r and p are the coordinates and momenta of the atoms in each replica. The subscript i represents the index of the replica and the superscript (i) represents the index of the temperature. For example, r(i)i means the coordinates of the ith replica at the ith temperature. The state distribution function of the system in the generalized ensemble, P(s), is just the product of Boltzmann factors of all the replicas [1, 2]

P (s( , ri( i ) , p(i i ) ,)) 

 H (ri( i ) , p(i i ) )  1 exp     Z kBT ( i )   i

(1)

Here H(r(i)i , p(i)i ) is the Hamiltonian of the replica i at ith temperature T(i). kB is the Boltzmann constant. Z is a constant to normalize P(s). In the standard REMD simulation, two random replicas (replica i and replica j) are selected for exchange at the fixed time step. That means the generalized system attempts to transfer from state s1(…, r(i)i , p(i)i , … ,r(j)j , p(j)j , …) to state s2(…, r(i)j , p(i)j , … , r(j)i , p(j)i , …). Here Metropolis criterion [19] can be used to determine the transition probability Pij [1, 2].  exp( V (r j( i ) ) kBT ( i ) )  exp( V (ri( j ) ) kBT ( j ) )  Pij  min 1, (i) (i) ( j) ( j)   exp( V (ri ) kBT )  exp( V (rj ) kBT ) 

(2)

where V(r(i)i ) is the potential energy of the ith replica. Exchange based on such criterion can ensure the detailed balance (P(s1)Pij= P(s2)Pji) in the generalized ensemble during the simulation and produce the limit distribution in Eq.(1). The discussion above only gives the main idea of REMD. Its details are well introduced in Ref [1, 2]. The advantage of REMD is clear. Exchanges between replicas at different temperatures allow them to walk freely in the temperature space. Each replica can pass the 3

free energy barrier at the high temperature and then sample in the canonical ensemble to produce the free energy surface at the room temperature. After REMD was first proposed in 1999 [1], it has become one of the most popular sampling methods in computational physics, chemistry and biology. B. The modified replica exchange molecular dynamics The sampling efficiency of the standard REMD simulation strongly depends on two factors. One is the intervals between the neighboring temperatures. The other is the upper limit of the temperature. The temperature intervals affect the acceptance ratio of the replica exchanges. For example, if one molecule of 100 atoms has two replicas that are placed at 300K and 400K respectively. Then the average kinetic energy difference between the two replicas is about 29.805 kcal/mol. Suppose that the kinetic energy difference is equal to the potential energy difference in the normal MD simulation, the approximate acceptance ratio of the exchange of the two replicas (eq.(2)) is only 3.727×10-6 (neglect the fluctuations of the potential energy). Obviously it is too small. These two replicas can hardly do a random walk in a limited simulation period. The exchange problem is more severe for large systems. Because of great potential energy differences between replicas at different temperatures, the temperature intervals must be decreased further as the system size increases. Besides the temperature intervals, the sampling efficiency of REMD is also determined by the highest temperature in the simulation. There could be lots of minima on the free energy surface. Any of them may trap the replicas for a long time. High enough temperature allows the molecule to cross over the barrier and escape from these minima efficiently. So in summary, both of the temperature intervals and the temperature range are important to the REMD method. However, either decreasing the temperature intervals or expanding the temperature range will immediately increase the number of replicas. Due to the limited computer resources, using too many replicas in the simulation is not practical. In this paper, we improve the standard REMD method for such situation. Actually, although all the replicas constitute a large system in the generalized ensemble, there are no interactions between them. Sampling in the generalized ensemble for the system and sampling in the canonical ensemble for each replica are two independent processes. In other words, it is not necessary to perform MD simulations for all the replicas simultaneously or sequentially in the simulation. It is safe to start the MD simulations for some replicas at the exchange step, and let the rest stay where they are till the next step. For simplicity, the replicas currently controlled by MD are said to be in the active state, and those at rest are said to be in the inactive state. The modified REMD simulation goes as follows. First, a number of temperatures are set from the room temperature to the highest temperature. The upper limit of the temperature must be high enough to ensure sufficient sampling in the whole conformational space. The temperature intervals must be small enough to ensure sufficient exchanges between the replicas. To make the acceptance ratio even for the replica exchange at all the neighboring temperatures, the replica's temperature increases in an exponential form [1]:

 log(Tn1 T0 )  Ti  T0 exp  i n1   4

i  0,  , n  1

(3)

Here n is the number of temperatures and i is the index of each temperature (replica). T0 and Tn-1 are the lower limit and upper limit of the temperature respectively. Second, one short MD simulation is performed for the replicas in order from the lowest temperature to the highest temperature. At the end of each simulation, the coordinates and velocities of the atoms in the molecule are saved as the initial state of the replica. Third, all the replica pairs that have at least one active replica are tried for exchange with a special transition probability (eq.(2)). After the exchange operation, a part of the replicas are randomly selected to be the active replicas and used for the normal MD simulation. The rest of them stay in the inactive states and keep their coordinates and velocities until the next exchange step. The sampling in the generalized ensemble (by replica exchange) and the sampling in the canonical ensemble (by MD simulation) are performed alternatively to the end of the simulation. It must be noted that using more inactive replicas in the simulation can decrease the sampling time at each temperature. So with the same number of active replicas, the free energy surface at each temperature in the modified REMD simulation may converge slower than that in the standard REMD simulation due to less sampling time. But on the other hand, a high replica-exchange rate between neighboring temperatures can also improve the sampling on the free energy surface. Formal simulations show that the latter factor plays the leading role. C. Simulation details To evaluate the validity of the modified REMD method, we use three test models: quantum harmonic oscillator, ALA dipeptide and trpzip2. In quantum mechanics, harmonic oscillator has discrete energy levels that are equally spaced 1 En  ( n  ) 2

(4)

n  0,, 

where ħ is the reduced Planck constant (h/2π) and ω is the angular frequency. n=0 indicates the ground state. The limit distribution in the energy space satisfies the Boltzmann distribution P ( En , T ) 

 E  1 exp   n  Z  kBT 

(5)

where kB is the Boltzmann factor, T is the temperature and Z is the partition function   E  Z   exp   n  n 0  kBT 

(6)

Since the limit distribution function for the harmonic oscillator is already known, its average energy and heat capacity at any temperature T can also be calculated in theory 

E (T )   E n P ( E n , T ) n 0

Cv (T ) 

 E (T ) 2

 E (T )

2

 k T 

(7)

2

B

These theoretic values provide us a good benchmark to check the numerical results. In this work, two different simulations are performed for the quantum harmonic oscillator. The first 5

is a simulation using the parallel tempering method [20] (also called replica exchange method, or "REM" for short [1]). The replica-exchange idea of the method is the same as the standard REMD. In the simulation, the energy level of the harmonic oscillator starts from n=0 to n=10 (eleven energy levels) and the temperature starts from 5.0 to 1000.0 K (determined by eq.(3)). ħω is set to be 100.0 kB (J). At each temperature, there is one replica sampling in the energy space by the standard Monte Carlo method [19]. So totally it has ten active replicas in the simulation. The number of the sampling steps is 1.0×108, sampling data are collected every 5 steps. Exchange between replicas at neighboring temperatures is tried every 20 steps. The second simulation is carried out by the modified replica exchange method proposed in this paper ("modified REM" for short). All the simulation protocols are as same as the previous simulation except the number of the active replicas. Only three active replicas are used in this simulation. The distribution function, average energy and heat capacity are calculated in the simulation and compared with theoretical values. The second model, ALA dipeptide, is a small molecule with only 22 atoms (sequence: ACE-ALA-NME). For this model, we perform three simulations. The first one is a 200 ns standard REMD simulation at seven temperatures: 300K, 383K, 489K, 624K, 797K, 1018K, 1300K (eq.(3)). The second simulation is also a 200 ns standard REMD simulation but only has three active replicas at three temperatures 300K, 624K, 1300K. The last 200 ns simulation applies the modified REMD method discussed in this paper. It places seven replicas at seven temperatures ranging from 300K to 1300K, as same as the first one. But only three of them are active. All the simulations are carried out in vacuum. Replica exchange is tried every 1 ps. The force field used in the simulations is AMBER PARM96 [21]. The molecule dynamics software is Tinker [22]. All the REMD codes are written in Fortran and parallelized by MPICH2 [23]. To compare the performances of the standard REMD and the modified REMD, the free energy surface of the ALA dipeptide in the first simulation (seven active replicas) is used as the benchmark. The free energy surface is a 2D space spanned by two collective variables: Φ and Ψ angles in the backbone [24]. Both of the two collective variables are divided into 40 grids from -π to π, and the free energy values on the grids (40×40) are calculated from the histogram function. We will use the free energy surface to check the data in the second and third simulations (three active replicas) and find out which one is more reliable. The convergence of the free energy calculation is another important parameter. In this work, it is defined by the difference of the free energy surfaces at successive fixed steps (root-mean-square error). 

  1  Fi ( t )  Fi ( t  t )    Fi ( t  t )  Fi ( t )    ngrids i i 1  

ngrids

2

(8)

Here ngrids is the number of grids in the collective variable space. Fi(t) and Fi(t+Δt) are the free energies of the ith grid at time t and t+Δt respectively. The time interval Δt is 1 ns. This free energy difference can be seen as an error function and help us monitor the convergence of the free energy calculation in the simulation. The third test model is a twelve-residue peptide: trpzip2 [25]. Its sequence is SWTWENGKWTWK. This typical peptide has been widely studied by simulations [26, 27] and experiments [28-30] before. All the studies show that trpzip2 has an stable hairpin-like 6

structure, which can be viewed as its native state. So in this work, we perform one 50 ns standard REMD simulation and one 50ns modified REMD simulation for trpzip2 to find out which method is better for searching the free energy minima in the conformational space. Two collective variables are defined for the simulations. One is the RMSD of backbone atoms in the peptide. The other is the average radius of gyration of two aromatic pairs (residues 4-9 and residues 2-11) between the two strands in the hairpin-like structure. Rg 

1 2  Rg (4, 9)  Rg2 (2,11)  2

(9)

Rg(4,9) is the radius of gyration of side chain atoms in residues 4,9 and Rg(2,11) is the radius of gyration for residues 2,11. All the simulations use the AMBER PARM96 force field [21] and Generalized Born/Surface Area (GB/SA) implicit solvent model [31, 32]. The initial structure at the beginning is an extended β-strand [27]. Temperature space ranges from 300K to 800K. For standard REMD, six replicas are placed at six temperature levels: 300K, 365K, 444K, 540K, 657K and 800K. But for modified REMD, there are sixteen replicas at sixteen temperatures: 300K, 320K, 341K, 365K, 389K, 416K, 444K, 474K, 506K, 540K, 576K, 615K, 657K, 701K, 749K, 800K (eq.(3)). At every exchange step, six of them are randomly selected to be the active replicas. The rest of them stay in the inactive state. III. Results and discussions Now we begin to analyze the data in the simulations of the quantum harmonic oscillator. There are ten temperatures in the simulation (from 5.0 to 1000.0). In Fig.1(a), we show the canonical probability distribution on all the eleven energy levels of the oscillator at the lowest and highest temperature. It is clear that both REM and modified REM methods can provide the correct distribution function of the harmonic oscillator. Fig.1(b) and (c) shows the average energy and heat capacity. They are also perfectly in agreement with the theoretical values (eq.(7)). It must be noted that, in the REM simulation, all the replicas at all the temperatures are active. So totally it has ten replicas sampling in their own energy spaces all the time. However, in the modified REM simulation, only three replicas are active. The rest of them, i.e. the inactive replicas, do not move in the energy space. Their actions are to help the other active replicas walk in the temperature space, which is also important for the sampling in the energy space. Well consistent distribution functions, average energies and heat capacities indicate that the modified REM method is as reliable as the standard REM method. And moreover, using the modified REM method is better for us. It eases the burden on the computer CPUs. Small number of active replicas can also produce the physical quantities in a wide temperature space.

7

FIG. 1. (Color online) (a) Canonical distribution function in the energy space of the quantum harmonic oscillator at T = 5.0 K and T = 1000.0 K. The data from the REM simulation (ten active replicas) and modified REM simulation (three active replicas) are shown by the symbols “×” and “◦”, respectively. The lines in the figure represent the theoretical values [Eq. (5)]. (b) Average energies of the harmonic oscillator at all temperatures. (c) Heat capacity of the harmonic oscillator at all temperatures. Theoretical values in (b) and (c) are calculated by Eq. (7). Energy and heat capacity are in unit kB (J) and (J/K) respectively. Now, we turn to the second test model: ALA dipeptide. It is a very small molecule with only 22 atoms. Due to simple structure and all-atom force field, it has been widely used as an ideal model in the test of various sampling algorithms, like Meta-dynamics [33], Adaptive Bias Force (ABF) [34], driven Adiabatic Free Energy Dynamics (d-AFED) [35] and other methods for searching the minimum free energy path [24, 36-38]. Here, to evaluate the modified REMD method, three different simulations are performed for this molecule in vacuum. The first standard REMD simulation (200 ns) has seven replicas at seven temperatures (300K, 383K, 489K, 624K, 797K, 1018K, 1300K). It provides a benchmark for the free energy calculation. As comparison, there are only three active replicas in the second and third simulations, and the displacements of the replicas in the temperature space are different. The three active replicas are fixed to three temperatures (300K, 624K, 1300K) in the second simulation (standard REMD), and they are randomly distributed on three of the seven temperatures at the exchange step in the third simulation (modified REMD). Although the calculation quantities and the temperature ranges of the last two simulations are equal, the temperature intervals are different. Neighboring temperatures are more close to each other in the third simulation. This can effectively improve the replica exchanges. Just like the quantum harmonic oscillator, here we first compare the canonical probability distribution in the energy space for the three simulations (Fig.2). For simplicity, we only show the distributions at three temperatures in the figure. The left, middle and right peaks represent the distributions at 300K, 624K, 1300K respectively. Data from the first, the 8

secoond and thee third simuulations are shown by lines, dash--dot lines aand dashed lines resppectively. Alll of them aree consistent with w each oth her, indicatinng that 200 nns is long eno ough for the t MD simuulations at the fixed tempperatures.

(Collor online) Fig.2 F Canoniccal probabiliity distributio ons of ALA dipeptide in the energy space s at 3000K (left), 624K 6 (middlee) and 1300K K (right) in the t three sim mulations. Thhe lines, dash h-dot liness and dashedd lines repreesent the datta from the first, f the second and the third simulaation resppectively. The ideal teest function for the samppling method ds is the freee energy surfface. In Fig.3 3, we w the free energy e surfaaces of ALA A dipeptide in the threee simulationss. The collective show variables of thee peptide aree the Φ and Ψ angles in n the backbbone. Free ennergy valuess are v spacce. In the fig gure, calculated from the histograam of the samples in thee collective variable face at 300K in the first standard s REM MD simulatiion is shownn in the panel (a). free energy surfa K to Thiss simulationn totally usees seven acttive replicas at seven temperatures (from 300K 13000K). Large number n of reeplicas in thhe temperature space maakes its free energy surfaace a referrence standaard. The surfface at 300 K clearly sho ows three miinima. Two oof them are close c to eaach other annd the third is i isolated. As A marked in n the figure, two minimaa are named C7eq and C7ax respecctively. Theyy are compleetely separateed from eachh other. Misssing free en nergy use of the hiigh free energy barrier. This information on the surface between theem is becau barrrier makes thhe direct trannsferring betw ween the two o minima verry difficult inn the normall MD simuulation at thee room tempperature. Now w in the seco ond REMD simulation, s w we only use three t repliicas, which are a placed att 300K, 624K K, 1300K sep parately. Altthough only aabout one haalf of the replicas r are cut from thee simulation, the averagee acceptance ratio of the replica exch hange decrreases quickly from 0.2884 to 0.001.. Such smalll ratio couldd severely afffect the ran ndom walkk of the repllicas in the temperature t space, and then reduce their t samplinng efficienciies at low temperaturees. Fig.2(b) is the free ennergy surfacee of ALA dippeptide in thhe second RE EMD e valuees are similaar to those in n the simuulation. It allso has threee minima, annd the free energy prevvious simulattion. We exaamine this inn more detaiil below. As comparisonn, the free en nergy surfface in the thhird modifieed REMD simulation is shown in Fiig.2(c). Thiss simulation uses seveen replicas att seven tempperatures (as same as thee first REMD D simulationn). But only three t of them t are raandomly seleected as thee active rep plicas at thee exchange step during g the simuulation. So the t calculatioon quantity is i identical to t the simulaation. More temperaturees (or repliicas) in the temperature space makee the replica exchange more m frequenntly. The aveerage acceeptance ratioo in the moddified REMD D simulation n is 0.284, which w is as ssame as the first 9

seveen-replica REMD R simullation, and much higheer than that in the secoond three-replica REM MD simulatioon (0.001).

(Collor online) Fig.3 F Free eneergy surface of ALA dip peptide in thee 2D collectivve variable space s from m the sevenn-replica RE EMD simulaation (a), th hree-replica REMD sim mulation (b) and three-replica moodified REM MD simulatioon (c). From the informaation, the freee energy pro ofiles m C7eq wn in on thhe path connnecting the minimum e and the miinimum C7axx (marked in (a)) are show (d) by b line, dashh-dot line andd dashed linee respectively y. To check thhe free energgy data, in Fiig.4 (d), we show the freee energy proofile on a sp pecial m C7eq to the miniimum C7ax (m marked by thhe dashed linne connecting the pathh from the minimum two minima in the panel (a)). As the reaction coo ordinate, thee x-axis in tthe figure iss the C eq. The linee, dash-dot line and dashhed line repreesent the pro ofiles distaance to the innitial state C7 in thhe first, the second and the third sim mulation resp pectively. Thhe figure shoows that thee free enerrgy barrier inn the third modified m REM MD simulatiion is a littlee lower than that in the other o two simulations. This is beccause of the less samplin ng time at thhe room tem mperature (30 00K). t at eachh temperaturre in the moodified REM MD simulatio on is Speccifically, thee sampling time 3(acctive replicass)×200(ns)/7= =85.714 ns, and in the otther simulatiions it is 2000 ns. Actually y, we can get the samee barrier by doubling d the simulation time t in the thhird simulatiion. Although h the i the modiffied REMD simulation, considering g the free energy barrrier convergges slower in r (CP PUs) in the simulation, this tradeoff is accepttable. smaaller numberr of active replicas Morreover, besiddes the barriiers, we are more intereested in the minima onn the free en nergy surfface. Correctlly locating thhe minima and a calculatin ng their free energies revveal the samp pling abiliity of the meethod. Based on the inforrmation, we can c find the transition t paath in the reacction. Duee to high acceptance ratioo of the repllica exchangee, the free ennergy differeence between n the two minima of ALA A dipeptidde (C7eq and C7ax) in the modified RE EMD simulaation is 2.53± ±0.08 t first simu ulation (2.61± ± 0.07 kcal/m mol), and grreatly kcall/mol, which is very close to that in the diffeerent to thaat in the second s simuulation (3.3 3±0.5 kcal/m mol). The statistical errors e (rooot-mean-squaare errors) inn the parenthheses are obttained from five indepenndent trajecttories (see next paragrraph). Thesee results indiicate the reliability of thhe modified REMD metthod. ( they are a inactive)) can effectiively Usinng more repplicas at moore temperatture levels (even imprrove the freee energy calcculation. 10

Now, we caalculate the convergence c e of the free energy e surface in the threee simulation ns. It is defined by thhe root-meann-square erroor of the freee energy suurface in the successive 1 ns EMD interrvals (eq.(8))). The results from thee first REMD simulation (line), thee second RE simuulation (dashh-dot line) annd the third modified RE EMD simulaation (dashedd line) are sh hown in Fig.4(a). We find f that all the t root-meaan-square errrors go to thee balance possition after 80 ns. s nergy Thiss indicates thhat 200 ns iss longer enoough for all the REMD simulations. The free en surffaces of ALA A dipeptide in the simulations are convergent c a last. Furthhermore, we also at checck the convvergence of the free energy profilee along the transition ppath between n the miniimum C7eq and a the miniimum C7ax (shown ( in Fiig.3(d)). Forr each kind oof the simulaation (stanndard REMD D or modified REMD), thhe root-mean n-square erroor of the free energy proffile is calculated from five indepenndent 200 ns trajectories. All the errors are shownn in Fig.4(b).. It is f energy error in thhe modified REMD sim mulation is rrather smalll. Its cleaar that the free flucttuation is cloose to that inn the first sttandard REM MD simulatioon. Howeverr, the free en nergy error in the second three-reeplica REMD D simulation n is much laarger. It meaans that the free m sensitivve to the inittial condition ns. enerrgy result is more

(Collor online) Fig.4 F (a) Connvergence off the free en nergy surfacee in the first standard RE EMD simuulation (linee), the secoond standardd REMD sim mulation (ddash-dot linee) and the third moddified REMD D simulationn (dashed linne). The datta are updateed every 1 nns by eq.(8)). (b) Aveerage root-meean-square errors e of the free energy profiles p on thhe transitionn path in the three t simuulations (seee the locatioon of the path p in Fig.3 3(a) and thee profiles inn Fig.3(d)). The roott-mean-squarre error of each simulaation is callculated from m five indeependent 200 ns trajeectories. Previously we show thhat the modiified REMD method cann produce m more reliable free f ALA dippeptide at thee minima, esspecially thee stable statee (C7ax) far away a enerrgy surface for from m the initial state (C7eq). This is veery importan nt. It indicatees that the m modified RE EMD methhod is not onnly a good free fr energy calculation c to ool but also a data mininng tool to exp plore morre and more stable s states in the confoormational sp pace. To illusstrate its perfformance, att last, we perform p the standard RE EMD simulattion and mod dified REMD D simulation for the third d test moddel: trpzip2. Both B simulattions use six active repliccas from 3000K to 800K aand last for 50 ns. 11

But the modifiedd REMD sim mulation has ten more inaactive replicaas. The peptiide starts from m an n "Materialss and Methoods", this pep ptide exteended β-strannd conformattion [27]. Ass discussed in has a hairpin-likke native staate [27]. So the t initial staate is very greatly g differrent to the native n he free energgy surface oof trpzip2 in both statee (backbone RMSD 11.5559 Å). Heree we show th simuulations in Fig.5. F The collective c vaariables are backbone b RM MSD and aaverage radiu us of gyraation of arom matic pairs (eq.(9)). Froom the free energy surface, it is cclear that trp pzip2 succcessively sam mples the nattive state in the modified d REMD sim mulation (bacckbone RMS SD < 1.0Å Å) (Fig.5(b))), but it faills to do so in the stan ndard REMD D simulationn (Fig.5(a)). The nearr-native statee at positionn (2.0, 5.0) traps t the pep ptide till the end of the simulation. This com mparison indiicates the higgh sampling ability of th he modified REMD methhod. Using more m inacctive replicass in the simullation is helppful for the conformation c n sampling aas well as thee free enerrgy calculatioon.

(Collor online) Fig.5 F Free ennergy surfacee of trpzip2 in i the standaard REMD siimulation (a)) and the modified RE EMD simulaation (b). Thhe two collecctive variablles are backbbone RMSD D and o gyration off the aromatiic pairs (eq.(9 9)). averrage radius of IV V. Conclusio on As an effiicient samplling algorithhm, REMD method haas been widdely used in n the mperature space allow alll the replicaas to molecular simullations. Randdom walkingg in the tem To maximizee the escaape from the meta-stablle states and pass the free energyy barriers. T sam mpling efficieency, the tem mperature inntervals musst be small enough and the temperaature rangge must be wide enouggh. This willl inevitably increase thee number of replicas in n the simuulation and require r more and more coomputer reso ources. I this paperr, we proposee a modifiedd REMD metthod to circuumvent this isssue. It uses a lot In of innactive replicas as welll as the actiive replicas. Active repllicas are ressponsible forr the sam mpling in the canonical ensemble e andd the inactiv ve replicas serve s as a brridge to help p the 12

active replicas walk in a wider temperature space. The modified REMD method has two advantages. On the one hand, it provides us a flexible way to perform the REMD simulation. One can freely set up the number of active replicas as well as the number of the temperatures in the simulation. This is different to the standard REMD, which requires the number of replicas (temperatures) to be an integral multiple of the number of the CPUs in the computer. On the other hand, it improves the sampling in both of the temperature space and the energy space by using the inactive replicas. Or in other words, with the same number of active replicas, this method is more effective than standard REMD method. It can provide more reasonable free energy values of the minima on the free energy surface. Acknowledgments This work was supported partially by the National Natural Science Foundation of China (No. 31370848, No. 11074084, No. 11174093).

13

References [1] Y. Sugita and Y. Okamoto, Replica-exchange molecular dynamics method for protein folding, Chem. Phys. Lett. 314, 141 (1999). [2] Y. Sugita, A. Kitao, and O. Y., Multidimensional replica-exchange method for free-energy calculations, J. Chem. Phys. 113, 6042 (2000). [3] M. Andrec, A. K. Felts, E. Gallicchio, and R. M. Levy, Protein folding pathways from replica exchange simulations and a kinetic network model, Proc. Natl. Acad. Sci. USA 102, 6801 (2005). [4] W. Li, J. Zhang, J. Wang, and W. Wang, Metal-coupled folding of Cys2His2 zinc-finger, J. Am. Chem. Soc. 130, 892 (2008). [5] K. Ostermeir and M. Zacharias, Advanced replica-exchange sampling to study the flexibility and plasticity of peptides and proteins, Biochim. Biophys. Acta 1834, 847 (2013). [6] D. Hamelberg, J. Mongan, and J. A. McCammon, Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules, J. Chem. Phys. 120, 11919 (2004). [7] D. Bucher, L. C. T. Pierce, J. A. McCammon, and P. R. L. Markwick, On the use of accelerated molecular dynamics to enhance configurational sampling in ab Initio simulations, J. Chem. Theory Comput. 7, 890 (2011). [8] I. Daidone, A. Amadei, D. Roccatano, and A. D. Nola, Molecular Dynamics Simulation of Protein Folding by Essential Dynamics Sampling: Folding Landscape of Horse Heart Cytochrome c, Biophys. J. 85, 2865 (2003). [9] Z. Zhang, Y. Shi, and H. Liu, Molecular dynamics simulations of peptides and proteins with amplified collective motions, Biophys. J. 84, 3583 (2003). [10] C. Chen, Y. Xiao, and L. Zhang, A Directed Essential Dynamics Simulation of Peptide Folding, Biophys. J. 88, 3276 (2005). [11] W. Zhang, C. Wu, and Y. Duan, Convergence of replica exchange molecular dynamics, J. Chem. Phys. 123, 154105 (2005). [12] X. Periole and A. E. Mark, Convergence and sampling efficiency in replica exchange simulations of peptide folding in explicit solvent, J. Chem. Phys. 126, 014903 (2007). [13] E. Rosta and G. Hummer, Error and efficiency of replica exchange molecular dynamics simulations, J. Chem. Phys. 131, 165102 (2009). [14] M. Fajer, D. Hamelberg, and J. A. McCammon, Replica-Exchange Accelerated Molecular Dynamics (REXAMD) Applied to Thermodynamic Integration, J. Chem. Theory Comput. 4, 1565 (2008). [15] W. Jiang, M. Hodoscek, and B. Roux, Computation of Absolute Hydration and Binding Free Energy with Free Energy Perturbation Distributed Replica-Exchange Molecular Dynamics (FEP/REMD), J. Chem. Theory Comput. 5, 2583 (2009). [16] G. Xu, J. Wang, and H. Liu, A Hamiltonian replica exchange approach and its application to the study of side chain type and neighbor effects on peptide backbone conformations, J. Chem. Theor. Comput. 4, 1348 (2008). [17] V. Babin and C. Sagui, Conformational free energies of methyl-alpha-L-iduronic and methyl-beta-D-glucuronic acids in water, J Chem Phys 132, 104108 (2010). [18] W. Jiang and B. Roux, Free Energy Perturbation Hamiltonian Replica-Exchange Molecular Dynamics (FEP/H-REMD) for Absolute Ligand Binding Free Energy Calculations, J. Chem. Theory. Comput. 6, 2559 (2010). [19] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, Equation of state 14

calculations by fast computing machine, J. Chem. Phys. 21, 1087 (1953). [20] U. H. E. Hansmann, Parallel tempering algorithm for conformational studies of biological molecules, Chem. Phys. Lett. 281, 140 (1997). [21] W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. J. Merz, D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. A. Kollman, A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules, J. Am. Chem. Soc. 117, 5179 (1995). [22] P. Ren and J. W. Ponder, Polarizable Atomic Multipole Water Model for Molecular Mechanics Simulation, J. Phys. Chem. B 107, 5933 (2003). [23] D. Buntinas, C. Coti, T. Herault, P. Lemarinier, L. Pilard, A. Rezmerita, E. Rodriguez, and F. Cappello, Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI protocols, Future Gener. Comp. Sy. 24, 73 (2008). [24] C. Chen, Y. Huang, W. Jiang, and Y. Xiao, A fast tomographic method for searching the minimum free energy path, J. Chem. Phys. 141, 154109 (2014). [25] A. G. Cochran, N. J. Skelton, and M. A. Starovasnik, Tryptophan zippers: stable, monomeric β-hairpins, Proc. Natl. Acad. Sci. USA 98, 5578 (2001). [26] J. Zhang, M. Qin, and W. Wang, Folding Mechanism of β-Hairpins Studied by Replica Exchange Molecular Simulations, Proteins: Struct. Funct. Bio. 62, 672 (2006). [27] C. Chen and Y. Xiao, Observation of multiple folding pathways of β-hairpin trpzip2 from independent continuous folding trajectories, Bioinformatics 24, 659 (2008). [28] W. Y. Yang, J. W. Pitera, W. C. Swope, and M. Gruebele, Heterogeneous Folding of the trpzip Hairpin: Full Atom Simulation and Experiment, J. Mol. Biol. 336, 241 (2004). [29] D. Du, Y. Zhu, C. Y. Huang, and F. Gai, Understanding the key factors that control the rate of β-hairpin folding, Proc. Natl. Acad. Sci. USA 101, 15915 (2004). [30] C. D. Snow, L. Qiu, D. Du, F. Gai, S. J. Hagen, and V. S. Pande, Trp zipper folding kinetics by molecular dynamics and temperature-jump spectroscopy, Proc. Natl. Acad. Sci. USA 101, 4077 (2004). [31] V. C. Still, A. Tempezvk, R. C. Hawley, and T. Hendrickson, Semianalytical treatment of solvation for molecular mechanics and dynamics, J. Am. Chem. Soc. 112, 6127 (1990). [32] D. Qiu, P. S. Shenkin, F. P. Hollinger, and W. C. Still, The GB/SA Continuum Model for Solvation. A Fast Analytical Method for the Calculation of Approximate Born Radii, J. Phys. Chem. A 101, 3005 (1997). [33] B. Ensing, M. D. Vivo, Z. Liu, P. Moore, and M. L. Klein, Metadynamics as a Tool for Exploring Free Energy Landscapes of Chemical Reactions, Acc. Chem. Res. 39, 73 (2006). [34] J. Henin, G. Fiorin, C. Chipot, and M. L. Klein, Exploring Multidimensional Free Energy Landscapes Using Time-Dependent Biases on Collective Variables, J. Chem. Theory Comput. 6, 35 (2010). [35] J. B. Abrams and M. E. Tuckerman, Efficient and Direct Generation of Multidimensional Free Energy Surfaces via Adiabatic Dynamics without Coordinate Transformations., J. Phys. Chem. B 112, 15742 (2008). [36] L. Maragliano and E. Vanden-Eijnden, On-the-fly string method for minimum free energy paths calculation, Chem. Phys. Lett. 446, 182 (2007). [37] C. Chen, Y. Huang, and Y. Xiao, Free-energy calculations along a high-dimensional fragmented path with constrained dynamics, Phys. Rev. E. 86, 031901 (2012). [38] C. Chen, Y. Huang, X. Ji, and Y. Xiao, Efficiently finding the minimum free energy path from steepest descent path, J. Chem. Phys. 138, 164122 (2013). 15

16