INSTITUTE OF PHYSICS PUBLISHING
PHYSICAL BIOLOGY
doi:10.1088/1478-3975/2/4/S08
Phys. Biol. 2 (2005) S137–S147
Protein flexibility using constraints from molecular dynamics simulations Tatyana Mamonova1, Brandon Hespenheide2, Rachel Straub1, M F Thorpe2 and Maria Kurnikova1 1 2
Chemistry Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA Department of Physics and Astronomy, Arizona State University, Tempe, AZ 85287, USA
E-mail:
[email protected]
Received 4 July 2005 Accepted for publication 22 August 2005 Published 9 November 2005 Online at stacks.iop.org/PhysBio/2/S137 Abstract Proteins are held together in the native state by hydrophobic interactions, hydrogen bonds and interactions with the surrounding water, whose strength as well as spatial and temporal distribution affects protein flexibility and hence function. We study these effects using 10 ns molecular dynamics simulations of pure water and of two proteins, the glutamate receptor ligand binding domain and barnase. We find that most of the noncovalent interactions flicker on and off over typically nanoseconds, and so we can obtain good statistics from the molecular dynamics simulations. Based on this information, a topological network of rigid bonds corresponding to a protein structure with covalent and noncovalent bonds is constructed, with account being taken of the influence of the flickering hydrogen bonds. We define the duty cycle for the noncovalent interactions as the percentage of time a given interaction is present, which we use as an input to investigate flexibility/rigidity patterns, in the algorithm FIRST which constructs and analyses topological networks.
1. Introduction It has been long recognized that molecular flexibility is as critical for protein function as its structure [1–15]. For example, the binding of a ligand, a substrate or another protein may change protein flexibility in addition to changing its conformation [3, 4, 9, 10], with, e.g., possible implications for functionality and allosteric pathways. Another example of the functional importance of protein flexibility is that the dynamic synchronization of positioning of the substrates in an enzyme active site and the subsequent prompt release of products may be critically important for enzymatic catalysis. Understanding, predicting and manipulating protein function (e.g. in drug design) therefore requires not only understanding of protein structure but also protein flexibility and how it relates to function on a molecular basis. Protein dynamics spans a range of time scales and may exhibit a variety of amplitudes of characteristic fluctuations as reflected, e.g., in B-factors of x-ray structures, an ensemble of structures typically produced in nuclear magnetic resonance (NMR) refinement, time resolved NMR [16, 17], Fourier transform infrared spectroscopy (FTIR) [18–20] and other 1478-3975/05/040137+11$30.00
emerging experimental techniques [7, 10, 21, 22]. While in recent years, rapid progress in time-resolved techniques for studying protein dynamics has been made, the spatial and temporal resolution for such measurements remains limited and often insufficient to characterize functionally important dynamic processes. Molecular dynamics (MD) simulations can provide a fairly realistic description of a protein and solvent dynamics and have proved useful for assessing protein flexibility and dynamics on a nanosecond time scale [23]. One obstacle on the route of extending classical MD simulations to mesoscopic (here hundreds of nanoseconds through microsecond to millisecond) time scales is that the basic time step used in the simulation is limited to resolving the fastest motions in the system. These are often vibrations involving covalent bonds, which occur on the pico-second time scale, and hence require femto-second time steps in the MD for sufficient accuracy. Clearly, the slower the dynamics of functionally important transitions, the less important for its characterization are the fast degrees of freedom such as valence bond and angle vibrational motions, rotation of torsion angles and libration of small chemical groups constituting the amino acid side chains. Based on these and similar observations,
© 2005 IOP Publishing Ltd Printed in the UK
S137
T Mamonova et al
several coarse grained approaches to building a protein model which would preserve molecular structure but may potentially be utilized in studying meso-scopic protein dynamics without accurate description of fast modes have been implemented, such as the Gaussian network models (GNM) [24–27]. The procedure FIRST (floppy inclusions and rigid substructure topography) software recently developed by Thorpe and co-workers [28, 29] allows the flexible regions to be determined by looking for the virtual motion associated with hinges in the protein structure. In this approach, a protein is described by a network of topological bars connecting bodies at the positions of atoms in the three-dimensional structure of the molecule. Rigid bars are substituted for covalent bonds, hydrogen bonds (h-bonds), salt bridges and hydrophobic contacts or tethers. Bodies have six degrees of freedom and the motion between adjacent bodies is constrained by bars (five for covalent and h-bonds, six for locked double bonds and two for hydrophobic tethers) [30]. Then this constructed network is analyzed via a graph-theoretical algorithm [28, 31, 32] to characterize flexibility/rigidity of the network. All atomic positions which are found to be immobile with respect to each other form the so called ‘rigid clusters’ and are thus expected to move in a correlated fashion. As a result of such an analysis, the protein structure is partitioned into a few rigid blocks, the motion of which with respect to each other is expected to dominate the diffusive or long-time-scale dynamics of the protein. FIRST flexibility analysis has been successfully applied to predict protein flexibility [28] and protein unfolding patterns [33]. To simplify the notation in this paper we will further refer to the rigidity analysis of topological networks performed with the graph theoretical algorithm by the term FIRST. This information can then be input in dynamics routines [34] which monitor the actual diffusive motion of the biomolecule or biomolecular complex, while filtering out the high-frequency vibrations. This has been done in the program FRODA (framework rigidity optimized dynamic algorithm) which uses geometric ghost templates formed from the rigid regions to guide the motion [35]. The construction of a topological network which represents a protein structure is not unique and requires certain artistry. Previous implementations of FIRST used energetic and geometric criteria [28, 33] for building hydrogen bonds, salt bridges and hydrophobic tethers; however no information about dynamics of these contacts, such as bond life times, and fraction of time present, was taken into account. In this paper we refer to the fraction of time that a bond between the same specific elements is ‘on’ as the duty cycle [36], which is expressed as a fraction or percentage and for which we use the symbol p. We adopt this term from pulsed lasers, where the duty cycle is the fraction or percentage of time that the pulse is ‘on’, the rest of the time there being no power. Therefore the effective power is the product of the power generated while a pulse is on, multiplied by the duty cycle [36]. A question arises how stable are the noncovalent bonds on time scales relevant to the functioning of a protein. Clearly, analyzing only the average static structure corresponding to the crystallographic structure of a protein is a big oversimplification when looking into such an inherently S138
dynamic problem as understanding and characterizing protein flexibility. Intuitively it seems reasonable to expect that some number of noncovalent native contacts (bonds) are dynamically breaking and reforming in a protein solvated in water at room temperature (the ‘flickering’ of bonds). It is, however, unclear a priori to what extent the dynamics of a folded protein is facilitated by flickering bonds and to what extend the short-time-scale dynamics of flickering bonds is important for predicting long-time-scale flexibility of a protein. A similar question has been posed and investigated in a recent paper by Gohlke et al [12]. These authors employed a statistical approach in which a flexibility index for each residue was calculated using FIRST for every snapshot of an MD trajectory and the results then averaged, i.e. a residue flexibility index defined as an ensemble average of FIRST predictions over instantaneous configurations. These authors concluded that the FIRST approach (applied as described above) and MD simulations agreed with each other rather well. However, in such an approach the relative importance of contacts, with different duty cycles cannot be taken into account. Also, as we will demonstrate in this paper, the rigid region decomposition performed on instantaneous configurations of a protein may vary significantly rendering averaging meaningless in some cases. We analyze the time-dependent behavior of noncovalent bonds in the course of an MD simulation, which is sufficiently long by MD standards but short on a mesoscopic time scale. We will use this information further to represent a protein by a unique network of topological bonds (inferred from the MD simulation and based on the noncovalent bonds duty cycles) which may be analyzed for patterns of rigidity/flexibility using FIRST. Such an implementation allows us to take into account not only the energy of a bond but also its duty cycle, through the effective strength. The two proteins chosen for this study are dynamically quite different. While barnase exhibits its full range of physiologically relevant conformational motions on the simulation time scale (10 ns) the glutamate receptor ligand binding domain (GluR2) remains in its open conformation. This paper is organized as follows. Section 2 discusses liquid water and amorphous ice as a reference point, section 3 describes the two proteins modeled, details of the MD simulations, definitions of hydrogen bonds (h-bonds), hydrophobic tethers and salt bridges. We define criteria for building a topological network and a brief description of the FIRST flexibility algorithm. In section 4 we present and analyze results of the MD simulations, with results for duty cycles. Section 5 summarizes our results.
2. Comment on water To provide a reference system, we will first consider liquid water. In water, the majority of molecules participate in a strong h-bond network almost all the time. Building a topological network for a snapshot of liquid water using an energetic criterion for an h-bond (i.e. directly applying the FIRST algorithm to instantaneous water structure) will result in a network in which all the molecules are bound by several
Protein flexibility using constraints from molecular dynamics simulations
bonds with their immediate neighbors and thus form a single rigid block or amorphous ice. However, recalling that the life time of each of these h-bonds is extremely short. According to [37] the average lifetime of hydrogen bonds in liquid water is about 1 ps at T = 300 K. We can define a ‘persistent h-bond’ as a bond whoose life time is longer than a certain threshold and chose this threshold value to be >2 ps (to ensure that it is longer than most experimental observations reported). If only persistent bonds participate in building a topological network, the water molecules become ‘disconnected’ and the FIRST model will correctly predict their mutual mobility and hence the fluidity of liquid water. The duty cycle is essentially zero, as the molecules diffuse away from the original contact, and hence the effective strength is zero. The effective strength of a noncovalent bond is defined as the product of the strength and the duty cycle. This is quite different in amorphous ice, where an instantaneous snapshot would be identical to liquid water, but where the duty cycle approaches 100% and the effective strength is equal to the actual strength for the hydrogen bonds. Liquid water presents a clear example of energetically strong yet short-lived h-bonds. As we will demonstrate in this paper, such bonds also exist in proteins and in water which is in immediate contact with the protein surface. The contribution of water to the structural, dynamic and functional properties of proteins is widely acknowledged [2, 38, 39] but not well characterized from a dynamics point of view. It is commonly believed that surface and buried water molecules may tighten or rigidify the protein, but in some cases it has been found that the presence of buried water actually makes the protein more flexible due to entropic effects [2]. The majority of earlier simulation studies aimed at addressing this question in quantitative fashion were performed for short times; 1–2 ns at most [39]. At these time scales, water often appears bound in ligand-binding pockets and at the surface of a protein. On longer time scales (10 ns) as reported in this study and [40], the water behavior near a protein surface can be very different, i.e. water can often diffuse in and out of its initial positions. It remains undetermined as to what extent water participates in the formation of persistent hydrogen bonds with the protein, i.e. may influence its rigidity/flexibility. Determining the relevant duty cycle associated with the hydrogen bonds between the protein and water is paramount to address the above question. In the following sections we will present results of our MD simulations of pure liquid water and water near protein interface which support and illustrate this introductory comment. Details of the MD simulations and system setup are described in section 3 and results of the simulations are presented in section 4.
3. Methods 3.1. Molecular dynamics simulations We have performed extensive equilibrium MD simulations of barnase and the glutamate receptor ligand binding domain (GluR2). Barnase (PDB access code 1A2P) [41] resolved by x-ray ˚ is a 110 residue crystallography at a resolution of 1.5 A
single-domain exotoxin with ribonuclease activity, produced by Bacillus amyloliquefaciens. Barnase can be co-expressed and co-folded in E. coli with an inhibitor, barstar. The function of barnase has not been elucidated but it is presumed to function as a digestive enzyme or as an exotoxin against predators or competitors [42]. The GluR2 ligand binding domain crystal structure was ˚ [13] (PDB access code 1FTO). This protein is resolved at 2.2 A a globular water soluble construct of a ligand binding domain of the AMPA glutamate receptor [14, 15]. In the intact GluR2 receptor, a conformational transition of the S1S2 domain upon ligand binding initiates an opening of a trans-membrane ion channel. The binding determinants of the GluR2 protein with its ligand glutamate has been studied at several levels of theory [43, 44]. All the MD simulations reported in this paper were performed using the AMBER 7 package [45] with the Cornell et al force field [46]. Equilibrium trajectories were collected every 0.5 ps. Equilibrium simulations were performed at a constant temperature of 300 K. Proteins were solvated in TIP3P water. Periodic boundary conditions were applied with particle mesh Ewald (PME) method for long-range electrostatics as implemented in AMBER. The temperature was controlled using a Berendsen thermostat [47]. Equilibration simulations were performed at constant pressure and equilibrium simulations were performed at a constant volume to speed up the simulation. Pure water simulations were performed for 5607 TIP3P molecules which formed a box with dimensions of 38 × 38 × ˚ Simulations of barnase were performed in water box 38 A. ˚ comprising 15050 water with dimensions 55 × 54 × 55 A molecules. The GluR2 protein was solvated with 19 914 water ˚ molecules resulting in a box of dimensions 60 × 70 × 65 A. Total equilibrium simulation time for the GluR2 protein was 10 ns. The first 2 ns of the simulations were discarded and the last 8 ns of the collected MD trajectory of the GluR2 ligand binding domain were used for analysis presented in section 4. Total simulation time of barnase was slightly less than 8 ns. 3.2. FIRST The program FIRST (floppy inclusions and rigid substructure topography) reads a file listing the 3D coordinates of a molecular system, such as a protein. The atom positions can be thought of as the vertices of a 3D graph. The covalent constraints are easily identified from the chemistry, i.e. the covalent bond lengths and angles and double bonds such as the peptide bond, around which rotations are not allowed. The h-bonds, salt bridges and hydrophobic tethers, which comprise the noncovalent bonds, are determined from the local geometries and a local energy function, and included in the constraint set [28, 29]. The final molecular graph, which consists of all the atoms, covalent bonds and noncovalent interactions, is then mapped to a generic body–bar graph, where the atoms are represented by bodies with six degrees of freedom [30]. Flexibility analysis of the body–bar graph proceeds via the pebble game algorithm [31, 32], which is embedded within FIRST. The results indicate for each bond in S139
T Mamonova et al (a)
(b)
2
1.5
RMSD (Å)
RMSD (Å)
2
1 0.5
1.5 1 0.5 0
0 0
2
4 time (ns)
6
8
2
4
6 time (ns)
8
10
Figure 1. The RMSD of main chain N, C and Cα atoms, relative to the starting structure, during approximately the last 8 ns of an MD trajectory for (a) barnase and (b) the glutamate receptor protein GluR2 S1S2 protein.
the body–bar graph whether it can participate in a dihedral angle rotation (flexible) or not (rigid). The rotations are rarely independent but occur together over a correlated region which defines collective motions. The results from FIRST are decomposed into rigid clusters, which are regions of contiguous rigid bonds, and flexible regions. This information is then mapped back onto the molecular graph, and the rigid and flexible regions of the protein can be used as an aid to interpret structure/function relationships, etc. 3.3. Topological bond networks The rigid cluster decomposition method implemented in the software FIRST [28] requires the input of a network of bonds, which is then analyzed for rigidity/flexibility. In the present work a network of topological bonds representing a protein structure was constructed from both covalent and noncovalent bonds of protein and water molecules inferred from MD simulations3 . While construction of topological bonds corresponding to chemical covalent bonds is trivial (each topological bond corresponds to a single covalent bond in the molecule), construction of topological bonds which represent noncovalent interactions such as hydrogen bonds, salt bridges and hydrophobic contacts requires special attention. The quality of the results depends critically on this input, in that it is this part of the algorithmic input which determines whether a rigid block decomposition is adequate to the dynamical properties of a protein under study and is the focus of this paper. An algorithm for constructing topological bonds corresponding to noncovalent bonds proposed previously [28, 30] was based on analyzing a static crystallographic or an NMR structure of a protein and energetic and geometric criteria [48–50]. In this work we have used MD simulations to generate a list of hydrogen bonds, salt bridges and hydrophobic contacts based on both local energy and geometric criteria and on duty cycles as determined from MD, and thus we include important new dynamical information into our criterion for defining a topological bond. The main idea is to use short MD simulations over a few nanoseconds to define the h-bond and hydrophobic tethers and subsequently use FIRST to analyze this constructed network 3
It is important to note that we used the ‘hbin’ and ‘phin’ options of FIRST5 that bypass the standard algorithms for noncovalent bond inclusion and used the list of bonds inferred from MD simulations as described in this section. The algorithm for the rigid region decomposition of an existing topological bond network has been unmodified as described in this work and elsewhere [30, 33].
S140
for rigid and flexible regions, thus permitting approaches such as FRODA [51] to be used to study diffusive motions.
4. Results 4.1. MD simulations and FIRST MD simulations of proteins barnase and GluR2 were performed in explicit water (TIP3P model) at room temperature. The root mean square deviation (RMSD) of atomic positions was calculated for both structures as a measure of structural stability. The N, C and Cα atomic RMSD with respect to the starting structure are shown for both proteins in figure 1. Both proteins were stable in the course of the simulation as reflected by low RMSD values, and thus remain in the folded state as would be expected. Shown in figure 2(a) are average atomic Cα RMSD values for barnase calculated per residue with respect to the average conformation of the backbone—that is the backbones of all the MD conformers were globally aligned. For comparison, shown in the same figure are atomic Cα RMSD values for the ensemble of conformations from an NMR experiment. In figure 2(b), the barnase structure is shown color-coded according to the residues RMSD values (figure 2(a)) to reveal the relative mobility of residues with respect to the average conformation. As seen from figure 2(b), barnase appears to have one relatively rigid core region comprised of the β-sheet and one α-helix, and a relatively mobile region that controls entrance to the binding site of the protein. The most rigid (cold) regions are shown in blue and the most mobile (hot) regions are shown in red in figure 2(b) In figure 3, the barnase structure is color-coded according to the rigid cluster decomposition in FIRST [28]. To produce this decomposition, hydrogen atoms were added to the structure downloaded from PDB databank (PDB access code 1A2P) and slightly optimized to remove steric clashes of atoms using AMBER 7; then the structure was analyzed using FIRST which generated a topological bond network [28] and then graph theory was used to find rigid clusters in the network [28]. The largest rigid cluster, which defines the ‘rigid core’ of barnase, is shown in blue. Comparing figures 2(b) and 3, we can conclude that the largest rigid cluster predicted by FIRST coincides mostly with the region of the protein which is least mobile in the MD simulation. We have applied a similar analysis to the GluR2 protein, in order to gain some additional perspective.
Protein flexibility using constraints from molecular dynamics simulations
(b)
(a)
2.5
RMSD (Å)
2 1.5 1 0.5 0 0
20
40
60
80
100
residue number
(c)
Figure 2. (a) The RMSD values of barnase calculated per residue for Cα atoms which have been globally aligned from our 10 ns MD simulation (solid line) and from the NMR experiment [52] (PDB access code 1fw 7) (broken line). (b) Backbone of barnase colored ˚ green is for 1 A ˚ < RMSD < 1.25 A, ˚ according to the RMSD values in (a). The coloring scheme is as follows, blue is for RMSD < 1 A, ˚ < RMSD < 2 A ˚ and red is for RMSD > 2 A. ˚ (c) Two MD snapshots of barnase aligned using a list of residues with yellow is for 1.25 A ˚ (blue region in (b)); shown in light green color is the structure at 4.475 ns (closed conformation) and in grey is the structure at RMSD < 1 A 8.2 ns (open conformation).
4.2. Duty cycles of noncovalent bonds from MD simulations
Figure 3. A ribbon diagram showing a rigid cluster decomposition of barnase using FIRST. The five largest rigid clusters are colored in descending order according to size; blue, red, orange, yellow and green, respectively. Clusters of smaller sizes are lumped together and shown in grey.
In figure 4, an analysis of GluR2 flexibility during the MD simulation is shown as an average RMSD plot (figure 4(a)) and the color-coded structure of the protein is shown in figure 4(b). In figure 5, the GluR2 structure is shown colored according to the rigid cluster decomposition obtained by using FIRST. For GluR2 (as well as in the case of barnase described above) the rigid cluster decomposition and residue mobilities attributed from RMSD values shown in figure 4 are in reasonable agreement.
An MD simulation produces an ensemble of conformational isomers of a protein in the vicinity of its native state. At room temperature atomic positions in a protein molecule fluctuate constantly due to thermal noise, within the water bath. Despite the fact that these fluctuations may occur in the vicinity of one global conformation, they may involve, for example, rotation and bending of side chains near their average positions, as well as positional fluctuation of backbone loops and hinges, which in turn may result in the short-lived formation or breaking of h-bonds and hydrophobic contacts. Thus each static conformer (a snapshot from an MD simulation) can be characterized by its own distribution of h-bonds, salt bridges and hydrophobic tethers. It is not clear from ad hoc considerations how sensitive the rigidity of the topological network constructed using the FIRST software will be to such thermal fluctuations of the protein structure at room temperature. In order to answer this question we applied FIRST [28] to several snapshots of the GluR2 structure extracted from the MD simulation4 . Visual inspection of the rigid cluster decomposition shown in figure 6 revealed that the FIRST prediction changes significantly from snapshot to a snapshot. Therefore it appears that properties 4
Here we used the FIRST 3.0 software as described in section 2.2 and [28, 30].
S141
T Mamonova et al
RMSD (Å)
(b) 3.5 3 2.5 2 1.5 1 0.5 0
(a)
0
50
100
150
200
250
residue number
Figure 4. (a) The RMSD values of GluR2 S1S2 calculated per residue for the Cα backbone atoms. (b) The backbone GluR2 S1S2 colored ˚ green is for 1 A ˚ < RMSD < 1.25 A, ˚ yellow is for 1.25 A ˚ < RMSD < 2 A ˚ and red is for RMSD > 2 A. ˚ as follows, blue is for RMSD < 1 A,
2ns
MDL
3ns
MDL
4ns
MDL
Figure 6. Snapshots of the GluR2 structure (PDB access code 1FTO) taken from an MD simulation at indicated times and analyzed with FIRST.
Figure 5. The GluR2 structure is shown colored by rigid region decomposition predicted using FIRST [28]. FIRST is applied to the structure downloaded from PDB (PDB access code 1FTO) and minimized for 100 steps with the steepest descent algorithm (AMBER). The largest rigid cluster is shown in blue. Other clusters are colored to distinguish them from one another. All clusters that include three atoms or less are colored yellow.
of topological networks constructed using instantaneous lists of noncovalent bonds may vary randomly and thus be not representative of the actual rigidity/flexibility of a protein. In order to understand the time-dependent behavior of noncovalent contacts which causes variations in the rigid region decompositions, illustrated in figure 6, we first looked at various hydrophobic and hydrogen bonds in the course of the MD simulation. In figure 7(a), the time-dependent behavior of a very stable long-lived h-bond is shown. In this figure, the donor–hydrogen–acceptor distance is plotted as a function of time (left panel) and the donor–hydrogen–acceptor angle is plotted in the right panel. For this bond, both the distance and the angle satisfied the h-bond definition criteria (section 3.3) [28, 30] most of the time; however fluctuations out of the range of geometric definitions of noncovalent bonds do exist for even the most stable bonds. A typical example of a flickering bond is shown in figure 7(c). In this particular case, the bond fluctuates in a steady fashion and has a duty cycle of 75%. Analysis performed on trajectory fragments of various lengths resulted in roughly equivalent estimates of the duty cycle, indicating S142
that even a short-time 2 ns simulation is sufficient to correctly deduce this bond’s duty cycle. In figure 7(d ), we show the behavior of a salt bridge that took the first part of the trajectory to form properly, after which it remained stably bonded for the rest of the simulation. This is again an example in which observation at shorter times may produce an incorrect assessment of duty cycle for this bond. Overall the number of salt-bridges is not very high. In our simulations, behavior of most salt bridges stabilized after 4 ns of the simulation. One atom may often form noncovalent bonds with several neighboring atoms in a single chemical group. Often such contacts occur within tightly packed groups of hydrophobic contacts. Due to the thermal motion, atoms in such groups may switch nearest neighbors from time to time and thus switch bonds without violating their packing. Figure 8 illustrates this observation. In figure 8(a), the two carbon atoms of the Val 234 residue take turns to form a contact with one atom of Ile 218, whereas in figure 8(b) the Val 4 carbon exchanges its bond with two atoms of Leu 49. Note that in both cases shown in figure 8, only one bond for a pair of residues exists at any one time. Thus, in some cases a question arises: how many stable contacts should be counted for a group of atoms found in a close contact? The distributions of the duty cycle p of noncovalent bonds during the MD simulation of the GluR2 protein are plotted in figure 9. We have considered several fragments of the total MD trajectory of various time lengths, which were carved out of the total trajectory using different starting points (all plotted in figure 9) to assess stability of the results.
Protein flexibility using constraints from molecular dynamics simulations 200
Distance (Å)
Angle (deg.)
4
(a)
3 2 1
150 100 50
0 2
4
6
8
0
10
2
4
6 time (ns)
time (ns) 200
8
150
Angle (deg.)
10
Distance (Å)
(b)
6 4 2
50 0
2
4
6
8
10
2
4
6 time (ns)
8
10
2
4
6 time (ns)
8
10
2
4
6 time (ns)
8
10
time (ns)
6 Angle (deg.)
Distance (Å)
200
4 2
160 120 80 40
0
0
2
4
6 time (ns)
8
10
200
8
160
Angle (deg.)
10 Distance (Å)
(d )
10
100
0
(c)
8
6 4 2
120 80 40
0 2
4
6 time (ns)
8
0
10
8
8
6
6
Distance (Å)
Distance (Å)
Figure 7. In the GluR2 protein shown are hydrogen–acceptor distances (left panels) and donor–hydrogen–acceptor angles (right panels) between possible h-bond forming atoms: (a) Gly173.N–H and Thr169.O (bond duty cycle is 92%), (b) Thr5.OG1–HG1 and Glu73.OE1 (bond duty cycle is 57%), (c) Thr115.N–H and Lys112.O (bond duty cycle is 75%) and (d ) salt bridge Arg144.NE–HE and Glu141.OE2 (bond duty cycle is 70%).
4 2
4 2 0
0 2
4
(a)
6
8
2
10
8
Distance (Å)
10
8
Distance (Å)
6
8
10
8
10
time (ns)
10
6 4 2
6 4 2 0
0 2
(b)
4
time (ns)
4
6 time (ns)
8
10
2
4
6 time (ns)
Figure 8. In the GluR2 protein shown are distances between possible hydrophobic contact forming atoms. (a) Left panel: Ile218.CG1 and Val234.CG2 (bond duty cycle is 59%); right panel: Ile218.CD1-Val234.CG2 (bond duty cycle is 23.5%). (b) Left panel: Val4.CG1 and Leu49.CD1 (bond duty cycle is 40%); right panel: Val4.CG1 and Leu49.CD2 (bond duty cycle 14%).
S143
T Mamonova et al 160
(a)
120
number of h-bonds
number of h-bonds
140
100 80 60 40 20
120 100 80 60 40 20 0
0
20 30
40
60
50
70 80
90
100
20
30
40
50
60
70
80
90
100
duty cycle (%)
duty cycle (%) 20 number of h-bonds
(b)
140
(c)
16 12 8 4 0
20
30
40
50
60
70
80
90
100
duty cycle (%)
Figure 9. Number of noncovalent bonds in GluR2 from the MD simulation distributed according their duty cycles. The blue (the left column in each panel) columns in each panel represent distributions of noncovalent bonds calculated using the fragment of the length 8 ns of the trajectory, which started at 2 ns of the trajectory absolute time and lasted until 10 ns. Red columns is for the number of bonds observed in a trajectory fragment of 4 ns duration (starting at 6 ns ending at 10 ns); orange columns correspond to the analysis of the second fragment of 4 ns duration (starting at 2 ns ending at 6 ns in trajectory absolute time); Green is for the trajectory fragment of 2 ns length (starting at 2 ns ending at 4 ns). The order of columns also corresponds to the order of the above description and is preserved in all three panels: (a) hydrogen bonds, (b) hydrophobic tethers and (c) salt bridges.
One can see from figure 9(a) that long-lived hydrogen bonds dominate the distribution with a relatively smaller number of bonds which were present with a duty cycle in the 30–70% range. Hydrophobic tethers exhibit strikingly different behavior (see figure 9(b)) in comparison to h-bonds. An approximately exponentially decaying distribution of duty cycles in figure 9(b) with a large number of shorter duty cycles being probably a manifestation of the fact that these interactions are not particularly specific, i.e. as long as the total number of contacts is preserved the structure will remain in the collapsed state. It is interesting to note that we found an approximately equal number of h-bonds and hydrophobic tethers among bonds with duty cycles exceeding 50% of the total trajectory length, regardless of the length of the trajectory fragment among the fragments considered, indicating that the MD simulations were sufficiently long to accurately estimate the distribution of noncovalent bond duty cycles. The total number of salt bridges (figure 9(c)) is significantly smaller than the number of h-bonds and hydrophobic tethers, and so the statistics are poorer. The shape of the distribution of their duty cycle distribution is intermediate between the h-bonds and hydrophobic tether distributions. On average, salt bridges are found to be less stable. We can attribute this observation to the fact that charged side groups that form salt bridges reside mostly at the surface of the protein, where they can be well solvated with water, and thus their interactions with other charges in the vicinity S144
are weakened by the dielectric screening. It is interesting to note that the patterns of duty cycle distributions for all three categories of noncovalent bonds are preserved for shorter observation times (2 ns and 4 ns, respectively). However, the distribution of duty cycles of salt bridges observed for the fragment of 2 ns duration differs significantly from that of the 8 ns trajectory fragment (see figure 9(c)). The observation during the longer trajectory fragments are of course the most meaningful. The distributions of duty cycles for either half of the trajectory (the first 4 ns of the trajectory and the last 4 ns of the trajectory in figure 9) were found to be very similar to each other, indicating that the dynamics of the protein has been approximately at a steady state. Nevertheless, a small number of examples in which a bond is present for only half of the trajectory time do exist. One such example is the time-dependent behavior of a bond shown in figure 7(b). This particular case and a few others are found occasionally (e.g., compare distributions for two 4 ns trajectory pieces in figure 9) but they may be important for defining regions of possible conformational transitions and may influence predicted rigidity/flexibility patterns. In the next subsection we will use the information about the duty cycles of noncovalent bonds inferred from the MD simulation to construct topological networks of bonds and analyze them for rigidity/flexibility.
Protein flexibility using constraints from molecular dynamics simulations
60%
50%
45%
80%
65%
Figure 10. Rigid region decompositions for GluR2, where blue, red, orange, green and yellow represent rigid clusters are arranged sequentially from the largest one (blue) to the fifth largest (yellow). Threshold values for the duty cycle pc of noncovalent bonds which were included in topological network are indicated in figure panels. (a)
(b)
(c)
Figure 11. Rigid cluster decompositions for GluR2, where blue, red, orange, green and yellow represent rigid clusters arranged sequentially from the largest one (blue) to the fifth biggest (yellow). Threshold values for noncovalent bond duty cycle which were included in topological network and trajectory fragment lengths are as follows: (a) pc = 55%; MD fragment length is 2 ns, (b) pc = 55%; MD fragment length is 4 ns and (c) pc = 50%; MD fragment length is 8 ns.
4.3. MD based topological bond network analyzed with FIRST In this subsection a topological network corresponding to the GluR2 protein 3D structure has been constructed in which atom positions are vertexes of the network which are connected by rigid constraints corresponding to all the covalent chemical bonds in the system, as well as a number of noncovalent bonds. The rigidity/flexibility of this network of bonds is then analyzed using the algorithm included in the FIRST program. Noncovalent bonds were chosen based on the duty cycles found from the MD simulation results. Only bonds whose presence in the MD simulation was extended beyond the threshold value of the duty cycle pc were included in the topological network. This critical value of pc was taken to be identical for h-bonds, hydrophobic tethers and salt bridges. In figure 10, the GluR2 protein is shown colored according to the rigid cluster decomposition calculated with several different threshold duty cycle values pc. In general, choosing smaller values of pc results in a larger number of noncovalent bonds being included in the topological network of bonds and thus leads to a more rigid structure. Note that the largest rigid region (colored in blue) includes nearly the entire protein at pc = 45%. For pc = 80% the largest rigid cluster shrinks significantly, while most of the protein remains flexible.
Decomposition of the protein into rigid clusters with duty cycle thresholds in the range 50–65% is very similar with the largest rigid cluster gradually shelling-off residues and loops from its periphery. It is useful to revisit diagrams in figure 9 to see that the variation in the number of h-bonds (figure 9(b)) and salt-bridges (figure 9(c)) for this range of threshold values pc is relatively small. A large variation in the number of hydrophobic tethers can be expected when the duty cycle threshold varies between 50% and 65% (figure 9(b)). However by construction of the FIRST algorithm, and because of their nonspecific nature, the hydrophobic tethers influence on the protein rigidity is weaker than that of h-bonds. By comparing figure 10 with the RMSD distribution shown in figure 3 and experimental observations of McFeeters and Oswald [53], we may empirically conclude that taking into account bonds with duty cycles longer than 50% resulted in the network that adequately characterizes rigidity patterns in this protein on the nano-second time scale. The next important question is to determine what is the shortest MD trajectory which is sufficient to adequately predict the protein rigidity pattern? In figure 11, we have plotted the rigid cluster decomposition performed with duty cycle threshold values of 50–55% for trajectory fragments of 2 ns, 4 ns and 8 ns duration, respectively. All three predicted S145
T Mamonova et al
patterns compare well, with only minor differences in predicting rigid cluster decomposition. Discrepancies between analyses made using 2 ns and 8 ns long trajectory fragments are slightly larger than between the 4 ns and 8 ns fragments, indicating that the 2 ns long trajectory was not quite sufficient to correctly capture the behavior of bonds which had longer transient periods in the beginning of the simulation, e.g. the one shown in figure 7(d ) and discussed in the previous section. 4.4. Water dynamics and protein flexibility An important question addressed in this study is to what extent surface and buried water play a role in protein flexibility/rigidity. Most published crystal structures of proteins contain (buried) water molecules, thus the question arises whether these water molecules need to be taken into account for analyzing protein flexibility at room temperature. To help answer this question we have monitored water behavior in the bulk solution and near the protein surface during the MD simulation. Water molecules in the liquid phase form a dense network of hydrogen bonds at every instant of time, therefore taking into account only energetic or geometric criteria to define h-bonds and hydrophobic tethers will result in a strongly interconnected network forming a single rigid block that includes all water molecules, i.e. an amorphous ice. However from our comments on water in section 2, it is known that the life time of a hydrogen bond in liquid water at room temperature is short, ca 1 ps [37]. In this work, our simulations of bulk water at room temperature resulted in average life time of a hydrogen bond of 10–15 ps. We have used the TIP3P water model which is required for simulating a solvated protein with Cornell et al force field [46]. TIP3P is notoriously inaccurate [54] in reproducing water dynamic properties, overestimating h-bond life times by the order of magnitude. However, even largely overestimated life times are much less than any duty cycle threshold pc values we used above, over time scales greater than a few nanoseconds. This is because the water molecule is free to diffuse away after the h-bond is broken in the liquid. As a result, no hydrogen bonds formed by bulk water were good candidates for inclusion in the final network of topological bonds. For comparison we have also performed the simulation of bulk water at 100 K, which is typically used for determining protein structure in x-ray crystallography. Lowering the temperature of the simulation resulted in a sharp increase in h-bond life times with some of them persisting for as long as 100% of the 1.5 ns simulation. Thus, despite the fact that TIP3P does not quantitatively reproduce mobility in bulk water, for the purpose of our analysis, it exhibited qualitatively correct behavior. Next, we looked at life times of water molecules at the surface and in the binding pocket of the GluR2 protein, distances and angles of all water atoms to all protein atoms were monitored over 2 ns. Only two water molecules located inside the protein formed stable hydrogen bonds of duration of slightly less than 2 ns; both as hydrogen acceptors with Lys140.N and Ala84.N. Thus, most water molecules which S146
solvated the protein, even those that were present in the low temperature x-ray structure, have not exhibited long enough h-bond duty cycles at room temperature to be included in final topological network and thus play no role in FIRST. This validates the omission of most liquid water in previous work [33], which only took into account ‘buried water’ as seen in x-ray crystallographic structures. As our simulation demonstrates even buried water may exchange with bulk water at time scales shorter than 2 ns and thus be fluid as far as slower conformational reorganizations are concerned, and should probably be ignored in FIRST. Note that if a hydrogen bond involving a specific site in the protein, switches from one water molecule to another, that is not counted as part of the duty cycle.
5. Conclusions In this work we have analyzed the interplay between protein flexibility/rigidity and the duty cycles of hydrogen bonds, salt bridges and hydrophobic tethers formed in the tertiary structure of proteins. The duty cycle is defined as the fraction of time that a bond is formed. We have utilized molecular dynamics (MD) simulations, followed by a rigid region decomposition of networks formed by covalent and noncovalent bonds using FIRST. While FIRST was originally developed to analyze the rigidity and flexibility of a protein based solely on its 3D structure obtained by either x-ray or NMR-crystallography, in this paper we have developed a novel composite approach which allows us to use FIRST for the rigid region decomposition of a protein based on both the protein structure and the noncovalent bond duty cycles, which determine what fraction of time a bond is present. We have utilized a two step protocol in which initially, stability and the distribution of duty times of noncovalent contacts are determined during the MD simulation. Secondly, the topological network of bonds was constructed using the MD inferred information about noncovalent contacts stability and further analyzed for rigid cluster decomposition using FIRST. Two proteins, barnase and the Glutamate receptor ligand binding domain (GluR2), were chosen for MD simulations and their apparent flexibility during these simulation accessed. In 10 ns simulations, the MD predicted flexibility of barnase compare well with the NMR ensemble of conformations, leading us to conclude that barnase exhibited its full range of motions in the MD simulations. Rigid region decomposition of barnase using the FIRST algorithm adequately identified the least flexible region of the protein as its largest rigid cluster. The GluR2 protein which is known to undergo a conformational change upon ligand binding remained in one conformation during the 10 ns MD simulation and thus appeared relatively rigid. We performed an analysis of the duty cycles of the noncovalent bonds for both proteins and found that a high percentage of hydrogen bonds were stable during the simulation while hydrophobic tethers fluctuate and switch partners often. Taking into account bonds with life times of at least 4 ns and analyzing this constructed network with FIRST,
Protein flexibility using constraints from molecular dynamics simulations
resulted in a rigid region decomposition which compared well with the MD simulation predicted mobility of residues and experimental results. We have focused on the duty cycle to determine which noncovalent interactions to include in rigidity analysis, and this has indeed proved useful. However it seems it would be useful to investigate defining an effective noncovalent bond strength by time averaging the strength over a suitable interval (∼10 ns). Thus noncovalent bonds, both hydrogen bonds and hydrophobic tethers, would be chosen on the basis of the strength of the effective interaction for inclusion (or not) in rigidity analysis using FIRST.
Acknowledgments We would like to acknowledge support from the NIH under grant number GM067249, the Biodesign Institute at Arizona State University, the Research Corporation and the Pittsburgh Supercomputer Center (NSF PACI grant). We should like to thank Maria Kilfoil for a critical reading of the manuscript.
References [1] Carlson H A, Masukawa K M and McCammon J A 1999 J. Phys. Chem. A103 10213–9 [2] Fischer S and Verma C S 1999 Proc. Natl Acad. Sci. USA 96 9613–5 [3] Gallicchio E, Kubo M M and Levy R M 1998 J. Am. Chem. Soc. 120 4526–7 [4] Celej M S, Montich C G and Fidelio G D 2003 Protein Sci. 12 1496–506 [5] Jimenez R, Salazar G, Yin J, Joo T and Romesberg F E 2004 Proc. Natl Acad. Sci. USA 101 3803–8 [6] Kern D and Zuiderweg E R P 2003 Curr. Opin. Struct. Biol. 13 748–57 [7] Markelz A, Whitmire S, Hillebrecht J and Birge R 2002 Phys. Med. Biol. 47 3797–805 [8] McFeeters R L and Oswald R E 2004 FASEB J. 18 428–38 [9] Merlino A, Vitagliano L, Ceruso M A and Mazzarella L 2003 Proteins: Struct. Funct. Genet. 53 101–10 [10] Merlino A, Vitagliano L, Sica F, Zagari A and Mazzarella L 2004 Biopolymers 73 689–95 [11] Radivojac P, Obradovic Z, Smith D K, Zhu G, Vucetic S, Brown C J, Lawson J D and Dunker A K 2004 Protein Sci. 13 71–80 [12] Gohlke H, Kuhn L A and Case D A 2004 Proteins—Struct. Funct. Bioinform. 56 322–37 [13] Hogner A, Kastrup J S, Jin R, Liljefors T, Mayer M L, Egebjerg J, Larsen I K and Gouaux E 2002 J. Mol. Biol. 322 93–109 [14] Madden D R 2002 Nat. Rev. Neurosci. 3 91–101 [15] Mayer M L and Armstrong N 2004 Annu. Rev. Physiol. 66 161–81 [16] Eisenmesser E Z, Bosco D A, Akke M and Kern D 2002 Science 295 1520–3 [17] Kern D, Eisenmesser E Z and Wolf-Watz M 2005 Methods Enzymol. 394 507–24 [18] Jayaraman V, Thiran S and Hess G P 1999 Biochemistry 38 11372–8 [19] Jayaraman V, Keesey R and Madden D R 2000 Biochemistry 39 8693–7 [20] Jayaraman V, Thiran S and Madden D R 2000 FEBS Lett. 475 278–82
[21] Bernado P, Fernandes M X, Jacobs D M, Fiebig K, de la Torre J G and Pons M 2004 J. Biomol. NMR 29 21–35 [22] McElroy C, Manfredo A, Wendt A, Gollnick P and Foster M 2002 J. Mol. Biol. 323 463–73 [23] Hansson T, Oostenbrink C and van Gunsteren W F 2002 Curr. Opin. Struct. Biol. 12 190–6 [24] Brooks B and Karplus M 1983 Proc. Natl Acad. Sci. USA 80 6571–5 [25] Tama F and Sanejouand Y H 2001 Protein Eng. 14 1–6 [26] Bahar I, Erman B, Jernigan R L, Atilgan A R and Covell D G 1999 J. Mol. Biol. 285 1023–37 [27] Tama F and Brooks C L III 2005 J. Mol. Biol. 345 299–314 [28] Jacobs D J, Rader A J, Kuhn L A and Thorpe M F 2001 Proteins: Struct. Funct. Genet. 44 150–65 [29] Thorpe M F, Lei M, Rader A J, Jacobs D J and Kuhn L A 2001 J. Mol. Graph. Model. 19 60–9 [30] Hespenheide B M, Jacobs D J and Thorpe M F 2004 J. Phys.: Condens Matter 16 S5055–64 [31] Jacobs D J and Thorpe M F 1995 Phys. Rev. Lett. 75 4051–4 [32] Jacobs D J and Thorpe M F 1996 Phys. Rev. E Stat. Phys., Plasmas, Fluids, and Related Interdisciplinary Topics 53 3682–93 [33] Hespenheide B M, Rader A J, Thorpe M F and Kuhn L A 2002 J. Mol. Graph. Model. 21 195–207 [34] Lei M, Zavodszky M I, Kuhn L A and Thorpe M F 2004 J. Comput. Chem. 25 1133–48 [35] Wells S, Dove M and Tucker M 2004 J. Appl. Crystallogr. 37 536–44 [36] Basov N G 1977 Temporal Characteristics of Laser Pulses and Interaction of Laser Radiation with Matter (New York: Plenum ) [37] Matsumoto M, Saito S and Ohmine I 2002 Nature 416 409–13 [38] Lemieux R U 1996 Acc. Chem. Res. 29 373–80 [39] Ni H H, Sotriffer C A and McCammon J A 2001 J. Med. Chem. 44 3043–7 [40] Henchman R H, Tai K S, Shen T Y and McCammon J A 2002 Biophys. J. 82 2671–82 [41] Bernstein F C, Bernstein F C, Williams B, Meyer E F, Rodgers J R, Shimanouchi T and Tasumi M 1977 J. Mol. Biol. 112 535–42 [42] Hartley R W 1989 Trends Biochem. Sci. 14 450–4 [43] Speranskiy K and Kurnikova M 2004 J. Chem. Phys. 121 1516–24 [44] Speranskiy K and Kurnikova M 2005 Biochemistry 44 11508–17 [45] Pearlman D A, Case D A, Caldwell J W, Ross W S, Cheatham T E, Debolt S, Ferguson D, Seibel G and Kollman P 1995 Comput. Phys. Commun. 91 1–41 [46] Cornell W D, Cieplak P, Bayly C I, Gould I R, Merz K M, Ferguson D M, Spellmeyer D C, Fox T, Caldwell J W and Kollman P A 1995 J. Am. Chem. Soc. 117 5179–97 [47] Berendsen H J C, Postma J P M, Vangunsteren W F, Dinola A and Haak J R 1984 J. Chem. Phys. 81 3684–90 [48] Stickle D F, Presta L G, Dill K A and Rose G D 1992 J. Mol. Biol. 226 1143–59 [49] Xu D, Tsai C J and Nussinov R 1997 Protein Eng. 10 999–1012 [50] Dahiyat B I, Gordon D B and Mayo S L 1997 Protein Sci. 6 1333–7 [51] Wells S, Menor S, Hespenheide B M and Thorpe M F 2005 Phys. Biol. 2 S127–36 [52] Nolde S B, Arseniev A S, Orekhov V Y and Billeter M 2002 Proteins: Struct. Func. Genet. 46 250–8 [53] McFeeters R L and Oswald R E 2002 Biochemistry 41 10472–81 [54] Mark P and Nilsson L 2001 J. Chem. Phys. B 105 24A
S147