A simplified force field for describing vibrational protein dynamics over the whole frequency range Konrad Hinsen and Gerald R. Kneller Centre de Biophysique Mol´eculaire (UPR 4301 CNRS) Rue Charles Sadron 45071 Orl´eans Cedex 2 France E-Mail:
[email protected] /
[email protected] Abstract The empirical force fields used for protein simulations contain shortranged terms (chemical bond structure, steric effects, van-der-Waals interactions) and long-ranged electrostatic contributions. It is well known that both components are important for determining the structure of a protein. We show that the dynamics around a stable equilibrium state can be described by a much simpler mid-range force field made up of the chemical bond structure terms plus unspecific harmonic terms with a distance-dependent force constant. A normal mode analysis of such a model can reproduce the experimental density of states as well as a conventional Molecular Dynamics simulation using a standard force field with long-range electrostatic terms. This finding is consistent with a recent observation that effective Coulomb interactions are short-ranged for systems with a sufficiently homogeneous charge distribution.
PACS-numbers: 87.14.Ee, 87.15.Aa, 87.15.He In addition to their fundamental importance for biological systems, proteins are also interesting dynamical systems from a purely physical point of view, combining liquid-like behavior at high frequencies with elastic behavior at low frequencies. The main experimental techniques for studying protein dynamics are inelastic neutron scattering [1] for short to medium time scales of about 1 ns and nuclear magnetic resonance (NMR) [2] for nanosecond to microsecond motions. Because of the structural complexity of proteins, the interpretation of the experimental results requires numerical calculations on atomic models. Therefore protein dynamics is a field that is marked by the cooperation of experiment and simulation. 1
Numerous experimental and theoretical studies have provided a general picture of protein dynamics spanning the whole frequency range [3, 4]. As in all physical systems, low frequencies correspond to collective motions, whereas high frequencies describe localized motions. The highest frequencies in a protein, around 100 THz (3000 cm−1 ), represent bond-stretching vibrations that involve a hydrogen atom. Moving towards lower frequencies, there are the bond stretching vibrations between two heavy atoms, bond angle vibrations, motions of larger chemical groups, residue deformations and residue rigid-body motions, secondary structure deformations, and finally large scale collective motions, for example domain motions. The main theoretical techniques for studying protein dynamics are molecular dynamics (MD) simulations and normal mode analysis [5]. The standard protein model consists of one classical point mass for each atom with interactions described by empirical force fields that contain long-ranged electrostatic contributions and short-ranged terms describing the chemical bond structure, excluded-volume effects, and van-der-Waals interactions: X
U =
bonds
ij
X
+
angles
ijk
dihedrals + (0)
P
kijkl cos (nijkl θijkl − δijkl ) ijkl
all pairs
P
(0) 2
kijk φijk − φijk
X
+ +
(0) 2
kij rij − rij
all pairs
ij
4ǫij
qi qj ij 4πǫ0 rij
(0)
12 σij 12 r
−
6 σij r6
nonbonded
(1)
The quantities kij , rij , kijk , φijk , kijkl, nijkl , δijkl , ǫij , σij , and qi are parameters obtained by fitting to experimental data or from more detailed calculations; they depend on the atoms involved. Due to the long-ranged terms all elements of the second-derivative matrix that is diagonalized in normal mode analysis are nonzero. This is the cause for the enormous memory requirement that is the limiting factor in normal mode calculations of macromolecules. The high-frequency part of the spectrum has been analyzed in detail by classical spectroscopy techniques on small peptide chains. It is mainly determined by the first three terms in Eq. 1, which describe the chemical bond structure. The very low frequency motions have been studied in detail as well, because they contain the highly specific domain motions which determine a protein’s function [6]. Several studies have shown that they are not sensitive to the details of the force field, but can be obtained with a simple harmonic force field with a distance-dependent force constant. Such models have been found sufficient to reproduce the frequency spectrum up to ≈ 0.5 THz (15 cm−1 ) [7] and to identify 2
the biologically relevant domain motions [8]. An even simpler Gaussian network model was able to reproduce crystallographic temperature factors [9]. However, domain motions occupy only a tiny part of the frequency spectrum; normal mode calculations of proteins of various size show that the number of modes describing them is roughly equal to a hundredth of the number of residues [10]. The frequency range between domain motions and intra-residue motions is much less well understood. Specifically, it is not clear which parts of the force field are essential to obtain a satisfactory description. In this letter we study a combination of the generic harmonic mid-range force field used for domain motion identification [8] and those terms from a detailed empirical force field that describe the chemical bond structure. We find that this combination works very well over the whole frequency range. The density of states obtained from this force field by normal mode analysis is shown to be as close to the experimental data as a density of states from an MD simulation using a standard detailed force field. We conclude that neither the long-ranged electrostatic terms nor the Lennard-Jones terms describing excluded-volume effects are important for protein dynamics around an equilibrium structure, although they certainly play an important role in defining this structure. We do not consider solvent effects in this work. There is both experimental [14] and theoretical evidence [11] showing that solvation has little effect on the shape of the vibrational density of states in the frequency range under consideration, and the observed importance of solvation for MD simulations [12] is essentially due to structural stabilization. This is not an issue for normal mode calculations, which study small vibrations around a stable structure. We construct our simplified potential by starting from the Amber 94 force field [13], which has the general form given in Eq. (1). We first removed the nonbonded interactions (electrostatic and Lennard-Jones, marked as “nonbonded” in Eq. 1) and replaced them by the harmonic deformation potential from Ref. [8], which is given by (0) 2 (0) Uij (r) = k(Rij ) |r| − Rij , (2) (0)
where Rij is the pair distance vector Ri − Rj in the input configuration. The distance-dependent force constant is given by |r|2 k(r) = c · exp − 2 , r0 !
(3)
and the parameter r0 was left at the optimal value found in ref. [8], i.e. 0.3 nm. For the bonded interactions, we kept the force constants from the Amber 94 potential, but modified the parameters that define the position of the minima in such a way that, just as for the non-bonded harmonic terms, each individual term becomes zero for the input configuration. For the bond terms, for exam(0) ple, the parameter rij in Eq. (1) was set equal to the value of rij in the input 3
configuration; a similar procedure was applied to the other terms. This modification ensures that an essential property of the deformation force field is kept: the force field is constructed in such a way that the given input conformation is a minimum; it is thus not necessary to perform a lengthy minimization in order to calculate normal modes. The Amber 94 rules for excluded pairs and 1-4 factors were applied without change. The factor c in Eq. (3) was adjusted to make the lowest non-zero normal mode frequency of a small molecule (an alanine residue with a nitrogen/methyl C terminus and an acetone N terminus) agree with that of the full Amber 94 potential; the optimal value was 300 kJ/mol ˚ A2 . An important practical advantage of this force field is the absence of long range terms. Its second-derivative matrix is therefore sparse, and using iterative eigenvector algorithms one can treat much larger proteins than with standard force fields such as Amber 94. As test cases we chose the proteins crambin, lysozyme, and myoglobin, all treated in vacuum. The crystallographic structures from the Protein Data Bank were minimized up to a remaining energy gradient of 10−4 kJ/mol/nm using the Amber 94 force field without any cutoffs. The density of states obtained from normal mode analysis with the Amber 94 force field in Fig. 1 shows almost no difference between the three proteins. This was to be expected for the highfrequency range, since all proteins are made up of the same amino acids, but not necessarily for low and intermediate frequencies. The high similarity between the three proteins allows us to limit all further calculations to crambin. The inset in Fig. 1 shows the low-frequency part of the spectrum for myoglobin in comparison with neutron scattering data (“dry myoglobin” from Fig. 5 in [14]) and MD results (from Fig. 6 in [15]) for myoglobin in vacuum, obtained with the united-atom version of the CHARMM forcefield [16] using a distance-dependent dielectric and a cutoff of 0.8 nm, softened by a switching function, for the long-range interactions. Considering the inclusion of anharmonic effects in the MD simulation and the use of different force fields, the two theoretical curves are remarkably similar, indicating that the normal mode approximation is not unrealistic for our study. The neutron scattering data shows much more important differences, which are due to both inaccuracies in the theoretical model and technical limitations in the experiments [1]. Fig. 2 shows the density of states for the individual ingredients of our simplified force field: the chemical bond structure terms (bond, angle, and dihedral terms), and the deformation force field. The chemical bond structure terms alone reproduce the spectrum above 15 THz (450 cm−1 ) very well. However, the lower end of the spectrum, which contains the collective motions, is completely wrong, showing an important shift of all frequencies towards zero. This effect becomes more pronounced for larger proteins, and can even produce unphysical zero frequencies for the slowest collective motions. The deformation force field, on the other hand, is completely wrong for everything except these collective motions below about 1 THz. 4
A comparison of the directions of the normal mode displacement vectors results in a similar observation. The starting point for a comparison of two sets of orthonormal vectors vi and wj is the overlap matrix U whose elements are defined by Uij = vi · wj . This is an orthogonal matrix which would be an identity matrix if the two mode sets were equal. The deviation from the identity matrix can be measured by 2 d = U − UT /8N, (4)
where N is the number of modes. The quantity d is essentially the Euclidean matrix norm of the asymmetric part of U and is normalized to ensure 0 ≤ d ≤ 1. Relative to the Amber 94 normal modes we find d = 0.35 for our simplified force field, d = 0.37 for only the chemical bond structure terms, and d = 0.58 for only the deformation term. However, these global numbers say nothing about the similarity of specific parts of the normal mode sets. A more detailed similarity measure is the “spread” that has been defined in Ref. [8] as si =
v u uX u t j 2 Uij2 j
−
X j
2
jUij2 .
(5)
The spread of mode i in the first set measures how many modes from the other set are necessary to reproduce it; it is zero when the vectors are equal, and its maximal value is N/2. Fig. 3 shows the spread of the mode sets computed with our simplified force field and its two ingredients (bond and deformation terms) compared to the Amber 94 normal modes. The overall difference is compatible with the matrix norm measure, which supports the meaningfulness of both quantities. The pure deformation force field shows a large spread except for the very first modes. The bond terms and the full simplified force field have a much smaller spread which is almost identical except for the first 40 modes (frequencies up to 1 THz), where the presence of the deformation term becomes important. Due to the form of our force field, the very low frequency collective modes and the high frequency local modes are well reproduced by construction. Therefore the spread is small in these frequency regions and shows a maximum for low-frequency non-collective modes. Fig. 4 compares the density of states from simplified force field to both the Amber 94 normal mode result and the density of states for myoglobin from Molecular Dynamics using the CHARMM force field [15]. The latter differs significantly in the upper frequency range due the use of an all-atom model. In the more interesting lower frequency part, it is clear that our highly simplified force field stays within the range of deviations that exist between different electrostatic cutoff schemes [1]. Since none of the empirical force fields used in protein simulations, and particularly none of the popular cutoff schemes, was developed with the goal of an accurate description of low-frequency dynamics, there remains a large uncertainty as to what the correct description of this frequency range is. Our 5
simplified force field thus cannot be said to be clearly worse than the detailed empirical force fields. It is also clear from Fig. 4 that anharmonic effects, whose absence in normal mode calculations is often claimed to be a serious defect, do not have a significant influence on the frequency spectrum of small motions around a stable conformation. We conclude that a simple harmonic model, which takes into account the chemical bond structure and a generic mid-ranged deformation energy, yields a description of protein dynamics around an equilibrium state that is as good as one obtained from a detailed molecular mechanics force field. The chemical bond structure terms alone describe the largest part of the spectrum well enough, but cannot identify the particularly important low-frequency collective motions. The combined model has the advantage of simplicity and computational efficiency. A detailed description of the long-ranged electrostatic interactions is not necessary. This observation is supported by the recent finding that the effective electrostatic interactions in systems that are electrically neutral on a sufficiently small length scale are short-ranged [17]. We thank Dr. W. Doster for providing the neutron scattering data from Ref. [14] used in Fig. 1.
References [1] J.C. Smith, Q. Rev. Biophys. 24, 227 (1991) [2] L.E. Kay, Nature Struct. Biol., NMR supplement , 513 (1998) [3] J.A. McCammon and S.C. Harvey, Dynamics of proteins and nucleic acids, Cambridge University Press, Cambridge, 1987 [4] C.L. Brooks, M. Karplus and B.M. Pettit, Adv. Chem. Phys. 71, 259 (1988) [5] A. Kitao and N. Go, Curr. Opin. Struct. Biol. 9, 164 (1999) [6] G.E. Schulz, Curr. Opin. Struct. Biol. 1, 883 (1991) [7] M.M. Tirion, Phys. Rev. Lett. 77, 1905 (1996) [8] K. Hinsen, Proteins 33, 417 (1998) [9] I. Bahar, A.R. Atilgan and B. Erman, Folding & Design 2, 173 (1997) [10] K. Hinsen, A. Thomas and M.J. Field, Proteins, in press [11] E. Clementi, G. Corongiu, M. Aida, U. Niesar and G. Kneller, in: Modern Techniques in Computational Chemistry: MOTECC-90, Ed. E. Clementi, Escom, Leiden, 1990 6
[12] M. Levitt and R. Sharon, Proc. Nat. Acad. Sci. USA 85, 7557 (1988) [13] W.D. Cornell, P. Cieplak, C.I. Bayly, I.R. Gould, K.M. Merz Jr, D.M. Ferguson, D.C. Spellmeyer, T. Fox, J.W. Caldwell and P.A. Kollman, J. Am. Chem. Soc. 117, 5179 (1995) [14] M. Settles and W. Doster, In: Biological Macromolecular Dynamics, Eds. S. Cusack, H. B¨ uttner, M. Ferrand, P. Langan and P. Timmins, Adenine Press, New York, 1997 [15] S. Furois-Corbin, J.C. Smith and G.R. Kneller, Proteins 16, 141 (1993) [16] B.R. Brooks, R.E. Bruccoleri, B.D. Olafson, D.J. States, S. Swaminathan and M. Karplus, J. Comp. Chem. 4, 187 (1983) [17] D. Wolf, P. Keblinski, S.R. Philpot and J. Eggebrecht, J. Chem. Phys. 110, 8254 (1999)
7
Figures Figure 1
0.04 crambin lysozyme myoglobin
0.03 normal modes molecular dynamics neutron scattering
0.02
0
5
10
15
20
0.01
0.00 0.0
50.0 Frequency [THz]
100.0
The density of states of crambin, lysozyme, and myoglobin from Amber 94 normal modes, showing the high similarity of the dynamics of different proteins. The inset shows the low-frequency part of the myoglobin spectrum in comparison with neutron scattering data from Ref. [14] and molecular dynamics results from Ref. [15]. There is a general agreement between the normal mode and molecular dynamics results, in spite of different force fields.
8
Figure 2 0.10
0.08
Amber 94 simplified deformation chemical bond terms
0.06
0.04
0.02
0.00
0
20
40
60
Frequency [THz]
The density of states for crambin for the full Amber 94 force field, for the chemical bond terms (bond/angle/dihedral) of Amber 94, and for the deformation force field defined in Eq. (2). The integral over all spectra is one. The chemical bond subset describes the upper part of the spectrum very well, but yields much too low frequencies for the collective motions at the lower end of the spectrum. Inversely, the deformation force field is sufficient only for a tiny fraction of the spectrum at its lower end. The simplified force field yields a satisfactory description over the whole frequency range.
9
Figure 3 300
Spread [mode numbers]
800
simplified (a) chemical bond structure (b) deformation (c)
200
100
600 0
0
20
40
60
400
c 200
0
a
0
500
1000
b
1500
2000
Mode number
The spread (defined in Eq. (5) as a similarity measure between normal mode directions) of the modes obtained with the simplified force field and its two contributions, the chemical bond structure terms and the deformation term, with respect to the full Amber 94 modes. The inset shows a magnification of the lowfrequency part. The scale of the main plot is chosen such that the full height corresponds to the maximum value that the spread can have. As for the density of states shown in Fig. 2, the chemical bond subset is a rather good approximation except for the low-frequency range, whereas the deformation term is good only in this range. The combination is virtually indistinguishable from the chemical bond subset, except for the low-frequency part, where the deformation term becomes important.
10
Figure 4 0.03 Amber 94 intermediate 1 molecular dynamics
0.02
0.01
0.00 0.0
20.0
40.0
60.0
Frequency [THz]
The density of states for crambin from the simplified force field compared to the density of states from a normal mode calculation with the Amber 94 force field and the density of states for myoglobin obtained by MD using the CHARMM force field. The latter is shown only for frequencies below 30 THz, because at higher frequencies an agreement cannot be expected due to the use of a unitedatom model. For low frequencies, the simplified force field stays within the limits of variations due to different parameter sets and cutoff schemes.
11