Quantifying the Entropy of Binding for Water Molecules in ... - Core

8 downloads 0 Views 264KB Size Report
water molecule to a protein cavity will not typically be greater than 7.0 cal/mol/K per water molecule, corresponding to a .... as well as the protonation state of all ionizable residues. .... number of solvent molecules (n) modeled as the pure bulk solvent ...... water molecules in the bacteriorhodopsin proton channel: a molecular.
928

Biophysical Journal

Volume 108

February 2015

928–936

Article Quantifying the Entropy of Binding for Water Molecules in Protein Cavities by Computing Correlations David J. Huggins1,* 1

Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK

ABSTRACT Protein structural analysis demonstrates that water molecules are commonly found in the internal cavities of proteins. Analysis of experimental data on the entropies of inorganic crystals suggests that the entropic cost of transferring such a water molecule to a protein cavity will not typically be greater than 7.0 cal/mol/K per water molecule, corresponding to a contribution of approximately þ2.0 kcal/mol to the free energy. In this study, we employ the statistical mechanical method of inhomogeneous fluid solvation theory to quantify the enthalpic and entropic contributions of individual water molecules in 19 protein cavities across five different proteins. We utilize information theory to develop a rigorous estimate of the total two-particle entropy, yielding a complete framework to calculate hydration free energies. We show that predictions from inhomogeneous fluid solvation theory are in excellent agreement with predictions from free energy perturbation (FEP) and that these predictions are consistent with experimental estimates. However, the results suggest that water molecules in protein cavities containing charged residues may be subject to entropy changes that contribute more than þ2.0 kcal/mol to the free energy. In all cases, these unfavorable entropy changes are predicted to be dominated by highly favorable enthalpy changes. These findings are relevant to the study of bridging water molecules at protein-protein interfaces as well as in complexes with cognate ligands and small-molecule inhibitors.

INTRODUCTION Experimental techniques such as x-ray crystallography commonly identify water molecules in the internal cavities of proteins (1,2). A number of analyses on databases of protein structures have shown that this is a common observation and that protein cavities commonly accommodate one to three water molecules (3,4). Such water molecules tend to be stabilized by hydrogen bonding interactions and are thus expected to exhibit strong translational and orientational ordering (5). Numerous computational studies have been performed to quantify the binding free energy of such water molecules using free energy methods (6–9). In this context, the binding free energy is the free energy of transfer of a water molecule from the bulk to the cavity. As expected, the binding free energies for crystallographically observed water molecules in protein cavities are predicted to be favorable. However, the degree of ordering relative to bulk water may mean that this favorable free energy is comprised of a favorable enthalpy component and an unfavorable entropy component (5). Analysis from Dunitz calculated the entropy contribution of individual water molecules to the entropy of inorganic crystals (10). This analysis suggests that an entropic cost of transferring an individual water molecule to a protein cavity will not typically be greater than 7.0 cal/mol/K. This corresponds to a contribution of

Submitted August 25, 2014, and accepted for publication December 17, 2014. *Correspondence: [email protected] This is an open access article under the CC BY license (http:// creativecommons.org/licenses/by/4.0/). Editor: Nathan Baker Ó 2015 by The Authors 0006-3495/15/02/0928/9 $2.00

approximately þ2.0 kcal/mol to the free energy. From this data, the conclusion was that water molecules in proteins are unlikely to make an entropic contribution of greater than þ2.0 kcal/mol to the free energy. In this study, we employ the statistical mechanical method of inhomogeneous fluid solvation theory (IFST) (5,11–14) to quantify the enthalpic and entropic contributions of individual water molecules in protein cavities. IFST has previously been used to rationalize kinase selectivity (15), identify ligand-binding hotspots at protein surfaces (16), and understand the hydrophobic effect (17). It has proven a very useful tool in modeling networks of water molecules in biological systems. IFST has also been used to study water molecules in cavities of IL-1b (18), and in this article we extend such a study to five different proteins and nineteen protein cavities. In addition, we combine the k-nearest neighbors (KNN) (19,20) algorithm with information theory (21–24) to study doubly occupied cavities and explicitly calculate the contributions of changes in water-water correlations and the associated entropy changes. To our knowledge, this approach represents the first estimate of this quantity using mutual information and provides a complete framework to calculate two-particle entropies, as envisioned in the original development of IFST (11,12). MATERIALS AND METHODS Five test systems were chosen for this study: IL-1b (2), T4 Lysozyme (1), FKBP-2, Carbonic Anhydrase (CA-II) (25), and b-Lactamase (26) (see Table 1). From these five structures, 19 internal cavities were identified. Fifteen of these cavities were singly occupied and four were doubly occupied, and thus the analysis considers 23 water molecules in total. Details of

http://dx.doi.org/10.1016/j.bpj.2014.12.035

Cavity Water Entropy

929

TABLE 1 The five proteins considered in this study along with crystallographic data, the number of singly and doubly occupied internal cavities, the initial charge of the protein before adding counterions, and the residues altered during the preparation stage Protein PDB ID Protein chain ˚) Resolution (A Single cavities Double cavities Initial charge Residues altered

IL-1b

T4 Lysozyme

FKBP-2

CA-II

b-Lactamase

2NVH A 1.53 2 2 0 Q14 Flip H30 Protonate Q48 Flip Q116 Flip N119 Flip Q126 Flip N129 Flip Q149 Flip

3DKE X 1.25 2 1 þ9 H31 Protonate N68 Flip Q69 Flip Q123 Flip N140 Flip N144 Flip N163 Flip

2PBC A 1.8 1 1 0 H55 ε-Hydrogen Q 72 Flip

3GZ0 A 1.26 5 0 0 H4 Protonated H10 ε-Hydrogen H15 ε-Hydrogen H17 ε-Hydrogen H36 ε-Hydrogen H64 ε-Hydrogen H119 ε-Hydrogen N178 Flip N253 Flip

2P74 A 0.88 5 0 þ1 H112 Flip Q31 Flip Q141 Flip Q206 Flip Q254 Flip

the PDB crystal structures and the residue numbers for these 23 water molecules are listed in Table 1.

System setup Protein structures were downloaded from the Protein Databank (27). Selenomethionines were changed to methionines and missing sidechains were added using Schrodinger’s Preparation Wizard (28), which was also used to check the orientations of the asparagine, glutamine, and histidine residues, as well as the protonation state of all ionizable residues. All heteroatomic species such as buffer solvents and ions were removed, with the exception of the zinc ion in the case of Carbonic Anhydrase. The changes made to each structure to improve the hydrogen bonding patterns are detailed in Table 1. The hydrogen-atom positions were then built using the HBUILD facility of CHARMM (29) with the CHARMM27 energy function (30,31), and the force field parameters and partial charges were assigned from the CHARMM27 force field (30,31). Only the crystallographic water molecules identified in internal cavities were retained, with all others being deleted. Water molecules were modeled with the TIP4P-2005 water model (32). To ensure an overall charge of zero in the system, nine and one additional chloride ions were added in the cases of T4 Lysozyme and Beta-Lactamase ˚ from the protein. respectively. All ions were placed farther than 15 A

Molecular dynamics simulations Equilibration was performed for 1.0 ns in an NVT ensemble at 300 K using Langevin temperature control (33). All systems were brought to equilibrium before continuing, by verifying that the energy fluctuations were stable. MD simulations were performed using an MD time step of 2.0 fs. Electrostatic interactions were modeled with a uniform dielectric and a dielectric constant of 1.0 throughout the equilibration and production ˚ with switching runs. Van der Waals interactions were truncated at 11.0 A ˚ . Electrostatics were modeled using the particle mesh Ewald from 9.0 A method (34), and the systems were treated using rhombic dodecahedral periodic boundary conditions. All non-water atoms were fixed for the entirety of the equilibration and production simulations. MD simulations were performed using NAMD version 2.9 (35). To explore the effect of protein conformation on the IFST results, we generated 10 conformations of T4 Lysozyme by running a 10.0 ns simulation with heavy-atom restraints of ˚ 2 and storing the coordinates every 1.0 ns. Each unique pro1.0 kcal/mol/A tein conformation was then used for a separate 10.0 ns IFST calculation with a fixed protein structure. For comparison, the 10.0 ns simulation with heavy-atom restraints was also analyzed.

Free energy perturbation calculations Free energy perturbation (FEP) calculations were performed for each internal cavity, to calculate the binding energy of the buried water molecules (DGbind). The total free energy change for an FEP calculation (DGFEP) was calculated as the sum of free energy changes for a series of N small steps between intermediate states a and b (36). The change in free energy was calculated for each small step (DGa/b) using the partition functions (Q) for the two states, which are calculated from the Hamiltonians (H):

DGFEP ¼

XN a ¼ 1; b ¼ aþ1

DGa/b

DGa/b ¼ Gb  Ga   Qb ¼ kT ln : Qa   ¼ kT ln hexpð  ðHb  Ha Þ=kTÞia

(1)

(2)

The results for the forward and backward FEP simulations were combined using the Bennett Acceptance Ratio (BAR) method (37,38). BAR was implemented using the ParseFEP Plugin from Visual Molecular Dynamics and the statistical error was estimated in each case (39). The estimated statistical error in the FEP free energy predictions using BAR was less than 0.1 kcal/mol in all cases. We used 32 equally spaced l windows for the forward FEP simulations and 32 equally spaced l windows for the backward FEP simulations. A soft-core potential was employed with a van der Waals radius-shifting coefficient of 5.0 (40,41), electrostatic interactions were scaled down to zero between l ¼ 0.0 and l ¼ 0.5, and van der Waals interactions were scaled down to zero between l ¼ 0.5 and l ¼ 1.0 (38). Equilibration was performed for 250 ps for each lambda window, and production simulation was performed for 750 ps for each lambda window. An NVT ensemble was used throughout and all nonwater atoms were fixed for the entirety of the FEP simulations. The free energy cycle for calculating DGbind can be seen in Fig. 1. The first step transfers the water molecule from 55 M to a fixed point in solution (DGliberation), and the second step annihilates the water molecule from bulk (DGinsertion). DGinsertion was calculated to be 6.95 kcal/mol for a fixed TIP4P-2005 water molecule using FEP at 1 atm and 300 K in an NPT ensemble. The third step transfers the water molecule from the fixed location back to 55 M in vacuum (DGliberation), and thus the two liberation terms in the cycle cancel one another. The fourth step is to harmonically restrain the oxygen atom of the water molecule to aid convergence of the free energy calculations Biophysical Journal 108(4) 928–936

930

Huggins the context of protein systems, water molecules tend to cluster at distinct locations termed hydration sites, and it is natural to compute the contribution of individual water molecules to the hydration free energy (14,18,47,48). For the IFST calculations, 100.0 ns of production simulation in an NVT ensemble were performed at 300 K for each system. System snapshots were saved every 5.0 ps, yielding 20,000 snapshots in total for each system. We calculated a mean and standard deviation for each of the IFST quantities by considering 10 blocks from the 100 ns simulation, each derived from 2000 randomly selected snapshots. DGIFST is calculated from the contributions of the hydration energy (DEIFST) and hydration entropy (DSIFST) to the hydration free energy:

DGIFST ¼ DEIFST  TDSIFST :

FIGURE 1 The steps in the free energy cycle used to calculate DGbind for a water molecule. The term aq represents a water molecule in aqueous conditions, the term vac represents a water molecule in vacuum, and the term cav represents a water molecule in a protein cavity. 55 M represents a water molecule at the standard bulk concentration of 55 mol/dm3, fixed represents a water molecule that is at a fixed point, and harm represents a water molecule that is harmonically restrained. To see this figure in color, go online. (7,9,42,43). This harmonic restraint leads to an analytic free energy penalty (DGrestrain) given by Eq. 3:

DGrestrain ¼ RT ln½C0 Vharm  Vharm ¼

2pRT kharm

32 =



(3)

;

(4)

where C0 is the concentration of a water molecule in bulk (55 M), Vharm is the volume available to the harmonically restrained water molecule, and kharm is ˚ 2 in the harmonic force constant. The value of kharm was set to 0.5 kcal/mol/A all cases, and thus DGrestrain was calculated to be þ0.23 kcal/mol. The fifth step is exnihilation of the water molecule in the cavity (DGexnihilation) using FEP. The final step is to remove the harmonic restraint (DGunrestrain) and this contribution is assumed to be zero, as in previous work, and is justified because the dynamics of the water molecule in the cavity are not affected by a force constant that is small in relation to the atomic fluctuations (9,18). For cavities containing two water molecules, the exnihilation is performed on both simultaneously and interactions between the exnihilated water molecules are also scaled to decouple them (18). Considering the steps described above, DGbind can be calculated using Eq. 5:

DGbind ¼ DGexnihilation  DGinsertion þ DGrestrain :

(5)

The symmetry contribution to the binding free energy (0.41 kcal/mol in the case of a water molecule) is only appropriate if there is a difference between the sampling of the symmetry-related states in the bound and unbound states (44). The unbound water molecules are treated as fixed and cannot sample the two symmetry-related states. In this case, the bound water molecules were not observed to sample the two symmetry-related states, presumably because of a large kinetic barrier. Thus, there is no symmetry contribution.

IFST calculations IFST calculates the hydration free energy of a solute (DGIFST) by computing the difference in free energy between a solution and the same number of solvent molecules (n) modeled as the pure bulk solvent (11,12). DGIFST can be calculated for small subvolumes of the system, allowing the contribution of specific regions to be estimated (13,45,46). In Biophysical Journal 108(4) 928–936

(6)

DEIFST is calculated from the mean solute-water interaction energy (Esw), the mean water-water interaction energy (Eww), and the mean interaction energy of a bulk water molecule (Ebulk):

DEIFST ¼ Esw þ Eww  nEbulk : ¼ Esw þ DEww

(7)

Ebulk and Eww are defined as half the interaction energy of a water molecule with all other water molecules in the system. For the TIP4P-2005 water model, Ebulk is calculated to be 11.5748 kcal/mol. The total hydration entropy is calculated as the sum of local (Slocal) and nonlocal contributions (Snonlocal). Slocal is the summed contribution of each local subvolume and Snonlocal is the sum of the volume entropy (Sve) and the change in liberation entropy (DSlib) (12). Slocal is calculated from the two-particle correlation function (gsw) that is a function of the position (r) and orientation (u) of the water molecule. The number density of bulk TIP4P-2005 water (r) is ˚ 3: calculated to be 0.03324 molecules/A

Z

DSlocal ¼ Rr

½gsw ln gsw  gsw þ 1dV þ DSww Z R ¼ Rr gsw ln gsw dV  Rr ½1 gsw dV þDSww Z ¼ DSIFST  Rr ½1  gsw dV (8)

DSnonlocal ¼ Sve þ DSlib ¼ Rð1  rVs Þ þ RðaT  rkkTÞ ¼ R þ RrVs þ RaT  RrkkT   R ¼ RþRr kkT þ ½1gsw dV þRaT  RrkkT R ¼ RðaT  1Þ þ Rr ½1  gsw dV (9) DShydration ¼ Slocal þ Snonlocal ¼ DSIFST þ RðaT  1Þ

;

(10)

zDSIFST  R where Vs is the partial molar volume of the solute, a is the thermal expansion coefficient of the solvent, and k is the isothermal compressibility of the solvent. Equation 9 uses the Kirkwood-Buff relationship for Vs (49). The two remaining integrals in Eqs. 8 and 9 can be understood as a local term corresponding to a reduction in the volume accessible to the solvent (Vs) and a non-local term corresponding to an increase in the volume accessible to the solvent (Vs) as the system expands under constant pressure. These two terms are equal and cancel one another. DSIFST is calculated from the two-particle correlations using the solute-water entropy (Ssw)

Cavity Water Entropy

931

and the difference in water-water entropy (DSww); higher-order correlations are not considered:

DSIFST ¼ Ssw þ Sww  nSbulk ¼ Ssw þ DSww

:

(11)

For the TIP4P-2005 water model, Sbulk is calculated from the values for Ebulk and DGinsertion to be -15.5097 cal/K/mol. In recent work, we developed a novel KNN approach to calculate Ssw using the translational and orientational distance metrics (50). The translational distance (dtrans) between two water molecules is simply the Euclidean norm between the Cartesian coordinates of the two water molecules. The orientational distance (dorient) between two water molecules is the distance between the rotations required to bring the two orientations to the same reference orientation. The correct distance metric for the rotation group is twice the geodesic distance on the unit sphere (51). The KNN algorithm provides an unbiased estimate of the absolute entropy (Habs) from the general expression in Eq. 12:

Habs

" # p p=2 ndi;k p 1Xn ¼ ln  Lk1 þ g n i ¼ 1 Gðp=2 þ 1Þ Xj

1 ; i¼1 i

(12)

molecule k in frame l are denoted by qij and qkl, respectively. In this work, we extend the nearest neighbor approach to calculate Sww from the twoparticle correlation functions (gsw, gsw’, and gww’) and the triplet correlation function (gsww’), which is a function of the 12 variables representing the positions and orientations of a pair of water molecules (p ¼ 12):

1 Sww ¼  Rr2 2

ZZ

gsw gsw0 ½gww0 ln gww0  gww0 þ 1dwdw0

ZZ 1 2 gsw gsw0 gww0 ln gww0 dwdw0 ¼  Rr 2 ZZ 1 2 gsw gsw0 ½1  gww dwdw0  Rr 2   ZZ 1 gsww0 gsww0 ln ¼  Rr2 dwdw0 2 gsw gsw0 ZZ 1 gsw gsw0 ½1  gww dwdw0  Rr2 2 X ¼  ½Iww þ Svol 

:

pairs

(13)

(18)

where n is the number of samples, di,k is the distance between sample point i and its kth nearest neighbor, p is the number of degrees of freedom, G is the gamma function, L0 is 0, and g is Euler’s constant. In the context of an individual water molecule, the relative solute-water thermodynamic entropy (Ssw) is calculated using the total distance (dtotal) between two water molecules in six dimensions (p ¼ 6) (50). It is calculated from the difference between the absolute entropy of the distribution (Habs) and the absolute entropy of a uniform distribution (Huni). We disregard the symmetry number (two in this case) as it is present in both Habs and Huni. We use the first nearest neighbor in this work (k ¼ 1) and the number of samples is equal to the number of frames where a water molecule is present in the hydration site:

Iww is best understood as a mutual information (MI) term as it represents the additional correlation between two water molecules that is not captured by the solute-water entropy term. The Svol term accounts for the exclusion of solvent from the volume occupied by other solvent molecules and is related to the Kirkwood-Buff integral (11,49). For a single water molecule, Sww can be calculated by considering the pair correlation with all other water molecules in the system. In this work, we restrict the sum to pairs of water ˚ . One can easily consider Iww as a combination of molecules within 4.0 A two- and three-particle entropy terms:

Lj ¼

Ssw ¼ RfHabs  Huni g  6 3  2   X 1 n ndtotal p 8p ln þ g  ln ¼ R  L 0 n i¼1 Gð4Þ r  6 3  2   X 1 n ndtotal p 1Xn 8p ln ln ¼ R þ g  n i¼1 n i¼1 6 r  6   X 1 n ndtotal pr ln þg ¼ R n i¼1 48 (14) dtotal

dtrans

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ d 2trans þ d2orient ;

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2  2  2 xij  xkl þ yij  ykl þ zij  zkl ; ¼   dorient ¼ 2  acos qij :qkl ;

(15) (16) (17)

where R is the gas constant and converts the entropy to thermodynamic units and G(4) is equal to 6. The Cartesian coordinates of water molecule j in frame i and its nearest neighbor water molecule k in frame l are denoted by xij, yij, zij and xkl, ykl, zkl, respectively. The quaternion representations of the rotations for water molecule j in frame i and its nearest neighbor water

Iww

1 ¼ Rr2 2 ¼

1 2 Rr 2

ZZ

gsww0 ZZ

  gsww0 ln dwdw0 gsw gsw0

½gsww0 ln gsww0

;

(19)

0

 gsww0 ln gsw  gsww0 ln gsw0 dwdw ¼ Ssww0  Ssw  Ssw0

where S*sw and S*sw’ represent the two-particle entropies computed from the pair data and can be calculated using Eq. 14. The three-particle entropy (Ssww’) can be calculated from the total distance (dpair) between two pairs of water molecules. In this case, both hydration sites are occupied in every snapshot for each doubly occupied cavity and thus all n frames contain pair data:

Ssww0 ¼ RfHabs  Huni g ( " #  ) 12 ndpair p6 1Xn 64p4 ¼ R ln  L0 þ g  ln n i¼1 Gð7Þ r2 ( " #  ) 12 ndpair p6 1Xn 1X n 64p4 ¼ R ln ln þg n i¼1 n i¼1 720 r2 ( " # ) 12 ndpair p2 r 2 1Xn ¼ R ln þg n i¼1 46080 (20) Biophysical Journal 108(4) 928–936

932

Huggins

dpair

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ d2total þ d2total0 :

(21)

G(7) is equal to 720. In practice, problems arise from combining KNN terms of different dimensionality in Eq. 19 (52). Thus, we use the method of permuted fill modes as described by Hensen et al. (53) The permuted set of distances (dperm) captures the correlation of the individual water molecules with the solute but decouples the correlation between the water molecules by computing the entropy of the artificially decorrelated data:

Iww ¼ Ssww0  Sperm sww0 # ) ( " 12 r2 p2 ndpair 1X n þg ¼ R ln n i¼1 46080 ( # ) " 12 r2 p2 ndperm : 1X n R þg ln n i¼1 46080 " # 12 dpair RXn ¼ ln 12 n i ¼ 1 dperm

(22)

Whereas the individual entropy terms, such as the one defined in Eq. 20, obey a power law convergence (as previously observed), the MI estimate in Eq. 22 does not appear to obey a power law convergence. It is also interesting to note that this MI estimate is not biased, as the bias terms from the two entropy estimates cancel one another. Within a singly occupied cavity, Iww and Svol are negligible because there are no significant water-water pair correlations. In a cavity with two or more water molecules, only pairs of subvolumes in which gsw and gsw’ are nonzero will make a contribution to Svol. These regions are very small and Svol is thus expected to be negli-

TABLE 2 reported

gible. For this reason, we assume that Svol is zero in all cases. As a comparison, the magnitude of Svol in bulk water can be calculated from the radial distribution function and has a value of 0.99 cal/mol/K for the TIP4P2005 water model, making a contribution of þ0.29 kcal/mol to the excess free energy.

RESULTS AND DISCUSSION We begin by considering the IFST estimates of enthalpy, entropy, and free energy contributions of each water molecule to the protein hydration free energy. The calculations for the 23 sites are presented in Table 2. The standard deviations are all below 0.1 kcal/mol and generally much smaller. The predicted free energies are in the range from 2.6 to 16.0 kcal/mol, and this is in excellent agreement with previous IFST studies that showed a range from 1.9 to 17.2 kcal/mol (5,14). The predicted enthalpies are larger in magnitude than the entropies and make the dominant contributions to the free energies for all 23 water molecules. This is also in agreement with previous studies. The mean contribution of þ1.82 kcal/mol is in good agreement with the contribution of þ2.0 kcal/mol estimated by Dunitz, and the maximum contribution of þ2.67 kcal/mol is not far above this value. As expected, the most favorable DEIFST and unfavorable TDSIFST are found for protein cavities containing charged amino acid sidechains. Table 2 also shows that the Iww term is small in magnitude for each pair of water molecules in the four

The results of the IFST calculations for the 23 hydration sites, with the mean and standard deviation from 10 blocks

System IL-1b IL-1b IL-1b IL-1b IL-1b IL-1b T4 Lysozyme T4 Lysozyme T4 Lysozyme T4 Lysozyme FKBP-2 FKBP-2 FKBP-2 CA-II CA-II CA-II CA-II CA-II b-Lactamase b-Lactamase b-Lactamase b-Lactamase b-Lactamase Minimum Maximum Mean

PDB water ID

Esw (kcal/mol)

Eww (kcal/mol)

DEIFST (kcal/mol)

202 204 203 207 200 209 902 905 904 920 207 208 203 2004 2015 2031 2042 2055 2023 2048 2073 2105 2327

15.93 5 0.03 16.73 5 0.02 14.56 5 0.05 13.75 5 0.02 21.21 5 0.02 19.04 5 0.03 22.13 5 0.02 18.67 5 0.02 16.23 5 0.03 21.10 5 0.02 17.96 5 0.01 19.54 5 0.03 26.49 5 0.02 20.77 5 0.03 27.52 5 0.03 22.40 5 0.02 24.11 5 0.02 17.80 5 0.02 16.08 5 0.03 27.94 5 0.01 28.60 5 0.03 24.70 5 0.02 30.27 5 0.02

2.69 5 0.01 2.72 5 0.01 1.97 5 0.02 1.99 5 0.02

7.05 5 0.03 7.88 5 0.02 4.96 5 0.04 4.16 5 0.03 9.48 5 0.02 7.33 5 0.03 13.50 5 0.02 10.06 5 0.02 4.65 5 0.03 9.54 5 0.02 8.76 5 0.01 10.40 5 0.02 14.91 5 0.02 9.47 5 0.03 15.92 5 0.03 11.09 5 0.02 12.56 5 0.02 6.20 5 0.02 3.95 5 0.03 16.33 5 0.01 16.47 5 0.03 13.08 5 0.02 18.70 5 0.02 18.70 3.95 10.28

Biophysical Journal 108(4) 928–936

2.95 5 0.01 2.97 5 0.01 2.37 5 0.01 2.44 5 0.01

TSsw (kcal/mol) 6.35 5 6.41 5 6.63 5 5.77 5 6.53 5 5.08 5 6.72 5 6.45 5 6.27 5 6.40 5 6.01 5 6.09 5 7.03 5 6.47 5 6.85 5 6.28 5 6.59 5 6.66 5 6.02 5 7.09 5 6.52 5 5.93 5 7.29 5

0.04 0.03 0.05 0.03 0.04 0.04 0.03 0.03 0.03 0.03 0.03 0.05 0.04 0.03 0.04 0.02 0.03 0.03 0.04 0.03 0.03 0.05 0.02

TSww (kcal/mol)

TDSIFST (kcal/mol)

DGIFST (kcal/mol)

0.07 5 0.02 0.07 5 0.02 0.10 5 0.03 0.10 5 0.03

1.80 5 0.05 1.86 5 0.04 2.11 5 0.06 1.25 5 0.03 1.91 5 0.04 0.46 5 0.04 2.16 5 0.03 1.88 5 0.03 1.65 5 0.03 1.78 5 0.03 1.48 5 0.03 1.56 5 0.06 2.41 5 0.04 1.85 5 0.03 2.23 5 0.04 1.66 5 0.02 1.97 5 0.03 2.04 5 0.03 1.40 5 0.04 2.47 5 0.03 1.90 5 0.03 1.31 5 0.05 2.67 5 0.02 0.46 2.67 1.82

5.25 5 0.04 6.02 5 0.03 2.85 5 0.04 2.91 5 0.03 7.57 5 0.03 6.87 5 0.02 11.34 5 0.04 8.18 5 0.03 3.00 5 0.01 7.76 5 0.01 7.29 5 0.02 8.85 5 0.05 12.50 5 0.02 7.62 5 0.02 13.70 5 0.01 9.44 5 0.02 10.58 5 0.02 4.17 5 0.03 2.55 5 0.02 13.87 5 0.03 14.57 5 0.02 11.77 5 0.04 16.03 5 0.01 16.03 2.55 8.46

0.05 5 0.02 0.05 5 0.02 0.08 5 0.01 0.08 5 0.01

Cavity Water Entropy

933

doubly occupied cavities. The IFST and FEP results are reported in Table 3. The agreement between IFST and FEP is extremely good, with an R2 coefficient of determination of 0.995 and a mean unsigned difference (MUD) of 0.45 kcal/mol. The accuracy of the estimates for Iww are supported by the close agreement of DGIFST and DGbind. In addition to analyzing the protein conformation from the crystal structure, we considered the effect of the protein conformation on the predictions of IFST in the case of T4 Lysozyme. This was achieved by analyzing 10 different protein conformations generated from a simulation with harmonically restrained heavy atoms. The results are presented in Table 4. Comparing the results from the fixed crystal structure with the results from the harmonically restrained structure, the TDSIFST term is reduced from an average of 1.87 kcal/mol to an average of 0.90 kcal/mol, respectively. This is in agreement with previous work showing that the TDSIFST term is smaller in magnitude when using data from a harmonically restrained protein simulation and is the expected result, because of a blurring of the probability densities when the protein is mobile (18). The DEIFST and DGIFST terms also differ to a small extent, with the most notable difference being that the binding free energy is more favorable by 2.15 kcal/mol in the case of water 904. Comparing the IFST results from the fixed crystal structure with the results from the ensemble of 10 structures, the differences are lower than 1.5 kcal/mol in all cases, and the qualitative picture remains the same. However, it is worth noting that the standard deviations from the 10 simulations are relatively large, with a maximum of 1.40 kcal/mol for DEIFST in the case of water 904. TABLE 3

CONCLUSIONS In this work, we have predicted the free energy of transferring water molecules from the bulk into a buried protein cavity using the methods of IFST and FEP. This can be viewed either as the binding free energy of the water molecule (DGbind) or the contribution of the water molecule to the hydration free energy of the protein (DGIFST). The free energy contributions are all strongly favorable and dominated by a favorable enthalpy component. The entropy contributions to the free energy are all unfavorable, but relatively small in magnitude. The water molecules with the strongest binding affinities tend to be in hydrophilic cavities making one (b-Lactamase W2073) or two (b-Lactamase W2327) hydrogen bonds with charged amino acids. Conversely, the water molecules with the weakest binding affinities tend to be in hydrophobic cavities (T4 Lysozyme W904) with very little potential for hydrogen bonding, as expected. The agreement between IFST and FEP for the 19 protein cavities in the five test systems is extremely good, with an R2 coefficient of determination of 0.995 and a MUD of 0.45 kcal/mol. To our knowledge, this study also represents the first calculation of the total two-particle entropy term (DSIFST) using information theory. The excellent agreement between IFST and FEP indicates that the mutual information terms between the pairs of water molecules (Iww) have been calculated correctly and that this term is negligible for water molecules in protein cavities. This is an important result because it demonstrates that the majority of the correlation is captured by the solute-water entropy (Ssw) term. This means that the total change in water-water entropy (DSww) is approximately equal to Sbulk (rather than zero) and makes a significantly favorable contribution of 4.62 kcal/mol to the binding free energy for the

The results of the FEP and IFST calculations for the 19 cavities

System IL-1b IL-1b IL-1b IL-1b T4 Lysozyme T4 Lysozyme T4 Lysozyme FKBP-2 FKBP-2 CA-II CA-II CA-II CA-II CA-II b-Lactamase b-Lactamase b-Lactamase b-Lactamase b-Lactamase Mean

PDB water IDs

DGbind (kcal/mol)

DGIFST (kcal/mol)

Signed difference (kcal/mol)

Unsigned difference (kcal/mol)

202 and 204 203 and 207 200 209 902 and 905 904 920 207 and 208 203 2004 2015 2031 2042 2055 2023 2048 2073 2105 2327

11.77 6.18 7.09 6.90 20.41 3.33 8.29 16.70 13.07 8.21 14.05 10.06 11.09 4.29 2.31 14.30 13.98 12.06 16.53

11.27 5.76 7.57 6.87 19.52 3.00 7.76 16.13 12.50 7.62 13.70 9.44 10.58 4.17 2.55 13.87 14.57 11.77 16.03

0.49 0.42 0.49 0.03 0.89 0.33 0.53 0.57 0.57 0.59 0.36 0.62 0.51 0.12 0.24 0.44 0.59 0.29 0.51 0.31

0.49 0.42 0.49 0.03 0.89 0.33 0.53 0.57 0.57 0.59 0.36 0.62 0.51 0.12 0.24 0.44 0.59 0.29 0.51 0.45

Biophysical Journal 108(4) 928–936

934

Huggins

TABLE 4 The results of the IFST calculations for 10 conformations of T4 Lysozyme, alongside the results for the first 10 ns of the fixed crystal structure simulation, and the results for the 10 ns harmonically restrained simulation Fixed crystal structure PDB water ID 902 905 904 920

Ten simulation structures

Harmonically restrained

DEIFST (kcal/mol)

TDSIFST (kcal/mol)

DGIFST (kcal/mol)

DEIFST (kcal/mol)

TDSIFST (kcal/mol)

DGIFST (kcal/mol)

DEIFST (kcal/mol)

TDSIFST (kcal/mol)

DGIFST (kcal/mol)

13.50 10.06 4.65 9.54

2.16 1.90 1.66 1.78

11.34 8.16 2.99 7.76

13.08 5 0.83 9.11 5 0.39 5.62 5 1.40 8.31 5 0.96

2.32 5 0.32 2.12 5 0.17 1.21 5 0.25 1.53 5 0.29

10.76 5 0.84 6.99 5 0.45 4.41 5 1.26 6.79 5 0.84

13.30 8.80 5.35 8.52

1.30 1.38 0.21 0.70

12.00 7.41 5.14 7.82

For the 10 conformations, the means and standard deviations are reported.

TIP4P-2005 water model. This result is expected to hold for highly ordered water molecules in protein active sites. It is important to note that small values of Iww in no way suggest an absence of correlation between the pair of water molecules in a doubly-occupied protein cavity. Solute-water and water-water correlations are explicitly wrapped up in the total two-particle entropy change (DSIFST), and the Iww term serves to capture additional correlations not quantified by the Ssw term. One could equally well proceed by considering the water-water correlations first, but the choice of a solute reference frame significantly simplifies the calculations. In this study, we have considered cavities containing up to two water molecules. Studies on cavities with three or more water molecules would allow the accuracy of the two-particle approximation to be assessed. This is an important issue that should be addressed in future work. At the same time, it will also be important to estimate the Svol term explicitly, as it may be significant and vary between buried and surface-exposed hydration sites. The average prediction of TDSIFST is þ1.82 kcal/mol and this is consistent with the work of Dunitz, who noted that water molecules in inorganic salts differ in entropy from bulk solvent by an amount corresponding to a free energy difference of approximately þ2.0 kcal/mol. The largest value of TDSIFST in this study is found for a water molecule near two charged residues and is þ2.67 kcal/mol. In comparison, the greatest difference in Dunitz’s analysis is þ3.5 kcal/mol in the case of zinc sulfate monohydrate (10). It is important to note that the majority of entropy estimates in this work are for a fixed protein structure. The use of a fixed protein allows direct comparison of the IFST and FEP predictions and thus validation of the IFST calculations. It is clear that using a harmonically restrained or fully mobile protein structure within the current framework of IFST will lead to misestimation of the entropies because of a blurring of the probability densities. However, the results from calculations on 10 different fixed conformations of T4 Lysozyme suggest that IFST results should be robust in cases where the solute does not show a significant deviation in size, shape, or electrostatics. This is true for the cavities within these compact protein structures and should extend to many cases of interest in biology. Despite this, it is clear that further work is needed to develop probabilityBiophysical Journal 108(4) 928–936

based statistical mechanical methods such as IFST to make accurate predictions. The timescales for the IFST and FEP simulations are similar, though the timescales for FEP depend on the number of lambda windows. This makes FEP a technique that requires performing benchmark simulations for each particular case or user input in configuring the calculations. Conversely, the convergence of IFST calculations can be automatically monitored throughout a simulation until the required accuracy is reached. Importantly, a single IFST simulation is informative about every hydration site in a system whereas FEP requires a separate simulation for each water molecule. In addition, IFST can be used to study individual water molecules within a network whereas FEP is best suited to studying single water molecules or groups of water molecules as a whole. This is because annihilating a water molecule from a network requires artificially creating an unphysical vacuum in a hydration site. For this reason, IFST is unique in yielding spatially resolved prediction of water thermodynamics and this is extremely useful in identifying ligand-binding hotspots at protein surfaces (48) and guiding the design of high-affinity small-molecule inhibitors (47). The IFST calculation of the hydration entropy is defined in the context of a fixed solute. The use of a mobile protein has been shown to reduce DSIFST, and this can be attributed to a blurring of the probability densities. The effect of the protein conformation on the results of IFST has been investigated by performing simulations for 10 different fixed conformations of T4 Lysozyme. The results are very similar to those for the crystal structure conformation, with no difference greater than 1.5 kcal/mol. This suggests that IFST predictions will be robust if the protein confirmation does not deviate significantly from the conformation observed in the crystal structure. However, the standard deviations are significant and thus the choice of protein confirmation will affect the results if a single-protein confirmation is used. The method of combining results from multiple IFST calculations on an ensemble of protein confirmations allows one to account for molecular flexibility and estimate the coupling of solute and solvent degrees of freedom. Clearly, FEP has the major advantage of considering molecular flexibility and, alongside thermodynamic integration, remains the tool of choice for estimation of absolute and

Cavity Water Entropy

relative binding free energies. However, when studying water molecules, IFST has a number of unique advantages and these make it a very useful tool for understanding the role of hydration in the structure and function of biological systems. In summary, to our knowledge, we have developed a new approach combining KNN and MI to calculate the total twoparticle entropy in the context of IFST, and we have shown that the resulting predictions for the contribution of water molecules in protein cavities to the hydration free energy agree extremely well with equivalent predictions using FEP. The predicted entropy contributions to the free energy are in the range of þ0.46 to þ2.67 kcal/mol and this is in excellent agreement with historical estimates. In the future, it will be interesting to apply the entropy estimates developed in this work to extended water networks at protein surfaces and within protein binding sites. In these cases, the coupling of solute and solvent degrees of freedom is expected to be more significant and will need to be treated appropriately. ACKNOWLEDGMENTS Acknowledgments go to the Cambridge High Performance Computing Service for technical assistance, Professor Mike Gilson for helpful discussions, and Professor Bruce Tidor for support. This work was supported by the MRC under grant ML/L007266/1. All calculations were performed using the Darwin Supercomputer of the University of Cambridge High Performance Computing Service (http://www.hpc.cam.ac.uk/) provided by Dell, Inc., using Strategic Research Infrastructure Funding from the Higher Education Funding Council for England and were funded by the EPSRC under grants EP/F032773/1 and EP/J017639/1.

REFERENCES

935 9. Hamelberg, D., and J. A. McCammon. 2004. Standard free energy of releasing a localized water molecule from the binding pockets of proteins: double-decoupling method. J. Am. Chem. Soc. 126:7683–7689. 10. Dunitz, J. D. 1994. The entropic cost of bound water in crystals and biomolecules. Science. 264:670. 11. Lazaridis, T. 1998. Inhomogeneous fluid approach to solvation thermodynamics. 1. Theory. J. Phys. Chem. B. 102:3531–3541. 12. Lazaridis, T. 1998. Inhomogeneous fluid approach to solvation thermodynamics. 2. Applications to simple fluids. J. Phys. Chem. B. 102:3542–3550. 13. Lazaridis, T. 2000. Solvent reorganization energy and entropy in hydrophobic hydration. J. Phys. Chem. B. 104:4964–4979. 14. Li, Z., and T. Lazaridis. 2006. Thermodynamics of buried water clusters at a protein-ligand binding interface. J. Phys. Chem. B. 110:1464–1475. 15. Robinson, D. D., W. Sherman, and R. Farid. 2010. Understanding kinase selectivity through energetic analysis of binding site waters. Chem. Med. Chem. 5:618–627. 16. Beuming, T., Y. Che, ., W. Sherman. 2012. Thermodynamic analysis of water molecules at the surface of proteins and applications to binding site prediction and characterization. Proteins Struct. Funct. Bioinf. 80:871–883. 17. Snyder, P. W., J. Mecinovic, ., G. M. Whitesides. 2011. Mechanism of the hydrophobic effect in the biomolecular recognition of arylsulfonamides by carbonic anhydrase. Proc. Natl. Acad. Sci. USA. 108:17889– 17894. 18. Raman, E. P., and A. D. MacKerell, Jr. 2013. Rapid estimation of hydration thermodynamics of macromolecular regions. J. Chem. Phys. 139:055105. 19. Singh, H., N. Misra, ., E. Demchuk. 2003. Nearest neighbor estimates of entropy. Am. J. Math. Manag. Sci. 23:301–321. 20. Hensen, U., H. Grubmu¨ller, and O. F. Lange. 2009. Adaptive anisotropic kernels for nonparametric estimation of absolute configurational entropies in high-dimensional configuration spaces. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 80:011913. 21. Killian, B. J., J. Yundenfreund Kravitz, and M. K. Gilson. 2007. Extraction of configurational entropy from molecular simulations via an expansion approximation. J. Chem. Phys. 127:024107. 22. Matsuda, H. 2000. Physical nature of higher-order mutual information: intrinsic correlations and frustration. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics. 62 (3 Pt A):3096–3102.

1. Liu, L., M. L. Quillin, and B. W. Matthews. 2008. Use of experimental crystallographic phases to examine the hydration of polar and nonpolar cavities in T4 lysozyme. Proc. Natl. Acad. Sci. USA. 105:14406– 14411.

23. Hnizdo, V., J. Tan, ., M. K. Gilson. 2008. Efficient calculation of configurational entropy from molecular simulations by combining the mutual-information expansion and nearest-neighbor methods. J. Comput. Chem. 29:1605–1614.

2. Quillin, M. L., P. T. Wingfield, and B. W. Matthews. 2006. Determination of solvent content in cavities in IL-1beta using experimentally phased electron density. Proc. Natl. Acad. Sci. USA. 103:19749–19753.

24. Hnizdo, V., E. Darian, ., H. Singh. 2007. Nearest-neighbor nonparametric method for estimating the configurational entropy of complex molecules. J. Comput. Chem. 28:655–668.

3. Hubbard, S. J., K. H. Gross, and P. Argos. 1994. Intramolecular cavities in globular proteins. Protein Eng. 7:613–626.

25. Avvaru, B. S., S. A. Busby, ., R. McKenna. 2009. Apo-human carbonic anhydrase II revisited: implications of the loss of a metal in protein structure, stability, and solvent network. Biochemistry. 48:7365–7372.

4. Williams, M. A., J. M. Goodfellow, and J. M. Thornton. 1994. Buried waters and internal cavities in monomeric proteins. Protein Sci. 3:1224–1235. 5. Li, Z., and T. Lazaridis. 2003. Thermodynamic contributions of the ordered water molecule in HIV-1 protease. J. Am. Chem. Soc. 125:6636–6637. 6. Olano, L. R., and S. W. Rick. 2004. Hydration free energies and entropies for water in protein interiors. J. Am. Chem. Soc. 126:7991– 8000. 7. Roux, B., M. Nina, ., J. C. Smith. 1996. Thermodynamic stability of water molecules in the bacteriorhodopsin proton channel: a molecular dynamics free energy perturbation study. Biophys. J. 71:670–681. 8. Yu, H., and S. W. Rick. 2010. Free energy, entropy, and enthalpy of a water molecule in various protein environments. J. Phys. Chem. B. 114:11552–11560.

26. Chen, Y., R. Bonnet, and B. K. Shoichet. 2007. The acylation mechanism of CTX-M b-lactamase at 0.88 a resolution. J. Am. Chem. Soc. 129:5378–5380. 27. Berman, H. M., J. Westbrook, ., P. E. Bourne. 2000. The Protein Data Bank. Nucleic Acids Res. 28:235–242. 28. Sastry, G. M., M. Adzhigirey, ., W. Sherman. 2013. Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments. J. Comput. Aided Mol. Des. 27:221–234. 29. Brooks, B. R., C. L. Brooks, 3rd, ., M. Karplus. 2009. CHARMM: the biomolecular simulation program. J. Comput. Chem. 30:1545–1614. 30. MacKerell, A. D., D. Bashford, ., M. Karplus. 1998. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B. 102:3586–3616. Biophysical Journal 108(4) 928–936

936 31. Mackerell, Jr., A. D., M. Feig, and C. L. Brooks, 3rd. 2004. Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. J. Comput. Chem. 25:1400–1415. 32. Abascal, J. L. F., and C. Vega. 2005. A general purpose model for the condensed phases of water: TIP4P/2005. J. Chem. Phys. 123:234505. 33. Feller, S. E., Y. H. Zhang, ., B. R. Brooks. 1995. Constant-pressure molecular dynamics simulation: the Langevin piston method. J. Chem. Phys. 103:4613–4621. 34. Essmann, U., L. Perera, ., L. G. Pedersen. 1995. A smooth particle mesh Ewald method. J. Chem. Phys. 103:8577–8593. 35. Phillips, J. C., R. Braun, ., K. Schulten. 2005. Scalable molecular dynamics with NAMD. J. Comput. Chem. 26:1781–1802. 36. Zwanzig, R. W. 1954. High-temperature equation of state by a perturbation method. I. nonpolar gases. J. Chem. Phys. 22:1420–1426.

Huggins 43. Boresch, S., F. Tettinger, ., M. Karplus. 2003. Absolute binding free energies: a quantitative approach for their calculation. J. Phys. Chem. B. 107:9535–9551. 44. Gilson, M. K., and K. K. Irikura. 2010. Symmetry numbers for rigid, flexible, and fluxional molecules: theory and applications. J. Phys. Chem. B. 114:16304–16317. 45. Huggins, D. J., and M. C. Payne. 2013. Assessing the accuracy of inhomogeneous fluid solvation theory in predicting hydration free energies of simple solutes. J. Phys. Chem. B. 117:8232–8244. 46. Nguyen, C. N., T. K. Young, and M. K. Gilson. 2012. Grid inhomogeneous solvation theory: hydration structure and thermodynamics of the miniature receptor cucurbit[7]uril. J. Chem. Phys. 137:044101. 47. Young, T., R. Abel, ., R. A. Friesner. 2007. Motifs for molecular recognition exploiting hydrophobic enclosure in protein-ligand binding. Proc. Natl. Acad. Sci. USA. 104:808–813.

37. Bennett, C. H. 1976. Efficient estimation of free energy differences from Monte-Carlo data. J. Comput. Phys. 22:245–268.

48. Huggins, D. J., M. Marsh, and M. C. Payne. 2011. Thermodynamic properties of water molecules at a protein-protein interaction surface. J. Chem. Theory Comput. 7:3514–3522.

38. Pohorille, A., C. Jarzynski, and C. Chipot. 2010. Good practices in free-energy calculations. J. Phys. Chem. B. 114:10235–10253.

49. Kirkwood, J. G., and F. P. Buff. 1951. The statistical mechanical theory of solutions. I. J. Chem. Phys. 19:774–777.

39. Liu, P., F. Dehez, ., C. Chipot. 2012. A toolkit for the analysis of freeenergy perturbation calculations. J. Chem. Theory Comput. 8:2606– 2616.

50. Huggins, D. J. 2014. Estimating translational and orientational entropies using the k-nearest neighbors algorithm. J. Chem. Theory Comput. 10:3617–3625.

40. Beutler, T. C., A. E. Mark, ., W. F. van Gunsteren. 1994. Avoiding singularities and numerical instabilities in free energy calculations based on molecular simulations. Chem. Phys. Lett. 222:529–539.

51. Huggins, D. J. 2014. Comparing distance metrics for rotation using the k-nearest neighbors algorithm for entropy estimation. J. Comput. Chem. 35:377–385.

41. Zacharias, M., T. P. Straatsma, and J. A. McCammon. 1994. Separation-shifted scaling, a new scaling method for Lennard-Jones interactions in thermodynamic integration. J. Chem. Phys. 100:9025–9031. 42. Gilson, M. K., J. A. Given, ., J. A. McCammon. 1997. The statisticalthermodynamic basis for computation of binding affinities: a critical review. Biophys. J. 72:1047–1069.

52. Kraskov, A., H. Sto¨gbauer, and P. Grassberger. 2004. Estimating mutual information. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 69:066138.

Biophysical Journal 108(4) 928–936

53. Hensen, U., O. F. Lange, and H. Grubmu¨ller. 2010. Estimating absolute configurational entropies of macromolecules: the minimally coupled subspace approach. PLoS ONE. 5:e9179.

Suggest Documents