Using Force-Field Grids for Sampling Translation

0 downloads 0 Views 959KB Size Report
Jan 4, 2017 - Keywords: force-field grid; protein loops; Monte Carlo; partially flexible molecules. 1. .... Crystal structure of MHC class I HLA-A2.1 bound to a ...
algorithms Communication

Using Force-Field Grids for Sampling Translation/Rotation of Partially Rigid Macromolecules Mihaly Mezei Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; [email protected]; Tel.: +1-212-659-5475 Academic Editor: Henning Fernau Received: 30 October 2016; Accepted: 23 December 2016; Published: 4 January 2017

Abstract: An algorithm is presented for the simulation of two partially flexible macromolecules where the interaction between the flexible parts and rigid parts is represented by energy grids associated with the rigid part of each macromolecule. The proposed algorithm avoids the transformation of the grid upon molecular movement at the expense of the significantly lesser effect of transforming the flexible part. Keywords: force-field grid; protein loops; Monte Carlo; partially flexible molecules

1. Introduction Atomic level simulation of macromolecules is currently performed almost exclusively by molecular dynamics. Its success, due to it following the actual time course of the system, is also the source of its ultimate limitation—that of the time scale reachable by it. However, it was argued earlier that Monte Carlo, the preferred technique at the early times of molecular simulations, has the potential of overcome this limitation as it is not bound to time [1]. In spite of this potential, there are few groups working along this line [2–4]. A further advantage of the Monte Carlo approach over molecular dynamics is the ease of limiting the sampling to certain parts of the system and/or to a limited set of degrees of freedom. Representing the field of rigid macromolecules on grids specific to the type of the interacting atoms is a standard technique employed in many docking softwares [5,6] where the target is held rigid and a flexible ligand is moved around the target. Assuming that there are NR atoms in the rigid part, in the energy calculation for each moving atom, the grid representation of the rigid part of the macromolecule replaces the calculation of NR distances with the calculation of the distances from the eight nearest gridpoints followed by interpolating the corresponding eight energy terms adding an effort equivalent to about two more distance calculations. Since the most likely macromolecule type to be thus modeled is a protein and protein domains have well over a thousand atoms (i.e., NR ' 1000 is a low estimate in most cases), using the grid representation saves a factor of NR /10, that is, likely a factor of ~100 or more. This idea has also been combined with the Monte Carlo approach for the simulation of protein loops [7] where only a small part of the molecule is moved during the simulation, achieving similar speedup. The fact that the rigid part of the macromolecule is also kept in place was an important aspect for the success of these applications since it allowed keeping the grid(s) unchanged during the entire simulation. The algorithm described here is designed to maintain the gains achievable by the use of the grids for the case when a macromolecule is moved. When modeling the interaction of two partially rigid macromolecules, the straightforward extension of these algorithms would run into the difficulty of having to move at least one of the sets of grids. This introduces additional computation expenses: (a) finding the nearest eight gridpoints

Algorithms 2017, 10, 6; doi:10.3390/a10010006

www.mdpi.com/journal/algorithms

Algorithms 2017, 10, 6

2 of 4

would be more involved since the rotated grids would not be aligned with the Cartesian axes and (b) the additional computational burden of transforming Nt * Ng 3 grid values (with Nt being the number of grid types and Ng the number of gridpoints along one Cartesian direction) whenever a whole 2017, 10, 6 2 of 5 macromoleculeAlgorithms is moved. The proposed algorithm would eliminate these problems at negligible added computational would cost. be more involved since the rotated grids would not be aligned with the Cartesian axes and 2. The

(b) the additional computational burden of transforming Nt * Ng3 grid values (with Nt being the number of grid types and Ng the number of gridpoints along one Cartesian direction) whenever a Algorithm Proposed whole macromolecule is moved. The proposed algorithm would eliminate these problems at negligible added computational cost.

This note describes an algorithm that allows the movement of the rigid part while maintaining 2. Theoriginal Algorithm Proposed the grids in their positions, giving rise to the same savings as obtained for the single molecule system. This note describes an algorithm that allows the movement of the rigid part while maintaining the grids in their original positions, giving rise to the same savings as obtained for the single Figure 1 shows the schematics of the system considered. The rigid parts of the macromolecules are molecule system. 1 shows the schematics system rigid parts of the macromolecules represented by theFigure oblongs labeled with of R1theand R2considered. and the The flexible parts are represented by the loops are represented by the oblongs labeled with R1 and R2 and the flexible parts are represented by the F1 and F2 . Macromolecule 2 will be held in place with only the F2 part moving while macromolecule 1 loops F1 and F2. Macromolecule 2 will be held in place with only the F2 part moving while is allowed to translate/rotate whilstto also changing the conformation of the part by the macromolecule 1 is allowed translate/rotate whilst also changing the conformation of therepresented part represented by the atoms forming F 1. atoms forming F1 .

F2 R2

R1 F1

Figure 1. Schematics of the system considered. R1 , R2 : rigid part of macromolecules 1 and 2, Figure 1. Schematics of the system considered. R1, R2: rigid part of macromolecules 1 and 2, respectively; F1respectively; , F2 : flexible part of macromolecules 1 and 2, respectively. F1, F2: flexible part of macromolecules 1 and 2, respectively. The key to the algorithm proposed is the simultaneous updating of two equivalent copies of the

The key toflexible the algorithm the simultaneous updating of two equivalent copies of the parts, labeledproposed F1′ and F2′ inisFigure 2. The difference between the primed and unprimed 0 and 0 in Figure 2. versions F is just a translation/rotation of the rigid part; their conformations areprimed identical. and The initial flexible parts, labeled F The difference between the unprimed versions 1 2 position of macromolecule 1 is R1 and its position after rigid translation/rotation is R1′. Such a is just a translation/rotation of also the require rigid the part; their conformations identical. The initial position of transformation would transformation of grid G1 into G1are ′, as shown in Figure 2, but algorithm avoids this. Instead, the translation/rotation algorithm introduces the alternative macromoleculethe1 proposed is R1 and its position after rigid is R1 0 . position Such aoftransformation macromolecule 2, R2′. It is constructed by translating/rotating 0the whole complex in such a way that would also require the transformation of grid G into G1 , as shown in Figure 2, but the proposed the rigid part of macromolecule 1 is overlaid on 1its initial position/orientation. Note that, in reality, algorithm avoids Instead,of the algorithm introduces the alternative position of macromolecule 2, onlythis. the coordinates the flexible part F2 are involved in the calculation. R2 0 . It is constructed by translating/rotating the whole complex in such a way that the rigid part of macromolecule 1 is overlaid on its initial position/orientation. Note that, in reality, only the coordinates of the flexible part F2 are involved in the calculation. Since E(F1 0 ,G1 0 ) = E(F1 ,G1 ) and E(F2 ,G1 0 ) = E(F2 0 ,G1 ) where E(F,G) represents the energy of flexible part F interacting with grid G, this arrangement allows the calculation of the energy of a new complex as follows: E = E(F1 ,G1 ) + E(F2 ,G2 ) + E(F1 0 ,G2 ) + E(F2 0 ,G1 ) + E(F1 0 ,F2 ) (1) where E(F1 0 ,F2 ) is the energy of the interaction between flexible parts F1 0 and F2 . Two kinds of changes are considered: translate/rotate macromolecule 1 in the current conformation of its flexible part, F1 , or change the conformation of the flexible part of one of the macromolecules, F1 or F2 . When macromolecule 1 is translated/rotated, the coordinates of F1 0 and F2 0 have to be updated. When the flexible part of macromolecule 1 is changed, the coordinates of F1 and F1 0 have to be updated. When the flexible part of macromolecule 2 is changed, the coordinates of F2 and F2 0 have to be updated. Note that in Equation (1), the energy of the interaction between the two rigid parts, being separated by the layers of the flexible part, is assumed to be negligible. For the Van der Waals interactions, this assumption holds to a good approximation. As for the electrostatic interactions, it is possible to include them, at negligible computational expense, using a multipole representation of the charge

Algorithms 2017, 10, 6

3 of 4

distribution since the two rigid parts can be enclosed into nonoverlapping spheres, ensuring the validity of the multipole representation. Algorithms 2017, 10, 6 3 of 5

F1’ G 1’

R1’

G1

G2

F2 R2

R1 F1 F2’

R2’

Figure 2. Schematics of the copies needed for the proposed algorithm. R1, R2: rigid part of

Figure 2. Schematics of the copies needed for the proposed algorithm. R1 , R2 : rigid part of macromolecules 1 and 2, respectively; F1, F2: flexible part of macromolecules 1 and 2, respectively; G1, macromolecules 1 and 2, respectively; F1 , F2of: flexible part of macromolecules 1 and 2, respectively; the rigid parts R1 and R2, respectively; Primed items G2: The set of grids representing the field G1 , Grepresent of systems grids representing the field of the rigid parts R1 and R 2 : The setthe 2 , respectively; after change from the initial position, orientation and conformation.Primed The twoitems represent the systems change from initialincreased position,fororientation and conformation. The two interacting systemsafter are shown with theirthe distance clarity. interacting systems are shown with their distance increased for clarity. Since E(F1′,G1′) = E(F1,G1) and E(F2,G1′) = E(F2′,G1) where E(F,G) represents the energy of flexible part F interacting with grid G, this arrangement allows the calculation of the energy of a new 3. Discussion complex as follows:

This note presented an algorithm that extends the use of grid-based force fields to the modeling of E = E(F1,G1) + E(F2,G2) + E(F1′,G2) + E(F2′,G1)+E(F1′,F2) (1) two partially flexible macromolecules by eliminating the added computation involved in determining where E(F 1′,F2) is the energy of computation the interaction to between flexible parts F1′ and . the determination of the nearest gridpoints, and the transform the grids. AsF2for Two kinds of changes are considered: translate/rotate macromolecule the current the nearest gridpoints, the proposed algorithm’s use of a transformed copy1 isinadding a similar conformation of its flexible part, F1, or change the conformation of the flexible part of one of the computational expense, so there is no net effect. The main saving is achieved by eliminating the macromolecules, F1 or F2. When macromolecule 1 is translated/rotated, the coordinates of F1′ and F2′ transformation of the grids. have to be updated. When the flexible part of macromolecule 1 is changed, the coordinates of F1 and Assuming that there are 10 different corresponding atom types, the proposed F1′ have to be updated. When the flexiblegrids part of macromoleculeto2 different is changed, the coordinates of F2 3 points at the negligible expense of transforming algorithm would save the transformation of ~10 × Ng and F2′ have to be updated. the additional NF in atoms of the part. Assuming that Ng 'the 200 (a rigid reasonable value in Note that Equation (1), flexible the energy of the interaction between two parts, being our experience) that the flexible part is is about 10%toofbethe whole molecule, about ~8 × 105 separated by and the layers of the flexible part, assumed negligible. For the Van der Waals 2 additional interactions, this holds a good approximation. electrostatic interactions, it is transformations canassumption be eliminated attothe negligible expenseAs offor ~2the × 10 transformations possible to include them, at negligible computational expense, using a multipole representation of to of the flexible part. The estimate of the overall effect is less straightforward. It is reasonable the charge distribution since the two rigid parts can be enclosed into nonoverlapping spheres, assume that the volume of the grid is proportional to the volume of the flexible atoms or, equivalently, ensuring the validity of the multipole representation. that the number of grids along an axis is proportional to the cube root of the number of flexible atoms, i.e., Ng = c1 × NF 1/3 , and that the overall energy calculation will be dominated by the direct 3. Discussion interactions of the flexible parts. Assuming further, as is generally true for Markov-chain Monte This note presented an algorithm that extends the use of grid-based force fields to the modeling Carlo, that a move involves a small limited fraction of the atoms available for moving, the work of two partially flexible macromolecules by eliminating the added computation involved in in calculating the energy change is c2 × NF. This means that the ratio of the work involved in the determining the nearest gridpoints, and the computation to transform the grids. As for the energy calculation of perthe step and the corresponding savings by usinguse theofalgorithm proposed is determination nearest gridpoints, the proposed algorithm’s a transformed copy here is 1/3 )3 ] = c /(10 × W × c 3 ) where W is the fraction of the time during [c2 ×adding (NF )]/[W × 10 × (c × N 1 F 2 1 a similar computational expense, so there is no net effect. The main saving is achieved by whicheliminating the wholethe molecule is moved. Thus, in the limit of a large number of flexible atoms, the expense transformation of the grids. of transforming the that gridthere is commensurable thecorresponding rest of the energy calculation, thus the limit of Assuming are 10 different to grids to different atom types, the upper proposed 3 points at the negligible expense of algorithm save the transformation of ~10 the savings is a would constant, depending, among others, on ×theNg frequency of whole molecule moves. Likely transforming the additional NF atoms of the flexible part. Assuming thatN Ng ≃ 200 (a reasonable applications, however, are far from this limit: assuming (a relatively large) F ' 100 and Ng ' 200 and

thus the upper limit of the savings is a constant, depending, among others, on the frequency of whole molecule moves. Likely applications, however, are far from this limit: assuming (a relatively large) NF ≃ 100 and Ng ≃ 200 and the number of changed flexible atoms ~50 (an overestimate), the ratio of Algorithms the energy calculation to the grid transformation calculation would be (50 × 100)/(W × 10 × 2017, 10, 6 4 of 4 3 −4 200 ) = 10 (5/(W × 8) × F where F is the ratio of the computation expenses of calculating a distance and transforming a coordinate. Clearly, it shows a significant gain for systems of systems of likely the number of changed flexible atoms ~50 (an overestimate), the ratio of the energy calculation to the size. grid transformation calculation would be (50 × 100)/(W × 10 × 2003 ) = 10 −4 (5/(W × 8) × F where Thus, expected that this algorithm significantly speed up the modeling of protein F is it theisratio of the computation expenses ofwould calculating a distance and transforming a coordinate. Clearly, it shows significant gain for systems of systems of likely size. complexes where the ainteraction region is relatively small compared to the overall volume of the Thus, it is expected that this algorithm would significantly speed up the modeling It of protein complex and foster further developments in the Monte Carlo methodology. is planned to complexes where the interaction region is relatively small compared to the overall volume of the implement it into the program called MMC [4] that already includes the single grid version. A complex and foster further developments in the Monte Carlo methodology. It is planned to implement typical example for its use would complexes (MHC) [8]. An it into the program called MMC be [4] modeling that already major includeshistocompatibility the single grid version. A typical example examplefor foritssuch a system is shown in Figure 3. use would be modeling major histocompatibility complexes (MHC) [8]. An example for such a system is shown in Figure 3.

Figure 3. Crystal structure of MHC class I HLA-A2.1 bound to a photocleavable peptide (PDB ID: 2X70). The possible choicesofof flexible to F1bound and F2 are in red and blue 3. Crystal structure MHC parts classcorresponding I HLA-A2.1 to shown a photocleavable spheres, respectively.

Figure peptide (PDB ID: 2X70). The possible choices of flexible parts corresponding to F1 and F2 are shown in red and blue spheres, respectively. Acknowledgments: This work was supported in part through the computational resources and staff expertise provided by the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai.

Acknowledgments: This work was supported in part through the computational resources and staff expertise Conflicts of Interest: The author declares no conflict of interest. provided by the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai. ConflictsReferences of Interest: The author declares no conflict of interest. Mezei, M. On the potential of Monte Carlo methods for simulating macromolecular assemblies. In Third References International Workshop for Methods for Macromolecular Modeling Conference Proceedings; Gan, H.H., Schlick, T., Eds.; Springer: New York, NY, USA, 2002. 1. Mezei, M. On theW.L.; potential of Monte Carlo modeling methodsoffor simulating macromolecular assemblies. 2. Jorgensen, Tirado-Rives, J. Molecular organic and biomolecular systems using BOSS and In Third MCPRO. J. Comput. 2005,for 26,Macromolecular 1689–1700. [CrossRef] [PubMed] International Workshop for Chem. Methods Modeling Conference Proceedings; Gan, H.H., Schlick, T., 3. Ullmann, R.T.; Ullmann, G.M. GMCT: A Monte Carlo simulation package for macromolecular receptors. Eds; Springer: New York, NY, USA, 2002. J. Comput. Chem. 2012, 33, 887–900. [CrossRef] [PubMed] 2. Jorgensen, W.L.; Tirado-Rives, J. Molecular modeling of organic and biomolecular systems using BOSS Mezei, M. MMC: Monte Carlo Program for Molecular Assemblies. Available online: http://inka.mssm.edu/ 4. and MCPRO. J. Comput. Chem. 2005, 26, 1689–1700. ~mezei/mmc (accessed on 27 December 2016). 5. Goodford, P.J. A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J. Med. Chem. 1985, 28, 849–857. [CrossRef] [PubMed] Meng, X.-Y.; Zhang, H.-X.; Mezei, M.; Cui, M. Molecular docking: A powerful approach for structure-based 6. drug discovery. Curr. Comput. Aided Drug Des. 2011, 7, 146–157. [CrossRef] [PubMed] 7. Cui, M.; Mezei, M.; Osman, R. Prediction of protein loop structures using a local move Monte Carlo approach and a grid-based force field. Protein Eng. Des. Sel. 2008, 21, 729–735. [CrossRef] [PubMed] 8. Janeway, C.A., Jr.; Travers, P.; Mark, W.; Mark, J.S. Immunobiology: The Immune System in Health and Disease, 5th ed.; Garland Science: New York, NY, USA, 2001. 1.

© 2017 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Suggest Documents