Finding the global energy minimum region of a polypeptide chain, independently of the ...... aThe ten arbitrary initial conformations are denoted by C-1âC-10.
A diffusion process-controlled Monte Carlo method for finding the global energy minimum of a polypeptide chain. I. Formulation and test on a hexadecapeptide Philippe Derreumaux Citation: The Journal of Chemical Physics 106, 5260 (1997); doi: 10.1063/1.473525 View online: http://dx.doi.org/10.1063/1.473525 View Table of Contents: http://scitation.aip.org/content/aip/journal/jcp/106/12?ver=pdfcov Published by the AIP Publishing Articles you may be interested in Monte Carlo simulation of proteins through a random walk in energy space J. Chem. Phys. 116, 7225 (2002); 10.1063/1.1463059 Monte Carlo update for chain molecules: Biased Gaussian steps in torsional space J. Chem. Phys. 114, 8154 (2001); 10.1063/1.1364637 From polypeptide sequences to structures using Monte Carlo simulations and an optimized potential J. Chem. Phys. 111, 2301 (1999); 10.1063/1.479501 Finding the low-energy forms of avian pancreatic polypeptide with the diffusion-process-controlled Monte Carlo method J. Chem. Phys. 109, 1567 (1998); 10.1063/1.476708 Folding a 20 amino acid αβ peptide with the diffusion process-controlled Monte Carlo method J. Chem. Phys. 107, 1941 (1997); 10.1063/1.474546
This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 193.0.65.67 On: Mon, 01 Dec 2014 09:05:16
A diffusion process-controlled Monte Carlo method for finding the global energy minimum of a polypeptide chain. I. Formulation and test on a hexadecapeptide Philippe Derreumaux Laboratoire de Biochimie The´orique, URA 77 CNRS, Institut de Biologie Physico-Chimique, 13, rue Pierre et Marie Curie, 75005, Paris, France
~Received 17 September 1996; accepted 18 December 1996! Finding the global energy minimum region of a polypeptide chain, independently of the starting conformation and in a reasonable computational time, is of fundamental interest. To approach this problem, a new Monte Carlo method is proposed and applied to the hexadecapeptide model Ac-~AAQAA!3Y~NH2!, in which the global energy minimum conformation, an a helix, is known. In order to reduce the available conformational space, the backbone dihedral angles f and c are restricted to a discrete set of ten regions and the side chains are modeled by a two-point representation. The energy used in these off-lattice simulations is of Amber type with a simplified hydrophobic potential. The novelty of the method is that, prior to the minimization of the energy, the move from the current conformation to the next must satisfy a kinetic requirement. The kinetic requirement is that there exists an upper bound on the escape time from the current conformation. From diffusion consideration it is shown that the escape time correlates with the angular deviations of the residues. The effectiveness of the approach is illustrated by a total of 25 biased simulations ~i.e., using specific probabilities for the ten f – c regions! and five unbiased simulations ~i.e., the 10 regions are equiprobable before application of the kinetic requirement!, starting from various conformations. It is found that all biased and unbiased simulations find the global minimum energy structure in ;102 – 103 Monte Carlo steps, although the estimated probability of getting the full a helix is ;10211 – 10216. © 1997 American Institute of Physics. @S0021-9606~97!50712-4#
I. INTRODUCTION
Predicting the native structure of a polypeptide from its sequence has long motivated the development of methods capable of surmounting the multiple minima problem. Current theoretical approaches using molecular mechanics force fields include buildup procedures,1 systematic search strategies,2 high-temperature simulated annealing,3 hightemperature molecular dynamics simulations,4 mathematically inspired methods,5,6 methods that deform the potential energy surface,7,8 and various Monte Carlo ~MC! methods.9–12 Among all these approaches, prediction of the global minimum region for medium-size systems ~consisting, at most, of 20 residues! is achieved by only three methods: the MC chain growth method,10 the biased probability Monte Carlo ~BPMC! method,9 and the electrostatically driven Monte Carlo ~EDMC! method.11 In the chain growth method, the polypeptide is grown atom by atom using local dihedral angle biases so as to generate a Boltzmanndistributed ensemble of conformations at a high temperature. Then the distinct conformations are minimized. Both the EDMC and BPMC methods are based at each iteration on the minimization of the total energy. They differ, however, in two respects. ~1! The BPMC method uses local dihedral angle biases and allows several residues to change simultaneously. In contrast, the EDMC method employs two different techniques to change one residue at a time. One considers electrostatic information so as to improve the orientations of the permanent dipoles in the local field generated by the
molecular system; the other involves a random sampling. ~2! The BPMC method uses the Metropolis criterion13 to accept or reject the proposed move, with a simulation temperature of 600 K. In EDMC, when the search is unable to modify the currently accepted conformation after a significant number of trials, the process is forced to accept, probabilistically, a high-energy conformation among those previously rejected by the Metropolis criterion. Thus, the effective temperature of the simulation is unknown. At present, extension of these methods to small globular proteins is not feasible because they are too slow to locate the region in which the global minimum lies. The CPU time scales exponentially with the number of dihedral angles in EDMC, and quadratically with the number of residues in the chain growth method. No relationship is given by BPMC, but a total of 2500 energy minimizations is typically required for predicting the native structure of the hexadecapeptide Ac-~AAQAA!3Y~NH2!.9 To develop a method applicable to proteins, two problems must be solved. One is related to the level of molecular detail to be used without sacrificing the properties of the system. In this work, to reduce the number of particles, the backbone atoms are treated explicitly and the side chains are represented by one or two sites, depending on their size and complexity. This protein model has the advantage of reducing the total number of conformations, but still contains a number of local minima much larger than can be searched by any simulation. The other is to design a global optimization approach that substantially reduces the number of Monte
5260 J. Chem. Phys. 106 (12), 22 March 1997 0021-9606/97/106(12)/5260/11/$10.00 © 1997 American Institute of Physics This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 193.0.65.67 On: Mon, 01 Dec 2014 09:05:16
Philippe Derreumaux: Global energy minimum of a polypeptide
Carlo steps, independently of the initial conformation. In this paper, the concepts behind a new global optimization method are presented and the results on the hexadecapeptide model Ac-~AAQAA!3Y~NH2! are compared to those obtained by the BPMC method. This sequence was shown to be helical and monomeric in solution.14 The organization of this paper is as follows. In Sec. II, the level of molecular detail, the backbone conformational space, the force field, and the algorithm are presented. To limit the available conformational space, the backbone dihedral angles f,c of each residue are restricted to a discrete set of ten regions. The basis for the algorithm is the Monte Carlo minimization method, i.e., minimization of the total energy in conjunction with the Metropolis acceptance criterion. The novelty of the method is that, rather than limiting the number of residues to be moved so as to keep the acceptance ratio close to 50%, by definition the acceptance ratio is the ratio of the number of accepted conformations to the number of trial conformations, or rather than changing one randomly chosen dihedral angle at a time, a diffusion-controlled process is proposed for determining the residues to be moved. This scheme is based on the hypothesis that there is an upper bound on the escape time from the current conformation at each iteration to optimize cooperativity. It must be stressed that this process does not bear any resemblance with the diffusive funnel dynamics picture.15 In Sec. III, the results are presented. The efficiency of the new method is illustrated by two series of simulations. One uses local dihedral angle biases, i.e. conformational probabilities, and is referred to as the biased simulation; the other is free of local dihedral angle biases before application of the kinetic requirement, and is referred to as the unbiased simulation. A total of 25 biased and 5 unbiased simulations starting from arbitrary conformations and using different random number seeds are performed. All of them converge to the expected global minimum structure in a limited number of MC minimization steps. The biased simulations find the native state in a range of 24 to 215 steps, although the probability of getting the full a helix is only 1.1310211; the unbiased simulations require 229 to 1083 steps, although the probability of an a helix is 10216. In comparison, the BPMC method requires 2.53103 steps, with a corresponding probability of 231025. A discussion on this method follows in Sec. IV. II. METHODS A. The model
1. Level of molecular detail
Since the number of possible conformations is an exponential function of the number of degrees of freedom ~3N in Cartesian space with N the number of atoms!, it is highly desirable to find alternatives to all atom models, which are too complex to be useful, except for short peptides. There are three bodies of data that justify the use of a simplified side chain representation—without sacrificing the properties of proteins—if the physical property differences between polar and nonpolar residues are considered. First, the polarity of residues rather than the precise identity of residues is con-
5261
served in a given fold. For instance, the fold of some proteins, such as globins, can be achieved by sequences that have less than 20% sequence indentity.16 Second, the observed secondary structure correlates with the periodicity of polar and nonpolar residues in self-assembling oligomeric peptides.17,18 Third, Dill and co-workers, in their theoretical work, have demonstrated that sequences expressed in a tworesidue code, hydrophobic and polar, have many protein-like properties.19 Current simplified side chain models involve either one-point20,21 or two-point representations.22,23 As a first step toward the simplification of the protein structure, the backbone N, H, C, O, and Ca atoms are treated explicitly—the Ha atom is not included in the list, and the side chains are represented by the two points, as proposed by Wallqvist and Ullner.22 Generally, the first site involves the hydrophobic atoms and the second site includes the hydrophilic component. Here, for the hexadecapeptide, the first site involves the b carbon, and the second site encompasses C6H4OH in Tyr and CH2CONH2 in Gln. 2. Local main chain conformational preferences
It is clear from protein structures that the bond lengths and bond angles vary little from their equilibrium values and that the f – c map and the dihedral angle v of each residue are restricted to high probability regions. In this work, the bond lengths and bond angles are free to move, the angles v are forced to be trans, and the entire f – c map is subdivided into ten regions. The division of the f – c plot is carried out on the basis of the observed regular secondary structure motifs, and observed irregular conformations. The regular secondary structures include the righthand a helix, b sheets, and the polyproline ~PII! conformation. Note that the 310 helix ~f5249°, c5226°! is not included in the list because it lies within the boundaries of the a region. The reproduction of the known turns requires the following regions:24 the classic type I, VIa, VIb, II8, and III turns involve a second loop residue in the b1 ~f5290°, c50°! region; the type II and I8 turns involve a second loop residue in the b2 ~f590°, c50°! region; the b hairpin I8 has the first loop residue in the left-hand helix ~aL ! area; the b hairpin II8 involves the first residue in the II8 ~f560°, c 52120°! region; the usual g turn is centered around the C 7ax ~f560°, c5260°! state.25 Two other regions are also included, namely the e ~f56164°, c56164°! and the d ~f 52110°, c570°! regions.9 The centers of the ten allowed regions are given in Table I. The sizes of the ten regions are derived from previous statistical analysis for 191 protein chains.9 Since the observed sizes for the high probability zones are a function of the identity of the residue and of the identity of the zone, simplifications are highly desirable. In this work, the angles f, c are allowed to vary by 614° from the center of the a state, i.e., the a region is defined to have 274°