A grid-based genetic algorithm combined with an ... - Springer Link

2 downloads 0 Views 992KB Size Report
May 8, 2008 - Crescenzi et al. (1998); Calland (2003). This leads to the mention of the Levinthal's paradox (Cyrus Levinthal 1969), which states that, despite ...
Soft Comput (2008) 12:1185–1198 DOI 10.1007/s00500-008-0298-8

FOCUS

A grid-based genetic algorithm combined with an adaptive simulated annealing for protein structure prediction Alexandru-Adrian Tantar · Nouredine Melab · El-Ghazali Talbi

Published online: 8 May 2008 © Springer-Verlag 2008

Abstract A hierarchical hybrid model of parallel metaheuristics is proposed, combining an evolutionary algorithm and an adaptive simulated annealing. The algorithms are executed inside a grid environment with different parallelization strategies: the synchronous multi-start model, parallel evaluation of different solutions and an insular model with asynchronous migrations. Furthermore, a conjugated gradient local search method is employed at different stages of the exploration process. The algorithms were evaluated using the protein structure prediction problem, having as benchmarks the tryptophan-cage protein (Brookhaven Protein Data Bank ID: 1L2Y), the tryptophan-zipper protein (PDB ID: 1LE1) and the α-Cyclodextrin complex. Experimentations were performed on a nation-wide grid infrastructure, over six distinct administrative domains and gathering nearly 1,000 CPUs. The complexity of the protein structure prediction problem remains prohibitive as far as large proteins are concerned, making the use of parallel computing on the computational grid essential for its efficient resolution.

The current article is developed within the context of the DOCK—Conformational Sampling and Docking on Grids project, sustained by ANR (Agence Nationale de la Recherche, http://www.gip-anr.fr). The project is a joint work between LIFL (USTL-CNRS-INRIA), IBL (CNRS-INSERM) and CEA DSV/DRDC. A.-A. Tantar (B) · N. Melab · E.-G. Talbi Laboratoire d’Informatique Fondamentale de Lille, LIFL/CNRS UMR 8022, DOLPHIN Project - INRIA Futurs, Cité Scientifique, 59655 Villeneuve d’Ascq Cedex, France e-mail: [email protected]; [email protected] N. Melab e-mail: [email protected] E.-G. Talbi e-mail: [email protected]

1 Introduction The protein structure prediction problem (PSP), is one of the particularly interesting challenges of parallel computing on computational grid. The PSP problem consists in determining the ground-state conformation of a specified protein given its amino-acids sequence (the primary structure). The ground-state conformation term designates the associated tridimensional native form, referred to as zero energy tertiary structure. The importance of the PSP problem is determined by the ubiquitousness of proteins in the living organisms. Real world applications of computational PSP are directly related to computer assisted drug and molecular design. From a structural point of view, proteins are complex organic compounds composed of amino-acid residues chains joined by peptide bonds (see Fig. 1). They are involved in immune response mechanisms, enzymatic activity, signal transduction, etc. Due to the intrinsic relation between the structure of a molecule and its functionality, the problem implies important consequences in medicine and biology related fields. A focus is set by current research on molecular structure prediction, molecular folding and molecular docking. Computational modeling and prediction offer an alternative to laboratory in vitro experimentation, unfeasible for large domain experimentation. A viable approach for addressing the implied complexity is grid computing. Nowadays, it is regarded as a powerful way for achieving high performance on computational-intensive applications. Considering the inherent complexity of the problem, from an algorithmic perspective, hybrid search algorithms present themselves as effective resolution approaches. In this study, a model combining an Evolutionary Algorithm (EA) and an Adaptive Simulated Annealing (ASA) technique is considered. Evolutionary algorithms are

123

1186

Fig. 1 First row structure of an amino-acid. a NCα C back-bone structure, b polymeric structure; ω,  and  relate to dihedral angles; R designates the specific amino acid’s side chain characteristic. Second row: tryptophan-cage protein (PDB ID 1L2Y), a multiple near-native conformations; b ribbon-ball & stick representation of a single conformation. The graphical representations of the tryptophan-cage protein has been created using UCSF Chimera (Pettersen et al. 2004)

stochastic iterative search techniques, with a large area of application—epistatic, multi-modal, multicriterion and highly constrained problems (Alba et al. 2005). Genetic algorithms (GAs) (Holland 1975), one of the most popular variants of EAs, are Darwinian-evolution inspired, population-based metaheuristics that allow a powerful exploration of the conformational space. However, they have limited search intensification capabilities, which are essential for neighborhood-based improvement. The neighborhood of a solution refers to a part of the landscape of the problem. Simulated annealing (SA) algorithms are optimization techniques capable of dealing with multimodal functions of a large nonlinearity and discontinuity degree. They were developed by Kirkpatrick et al. (1983) as a generalization of the Metropolis Monte Carlo techniques (Metropolis et al. 1953). It extends the previous methods by including a temperature schedule. This metaheuristic simulates metal recrystallization in the process of annealing. The entropy of an initial disordered system is adiabatically reduced to low entropy states, maintaining at each step a thermodynamic equilibrium. The SAs represent a viable alternative to gradient based local search methods, having the advantage of being less prone to getting trapped into local minima. Moreover, the SA implementation does not impose complex development

123

A.-A. Tantar et al.

constraints. However, SA methods are extensively sequential in their nature thus being difficult to parallelize, requiring long execution times. Their design and characteristics makes them more appropriate for use as local search. A comprehensive study comparing a Monte Carlo SA, a common GA and a Lamarckian GA was conducted in Morris et al. (1999). The article discusses the various aspects related to the algorithms, problem concepts, empirical free energy function definition, etc. Overall, it is concluded over the efficiency of the Lamarckian GA under study, indicating it as the best candidate for ligands with increasing number of degrees of freedom. Considering the diverse parametric elements modeling the behavior of conformational sampling methods, the work of Thomsen and Rene (2003) is important to mention. It discusses the impact of local search hybrids and variation operators, employed for flexible ligand docking with EAs. The EA performance is evaluated by varying different specific settings of the algorithm, e.g., population size, mutation, crossover and local search operators. The presented results lead to an interesting conclusion: local search operators determine the EAs to be more prone to getting trapped in local minima, hence being unable to sample an extensive conformational domain. As a counterpart, an annealing scheme for variance control in the mutation operator is identified as being an important and determinant component to efficiency. In this context, the search for different annealing schemes is underlined as an interesting area for further research. In our investigation, we take into consideration a hybrid model combining a hierarchical parallel genetic algorithm and an ASA algorithm. Augmented computational resources allow for higher and more complex algorithmic constructions thus rendering possible the design of hierarchical concurrent and parallel approaches. Although briefly presented, the underlying technical aspects of the used framework, as well as the grid environment, are not detailed. We considered it as out of scope for this study since it allows us to focus on the algorithms. Under the same considerations, no in-depth details are included for the employed force field. Part of the notions and definitions presented in this paper can also be found in our previous works (Alexandru-Adrian et al. 2006; Tantar et al. 2007), although they are not related to this paper. The remainder of the paper is organized as follows: a brief overview of the field is proposed in Sect. 2 indicating the main directions for solving the PSP problem. Sections 3 and 4 describes the algorithms, offering details of the parallel models that are used. In Sect. 5, the ParadisEO framework is presented, as well as the general implementation aspects. In Sect. 6, experimentation results are given with an introductory presentation of the GRID5000 computational grid. Finally, Sect. 7 concludes the paper.

A grid-based genetic algorithm

2 Protein structure prediction The PSP problem consists in determining the ground-state conformation of a specified protein, given its amino-acid sequence. A closely related problem, molecular folding, has as desired outcome the pathways followed along the folding process in a molecule. Additionally, molecular docking describes the complexed macromolecule resulting from the binding of two separate folded molecules, exerting geometrical and chemical complementarity. From a computational standpoint, in silico docking simulates molecular recognition, although not relating to the molecular pathways of the process but to the final complexed result. Having been studied for more than a decade, and of particular interest, protein-protein docking, as a singular case, is fundamental in understanding biomolecular processes such as, for example, interactions between antibodies and antigens, intra-cellular signaling modulation mechanisms, inhibitor design, macromolecular interactions, and others. As an example of PSP complexity aspects, for a reduced size molecule composed of 40 residues, a number of 1040 conformations must be taken into account when considering, in average, 10 conformations per residue. Additionally, if a number of 1014 conformations per second is explored, a time of more than 1018 years is needed to find the native-state conformation. For example, the [met]-enkephalin pentapeptide, composed of 75 atoms and having five amino-acids, Tyr-Gly-Gly-Phe-Met, and 22 variable backbone dihedral angles, a number of 1011 local optima is estimated. Detailed aspects concerning complexity issues are discussed in Crescenzi et al. (1998); Calland (2003). This leads to the mention of the Levinthal’s paradox (Cyrus Levinthal 1969), which states that, despite the numerous pathways, in vivo molecular folding expands over a time scale magnitude of several milliseconds. Notes on molecular structure prediction complexity may be found in Ngo and Marks (1992). As a conclusion, no simulation or resolution is possible unless extensive computational power is applied. It may be inferred that no polynomial time resolution is achievable if no or little a priori knowledge is employed. An extended referential resource for protein structural data may be accessed through the Brookhaven Protein Data Bank1 (Bernstein et al. 1977). A comprehensive introductory article on the structure of proteins and related notions and aspects, can be found in Neumaier (1997), and a glossary of terms in Van de Waterbeemd et al. (1997). The inter-atomic interactions to be considered for the PSP problem are a resultant of electrostatic forces, entropy, hydrophobic characteristics, hydrogen bonding, etc. Precise energy determination relies also on the solvent effect enclosed in 1

http://www.rcsb.org—Brookhaven Protein Data Bank; offers geometrical structural data for a large number of proteins

1187

the dielectric constant  and in a continuum model based term. In practice, a trade-off is accepted opposing accuracy against the approximation level, varying from exact, physically correct mathematical formalisms to purely-empirical approaches. The main categories to be mentioned are de novo, ab initio electronic structure calculations, semi-empirical methods and molecular mechanics based models. The mathematical model accurately describing molecular systems is formulated upon the Schrödinger equation, which makes use of molecular wavefunctions for modeling the spatio-temporal probability distribution of constituent particles (Dorsett and White 2000). It should be noted that, though offering the most accurate approximation, the Schrödinger equation cannot be accurately solved for more than two interacting particles. For aspects related with its resolution, please consult (Islas and Schober 2003; Moore and Reich 2003). Ab initio (first principles) calculations rely on quantum mechanics for determining different molecular characteristics, comprising no approximations and with no a priori required experimental data. Molecular orbital methods make use of basis functions for solving the Schrödinger equation. The high computational complexity of the formalism restricts their application area to systems composed with a reduced number of atoms. Semi-empirical methods substitute computationally expensive segments by approximating ab initio techniques. A decrease in the time required for calculus is obtained by employing simplified models for electron-electron interactions: extended Hückel model, neglect of differential overlap, neglect of diatomic differential overlap, etc. Extended explanations for the exposed directions are available via (Dorsett and White 2000; White et al. 2000; Sherwood 2000; Neumaier 1997). Empirical methods rely upon molecular dynamics (classical mechanics based methods), and were introduced by Alder and Wainwright (1957) and Alder and Wainwright (1959). After more than a decade, protein simulations were initiated on the Bovine Pancreatic Trypsin Inhibitor (BPTI) (McCammon et al. 1977). They often represent the only applicable technique for large molecular systems, namely, proteins and polymers, and do not make use of quantum mechanics formalism. They rely solely upon classical Newtonian mechanics. As to the basis of the considered approach, we should mention that, according to recent results (Lattman 2001; Bonneau et al. 2001), empirical methods exceed in terms of quality the ab initio ones. Conceptually, molecular dynamics models do not dissociate atoms into electrons and nuclei but regard them as indivisible entities. Finally, hybrid and layered methods exist (Vreven et al. 2003; Kikuchi et al. 2002; Nakano et al. 2001), connecting several of these techniques through various computing architectures. This is made in an attempt to obtain accurate results

123

1188

at low computational costs, and, consequently, in a reduced period of time.

3 Adaptive simulated annealing The main directions to improve the basic SA algorithm are modifications on the generation of the probability density function, the temperature schedule or parallel and hybridization schemes. The ASA algorithm presented in the work of Ingber (1993a) relies on periodically reannealing the temperature while modifying different control parameters. This is done in concordance with the explored landscape. A simulated quenching technique may also be employed. Ingber’s adaptive variant samples the ergodic space in a D + 1 dimensional space, where D represents the cardinality of a solution, i.e., the number of configuration parameters. The additional element stands for the associated fitness value. Adaptive SA represents an outcome of a previous method, very fast simulated reannealing (VFSR). VFSR exploits the characteristics of a specific-designed generating function, to allow for an exponential faster annealing process, as compared to Boltzman annealing. Adding a reannealing step permits the adjustment of the algorithm’s control parameters, as influenced by the sensitivity measured on the multidimensional space, for each of the configuration parameters which model a solution. Next, a few basic notions are exposed in order to set the basis for formally describing the phases of an ASA. Please refer to the work of Ingber for detailed descriptions, including comparison and test case studies (Ingber 2001, 1996, 1993a,b; Ingber and Rosen 1993). The associated source code for the method is publicly available. A simplified form of simulated annealing, a Boltzman annealing algorithm, consists of: (1) a probability density function of the state space, g(x); (2) an acceptance probability function, h(E); and (3), an annealing schedule T (k). The annealing schedule is defined over a number of discrete steps, serving as control parameter for the acceptance probability function. The acceptance function offers a quantification for the probability of performing a transition from an E k energy state to a new state with energy E k+1 , being defined as follows: exp(−E k+1 /T ) h(E) = exp(−E k+1 /T ) + exp(−E k /T ) 1 = 1 + exp(E/T ) 1 ≈ exp(−E/T ), E = E k+1 − E k 1 + exp(E/T ) This function allows for high rates at high temperatures while leading to low rates with the decrease of temperature, thus a bias towards improved solutions is induced. High

123

A.-A. Tantar et al.

temperatures determine high acceptance rates with low discrimination between solutions. The method acts like a global search procedure whistl local search is performed at low temperatures. Considering a classical Gaussian–Markov system with a probability density state space function g(x) defined as g(x) = (2π T )−D/2 exp[−x 2 /(2T )], for an appropriate initial temperature T0 , the global minimum can be found if the temperature is decreased no faster than T (k) = T0 / ln k. The main difficulty in designing a Boltzman SA consists in determining the starting temperature as well as an efficient schedule for the problem under study. In practice, a T0 / ln k schedule does not offer a fast enough annealing. An exponentially decreasing schedule is preferred, for example, T (k) = T0 exp((c − 1)k), T (k) = c T0 , with 0 < c < 1, e.g., c ≈ 0.98. It should be noted that, although the algorithm offers faster cooling schedules, the proof of statistical asymptotic convergence does not stand for the aforementioned exponential annealing schemes. As a general formalism, employing Ingber’s notations, we consider a minimization problem under the following form: min F(α), α = (α1 , ..., αn ),  = {α : Ai ≤ αi ≤ Bi } , α∈

where α = (α1 , ..., αn ) represents an ensemble of control parameters identifying a solution in the search space. Moreover,  denotes a set of bounded solutions, defining the interest area, for each parameter αi , a lower and an upper limit being specified— Ai and Bi , respectively. According to the Ingber’s proof, for a generating function gT defined as gT (y) =

D  i=1

 1 ≡ gTi (y i ) i 2(|y | + Ti ) ln(1 + 1/Ti ) D

i=1

the associated cumulative probability distribution can be written as follows: y

1

G T (y) =

y

D

... −1

G iT (y i ) =

dy 1 ...dy D gT (y  ) ≡

G iT (y i )

i=1

−1

1 + 2

D 

sgn(y i )

ln(1 + |y i |/Ti )

2

ln(1 + 1/Ti )

In this case, the global optimum is statistically attainable for a Ti (k) = T0i exp(−ci k 1/D ) annealing schedule: ∞  k0

gk ≈

∞  k0



D  i=1

1 2|y i |ci



1 =∞ k

A grid-based genetic algorithm

1189

The ASA generation function is defined as follows, over a set of uniformly generated random variables, u i ∈ U [0, 1]: i αk+1 =αki +y i (Bi −Ai ), where y i is defined as

y i = sgn(u i −1/2) Ti [ (1+1/Ti )|2u

i −1|

− 1 ], y i ∈ [−1, 1]

The acceptance of newly generated solutions is performed according to the Metropolis criterion as previously exposed for the Boltzman SA. In addition to the standard SA approach, a reannealing step is periodically applied making use of pre-computed sensitivities which offer information describing the landscape structure. For each, αi ∈ α the associated sensitivity si is computed by including gradient information. No restriction is imposed on defining different sensitivity measures. At a specified number of accepted solutions, reannealing takes place using the pre-computed sensitivities, for updating the Tik factors and the ki step indexes as follows:   ln(T0i /Tik ) D smax ∂F   Tik = Tik , ki = , si = si ci ∂αi Considering also simulated quenching, the annealing schedule may be re-defined for including quenching factors: Q i /D

Tki = T0i exp(−ci ki

launched ASAs were randomly initialized. The algorithm’s implementation details are presented later in the following sections.

), where ci = m i exp(−n i Q i /D)

m i , n i defining control parameters which may be adjusted for fine-tuning the algorithm for a specific problem. The adaptive features of the algorithm are given by the sensitivity derived factors which are used in generating new solutions. The reannealing step updates each of the factors in concordance with the computed sensitivities also adjusting the current schedule temperature. In this study several approaches were considered for the ASA design. First, when generating a new solution, we considered applying reduced step perturbations to only one control parameter of the previous solution instead of perturbing the entire array of parameters. Furthermore, the magnitude of the perturbation has been adjusted to allow distant solutions for high-energy conformations but to bias towards local solutions for low-energy conformations. Moreover, in case the value of the perturbed control parameter falls out of its associated interval for a specified number of trials, a uniform value is sampled from its domain. In order to explore for optimized initialization parameters, a meta-algorithm has been developed deploying ASA algorithms on distributed computing resources. Each of the deployed algorithms has been iteratively executed on a specified PSP benchmark conformation, the final score being given as a mean of the obtained energies. The results presented in the final section were obtained by using the ASA algorithm initialized with the optimized control parameters, as well as with randomly uniform parameters. For the former case, a bias has been set: only one out of five synchronously

4 Parallel metaheuristics for solving the PSP 4.1 Encoding of the conformations The algorithmic resolution of the PSP, in heuristic context, is directed through the exploration of the molecular energy surface. The sampling process is performed by altering the structure of the molecule, i.e., backbone structure, associated torsional angles, etc., in order to obtain different structural variations. Different encoding approaches can be found in the literature. The trivial approach is the direct coding of atomic Cartesian coordinates (Rabow and Scheraga 1996). The main disadvantage of direct coding is the fact that, it requires filtering and correcting mechanisms inducing non-negligible affected times. Additionally, hydrophobic/hydrophilic models have been developed by using amino-acid based codings (Krasnogor et al. 1999). Several variations exist, making use of all-heavy-atom coordinates, Cα coordinates or backbone and residues atom coordinates, where amino-acids are approximated by their centroids. For our method, an indirect, less error-prone, torsional angle based representation has been preferred. More specifically, each individual is coded as a vector of torsion angle values (see Fig. 2). The defined number of torsion angles represents the degree of flexibility. Apart from torsion angles which move less than a specified parameter, e.g., within a specified interval, all torsion angles are rotatable. Rotations are performed in integer increments. Energy quantification of covalent bonds and nonbonded atoms interactions are used as an optimality evaluation criterion.

Fig. 2 Chromosome encoding based on specifying the backbone torsional angles

123

1190

A.-A. Tantar et al.

Table 1 Scoring function quantifying the inter-atomic interactions  E= K b (b − b0 )2 bonds  + K θ (θ − θ0 )2 +

angles  tor sions

+ +



V an der W aals  Coulomb

+



desolvation

K φ (1 − cos n(φ − φ0 )) K iaj di12 j



K ibj di6j

qi q j 4π εdi j K qi2 V j +q 2j Vi di4j

4.2 Scoring function The scoring function is computed by making use of bonded atoms energy and non-bonded atoms energy through an independently developed CVFF-based force field. The quantification of energy is performed by using empirical molecular mechanics, as depicted in Table 1. An extensive discussion on force fields designed for protein simulations, with in-depth details, is offered in Ponder and Case (2003). The first part of the mentioned work covers the evolution of the force fields, starting from the 1980s and discussing various formulations which include the Amber, CHARMM and OPLS force fields. A specific constant is associated with each type of interaction, notationally denoted by K inter , i.e., K b , K θ , K φ , K iaj for bonds, angles, torsion angles and Van der Waals interactions, respectively. An optimal value for the considered entity (bond, angle, torsion) is introduced in the equation as reference—(A − A0 ). A0 specifies the natural, experimentally observed value, while A stands for the experimentation Fig. 3 Free energy surface for the tryptophan-cage protein around a deep optima. High-energy points are depicted in light colors, the low-energy points being identified by the darker areas

123

value. More specific, b represents the bond length, θ the bond angle, φ the torsion angle and qa , di j and V p the electrostatic charge associated with a given atom, the distance between the i and the j atoms and a volumetric measure for the p atom, respectively. Although a part of the designed algorithms, we consider as out of scope for this study to enter into further details on the employed force field. An example of free energy surface for the tryptophan-cage is given in Fig. 3. The sampling values used for constructing the free energy surface were computed by employing the Gibbs free energy over an ensemble of locally sampled conformations E i : G=−

1  −β Ei ln e β

The lighter areas on the obtained surface correspond to high-energy conformations. The hyper-surface, generated by varying the entire set of torsional angles has an extremely rough landscape, with a large number of local optima. 4.3 Conjugated gradient local search The envisaged methods may benefit from relying on a hybrid architecture, combining, for example, an EA with a torsiondriven conjugated gradient-based local search method. The exploration/intensification capabilities of the exploratory algorithms, do not suffice as paradigm when addressing rough molecular energy function landscapes. Small variations of the torsion angle values may generate extremely different individuals, with respect to the fitness function. As a consequence, a nearly optimal configuration, considering the torsion angle values, may have a very high energy value, and

A grid-based genetic algorithm

1191

thus, it may not be taken into account for future iterations of the algorithm. In order to correct the above exposed problem, a conjugated-gradient based method may be applied for local search, alleviating the drawbacks determined by the conformation of the landscape. As a counterpart, gradient-based methods have the disadvantage of getting easily trapped in local optima requiring a fine-tuned exploration balance. 4.4 Adaptive simulated annealing algorithm As the intrinsic SA algorithm is difficult to parallelize, a synchronous multi-start model has been employed for launching several SAs in parallel on a specified set of solutions. The adaptive algorithm was first executed as a stand-alone procedure, having as an initial starting point random generated conformations. Later, it was executed as part of a hybrid model. For the first case, the system starts from an initial disordered state, gradually following an adaptive cooling schedule and maintaining the thermodynamic equilibrium. The stand alone execution case served only as a preliminary tuning procedure. Modifications of the current state are accepted on a Boltzmann probability distribution. In the second case, the method has been executed on solutions which represent the outcome of the evolutionary process of the GA. The initial control parameters of the ASA method were initially drawn from a random uniform distribution. Finally, a meta-algorithm has been employed in order to search for optimized control parameters. As the reannealing step requires sensitivity measures, analytic gradient information was used for computing the si values in a first step. A second approach considers numeric gradient values computed by performing reduced perturbations:    F(α ∗ +λi δ)− F(α ∗ )   , smax = max{si , 1 ≤ i ≤ D} si =   δ λi = (λi0 , . . . , λi D ), λi j =

conformation to be optimized passed as parameter to the adaptive simulated algorithm. 4.5 Genetic algorithm The pseudo-code in Algorithm 4.5 exposes the generic components of an EA. Considering GAs, a random population of individuals is iteratively evolved in generations through different strategies in order for convergence to be achieved. The genotype represents the raw encoding of individuals while the phenotype encloses the coded features. For each generation, individuals are selected on a fitness basis and genotype alteration being performed by means of genetic operators: crossover and mutation. The application of the genetic operators has as result, the modification of the population’s structure. This results in the intensification of the exploration inside a region of interest or, a diversification purpose. Algorithm 4.5. EA pseudo-code. Generate(P(0)); t := 0; while not Termination_Criterion(P(t)) do Evaluate(P(t)); P  (t) := Selection(P(t)); P  (t) := Apply_Reproduction_Ops(P  (t)); P(t + 1) := Replace(P(t), P  (t)); t := t + 1; endwhile

The presented GA is parallelized in a hierarchical manner, including three levels of parallelism: the island model, the parallel evaluation of the population and the synchronous multi-start model. A conceptual simplistic model is shown

0, i = j 1, i = j

where α ∗ stands for the best found value at a given execution point, δ representing a small perturbation step value. At the end of a complete exploration cycle, after sampling a specified number of conformations for a given temperature, temperature update occurs: 1/D

ki = ki + 1, Tik = Ti0 exp(−cki

)

Each ki step was set to 0 as initial value, the associated Ti factors (except for the initial acceptance temperature) being set to 1. The initial acceptance temperature was set to be equal to γ F(α) where γ represents a pre-defined multiplication factor ranging from 1e − 3.0 to 1, α representing the

Fig. 4 Island deployment model and parallel evaluation of the individuals: multiple algorithms are executed independently in a concurrent manner, migrations of the individuals occurring on a periodic basis. Parallel evaluation of the population is also employed by each algorithm—for simplicity the figure depicts only one algorithm performing the parallel evaluation step

123

1192

in Fig. 4. The parallel affinity of the EAs represents a feature determined by their intrinsic population-based nature. First, several GAs cooperate by exchanging their genetic material, emigrants and immigrants, at a predefined number of iterations [the parallel island model (Alba et al. 2005)]. The exchange is performed in an asynchronous manner, i.e., no synchronization between the execution/generations of the algorithms is imposed. This is an important aspect to consider, when there is an interest for the algorithms to converge at different rates. The emigrant conformations are selected through a stochastic tournament technique. Immigrant individuals are integrate by replacing the worst individuals in the population. At each migration phase, one sixth of the population is selected for the exchange. In addition, at each generation, the best obtained conformation is preserved. The migrating individuals contribute to the diversity maintenance while assuring a coordinated convergence of all islands. Second, as the fitness function of each GA is timeintensive, the fitness evaluation phase is parallelized [the parallel evaluation of the population model (Alba et al. 2005)]. The granularity of the problem, as a counterpart term for the computationally expensive fitness evaluations, biased the resolution pattern towards a parallel, cooperative island-model approach. As a consequence, several populations evolve on a central machine with fitness function evaluations being distributed on remotely available computing units. We have to note that, the evaluation of the fitness function consists of several stages, including the calculation of Cartesian atomic coordinates, inter-atomic distances determination, etc. Third, for the hybrid approach, at the end of each generation, local search is performed in parallel on a set of selected solutions [the synchronous multi-start model (Alba et al. 2005)]. Also, one tenth of the population is subjected to adaptive simulated annealing local search. The initial individuals are selected by applying a random-selection strategy. In order to exploit the local search capabilities of the conjugated gradient local search method, two different sets of operators were designed. For each operator type, crossover and mutation, a simple approach was followed: a twopoint crossover and, respectively, a one-point torsion angle mutation. In addition, we designed operators that apply local search method on the offsprings. Crossover generates two new conformations from two specified parents. Next, we optimize the respective offsprings by applying local search. Similarly, for mutation, after applying a random torsion angle variation, on a random chosen angle, local search is also performed. One potential drawback of this technique is that it may lead to premature convergence. A careful balancing of the two sets of operators is required to allow, at the same time, diversity and convergence. For a complete overview on parallel and grid specific metaheuristcs, please consult (Cahon et al. 2006; Talbi 2002; Alba et al. 2005; Cahon et al. 2004).

123

A.-A. Tantar et al.

5 ParadisEO based implementation ParadisEO2 (Cahon et al. 2006; Melab et al. 2006) is a framework dedicated to the reusable design of parallel hybrid metaheuristics by providing a broad range of features, including EAs, local search methods, parallel and distributed models, different hybridization mechanisms, etc. The rich content and utility of ParadisEO increases its usefulness. ParadisEO is a C++ LGPL white-box open source framework, based on a clear conceptual separation of the metaheuristics from the problems they are intended to solve. This separation, and the large variety of implemented optimization features, allow a maximum code and design reuse. Changing existing components and adding new ones can be easily done, without impacting the rest of the application. ParadisEO is one of the rare frameworks that provide the most common parallel and distributed models, portable on distributed-memory machines and shared-memory multiprocessors, as they are implemented using standard libraries such as MPI, PVM and PThreads. The models can be exploited in a transparent way - one has just to instantiate its associated ParadisEO components. The user has the possibility of choosing, by a simple instantiation, the MPI or the PVM for the communication layer. The models have been validated on academic and industrial problems, and the experimental results demonstrate their efficiency (Cahon et al. 2004). The architecture of ParadisEO is layered, as it is illustrated in Fig. 5. From a top-down view, the first level supplies the optimization problems to be solved by using the framework. The second level represents the ParadisEO framework, including the optimization solvers, embedding single and multicriterion metaheuristics (EAs and local search). The third level provides interfaces for standard MPI based programming. At this level, virtually any standard conforming MPI distribution may be placed as layer: MPICH-VMI (Pant and Jafri 2004) has been chosen for our specific case. The fourth and lowest level supplies communication and resource management services (in this case, the VMI interfacing component which offers support for MPICH-VMI). The implementation relies on invariant elements provided by the ParadisEO framework, offering support for the insular model approach, as well as for distributed and parallel aspects concerning the parallel population evaluation. In this context, parallelization related aspects are transparent, the focus being oriented on the algorithm-specific elements. The main steps to construct the algorithm consist in specifying the: encoding of the individuals, specific operators and fitness function. Furthermore, elements concerning selection mechanisms and replacement strategies must be specified,

2

http://paradiseo.gforge.inria.fr

A grid-based genetic algorithm

1193

from Sophia; Capricorne and Sagittaire from Lyon; and GDX cluster from Orsay. In overall, the gathered resources cumulated 964 CPUs. The GRID is designed to allow a per reservation utilization of the resources. No interferences may occur during the experiments with resource allocation associated only with the user which made the reservation. The demanded resources are completely available during the entire experimentation time, unless, in extremis exceptional events occur. Detailed information regarding all the involved functional aspects may be accessed on the GRID5000 web site: https:// www.grid5000.fr. The considered benchmark instances were the tryptophancage mini-protein (PDB ID 1L2Y), tryptophan-zipper (PDB ID 1LE1) and the α-Cyclodextrin molecular complex (which is not a protein but it represents a special interest due to its toroidal structure and its applications in drugs development). Figure 6 offers an overview of the structure of each benchmark molecule.

Fig. 5 A layered architecture of ParadisEO

along with configuration parameters (e.g., number of individuals and number of generations).

6 Experiments and results The underlying support for performing the experiments was GRID5000, a French nation-wide experimental grid, connecting several sites which host clusters of PCs interconnected by RENATER3 (the French academic network). GRID5000 is promoted by CNRS, INRIA and several universities4 [for details, we refer the reader to (Bolze et al. 2006)]. At the time of this writing, the GRID is gathering more than 4000 processors with more than 100 Tb of non-volatile storage capacity. Inter-connections sustain communications of 10 Gbps. The project currently aims at interconnecting sites with hundreds to thousands of processors. For the moment, Grid5000 is regrouping nine centers: Bordeaux, Grenoble, Lille, Lyon, Nancy, Orsay, Rennes, SophiaAntipolis, and Toulouse. The runs were executed on three sites (Sophia, Lyon and Orsay) gathering six clusters, namely, Azur, Helios, and Sol 3

http://www.renater.fr

4

CNRS - http://www.cnrs.fr/index.html; INRIA - http://www.inria.fr.

Fig. 6 Structural overview of the considered benchmarks—α-Cyclodextrin, tryptophan-zipper (1LE1) and tryptophan-cage (1L2Y )

123

1194

Considering the number of involved distributed resources, no comparable study has been carried out, to the extent of our knowledge. The focus of our study is mainly set on validating the described algorithms from both exploratory capabilities and scalability perspectives. As the study confirms the scalability of the approach, the proposed algorithmic resolution can be used as basis for further more complex schemes. 6.1 Configuration of the algorithms 6.1.1 Genetic algorithm The algorithm is set to iterate for 100 generations, with a population size of 300 individuals. Two sets of operators are employed with the following probabilities: 0.95—crossover operator, 0.15—local search crossover, 0.05—mutation operator and 0.05—local search mutation. In addition, overall probabilities were associated for each type of operators: 1.0 and 0.1 for the crossover and the mutation, respectively. This results, for example, in a probability of 0.95 for applying the simple crossover operator and of 0.15 for applying the local search crossover. A stochastic tournament selection strategy was enclosed in the algorithm to select the offsprings. The replacement phase for back-integrating the offsprings is performed by using also a stochastic tournament strategy. Another important element is the parameterization of the asynchronous migration. This has important consequences on the algorithm. Frequent migrations may result in a premature convergence while distant migrations fall in the opposite case. In our situation, one sixth of the population migrates at each five generations, in asynchronous manner (migrations occur at different times, depending on the evolutionary search process). A tournament selection strategy is being applied for selecting the emigrant individuals while the immigrant discard the worse individuals in the target population. Migrations are performed inside a ring topology. Each algorithm has a source island for receiving individuals and a destination island for sending the emigrant individuals. The ring topological model, depending on the frequency of the migrations, balances diversity and coordination. For a large ring, a migrant surviving the selection process of all the algorithms will be integrated in his originating population after several generations. As a counterpart, low coordination between the algorithms is obtained. At the opposite end, a small ring does not virtually assure for diversity while maintaining a strong coordination of the algorithms. For the hybrid approach model, at the end of each generation, a number of conformations equal to one tenth of the population are randomly selected, becoming subject to parallel improvement via synchronously launched adaptive simulated annealing algorithms - the multi-start model. The reinsertion of the resulting conformations consists of a worse-replacement strategy.

123

A.-A. Tantar et al.

6.1.2 Adaptive simulated annealing A simple distributed EA is used for finding the optimal parameterization of the ASA. The algorithm has been executed inside the same grid environment. In this case, each individual of the meta-algorithm represents an encoding of the different parameterization values, i.e., initial temperature, number of accepted solutions, quenching factor, etc. The fitness of each individual has been computed as the average improvement obtained after running the ASA on a set of five known difficult conformations. A maximum number of 3,000 samplings was set, for each of the five conformations, on every fitness evaluation run. As an example, one of the chosen resulting parameterizations had a reannealing limit of 111 accepted conformations with 97 sampling points at each temperature and a large quenching factor of 33.16. For the synchronous multi-start execution, two approaches were considered. The ASA algorithm is either executed in order to sample 3,000 solutions in one run, or, in alternative, 10 short runs with 300 samples. In addition, at the end of one ASA run, the outcome conformation is further optimized by applying a 30 step gradient. As a general note, it should be mentioned that, the assignation policy process employed for distributing the parallel tasks to the available resources is transparent for the algorithms. The underlying framework maintains a pool of the available resources used for launching tasks. The assignation of a task to a specific resource attracts its exclusion from the pool until the termination of the task. As each island launches parallel tasks with comparable granularity no significant delays is noticed in the algorithm’s evolution. 6.2 Experimental outcomes All the best scored solutions obtained for the considered benchmarks, using the ASA-hybridized EA, are bellow the native reference energy. This is not a guarantee of a correct solution, but rather a measure of the exploration capabilities of the algorithm. The best scored conformation for the 1L2Y protein had an energy of 28.9 kcal mol−1 (reference energy: 46.6 kcal mol−1 , ball and stick representation given in Fig. 7, −3.5 kcal mol−1 (reference energy: 11.1 kcal mol−1 ) for the 1LE1 protein and 161.6 kcal mol−1 for α-Cyclodextrin (242.4 kcal mol−1 ). Although it comes as a result of the force field parameterization, the obtained low-energy conformations sustain the chosen approach as a viable technique. An important improvement is obtained by applying the gradient local search method. The main disadvantage is that it blocks easily in local optima points. As a comparison, for the α-cyclodextrin, the set of solutions found by the GA hybridized with the gradient method, had an average of 201.37 kcal mol−1 (std dev. 21.82 kcal mol−1 ), with a maximum of 243.05 kcal mol−1 and a minimum of

A grid-based genetic algorithm

1195 160 ASA Fitness Improvement

Number of conformations

140 120 100 80 60 40 20 0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fitness improvement

160

ASA Improvement < 1%

140

Energy (kcal mol-1)

161.69 kcal mol−1 while the genetic algorithm alone gave a set of solutions with an average of 3790.56 kcal mol−1 (std dev. 708.54 kcal mol−1 ) and a maximum, minimum of 5845.27 kcal mol−1 , 2,470 kcal mol−1 , respectively. For both approaches, the maximum number of generations was maintained at 100. The number of steps for the gradient method was set to 30. A number of 30 independent executions were performed for the gradient hybridized GA as well as for the GA alone. The presented results depict, in a first step, the outcomes of the ASA alone, executed on completely random generated conformations. The second part discusses the hybridization results. In the case of the ASA algorithm launched on random conformations, a bias can be observed. Figure 8 shows it towards large improvements ratios, corresponding to the initial phases of the algorithm, and small, near-zero improvements ratios, corresponding to the final/local optima exploration phases. Around 16% of the conformations allowed for an improvement above 10% whistl only 3% attained and improvement rate above 90%. Different line-patterns may be observed in Fig. 9. The near-zero improvement conformations, which give a vertical line-pattern, correspond to strong local optima which act as attractors. The horizontal line-patterns (more visible near 70, 60 and 46 kcal mol−1 ) correspond to near local-optima solutions. They were generated under the influence of the previously described conditions. The evolution of the ASA algorithm on solutions generated by the evolutionary algorithm are shown in Fig. 10. The clusters which may be observed at low energy values (final energy around 40 and 60 kcal mol−1 ) indicate basins in the landscape containing strong attractors. Figure 11 depicts the evolution of the three ASA hybridized EAs. The algorithms are connected by following a ring insular-model topology.

Fig. 8 Experimental results: histogram indicating for a given threshold the number of improved conformations. The conformations which had an improvement ratio bellow 0.1 are not included in the figure

120

100

80

60

40

0

0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01

ASA Fitness Improvement

Fig. 9 Experimental results: conformations with an improvement ratio bellow 0.1; different line-patterns may be extracted from the figure, relating to different strong local optima 180

Tryptophan-cage (1L2Y) - ASA

160

Final Energy (kcal mol-1)

Fig. 7 Experimental results: tryptophan-cage structure (28.9 kcal mol−1 )

140 120 100 80 60 40 20 20

40

60

80

100

120

140

160

180

200

Initial Energy (kcal mol-1)

Fig. 10 Experimental results: ASA improvement for the 1L2Y protein. The X axis corresponds to the initial energy while the final energy obtained after applying the local search is depicted on the Y axis

123

1196

A.-A. Tantar et al. 200

1

Tryptophan-cage (1L2Y) - GA+ASAwGrad

180

0.8

Fitness (de)correlation

160

Energy (kcal mol-1)

Tryptophan-cage (1L2Y) - GenAlg

140 120 100 80

0.6 0.4 0.2 0 -0.2

60 50

100

150

200

250

-0.4

300

0

10

20

30

40

Generation

Fig. 11 Experimental results: hybrid model including three insularmodel genetic algorithms with asynchronous migrations and multi-start synchronous local search improvement at the end of each generation

Energy values bellow the native reference conformation energy have been obtained for all the considered benchmarks. This proves the efficiency of the hybridization scheme. Even under these conditions, the high combinatorial constraints and the large conformational space still impose limitations on the given approach. In order to alleviate these, the robustness of the solutions should be considered. As free energy calculations may offer a representative score for a large ensemble

123

70

80

90

100

Tryptophan-cage (1L2Y) - GA+ASAwGrad

Fitness (de)correlation

0.15 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 -0.25

10

20

30

40

50

60

70

80

90

100

Generation

Fig. 13 Experimental results: fitness (de)correlation measures for the hybrid model, genetic algorithm + adaptive simulated annealing and conjugated gradient

0.4

Fitness-Distance Correlation (population)

7 Conclusions and future work

60

Fig. 12 Experimental results: execution of an island-model genetic algorithm hybridized with a conjugated gradient local search method 0.2

Asynchronous migrations are performed periodically. Moreover, at the end of each generation, a synchronous multi-start execution of the ASA component is performed. Figure 12 represents a fitness (de)correlation measure for the island-executed GAs alone (no ASA hybridization) over all the generations, offering an overview of the fitness dynamics along the execution path. The measure is computed by using the highest ranked individual in the population as reference. The fitness-distance correlation measurements offers information regarding the convergence rate of the algorithm, thus given the possibility of deciding on its different parameters. It can be observed the tendency of convergence towards the upper limit, near the 100 generation. The ASA hybridized variant comports an improved convergence rate, as depicted in Fig. 13. To a certain extent, this may be an effect of the fact that SA algorithms do not necessarily follow the closest steepest path. They also search for distant improved solutions, thus increasing conformations diversity. The evolution of the convergence tendency is also visible in Fig. 14. Several steps on the energy level scale correspond to convergence phases occurred during the execution of the algorithm.

50

Generation

Tryptophan-cage (1L2Y) - GA+ASA

0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4

0

50

100

150

200

250

300

Fitness value (kcal mol-1)

Fig. 14 Experimental results: fitness-distance correlation measures for the GA hybridized with the ASA. As above, the same basins can be identified by the formation of clusters

A grid-based genetic algorithm

of solutions, a bounded region in the conformational space can be evaluated as a single entity through a set of internal samples. A meta-algorithm is envisaged acting on bounded regions. The designed algorithms can then be employed as exploratory techniques. The modified approach only requires the replacement of the fitness evaluation with a sampling algorithm. This algorithm must be capable of offering a high confidence free energy evaluation for the considered bounded region it explores. In addition, an interesting direction to be taken should consider a centralized coordinated algorithm. This approach should allow for the exploration process of several parallel algorithms, guided in concordance with the improvement rate. Finally, adaptive features and operators inside the genetic algorithm should allow better performance alleviating the burden of finding optimized parameters. References Alba E, Luque G, Talbi E-G, Melab N (2005) Metaheuristics and parallelism. In: Alba E (ed) Parellel metaheuristics: a new class of algorithms, Chap. 4. Wiley Series on Parallel and Distributed Computing. Wiley, New Jersey, pp 79–104. ISBN 0-471-67806-6 Alder BJ, Wainwright TE (1957) Phase transition for a hard sphere system. J Chem Phys 27:1208 Alder BJ, Wainwright TE (1959) Studies in molecular dynamics I: General method. J Chem Phys 31:459 Bernstein FC, Koetzle TF, Williams GJ, Meyer E, Bryce MD, Rogers JR, Kennard O, Shikanouchi T, Tasumi M (1977) The protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol 112:535–542 Bolze R, Cappello F, Caron E, Dayde M, Desprez F, Jeannot E, Jegou Y, Lanteri S, Leduc J, Melab N, Mornet G, Namyst R, Primet P, Quetier B, Richard O, Talbi E-G, Touche I (2006) Grid’5000: a large scale and highly reconfigurable experimental grid testbed. Int J High Performance Comput Appl 20(4):481–494 Bonneau R, Tsui J, Ruczinski I, Chivian D, Strauss CME, Baker D (2001) Rosetta in CASP4: progress in ab-initio protein structure prediction. Proteins 45:119–126 Calland P-Y (2003) On the structural complexity of a protein. Protein Eng 16(2):79–86 Cahon S, Melab N, Talbi E-G (2004) ParadisEO: a framework for the reusable design of parallel and distributed metaheuristics. J Heuristics 10:357–380 Cahon S, Melab N, Talbi E-G (2005) An enabling framework for parallel optimization on the computational grid. In: Proceedings of 5th IEEE/ACM international symposium on cluster computing and the grid (CCGRID’2005), Cardiff, 9–12 May Crescenzi P, Goldman D, Papadimitriou CH, Piccolboni A, Yannakakis M (1998) On the complexity of protein folding. J Comput Biol 5(3):423–466 Dorsett H, White A (2000) Overview of molecular modelling and ab initio molecular orbital methods suitable for use with energetic materials. Department of Defense, Weapons Systems Division, Aeronautical and Maritime Research Laboratory, DSTOGD-0253, Salisbury South Australia, September Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor Ingber L (1993a) Adaptive simulated annealing (ASA), global optimization C-code, Caltech Alumni Association Ingber L (1993b) Simulated annealing: practice versus theory. Math Comput Model 11(18):29–57

1197 Ingber L (1996) Adaptive simulated annealing (ASA): lessons learned. Control Cybern 25(1):33–54 Ingber L (2001) Adaptive Simulated Annealing (ASA) and PathIntegral (PATHINT) algorithms: generic tools for complex systems, ASA-PATHINT Lecture Plates, Lester Ingber Research Ingber L, Rosen B (1993) Genetic algorithms and very fast simulated reannealing: A comparison. Oper Res Manage Sci 5(33):523 Islas AL, Schober CM (2003) Multi-symplectic integration methods for generalized Schrödinger equations. Future Generation Comput Syst 19:403–413 Kikuchi H, Kalia RK, Nakano A, Vashishta P, Iyetomi H, Ogata S, Kouno T, Shimojo F, Tsuruta K, Saini S (2002) Collaborative simulation grid: multiscale quantum-mechanical/classical atomistic simulations on distributed PC clusters in the US and Japan. IEEE, New York Krasnogor N, Hart W, Smith J, Pelta D (1999) Protein structure prediction problem with evolutionary algorithms. In: Proceedings of the genetic and evolutionary computation conference Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680 Lattman EE (2001) CASP. Proteins 4( 44):399 Levinthal C (1969) Mossbauer spectroscopy in biological systems. In: DeBrunner JTP, Munck E (eds) Proc. How to Fold Graciously). University of Illinois Press, proceedings of a meeting held at Allerton House, Monticello, pp 22–24 McCammon JA, Gelin BR, Karplus M (1977) Dynamics of folded proteins. Nature (Lond.) 267(5612):585–590 Melab N, Cahon S, Talbi E-G (2006) Grid computing for parallel bioinspired algorithms. J Parallel Distrib Comput (JPDC) 66(8):1052– 1061 Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092. doi:10.1063/1.1699114 Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK, Olson AJ (1999) Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem 19:1639–1662 Moore BE, Reich S (2003) Multi-symplectic integration methods for Hamiltonian PDEs. Future Generation Syst 19:395–402 Nakano A, Kalia RK, Vashishta P, Campbell TJ, Ogata S, Shimojo F, Saini S (2001) Scalable atomistic simulation algorithms for materials research. SC2001 November 2001, Denver (c) 2001. ACM, New York Neumaier A (1997) Molecular modelling of proteins and mathematical prediction of protein structure. SIAM Rev 39:407–460 Ngo JT, Marks J (1992) Computational Complexity of a Problem in Molecular-Structure Prediction. Protein Eng 5(4):313–321 Pant A, Jafri H (2004) Communicating efficiently on cluster based grids with MPICH-VMI. In: 2004 IEEE International Conference on Cluster Computing (CLUSTER’04), pp 23–33 Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE (2004) UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem 25:1605–1612 Ponder JW, Case DA (2003) Force fields for protein simulations. Adv Protein Chem 66:27–85 Rabow A, Scheraga H (1996) Protein. Science 5:1800–1815 Sherwood P (2000) Hybrid quantum mechanics/molecular mechanics approaches, modern methods and algorithms of quantum chemistry, proceedings, 2nd edn. In: Grotendorst J (ed) John von Neumann Institute for Computing, Jülich, NIC Series, vol 3. ISBN 3-00-005834-6, pp 285–305 Tantar A-A, Melab N, Talbi E-G, Toursel B (2006) Solving the protein folding problem with a bicriterion genetic algorithm on the grid. CCGRID, Vol. 2, ISBN 0-7695-2585-7, 43, 2006

123

1198 Tantar A-A, Melab N, Talbi E-G, Parent B, Horvath D (2007) A parallel hybrid genetic algorithm for protein structure prediction on the computational grid. Future Generation Comput Syst 23:398–409 Talbi E-G (2002) A taxonomy of hybrid metaheuristics. J Heuristics 8:541–564 Thomsen R (2003) Flexible ligand docking using evolutionary algorithms: investigating the effects of variation operators and local search hybrids. Biosystems 72:57–73 Vande Waterbeemd H, Carter RE, Grassy G, Kubinyi H, Martin YC, Tute MS, Willett P (1997) Glossary of terms used in computational drug design. Pure Appl Chem 69(5):1137–1152 Vreven T, Morokuma K, Farkas Ö, Schlegel HB, Frisch MJ (2003) Geometry optimization with QM/MM,ONIOM, and other com-

123

A.-A. Tantar et al. bined methods. I. Microiterations and constraints. Wiley Periodicals Inc. J Comput Chem 24:760–769 Westhead DR, Clark DE, Murray CW (1997) A comparison of heuristic search algorithms for molecular docking. J Comput Aided Molec Des 11:209–228 White A, Zerilli FJ, Jones HD (2000) Ab initio calculation of intermolecular potential parameters for gaseous decomposition products of energetic materials. Department of Defense, Energetic Materials Research and Technology Department, Naval Surface Warfare Center, DSTO-TR-1016, Melbourne Victoria 3001 Australia, August

Suggest Documents