An MPI Implementation of the Fast Messy Genetic Algorithm - CiteSeerX

An MPI Implementation of the Fast Messy Genetic Algorithm George H. Gates, Jr.

Air Force Research Laboratory WL HPC Technical Focal Point Wright-Patterson AFB, OH 45433 [email protected]

Gary B. Lamont

Department of Electrical and Computer Engineering Graduate School of Engineering Air Force Institute of Technology Wright-Patterson AFB, OH 45433 lamont@a t.af.mil

Laurence D. Merkle

Center for Plasma Theory and Computation Phillips Laboratory Kirtland AFB, OH 87117 [email protected] Abstract

The fast messy genetic algorithm (fmGA) belongs to a class of algorithms inspired by the principles of evolution, known appropriately as "evolutionary algorithms" (EAs). These techniques operate by applying biologically-inspired operators, such as recombination, mutation, and selection, to a population of individuals. EAs are frequently applied as optimum seeking techniques, by way of analogy to the principle of \survival of the ttest." In contrast to many EAs, the fmGA consists of several evolutionary phases, each with distinct characteristics of local/global computation. These are explained in the paper. Parallel computational experiments are performed to determine the eectiveness, ef ciency, and scaled e ciency of fmGA implementations for several parallel/distributed

1

environments. The underlying communication libraries are based on the Message Passing Interface (MPI), as well as an NX-based Intel Paragon implementation. Each implementation is evaluated with respect to two selection strategies and two recombination strategies. The optimization problem for these experiments is the minimization of the CHARMM energy model of the pentapeptide Met]-Enkephalin. With this application, the use of the MPI constructs permit eective and e cient execution on a variety of platforms.

1 Introduction The native message-passing libraries of various parallel supercomputers, including the Intel Paragon, are syntactically distinct and typically incompatible. This is despite the fact that the commonly used message-passing operations on the various platforms are essentially identical semantically. As a consequence of the incompatibility of the communications libraries, applications which are implemented using these libraries are not portable. Considerable eort is expended in continuously porting applications to next generation supercomputers. The purpose of the Message Passing Interface (MPI) standard 25] is to improve the portability of parallel supercomputer applications. This paper discusses an MPI-based implementation of the fast messy genetic algorithm (fmGA), and computational experiments which evaluate the eectiveness and e ciency of the implementation on several parallel architectures. The fmGA belongs to the class of evolutionary algorithms, which is discussed in Section 2. Section 3 describes computational experiments. The results in Section 4 present an evaluation of the performance of the MPI-based implementation on each of several parallel architectures, as well as that of an earlier NX-based fmGA implementation 7].

2 Evolutionary Algorithms Many problems in science, engineering, and operations research may be viewed as optimization problems. Informally, an optimization problem involves a number of alternatives, and the solution to the problem is the set of alternatives which maximize or minimize some criterion. Only one general method exists which is guaranteed to obtain the optimum alternative of an arbitrary optimization problem, that being exhaustive search 31] (pure random search being a special case thereof). For many optimization problems of practical interest, the number of alternative solutions prohibits their enumeration. Thus the practitioner is often forced to settle for solutions which are \good enough," even though they are not optimal. When the problem is posed as such, it may be referred to as a semi-optimization problem 30], and the solution techniques employed are called optimum-seeking techniques. Even nding an \acceptable" solution may require exploring a signi cant part of a large dimensional search space. The various techniques used to solve optimization problems are part of what come to be a uni ed theory of optimization. Some techniques have been known and used for

2

centuries, while the advent of the high speed digital computer has enabled the application of other techniques which were previously impractical due to the large number of calculations required. It also brought about the development of entirely new techniques designed explicitly to exploit the strengths of the digital computer. Still, problem sizes are limited by processor speed and memory size, and the relative e ciency and eectiveness of existing algorithms are not necessarily preserved by hardware advances. This last observation is especially important as multiprocessor architectures (parallel, distributed, or otherwise) become more prevalent. Realization of the eectiveness and e ciency improvements made possible by such architectures depends on the design and use of appropriate multiprocessor algorithms. Just as the arrival of single processor computers spawned the development of new techniques, multiprocessor architectures invite the investigation of a new set of optimization tools. A promising set of candidates for such investigation consists of those algorithms inspired by the principles of evolution, known appropriately as evolutionary algorithms (EAs) 1]. These techniques operate by applying biologically-inspired operators, such as recombination, mutation, and selection, to a population of individuals, each of which represents a candidate solution (alternative). The use of EAs as optimum seeking techniques derives from the resulting analogy to the principle of \survival of the ttest." The best known EA is the simple genetic algorithm (sGA) 8]. Another EA, which is historically related to the sGA, is the fast messy genetic algorithm 11].

2.1 Fast Messy Genetic Algorithms

In contrast to the uniform-length binary string representation scheme of the sGA, the fmGA scheme is order-invariant and such that individuals are not necessarily of uniform length. Also in contrast to the sGA, the fmGA algorithm consists of distinct phases, and is typically applied iteratively. Successive iterations are performed varying the building block size k. The upper limit of the iteration controls a tradeo between execution time and solution quality. The three phases of each iteration, which are illustrated in Figure 1, are the initialization, primordial, and juxtapositional phases. The initialization phase consists of a random sampling of individuals of length `0 = ` ; k, where ` is the number of discrete optimization variables. Probabilistically Complete Initialization (PCI) is a method of generating the initial population. The goal of the primordial phase is to obtain a population containing individuals of length k which can with high probability be juxtaposed to obtain an optimal individual. Tournament selection focuses the search on highly t individuals, while building block ltering (BBF) is a mutation operator which is used to periodically reduce the lengths of the individuals. The juxtapositional phase uses cut-and-splice (a recombination operator) to construct highly t individuals of length ` from the length k individuals surviving the primordial phase. Tournament selection is used to focus the search on highly t combinations. In both the primordial and juxtapositional phases, a locally optimal solution, called the competitive template, is used to \ ll in the gaps" in partially speci ed solutions to

3

Start

PCI t:=0

Initialization

t < tp?

t < tf?

BBF

Cut and Splice

Select

Select

t:=t+1

t:=t+1

Primordial

Stop

Juxtapositional

Figure 1: Fast Messy Genetic Algorithm Flow Chart allow their evaluation. Also, in order to prevent the cross-competition between building blocks caused by non-uniform scaling, competition is restricted to those individuals which are de ned at some threshold number of common loci. PCI generates a population of random individuals in which each building block has an expected number of copies su cient to overcome sampling noise. Each individual in the population is de ned at `0 = ` ; k loci, which are selected randomly without replacement (it is assumed that k `). The population size is N = `(!(``;;k2)!k2)! 2z 2 2(m ; 1)2k (1) where m is the number of building blocks in a fully speci ed solution, is a parameter specifying the probability of selection error between two competing building blocks, P Z z ] = 1 ; where Z is a standard normal random variable, and 2 is a parameter specifying the maximum inverse signal-to-noise ratio per subfunction to be detected 11]. The fast messy GA primordial phase enriches the initial population via alternating tournament selection and building block ltering (BBF). Tournament selection increases the proportion of individuals containing highly t building blocks. BBF then randomly deletes some number of genes from every individual, the number being chosen so that BBF is expected to disrupt many but not all of the highly t building blocks. Those individuals still containing highly t building blocks receive additional copies in subsequent iterations of tournament selection. The net eect is to produce a population of partial strings of length k with a high expected proportion of highly t building blocks.

4

Competition is restricted to those pairs of individuals that contain a speci ed number of common de ning loci. The threshold for each generation is speci ed as an input parameter. Current practice is to use an empirically determined ltering and threshold schedule, although theoretical work is in progress to allow a priori schedule design 16, 21]. Frequently, in order to obtain good eectiveness, it is necessary to use quite restrictive thresholding. In this case, the algorithmic complexity of selection is dominated by the O(N 2) search for pairs that satisfy the threshold, where N is the population size. One attractive feature of genetic algorithms is that they are highly parallelizable, in the sense that it is possible to obtain near-linear or better speedups on a wide variety of parallel architectures, typically with relatively little development eort. Many parallel genetic algorithm implementations use the \island" model, wherein the population is partitioned amongst the processors, and each processor executes an independent genetic algorithm 13]. The interprocessor communication for this model is easily tailored to a speci c architecture, since no communication is required for tness evaluation. Numerous variations exist in which either the selection operation executing on a particular processor is aected by other processors' subpopulations, or processors communicate some portion of their subpopulations to other processors. These approaches extend with some modi cations to the messy GA 24].

2.1.1 Parallel Algorithm Design

A parallel fast messy GA (PFMGA) is designed based on the island model 13]. The PFMGA is implemented on the Intel Paragon parallel supercomputer and IBM SP2 in C. These programs are part of AFIT's GA Toolkit, which includes several sequential and parallel GAs 18]. A \controller" processor inputs the GA parameters, creates a competitive template, and broadcasts them to the remaining processors. Then, each processor (including the controller processor) independently performs PCI to generate and evaluate an initial population of N=P individuals, where the global population size N is given by Equation 1 and P is the processor count. Because N depends on building block size, it is determined independently for each iteration. During the primordial phase, selection may be performed either globally or locally. The search trajectories resulting from global selection are essentially equivalent to those of a single processor run.1 Speci cally, global selection consists of a communication step (collection), one or more selection steps, and another communication step (distribution): Each processor other than the controller sends its subpopulation to the controller, which collects the subpopulations into a single population. The controller performs tournament selection on the combined population repeatedly until BBF is required. In these experiments, the pseudo-random number generators for the processors are mutually independent. Thus, the search trajectory does depend on the processor count. 1

5

The controller partitions the new population into subpopulations and distributes them amongst the processors. In contrast, the search trajectories resulting from local selection depend fundamentally on the number and size of the subpopulations, and hence on the processor count. Speci cally, each processor (including the controller) independently performs tournament selection on its subpopulation. Thus, local selection does not require communication. Following selection, each processor (including the controller) performs BBF and function evaluation for its subpopulation. In the context of global selection, this step may be viewed as a farming model design. The juxtapositonal phase may also be performed either globally or locally. In contrast to the situation with primordial phase selection, the global option does not correspond directly to a single processor run. Speci cally, the global combined strategy begins with two communication steps: Each processor other than the controller sends its subpopulation to the controller, which collects the subpopulations into a single global population. The controller broadcasts the global population to the remaining processors. Each processor (including the controller) then independently applies tournament selection and cut-and-splice to the global population (using dierent pseudo-random number sequences). The communication steps are absent from the independent strategy, so that each processor (including the controller) independently applies tournament selection and cut-and-splice to its nal primordial phase population. At the end of each iteration, regardless of the selection method and the juxtapositional phase strategy, each processor other than the controller sends its best solution to the controller. The controller determines the overall best solution, which becomes the competitive template for the next iteration, and reports execution statistics.

3 Experimental Design This section describes computational experiments which compare the performance of several island model 13] fmGAs: an NX-based Intel Paragon implementation 7], as well as MPI-based implementations for the Intel Paragon and the IBM SP2. In each case, experiments are conducted to determine the eectiveness (overall minimum energy), speedup, (absolute) e ciency, and scaled e ciency for each combination of global selection vs local selection and independent vs global combine strategy.

6

Absolute e ciency is measured by varying the processor count while xing the global population size, as determined by Equation 1. Scaled e ciency is measured by varying the processor count while holding the subpopulation size xed, as determined by Equation 1 and assuming an intermediate number of processors. Each execution uses four fmGA interations, and results are averaged over ve executions using independent pseudo-random number sequences. The optimization problem for these experiments is the minimization of the CHARMM energy model 27] of the pentapeptide Met]-Enkephalin. Previous research eorts have established Met]-enkephalin conformations which are known to be low in energy, as measured by the CHARMM energy model 15, 23]. Appendix A describes the general polypeptide conformation problem in more detail.

4 Results and Analysis This section discusses the observed eects of the various experimental factors on the performance (overall minimum energy, speedup, absolute e cency, and scaled e ciency) of the PFMGA. For local selection with xed independent subpopulations, the following table shows the performance (eectiveness and e ciency) for the indicated number of processors, the subpopulation size change due the iterated process, as averaged over 5 experiments (see Tables 1 and 2).

Table 1: Overall minimum energies (kcal/mol) obtained by the MPI-based Paragon implementation using local selection, xed subpopulation size, and the independent juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time (1,2,4,8) 2 360 -29 360 3 492 -29 560 4 757 -30 1150

On the SP2, the minimum energy values (eectiveness) are found using 8 or 16 processors. The computational times should be relatively the same across the number of processors since the same since xed subpopulation sizes are employed per processor. The increasing subpopulation size approximately increases the computation time as a ratio of block size for the SP2. The paragon computation time increases at a little faster rate because of increased message size. It is interesting to note that the SP2 energies are worst than the Paragon energies, even though the SP2 has higher global populations for a larger number of processors. But the Paragon local subpopulation of 757 provides better results noting that the Paragon subpopulation size is larger than any local subpopulation on the SP2. As the number of processors increases, one can increase subpopulation size possibily providing

7

Table 2: Overall minimum energies (kcal/mol) obtained by the MPI-based SP2 implementation using local selection, xed subpopulation size, and the independent juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time (1,2) 2 180 -20 48 3 246 -22 68 4 378 -23 118 (4,32) 2 180 -25 48 3 244 -26 68 4 376 -27 118 (8,16) 2 176 -24 47 3 240 -28 68 4 376 -30 119 (64) 2 128 -22 35 3 192 -25 55 4 320 -25 97

better energies, however, the Paragon will have higher computation time averages than that of the SP2 as shown because of its architecture. Now, for local selection with xed populations and independent startegy, Table 3 shows the average performance, again averaged over 5 experiments: The apparent superlinear speedup can be attributed to the fmGA shue size parameter. This parameter forces a search through the population with order n2 , Thus, reducing the subpopulation by half reduces by a factor of four that part of the execution time spent in searching for compatible individuals, resulting in the observed superlinear speedup. Also, note that small number of processors achieve the overally mimimum energy value due to critical population size in this application. Scaled e ciency is relatively the same for each block size and number of processors. Although the Paragon has better scaled e ciencies. The scaled e ciency is increasing and is beyond 1.0 due to the superlinear speedup and thus has litte meaning. Considering global selection with xed population and global combination, the Paragon with only 8 or fewer processors did not generate lower energy results as shown in Table 5. The same performance was observed for SP2 with 8 or fewer processsors. However, the SP2 via Table 6 reects excellent results for 16 processors: Speedup calculations for global selection are based on the execution times of local selection single processor runs, because global and local selection are equivalent for single processor runs. Although the minimum energy found is very good, speedup decreases due to extensive communication as the population increases. Using more SP2 processors improves the value of the minimum energy found (i.e., the global combine algorithm becomes

8

Table 3: Overall minimum energies (kcal/mol), speedups, and absolute eciencies obtained by the MPI-based Paragon implementation using local selection, xed global population size, and the independent juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time Speedup E ciency (1) 1 1440 -21 1340 N/A N/A 2 1452 -25 2000 N/A N/A 3 1968 -26 3650 N/A N/A 4 3028 -28 9200 N/A N/A (2) 2 726 -25 840 2.4 1.2 3 980 -26 1380 2.8 1.4 4 1514 -27 3150 2.9 1.4 (4) 2 363 -29 365 6.0 1.5 3 492 -30 560 8.7 2.2 4 757 -30 1150 8.3 2.1 (8) 2 181 -23 170 12.0 1.5 3 246 -25 250 14.6 1.8 4 378 -26 460 20.0 2.3 more eective). As with any fmGA, appropriate building block ltering schedules along with e cient communication impact performance. Note the decreasing e ciency above due to increase communication time. Other experiments in the matrix using global selection with xed population (independent and global combination variations) did not produce good minimization values or e ciences. For example: With independent xed population using global selection, the eectiveness is less than with associated global combined strategy. However as seen from Tables 7 and 8, the computation time is less along with better speedup and e ciency. In general, the Paragon fmGA performance results are similar to our previous fmGA serial and parallel results using NX 7]. These results are summarized in Tables 9, 10, 11, and 12. As indicated above, the larger populations within these two experiments acquired good minimum energies as compared to our other GAs experiments using NX 22, 23]. Scaled e ciency for xed subpopulation with independent strategy indicate poor performance. For xed global populations, the NX Paragon model reects e ciencies greater than one due again to superlinear speedup as discussed previously. Note that the Paragon-NX minimim energy values do not approach those of the Paragon-MPI algorithm. In the rst case, for a large xed population size of 4096 relatively poor performance resulted. This can be compared to smaller xed population sizes of the MPI version resulting in better minimum energy values. With large MPI xed populations, minimim energy values close to the \global" minimun have been found. This is not so for the small 32 xed population size of the NX version. However, computational times for Paragon-NX and Paragon-MPI are close to the same values.

9

Table 4: Overall minimum energies (kcal/mol), speedups, and absolute eciencies obtained by the MPI-based SP2 implementation using local selection, xed global population size, and the independent juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time Speedup E ciency (1) 1 1440 -21 365 N/A N/A 2 1452 -25 470 N/A N/A 3 1968 -26 790 N/A N/A 4 3028 -28 1740 N/A N/A (2) 2 726 -25 210 2.3 1.2 3 984 -26 330 2.6 1.3 4 1514 -27 660 2.7 1.4 (4) 2 363 -29 99 4.7 1.2 3 492 -30 140 5.6 1.4 4 757 -30 270 6.3 1.6 (8) 2 181 -23 47 10. 1.3 3 246 -25 69 11. 1.4 4 378 -26 120 15. 1.8 (16) 2 90 -22 26 18. 1.1 3 123 -25 34 23. 1.6 4 189 -26 58 29. 1.8 (32) 2 45 -21 13 36. 1.1 3 61 -23 19 40. 1.3 4 94 -24 27 64. 2.0 (64) 2 22 -19 7 66. 1.0 3 30 -22 9 89. 1.4 4 47 -22 14 124. 2.0 We observed that the use of MPI as a communication library permitted ease of rehosting on various computational platforms and running the multitude of experiments. Of course, one has to consider the speci c C-language compiler constructs and associated MPI implementation (Paragon-mpich, SP2-IBM's MPI). Although not included in the extensive set of experiments, limited experiments on the SGI Power Challenge and the SGI Origin 2000 indicate similar minimum energy trends. Of course, the communication backplane architectures (Shared-buses) are relatively slower than the Paragon (400MB/s) and SP2 (150MB/s). Thus, wall-clock times will be much slower on these as well as ethernet backplane structures of NOWs. For larger fmGA populations, computational platforms with high-speed backplanes and high-speed processors generate eective and e cient minimum energy polypeptide conformations and are therefore recommended in this type of application.

10

Table 5: Overall minimum energies (kcal/mol), speedups, and absolute eciencies obtained by the MPI-based Paragon implementation using global selection, xed global population size, and the global combined juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time Speedup E ciency (8) 1 180 -26 950 1.4 .18 2 181 -26 1425 1.4 .18 3 246 -27 2860 1.3 .16 4 378 -27 7728 1.2 .15

Table 6: Overall minimum energies (kcal/mol), speedups, and absolute eciencies obtained by the MPI-based SP2 implementation using global selection, xed global population size, and the global combined juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time Speedup E ciency (8) 1 180 -22 250 1.5 .19 2 181 -26 308 1.5 .19 3 246 -26 550 1.4 .18 4 378 -27 1303 1.3 .16 (16) 2 90 -30 290 1.6 .10 3 123 -31 535 1.5 .10 4 189 -31 1331 1.3 .08

5 Conclusions & Recommendations This paper discusses the development of an MPI-based implementation of the fast messy genetic algorithm. The eectiveness and e ciency of this implementation is determined experimentally on the Intel Paragon and IBM SP2. The MPI performance is compared to those obtained for an NX-based implementation of the fast messy genetic algorithm. Thus, MPI based fmGA algorithms can provide eective and e cient solutions for very high dimensional problems over a range of high performnce computing platforms. As a characteristic of the application, the SP2 will generally have better performance than the Paragon given the same population size. ... An additional communication strategy to consider is the combined strategy. Here,each node sends its entire population to node 0 at the end of the primordial phase. Node 0 combines the subpopulations into a single population, with which it executes the juxtapositional phase. Each of the other nodes executes the juxtapositional phase with just its subpopulation, then sends its best solution to node 0. Node 0 again determines the overall best solution.

11

Table 7: Overall minimum energies (kcal/mol), speedups, and scaled eciencies obtained by the MPI-based Paragon implementation using global selection, xed global population size, and the independent juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time Speedup E ciency (8) 1 180 -20 380 3.5 .44 2 181 -25 930 2.2 .28 3 246 -26 2200 1.7 .21 4 378 -27 6400 1.4 .18

Table 8: Overall minimum energies (kcal/mol), speedups, and scaled eciencies obtained by the MPI-based SP2 implementation using global selection, xed global population size, and the independent juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time Speedup E ciency (8) 1 180 -21 123 3.0 .38 2 181 -25 208 2.3 .29 3 246 -26 416 1.9 .24 4 378 -27 1141 1.5 .19 (16) 1 90 -21 62 5.9 .37 2 90 -25 140 3.4 .21 3 123 -27 320 2.6 .16 4 189 -27 987 1.8 .11 Other computational networks to be addressed in the future include heterogenous networks of Sun and Silicon Graphics workstations, SGI Power Challange, SGI Origin 2000, and a network of Intel Pentium Pro workstations. Based upon our experience, the rehosting of MPI communication structures within the polypeptide structure prediction problem should be relatively easy. We wish to acknowledge the use of the DoD High Performance computing facilities at the Maui HPCC and at the Wright Patterson AFB Major Shared Resource, Center (MSRC) in support of this research.

6 Appendix A Protein Structure Prediction In this appendix, we discuss the objective function associated with our polypeptide energy minimization application as well as binary encoding scheme. The capability to predict a polypeptide's molecular structure given its amino acid sequence is important to numerous scienti c, medical, and engineering applications 5]. The eort to develop a general technique for such structure prediction is commonly referred to as the protein

12

Table 9: Overall minimum energies (kcal/mol), speedups, and scaled eciencies obtained by the NX-based Paragon implementation using local selection, xed subpopulation size, and the independent juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time Speedup E ciency (1) 3 32 -5 190 N/A 1. (2) 3 32 -7.5 190 2 1. (4) 3 32 -12 190 4 1. (8) 3 32 -14.5 190 8 1. (16) 3 32 -17 190 16 1. (32) 3 32 -18 190 32 1. (64) 3 32 -20 190 64 1. (128) 3 32 -22 190 128 1. folding problem, where the minimization of an energy function in conformational space is required 34]. One optimization algorithm which has been applied to the protein folding problem is genetic algorithms 3]. Genetic algorithms (GAs) are a class of stochastic, population-based, optimization algorithms, and are described in detail elsewhere (c.f. Holland 14], Goldberg 8], or Michalewicz 26]). One general class of functions for which the canonical (or \simple") GA tends to converge toward suboptimal solutions is that of deceptive functions 35]. The messy GA 12, 9, 10] is designed speci cally to overcome deception.

6.1 Objective Function

The conformational state of a molecule, de ned by the physical locations of the component atoms, is represented by the internal coordinate system. The location of atom, i is de ned in terms of bond length (i j ), bond angle (i j k), and dihedral angle (i j k l) with respect to neighboring atoms. Bond lengths and angles are assumed xed leaving the dihedral angles (i i !i i ) as independent variables. See Figure 2 The objective function, which we seek to minimize, is based on the CHARMM 4] energy function X E= Kr (rij ; req )2 + n

ij

(ij )2B

X

(ijk)2A

K (

X

(ijkl)2D

X (ij )2N

ijk

ijk ; eq )

2

+

K 1 + cos(nijkl !ijkl ; ijkl )] + ijkl

2 !12 !6 3 A B q q ij ij i j 4 5+ ; +

rij

rij

13

4 "rij

Table 10: Overall minimum energies (kcal/mol), speedups, and scaled eciencies obtained by the NX-based Paragon implementation using global selection, xed subpopulation size, and the global combined juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time Speedup E ciency (1) 3 32 -5 190 N/A N/A (2) 3 32 -4 230 1.6 .80 (4) 3 32 -12 340 1.0 .25 (8) 3 32 -14.5 500 1.2 .15 (16) 3 32 -17 900 1.3 .08 (32) 3 32 -18 1200 1.3 .04 (64) 3 32 -20 4000 1.2 .02 (128) 3 32 -22 7800 1.1 .01 2

1 X 4 Aij 2 (ij )2N rij

!12

0

; Br ij ij

!6

3

qi qj 5 + 4 "r ij (2)

where the ve terms (which we denote EB , EA, ED , EN , EN ) represent the energy due to bond stretching, bond angle deformation, dihedral angle deformation, non-bonded interactions, and 1-4 interactions, respectively. Speci cally, B is the set of bonded atom pairs, A is the set of atom triples de ning bond angles, D is the set of atom 4-tuples de ning dihedral angles, N is the set of non-bonded atom pairs, N 0 is the set of 1-4 interaction pairs, rij is the distance between atoms i and j , ijk is the angle formed by atoms i j and k, !ijkl is the dihedral angle formed by atoms i j k and l, qi is the partial atomic charges of atom i, the Kr 's, req 's, K 's, eq 's, K 's, ijkl 's, Aij 's, Bij 's, and " are empirically determined constants (taken from the QUANTA parameter les). The CHARMM energy model seems to be the most complex of those available for polypeptide prediction including AMBER and ECEPP/2 which we have employed elsewhere 22]. The primary determinants of a protein's 3-D structure, and thus the energetics of the system, are its independent dihedral angles 34]. Our genetic algorithm operates on individuals which encode these dihedral angles 7]. In Equation 2, E is expressed as 0

ij

ijk

ijkl

14

Table 11: Overall minimum energies (kcal/mol), speedups, and (absolute) eciencies obtained by the NX-based Paragon implementation using global selection, xed global population size, and the independent juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time Speedup E ciency (1) 3 4096 -18 100000 N/A N/A (2) 3 2048 -20 12000 2.5 1.25 (4) 3 1024 -21.5 9000 10. 2.5 (8) 3 512 -21.7 1250 12. 1.5 (16) 3 256 -20.5 1050 80. 5.0 (32) 3 128 -21 750 180. 5.6 (64) 3 64 -21.3 240 300. 4.7 (128) 3 32 -22 200 500. 3.9 a function of both the internal coordinates (bond lengths rij for (i j ) 2 B, bond angles ijk , and dihedral angles !ijkl ) and the inter-atomic distances rij for (i j ) 2 N N 0. Thus, in order to calculate E (and hence the tness) for the conformation encoded by an individual, it is necessary to calculate its Cartesian coordinates from its internal coordinates.

6.2 Encoding Scheme

Two speci c biomolecules are investigated based on variety of reasons. The rst is the pentapeptide Met]-enkephalin2 . This molecule is chosen because it has been used as a test problem for many other energy minimization investigations (e.g. 19, 29]), and its minimum energy conformation is known (with respect to the ECEPP/2 energy model) 20]. A minimum energy for Met]-enkephalin with respect to CHARMM was published 28] but there is a question about this minimum since the ECEPP optimal was used as the starting conformation.

Met]-enkephalin has 75 atoms. In it, 24 dihedral angles are treated as independent, the rest are either xed, or treated as dependent. In the binary representation, each individual is a xed length binary string encoding the independent dihedral angles of a polypeptide conformation. The decoding function used is the a ne mapping D : f0 1g10 ! ; ] of 10 bit subsequences to dihedral angles such that 10 X (3) D(a1 a2 : : : a10) = ; + 2 aj 2;j : j =1

This encoding yields a precision of approximately one third of one degree. As in the ve amino acids in Met]-enkephalin are in order tyrosine, glycine, glycine, phenylalanine, methionine 33] 2

15

Table 12: Overall minimum energies (kcal/mol), speedups, and (absolute) eciencies obtained by the NX-based Paragon implementation using global selection, xed global population size, and the global-combined juxtapositional strategy. Computation time is in seconds. Proc count Block size SubPop-Size Min Energy Comp Time Speedup E ciency (1) 3 4096 -18.0 100000 N/A N/A (2) 3 2048 -19.8 38000 3.0 1.5 (4) 3 1024 -19.9 10900 7.0 1.75 (8) 3 512 -22.5 10000 10. 1.25 (16) 3 256 -23.8 9990 11. 0.7 (32) 3 128 -24.8 9950 11. 0.3 (64) 3 64 -24.0 9910 11. 0.2 (128) 3 32 -23.9 9900 11. 0.1 previous studies 15, 23] studies, this research takes the 24 , , ! , and dihedral angles of Met]-Enkephalin to be the independent variables for optimization, hence its string length is 240.

6.3 Application of the PFMGA to Energy Minimization

Each individual in the PFMGA population, once overlaid on the competitive template, is a binary string representation of a subset of the internal coordinates required to fully specify a conformation. The NC represented coordinates are referred to as the \independently variable" coordinates. Which of the coordinates are to be independently variable is speci ed in a Z-matrix 3], which is input during initialization. Each independently variable coordinate V is represented by bits (bV 0+1 bV 0+2 : : : bV0+k^ ) in the string such that

V = Vmin + (Vmax ; Vmin)

k^ X i=1

bV 0+i 2;i

(4)

where k^ = `=NC and V 2 Vmin Vmax]. The values of the coordinates which are not independently variable are also speci ed in the Z-matrix, while the empirical constants of Equation 2, are speci ed in another input le which is read during initialization 3]. Finally, the interatomic distances for a conformation are determined by the internal coordinates 3]. Thus, the tness of an individual is determined by overlaying it on the competitive template, \decoding" each of the independently variable coordinates, calculating the interatomic distances, and then calculating the energy associated with the conformation via Equation 2.

16

S1

S3

O H

H

H

Bond Length

H

Cα Bond Angle

O

N

N

C

Cα

C Cα

Φ

Ψ

ϖ

N

C

H

H

χ

H

H

O

O S2

Figure 2: Tri-peptide showing internal coordinate system references

References

1] Thomas B#ack. Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York, 1996.

2] Richard K. Belew and Lashon B. Booker, editors. Proceedings of the Fourth International Conference on Genetic Algorithms, San Mateo, California, July 1991. Morgan-Kaufmann Publishers.

3] Donald J. Brinkman, Laurence D. Merkle, Gary B. Lamont, and Ruth Pachter. Parallel genetic algorithms and their application to the protein folding problem. In JoAnne Wold, editor, Proceedings of the Intel Supercomputer Users' Group 1993 Annual North America Users' Conference, [email protected], 1993. Intel Supercomputer Systems Division.

4] Bernard R. Brooks et al. Charmm: A program for macromolecular energy, minimization, and dynamic calculations. Jounal of Computational Chemistry, 4(2):187{217, 1983.

5] Hue Sun Chan and Ken A. Dill. The protein folding problem. Physics Today, pages 24{32, February 1993.

6] Stephanie Forrest, editor. Proceedings of the Fifth International Conference on Genetic Algorithms, San Mateo CA, July 1993. Morgan Kaufmann Publishers, Inc.

7] George H. Gates, Jr., Ruth Pachter, Laurence D. Merkle, and Gary B. Lamont. Parallel simple and fast messy GAs for protein structure prediction. In Proceed-

17

8]

9]

10]

11]

12]

13]

14]

15]

16]

17]

18]

19]

20]

21]

ings of the Intel Supercomputer Users' Group 1995 Annual North America Users Conference, Beaverton, Oregon, 1995. Intel Supercomputer Systems Division. David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison{Wesley Publishing Company, Reading MA, 1989. David E. Goldberg, K. Deb, and B. Korb. Messy genetic algorithms revisited: Studies in mixed size and scale. Complex Systems, 4:415{444, 1990. David E. Goldberg, K. Deb, and B. Korb. Don't worry, be messy. In Belew and Booker 2]. David E. Goldberg, Kalyanmoy Deb, Hillol Kargupta, and Georges Harik. Rapid, accurate optimization of di cult problems using fast messy genetic algorithms. In Forrest 6], pages 56{64. David E. Goldberg, B. Korb, and K. Deb. Messy genetic algorithms: Motivation, analysis, and rst results. Complex Systems, 3:493{530, 1989. V. Scott Gordon and Darrell Whitley. Serial and parallel genetic algorithms as function optimizers. In Forrest 6], pages 177{183. John H. Holland. Adaptation in Natural and Articial Systems. MIT Press, Cambridge, MA, First MIT Press edition, 1992. Charles E. Kaiser, Jr., Laurence D. Merkle, Gary B. Lamont, George H. Gates, Jr., and Ruth Pachter. Case studies in protein structure prediction with realvalued genetic algorithms. In Michael Heath, Virginia Torczon, Greg Astfalk, Petter E. Bj&rstad, Alan H. Karp, Charles H. Koelbel, Vipin Kumar, Robert F. Lucas, Layne T. Watson, and David E. Womble, editors, Proceedings of the Eighth SIAM Confrenece on Parallel Processing for Scientic Computing,, Philadelphia, PA, 1997. SIAM, Society for Industrial and Applied Mathematics. Hillol Kargupta. SEARCH, Polynomial Complexity, and the Fast Messy Genetic Algorithm. PhD thesis, University of Illinois, 1995. Also available as IllliGAL Report No. 95008. Gary B. Lamont, editor. Compendium of Parallel Programs for the Intel iPSC Computers. Department of Electrical and Computer Engineering, Graduate School of Engineering, Air Force Institute of Technology, Wright-Patterson AFB OH 45433, 1993. Gary B. Lamont et al. Evolutionary algorithms. In Lamont 17]. Scott M. LeGrand and Kenneth M. Merz Jr. The application of the genetic algorithm to the minimization of potential energy functions. Journal of Global Optimization, 3:49{66, 1991. Zhenqin Li and Harold A. Scheraga. Monte carlo-minimization approach to the multiple-minima problem in protein folding. Proceedings of the National Academy of Science USA, 84:6611{6615, 1987. Laurence D. Merkle. Analysis of Linkage-Friendly Genetic Algorithms. PhD thesis, Graduate School of Engineering, Air Force Institute of Technology (AU), Wright-Patterson AFB OH 45433, December 1996.

18

22] Laurence D. Merkle, George H. Gates, Jr., Gary B. Lamont, and Ruth Pachter. Hybrid genetic algorithms for minimization of a polypeptide speci c energy model. In Proceedings of the Third IEEE Conference on Evolutionary Computation, 1996.

23] Laurence D. Merkle, Robert L. Gaulke, George H. Gates, Jr., Gary B. Lamont, and Ruth Pachter. Hybrid genetic algorithms for polypeptide energy minimization. In Applied Computing 1996: Proceedings of the 1996 Symposium on Applied Computing, New York, 1996. The Association for Computing Machinery.

24] Laurence D. Merkle and Gary B. Lamont. Comparison of parallel messy genetic algorithm data distribution strategies. In Forrest 6], pages 191{198.

25] Message Passing Interface Forum. MPI-2: Extensions to the message-passing interface. Technical report, University of Tennessee, Knoxville, TN, November 1996.

26] Zbigniew Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs. Springer, 1992.

27] Molecular Simulations, Incorporated. CHARMm version 22.0 Parameter File, 1992.

28] Thierry Montcalm, Weili Cui, Hong Zhao, Frank Guarnieri, and Stephen R. Wilson. Simulated annealing of met-enkephalin: low-energy states and their relevance to membrane-bound, solution and solid-state conformations. Molecular Structure (Theochem), 308:37{51, 1994.

29] Akbar Nayeem, Jorge Vila, and Harold A. Scheraga. A comparative study of the simulated-annealing and Monte Carlo-with-minimization approaches to the minimum-energy structures of polypeptides: Met]-Enkephalin. Journal of Computational Chemistry, 12(5):594{605, 1991.

30] Judea Pearl. Heuristics. Addison{Wesley Publishing Company, Reading MA, 1989.

31] Donald A. Pierre. Optimization Theory with Applications. Dover, New York, 1986.

32] Gregory J. E. Rawlins, editor. Foundations of Genetic Algorithms. Morgan Kaufmann, San Mateo, CA, 1991.

33] Lubert Stryer. Biochemistry (4th Edition). W. H. Freeman, New York, 1995.

34] Maximiliano V'asquez, G. N'emethy, and H. A. Scheraga. Conformational energy calculations on polypeptides and proteins. Chemical Reviews, 94:2183{2239, 1994.

35] Darrell Whitley. Fundamental principles of deception in genetic search. In Rawlins

32].

19

An MPI Implementation of the Fast Messy Genetic Algorithm - CiteSeerX

An MPI Implementation of the Fast Messy Genetic Algorithm - CiteSeerX

Suggest Documents

scalability of an mpi-based fast messy genetic algorithm - CiteSeerX

Application of the Parallel Fast Messy Genetic Algorithm to ... - CiteSeerX

Extended Multi-Objective fast messy Genetic Algorithm ... - CiteSeerX

Messy Genetic Algorithm Based New Learning ...

An MPI Implementation of the MCHF Atomic Structure ... - CiteSeerX

Studies on Fast Pareto Genetic Algorithm Based on Fast ... - CiteSeerX

DART-MPI: An MPI-based Implementation of a PGAS Runtime

FPGA-based Implementation of Genetic Algorithm for the ... - CiteSeerX

AN IMPLEMENTATION OF KARMARKAR'S ALGORITHM ... - CiteSeerX

DISTRIBUTED GENETIC ALGORITHM IMPLEMENTATION BY ...

implementation of fast focusing algorithm for uwb ... - CiteSeerX

an implementation of intrusion detection system using genetic algorithm

Implementation of a Genetic Algorithm for Jiles

An MPI-CUDA Implementation for Massively Parallel ... - CiteSeerX

Hardware Implementation of a Genetic Algorithm

Implementation of distributed genetic algorithm for parameter ...

Implementation of Genetic Algorithm in Network ...

Implementation of Genetic Algorithm in Control

AN EFFICIENT GENETIC ALGORITHM FOR FINDING ... - CiteSeerX

An Implementation of the Earliest Deadline First Algorithm ... - CiteSeerX

py-oopsi: the python implementation of the fast-oopsi algorithm

An FPGA Implementation of the Simplex Algorithm - CiteSeerX

An Implementation of the MPI-2 Extended Collective Operations

The Design and Implementation of Zero Copy MPI Using ... - CiteSeerX