Parallel Implementations of Evolutionary Strategies - Semantic Scholar

Parallel Implementations of Evolutionary Strategies Garrison W. Greenwood

Sanjay Ahire

Dept. of Elec. & Comp. Engr. Dept. of Management Western Michigan University Western Michigan University Kalamazoo, MI 49008 Kalamazoo, MI 49008 fgarry.greenwood,[email protected]

Ajay Gupta and R. Munnangi

Department of Computer Science Western Michigan University Kalamazoo, MI 49008 fgupta,[email protected]

Abstract Evolutionary algorithms can eciently solve a diverse set of optimization problems. The convergence rate of these techniques improve with larger population sizes but at the expense of an increased in computation time per generation. This paper presents an evolutionary algorithm which uses a large population for rapid convergence but a parallel implementation to minimize overall computation time. An instance of the preventive maintenance problem is used to evaluate this parallel implementation.

1 Introduction Many deceptively simple problems are, in fact,

N P -hard to solve. For example, suppose we are given a set of N integers. Can we partition this set into two

subsets such that the sum of the integers in one subset equals the sum of the integers in the other subset? For N 100; this problem cannot be expected to be solved in a reasonable nite time [1]. Finding optimal solutions to more complex problems is not an easy task. This has led researchers to adopt heuristic approaches. The evolutionary algorithm (EA) is a heuristic approach that has generated a great deal of recent interest. Though EAs have a number of paradigms, in this

Research supported in part by a Fellowship from the Faculty Research and Creative Activities Support Fund, WMUFRACASF 94-040 and by the National Science Foundation under grant CCR-9405377.

paper we will discuss only the evolutionary strategies paradigm in depth. Evolutionary strategies have been successfully used to solve a diverse set of optimization problems including task scheduling in distributed systems [2], the design of neural structures [3], and system identi cation [4]. An EA program may have a long (albeit, nite) execution time for large size problems. The EA algorithm, however, is ideally suited for parallel implementation. In this paper we present a parallel evolutionary strategy algorithm which was implemented on a 128 processor nCUBE2 machine (con gured as a 7-d hypercube). Both sequential and parallel versions were implemented and compared for quality of results and running time. Our results indicate that the parallel implementation is quite ecient. As a test case, we used instances of the preventive maintenance (PM) problem which is known to be N P -hard. In this problem, there exist N distinct preventive maintenance tasks that must be scheduled for execution. The PM problem is a variation of the assembly-line-balancing problem which is of considerable interest in the production and operations management eld [5]. This paper presents some of our preliminary results which are based upon randomly generated instances of the PM problem. We are currently in the process of obtaining actual data from several commercial airlines to evaluate our technique under a real-world scenario. These results will be presented in a subsequent paper. In the past researchers have developed parallel algorithms for EAs (e.g., see [6] and [7]). One topic of interest has been the aect that population size has

on the quality of the nal solution [8]. We also investigated this issue. A key result of this paper is that while the evolutionary strategy speeds up and scales linearly with the number of processors, the quality of the nal solution can degrade as the speedup increases.

2 Evolutionary Algorithms Evolutionary algorithms are heuristic based search techniques that have been applied to a large number of optimization problems. Two of the more common paradigms are genetic algorithms (GA) and the evolutionary strategies (ES). Though this paper discusses only the ES in depth, a general discussion of both paradigms is worthwhile. EAs are based upon the principles of adaptive selection found in the natural world. Each generation (iteration of the EA algorithm) takes a population of individuals (potential solutions) and modi es the genetic material (problem parameters) to produce new ospring. Both the parents and the ospring are evaluated but only the highest t individuals (better solutions) survive over multiple generations. This means the EA is simultaneously investigating several regions of the search space which greatly decreases the amount of time required to locate a good solution. EAs have been successfully used to solve various types of optimization problems. The reader is referred to Michalewicz for an excellent discussion of EAs [9]. The particular genetic encoding for an individual is referred to as the genotype. New genotypes are created by special operations (e.g., mutation) which modi es the genetic material. Decoding this genetic material gives the observed characteristics of the individual which is referred to as the phenotype. The f itness of the genotype quanti es how closely the observed characteristics match a desirable solution; highly t individuals exhibit highly desirable characteristics. The EA terminates after a xed number of generations (?) have been produced and evaluated or earlier if an acceptable solution is found. GAs and ES were independently developed. While each maintains a population of trial solutions, imposes random changes to those solutions, and incorporates selection to determine which solutions survive, there are a number of dierences. GAs are concerned with the genotype level and attempt to model genetic operators that exist in nature (such as crossover). ES are more concerned with phenotypic eects and emphasize mutational transformations [10]. The algorithm below implements the ES.

1. Randomly generate an initial population of individuals. 2. Mutate each individual to generate one ospring. Add all ospring to the population. 3. Evaluate all individuals to determine their tness. 4. Select the ttest individuals for survival. Discard the other individuals. 5. Proceed to step 2 unless an acceptable solution has been found or ? generations have been evaluated. In step 2, each individual was mutated to produce a single ospring. The mutation operator produces this ospring by making a small, random perturbation to the problem parameters encoded by the genotype. Exactly how a mutation operator produces an ospring depends on the genotype data structure and is thus application dependent. It is also possible to have more than one mutation operator. In such cases, the k-th mutation operator is applied P to an individual with probability pk . Since pk = 1:0, all individuals will be subjected to a mutation operation. In section 4 we will discuss the genotype data structure and the two mutation operators that were used in the PM problem. While it is true that mutation operators are stochastic in nature, this does not mean that the ES is merely a random search. The ES concentrates the search in those regions of the search space where good solutions have been previously found. Regions which show little promise are no longer investigated. (Speci cally, individuals with high tness survive and those with low tness die out.) There is much recent evidence that the use of macro-mutations such as crossover are not always necessary to achieve satisfactory performance [10]. In step 4 of the ES algorithm all individuals are evaluated to determine their tness and the best are kept. Not only does this mean that ospring compete equally with their parents for survival, but it also identi es regions of the search space that show promise of containing good solutions. This survival mechanism permits the ES to monotonically converge to an acceptable solution. With large size problems the number of generations normally is increased as more mutation operations are required for the ES to converge. In part this is due to the \curse of dimensionality" which exponentially grows as the problem size grows. The quality of the

solution is generally proportional to the size of and to the number of generations. But high quality solutions may require excessive computation times. In such cases, a parallel implementation should be used.

3 Parallel Evolutionary Strategies In many ES problems a population size of = 50 - 100 is traditionally chosen. While this is adequate for moderately sized problems, for large size problems this is quite small and, intuitively, does not adequately cover the entire search space. This limitation can lead to sub-optimal solutions. The limitation can be removed by choosing a larger . However, this also increases the computation time for each generation and the memory requirements on a single processor. To signi cantly improve the computation time, the ES should be executed on a multiprocessor system. In this section we describe how this can be done. Suppose we have a large and a processor machine. Without loss in generality, we assume and are integer powers of 2. The initial population is partitioned equally between the processors. Each processor then executes the ES on its local population. Ideally after every generation the global ttest should be selected to form the next generation. But this would involve signi cant communication costs in a multiprocessor system. A better approach is to rst let each processor execute for g generations using its local subset of the individuals and then participate in a global introduction of new genetic material. Let be the set of the ttest individuals where each processor contributes its ttest individuals. can range from 1 to . The global ttest individuals from are then broadcast to every processor. These individuals replace the least t individuals at each processor. This is implemented with the call to the global- ttest procedure in Figure 1. The purpose of introducing new genetic material is to prevent a local ES computation from becoming trapped at a sub-optimal solution. In order to reduce the possibility of prematurely saturating the population, should be quite small ( 0.1 ). The exact implementation of the global- ttest procedure depends on the interconnection network and routing mechanism of the multiprocessor system. For hypercubes, this can be done in O( log ) time. An exact complexity analysis of the parallel ES algorithms in comparison to the sequential ES algorithm should be done with care. Our parallel ES 0

0

Procedure Parallel ES k 0 randomly generate population P(k) with individuals partition individuals in P(k) into subsets assign a unique subset of size = / to each processor For every processor do in parallel evaluate local population endfor While(k ?) do For every processor do in parallel k k +1 T local individuals /* copy current generation */ use local individuals to create children T T children evaluate T P(k) ttest individuals from T if (k mod g == 0) then Call global- ttest endfor endwhile 0

0

[ f

g

0

Figure 1: An Outline of a Parallel ES Algorithm. algorithm requires introduction of new genetic material from a partitioned population; the sequential ES algorithm has only a single population to process. Also, our parallel algorithm requires broadcasting the global ttest individuals after every g generations to all population partitions. Again this makes no sense for the sequential ES algorithm because of a single population partition. Nonetheless, an approximate analysis can be performed as follows. We assume that both the sequential and parallel ES algorithms are run for the same number of generations (i.e., ?). The sequential ES algorithm executes a single generation in (; N ) time where is the population size and N is the size of an individual in the population. (In the PM problem, size equals the number of preventive maintenance tasks that must be scheduled.) This gives a total execution time of ? ? (; N ). For the parallel ES algorithm, each processor contains a partitioned population of / individuals. The time to evaluate an individual population on a processor is thus (=; N ). Since each processor works on this population for g generations, the total execution time prior to introduction of new genetic material is O(g (=; N )). Given local ttest individuals at every processor, selection and broadcasting of the global- ttest individ-

uals in the -processor hypercube requires O( log ) time and this is repeated ?=g times. It is now easy to see that the total execution time of the parallel ES algorithm is O( ?g (g (=; N ) + log )), resulting in a speedup of O( (; N )=( ( ; N ) + log g )). In a number of problems, where genetic algorithms or evolutionary strategies are used, the time (; N ) to evaluate a single generation is O(N ) (certainly true in our PM problem of Section 4). Using this, log the speedup is O(N=( N + g )). In the worst case we compute the global- ttest individuals every generation (i.e., g=1). Setting to its maximum value of 0.1= , the speedup can be calculated as ) O( 0 :1 1 + N log For N > log the speedup is bounded by O(=1:1).

4 The PM Problem To evaluate the performance of our parallel ES algorithm, we took an instance of the PM problem. In this problem, there exist N distinct preventive maintenance tasks that must be scheduled for execution. For convenience, the tasks are assigned distinct integer labels from the set 1 N . Each task has a known completion time and a known set of skills that technicians must have to perform the task. For example, a PM task may require the assignment of an electrician for 3 hours and a mechanical technician for 1 hour. Given a set of technicians (with speci ed skills), the goal is to assign them to the N PM tasks such that all tasks can be completed in minimal time. It is well known that this problem is N P -hard to solve. For our tests we assumed that there are N =200 tasks and 20 technicians are available for assignment. Tasks were randomly assigned a completion time of between 10 and 20 time units. Each task required the assignment of 1 to 5 possible technical people. These technical people could be electricians, mechanics, or electro/mechanics. (The latter individuals can be assigned to an electricians task or a mechanics task.) A PM task could not commence until all of the technical people required to perform that task were available. Once assigned, a technician would remain with that PM task until that particular skill was no longer required. A technician could be assigned to only one PM task at a time. Recall that the genotype is a data structure that encodes the problem's parameters. For the PM problem the data structure is a N element list of integers f

g

corresponding to the N tasks. The left-to-right order of the list speci es the order in which tasks are to be assigned available personnel. Let L denote the set of technical people. The cardinality of this set varies as PM tasks are executed since all personnel assigned to a PM task might not be needed for the same amount of time. For example, a mechanic could be required to work for 5 time units while an electrician might be required to work for only 2 time units. Once a technician has completed a PM task, (s)he is \returned" to L to await assignment to another task. The schedule for a speci c ordering of PM tasks is computed as follows. The rst task in the integer list is selected and all needed personnel for this task are taken from L. If L is not empty, all remaining tasks are scanned (from left-to-right) to see if any other task could be started. Tasks cannot hoard resources which means all needed personnel must exist in L before a task could begin. Technicians are reassigned to other PM tasks whenever they have completed their current task. Using the example above, the electrician would be returned to L after two time units and could be reassigned. The mechanic would have three more time units of work before (s)he could be reassigned. The above process continues until all PM tasks are completed. The completion time of the last task determines the schedule length (also known as the makespan). The lower the schedule length, the higher the tness of the phenotype. The initial population is randomly generated and produces phenotypes. New phenotypes are created by mutation. We used two mutation operators. Mutation operator M1 randomly selects two points in the integer list and perturbs the order of all tasks between the two points. The other operator M2 randomly selects two integers from the integer list and swaps them. Since both operators change the order of tasks in the list, the order of execution is likewise aected. Both sequential and parallel versions of the ES were constructed, run, and the results tabulated. For the rst set of runs, mutation operators M1 and M2 were applied with probability p1 =0.7 and p2 =0.3, respectively. Notice that p1 +p2=1 which means all individuals in a population are mutated during each generation. The initial population used for the sequential version was also used for the parallel version to permit a fair comparison of the results. To evaluate the parallel version, the ES was run on a 128 node nCUBE2 hypercube array. The sequential version was run on just one of these processors.

65 57 49 Sp 41 ee du 33 p 25 17 9 1

65 ... . . . . .... 57 . . . . .... . . . 49 . .... . . . . 41 .... . . . . . . . . 33 .... . . . . . . . . 25 .... . . . . . . . . 17 .... . . . . . . . . 9 .... . . . . . . . ... 1 1 4 8 16 32 64 Number of Processors

45600 45300 Sp ee du p

Figure 2: Speedup on NCUBE-2 for ? = 200, g = 10, = 200, N = 256. Figure 2 shows the typical speedup as the number of processors increases. For these test cases we used = 200; N = 256 and the ES was run for ?=200 generations. For 2 processors, each processor was assigned = individuals. Each processor thus locally ran the ES algorithm. A global search was conducted to determine the highest t individual. After g=10 generations, this highest t individual was broadcast to each processor. Each processor then replaced its lowest t individual with the received individual. From gure 2 it is clear that near linear speedup is achievable. This matches well with the speedup bound derived in section 3. We did not keep track of the percent improvement in schedule length that the ES produces between the initial and the nal populations. Such information is dicult to interpret. The initial population is randomly generated making it theoretically possible to contain the globally optimum solution. This means the ES would produce a 0% improvement! Conversely, the initial population could contain the globally worst possible solution and the nal population could contain the globally optimum solution. Figure 3 shows the schedule length as a function of the number of processors for a typical run. Notice that the schedule length decreases and then monotonically increases as the number of processors (and

d

e

Sc he du el Le ng t h

45000 44700 44400 44100 43800 43500

45600 .. ............ . . . . . . . . . . . ...... ............ . . . . . . . . . .. ...... . . . . ...... ....... .. ... .. ... ... ... ... .. ... .. ... .. ... .. .... 1 4 8 16 32 Number of Processors

64

45300 45000 44700 44400 44100 43800 43500

Figure 3: Schedule on NCUBE-2 for ? = 200, g = 10, = 200, N = 256. speedup) increases. If too small a population size is selected, the algorithm will converge too quickly to a local optima [8]. Recall that the population is evenly distributed among the processors. Consequently, each of these local populations becomes smaller as the number of processors increases. This means that, for a xed size problem, high speedup may lead to premature convergence. One can always run the ES for a larger number of generations to see if continual mutation will help to avoid premature convergence. We tried running the ES for ?=500 generations. The nal solution improved by slightly over 2% but the run time of the algorithm increased by almost 200%. We also investigated the aect of migrating good individuals between processors less often. A number of runs were conducted with g = 10 and g=25. Despite the fact that the smaller g leads to increased network trac, there was no perceptible change in speedup. As expected, the smaller g produced a better solution since the migration occurs more often. However, the quality of the nal solution diered by less than 1% which suggests that the technique is robust with respect to this parameter.

5 Conclusions We have implemented an instance of the PM problem using both a sequential and a parallel version of an ES. Near linear speedup was achieved when run on 64 processors of a nCUBE2 parallel processing machine. Our results indicate that there is a correlation between speedup and the quality of the nal solution. This issue becomes a fundamental decision that must be faced by all users of evolutionary techniques. In fact, it suggests that a parallel implementation may only be appropriate when investigating large size problems. Yet, this may actually prove to be of bene t. The larger the population size, the higher the likelihood that all regions of the search space are represented. This permits a more thorough investigation which increases the chances of nding the global optimum.

References [1] E. Lamagna, \Infeasible Computation: N P Complete Problems", Abacus, Vol 4., No. 3, pp 18-33, Spring 1987 [2] G. Greenwood, A. Gupta, and V. Mahadik, \Multiprocessor Scheduling of High Concurrency Algorithms", Proc. of Florida AI Research Symp., pp. 265-269, May 1994 [3] R. Lohmann, \Structure Evolution and Incomplete Induction", Parallel Problem Solving from Nature 2, edited by R. Manner and B. Manderick, Amsterdam: North-Holland, pp 175-186, 1992 [4] W. Kuhn and A. Visser, \Identi cation of the System Parameter of a 6-Axis Robot with the Help of an Evolutionary Strategy" Robotersysteme, Vol 8:3, pp 123-133, 1992 [5] S. Ghosh and R. Gagnon, \A Comprehensive Literature Review and Analysis of the Design, Balancing and Scheduling of Assembly Systems", Int'l Jour. of Prod. Research, 27(4), pp. 637-670, 1989 [6] C. Farrell, D. Kieronska, and M. Schulze, \Genetic Algorithms for Network Division Problem", 1st IEEE Conf. on Evol. Comp., pp. 422-427, June 1994 [7] F. Homeister, \Scalable Parallelism by Evolutionary Algorithms", Parallel Comp. and Math.

Opt., edited by D. B. Grauer, Berlin: Springer, pp 177-198, 1991 [8] D.E. Goldberg, \Sizing Populations for Serial and Parallel Genetic Algorithms", Proc. ICGA89, pp. 70-79, 1989 [9] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, 2nd ed., Springer-Verlag 1994 [10] D. Fogel, Evolutionary Computation, IEEE Press, 1995 (pg. 103-104)