combining distributed populations and periodic ... - CiteSeerX

1 downloads 0 Views 82KB Size Report
rithms can profit from performing periodic centralized se- lections of distributed populations. ... Table 1 summarizes the main characteristics of the dif- ferent parallel GA ... selections; it is based on the traditional organization of par- allel GAs for ...
COMBINING DISTRIBUTED POPULATIONS AND PERIODIC CENTRALIZED SELECTIONS IN COARSE-GRAIN PARALLEL GENETIC ALGORITHMS R. Bianchini, C. M. Brown, M. Cierniak, and W. Meira

fricardo,brown,cierniak,[email protected] Department of Computer Science University of Rochester Rochester, New York 14627

Abstract

In this paper we demonstrate that parallel genetic algorithms can profit from performing periodic centralized selections of distributed populations. With this combination, implementations can benefit from the variety of environments provided by distributed approaches, while periodically being able to consider the population as a whole and disregard very unfit individuals. We study four different parallel genetic algorithm implementation strategies; each of them striking a different balance between centralization and distribution. These strategies are applied to several well-known benchmark problems. Our results show that the implementations using periodic centralized selections exhibit remarkable robustness in terms of their search quality, while keeping running time overheads under control. Our main conclusion is that performing centralized selections represents an improvement to the traditional population distribution approaches, which are susceptible to somewhat inefficient search.

1 Introduction Genetic Algorithms (GAs) have proven to be very effective approaches to solving various hard problems in state space search and optimization. However, some of these algorithms may require a large amount of time in order to find good problem solutions. A common approach to reducing this excessive execution time is to resort to parallel GAs (e.g. [1, 3, 7, 10, 12]).

Many search quality issues must be considered when designing a parallel GA for a certain application. Although centralization of the population (in the form of centralized selections) can cause premature convergence, it allows for global knowledge of the best individuals at each moment in time. In contrast, in a distributed organization, processors replicate individuals in the context of local smaller subpopulations. This isolation can cause some loss of computing cycles, whenever individual processors spend time working on very unfit individuals or get stuck on local minima (maxima). Nevertheless, distributed algorithms have the great advantage that a larger number of different environments composing the population (each of which corresponding to a separate subpopulation) may improve the search. Many researchers have studied the effects of the population distribution on the search quality of parallel genetic algorithms. Neuhaus [10] mentions that a centralized solution to the process mapping problem delivered better search quality than a distributed solution to the same problem. Cohoon et al. [3] present results in which distributed algorithms with migration found better solutions than a sequential GA for optimization problems. Kommu and Pomeranz [9] present the same type of results for the Set Covering Problem. In [12], Tanese concluded that a distributed algorithm without migration searched at least as effectively as a distributed implementation with periodic migrations for several function optimization problems. In these latter experiments, both distributed implementations found fitter individuals than the sequential GA. Other work (e.g. [4]) considers the local selection of distributed populations as leading to a faster, more diverse, and more robust type of search than its global counterpart. Collectively these results show that, although distributed populations generally search better and faster, centralized ones can also deliver good results, depending on the problem at hand. With respect to execution performance, centralization of the population entails potentially large amounts of communication and creates execution bottlenecks, causing serialization and reduced performance. Distributed implementations, on the other hand, lack central bottlenecks and have reduced communication needs.

In this paper we investigate the profitability of using distributed populations with periodic centralized selections. With such a hybrid approach, implementations can benefit from the variety of environments provided by distributed approaches, while periodically being able to consider the population as a whole and to discard very unfit individuals. In order to assess the search quality and execution time performance of parallel GAs with periodic centralized selections, we compare a set of implementation alternatives [2]. Our results show that hybrid strategies are extremely robust and do not degrade running time performance significantly. Our main conclusion is that periodically performing centralized selections represents an improvement to traditional distributed GAs. The remainder of this paper is organized as follows. Section 2 describes the different parallel GA strategies we study. In section 3 we describe our benchmark suite, the parameters used in our experiments, and our main search quality and execution time results. Section 4 concludes the paper.

2

Implementation Strategies

Table 1 summarizes the main characteristics of the different parallel GA implementation strategies we study. Our first strategy, called Centralized (Cent), is organized in a master-slave fashion, interconnected logically as a star, with the master in the center. The computation proceeds with the master and each slave processor performing reproduction, crossovers, and mutations on different groups of individuals for a certain number of generations. After computing independently for a while, the slaves communicate with the master so that it can perform a completely centralized reproduction. After each centralized reproduction phase, the population is redistributed for one more round of independent computations.

Strategy Cent Semi-Dist Dist Tot Dist

Topology star clusters of stars torus torus

Comm Style

Centralized Selections? master-slave yes master-slave + yes near-neighbor near-neighbor no – no

Table 1: Main Implementation Characteristics

Our second implementation strategy, Semi-Distributed (Semi-Dist), provides a greater degree of data distribution than the centralized algorithm, while alleviating the performance bottleneck caused by having a single master processor in that algorithm. The idea is to have clusters of processors that work as in the centralized implementation. Clusters can be made sufficiently small so that contention is relieved. The clusters’ masters are connected to form a (logical) torus topology. Communication between masters happens with some desired frequency, exchanging some of the best individuals between clusters of processors. In contrast with the aforementioned algorithms, our Distributed (Dist) implementation does not perform centralized selections; it is based on the traditional organization of parallel GAs for distributed-memory architectures (e.g. [12]). There are no shared populations; each processor holds a piece of the total population and runs a complete sequential GA on it. Communication happens periodically, when processors send some of their best individuals to their nearest neighbors. Each processor includes the individuals it receives in its population. The processor interconnection topology we use in this implementation is again (logically) a torus. Finally, our Totally Distributed (Tot Dist) strategy is a simple variation of the distributed implementation; one in which individuals are never transferred between processors. Processors only communicate in the initialization and termination phases of the algorithm.

3 Experimental Results Our evaluation is centered on a number of well-known benchmark problems:



DeJong’s five function minimization problems (f1–f5) [8].

f 1(x ) = P31 x2 f 2(x ) = 100 (x21 ? x2 )2 + (1 ? x1 )2 P f 3(x ) = P51 integer(x ) 4 f 4(x ) = 30 1 ix + Gauss(0; 1) f 5(x ) = 0:002 + P25=1 +P 1( ?1) i

i

i i

i

i

i

i





j

j

2

=1

i

xi

6

Traveling Salesman Problem (TSP). TSP consists of finding the shortest route through n cities, such that each city is visited only once. Our instance of TSP involves 58 cities. 0-1 Integer Linear Programming problem (ILP). ILP [11] is an NP-complete optimization problem where one wants to minimize a linear function of the form f (x1; : : :; x ), subject to a set of linear constraints. The variables (x1; : : :; x ) can only take the values 0 or 1. The problem can be stated as the minimization of f = =1 a x  b , =1 c x , subject to the constraints: where a , b and c are real constants; i = 1; : : :; m and j = 1; : : :; n. Our instance of this problem assumes n = 50 and m = 10. n

P

n

j

P

n

n

j

j

i;j

j

i

i;j

j

i

j

Our implementations use binary encoding and perform simple (one-point) crossover and mutation on randomlypicked individuals. Evolution is generational with an elitist selection strategy based on the stochastic remainder without replacement method [6].

100 Cent Dist Semi-Dist Tot Dist Seq

10

1

Fitness

0.1

0.01

0.001

0.0001

1e-05

1e-06 0

500

1000

1500

2000 Generations

2500

3000

3500

4000

Figure 1: Search Quality on F4.

Our experiments were run on an 8-processor SGI Challenge using the Mercury runtime library [5] for parallelism. We present a summary of the results of 15 runs for each of the example functions; each run corresponds to a different random seed controlling the search. Each experiment was run for 2000 generations, except for the ones optimizing f4 and TSP which were run for 4000 generations. Crossover and mutation probabilities were .8 and .01, respectively. Population sizes roughly correspond to the complexity of the landscape being searched (72 for f1, 96 for f2, 168 for f3, and 216 for f4, f5, ILP, and TSP). We studied several frequencies of communication; from 5 to 40 generations between centralized replications (Cent and Semi-Dist) or near-neighbor communication (Dist). The frequency of communication between the two master processors in Semi-Dist was kept constant, occurring once every 50 generations. In Dist, each processor sends its best individual to each neighbor during the migration phase. Our results show that the strategies with periodic centralized selections exhibit remarkable robustness in terms of their search quality. In cases where the sequential GA achieves good performance (f2 and f4), Cent and SemiDist approach or surpass that performance. When the more distributed strategies entail the best performance, i.e. the sequential GA exhibits poor performance (f1, f3, f5, ILP, and TSP), the implementations with centralized selections closely approach those results.

131072 Cent Dist Semi-Dist Tot Dist Seq

Fitness

65536

32768

16384 0

500

1000

1500

2000 Generations

2500

3000

3500

4000

Figure 2: Search Quality on TSP.

As representative examples of these two scenarios, figures 1 and 2 present the overall best fitness at each generation averaged over 15 different runs for each of our implementation strategies running on 8 processors. Figure 1 shows results achieved for f4. In these experiments Cent and Semi-Dist perform master-slave communication every 5 generations, while near-neighbor migrations are performed every 10 generations in Dist. Figure 2 shows the best results achieved for TSP. For this program near-neighbor migrations are again performed every 10 generations. Best performance for Cent and SemiDist was also achieved with this frequency of communication between master(s) and slaves. As an overview of our results, table 2 presents the implementation and period between communication that achieved the best search quality performance (in terms of the average of the best fitnesses for 15 runs) for each of our benchmarks. The results in the table show that, besides being robust, implementations with periodic centralized selections frequently deliver the best overall solutions for our benchmarks. Dist also exhibits good performance for several benchmarks, while Tot Dist invariably delivers significantly worse search quality than all the other implementations.

Benchmark Strategy/Interval Between Comms f1 Dist/40 f2 Cent/5 f3 Semi-Dist/40 Dist/5,10,40 f4 Seq f5 Dist/5,10,40 ILP Cent/5,10,40 Semi-Dist/5,10,40 Dist/5,10,40 TSP Dist/5 Table 2: Average of Best Fitnesses

In addition to good search quality, parallel GAs must also deliver good running time performance. Table 3 presents the execution times of each implementation optimizing f3 on 8 processors. The results in this table show that, at least for the small machine configurations we studied, centralized selections do not entail significant performance degradation. The greatest performance difference between Dist and the implementations with centralized selections is only about 12%. Note also that Semi-Dist performed slightly worse than Cent, even though the former strategy is supposed to address a bottleneck of the latter. The reason for this result is that Semi-Dist has a more complicated organization than Cent and our multiprocessor is based on a shared bus, which renders Semi-Dist’s clusters ineffective. On multiprocessors with scalable interconnection media, we expect that Semi-Dist shall easily surpass Cent’s execution time performance.

4 Conclusions Our main conclusion is that performing centralized selections represents an improvement to the traditional population distribution approaches. Strategies performing periodic centralized selections are extremely robust and frequently achieve superior search results, while keeping execution time overheads under control.

Strategy Seq Cent

Comm Freq Run Time Speedup – 17.1 1.0 5 8.1 2.1 10 7.5 2.3 40 7.0 2.4 Semi-Dist 5 8.1 2.1 10 7.6 2.3 40 7.4 2.3 Dist 5 7.5 2.3 10 7.2 2.4 40 7.2 2.4 Tot Dist – 6.7 2.6 Table 3: Running Times (in Seconds) and Speedups for f3

Acknowledgements

We would like to thank Leonidas Kontothanassis for his help with the Mercury runtime library. This work was supported by Brazilian CAPES, CNPq, and NUTES/UFRJ fellowships, the National Science Foundation under Grants numbered IRI-8920771 and CDA-8822724, and Darpa contract MDA972-92-J-1012. The Government has certain rights in this material.

References [1] S. Baluja. Structure and performance of fine-grain parallelism in genetic search. In Proc. of the 5th Int. Conf. on Genetic Algorithms and their Applications, pages 155–162, 1993. [2] R. Bianchini and C. Brown. Parallel genetic algorithms on distributed-memory architectures. In Transputer: Research and Applications, May 1993. [3] J. P. Cohoon, W. N. Martin, and D. S. Richards. A multi-population genetic algorithm for solving the kpartition problem on hyper-cubes. In Proc. of the 3rd Int. Conf. on Genetic Algorithms and their Applications, 1989. [4] R. J. Collins and D. R. Jefferson. Selection in massively parallel genetic algorithms. In Proc. of the 4th Int. Conf. on Genetic Algorithms and their Applications, pages 249–256, 1991. [5] R. J. Fowler and L. I. Kontothanassis. Mercury: Object-affinity scheduling and continuation passing on multiprocessors. In Parallel Architectures and Languages Europe (PARLE) ’94, June 1994. [6] D. E. Goldberg. Genetic Algorithms in search, optimization and machine learning. Addison-Wesley, 1989. [7] J. J. Grefenstette. Parallel adaptive algorithms for function optimization. Technical Report CS-81-19, Nashville: Vanderbilt University. Computer Science Dept., 1981. [8] K. A. De Jong. Analysis of the Behavior of a Class of Genetic Algorithms. PhD thesis, University of Michigan, 1975. [9] V. Kommu and I. Pomeranz. Effect of communication in a parallel genetic algorithm. In Proc. of the 1992 Int. Conf. on Parallel Processing, pages III:310–317, 1992. [10] P. Neuhaus. Solving the mapping-problem – experiences with a genetic algorithm. In Parallel Problem Solving from Nature, pages 170–175. Springer-Verlag, 1991.

[11] H. A. Taha. Integer Programming — Theory, Applications and Computations. Academic Press, 1975. [12] R. Tanese. Distributed genetic algorithms. In Proc. of the 3rd Int. Conf. on Genetic Algorithms and their Applications, 1989.

Suggest Documents