Spatial genetic algorithm and its parallel ... - Semantic Scholar

1 downloads 0 Views 581KB Size Report
As it is shown in [1], PRSA is able to fred the solution of the. 24-bit, eight-subfunction, ..... The arrows show neighboring strips and cells, edge elements of which ...
Spatial Genetic Algorithm and Its Parallel Implementation I Andrzej B r o d a and Witold D z w i n e l Institute of Computer Science AGH, A1. Mickiewicza 30, 30-059 Cracow, Poland.

dzwinel @uci.agh.edu.pl Abstract. The spatial genetic algorithm (SGA) is presented. Locality is realized by mapping GA population on a cellular automata. The role of neighborhood in genetic search is shown by comparing SGA with the parallel recornbinative simulated annealing (PRSA) approach proposed by Mahfoud and Goldberg in [1]. It appears, that not optimized SGA outdoes PRSA in loose and is only slightly worse in tight optimization problems defined in [1]. However, because of high potential parallel speedup for SGA and the possibility of domain decomposition due to SGA locality, the efficient parallel realization of SGA seems to be more perspective than PRSA. The SGA opens the way for investigations of more sophisticated neighborhood definitions based on the lattice gas and molecular dynamics paradigms, which are well scalable (in parallel computing sense) and can simulate the natural space-and-time environment lacking in pure GAs.

1

Introduction

Disruption of good subsolutions by mutation and crossover operators is the main disadvantage of genetic algorithms (GA). A tradeoff between diversity and performance consists in the control of frequency of disruption events. Maintaining diversity is a principal demand. Although, crossover and mutation generate new solutions, the certain limitations come from the facts that: crossing nearly identical strings yields offspring similar to the parents, mutation examines the full solution space but may take an extensively long time to gain an acceptable solution. Parallel recombinative simulated annealing (PRSA) approach proposed by Mahfoud and Goldberg in [1], can be interpreted as a method, which goal is to raise diversity, simultaneously limiting the number of worthless solutions (population members). It is done by maintaining the Boltzmann distribution in the population and using SA (simulated annealing) paradigm of cooling. Introducing a 1This work is sponsored by the Polish Committee of Scientific Research (KBN), Grant No. KBN 8 $503 006 07.

98 minimization of cost function (energy) in addition to fitness, reinforces the process of search, making the recombination and mutation operators "energy directed". As proved in [ 1], this makes the process of search convergent to the global solution (for problem dependent suitable cooling schedule). What is interesting, for relatively large populations the solution is obtained before the cooling threshold is achieved. Otherwise, for smaller populations, "energy directed" mutation is able to fred the proper solution. The tradeoff between "GA factor" and "SA factor" let us to fred the conditions of the best performance of PRSA algorithm. The priceless advantages of PRSA over both pure GA and SA are as follows: 1. Unlike GAs, it provides the possibility to obtain a good solution for problems characterized by long strings. GAs require huge populations to obtain a reasonable result. As it is shown in [1], PRSA is able to fred the solution of the 24-bit, eight-subfunction, loose version of Goldberg's order-3 deceptive problem for small population (n is order of 10) in a reasonable period of time, while pure GA failed even for n=5000 [1]. 2. It introduces the population and implicit parallelism to SA paradigm, which via polynomial increase of the best partial solutions (schemata) speedups the process of search. Unlike SA, PRSA is inherently parallel, which enables it to gain a high performance for parallel implementation of the algorithm [ 1]. However, diversity introduced by SA paradigm implementation is computationally expensive and requires by hand optimization not only GA (probabilities of recombination, mutation, number of population members etc.), but SA parameters like: cooling period, cooling constants, starting, switching and final temperatures as well. Another approach, complementary to PRSA, is to introduce locality by using a mapping to a cellular automata. This idea, was presented in [2,3] as a way of facilitating parallelization. In fact, despite the intrinsic parallelism of GAs, they are still difficult to parallelize to a level of high scalabitity. This is a consequence of the fact, that GAs (PRSA included) use global knowledge for their selection process. Therefore, their parallel versions (synchronous) use master-slave approach, where master sends-and-receives the current population to-from slaves and randomly shuffle pointers to population elements. Taking into account that for large scale computations in multiprocessor environment all members of populations have to be sent from-to slaves to-from master, both the limited memory of master processor and transmission delay will be the bottlenecks of efficient massive parallel processing. On the other hand, asynchronous parallel processing (very often use in GAs), in which the population members migrate to-from other processor, suffers from slow convergence due to migration [1] and scales wrong with increasing number of processors. Mapping population onto spatial structure resolves the problem of scalability, what will be shown in course of this paper. Moreover, mapping provides a new technique for assuring diversity in GAs population. In the paper SGA (spatial genetic algorithm) is proposed and compared with PRSA for the 24-bit, eight-subfunetion, loose and tight versions of Goldberg's order-3 deceptive problem. Both serial and potential parallel speedups are compared. Two parallel versions of SGA are discussed. Finally, possible directions

99

for future research to obtain more adaptive and problem independent algorithm are suggested.

2

The algorithm

Let us assume that N is the number of population members, G - the number of generations, 91 - the square array of n=SzxSz size. It is assumed that the array is closed by the periodic boundary conditions, i.e., the closest neighbors of edge elements are located on the opposite sides of the array. In Fig.1 the neighborhood of an element C is shown. It consists of eight S - the closest neighbors of C.

I ~ l l g l l Ig.41IK! Igl

ININNNNNN[] IlINN NNNIr EINNNNNN[] 1P~J 1P;lll D~ll llP;tl [ ] Fig.1. The 91(5x5) array. 25 population members are mapped into all the elements of the array. The closest neighbors of C element are marked by S.

Spatial genetic algorithm (A) Generate N members of an initial population and map each element of the population on the randomly selected, void element of 91 army. (B) Repeat i : 1,G inspect all the elements j (population members) of 91 array and supplement current genotype C inj by the genotype found as follows: (1) Out of the closest neighbors o f j (see Fig.l) a n d j itself, select two individuals A and B. The selection probability is an increasing function of fitness factors of neighboring individuals andj. (2) Substitute C by a better individual chosen from A and B. (3) Substitute C by one out of two possible results of A and B crossover (optional one) withpc probability (pc=0.5 will be considered). (4) Mutate C with pm probability (pm=0.5 will be considered). Owing to disruptive effects of genetic operators, GAs are incline to a loss of solutions and their substructures or bits [4]. Upon disruption, the pure GA will not maintain old but better solutions, especially those, for which below-average but perspective schemata must be discarded. This effect can be overcome only by using larger population sizes. However, the perspective, low order, wide span schemata have no chance of surviving. On one hand, after introducing neighborhood, the exponential increase in the number of above-average schemata - typical of simple GA - may no longer be valid (the population member is compared only to its neighbors but not to the whole

100

population). On the other hand, higher diversity of population can be expected. In general, better schemata will not die out, but dominate their surroundings. Their propagation will be stable but inevitable and not dependent on their advantage in fitness. In order to prevent fast domination of a better solution above the others, the fitness function for pure GA is usually rescaled to balance the individuals. It may happen, that above-average individual will be lost after such "equalization". Unlike in GAs, here an individual is safe until a better one can be found in its vicinity. This gives time for self-improvement and - if it is possible - to elaborate better solution by the moment of confrontation. Similarly, the best individuals do not die out, because they are better than their neighbors. As a result, there is little danger of the population being dominated by the superior individual too fast, and little risk of a loss of an individual which is only slightly better than its neighbors. Because long schemata are easier for disruption, only short schemata play a considerable role in classical GAs. Since the SGA individual is surrounded by its "relatives", a process of reconstruction of disrupted schemata is more likely. This fact was confirmed in SGA tests and experiences with loose problems presented in course of this paper. As it is stated in [1] about problems with good solutions represented by schemata of wide span, GAs usually miss them. It will be shown in the next section, that SGA not only use wide span schemata to fmd the solution but also makes a process of search faster than PRSA proposed by Mohfoud and Goldberg.

3 SGA against PRSA The tests were carried out for global minimum search of the 24-bit, eightsubfunctions, of Goldberg's order-3 deceptive problem defmed in [1]. The 24-bit function is a sum of eight 3-bit subfunctions. Each of the subfunctions represents order-3 deceptive problem. This choice is justified by the fact, that deceptive problems present the hardest challenge for GAs [4]. Two representations were tested:

tight." loose."

aaabbbcccdddeeefffggghhh abcdefghabcdefghabcdefgh

The loose version extremely separates the bits of subfunctions, forcing the algorithm to work on long schemata. Because the classical GAs are not able to solve this problem [1] only PRSA will be considered in confrontation with SGA. In Fig.2 and Tabs.l,2 the comparison between PRSA and SGA serial implementations is shown. "FE" stands for the number of function evaluations and "SS" denotes the "serial speedup" in relation to pure SA algorithm. The results for PRSA (see Fig.2 and Tab.l) presented in [1] were obtained after tuning the algorithm and testing different cooling speeds. Therefore, they represent the best PRSA realization. However, this fact is a potential source of troubles with selfadaptability of the algorithm to other optimization tasks and may result in considerable speed deceleration. In contrast, SGA does not involve any tuning.

101

Moreover, apart from mutation (Pro) and crossover (Pc) likelihood, it does not posses any parameters to be set. Concerning the two parameters mentioned, their values are arbitrary (pc--0.5 and pro=0.5), which means they may not to be optimal. This fact and certain convergence for populations n_~6 (this limit is intuitively obvious due to the algorithm specificity) make SGA convenient and versatile tool for constructing adaptive genetic algorithm in the furore (alike adaptive simulated annealing proposed by Ingber [5]). Tab.1. FE and SS for PRSA for different populations n

2 4 8 16 32 64 128 256 512

Tight problem FE SS 195791 2.69 127492 4.13 112341 4.69 94634 5.56 53280 9.88 1 6 6 7 2 31.5

Loose problem FE SS

205456 92192 158784 169600 193690 394752

2.56 5.71 3.31 3.10 2.72 1.33

2~000

16n00fl

n

Tight problem

36 64 100 144 196 225 256 324 400 484 625

Loose problem FE SS FE SS 46933 1 1 . 2 1 75107 7.01 32154 1 6 . 3 7 60659 8.68 21740 2 4 . 2 1 64150 8.20 1 7 3 2 3 30.38 71280 7.38 22070 23.85 74500 7.06 1 7 0 5 5 30.86 67073 7.05 1 8 3 5 5 28.68 89446 5.88 1 8 4 6 8 28.50 88614 5.94 1 9 0 0 0 27.70 107360 4.90 26088 20.18 125501 4.19 32813 16.04 118500 4.44

40(100{1 I

TIGHT

180000

Tab.2. FE and SS for SGA for different populations

.

a0000oI

1~000

.

.

.

LO0~

3500001

120000 100000

PRSA

250000 I

R0000

20000fl

6~

I

ooool , f S C'-,A

200~

100

200

3~

400

~00

6~

7~

5~100

O

1~

2~

~

4~

fi~

600

700

Fig.2. PRSA versus SGA for thigh and loose problems (axis y - number o f function evaluations needed to obtain the global minimum, axis x - population). As shown in Fig.2 and Tabs. 1,2, the best realization o f PRSA (for n=64) is slightly better than SGA one (for n=144) for tight problem. As it was told in [1], for PRSA and n>64 the number o f FE increases considerably (unfortunately Mahfoud and Goldberg do not give any FE evaluation for n>64), while for SGA FE is relatively stable for a wide range o f population capacities. However, the striped version of PRSA with fast quenching cooling schedule [1] gives FE=6000 for n=128, which shows the strength of PRSA tuning.

102

Nevertheless, the strength of SGA consists in efficient processing of wide span schemata and explicit highly sealable geometric (domain) parallelism. The former approves the results obtained by SGA for loose problem. As we can see, the best SGA result outdoes that obtained for PRSA on about 50% (according to [1], pure GA is not able to solve the loose problem even for n=5000!!). Moreover, the performance of SGA is stable for a wide range of n, whereas the PRSA leader is almost twice as good as the second best (see Fig.2). Taking into account parallel realization, SGA proves even better. The maximum potential parallel speedup PS is defined in [1] as the serial speedup multiplied by the maximal number of processors which can be theoretically engaged in computations for given n. Assuming that the number of processors is equal to 256, the PS values for tight and loose problems for SGA and PRSA are shown in Fig.3. \

T IGi'IT

LOOSE

./ 0

2000

/ 4000

/ 6000

/ 8000

Fig.3. The comparison of maximal potential parallel speedups for tight and loose problems for SA (PS=I), PRSA and SGA algorithms on 256-processors ideal (without memory access problems) shared memory computer. However, such the theoretical shared memory approach is currently impossible because of not resolved simultaneous memory access. Therefore, message-passing realization on massively parallel computers has to be considered.

4 Parallel realization of SGA As was told in the introduction, GAs are difficult to parallelize to a level of high scalability. It comes from the fact, that GAs (PRSA included) use global knowledge in their selection process. Usually, the parallel message-passing versions (synchronous) use master-slave approach, where master sends-and-receives the current population to-from slaves and randomly shuffle pointers to population elements. For large scale computations all population members have to be sent from-to slaves to-from master. Both the limited memory of master processor and

103

transmission delay will be bottlenecks of efficient massive parallel processing. On the other hand, asynchronous parallel processing (very often use in GAs), where the population members migrate to-from other processor, suffers from slow convergence due to migration [1] and scales wrong with increasing number of processors. In contrast, parallelization of SGA is straightforward because of spatial component enabling domain decomposition. Two approaches were examined (see Fig.4).

|

[] tI E,,," /

E

iv

I-rl,, I E 1 ,I. ",,

Suggest Documents