Continuous Selection and Self-Adaptive Evolution ... - CiteSeerX

27 downloads 0 Views 364KB Size Report
The first author would like to thank Professor Hans-Paul Schwefel for his constructive comments which ...... [13] H.-P. Schwefel and G. Rudolph. Contemporary ...
Continuous Selection and Self-Adaptive Evolution Strategies Thomas Philip Runarsson

Xin Yao

Science Institute, University of Iceland [email protected]

School of Computer Science, University of Birmingham [email protected]

Abstract— The intention of this work is to eliminate the need for a synchronous generation scheme in the (µ +, λ) evolution strategy. It is motivated by the need for a more practical implementation of selection strategies on parallel machine architectures. This strategy is known as continuous or steady state selection. Continuous selection is known to reduce significantly the number of function evaluations needed to reach an optimum, in evolutionary search, for some problems. Here evolution strategy theory is used to illustrate when continuous selection is more efficient than generational selection. How this gain in efficiency may influence the overall effectiveness of the evolution strategy is also investigated. The implementation of continuous selection becomes problematic for algorithms using explicitly encoded selfadaptive strategy parameters. Self-adaption is therefore given special consideration in this work. The discussion leads a new evolution strategy version.

I. I NTRODUCTION Evolutionary algorithms using continuous selection add individuals directly to the population without waiting for the conclusion of a discrete generation. This selection method is also known as steady-state selection and more specifically (µ + 1) in evolution strategies (ES) [9]. The (µ + 1) selection method is nevertheless rarely used, since an effective method of controlling the mutation strength is lacking [11]. Some recent work on median selection [14] has been proposed to solve this problem. This method uses the covariance matrix adaptation CMA-ES [7], which has been criticized for lacking any resemblance to natural computation [5]. It is not known whether and how median selection is applicable to the more commonly used methods of self-adaptation, like those in [12], [3]. There exist also other evolutionary models, such as the predator-prey approach to multi-objective optimization [8], which use some form of continuous selection. The continuous selection presented in this work is motivated and developed on the basis of available (µ +, λ) ES theory [12], [3]. The common motivations for using continuous selection are: • The algorithm scales better in a multi-processor environment, as the algorithm is ‘continuous’ and there is no need to wait for the completion of a discrete generation. • Accelerated evolution on a single processor computer. By keeping the population up-to-date, the replication of low The first author would like to thank Professor Hans-Paul Schwefel for his constructive comments which helped to improve the clarity of this paper greatly.

fitness individuals may be avoided. The approach is more like natural systems, where selection is a continuous process. In the past there have been arguments against the use of continuous selection. One of them has been mentioned, this is the problem posed for self-adaptation. Another is that continuous versus generational selection is essentially a change in the exploration/exploitation balance [6]. For global optimization problems a greater exploration of the search space is desirable. However, there are other means of achieving exploration, as for example by the proper choice of mutation strength, search distribution, and the number of offspring generated. The paper is organized as follows. Section II discusses progress rates, which is then is used to argue for why continuous selection requires fewer function evaluations than generational selection for some problems. This leads to a discussion on the self-adaptation of mutation strengths in section III. Then a new approach to continuous selection, motivated by the need for achieving self-adaptation, is presented in section IV. The continuous selection is demonstrated and compared with ES algorithms using generational selection on some standard unimodal, time-varying, and multimodal test problems in section V. In the final section a summary and conclusion is drawn. •

II. R ATE OF PROGRESS The driving force of any evolutionary algorithm (EA) is the rate of imperfect replication of data structures. The number of times any structure is replicated is limited by selection determined by its fitness and in some instances its generational age. The data structures’ imperfect replication has a probability density function that can either be predefined, self-adaptive, or dependent on the population. Constant rate bit-flip mutations and other fixed mutation strengths are examples of predefined perturbations of a data structure. Parameterized probability distributions, such as those using variable standard deviations, are examples of a self-adaptive perturbation. Recombination operators are examples of perturbations where the adaptation of step lengths and search directions are dependent on a population of data structures. A means of determining whether an EA “works”, independently of the data structure type and the way it is perturbed, is by estimating its rate of progress. If the goal is ~x∗ and a distance metric d(~x, ~x∗ ) ≥ 0 can be defined, then the rate of progress may be written as follows:

(t)

(t+∆t)

ϕ = E[d(~xπ(1) , ~x∗ ) − d(~xπ(1) , ~x∗ )] (1) where ∆t is some elapsed time period. As long as ϕ > 0, the EA algorithm “works”. The following notation is used: π −1 (i) is the rank given to individual i and π(j) is the individual assigned the rank j. Since π(j) is the individual assigned the rank j, the bracket notation π = hπ(1), π(2), . . . , π(µ), . . . , π(λ)i corresponds to listing all individuals in their ranked order. This ranking is determined by a fitness function, with individual π(1) as the fittest individual and π(λ) the worst. In the canonical evolution strategy there are in principle two forms of selection: (µ +, λ). In order to minimize computational costs, in terms of the number of fitness evaluations needed, a population size which maximizes the overall progress is computed. For a single processor computer the ˆ for a single parent (µ = 1) strategy, working with optimal λ an optimal normalized step length σ ˆ ∗ , has been calculated for a number of function models [12]. For the (1, λ) the optimal ˆ is: ' 2.5 for the inclined plane, ' 4.7 for the sphere model, λ and ' 6.0 for the corridor model. These optimal values are determined for when the maximal normalized universal rate of progress ϕˆ∗ divided by λ is maximal. In other words, the maximal progress per function evaluation or individual. For the models using a (1 + λ) strategy it is found that the ˆ is 1 [12]. If the maximal progress for (1, λ) and the optimal λ maximal progress gained by (1 + 1) times λ is compared, then it becomes clear that the (1 + 1) is the fastest strategy. For example, the maximal normalized progress rate for the sphere model is ϕ∗(1,5) (ˆ σ ∗ ) ≈ 0.7 and ϕ∗(1+1) (ˆ σ ∗ ) ≈ 0.2 or ≈ 1.0 for 5 function evaluations. Since the (1 + 1) strategy is also the simplest form of continuous selection, using a replace worst strategy, it is clear that EAs using continuous selection will require fewer function evaluation than generational selection to reach an optimum. Similar arguments can be made for µ > 1 for these models. In general there may, however, be cases where a (µ, λ) will require fewer function evaluations than a (1 + 1) strategy. Such an example, the sphere model in a noisy environment, has been found for the (µ/µ, λ) ES, i.e. using recombination [1]. It is also worth noting that the (µ/µ, λ) will attain the same progress as (1 + 1) times λ for the sphere model [3, p. 255]. Nevertheless, the theory is only valid as long as optimal step lengths are maintained. Maintaining near optimal step sizes is therefore of considerable concern. The question now is how may effective step size control be achieved using continuous selection. Before answering this, a review of how self-adaptation1 works for the (µ +, λ) ES is needed. III. S ELF - ADAPTATION Of the two canonical ES selection methods (µ +, λ), the (µ, λ) method is preferable for self-adaptation. The reason for this is due to the possibility of generating super-fit individuals whose inherited search distribution is highly unsuitable for 1 For the purposes of this work only the isotropic and non-isotropic selfadaptation using the log-normal update rule will be considered.

their new situation [12, p. 145]. These individuals are deleted in the following generation in the (µ, λ) strategy, but may remain indefinitely by the (µ + λ) strategy. As a result the search may stagnate. The (µ, λ) strategy effectively introduces a limited capacity of reproduction, ξ = λ/µ, of any data structure within a single generation. That is, any structure will only be replicated and mutated on average ξ times. Essentially the idea is that if no progress is obtained in this number of trials then it is best to delete the individual and its corresponding strategy parameters. Consider now the (1, λ) strategy for self-adaptation. This is equivalent to looking at any single parent in a (µ, λ) strategy with the same ξ. At any generation ξ trial step sizes are created (g) by mutating the parent step size σπ(1) , which is then tested on (g) the parent ~xπ(1) . The isotropic self-adaptation using the lognormal update rule is as follows, ηk (g+1) ~xk

(g)

=

σπ(1) exp(N (0, τo2 )),

=

(g) ~xπ(1)

+

~ (0, ηk2 ), N

(2)

k = 1, . . . , ξ

√ where τo ' c(1,λ) / n and the step size is updated in the (g+1) following discrete generation by setting σπ(1) = ηπ(1) . Similarly, the non-isotropic self-adaptation is performed as, ¢ ¡ 2 (g) (3) ηk,i = σπ(1),i exp N (0, τ 0 ) + Ni (0, τ 2 ) , (g+1)

(g)

2 ), k = 1, . . . , ξ, i = 1, . . . , n xπ(1),i + Ni (0, ηk,i p √ √ where τ 0 ∝ 1/ 2n, τ ∝ 1/ 2 n. The step size is updated (g+1) as before by setting ~σπ(1) = ~ηπ(1) . The primary aim of the step size control is to tune the search distribution so that maximal progress in maintained. For this some basic conditions for achieving optimal progress must be satisfied. The first lesson in self-adaptation is taken from the 1/5-success rule [10, p. 367]. The rule’s derivation is based on the probability we that the offspring is better than the parent. This probability is calculated for the case where the optimal standard deviation is used w ˆe , from which it is then determined that the number of trials λ must be greater than or equal to 1/w ˆe if the parent using the optimal step size is to be successful. Founded on the sphere and corridor models, this is the origin of the 1/5 value. Furthermore, for the sphere model, the (1, λ) ES will not work, i.e. ϕ ≤ 0, for σ ∗ > 2c(1,λ) and for λ = 1 as c(1,1) = 0, where σ ˆ ∗ = c(1,λ) [3, p. 72]. In a mutative step size control, such as the one given by (2), there is no single optimal standard deviation being tested, but rather a series of trial step sizes ηk , k = 1, . . . , ξ centered2 around the parent step size σπ(1) . Consequently, the number of trials ξ should be greater than that specified by the 1/5success rule. Frequently, ξ = λ/µ ≈ 7 is used, but 10 is also common. Clearly larger values must be chosen in order to guarantee success3 . If enough trial steps for success are

xk,i

2 The

=

expected median is σπ(1) . may especially be the case for when the number of free strategy parameters increases, as for the non-isotropic case. It is also possible that in this case intermediate recombination of the strategy parameters will reduce the number of trials needed. 3 This

generated near the optimal standard deviation then this trial step size will be inherited via the corresponding offspring. This offspring will necessarily also be the most likely to achieve the greatest progress and hence be the fittest. The fluctuations on σπ(1) , the trial standard deviations ηk , and consequently also on the optimal mutation strength will degrade the performance of the ES. The theoretical maximal progress rate is impossible to obtain. Any reduction of this fluctuation will therefore improve performance [3, p. 315]. The CMA-ES mentioned in the introduction has been proposed as one method of achieving this. Intermediate recombination of strategy parameters may be another means [2, p. 346]. IV. C ONTINUOUS SELECTION The continuous selection method developed is essentially a continuous version of the (µ, λ) strategy. As in the (µ, λ) strategy ξ = λ/µ trial step sizes are generated and tested on ~xπ(j) , j = 1, . . . , µ. However, unlike the generational approach, where the ranking is computed at each discrete generation, the ranking π will be determined immediately following the evaluation of an individual. In this way only the best individuals are reproduced, and time is not wasted on the lesser fit. So essentially one may consider ξ to be an upper bound on the number of replications a parent can undergo. A single replication and isotropic mutation is described as follows, ηπ(k) = σπ(j) exp(N (0, τo2 )) ~ (0, η 2 ) ~xπ(k) = ~xπ(j) + N (4) π(k)

ζπ(j) ζπ(k)

= ζπ(j) + 1 = 0

where the counter ζπ(j) keeps a record of the number of replications made of parent π(j). The individual replaced is the worst one, that is k = λ. Only when the ξ trials have been completed will the mutation strength be updated by the trial step size that produced it, σπ(j) = ηπ(j) (5) and again ξ trials are made using this new parent strategy parameter. When these trials are over the parent must be deleted, because both σπ(j) and ηπ(j) have failed to produce a better individual. This is done by setting k = j. This new continuous selection scheme is given the notation (µ ξ, λ). The full algorithm details, for the (µ ξ, λ) ES using isotropic self-adaptation, is given in figure 1. In this new scheme it is possible that ξ > λ/µ, however, in order to minimize the presentation of new material ξ is set to approximately λ/µ. The variable λ should now be thought of as the population size and not in terms of the number of offspring produced. By introducing a limited reproductive capacity it appears reasonable to maintain a population. By maintaining a population of solutions a record of successful trial points and mutation strengths are stored in memory. These individuals wait for their turn for one of the top µ position. Those that once lost this status may then eventually regain it and continue replicating from where they left off. These individuals can

Initialize: x], σi := σo , ζi := 0, ~ xi uniform randomly ∈ [~ x, ~ evaluate f (~ xi ) and permutation π; i := 1, . . . , λ. 1 while termination criteria not satisfied do 2 j ← parent uniform randomly chosen from {1, . . . , µ} 3 if (ζπ(j) = ξ) then 4 σπ(j) ← ηπ(j) (update parent step size with trial step size) 5 fi 6 if (ζπ(j) = (2ξ − 1)) then 7 k ← j (overwrite parent, both parent and trial step size have failed) 8 else 9 k ← λ (overwrite worst) 10 σπ(k) ← σπ(j) (inherit parent step size) 11 ζπ(j) ← ζπ(j) + 1 (increment reproduction counter) 12 fi 13 ζπ(k) ← 0 (zero offspring reproduction counter) 14 ηπ(k) ← σπ(k) exp(N (0, τo2 )) (create trial step size) ~ (0, η 2 ) (variate object variables) 15 ~ xπ(k) ← ~ xπ(j) + N π(k) 16 evaluate f (~ xπ(k) ) and permutation π od Fig. 1. The new self-adaptive evolution strategy using continuous selection.

also be utilized for recombination. For the non-isotropic selfadaptation this may be essential for the reduction of fluctuations in the mutation strength. V. E XPERIMENTAL STUDIES In this section four unimodal and three multimodal test functions are studied. The unimodal test cases are the sphere, elliptic, correlated model, and a time varying elliptic model. The multimodal models are Ackley, Kowalik, and Rastrigin’s function. A. Unimodal The aim of this study is to empirically verify the new continuous selection method and compare it with the canonical approach. In all cases the number of variables is n = 30 and the population is initialized uniformly and randomly√ ∈ [−1, 1]. The default initial mean step size is set at σ (0) = 2/ 12, as in [5]. No parametric constraints are forced on either the object variables nor strategy parameters. Sphere model: In this section the sphere model, min f1 (~ x) =

n X

x2i

i=1

is used to study isotropic self-adaptation (SA). The different evolution strategy schemes are: 1) (1 + 1) optimal mutation strength, 2) (1, 5) optimal mutation strength, 3) (1 5, 5) isotropic self-adaptation with c(1,5) = 1.1630, 4) (1, 5) and (1, 10) isotropic self-adaptation using c(1,5) = 1.1630 and c(1,10) = 1.5388 respectively. The optimal mutation strength used by the first two schemes is [10, p. 366]: σ ˆ = 1.224 k~x∗ − ~xk/n (6) The optimal (1 + 1) and (1, 5) ES are used to illustrate where the performance limits lie. The (1, λ) SA-ES strategy is bounded by the limit set by the optimal (1, λ) ES, whereas the (1 λ, λ) SA-ES is bounded by the optimal (1 + 1) ES.

(1, 5) SA-ES (1, 10) SA-ES (1, 5) ES opt (1 5, 5) SA-ES (1 + 1) ES opt

−10

mean best function value

10

−20

10

PSfrag replacements

−30

10

−40

10

0

200

400

600

1 function evaluations × 10

800

1000

Fig. 2. Average simulation results for the sphere model.

Figure 2 shows the best function value versus the number of function evaluations averaged over 1000 independent runs. For both self-adaptive schemes the initial step size is too large and therefore there is a temporary worsening of the objective function value. This is illustrated in the extra figure, plotted in linear scale, for the first 500 function evaluations. The quality of (1, 10) SA-ES is better than that of the (1, 5) SA-ES. For the fluctuating step size a larger number of trials may be needed. However, as the number of trials is increased further, the progress per function evaluation is again reduced. The performance of the (1 5, 5) SA-ES approaches and exceeds that of the optimal (1, 5) ES. This empirically verifies that the self-adaptation using continuous selection works and as expected from the theory is more efficient than the SA-ES using generational selection. Elliptic model: In this section the elliptic model, min f2 (~ x) =

n X

1.5(i−1) x2i

i=1

practice the canonical ES also commonly uses a dominant recombination on the object variables, as done in [5]. The performance of this strategy, (10/10ID , 100), is better and almost as good as the one using continuous selection. All of these results are depicted in figure 3, which is an average taken over 1000 independent runs. It is interesting to observe that the (1/λI λ, λ) strategy, which does not use recombination on the object variables, should achieve this performance. It is expected that the strategy parameters averaged are more closely related (i.e. originate from the same parent) than those of a (µ/µI , λ) strategy. This may also be one reason why the canonical approach requires the additional recombination of object variables. Correlated model: The unimodal model studied here is the correlated model, min f3 (~ x) =

i=1

is used to study non-isotropic self-adaptation. This model is similar to the sphere model, but each variable now has an unequal contribution to the objective function. For this reason the isotropic mutation is inadequate for solving the task [5]. The (1 λ, λ) algorithm is equivalent to the one in figure 1, but now uses the non-isotropic mutation (3) in line 14. In these experiments it is decided to keep a fixed population size of 100, the reproductive capacity,√ξ or λ/µ, at 10,p and the √ 0 mutation parameters are τ = 1.73/ 2n, τ = 1.73/ 2 n. Initially (1 10 , 100) and (10, 100) SA-ES are compared. Both methods stagnate, the former sooner than the latter. In order to avoid stagnation, intermediate recombination on the strategy parameters is introduced. What this means for the algorithm using continuous selection is that line 4 in figure 1 should be replaced by an average of the top ρ trial step sizes, ρ PSfrag replacements 1X ηπ(k),i , i = 1, . . . , n. (7) σπ(j),i = ρ k=1

This algorithm is denoted (1/ρI ξ, λ), where the subscript I denotes intermediate recombination. When (1/10I 10 , 100) and (10/10I , 100) are compared then the continuous selection method outperforms the generational one. However, in

n X i X

2

xj

j=1

Unlike the sphere model the correlated model is not spherically symmetric. The model has been used to illustrate the need for rotation angles in ES. This requires up to another n(n − 1)/2 strategy parameters in addition to the usual n used for the non-isotropic mutation. Generating successful trial search distributions would then be even more difficult than in the previous study. In this study an isotropic self-adaptation will be used. The progress velocity for an optimal isotropic mutation will be less than that of the optimal correlated non-isotropic mutation. However, one must also consider the adaptation time for a strategy using n(n − 1)/2 + n different strategy parameters. The search time for the isotropic self-adaptation will also increase, since the reproductive capacity must be increased in order to guarantee success. When the results presented here are compared with that in [5] using correlated mutations, it is noticed that the isotropic mutation is faster. Two experiments are run on the (1, λ) and (1 λ√ , λ) ES using a self-adaptive isotropic mutation with τo = 1/ n. The first experiment uses a population size (λ) of 10 and the second 20. Again 1000 independent experiments are performed and the averaged best function value versus number of function evaluations plotted, see figure 4. In the case where λ = 10 10

10

0

10

(1 10 , 100) (10, 100) (10/10I , 100) (10/10ID , 100) (1/10I 10 , 100)

−10

10

mean best function value

0

10

−20

10

−30

10

−40

10

−50

10

−60

10

0

400

800

1200

1 function evaluations × 10

1600

Fig. 3. Average simulation results for the elliptic model.

2000

0

10

−5

mean best function value

10

−10

10

−15

10

−20

10

(1, 10) (1 10 , 10) (1, 20) (1 20 , 20)

PSfrag replacements −25

10

0

100

200

300

400

500

600

700

1 function evaluations × 100

800

900

1000

Fig. 4. Average simulation results for the correlated model.

the self-adaptation of (1 10 , 10) fails, but interesting not so for the (1, 10). When the reproductive capacity is increased, and therefore also the probability of success, then the selfadaptation for (1 20 , 20) works. The performance is also significantly better than both (1, λ) experiments. B. Time varying In this section the self-adaptive properties of the new ES version is examined on the time varying elliptic function: min f4 (~ x) =

n X

ri (xi − x∗i )2

i=1

where ri , a shuffled array of integers between 1, . . . , n = 30, and the optimum (x∗i ∈ [−1, 1] values) are generated at random and anew every 1000 × λ function evaluations. The algorithms that successfully solved the elliptic model (f2 ) are compared, i.e. (1/ρI ξ, λ) and (µ/ρID , λ). Here the following parameters√are used: λ = 100, p µ √ = ρ = 15, ξ = 5, τ 0 = 1.558/ 2n, and τ = 1.558/ 2 n. For the (1/ρI ξ, λ) strategy the function value held by individuals in the population are invalid as soon as the problem changes, in this case they are simply set to ∞. Figure 5 shows the best function value and corresponding standard deviation for variable x15 , averaged over 100 independent runs. From the

0

10

performance

u n u1 X −20 exp @−0.2t x2 A − exp n i=1 i

−5

10

!

+ 20 + e

TABLE I

−10

10

0

T HE NUMBER OF GLOBAL HITS FOR ACKLEY ’ S FUNCTION .

(1/15I 5, 100) mean best function value mean mutation strength in 15-th variable (15/15ID , 100) mean best function value mean mutation strength in 15-th variable

−15

10

n 1X cos 2πxi n i=1

The parent number and reproductive capacity is varied and the global hits recorded for 100 independent runs. The number of function evaluations are 500 × λ. The results are depicted in table I. As may have been expected the number of global hits increases as µ and ξ increase. For this test case the global search performance of continuous selection is better than that of generational selection.

5

10

PSfrag replacements

figure one can see that both methods behave similarly and are able to adapt to a changing function landscape. C. Multimodal The unimodal models studied so far have illustrated the self-adaptive properties and superior local search performance of continuous selection. In order to give some indication of its effect on global search performance multimodal functions must also be examined. Two approaches will be used to increasing the likelihood of a global hit. Firstly, by using a larger number of parents (µ). This may result in a more thorough exploration of the search space. Secondly, by using a large mutation strength. Initially a large mutation strength, covering the search space, is usually selected. Large mutation strengths come at the cost of a lower success probability and so the reproductive capacity ξ (i.e. λ/µ) must be increased if they are to be maintained. A search is only local to the effect of the mutation strength, if the mutation strength is large one might call it global search [2]. These principles will now be demonstrated on Ackley, Kowalik, and Rastrigin’s function respectively. In all cases the number of variables is again n = 30 and the population is initialized uniformly and randomly ∈ [−32, 32], [−5, 5] and [−1, 1], respectively. The Ackley experiments use an isotropic √ experiments mutation with σ (0) = 64/ n and the Kowalik √ a non-isotropic mutation with ~σ (0) = 1/ n. The Rastrigin experiments will also be used to demonstrate the use of recombination for the new ES version. It will also be used to illustrate how larger mutation strength may be maintained longer by using smaller learning rates (τo ). An isotropic mutation is used √ in this case with an initial step size of σ (0) = 2/ 12. The studies will support the argument that the use of continuous or generational selection has little impact on overall global search performance. p √ The mutation√parameters are: √ τ 0 = 1/ 2n, τ = 1/ 2 n, and τo = 1/ n (varied in the Rastrigin experiment). Ackley: In this study the global hit performance of the (µ, 200) SA-ES and (µ ξ, 200) SA-ES are examined for Ackley’s function: x) = 5 (~ v min f1 0

1000

2000

function evaluations

1 × 100

3000

4000

Fig. 5. Mean best function value and standard deviation of the x15 variable for both (1/ρI ξ, λ) and (µ/ρID , λ).

µ\ξ 10 20 30 40

5 38 58 61 72

(µ ξ, 200) 10 20 65 92 90 98 92 100 94 100

30 97 100 100 100

40 97 100 100 100

(µ, 200) µ 10 22 20 24 30 26 40 23

Kowalik: The same experiment is now performed on Kowalick’s function:   min f6 (~ x) =

11 X i=1

ai −

x1 (b2i + bi x2 ) b2i + bi x3 + x4

2

where ~a =(.1957, .1947, .1735, .1600, .0844, .0627, .0456, .0342, .0323, .0235, .0246) and ~b−1 =(.25, .5, 1, 2, 4, 6, . . ., 16). The number of function evaluations are 1000 × λ. The number of global hits for 100 independent runs is given in table II. The results are not as decisive as for the Ackley experiment. On closer inspection it is revealed that local minima are found outside the object variable range initially specified. There appears to be little performance difference between continuous and generation selection in this case, although the global hit performance of (µ, 200) is reduced as µ is increased. This may be due to the corresponding reduction in reproductive capacity by 200/µ. TABLE II T HE NUMBER OF GLOBAL HITS FOR (µ ξ, 200) µ\ξ 5 10 20 30 10 74 75 75 71 81 79 80 20 74 30 79 77 79 82 40 79 71 85 76

KOWALIK ’ S FUNCTION . (µ, 200) µ 40 10 87 67 20 72 77 30 53 76 40 14 90

Rastrigin: In [4] a (50/50ID , 100) strategy was used to solve successfully Rastrigin’s function: f7 (~ x) =

n X [x2i + B(1 − cos(2πxi ))]

i=1 √ where B = 2. A learning rate of τo = 1/ 2n was used. √ This is a smaller value than the commonly used τo = 1/ n. By using a smaller τo the rate of change in the mutation strength is reduced. The reduction of the initial mutation strength, covering the search space, is therefore slowed down. As a result more trials are made using a larger mutation strength and so the likelihood of a global hit is increased. This will be illustrated three √ √ sets of experiments with τo √ by performing equal to 1/ n, 1/ 2n, and 1/ 4n respectively. In each case 100 independent runs are performed. Furthermore, dominant recombination is introduced to the new ES version. This implies that for the algorithm in figure 1, line 15 must be replaced by: 2 xπ(j),i = xπ(`),i + N (0, σπ(k) ), i = 1, . . . , n (8) where ` ∈ {1, . . . , ρ} is chosen randomly and anew for each i. The algorithms compared are therefore (µ/µID , λ) and (µ/µID ξ, λ), where λ = 100, µ = ρ = 50, and ξ = 5. The number of global hits is given in table III. Additionally, the number of function evaluations needed to reach an average function value of 0.01 is also shown. This gives some idea of the average time needed for the different experiments. As expected a longer search time is required for the smaller τo values, however, the number of global hits increases. Here, like before, the new ES is faster (also greater number of global hits). It must be reiterated that the global/local attractor is determined by the search distribution (in this case the mutation strength) and the number of offspring generated (i.e. probability of success). Continuous selection appears simply to offers faster progress towards this attractor.

TABLE III N UMBER OF GLOBAL HITS FOR R ASTRIGIN ’ S FUNCTION AND MEAN NUMBER OF EVALUATIONS NEEDED TO REACH A FUNCTION VALUE 0.01. τo (µ/µID , λ) (µ/µID ξ, λ)

hits √ 1/ n 76/405 89/285

/ √ 1/ 2n 86/740 95/530

evaluations/λ √ 1/ 4n 94/1520 100/1050

VI. C ONCLUSION The continuous selection proposed in this work is based on the available evolution strategy theory. Its key contribution has been the notion of an upper bound on the reproductive capacity ξ. This limited capacity to reproduce [12, p. 119], should not to be confused with the “generational age” κ introduced by the contemporary evolution strategies [13], [12, p. 247]. A limited capacity to reproduce has already been introduced before in the (µ, λ) strategy by the ratio λ/µ. Here it has been brought more clearly into attention and reformulated for continuous selection. As a result a new ES version has been created. Based on the preliminary findings presented here, there appears to be little support for the necessity of maintaining a synchronous generational selection scheme. R EFERENCES [1] D.V. Arnold and H.-G. Beyer. Local performace of the (µ/µI , λ)-ES in a noisy environment. In W. Martin and W.M Spears, editors, Foundations of Genetic Algorithms 6 (FOGA 2000), pages 127–141, 2001. [2] H.-G. Beyer. Toward a theory of evolution strategies: self-adaptation. Evolutionary Computation, 3(3):311–347, 1996. [3] H.-G. Beyer. The Theory of Evolution Strategies. Springer-Verlag, Berlin, 2001. [4] H.-G. Beyer and H.-P. Schwefel. Evolution strategies: A comprehensive introduction. Natural Computing, 1, 2002. In press. [5] K. Deb and H.-G. Beyer. Self-adaptation in real-parameter genetic algorithms with simulated binary crossover. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 172–179, San Francisco, CA, 1999. Morgan Kaufmann. [6] K. A. DeJong and J. Sarma. Generation gaps revisited. In Darrell Whitley, editor, Foundations of Genetic Algorithms - 2, pages 19–28, Vail, CO, 1993. Morgan Kaufmann. [7] N. Hansen. and A. Ostermeier. Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In IEEE Intern. Conf. on Evolutionary Computation, pages 312–317, 1996. [8] M. Laumanns, G. Rudolph, and H.-P. Schwefel. A spatial predatorprey approach to multi-objective optimization: A preliminary study. In Parallel Problem Solving from Nature – PPSN V, pages 241–249, Berlin, 1998. Springer. [9] W. Porto. Evolutionary programming. In Th. B¨ack, D. B. Fogel, and Z. Michalewicz, editors, Handbook of Evolutionary Computation, chapter B1.4:5. Oxford University Press, New York, and Institute of Physics Publishing, Bristol, 1997. [10] I. Rechenberg. Evolutionstrategie ’94. Frommann-Holzboog, Stuttgart, 1994. [11] G. Rudolph. Evolution strategies. In Th. B¨ack, D. B. Fogel, and Z. Michalewicz, editors, Handbook of Evolutionary Computation, chapter B1.3:2. Oxford University Press, New York, and Institute of Physics Publishing, Bristol, 1997. [12] H.-P. Schwefel. Evolution and Optimum Seeking. Wiley, New-York, 1995. [13] H.-P. Schwefel and G. Rudolph. Contemporary evolution strategies. In Advances in Artificial Life. Third ECAL Proceedings, pages 893–907, Granada, Spain, June 4–6, 1995. Springer, Berlin. [14] J. Wakunda and A. Zell. A new selection scheme for steady-state evolution strategies. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 794–801, Las Vegas, Nevada, July 1012 2000. Morgan Kaufmann Publishers, San Francisco, California.

Suggest Documents