Noisy Optimization Problems - A Particular Challenge for ... - CiteSeerX

2 downloads 4568 Views 523KB Size Report
Challenge for Differential Evolution? Thiemo Krink*. *EVALife, Dept. of Computer Science. University of Aarhus. Aabogade 34, 8200 Aarhus N, Denmark. Email: ...
Noisy Optimization Problems - A Particular Challenge for Differential Evolution? Thiemo Krink*

Bogdan Filipiˇc

Gary B. Fogel

*EVALife, Dept. of Computer Science University of Aarhus Aabogade 34, 8200 Aarhus N, Denmark Email: [email protected]

Dept. of Intelligent Systems Joˇzef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Email: [email protected]

Natural Selection Inc. 3333 N. Torrey Pines Ct., Suite 200 La Jolla, CA, USA Email: [email protected]

Ren´e Thomsen* Email: [email protected] Abstract— The popularity of search heuristics, has lead to numerous new approaches in the last two decades. Since algorithm performance is problem dependent and parameter sensitive, it is difficult to consider any single approach as of greatest utility over all problems. In contrast, differential evolution (DE) is a numerical optimization approach that requires hardly any parameter tuning and is very efficient and reliable on both benchmark and real-world problems. However, the results presented in this paper demonstrate that standard methods of evolutionary optimization are able to outperform DE on noisy problems when the fitness of candidate solutions approaches the fitness variance caused by the noise.

I. I NTRODUCTION Evolutionary algorithms (EAs) and related search heuristics are very popular, because they are easy to implement and applicable to a wide range of different problems. Continued development in the community has led to the discovery of several shortcomings, such as premature convergence, as well as new developments to avoid these pitfalls. For experts and practitioners alike, it has become very hard to keep an overview of the most useful approaches, since the performance of heuristics is notoriously problem dependent and parameter sensitive. When compared to other forms of EAs, differential evolution (DE) [18], [13], is a rather unknown numerical optimization procedure, despite having been proposed almost a decade ago. By experimentation, we recently noticed that DE has exceptional performance compared to other search heuristics in numerical optimization. Surprisingly, DE requires hardly any parameter tuning and works very reliably with excellent overall results over a wide set of benchmark and real-world problems, such as partitional data clustering [12], parameter identification of system models in engineering [21] and flexible ligand docking in bioinformatics [20]. Of course, this exceptional performance is not in conflict with the No-FreeLunch (NFL) theorem [23]. However, NFL considers literally all search problems, including totally random problems, which are not very meaningful in practice. Real search problems have certain meaningful characteristics and in that respect there are algorithms (and algorithmic settings) that differ in their utility. In many real-world applications, the search space

0-7803-8515-2/04/$20.00 ©2004 IEEE

characteristics (such as multi-modality) are usually not known in advance and because of the lengthy time requirement for candidate solution evaluation there is not much room for elaborate tuning and pre-experimentation. Especially in this respect, DE can be a very attractive choice for real-world applications. The performance of DE was so striking on the test problems we tried when compared to standard EA approaches that we reasoned that an unreported weakness for DE over certain types of optimization problems must exist. DE uses a rather greedy and less stochastic approach to problem solving compared to EAs and because of this, we hypothesized that noisy fitness functions might pose great difficulty for DE. The problem of dealing with noisy function evaluation has been addressed by various experts in the field, mainly for evolution strategies (ES) [14], evolutionary programming (EP) [4], genetic algorithms (GA) [5] and particle swarm optimization (PSO) [7]. Resampling is perhaps the most obvious and simple approach to deal with noise. The idea is simply to evaluate the same candidate solution m times and to estimate the ’true’ fitness value by the mean of the samples. Resampling involves a high cost, especially if the time to evaluate candidate solutions is the main bottleneck of the optimization, which is typical of real-world applications. However, since resampling can greatly help to prevent deception from the optimal solution, this approach can be a good investment when compared to population size and number of iterations, which are two other factors that determine the overall number of candidate solution evaluations. Note that in our study, we considered the trade-off in computation time for using resampling versus number of iterations and population size. Darwen [3] noted that the performance improvement obtained with additional samples eventually reaches a point of stagnation and that additional resampling might even be harmful for small populations. A simple way to counter increased noise in the system is to apply a corresponding increase in the population size, however this requires proportionally more candidate solution evaluations. Rudolph [15] suggested that the fitness relationship between

332

individuals for non-noisy problems can be considered as a total order, whereas for noisy problems it becomes a partial order. In other words, for noisy problems not all solutions can be compared (because of noise). However, solutions which do not have overlapping confidence intervals of their fitnesses can be directly compared, such that resampling is unnecessary. This consideration is mainly relevant at the beginning of the run when fitness differences are particularly large and computation time can be saved by less frequent resampling of solutions. However, this requires a priori knowledge of the fitness variance caused by the noise and is also based on the assumption that the fitness variance is roughly the same for all candidate solutions. A different way of tackling noise, called ”thresholding,” was introduced by Markon et al. [8]. The idea is to use a new selection operator for ES, such that a new candidate solution can only replace an existing one if the fitness difference is larger than a threshold τ . This operator was investigated for the s-ring problem (which is a simplification of an elevator control problem) and the sphere problem. This approach requires (in practice) to determine a useful τ , which depends on the variance of the noise and the fitness distance to the optimal fitness value. Although the former might be roughly estimated by preliminary experimentation it is rather unrealistic that the latter can be determined empirically. Moreover, the authors concluded from their empirical results that thresholding requires a modified adaptation rule for the mutation rate. Branke [1], [2] suggested to tackle noise by estimating an individual’s fitness with local regression of the fitness of neighboring individuals. The underlying assumptions are that the (true) fitness function (without noise) can be locally approximated by a low polynomial function, that the variance in a local neighborhood is constant and that the noise is normally distributed. In order to ensure that these assumptions were sufficiently fulfilled, Branke introduced a weighting function to determine the influence of a data point to the fitness estimation of another location that decreases with the distance and is cut-off from a certain threshold value on. Also this approach introduced a new critical parameter (here for the weight function), which can be hard to determine in practice. A somewhat similar idea by Tamaki and Arai [19] was to use the fitness value of the parents to estimate the ’true’ fitness. However, as Kito and Sano point out [16], the parents might have survived previous selection themselves by a stochastic ’overestimation’, which causes a systematic error in the estimation of the fitness value of the offspring. Another approach utilizing EP was a self-adaptation variant called robust evolutionary programming (REP) [9]. REP utilizes Cauchy instead of Gaussian mutation and adds redundant (inactive) strategy parameters to the representation of candidate solutions, which are swapped with active parameters during the run. Regarding GAs, Kita and Sano [16] introduced a memorybased fitness evaluation GA (MFEGA). Their main idea was to estimate the fitness as a weighted average of sampled fitness values during the run using maximum likelihood estimation. For PSO, Parsopoulos and Vrathatis [11] investigated the

effect of noise in optimization functions and concluded that PSO performance was very stable and efficient although also their data showed some substantial performance loss for the noisy Beale function. In this paper we compared the performance of DE with PSO and EA regarding noisy optimization problems and resampling. The paper is organized as follows. In the next section II we briefly introduce DE and the other search heuristics that we used for comparison. In section III, we outline our experimental design, parameter settings of the algorithms and introduce the noisy and non-noisy optimization benchmark functions. The results of our study are shown in section IV and discussed in section V. Finally, section VI summarizes the main results of our study in conclusion. II. M ETHODS For the experiments in our investigation, we have implemented conventional versions of DE, PSO, and EA. In our study, candidate solutions were represented as arrays of floating point numbers. Moreover, for all three algorithms, the initial population is composed of randomly generated candidate solution vectors, i.e., for each dimension of each vector we generate a uniformly distributed random number in the interval of the search space dimension. Afterwards all solutions were evaluated according to the benchmark fitness functions, which were presented in section III-C. Noise in fitness functions can be tackled in different ways as outlined in the introduction. In our study, we used repeated re-evaluation of candidate solutions in order to estimate the ’true’ fitness by the mean of the re-evaluations. A. Differential Evolution DE [18], [13] is a population-based heuristic related to evolutionary computation. After generating and evaluating the initial population, the solutions are iteratively refined as follows: For each candidate solution j in the population, choose three other solutions k, l, and m randomly from the population (with j 6= k 6= l 6= m), subtract solution vector k from l, scale the resulting vector by a parameter factor f and create an offspring c by adding the scaled vector to the solution vector of m. The only additional twist in this process is that not the entire chromosome of offspring c is created in this way, but that genes are partly inherited from individual j. The proportion is determined by the socalled crossover factor CF , which determines how many consecutive genes of the difference vector on average are copied to the offspring. The offspring solution is evaluated and substitutes its parent j deterministically if its fitness is better. This alteration and selection procedure is also known as the ’DE/rand/1/exp’ operator. Note that there are also other frequently used DE operators, such as ’DE/rand/1/bin’ and ’DE/best/2/bin’, which we did not investigate in this study, because an elaborate investigation of the problem dependent best choice of operators and parameters for all three algorithms was beyond the scope of this paper.

333

B. Particle Swarm Optimization PSO [7] is a search technique inspired by the swarming behaviour of animals and human social behaviour. A particle swarm consists of a population of particles, where each particle is a moving object that ’flies’ through the search space and is attracted to previously visited locations with high fitness. Each particle consists of a position vector ~x, which represents the candidate solution, its fitness, a velocity vector ~v and a memory vector p~ of the best candidate solution encountered by the particle with its recorded fitness. The velocity of a particle is updated according to ~v = w~v + ϕ ~ 1 (~ p − ~x) + ϕ ~ 2 (~ pg − ~x) where w is the inertia weight described in [17] and p ~g is the best position known for all particles. ϕ1 and ϕ2 are random values different for each particle and for each dimension. Moreover, particle velocity is limited by a maximum bound ~vmax , which is typically half of the domain size for each dimension of vector ~x. Afterwards, the positions of all particles are updated by ~x = ~x + ~v and then evaluated with the fitness function. Typically, the inertia weight is linearly decreased during the run. All particles interact with each other directly through their attraction to vector p~g . This unconstrained interaction is also known as the star or fully connected neighbourhood topology [6]. Apart from the candidate solutions, all velocity vectors need to be initialized at the beginning of the run by uniformly distributed random numbers in the interval [0, ~vmax ]. After initialization, the memory of each particle is updated and the velocity and position update rules are applied. If a new position vector is outside the domain, it is moved back into the search space by adding the negative distance with which it exceeds the search space to the position vector. This process is applied to all particles and repeated for a fixed number of iterations. C. Evolutionary Algorithm An EA [10] is a search heuristic that evolves a population of candidate solutions with operators for selection and alteration. In our implementation, we use tournament selection of size 2, i.e., for each individual i we select one other individual j of the old population randomly and carry over the individual with the better fitness to the new population. For alteration we use arithmetic crossover and Gaussian mutation as operators. The former works so that each individual i is either carried over to the new population or substituted by the offspring o of i and another randomly selected individual j with crossover probability pc , such that ~xok = wk ~xik + (1 − wk )~xjk where k is an index for each dimension of the candidate solution vectors and all wk are uniform random numbers in [0, 1]. Moreover, we apply Gaussian mutation with a fixed mutation rate and probability pm to each individual i, such that ~xik = ~xik + N (0, 1)σm (~xmaxk − ~xmink ). After initialization and each generation all altered individuals are evaluated and an elite of the n best individuals is determined, which is kept untouched by the mutation and crossover operators during the

next generation. The process is repeated for a fixed number of iterations. III. E XPERIMENTS A. Experimental set-up In our experiments, we investigated the performance of DE, PSO, and DE regarding noisy and non-noisy versions of a set of numerical benchmark functions (see section III-C). Each experiment was repeated 30 times. In all runs, we kept the number of function evaluations for each algorithm constant to provide a fair performance comparison, such that numEval = popSize × numIt × s − numU nchanged with popSize = population size, numIt = number of iterations and s = number of (candidate) solution resamples and numU nchanged = number of unchanged individuals during the run. The latter refers to candidate solutions that were evaluated previously, which we did not re-evaluate if they remained unchanged, such as the elite of the EA population. We used numEval = 100, 000 for the low dimensional Schaffer F6 (2D) and Sphere (5D) functions, and numEval = 500, 000 for 50 dimensional Griewank, Rastrigin F1, and Rosenbrock functions respectively. For each function and algorithm, we ran experiments with s = 1, 5, 20, 50 and 100 times evaluations per candidate solution to estimate the ’true’ (non-noisy) fitness value. Note that, for instance, in experiments with s = 100 the number of iterations has been 100 times smaller than for s = 1 (the population size was constant), i.e., numIt is inversely proportional with s. For the analysis of the results, we have used the ’true’ fitness, i.e., the fitness value of a candidate solution evaluated by the non-noisy version of the benchmark function, which affords the possibility for objective comparison of the result quality. Note that regarding noisy fitness, a good candidate solution might receive worse fitness than a bad candidate solution by chance. Moreover, when recording the best noisy fitness results, data are biased towards the optimization goal. B. Algorithmic settings Table I shows the parameter settings for the search heuristics. We deliberately did not tune the parameters of each heuristic to each problem. This of course would be possible for simple benchmark problems, like those reported in this paper. However, in real-world applications, evaluation time can be substantial, which only allows minor pre-experimentation for tuning. The parameter settings and choice of operators for the EA, DE, and PSO are based on experience and have turned out to yield very satisfying performance results for a wide range of numerical problems. The number of iterations was numIt = numEval/(s × popSize). C. Benchmark functions As benchmark functions we first used noisy and non-noisy versions of the Schaffer F6 and Sphere (5D) problems. The Sphere function has been used in a number of earlier investigations on EA performance for noisy optimization problems (e.g. [2], [8]). Moreover, we ran additional experiments with

334

TABLE I

TABLE II

A LGORITHMIC PARAMETERS . popSize = POPULATION SIZE , CF =

M EAN AND STANDARD ERRORS (±SE) OF THE FINAL RESULTS . f1 =

CROSSOVER FACTOR (DE), f = SCALING FACTOR , w = INERTIA WEIGHT ( ITS VALUE IS LINEARLY DECREASED FROM 1.0 TO 0.7 DURING THE RUN ),

S CHAFFER F6 (2D), f1∗ = S CHAFFER F6 (2D) ( NOISY ), f2 = S PHERE (5D), f2∗ = S PHERE (5D) ( NOISY ), s = NUMBER OF SOLUTION RESAMPLES .

χ = CONSTRICTION COEFFICIENT, ϕmin , ϕmax = LOWER AND UPPER = CROSSOVER

BOUNDS OF THE RANDOM VELOCITY RULE WEIGHTS , pc RATE

(EA), pm =

MUTATION RATE ,

σm = MUTATION VARIANCE , n =

ELITE SIZE .

DE popSize CF f

PSO 50 0.8 0.5

EA

popSize w ϕmin ϕmax

popSize pc pm σm n

20 1.0 → 0.7 0.0 2.0

100 1.0 0.3 0.01 10

the 50 dimensional Griewank, Rastrigin F1, and (generalized) Rosenbrock functions to investigate the performance of the heuristics for harder (high-dimensional) problems. All benchmarks are minimization problems. The non-noisy definitions are as follows:

DE

PSO

EA

f1 f1∗ , f1∗ , f1∗ , f1∗ , f1∗ ,

s=1 s=5 s=20 s=50 s=100

0±0 0.48988±0.00582 0.40360±0.03030 0.16597±0.02753 0.12729±0.01829 0.09795±0.01209

0.00453±0.00090 0.44486±0.01667 0.37603±0.02978 0.19964±0.03328 0.09242±0.01634 0.06972±0.00639

3.33067E-17±0 0.25829±0.03045 0.12859±0.01678 0.06730±0.01066 0.04769±0.00757 0.06277±0.00743

f2 f2∗ , f2∗ , f2∗ , f2∗ , f2∗ ,

s=1 s=5 s=20 s=50 s=100

4.12744E-152±0 0.25249±0.02603 0.13315±0.01266 0.07364±0.00811 0.07004±0.00686 0.08165±0.00800

2.51130E-8±0 0.36484±0.05182 0.16702±0.03072 0.11501±0.01649 0.06478±0.00739 0.07135±0.00938

6.71654E-20±0 0.04078±0.00543 0.02690±0.00363 0.02205±0.00290 0.01765±0.00233 0.03929±0.00396

TABLE III M EAN AND STANDARD ERRORS (±SE) OF THE FINAL RESULTS . f3 = G RIEWANK (50D), f3∗ = G RIEWANK (50D) ( NOISY ), f4 = R ASTRIGIN F1

Schaffer F6 (2D): fSchafferF6 (x1 , x2 ) = 0.5 +

Function

sin2

with −100 ≤ x1 , x2 ≤ 100



(50D), f4∗ = R ASTRIGIN F1 (50D) ( NOISY ), f5 = ROSENBROCK (50D), f5∗ = ROSENBROCK (50D) ( NOISY ),s = NUMBER OF SOLUTION RESAMPLES .

 2

x21 +x2 −0.5

(1+0.001(x21 +x22 ))

2

Sphere (5D): P 5 fSphere(~x) = i=1 x2i with −100 ≤ xi ≤ 100 Griewank (50D): P50 Q50 1 √ fGriewank (~x) = 4000 · i=1 (xi −100)2 − i=1 cos( xi −100 )+1 i with −600 ≤ xi ≤ 600 Rastrigin F1 (50D): P50 fRastriginF1 (~x) = 200 + i=1 x2i − 10 · cos(2πxi ) with −5.12 ≤ xi ≤ 5.12 Rosenbrock (50D): P 2 2 2 fRosenbrock (~x) = 49 i=1 [100(xi+1 − xi ) + (xi − 1) ] with −50 ≤ xi ≤ 50 The noisy versions of the benchmark functions are defined as: fNoisy (~x) = f (~x) + N (0, 1) with N (0, 1) = Normal (or Gaussian) distribution with mean 0 and variance 1. IV. R ESULTS The results reported in this section refer to the ’true’ (nonnoisy) evaluation of candidate solutions. Tables II and III summarize all final results of our experiments for the low and high-dimensional benchmarks respectively. In all figures, the graphs represent the mean of the best ’true’ (non-noisy) evaluation in 30 repeated runs over the number of evaluations (not iterations). Note that there was not enough space to show the graphs for the Rosenbrock function. Standard errors of the final results are shown in the tables.

Function

DE

PSO

EA

f3 f3∗ , f3∗ , f3∗ , f3∗ , f3∗ ,

s=1 s=5 s=20 s=50 s=100

0±0 3.31514±0.07388 2.42183±0.03616 2.67093±0.03895 46.8197±0.96449 233.802±6.25840

1.54900±0.06695 11.2462±0.50951 16.6429±0.70800 85.4865±2.13148 143.021±2.33228 194.188±4.90959

0.00624±0.00138 1.14598±0.00307 1.10223±0.00342 1.44349±0.01381 3.69626±0.13127 18.0858±0.99646

f4 f4∗ , f4∗ , f4∗ , f4∗ , f4∗ ,

s=1 s=5 s=20 s=50 s=100

0±0 2.35249±0.06062 14.0355±0.47935 167.628±2.12569 314.762±2.88650 438.036±3.67504

13.1162±1.44815 55.9704±2.19902 160.500±2.67500 313.184±3.93659 380.178±4.88706 418.265±5.35434

32.6679±1.94017 30.7511±1.32780 31.4725±2.02356 39.1777±2.11529 74.8577±2.69437 147.800±2.93208

f5 f5∗ , f5∗ , f5∗ , f5∗ , f5∗ ,

s=1 s=5 s=20 s=50 s=100

35.3176±0.27444 47.6188±0.15811 47.0404±0.13932 7917.46±352.851 1.65E+7±903677 2.98E+8±1.04E+7

5142.45±2929.47 4884.68±886.599 368512±39755.5 1.61E+7±1.18E+6 5.57E+7±2.38E+6 1.17E+8±7.38E+6

79.8180±10.4477 118.940±13.2322 341.788±49.6738 1859.06±261.844 35477.7±4656.17 257488±19371.2

The top figures in each column (figs. 1, 5, 9 and 13) show a comparison between DE, PSO and EA performance for the non-noisy functions. The three lower figures in each column show a performance comparison of experiments with s=1, 5, 20, 50, and 100 candidate solution resamples for noisy benchmarks using DE, PSO and EA respectively. As expected, regarding the noisy benchmark experiments, all heuristics performed much worse compared to the results

335

0.01

1e-07 DE PSO EA

0.009

8e-08

0.007

7e-08

0.006

6e-08

f_Sphere(5D)

f_SchafferF6

0.008

0.005 0.004

3e-08 2e-08

0.001

1e-08 0 20000

40000 60000 Number of evaluations

80000

100000

0

DE versus PSO and EA for the non-noisy Schaffer F6.

Fig. 5.

DE versus PSO and EA for the non-noisy Sphere (5D). 0.7

s=1 s=5 s=20 s=50 s=100

0.6

s=1 s=5 s=20 s=50 s=100

0.6 0.5

f_Noisy_SchafferF6

0.5

f_Noisy_SchafferF6

100000 Number of evaluations

0.7

0.4 0.3 0.2 0.1

0.4 0.3 0.2 0.1

0

0 0

Fig. 2.

20000

40000 60000 Number of evaluations

80000

100000

0

DE performance; Noisy Schaffer F6.

Fig. 6.

0.7

20000

40000 60000 Number of evaluations

80000

100000

DE performance; Noisy Sphere (5D).

0.7 s=1 s=5 s=20 s=50 s=100

0.6

s=1 s=5 s=20 s=50 s=100

0.6 0.5

f_Noisy_Sphere(5D)

0.5

f_Noisy_SchafferF6

4e-08

0.002

0

0.4 0.3 0.2 0.1

0.4 0.3 0.2 0.1

0

0 0

Fig. 3.

20000

40000 60000 Number of evaluations

80000

100000

0

PSO performance; Noisy Schaffer F6.

Fig. 7.

0.7

20000

40000 60000 Number of evaluations

80000

100000

PSO performance; Noisy Sphere (5D).

0.7 s=1 s=5 s=20 s=50 s=100

0.6

s=1 s=5 s=20 s=50 s=100

0.6 0.5

f_Noisy_Sphere(5D)

0.5

f_Noisy_SchafferF6

5e-08

0.003

0

Fig. 1.

DE PSO EA

9e-08

0.4 0.3 0.2 0.1

0.4 0.3 0.2 0.1

0

0 0

Fig. 4.

20000

40000 60000 Number of evaluations

80000

100000

0

EA performance; Noisy Schaffer F6.

Fig. 8.

336

20000

40000 60000 Number of evaluations

80000

100000

EA performance; Noisy Sphere (5D).

2

50 DE PSO EA

1.8 1.6

40 35

f_Rastrigin F1(50D)

f_Griewank(50D)

1.4 1.2 1 0.8 0.6

25 20 15 10

0.2

5 0 0

100000

200000 300000 Number of evaluations

400000

500000

0

DE versus PSO and EA for the non-noisy Griewank(50D).

Fig. 13.

100000

200000 300000 Number of evaluations

400000

500000

DE versus PSO and EA for the non-noisy Rastrigin F1 (50D).

500

1000 s=1 s=5 s=20 s=50 s=100

400 350

s=1 s=5 s=20 s=50 s=100

800

f_Noisy_Rastrigin F1 (50D)

450

f_Noisy_Griewank (50D)

30

0.4

0

Fig. 9.

DE PSO EA

45

300 250 200 150 100

600

400

200

50 0

0 0

Fig. 10.

100000

200000 300000 Number of evaluations

400000

500000

0

DE performance; Noisy Griewank (50D).

Fig. 14.

500

200000 300000 Number of evaluations

400000

500000

DE performance; Noisy Rastrigin F1 (50D).

1000 s=1 s=5 s=20 s=50 s=100

400 350

s=1 s=5 s=20 s=50 s=100

800

f_Noisy_Rastrigin F1 (50D)

450

f_Noisy_Griewank (50D)

100000

300 250 200 150 100

600

400

200

50 0

0 0

Fig. 11.

100000

200000 300000 Number of evaluations

400000

500000

0

PSO performance; Noisy Griewank (50D).

Fig. 15.

500

200000 300000 Number of evaluations

400000

500000

PSO performance; Noisy Rastrigin F1 (50D).

1000 s=1 s=5 s=20 s=50 s=100

400 350

s=1 s=5 s=20 s=50 s=100

800

f_Noisy_Rastrigin F1 (50D)

450

f_Noisy_Griewank (50D)

100000

300 250 200 150 100

600

400

200

50 0

0 0

Fig. 12.

100000

200000 300000 Number of evaluations

400000

500000

0

EA performance; Noisy Griewank (50D).

Fig. 16.

337

100000

200000 300000 Number of evaluations

400000

500000

EA performance; Noisy Rastrigin F1 (50D).

for the non-noisy optimization. For the non-noisy versions of all benchmarks, DE achieved the best results of all heuristics in terms of accuracy and variance of the final result as well as the convergence speed. For the Schaffer F6 and Sphere 5D, investing more computation time in evaluations for resampling while proportionally sacrificing search iterations (for fixed population sizes) improved the results for all algorithms up to s=50 for the Sphere (5D), and up to s=50 for the EA and s=100 for the DE and PSO for the Schaffer F6 (2D). For an illustration regarding Schaffer F6 see Figures 2–4. In contrast to the results for the non-noisy optimization, DE produced consistently worse results than the EA when noise was present. In comparison to the PSO, the DE results were roughly comparable for the noisy Schaffer F6 and Sphere (5D). For the 50 dimensional Griewank, Rastrigin F1 and Rosenbrock functions, we found the following: For the noisy Griewank (50D) function, the DE performance was again worse than for the EA, but better than the PSO for all resampling values. However, for the noisy Rastrigin F1 (50D) and noisy Rosenbrock (50D) functions, DE achieved better results than the EA and PSO with low resampling values (see Table III). On first sight this result might appear to contradict the findings for the other test functions. However, this is not the case. A closer look at the data shows that the PSO and EA had problems to achieve results close to the optimal value 0. In fact, the PSO stagnated early in the runs for the noisy Griewank (see Fig. 11), noisy Rastrigin F1 (see Fig. 15) and noisy Rosenbrock function (figure not shown). The mean of the best results for the PSO were 11.25, 13.12 and 5142 respectively (see Table III), whereas the optimum for all three functions is at 0. Also the EA prematurely stagnated for the noisy Rastrigin F1 (see Fig. 16) and Rosenbrock functions (figure not shown) with mean best results of 32.7 and 79.8, but got quite close to the optimal fitness for the noisy Griewank function (see Fig. 12) with a result of 1.15 for s=1 and 1.10 for s=5 respectively (see Table III). Note, that the noise caused by the normal distribution that we used for the noisy benchmark functions had a standard error of 1, i.e. 95% of the added variance was in the interval [−2, 2]. Therefore, noise had only a significant effect on the performance of the search heuristics when the fitness differences between individuals were small compared to the noise variance of 1 standard error. In other words, DE produced superior results for the noisy Rastrigin F1 and Rosenbrock functions, because both PSO and EA performance stagnated early before reaching the noise critical aspect of these two functions. V. D ISCUSSION Our results confirm that DE is a search heuristic with remarkable performance in numerical optimization. However, the results of our experiments with noisy problems indicate that the performance of a conventional DE is worse compared to a conventional EA and comparable to PSO when the fitness of candidate solutions approaches the fitness variance caused by the noise. In other words, for fitness functions with substantial noise, a conventional DE is not a good approach

to achieve results with high accuracy. To our knowledge, this paper is the first to recognize this DE deficiency. However, we would like to point out that we only used resampling as a strategy to cope with noise in this comparison. Other techniques, such as thresholding might work better for DE in comparison to the other algorithms, which should be further investigated. Why is noise a problem for DE? One main reason might be the deterministic selection and the lack of a mutation operator, which makes DE ’greedy’. In fact, DE relies on an iterative approximation of the features of the search space based on the distribution of its individuals. Noise seems to confuse the algorithm substantially. However, when noise plays a rather minor role compared to the complexity of the problem, such as for the Rosenbrock (50D) with Gaussian noise N (0, 1), DE can still be a good choice, since alternative search heuristics such as EAs and PSO might stagnate long before the effect of noise becomes significant. Another result of our study is that frequent resampling, while (proportionally) sacrificing search iterations (for equal number of total evaluations) can greatly pay-off when noise matters. As mentioned in the introduction, Darwen [3] suggests to make the population as large as possible. However, a larger population size means proportionally more evaluations of candidate solutions. For DE we experimented with a population size of 100 instead of 50 (again for the same total number of evaluations) and found that this made its performance worse for both test functions (data not shown). Compared to DE, our EA implementation seems to be less badly affected by noise. This is perhaps not so surprising, since our EA uses a Gaussian mutation operator with a fixed mutation variance in order to maintain a certain amount of exploration throughout the run. Our parameter and operator choices for the EA implementation are based on conclusions from numerous investigations and can solve many numerical problems well without the need of elaborate parameter tuning. Its large elite of 10% of the population, the representation of candidate solutions by floating point numbers, and the rather high mutation variance make the EA rather similar to ES and EP. The PSO results of our investigation are rather disappointing compared to the EA and DE. In general, the performance of PSO is known to be particularly parameter sensitive [22]. In our investigation we used just one setting for all experiments, which we have identified as a good compromise for most numerical optimization problems. However, we are aware that the performance can be greatly improved by elaborate problem specific tuning, which might explain the rather poor results. Note that we deliberately did not tune the parameters of any of the three heuristics, since in many realistic applications the evaluation time for a single candidate solutions can be very long (e.g. minutes) and thus no elaborate parameter pre-tuning can be carried out. The results from the 50D problems that we report in this paper, raise the question under which conditions noise really matters for the choice of the search heuristic. This is indeed an important question, in particular for real-world applications. In

338

general, one should note that noise only really matters if the required accuracy of the optimization result is in the ball park of the variance that is caused by the noise source. In a realworld application, one could estimate the variance of the noise based on samples of different and identical candidate solutions. However, one would have to assume that this variance is roughly the same for all candidate solutions and that it follows a certain distribution (such as the normal distribution). If the search problem is likely to be hard and noise is only a minor factor, a DE still appears to be a good choice, since it does not require elaborate tuning, it is very easy to implement, and the performance is very reliable compared to other search heuristics. However, if noise is substantial compared to the required accuracy of the results, our experiments indicate that one should rather use an EA instead of a conventional DE. One possibility that we have not investigated in this study is to extend the conventional DE approach by a threshold selection operator as suggested for ES [8] for applications to noisy problems. Another possibility would be to implement a noisy mutation operator that introduces additional exploration or to use stochastic instead of deterministic selection. VI. C ONCLUSION In our study, we have made a preliminary investigation on the effect of noise in numerical optimization problems on the performance of conventional DE. Our results confirm that DE is a very powerful heuristic for non-noisy optimization problems, but that noise is indeed a serious problem for conventional DE, when the fitness of candidate solutions approaches the fitness variance caused by the noise. This deficiency of DE should be investigated further. Possible solutions to achieve better results with DE for noisy problems could be to introduce a modified selection operator, such as thresholding or stochastic selection, or to use a mutation operator in order to maintain a higher level of exploration during the search. ACKNOWLEDGMENT The authors would like to thank the Danish Research Council for financial support (EVALife project) and the Slovenian Ministry of Education, Science and Sport for the support of Slovenian-Danish scientific exchange (project Evolutionary Optimization of Dynamic Systems). R EFERENCES [1] B RANKE , J. Creating robust solutions by means of evolutionary algorithms. In Parallel Problem Solving from Nature V (1998), vol. 1498 of Lecture Notes in Computer Science, Springer Verlag, pp. 119–128. [2] B RANKE , J., S CHMIDT, C., AND S CHMECK , H. Efficient fitness estimation in noisy environments. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001) (2001), Morgan Kaufmann, pp. 243–250. [3] D ARWEN , P. J. Computationally intensive and noisy tasks: Coevolutionary learning and temporal difference learning on Backgammon. In Proc. of the 2000 Congress on Evolutionary Computation (2000), IEEE Press, Piscataway, NJ, USA, pp. 872–879. [4] F OGEL , L. J., O WENS , A. J., AND WALSH , M. J. Artificial Intelligence through Simulated Evolution. John Wiley & Sons, New York, 1966. [5] H OLLAND , J. H. Adpatation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, 1975.

[6] K ENNEDY, J. Small worlds and mega-minds: Effects of neighborhood topology on particle swarm performance. In Proceedings of the Congress of Evolutionary Computation (6-9 July 1999), vol. 3, IEEE Press, Piscataway, NJ, USA, pp. 1931–1938. [7] K ENNEDY, J., AND E BERHART, R. C. Particle swarm optimization. In Proceedings of the 1995 IEEE International Conference on Neural Networks (Perth, Australia, IEEE Service Center, Piscataway, NJ, 1995), vol. 4, pp. 1942–1948. [8] M ARKON , S., A RNOLD , D. V., BAECK , T., B EIELSTEIN , T., AND B EYER , H.-G. Thresholding - a selection operator for noisy es. In Proceedings of the 2001 Congress on Evolutionary Computation CEC2001 (27-30 May 2001), IEEE Press, Piscataway, NJ, USA, pp. 465–472. [9] M ATSUMURA , Y., O HKURA , K., AND U EDA , K. Evolutionary dynamics of evolutionary programming in noisy environment. In Proceedings of the 2001 Congress on Evolutionary Computation CEC2001 (2001), IEEE Press, Piscataway, NJ, USA, pp. 17–24. [10] M ICHALEWICZ , Z., AND F OGEL , D. B. How to Solve It: Modern Heuristics. Springer, Berlin, 2000. [11] PARSOPOULOS , K. E., AND V RAHATIS , M. N. Particle swarm optimizer in noisy and continuously changing environments. Artificial Intelligence and Soft Computing (2001), 289–294. [12] PATERLINI , S., AND K RINK , T. High performance clustering using differential evolution. In Proceedings of the Six Congress on Evolutionary Computation (CEC-2004) (in press), IEEE Press, Piscataway, NJ, USA. [13] P RICE , K. V. An introduction to differential evolution. In New Ideas in Optimization, D. Corne, M. Dorigo, and F. Glover, Eds. McGraw-Hill, London, 1999, pp. 79–108. [14] R ECHENBERG , I. Evolution strategy: Optimization of technical systems by means of biological evolution. Fromman-Holzboog, Stuttgart, 1973. [15] RUDOLPH , G. A partial order approach to noisy fitness functions. In Proceedings of the 2001 Congress on Evolutionary Computation CEC2001 (2001), IEEE Press, Piscataway, NJ, USA, pp. 318–325. [16] S ANO , Y., AND K ITA , H. Optimization of noisy fitness functions by means of genetic algorithms using history of search with test of estimation. In Proceedings of the 2002 Congress on Evolutionary Computation CEC2002 (2002), IEEE Press, Piscataway, NJ, USA, pp. 360–365. [17] S HI , Y., AND E BERHART, R. C. Parameter selection in particle swarm optimization. In Evolutionary Programming VII (Berlin, 1998), V. W. Porto, N. Saravanan, D. Waagen, and A. E. Eiben, Eds., Springer, pp. 591–600. Lecture Notes in Computer Science 1447. [18] S TORN , R., AND P RICE , K. Differential evolution - a simple and efficient adaptive scheme for global optimization over continuous spaces. Tech. Rep. TR-95-012, International Computer Science Institute, Berkley, 1995. [19] TAMAKI , H., AND A RAI , T. A genetic algorithm approach to optimization problems in an uncertain environment. In Proceedings of the 1997 International Conference on Neural Information Processing and Intelligent Information Systems (1997), vol. 1, pp. 436–439. [20] T HOMSEN , R. Flexible ligand docking using differental evolution. In Proceedings of the Fifth Congress on Evolutionary Computation (CEC2003) (2003), vol. 4, IEEE Press, Piscataway, NJ, USA, pp. 2354–2361. [21] U RSEM , R. K., AND VADSTRUP, P. Parameter identification of induction motors using differential evolution. In Proceedings of the Fifth Congress on Evolutionary Computation (CEC-2003) (2003), IEEE Press, Piscataway, NJ, USA, pp. 790–796. [22] V ESTERSTRØM , J. S., R IGET, J., AND K RINK , T. Division of labor in particle swarm optimisation. In Proceedings of the Fourth Congress on Evolutionary Computation (CEC-2002) (2002), vol. 2, IEEE Press, Piscataway, NJ, USA, pp. 1570–1575. [23] W OLPERT, D. H., AND M ACREADY, W. G. No free lunch theorems for optimization. IEEE Trans. on Evolutionary Computation 1, 1 (1997), 67–82.

339

Suggest Documents