Evolution Strategies and Composed Test Functions I. De Falco, R. Del Balio, A. Della Cioppa and E. Tarantino Institute for Research on Parallel Information Systems (IRSIP) National Research Council of Italy (CNR) Via P. Castellino 111 80131 Naples ITALY e{mail fivan,renato,dean,
[email protected]
Abstract | Classical test functions known as F1 to F10 are
often used as a benchmark for heuristic optimisation techniques. Recently it has been remarked in literature that relying on the results obtained for their optimisation may be improper or even misleading to assess the eciency of a technique with respect to others, and a method for developing new more reliable test functions has been given. In this paper a well known class of heuristic strategies, the Evolution Strategies, is considered for the solution of one such function and its results are compared against those obtained by classical discrete Genetic Algorithms, adaptive search strategies and hill{climbing methods. Keywords | Evolution Strategies, Genetic Algorithms, Adaptive Search Strategies, Hill{Climbing, Composed Test Functions. I. Introduction
With the de nition and implementation of heuristic optimisation techniques, like Genetic Algorithms (GAs) [1], [2] and Evolution Strategies (ES) [3], [4], [5], researchers started in the 70es to introduce test functions to compare techniques and their variants. So a test function set was gradually increased with the aim to provide scientists of the eld with a suitable evaluation and measurement tool. This set grew with time because every now and then researchers designed some new function, but nobody took care of the real form of those functions. De Jong, in 1975, tried to create a set with ve functions, widely known as F1 to F5 [6], representing dierent classes of problems. Later on, people like Griewangk, Schwefel, Rastrigin, and many others introduced in literature new functions (F6, F7 and F8, respectively [7]), and the whole world of heuristic optimisers used them. To date, more than twenty functions are worldwide known and used to tune and compare search strategies. Recently, Whitley and his colleagues [7] have considered by looking at their mathematical form that these test functions are not as well suited for these purposes as supposed, since many of them are not as dicult as expected, nonetheless some of their features are interesting. Because of this, they have suggested a way to create new test functions, composing these old ones so building more dicult ones however still retaining their positive features. As an example of heuristic optimisation techniques we wish to consider within this paper ES. They were designed and implemented in the 60es in Germany by H.P. Schwefel and I. Rechenberg. Since then ES have proved their ability in solving many classes of multivariable problems, both continuous and discrete. Their birth and development was independent of that happening contemporaneously for GAs and, in fact, they share some ideas like evolution of a population of possible solutions but dierent was the representation for the variables of the problem (direct real values in
ES, binary encoding for earlier GAs), dierent were the selection mechanisms (deterministic in ES, probabilistic in GAs) and the operators used to create new individuals, and dierent was the underlying search philosophy (mainly based on mutation for ES, driven mostly by crossover for GAs). Nevertheless, they both were widely applied with encouraging results in several elds. In this paper we wish to apply ES to the solution of one of the new test functions designed by Whitley, namely the F8F2 composed function. In fact, while on the classical test functions much work has been done by supporters of both ES and GAs, on this new set of functions, due to their recent introduction, only results for GAs are available. Our paper brie y describes in section II the ES in their basic form, while in section III the seminal work by Whitley is summarised and the F8F2 function is de ned. In section IV our experimental results are reported and compared against those found in literature by means of discrete GAs, adaptive search strategies and hill{climbing methods as well. Section V contains our conclusions and foreseen future work in this eld. II. Evolution Strategies
ES have evolved a lot during these years, so many possible forms and variants for them are known. We will try in this section to describe the most commonly used; the interested reader may nd an excellent formal presentation and overview in [5]. Let us suppose to have a mathematical model for the optimisation problem under consideration in the form U (x1; : : :; xn) ?! min
where U is a scalar potential function de ned over an n{ dimensional space fx1 ; : : :; xng. For this type of problems ES can handle directly real variables representing them as real numbers, without encoding them as it happens in classical discrete GAs. Furthermore the tness function of an individual a is identical to its scalar potential function, i.e. (a) = U (a). Another very important feature of ES is that each individual in the population contains not only the n real variables representing the problem (`object variables'), but also the strategy parameters. These parameters represent the variances i and covariances i of the n{dimensional Gaussian mutation probability density used for exploring the space of the object variables. Because of this, the general form of an individual in ES may be brie y represented as follows: a = (x; ; ) We have to decide how many probability distributions we wish to use and how strictly should these
distributions be correlated. So, we have to consider a number of, say, n variances and n correlations. As a conclusion an individual in ES will consist of n + n + n components, let us consider this number equal to `. It is beyond the scope of this paper to discuss the best value for these two parameters n and n and how they in uence the search, what is important to point out here is that the larger the n the wider the search, and the higher the n the less the search will be trapped in local optima, since it will try to optimise in more directions at the same time, rather than optimising independently in each dimension. Basically, ES start with a population of, say, elements. Starting from them a number of new elements are created by means of recombination and mutation. Extinctive (i.e. deterministic) selection is then performed to obtain the new population of elements again; this may take place in two basic ways, called respectively (; ){ES (or comma strategy) and ( + ){ES (or plus strategy). In the former, with > , the best are deterministically selected out of the constituting the current population, while in the latter, with > 1, the choice is performed among both the and the elements, so that elements belonging to the previous population can survive if they are strong enough. More recently, an extension on the two versions of ES, the (; k; ; ){ES, has been set up [8]. In it a life span value k 1, representing the maximum number of generations one individual can stay alive even if very strong, has been introduced. In this way the two original strategies are special cases of the (; k; ; ){ES (k = 1 ! (; ){ ES, k = 1 ! ( + ){ES). Furthermore it has been considered a free number of parents, tournament selection as alternative to truncation selection, free probabilities for recombination and mutation and further recombination mechanisms including crossover. The elements undergo recombination and mutation to create the new individuals. Recombination can be performed in many ways, in all of them however the probability of undergoing recombination is the same for all the individuals. Basically, it is applied with probability pr and starts from two randomly chosen individuals x = (x1 ; : : :; x` ) and y = (y1 ; : : :; y` ) and creates a new individual z = (z1 ; : : :; z` ) according to one of the following rules: 8 x _y discrete recombination < i i intermediate recombination zi = (xi + yi )=2 : xi + (1 ? ) yi random intemediate where is a uniform random real variable in [0:0; 1:0]. These operators can have either dual or global form, the former meaning that to create a new element only two parents are taken into account and all the components of the ospring are created by applying the chosen operator on the corresponding components of the parents, while in the latter to create a new element for each of its components a number of parents ( ) are randomly chosen and on the corresponding components recombination is carried out. After recombination, mutation is performed with a given probability pm on each
element z = (z1 ; : : :; z` ) so as to obtain a new element w = (w1; : : :; w`). This requires more steps as stated in the following. Firstly, we need to mutate the recombined variances: i0 = i exp (zi + z0 )
with z0 2 N(0; 02 ); zi 2 N(0; 2 ) i = 1; : : :; n , (N(xi ; 2 ) denotes the normal distribution with mean xi and variance 2 ). Secondly, we mutate the recombined covariances: 0i = i + zi
with zi 2 N(0; 2 ) i = 1; : : :; n. Finally, we mutate the recombined object variables using the recombined and already mutated variances and covariances: x0i = xi + cori (i0 ; 0i)
where cor is a random vector with normally distributed, eventually correlated components, using the and obtained in the previous steps. III. Test functions and their features
Test functions have often been used to tune and re ne variants of a single evolutionary algorithm and to argue the superiority of one approach over another. As Whitley and others state in [7], the danger in this practice is that algorithms can become customised for a particular set of test problems; this is troubling if the test problems do not represent the types of problems that evolutionary algorithms are best suited for in practice. An excellent overview of why the usual set of test functions (F1 to F10) is not really dicult and meaningful can be found in the above mentioned paper. We wish here to simply recall the basic points of that paper, expressing the limitations of the widely known test functions. A. Separability
Most of the classical test problems are separable, i.e. there are no nonlinear interactions between variables. This means that they can be solved by simply optimising independently in each dimension; as a consequence exact search methods exist that require only O(n) function evaluations to completely solve, as an example, F6 and F7. Furthermore, it is remarked in that paper that the F8 function becomes easier as its dimensionality increases. B. Symmetry
For 2{dimensional functions symmetry exists if F (x; y) = F (y; x). All the classical test functions are symmetric. If a function is separable and symmetric, it can also display increased symmetry at higher dimensions. If a function possesses this feature, then for a function of n variables up to n! equivalent solutions may exist, so the problem is actually easier.
either discrete Genetic Algorithms like the classical Simple It is a desired feature, so as to allow search algorithms Genetic Algorithm with Elitism and Tournament Selection to be tested on problems with progressively higher (ESGAT) [2] and Genitor by Whitley [9], or adaptive search dimensionality. The problem with the classical test algorithms like Eshelman's CHC [10], or hill-climbingfunctions is that nonlinear interactions themselves should based ones like the Random Bit Climber RBC by Davis [11] and the Line Search Algorithm by Whitley [7]. CHC, be sensitive to scaling. Moreover, in that paper the authors state that test suites in particular, has many features in common with GAs should contain problems having all the following features: (importance of recombination) but it has also features resistant to hill{climbing (i.e. to simple search strategies), which would classify it as a ( + ){ES. We have taken nonlinear, nonseparable and nonsymmetric; furthermore, into account several dimensions for this function, namely they should contain scalable functions and problems with 10, 20, 50 and 100 (the same accounted for in [7]). We have run our ( + ){ES on a RISC processor (actually both scalable evaluation cost and a canonical form. Finally, in that paper it is shown how to compose a node of the parallel machine Meiko Computing Surface the already existing test functions to create new, more CS2). In order to obtain average values our program has reliable ones showing the desired features of nonlinearity, been executed 30 times for each size (like they did for nonseparability and scalability. The basic idea is to use the other techniques), by assigning dierent values to the a nonlinear function of two variables F (x; y) as a starting random number generator seed. We have stopped our ES function. The function can then be scaled to n variables after 500,000 evaluations to obtain results consistent with by expanding it in such a way that the newly obtained those in [7]. As regards the parameter set, values of 15 expanded function EF is no longer separable. The new EF and 100 have been used for and respectively. These are quite reasonable default values which can be found in function has the general form: several papers on ES; they respect a value of about 1=7 for the ratio = which is suggested by experts in ES. We EF (x1; x2; : : :; xn) = F (x1; x2) + F (x2; x3) + : : : have set n = n and n = 0 (this means no correlation +F (xn?1; xn) + F (xn; x1) among object variables has been taken into account). The This can be done by writing down a bidimensional table last choice depends on the fact that in order to completely containing all the possible combinations for the variables correlate n variables one should consider n (n ? 1)=2 x1; : : :; xn taken two by two, de ning a weight for each covariances; this would mean that for the 50{sized problem such combination and taking into account all the possible we would need 50 49=2 = 1225 covariances, and for combinations according to prede ned strategies like minor n = 100 this number increases up to 100 99=2 = 4450, diagonal scaling (`wrap'), upper or lower triangle or full thus the individual length would become excessive. We matrix. Starting from the same fundamental function we might have made some choice and decided to correlate will achieve dierent composed functions based on the only some variables rather than others, but we did not strategy chosen and on the weights, containing O(n) terms have any knowledge of the problem, so we did not know for wrap and O(n2 ) terms in the other cases. As an which directions to prefer in the linkage, if any. With these example of this strategy, in the above referenced paper premises, our individuals are represented by vectors of 2 n the authors propose a composition which starts with a real values, the former n representing the object variables primitive function of one variable and composes it with (varying in the range [?2:048; 2:048]) and the latter n the an inner function that takes in two variables and outputs variances (varying between 0.0 and an upper limit which a single value which falls into the domain of the outer has become a parameter itself, though, according to theory, primitive function. Going into details, they have composed ES are capable of self{adjusting this value). In table I we the Griewangk function (F8) with De Jong's F2 in the report the average nal values for all the techniques. following manner (with wrap expansion): As it can be seen, ES always outperform all the dierent forms of discrete Genetic Algorithms, and are about F 8F 2(x1; x2; x3; : : :; xn) = F 8(F 2(x1; x2))+ comparable to RBC apart from the case n = 10, where F 8(F 2(x2; x3)) + : : : + F 8(F 2(xn?1; xn)) + F 8(F 2(xn; x1)) the latter algorithm found an extraordinarily good mean It is nonsymmetric (because F2 is nonsymmetric) with value. As the function size increases the superiority of many local minima, nonlinear, scalable in n and avoids ES with respect to GAs is more and more evident. The the scale-up problems of the F8, since this is used in its comparison against the hillclimbing{based methods shows 1{dimensional form. that the Line method is, as a general trend, better than ES. For the size of 100 Line is not able to obtain excellent IV. Experimental results values, and this depends on its structure which limits the We have used the F8F2 function de ned in the previous search in this case due to the limit on the maximumallowed section to test the eectiveness of ES both in absolute number of evaluations. In this case the ES obtain results terms (the global best is equal to 0 and is in the origin quite close to Line, while CHC is by far the best. The regardless of the function size) and with reference to the comparison with RBC, instead, shows that for small sizes results obtained by Whitley and his colleagues in [7] using this one is better than ES, but as the size increases the other heuristic techniques. Namely those techniques are two achieve quite similar results and for the largest size C. Scalability
10 average sigma CHC 1.344 0.921 RBC 0.139 0.422 ESGAT 4.077 2.742 Genitor 4.365 2.741 Line 3.0294 5.1526 ES 2.117 1.417
20 average sigma 5.630 2.862 7.243 11.289 47.998 32.615 21.452 19.459 2.5025 3.2801 8.852 21.698
TABLE I
average 75.0995 301.561 527.100 398.120 2.6519 141.679
50
100 average sigma 670.223 377.59 1655.557 605.268 2991.89 596.470 2844.389 655.159 1606.4 1583.5 1579.264 631.193
sigma 49.644 72.745 176.988 220.284 2.5583 155.052
The average final results obtained by several techniques on the F8F2 function.
V. Conclusions and future work
In this paper an approach for writing new more dicult test functions has been reported, and one of such functions has been widely investigated by using Evolution Strategies. The obtained results have been compared against those achieved for the same function by using discrete Genetic Algorithms, adaptive search strategies and hill{climbing methods as well. The comparison shows that Evolution Strategies perform better than discrete Genetic Algorithms on this problem, and the dierence increases as the function size does. Furthermore, they become more and more attractive even with respect to the other techniques, as the problem dimension increases. Our future work in this eld will consist rstly in implementing some kind of continuous or real{coded Genetic Algorithm so as
10000 best fitness
1000
Fitness
ES become preferable. So, the results for ES are quite encouraging, as they look more and more eective when the problem size becomes higher, while other techniques, very powerful for small sizes, tend to lose their eciency. Nonetheless, we have also tried to tune the parameters, and we have found that smaller values for and (10 and 50, respectively) lead to improve the performance, at least for this function. In fact, really excellent values, especially for the 50{ and 100{sized problems, have been found as it can be seen in Table 2. In this case, we report both the average nal value and the ve nal values found for each problem. From Table 2 it appears evident that ES, once tuned, can reach about the same performance as hill{climbing based techniques in the whole size range, and outperform many of them at the highest problem sizes, and that for any problem instance they are almost equivalent to the best technique for that instance, thus their robustness is demonstrated. In Figg. 1 to 4 we sketch the evolution of the best tness value as a function of the generation number for the best run obtained for problem instances of size 10, 20, 50 and 100 respectively. From them it can be seen that there is an initial phase (about 5% ? 10% of the generations) in which the tness values decrease very fastly (about 90% of the distance between the initial best and the global best is covered in this phase). During the remaining generations, instead, a further improvement is obtained quite slowly, so that ES seem very well suited if we need to nd a `reasonably good' solution within a low number of evaluations.
100
10
1 0
1000
2000
3000
4000
5000 6000 Generation
7000
8000
9000
10000
Fig. 1. The evolution of the best solution obtained for the 10{sized function.
to compare our results against those obtained by using this latter technique. Moreover, we plan to implement more such composed functions to realise if the results of this comparison are more general than yielded by the consideration of just this function. Acknowledgement
We wish to wholeheartedly thank Prof. Darrell Whitley, K. Mathias, S. Rana and J. Dzubera for the results contained in their paper [7] and reported by us in this paper, and for their kindness in sending us some of their most important recent papers on the test function problem. [1] [2] [3]
References
J. H. Holland, Adaptation in Natural and Arti cial Systems. MIT Press, 1975. D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, Mass., 1989. H. P. Schwefel, \Numerische Optimierung von Computer{ Modellen mittels der Evolutionsstrategie," in Interdisciplinary Systems Research, Birkhauser, Basle, Switzerland, 1977. [4] I. Rechenberg, Evolutionsstrategie: Optimierung technischer Systems nach Prinzipien der biologischen Evolution. Fromman Holzboog Verlag, Stuttgart, Germany, 1973. [5] T. Back and H. P. Schwefel, \A Survey of Evolution Strategies," in Proceedings of the Fourth International Conference on Genetic Algorithms, (Booker and Belew, eds.), pp. 2{9, M. Kaufmann Publisher, 1991. [6] K. De Jong, \An Analysis of the Behavior of a Class of Genetic Adaptive Systems," in PhD thesis, University of Michigan, 1975.
10 average 0.9865 sigma 0.0001 run 1 0.9864 run 2 0.9864 run 3 0.9865 run 4 0.9865 run 5 0.9867
20 3.9679 0.0208 3.9476 3.9523 3.9626 3.9765 4.0005
50 32.3395 2.5742 28.74441 31.60573 32.06928 33.50923 35.76927
TABLE II
100 840.1513 355.3459 255.4123 844.2608 861.4934 1064.037 1175.553
The average, the sigma and the final values (over five runs) achieved by using = 10 and = 50.
10000 best fitness
100000 best fitness
1000
Fitness
Fitness
10000 100
1000
10
100 1 0
1000
2000
3000
4000
5000 6000 Generation
7000
8000
9000
10000
Fig. 2. The evolution of the best solution obtained for the 20{sized function.
0
1000
2000
3000
4000
5000 6000 Generation
7000
8000
9000
10000
Fig. 3. The evolution of the best solution obtained for the 50{sized function.
1e+06 best fitness
100000
Fitness
[7] D. Whitley, K. Mathias, S. Rana, and J. Dzubera, \Evaluating Evolutionary Algorithms," accepted for publication in Arti cial Intelligence, 1996. [8] H. P. Schwefel and G. Rudolph, \Contemporary Evolution Strategies," in Advances in Arti cial Life, Third International Conference on Arti cial Life, vol. 929 of Lecture Notes in Arti cial Intelligence, pp. 893{907, Springer{Verlag, Berlin, 1995. [9] D. Whitley, \The GENITOR Algorithm and Selective Pressure: why rank based Allocation of Reproductive Trials is best," in Proceedings of the Third International Conference on Genetic Algorithms, (J. D. Schaer, ed.), pp. 116{121, M. Kaufmann Publisher, 1989. [10] L. Eshelman, The CHC Adaptive Search Algorithm. Morgan{ Kaufmann, 1991. [11] L. Davis, \Representational Bias and Test Suite Design," in Proceedings of the Fourth International Conference on Genetic Algorithms, (Booker and Belew, eds.), pp. 18{23, M. Kaufmann Publisher, 1991.
10
10000
1000
100 0
1000
2000
3000
4000
5000 6000 Generation
7000
8000
9000
10000
Fig. 4. The evolution of the best solution obtained for the 100{sized function.