Differential Evolution Based on Improved Learning Strategy Yuan Shi1, Zhen-zhong Lan1 and Xiang-hu Feng1 1
Software School, Sun Yat-sen University, Guangzhou, P.R.China
[email protected]
Abstract. From a learning perspective, the mutation scheme in differential evolution (DE) can be regarded as a learning strategy. When mutating, three random individuals are selected and placed in a random order. This strategy, however, probably suffers some drawbacks which can slow down the convergence rate. To improve the efficiency of classic DE, this paper proposes a differential evolution based on improved learning strategy (ILSDE). The proposed learning strategy, inspired by the learning theory of Confucius, places the three individuals in a more reasonable order. Experimenting with 23 test functions, we demonstrate that ILSDE performs better than classic DE. Keywords: Differential evolution, mutation scheme, learning strategy
1
Introduction
For human beings, leaning is a fundamental process through which we gain knowledge and explore the world outside. By learning, an individual is able to get information from others and thus broaden personal experience. Therefore, learning provides a way for a group of people to cooperate and share their ideas and experience. In other words, learning provides a cooperative way to solve complicated problems. This is one of the essential characteristics of a class of algorithms, namely evolutionary algorithms (EAs). EAs, inspired by biological phenomenon, are a branch of stochastic search algorithms. During last several decades, a variety of EAs have been developed and applied to various problems in science and engineering. Some popular ones are genetic algorithm (GA) [1], ant colony optimization (ACO) [2], particle swarm optimization (PSO) [3], differential evolution (DE) [4] and so forth. In GA, an individual learns from another one by the crossover operation and the direction of learning is guaranteed by the selection operation. In ACO, through pheromone deposit and pheromone decay, good information is enhanced while bad information is weakened. Therefore, ants are attracted to superior areas and further improve the solutions. In PSO, each particle learns from two positions: the global best position found so far and the personal best position found so far. Thus particles quickly fly towards promising areas just like bird flocking. DE proposes another distinctive learning strategy, in which an individual learns information from three other individuals. The strategy first randomly selects one individual as the mutation base;
then adds the weighted differences between two other randomly selected individuals to the mutation base. The mutation scheme of classic DE is meaningful and shares some similarity with the learning theory of Confucius who is one of the famous ideologists of ancient China. He said, “When I walk along with two others, they may serve me as my teachers. I will select their good qualities and follow them, know their bad qualities and avoid them.” [5] According to his theory, human beings learn in a similar yet more intelligent and effective way compared with that in classic DE. The major difference is that human beings are sophisticated enough to distinguish good and bad information. Therefore the efficiency and effectiveness of learning in classic DE can be improved if we take some ideas from the Confucius’ learning theory. In this paper, we design an improved learning strategy based on the learning theory of Confucius and apply it to the mutation scheme in DE. In the proposed mutation scheme, the information of the good individual and the differences between the good and bad ones are utilized in a specific manner. In other words, the three randomly selected individuals are placed in a specific yet more reasonable order. Numerical experiments on 23 test functions are carried out to test the performance of differential evolution based on the improved learning strategy (ILSDE). Experimental results demonstrate the superiority of ILSDE.
2
Problem Formulation
In this paper, the following function optimization problem is addressed: minimize f ( x) s.t. x ∈ Ω where x = ( x1 , x2 ,..., xN ) is the continuous variable vector domain Ω ⊂ R
N
,and f ( x) : Ω → R
is a continues real-valued function. The
is defined by specifying domain Ω lower l = (l1 , l2 ,..., lN ) bounds, respectively.
3
in
upper
u = (u1 , u2 ,..., u N )
and
Differential Evolution
Differential Evolution (DE) is a population-based EA over continuous domain. The population is composed of PS individual xi ,G , i = 1, 2,..., PS where G denotes the current generation. Each individual xi ,G is represented by a D dimensional vector: xi ,G = ( xi1,G , xi 2,G ,..., xiD ,G )
Like other EAs, DE guides the population towards the global optimum through repeated cycles of evolutionary operations, including mutation, crossover and selection. Detailed description of the three operations is given as follows. (1) Mutation
For each individual xi ,G , i = 1, 2,..., PS , the mutation operation is applied by first randomly selecting three other individuals, denoted by xr1 ,G , xr2 ,G and xr3 ,G , then a mutated individual vi ,G is generated according to the following equation: vi ,G = xr1 ,G + F ⋅ ( xr2 ,G − xr3 ,G )
(1)
where i, r1 , r2 , r3 ∈ [1, PS ] and i ≠ r1 ≠ r2 ≠ r3 , while F ∈ [0, 2] , is a positive real factor named scaling factor Literature [4] discussed some other variants of DE in which more than three individuals are involved in the mutation operation. (2) Crossover Crossover operation recombines xi ,G and vi ,G to yield a trail individual ui ,G +1 . The operation is carried out following the equation below: ⎪⎧v , uij ,G +1 = ⎨ ij ,G ⎪⎩ xij ,G ,
if r ( j ) < CR or j = rn(i ) otherwise
(2)
with j = 1, 2,..., D . r ( j ) ∈ [0,1] is the jth evaluation of a uniform random number generator and rn(i ) ∈ {1,..., D} is a randomly chosen index which ensures that ui ,G +1 gets at least one element from vi ,G .
(3) Selection A one-to-one competition is played between xi ,G and ui ,G +1 after crossover operation. The better one will be promoted to the next generation according to the following equation: ⎧⎪u , xi ,G +1 = ⎨ i ,G +1 ⎪⎩ xi ,G ,
4
if f(ui,G+1 )< f(xi,G ) otherwise
(3)
DE Based on Improved Learning Strategy
The mutation scheme of DE can be regarded as a learning strategy. It randomly determines the mutation base and the other two individuals to generate the weighted differences in classic DE. This strategy, however, probably suffers from two drawbacks. First, if the quality of the randomly selected mutation base is relatively bad, the generated mutated individual is likely to be unfavorable. Second, since the two individuals for calculating the weighted difference are randomly placed, the direction of the difference may be too random, making the mutation a bit inefficient. Those two drawbacks can slow down the convergence rate of the algorithm considerably.
How to improve the efficiency of the mutation scheme? In other words, how to make the learning strategy more efficient? The Confusions’ learning theory puts forward a feasible and promising solution. According to his words, “I will select their good qualities and follow them, know their bad qualities and avoid them”, we should design an improved learning strategy in which the mutated individual learns the good qualities but avoids the bad qualities of the random selected individuals. Generally, suppose K (K≥3) individuals are randomly chosen to perform the mutation operation. To “follow the good qualities”, the best one among the K individuals, referred as lbest, is set as the mutation base. To “avoid the bad qualities”, the direction of the weighted difference should be generated not only towards the good individual, but also apart from the bad one. Thus, the weighted difference of the second best individual (slbest) and the worst one (lworst) is used. Mathematically, this improved learning strategy can be represented by the formula: (4)
vi ,G = lbest + F ⋅ ( slbest − lworst )
where the parameter F and vi ,G are the same as those in (1).
5
Numerical Experiments
5.1
Test Functions
To test the performance of ILSDE, 23 commonly-used test functions [6] of three categories are chosen. Table 1 lists the test functions and their key properties. These functions can be divided into three categories. f1 – f7 are unimodal functions, which are relatively easy to optimize. They can be used to test the converge rate of an optimization algorithm. f8 – f13 are multimodal functions where the number of local minima increases exponentially with the problem dimension. f14 – f23 are multimodal functions having a few local minima, which are always used to test the ability of an optimization algorithm in escaping from deceptive optima and locating the desired near-global solution. Details of some functions can be found in literature [6]. Table 1. Test Functions.
Ω
Benchmark Function
f1 = ∑ i =1 xi2 n
fmin [-100,100]30
0
f 2 = ∑ i =1 | xi | +∏ i | xi |
[-10,10]30
0
f 3 = ∑ i =1 (∑ j =1 x j ) 2
[-100,100]30
0
f 4 = max i ( xi ,1 ≤ i ≤ n)
[-100,100]30
0
n
n
n
i
f 5 = ∑ i =1 ⎡⎣100( xi +1 − xi2 ) 2 + ( xi − 1) 2 ⎤⎦
[-30,30]30
0
f 6 = ∑ i =1 ( ⎣⎢ xi + 0.5⎦⎥ )
[-100,100]30
0
f 7 = ∑ i =1 ix + random[0,1)
[-1.28,1.28]30
0
f8 = ∑ i =1 − xi sin( | xi |)
[-500,500]30
-12569.5
f 9 = ∑ i =1[ x − 10 cos(2π xi ) + 10]
[-5.12,5.12]30
0
[-32,32]30
0
[-600,600]30
0
[-50,50]30
0
[-50,50]30
0
[-65.536, 65.536]2
1
[-5,5]4
0.0003
[-5,5]2
-1.03
[-5,10]×[0,15]
0.398
[-2,2]2
3
f19 = −∑ i =1 ci exp[−∑ j =1 aij ( x j − pij ) 2 ]
[0,1]4
-3.86
f 20 = −∑ i =1 ci exp[−∑ j =1 aij ( x j − pij ) 2 ]
[0,1]6
-3.32
f 21 = −∑ i =1[( x − ai )( x − ai )T + ci ]−1
[0,10]4
-10.1532
f 22 = −∑ i =1[( x − ai )( x − ai )T + ci ]−1
[0,10]4
-10.4029
n −1
n
n
2
4 i
n
n
2 i
f10 = −20exp(−0.2
1 n 2 1 n x ) − exp( ∑i =1 cos2π xi ) ∑ i =1 i n n
+ 20 + e f11 = 1 4000 ∑ i =1 xi2 − ∏ i =1 cos( n
n
f12 =
xi i
) +1
1 n −1 {sin2 (3π x1 ) + ∑i =1 ( xi −1)2[1+ sin2 (3π xi +1 )] 10
+ ( xn −1) [1+ sin (2π xn )]} + ∑i =1 u( xi ,10,100,4) 2
n
2
π
f13 = {10sin2 (3π y1 ) + ∑i =1 ( yi −1)2[1+ sin2 (3π yi +1 )] n n−1
+ ( yn −1) [1+ sin (2π xn )]}+ ∑i =1u(xi ,5,100,4) 2
n
2
⎡ ⎤ 1 1 25 ⎥ + ∑ j =1 f14 = ⎢ 6 2 ⎢ 500 ⎥ + − j x a ( ) ∑ i ij i =1 ⎣ ⎦
−1
2
⎡ x (b 2 + bi x2 ) ⎤ f15 = ∑ i =1 ⎢ ai − 21 i ⎥ bi + bi x3 + x4 ⎦ ⎣ 1 f16 = 4 x12 − 2.1x14 + x16 + x1 x2 − 4 x22 + 4 x24 3 5.1 2 5 1 f17 = ( x2 − 2 x1 + x1 − 6)2 +10(1− )cos x1 +10 4π 8π π f18 = [1 + ( x1 + x2 + 1)2 (19 − 14 x1 + 3x12 − 14 x2 + 6 x1 x2 11
+ 3x22 )] × [30 + (2 x1 − 3x2 )2 (18 − 32 x1 + 12 x12 + 48x2 − 36 x1 x2 + 27 x22 ] 4
4
4
6
5
7
f 23 = −∑ i =1[( x − ai )( x − ai )T + ci ]−1 10
5.2
[0,10]4
-10.5364
Experimental Setup
For both algorithms, the parameters are set the same for all test functions. That is: population size PS =100, crossover rate CR =0.9, and the scaling factor F= 0.5. These parameters follow the suggestions from Rainer Storn and Kenneth price [4], Jakob Vesterstrom [7]., The initial population is generated uniformly at random in the search domain of the functions. 5.3
Comparisons between DE and ILSDE
The performance of the ILSDE is evaluated based on the optimization results on the 23 test functions. The comparisons between ILSDE and DE are described in Table 2, in which mean and variance denote the mean value and standard deviation of the optima in 30 runs, respectively. The results reveal that ILSDE can achieve higher accuracy in the obtained optimum both in unimodal functions and multimodal functions. Table 2.
FN
Computational effort
f1
100000
f2
100000
f3
200000
f4
200000
f5
500000
f6
50000
f7
50000
f8
50000
f9
50000
f10
50000
Comparison between DE and ILSDE. Mean Best (Variance) ILSDE 1.19×10-19 (1.34×10-19) 1.67×10-9 (7.95×10-9) 1.26×10-8 (1.27×10-8) 8.51×10-6 (3.37×10-6) 1.08×10-29 (2.31×10-29) 0 (0) 2.60×10-2 (5.63×10-3) -6281.2 (3.90×102) 187.37 (9.21) 1.66×10-4 (4.57×10-5)
DE 1.81×10-8 (7.54×10-8) 1.52×10-4 (3.94×10-4) 9.00×10-2 (3.34×10-2) 1.85×10-1 (2.46×10-1) 1.10×10-12 (1.79×10-11) 0 (0) 5.21×10-2 (1.26×10-2) -5354.0 (2.22×102) 199.8 (12.05) 3.21×10-1 (1.26×10-1)
5.4
f11
50000
f12
50000
f13
50000
f14
10000
f15
50000
f16
5000
f17
5000
f18
5000
f19
5000
f 20
5000
f 21
10000
f 22
10000
f 23
10000
7.42×10-4 (4.18×10-3) 3.67×10-8 (1.89×10-8) 4.78×10-7 (2.98×10-7) 0.998 (9.93×10-17) 3.99×10-4 (2.74×10-4) -1.03 (9.45×10-9) 0.398 (5.82×10-9) 3.00 (8.36×10-13) -3.86 (1.78×10-12) -3.25 (5.54×10-2) -10.15 (8.25×10-11) -10.40 (2.08×10-13) -10.53 (9.21×10-12)
2.39×10-1 (1.42×10-1) 2.24×10-3 (1.34×10-3) 4.98×10-2 (2.05×10-2) 0.998 (3.07×10-16) 4.29×10-4 (3.11×10-4) -1.03 (5.73×10-7) 0.398 (2.17×10-7) 3.00 (2.84×10-9) -3.86 (3.48×10-8) -3.21 (2.97×10-2) -10.15 (3.69×10-5) -10.40 (1.48×10-7) -10.53 (2.26×10-6)
Detailed Analysis
(1) Unimodal Function Without loss of generality, f1 , f5 are chosen to represent the unimodal function and given more detailed analysis. f1 is relatively simple to optimize. It is commonly used to test the convergence rate of an algorithm. While f5 always appears to be one of the most difficult functions since its global minimum is inside a long, narrow, parabolic shaped flat valley, the variables are strongly dependent, and the gradients generally do not point towards the optimum. As indicated in Table 2 and Fig 1, ILSDE outperforms DE in terms of convergence rate.
f1
f5
Fig. 1. Comparisons of converge rate between DE and ILSDE on f1 and f5.
(2) Multimodal Function f9, f10 and f16, f17 are selected to represent multimodal functions with high dimensions and low dimensions, respectively. f9 and f10 is highly multimodal. Most evolutionary algorithms with relatively small population tend to trap into local optima. Actually, both DE and ILSDE trap into local minima when CR is 0.9, but when CR is set to 0.1, both of the algorithms are able to obtain much better results in f9. As can be seen in Fig 2 and Table 2, ILSDE can perform better in escaping from deceptive optima. Both f16 and f17 are two-dimensional functions with a few local optima. From Fig 3 and Table 2, no significant difference can be observed between DE and ILSDE in this category of functions.
f9
f10
Fig. 2. comparisons of convergence rate between DE and ILSDE on f9 and f10.
f16
f17
Fig. 3. Comparisons of convergence rate between DE and ILSDE on f16 and f17.
In the previous experiments, the value of K is 3. It’s necessary and interesting to have a study on the influences when setting different values for K. Here, we carry out another set of experiments on higher K (K=5, 10). Fig. 4 illustrates their performances on f1, f5, f9 and f10, showing that they converge faster than ILSDE when K=3. On the other hand, however, as we have observed from some test cases, faster convergence rate may lead to easier trapping into local minima. Therefore, further analysis and study should be made in our future work.
f1
f5
f9
f10
Fig. 4. Comparisons of convergence rate of ILSDE using different K on f1, f5, f9 and f10.
6
Conclusion
In this paper, the mutation scheme in DE is studied as a learning strategy. The strategy in classic DE has some drawbacks since it randomly places the selected individuals to generate mutated individual. Taking some ideas from the learning theory of Confucius, we designed an improved learning strategy involving more intelligence. Extensive experiments show that ILSDE converges faster and obtains better results than classic DE in most test cases. Our future work will be emphasized on finding more effective learning strategy. Also, we will apply the proposed approach to solve some real-world problems.
References 1. Holland, J.H.: Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor (1975) 2. M. Dorigo: Optimization, Learning and Natural Algorithms. PhD thesis, Dipartimento di Elettronica, Politecnico di Milano, Italy (1992) 3. J. Kennedy and R. C. Eberhart: Particle swarm optimization. In: Proc.IEEE Int. Conf. Neural Netw., pp. 1942--1948 (1995) 4. R. Storn and K. V. Price: Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. J. Global Opt., vol. 11, no. 4, pp. 341--359 (1997) 5. Confucius, Dim Cheuk Lau: Confucius: The Analects. Chinese University Press (1992) 6. Xin Yao, Yong Liu and Guangming Lin: Evolutionary Programming Made Faster. IEEE Trans. on Evolutionary Computation, vol. 3, pp. 82--102 (1999) 7. Jakob Vesterstrom and Bene Thomsen: A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems Evolutionary Computation. In: CEC2004. Congress on Volume 2, 19-23, pp. 1980--1987 (2004)