An Improved Particle Swarm Optimizer with Momentum Tao Xiang, Jun Wang, and Xiaofeng Liao Abstract— In this paper, an improved particle swarm optimization algorithm with momentum (mPSO) is proposed based on inspiration from the back propagation (BP) learning algorithm with momentum in neural networks. The momentum acts as a lowpass filter to relieve excessive oscillation and also extends the PSO velocity updating equation to a second-order difference equation. Experimental results are shown to verify its superiority both in robustness and efficiency.
compose a swarm. In an n-dimensional search space, at the tth-iteration, taking the ith particle into account, the position vector and the velocity vector can be represented t t t , vi,2 , . . . vi,n ), as Xit = (xti,1 , xti,2 , . . . xti,n ) and Vit = (vi,1 respectively. The velocity and position updating rules are respectively given as follows: t+1 vi,j
I. I NTRODUCTION
P
ARTICLE swarm optimization (PSO) was firstly introduced by Kennedy and Eberhart in 1995. It is a metaphor of the social behavior of animals such as bird flocking and fish schooling[1], [2]. Investigations and applications have proved its capability and efficiency for various optimization problems. PSO represents each potential solution of target function by particles directly. These particles are conceptual, and they do not possess volume or weight, but each of them has its own position coordinate and velocity. Their positions and velocities are randomly initialized firstly, all particles then ”fly” through the solution space and update their positions until they find the optimal solution. During this iterative process, each particle’s velocity is adjusted according to its own experience and social cooperation. Since the introduction of PSO, numerous variants have been emerged to enhance its capabilities in tackling various complex optimization problems. Almost all variants are proposed to balance the global exploration and local exploitation, keep the diversity of particles, and accelerate or guarantee the convergence. In this paper, an improved PSO algorithm with momentum is proposed. This inspiration is drawn from the improvement of BP learning algorithm with momentum in neural networks. The proposed method is tested on benchmarks and compared with other prevalent variants. Conclusions are made at the end. II. S TANDARD PSO A LGORITHM
AND I TS
VARIANTS
In this section, the standard PSO algorithm and two important variants are described in detail. A. Standard PSO Algorithm PSO is a population-based iterative stochastic optimization algorithm. In standard PSO (SPSO) algorithm, all particles T. Xiang and X. Liao are with the College of Computer Science, Chongqing University, Chongqing 400044, China (e-mail:
[email protected];
[email protected]). J. Wang is with the Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong, China (e-mail:
[email protected])
xt+1 i,j
t t vi,j + c1 r1 (yi,j − xti,j ) + c2 r2 (ˆ yjt − xti,j ) (1)
=
t+1 xti,j + vi,j
(2)
for i = 1, 2, 3 . . . p; j = 1, 2, 3 . . . n, where p is the swarm size (i.e., the number of particles in the swarm), n is the dimension of search space, r1 and r2 are uniformly distributed random numbers between 0 and 1 (i.e., r1 , r2 ∈ U (0, 1)), c1 and c2 are positive constants referred as acceleration constants (a recommended value for them is c1 = c2 = 2[1]), yi,j is the the jth element of the best position found so far by the ith particle, yˆj is the jth element of the global best position found so far by all particles of the swarm. y and yˆ are updated as follows: t yi , if f (xt+1 ) ≥ f (xti ) i yit+1 = (3) t+1 , if f (x ) < f (xti ) xt+1 i i yˆt+1
= arg
min
y∈{y1t ,y2t ,...ypt }
f (y)
(4)
Generally, a maximum velocity (Vmax ) for each dimension of velocity is defined in order to clamp the excessive roaming of particles. This limitation is described in as follows: vit = sign(vit ) · min(Vmax , abs(vit ))
(5)
B. PSO Algorithm with Inertia Weight The PSO algorithm with inertia weight (WPSO) was introduced in [3]. The inertia weight (ω) is employed to balance global exploration and local exploitation. Large value of inertia weight facilitates better global exploration ability while small value of it enhances local exploitation capability. The velocity updating rule of this scheme is given as follows: t+1 t t = ωvi,j + c1 r1 (yi,j − xti,j ) + c2 r2 (ˆ yjt − xti,j ) vi,j
(6)
It is pointed out that WPSO takes the least average number of iterations to find global optimum when ω takes the value in [0.9, 1.2][3]. In another paper[4], Shi and Eberhart empirically studied the parameter selection of inertia weight and suggested linearly decrease ω, which substantially improved the performance of WPSO. At the beginning of iterations, large inertia weight helps particles explore new promising areas independently, while in the latter iterations, decreased
3341 c 1-4244-1340-0/07$25.00 2007 IEEE
=
inertia weight facilitates fine-tuning the consensual area so as to accelerate convergence. A commonly used setting is to linearly decrease ω from 0.9 to 0.4[4]. C. PSO Algorithm with Constriction Factor The other prevalent concept is constriction factor, which was introduced in [5]. The constriction factor (χ) is a constant utilized to constrict the magnitude of flying velocity of particles. In this way, the excessive roaming out of the defined search space is controlled, convergence rate is accelerated meanwhile. Velocity updating rule of the PSO with constriction factor (CPSO) is given as follows: t+1 t t = χ(vi,j + c1 r1 (yi,j − xti,j ) + c2 r2 (ˆ yjt − xti,j )) vi,j
where
2κ , χ = 2 − φ − φ2 − 4φ
(7)
IV. E XPERIMENTS AND D ISCUSSIONS (8)
φ = c1 + c2 and κ ∈ [0, 1]. The constant κ controls the speed of convergence. When κ ≈ 0, fast convergence to stable point is obtained. while a κ ≈ 1 results in slow convergence[6]. In practice, κ = 1, φ = 4.1 and χ = 0.729 was recommended by Clerc et al.[5]. III. I MPROVING PSO WITH M OMENTUM In this section, the PSO algorithm with momentum (mPSO) is introduced. This inspiration is drawn from the improvement of back-propagation (BP) learning algorithm with momentum in neural networks. The BP algorithm is one of the most prevalent algorithms used in training multi-layer feedforward neural networks [7]. It uses gradient descent method to minimize the error between actual output and expected output, but it tends to stagnate or oscillate in the superficial local minima and fail to converge to global minimum. The momentum was introduced to deal with this issue[8]. It acts as a lowpass filter to smoothen the alteration of weight. Its mathematical principle can be simply explained as follows: (9) y t = (1 − λ)xt + λy t−1 for 0 ≤ λ < 1, where x denotes the signal sequence needed to be smoothened, and xt the discrete signal value at time t, y t denotes filter’s output at time t. λ is so called a momentum factor. The greater value λ takes, the stronger smoothing capability it has. Inspired by this, a momentum is introduced in the velocity updating equation. For simplicity, the momentum is applied in SPSO. The new velocity updating rule is given as follows: t+1 t−1 t t = (1−λ)[vi,j +c1 r1 (yi,j −xti,j )+c2 r2 (ˆ yjt −xti,j )]+λvi,j vi,j (10) Compared (10) with the original SPSO velocity updating t−1 (i.e. the momentum), is added. λ, the equation (1), λvi,j momentum factor, indicates the effect of the momentum. This variant is unprecedented because to the best of our knowledge all velocity updating rules in existing schemes are first-order difference equations about particle velocity, but in
3342
this variant, the momentum is employed to extend it to a second-order difference equation. mPSO has the following three advantages at least: Firstly, it is just as simple as SPSO both in representation and implementation because it does not introduce any complicated operator or data structure. Secondly, it smoothens the movement of particles, which can eliminate the oscillation at the end of iterations as well as the stagnation in local minima. Finally, It is a generalized technique which can be incorporated in all existing PSO schemes that have explicit velocity updating equations. For example, mPSO can be incorporated in hierarchical PSO[9], hybrid PSO, etc. where SPSO, WPSO, and CPSO are employed currently.
This section lays out experimental configurations, numerical results, as well as their discussions. A. Experimental Setup Six benchmarks, which are widely used in literature, are utilized to test our scheme and compare with other variants. Their mathematical representations and related parameters configuration are listed in Tables I. The threshold column in Table I indicates the tolerable error range for judging whether a running of an algorithm find priori global minimizer successfully. The population size is assigned to 30 throughout the following experiments. Each algorithm is run for 5000 iterations in a running. In order to eliminate stochastic discrepancy, each algorithm is repeated 50 times. All criteria data will be calculated after each running and then their average values insist of final criteria data. B. Criteria Generally speaking, there are four criteria to evaluate the performance of a PSO algorithm; i.e., convergence precision error, success ratio, convergence speed, and convergence curve. The convergence precision error () denotes the difference between priori minimum of an objective function and actual convergent value. The smaller, the better. equals to the final best value because the priori minimum of each benchmarks are all equal to 0. The success ratio (γ) which indicates the robustness of a PSO algorithm can be defined as follows: γ=
times of successful running times of total running
(11)
where a successful running means an algorithm converges at the global minimizer of a benchmark within its error threshold after a running. The convergence speed indicates the cost of time to converge to global minimizer. But due to the discrepancy of test functions (because fitness evaluation is time-consuming and the difference of time overhead among evaluating disparate functions is significant), this criterion is often measured by the time of evaluation before convergence (τ ) instead, which
2007 IEEE Congress on Evolutionary Computation (CEC 2007)
TABLE I B ENCHMARKS FOR S IMULATIONS Function
Name
f1
Sphere
f2 f3 f4 f5 f6
Rosenbrock
Mathematical representation n f1 (x) = x2i f2 (x) =
i=1
f3 (x) =
Rastrigrin Griewank
n−1
−0.2
1 n
n
i=1
f6 (x) = 0.5 +
Schaffer
+ (xi − 1)2
x2i
sin
i=1 2 2 x2 −0.5 1 +x2
(1+0.001(x21 +x22 ))2
is exactly adopted in this paper. If an algorithm cannot converge within the error threshold on a benchmark, this value is asserted to inf . Convergence curve plots the tendency of global best value of cost function during iterations. This criterion is widely given in literature to observe the convergence situation of a PSO algorithm or compare performance among several algorithms. C. Results and Discussions In our experiments, different values of λ (from 0.1 to 0.9 with 0.1 increment) are deployed and compared to other prevalent variants. It is found that mPSO gains best performance when λ takes the values between 0.2 and 0.4. Therefore, only the results of mPSO with λ = 0.2, 0.3, and 0.4 are shown here. The detailed results are presented in Table II and Fig. 1. In Table II, as a whole, the criteria data of mPSO are better than those of SPSO, WPSO, and CPSO on average. In particular, mPSO with λ = 0.2 has the best robustness as its success ratio is the highest on five benchmarks except for f6 , and it gains 100% success ratio on four benchmarks (f1 to f4 ). But its best cost function value is not noticeable as it only wins in Rastrigrin and Ackley functions. As for the convergence speed, mPSO with λ = 0.3 and λ = 0.4 has their advantages. mPSO with λ = 0.4 wins on four benchmarks (f1 , f2 , f3 , and f6 ) and that with λ = 0.3 wins on two benchmarks (f4 and f5 ). When the best cost function value is considered, victors are distributed more dispersedly. mPSO with λ = 0.2 to 0.4 still win on four benchmarks (f1 , f2 , f3 , and f5 ). In order to give a detailed performance view for the algorithms, the convergence curve are also drawn in Fig. 1. For a benchmark, the mPSO whose is the best is plotted. In Fig. 1, the convergence curves of mPSO always descend rapidly during iterations and reaches the lowest point compared to those of SPSO, WPSO, and CPSO. V. C ONCLUSION In this paper, an improved PSO algorithm with momentum is proposed. The momentum acts as a lowpass filter to relieve the excessive oscillation of particles in superficial minima
Domain
30
[−100, 100]
[50, 100]
0
0.01
30
[−30, 30]
[15, 30]
0
100
30
[−5.12, 5.12]
[2.56, 5.12]
0
100
30
[−600, 600]
[300, 600]
0
0.1
30
[−32, 32]
[16, 32]
0
0.1
2
[−100, 100]
[50, 100]
0
0.00001
− 10 cos (2πxi ) + 10 n
x x2i − cos √ii + 1 i=1 i=1 n n 1 x2i − exp n cos (2πxi ) + 20 + e
i=1
1 4000
Ackley f5 (x) = −20exp
2 2
100 xi+1 − xi
n
f4 (x) =
i=1
Dimension
Initialization RangeGlobal MinimumThreshold
and accelerate convergence rate. It also extends PSO velocity updating equation to a second-order difference equation. In order to investigate its performance, a series of experiments are conducted on the benchmarks. The results are compared to other prevalent versions (SPSO, WPSO, CPSO) under several criteria. From experimental results, it is concluded that the mPSO algorithm with λ = 0.2 to 0.4 has superior performance both in efficiency and robustness. It gains the best robustness when λ = 0.2, whereas greater value of λ (λ = 0.3 or 0.4) gives it more efficiency in finding better resolution in less iterations. The proposed mPSO algorithm does not introduce any time-consuming operation, complicated topological structure, or profound velocity updating rule. It actually belongs to a foundational variant like WPSO and CPSO, but outperforms them significantly. Undoubtedly, mPSO can also be used in all extended variants where SPSO, WPSO, and CPSO are employed currently. R EFERENCES [1] J. Kennedy and R. C. Eberhart, “Particle swarm optimization,” in Proc. IEEE International Conference on Neural Networks, Perth, Australia, Nov. 1995, pp. 1942–1948. [2] R. C. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Proc. of the Sixth International Symposium on Micro Machine and Human Science (MHS’95), Nagoya, Japan, Oct. 1995, pp. 39–43. [3] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,” in Proc. IEEE International Conference on Evolutionary Computation, Anchorage, USA, May 1998, pp. 69–73. [4] Y. Shi and R. C. Eberhart, “Parameter selection in particle swarm optimization,” in Proc. the 1998 Annual Conference on Evolutonary Programing, San Diego, USA, Mar. 1998, pp. 591–600. [5] M. Clerc and J. Kennedy, “The particle swarm–explosion, stability, and convergence in a multidimensional complex space,” IEEE Trans. on Evolutionary Computation, vol. 6, no. 1, pp. 58–73, 2002. [6] F. van den Bergh and A. P. Engelbrecht, “A study of particle swarm optimization particle trajectories,” Information Sciences, vol. 176, pp. 937–971, 2006. [7] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986. [8] T. P. Vogl, J. K. Mangis, A. K. Zigler, W. T. Zink, and D. L. Alkon, “Accelerating the convergence of the backpropagation method,” Biological Cybernetics, vol. 59, pp. 256–264, 1988. [9] S. Janson and M. Middendorf, “A hierarchical particle swarm optimizer and its adaptive variant,” IEEE Trans. on Systems, Man, and Cybernetics - Part B: Cybernetics, vol. 35, no. 6, pp. 1272–1282, 2005.
2007 IEEE Congress on Evolutionary Computation (CEC 2007)
3343
4
8
x 10
x 10
8
5 SPSO WPSO CPSO mPSO(0.4)
6
SPSO WPSO CPSO mPSO(0.3)
4.5 4 Cost function value
Cost function value
7
5 4 3
3.5 3 2.5 2 1.5
2 1 1 0
0.5 0
1000
2000 3000 Iteration
4000
0
5000
0
1000
(a) f1 (Sphere)
4000
5000
(b) f2 (Rosenbrock)
500
600 SPSO WPSO CPSO mPSO(0.2)
400
SPSO WPSO CPSO mPSO(0.2)
500 Cost function value
450 Cost function value
2000 3000 Iteration
350 300 250 200
400
300
200
150 100
100
50 0
0
1000
2000 3000 Iteration
4000
0
5000
0
1000
(c) f3 (Rastrigrin)
2000 3000 Iteration
5000
(d) f4 (Griewank)
25
0.02 SPSO WPSO CPSO mPSO(0.1)
0.018 0.016 Cost function value
20 Cost function value
4000
SPSO WPSO CPSO mPSO(0.2)
15
10
0.014 0.012 0.01 0.008 0.006
5
0.004 0.002
0
0
1000
2000 3000 Iteration
(e) f5 (Ackley) Fig. 1.
3344
4000
5000
0
0
1000
2000 3000 Iteration
(f) f6 (Schaffer)
Convergence curves of mPSO, SPSO, WPSO, and CPSO on different benchmarks.
2007 IEEE Congress on Evolutionary Computation (CEC 2007)
4000
5000
TABLE II R ESULTS OF M PSO VS . SPSO, WPSO, AND CPSO f1 f2 f3 f4 f5 f6 γ τ γ τ γ τ γ τ γ τ γ τ SPSO 2.9806e+004 0 inf 5.3967e+007 0 inf 320.9357 0 Inf 272.3830 0 inf 19.7324 0 inf 0.0089148 0 inf WPSO 2.1328e-028 1 3093 58.5404 0.823388.5 28.6150 1 2402.9 0.0166 1 3010.6 3.6217 0.823019.23.3073e-291 1 1323 CPSO 6.6114e-072 1374.28 19.8988 0.94952.34 96.4709 0.52 202 0.0292 0.96336.2711.98040.06 434 0.0034977 0.64 539.5 mPSO (λ = 0.2) 4.3695e-055 1516.26 18.1544 1 715.76 18.5659 1 160.16 0.0143 1 468.6 0.1591 0.88467.61 0.0019432 0.8 377.73 mPSO (λ = 0.3) 1.6699e-086 1310.66 1.7942 1 395.02 30.8039 1 85.66 0.0418 0.88274.61 3.0933 0.02 237 0.0046636 0.52429.46 mPSO (λ = 0.4) 1.9148e-087 1296.42 20.6266 0.98310.06 36.2364 1 74.32 0.0413 0.9 250.91 4.0192 0 inf 0.0042750 0.56125.21 Algorithm
2007 IEEE Congress on Evolutionary Computation (CEC 2007)
3345