Swarm intelligence is a set of search and optimization techniques. ... Fourth, the application of PSO algorithms on text category problems is presented. ...... unrelated keywords to text content, uses this to cheat with search engines. 168 ...
Population Diversity in Particle Swarm Optimization: Definition, Observation, Control, and Application Thesis submitted in accordance with the requirements of the University of Liverpool for the degree of Doctor in Philosophy by Shi Cheng Department of Electrical Engineering and Electronics School of Electrical Engineering and Electronics and Computer Science University of Liverpool February, 2013
Abstract Optimization, in general, concerns with finding the “best available” solution(s) for a given problem, and the problem may have several or numerous optimum solutions, of which many are local optimal solutions. The goal of global optimization is to make the fastest possible progress toward the “good enough” solution(s). Swarm intelligence is a set of search and optimization techniques. To search a problem domain, a swarm intelligence algorithm processes a population. A population is a collection of individuals. Each individual represents a potential solution of the problem being optimized. Particle Swarm Optimization (PSO) is a population-based stochastic algorithm modeled on the social behaviors observed in flocking birds. Each particle represents a solution, flies through the search space with a velocity that is dynamically adjusted according to its own and its companion’s historical behaviors. The particles tend to fly toward better search areas over the course of the search process. Premature convergence happens in PSO algorithms partially due to improper search information propagation. Fast propagation of search information will lead particles get clustered together quickly. The most important factor affecting an optimization algorithm’s performance is its ability of exploration and exploitation. Exploration means the ability of a search algorithm to explore different areas of the search space in order to have high probability to find good promising solutions. Exploitation, on the other hand, means the ability to concentrate the search around a promising region in order to refine a candidate solution. To solve the premature convergence problems, optimization algorithm should optimally balance the two conflicted objectives. In this thesis, the research on the population diversities of PSO algorithms includes four parts. First, the theoretical analyses of population diversities are proposed. A unified definition of population diversity is proposed, and the differences between different population diversity definitions are compared. Population diversity is a way to monitor the degree of convergence or divergence in PSO search process. In other words, the particles’ current distribution and its moving tendency, whether it is in the state of “fly” to a large search space or refine in a local area, can be obtained from this measurement. Population diversities of PSO, which include position diversity, velocity diversity, and cognitive diversity, are utilized to monitor particles’ search status during
i
optimization process. Position diversity, velocity diversity, and cognitive diversity, represent distributions of current solutions, particles’ “moving potential”, and particles’ “moving target”, respectively. Second, the population diversities of different PSO algorithms solving single- and multi-objective problems are observed. Particles are not only premature converged on the local optima, but also on the boundary. The boundary constraints handling strategies based on the population diversity observations are analysed and compared. Different variants of PSO algorithms have different search properties. This is partially based on their different search information propagation strategies. The search information propagation strategies in particles are analysed based on the population diversity. The population diversities with different search information propagation, diversity promotion for multimodal and large scale problems, and population diversities in multiobjective optimization are observed and discussed. Third, the population diversities based adaptive PSO algorithms. To increase the possibility of particles “jumping out” of local optima, and keep the ability of the algorithm finding “good enough” solution, a variant of PSO algorithm, based on the population diversity enhancement and search space dynamically reduction, is proposed to solve multimodal and large scale optimization problems. Also, an adaptive inertia weight is utilized to control the population diversity. Fourth, the application of PSO algorithms on text category problems is presented. Text category problem is a problem that finds correct category (or categories) for documents by giving a set of categories (subject, topics) and a collection of text documents. A text mining problem is modeled as an optimization problem in this thesis. The particle swarm optimizer is utilized in a semi-supervised learning classifier to optimize the categorized samples. A particle swarm optimizer based k nearest neighbor method is proposed to tune the parameter k and the numbers of examples in each class. The experimental results show that the classification accuracy is improved. At the last, the potential future research on some further questions and possible extensions are also discussed.
ii
Contents Abstract
iii
Contents
vi
List of Figures
x
List of Tables
xiii
List of Algorithms
xiii
List of Abbreviations
xiv
1 Introduction
1
1.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3
Major Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.4
Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2 Literature Review
5
2.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2.1
Single Objective Optimization . . . . . . . . . . . . . . . . . . .
6
2.2.2
Multiobjective Optimization
. . . . . . . . . . . . . . . . . . . .
9
2.3
Evolutionary Computation
. . . . . . . . . . . . . . . . . . . . . . . . .
10
2.4
Swarm Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.5
Particle Swarm Optimizer/Optimization . . . . . . . . . . . . . . . . . .
12
2.5.1
Original Particle Swarm . . . . . . . . . . . . . . . . . . . . . . .
14
2.5.2
Classical Particle Swarm . . . . . . . . . . . . . . . . . . . . . . .
15
2.5.3
Fully Informed Particle Swarm . . . . . . . . . . . . . . . . . . .
16
2.5.4
Topology Structure . . . . . . . . . . . . . . . . . . . . . . . . . .
17
Text Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.6.2
Swarm Intelligence in Data Mining . . . . . . . . . . . . . . . . .
20
2.6
iii
2.6.3
Chinese text categorization process . . . . . . . . . . . . . . . . .
3 Population Diversity
21 23
3.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.2
Population Diversity Definition . . . . . . . . . . . . . . . . . . . . . . .
24
3.2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.2.2
Position Diversity . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.2.3
Velocity Diversity
. . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.2.4
Cognitive Diversity . . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.2.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
Diversity Monitoring and Analysis . . . . . . . . . . . . . . . . . . . . .
29
3.3.1
Comparison of Different PSO Diversity Definitions . . . . . . . .
29
3.3.2
PSO Diversity Analysis . . . . . . . . . . . . . . . . . . . . . . .
31
Search Information Propagation . . . . . . . . . . . . . . . . . . . . . . .
31
3.4.1
Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.4.2
Population Diversity Analysis and Discussion . . . . . . . . . . .
38
3.4.3
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
Normalized Population Diversity . . . . . . . . . . . . . . . . . . . . . .
44
3.5.1
Vector Norm and Matrix Norm . . . . . . . . . . . . . . . . . . .
46
3.5.2
Normalized Position Diversity . . . . . . . . . . . . . . . . . . . .
47
3.5.3
Normalized Velocity Diversity . . . . . . . . . . . . . . . . . . . .
48
3.5.4
Normalized Cognitive Diversity . . . . . . . . . . . . . . . . . . .
49
3.5.5
Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . .
50
3.5.6
Diversity Analysis and Discussion
. . . . . . . . . . . . . . . . .
53
3.5.7
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
3.3
3.4
3.5
4 Population Diversity Observation
70
4.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
4.2
Population Diversity based Boundary Constraints Handling Analysis . .
70
4.2.1
Boundary Constraints Handling . . . . . . . . . . . . . . . . . . .
71
4.2.2
Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . .
73
4.2.3
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . .
74
4.2.4
Population Diversity Analysis and Discussion . . . . . . . . . . .
82
4.2.5
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
4.3
Population Diversity in Single and Multi-Objective Optimization . . . . 100 4.3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.3.2
Multi-Objective Optimization . . . . . . . . . . . . . . . . . . . . 101
4.3.3
Performance Metrics in Multi-Objective Optimization . . . . . . 102
4.3.4
Diversity Change in Single Objective Optimization . . . . . . . . 109
4.3.5
Population Diversity in Multiobjective Optimization . . . . . . . 111 iv
4.3.6
Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.3.7
Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . . . 118
4.3.8
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5 Population Diversity Control
125
5.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.2
Population Diversity Control . . . . . . . . . . . . . . . . . . . . . . . . 126
5.3
5.4
5.2.1
Based on Random Noise . . . . . . . . . . . . . . . . . . . . . . . 126
5.2.2
Based on Average of Current Velocities . . . . . . . . . . . . . . 127
5.2.3
Based on Current Position and Average of Current Velocities . . 128
5.2.4
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Dynamical Exploitation Space Reduction . . . . . . . . . . . . . . . . . 132 5.3.1
Diversity Maintenance . . . . . . . . . . . . . . . . . . . . . . . . 133
5.3.2
Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3.3
Diversity Analysis and Discussion
5.3.4
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Adaptive Inertia Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.4.1
Diversity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.4.2
Particles with Different Inertia Weight . . . . . . . . . . . . . . . 147
5.4.3
Particles with Similar Inertia Weight . . . . . . . . . . . . . . . . 149
5.4.4
Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.4.5
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6 Text Categorization 6.1
6.3
6.4
162
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.1.1
6.2
. . . . . . . . . . . . . . . . . 137
Preliminary Processing . . . . . . . . . . . . . . . . . . . . . . . . 163
Similarity Metrics
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.2.1
Distance Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.2.2
Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.2.3
Measure Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 168
Categorization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.3.1
Nearest-neighbor classifier . . . . . . . . . . . . . . . . . . . . . . 169
6.3.2
k Nearest Neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.3.3
k Weighted Nearest Neighbor . . . . . . . . . . . . . . . . . . . . 170
6.3.4
Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . . 172
Particle Swarm Optimization based Semi-Supervised Learning . . . . . . 172 6.4.1
Particle Swarm Optimization based Semi-Supervised Learning . 172
6.4.2
Experimental Results and Analysis . . . . . . . . . . . . . . . . . 173
6.4.3
Nearest Neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.4.4
Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . . 177 v
6.5
6.4.5
Particle Swarm Optimization based Semi-Supervised Learning . 180
6.4.6
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Particle Swarm Optimization based Nearest Neighbor . . . . . . . . . . 181 6.5.1
k Value Optimization . . . . . . . . . . . . . . . . . . . . . . . . 182
6.5.2
k Value and Labeled Examples Optimization . . . . . . . . . . . 183
6.5.3
Experimental Results and Analysis . . . . . . . . . . . . . . . . . 184
6.5.4
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
7 Conclusions 7.1
7.2
190
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 7.1.1
Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . 190
7.1.2
Population Diversity . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.1.3
Single / Multi-Objective Optimization . . . . . . . . . . . . . . . 191
7.1.4
Text Categorization . . . . . . . . . . . . . . . . . . . . . . . . . 192
Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
A Benchmark Functions
195
A.1 Single Objective Optimization . . . . . . . . . . . . . . . . . . . . . . . . 195 A.1.1 Unimodal Function . . . . . . . . . . . . . . . . . . . . . . . . . . 195 A.1.2 Multimodal Function . . . . . . . . . . . . . . . . . . . . . . . . . 196 A.1.3 Unimodal Shifted Function . . . . . . . . . . . . . . . . . . . . . 197 A.1.4 Multimodal Shifted Function . . . . . . . . . . . . . . . . . . . . 198 A.2 Multi-Objective Optimization . . . . . . . . . . . . . . . . . . . . . . . . 199 A.2.1 The CEC 2009 Multiobjective Optimization Test Instances . . . 200 Bibliography
228
Index
229
vi
List of Figures 2.1
The mapping from solution space to objective space. . . . . . . . . . . .
6
2.2
Topology structures in particle swarm optimization. . . . . . . . . . . .
18
3.1
Different definitions of PSO population diversity. . . . . . . . . . . . . .
30
3.2
PSO population diversity analysis. . . . . . . . . . . . . . . . . . . . . .
32
3.3
Population diversity changing curves on PSO solving unimodal function f0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4
Population diversity changing curves on PSO solving multimodal function f5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5
54
Population diversity changing while PSO solving unimodal Schwefel’s P1.2 function f2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9
43
Population diversity changing while PSO solving unimodal Parabolic function f0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8
42
Comparison of Population diversities for PSO solving multimodal function f5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7
41
Comparison of Population diversities for PSO solving unimodal function f0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6
39
55
Population diversity changing while PSO solving multimodal Rosenbrock function f5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
3.10 Population diversity changing while PSO solving multimodal Ackley function f8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
3.11 Comparison of different normalization of PSO population diversity on unimodal Parabolic function f0 . . . . . . . . . . . . . . . . . . . . . . . .
60
3.12 Comparison of different normalization of PSO population diversity on unimodal Parabolic function f0 . . . . . . . . . . . . . . . . . . . . . . . .
61
3.13 Comparison of different normalization of PSO population diversity on unimodal Schwefel’s P1.2 function f2 . . . . . . . . . . . . . . . . . . . .
62
3.14 Comparison of different normalization of PSO population diversity on unimodal Schwefel’s P1.2 function f2 . . . . . . . . . . . . . . . . . . . .
63
3.15 Comparison of different normalization of PSO population diversity on multimodal Rosenbrock function f5 . . . . . . . . . . . . . . . . . . . . .
vii
64
3.16 Comparison of different normalization of PSO population diversity on multimodal Rosenbrock function f5 . . . . . . . . . . . . . . . . . . . . .
65
3.17 Comparison of different normalization of PSO population diversity on multimodal Ackley function f8 . . . . . . . . . . . . . . . . . . . . . . . .
66
3.18 Comparison of different normalization of PSO population diversity on multimodal Ackley function f8 . . . . . . . . . . . . . . . . . . . . . . . . 4.1
Position diversity changing curves for PSO solving parabolic function f0 with different strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
. . . .
90
Comparison of PSO population diversities for solving multimodal function f5 with deterministic boundary constraints handling techniques. . .
4.9
89
Comparison of PSO population diversities for solving unimodal function f0 with deterministic boundary constraints handling techniques.
4.8
88
Comparison of PSO population diversities for solving multimodal function f5 with exceeding boundary constraints handling techniques. . . . .
4.7
87
Comparison of PSO population diversities for solving unimodal function f0 with exceeding boundary constraints handling techniques. . . . . . .
4.6
86
Comparison of PSO population diversities for solving multimodal function f5 with classic boundary constraints handling techniques. . . . . . .
4.5
85
Comparison of PSO population diversities for solving unimodal function f0 with classic boundary constraints handling techniques. . . . . . . . .
4.4
84
Position diversity changing curves for PSO solving multimodal function f5 with different strategies. . . . . . . . . . . . . . . . . . . . . . . . . .
4.3
67
91
Comparison of PSO population diversities for solving unimodal function f0 with stochastic boundary constraints handling techniques that random resetting particles in half search space. . . . . . . . . . . . . . .
92
4.10 Comparison of PSO population diversities for solving multimodal function f5 with stochastic boundary constraints handling techniques that random resetting particles in half search space. . . . . . . . . . . . . . .
93
4.11 Comparison of PSO population diversities for solving unimodal function f0 with stochastic boundary constraints handling techniques that randomly reset particles in a small and close to boundary search space.
94
4.12 Comparison of PSO population diversities for solving multimodal function f5 with stochastic boundary constraints handling techniques that randomly reset particles in a small and close to boundary search space.
95
4.13 Comparison of PSO population diversities for solving unimodal function f0 with stochastic boundary constraints handling techniques that randomly reset particles in a linearly decreased search space. . . . . . .
viii
96
4.14 Comparison of PSO population diversities for solving multimodal function f5 with stochastic boundary constraints handling techniques that randomly reset particles in a linearly decreased search space. . . . . . .
97
4.15 Examples of outperformance relations. . . . . . . . . . . . . . . . . . . . 103 4.16 Drawbacks of C measure. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.17 The relative value of the S metric depends upon an arbitrary choice of
reference point zref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.18 The comparison between C measure and D measure. . . . . . . . . . . . 106
4.19 The relative value of the D metric depends upon an arbitrary choice of
reference point z ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.20 The solution of particle swarm optimizer solving multiobjective UCP problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.21 Population diversities observation on particle swarm optimizer with star structure solving single objective problems. . . . . . . . . . . . . . . . . 118 4.22 Population diversities observation on particle swarm optimizer with ring structure solving single objective problems. . . . . . . . . . . . . . . . . 119 4.23 Population diversity change observation on PSO with star structure solving single objective problems. . . . . . . . . . . . . . . . . . . . . . . . . 119 4.24 Population diversity change observation on PSO with ring structure solving single objective problems. . . . . . . . . . . . . . . . . . . . . . . . . 120 4.25 The ratio of position diversity to cognitive diversity on PSO solving single objective problems. . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.26 Population diversities observation on PSO solving multiobjective UCP problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.1
PSO population diversity control based on current position and average of current velocities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.2
For each dimension, the exploitation space is divided into four equal parts.136
5.3
Comparison of population diversities changing for PSO solving unimodal function f4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.4
Comparison of population diversities changing for PSO solving multimodal function f9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.5
Comparison of fitness value changing for variants of PSO. . . . . . . . . 153
5.6
Definition of population diversities changing for variants of PSO solving unimodal function f4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.7
Definition of population diversities changing for variants of PSO solving multimodal function f6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.8
Definition of population diversities changing for variants of PSO solving multimodal function f8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
ix
5.9
Comparison of population diversities changing for variants of PSO solving unimodal function f4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.10 Comparison of population diversities changing for variants of PSO solving multimodal function f6 . . . . . . . . . . . . . . . . . . . . . . . . . . 158 5.11 Comparison of population diversities changing for variants of PSO solving multimodal function f8 . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.1
The process of knowledge discovery in databases (KDD). . . . . . . . . . 163
6.2
The error rate of nearest neighbor method with different k . . . . . . . . 186
6.3
The performance of utilized PSO to optimize k in NN method . . . . . . 187
6.4
The performance of utilized PSO in NN method . . . . . . . . . . . . . 188
A.1 The Pareto front of unconstrained (bound constrained) problems. . . . . 203
x
List of Tables 3.1
Parameters and Criteria of different topology. . . . . . . . . . . . . . . .
33
3.2
Parameters and Criteria for the test functions. . . . . . . . . . . . . . .
35
3.3
Results of the classic PSO and fully informed PSO with different topologies. 36
3.4
Results of the classic PSO and fully informed PSO with different topologies. 37
3.5
Results of each particle following a random neighbor in its neighborhood. 38
3.6
Results of PSO with global star and local ring structure for solving benchmark functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
3.7
Some representative benchmark functions. . . . . . . . . . . . . . . . . .
53
4.1
Results of the strategy that a particle “sticks in” boundary when it exceeds the boundary constraints. . . . . . . . . . . . . . . . . . . . . . .
4.2
Results of the strategy that a particle “sticks in” boundary when it exceeds the boundary constraints. . . . . . . . . . . . . . . . . . . . . . .
4.3
76
Results of the strategy that a particle stays at boundary when it exceeds the boundary constraints
4.4
75
. . . . . . . . . . . . . . . . . . . . . . . . . .
77
Results of the strategy that a particle ignores the boundary when it exceeds the boundary constraints. . . . . . . . . . . . . . . . . . . . . . .
78
4.5
Results of PSO with a deterministic boundary constraint strategy. . . .
79
4.6
Results of the strategy that a particle is randomly re-initialized within the half search space when the particle meets the boundary constraints.
4.7
80
Results of the strategy that a particle is randomly re-initialized within a limited search space when the particle meets the boundary constraints. 81
4.8
Results of the strategy that particles are randomly re-initialized in a linearly decreased search space when particles meet the boundary constraints. 82
4.9
Results of PSO with global star and local ring structure for solving benchmark functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.1
Representative results of PSO with diversity control based on random noise.
5.2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Representative results of PSO with diversity control based on average of current velocities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
xi
5.3
Representative results of PSO with diversity control based on current position and average of current velocities. . . . . . . . . . . . . . . . . . 130
5.4
Representative results of PSO with diversity control based on current position and average of current velocities. . . . . . . . . . . . . . . . . . 131
5.5
Results of variants of PSO with star structure solving unimodal benchmark functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.6
Results of variants of PSO with ring structure solving unimodal benchmark functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.7
Results of variants of PSO with star structure solving multimodal benchmark functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.8
Results of variants of PSO with ring structure solving multimodal benchmark functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.9
Results of variants of PSO solving large scale benchmark functions. . . . 142
5.10 Results of variants of PSO solving unimodal benchmark functions. . . . 151 5.11 Results of variants of PSO solving multimodal benchmark functions. . . 152 6.1
The imbalanced test corpus contains 950 texts, and separated unequally in each category
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.2
The categorization result of KNN, k = 1. . . . . . . . . . . . . . . . . . 175
6.3
The categorization result of KNN, k = 3. . . . . . . . . . . . . . . . . . 176
6.4
The categorization result of KWNN, k = 3. . . . . . . . . . . . . . . . . 176
6.5
The categorization result of KWNN, k = 7. . . . . . . . . . . . . . . . . 176
6.6
The categorization result of KWNN, k = 11. . . . . . . . . . . . . . . . 177
6.7
The categorization result of self-training, k = 1. . . . . . . . . . . . . . . 177
6.8
The categorization result of self-training, k = 3. . . . . . . . . . . . . . . 177
6.9
The categorization result of self-training, k = 7. . . . . . . . . . . . . . . 178
6.10 The categorization result of self-training, k = 11. . . . . . . . . . . . . . 178 6.11 The categorization result of rotated self-training, k = 1. . . . . . . . . . 178 6.12 The categorization result of rotated self-training, k = 3. . . . . . . . . . 179 6.13 The categorization result of rotated self-training, k = 7. . . . . . . . . . 179 6.14 The categorization result of rotated self-training, k = 11. . . . . . . . . 179 6.15 The categorization result of PSO based SSL, k = 1. . . . . . . . . . . . 180 6.16 The categorization result of PSO based SSL, k = 3. . . . . . . . . . . . 180 6.17 The categorization result of PSO based SSL, k = 7. . . . . . . . . . . . 180 6.18 The categorization result of PSO based SSL, k = 11. . . . . . . . . . . . 181 6.19 The imbalanced test corpus contains 2816 texts, and separated unequally in each category. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.20 The categorization error rate of nearest neighbor method with different k 185 6.21 The minimum error rate for each class and the categorization process. . 187 6.22 The minimum error rate for the categorization. . . . . . . . . . . . . . . 187 xii
List of Algorithms 1
The basic procedure of particle swarm optimization . . . . . . . . . . . .
16
2
Exploitation space dynamical reduction in particle swarm optimization . 135
3
Adaptive inertia weight in particle swarm optimization . . . . . . . . . . 148
4
The k nearest neighbor categorization algorithm . . . . . . . . . . . . . . 170
5
The k weighted nearest neighbor categorization algorithm . . . . . . . . . 171
6
The particle swarm optimization based semi-supervised learning. . . . . . 173
7
The k value optimized nearest neighbor categorization algorithm. . . . . 183
8
The k value and labeled examples optimized nearest neighbor categorization algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
xiii
List of Abbreviations ACO Ant Colony Optimization CI
Computational Intelligence
EA
Evolutionary Algorithm
EC
Evolutionary Computation
EMO Evolutionary Multiobjective Optimization EP
Evolutionary Programming
ES
Evolution Strategies
GA
Genetic Algorithm
KNN k Nearest Neighbor KWNN k Weighted Nearest Neighbor MOO Multi-Objective Optimization PSO
Particle Swarm Optimizer/Optimization
SI
Swarm Intelligence
SSL
Semi-Supervised Learning
TC
Text Categorization/Classification
xiv
Chapter 1
Introduction 1.1
Overview
Many real-world applications can be represented as optimization problems of which algorithms are required to have the capability to search for optimum. Most traditional methods can only be applied to continuous and differentiable functions [202]. The meta-heuristic algorithms are proposed to solve the problems, which the traditional methods cannot solve or at least be difficult to solve. Recently, kind of meta-heuristic algorithms, termed as swarm intelligence, are attracting more and more attentions from researchers. Swarm intelligence is a set of search and optimization techniques. To search a problem domain, a swarm intelligence algorithm processes a population. A population is a collection of individuals. Each individual represents a potential solution of the problem being optimized. Particle Swarm Optimization (PSO), which is one of the swarm intelligence techniques, was invented by Eberhart and Kennedy in 1995 [74, 133]. It is a populationbased stochastic algorithm modeled on the social behaviors observed in flocking birds. Each particle, which represents a solution, flies through the search space with a velocity that is dynamically adjusted according to its own and its companion’s historical behaviors. The particles tend to fly toward better and better search areas over the course of the search process [75, 111]. The most important factor affecting an optimization algorithm’s performance may be its ability of exploration and exploitation [80,223]. Exploration means the ability of a search algorithm to explore different areas of the search space in order to have high probability to find good promising solutions. Exploitation, on the other hand, means the ability to concentrate the search around a promising region in order to refine a candidate solution. A good optimization algorithm should optimally balance the two conflicted objectives. Population diversity is a way to monitor the degree of convergence or divergence in PSO search process [44, 251]. In other words, the particles’ current distribution and 1
its moving tendency, whether it is in the state of “flying” to a large search space or refining in a local area, can be obtained from this measurement. Optimization, in general, concerns with finding the “best available” solution(s) for a given problem within allowable time, and the problem may have several or numerous optimum solutions, of which many are local optimal solutions. According to the number of objectives, optimization problems can be divided into a single objective and multiobjective problems. Normally, the difficulty of problem will increase with the increasing of the number of variables and objectives. Specially, problems with large number of variables, e.g., more than thousands variables, are termed as large scale problems. Text categorization, or termed as text classification (TC) is a problem that finds correct category (or categories) for documents by giving a set of categories (subject, topics) and a collection of text documents. Text categorization can be considered as a mapping f : D → C, which is from the document space D onto the set of classes C. The objective of a classifier is to obtain an accurate categorization results or a high confidence of predictions.
1.2
Objectives
The exploration and exploitation are the two cornerstones of search algorithms solving optimization problems [80]. The balance of exploration and exploitation can be achieved through population diversity control. The main objectives of this thesis are to analyze the search process of particle swarm optimization based on the population diversity, to propose new kind of PSO variants to solve multimodal and large scale problems, and to apply PSO to text categorization problems.
1.3
Major Contributions
The major contributions of the thesis are the following: 1. The diversity is important in search algorithms. An algorithm’s exploration and exploitation ability can be adjusted through population diversity control. A unified definition of population diversity is proposed, and the differences between different population diversity definitions are compared. 2. Particles are not only premature converged on the local optima, but also on the boundary. The analysis and comparison of boundary constraints handling strategies based on the population diversity observations. 3. Different variants of particle swarm optimization algorithms have different search properties. This is partially based on their different search information propa-
2
gation strategies. The search information propagation strategies in particles are analysed based on the population diversity. 4. The balance of exploration and exploitation can be achieved through population diversity control. A population diversity control method is proposed to balance the algorithm’s exploration and exploitation ability. 5. To increase the possibility of particles “jumping out” of local optima, and keep the ability of the algorithm finding “good enough” solution, a variant of particle swarm optimization algorithm, based on the population diversity enhancement and search space dynamically reduction, is proposed to solve multimodal and large scale optimization problems. 6. The text mining problem is attempts to discover useful information or knowledge from textual data. A text mining problem is modeled as an optimization problem in this thesis. A PSO based semi-supervised learning method, and PSO optimized nearest neighbor method are utilized to solve text categorization problem.
1.4
Outline of the Thesis
The rest of this thesis is organized as follows. Chapter 2 presents a brief literature review. Section 2.2 briefly introduces the optimization. Section 2.4 reviews the swarm intelligence. Section 2.5 reviews the basic particle swarm optimization algorithm and fully informed particle swarm algorithm. Section 2.6 briefly reviews the Chinese text categorization problem. Chapter 3 introduces the theoretical analysis of population diversity. Section 3.2 gives a definition of population diversity. Section 3.3 analyses the difference between different population diversity, and the properties of different population diversity definitions. Section 3.4 analyses the search information propagation in particles based on the population diversity. Section 3.5 introduces the definitions and properties of normalized population diversities. Chapter 4 introduces the observation of population diversity in PSO. Section 4.2 introduces the observation of population diversity on different boundary constraints handling strategies in PSO, and compares the performance of different boundary constraints handling in PSO variants. Section 4.3 introduces population diversity in PSO solving multiobjective problems. Chapter 5 introduces the population diversity control in PSO. Section 5.2 introduces a diversity control method. Section 5.3 introduces the dynamical exploitation space reduction for solving large scale problems. Section 5.4 introduces population diversity based inertia weight adaptation in PSO.
3
Chapter 6 introduces the application of PSO to Chinese text categorization problem. Section 6.2 reviews the similarity and distance metrics in vectors. Section 6.3 reviews the nearest neighbor based category methods. Section 6.4 introduces a PSO based semisupervised learning on Chinese text categorization problems. Section 6.5 introduces a PSO based nearest neighbor algorithm on Chinese text categorization. Finally, Chapter 7 gives the conclusions and future research. The appendix A describes the benchmark functions utilized in this thesis, which include the single objective and multiobjective optimization functions, respectively.
4
Chapter 2
Literature Review 2.1
Overview
This chapter introduces the background information of optimization and text categorization, which contains: • The concept of optimization, which includes single objective optimization and multi-objective optimization.
• The evolutionary computation and swarm intelligence, especially particle swarm optimization.
• The processes of Chinese text categorization.
2.2
Optimization
Optimization concerns with finding the “best available” solution(s) for a given problem within allowable time. Optimization problems can be simply divided into unimodal problems and multimodal problems. As indicated by the name, a unimodal problem has only one optimum solution; on the contrary, a multimodal problem has several or numerous optimum solutions, of which many are local optimal solutions. Evolutionary optimization algorithms, or simply the evolutionary algorithms (EA), are generally difficult to find the global optimum solutions for multimodal problems due to the possible occurrence of the premature convergence [59, 100, 165]. In mathematical terms, an optimization problem in Rn , or simply an optimization
problem, is a mapping f : Rn → Rk , where Rn is termed as decision space [3] (or
parameter space [121], problem space [229]), and Rk is termed as objective space [216].
Optimization problems can also be divided into two categories by the value of k, when k = 1, this kind of problem is called Single Objective Optimization (SOO), and when k > 1, this is called Multi-Objective Optimization1 (or Many Objective Optimization,
1 Note that the terms “multi-objective” and “multiobjective” are used interchangeably throughout this thesis.
5
MOO) [4, 45, 188]. The evaluation function in optimization, f (x), maps decision variables to objective vectors. Each solution in decision space is associated with a point in objective space. This situation is represented in Figure 2.1 for the case n = 3, and k = 2. x2
Ω = {x ∈ Rn }
f2
x1
Λ = {y ∈ Rk }
x3
Solution Space
→
(a)
Objective Space
f1
(b)
Figure 2.1: The mapping from solution space to objective space. One of the main differences between single objective and multiobjective optimization is that multiobjective problems constitute a multidimensional objective space Rk . In addition, a set of solutions representing the trade-off among the different objectives
rather than an unique optimal solution is sought in multiobjective optimization. This set of solutions is also known as the Pareto optimal set and these solutions are also termed as “noninferior,” “admissible,” or “efficient” solutions. The corresponding objective vectors of these solutions are termed as “nondominated” and each objective component of any nondominated solution in the Pareto optimal set can only be improved by degrading at least one of its other objective components.
2.2.1
Single Objective Optimization
The single objective problems concerns finding the maximum or minimum of a fitness function. The traditional single objective problems are well studied, and there are three kinds of issues that still need to be solved. Large Scale Problems Many optimization methods suffer from the “curse of dimensionality”, which implies that their performance deteriorates quickly as the dimension of the search space increases [11, 33, 66, 105, 145]. There are several reasons that cause this phenomenon. First, the solution space of a problem often increases exponentially with the problem dimension and more efficient search strategies are required to explore all promising 6
regions within a given time budget. The evolutionary computation or swarm intelligence is based on the interaction of a group of solutions. The promising regions or the landscape of problems are very difficult to reveal by small solution samples (compared with the number of all feasible solutions). The “empty space phenomenon” gives an example of problems getting hard when the dimension increases [145, 196, 224]. The number of possible solutions is increased exponentially when the dimension increasing. There are mn possible solutions in total for a problem with m possible solutions in each dimension (assume that each dimension has the same number of possible solutions). For example, when the m equals to 1000, 100 samples cover 10% solutions for one dimensional problem. However, 100 samples only cover 0.01% solutions for two dimensional problem. For continuous problems, even we considered the computational accuracy, the number of possible solutions in one dimension is larger than 1000. The percentage of data points will decrease rapidly. The search performance of most algorithms is based on the previous search experience. Considered the limitation of computational resources, the percentage of data points have been retrieved will close to zero when the dimension increased to a large number. The performance of algorithms is affected by the increasing of problems’ dimension. Second, the characteristics of a problem may change with the scale. Problems will become more difficult and complex when the dimension increases. Rosenbrock’s function (see also A.1), for instance, is unimodal for two dimensional problems but becomes multimodal for higher dimensional problems [199]. Because of such a worsening of the features of an optimization problem resulting from an increase in scale, a previously successful search strategy may no longer be capable of finding an optimal solution. Third, the direction of “good” solutions is difficult to determine. The swarm intelligence takes an “update on each dimension, evaluation on whole dimensions” strategy. An algorithm is very difficult to determine which one is better when two solutions both have some good parts and their fitness values are equally bad. The similar scenario also happens in multiobjective optimization. In Pareto domination measurement, nearly all solutions are Pareto non-dominated when the number of objects is larger than 10 [115]. The last, the bias is accumulated. In the swarm intelligence, each solution is updated dimension by dimension, and the fitness value is calculated for the whole solution. The solution update depends on the combination of several vectors, i.e., the current value, the difference between current value and previous best value, the differential between current value and neighbor best value, or the difference between two random solutions, etc. In the low dimensional space, the direction of the vector combination has the high probability to point to the global optimum. However, the distance metric, which is utilized in low dimension space, is not effective in high dimensional space [5, 228]. The search direction is far away from the global optimum due to the bias accumulation. Many effective strategies are proposed for high dimensional optimization problem-
7
s, such as problem decomposition and subcomponents cooperation [248], parameter adaptation [245], surrogate-based fitness evaluations [118, 121]. Based on the swarm intelligence, an effective method could find good solutions for large scale problems, both on the time complexity and result accuracy. Dynamic Problems Evolutionary computations (ECs) have been widely applied to solve stationary optimization problems. However, many real-world applications are actually dynamic. The dynamic problems, sometimes termed as non-stationary environments [172], or uncertain environments [119], have a dynamic changing over time. The two most common dynamics studied are: a slowly drifting motion and an oscillatory motion. Over the years, a number of dynamic test problems have been utilized to compare the performance of EAs in dynamic environments, e.g., the “moving peaks” benchmark (MPB) proposed by Branke [20], the initial test function generator DF1 for non-stationary problems [172], the single and multi-objective dynamic test problem generator by dynamically combining different objective functions of exiting stationary multi-objective benchmark problems [120, 147, 148]. Evolutionary computations often have to solve optimization problems in the presence of a wide range of uncertainties. Generally, uncertainties in evolutionary computation can be divided into the following four categories. First, the fitness function is noisy. Second, the design variables and/or the environmental parameters may change after optimization, and the quality of the obtained optimal solution should be robust against environmental changes or deviations from the optimal point. Third, the fitness function is approximated, which means that the fitness function suffers from approximation errors. Fourth, the optimum of the problem to be solved changes over time and, thus, the optimizer should be able to track the optimum continuously. In all these cases, additional measures must be taken so that evolutionary algorithms are still able to solve satisfactorily dynamic problems [24, 119, 241]. Constrained Problems Most optimization problems have constraints of different types (e.g., physical, time, geometric, etc.) which modify the shape of the search space. During the last few years, a wide variety of meta-heuristics have been designed and applied to solve constrained optimization problems. Evolutionary algorithms and most other meta-heuristics, when used for optimization, naturally operate as unconstrained search techniques. Therefore, they require an additional mechanism to handle constraints into their fitness function. The constrained optimization handles with the problem has equality and/or inequal-
8
ity constraints [95, 96, 153, 160, 169]. A general constrained problem can be defined as minimize
f (x), x = (x1 , x2 , · · · , xn )
subject to gi (x) ≤ 0,
i = 1, · · · , p
hj (x) = 0,
j = 1, · · · , q
where x is the vector of solutions, gi (x) are inequality constraints, p is the number of inequality constraints, hj (x) are equality constrains, and q is the number of equality constraints (in both cases, constraints could be linear or nonlinear). The equality constraints are more difficult to handle than inequality constraints, therefore, in most cases, equality constraints are transformed into inequalities constraints. |hj (x) − | = 0,
j = 1, · · · , q
where is a threshold, and the is set to a tiny value. Traditionally, in evolutionary algorithms and in mathematical programming, the most common approach to handle constraints is the penalty functions. In general, the penalty functions have several limitations. Firstly, the penalty functions are not easy to handle the boundary between the feasible and the infeasible regions. They are not a good choice when trying to solve problem in which the optimum lays in the boundary between the feasible and the infeasible regions or when the feasible region is disjointed. Secondly, the parameters are difficult to set. The penalty functions require a careful fine-tuning to determine the most appropriate penalty factors [153].
2.2.2
Multiobjective Optimization
Multiobjective Optimization refers to optimization problems that involve two or more objectives, and a set of solutions is sought instead of one [45]. A general multiobjective optimization problem can be described as a vector function f that maps a tuple of n parameters (decision variables) to a tuple of k objectives. Without loss of generality, minimization is assumed throughout this paper. minimize
f (x) = (f1 (x), f2 (x), · · · , fk (x))
subject to x = (x1 , x2 , · · · , xn ) ∈ X y = (y1 , y2 , · · · , yk ) ∈ Y where x is called the decision vector, X is the decision space, y is the objective vector, and Y is the objective space, and f : X → Y consists of k real-valued objective
functions.
Let u = (u1 , · · · , uk ), v = (v1 , · · · , vk ) ∈ Y, be two vectors, u is said to dominate
v (denoted as u v), if ui ≤ vi , ∀i = 1, · · · , k, and u 6= v. A point x∗ ∈ X is called
Pareto optimal if there is no x ∈ X such that f (x) dominates f (x∗ ). The set of all 9
the Pareto optimal points is called the Pareto set (denoted as P S). The set of all the Pareto objective vectors, P F = {f (x) ∈ X|x ∈ P S}, is called the Pareto front (denoted as P F ).
Unlike the single objective optimization, the multiobjective problems have many or infinite solutions [19]. The optimization goal of an MOP consists of three objectives: 1. The distance of the resulting nondominated solutions to the true optimal Pareto front should be minimized; 2. A good (in most cases uniform) distribution of the obtained solutions is desirable; 3. The spread of the obtained nondominated solutions should be maximized, i.e., for each objective a wide range of values should be covered by the nondominated solutions. In a multiobjective optimization problem, we aim to find the set of optimal trade-off solutions known as the Pareto optimal set. Pareto optimality is defined with respect to the concept of nondominated points in the objective space.
2.3
Evolutionary Computation
Many real-world applications can be represented as optimization problems of which algorithms are required to have the capability to search for optimum. Originally, these optimization problems were mathematically represented by continuous and differentiable functions so that algorithms such as hill-climbing algorithms can be designed and/or utilized to solve them. Traditionally, these hill-climbing like algorithms are single-point based algorithms such as gradient decent algorithms which move from the current point along the direction pointed by the negative of the gradient of the function at the current point. These hill-climbing algorithms can find solutions quickly for unimodal problems, but they have the problems of being sensitive to initial search point and being easily trapped into local optimum for nonlinear multimodal problems. Furthermore, these mathematical functions need to be continuous and differentiable, which instead greatly narrows the range of real-world problems that can be solved by hill-climbing algorithms. Recently, evolutionary algorithms have been designed and utilized to solve optimization problems. Different from traditional single-point based algorithms such as hill-climbing algorithms; each evolutionary algorithm is a population-based algorithm, which consists of a set of points (population of individuals). The population of individuals is expected to have high tendency to move towards better and better solution areas iteration over iteration through cooperation and/or competition among themselves. There are a lot of evolutionary algorithms out there in the literature. The most popular evolutionary algorithms are evolutionary programming [92, 93], genetic algorith10
m [54,104,106,108,158,170,210,219], evolution strategy [13,191], and genetic programming [141], which were inspired by biological evolution. In evolutionary algorithms, population of individuals survives into the next iteration. Which individual has higher probability to survive is proportional to its fitness value according to some evaluation function. The survived individuals are then updated by utilizing evolutionary operators such as crossover operator and mutation operator, etc. In evolutionary programming and evolution strategy, only the mutation operation is employed, while in genetic algorithms and genetic programming, both the mutation operation and crossover operation are employed. The optimization problems to be optimized by evolutionary algorithms do not need to be mathematically represented as continuous and differentiable functions; they can be represented in any form. Only requirement for representing optimization problems is that each individual can be evaluated as a value called fitness value. Therefore, evolutionary algorithms can be applied to solve more general optimization problems, especially those that are very difficult, if not impossible, for traditional hill-climbing algorithms to solve. Swarm intelligence [18, 134], or Evolutionary Computation [91, 240, 250] is a search and optimization technique. To search a problem domain a swarm intelligence algorithm processes a population. A population is a collection of individuals. Each individual represents a variable in the problem being optimized.
2.4
Swarm Intelligence
Swarm intelligence (SI), which works with a population of individuals, is a collection of nature-inspired searching techniques [78, 83, 182]. Swarm intelligence, which belongs to computational intelligence (CI) discipline, concerns with the design of intelligent multi-agent systems by taking inspiration from the collective behavior of social insects such as ants, termites, bees, and wasps, as well as from other animal societies such as flocks of birds or schools of fish. Colonies of social insects have fascinated researchers for many years, and the mechanisms that govern their behaviors remained unknown for a long time. Even though a single member of these colonies is a non-sophisticated individual, they are able to achieve complex tasks through cooperation. Coordinated colony behavior emerges from relatively simple actions or interactions between the colonies’ individual members. Many aspects of the collective activities of social insects are self-organized and work without a central control. Recently, swarm intelligence algorithms are attracting more and more attentions from researchers. Swarm intelligence algorithms are usually nature-inspired optimization algorithms instead of evolution-inspired optimization algorithms such as evolutionary algorithms. Similar to evolutionary algorithms, a swarm intelligence algorithm is also a population-based optimization algorithm. Different from the evolutionary algorithms, each individual in a swarm intelligence algorithm represents a simple object 11
such as ant, bird, fish, etc. So far, a lot of swarm intelligence algorithms have been proposed and studied. Among them are particle swarm optimization [74, 76, 130, 131, 133, 203, 204], ant colony optimization algorithm (ACO) [68–70], bacterial forging optimization algorithm (BFO) [184–186], cuckoo search algorithm (CS) [246], firefly optimization algorithm (FFO) [242–245], bee swarm intelligence [125] (includes bee colony optimization algorithm (BCO) [175,176], Artificial bee colony algorithm (ABC) [124,126–128]), artificial immune system (AIS) [58,137], fish school search optimization algorithm (FSO) [9,10], shuffled frog-leaping algorithm (SFL) [84, 85], intelligence water drops algorithm (IWD) [197], brain storm optimization algorithm (BSO) [201, 202], to just name a few. In a swarm intelligence algorithm, an individual represents a simple object such as birds in PSO, ants in ACO, bacteria in BFO, etc. These simple objects cooperate and compete among themselves to have a high tendency to move toward better and better search areas. As a consequence, it is the collective behavior of all individuals that makes a swarm intelligence algorithm to be effective in problem optimization. For example, in PSO, each particle (individual) is associated with a velocity. The velocity of each particle is dynamically updated according to its own historical best performance and its companions’ historical best performance. All the particles in the PSO population fly through the solution space in the hope that particles will fly towards better and better search areas with high probability. Mathematically, the updating process of population of individuals over iterations can be looked as a mapping process from one population of individuals to another population of individuals from one iteration to the next iteration, which can be represented as Pt+1 = f (Pt ), where Pt is the population of individuals at the iteration t, f () is the mapping function. In general, the expected fitness of a solution returned should improve as the search method is given more computational resources in time and/or space. More desirable still, in any single run, the quality of the solution returned by the method should improve monotonically – that is, the quality of the solution at time t + 1 should be no worse than the quality at time t, i.e., f itness(t + 1) ≤ f itness(t) for minimum problems [89].
2.5
Particle Swarm Optimizer/Optimization
Particle swarm optimizer/optimization is one of the swarm intelligence techniques [74, 133]. It is a population-based stochastic algorithm modeled on the social behaviors observed in flocking birds. Each particle, which represents a solution, flies through the search space with a velocity that is dynamically adjusted according to its own and its companion’s historical behaviors. The particles tend to fly toward better and better search areas over the course of the search process [75, 76, 111]. 12
There are many analysis on the search processes of PSO. Analysis of particle trajectories in particle swarm optimization [221, 222], evolving problem landscapes [143], information exchange in multiple cooperating swarms [82], and search information propagation [36]. The dynamic update rule of particle swarm optimization is formulated as a second-order stochastic difference equation in [15], and general relations are derived for search focus, search spread, and swarm stability at stagnation. Many variants of PSO are proposed to solve different kinds of problems. Ratanweera et al. proposed a self-organizing hierarchical PSO with linearly time-varying acceleration coefficients (HPSO-TVAC) [190]. Liang and Suganthan proposed a dynamic multi-swarm particle swarm optimizer (DMS-PSO) which is characterized by dynamically changing neighborhood topology [154]. A two-stage particle swarm optimizer was proposed by Zhuang et al. [263]. Zhan et al. presented an adaptive particle swarm optimization which enables the automatic control of inertia weight, acceleration coefficients, and other algorithmic parameters according to evolutionary states [251]. Nickabadi et al. proposed a novel adaptive inertia weight approach which uses the successful rate of the swarm as its feedback parameter to ascertain the particles’ situation [177]. Another active research trend in PSO aims to design different neighborhood topologies. The PSO with Von Neumann topological structure has been proposed by Kennedy and Mendes to enhance the performance in solving multimodal problems [135]. Peram et al. developed the fitness-distance-ratio-based PSO (FDR-PSO) with near neighbor interactions [187]. Mendes and Kennedy introduced a fully informed particle swarm (FIPS), in which the information of the entire neighborhood is all utilized to update the velocity [166, 168]. Combining the PSO with other auxiliary search techniques is an effective way to improve the performance of PSO. Genetic operators such as selection, crossover, and mutation have been hybridized with PSO to keep the best particles. Differential evolution [55], and ant colony optimization and also conventional local search techniques have been used to combine with PSO. In addition, to avoid locating previously detected solutions, techniques such deflection, sketching, repulsion, self-organizing, and dissipative methods have also been used in the hybrid PSO algorithms. Some biology-inspired operators such as niche and specification technique are introduced into PSO to prevent the swarm from crowding too closely and to locate as many optimal solutions as possible. A cellular particle swarm optimization, in which a cellular automata mechanism is integrated in the velocity update to modify the trajectories of particles, is proposed in [200]. Designing a new learning strategy may be the most promising way to improve the performance of PSO. Liang et al. proposed the comprehensive learning particle swarm optimizer (CLPSO), in which particles are allowed to use different personal best
13
position to update its flying on different dimensions [152]. An integrated learning particle swarm optimizer is proposed in [193]. This algorithm finds the diverged particle and accelerates them towards optimal solution, updates the velocity according to hyper-spherical coordinates system [192]. Wang et al. proposed a self-adaptive learning based particle swarm optimization which simultaneously adopts four PSO based search strategies [225]. The probability of search strategy is determined according to a strategies’ ability of generating better quality solutions in the past iterations. Inspired by a social phenomenon that multiple good examples can guide a crowd towards making progress, Huang et al. developed an example-based learning particle swarm optimization [112]. Zhan et al. proposed an orthogonal learning strategy for PSO to utilize more useful information previously discovered via orthogonal experimental design [252]. This strategy can guide particles to fly in better direction by constructing a much promising and efficient exemplar. Li proposed a cooperative coevolving particle swarm optimization (CCPSO) algorithm in an attempt to address the issue of scaling up particle swarm optimization algorithms in solving large-scale optimization problems (up to 2000 real-valued variables) [150]. The particle swarm optimizer is utilized to solve many kinds of problems, such as Facility Location Problem [227], portfolio optimization [225], University Examination Timetabling Problems [87], urban traffic signal control [259], just to name a few. Particle Swarm Optimization is a population-based stochastic algorithm modeled on social behaviors observed in flocking birds [74, 133]. A particle flies through the search space with a velocity that is dynamically adjusted according to its own and its companion’s historical behaviors. Each particle’s position represents a solution to the problem. Particles tend to fly toward better and better search areas over the course of the search process [36, 75, 111]. Particle swarm optimization emulates the swarm behavior and the individuals represent points in the n-dimensional search space. A particle represents a potential solution. Each particle is associated with two vectors, i.e., the velocity vector and the position vector. For the purpose of generality and clarity, m represents the number of particles and n the number of dimensions. The position of a particle is represented as xij , i represents the ith particle, i = 1, · · · , m, and j is the jth dimension, j = 1, · · · , n. The velocity of a particle is represented as vij .
2.5.1
Original Particle Swarm
Each particle represents a potential solution in particle swarm optimization, and this solution is a point in the n-dimensional solution space. The original PSO algorithm is simple in concept and easy in implementation [74, 133]. The velocity vij and position
14
xij of the jth dimension of the ith particle are updated as follow [76, 134]: vij = vij + c1 rand()(pij − xij ) + c2 Rand()(pnj − xij )
(2.1)
xij = xij + vij
(2.2)
where c1 and c2 are positive constants, and rand() and Rand() are two random functions in the range [0, 1) and are different for each dimension and each particle.
2.5.2
Classical Particle Swarm
In the original particle swarm optimizer, the velocity is difficult to control during the search process. The final solution is heavily dependent on the initial seeds (population). For different problems, there should be different balances between the local search ability and global search ability. Shi and Eberhart introduced a new parameter, an inertia weight w, to balance the exploration and exploitation [203, 204]. This inertia weight w is added to equation (2.1), and it can be a constant, linear decreasing value over time [205], or fuzzy value [206,209]. The new velocity update equation is as follows vij = wvij + c1 rand()(pij − xij ) + c2 Rand()(pnj − xij )
(2.3)
Adding an inertia weight w can increase the probability for algorithm to converge to better solutions, and have a way to control the whole process of algorithm’s searching. Generally speaking, algorithm should have a more exploration and less exploitation ability at first, which has a high probability to find more local optima. Exploration should be decreased, and exploitation should be increased to refine candidate solutions over the time. Accordingly, the inertia weight w, should be linearly decreased or even dynamically determined by a fuzzy system. The equations of classical particle swarm optimization algorithm can be rewritten in the vector form. The velocity and position update equations are as follow [76,134,203]: vi ← wvi + c1 rand()(pi − xi ) + c2 Rand()(pn − xi )
(2.4)
xi ← xi + vi
(2.5)
where w denotes the inertia weight and usually is less than 1 [35, 204, 206], c1 and c2 are two positive acceleration constants, rand() and Rand() are two random functions to generate uniformly distributed random numbers in the range [0, 1) and are different for each dimension and each particle, xi represents the ith particle’s position, vi represents the ith particle’s velocity, pi is termed as personal best, which refers to the best position found by the ith particle, and pg is termed as local best, which refers to the position found by the members in the ith particle’s neighborhood that has the best fitness evaluation value so far.
15
The inertia weight w can be different for different particle at different different dimension. The inertia weight can be written as wi . Consider the iteration number t, the equations are rewritten as: vi (t + 1) ← wi vi (t) + c1 rand()(pi − xi (t)) + c2 Rand()(pn − xi (t)) xi (t + 1) ← xi (t) + vi (t + 1) Random variables are frequently utilized in swarm optimization algorithms. The length of search step is not determined in the optimization. This approach belongs to an interesting class of algorithms that are known as randomized algorithms. A randomized algorithm does not guarantee an exact result but instead provides a high probability guarantee that it will return the correct answer or one close to it. The result(s) of optimization may be different in each run, but the algorithm has a high probability to find a “good enough” solution(s). Algorithm 1: The basic procedure of particle swarm optimization Initialization: Initialize velocity and position randomly for each particle in every dimension; while have not found the “good enough” solution or have not reached the maximum number of iterations do Calculate each particle’s fitness value; Compare fitness value between current position and the best position in history (personal best, termed as pbest). For each particle, if the fitness value of the current position is better than pbest, then update pbest to be the current position; Select the particle which has the best fitness value among current particle’s neighborhood, this particle is called the neighborhood best (termed as nbest)2 ; for each particle do Update particle’s velocity and position according to the equation (2.4) and (2.5), respectively;
1
2
3 4
5
6 7
The basic procedure of PSO is shown as Algorithm 1. A particle updates its velocity according to equation (2.4), and updates its position according to equation (2.5). The c1 rand()(pi − xi ) part can be seen as a cognitive behavior, while c2 Rand()(pg − xi ) part can be seen as a social behavior.
In particle swarm optimization, a particle not only learns from its own experience, it also learns from its companions. It indicates that a particle’s “moving position” is determined by its own experience and the neighbors’ experience [31].
2.5.3
Fully Informed Particle Swarm
Fully informed PSO (FIPS) does not share the concept of “global/local best”. A particle in FIPS does not follow the leader in its neighborhood, but follow all other particles in 2
If the current particle’s neighborhood includes all particles then this neighborhood best is the global best (termed as gbest), otherwise, it is the local best (termed as lbest)
16
its neighborhood. The basic equations of the FIPS algorithm are as follow [136, 168]: vi ← χ vi +
Ni X U (0, ϕ)(pnbr(k) − xi )
!
Ni
k=1
xi ← xi + vi
(2.6) (2.7)
where χ denotes the acceleration coefficient, U (0, ϕ) is a random function to generate random numbers in the range [0, ϕ], Ni represents the neighborhood size of the ith particle, and pnbr(k) represents the kth particle’s personal best position.
2.5.4
Topology Structure
A particle updates its position in the search space at each iteration. The velocity update in Equation (2.4) consists of three parts, which are previous velocity, cognitive part, and social part. The cognitive part means that a particle learns from its own searching experience, and correspondingly, the social part means that a particle can learn from other particles, or learn from the best in its neighbors in particular. Topology defines the neighborhood of a particle. Different topology structure can be utilized in PSO, which will have different strategy to share search information for every particle. Global star and local ring are two most commonly used topology structures. A PSO with global star structure, where all particles are connected to each other, has the smallest average distance in swarm, and on the contrary, a PSO with local ring structure, where every particle is connected to two near particles, has the biggest average distance in swarm [31, 168]. Particle swarm optimization algorithm has different kinds of topology structures, e.g., star, ring, four clusters, or Von Neumann structure. A particle in a PSO with a different structure has different number of particles in its neighborhood with a different scope. Learning from a different neighbor means that a particle follows different neighborhood (or local) best, in other words, topology structure determines the connections among particles, and the strategy of search information propagation. Although it does not relate to the particle’s cognitive part directly, topology can affect the algorithm’s convergence speed and the ability of avoiding premature convergence, i.e., the PSO algorithm’s ability of exploration and exploitation. A topology structure can be seen as the environment for particles [12]. Particles live in the environment and each particle competes to be the global, local, or neighborhood best. If a particle is chosen to be the global, local, or neighborhood best, its (position) information will affect other particles’ positions, and this particle is considered as a leader in its neighborhood. The structure of PSO determines the environment for particles, the process of a particle competing to be a leader is like an animal struggling in its population.
17
In this thesis, four most commonly used topology structures are considered [135, 166]. They are star, ring, four clusters, and Von Neumann structure, which are shown in Figure 2.2.
(a) Star
(b) Ring
(c) Four clusters
(d) Von Neumann
Figure 2.2: Topologies used in this thesis are presented in the following order: (a) Star topology, where all particles or nodes share the search information in the whole swarm; (b) Ring topology, where every particle is connected to two neighbors; (c) Four clusters topology, where four fully connected subgroups are inter-connected among themselves by linking particles; (d) Von Neumann topology, which is a lattice and where every particle has four neighbors that are wrapped on four sides. Each swarm has 16 particles.
• The star topology is shown in Figure 2.2 (a). Because all particles or nodes are connected, search information is shared in a global scope, this topology is
frequently termed as global or all topology. With this topology, the search information is shared in the whole swarm, and a particle with the best fitness value will be chosen to be the “leader.” Other particles will follow the leader to find optimum. This topology can be seen as a completive competition pattern. In this pattern, each particle competes with all others in the population and this requires N − 1 total competitions for a single-species population of N particles. 18
• The ring topology is shown in Figure 2.2 (b). A particle is connected with two neighbors in this topology. A particle compares its fitness value with its left
neighbor at first, and then the winner particle compares with the right neighbor. A particle with better fitness value in this small scope is determined by these two comparisons. This is like a small competition environment; each particle only competes with its two neighbors. This requires 2(N − 1) total competitions for a population of N particles.
• The four clusters topology is shown in Figure 2.2 (c). As the name indicated, the whole swam are divided into four subgroups. Each subgroup is a small star topology, which has a “leader” particle in this subgroup, sharing its own search information. Each subgroup also has three linking particles links to other three subgroups. The linking particles are used to exchange search information with other three subgroups. For N particles, each subgroup has N/4 particles, this need N − 4 competitions, plus with 12 times search information exchange. This requires N + 8 total competitions.
• The Von Neumann topology is shown in Figure 2.2 (d). This topology is
also named as Square [168] or NWES neighborhood (for each particle has four
neighbors in the North, East, West, and South) [71]. In this topology, every particle has four neighbors that are wrapped on four sides, and the swarm is organized as a mesh. For N particles, this need 4(N − 1) total competitions. Topology determines the structure of particles’ connections and the transmission of search information in the swarm. Star and ring are the two most commonly used structures. A PSO with a star structure, where all particles are connected to each other, has the smallest average distance in swarm, and on the contrary, a PSO with a local ring structure, where every particle is connected to two near particles, has the biggest average distance in swarm [166–168].
2.6 2.6.1
Text Categorization Introduction
The Knowledge discovery in databases (KDD) is the process of converting raw data into useful information. Data mining (the analysis step of KDD process), is the process that attempts to discover useful information (or patterns) in large data repositories [217]. Information retrieval (IR) is to obtain material of an unstructured, or semi-structured nature that relevant to an information need from large collections of information resources [161]. The traditional data mining techniques are focuses on the well-structured collections that exist in either relational databases or data warehouses. The text mining techniques 19
are focuses on the semi-structured data or natural-language documents, which is far less structured, such as the Web page contents [6, 156]. Text Categorization is a field at the intersection of data mining and information retrieval. The task of text categorization is to classify documents into a fixed number of one or two predefined classes. A class is considered a semantic category that groups documents that have certain properties in common. Generally, a document can be in multiple, exactly one, or no classes. Yet, with the task of information filtering in mind, i.e., the categorization of documents as either relevant or non-relevant, we assume that each document is assigned to exactly one class. More precisely, the text categorization problem is as follows [144]. Assume a space of textual documents D and a fixed set of k classes C = {c1 , · · · , ck },
which implies a disjoint, exhaustive partition of D. Text categorization is a mapping, f : D → C, from the document space onto the set of classes.
2.6.2
Swarm Intelligence in Data Mining
Data mining has been a popular academic topic in computer science and statistics for decades, swarm intelligence is a relatively new subfield of computational intelligence (CI) which studies the collective intelligence in a group of simple individuals. In the swarm intelligence, useful information can be obtained from the competition and cooperation of individuals. Generally, there are two kinds of approaches that applies swarm intelligence as data mining techniques [164]. The first category consists of techniques where individuals of a swarm move through a solution space and search for solution(s) for the data mining task. This is an effective search approach; the swarm intelligence is applied to optimize the data mining technique, e.g., the parameter tuning. In the second category, swarms move data instances that are placed on a low-dimensional feature space in order to come to a suitable clustering or low-dimensional mapping solution of the data. This is a data organizing approach; the swarm intelligence is directly applied to the data samples, e.g., samples dimensionality reduction. Swarm intelligence, especially particle swarm optimization or ant colony optimization algorithms, is utilized in data mining to solve single objective [2] and multiobjective problems [48]. Based on the two characters of particle swarm: the selfcognitive and social learning, the particle swarm has been utilized in data clustering techniques [8, 49, 220, 237], document clustering [1, 53], variable weighting in clustering high-dimensional data [159], semi-supervised learning based text categorization [35], and the Web data mining [181].
20
2.6.3
Chinese text categorization process
In the Chinese text categorization process, the texts are divided into sets of words. Each word represents a dimension in the vector space. Vector space model or term vector model is an algebraic model for representing text documents as vectors of identifiers. The Cosine metric is utilized to measure the similarity between the test example (a set of vector) and the examples in the training set. In solving the text categorization problem, each text will be transferred to a collection of words. In other words, the long text needs to take some process before classifier works. This process includes punctuation clean, Chinese word segmentation, and feature selection. Data Preparation and Cleaning One important step in text preprocess is the Tokenization or punctuation clean [157, 161]. For language properties, a single word has no punctuation. Changing all punctuation marks to empty spaces is a useful method to simplify text. After punctuation clean, text becomes to be several short sentences, sentences become to be a set of words as the search words in dictionary. Content in text becomes a vector and each element is the frequency that a single word occurs in the text. Each word is one dimension of the text space, and text becomes a vector space model. Then the distance and similarity between two texts can be measured by using this model. Chinese word segmentation Chinese characters are examples of “ideograms”, and many have specific logographic qualities or functions. There is no white space to separate the words in texts. Indexing of Chinese documents is impossible without a proper segmentation algorithm [226,236]. The most commonly used method to segment Chinese sentence is dictionary. Text is split to several terms which contain two or three Chinese characters. If this term is found in dictionary, then this term is a word. The quality of dictionary may affect the performance of a classifier. Constructing a proper dictionary is an important step in Chinese word segmentation. The contents of each text are transferred to a collection of words, and each word is a dimension of the categorization space. The frequency of each word is a feature of categorized examples. Feature Selection In text mining domains, effective feature selection is essential to make the mining task efficient and more accurate. There are more than ten thousand words in Chinese or English. The number of different words is large even in relatively small documents such as short news articles or article abstracts. The number of different words in big 21
document collections can be huge [88]. The dimension of the bag of words feature space for a big collection can reach hundreds of thousands; moreover, the document representation vectors, although sparse, may still have hundreds and thousands of nonzero components. Feature selection is necessary to make large problems computationally efficient. An empirical study of twelve feature selection metrics was presented on text classification benchmark problem [97]. Most of those words are irrelevant to the categorization task, which can be dropped with no harm to a classifier’s performance and may even result in improvement owing to noise reduction. These words, like “an” or “the” in English, are termed as stop words [151]. The feature selection process removes the irrelevant words in the word collections.
22
Chapter 3
Population Diversity 3.1
Overview
Diversity is important in population based algorithms. There are many discussions on diversity of evolutionary algorithms, such as genetic algorithm [107], estimation of distribution algorithms (EDA) [255]. In addition, many studies have been done on population diversity in particle swarm optimization [14, 179, 207, 208, 253]. Particles fly in the search space. If particles easily get clustered together in a short time, these particles will lose their “search potential.” Population premature convergence around a local optimum is a common problem for population-based algorithms. It is a result of individuals hastily congregating within a small region of the search space. An algorithm’s search ability of exploration is decreased when premature convergence occurs, and particles will have a low possibility to explore new search areas. Normally, diversity, which is lost due to particles getting clustered together, is not easy to be recovered. An algorithm may lose its search efficacy due to premature convergence. As a population becomes converged, the algorithm will spend most of the iterations to search in a small region. Premature convergence and low search efficacy happen partially due to the reason that a population-based algorithm has an improper search information propagation strategy. For example, to solve a difficult problem, star topology in classical PSO or fully informed PSO may cause a fast propagation of search information. On contrast, to solve a simple problem, ring topology in classical PSO has a slow propagation in search information, which may lead to a slow convergence, and therefore may need many iterations to find a “good enough” solution. Although in literature many methods were designed to avoid premature convergence, these methods did not incorporate an effective way to measure the degree of premature convergence, in other words, the measurement of particles’ exploration / exploitation is still needed to be investigated. Shi and Eberhart gave several definitions on diversity measurement based on particles’ positions, velocity, and historical best positions [207, 208]. Through diversity measurements, useful exploration and/or 23
exploitation search information can be obtained. The most important factor affecting an optimization algorithm’s performance is its ability of exploration and exploitation. Exploration means the ability of a search algorithm to explore different areas of the search space in order to have high probability to find good promising solutions. Exploitation, on the other hand, means the ability to concentrate the search around a promising region in order to refine a candidate solution. A good optimization algorithm should optimally balance the two conflicted objectives. Population diversity of PSO is useful for measuring and dynamically adjusting an algorithm’s ability of exploration or exploitation accordingly. Shi and Eberhart gave three definitions on population diversity, which are position diversity, velocity diversity, and cognitive diversity [207, 208]. Position, velocity, and cognitive diversity are used to measure the distribution of particles’ current positions, current velocities, and pbests (the best position found so far for each particles), respectively. The modified definitions of the three diversity measures based on L1 norm are introduced in [28, 30]. From diversity measurements, the useful information can be obtained. The detailed definitions of PSO population diversities can be found in [28–32].
3.2 3.2.1
Population Diversity Definition Introduction
Population diversity of PSO measures the distribution of particles, and the diversity’s changing rate is a way to monitor the degree of convergence / divergence of PSO search process [44, 251]. In other words, the particles’ current distribution and velocity tendency, whether it is in the state of “flying” to a large search space or refining in a local area, can be obtained from this measurement. In other words, the status of particles, whether it is in the state of exploration or exploitation, could be obtained from this measurement. Population diversity of PSO is useful for measuring and dynamically adjusting an algorithm’s ability of exploration or exploitation accordingly. Shi and Eberhart gave several definitions of PSO population diversity measurements in [207, 208, 253], and these definitions of population diversities could be divided into three parts: position diversity, velocity diversity, and cognitive diversity. For the purpose of generality and clarity, m represents the number of particles and n the number of dimensions. Each particle is represented as xij , i represents the ith particle, i = 1, · · · , m, and j is the jth dimension, j = 1, · · · , n. The definition of population diversity can be dimension-wise or element-wise. In the dimension-wise population diversity, each dimension is measured independently. On the contrary, in the element-wise population diversity, all dimensions are considered together to calculate the population diversity.
24
The element-wise and dimension-wise position diversity in [207] are defined as follow: • Element-wise position diversity m
x ¯=
n
m
1 XX xij m×n
p DE =
i=1 j=1
n
1 XX (xij − x ¯ )2 m×n i=1 j=1
p where x ¯ is the mean of current position for all particles in all dimensions, and DE
measures all particles’ position diversity in all dimensions. • Dimension-wise position diversity v um X 1u p D = t (xij − x ¯j )2 m
m 1 X x ¯j = xij m i=1
i=1
¯ = [¯ ¯ represents the mean of particles’ current posiwhere x x1 , · · · , x ¯j , · · · , x ¯n ], x tions on each dimension. Dp = [D1p , · · · , Djp , · · · , Dnp ], which measures particles’
position diversity based on L2 norm for each dimension.
The dimension-wise definition of velocity diversity, which was given by Shi and Eberhart is as follows [208] nor vij
vij = qP n
2 j=1 vij
m
v¯jnor
1 X nor = vij m i=1
v um uX 1 v nor − v D = t (vij ¯jnor )2 m i=1
where Dv = [D1v , · · · , Djv , · · · , Dnv ], Dv measures velocity diversity based on L2 norm for each single dimension.
Shi and Eberhart gave three definitions on population diversity, which are position diversity, velocity diversity, and cognitive diversity [207, 208]. Position, velocity, and cognitive diversity are used to measure the distribution of particles’ current positions, current velocities, and pbests (the best position found so far for each particles), respectively. The position diversity and cognitive diversity can be seen as the “average distance” of particle’s current positions or personal best positions, and the velocity diversity also can be seen the “average value” of current velocities. This distance between each particle to their positions / velocities center can be measured by Euclidean distance ( L2 distance) or Manhattan distance (L1 distance) [207, 208]. Cheng and Shi introduced the modified definitions of the three diversity measures based on L1 distance [28, 30–32]. In the following definitions, L1 distance is utilized to measure the distance between each particle to their positions / velocities center.
3.2.2
Position Diversity
Position diversity measures distribution of particles’ current positions, therefore, can reflect particles’ dynamics. Position diversity gives the current position distribution 25
information of particles, whether the particles are going to diverge or converge could be reflected from this measurement. From diversity measurements, useful search information can be obtained. Dimension-Wise Diversity Definition of dimension-wise position diversity, which is based on the L1 distance, is as follows
m
x ¯j =
m
1 X xij m
Dj p =
i=1
1 X |xij − x ¯j | m
Dp =
i=1
n X
wj Djp
j=1
where x ¯j represents the pivot of particles’ position in dimension j, and Dj p measures particles position diversity based on L1 norm for dimension j. Then we define ¯ = [¯ ¯ represents the mean of particles’ current positions on each x x1 , · · · , x ¯j , · · · , x ¯n ], x
dimension, and Dp = [D1p , · · · , Djp , · · · , Dnp ], which measures particles’ position diver-
sity based on L1 norm for each dimension. Dp measures the whole swarm’s position
diversity. Without loss of generality, every dimension is considered equally. Setting all weights wj = n1 , then the dimension-wise position diversity can be rewritten as: n n X 1 p 1X p D = D = Dj n j n p
j=1
j=1
Element-Wise Diversity The definition of element-wise PSO position diversity is as follows m
m
n
1 XX x ¯= xij n×m
p DE
i=1 j=1
3.2.3
n
1 XX = |xij − x ¯| n×m i=1 j=1
Velocity Diversity
Velocity diversity , which represents diversity of particles’ “moving potential”, measures the distribution of particles’ current velocities. In other words, velocity diversity measures the “activity” information of particles. Based on the measurement of velocity diversity, particle’s tendency of expansion or convergence could be revealed. Dimension-Wise Diversity The dimension-wise velocity diversity based on L1 distance is defined as follows m
1 X v¯j = vij m i=1
m
Djv
1 X = |vij − v¯j | m i=1
v
D =
n X
wj Djv
j=1
where v¯j represents the pivot of particles’ velocity in dimension j, and Dj v measures particles velocity diversity based on L1 norm for dimension j. Then we define 26
¯ = [¯ ¯ represents the mean of particles’ current velocities on each v v1 , · · · , v¯j , · · · , v¯n ], v dimension, and Dv = [D1v , · · · , Djv , · · · , Dnv ], Dv measures velocity diversity of all par-
ticles on each dimension. Dv represents the whole swarm velocity diversity based on L1 norm. Without loss of generality, every dimension is considered equally. Setting all weights wj = n1 , then the dimension-wise velocity diversity can be rewritten as: n n X 1X v 1 v D = D = Dj n j n v
j=1
j=1
Element-Wise Diversity The definition of element-wise PSO velocity diversity is as follows m
m
n
1 XX v¯ = vij n×m
v DE
i=1 j=1
3.2.4
n
1 XX = |vij − v¯| n×m i=1 j=1
Cognitive Diversity
Cognitive diversity, which represents distribution of particles’ “moving target”, measures the distribution of historical best positions for all particles. The measurement definition of cognitive diversity is the same as that of the position diversity except that it utilizes each particle’s current personal best position instead of current position. Dimension-Wise Diversity The definition of dimension-wise PSO cognitive diversity is as follows m
1 X p¯j = pij m
m
Djc
i=1
1 X = |pij − p¯j | m i=1
c
D =
n X
wj Djc
j=1
where p¯j represents the pivot of particles’ previous best position in dimension j, and Dj v measures particles cognitive diversity based on L1 norm for dimension j. Then we ¯ = [¯ ¯ represents the average of all particles’ personal define p p1 , · · · , p¯j , · · · , p¯n ] and p best position in history (pbest) on each dimension; Dc = [D1c , · · · , Djc , · · · , Dnc ], which represents the particles’ cognitive diversity for each dimension based on L1 norm. Dc
measures the whole swarm’s cognitive diversity. Without loss of generality, every dimension is considered equally. Setting all weights wj = n1 , then the dimension-wise cognitive diversity can be rewritten as: n n X 1X c 1 c Dj D = D = n j n c
j=1
j=1
27
Element-Wise Diversity The definition of element-wise PSO cognitive diversity is as follows m
p¯ =
m
n
1 XX pij n×m
c DE =
i=1 j=1
3.2.5
n
1 XX |pij − p¯| n×m i=1 j=1
Discussion
Element-wise measurement sets all particles and all dimensions as an entirety to calculate the population diversity of PSO; therefore, this kind of definitions has some disadvantages: p • Lack of each dimension’s diversity information: DE represents the swarm position
diversity, no measurement of particles’ position diversity on a single dimension.
• Confusion about the difference in dimensions: Consider a simple scenario, using two particles to solve a problem with two dimensions, particles at (1, 7) and (7, 1), respectively; or two particles converge to (1, 7), the results of elementp = 9. However, these two situations should wise diversity are same: x ¯ = 4, DE
have different results of population diversity measurement. The above example shows that element-wise diversity cannot give the useful information of problem’s optimum with different values among all dimensions. ¯ Besides, the definition of diversity on dimensions has clearer geometric means: x is the center of all positions on each dimension, and Djp is the average distance of particles from the center in j dimension. In other word, if swarm moves to the center from current distribution, the distance of all particles need to move is mDjp in dimension j, and the total distance is m
n X j=1
Djp = m × n × Dp = mnDp .
Compared with the L2 norm, L1 norm has higher computational efficiency, it can be used to replace L2 norm. Similar with definitions of position diversity, velocity diversity based on L2 norm lacks of measurement on whole swarm velocity, Dv only measures velocity distribution on each dimension. L1 norm can also be used to replace L2 norm. ¯, In addition, velocity diversity based on L1 norm has clearer geometric meaning: x which is the average velocity of particles on each dimension, was found at first; then the variance of velocity on each dimension j could be calculated. This variance gives the information of particles “vitality,” even the swarm may have the same average of velocity on each dimension during different searching period; the variance will be small when the swarm converges.
28
The above discussion gives the definitions of population diversities from three parts: position, velocity, and cognitive. These diversity definitions are based on L1 norm which have clearer geometric meaning. Three definitions have the same form and easy to understand or implement. More searching information of optimization algorithms could be revealed from this measurement.
3.3
Diversity Monitoring and Analysis
Wolpert and Macerady have proved that under certain assumptions no algorithm is better than other one on average for all problems [232–235]. Consider the generalization, twelve benchmark functions were used in our experimental studies to monitor and analyze diversity [152, 249]. The aim of the experiment is not to compare the ability or the efficacy of PSO algorithm with different parameter setting or structure, e.g., global star or local ring, but to compare the measurement of runtime information when PSOs are executed. The 12 benchmark functions are given in Appendix A.1. Functions f0 − f4 are
unimodal problems, which include Parabolic (Sphere) function f0 , Schwefel’s P2.22 function f1 , Schwefel’s P1.2 function f2 , Step function f3 , and Quartic noise function f4 . f4 is a noisy quadric function, where random[0, 1) is a uniformly distributed random variable in [0, 1). Functions f5 − f11 are multimodal problems, which include gener-
alized Rosenbrock function f5 1 , Schwefel function f6 , generalized Rastrigin function f7 , Noncontinuous Rastrigin function f8 , Ackley function f9 , Griewank function f10 , generalized penalized function f11 . In all experiments, PSO has 50 particles, and parameters are set as the standard PSO [21]. Each algorithm runs 50 times, 10 000 iterations in each run. The simulation results of three representative benchmark functions are reported here. These functions include Quartic noise function f4 , which is a unimodal function with random noise; Noncontinuous Rastrigin function f8 , which is a noncontinuous multimodal function, and Griewank function f10 , a continuous multimodal function.
3.3.1
Comparison of Different PSO Diversity Definitions
Figure 3.1 shows a comparison of different PSO population diversity definitions. Firstly, for the PSO with global star structure, Figure 3.1 (a) shows the cognitive diversity of f5 function, Figure 3.1 (b) shows the position diversity of f9 function, and Figure 3.1 (c) shows the velocity diversity of f11 function; secondly, for the PSO with local ring structure, Figure 3.1 (d) shows the velocity diversity of f5 function, Figure 3.1 (e) shows the cognitive diversity of f9 function, and Figure 3.1 (f) shows the position diversity of f11 function. 1
Generalized Rosenbrock function f5 has 2 minima when dimensions n = 4 ∼ 30 [199].
29
0
2
10
10
dimension−wise L
1
element−wise dimension−wise L2
1
10 −1
10
0
10
−2
10
−1
dimension−wise L1
10
element−wise dimension−wise L2 −3
10
−2
0
10
1
10
2
10
3
10
10
4
10
0
10
(a) Quartic noise
1
10
2
10
3
10
4
10
(b) Noncontinuous Rastrigin
4
0
10
10
2
10
−1
10
0
10
−2
10
−2
10 −4
10
−6
10
−3
10
L1 norm
−8
10
L1 norm
L2 norm
L2 norm
−10
10
−4
0
10
1
10
2
10
3
10
10
4
10
0
10
(c) Griewank
1
10
2
10
3
10
4
10
(d) Quartic noise 6
10
5
10
4
10
3
10
2
10
dimension−wise L1
0
10
1
10
element−wise dimension−wise L2
0
dimension−wise L1
10
element−wise dimension−wise L2
−1
10
−2
0
10
1
10
2
10
3
10
10
4
10
0
10
(e) Noncontinuous Rastrigin
1
10
2
10
3
10
4
10
(f) Griewank
Figure 3.1: Different definitions of PSO population diversity. Global star structure: (a) f4 cognitive, (b) f8 Position, (c) f10 velocity; Local ring structure: (d) f4 velocity, (e) f8 cognitive, (f) f10 position
30
From the Figure 3.1, conclusions could be made that dimension-wise population diversities is better than element-wise. The measurement of element-wise diversity cannot give useful information of particles’ distribution. The diversity based on L1 norm and L2 norm have the same changing curve. L1 norm is better than L2 norm for two reasons: • In general terms, the L1 can be calculated easily, and it has high computational efficiency.
• The L1 norm is easy to undertand, and it has clear geometric meaning in high dimension space.
• The value of L1 is larger than L2 norm, (Figure 3.1: (a), (b), (e), (f)), or the value of L1 has larger variation range (Figure 3.1: (c), (d)). These show that
diversity based on L1 norm can reveal more significant information at least for the tested benchmark functions under dimension-wise population diversity.
3.3.2
PSO Diversity Analysis
Figure 3.2 displays the population diversities which include position diversity, cognitive diversity, and velocity diversity. Firstly, Figure 3.2 (a), (b), (c) display the diversities of f5 function, f9 function, and f11 function for PSO with global star structure, respectively; secondly, Figure 3.2 (d), (e), (f) display the diversities of f5 function, f9 function, and f11 function for PSO with local ring structure, respectively. From the figures, some conclusions could be made that position diversity and cognitive diversity have the same changing tendency, and cognitive diversity curve is a simplified position diversity curve without vibrate. Velocity diversity will fall to a tiny value after algorithm find a “good enough” value or “stuck” in a local optimum. It is observed from running PSO that particles “fly” from one side of optimum to another side on each dimension continually [213]. Velocity diversity and position diversity usually have a continuous vibrate.
3.4
Search Information Propagation
A particle updates its position in the search space at each iteration. The velocity update in Equation (2.4) consists of three parts, which are previous velocity, cognitive part, and social part. The cognitive part means that a particle learns from its own searching experience, and correspondingly, the social part means that a particle can learn from other particles, or learn from the best in its neighbors in particular. Topology defines the neighborhood of a particle. Particle swarm optimization algorithm has different kinds of topology structures, e.g., star, ring, four clusters, or Von Neumann structure. The figures of different 31
0
1
10
10
−1
0
10
10
−2
−1
10
10
cognitive position velocity
cognitive position velocity
−3
−2
10
0
10
1
10
2
10
3
10
4
10
10
0
10
(a) Quartic noise
1
10
2
10
3
10
4
10
(b) Noncontinuous Rastrigin
4
0
10
10
2
10
0
10
−1
10 −2
10
−4
10
−2
10
−6
10
cognitive position velocity
−8
10
cognitive position velocity
−10
−3
10
0
10
1
10
2
10
3
10
4
10
10
0
10
(c) Griewank
1
10
2
10
3
10
4
10
(d) Quartic noise 50
10
0
10
−50
10
cognitive position velocity
0
10
−100
10
−150
10
cognitive position velocity
−200
10
−250
10
−300
10 0
10
1
10
2
10
3
10
4
0
10
10
(e) Noncontinuous Rastrigin
1
10
2
10
3
10
4
10
(f) Griewank
Figure 3.2: PSO population diversity analysis. Global star structure: (a) f4 population, (b) f8 population, (c) f10 population; Local ring structure: (d) f4 population, (e) f8 population, (f) f10 population
32
topologies are given in Section 2.5.4. A particle in a PSO with a different structure has different number of particles in its neighborhood with a different scope. Learning from a different neighbor means that a particle follows different neighborhood (or local) best, in other words, topology structure determines the connections among particles, and the strategy of search information propagation. Although it does not relate to the particle’s cognitive part directly, topology can affect the algorithm’s convergence speed and the ability of avoiding premature convergence, i.e., the PSO algorithm’s ability of exploration and exploitation. The graph type and population size on the transmittal of a solution, i.e., the solution transfer rates, have been investigated in graph based evolutionary algorithms [50]. In this section, the population diversity based study on search information propagation in particle swarm optimization is analyzed and discussed. Table 3.1 gives different properties of topologies, where “Neighbor” indicates the number of neighbors that a particle has, “Diameter” indicates the diameter of a topology, which is the largest step that search information propagation from a particle to another, and “Average distance” indicates the average step of search information propagation in the swarm. The “Average distance” of a topology structure will always less than or equal to the “Diameter.” Table 3.1: Parameters and Criteria of different topology. Topology Star Ring Four Clusters Von Neumann
Neighbors m−1 m 4
2 − 1 (m 4) 4
Diameter 1 m 2,
Average Distance 1 m2 m+1 4(m−1) , or 4 5m−14 2(m−1) 7d2 −11d 2(m−1) , or 7d2 −5d−4 2(m−1)
or m−1 2 3
2 + bm/8c
Star structure has the most neighbors and the smallest diameter. It only needs one iteration to propagate “good” search information over whole swarm. Ring structure has the longest diameter among the four topologies. The calculation of diameter and average distance are different when the population size is even or odd. When population size m is an even number, the search information needs propagate over whole swarm, therefore, the diameter is
m−1 2 .
m 2
to
A particle needs only one
step to two nearby neighbors, and another step to another two particles, and so on. The average distance is calculated as follows: disring =
2 × (1 + 2 + · · · + m−1
m−2 2 )
+
m 2
=
m2 4(m − 1)
On the contrary, when population size m is an odd number, the search information needs
m−1 2
steps to propagate over whole swarm. The average distance is calculated as
33
follows:
2 × (1 + 2 + · · · + m−1 m+1 2 ) = m−1 4 The population size in four clusters structure should be a multiple of 4, and large disring =
than 16. A particle in the cluster has m/4 − 1 neighbors, and a link particle has m/4 neighbors. A swarm with four clusters topology has 12 link particles, and m − 12 inner
particles.The average distance is calculated as follows:
(m − 12)[(m/4 − 1) + 3 × 2 + (3m/4 − 3) × 3 m(m − 1)] 12[m/4 + (2 + m/4 − 1) × 2 + (m/2 − 2) × 3] + m(m − 1) 5m − 14 = 2(m − 1)
disf our =
The population size in Von Neumann structure also should be a multiple of 4, and large than 16, normally. Diameter d equals 2 + bm/8c in this structure. The calculation of average distance is different when the equation When
mod (m, 8) has different result.
mod (m, 8) = 0, the average distance is 4 + 7 × (2 + · · · + bm/8c) 4 × (1 + bm/8c) + 1 × (2 + bm/8c) + m−1 m−1 7 × (2 + · · · + bm/8c) + 5 × (2 + bm/8c) = m−1 7d2 − 11d = 2(m − 1)
disvon =
When
mod (m, 8) = 4, the average distance is 4 + 7 × (2 + · · · + bm/8c) 6 × (1 + bm/8c) + 2 × (2 + bm/8c) + m−1 m−1 7 × (2 + · · · + bm/8c) + 8 × (2 + bm/8c) − 2 = m−1 2 7d − 5d − 4 = 2(m − 1)
disvon =
Diameter and average distance is a fixed number in star topology. In four cluster topology, diameter is 3, the larger the population size m is, the closer to 2.5 the average distance is. In ring and Von Neumann topology, the value of diameter and average distance increases with the population size, and they increase faster in ring topology than in Von Neumann topology.
3.4.1
Experimental Study
Benchmark Test Functions The experiments have been conducted on to testing the benchmark functions listed in Appendix A.1. Without loss of generality, five standard unimodal and five multimodal test functions are selected [152, 249]. 34
All functions are run 50 times to ensure a reasonable statistical result necessary to compare different approaches. Every tested function’s optimal point in solution space S is shifted to a randomly generated point with different value in each dimension, and S ⊆ Rn , Rn is a n-dimensional Euclidean space. Parameter Setting Clerc provided a reference setting of parameters in [43]. The setting of population √ size is calculated according to formula: m = b10 + 2 nc, where n is the dimension of
problems. Due to the reason that four cluster and Von Neumann topologies require that
the population size should be a multiple of 4. In all experiments, problems have 100 √ dimensions, therefore, each PSO has m = b10 + 2 nc particles. Other parameters are
set as follow: w =
1 2 ln(2)
' 0.721, and c1 = c2 = 0.5 + ln(2) ' 1.193. Correspondingly,
in fully informed PSO, χ is set as 0.72984, and ϕ is 4.1 [136]. Each algorithm runs 50 times. According to Table 3.1, the parameters and criteria for the test functions are listed in Table 3.2. Table 3.2: Parameters and Criteria for the test functions. Topology Star Ring Four Clusters Von Neumann
Neighbors 31 2 7 (8) 4
Diameter 1 16 3 6
Average Distance 1 8.25806 2.35483 3.0
Experimental Results Table 3.3 and 3.4 give experimental results of classical PSO and fully informed PSO with star, ring, four clusters, and Von Neumann structure to solve benchmark functions. Fully informed PSO with ring structure has the best performance than other variants of PSO in this experiment. The two common measures of center are the mean and median. The mean is the “average value” and the median is the “middle value.” These are two different ideas for “center”, and the two measures behave differently. An important weakness of the mean as a measure of center is the mean is sensitive to the influence of a few extreme observations, i.e., the median is more resistant than the mean [171]. Two extreme variants of PSO are also tested in this experiment. The classical PSO has the basic concept that “follow the leader” in the neighborhood. However, Table 3.5 given the results that a particle follows with a random particle in star or ring topology structure, respectively. For PSO with star structure, particle follows with a random particle in the swarm; the search information has a very slow propagation. Particles are difficult to get 35
Table 3.3: Results of the classic PSO and fully informed PSO with different topologies. All algorithms are run 50 times, where “Best”, “Median”, “Mean”, and “Std. Dev.” indicate the best, middle, average, and standard deviation of the best fitness values over 50 runs, respectively. The maximum iteration number is 4 000. Func.
fmin
PSO Classic
f0
−450.0 FIPS
Classic f1
−330.0 FIPS
Classic f2
450.0 FIPS
Classic f3
330.0 FIPS
Classic f4
−450.0 FIPS
star ring four von star ring four von star ring four von star ring four von star ring four von star ring four von star ring four von star ring four von star ring four von star ring four von
Best -449.9449 -449.9999 -449.9999 -449.9999 339613.56 -450 14285.00 -449.9999 -328.4412 -329.9996 -329.9999 -329.9999 7.996E+36 -329.9999 9.053966 -329.9997 116093.81 26076.057 12389.844 14159.233 806115.43 47111.608 202707.49 29281.216 6781 385 603 395 355037 330 33363 541 -444.9641 -448.7550 -449.5577 -449.6993 4315.991 -449.9204 -357.3188 -445.2883
36
Median -439.7927 -449.9999 -449.9999 -449.9999 447076.36 -450 24681.19 -449.9999 -291.4290 -329.9643 -329.9992 -329.9999 3.263E+43 -329.9999 4.801E+09 -329.9939 216822.59 54623.792 67274.542 45973.174 1555663.2 56729.710 249662.99 84296.259 22969 489 1502 722 455072 331 50566 1332 -432.7642 -447.9319 -448.4410 -449.0370 6068.306 -449.8675 -314.8768 -441.6522
Mean -255.2338 -449.9999 -449.9999 -449.9999 441831.66 -450 24684.67 -449.9999 -289.4810 -305.2955 -329.9925 -329.9996 2.274E+48 -329.9999 7.803E+15 -329.8927 232813.60 58720.959 71598.946 51052.310 1560989.7 57214.995 251088.27 88102.738 23315.9 544.38 1989.12 1111.06 451635.92 331.18 49443.56 1510.9 -424.9775 -447.7522 -448.2649 -448.8236 6083.339 -449.8651 -315.3167 -440.7968
Std. Dev. 560.8554 0.000138 4.20E-07 7.99E-10 35227.44 1.89E-13 5389.470 0.000344 21.8558 88.8413 0.01781 0.00129 1.36E+49 4.26E-10 5.44E+16 0.679876 65798.359 16602.364 40728.281 29712.105 380104.02 5839.6394 34874.554 41992.824 10306.952 195.5423 1590.592 1066.0466 34292.83 1.68154 7546.029 771.316 34.21905 0.652466 0.959494 0.907857 694.3036 0.031783 28.69503 3.350950
Table 3.4: Results of the classic PSO and fully informed PSO with different topologies. All algorithms are run 50 times, where “Best”, “Median”, “Mean”, and “Std. Dev.” indicate the best, middle, average, and standard deviation of the best fitness values over 50 runs, respectively. The maximum iteration number is 4 000. Func.
fmin
PSO Classic
f5
−330.0 FIPS
Classic f6
450.0 FIPS
Classic f7
180.0 FIPS
Classic f8
120.0 FIPS
Classic f9
330.0 FIPS
star ring four von star ring four von star ring four von star ring four von star ring four von star ring four von star ring four von star ring four von star ring four von star ring four von
Best -18.22195 -173.3592 -192.9005 -193.7088 4.106E+07 -234.3696 227170.13 -150.5334 1004.776 903.9942 905.6888 820.1231 2308.406 830.7015 1387.589 789.2801 198.6934 182.6997 186.5255 183.5734 201.1163 180.0000 199.5553 185.2589 120.0778 120.0000 120 120 3218.739 120 224.9534 120.0000 331.3039 330.4914 330.0001 330.0000 1.173E+10 330.0239 1.168E+07 331.2250
37
Median 23163.16 -21.9800 20.00018 -19.9443 5.513E+07 -131.6252 581041.02 256.6969 1373.615 1084.227 1096.078 994.3507 2534.831 956.7745 1500.302 861.9119 199.5733 189.4261 194.1494 185.7672 201.2654 180.0000 200.2754 190.7780 121.0613 120.0005 120.0633 120.0682 4137.406 120 336.2165 120.0657 335.7367 334.0971 330.9404 330.7812 1.678E+10 332.8482 5.228E+07 11120.27
Mean 60475.23 -23.1920 3224.695 876.3252 5.485E+07 -134.3712 594981.22 1224.615 1352.159 1086.382 1106.205 990.3344 2522.431 953.9749 1486.953 862.8679 199.4646 191.6607 194.0570 186.3510 201.2582 180.0000 200.1863 190.8429 121.9329 120.0219 120.1254 120.1016 4138.720 120.0011 334.5281 120.2018 336.9716 334.3615 331.2466 331.0321 1.676E+10 332.8362 5.267E+07 83274.45
Std. Dev. 94959.45 74.34104 18677.85 3844.901 6.055E+06 55.86204 173187.8 2167.309 144.4495 78.35568 115.0447 93.4592 105.0067 42.5777 47.9096 39.06456 0.268229 6.172737 4.149344 2.16691 0.037203 4.643E-08 0.262970 2.644279 3.965766 0.043576 0.165114 0.153764 280.8450 0.003452 47.16474 0.506643 5.89327 2.31153 1.22104 1.11112 1.966E+09 1.607897 2.112E+07 169937.51
clustered together in a small region. For PSO with ring structure, a particle has two neighbors in ring topology. Each particle has a fifty-fifty chance to follow the particle with a better fitness value when particles follow neighbor randomly. At this time, the search information propagation is slower than the classical PSO with ring topology. Results in Table 3.5 show that it is difficult to find a good result when following a particle randomly selected from whole swarm, however, following a particle randomly in small region can obtain a good result. For unimodal function f3 , particle following a random neighbor in ring topology has better result than classical PSO and fully informed PSO. Table 3.5: Results of each particle following a random neighbor in its neighborhood. All algorithms are run 50 times, where “Best”, “Median”, “Mean”, and “Std. Dev.” indicate the best, middle, average, and standard deviation of the best fitness values over 50 runs for each benchmark function, respectively. The maximum iteration number is 4 000.
3.4.2
Func.
fmin
f0
−450.0
f1
−330.0
f2
450.0
f3
330.0
f4
−450.0
f5
−330.0
f6
450.0
f7
180.0
f8
120.0
f9
330.0
Star Ring Star Ring Star Ring Star Ring Star Ring
Best 24817.424 -449.9995 -156.1868 -329.9977 320829.942 124026.453 24154 330 -413.9879 -449.8921
Median 29325.182 -449.9990 -107.6218 -329.9949 395701.648 181956.903 28940 330 -398.2961 -449.82399
Mean 29249.152 -449.9989 83989.389 -329.9948 397395.835 180529.670 28866.46 330.04 -397.4175 -449.8235
Std. Dev. 2456.377 0.000443 571063.228 0.001400 32888.245 24434.898 2431.157 0.1959591 7.58119 0.031174
Star Ring Star Ring Star Ring Star Ring Star Ring
250096.782 -230.0535 1401.5184 962.6106 193.8776 180.0055 327.1738 120.0002 1.543E+07 333.1731
433522.28 -86.4069 1459.6367 1157.5832 195.1455 180.0081 386.8496 120.0007 3.013E+07 337.1858
433543.07 -74.2590 1457.5322 1152.435 195.0929 180.0083 385.8214 120.0022 3.055E+07 337.2432
75651.926 90.33934 26.62942 52.92584 0.313985 0.002285 25.7765 0.006460 6822010.6 2.036253
Population Diversity Analysis and Discussion
Without loss of generality and for the purpose of simplicity and clarity, the results for one function from five unimodal benchmark functions and one function from five multimodal functions will be displayed because the results for the others are similar. There are several definitions on the measurement of population diversities [30, 207, 208]. The dimension-wise population diversity based on the L1 norm is utilized in this Chapter. Figure 3.3 and 3.4 display the population diversity changing curves of classical PSO 38
5
2
10
10
0
1
10
10
−5
10
0
10
classic position classic velocity classic cognitive fully position fully velocity fully cognitive
−10
10
−1
10
−15
−2
10
10
−20
10
classic position classic velocity classic cognitive random position random velocity random cognitive
−3
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
(a) star
1000
1500
2000
2500
3000
3500
4000
3500
4000
3500
4000
(b) star random
2
2
10
10
classic position classic velocity classic cognitive fully position fully velocity fully cognitive
0
10
−2
10
1
10
0
10
−1
10
−2
10 −4
10
−3
10
classic position classic velocity classic cognitive random position random velocity random cognitive
−4
10
−6
10
−5
10 −8
10
−6
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
(c) ring
1500
2000
2500
3000
(d) ring random
2
2
10
10
0
0
10
10
−2
−2
10
10
−4
−4
classic position classic velocity classic cognitive fully position fully velocity fully cognitive
10
−6
10
classic position classic velocity classic cognitive fully position fully velocity fully cognitive
10
−6
10
−8
10
1000
−8
0
500
1000
1500
2000
2500
3000
3500
10
4000
(e) four clusters
0
500
1000
1500
2000
2500
3000
(f) von Neumann
Figure 3.3: Population diversity changing curves on PSO solving unimodal function f0 : (a) star, (b) star random, (c) ring, (d) ring random, (e) four clusters, (f) von Neumann.
39
and fully informed PSO with four kinds of topologies solving unimodal function f0 and multimodal function f5 , respectively. In both figures, (a) is for classical PSO and fully informed PSO with star structure, (b) is for classical PSO and particle follows with a random neighbor with star structure, (c) is for classical PSO and fully informed PSO with ring structure, (d) is for classical PSO and particle follows with a random neighbor with ring structure, (e) is for classical PSO and fully informed PSO with four clusters structure, (f) is for classical PSO and fully informed PSO with Von Neumann Structure, respectively. From the figures, we can see that classical PSO has a faster decrease of population diversity than fully informed PSO in general. For classical PSO, particles following “leader” in the neighborhood, this will cause a faster search information propagation, then the swarm gets clustered in a small region quickly. Fully informed PSO with a star structure or a particle following another random neighbor can be seen as extremely examples. For fully informed PSO with a star structure, velocity diversity gets clustered to a tiny value rapidly; position diversity and cognitive diversity remain to be a stable value during the whole search process. This is because too much search information propagation in the swarm, all particles quickly moves to a small region, premature convergence happens in this case. There is little information about propagation in the swarm when particle follows a random neighbor in whole swarm. Particles are difficult to get clustered, and then three population diversities remain to be a large value over the iterations. Search information propagates slowly when particles follow a random neighbor in a small neighborhood. Correspondingly, population diversities of particles following random neighbor decrease slower than classical PSO in Figure 3.3. (b), (d) and Figure 3.4. (b), (d). Figure 3.5 and 3.6 display the population diversity changing curves of classical PSO and fully informed PSO solving unimodal function f0 and multimodal function f5 , respectively. In both figures, (a) is for position diversity, (b) is for velocity diversity, (c) is for cognitive diversity, respectively. Fully informed PSO with star topology uses whole swarm’s search information to update each particle’s position. The particles get clustered together after a few iterations, the velocity diversity falls to a tiny value. Velocity diversity represents the distribution of particles’ “moving potential”, from this case, we can conclude that we need to keep this search potential during the search process. In general, classical PSO with ring structure or fully informed PSO with ring structure performs better than others. Cognitive diversity represents the distribution of particles’ “moving target.” Position diversity represents current solutions’ distribution. Position diversity should tend to follow the cognitive diversity or vice versa. Dynamical changing position diversity to adapt cognitive diversity, or on the contrary, changing cognitive diversity to adapt position diversity may improve algorithm’s performance.
40
5
1
10
10
0
0
10
10
−5
−1
10
10
classic position classic velocity classic cognitive fully position fully velocity fully cognitive
−10
10
−2
10
−15
−3
10
10
−20
10
classic position classic velocity classic cognitive random position random velocity random cognitive
−4
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
(a) star
1500
2000
2500
3000
3500
4000
3500
4000
(b) star random
1
1
10
10
classic position classic velocity classic cognitive fully position fully velocity fully cognitive
0
10
−1
10
0
10
−1
10
−2
−2
10
classic position classic velocity classic cognitive random position random velocity random cognitive
10
−3
−3
10
10
−4
10
1000
−4
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
(c) ring
1000
1500
2000
2500
3000
(d) ring random
1
1
10
10
classic position classic velocity classic cognitive fully position fully velocity fully cognitive
0
10
−1
10
classic position classic velocity classic cognitive fully position fully velocity fully cognitive
0
10
−1
10 −2
10
−2
10
−3
10
−4
10
−3
0
500
1000
1500
2000
2500
3000
3500
10
4000
(e) four clusters
0
500
1000
1500
2000
2500
3000
3500
4000
(f) von Neumann
Figure 3.4: Population diversity changing curves on PSO solving multimodal function f5 (a) star, (b) star random, (c) ring, (d) ring random, (e) four clusters, (f) von Neumann.
41
2
5
10
10
0
0
10
10
−2
−5
10
10
classic star classic ring classic four classic von fully star fully ring fully four fully von
−4
10
−6
10
−8
10
0
500
1000
1500
classic star classic ring classic four classic von fully star fully ring fully four fully von
−10
10
−15
10
−20
2000
2500
3000
3500
10
4000
(a) position
0
500
1000
1500
2000
2500
3000
3500
4000
(b) velocity
2
10
0
10
−2
10
classic star classic ring classic four classic von fully star fully ring fully four fully von
−4
10
−6
10
−8
10
0
500
1000
1500
2000
2500
3000
3500
4000
(c) cognitive Figure 3.5: Comparison of Population diversities for PSO solving unimodal function f0 : (a) position, (b) velocity, (c) cognitive.
42
1
5
10
10
classic star classic ring classic four classic von fully star fully ring fully four fully von
0
10
−1
10
0
10
−5
10
−2
10
−3
−15
10
10
−4
10
classic star classic ring classic four classic von fully star fully ring fully four fully von
−10
10
−20
0
500
1000
1500
2000
2500
3000
3500
10
4000
(a) position
0
500
1000
1500
2000
2500
3000
3500
4000
(b) velocity
1
10
classic star classic ring classic four classic von fully star fully ring fully four fully von
0
10
−1
10
−2
10
−3
10
−4
10
0
500
1000
1500
2000
2500
3000
3500
4000
(c) cognitive Figure 3.6: Comparison of Population diversities for PSO solving multimodal function f5 : (a) position, (b) velocity, (c) cognitive.
43
3.4.3
Conclusions
An algorithm’s ability of exploration and exploitation is important in the optimization process. With good exploration ability, an algorithm can explore more areas in the search space, and find some potential regions where “good enough” solutions may exist. On the other hand, an algorithm with the ability of exploitation can finely search the potentially good regions, and find the optimum ultimately. An algorithm should have a good balance between exploration and exploitation during the search process. In this section, we have analyzed the different diameter and average distance of search information propagation in PSO with different topology. PSO with star topology has the smallest diameter and average distance, which means that search information has the fastest propagation in all topologies, and on the contrary, PSO with ring topology has the largest diameter and average distance. Topology determines the search information propagation in swarm. With improper search information propagation, premature or low efficacy search may happen. Algorithm’s exploration and exploitation can be monitored by population diversity changing during search process. In this section, we monitored population diversities of PSO with different topologies. Normally, we can conclude that classical PSO with ring structure has faster propagation of search information than fully informed PSO with ring structure, and correspondingly, population diversities decease fast in classical PSO at the same iteration. In this case, fully informed PSO can preserve population diversity for more iterations, the performance of fully informed PSO with ring structure is better than classical PSO with ring structure in most benchmark functions. However, it does not mean that the slower the search information propagation is the better choice. The fully informed PSO with star topology shares most of search information in the whole swarm. The velocity diversity decreases to a very tiny value after a few iterations. The swarm will lose “search potential”, even the position diversity and cognitive diversity preserve during the search process.
3.5
Normalized Population Diversity
Position diversity represents the distribution of current solutions. Cognitive diversity represents the distribution of particles’ “moving target.” Position diversity should tend to follow the cognitive diversity or vice versa. Velocity diversity represents the particles’ “moving potential”. This potential should be enlarged when particles are “stuck in” a small search region where “good enough” solution could not be found, and be reduced when particles find an area that “good enough” solution may exist. According to the relationship between position diversity and cognitive diversity, it could improve the algorithm’s performance by dynamically adjusting particles’ velocities [208]. The basic procedure of PSO is shown as Algorithm 1 in Section 2.5. A particle
44
updates its velocity according to equation (2.4), and updates its position according to equation (2.5). The c1 rand()(pi − xi ) part can be seen as a cognitive behavior, while c2 Rand()(pg − xi ) part can be seen as a social behavior.
In particle swarm optimization, a particle not only learns from its own experience,
it also learns from its companions’ experience. It indicates that a particle’s ‘moving position’ is determined by its own experience and the neighbors’ experience [31]. The particles’ update equations (2.4) and (2.5) in Section 2.5 can also be written in matrix form as follow: V = wV + c1 rand()(P − X) + c2 Rand()(N − X)
(3.1)
X=X+V
(3.2)
where rand() and Rand() are different for each matrix element, and x1 x11 x12 · · · x1n v1 v11 v12 x2 x21 x22 · · · x2n v2 v21 v22 X= . = . V= . = . .. .. .. .. .. xij . xm xm1 xm2 · · · xmn vm vm1 vm2 ∗ p p11 p∗12 p12 · · · p1n 11 p1 p∗1 p∗21 p∗22 p21 p22 · · · p2n p2 p∗2 = = P= N = . . · · · .. · · · ... .. pij pm p∗m pm1 pm2 · · · pmn p∗m1 p∗m2
··· ··· vij ···
··· ···
p∗ij ···
v1n v2n .. . vmn p∗1n p∗2n .. .
p∗mn
The above four matrix are termed as position matrix X, velocity matrix V, cognitive (personal best) matrix P, and social (neighboring best) matrix N, which is a simplified personal matrix with only one particle’s best position which is either the global best position in global star structure or a particle’s personal best in this neighborhood in other structures, e.g., local ring. The three definitions on population diversity, which are position diversity, velocity diversity and cognitive diversity, are given in [208]. According to the PSO matrix representation, diversity is a measurement of variance of different elements in each dimension or in whole matrix. Position diversity is used to measure the distribution of particles’ current positions, that is, it concerns elements in matrix X. Velocity diversity is used to measure the distribution of swarm’s current velocity, that is, it concerns elements in matrix V. Cognitive diversity measures the distribution of the best positions for each particles found so far, that is, it concerns elements in matrix P. Which diversity definition to be utilized to measure the diversity of swarm is determined by the property of particle swarm optimization algorithms and the problems to be solved. The population diversity measurement can be based on each dimension or the whole swarm. In the particle swarm optimization, each vector xi , where xi = [xi1 , · · · , xin ]
in position matrix X, is a solution of problem. Different vector xi is measured inde-
pendently in dimension-wise population diversity, while for element-wise diversity, all 45
vectors are combined together to find the center of particles. For a dimension-wise diversity, it is independent to evaluate the contribution of each xij to measure the diversity; while for element-wise diversity, it is not independent. Therefore, for dimension-wise diversity, it is preferred to normalize population diversity with vector norm, while for element-wise diversity, it is preferred to normalize diversity with matrix norm. Diversity has been defined to measure the search process of an evolutionary algorithm. Generally, it is not to measure whether the algorithm find a “good enough” solution or not, but to measure the distribution of individuals in the population (current solutions). Leung et al. used Markov chain analysis to measure a degree of population diversity in the premature convergent process of genetic algorithms [146]. Olorunda and Engelbrecht utilized swarm diversity to measure the state of exploration or exploitation during particles searching [179]. The three different definitions on population diversity to measure the PSO search process are introduced in [207, 208]. These three kinds of population diversities are utilized on different subjects, which includes the population diversity control [28], search space boundary constraints handle [31], promoting diversity to solve multimodal problems [32], search information propagation analysis [36], and dynamical exploitation space reduction for solving large scale problems [33]. Compared with other evolutionary algorithms, e.g., genetic algorithm, PSO has more search information, which includes not only the solution (position), but also the velocity and the previous best solution (cognitive). Population diversities, which include position diversity, velocity diversity, and cognitive diversity, are utilized to measure the information, respectively. There are several definitions on the measurement of population diversities [30, 207, 208]. Because different problems have different dynamic ranges, the dynamic ranges of these defined diversities generally will be different. As a consequence, the diversity observation on one problem will be different from that on another problem. Therefore it is necessary to have normalized diversity definitions.
3.5.1
Vector Norm and Matrix Norm
A vector norm is a map function f : Rn → R. The p-norms, which is a useful class of
vector norms, are defined by
1
kxkp = (|x1 |p + · · · + |xn |p ) p
p ≥ 1.
(3.3)
The L1 and L∞ norms can be utilized to normalize the population diversity, the definitions are as follow: kxk1 = |x1 | + · · · + |xn | kxk∞ = max |xi | 1≤i≤n
46
(3.4) (3.5)
All norms on Rn are equivalent [101], i.e., if k · kα and k · kβ are norms on Rn , there
exist positive constants, c1 and c2 such that c1 kxkα ≤ kxkβ · · · ≤ c2 kxkα . Vector norm
has the property that:
kxk1 ≥ kxk2 · · · ≥ kxk∞
(3.6)
A matrix norm is a map function f : Rm×n → R. The most frequently used matrix
norm is the p-norms:
kAxkp kxkp
kAkp = sup x6=0
and the matrix norm has the properties that: max |aij | ≤ kAk2 ≤ i,j
kAk1 = max
√ mn max |aij |,
m P
1≤j≤n i=1 n P
kAk∞ = max
i,j
|aij |, and
1≤i≤m j=1
|aij |
By applying matrix norms in PSO, the meaning of matrix norm is as follows: • For each dimension, calculating the sum of absolute position value for every particle, the maximum is the matrix norm L1 for position matrix.
• For every particle, finding the sum of absolute position value in each dimension, the maximum is the matrix norm L∞ for position matrix.
The distinction between matrix L1 norm and matrix L∞ norm is the perspectives taken on the position matrix. Matrix L1 norm measures the largest value on dimension, while matrix L∞ norm measures the largest value on particles. Considering the property whether vectors are dependent on each other or not, vector norms are preferred to be applied to normalize dimension-wise population diversity and matrix L∞ norms are preferred to be used for element-wise population diversity.
3.5.2
Normalized Position Diversity
Position diversity measures distribution of particles’ current position. Whether the swarm is going to diverge into wider search space or converge to a small area can be obtained from this measurement. Position diversity concerns the elements in position matrix. Dimension-Wise Diversity For dimension-wise diversity, each vector in position matrix are independent. Vector norms are preferred to normalize the position, and three methods are as follow. These
47
normalization are based on the vector L1 norm, or L∞ norm, or maximum value of position: xnor ij =
xij xij = n , or P kxk1 |xij |
xnor ij =
j=1
xij xij = , or kxk∞ max |xij |
xij Xmax
xnor ij =
Considering the inequality (3.6), normalized position based on other vector norms is always larger than position based on L1 norm and smaller than position based on L∞ norm. Normalized dimension-wise position diversities are calculated as follow: m
x ¯nor j
1 X nor = xij m
m
Djp
i=1
1 X nor |xij − x ¯nor = j | m i=1
n
1X p D = Dj n p
j=1
where Dp = [D1p , · · · , Dnp ] are the diversities on each dimension, and Dp is the normalized position diversity for particles. Element-Wise Diversity For element-wise diversity, a vector in position matrix is relative to other vector or vectors. This connection should be considered in diversity measurement. Three methods are preferred to normalize the position of each particle max |xij | in matrix X, matrix L∞ norm for X, or the maximum value of position. Normalized position is as follows: xnor ij =
xij , or max |xij |
xnor ij =
xij = kXk∞
max
xij n P
1≤i≤m j=1
, or |xij |
xnor ij =
xij Xmax
After normalized the position, normalized element-wise position diversity is calculated as follows: m
x ¯nor = Dp =
n
1 X X nor xij m×n 1 m×n
i=1 j=1 m X n X i=1 j=1
|xnor ¯nor | ij − x
where Dp is the normalized position diversity for particles at this running step.
3.5.3
Normalized Velocity Diversity
Velocity diversity, which gives the tendency information of particles, measures the distribution of particles’ current velocity. In other words, velocity diversity measures the “activity” information of particles. Based on the measurement of velocity diversity, particle’s tendency of expansion or convergence could be obtained.
48
Dimension-Wise Diversity Vector in velocity matrix is independent to measure the dimension-wise diversity. Vector norm L1 , or L∞ , or maximum value of velocity is applied to normalize velocity: vij vij = n , or P kvk1 |vij |
nor vij =
nor vij =
j=1
nor vij
vij vij = , or kvk∞ max |vij |
vij = Vmax
Normalized dimension-wise velocity diversities are calculated as follow: m
v¯jnor =
1 X nor vij m
m
Djv =
i=1
1 X nor |vij − v¯jnor | m
n
Dv =
i=1
1X v Dj n j=1
where Dv = [D1v , · · · , Dnv ] are the diversities on each dimension, and Dv is the normalized velocity diversity for particles. Element-Wise Diversity In the measurement of element-wise diversity, vectors are not independent in velocity matrix. One of three operators: max |vij | in velocity matrix, or matrix L∞ norm of V, or maximum value of velocity is applied to normalize the velocity. nor vij =
nor vij =
vij , or max |vij |
nor vij =
vij = kVk∞
max
vij n P
1≤i≤m j=1
, or |vij |
vij Vmax
Normalized element-wise velocity diversity is calculated as follows: m
v¯nor =
n
1 X X nor vij m×n i=1 j=1
Dv =
1 m×n
m X n X i=1 j=1
nor |vij − v¯nor |
where Dv is the normalized velocity diversity for particles at this running step.
3.5.4
Normalized Cognitive Diversity
Cognitive diversity represents the target distribution of all particles found currently. The measurement of cognitive diversity is the same as position diversity except using each particle’s current personal best position instead of current position. Therefore, the analysis for position diversity is also effective for cognitive diversity.
49
Dimension-Wise Diversity The normalized cognitive positions are as follow: pnor ij =
pij pij = n , or P kpk1 |pij |
pnor ij =
j=1
pij pij = , or kpk∞ max |pij |
pnor ij =
pij Xmax
Normalized dimension-wise cognitive diversities are calculated as follow: m
p¯nor j
1 X nor = vij m
m
Djc
i=1
1 X nor = |vij − v¯jnor | m i=1
n
1X v D = Dj n c
j=1
where Dc = [D1c , · · · , Dnc ] are the diversities on each dimension, and Dc is the normalized cognitive diversity for particles. Element-Wise Diversity Like the definition of position diversity, the normalized personal best positions are as follow: pnor ij = pnor ij =
pij , or max |pij |
pnor ij =
pij = kPk∞
max
pij n P
1≤i≤m j=1
, or |pij |
pij Xmax
Normalized element-wise cognitive diversity is calculated as follows: m
nor
p¯
n
1 X X nor = pij m×n
Dc =
1 m×n
i=1 j=1 m X n X i=1 j=1
|pnor ¯nor | ij − p
where Dc is the normalized cognitive diversity for particles at this iterative step.
3.5.5
Experimental Studies
Benchmark Test Functions The experiments have been conducted to test the benchmark functions listed in Appendix A.1. Without loss of generality, five standard unimodal and six multimodal test functions are selected [152, 249]. All functions are run 50 times to ensure a reasonable statistical result necessary to compare different approaches. Randomly shifting of the location of optimum is utilized in each dimension for each run.
50
Parameter Setting The parameter setting is the same as that of the standard PSO. In all experiments, PSO has 50 particles, c1 = c2 = 1.496172, and the inertia weight w = 0.72984 [21, 44]. Each algorithm has 5000 iterations for 100 dimensional problems in every run. There is also a limitation in velocity to control the search step size. The setting could prevent the particles to cross the search boundary. The maximum velocity is set as follows: maximum velocity = 0.2 × (position upper bound − position lower bound).
(3.7)
Boundary Constraints Handling With an improper boundary constraints handling method, particles may get “stuck in” the boundary [31]. The classical boundary constraints handling method is as follows: if xi,j (t + 1) > Xmax,j Xmax,j Xmin,j if xi,j (t + 1) < Xmin,j xi,j (t + 1) = (3.8) x (t + 1) otherwise i,j where t is the number of the last iteration, and t + 1 is the number of current iteration. This strategy resets particles to a particular point – the boundary, which constrains particles to fly in the search space limited by boundary. For PSO with star structure, a stochastic boundary constraints handling method was utilized in this section. The equation (3.9) gives a method that particles are reset into a special area. Xmax,j × (rand() × c + 1 − c) if xi,j (t + 1) > Xmax,j Xmin,j × (Rand() × c + 1 − c) if xi,j (t + 1) < Xmin,j xi,j (t + 1) = x (t + 1) otherwise i,j
(3.9)
where c is a parameter to control the resetting scope. In our experiment, the c is set to 0.1. A particle will be close to the boundary when position is beyond the boundary. This will increase the exploitation ability of algorithm searching the solution close to the boundary. A deterministic method, which resets a boundary-violating position to the middle between old position and the boundary [264], was utilized for PSO with local ring structure. The equation is as follows: 1 2 (xi,j,G + Xmax,j ) if xi,j,G+1 > Xmax, j 1 xi,j,G+1 = 2 (xi,j,G + Xmin,j ) if xi,j,G+1 < Xmin,j x otherwise i,j,G+1
(3.10)
The position in last iteration is used in this strategy. Both classic strategy and this strategy reset a particle to a deterministic position.
51
Experimental Results Table 3.6 is the results of variants of PSO solving unimodal and multimodal benchmark functions, respectively. The bold numbers indicate the better solutions. Five measures of performance are reported. The first is the best fitness value attained after a fixed number of iterations. In our case, we report the best result found after 5000 iterations. The following measures are the median value, worst value, mean value, and the standard deviation of the best fitness values for all runs. It is possible that an algorithm will rapidly reach a relatively good result while becoming trapped into a local optimum. These values reflect the algorithm’s reliability and robustness. Table 3.6: Results of PSO with global star and local ring structure for solving benchmark functions. All algorithms are run for 50 times, where “Best”, “Median”, “Worst”, and “Mean” indicate the best, median, worst, and mean of the best fitness values for all runs, respectively. Result fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best -449.9999 -329.9877 3818.890 363 -449.9706
f5 f6 f7 f8 f9 f10
180.0 -330.0 450.0 180.0 120.0 330.0
267.7406 46.2115 642.0642 183.1818 120.0169 330.6340
Result fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best -449.9999 -329.9999 26269.130 330 -449.9061
f5 f6 f7 f8 f9 f10
308.5254 97.8392 869.4427 180.0000 120.0000 330.0017
180.0 -330.0 450.0 180.0 120.0 330.0
PSO with Global Star Median Worst -449.6854 -442.6770 -329.2344 -304.9446 8427.7940 19023.334 583 2004 -449.9444 -449.7928 462.5463 208.2699 801.0000 185.8173 120.2609 332.1802
1304.6580 363.4809 1038.0011 199.6355 122.0702 334.3695
PSO with Local Ring Median Worst -449.9999 -449.9999 -329.9999 -318.1378 39541.160 52074.665 331 339 -449.8677 -449.7951 386.3057 226.0655 1051.2509 181.9460 120.0000 330.5103
524.9256 413.9907 1200.5000 182.8799 120.0000 331.8971
Structure Mean -449.1768 -326.7997 9030.511 754.44 -449.9363
Std. Dev. 1.33996 5.72326 3068.57 385.978 0.03416
485.0325 203.8959 803.5064 187.5995 120.3948 332.2510
164.280 64.8012 96.4137 5.10715 0.39427 1.01614
Structure Mean -449.9999 -329.5631 38807.785 331.46 -449.8652
Std. Dev. 4.49E-10 1.70413 5966.04 2.00209 0.02438
396.6181 225.8784 1053.3454 181.8619 120.0000 330.6488
48.9784 61.4603 69.4105 0.54818 6.35E-09 0.48609
From the result in Table 3.6, we can conduct that the seven function f0 , f1 , f3 , f4 , f8 , f9 , and f10 have a good optimization result, while for other four functions f2 , f5 , f6 , and f7 , the results are not very good. This is because of the property of functions, some functions will become significant difficult when the dimension increased. The PSO with global star structure is better than PSO with local ring structure on some 52
functions, and on others, the PSO with local ring structure is better.
3.5.6
Diversity Analysis and Discussion
Different functions have different properties, which may lead to different population diversities changes during search. Considering the number of local optima, the benchmark functions can be categorized as unimodal function and multimodal function. The benchmark functions also can be divided as separable function and non-separable function based on the dependant of dimensions. Four representative benchmark functions in Table 3.7 are chosen to analyze the different normalized population diversity. Table 3.7: Some representative benchmark functions. Parabolic Schwefel’s P1.2
f0 f2
unimode unimode
separable non-separable
Rosenbrock Ackley
f5 f8
multimode multimode
non-separable separable
Normalized Population Diversity Figures 3.7, 3.8, 3.9, and 3.10 display the population diversity changing while PSO solving unimodal f0 , f2 , and multimodal function f5 , f8 , respectively. Figure 3.7 displays the population diversity changing while PSO solving unimodal Parabolic function f0 . The Parabolic function is a separable problem, i.e., each dimension is independent. The “hardness” of this problem will not significantly increase when the problem has large scale. The curve of population diversities changing is very smooth in this problem. Figure 3.8 displays the population diversity changing while PSO solving unimodal Schwefel’s P1.2 function f2 . The Schwefel’s P1.2 function is a non-separable problem, i.e., each dimension is dependent on others. This problem will significantly get “hard” when the problem becomes large scale. The curve of population diversities changing has many vibrating, and the normalized diversity only has a slight decreasing. This indicates that particles cannot locate some good regions, most of search are spent on the exploration state. The conclusion can be drawn that the Schwefel’s P1.2 function f2 is more difficult than Parabolic function f0 , and the result may be far from the optimum. Figure 3.9 displays the population diversity changing while PSO solving multimodal Rosenbrock function f5 . The Rosenbrock function is also a non-separable problem. This problem also will significantly get “hard” when the problem becomes large scale. The curve of population diversities changing has many vibrating, and the normalized diversity only has a slight decreasing. The curve of population diversities is similar to function f2 . 53
0
0
10
10
position velocity cognitive
−1
10
−2
10
−1
10
position velocity cognitive
−3
10
−2
10 −4
10
−5
10
−3
10
−6
10
−7
10
−4
0
1000
2000
3000
4000
10
5000
0
(a) dimension-wise
2000
3000
4000
5000
4000
5000
(b) element-wise
0
0
10
10
−1
−1
10
10
−2
−2
10
10
−3
−3
10
10
−4
−4
10
10
−5
−5
10
10
−6
−6
position velocity cognitive
10
−7
10
position velocity cognitive
10
−7
10
−8
10
1000
−8
0
1000
2000
3000
4000
10
5000
(c) dimension-wise
0
1000
2000
3000
(d) element-wise
Figure 3.7: Population diversity changing while PSO solving unimodal Parabolic function f0 : (a), (b) is PSO with global star structure, (c), (d) is PSO with local ring structure.
54
0
0
10
10
position velocity cognitive −1
−1
10
10
−2
−2
10
10
position velocity cognitive −3
10
−3
0
1000
2000
3000
4000
10
5000
0
(a) dimension-wise
1000
2000
3000
4000
5000
(b) element-wise
0
0
10
10
position velocity cognitive
−1
10
−2
10
position velocity cognitive
−1
10
−2
0
1000
2000
3000
4000
10
5000
(c) dimension-wise
0
1000
2000
3000
4000
5000
(d) element-wise
Figure 3.8: Population diversity changing while PSO solving unimodal Schwefel’s P1.2 function f2 : (a), (b) is PSO with global star structure, (c), (d) is PSO with local ring structure.
55
0
0
10
10
position velocity cognitive
−1
10
−1
10
−2
10
−2
10 −3
10
−3
position velocity cognitive
10
−4
10
−5
10
−4
0
1000
2000
3000
4000
10
5000
0
(a) dimension-wise
1000
2000
3000
5000
(b) element-wise
0
0
10
10
position velocity cognitive
position velocity cognitive −1
−1
10
10
−2
−2
10
10
−3
10
4000
−3
0
1000
2000
3000
4000
10
5000
(c) dimension-wise
0
1000
2000
3000
4000
5000
(d) element-wise
Figure 3.9: Population diversity changing while PSO solving multimodal Rosenbrock function f5 : (a), (b) is PSO with global star structure, (c), (d) is PSO with local ring structure.
56
0
0
10
10
position velocity cognitive
−1
10
−2
−1
10
10
−3
10
−4
−2
10
10
position velocity cognitive
−5
10
−6
10
−3
0
1000
2000
3000
4000
10
5000
0
(a) dimension-wise
1000
2000
3000
4000
5000
4000
5000
(b) element-wise
0
0
10
10
−1
10
−1
10
−2
10
−2
10
−3
10
−3
10 −4
10
−4
10 −5
10
−5
−6
10
position velocity cognitive
10
−7
10
−8
10
position velocity cognitive
−6
10
−7
0
1000
2000
3000
4000
10
5000
(c) dimension-wise
0
1000
2000
3000
(d) element-wise
Figure 3.10: Population diversity changing while PSO solving multimodal Ackley function f8 : (a), (b) is PSO with global star structure, (c), (d) is PSO with local ring structure.
57
Figure 3.10 displays the population diversity changing while PSO solving multimodal Ackley function f8 . The Ackley function is a separable problem [218]. The particles are easily “stuck in” the local optima. The “hardness” of this problem are increased when the problem has large scale. From the observation of population diversities in these figures, there are some conclusions that can be drawn as follows. • The population diversities defined in this thesis can be divided into three categories: position diversity, velocity diversity, and cognitive diversity. These di-
versity definitions measure the distribution of particles positions, velocities, and cognitive positions, respectively. • For position diversity and cognitive diversity, the value of diversities can be seen as the “average distance” among particles. A large value indicates that the particles
are distributed in a large area, on the contrary, a small value indicates that the particles are get clustered in a small region. • For velocity diversity, the value of diversity measures the “search potential” of
the particles. A large value indicates that the particles are exploring the search space, it has highly possible to “jump out” of the local optima. On the contrary, a small value indicates that the particles are exploiting the potential regions.
• The population diversities decrease during the search process. This indicates that particles are distributed in the large area, and get converged after iterations. This gives a simulation of the particles search process, particles spread in the search space at first, then get converged into local optima through the propagation of search information [36]. • For dimension-wise population diversity, the changing curves of position, velocity,
and cognitive diversity are very similar. These three definitions of population diversity are dependent on each other in canonical particle swarm optimization algorithm. The cognitive diversity can be seen as a simplified version of the position diversity. The position diversity has more vibrating than the cognitive diversity. The changing of velocity diversity affects the value of position diversity. With a large value of velocity diversity, particles search in a large region, the particles are distributed in this region, which will lead to a large value of position diversity. On the contrary, with a small value of velocity, the particle’s position has a small movement, all particles stay in a small region, this will lead to a small value of position diversity.
• Without additional diversity maintain strategy [32,33,120], such as inserting ran-
domly generated individuals, niching, re-initialization, or reformulating the fitness 58
function considering the age of individuals, the population diversities usually will decrease in the whole search process. This indicates that the canonical particle swarm optimization algorithm is difficult to “jump out” of local optima. This is a trade-off in the search process. It is difficult to recognize local optima during search, and an improper additional strategy may lead algorithm ineffectiveness. • For element-wise population diversity, position diversity and cognitive diversity remain to be a stable value over iterations. This definition combines all dimensions altogether. For a problem with different optimal value in different dimension, this kind of definition may confuse about the difference in dimensions [28]. • The definitions of diversity measurement gives some useful information during particles search process. Position diversity and velocity diversity always have a
continuous vibrate, this proved that particles “fly” from one side of optimum to another side on each dimension continually [213]. Normalized Population Diversity Comparison Figures 3.11, 3.12, 3.13, and 3.14 display the curves of different normalized population diversity when PSO solving unimodal function f0 , and f2 , respectively. Figure 3.11 displays the population diversities of PSO with global star structure, and Figure 3.12 displays the population diversities of PSO with local ring structure. The population diversity of PSO with star structure decreases faster than PSO with ring structure at the beginning of search. However, the PSO with ring structure can reach a smaller value than PSO with star structure. This indicates that the PSO with ring structure has better exploitation ability. Figure 3.13 displays the population diversities of PSO with global star structure, and Figure 3.14 displays the population diversities of PSO with local ring structure. This problem is difficult to solve, the result is not good. The population diversity didn’t have a significant decreasing during the search process. Figures 3.15, 3.16, 3.17, and 3.18 display the curves of different normalized population diversity when PSO solving multimodal function f5 , and f8 , respectively. Figure 3.15 and Figure 3.17 display the population diversities of PSO with global star structure when PSO solving multimodal function f5 and f8 , respectively. Figure 3.16 and 3.18 display the population diversities of PSO with local ring structure when PSO solving multimodal function f5 and f8 , respectively. The population diversities of PSO with global star structure have more vibrating than the PSO with the local ring structure. From the observation of different normalized population diversities in these figures, there are some conclusions that can be drawn as follows.
59
0
0
10
10
−1
vector L1 norm
10
position max vector L∞ norm
−2
10
max posij
−1
10
position max matrix L∞ norm
−3
10
−4
10
−2
10
−5
10
−6
10
−7
10
−3
0
1000
2000
3000
4000
10
5000
0
1000
(a) position
2000
3000
4000
5000
(b) position
0
0
10
10
max velij −1
−1
10
vector L1 norm
−2
10
−2
10
velocity max vector L∞ norm
−3
−3
10
10
−4
10
velocity max matrix L∞ norm
10
−4
0
1000
2000
3000
4000
10
5000
0
1000
(c) velocity
2000
3000
4000
5000
4000
5000
(d) velocity
0
0
10
10
−1
10
vector L1 norm
−2
max pbestij
position max vector L∞ norm
10
−3
10
position max matrix L∞ norm
−1
10
−4
10
−5
10
−2
10 −6
10
−7
10
−8
10
−3
0
1000
2000
3000
4000
10
5000
(e) cognitive
0
1000
2000
3000
(f) cognitive
Figure 3.11: Comparison of different normalization of PSO population diversity: PSO with global star structure on unimodal Parabolic function f0 , (a), (c), and (e) are dimension-wise diversity, (b), (d), and (f) are element-wise diversity.
60
0
0
10
10
−2
10
max posij
−1
10
position max matrix L∞ norm
−4
10
−6
10
vector L1 norm
−2
10
position max vector L∞ norm
−8
10
−10
10
−3
0
1000
2000
3000
4000
10
5000
0
1000
(a) position
3000
4000
5000
4000
5000
(b) position
0
0
10
10
−1
−1
10
10
−2
−2
10
10
−3
−3
10
10
−4
−4
10
10
−5
10
−6
10
−7
10
−5
vector L1 norm
10
velocity max vector L∞ norm
10
max velij
−6
velocity max matrix L∞ norm
−7
10
−8
10
2000
−8
0
1000
2000
3000
4000
10
5000
0
1000
(c) velocity
2000
3000
(d) velocity
0
0
10
10
−2
10
max pbestij −1
position max matrix L∞ norm
10 −4
10
−6
10
vector L1 norm
−2
10
position max vector L∞ norm
−8
10
−10
10
−3
0
1000
2000
3000
4000
10
5000
(e) cognitive
0
1000
2000
3000
4000
5000
(f) cognitive
Figure 3.12: Comparison of different normalization of PSO population diversity: PSO with local ring structure on unimodal Parabolic function f0 , (a), (c), and (e) are dimension-wise diversity, (b), (d), and (f) are element-wise diversity.
61
0
0
10
10
vector L1 norm position max vector L∞ norm
−1
10
max posij
−1
10
position max matrix L∞ norm
−2
10
−2
10 −3
10
−4
10
−3
0
1000
2000
3000
4000
10
5000
0
1000
(a) position
2000
3000
4000
5000
(b) position
0
0
10
10
max velij −1
velocity max matrix L∞ norm
−1
10
10
−2
−2
10
10
vector L1 norm velocity max vector L∞ norm
−3
10
−3
0
1000
2000
3000
4000
10
5000
0
1000
(c) velocity
2000
3000
4000
5000
(d) velocity
0
0
10
10
vector L1 norm position max vector L∞ norm
−1
10
−1
max pbestij
10
position max matrix L∞ norm
−2
10
−2
10 −3
10
−4
10
−3
0
1000
2000
3000
4000
10
5000
(e) cognitive
0
1000
2000
3000
4000
5000
(f) cognitive
Figure 3.13: Comparison of different normalization of PSO population diversity: PSO with global star structure on unimodal Schwefel’s P1.2 function f2 , (a), (c), and (e) are dimension-wise diversity, (b), (d), and (f) are element-wise diversity.
62
0
0
10
10
−1
max posij
−1
vector L1 norm
10
10
position max matrix L∞ norm
position max vector L∞ norm −2
−2
10
10
−3
10
−3
0
1000
2000
3000
4000
10
5000
0
1000
(a) position
2000
3000
4000
5000
4000
5000
(b) position
0
0
10
10
vector L1 norm −1
velocity max vector L∞ norm
10
−1
max velij
−2
10
10
velocity max matrix L∞ norm
−3
10
−2
10
−4
0
1000
2000
3000
4000
10
5000
0
1000
(c) velocity
2000
(d) velocity
0
0
10
10
−1
max pbestij
−1
10
10
vector L1 norm
position max matrix L∞ norm
position max vector L∞ norm −2
−2
10
10
−3
10
3000
−3
0
1000
2000
3000
4000
10
5000
(e) cognitive
0
1000
2000
3000
4000
5000
(f) cognitive
Figure 3.14: Comparison of different normalization of PSO population diversity: PSO with local ring structure on unimodal Schwefel’s P1.2 function f2 , (a), (c) and, (e) are dimension-wise diversity, (b), (d), and (f) are element-wise diversity.
63
0
0
10
10
vector L1 norm
−1
10
max posij
position max vector L∞ norm
−2
10
position max matrix L∞ norm
−1
10
−3
10
−4
−2
10
10
−5
10
−6
10
−3
0
1000
2000
3000
4000
10
5000
0
1000
(a) position
2000
3000
4000
5000
4000
5000
(b) position
0
0
10
10
max velij −1
10
−2
−2
10
10
vector L1 norm
−3
10
−3
10
velocity max vector L∞ norm
−4
10
velocity max matrix L∞ norm
−1
10
0
1000
2000
−4
3000
4000
10
5000
0
1000
(c) velocity
2000
3000
(d) velocity
0
0
10
10
vector L1 norm
−1
10
max pbestij
position max vector L∞ norm
−2
10
position max matrix L∞ norm
−1
10
−3
10
−4
−2
10
10
−5
10
−6
10
−3
0
1000
2000
3000
4000
10
5000
(e) cognitive
0
1000
2000
3000
4000
5000
(f) cognitive
Figure 3.15: Comparison of different normalization of PSO population diversity: PSO with global star structure on multimodal Rosenbrock function f5 , (a), (c), and (e) are dimension-wise diversity, (b), (d), and (f) are element-wise diversity.
64
0
0
10
10
−1
−1
10
10
vector L1 norm position max vector L∞ norm
−2
10
−2
10
max posij
−3
10
position max matrix L∞ norm
−3
0
1000
2000
3000
4000
10
5000
0
1000
(a) position
2000
3000
4000
5000
(b) position
0
0
10
10
vector L1 norm
max velij
velocity max vector L∞ norm
−1
10
−1
velocity max matrix L∞ norm
10
−2
10
−2
10
−3
10
−3
10
−4
0
1000
2000
3000
4000
10
5000
0
1000
(c) velocity
2000
3000
4000
5000
4000
5000
(d) velocity
0
0
10
10
−1
−1
10
10
vector L1 norm position max vector L∞ norm
−2
10
−2
10
max pbestij
−3
10
position max matrix L∞ norm
−3
0
1000
2000
3000
4000
10
5000
(e) cognitive
0
1000
2000
3000
(f) cognitive
Figure 3.16: Comparison of different normalization of PSO population diversity: PSO with local ring structure on multimodal Rosenbrock function f5 , (a), (c), and (e) are dimension-wise diversity, (b), (d), and (f) are element-wise diversity.
65
0
0
10
10
−1
vector L1 norm
−2
position max vector L∞ norm
10
10
−1
10
max posij position max matrix L∞ norm
−3
10
−4
−2
10
10
−5
10
−6
10
−3
0
1000
2000
3000
4000
10
5000
0
1000
(a) position
2000
3000
5000
(b) position
0
0
10
10
vector L1 norm
max velij
velocity max vector L∞ norm
−1
velocity max matrix L∞ norm
−1
10
10
−2
−2
10
10
−3
10
4000
−3
0
1000
2000
3000
4000
10
5000
0
1000
(c) velocity
2000
3000
4000
5000
(d) velocity
0
0
10
10
−1
vector L1 norm
−2
position max vector L∞ norm
10
10
−1
10
max pbestij position max matrix L∞ norm
−3
10
−4
−2
10
10
−5
10
−6
10
−3
0
1000
2000
3000
4000
10
5000
(e) cognitive
0
1000
2000
3000
4000
5000
(f) cognitive
Figure 3.17: Comparison of different normalization of PSO population diversity: PSO with global star structure on multimodal Ackley function f8 , (a), (c), and (e) are dimension-wise diversity, (b), (d), and (f) are element-wise diversity.
66
0
0
10
10
−1
10
−2
10
−1
10
−3
max posij
10
position max matrix L∞ norm
−4
10
−5
10
vector L1 norm
−6
−2
10
position max vector L∞ norm
10
−7
10
−8
10
−3
0
1000
2000
3000
4000
10
5000
0
1000
(a) position
3000
4000
5000
4000
5000
(b) position
0
0
10
10
−1
−1
10
10
−2
−2
10
10
−3
−3
10
10
−4
−4
10
10
vector L1 norm
−5
10
−6
max velij
−5
10
velocity max vector L∞ norm
10
velocity max matrix L∞ norm
−6
10
−7
10
2000
−7
0
1000
2000
3000
4000
10
5000
0
1000
(c) velocity
2000
3000
(d) velocity
0
0
10
10
−1
10
−2
10
−1
10
−3
max pbestij
10
position max matrix L∞ norm
−4
10
−5
10
vector L1 norm
−6
−2
10
position max vector L∞ norm
10
−7
10
−8
10
−3
0
1000
2000
3000
4000
10
5000
(e) cognitive
0
1000
2000
3000
4000
5000
(f) cognitive
Figure 3.18: Comparison of different normalization of PSO population diversity: PSO with local ring structure on multimodal Ackley function f8 , (a), (c) and, (e) are dimension-wise diversity, (b), (d), and (f) are element-wise diversity.
67
• In general, the curves of different normalized population diversities are very similar. Even with the different strategies to normalize the population diversities, the tendency of population diversities changing is the same. • The value of vector L1 norm is always larger than the value of vector L∞ norm, and in most cases, the position maximum is larger than value of vector L∞ norm, and smaller than value of vector L1 norm. This leads the value of normalized diversity based on vector L∞ norm to have the largest value, and normally, the value of vector L1 norm based normalized diversity has the smallest value. • Particles have a large “average” velocities at the beginning of search. The velocity will decrease to a small value after iterations. The sum of velocity values
may be smaller than the velocity maximum in some benchmark functions. For velocity diversity, the velocity maximum is a constant value during the search process, and velocity maximum based velocity diversity has more vibrating than other strategies based velocity diversities. The velocity maximum based velocity diversity gives more accurate information on “average” velocity changing behavior. Other strategies based velocity diversity gives an ratio between “average” velocity and current largest velocity value (or sum of current velocity values). • There is a significant decreasing in dimension-wise population diversities. The
dimension-wise population diversities also have more vibrating than element-wise population diversities. For dimension-wise population diversities, the measurement is based on each single dimension, and particles are observed as “flying” around the center of each dimension. On the contrary, for element-wise population diversities, the measurement is based on whole dimensions, and particles are observed as “flying” around the center of whole dimension.
• The detailed information of particles’ movement can be obtained from the dimensionwise population diversities. The element-wise diversities give a “dynamic equilibrium” of particles, which gives a dynamic search range during optimization.
3.5.7
Conclusions
This section proposed an analysis of population diversity based on the category of dimension-wise and element-wise diversity. Each dimension is measured separately in the dimension-wise diversity, on the contrary, the element-wise diversity measures all dimension together. In other words, each vector of position matrix is independent in the dimension-wise diversity measurement, and in the element-wise diversity observation, the vectors are considered together. Considering the property whether vectors are dependent on each other or not, vector norms are preferred to normalize dimensionwise population diversity and matrix L∞ norm is preferred to be used for element-wise 68
population diversity. Particles on the state of “expansion” or “converge” can be determined by this diversity measurement. From the normalized population diversity, the diversity changing of PSO variants on different type of functions can be compared and analyzed. The particles’ dynamical search state, the “hardness” of function, the number of local optima, and other information can be obtained. With this information, performance of optimization algorithm can be improved by adjusting population diversity dynamically during PSO search process. Particles with different topology structure also have different vector dependence in position or velocity matrix. Seeking the influence of PSO topology structure and vector partial dependence analysis is the research need to be explored further. The idea of normalized population diversity measuring can also be applied to other evolutionary algorithms, e.g., genetic algorithm, differential evolution because evolutionary algorithms have the same concepts of current population solutions and search step. The performance of evolutionary algorithms can be improved by utilizing this measurement of population diversity. Dynamically adjusting the population diversity controls the algorithm’s ability of exploration or exploitation; hence, algorithm has higher possibility to reach optimum.
69
Chapter 4
Population Diversity Observation 4.1
Overview
In this chapter, population diversity based boundary constraints handling strategies and population diversity on single and multi-objective optimization are observed and discussed.
4.2
Population Diversity based Boundary Constraints Handling Analysis
Most reported optimization methods are designed to avoid premature convergence in solving multimodal problems [16]. However, premature convergence also happens in solving unimodal problems when the algorithm has an improper boundary constraint handling method. For example, even for the simplest benchmark function – Sphere, or termed as a Parabolic problem, which has a convex curve in each dimension, particles may “stick in” the boundary and the applied PSO algorithm therefore cannot find the global optimum during its search process. With regards to this, premature convergence needs to be addressed in both unimodal and multimodal problems. Avoiding premature convergence is important in problem optimization, i.e., an algorithm should have a balance between fast convergence speed and the ability of “jumping out” of local optima. Particles fly in the search space. If particles can easily get clustered together in a short time, particles will lose their “search potential.” Premature convergence means particles have a low possibility to explore new search areas. Although many methods were reported to be designed to avoid premature convergence [26], these methods did not incorporate an effective way to measure the degree of premature convergence, in other words, the measurement of particles’ exploration / exploitation is still needed to be investigated. There are several definitions on diversity measurement based on particles’ positions [207, 208]. Through diversity measurements, useful exploration and/or exploitation search information can be obtained. 70
PSO is simple in concept and easy in implementation, however, there are still many issues that need to be considered [132]. Boundary constraint handling is one of them [238]. In this section, different boundary constraints handling methods and their impacts are discussed. Position diversity will be measured and analyzed for PSO with different boundary constraints handle strategies and different topology structures.
4.2.1
Boundary Constraints Handling
This section presents a brief survey on the main existing methods that deal with boundary constraints in the literature. Even PSO is simple and easy in implementation, there are still some issues need to be considered [132], and boundary constraints handling is one of the issues. There are different strategies to handle a particle’s position when this particle exceeds its boundary limit. “Stuck in” the Boundary Algorithms are generally tested on the standard benchmark functions for the purpose of comparison. These functions have an optimum in the center of solution space [249]. However, for real problems, we don’t know the location of an optimum, and the optimum could be at any place in the solution space. With an improper boundary constraints handling strategy, a phenomenon of particles “stuck in” the boundary will occur. A classic boundary constraint handling strategy resets a particle at boundary in one dimension when this particle’s position is exceeding the boundary in that dimension. If the fitness value of the particle at boundary is better than that of other particles, all particles in its neighborhood in this dimension will move to the boundary. If particles could not find a position with better fitness value, all particles will “stick in” the boundary at this dimension. A particle is difficult to “jump out” of boundary even we increase the total number of fitness evaluations or the maximum number of iterations, and this phenomenon occurs more frequently for high-dimensional problems. Classical Strategy The conventional boundary handling methods try to keep the particles inside the feasible search space S. Search information is obtained when particles fly in the search space. However, if a particle’s position exceeds the boundary limit in one dimension at one iteration, that search information will be abandoned. Instead, a new position will be reset to the particle in that dimension. The classic strategy is to set the particle at boundary when it exceeds the boundary [257]. The equation of this strategy is as
71
follows: xi,j,G+1
Xmax,j if xi,j,G+1 > Xmax,j Xmin,j if xi,j,G+1 < Xmin,j = x i,j,G+1 otherwise
(4.1)
where G is the number of the last iteration, and G+1 is the number of current iteration. This strategy resets particles in a particular point – the boundary, which constrains particles to fly in the search space limited by boundary. Deterministic Strategy A deterministic method was reported by Zielinski et al.
in [264], which resets a
boundary-violating position to the middle between old position and the boundary. The equation is as follows: xi,j,G+1
1 2 (xi,j,G 1 2 (xi,j,G
+ Xmax,j ) if xi,j,G+1 > Xmax, j + Xmin,j ) if xi,j,G+1 < Xmin,j = x otherwise i,j,G+1
(4.2)
The position in last iteration is used in this strategy. Both classic strategy and this strategy reset a particle to a deterministic position. Stochastic Strategy Eberhart and Shi utilized a stochastic strategy to reset the particles when particles exceed the position boundary [76].
xi,j,G+1
1 Xmax,j − ( 2 rand()(Xmax,j − Xmin,j )) if xi,j,G+1 > Xmax,j Xmin,j + ( 12 Rand()(Xmax,j − Xmin,j )) if xi,j,G+1 < Xmin,j = x otherwise i,j,G+1
(4.3)
where rand() and Rand() are two random functions to generate uniformly distributed random numbers in the range [0, 1]. By this strategy, particles will be reset within the half search space when particles exceed the boundary limit. This will increases the algorithm’s exploration, that is, particles have higher possibilities to explore new search areas. However, it decreases the algorithm’s ability of exploitation at the same time. A particle exceeding the boundary means the global or local optimum may be close to the boundary region. An algorithm should spend more iteration in this region. With the consideration of keeping the ability of exploitation, the resetting scope should be taken into account. For most benchmark functions, particles “fly in” a symmetric search space. With regards to this, Xmax,j = 12 (X top,j − X bottom,j ) = 12 X scope,j and Xmin,j = −Xmax,j . The equation of resetting particle into a special area is as follows: Xmax,j × (rand() × c + 1 − c) if xi,j,G+1 > Xmax,j Xmin,j × (Rand() × c + 1 − c) if xi,j,G+1 < Xmin,j xi,j,G+1 = x otherwise i,j,G+1 72
(4.4)
where c is a parameter to control the resetting scope. When c = 1, this strategy is the same as the equations (4.3), that is, particles reset within a half space. On the contrary, when c = 0 , this strategy is the same as the equation (4.8), i.e., it is the same as the classic strategy. The closer to 0 the c is, the more particles have a higher possibility to be reset close to the boundary.
4.2.2
Experimental Study
Several performance measurements are utilized in the experiments below. The first is the best fitness value attained after a fixed number of iterations. In our case, we report the mean result found after the pre-determined maximum number of iterations. The second is the time t which indicates the times of particles stuck in the boundary. At the end of each run, we count the numbers of the particles, which has the best fitness value and which get stuck in boundary in at least one dimension. The number will be larger if a particle is stuck in boundary in more dimensions. All numbers will be summed after 50 runs. The summed number indicates the frequency of particles that may get stuck in boundary. Standard deviation values of the best fitness values are also utilized in this section, which gives the solution’s distribution. These values give a measurement of goodness of the algorithm. Benchmark Test Functions The experiments have been conducted on testing the benchmark functions listed in Appendix A.1. Without loss of generality, five standard unimodal and five multimodal test functions are selected [152, 249]. All functions are run 50 times to ensure a reasonable statistical result necessary to compare the different approaches. Every tested function’s optimal point in solution space S is shifted to a randomly generated point with different value in each dimension, and S ⊆ Rn , Rn is a n-dimensional Euclidean space. Velocity Constraints In the experiments, all benchmark functions have Vmin = −Vmax , it means that Vmin
has the same magnitude but opposite direction. The velocity also has a constraint to limit particle’s search step: if vij > Vmax then vij = Vmax else if vij < −Vmax then vij = −Vmax
73
Parameter Setting In all experiments, each PSO has 32 particles, and parameters are set as in the standard PSO, w = 0.72984, and c1 = c2 = 1.496172 [21]. Each algorithm runs 50 times.
4.2.3
Experimental Results
Observation of “stuck in” boundary By applying the classic strategy of boundary handling method, a position will be reset on the boundary if the position value exceeds the boundary of the search space. Table 4.1 and Table 4.2 give the experimental results of applying this strategy. Each benchmark function will be tested with dimension 25, 50, and 100 to see whether similar observation can be obtained. The maximum number of iterations will be set to be 1 000, 2 000, 4 000 corresponding to dimension 25, 50, and 100, respectively. From the results, we can conclude that each algorithm has different possibilities of “being stuck in” boundary when it is applied to different problems. Problem dimension does not have a significant impact on the possibility of particles “being stuck in” the boundary at least for the benchmark functions with dimensions 25, 50, and 100 that we tested. Furthermore, generally speaking, the PSO with star structure is more like to be attracted to and then to be “stuck in” the boundary, and the PSO with ring structure is less like to be “stuck in” boundary. If particles are “stuck in” the boundary, it is difficult for them to “jump out” of the local optima even we increase the maximum number of fitness evaluations. The fitness evaluation number of each function with dimension 100 in Table 4.1 and Table 4.2 is 32 × 4000 = 128 000, we then increase this number to 32 × 10 000 = 320 000.
The experimental results are given in Table 4.3. From Table 4.3, we can see that there is no any significant improvement neither on the fitness value nor on the number of particles “stuck in” the boundary. This means that by only increasing the number of fitness evaluations cannot help particles “jump out” of boundary constraints. Some techniques should be utilized for particles to avoid converging to the boundary. Table 4.4 gives the experimental results of the algorithm that ignores the boundary constraints. In the Table 4.4 and 4.5, only the PSO with star topology has the “t” column, while PSOs with other topologies do not have the “t” column because the “t” values are all zeros. For the same reason, other tables below do not have the “t” column. Particles take no strategy when particles meet the boundary. Some tested functions will get good fitness value with most of the obtained solutions being out of the search space. This may be good for particles flying in a periodic search space [257]. However, most problems have strict boundary constraints which this strategy does not fit for.
74
Table 4.1: Results of the strategy that a particle “sticks in” boundary when it exceeds the boundary constraints. All algorithms are run for 50 times, the maximum number of iteration is 1 000, 2 000, and 4 000 when the dimensions are 25, 50, and 100, respectively. Where “Mean” indicates the average of the best fitness values for each run, “times” t indicates the number of particle with the best fitness value “stuck in” the boundary at a dimension. The percentage shows the frequency of particles “stuck in” the boundary of the search space. Func. fmin f0
−450.0
f1
−330.0
f2
450.0
f3
330.0
f4
−450.0
f5
120.0
f6
−330.0
f7
450.0
f8
180.0
f9
330.0
n 25 50 100 25 50 100 25 50 100 25 50 100 25 50 100
Mean 4950.914 18512.653 66154.793 -312.9603 -265.9373 -170.1893 6556.133 59401.841 149100.11 6483.32 19175.24 70026.22 -446.843 -386.6677 49.74071
Star times t 301 (24.08%) 681 (27.24%) 1552 (31.04%) 159 (12.72%) 575 (23%) 1212 (24.24%) 223 (17.84%) 838 (33.52%) 1614 (32.28%) 284 (22.72%) 724 (28.96%) 1449 (28.98%) 245 (19.6%) 696 (27.84%) 1606 (32.12%)
Ring Mean t -441.0717 2 (0.16%) -391.2514 29 (1.16%) -269.4694 48 (0.96%) -329.9257 2 (0.16%) -326.8749 80 (3.2%) -299.1541 395 (7.9%) 1418.165 55 (4.4%) 20755.672 286 (11.44%) 119188.23 762 (15.24%) 439.7 8 (0.64%) 476.58 29 (1.16%) 859.12 36 (0.72%) -449.9543 24 (1.92%) -449.8241 44 (1.76%) -448.7530 67 (1.34%)
25 50 100 25 50 100 25 50 100 25 50 100 25 50 100
169.2253 308.2321 676.8980 102391.65 1075433.72 3500148.06 529.9848 730.7713 1168.5505 192.2881 195.2231 199.0853 7773207.1 1.635E08 1.063E09
319 (25.52%) 739 (29.56%) 1415 (28.3%) 334 (26.72%) 870 (34.8%) 1614 (32.28%) 210 (16.8%) 445 (17.8%) 664 (13.28%) 239 (19.12%) 420 (16.8%) 521 (10.42%) 250 (20%) 749 (29.96%) 1765 (35.3%)
120.4066 121.6709 122.7015 -236.9387 215.3156 2311.019 503.0551 638.3773 968.5158 181.1893 182.6727 191.6770 3567.791 331.2763 5394.753
75
7 (0.56%) 49 (1.96%) 66 (1.32%) 25 (2%) 118 (4.72%) 222 (4.44%) 103 (8.24%) 196 (7.84%) 369 (7.38%) 15 (1.2%) 69 (2.76%) 258 (5.16%) 15 (1.2%) 172 (6.88%) 394 (7.88%)
Table 4.2: Results of the strategy that a particle “sticks in” boundary when it exceeds the boundary constraints. All algorithms are run for 50 times, the maximum number of iteration is 1 000, 2 000, and 4 000 when the dimensions are 25, 50, and 100, respectively. Where “Mean” indicates the average of the best fitness values for each run, “times” t indicates the number of particle with the best fitness value “stuck in” the boundary at a dimension. The percentage shows the frequency of particles “stuck in” the boundary of the search space. Func. fmin f0
−450.0
f1
−330.0
f2
450.0
f3
330.0
f4
−450.0
f5
120.0
f6
−330.0
f7
450.0
f8
180.0
f9
330.0
n 25 50 100 25 50 100 25 50 100 25 50 100 25 50 100
Four Clusters Mean times t 347.1880 72 (5.76%) 1284.5929 249 (9.96%) 7744.0162 615 (12.3%) -328.008 38 (3.04%) -314.1489 242 (9.68%) -254.34974 791 (15.82%) 1551.690 82 (6.56%) 18687.756 411 (16.44%) 97555.705 1049 (20.98%) 1105.8 86 (6.88%) 2697.68 238(9.52%) 7688.02 396 (7.92%) -449.836 61 (4.88%) -447.0686 157 (6.28%) -418.6275 359 (7.18%)
Von Neumann Mean t -64.9233 36 (2.88%) 467.76781 178 (7.12%) 4053.8707 421 (8.42%) -328.354 30 (2.4%) -323.7258 158 (6.32%) -273.9211 630 (12.6%) 1151.919 68 (5.44%) 16730.741 327 (13.08%) 93162.75 778 (15.56%) 876.56 53 (4.24%) 1604.98 166 (6.64%) 7021.66 311 (6.22%) -449.901 33 (2.64%) -448.6615 91 (3.64%) -433.2899 282 (5.64%)
25 50 100 25 50 100 25 50 100 25 50 100 25 50 100
128.1231 141.3122 194.1575 13681.40 16784.125 167064.11 500.5270 635.7780 957.5405 186.4093 188.7666 191.9497 473424.72 5365821.99 26421658.9
123.0943 128.9908 156.2124 1472.597 25350.406 123572.63 497.6501 612.3373 879.3672 184.0952 185.2521 189.5658 10931.296 2296129.53 6918596.18
88 (7.04%) 270 (10.8%) 591 (11.82%) 107 (8.56%) 368 (14.72%) 712 (14.24%) 135 (10.8%) 281 (11.24%) 519 (10.38%) 94 (7.52%) 267(10.68%) 490 (9.8%) 76 (6.08%) 345 (13.8%) 789 (15.78%)
76
36 (2.88%) 187 (7.48%) 428 (8.56%) 76 (6.08%) 287 (11.48%) 586 (11.72%) 129 (10.32%) 253 (10.12%) 479 (9.58%) 64 (5.12%) 183 (7.32%) 450 (9%) 54 (4.32%) 291 (11.64%) 646 (12.92%)
Table 4.3: Results of the strategy that a particle stays at boundary when it exceeds the boundary constraints. Algorithms have a large maximum number of fitness evaluations, i.e. 10 000. The “times” t indicates the number of particle with the best fitness value “stuck in” the boundary at a dimension. The dimension n is 100. Func. fmin f0 -450 f1 -330 f2 450 f3 330 f4 -450
Best 14899.01 -273.65 24164.8 28985 -338.05
Star Mean 67822.602 -176.4286 126974.82 71393.22 143.0742
times t 1540 (30.8%) 1225 (24.5%) 1675 (33.5%) 1528 (30.56%) 1599 (31.98%)
Best -450 -329.748 30037.25 332 -449.916
Ring Mean -256.070 -297.975 68200.2 931.7 -448.580
t 70 (1.4%) 475 (9.5%) 845 (16.9%) 45 (0.9%) 49 (0.98%)
f5 f6 f7 f8 f9
120 -330 450 180 330
317.710 508037 960.509 195.015 7.43E07
723.8275 3963368.9 1201.7850 199.5501 9.696E08
1556 (31.12%) 1881 (37.62%) 722 (14.44%) 545 (10.9%) 1874 (37.48%)
120 -155.490 866.29 182.900 330.015
123.623 1690.53 965.6807 193.9069 73541.50
106 299 413 312 487
Func. fmin f0 -450 f1 -330 f2 450 f3 330 f4 -450
Best 577.74 -318.57 16044.7 1428 -449.631
Four Clusters Mean times t 7843.960 604 (12.08%) -237.349 874 (17.48%) 71934.29 1099 (21.98%) 10083.18 529 (10.58%) -400.9793 425 (8.5%)
Best -295.86 -328.24 17995.32 1189 -449.788
Von Neumann Mean t 2944.748 410 (8.2%) -279.874 603 (12.06%) 68816.78 1002 (20.04%) 6138.18 371 (7.42%) -439.6080 284 (5.68%)
f5 f6 f7 f8 f9
126.806 1404.81 807.14 186.521 364.993
189.9494 229710.9 971.4570 194.2383 4.265E07
121.857 113.27 808.17 183.741 331.475
120 -330 450 180 330
589 (11.96%) 810 (16.2%) 550 (11%) 526 (10.52%) 979 (19.58%)
77
154.8893 68857.72 912.055 190.306 1.420E07
(2.12%) (5.98%) (8.26%) (6.24%) (9.74%)
454 (9.08%) 640 (12.8%) 552 (11.04%) 532 (10.64%) 808 (16.16%)
Table 4.4: Results of the strategy that a particle ignores the boundary when it exceeds the boundary constraints. All algorithms are run for 50 times, where “Best” and “Mean” indicate the best and the average of the best fitness values for each run. The “times” t indicates the number of particle with the best fitness value “stuck in” the boundary at a dimension, n is 100, and maximum iteration number is 4 000. Function fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best -449.999 -329.986 20558.33 443 -449.932
Star Mean -449.6162 -327.3012 35458.4 3343.06 -449.0495
times t 0 (0%) 1 (0.02%) 396 (7.92%) 39 (0.78%) 191 (3.82%)
Best -449.999 -329.999 78181.64 330 -449.783
Ring Mean -449.9999 -329.9900 116972.62 334.28 -449.6915
t 0 (0%) 0 (0%) 519 (10.38%) 2 (0.04%) 126 (2.52%)
f5 f6 f7 f8 f9
120.0 -330.0 450.0 180.0 330.0
120.000 -449.783 815.14 192.795 330.190
120.4391 -87.3399 1058.2517 199.47550 331.5831
1 (0.02%) 216 (4.32%) 412 (8.24%) 395 (7.9%) 117 (2.34%)
120.000 -147.687 827.52 182.579 332.755
120.0007 -15.65356 988.66426 199.26132 336.1132
0 (0%) 162 (3.24%) 265 (5.3%) 350 (7.0%) 201 (4.02%)
Function fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best -449.999 -329.999 36753.95 331 -449.899
Four Clusters Mean times t -449.999 0 (0%) -329.9998 0 (0%) 66411.14 435 (8.7%) 353.92 4 (0.08%) -449.833 172 (3.44%)
Best -449.999 -329.999 42283.00 332 -449.894
f5 f6 f7 f8 f9
120.000 -230.711 815.149 182.170 330.258
120.0 -330.0 450.0 180.0 330.0
120.0176 -107.963 945.3926 194.9949 332.4632
0 (0%) 164 (3.28%) 303 (6.06%) 286 (5.72%) 130 (2.6%)
78
120.000 -198.206 745.501 181.027 330.192
Von Neumann Mean t -449.9999 0 (0%) -329.999 0 (0%) 71154.16 425 (8.5%) 363.56 4 (0.08%) -449.835 154 (3.08%) 120.0205 -92.0036 892.9344 184.6948 332.1072
0 (0%) 157 (3.14%) 293 (5.86%) 89 (1.78%) 130 (2.6%)
Comparison of PSOs with different boundary constraint handling techniques Table 4.5 shows the results of PSOs with the deterministic strategy. A particle takes a middle value of the former position and the boundary limit value when the particle meets the boundary constraint. PSOs with ring, four clusters, and Von Neumann structure can obtain good fitness values by utilizing this strategy. However, “struck in” boundary will still happen for PSO with star structure for most problems. This is because particles with star structure will progressively move to boundary. With this tendency, particles will get clustered together at the boundary and be difficult to “jump out.” Therefore, the exploration ability decreases over the iterations. Table 4.5: Results of PSO with a deterministic boundary constraint strategy. Particles will be reset to the middle between old position and limit when particle’s position exceeds the boundary. where “Best” and “Mean” indicate the best and the average of the best fitness values for each run. The “times” t indicates the number of particle with the best fitness value “stuck in” the boundary at a dimension, n is 100, and maximum iteration number is 4 000. Function fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best 596.555 -311.238 24296.1 4575 -437.838
Star Mean 18125.68 -267.2679 63749.30 27628.66 -300.7705
times t 523 (10.46%) 368 (7.36%) 11 (0.22%) 103 (2.06%) 0 (0%)
Best -449.999 -329.999 36603.1 330 -449.862
Ring Mean -449.999 -324.0182 95768.66 343.04 -449.762
t 0 (0%) 2 (0.04%) 2 (0.04%) 0 (0%) 0 (0%)
f5 f6 f7 f8 f9
120.0 -330.0 450.0 180.0 330.0
136.432 22901.3 877.643 188.842 338.848
271.0433 627095.6 1104.514 198.1596 8.864E07
508 (10.16%) 657 (13.14%) 20 (0.4%) 67 (1.34%) 333 (6.66%)
120.000 -205.629 713.002 181.561 330.469
120.0399 -46.8004 944.5443 183.1589 333.7982
0 (0%) 4 (0.08%) 10 (0.2%) 0 (0%) 5 (0.1%)
Function fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best -450 -329.999 30997.7 333 -449.903
f5 f6 f7 f8 f9
120 -243.193 799.788 181.980 330.186
120.0 -330.0 450.0 180.0 330.0
Four Clusters Mean times t -409.4909 11 (0.22%) -315.4557 44 (0.88%) 61791.36 7 (0.14%) 1378.16 0 (0%) -449.0363 0 (0%) 122.2650 360.9332 944.4611 185.4077 339.5574
12 (0.24%) 28 (0.56%) 17 (0.34%) 7 (0.14%) 37 (0.74%)
Best -450 -329.999 30204.5 336 -449.911 120 -241.247 714.706 181.604 330.001
Von Neumann Mean t -420.4673 1 (0.02%) -324.081 8 (0.16%) 58886.36 0 (0%) 719.52 0 (0%) -449.3194 0 (0%) 120.8317 66.0810 839.2663 183.4269 331.4686
4 (0.08%) 5 (0.1%) 6 (0.12%) 1 (0.02%) 7 (0.14%)
With a stochastic strategy, a particle will be reset to a random position when the particle meets the boundary. Table 4.6 gives the result of PSOs with the stochastic strategy, that is, a particle is reset to be within the upper half space when the particle meets the upper bound, and correspondingly, a particle is reset to be within the lower half space when the particle meets the lower bound. Compared with the classic strategy 79
and the deterministic strategy, this strategy improves the result of PSO with the star structure, but it does not get better optimization performance for PSOs with other structures in this thesis. Table 4.6: Results of the strategy that a particle is randomly re-initialized within the half search space when the particle meets the boundary constraints. n is 100, and maximum iteration number is 4 000. Function fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best 7428.05 -251.373 35505.13 7900 -447.784
Star Mean 15106.48 -204.5540 90226.81 16437.02 -442.9050
Std. Dev. 5401.91 27.5700 38529.5 4735.01 5.45545
Best 4678.67 -294.982 74693.3 5300 -449.000
Ring Mean 6395.58 -283.7226 128790.1 7593.56 -447.7014
Std. Dev. 1164.00 5.04729 28617.9 1181.77 0.65199
f5 f6 f7 f8 f9
120.0 -330.0 450.0 180.0 330.0
204.206 15229.6 743.511 191.479 340.764
266.7711 68586.66 903.1275 193.9104 1834.003
36.4327 68055.9 69.4440 1.33005 8708.16
163.9248 3104.51 824.154 189.054 368.265
186.8523 16834.99 928.1833 190.0964 2073.147
12.4562 6821.93 51.2752 0.48874 3587.60
Function fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best 3885.38 -293.296 39649.4 4526 -448.865
f5 f6 f7 f8 f9
165.289 3729.22 731.572 188.993 344.433
120.0 -330.0 450.0 180.0 330.0
Four Clusters Mean Std. Dev. 6456.285 1471.81 -282.5310 7.37408 86885.8 31845.2 6954.9 1147.18 -447.9193 0.61041 180.9738 13083.35 870.4047 190.1364 489.1269
10.2183 6063.86 72.8032 0.62768 220.710
Best 3706.33 -296.048 42464.4 4864 -449.337
Von Neumann Mean Std. Dev. 5777.201 1252.94 -286.151 5.52508 89231.03 33283.2 6874.76 1237.94 -448.160 0.49077
153.942 1909.17 717.645 188.493 341.014
178.7420 12442.67 823.6537 189.6701 406.0229
11.1088 4804.74 59.7845 0.61684 121.916
In Table 4.6, particles are reset within half space when particles meet the boundary. This increases an algorithm’s the ability of exploration, and it decreases the ability of exploitation. A particle being close to the boundary may mean that the optimal area may be near the boundary, the resetting area should be restricted. Table 4.7 gives the result of resetting area limited to [0.9Vmax , Vmax ] when a particle meets the upper bound, and [Vmin , 0.9Vmin ] when a particle meets the lower bound (the lower bound is smaller than zero). This strategy can obtain better results. In both the Table 4.6 and 4.7, the resetting area does not change during the whole search process. Intuitively, at the beginning of search process, we want a large ability of exploration and small ability of exploitation to be able to search more areas of the search space [203, 205]. Correspondingly, at the end of search process, the exploitation ability should be more favored to find an optimum in “good” areas. With regards to this, the resetting search space should be dynamically changed in the search process. Table 4.8 gives the results of the strategy that the resetting space linearly decreases in 80
Table 4.7: Results of the strategy that a particle is randomly re-initialized within a limited search space when the particle meets the boundary constraints. n is 100, and the maximum iteration number is 4 000. Function fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best -449.314 -329.873 16133.02 464 -449.928
Star Mean -421.0384 -315.9906 35799.3 2592.84 -449.5926
Std. Dev. 106.757 14.7339 13401.0 2381.52 0.90197
Best -449.892 -329.673 52753.1 340 -449.869
Ring Mean -446.7030 -329.2227 99311.64 351.12 -449.7613
Std. Dev. 1.30163 0.15527 17424.3 6.55634 0.05515
f5 f6 f7 f8 f9
120.0 -330.0 450.0 180.0 330.0
120.086 -151.560 898.665 186.825 330.459
121.8713 271.2483 1059.507 197.5319 331.9493
3.27914 757.2085 86.4277 3.14876 1.01976
120.619 -186.020 834.263 181.376 330.883
120.9374 111.6548 971.4921 183.0840 334.1176
0.09021 176.704 74.6199 3.43442 1.36487
Function fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best -449.483 -329.747 32535.01 336 -449.929
f5 f6 f7 f8 f9
120.009 -201.233 758.107 181.410 330.223
120.0 -330.0 450.0 180.0 330.0
Four Clusters Mean Std. Dev. -448.672 0.54416 -329.1543 2.61051 56011.55 11425.0 359.58 45.4612 -449.869 0.02910 120.518 23.6072 930.1978 182.9285 332.1842
0.12078 144.996 105.099 2.46308 1.40831
81
Von Neumann Best Mean Std. Dev. -449.381 -448.2624 0.80095 -329.659 -329.423 0.52019 40768.18 56913.22 10526.6 331 368.96 77.6200 -449.920 -449.8604 0.03410 120.452 -196.464 720.429 181.222 330.130
120.6400 1.65853 838.011 182.322 331.255
0.10262 120.590 57.7271 0.54432 0.91798
the search process. Table 4.8: Results of the strategy that particles are randomly re-initialized in a linearly decreased search space when particles meet the boundary constraints. n is 100, and iteration number is 4 000. Function fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best -172.846 -314.970 17526.88 1192 -449.696
Star Mean 1702.321 -264.9917 53393.69 5143.02 -448.6407
Std. Dev. 2184.342 22.4393 19574.62 3071.211 1.94783
Best -427.502 -327.863 53918.5 380 -449.733
Ring Mean -408.3616 -327.3860 90638.595 420.66 -449.5890
Std. Dev. 10.8057 0.269110 19706.68 17.4489 0.08245
f5 f6 f7 f8 f9
120.0 -330.0 450.0 180.0 330.0
124.718 450.528 781.380 189.392 332.590
143.2678 9488.63 932.3062 193.2042 335.4259
16.3970 18895.48 74.4018 2.50837 1.71372
121.226 264.641 800.072 182.314 333.175
121.3696 695.2724 929.7192 183.1691 338.8566
0.09049 429.340 60.0003 0.41722 3.03905
Function fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best -433.730 -328.632 30471.06 368 -449.831
Four Clusters Mean Std. Dev. -399.1756 53.2334 -326.7792 2.1305 59016.26 16850.51 495.92 152.597 -449.7233 0.07594
Best -434.935 -328.653 27308.29 371 -449.859
f5 f6 f7 f8 f9
121.116 34.5858 771.370 182.258 331.803
120.0 -330.0 450.0 180.0 330.0
121.4365 461.9876 894.0441 184.1428 335.0429
0.42078 413.845 61.8244 1.08565 1.57895
121.132 -48.239 747.492 182.485 331.147
Von Neumann Mean Std. Dev. -404.6038 49.4888 -327.568 0.72114 57652.79 17353.16 484.9 180.2659 -449.730 0.075108 121.3901 341.310 835.023 183.6688 334.112
0.284098 246.16785 47.61768 0.97433 1.282631
By examining the experimental results, it is clear that different boundary constraints handling techniques have different impacts on particles’ diversity changing and optimization performance. The deterministic strategy fits for PSOs with ring, four clusters and Von Neumann structure. Resetting particles randomly in a small area fits for all the four topologies utilized in this section. Using this strategy, the PSO with star structure will have a good balance between ability of exploration and exploitation, which gets the best performance than other strategies.
4.2.4
Population Diversity Analysis and Discussion
Without loss of generality and for the purpose of simplicity and clarity, the results for one function from five unimodal benchmark functions and one function from five multimodal functions will be displayed because others will be similar. There are several definitions on the measurement of population diversities [30, 207, 208]. The dimension-wise population diversity based on the L1 norm is utilized in this section.
82
Position Diversity Monitoring Figure 4.1, and Figure 4.2 display the position diversity changing curves when PSO is applied to solve benchmark functions. Figure 4.1 displays the curves for the unimodal function f0 , and Figure 4.2 displays for the multimodal function f5 . In both figures, (a) is for functions f0 and f5 with a classic boundary handling technique, (b) is for functions f0 and f5 with particles ignoring the boundary, (c) is for functions f0 and f5 with particles close to boundary gradually, (d) is for functions f0 and f5 with particles resetting in half search space, (e) is for functions f0 and f5 with limited resetting space at a small range near boundary, (f) is for functions f0 and f5 with a linearly decreased resetting scope, respectively. Figure 4.1 and Figure 4.2 display the position diversity changing curves of particles with four kinds of topologies. Some conclusion can be made that PSO with star topology has the most rapid position diversity decreasing curve, and PSO with ring topology can keep its diversity over the large number of iterations, generally. PSO with four clusters structure and Von Neumann structure also keep their diversity well during the search process, and the curves of diversity changing are smooth in most times. The impact of different boundary constraint handling strategy on the position diversity also can be seen from these two figures. The values of position diversity changing will be very small when we utilize classic or deterministic strategies, and on the contrary, the position diversity will be kept at a “large” value when we utilize a stochastic strategy. The different position diversity changing curves indicate that particles will get clustered to a small region when we utilize a classic or deterministic strategy, and particles will be distributed in a large region when we utilize a stochastic strategy. The changing curves of position diversity reveal the algorithm’s ability of exploration and/or exploitation. The position diversity of PSO with a stochastic strategy will keep a “large” value, which indicates that with this strategy, PSO will have a good exploration ability. Position Diversity Comparison Different topology structure and boundary constraint handling method will have different impact on PSO algorithms’ convergence. Figure 4.3 and below give some comparison among PSOs with different structures. There are four curves in each figure, which are the minimum, middle, and maximum dimensional position diversity, and position diversity as a whole. It should be noted that the dimension which has the minimum, middle, or maximum value is not fixed and may change over iterations. In other words, if the dimension i has the minimum value at iteration k, and it may be the dimension j that has the minimum value at iteration k + 1. The figures only display position diversity’s minimum, middle and maximum values at each iteration. Figure 4.3 and 4.4 display the position diversity changing curves of a PSO with 83
2
2
10
10
0
0
10
10
−2
−2
10
10
−4
−4
10
10
star ring four clusters von neumann
−6
10
−6
10
−8
10
star ring four clusters von neumann
−8
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) classic
1500
2000
2500
3500
4000
(b) cross
2
2
10
10
1
star ring four clusters von neumann
10
0
10
−1
10
−2
1
10
10
−3
star ring four clusters von neumann
10
−4
10
−5
10
−6
10
3000
0
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
(c) deterministic
1000
1500
2000
2500
3000
3500
4000
(d) stochastic
2
2
10
10
star ring four clusters von neumann
1
10
star ring four clusters von neumann
1
10
0
10
0
10 −1
10
−2
10
−1
0
500
1000
1500
2000
2500
3000
3500
10
4000
(e) limit
0
500
1000
1500
2000
2500
3000
3500
4000
(f) linear
Figure 4.1: Position diversity changing curves for PSO solving parabolic function f0 with different strategies: (a) classic, (b) cross, (c) deterministic, (d) stochastic, (e) limit, (f) linear.
84
3
4
10
10
2
10
2
10
1
10
0
10
0
10
−1
10
−2
−2
10
star ring four clusters von neumann
10
−3
10
−4
10
−4
10
−5
10
star ring four clusters von neumann
−6
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) classic
1500
2000
2500
3000
3500
4000
(b) cross
3
3
10
10
2
star ring four clusters von neumann
10
1
10
2
10
0
10
−1
10
−2
star ring four clusters von neumann
10
−3
10
−4
10
1
10
−5
10
0
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
(c) deterministic
1500
2000
2500
3000
3500
4000
3000
3500
4000
(d) stochastic
3
3
10
10
star ring four clusters von neumann
2
10
2
10
1
1
10
10
0
star ring four clusters von neumann
0
10
10
−1
10
1000
−1
0
500
1000
1500
2000
2500
3000
3500
10
4000
(e) limit
0
500
1000
1500
2000
2500
(f) linear
Figure 4.2: Position diversity changing curves for PSO solving multimodal function f5 with different strategies: (a) classic, (b) cross, (c) deterministic, (d) stochastic, (e) limit, (f) linear.
85
2
2
10
10
minimum middle maximum diversity
0
10
−2
10
1
10
0
10
−1
10
−2
10
−4
10
minimum middle maximum diversity
−3
10 −6
10
−4
10 −8
10
−5
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
2000
2500
3000
3500
4000
(b) Ring
2
2
10
10
0
minimum middle maximum diversity
1
10
10
−2
0
10
10
−4
−1
10
10
minimum middle maximum diversity
−6
10
−2
10
−8
10
1500
−3
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
3000
3500
4000
(d) Von Neumann
Figure 4.3: Comparison of PSO population diversities for solving unimodal function f0 with classic boundary constraints handling techniques.
86
the classic boundary constraints handling method to solve unimodal function f0 and multimodal function f5 , respectively. Four subfigures display the PSO with star, ring, four clusters, and Von Neumann topology, respectively. As can be seen from the figures, the dimensional minimum value of position diversity is quickly getting zero for PSO with star, four clusters, and Von Neumann topology, while the dimensional minimum value of position diversity will exist during the whole search process for the PSO with ring topology. 4
3
10
10
2
minimum middle maximum diversity
2
10
0
10
10
1
10
0
10
−2
−1
10
10
−2
10
−4
minimum middle maximum diversity
10
−3
10 −6
10
−4
10 −8
10
−5
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
2000
2500
3000
3500
4000
3000
3500
4000
(b) Ring
3
3
10
10
2
2
10
10
1
1
10
10
0
0
10
10
−1
−1
10
10
minimum middle maximum diversity
−2
10
−3
10
minimum middle maximum diversity
−2
10
−3
10
−4
10
1500
−4
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
(d) Von Neumann
Figure 4.4: Comparison of PSO population diversities for solving multimodal function f5 with classic boundary constraints handling techniques. Compared with other topologies, the position diversity of PSO with star structure can get to the smallest value at the early iteration numbers, which means particles have clustered together in a small region, and any particle generally has the smallest distance to other particles. On the contrary, the position diversity of PSO with ring structure has the largest value, which means particles are distributed in a large region, and any particle generally has the largest distance to other particles. Figure 4.5 and 4.6 display the position diversity curve of PSO with the strategy that a particle ignores the boundary constraints when the particle’s position exceeds the limit. Figure 4.5 is for f0 , and Figure 4.6 is for f5 . Particles can keep their search “potential” with this strategy, the position diversity decreases in the whole search space, 87
2
2
10
10
1
10
0
10
0
10
−2
10
−1
10 −4
10
−2
10
minimum middle maximum diversity
−6
10
−8
10
−4
10
−10
10
minimum middle maximum diversity
−3
10
−5
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
2000
2500
3000
3500
4000
3000
3500
4000
(b) Ring
2
2
10
10
0
0
10
10
−2
−2
10
10
−4
−4
10
10
minimum middle maximum diversity
−6
10
minimum middle maximum diversity
−6
10
−8
10
1500
−8
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
(d) Von Neumann
Figure 4.5: Comparison of PSO population diversities for solving unimodal function f0 with exceeding boundary constraints handling techniques.
88
and not getting to zero at the end of each run. 4
3
10
10
minimum middle maximum diversity
2
10
0
10
2
10
1
10
0
10
−2
−1
10
10
−2
minimum middle maximum diversity
10
−4
10
−3
10 −6
10
−4
10 −8
10
−5
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
2000
2500
3000
3500
4000
3000
3500
4000
(b) Ring
4
4
10
10
2
2
10
10
0
0
10
10
−2
−2
10
10
minimum middle maximum diversity
−4
10
−6
10
minimum middle maximum diversity
−4
10
−6
10
−8
10
1500
−8
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
(d) Von Neumann
Figure 4.6: Comparison of PSO population diversities for solving multimodal function f5 with exceeding boundary constraints handling techniques. Figure 4.7 and 4.8 display the position diversity changing curves of PSO with a deterministic boundary handling strategy on unimodal function f0 and multimodal function f5 . PSO with star topology is easily “stuck in” the boundary with this strategy, and the minimum of position diversity quickly became zero in Figure 4.7 (a) and Figure 4.8 (a). PSO with other three topologies have some ability to “jump out” of local optima. Figure 4.7 (c) and Figure 4.8 (c) display the diversity changing curves of PSO with four clusters structure. Figure 4.7 (d) and Figure 4.8 (d) display the diversity changing curves of PSO with Von Neumann structure. From Figure 4.7 (c), (d) and Figure 4.8 (c), (d), we can observe dramatically “up and down” changes of the position diversity curve, which may mean that as a whole, the search process is convergent but there are divergent process embedded in the convergent process. Figure 4.9 and 4.10 display the position diversity changing curves of PSO with a stochastic boundary constraints handling technique to solve unimodal function f0 and multimodal function f5 , respectively. By utilizing a half search space resetting technique, the values of position diversities are larger than that of PSOs with other strategies, which means that particles search in a larger region, i.e., the ability of 89
5
2
10
10
minimum middle maximum diversity
1
10 0
10
0
10 −5
10
−1
10
−2
10
−10
10
minimum middle maximum diversity
−15
10
−3
10
−4
10
−20
10
−5
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
2000
2500
3000
3500
4000
3000
3500
4000
(b) Ring
2
2
10
10
0
0
10
10
−2
−2
10
10
−4
−4
10
10
minimum middle maximum diversity
−6
10
minimum middle maximum diversity
−6
10
−8
10
1500
−8
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
(d) Von Neumann
Figure 4.7: Comparison of PSO population diversities for solving unimodal function f0 with deterministic boundary constraints handling techniques.
90
5
3
10
10
2
10 0
1
10
10
0
10 −5
−1
10
10
−2
minimum middle maximum diversity
−10
10
10
−4
10
−15
10
minimum middle maximum diversity
−3
10
−5
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
2000
2500
3000
3500
4000
3000
3500
4000
(b) Ring
4
4
10
10
2
2
10
10
0
0
10
10
−2
−2
10
10
−4
−6
10
minimum middle maximum diversity
−4
minimum middle maximum diversity
10
10
−6
10
−8
10
1500
−8
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
(d) Von Neumann
Figure 4.8: Comparison of PSO population diversities for solving multimodal function f5 with deterministic boundary constraints handling techniques.
91
2
2
10
10
1
10
0
1
10
10
−1
10
minimum middle maximum diversity
−2
10
−3
10
−4
10
minimum middle maximum diversity
0
10
−1
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
2000
2500
3000
3500
4000
3000
3500
4000
(b) Ring
2
2
10
10
1
1
10
10
minimum middle maximum diversity
0
10
−1
10
0
10
minimum middle maximum diversity
−1
10
−2
10
1500
−2
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
(d) Von Neumann
Figure 4.9: Comparison of PSO population diversities for solving unimodal function f0 with stochastic boundary constraints handling techniques that random resetting particles in half search space.
92
exploration can be kept with this strategy. However, the ability of exploitation will be decreased when particles are getting close to the boundary. In general a distance between any pair of particles is larger than that in PSOs with other boundary constraint handling techniques at the same iterations. 3
3
10
10
minimum middle maximum diversity
2
10
1
2
10
10
0
10
−1
10
1
10
minimum middle maximum diversity
−2
10
−3
10
0
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
2000
2500
3000
3500
4000
(b) Ring
3
3
10
10
minimum middle maximum diversity
2
10
minimum middle maximum diversity
2
10
1
1
10
10
0
0
10
10
−1
10
1500
−1
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
3000
3500
4000
(d) Von Neumann
Figure 4.10: Comparison of PSO population diversities for solving multimodal function f5 with stochastic boundary constraints handling techniques that random resetting particles in half search space. Figure 4.11 and 4.12 display the position diversity changing curves of PSO with a small resetting area to solve unimodal function f0 and multimodal function f5 , respectively. Four subfigures are displayed for PSO with star, ring, four clusters, and Von Neumann topology, respectively. Like in other figures, the position diversities for PSO with star topology have the smallest value, and position diversities for PSO with ring topology have the largest value. Figure 4.13 and 4.14 display the position diversity changing curves of PSO with a linearly decreased resetting space to solve unimodal function f0 and multimodal function f5 , respectively. PSO with this strategy can have a good balance between exploration and exploitation. From the figures we can see that the minimum position diversity is kept to a small value but not to zero in the whole search process. This means that particles can exploit some specific areas, and at the same time, particles 93
2
2
10
10
0
10
1
10 −2
10
0
10
−4
10
−6
10
−1
10
minimum middle maximum diversity
−8
10
−10
10
−12
10
minimum middle maximum diversity
−2
10
−3
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
2000
2500
3000
3500
4000
3000
3500
4000
(b) Ring
2
2
10
10
minimum middle maximum diversity
1
10
0
10
0
10
−2
10
−1
−4
10
10
−2
minimum middle maximum diversity
−6
10
10
−3
−8
10
10
−4
10
1500
−10
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
(d) Von Neumann
Figure 4.11: Comparison of PSO population diversities for solving unimodal function f0 with stochastic boundary constraints handling techniques that randomly reset particles in a small and close to boundary search space.
94
4
3
10
10
2
10
2
10 0
10
1
10
−2
10
−4
10
0
10
minimum middle maximum diversity
−6
10
−8
10
−10
10
minimum middle maximum diversity
−1
10
−2
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
1500
2000
2500
3000
3500
4000
3000
3500
4000
(b) Ring
3
3
10
10
2
10
2
10
1
10
1
10
0
10
0
10 −1
10
−1
10 −2
10
minimum middle maximum diversity
−3
10
−4
10
−3
10
−5
10
minimum middle maximum diversity
−2
10
−4
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
(d) Von Neumann
Figure 4.12: Comparison of PSO population diversities for solving multimodal function f5 with stochastic boundary constraints handling techniques that randomly reset particles in a small and close to boundary search space.
95
2
2
10
10
0
10
1
10
−2
10
0
10 −4
10
minimum middle maximum diversity
−6
10
−8
10
minimum middle maximum diversity
−1
10
−2
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
2000
2500
3000
3500
4000
3000
3500
4000
(b) Ring
2
2
10
10
1
1
10
10
0
0
10
10
−1
−1
10
10
minimum middle maximum diversity
−2
10
minimum middle maximum diversity
−2
10
−3
10
1500
−3
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
(d) Von Neumann
Figure 4.13: Comparison of PSO population diversities for solving unimodal function f0 with stochastic boundary constraints handling techniques that randomly reset particles in a linearly decreased search space.
96
will not be clustered together in this area. Particles can “jump out” of the local optima with this strategy and the experimental results also show that PSO with this strategy can get a good performance. 3
3
10
10
2
10
1
2
10
10
0
10
−1
1
10
10
−2
10
minimum middle maximum diversity
−3
10
−4
10
−5
10
minimum middle maximum diversity
0
10
−1
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
1000
(a) Star
1500
2000
2500
3000
3500
4000
3000
3500
4000
(b) Ring
3
3
10
10
2
10
2
10
1
10 1
10
0
10 0
10
−1
minimum middle maximum diversity
−1
10
10
10
−2
10
minimum middle maximum diversity
−2
−3
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) Four Clusters
0
500
1000
1500
2000
2500
(d) Von Neumann
Figure 4.14: Comparison of PSO population diversities for solving multimodal function f5 with stochastic boundary constraints handling techniques that randomly reset particles in a linearly decreased search space. From Figure 4.3 to 4.14 we can see that PSO with star topology can achieve the smallest value of position diversity, and PSO with ring topology has the largest value at the same number of iterations. PSO with four clusters and Von Neumann nearly have the same diversity curve in our experiments. In summary, the PSO with star topology has the greatest ability to exploit the small area at the same number of iterations, and on contrast, the PSO with ring topology has the greatest ability to explore new search areas. The search “potential” of particles is important to an algorithm’s performance. Particles “fly” in a limited area. To ensure the performance of algorithms, not only the center of search area, but also the areas close to the boundary should be searched carefully. Some strategy should be utilized for the reason that if we take no action, particles can easily cross the boundary limit, and not return to the “limited” search area. It is a frequently used method that resets a particle’s position when the particle meets 97
the boundary. This method also has some drawbacks. Resetting a particle’s position in a specific location will decrease the particle’s search “potential”, and the ability of exploration and exploitation will also be affected; on the other hand, resetting particles on a large area will decrease the algorithm’s ability of exploitation, and particles will have difficulties to exploit the solution areas near the boundary. From experimental results of applying a deterministic strategy and three variants of stochastic strategy, we can observe that the deterministic strategy usually can obtain better optimization performance for PSO with ring, four clusters, or Von Neumann structure than PSO with other strategies at least for the ten benchmark functions and three boundary constraints handling strategies we experimented in this section. A random re-initialization strategy fits for PSO with star, four clusters, and Von Neumann structures, and the space of re-initialization also should be considered. This conclusion is also verified on the population diversity observation. Figure 4.7 and 4.8 display the position changing curves of PSO with deterministic strategy. It can be seen that particles in PSO with star topology are easily get clustered together. Some dimensional position diversities are quickly becoming zero, which may mean all particles stay in the same position and lose the search “potential” in these dimensions. All particles with four clusters and Von Neumann are also clustered together to the same position in some dimensions, i.e., the minimum position diversity becomes zero after several iterations. PSO with random resetting strategy can avoid the above problem. The position diversity curves of PSO with a stochastic strategy to handle boundary constraints are displayed in Figure 4.9 to Figure 4.14. Particles can keep their position diversities with this strategy. Considering about algorithm’s ability of exploitation, resetting particles in a small or decreased region can generally get better performance. PSO with different topology will have different convergence speed. PSO with star structure has the fastest convergence speed, PSO with ring structure has the slowest speed, PSO with four clusters or Von Neumann structure is in the middle of them. Keeping particle’s search “potential” and having a good balance of exploration and exploitation is important in the search process. Different boundary constraints handling strategy needs to be considered when we determine the PSO’s topology because a proper strategy can give an improvement on algorithm’s performance.
4.2.5
Conclusions
An algorithm’s ability of exploration and exploitation is important in the optimization process. With good exploration ability, an algorithm can explore more areas in the search space, and find some potential regions that “good enough” solutions may exist. On the other hand, an algorithm with the ability of exploitation can finely search the potentially good regions, and find the optimum ultimately. An algorithm should have 98
a good balance between exploration and exploitation during the search process. In this section, we have reviewed the different strategies to handle particles exceeding the position boundary constraint. Position diversity changing curves were utilized to study variant of algorithm’s ability of the exploration and/or exploitation. The position diversity changing curves of different variant of PSO were compared. From the position diversity measurement, the impacts of different boundary constraint handing strategies on the optimization performance were studied. Boundary constraints handling can affect particles’ search “potential”. The classic method resets particles on the boundary when particles exceed the boundary limit, which may mislead particles to the wrong search area, and cause particles “stuck in” the boundary. The position diversities of PSO with star, ring, four clusters, and Von Neumann topology were experimented in this section. PSO with different topology will have different convergence. From the diversity measurement, the convergence speed and the ability of “jumping out” of local optima could be observed and/or analyzed. A deterministic boundary handling technique may improve the search results of PSO with ring, four clusters, or Von Neumann topology, but not star topology. Premature convergence still occurs in PSO with star topology. Stochastic method can avoid the premature convergence, and by resetting particles in a small or decreased region will keep PSO’s ability of exploitation and therefore have a better performance. Besides the boundary constraints handling techniques discussed in this section, there are many other methods, such as “invisible boundaries”, “damping boundaries”, etc. [113, 238]. These methods will have a different impact on the optimization performance of PSO algorithms. As the same as the boundary constraints handling methods discussed in this section, these methods also can be analyzed by position diversity changing curves during the search process. The proper boundary constraint handling method should be considered together with the topology. As indicated by the “no free lunch theory”, there is no algorithm that is better than other one on average for all problems [232–235]. Different variant of PSO fits for different kinds of problems. The comparison between different variants of PSOs and their population diversities should be studied when they are applied to solve different problems. The impact of parameters tuning on population diversity for solving different problems are also needed to be researched. Also, there are other questions need to be studied. For example, the change observation of other definition of diversities on different boundary constraints handling strategies. In addition to the position diversity, there are velocity diversity and cognitive diversity defined in PSO algorithms [207,208], which are unique to PSO algorithms. Experimental study on boundary constraints handling strategy based on velocity diversity and cognitive diversity should also be conducted to gain better understanding of PSO algorithms.
99
4.3
Population Diversity in Single and Multi-Objective Optimization
4.3.1
Introduction
Premature convergence occurs when all individuals in population-based algorithms are trapped in local optima. In this situation, the exploration of algorithm is greatly reduced, i.e., the algorithm is searching in a narrow space. For continuous optimization problems, there are infinite numbers of potential solutions. Even with consideration of computational precision, there still have great number of feasible solutions. The computational resources allocation should be optimized, i.e., maintaining an appropriate balance between exploration and exploitation is a primary factor. Different problems have different properties. Single objective problem could be divided into unimodal problem and multi-modal problem. The multi-modal problem has several or number of optimum solutions, of which many are local optimal solutions. Optimization algorithms are difficult to find the global optimum solutions because generally it is hard to balance between fast converge speed and the ability of “jumping out” of local optimum. In other words, we need to avoid premature in the search process. Different computational resource allocation methods should be taken when dealing with different problems. A good algorithm should balance its exploration and exploitation ability during its search process. The concept of exploration and exploitation was firstly introduced in organization science [102, 163]. The exploration ability means an algorithm can explore more search place, to increase the possibility that the algorithm can find “good enough” solution(s). In contrast, the exploitation ability means that an algorithm focuses on the refinement of found promising areas. The population diversity [51,59,165,179] is a measure of individuals’ search information. From the distribution of individuals and change of this distribution information, we can obtain the algorithm’s status of exploration or exploitation. The population diversities definitions are different for single and multiobjective optimization algorithms. This is due to that multiobjective optimization algorithm has the different fitness metric in solutions. The Pareto domination is frequently used in current research to compare between two solutions. The Pareto domination has much strength, such as easy to understand, computational efficiency, however, it has some drawbacks: • It only can be used to compare between two single solutions, for several group of solutions, the Pareto domination is difficult to measure which group is better than others. • For multiobjective problems with large number of objectives, almost every solu100
tion is Pareto nondominated [115]. The Pareto domination is not appropriate for multiobjective problems with large number of objectives. In this section, we will analyse and discuss the population diversity of particle swarm optimizer for solving single and multi-objective problems. For multiobjective problems, the population diversity is observed on both the particles and the solutions in the archive. The population diversity of particles measures the distribution of the positions, while the population diversity of solutions can be used to measure the goodness of a set of solutions. This metric may guide the search in multiobjective problems with numerous objectives. Adaptive optimization algorithms can be designed through controlling balance between exploration and exploitation.
4.3.2
Multi-Objective Optimization
One of the main differences between single optimization and multiobjective optimization is that multiobjective optimization constitutes a multidimensional objective space. In addition, a set of solutions representing the trade-off among the different objectives rather than a unique optimal solution is sought in multiobjective optimization. How to measure the goodness of solutions and the performance of algorithms are important in multiobjective optimization. [14, 65] Although many articles have discussed metrics on multiobjective optimization [79, 139, 178, 268], there is no one metric that can overwhelm others. It is necessary to have more analysis and discussions. In this section, we classify variants of performance metrics into three categories: set based metrics, reference point based metrics, and the true Pareto front/set based metrics. The properties and drawbacks of different metrics are discussed and analyzed. From metrics analysis, more effective algorithms can be designed to solve multiobjective problems. In this section, population diversities are monitored on PSO solving single-objective problems. Multiobjective problems have different goal of optimization, it does not find one single solution, but many. The concept of convergence has different meanings between single and multiple objective problems [121]. Population diversity is also important when applying PSO to solve multi-objective problems [4]. Defining appropriate population diversity for a multi-objective particle swarm optimization algorithm (MOPSO) and monitoring its changing is helpful to understand the process of algorithms. Evolutionary multiobjective optimization (EMO) is a fast growing areas in computational intelligence (CI). Multiobjective optimization problems typically involve several conflicting objectives and in the absence of preference information, a large number of solutions, called the Pareto-optimal solutions, may be of interest. As swarm intelligence or evolutionary computation algorithms deal with a population of solutions, it has a natural advantage in the search for Pareto-optimal solutions (many solutions are found
101
at the same time). The traditional multiobjective optimization algorithms have two aspects in common: 1. the non-dominated population members are favored over dominated members, and 2. the population members in less-crowded regions are favored over those in crowded regions. The convergence and other computational aspects of multiobjective optimization algorithms are developed in many papers [3, 4, 116, 117], such as generating interesting and challenging test problems, handling constraints and large number of objectives, addressing multiobjective combinatorial optimization problems, developing indicators to measure the performance of algorithms, and applying multiobjective optimization algorithms to various practical applications. The principles of multiobjective optimization algorithms have also been exploited to aid in solving other optimization problems, such as handling uncertainties [61,98,119,180], dynamic nature of problems, computationally expensive objectives and constraints, among other things. The Pareto domination is widely utilized to compare two solutions. Many variants of domination are proposed to compare two collections of solution, such as Domination rank [214], domination count [94], and domination strength [265] are usually used to assign rank values. A variety of density estimation methods have been proposed. The widely used methods include the niching and fitness sharing strategy [94], crowding distance [60], K-nearest-neighbor method [267], fast sorting, and gridding and -domination method [47], On-Line Landscape Approximation [138], Multiobjective evolutionary algorithm based on decomposition (MOEA/D) with Gaussian Process Model [254], and Preference-Based Multiobjective Evolutionary Algorithms [62,63], just to name a few.
4.3.3
Performance Metrics in Multi-Objective Optimization
Set based Metrics In multiobjective optimization, a set of solutions representing the trade-off among all objectives rather than an unique optimal solution is sought. It’s a straightforward way to measure solutions on set based metrics. These metrics are a kind of quality measures, which are difficult to measure the goodness of solutions. Outperformance relations Three kinds of outperformance relations are introduced in [103] to express the relations between two sets of internally nondominated objective vectors. The relations are as follow: • Weak outperformance: ND(A ∪ B) = A and A 6= B. A weakly outperforms B if all solutions in B are contained in A and there is at least one solution in A that is not contained in B, e.g. Figure 4.15 (a). 102
f2
z5
z4 ∈ A and z4 ∈ B
f2
z4 ∈ A z ∈ B 4
z5 z4
z4
z4
f2
zi ∈ A and zi ∈ B z5 z5 z 4
z4
z3
z3
z3 z3
z2
z2
f1
(a) weak
z2 z1
z1
z2 z1 z1 f1
f1
(b) strong
(c) complete
Figure 4.15: Examples of outperformance relations, • ∈ A, and ∗ ∈ B: (a) A weak outperformance B, (b) A strong outperformance B, (c) A complete outperformance B. • Strong outperformance: ND(A ∪ B) = A and B − ND(A ∩ B) 6= ∅. A strongly outperforms B if all solutions in B are equal to or dominated by solutions in A
and there exists at least one solution in B that is dominated by solutions in A, e.g. Figure 4.15 (b). • Complete outperformance: B ∩ ND(A ∪ B) = ∅. A completely outperforms B if each solution in B is dominated by solutions in A, e.g. Figure 4.15 (c).
where ND() denotes the set includes all nondominated solutions, and A, B are two groups of solutions. C Measure The C measure indicates the coverage of two sets [265]. This measure
compares two sets of solutions and calculates the proportion of solutions in the second set for which there are solutions at least as good for every objective in the first set. The definition of C measure is as follows: Let A, B ⊆ X be two sets of decision
vectors. The function C maps the ordered pair (A, B) to the interval [0, 1]: C(A, B) :=
|{b ∈ B | ∃a ∈ A : a b}| |B|
The value C(A, B) = 1 means that all decision vectors in B are at least weakly
dominated by A. The opposite, C(A, B) = 0, represents the situation when none of the points in B are weakly dominated by A. Note that always both directions have to be considered, since C(A, B) is not necessarily equal to 1 − C(B, A). The C measure has some drawbacks:
• It cannot measure the subset relation. In Figure 4.16. (a), Set A includes Set B, however, values of C measure are both zero.
• If solutions in set A not dominated by solutions in set B, while vice versa, the value of C measure is zero, e.g., Figure 4.16. (b).
• The magnitude of solutions is not considered. In Figure 4.16. (c), the result of C measure does not obey the intuition.
103
f2
• ∈ A and ∗ ∈ B z5
f2
• ∈ A and ∗ ∈ B z4
f2
• ∈ A and ∗ ∈ B z5
z1
z4 z3
z4
z3 z2
z3
z2
z2
z1
z1
f1
f1
(a)
(b)
z1
z1 f1
(c)
Figure 4.16: Drawbacks of C measure: (a) Set B is a subset of set A: C(A, B) = 0, C(B, A) = 0; (b) A 6≺ B and B 6≺ A: C(A, B) = 0, C(B, A) = 0; (c) Different number of element in each set: C(A, B) = 0, C(B, A) = 1/5. The above metrics are a kind of quality measure. It shows the relations of two sets, however, in most cases, two solutions both have part of non-dominated solutions. The set based metrics are difficult to utilize under the situation. function M3 The function M3 is a spread metric, which measures the spread
of the solutions set A in decision space or the spread of the obtained nondominated solutions U in objective space [265]. v u n uX M3 (A) = t max{kai − bi k |a, b ∈ A} i=1
v u n uX ∗ M (U) = t max{kui − vi k |u, v ∈ U} 3
i=1
The function M3 ignores the magnitude of solutions. Reference Point Based Metrics The reference points based metrics are mostly used in multiobjective optimization. Through these metrics, the goodness of solutions is measured by a single scalar. These metrics are easy in concept and efficient in calculation, however, these metrics are sensitive to the choice of the reference point, and a solution in different part of Pareto front plays different role in the scalar calculation. S Measure (hypervolume) A favored metric is hypervolume, also known as the S
measure [265] or Lebesgue measure. The hypervolume is a measure of how much of the objective space is weakly dominated by a given nondominated set. i.e., it measures the size of the portion of objective space that is dominated by these solutions collectively. Generally, hypervolume is favored because it captures in a single scalar both the closeness of the solutions to the optimal set and, to some extent, the spread of the solutions across objective space. Hypervolume also has nicer mathematical properties 104
than many other metrics; although it is difficult to calculate the accurate value of hypervolume, many fast algorithms are proposed to get a approximate scalar [230,231]. Also, it has been proved that hypervolume is maximized if and only if the set of solutions contains only Pareto optima. Hypervolume has some nonideal properties: • It is sensitive to the choice of reference point. Figure 4.17 displays that the same sets have a different ordering in S caused by a different choice of z ref [139, 140].
• Extreme points play a more important role than points in the middle of the Pareto front. For example, in Figure 4.17 (a), z3 is important than z2 , and in Figure 4.17 (b). z10 worth much than z20 . • Hypervolume is expensive to calculate, an approach needs to be designed to approximate it within a reasonable error [231].
D Measure The D measure indicates the coverage difference of two sets [265]. This
measure combines the C measure and hypervolume measure.
The definition of D measure is as follows: Let A, B ⊆ X be two sets of decision
vectors. The function D is defined by
D(A, B) := S(A + B) − S(B)
(4.5)
and gives the size of the space weakly dominated by A but not weakly dominated by B (regarding the objective space). As shown in Figure 4.18, (a) is for C measure, (b) is for D measure. There is the
area of size α that is covered by front 1 but not by front 2; and area of size β that is
covered by front 2 but not by front 1. The dark-shaded area (of size γ) is covered by both front in common. It holds that D(A, B) = α, and D(B, A) = β.
In this example, D(B, A) > D(A, B) which reflects the quality difference between
the two fronts in contrast to the C metric. In addition, the D measure gives information
about whether either set entirely dominates the other set, e.g., D(A, B) = 0 and D(B, A) > 0 means that A is dominated by B.
The D measure is based on the hypervolume calculation. It is sensitive to the choice
of reference point. Figure 4.19 displays that the same sets have a different ordering in D caused by a different choice of z ref . True Pareto Front/Set Based Metrics True Pareto front based metrics compares the distribution of Pareto front found by the search algorithm and the true Pareto front. This kind of metrics is only utilized on benchmark problems, because the true Pareto front is unknown for real-world problems. However, utilizing these metrics, the search efficiency of different algorithms can be compared. 105
zref
A
f2
zref
B
f2 z3
z3 z2
z2 z1
z1 f1
f1
(a)
(b) zref
A
f2
zref
B
f2
z3 z3 z2
z2 z1
z1 f1
f1
(c)
(d) zref
A
f2
zref
B
f2 z3
z3 z2
z2 z1
z1 f1
f1
(e)
(f)
Figure 4.17: The relative value of the S metric depends upon an arbitrary choice of reference point zref . Two nondominated sets are shown, A and B, in figure (a) and (b) S(A) = S(B), in figure (c) and (d) S(A) > S(B), and in figure (e) and (f) S(A) < S(B). The same sets have a different ordering in S caused by a different choice of z ref .
• ∈ A and ∗ ∈ B
f2
f2
front 1 front 2
z4 z3
z4
• ∈ A and ∗ ∈ B zref only covered by front 2 covered by front 1 and front 2 β γ
z3 z2 z2
α
z1 z1
only covered by front 1
f1
f1
(a)
(b)
Figure 4.18: The comparison between C measure and D measure: (a) the C measure: C(A, B) = C(B, A) = 1/2; (b) The D measure: D(A, B) = α, D(B, A) = β, and D(A, B) > D(B, A). 106
f2
• ∈ A and ∗ ∈ B zref
f2
γ
β
• ∈ A and ∗ ∈ B zref γ
β
α
α f1
f2
(a) • ∈ A and ∗ ∈ B
f1 (b)
zref γ
β
α f1 (c) Figure 4.19: The relative value of the D metric depends upon an arbitrary choice of reference point z ref . In figure (a) D(A, B) = D(B, A), (b) D(A, B) < D(B, A), (c) D(A, B) > D(B, A).
107
Inverted Generational Distance (IGD) One frequently used metric is Inverted generational distance (IGD) [149, 260] also known as reverse proximity indicator (RPI) [19, 212], or the convergence metric γ [60]. It measures the extent of convergence to a known set of Pareto-optimal solutions. The definition of this metric is as follows: Let P ∗ be a set of uniformly distributed Pareto-optimal points in the P F (or P S). Let P be an approximation to the P F (or the P S). The IGD metric is defined as follows: P ∗ d(v, P) ∗ IGD(P , P) = v∈P ∗ |P | where d(v, P) is the minimum Euclidean distance between v and all of the points in the set P; and |P∗ | is the cardinality of P∗ . In this metric, the number of solutions in P should be large enough to obtain an accurate result.
The IGD metric can be utilized both in solution space and objective space. In objective space, P∗ is a set of points in the P F and d(v, P) is the minimum distance between fitness values of solutions and the Pareto front. While in decision space, P∗ is a set of points in the P S and d(v, P) is the minimum distance between solutions and the Pareto set. − − Hypervolume Difference (IH ) Metric The Hypervolume difference IH metric
is defined as − IH (P∗ , P) = IH (P∗ ) − IH (P)
where IH (P∗ ) is the hypervolume between the true Pareto front P∗ and a reference point, and IH (P) is the hypervolume between the obtained Pareto front P and the same reference point. The hypervolume difference measure is also based on the hypervolume calculation. The result may be different by the choice of reference point. − metric measure convergence and diversity. To Both the IGD metric and the IH − values, P must be close to the P F (or P S) and cannot miss any have low IGD and IH
part of the whole P F (or P S) [260]. Spacing ∆ Metric The spacing ∆ metric measures the extent of spread achieved among the obtained solutions [60]. The following metric is utilized to calculate the non-uniformity in the distribution: P|P|−1 ¯ df + dl + i=1 |di − d| ∆= df + dl + (|P| − 1)d¯ where di is the Euclidean distance between consecutive solutions in the obtained nondominated set of solutions P , df and dl are the distances between the extreme solutions in true Pareto front and the boundary solutions of P. d¯ is the average of all distance di , i ∈ [1, |P| − 1].
108
Conclusions Multiobjective Optimization refers to optimization problems that involve two or more objectives, and a set of solutions is sought instead of one. How to measure the goodness of solutions and the performance of algorithms is important in multiobjective Optimization. In this section, we reviewed variants of performance metrics and classified them into three categories: set based metric, reference point based metric, and the true Pareto front/set based metric. The properties and drawbacks of different metrics are discussed and analyzed. A proper metric should be chosen under different situations, and on the contrary, an algorithm’s ability can be measured by different metrics. An algorithm’s properties can be revealed through different metrics analysis on different problems, and then different algorithms can be utilized in an appropriate situation. From different metrics analysis, an algorithm’s properties can be revealed and more effective algorithms can be designed to solve MOO problems.
4.3.4
Diversity Change in Single Objective Optimization
Premature convergence occurs in population-based algorithms. Holland has introduced a well-known phenomenon of “hitchhiking” in population genetics [107]. In populationbased algorithms, all individuals search for optima at the same time. If, compared with other individuals, some newly searched solution gives extremely good fitness value; all other individuals may converge rapidly toward it. It is difficult to handle premature convergence. Once individuals “get stuck” in local optima, the exploration of the algorithm is greatly reduced, i.e., solutions lose their diversity in decision space. Another kind of premature convergence occurs due to improper setting of boundary constraints [31]. A classic boundary constraint handling strategy resets a particle at boundary in one dimension when this particle’s position is exceeding the boundary in that dimension. If the fitness value of the particle at boundary is better than that of other particles, all particles in its neighborhood in this dimension will move to the boundary. If particles could not find a position with better fitness value, all particles will “stick in” the boundary at this dimension. A particle is difficult to “jump out” of boundary even we increase the total number of fitness evaluations or the maximum number of iterations, and this phenomenon occurs with higher possibility for highdimensional problems. The most important factor affecting an optimization algorithm’s performance is its ability of “exploration” or “exploitation”. Exploration means the ability of a search algorithm to explore different areas of the decision space in order to find good optimum with high probability. Exploitation, on the other hand, means the ability to concentrate the search around a promising region in order to refine a candidate solution. A good 109
optimization algorithm optimally balances these conflicted objectives. Within the PSO, these objectives are addressed by the velocity update equation. Many strategies have been proposed to adjust an algorithm’s exploration and exploitation ability. Velocity clamp was firstly used to adjust the ability between exploration and exploitation [77]. Like the equation below, current velocity will be equal to maximum velocity or minus maximum velocity if velocity is greater than the maximum velocity or less than the minus maximum Vmax vij vij = −Vmax
velocity, respectively. vij > Vmax −Vmax ≤ vij ≤ Vmax vij < −Vmax
(4.6)
Adding an inertia weight is more effective than velocity clamp for it not only increases the probability for algorithm to converge, but has a way to control the whole process of algorithm’s searching [203]. There are many adaptive strategies to tune algorithms’ parameters during the search [33,35]. The properties of population diversity in single objective optimization have been discussed in many articles [36,51]. The properties of population in multiobjective optimization and the difference between population diversity in SOP and MOP still need to be analyzed and discussed. Population diversity of PSO is useful for measuring and dynamically adjusting an algorithm’s ability of exploration or exploitation accordingly. Shi and Eberhart gave three definitions on population diversity, which are position diversity, velocity diversity, and cognitive diversity [207, 208]. Position, velocity, and cognitive diversity are used to measure the distribution of particles’ current positions, current velocities, and pbests (the best position found so far for each particles), respectively. Cheng and Shi introduced the modified definitions of the three diversity measures based on L1 norm [28,30]. The definitions of dimension-wise population diversities are given in Section 3.2. Change of Position Diversity and Cognitive Diversity The position diversity and cognitive diversity measure the distribution of particles’ current position and previous best position. From the change of position diversity and cognitive diversity, the algorithm’s status of exploration and exploitation can be obtained. The change of position diversity and cognitive diversity are defined as follow: C p = Dp (t + 1) − Dp (t) C c = Dc (t + 1) − Dc (t)
where C p is the change of position diversity, and C c is the change of cognitive diversity. From the changing of position diversity and cognitive diversity, the speed of swarm convergence or divergence can be observed. The changing of position diversity and cognitive diversity can be divided into four cases:
110
1. position diversity increasing, cognitive diversity increasing, i.e., position diversity and cognitive diversity get increased at the same time. 2. position diversity decreasing, cognitive diversity decreasing, i.e., position diversity and cognitive diversity get decreased at the same time. 3. position diversity increasing, cognitive diversity decreasing. 4. position diversity decreasing, cognitive diversity increasing. For the first two cases, if the position diversity and cognitive diversity increase at the same time, the swarm is diverging, i.e., the algorithm is in the exploration status, and on the contrary, if the position diversity and cognitive diversity decrease at the same time, the swarm is converging, i.e., the algorithm is in the exploitation status. The inertia weight should be adaptively changed while in these two status. Ratio of Position Diversity to Cognitive Diversity Cognitive diversity represents the distribution of all current moving targets found by particles. From the relationship of position diversity and cognitive diversity, particles’ dynamical movement can be observed. The ratio of position diversity to cognitive diversity is defined as follows: R=
Dp Position Diversity = c Cognitive Diversity D
The ratio of position diversity to cognitive diversity indicates the movement of particles. If the value large than 1, the particles are searching in a large space, and the previous best solutions are in a small space. On the contrast, If the value less than 1, the particles’ search is in a small space. Different strategies should be utilized in these different situations to maintain an appropriate balance between exploration and exploitation.
4.3.5
Population Diversity in Multiobjective Optimization
Optimization has different means between single objective optimization and multiobjective optimization [121]. For MOO, optimization is not a single, but a lot, may be infinite of solutions. The population diversity is also important in multiobjective optimization [255, 260]. Unlike the single objective optimization, the diversity in multiobjective optimization not only concerns the convergence in decision space, but also the convergence in objective space. There are many discussion of population diversity in multiobjective optimization, such as convergence acceleration [3], diversity management [4], and diversity improvement [116]. The particle swarm optimizer has been extended to solve multiobjective problems. Particle swarm optimizer with dynamic neighborhood [109], an additional archive or extended memory [45–47, 110] are utilized in multiobjective optimization. 111
Population Diversity of Particles In the single objective optimization, each individual represents a potential solution. The individual with the best fitness value is the best solution for the solving problem. In the multiobjective optimization, if an individual corresponding to a nondominated solution, this solution may be chose into the additional archive. All nondominated solutions are stored in the archive. The population diversity should be measured on current solutions and the found nondominated solutions, i.e., the distribution of particles and distribution of solutions in archive should be observed. The measurement of distribution of particles is the same as PSO solving the single objective problems. The position diversity, velocity diversity, and cognitive diversity should be measured. These diversity definition and equations are the same as the PSO in single objective optimization. Population Diversity of Solutions The number of solutions in the archive may be different from the number of particles. For the purpose of generality and clarity, h represents the number of solutions, n represents the number of problem dimensions, and k is the number of objectives. The u represents the uth solution, u = 1, · · · , h, v is the vth objective, v = 1, · · · , k, and j is
the jth dimension, j = 1, · · · , n. Two kinds of population diversity should be measured
for MOPSO, diversity in solution space and diversity in objective space. Convergence in objective space are not preferred. All objectives should have a uniformly distribution in the Pareto front. Population Diversity of Pareto Set Population diversity of Pareto set measures distribution of nondominated solutions in the search space, therefore, can reflect solutions’ dynamics. Definition of diversity of Pareto set, which is based on the L1 norm, is as follows h
1X suj s¯j = h u=1
h
Djs
1X = |suj − s¯j | h u=1
n
1X s D = Dj n s
j=1
where s¯j is the center of solutions on dimension j, we define ¯s = [¯ s1 , · · · , s¯j , · · · , s¯n ] and
¯s represents the average of all solutions’ on each dimension. The parameter h is the number of solutions in the archive, the h and number of particles m can be different. The vector Ds = [D1s , · · · , Djs , · · · , Dns ] represents the diversity of Pareto set for each
dimension based on L1 norm. Ds measures the diversity of solutions in Pareto set as a
whole.
112
Population Diversity of Pareto Front Population diversity of Pareto Front measures distribution of fitness value in the objective space, therefore, can reflect goodness of solutions. Definition of diversity of Pareto front, which is based on the L1 norm, is as follows h
1X fuv f¯v = h u=1
h
Dvf =
1X |fuv − f¯v | h u=1
Df =
k 1X f Dv k v=1
where f¯v is the center of solutions on objective v, we define ¯f = [f¯1 , · · · , f¯v , · · · , f¯k ] and ¯f
represents the average of all fitness values on each objective; Df = [D1f , · · · , Dvf , · · · , Dkf ], which represents the diversity of Pareto front for each objective based on L1 norm. Df
measures the diversity of fitness values in Pareto front as a whole. The diversity measurement can be utilized to metric the solutions of multiobjective optimization. One of the main differences between single optimization and multiobjective optimization is that multiobjective optimization constitutes a multidimensional objective space. In addition, a set of solutions representing the trade-off among the different objectives rather than a unique optimal solution is sought in Multiobjective optimization (MOO). How to measure the goodness of solutions and the performance of algorithms is important in MOO [41]. The defined diversity metrics have several properties: [60] 1. Comparability: For the benchmark functions, the target (or desired) metric value (calculated for an ideally converged and diversified set of points) can be calculated. For the real world problems, the metric values can be compared. 2. Monotonicity: The metric should provide a monotonic increase or decrease in its value, as the solution gets improved or deteriorated slightly. This will also help in evaluating the extent of superiority of one approximation set with another. 3. Scalability: The metric should be scalable to any number of objectives. The Multiobjective optimization contains only two or three objectives, while the many objective optimization contains more than four objectives. The Pareto domination is utilized in these optimizations, however, there has been reported that almost every solution is Pareto nondominated in the problems with more than ten objectives [115, 116]. For the large scale multiobjective problems, especially a problem with large number of objectives, the Pareto domination may not be appropriated to metric the goodness of solutions. In this situation, we need to consider the scalability of metrics. Although the scalability has not been discussed a lot in current research and it is not an absolutely necessary property, but if followed, it will certainly be convenient for evaluating scalability issues of Multi-Objective Evolutionary Algorithms (MOEAs) in terms of number of objectives.
113
4. Computational efficiency: The metric should be computationally inexpensive, although this is not a stringent condition to follow. In swarm intelligence algorithms, many iterations are taken to search the optima. Considering the number of iterations, a fast metric can accelerate the search speed.
4.3.6
Experimental Study
Benchmark Test Functions Wolpert and Macerady have proved that under certain assumptions no algorithm is better than other one on average for all problems [232–235]. Considering the generalization, eleven single objective and six multiobjective benchmark functions were used in our experimental studies [152,249,256]. The aim of the experiment is not to compare the ability or the efficacy of PSO algorithm with different parameter setting or structure, e.g., global star or local ring, but to measure the exploration and exploitation information when PSOs are executed. Single Objective Problems The experiments have been conducted to test the single objective benchmark functions listed in Appendix A.1. Without loss of generality, five standard unimodal and six multimodal test functions are selected [152, 249]. All functions are run for 50 times to have statistical meaning for comparison among different approaches. Randomly shifting of the location of optimum is utilized in each dimension for each run. Multiobjective Problems There are six unconstrained (bound constrained) problem [256] in the experimental study, each problem has two objectives to be minimized. The benchmark functions are given in Appendix A.2. The unconstrained problem 1, 2, 3, 4, and 7 have a continuous Pareto front, the unconstrained problem 5 has a discrete Pareto front. Parameter Setting Single Objective Optimization In all experiments of PSO solving single objective problems, PSO has 50 particles. All functions are run 50 times to have statistical meaning for comparison among different approaches. The parameter setting is the same as that of the standard PSO. In all experiments, PSO has 50 particles, c1 = c2 = 1.496172, and the inertia weight w = 0.72984 [21, 44]. For each algorithm, the maximum number of iterations is 5000 for 100 dimensional problems in every run. There is also a limitation in velocity to control the search step size. The setting could prevent particles from crossing the search boundary. The maximum velocity is set as follows: maximum velocity = 0.2 × (position upper bound − position lower bound).
114
(4.7)
Multiobjective Optimization In all experiments of PSO solving multiobjective problems, PSO has 250 particles. The maximum number h of solutions in archive is 100. The maximum number of iterations is 2000 for 10 dimensional problems in every run. Other parameters are the same as that in the standard PSO. Boundary Constraints Handling Many strategies have been proposed to handle the boundary constraints. With an improper boundary constraints handling method, particles may get “stuck in” the boundary [31]. The classical boundary constraints Xmax,j Xmin,j xi,j (t + 1) = x (t + 1) i,j
handling method is as follows: if xi,j (t + 1) > Xmax,j if xi,j (t + 1) < Xmin,j otherwise
(4.8)
where t is the number of the last iteration, and t + 1 is the number of current iteration. This strategy resets particles in a particular point – the boundary, which constrains particles to fly in the search space limited by boundary. In the single objective optimization, for PSO with star structure, a stochastic boundary constraints handling method was utilized in this experiment. The equation (4.9) gives a method that particles are reset into a special area. Xmax,j × (rand() × c + 1 − c) if xi,j (t + 1) > Xmax,j Xmin,j × (Rand() × c + 1 − c) if xi,j (t + 1) < Xmin,j xi,j (t + 1) = x (t + 1) otherwise i,j
(4.9)
where c is a parameter to control the resetting scope. When c = 1, particles are reset within a half search space. On the contrary, when c = 0 , this strategy is the same as the equation (4.8), i.e., particles are reset within the boundary. The closer to 0 the c is, the more particles have a high possibility to be reset close to the boundary. In our experiment, the c is set to 0.1. A particle will be close to the boundary when position is beyond the boundary. This will increase the exploitation ability of algorithm search the solution close to the boundary. A deterministic method, which resets a boundary-violating position to the middle between old position and the boundary [264], was utilized for PSO with local ring structure. The equation is as follows: 1 2 (xi,j,G + Xmax,j ) if xi,j,G+1 > Xmax, j 1 xi,j,G+1 = 2 (xi,j,G + Xmin,j ) if xi,j,G+1 < Xmin,j x otherwise i,j,G+1
(4.10)
The position in last iteration is used in this strategy. Both classic strategy and this strategy reset a particle to a deterministic position. For the multiobjective particle swarm optimizer, the Equation (4.8) is utilized to handle the boundary constraints. 115
Experimental Results The result of PSO solving single objective problems is given in Table 4.9. The bold numbers indicate the better solutions. Five measures of performance are reported. The first is the best fitness value attained after a fixed number of iterations. In our case, we report the best result found after 5000 iterations. The following measures are the median value, worst value, mean value, and the standard deviation of the best fitness values for all runs. Table 4.9: Results of PSO with global star and local ring structure for solving benchmark functions. All algorithms are run for 50 times, where “Best”, “Median”, “Worst”, and “Mean” indicate the best, median, worst, and mean of the best fitness values for all runs, respectively. Result fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best -449.9999 -329.9999 4538.646 359 -449.9691
f5 f6 f7 f8 f9 f10
180.0 -330.0 450.0 180.0 120.0 330.0
272.4658 63.6766 610.0011 182.4990 120.0809 330.0116
516.5409 320.3105 802.0023 197.7353 120.2979 330.8841
1498.878 630.1263 1059.000 199.8314 122.4966 333.4218
599.9292 312.8142 803.5421 192.0871 120.3902 330.9681
246.224 103.560 97.1025 7.48450 0.38224 0.67871
Result fmin f0 -450.0 f1 -330.0 f2 450.0 f3 330.0 f4 -450.0
Best -449.9999 -329.9999 30211.37 330 -449.9165
PSO with Median -449.9999 -329.9999 40847.52 331 -449.8565
Local Ring Worst -449.9999 -329.0500 54352.03 337 -449.8119
Structure Mean -449.9999 -329.9771 41051.55 331.3 -449.8604
Std. Dev. 3.62E-10 0.13457 5739.44 1.5 0.02441
f5 f6 f7 f8 f9 f10
269.4788 73.2817 836.0000 180.0000 120.0000 330.0033
368.9582 325.6967 1034.000 181.9515 120.0000 330.7748
442.3104 441.0884 1182.250 199.5519 120.5880 332.7372
360.0133 309.8727 1022.319 183.8464 120.0117 330.9090
41.4152 74.7981 86.6742 5.75737 0.08232 0.61122
180.0 -330.0 450.0 180.0 120.0 330.0
PSO with Global Star Structure Median Worst Mean Std. Dev. -449.9045 -421.1076 -448.4784 5.01505 -329.9993 328.3578 -316.8113 92.1670 8024.012 23007.12 9129.964 3939.06 566 2220 635.6 314.235 -449.9432 -449.1378 -449.9251 0.11416
From the result in Table 4.9, we can conduct that the seven function f0 , f1 , f3 , f4 , f8 , f9 , and f10 have a good optimization result, while for other four functions f2 , f5 , f6 , and f7 , the results are not very good. This is because the property of functions, some functions will become significantly difficult when the dimension increases. Figure 4.20 shows single run results of PSO solving multiobjective problems. The multiobjective problems will become difficult with the increasing of objective’s number and problem’s dimension. 116
1
1
0.9
0.9
0.8
0.8
fitness front
0.7 0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1 0
fitness front
0.7
0.1 0
0.2
0.4
0.6
0.8
0
1
0
(a) UCP 1 solution
0.2
0.4
0.6
0.8
1
(b) UCP 2 solution
1.5
1.4
1.2
fitness front
1 1 0.8
0.6
fitness front
0.5
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0
1.8
0
(c) UCP 3 solution
0.2
0.4
0.6
0.8
1
1.2
1.4
(d) UCP 4 solution
2.5
1.5
2
fitness front
1
1.5
fitness front 1 0.5 0.5
0
0
0.5
1
1.5
0
2
(e) UCP 5 solution
0
0.5
1
1.5
2
2.5
(f) UCP 7 solution
Figure 4.20: The solution of particle swarm optimizer solving multiobjective UCP problems.
117
4.3.7
Analysis and Discussion
Diversity in Single Objective Optimization Different functions have different properties. Four representative benchmark functions, which include Parabolic f0 , Schwefel’s P1.2 f2 , Rosenbrock f5 , and Ackley f8 are chosen to analysis the different normalized population diversity. The properties of these functions are shown in Table 3.7 in Section 3.5. Population Diversity Observation 2
2
10
10
position velocity cognitive
1
10
0
1
10
10
−1
10
−2
0
10
10
position velocity cognitive
−3
10
−4
10
−1
0
1000
2000
3000
4000
10
5000
(a) Parabolic f0 Star
0
1000
2000
3000
4000
5000
(b) Schwefel’s P1.2 f2 Star
1
2
10
10
position velocity cognitive
0
10
position velocity cognitive
1
10
0
10 −1
10
−1
10
−2
10
−2
10
−3
10
−3
0
1000
2000
3000
4000
10
5000
(c) Rosenbrock f5 Star
0
1000
2000
3000
4000
5000
(d) Ackley f8 Star
Figure 4.21: Population diversities observation on particle swarm optimizer with star structure solving single objective problems. Figure 4.21 and Figure 4.22 show the population diversities on particle swarm optimizer solving single objective problems. The Figure 4.21 is PSO with star structure and Figure 4.22 is PSO with ring structure. The diversity measures show a distribution of particles’ positions, velocities, and cognitive positions. The PSO with ring structure has more smootherurve than the PSO with star structure. Change of Position Diversity and Cognitive Diversity Figure 4.23 and 4.24 show the population diversity change on PSO solving the four single objective problems. The diversities measure the distribution of particles, and
118
2
2
10
10
position velocity cognitive
0
10
−2
10
1
10 −4
10
position velocity cognitive
−6
10
−8
10
0
0
1000
2000
3000
4000
10
5000
(a) Parabolic f0 Ring
0
1000
2000
3000
4000
5000
(b) Schwefel’s P1.2 f2 Ring
1
2
10
10
position velocity cognitive
1
10
0
10
0
10
−1
10
−2
10
−3
10
−1
10
−4
10
position velocity cognitive
−5
10 −2
10
−6
0
1000
2000
3000
4000
10
5000
0
(c) Rosenbrock f5 Ring
1000
2000
3000
4000
5000
(d) Ackley f8 Ring
Figure 4.22: Population diversities observation on particle swarm optimizer with ring structure solving single objective problems. 2
2
0
0
position cognitive
−2
−4
−4
−6
−6
−8
−8
−10
−10
−12
0
1000
2000
3000
4000
position cognitive
−2
−12
5000
(a) Parabolic f0 Star
0
1000
2000
3000
4000
5000
(b) Schwefel’s P1.2 f2 Star
0.2
1 0.5
0 0
position cognitive
−0.2
−0.5 −1
−0.4
position cognitive
−1.5 −0.6
−2 −2.5
−0.8
−3 −1 −3.5 −1.2
0
1000
2000
3000
4000
−4
5000
(c) Rosenbrock f5 Star
0
1000
2000
3000
4000
5000
(d) Ackley f8 Star
Figure 4.23: Population diversity change observation on PSO with star structure solving single objective problems. 119
2
5
0
position cognitive
−2
0
−4
position cognitive
−6 −5 −8
−10
−12
0
1000
2000
3000
4000
−10
5000
(a) Parabolic f0 Ring
1000
2000
3000
4000
5000
(b) Schwefel’s P1.2 f2 Ring
0.2
1
0
0.5 0
position cognitive
−0.2
0
−0.5
−0.4
position cognitive
−1 −0.6 −1.5 −0.8
−2
−1
−1.2
−2.5
0
1000
2000
3000
4000
−3
5000
(c) Rosenbrock f5 Ring
0
1000
2000
3000
4000
5000
(d) Ackley f8 Ring
Figure 4.24: Population diversity change observation on PSO with ring structure solving single objective problems. the convergence or divergence information can be obtained from the change of diversities. From the figures, some conclusions could be made that the diversity changes very quickly at the beginning of search which indicates that the particle swarm has good global search ability. Particles then get clustered in a small region quickly. The convergence speed should be controlled during the search; the fast convergence may cause premature convergence. Ratio of Position Diversity to Cognitive Diversity Figure 4.25 shown the ratio of position diversity to cognitive diversity on PSO solving single objective problems. Particle swarm optimizer with star or ring structure has different properties. The particle swarm optimizer with star structure has a strong vibration in the ratio of position diversity to cognitive diversity. The particle swarm optimizer with ring structure has a smooth curve in the ratio of position diversity to cognitive diversity. Diversity in Multiobjective Optimization Figure 4.26 displays the population diversities of PSO solving multiobjective problems. In Figure 4.26 (a), (c) and (e), the diversity curves in particles and solutions are very
120
45
1.4
Star Ring
40
1.3
35
1.2
30 1.1 25 1 20 0.9 15 0.8
10
0
Star Ring
0.7
5
0
1000
2000
3000
4000
5000
0
(a) Parabolic f0 4.5
2
4
1.8
3.5
1.6
3
1.4
2.5
1.2
2
1
1.5
Star Ring
0.8
1000
2000
3000
2000
3000
4000
5000
(b) Schwefel’s P1.2 f2
2.2
0
1000
4000
Star Ring
1 0.5
5000
(c) Rosenbrock f5
0
1000
2000
3000
4000
5000
(d) Ackley f8
Figure 4.25: The ratio of position diversity to cognitive diversity on PSO solving single objective problems. similar, on the contrast, diversity of positions and solutions have different changing curves in other figures. The multiobjective particle swarm optimization algorithms (MOPSOs) have a fast convergence in solving MOPs [67]. The particles very quickly converged to the solutions, corresponding to the population diversities; the diversities nearly have a straight line after few iterations. Discussions Particle swarm has a different diversity changing in single and multiobjective problems. For single objective problems, we only consider the diversity in solution space, however, for multiobjective problems, the diversity in the objective space is also needed to be concerned. For single objective problems, the diversity which decreases fast and may reflect premature convergence, on the contrast, the diversity which decreases slowly may reflect the algorithm searching ineffectively. The algorithm should have a proper decreasing diversity, and the diversity may also need to be enhanced during the search. For multiobjective problems, a diversified target should be kept, i.e., the particles or solutions should not converge into a small region. Maintaining the diversity in the search process is important for multiobjective optimization algorithms. More specif121
0
10
position velocity cognitive set front
−0.6
10
−0.7
10 −1
10
−0.8
10
position velocity cognitive set front
−0.9
10
−2
10
0
500
1000
1500
2000
0
(a) UCP 1 diversity
1000
1500
2000
(b) UCP 2 diversity
0
0
10
10
−1
−1
10
10
position velocity cognitive set front
position velocity cognitive set front
−2
10
500
−2
0
500
1000
1500
10
2000
0
(c) UCP 3 diversity
500
1000
1500
2000
(d) UCP 4 diversity
1
10
position velocity cognitive set front
0
10
position velocity cognitive set front
−0.7
10
−0.8
10 −1
10
−0.9
10
−2
10
0
500
1000
1500
2000
0
(e) UCP 5 diversity
500
1000
1500
2000
(f) UCP 7 diversity
Figure 4.26: Population diversities observation on PSO solving multiobjective UCP problems.
122
ically, we want the searching results to be uniformly distributed, and the fitness of solutions to be close to the real Pareto front. This indicates that the diversity should be maintained on a proper level during the search. This is a problem-dependent setting for different problems having different shape of Pareto front and different number of solutions on Pareto front.
4.3.8
Conclusions
This section discussed an analysis of population diversity of particle swarm optimizer solving single and multi-objective problems. The performance of a search algorithm is determined by its two kinds of abilities: exploration of new possibilities and exploitation of old certainties. These two abilities should be balanced during the search process to obtain a good performance, i.e., the computational resources should be reallocated at algorithm running time. For single objective optimization, the population diversity measures the distribution of particles, while for multiobjective optimization, the distribution of nondominated solutions also should be measured. From the observation of distribution and diversity change, the degree of exploration and exploitation can be obtained. In this section, we have analyzed the population diversity of particle swarm optimizer solving single objective and multiobjective problems. Adaptive optimization algorithms can be designed through controlling balance between the exploration and exploitation. For multiobjective optimization, different problems have different kinds of diversity changing curves. The properties of problem, such as “hardness”, the number of local minima, continue or discrete Pareto front, all affect the performances of optimization algorithms. Through the search information, the problems can be solved effectively. Particles on the state of “expansion” or “converge” can be determined by the diversity measurement. From the population diversity, the diversity changing of PSO variants on different type of functions can be compared and analyzed. The particles’ dynamical search state, the “hardness” of function, the number of local optima, and other information can be obtained. With the information, performance of an optimization algorithm can be improved by adjusting population diversity dynamically during PSO search process. The different topology structure of particles and the dimensional dependence of problems also affect the search process and performance of search algorithms. Seeking the influence of PSO topology structure and dimensional dependence on population diversity is the research need to be explored further. The idea of population diversity measuring can also be applied to other evolutionary algorithms, e.g., genetic algorithm, differential evolution because evolutionary algorithms have the same concepts of current population solutions and search step. The performance of evolutionary algorithms can be improved by taking advantage of the measurement of population diversity. Dynamically adjusting the population diver123
sity controls an algorithm’s ability of exploration or exploitation; hence, the algorithm could have higher possibility to reach optimum.
124
Chapter 5
Population Diversity Control 5.1
Overview
Compared with other evolutionary algorithms, e.g., genetic algorithm, PSO has more search information, which includes not only the solution (position), but also the velocity and the previous best solution (cognitive). Population diversities, which include position diversity, velocity diversity, and cognitive diversity, are utilized to measure the information, respectively. There are several definitions on the measurement of population diversities [30, 207, 208]. The most important factor affecting an optimization algorithm’s performance is its ability of “exploration” or “exploitation.” [179, 207] Exploration means the ability of a search algorithm to explore different areas of the search space in order to have high probability to find good promising solutions. Exploitation, on the other hand, means the ability to concentrate the search around a promising region in order to refine a candidate solution. A good optimization algorithm should optimally balance these conflicted objectives. This balance could be controlled by setting the algorithm’s parameters [81]. Within the PSO, these objectives are addressed by the velocity update equation. Velocity clamp was firstly used to adjust the ability between exploration and exploitation [77]. Like the equation below, current velocity will be equal to maximum velocity or minus maximum velocity if velocity is greater than the maximum velocity or less than the minus maximum velocity, respectively. vij > Vmax Vmax vij −Vmax ≤ vij ≤ Vmax vij = −Vmax vij < −Vmax
(5.1)
However, velocity clamp only makes algorithm less like to diverge, it cannot help algorithm “jump out” a local optimum or refine the candidate solution. Shi and Eberhart introduced a new parameter, an inertia weight w to balance the exploration and exploitation [203,204]. This inertia weight w is added to equation (2.4), and it can be a constant, linear decreasing value over time [205], or fuzzy value [206,209]. 125
The new velocity update equation is as follows vij = wvij + c1 rand()(pij − xij ) + c2 Rand()(pnj − xij )
(5.2)
Adding an inertia weight is more effective than velocity clamp for it not only increase the probability for algorithm to converge, but have a way to control the whole process of the algorithm’s searching. Generally speaking, a PSO algorithm should have a bigger exploration and lower exploitation ability at first, which has a high probability to find more local optima. Exploration should be decreased, and exploitation should be increased to refine candidate solutions over the time. Accordingly, the inertia weight w, should be linearly decreased or even dynamically determined by a fuzzy system. The whole process of PSO search could be adjusted by adding an inertia weight, however, it is difficult to change inertia weight in order to dynamically adjust the ability of exploration or exploitation during algorithm searching. Diversity, which can be a way to monitor an algorithm’s state of exploration or exploitation, is important to help adjust the ability of exploration and exploitation. Diversity can reveal internal characteristic of a search process.
5.2
Population Diversity Control
Diversity can be used to measure the ability of exploration and exploitation, however, the goal is not only to observe, but to control the diversity, that is the state of the exploration or exploitation should be dynamically adjusted. Since adding random noise may increases population diversity, in the first experiment below, we add noise in PSO to see its impacts. The 12 benchmark functions are given in Appendix A.1. Functions f1 − f5 are
unimodal problems, which include Parabolic (Sphere) function f0 , Schwefel’s P2.22 function f1 , Schwefel’s P1.2 function f2 , Step function f3 , and Quartic noise function f4 . f4 is a noisy quadric function, where random[0, 1) is a uniformly distributed random variable in [0, 1). Functions f5 − f11 are multimodal problems, which include generalized Rosenbrock function f5 , Schwefel function f6 , generalized Rastrigin func-
tion f7 , Noncontinuous Rastrigin function f8 , Ackley function f9 , Griewank function f10 , generalized penalized function f11 .
5.2.1
Based on Random Noise
Noise is added to equation (2.5), and the new equation is as follows xij = xij + vij + c3 RAND()
(5.3)
where c3 could be a positive or negative constant, or adjusted during the algorithm search process. RAND() is a random function in the range [0, 1] and the value is 126
different for each dimension and each particle. Some representative results are given in Table 5.1. This method does not perform better than the standard PSO. Table 5.1: Representative results of PSO with diversity control based on random noise. All algorithms are run over 50 times, where “mean” indicates the mean of best fitness values found in the last generation.[−0.05, 0.05] indicates that c3 RAND() in the range of −0.05 to 0.05 during PSO search process, fmin = 0. Result
Standard PSO [0, 0.05] [0, 0.1] Star
[−0.05, 0] [−0.1, 0] [−0.05, 0.05] [−0.1, 0.1] Standard PSO [0, 0.05] [0, 0.1]
Ring
[−0.05, 0] [−0.1, 0] [−0.05, 0.05] [−0.1, 0.1]
5.2.2
best mean best mean best mean best mean best mean best mean best mean
f4 0.003306 3.763743 0.027495 3.324122 0.192631 5.108825 0.030778 2.465845 0.199754 6.448594 0.092398 0.206316 0.914132 1.488689
f8 98.0000 170.2000 151.4569 193.7475 171.7475 243.0230 125.2904 177.0750 162.2819 242.4670 187.5481 237.1176 248.5174 328.5418
f10 0 18.0558 0.001945 14.4905 0.006434 3.639894 0.001924 7.250727 0.007742 5.450799 0.006408 0.022319 0.023818 0.046167
best mean best mean best mean best mean best mean best mean best mean
0.021183 0.078473 0.062837 0.092465 0.251453 0.389412 0.055127 0.089502 0.320260 0.400614 0.203883 0.283535 1.022184 1.777710
139.2511 181.7243 139.5583 189.9442 162.0773 212.5552 135.7548 188.1026 193.1383 222.3157 155.0355 197.6311 216.1435 279.9250
9.7699E-15 1.7852E-14 0.002694 0.003257 0.008715 0.012889 0.002388 0.003799 0.010421 0.012960 0.007811 0.010637 0.027316 0.041168
Based on Average of Current Velocities
Velocity measures the “flying” tendency of particles. The average of current velocities may affect the population diversity. The next experiment, which adds the average velocity to equitation (2.5), has the equation (2.5) modified as follows xij = xij + vij + c3 RAND()¯ vj
(5.4)
where c3 could be a positive or negative constant, or adjusted during the algorithm search process. RAND() is a random function in the range [0, 1] and the value is different for each dimension and each particle; v¯j is the average of current velocities for dimension j. 127
Some representative results are shown in Table 5.2. This method does not have significant improvement over the standard PSO. Table 5.2: Representative results of PSO with diversity control based on average of current velocities. All algorithms are run over 50 times, where “mean” indicates the mean best fitness values found in the last generation. c3 ∼ [−0.05, 0.05] indicates that c3 linearly increases from −0.05 to 0.05 during PSO search process, fmin = 0. Result
Standard PSO c3 = 0.05 c3 = 0.1 Star
c3 = −0.05 c3 = −0.1 c3 ∼ [−0.05, 0.05] c3 ∼ [−0.1, 0.1] Standard PSO c3 = 0.05 c3 = 0.1
Ring
c3 = −0.05 c3 = −0.1 c3 ∼ [−0.05, 0.05] c3 ∼ [−0.1, 0.1]
5.2.3
best mean best mean best mean best mean best mean best mean best mean
f4 0.003306 3.763743 0.000300 4.835266 0.000213 3.653916 0.000314 3.976869 0.000209 1.990022 0.000113 3.707627 0.000420 2.365738
f8 98.0000 170.2000 91.2810 154.0616 51.3985 139.9491 83.1481 144.6972 66.5491 153.8094 65.2646 153.0410 90.0021 165.4357
f10 0 18.0558 0 21.7101 0 21.7121 0 16.3120 0 18.0800 0 14.4772 0 14.4980
best mean best mean best mean best mean best mean best mean best mean
0.021183 0.078473 0.010046 0.028780 0.008539 0.028096 0.008869 0.030871 0.007992 0.027247 0.009003 0.032105 0.007660 0.034546
139.2511 181.7243 120.4208 162.1770 138.2817 181.1114 125.3705 182.1733 144.7393 180.8293 146.1426 180.4098 144.0002 174.4129
9.7699E-15 1.7852E-14 8.8817E-16 0.000177 7.7715E-16 6.4194E-05 1.4432E-15 3.2111E-05 8.8817E-16 9.6985E-05 2.3314E-15 0.000148 9.9920E-16 0.000149
Based on Current Position and Average of Current Velocities
In the third experiment, we utilize current position in addition to the current average velocity. The new equation is as follows xij = xij + vij + c3 RAND()(xij − v¯j )
(5.5)
where c3 could be a positive or negative constant, or adjusted during the algorithm searching. RAND() is a random function in the range [0, 1] and is different for each dimension and each particle; v¯j is the average of current velocities for dimension j. Several representative results are shown in Table 5.3 and Table 5.4. Some significant improvements are achieved from this method, the best or mean of results are much 128
0
1
10
10
−1
10
0
10
−2
10
c3 = −0.4 −3
10
standard PSO c3 = 0.4 ∼ −0.4
c3 = 0.4
c3 = 0.4
−4
10
c3 = −0.4
−1
10
standard PSO c3 = 0.4 ∼ −0.4
−2
0
10
1
10
2
10
3
10
10
4
10
0
1
10
(a) Quartic noise
10
2
10
3
10
4
10
(b) Noncontinuous Rastrigin
4
0
10
10
2
10
0
10
−2
10
−1
10 −4
10
c3 = −0.4 −6
c3 = −0.4
standard PSO c = 0.4 ∼ −0.4
10
standard PSO c = 0.4 ∼ −0.4
3
−8
c3 = 0.4
10
3
c3 = 0.4
−10
10
−2
0
10
1
10
2
10
3
10
10
4
10
0
10
(c) Griewank
1
10
2
10
3
10
4
10
(d) Quartic noise 4
10 0.2
10
2
10 0
10
0
10 −0.2
10
−2
10 −0.4
10
c3 = −0.4
10
c3 = −0.4
−4
10
standard PSO c3 = 0.4 ∼ −0.4
−0.6
standard PSO c3 = 0.4 ∼ −0.4
−6
10
c3 = 0.4
c3 = 0.4
−0.8
10
−8
0
10
1
10
2
10
3
10
10
4
10
0
10
(e) Noncontinuous Rastrigin
1
10
2
10
3
10
4
10
(f) Griewank
Figure 5.1: PSO population diversity control based on current position and average of current velocities. Global star structure: (a) f4 position, (b) f8 cognitive, (c) f10 velocity; Local ring structure: (d) f4 cognitive, (e) f8 velocity, (f) f10 position
129
better than the standard particle swarm optimization algorithm. Table 5.3: Representative results of PSO with diversity control based on current position and average of current velocities. All algorithms are run over 50 times, where “mean” indicates the mean of best fitness values found in the last generation. c3 ∼ [0.05, −0.05] indicates that c3 linearly decreases from 0.05 to −0.05 during PSO search process, fmin = 0. Result Standard PSO c3 = 0.4 c3 = 0.2 c3 = 0.1 c3 = 0.05 c3 = −0.05 c3 = −0.1 c3 = −0.2 c3 = −0.4 c3 = −0.8 c3 = 0.05 ∼ −0.05 c3 = 0.1 ∼ −0.1 c3 = 0.2 ∼ −0.2 c3 = 0.4 ∼ −0.4 c3 = 0.8 ∼ −0.8
best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean
Global f4 0.003306 3.763743 15.0484 44.2900 0.019737 15.8242 0.001361 7.681045 0.001999 8.593309 9.2408E-05 0.001132 3.8991E-05 0.000707 1.9932E-05 0.000549 5.1783E-05 0.000358 0.070678 2.289617 0.000418 0.002872 0.000453 0.002000 0.000216 0.001909 4.7276E-05 0.001344 1.101E-06 0.001127
star structure f8 f10 98.0000 0 170.2000 18.0558 502.3478 156.2566 533.4721 233.7385 377.3507 0.000415 413.9585 54.3862 210.4438 0 272.0944 52.3284 132.0845 0 195.9888 54.2096 27.1167 0 62.5534 0.005293 36.2792 0 83.3589 0.004473 28.0569 0 59.0719 0.004092 15.2198 0 37.4257 0.003220 347.6730 0.022260 392.8169 42.3895 32.5417 0 82.3480 0.012007 19.3198 0 60.7679 0.012391 21.2966 0 124.3974 0.009576 84.0947 0 187.7145 0.012821 247.4993 0 307.9561 0.001575
Figure 5.1 measures the difference of the standard PSO and PSO with diversity control: firstly, for the PSO with global star structure, Figure 5.1 (a) shows the position diversity of f4 function, Figure 5.1 (b) shows the cognitive diversity of f8 function, and Figure 5.1 (c) shows the velocity diversity of f10 function; secondly, for the PSO with local ring structure, Figure 5.1 (d) shows the cognitive diversity of f4 function, Figure 5.1 (e) shows the velocity diversity of f8 , and Figure 5.1 (f) shows the position diversity of f10 function. From the Table 5.3, Table 5.4, and Figure 5.1, conclusions could be made that if c3 is positive, the diversity will be increased, particles search wider area than the 130
Table 5.4: Representative results of PSO with diversity control based on current position and average of current velocities. All algorithms are run over 50 times, where “mean” indicates the mean of best fitness values found in the last generation. c3 ∼ [0.05, −0.05] indicates that c3 linearly decreases from 0.05 to −0.05 during PSO search process, fmin = 0. Result Standard PSO c3 = 0.4 c3 = 0.2 c3 = 0.1 c3 = 0.05 c3 = −0.05 c3 = −0.1 c3 = −0.2 c3 = −0.4 c3 = −0.8 c3 = 0.05 ∼ −0.05 c3 = 0.1 ∼ −0.1 c3 = 0.2 ∼ −0.2 c3 = 0.4 ∼ −0.4 c3 = 0.8 ∼ −0.8
best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean best mean
Local f4 0.021183 0.078473 10.8490 26.8031 0.015253 0.029428 0.006296 0.013490 0.012057 0.020866 0.000468 0.002397 3.0822E-05 0.000527 8.8024E-06 0.000486 4.0767E-06 0.000362 0.006103 0.029982 0.002921 0.012222 0.002940 0.007120 0.001008 0.003633 0.001177 0.001970 7.3074E-05 0.001389
131
ring structure f8 f10 139.2511 9.7699E-15 181.7243 1.7852E-14 474.6455 105.0195 550.2269 183.7515 366.2699 0.166868 390.5112 0.416948 247.5167 0 272.3373 0.000193 163.8312 0 194.9040 0.000201 99.77662 0 134.5600 2.2764E-06 90.1938 0 117.1118 8.8288E-06 75.1486 0 102.8416 0 84.2474 0 102.2737 0 324.0427 0 358.4001 0.213375 153.0000 0 185.6194 0.000347 137.6087 0 192.0868 1.2374E-06 159.4919 0 184.7151 3.0085E-06 180.9459 0 255.3435 1.9579E-05 258.3948 0 320.8334 0.000225
standard PSO does, the results have a large variance. If c3 is negative, the diversity will be decreased, particles converge faster than the standard PSO, the results have a small variance. If c3 was initialized with a positive value and decreases to negative values during PSO search process, both the best and mean result will be improved. The relationship between this new parameter and diversity control is still required to be further researched. How to set this parameter c3 , or how to adjust this parameter for different problem during different state of PSO searching is our future work.
5.2.4
Conclusions
Based on the population analysis in Chapter 3, a novel position update equation to modify the diversity during PSO search process is presented. New equation has an extra parameter c3 , the diversity could be increased or decreased by setting c3 to be a positive or negative value, respectively. The idea of diversity measuring and controlling can also be applied to other evolutionary algorithms, e.g., genetic algorithm, differential evolution. Because evolutionary algorithms have the same concepts of current population solutions, the position diversity could be measured and adjusted. It can be beneficial to dynamically adjust the algorithm’s ability of exploration or exploitation, especially when the problem to be solved is a difficult or large-scale problem.
5.3
Dynamical Exploitation Space Reduction
Swarm intelligence, which is based on a population of individuals, is a collection of nature-inspired searching techniques. Particle Swarm Optimization is one of swarm intelligence algorithms [74, 133]. It is a population-based stochastic algorithm modeled on the social behaviors observed in flocking birds. Each particle represents a solution, flies through the search space with a velocity that is dynamically adjusted according to its own and its companion’s historical behaviors. The particles tend to fly toward better search areas over the course of the search process [75]. Optimization, in general concerns with finding the “best available” solution(s) for a given problem within allowable time, and the problem may have several or numerous optimum solutions, of which many are local optimal solutions. Evolutionary optimization algorithms are generally difficult to find the global optimum solutions for multimodal problems due to the possible occurrence of the premature convergence. Particles fly in the search space. If particles can easily get clustered together in a short time, particles will lose their “search potential.” Population premature convergence around a local optimum is a common problem for population-based algorithms. It is a result of individuals hastily congregating within a small region of the search space. An algorithm’s search ability of exploration is decreased when premature con-
132
vergence occurs, and particles will have a low possibility to explore new search areas. Normally, diversity, which is lost due to particles getting clustered together, is not easy to be recovered. An algorithm may lose its search efficacy due to premature. As a population becomes converged, an algorithm will spend most of iterations to search in a small region. The “No Free Lunch” (NFL) theorem for optimization, which was introduced by Wolpert and Macready [232–235], has proved that under certain assumptions no algorithm is better than other one on average for all problems. It means that we cannot find an algorithm suit for all problems without prior knowledge. However, NFL theorem also implies that we can make better algorithm if an algorithm know the information of the problem. Algorithms should have an ability of learning to solve different problems, in other words, algorithms can adaptively change to suit the landscape of problems. Good algorithms should converge after iterations. The concept of convergence in swarm intelligence should not mean all individuals get clustered together, but the fitness value getting better and better in search process to find “good enough” solution(s). In other words, convergence should be concerned in objective space. PSO with partial reinitialization, which is an effective way to promoting diversity, can increase possibility of particles “jumping out” of local optima, and keep the ability of algorithm finding “good enough” solution. Compared with other evolutionary algorithms, e.g., genetic algorithm, PSO has more search information, which includes not only the solution (position), but also the velocity and the previous solution (cognitive). Population diversities, which include position diversity, velocity diversity, and cognitive diversity are utilized to measure these information, respectively. There are several definitions on the measurement of population diversities [30, 207, 208].
5.3.1
Diversity Maintenance
Convergence Analysis Objective function is a mapping from solution space to objective space. The optimization (or search) process is to find the optimal objective function value in the objective space or the best position in the solution space which corresponds to the optimal objective function value. Convergence is a measure of an algorithm’s performance, it has clear definition in mathematics, and however, some misunderstanding still exists in optimization. Convergence should not mean all particles get clustered in a small region, but the global best or the particle which has the best fitness value gets close to a “good enough” solution. The definition of convergence in mathematics is: A sequence in Rn is the specifica-
tion of a point xk ∈ Rn for each integer k ∈ {1, 2, · · · }. A sequence of points {xk } in
Rn is said to converge to a limit x (written xk → x) if the distance d(xk , x) between 133
xk and x tends to zero as k goes to infinity, i.e., if for all > 0, there exists an integer k() such that for all k ≥ k(), we have d(xk , x) < [216].
However, evolutionary algorithms are population-based algorithms for optimization
and search problems. There are many solutions that exist at the same time. Unlike the definition in mathematics, an algorithm only has limited computational resources, which means the iteration number k cannot goes to infinity. The evolution stops when the algorithm has found a “good” solution(s) or has reached the predefined maximum number of iterations. Zhang et al. states that almost all the individual points in the population will move to globally optimal points as the time tends to infinity if an algorithm converges globally [255]. Chen et al. gives a definition of convergence as: if limt→∞ F¯ (t) = g ∗ holds for a given algorithm, where F¯ (t) is the average fitness of individuals in the tth generation and g ∗ is the fitness of the global optimum, then we say that the algorithm converges to the global optimum [27]. The individuals may get clustered together after an algorithm converges globally; however, these are not correlated. The individuals get clustered together doesn’t means the algorithm converges, the individuals may “stick” in a local optimum. On the contrary, if the best fitness value is getting better and better, and “good enough” solution(s) is found at the end of search, we can say that the algorithm has converged. The convergence doesn’t requires all individuals find “good enough” solutions. One individual with the best fitness value is enough to prove that the algorithm has converged. Convergence doesn’t concern with the distribution of individuals, it concerns the fitness values in the objective space. If the best fitness value is getting better and better, and “good enough” solution(s) is found at the end of search, we can say that the algorithm has converged. Then we can define fast convergence as an algorithm using small number of iterations to find “good enough” solution(s), and if an algorithm reaches “good enough” solution(s) many times in independent runs, the algorithm have high potential to converge. Diversity Promotion Diversity is important in an algorithm’s search. Premature may happens due to the low diversity. Convergence only concerns the particle with the best fitness value; other particles’ search information may be redundant for the search. When particles get into a small region, the swarm will lose the “search potential”, most of the particles became useless, it stops fly. At that time, the population diversity is a very small value. Therefore we need to inject energy into the swarm, such as to re-initialize some particles. With the re-initializing strategy, the algorithm’s diversity will be promoted. The
134
idea behind this approach is to increase the possibility of particles “jumping out” of local optima, and keep the ability of the algorithm finding “good enough” solution. Algorithm 2 lists the process of PSO with re-initialization strategy. After several iterations, part of particles re-initialize its position and velocity in whole search space, which increases the possibility of particles “jumping out” of local optima. According to the way of keeping some particles, this mechanism can be divided into two kinds [32]. • Random partial re-initialization As its name indicates, random partial reinitialization means reserving particles by random. This approach can obtain a great ability of exploration as most particles are re-initialized. • Elitist partial re-initialization On the contrary, elitist partial re-initialization keeps particles with better fitness values. Algorithm increases the ability of exploration as part of particles re-initialization in whole search space, and at the same time, particles are attracted by particles with better fitness values. Algorithm 2: Exploitation space dynamical reduction in particle swarm optimization 1 Initialization: Initialize velocity and position randomly for each particle in every dimension; 2 while have not found “good enough” solution or have not reached the maximum iteration do 3 Calculate each particle’s fitness value; 4 Compare fitness value between current value and best position in history (personal best, termed as pbest). For each particle, if fitness value of current position is better than pbest, then update pbest as current position; 5 Select a particle which has the best fitness value from current particle’s neighborhood; 6 for each particle do 7 Update particle’s velocity and position according to the equation (2.4) and (2.5), respectively; 8 Divide exploitation space into four parts, count numbers of particles appeared in each part for each dimension; 9 if meet some conditions or after every specific iterations then 10 Compare the numbers of particles appeared in two sides for each dimension, remove one side which includes less particles. Setting the remaining space as exploitation space; 11 Keep some particles’ positions and velocities; re-initialize others in the new exploitation space randomly;
Even the exploitation space is changed in Algorithm 2, the search boundary is kept to avoid the premature convergence.
135
Exploitation Enhancement An algorithm may lose its search efficiency when the problem’s dimension increases to a large scale. Partial re-initialization can improve an algorithm’s exploration ability for it can search more regions. The exploitation ability is also important in search, it helps algorithm to refine solutions. A learning strategy is proposed in this section to enhance the exploitation ability. For particle swarm optimization, particles fly in the search space. If many particles stay in a region, we can conclude that this region may exist good solution(s). On the contrary, if particles rarely stay in one region, or quickly leave this region, we can conclude that this region may not contain good solution(s). With this assumption, we propose a strategy to reduce exploitation space dynamically.
min 1 4
1 2
3 4
max xi
Figure 5.2: For each dimension, the exploitation space is divided into four equal parts. Figure 5.2 displays the exploitation space which is divided into four equal parts. We count the number of particles appeared in different part at each iteration. If particles appeared in the first part more times than the last part, then we consider that maybe the last part doesn’t have good solution(s), and abandon the last part. If particles appeared in the last part are more than that in the first part, we will abandon the first part. The particles will partially re-initialize in the new space. The new space will be divided into four parts in the next iteration. Compared with the whole search space, this space is reduced to a small region. This strategy can be seen as a PSO with a learning process. Particles are guided to search more promising regions, which will enhance the algorithm’s ability of exploitation. In this chapter, we partially re-initialize particles every fixed number of iterations, and the results of particles re-initializing in whole search space and in dynamically reduced space are compared.
5.3.2
Experimental Study
Benchmark Test Functions and Parameter Setting The experiments have been conducted to test the benchmark functions listed in Appendix A.1. Without loss of generality, five standard unimodal and five multimodal test functions are selected [152, 249]. All functions are run 50 times to ensure a reasonable statistical result necessary to compare different approaches. Randomly shifting of the 136
location of optimum is utilized in each dimension for each run. In all experiments, PSO has 50 particles, and parameters are set as in the standard PSO, let w = 0.72984, and c1 = c2 = 1.496172 [21, 44]. Each algorithm has 20, 000 iterations for 200 dimensional problems in every run to ensure “enough” search. For large scale problems, many computational hours are needed to find the “good enough” solutions. All functions are tested with dimension 200. With the limitation of time, only function f4 and f9 with dimension 500 are tested. Each algorithm has 50, 000 iterations for 500 dimensional problems in every run. Experimental Results As we are interested in finding an optimizer that will not be easily deceived by local optima, we use three measures of performance. The first is the best fitness value attained after a fixed number of iterations. In our case, we report the best result found after 20, 000 iterations (50, 000 iterations for 500 dimensional problems). The second and the last are the median and mean value of the best fitness values for all runs. For the limit of space, Table 5.5, 5.6, 5.7, 5.8, and 5.9 give some results of variants of PSO solving unimodal and multimodal benchmark functions, respectively. Reinitialization strategy is utilized after each 1000 iterations (2500 iterations for 500 dimensional problems). In the first column of tables, “elitism” means kept part of particles with better fitness values than other, and “random” means randomly choosing particles to be re-initialized. ‘S’ in “elitismS”, “elitismRS”, “randomS”, and “randomRS” means stable, 20% of particles’ positions and velocities are kept during reinitialization; while in other experiments, 5% particles are kept at first, and increases 5% after each re-initialization, 95% particles are kept at last re-initialization. ‘R’ in “elitismR”, “elitismRS”, “randomR”, and “randomRS” means reduction, the exploitation space is reduced by 1/4 percent after each re-initialization. Table 5.5, 5.6, 5.7, and 5.8 give the optimization result of 200 dimensional problems. Table 5.9 gives the optimization results of function f4 and f9 with dimension 500. From the results, we can see that partial re-initialization strategy can improve the algorithm’s performance. The re-initialization with the exploitation space reduction strategy has the most good results, both for best and mean fitness values. With the dimension increasing, problems are getting more difficult, and easy to “stick in” local optima. Utilizing re-initialization strategy, the ability of exploration can be improved, while at the same time, the strategy of dynamically reducing the exploitation space is a kind of learning ability, the exploitation ability is enhanced.
5.3.3
Diversity Analysis and Discussion
Compared with other evolutionary algorithms, e.g., genetic algorithm, PSO has more search information, which includes not only the solution (position), but also the veloci137
Table 5.5: Results of variants of PSO with star structure solving unimodal benchmark functions. All algorithms are run for 50 times, where “best”, “median”, and “mean” indicate the best, median, and mean of the best fitness values for all runs, respectively. Func.
Result fmin
f0
−450.0
f1
−330.0
f2
−450.0
f3
330.0
f4
−450.0
standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS
Best -396.6512 -432.5566 -436.8762 -448.9060 -444.1402 -433.5336 -420.1116 -432.5671 -433.1057 -324.2892 -326.7687 -324.6042 -328.2870 -328.2459 -325.7084 -323.9110 -307.5741 -302.3322 24661.556 33507.801 24638.182 16607.916 24261.573 24398.385 22584.004 29889.627 24933.168 1156 426 368 376 372 375 367 581 541 -449.9200 -449.9406 -449.9606 -449.9689 -449.9796 -449.9527 -449.9569 -449.9568 -449.9677
138
Global Star Median 162.9360 -405.9650 -389.6956 -435.6894 -436.4156 -414.2899 -396.4617 -401.3262 -406.4670 -240.0715 -307.5884 -297.8940 -321.7155 -324.0966 -310.1950 -304.0339 -234.1052 -222.5054 104118.94 49367.068 43829.082 51591.162 43995.148 45544.564 41793.371 43437.023 36941.390 5602 530 481 444 422 446 478 643 641 -449.7666 -449.9189 -449.9308 -449.9473 -449.9654 -449.9269 -449.9312 -449.9235 -449.9368
Structure Mean 967.0205 -395.7373 -389.7465 -436.1783 -435.8136 -411.3826 -392.8778 -403.2554 -406.1264 -244.8487 -303.3576 -290.9013 -317.7022 -320.9108 -308.2199 -301.2244 -232.4679 -229.3929 108033.40 55698.004 46409.201 65903.277 46348.846 46149.681 41368.493 52605.030 39195.549 8793.02 537.8 518.58 463.48 431.1 451.92 487.5 637.36 624.36 -449.5825 -449.9129 -449.9269 -449.9467 -449.9646 -449.9239 -449.9258 -449.9260 -449.9407
Std. Dev. 2150.29 36.0895 23.4878 7.21397 4.36639 14.1847 18.4929 13.0089 11.9736 40.8985 21.8268 22.9367 11.0450 8.37401 15.1650 17.4156 31.3117 26.6733 46946.08 17750.09 20502.85 41096.28 14794.26 13514.70 9110.691 32438.44 9681.160 7880.39 95.797 120.347 59.863 50.340 48.419 61.811 14.518 29.800 0.54627 0.02617 0.02021 0.01277 0.00652 0.01675 0.02639 0.00699 0.01276
Table 5.6: Results of variants of PSO with ring structure solving unimodal benchmark functions. All algorithms are run for 50 times, where “best”, “median”, and “mean” indicate the best, median, and mean of the best fitness values for all runs, respectively. Func.
Result fmin
f0
−450.0
f1
−330.0
f2
−450.0
f3
330.0
f4
−450.0
standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS
Best -450 -450 -450 -450 -450 -450 -450 -449.9999 -450 -329.9999 -329.9999 -329.9999 -329.9999 -329.9999 -329.9999 -329.9999 -329.5036 -329.9591 145745.89 133580.02 127853.46 56002.083 98671.062 122989.15 120146.47 54480.255 57954.202 332 332 333 337 335 330 332 541 522 -449.8026 -449.8093 -449.8098 -449.9630 -449.9582 -449.7825 -449.8127 -449.9514 -449.9603
139
Local Ring Structure Median Mean -450 -450 -450 -450 -450 -450 -450 -450 -450 -450 -450 -450 -450 -450 -449.8258 -449.6071 -449.9716 -449.8682 -325.6656 -314.3573 -328.5221 -314.5340 -314.3214 -309.7285 -329.9999 -329.9999 -329.9999 -329.9999 -324.9412 -309.0613 -318.5351 -312.8336 -323.1047 -318.9794 -323.9673 -319.2941 195993.36 198457.73 184699.67 185370.78 173393.35 174266.19 81554.375 105191.15 172486.86 170649.22 167454.21 166450.56 160691.76 162846.23 74050.437 95816.061 152036.35 151112.29 347 350.86 349 351.26 343 344.96 352 365.94 346 349.54 338 341.92 338 343.86 632 622.86 575 578.24 -449.6859 -449.6842 -449.6926 -449.6904 -449.7020 -449.7009 -449.9441 -449.9432 -449.9513 -449.9505 -449.7189 -449.7101 -449.7120 -449.7094 -449.9228 -449.9250 -449.9399 -449.9411
Std. Dev. 4.071E-13 4.340E-13 4.240E-13 4.226E-13 4.049E-13 4.079E-13 4.366E-13 0.550956 0.275698 19.3804 24.6827 21.9638 2.687E-13 5.414E-07 29.7416 21.3848 9.54560 13.3619 29145.70 24111.99 21868.74 46426.37 29881.99 18070.66 21937.51 43413.27 27352.9808 17.716 12.252 7.7561 38.937 12.840 15.095 23.823 23.625 29.344 0.06334 0.07247 0.05543 0.01003 0.00478 0.04943 0.05156 0.00581 0.00933
Table 5.7: Results of variants of PSO with star structure solving multimodal benchmark functions. All algorithms are run for 50 times, where “best”, “median”, and “mean” indicate the best, median, and mean of the best fitness values for all runs, respectively. Func.
Result fmin
f5
−330.0
f6
120.0
f7
−330.0
f8
f9
−450.0
180.0
standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS
Best 530.1117 307.7678 467.7053 237.4827 175.0196 188.4822 245.5726 286.1139 250.7470 1287.8228 1178.1983 1182.4966 1197.7698 1255.9851 1287.0368 1235.3700 979.4434 1156.1161 -310.7626 -310.5381 -310.7134 -310.5798 -310.9013 -310.8754 -310.4785 -311.1126 -310.4650 -448.8958 -448.8545 -448.8107 -449.7932 -449.7519 -448.8595 -448.7562 -449.1530 -449.0225 180.5779 180.5490 180.1273 180.2445 180.0976 180.0179 180.0577 180.2905 180.1987
140
Global Star Structure Median Mean 2567.599 5041.327 832.0563 909.1455 896.8462 950.6690 424.5731 428.0895 404.1930 387.3405 712.6283 763.2868 652.8933 727.8075 585.9211 581.4446 617.2641 565.0439 1713.9521 1690.4335 1606.6760 1596.5654 1581.1626 1561.4312 1691.2389 1634.9397 1507.3436 1519.4253 1530.6305 1555.8450 1554.1859 1560.7901 1742.5405 1622.6327 1627.6344 1581.9139 -310.2155 -310.2250 -310.2082 -310.2303 -310.2478 -310.2541 -310.2216 -310.2344 -310.2111 -310.2369 -310.2343 -310.2587 -310.2177 -310.2423 -310.2257 -310.2679 -310.2347 -310.2405 -442.3428 -435.9744 -448.5345 -448.4655 -448.4935 -448.4663 -449.0991 -449.2070 -449.0632 -449.1220 -448.6393 -448.6336 -448.5470 -448.4902 -448.6938 -448.7215 -448.7105 -448.7010 182.2747 182.3334 181.7933 182.0612 180.8066 180.9525 180.7614 180.7481 180.4592 180.4723 180.5369 180.6387 180.4841 180.5663 181.0011 181.3749 180.6563 180.6345
Std. Dev. 7894.732 411.6821 383.0467 64.3925 74.3629 326.1787 324.2605 95.0018 119.8512 191.9500 208.5966 200.3136 171.6530 122.3195 183.1440 179.9893 259.5250 173.7164 0.09793 0.08587 0.11104 0.08652 0.12554 0.12494 0.09322 0.18677 0.07864 23.4516 0.45666 0.18078 0.30399 0.21779 0.13600 0.20138 0.14011 0.10913 1.05064 1.11163 0.61830 0.24774 0.19466 0.42691 0.39055 0.94949 0.30412
Table 5.8: Results of variants of PSO with ring structure solving multimodal benchmark functions. All algorithms are run for 50 times, where “best”, “median”, and “mean” indicate the best, median, and mean of the best fitness values for all runs, respectively. Func.
Result fmin
f5
−330.0
f6
120.0
f7
−330.0
f8
f9
−450.0
180.0
standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS
Best -85.4149 -42.7788 -37.1115 49.0892 39.4321 -44.3760 -53.1758 58.5671 46.6260 1066.6763 1012.0764 1061.2819 1031.6478 1119.1023 935.5948 881.8618 961.8243 995.0481 -326.8737 -327.1184 -327.8750 -327.5319 -326.8232 -327.3005 -310.8845 -327.3665 -327.6128 -450 -450 -450 -450 -450 -450 -450 -449.9999 -449.9878 180.3176 180.1374 180.1579 180.0003 180.0003 180.1377 180.1289 180.0003 180.0003
141
Local Ring Median 116.3527 94.9119 108.8648 54.2064 107.6642 106.5194 102.5059 169.8279 120.7200 1332.0078 1302.1408 1318.6356 1516.5832 1355.8074 1206.0604 1222.4484 1561.7298 1501.1725 -310.2526 -310.2689 -310.2752 -310.0083 -310.1363 -310.3392 -310.2874 -310.1062 -310.1606 -450 -450 -450 -449.9827 -449.9827 -450 -450 -449.8118 -449.8468 181.9244 182.0000 181.6051 180.0625 180.0625 181.2355 180.8183 180.0625 180.0625
Structure Mean 118.0499 107.8782 120.3515 83.0209 92.7237 106.4433 109.5819 158.4124 131.0229 1334.5447 1315.0352 1301.6603 1439.2557 1357.5523 1217.4671 1218.9190 1415.5927 1438.9219 -311.2310 -310.6737 -311.0091 -311.0497 -311.1175 -311.3687 -310.3525 -310.5687 -310.8133 -449.9998 -449.9998 -449.9908 -449.9475 -449.9632 -449.9939 -449.9935 -449.7889 -449.8398 182.2738 182.3215 181.6485 180.0551 180.0391 181.3079 181.0150 180.0758 180.0488
Std. Dev. 70.0740 86.6089 98.0489 35.6439 42.8395 68.9579 87.3244 48.0036 51.5596 125.6226 139.7578 127.4058 197.6544 116.7431 114.5881 134.7725 249.7555 203.9936 3.80208 2.36148 3.34922 4.03891 3.90695 3.94883 0.14253 2.51200 3.36525 0.001035 0.001035 0.064307 0.072668 0.077175 0.042642 0.042633 0.155654 0.119247 1.387543 1.285333 0.885879 0.020212 0.030421 0.704538 0.653206 0.052563 0.025761
Table 5.9: Results of variants of PSO solving large scale benchmark functions. All algorithms are run for 50 times, where “best”, “median”, and “mean” indicate the best, median, and mean of the best fitness values for all runs, respectively. Func.
fmin
Star
f4
−450.0
Ring
Star
f9
180.0
Ring
PSO standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS standard elitism elitismS elitismR elitismRS random randomS randomR randomRS
Best -447.3613 -449.7081 -449.7672 -449.8564 -449.8894 -449.7393 -449.7793 -448.5790 -449.4186 -449.0737 -449.1327 -449.1194 -449.8186 -449.7503 -449.0598 -449.1447 -449.5874 -449.7083 182.2265 182.2769 180.8355 180.5561 180.4983 181.8013 181.7606 180.7785 180.7427 224.6187 188.5252 184.2550 180.0003 180.0273 185.9102 185.0061 180.0016 180.0603
142
Median -422.2333 -449.5676 -449.6687 -449.7924 -449.8169 -449.6504 -449.6949 -446.5560 -446.6553 -448.5077 -448.5463 -448.6034 -449.7270 -449.6764 -448.6143 -448.6534 -446.5793 -446.8610 186.1952 236.3979 187.5741 181.7502 182.4056 184.8506 183.9196 186.3966 183.4781 7440.065 2234.821 5795.195 180.0378 180.3886 1132.832 3129.990 180.6307 180.5313
Mean -384.0627 -449.5631 -449.6675 -449.7038 -449.8123 -449.6522 -449.6841 -446.6972 -447.3046 -448.3724 -448.4714 -448.5844 -449.6757 -449.6635 -448.6035 -448.6045 -446.7937 -447.6682 44877.72 1056.631 3085.429 181.8445 182.6086 184.9795 183.9366 186.1787 183.9156 48781.82 35987.29 17085.96 180.0430 180.6344 18744.79 7511.603 180.6370 180.9238
Std. Dev. 90.2907 0.09106 0.06249 0.26527 0.03890 0.06104 0.06609 0.42539 0.94508 0.49739 0.40334 0.30239 0.18786 0.11839 0.24851 0.32805 0.63186 1.16541 183432.1 2583.94 18298.5 0.97390 1.54583 1.41452 1.47493 2.83960 2.77491 118797.1 69794.11 37495.81 0.037395 0.765846 42600.37 12855.75 0.392614 1.235349
ty and the previous solution (cognitive). Population diversities, which include position diversity, velocity diversity, and cognitive diversity are utilized to measure these information, respectively. There are several definitions on the measurement of population diversities [30, 207, 208]. The dimension-wise population diversity based on the L1 norm is utilized in this chapter. Without loss of generality and for the purpose of simplicity and clarity, the results for one function from five unimodal benchmark functions and one function from five multimodal functions will be displayed because the results for the others are similar. Figure 5.3 and 5.4 display the comparison of population diversities for PSO solving unimodal function f4 and multimodal function f9 , respectively. In both figures, (a), (b), and (c) are for PSO with elitism partial re-initialization; (d), (e), (f) are for PSO with random partial re-initialization. By looking at the shapes of population diversities changing in Figure 5.3 and Figure 5.4, it is easy to see that position diversity and velocity diversity of PSO with partial re-initialization have more vibration than classical PSO. Cognitive diversity can reache to a very tiny value in PSO with exploitation space reduction. Particles are getting clustered together in the search, this may cause premature when PSO is utilized to solve computational expensive problems. With the concept that keeps particles’ search ability, and enlarge particles’ exploration and exploitation ability, partial re-initialization and exploitation space reduction are utilized. Position and velocity diversity have strong vibration indicates that particles are spread into a large space. This strategy could enlarge the algorithm’s exploration ability. While cognitive diversity is reduced to a small value, this means that the algorithm’s “search targets” are getting together more closely, the algorithm’s exploitation ability is also improved.
5.3.4
Conclusions
In this chapter, we proposed a method to dynamically reduce PSO’s exploitation space. For large scale problem, algorithms are easily to have premature convergence due to the high dimensional search space. Algorithm’s ability of exploration and exploitation should be considered together during search process. Particles getting clustered together in the solution space do not indicate that an algorithm has converged. Algorithm’s convergence has the relationship only with particles’ fitness values. Some strategies can be added on particles to maintain algorithm’s ability of exploration and exploitation without or rarely affecting algorithm’s convergence. Two strategies are utilized in this chapter to promote PSO’s ability of exploration and exploitation: The partial re-initialization strategy is utilized to enlarge the explo143
0
0
10
10
star ring star elitism ring elitism star eli. red. ring eli. red.
−1
10
star ring star elitism ring elitism star eli. red. ring eli. red.
−1
10
−2
10
−2
10
−3
10
−3
10
−4
0
1
2
3
4
10
5
0
1
2
3
4
4
5 4
x 10
x 10
(a) position
(b) velocity
0
0
10
10
star ring star elitism ring elitism star eli. red. ring eli. red.
−1
10
−1
10
−2
10
−3
10
star ring star random ring random star ran. red. ring ran. red.
−2
10
−4
10
−3
10
−5
0
1
2
3
4
10
5
0
1
2
3
4
4
5 4
x 10
x 10
(a) cognitive
(b) position
0
0
10
10
star ring star random ring random star ran. red. ring ran. red.
−1
10
−1
10
−2
10
−2
10 −3
10
star ring star random ring random star ran. red. ring ran. red.
−4
10
−5
10
0
1
−3
10
−4
2
3
4
10
5
0
1
2
3
4
4
5 4
x 10
x 10
(e) velocity
(f) cognitive
Figure 5.3: Comparison of population diversities changing for PSO solving unimodal function f4 . PSO with elitism partial re-initializing strategy: (a) position diversity, (b) velocity diversity, (c) cognitive diversity; PSO with random partial re-initializing strategy: (d) position diversity, (e) velocity diversity, (f) cognitive diversity.
144
2
2
10
10
star ring star elitism ring elitism star eli. red. ring eli. red.
1
10
1
10
0
10
star ring star elitism ring elitism star eli. red. ring eli. red.
0
10
−1
10
−1
10
−2
0
1
2
3
4
10
5
0
1
2
3
4
4
x 10
(a) position
(b) velocity
2
2
10
10
star ring star elitism ring elitism star eli. red. ring eli. red.
1
10
1
10
0
10
−1
star ring star random ring random star ran. red. ring ran. red.
10 0
10
−2
10
−1
10
−3
0
1
2
3
4
10
5
0
1
2
3
4
4
5 4
x 10
x 10
(c) cognitive
(d) position
2
2
10
10
1
1
10
10
0
0
10
10
−1
−1
star ring star random ring random star ran. red. ring ran. red.
10
−2
10
star ring star random ring random star ran. red. ring ran. red.
10
−2
10
−3
10
5 4
x 10
−3
0
1
2
3
4
10
5
0
1
2
3
4
4
5 4
x 10
x 10
(e) velocity
(f) cognitive
Figure 5.4: Comparison of population diversities changing for PSO solving multimodal function f9 . PSO with elitism partial re-initializing strategy: (a) position diversity, (b) velocity diversity, (c) cognitive diversity; PSO with random partial re-initializing strategy: (d) position diversity, (e) velocity diversity, (f) cognitive diversity.
145
ration ability, and exploitation space dynamical reduction strategy is utilized to enhance the exploitation ability. Algorithm’s performance is improved by these strategies on the experimental study. The particles are re-initialized every fixed number of iterations in this chapter, however, the re-initialization strategy may be more effective if the number of iterations is adapted based on population diversities.
5.4
Adaptive Inertia Weight
From the population diversity measurements, the exploration or exploitation status can be recognized. An adaptive inertia weight PSO is proposed in this chapter which is based on the exploration or exploitation status. Particles’ inertia weights are initialized as uniformly distributed random numbers in the range [0.4, 0.9]. At the beginning of search, the minimum inertia weight will be increased to enhance the exploration ability. On the contrary, at the end of search, the maximum inertia weight will be decreased to enhance the exploitation ability. The adaptive inertia weight PSO is compared with the standard PSO. The adaptive inertia weight PSO could find solutions as good as the standard PSO [21] with the star structure, and for some multimodal functions, the solutions are even better. The exploration or exploitation status also can be roughly recognized by the distance between the gbest and centre of swarm. If the gbest is close to the centre of swarm, other particles will fly to the gbest, i.e., other particles will fly to the centre of swarm. In most case, the exploitation ability should be enhanced in this situation. On the contrary, if the gbest is far from the centre of swarm, most particles should leave their own positions, and the exploration ability of algorithm should be enhanced. An adaptive inertia weight strategy based on the distance between the gbest and the centre of swarm is also proposed in this chapter. A sigmoid function is utilized mapping from the distance ratio [0, 1] to inertia weight w within the range of [0.4, 0.9].
5.4.1
Diversity Analysis
The optimization (or search) process is to find the good enough position in the solution space which corresponds to the optimal objective function value. A good algorithm should have the ability of exploration to find the areas which may contain the good enough solution, and the ability of exploitation was also needed to refine the area which may contain the good solution. In general, the algorithm should focus on exploration at the beginning of search, and focus on exploitation at the end of search. However, it is difficult to determine whether the algorithm is in the exploration or exploitation state. From the changing of position diversity and cognitive diversity, the speed of swarm convergence or divergence can be observed. The changing of position
146
diversity and cognitive diversity can be divided into four cases: 1. Position diversity increasing, while cognitive diversity increasing, i.e., position diversity and cognitive diversity get increased at the same time. 2. Position diversity decreasing, while cognitive diversity decreasing, i.e., position diversity and cognitive diversity get decreased at the same time. 3. Position diversity increasing, while cognitive diversity decreasing. 4. Position diversity decreasing, while cognitive diversity increasing. For the first two cases, if the position diversity and cognitive diversity increase at the same time, the swarm is diverging, i.e., the algorithm is in the exploration status, and on the contrary, if the position diversity and cognitive diversity decrease at the same time, the swarm is converging, i.e., the algorithm is in the exploitation status. The inertia weight should be adaptively changed while in these two status.
5.4.2
Particles with Different Inertia Weight
An algorithm should have a good balance between the ability of exploration and exploitation. In general, an algorithm should explore more space to find the regions that may contain good enough solution at the beginning of search. Exploring more space could increase the possibility of algorithm “jumping out” of local optima. On the contrary, at the end of search, the algorithm should focus on the ability of exploitation to refine the found region. The ability of exploration should be enhanced at the beginning of the search, and the ability of exploitation be enhanced at the end of the search. The particle swarm optimization algorithm’s ability of exploration and exploitation can be adjusted by changing the inertia weights. With a larger inertia weight, particles will have higher possibility to diverge, i.e., more space will be explored. If the inertia weight is small, the velocities of particles will be decreased fast, i.e., the swarm will get converged quickly. In this chapter, we propose a new adaptive inertia weight in PSO, and the pseudocode is listed in Algorithm 3. For each particle, a uniformly distributed random number in the range [0.4, 0.9] is assigned as inertia weight at the initialization. If the global best fitness has improvement after one iteration, all inertia weight will be kept to the next iteration. On the contrary, if the global best fitness is not improved after one iteration, the inertia weight will be adjusted according to different situation. At the beginning of search, the minimum of inertia will be increased to encourage exploration; on the contrary, at the end of the search, the maximum of inertia will be decreased to encourage exploitation. This strategy can be seen as a PSO with a learning process. The algorithm’s ability of exploration is enhanced at the beginning of search; particles are guided to search 147
Algorithm 3: Adaptive inertia weight in particle swarm optimization 1
2
3
4 5
6
7 8
9 10
11
12
13
14
Initialization: Initialize velocity and position randomly for each particle in every dimension; Initialize each particle’s inertia weight as a uniformly distributed random number in the range [0.4, 0.9]; while have not found “good enough” solution or have not reached the maximum iteration do Calculate each particle’s fitness value; Compare fitness value between current value and best position in history (personal best, termed as pbest). For each particle, if fitness value of current position is better than pbest, then update pbest as current position; Select a particle which has the best fitness value from current particle’s neighborhood; for each particle do Update particle’s velocity and position according to the equation (2.4) and (2.5), respectively; if the global best fitness didn’t get improved after one iteration then if position diversity and cognitive diversity increase or decrease at the same time then if at the beginning of search, i.e., the number of iteration is less than a threshold α then Update the range of inertia weight: increase the minimum inertia weight of particles; else if at the end of search, i.e., the number of iteration is larger than a threshold β then Update the range of inertia weight: decrease the maximum inertia weight of particles;
15
148
more promising regions. The ability of exploitation is enhanced at the end of search; particles are guided to refine the promising region founded.
5.4.3
Particles with Similar Inertia Weight
The inertia weight also can be adaptively changed by the distribution of particles’ positions or velocities [251, 261]. Firstly, the distances between particles and their centre can be calculated as follow: distancei =
n X j=1
|xij − x ¯j |.
(5.6)
Particles follows the gbest in the swarm. The exploration or exploitation status also can be roughly recognized by the distance between the gbest and centre of swarm. If the gbest is close to the centre, other particles will search around the gbest, it means good solutions may exist in the region close to the gbest. The ability of exploitation should be enhanced to refine the region close to the gbest. On the contrary, if the gbest is far from the centre, most particles should leave their positions to find good solution. The ability of exploration should be enhanced to move to new regions quickly. The ratio of gbest in the distance can be measured by (5.7) or (5.8). ratio =
distancegbest distancemax
(5.7)
distancegbest − distancemin (5.8) distancemax − distancemin If gbest close to the centre, the ratio will be close to 0, and if gbest is far from the ratio =
centre, the ratio will be close to 1. A sigmoid function (5.9) is utilized to map the ratio into the range of [0.4, 0.9] [251, 261]. w=
1 ∈ [0.4, 0.9] 1 + 1.5 exp(−2.2 × ratio)
(5.9)
With this setting, the inertia weight is adaptively changed in the range [0.4, 0.9], and if the gbest is close to the centre, the w will be close to 0.4, then the exploitation ability will be enhanced. On the contrary, if the gbest is far from the centre, the w will close to 0.9, the exploration ability will be enhanced.
5.4.4
Experimental Study
Benchmark Test Functions The experiments have been conducted to test the benchmark functions listed in Table A.1. Without loss of generality, five standard unimodal and six multimodal test functions are selected [152, 249]. All functions are run 50 times to ensure a reasonable statistical result necessary to compare different approaches. Randomly shifting of the location of optimum is utilized in each dimension for each run. 149
Parameter Setting The proposed PSOs are compared with the standard PSO. In all experiments, PSO has 48 particles, and c1 = c2 = 1.496172 [21, 44]. For standard PSO [21], the inertia weight w = 0.72984. For particles with different inertia weights (denoted as Adaptive inertia weight PSO I in Table 5.10 and Table 5.11, or simply adaptive I), the weights are initialized as uniformly distributed random numbers in the range [0.4, 0.9]. Each algorithm has 4000 iterations for 100 dimensional problems in every run. The inertia weights adaptively change when the global best is not improved in one iteration. In the first 1500 iterations, if particles’ position diversity and cognitive diversity increase or decrease at the same time, the minimum inertia weight will be increased at 0.01. In the last 1500 iterations, if particles’ position diversity and cognitive diversity increase or decrease at the same time, the maximum inertia weight will be decreased at 0.01. For particles with the same inertia weight, the strategy with equation (5.7) is denoted as Adaptive inertia weight PSO II (or simply adaptive II), c1 = c2 = 1.496172, and the strategy with equation (5.8) is denoted as Adaptive inertia weight PSO III (or simply adaptive III), c1 = c2 = 2.0. Boundary Constraints Handling With an improper boundary constraints handling method, particles may get “stuck in” the boundary [31]. The classical boundary constraints handling method is as follows: if xi,j (t + 1) > Xmax,j Xmax,j Xmin,j if xi,j (t + 1) < Xmin,j xi,j (t + 1) = (5.10) x (t + 1) otherwise i,j where t is the number of the last iteration, and t + 1 is the number of current iteration. This strategy resets particles in a particular point – the boundary, which constrains particles to fly in the search space limited by boundary. A stochastic boundary constraints handling method for PSO with global star structure was utilized in this chapter. The equation (5.11) gives a method that particles are reset into a special area. Xmax,j × (rand() × c + 1 − c) if xi,j (t + 1) > Xmax,j Xmin,j × (Rand() × c + 1 − c) if xi,j (t + 1) < Xmin,j xi,j (t + 1) = x (t + 1) otherwise i,j
(5.11)
where c is a parameter to control the resetting scope. In our experiment, the c is set to 0.1. A particle will be close to the boundary when position is beyond the boundary. This will increase the exploitation ability of algorithm search the solution close to the boundary.
150
Experimental Results As we are interested in finding an optimizer that will not be easily deceived by any local optima, we apply three measures of performance. The first is the best fitness value attained after a fixed number of iterations. In our case, we report the best result found after 4000 iterations. The second and the last are the median and mean value of the best fitness values for all runs. Table 5.10 and Table 5.11 give the results of variants of PSO solving unimodal and multimodal benchmark functions, respectively. The bold numbers indicate the better solutions. In general, the adaptive I strategy has better results in the mean fitness metric, and the adaptive III strategy has better results in the best fitness metric. This indicates that the adaptive I strategy has a good robustness, and the adaptive III strategy has a good search ability. The strengths of these two strategies should be combined to obtain a good search ability and robustness. Table 5.10: Results of variants of PSO solving unimodal benchmark functions. All algorithms are run for 50 times, where “best”, “median”, and “mean” indicate the best, median, and mean of the best fitness values for all runs, respectively. Func. f0
PSO Classic Adaptive I Adaptive II Adaptive III
f1
Classic Adaptive I Adaptive II Adaptive III
f2
Classic Adaptive I Adaptive II Adaptive III
f3
Classic Adaptive I Adaptive II Adaptive III
f4
Classic Adaptive I Adaptive II Adaptive III
fmin
Best -449.9999 -449.9347 -449.9999 -449.9999
Median -448.9776 -445.2504 -449.9999 -449.9999
Mean -445.3326 -414.5315 -449.9999 -371.8409
Std. Dev. 9.85626 117.704 0.00050 323.0614
-330.0
-329.5916 -328.7556 -323.7784 -329.9999
-323.5165 -326.0650 -292.1504 -328.5401
-320.2313 -323.8926 -278.9231 -273.9591
15.3871 4.88928 62.7845 118.8989
450.0
7494.5009 16983.569 10958.533 22906.143
13732.397 32375.399 49959.394 37970.149
15298.577 33500.722 50931.664 38986.947
4941.028 8074.8034 28271.42 9908.50
330.0
361 334 941 331
604 356 2769 341
664.42 405.04 3472.5 436.14
320.4953 137.7864 2428.06 348.2269
-450.0
-449.9662 -449.9571 -449.8690 -449.9063
-449.9305 -449.9207 -449.5969 -449.8375
-449.9153 -449.9193 -449.4089 -449.7873
0.04943 0.01952 0.50349 0.16833
-450.0
Discussions Compared with other evolutionary algorithms, e.g., genetic algorithm, PSO has more search information, which includes not only the solution (position), but also the velocity and the previous solution (cognitive). Population diversities, which include position 151
Table 5.11: Results of variants of PSO solving multimodal benchmark functions. All algorithms are run for 50 times, where “best”, “median”, and “mean” indicate the best, median, and mean of the best fitness values for all runs, respectively. Func. f5
PSO Classic Adaptive I Adaptive II Adaptive III
f6
Classic Adaptive I Adaptive II Adaptive III
f7
Classic Adaptive I Adaptive II Adaptive III
f8
Classic Adaptive I Adaptive II Adaptive III
f9
Classic Adaptive I Adaptive II Adaptive III
f10
Classic Adaptive I Adaptive II Adaptive III
fmin
Best 315.6472 299.7977 281.5317 277.3043
Median 477.7683 560.2574 502.8287 372.9479
Mean 486.6120 577.2132 945.1477 479.2423
Std. Dev. 100.0427 124.5268 920.6598 374.0204
-330.0
56.7879 -139.2625 182.4007 -16.5889
245.0822 -34.9566 351.5433 176.2200
232.6096 -26.6035 345.4072 171.1756
84.9479 54.8106 89.1062 95.7205
450.0
733.0458 679.4103 1051.25 865.9549
878.5282 868.0360 1278.5 1113.897
870.1552 864.4472 1271.87 1125.691
80.9423 74.2964 122.957 109.538
180.0
183.0057 181.1432 189.5825 180.0000
199.2698 189.9451 199.5070 199.2170
193.0924 191.2208 198.7401 191.7271
7.28533 8.13577 2.22100 8.5323
120.0
120.1207 120.0895 120.0000 120.0000
120.3520 120.9875 120.2110 120.0172
120.4754 120.9963 120.3528 120.9151
0.33667 0.65497 0.45460 3.89020
330.0
330.8298 330.1263 331.2782 330.0424
332.3636 330.7440 335.1152 330.4690
332.3815 330.8192 337.3615 333.6002
1.07291 0.44340 12.9941 9.87691
180.0
152
diversity, velocity diversity, and cognitive diversity are utilized to measure these information, respectively. There are several definitions on the measurement of population diversities [29, 30, 207, 208]. The dimension-wise population diversity based on the L1 norm is utilized in this chapter. Without loss of generality and for the purpose of simplicity and clarity, the results for one function (f4 ) from five unimodal benchmark functions and two functions (f6 , f8 ) from six multimodal functions will be displayed. 12000
2000
standard adaptive I adaptive II adaptive III
10000
8000
standard adaptive I adaptive II adaptive III
1500
1000
6000
4000
500
2000 0 0
−2000
0
500
1000
1500
2000
2500
3000
3500
−500
4000
0
(a) function f4
500
1000
1500
2000
2500
3000
3500
4000
(b) function f6
205
standard adaptive I adaptive II adaptive III
200
195
190
185
180
0
500
1000
1500
2000
2500
3000
3500
4000
(c) function f8 Figure 5.5: Comparison of fitness value changing for variants of PSO solving unimodal function f4 , and multimodal function f6 , f8 in a single run, respectively. Figure 5.5 displays fitness improvement of variants of PSO solving unimodal function f4 and multimodal function f6 , f8 in a single run. From this figure, the changing curve of standard PSO are smoother than PSO with adaptive strategy. This indicates that the PSO with adaptive strategy has a large search ability, and it can easily jump out of local optima. Figure 5.6, Figure 5.7, and Figure 5.8 displays the definitions of population diversities for PSO solving function f4 , f6 , and f8 , respectively. Figure 5.9, Figure 5.10, and Figure 5.11 compares the position diversity, velocity diversity and cognitive diversity for PSO solving function f4 , f6 , and f8 , respectively. 153
0
0
10
10
−1
−1
10
10
−2
−2
10
10
position velocity cognitive
position velocity cognitive
−3
10
−3
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
(a) standard
1000
1500
2000
2500
3000
3500
4000
(b) adaptive I
0
0
10
10
position velocity cognitive
position velocity cognitive
−1
10
−1
10
−2
10
−3
10
−2
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) adaptive II
0
500
1000
1500
2000
2500
3000
3500
4000
(d) adaptive III
Figure 5.6: Definition of population diversities changing for variants of PSO solving unimodal function f4 .
154
1
1
10
10
0
10 0
10
−1
10
position velocity cognitive
−1
10
−2
10
−3
10 −2
position velocity cognitive
10
−4
10
−3
10
−5
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
(a) standard
1500
2000
2500
3000
3500
4000
3000
3500
4000
(b) adaptive I
2
1
10
10
0
0
10
10
−2
−1
10
10
−4
−2
10
10
−6
−3
10
10
−8
−4
10
10
position velocity cognitive
−10
10
position velocity cognitive
−5
10
−12
10
1000
−6
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) adaptive II
0
500
1000
1500
2000
2500
(d) adaptive III
Figure 5.7: Definition of population diversities changing for variants of PSO solving multimodal function f6 .
155
2
2
10
10
1
10
0
10
0
10 −2
10
−1
10 −4
10
−2
10
position velocity cognitive
−6
10
−8
10
position velocity cognitive
−3
10
−4
0
500
1000
1500
2000
2500
3000
3500
10
4000
0
500
(a) standard
1000
1500
2000
2500
3000
3500
4000
3000
3500
4000
(b) adaptive I
2
2
10
10
position velocity cognitive
0
10
1
10
0
10
−2
10
−1
10 −4
−2
10
10
−3
10
−6
10
−4
position velocity cognitive
10 −8
10
−5
10 −10
10
−6
0
500
1000
1500
2000
2500
3000
3500
10
4000
(c) adaptive II
0
500
1000
1500
2000
2500
(d) adaptive III
Figure 5.8: Definition of population diversities changing for variants of PSO solving multimodal function f8 .
156
0
0
10
10
standard adaptive I adaptive II adaptive III
−1
10
−1
10
−2
−2
10
10
−3
10
standard adaptive I adaptive II adaptive III
−3
0
500
1000
1500
2000
2500
3000
3500
10
4000
(a) position
0
500
1000
1500
2000
2500
3000
3500
4000
(b) velocity
0
10
standard adaptive I adaptive II adaptive III
−1
10
−2
10
−3
10
0
500
1000
1500
2000
2500
3000
3500
4000
(c) cognitive Figure 5.9: Comparison of population diversities changing for variants of PSO solving unimodal function f4 .
157
2
2
10
10
0
0
10
10
−2
−2
10
10
−4
−4
10
10
−6
−6
10
10
standard adaptive I adaptive II adaptive III
−8
10
−10
10
−10
10
−12
10
standard adaptive I adaptive II adaptive III
−8
10
−12
0
500
1000
1500
2000
2500
3000
3500
10
4000
(a) position
0
500
1000
1500
2000
2500
3000
3500
4000
(b) velocity
2
10
0
10
−2
10
−4
10
standard adaptive I adaptive II adaptive III
−6
10
−8
10
−10
10
0
500
1000
1500
2000
2500
3000
3500
4000
(c) cognitive Figure 5.10: Comparison of population diversities changing for variants of PSO solving multimodal function f6 .
158
2
2
10
10
0
0
10
10
−2
−2
10
10
−4
−4
10
10
−6
10
−6
10
standard adaptive I adaptive II adaptive III
−8
10
10
−10
10
standard adaptive I adaptive II adaptive III
−8
−10
0
500
1000
1500
2000
2500
3000
3500
10
4000
(a) position
0
500
1000
1500
2000
2500
3000
3500
4000
(b) velocity
2
10
0
10
−2
10
−4
10
−6
10
standard adaptive I adaptive II adaptive III
−8
10
−10
10
0
500
1000
1500
2000
2500
3000
3500
4000
(c) cognitive Figure 5.11: Comparison of population diversities changing for variants of PSO solving multimodal function f8 .
159
By observing the shapes of population diversities changes in Figure 5.6, Figure 5.7, Figure 5.8, Figure 5.9, Figure 5.10, and Figure 5.11, it is easy to see that adaptive inertia weight PSO I and III have a slower decrease than standard PSO. The change of population diversities represents the speed of particles converges or diverges. Particles should converge into a small region, and then the algorithm could find better fitness value after iterations. From these two figures, we can conclude that particles should be converging to find the solution; however, the speed of convergence is not the fast the better for fast convergence may lead to local optima. In general, adaptive II has a fast deceasing in population diversity, and the final value of population diversity reaches to a very small number. However, the result of adaptive II is not good in experiments. The changing of population diversity, i.e., the speed of particles getting converged should be controlled. It should be slow to ensure that particles are not “stuck/trapped” into local optima, and not too slow to perform efficiently and effectively in exploring solution(s).
5.4.5
Conclusions
In this section, we proposed two strategies to dynamically adjust PSO’s inertia weight. In the first strategy, particles have different inertia weights. The particles’ inertia weights are randomly generated within the range [0.4, 0.9] at the initialization. The minimum of inertia weight will be increased at the beginning of search to enhance the ability of exploration, and the maximum of inertia weight will be decreased at the end of the search to enhance the ability of exploitation. In the second strategy, all particles share the same inertia weight. The weight dynamically changed according to the distribution of particles. If the gbest is close to the centre of swarm, the inertia weight will be close to 0.4 to enhance the exploitation ability. On the contrary, if the gbest is far from the centre, the inertia weight will be close to 0.9 to enhance the exploration ability. The standard particle swarm optimization [21] defines a set of fixed parameters. These parameters may be suitable for most problems to get a good solution. However, for different problems, an adaptive parameters setting may find better solutions. For multimodal problems, particles may “stick into” local optima with fixed parameters. If algorithms could recognize the current search status, more proper parameters could be assigned to algorithms. With a proper adaptive parameters strategy, algorithms performance will be improved. The particles’ inertia weights are adjusted based on the changes of position diversity and cognitive diversity, or the distribution of particles’ positions or velocities. The position diversity and cognitive diversity increasing or decreasing at the same time are utilized to tune the inertia weights. The adaptation also can be based on the distance between gbest and the centre of swarm. Particles’ weights can be set at different values, and update to be similar after one iteration; or particles’ weights can be set as the same, 160
and all particles’ weights adaptively update at the same time. Through the observation of the algorithm being in exploration or exploitation status, the inertia weights can be adaptively changed to enhance the search ability. The acceleration constants c1 and c2 also can be adjusted to improve the performance. More analyses of population diversity are also needed for a more precise control on the particles. According to the population diversities change, adaptively tuning parameters may obtain a better performance.
161
Chapter 6
Text Categorization The Knowledge discovery in databases (KDD) is the process of converting raw data into useful information. Data mining (the analysis step of KDD), is the process that attempts to discover useful information (or patterns) in large data repositories [86,217]. The Figure 6.1 shows the general process of knowledge discovery in databases (KDD). In this chapter, particle swarm optimization algorithm is utilized to solve Chinese text categorization problems. The swarm intelligence can be utilized to solve data mining problems. The data clustering methods also can be applied to the swarm intelligence [40,201,202]. In swarm intelligence algorithms, solutions are spread in the search space. Each solution can be seen as a sampling result from search space. An individual in the search space is not only a solution to the problem to be optimized, but also a data point to reveal the landscapes of the problem. The distribution of solutions can be utilized to reveal the landscapes of a problem. From the clustering analysis of solutions, the search results can be obtained.
6.1
Overview
Text categorization, or termed as text classification (TC), is a problem that finds correct category (or categories) for documents by giving a set of categories (subject, topics) and a collection of text documents. Text categorization can be considered as a mapping f : D → C, which is from the document space D onto the set of classes C. The objective
of a classifier is to obtain an accurate categorization results or predictions with high confidence. In the text categorization process, the text is divided into a collection of words. The most commonly used method is the bag of words model. All words that are possible to show in any document are considered as the features of a document, and thus the dimension of the feature space is equal to the number of different words that can appear in all documents. A bag is similar to a set in that it is defined as a collection of elements. However, the difference between bag and set is that bag allows multiple occurrences of 162
elements, i.e., a element may occurs more than once in a bag [173, 239]. This indicates that the data in each document is not words, but a vector. The methods of assigning weights to the features may vary. The simplest is the binary method in which the feature weight is either one — if the corresponding word is present in the document — or zero otherwise [25, 35]. Recently, particle swarm optimization is utilized for data categorization problem [122, 258]. In these methods, PSO is utilized to optimize the parameter of classifier. In particle swarm optimization, a particle not only learns from its own experience, it also learns from its companions. It indicates that a particle’s ‘moving position’ was determined by its own experience and the neighbors’ information. With this concept, we introduce a particle swarm optimization based semi-supervised learning method to solve Chinese text categorization problem. Text mining or text retrieval is an efficient approach for analyzing and discovering knowledge from large complex text of heterogeneous quality, for which a variety of text mining tools have been developed [123]. Selection
Target Data
Data Preprocessing
Preprocessed Data
Transformation
Transformed Data
Data Mining
Model
Interpretation
Knowledge
Initial Data
Knowledge Discovery in Databases (KDD)
Figure 6.1: The process of knowledge discovery in databases (KDD). In this chapter, the particle swarm optimization is utilized in semi-supervised learning, parameter k optimization, and the labeled examples optimization.
6.1.1
Preliminary Processing
Text Vectorization The size of Dictionary is important to Chinese segmentation. If dictionary contains too many terms, some incorrect words will reduces the accuracy of segment. However, if the number of terms is too small, some words will not be found. It also reduces the accuracy. After comparison, the Modern Chinese dictionary was choosen as segment 163
dictionary. The Modern Chinese Dictionary contains more than 240 000 Chinese words totally and two characters word nearly 210 000, three characters words more than 2 600, four characters words more than 5 000 and several words’ length more than four, respectively. Dictionary Process Word length is the most obvious character for every word except language properties. Most words have two or three Chinese characters, and thousands of words have four Chinese characters. For words, whose length bigger than four, could be neglected for this kind of word is very rare. In dictionary establishment, our dictionary only contains two, three and four Chinese characters words. Empirically, long word is more important than short word. In other words, different word length has different weight. Whole dictionary needs to be divided into three dictionaries, which contain only two, three and four words respectively. This dividing also gives an upgrade for the segment speed. Only the dictionary, which contains words length of two, need to be inquiry if the text segment by two Chinese characters word. For many large scale learning problems, acquiring a large amount of labeled training data is expensive and time-consuming. Semi-supervised learning is a machine learning paradigm which deals with utilizing unlabeled data to build better classifiers. However, unlabeled data with wrong predictions will mislead the classifier. In this section, we proposed a particle swarm optimization based semi-learning classifier to solve Chinese text categorization problem. This classifier utilizes an iterative strategy, and the result of the classifier is determined by a document’s previous prediction and its neighbors’ information. The new classifier is tested on a Chinese text corpus. The proposed classifier is compared with the k nearest neighbor method, the k weighted nearest neighbor method, and the self-learning classifier. In particle swarm optimization, a particle not only learns from its own experience, it also learns from its companions. It indicates that a particle’s ‘moving position’ is determined by its own experience and the neighbors’ information [31].
6.2
Similarity Metrics
The simple approach to perform text categorization is to compute the similarity between the query text q and all database texts in order to find the k best texts. The k Nearest Neighbor (KNN) is a very intuitive method that classifies unlabeled examples based on their similarity or distance with examples in the training set. Distances, which are dissimilarities with regard to certain properties, measure how unlike or dissimilar objects are. Similarity is defined as a mapping from two vectors x and y to an interval [0, 1]. The Cosine measure is utilized to calculate the similarity between labeled examples and test example here [217]. 164
6.2.1
Distance Measure
Distances, which are dissimilarities with certain properties, measure how unlike or dissimilar objects are. Euclidean Distance The most popular metric is the usual Euclidean Distance, the equation is as following: v u k uX (6.1) dis(x, y) = t (xi − yi )2 i=1
Manhattan Distance Manhattan distance or termed hamming distance, which is the number of bits that are different between two objects that have only binary attributes, i.e., between two binary vectors. The equation is shown in (6.2) dis(x, y) =
k X i=1
|(xi − yi )|
(6.2)
The Euclidean and Manhattan distance measure give in equation (6.1) and (6.2) are generalized by the Minkowski distance metric shown in Equation (6.3), k X 1 dis(xi , yi ) = ( |xi − yi |r ) r
(6.3)
i=1
where r is a parameter. The following are the three most common examples of Minkoski distance. • r = 1. Manhattan (City block, taxicab, L1 norm) distance. • r = 2 Euclidean distance (L2 norm). • r = ∞ Supremum (Lmax or L∞ norm) distance. This is the maximum difference between any attribute of the objects. More formally, the L∞ distance is defined
by the Equation (6.4) n X 1 dis(xi , yi ) = limr→∞ ( |xi − yi |r ) r
(6.4)
i=1
Distance Properties Distances, such as Euclidean Distance and Manhattan Distance, have some well-known properties. If dis(x, y) is the distance between two data objects, x and y, then the following properties hold [217]. 1. Positivity 165
• dis(x, y) ≥ 0 for all x and y, • dis(x, y) = 0 only if x = y. 2. Symmetry • dis(x, y) = dis(y, x) for all x and y. 3. Triangle Inequality • dis(x, z) ≤ dis(x, y) + dis(y, z) for all points x, y, and z.
6.2.2
Similarity Measure
Similarity is defined as a mapping from two vectors x and y to an interval [0, 1] (for overlap measure is [0, +∞)). There are four kind of similarity measure used in information retrieval: Dice, Jaccard, Cosine and Overlap Measure. Dice Measure For similarity between two vectors x and y, the equation of Dice measure is as following: 2 sim(xi , yi ) =
k P
x i yi
i=1 k P i=1
x2i
+
k P i=1
(6.5) yi2
where 0 ≤ sim(xi , yi ) ≤ 1. The Dice similarity will be 1 if and only if text A and text
B have same contents. Jaccard Measure
For similarity between two vectors x and y, the equation of Jaccard measure is shown in (6.6): k P
sim(xi , yi ) =
k P i=1
x2i
+
i=1 k P i=1
x i yi yi2
−
k P
(6.6) x i yi
i=1
where 0 ≤ sim(xi , yi ) ≤ 1. The Jaccard similarity will be 1 if and only if text A and
text B have same contents. Cosine Measure
For similarity between two vectors x and y, the equation of Cosine measure is as following: k P
sim(x, y) = s
k P
i=1
166
xi yi
i=1
x2i
k P i=1
yi2
If x and y are two document vectors, then sim(x, y) =
x·y k x kk y k
where · indicates the vectors dot product, x · y = s k P √ vector x, k x k= x2i = x · x.
k P i=1
(6.7)
xi yi , and k x k is the length of
i=1
The cosine similarity is a measure of the (cosine of the) angle between x and y. Thus, if the cosine similarity is 1, the angle between x and y is 0◦ , and x and y are the same except magnitude (length). If the cosine similarity is 0, then the angle between x and y is 90◦ , and they do not share any terms(words). Equation (6.7) can be written as Equation (6.8) sim(x, y) = where x0 =
x kxk
and y 0 =
y kyk .
y x · = x0 · y 0 kxk kyk
(6.8)
Dividing x and y by their lengths normalizes them to
have a length of 1. This means that cosine similarity does not take the magnitude of the two data objects into account when computing similarity. For vectors with a length of 1, the cosine measure can be calculated by taking a simple dot product. Consequently, when many cosine similarities between objects are being computed, normalizing the objects to have unit length can reduce the time required. Overlap Measure For similarity between two vectors x and y, the equation of Overlap similarity measure is shown in (6.9): k P
x i yi
i=1 k P
k P
i=1
i=1
sim(x, y) = min(
x2i ,
(6.9) yi2 )
where 0 ≤ sim(xi , yi ), sim(xi , yi ) can be any value bigger than 0.
If text A and text B have the same contents, the Overlap similarity will be 1. If
text A contains text B, the Overlap similarity equals 1 or bigger than 1. For example, consider text A as a terms vector x = {a, b, c, d} and text B as a terms vector y =
{b, c, d}. The xi and the yi are the frequency of each item. The Overlap similarity is
sim(x, y) = 4 min(7, 3)
3 min(4, 3)
= 1.334.
= 1; if x = {a, b, b, c, d}, the Overlap similarity will be sim(x, y) =
But if text A not contains text B, the Overlap similarity also can be 1 or bigger than 1. For example, for text A and B (terms vector x and y), x = {a, b, b, c}, y = {b, c, d};
sim(x, y) = 1.334.
3 min(6, 3)
= 1. For x = {a, b, b, b, c}, y = {b, c, d}; sim(x, y) =
167
4 min(11, 3)
=
Similarity Characteristics For similarity, the triangle inequality (or the analogous property) typically does not hold, but symmetry and positivity typically do. To be explicit, if sim(a, y) is the similarity between data objects x and y, then the typical properties of similarities is the following: • sim(x, y) = 1 only if x = y. (0 ≤ sim(x, y) ≤ 1) (Except for overlap similarity). • sim(x, y) = sim(y, x) for all x and y (Symmetry).
6.2.3
Measure Comparison
Four kinds of similarity measures are used for calculating the similarity of two vectors. Dice similarity measures the connection of intersection and mean size between two vectors. Jaccard similarity considers the relation between intersection and union of two vectors. Cosine similarity measures the relation between intersection and geometric mean of two vectors [73]. Vector Space Model (VSM) (or termed vector model) is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers. Similarity measures, which are based on vector space model, do not take the magnitude of the two data objects into account. Distance measure might be a better choice when magnitude is important. Distance measure considers the magnitude of the vector difference between two document vectors. This measure suffers from a drawback: Two documents with very similar contents can have a significant vector differences simply because one is much longer than the other. Thus, the relative distributions of terms may be identical in the two documents, but the absolute term frequencies of one may be far larger. For example, for terms vector x = {a, a, b} and y = {a, b, b}, the cosine, dice
and overlap similarity is 0.8, and Jaccard similarity is 0.667; the Euclidean distance
is 1.414 and Manhattan distance is 2. If the vectors are doubled in magnitude, x = {a, a, a, a, b, b} and y = {a, a, b, b, b, b}, similarity measures do not change, but Euclidean distance changes to 2.828 and Manhattan distance changes to 4, distances measure changes with magnitude’s mutation. Similarity and distance measure do not take the sequence of terms in text into account. The measure has no change when the sequence has been modified. This is a weak point of vector space model. For example, content spam, which adds a lot of unrelated keywords to text content, uses this to cheat with search engines.
168
6.3
Categorization Methods
6.3.1
Nearest-neighbor classifier
A learning algorithm can be divided into lazy learning and eager learning [7]. For Lazy learning, such as instance-based learning and nearest neighbor classifiers, it simply stores training data (or only minor processing) and waits until it is given a test tuple. The lazy learning do not require model building. However, classifying a test example can be quite expensive because we need to compute the proximity values individually between the test and training examples. Nearest-neighbor classification is part of more general technique known as instancebased learning, which uses specific training instances to make predictions without having to maintain an abstraction (or model) derived from data. Instance-based leaning algorithms require a proximity measure to determine the similarity or distance between instances and a classification function that returns the predicted class of a test instance based on its proximity to other instances. Lazy learners such as nearest-neighbor classifiers do not require model building. However, classifying a test example can be quite expensive because we need to compute the proximity values individually between the test and training examples. In contrast, eager learners often spend the bulk of their computing resources for model building. Once a model has been built, classifing a test example is extremely fast. Nearest-neighbor make their predictions based on local information, whereas decision tree and rule-based classifiers attempt to find a global model that fits the entire input space. Because the classification decisions are model locally, nearest-neighbor classifiers (with small value of k) are quite susceptible to noise. Nearest-neighbor classifiers can produce arbitrarily shaped decision boundaries. Such boundaries provide a more flexible model representation compared to decision tree and rule-based classifiers that are often constrained to rectilinear decision boundaries. The decision boundaries of nearest-neighbor classifiers also have high variability because they depend on the composition of training examples. Increasing the number of nearest neighbors may reduce such variability.
6.3.2
k Nearest Neighbor
The k Nearest Neighbor (KNN) classifier is to find the k training examples that are relatively similar to the attributes of the test example. These examples, which are known as nearest neighbor, can be used to determine the class label of the test example. The nearest neighbor classification, or termed as nearest neighbor rule, is a very intuitive method that categorizes unlabeled examples based on their similarities or distances with the labeled examples. It is important to choose the right value of k. If k is too small, then the nearest-
169
neighbor classifier may be susceptible to overfitting because of noise in the training data. On the other hand, if k is too large, the nearest-neighbor classifier may be misclassify the test instance because its list of nearest neighbors may include data points that are located far away from its neighborhood. The nearest neighbor rule is a kind of non-probabilistic classification procedure, it was first formulated by Fix and Hodges [90]. They investigated a rule called the k-nearest-neighbor rule, which assigns to an unclassified sample the class most heavily represented by its k nearest neighbors. Cover and Hart [52] studied most extensively the properties of this rule, including the lower and upper bound on the probability of error. For a problem with a parameter k, and a collection of n correctly classified samples, it has been demonstrated that when k and n tend to infinity in such a manner that
k n
→ 0, the risk of such a rule approaches the Bayes risk.
A summary of the nearest-neighbor categorization method is given in Algorithm 4. The algorithm computes the distance (or similarity) between each sample z = (x0 , y 0 ) and all the training examples (x, y) ∈ D to determine its nearest-neighbor list, Dz . Such computation can be costly if the number of training examples is large. However, efficient indexing techniques are available to reduce the amount of computations needed to find the nearest neighbor of a test example. Algorithm 4: The k nearest neighbor categorization algorithm Input: Let k be the number of nearest neighbor and D be the set of training examples; 1 for each unclassified sample z = (x0 , y 0 ) do 2 Compute d(x0 , x), which is the distance between test example x0 and every training example x, (x, y) ∈ D; 3 Select Dz ⊆ D, the set ofP k closest training examples to x0 ; 0 4 The prediction y = max (xi ,yi )∈Dz I(v = yi ); v
Once the nearest-neighbor lists are obtained, the test example is classified based on the majority class of its nearest neighbor: Majority voting :
y 0 = max v
X
I(v = yi )
(xi ,yi )∈Dz
where v is class label, yi is the class label for one of the nearest neighbor, and I(·) is an indicator function that returns the value 1 if its argument is true and 0 otherwise.
6.3.3
k Weighted Nearest Neighbor
The distance-weighted k nearest neighbor rule was proposed by Dudani [72], which weighs the evidence of neighbor close to an unclassified observation more heavily than the evidence of another neighbor which is at a greater distance from the unclassified observation. 170
In the majority voting approach, every neighbor has the same impact on the categorization. This makes the algorithm sensitive to the choice of k. The k weighted nearest neighbor (KWNN) utilizes a weight on different neighbors. Distance weighted voting is a straightforward way to weight each neighbor, if a training example is located far away from a unlabeled data, which will have a weaker impact on the categorization result compared to those that has a close distance. The equation is defined as follows: X y 0 = max wi × I(v = yi ) v
(xi ,yi )∈Dz
A summary of the k weighted nearest-neighbor categorization method is given in Algorithm 5. The weight w is the similarity between the unclassified sample x0 and the example x. Algorithm 5: The k weighted nearest neighbor categorization algorithm Input: Let k be the number of nearest neighbor and D be the set of training examples; 1 for each unclassified sample z = (x0 , y 0 ) do 2 Compute d(x0 , x), which is the distance between test example x0 and every training example x, (x, y) ∈ D; 3 Select Dz ⊆ D, the set of k closest training examples to x0 ; 4 Calculate the weight wi ; P 5 The prediction y 0 = max (xi ,yi )∈Dz wi × I(v = yi ); v
Nearest neighbor categorization is part of more general technique known as instance (examples) based learning, which uses specific training instances to make predictions without having to maintain an abstraction (or model) derived from data. Instance based leaning algorithms require a proximity measure to determine the similarity or distance between instances and a categorization function that returns the predicted class of a test instance based on its proximity to other instances. A learning algorithms can be divided into lazy learning and eager learning [7]. For Lazy learning, such as instance-based learning and nearest neighbor classifiers, it simply stores training data (or only minor processing) and waits until it is given a test tuple. The lazy learning do not require model building. However, classifying a test example can be quite expensive because we need to compute the proximity values individually between the test and training examples. For this reason, the number of training examples cannot be too large, otherwise, the categorization process will be inefficient [35]. In contrast, given a set of training set, eager learning, which includes decision trees, support vector machine, and neural network, etc., constructs a classification model before receiving new (e.g., test) data to classify. The eager learners often spend the bulk of their computing resources for model building. Once a model has been built, classifying a test example is extremely fast. 171
6.3.4
Semi-Supervised Learning
For many large scale learning problems, acquiring a large amount of labeled training data is difficult and time-consuming. Semi-supervised learning (SSL) is a machine learning paradigm that deals with utilizing unlabeled data to build better classifiers. Traditional classifiers use only labeled data to train, for example, labeled data as pairs of feature and label. However, labeled instances are often difficult, expensive, or time consuming to obtain, as they require the experienced human experts’ efforts. Meanwhile unlabeled data may be relatively easy to collect, but not easy to use them. Semisupervised learning addresses this problem by using large amount of unlabeled data, together with the labeled data, to build better classifiers. Semi-supervised learning is of great interest both in theory and in practice for it requires less human effort and gives higher accuracy than learning only utilize that labeled data [262]. Self-training is a simple and easy to apply semi-supervised learning technique, which is characterized by the fact that the learning process uses its own predictions to teach itself [99]. The main idea is first training with labeled data, and then the unlabeled data, with their predicted labels, is utilized to predict other unlabeled data. Selftraining assumes that the prediction based on the previous training tends to be correct, then other unlabeled could benefits from these predictions.
6.4
Particle Swarm Optimization based Semi-Supervised Learning
The “No Free Lunch” (NFL) theorem for optimization, which was introduced by Wolpert and Macready [232–235], has claimed that under certain assumptions no algorithm is better than other one on average for all problems. Unlabeled data with a prediction is utilized to train other data in semi-supervised learning. If the previous prediction has a highly confidence, the learning will benefit from experiences, however, if the previous prediction has many errors, the later prediction will be misled.
6.4.1
Particle Swarm Optimization based Semi-Supervised Learning
The particle swarm optimization based semi-supervised learning is shown on Algorithm 6. The algorithm utilizes an iterative strategy, which compares each document’s previous prediction and neighbors’ information. The distance or the similarity is recorded for each sample that to be classified. When this unclassified sample finds a better prediction, i.e., it has a closer distance or similarity than the record; the sample will be predicted to a new category, and the record will be updated to a close distance or similarity. After several iterations, the error rate of categorization could be reduced. The fitness function of text categorization is different when similarity measures or distance measures are utilized in classifier. The function is as follows: 172
Algorithm 6: The particle swarm optimization based semi-supervised learning. Input: Let k be the number of nearest neighbors and D be the set of training examples; 1 for each unclassified sample z = (x0 , y 0 ) do 2 Compute d(x0 , x), which is the distance between unclassified sample x0 and every training example x, (x, y) ∈ D; 3 Select Dz ⊆ D, the set ofP k closest training examples to z; 0 4 The prediction y = max (xi ,yi )∈Dz I(v = yi ); v
Initialize the archive, add each classified text into archive Da with a random probability; while have not reached the maximum number of iterations do Compute d(x0 , x) and d(x0 , xa ), the distance between x0 and every example, (x, y) ∈ D; if test example has closer distance or similarity then Update test Pexample’s predicted category. The prediction y 0 = maxv (xi S xa ,yi )∈Dz I(v = yi );
5
6 7
8 9
i
Update the archive, add each classified text into archive Da with a random probability;
10
• The documents may belong to the same class if they have higher similarity. The object of categorization is to maximum the similarity: f (x) = max
k X
sim(xli , xu )
i=1
• The documents in the same class will have a small distance. The object of categorization is to miniming the distance:
f (x) = min
k X
dis(xli , xu )
i=1
where the xli is a labeled document, and xu is a unlabeled document
6.4.2
Experimental Results and Analysis
Categorization Corpus The test corpus is given in Table 6.1, which has 10 categories and 950 news articles in total. The documents are separated unequally in each category, The class ‘computer’ has the most elements, which contains 210 elements, and the class ‘automobile’ only has 42 elements.
173
Table 6.1: The test corpus used in our experimental study. This corpus has 950 texts in total, and different categories have different numbers of texts. Item 1 2 3 4 5 6 7 8 9 10
Categories Human Resource Sport Health Entertainment Real Estate Eduction Automobile Computer Technology Finance
Numbers 43 201 100 107 67 58 42 210 74 48
Performance Metrics Accuracy is a straightforward way to measure the performance of categorization. The accuracy is defined as follows: Accuracy =
Number of correct predictions Total number of predictions
(6.10)
Equivalently, the performance of a model can be expressed in terms of its Error Rate, which is given by the following formula: Error Rate =
Number of Wrong predictions Total number of predictions
(6.11)
From the above equations, it can be easily seen that: Error Rate = 1 − Accuracy
(6.12)
To simplify categorization problems, categorization error percentage (CEP), which is also equivalent to error rate, is utilized by some researchers to compare the performance of algorithms [129]. CEP represents the percentage of faultily classified patterns of the test data sets. It is defined as follows: CEP = 100 ×
# of misclassified examples size of test date set
(6.13)
The error rate metric only considers misclassified pattern, recall and precision concern the retrieved document and relevant documents together [17,183]. In the perspective of information retrieval, the definitions are as follow [161]: Recall (r) measures how well a system retrieves all the relevant documents, which is the fraction (or proportion) of relevant material (documents, texts, articles) that are actually retrieved Recall =
relevant items retrieved = P (retrieved|relevant) relevant items 174
Precision (p) measures how well the system retrieves only the relevant documents, which is the fraction (or proportion) of retrieved documents that are actually relevant Precision =
relevant items retrieved = P (relevant|retrieved) retrieved items
The Fβ metric utilizes a weight β to balance between recall and precision. The formula is defined as follows: Fβ (r, p) =
(β 2 + 1)pr β2p + r
where β is the parameter allowing differential weighting of p and r. When the value of β is set to one (denoted as F1 ), and precision is weighted equally. The F1 measure, which represents the harmonic mean of recall (r) and precision (p), utilizes an equal weight to combine these two components [247]. The F1 measure is defined as follows: F1 (r, p) =
6.4.3
1 r
2 +
1 p
=
2rp r+p
Nearest Neighbor
Nearest neighbor classifiers and k weighted nearest neighbor are firstly tested in the experiments. k Nearest Neighbor Table 6.2 gives the categorization results of k = 1 nearest neighbor classifier. The number of wrong categorized documents and the error rate are given while training examples of each category are 10, 20, and 30 respectively. The number of wrong categorized are given in ‘10’, ‘20’, and ‘30’ item of Table 6.2 and following tables. The corresponding error rate is given in ‘rate’ item and following tables. Table 6.2: The categorization result of KNN. k is 1 in this experiment, and training examples of each category are from 10 to 30. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 610 702 313 313 513 292
rate 0.6421 0.7389 0.3294 0.3294 0.54 0.3073
20 427 538 226 226 426 209
rate 0.4494 0.5663 0.2378 0.2378 0.4484 0.22
30 334 451 183 183 314 168
rate 0.3515 0.4747 0.1926 0.1926 0.3305 0.1768
Table 6.3 gives the categorization results of k = 3 nearest neighbor classifier. In three nearest neighbors, if more than one neighbor belongs to a specific class, the unlabeled document will be predicted to that class. Otherwise, it will belong to the nearest neighbor’s class. 175
Table 6.3: The categorization result of KNN. k is 3 in this experiment, and training examples of each category are from 10 to 30. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 672 780 294 294 468 265
rate 0.7073 0.8210 0.3094 0.3094 0.4926 0.2789
20 511 552 205 205 397 184
rate 0.5378 0.5810 0.2157 0.2157 0.4178 0.1936
30 412 482 162 162 298 150
rate 0.4336 0.5073 0.1705 0.1705 0.3136 0.1578
From the results above, we can conclude that the similarity metrics is better than distance metrics, and the cosine metric has the best performance in all metrics. The error rate decreases with more training examples. k Weighted Nearest Neighbor Table 6.4, 6.5, and 6.6 give the results of k weighted nearest neighbor classifier with k being 3, 7, and 11 respectively. All the k nearest neighbors will be weighted by a weight, which is the distance or similarity between unlabeled data and the neighbor. Summing all distances or similarities, unlabeled document will be predicted to a neighbor’s class, which has the closest distance or the highest similarity. Table 6.4: The categorization result of KWNN. k is 3 in this experiment, and training examples of each category are from 10 to 30. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 708 781 294 294 461 266
rate 0.7452 0.8221 0.3094 0.3094 0.4852 0.28
20 566 586 206 206 374 186
rate 0.5957 0.6168 0.2168 0.2168 0.3936 0.1957
30 452 531 156 162 287 151
rate 0.4757 0.5589 0.1642 0.1705 0.3021 0.1589
Table 6.5: The categorization result of KWNN. k is 7 in this experiment, and training examples of each category are from 10 to 30. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 595 683 264 263 406 250
rate 0.6263 0.7189 0.2778 0.2768 0.4273 0.2631
20 636 693 183 183 322 176
rate 0.6694 0.7294 0.1926 0.1926 0.3389 0.1852
30 512 583 156 156 252 129
rate 0.5389 0.6136 0.1642 0.1642 0.2652 0.1357
The choice of k is sensitive to the results of classifiers. From the results above, we can conclude that performance will get better with the increasing of k value, however, this is not monotonic. The performance may get worse after k beyond a specific value. 176
Table 6.6: The categorization result of KWNN. k is 11 in this experiment, and training examples of each category are from 10 to 30. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 760 782 280 280 379 266
rate 0.8 0.8231 0.2947 0.2947 0.3989 0.28
20 588 659 197 197 342 167
rate 0.6189 0.6936 0.2073 0.2073 0.36 0.1757
30 526 498 148 148 265 117
rate 0.5536 0.5242 0.1557 0.1557 0.2789 0.1231
For example, the result of Cosine similarity with 10 training examples in Table 6.5 is better than the corresponding result in Table 6.6.
6.4.4
Semi-Supervised Learning
Self-Training Table 6.7, 6.8, 6.9, and 6.10 give the categorization results of self-training, which only takes one iteration on the prediction of unlabled samples. In this experiment, the previous prediction of unlabeled data is based on k weighted nearest neighbor, and k is 1, 3, 7, and 11 respectively. If an unlabeled document has been predicted to a class, this document will be added to example documents. Other unlabeled documents will learn from the previous prediction. If the previous prediction has a high confidence, the later prediction will benefit, otherwise, it will be misled. Table 6.7: The categorization result of self-training. The prediction is based on KWNN classifier, and k is 1 in this experiment, training examples of each category are from 10 to 30. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 690 770 356 356 540 327
rate 0.7263 0.8105 0.3747 0.3747 0.5684 0.3442
20 473 550 278 278 387 249
rate 0.4978 0.5789 0.2926 0.2926 0.4073 0.2621
30 376 462 203 203 301 174
rate 0.3957 0.4863 0.2136 0.2136 0.3168 0.1831
Table 6.8: The categorization result of self-training. The prediction is based on KWNN classifier, and k is 3 in this experiment. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 702 737 328 328 578 300
rate 0.7389 0.7757 0.3452 0.3452 0.6084 0.3157
20 531 598 243 243 378 263
177
rate 0.5589 0.6294 0.2557 0.2557 0.3978 0.2768
30 415 518 172 172 275 183
rate 0.4368 0.5452 0.1810 0.1810 0.2894 0.1926
Table 6.9: The categorization result of self-training. The prediction is based on KWNN classifier, and k is 7 in this experiment. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 699 782 505 505 627 409
rate 0.7357 0.8231 0.5315 0.5315 0.66 0.4305
20 638 636 275 275 481 243
rate 0.6715 0.6694 0.2894 0.2894 0.5063 0.2557
30 502 559 150 150 371 125
rate 0.5284 0.5884 0.1578 0.1578 0.3905 0.1315
Table 6.10: The categorization result of self-training. The prediction is based on KWNN classifier, and k is 11 in this experiment. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 758 818 535 535 654 513
rate 0.7978 0.8610 0.5631 0.5631 0.6884 0.54
20 630 682 278 281 503 276
rate 0.6631 0.7178 0.2926 0.2957 0.5294 0.2905
30 556 586 173 173 372 162
rate 0.5852 0.6168 0.1821 0.1821 0.3915 0.1705
Rotated Self-Training Table 6.11, 6.12, 6.13, and 6.14 give the categorization results of rotated self-training, which takes several iterations on the prediction of unlabled samples. In this experiment, the previous prediction of unlabeled data is based on k weighted nearest neighbor, and k is 1, 3, 7, and 11 respectively. If an unlabeled document has been predicted to a class, this document will be added to training documents. Other unlabeled documents will learn from the previous prediction. After all unlabeled documents have been predicted to a class, every document will be trained again. Table 6.11: The categorization result of rotated self-training. The prediction is based on KWNN classifier, and k is 1 in this experiment, training examples of each category are from 10 to 30. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 740 772 356 356 537 305
rate 0.7789 0.8126 0.3747 0.3747 0.5652 0.3210
20 502 552 274 274 415 233
rate 0.5284 0.5810 0.2884 0.2884 0.4368 0.2452
30 402 464 203 203 327 172
rate 0.4231 0.4884 0.2136 0.2136 0.3442 0.1810
The self-training didn’t improve the performance of classifier. This is because the first supervised learning has many wrong predictions. The unlabeled documents with wrong categories will mislead the later prediction.
178
Table 6.12: The categorization result of rotated self-training. The prediction is based on KWNN classifier, and k is 3 in this experiment. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 792 774 324 324 604 263
rate 0.8336 0.8147 0.3410 0.3410 0.6357 0.2768
20 636 719 227 227 371 271
rate 0.6694 0.7568 0.2389 0.2389 0.3905 0.2852
30 526 628 159 159 276 149
rate 0.5536 0.6610 0.1673 0.1673 0.2905 0.1568
Table 6.13: The categorization result of rotated self-training. The prediction is based on KWNN classifier, and k is 7 in this experiment. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 650 664 519 519 655 397
rate 0.6842 0.6989 0.5463 0.5463 0.6894 0.4178
20 689 711 252 252 528 216
rate 0.7252 0.7484 0.2652 0.2652 0.5557 0.2273
30 512 631 145 145 432 115
rate 0.5389 0.6642 0.1526 0.1526 0.4547 0.1210
Table 6.14: The categorization result of rotated self-training. The prediction is based on KWNN classifier, and k is 11 in this experiment. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 669 706 537 537 659 527
rate 0.7042 0.7431 0.5652 0.5652 0.6936 0.5547
20 586 710 278 278 559 261
179
rate 0.6168 0.7473 0.2926 0.2926 0.5884 0.2747
30 566 603 159 159 425 154
rate 0.5957 0.6347 0.1673 0.1673 0.4473 0.1621
6.4.5
Particle Swarm Optimization based Semi-Supervised Learning
Table 6.15, 6.16, 6.17, and 6.18 give the categorization results of particle swarm optimization based semi-supervised learning. The supervised learning method is also the k weighted nearest neighbors, and k is 1, 3, 7, and 11 respectively. If an unlabeled document has been predicted to a class, this document will have a probability to add into an additional archive. Other unlabeled documents will learn from the training examples and prediction in this archive. This classifier is an iterative method, if unlabeled document has a closer distance or higher similarity with some examples, the unlabeled document will be predicted to these examples’ class. After several iterations, the classifier learns from unlabeled document, and the error rate will be decreased. Table 6.15: The categorization result of PSO based SSL. The prediction is based on KWNN classifier, and k is 1 in this experiment. Training examples of each category are 10, 20, and 30 respectively. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 610 702 308 309 504 278
rate 0.6421 0.7389 0.3242 0.3252 0.5305 0.2926
20 427 538 231 228 419 206
rate 0.4494 0.5663 0.2431 0.24 0.4410 0.2168
30 334 451 183 183 308 168
rate 0.3515 0.4747 0.1926 0.1926 0.3242 0.1768
Table 6.16: The categorization result of PSO based SSL. The prediction is based on KWNN classifier, and k is 3 in this experiment. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 708 781 279 280 458 248
rate 0.7452 0.8221 0.2936 0.2947 0.4821 0.2610
20 572 590 199 200 370 178
rate 0.6021 0.6210 0.2094 0.2105 0.3894 0.1873
30 459 537 160 160 282 150
rate 0.4831 0.5652 0.1684 0.1684 0.2968 0.1578
Table 6.17: The categorization result of PSO based SSL. The prediction is based on KWNN classifier, and k is 7 in this experiment. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 589 684 253 254 400 231
rate 0.62 0.72 0.2663 0.2673 0.4210 0.2431
20 634 693 173 173 317 174
rate 0.6673 0.7294 0.1821 0.1821 0.3336 0.1831
30 511 583 153 153 244 124
rate 0.5378 0.6136 0.1610 0.1610 0.2568 0.1305
The PSO based semi-supervised learning classifier has the best performance in the experiment. This method utilizes unlabeled data’s prediction to guide other unlabeled 180
Table 6.18: The categorization result of PSO based SSL. The prediction is based on KWNN classifier, and k is 11 in this experiment. Measure Euclidean Manhattan Dice Jaccard Overlap Cosine
10 756 782 268 266 380 250
rate 0.7957 0.8231 0.2821 0.28 0.4 0.2631
20 591 657 191 191 343 162
rate 0.6221 0.6915 0.2010 0.2010 0.3610 0.1705
30 527 499 144 144 261 113
rate 0.5547 0.5252 0.1515 0.1515 0.2747 0.1189
data. The iterative strategy is utilized in the method, and different metrics can be used in different iterations.
6.4.6
Conclusions
For many large scale learning problems, acquiring a large amount of labeled training data is expensive and time-consuming. Semi-supervised learning is a machine learning paradigm which deals with utilizing unlabeled data to build better classifiers. However, unlabeled data with a wrong prediction will mislead the classifier. In this section, we proposed a particle swarm optimization based semi-learning classifier to solve Chinese text categorization problem. This classifier utilized an iterative strategy, and the result of classifier is determined by document’s previous prediction and its neighbors’ information. The new classifier is tested on a Chinese text corpus. In the experiment, the performance of this classifier is better than the k nearest neighbor method, the k weighted nearest neighbor method, and the self learning classifier. The error rate is utilized in this section to measure the performance of different classifier. Beside the error rate, the precision, recall, and Fβ metric are often used to measure performance. These metrics all represent text categorization as a single objective optimization problem. However, for the real world problems, different problems have different error risk (or lost), we may need different solutions under different situations. Fβ metric utilizes a fixed value β to balance precision and recall at a time, less information of classifier can be obtained in this metric. The text categorization problem can be solved as a multi-objective problem [121]. In multi-objective optimization, precision and recall are concerned at the same time, and a proper classifier can be found to suit for different situations.
6.5
Particle Swarm Optimization based Nearest Neighbor
Nearest neighbor method is very effective in categorization [52, 72, 90]. For large scale data, it is very difficult to solve the similarity search problem due to “the curse of dimensionality” [105]. Several approximate search methods have been proposed, such as Locality Sensitive Hashing [56, 142, 198, 211], which is based on hashing functions 181
with strong “local-sensitivity” in order to retrieve nearest neighbors in a Euclidean space with a complexity sublinear in the amount of data. A deficiency of the nearest neighbor method is the setting of parameters. The parameters of a nearest neighbor method are difficult to determine, and most of settings are from the experiences [151]. The evolutionary algorithm approach can be applied to solve this kind of problem. The genetic algorithm has been utilized to find a compact reference set used in nearest neighbor classification [114, 174]. In this section, the nearest neighbor method on Chinese text categorization is formulated as an optimization problem. The particle swarm optimization is utilized to optimize a nearest neighbor classifier to solve Chinese text categorization problem. The parameter k was first optimized to obtain the minimum error, then the categorization problem is formulated as a single objective, discrete, and constrained problem. Each dimension of solution vector is dependent in the solution space. The parameter k and the number of labeled examples for each class are optimized together to reach the minimum categorization error.
6.5.1
k Value Optimization
The k weighted nearest neighbor algorithm is easy to implement. This algorithm only requires an integer k, a set of labeled examples (training data), and a metric to measure “closeness”. The number of labeled examples is setting as 100 for each category, and the Cosine metric is utilized to measure the similarity of texts. The performance of categorization result is affected by the setting of parameter k. It is important to choose the right value of k. If k is too small, then the nearest-neighbor classifier may be susceptible to overfitting because of noise in the training data. On the other hand, if k is too large, the nearest-neighbor classifier may misclassify the test instance because its list of nearest neighbor may include data points that are located far away from its neighborhood. The fitness function of text categorization is as follows when similarity measures are utilized in classifier [33]. The documents may belong to the same class if they have higher similarity. f (x) = max
k X
sim(xli , xu )
i=1
where the
xli
is a labeled document, and xu is a unlabeled document.
The k is set from the experience in most experiments [151]. It will be ineffective to find the best k with the brute force search. Particle swarm optimization algorithm is very effective to optimize the parameter k. The minimum categorization error can be obtained by the optimized k. The object of optimization is to obtain the minimum categorization error: f (k) = min(
wrong predictions ) number of predictions 182
Algorithm 7: The k value optimized nearest neighbor categorization algorithm Input: Let k be the number of nearest neighbor and D be the set of training examples; 1 Initialization: Initialize random k in the swarm, each particle is a classifier; 2 while have not found the “good enough” solution or have not reached the maximum number of iterations do 3 for each unclassified sample z = (x0 , y 0 ) do 4 Compute d(x0 , x), which is the distance between test example x0 and every training example x, (x, y) ∈ D; 5 Select Dz ⊆ D, the set ofP k closest training examples to x0 ; 0 6 The prediction y = max (xi ,yi )∈Dz I(v = yi ); v
Calculate the number of wrong prediction, update the k of each classifier;
7
The different categorization error has different risk or loss function for some real categorization tasks. For some specific cases, a particular class is required to have the minimum categorization error. The object of categorization is to obtain the minimum categorization error in the specific class in this situation. The objective function is as follows: f (k) = min(
6.5.2
wrong predictions in the specific class ) number of predictions in the specific class
k Value and Labeled Examples Optimization
Nearest neighbor method requires a parameter k and a set of labeled examples. The error rate of predictions will be reduced if increasing the number of labeled examples. However, for many large scale learning problems, acquiring a large amount of labeled training data is difficult and time consuming. Different classes have different “hardness” of categorization. For a limited number of labeled examples, given each class the equivalent number of labeled examples does not reach the minimum of categorization error. The number of labeled examples and the parameter k should be optimized to reach the minimum categorization error. The categorization process can be formulated as a discrete, constrained, and single objective problem, and each dimension is dependent in solution space. The object of optimization is to obtain the minimum categorization error: wrong predictions ) f (k, ni ) = min( number of predictions X subject to: ni ≤ N
(6.14)
where ni is the number of labeled examples for each class, and N is the maximum number of labeled examples.
183
Algorithm 8: The k value and labeled examples optimized nearest neighbor categorization algorithm Input: Let k be the number of nearest neighbor and D be the set of training examples; 1 Initialization: Initialize random k and the number of labeled examples in the swarm, each particle is a classifier; 2 while have not found the “good enough” solution or have not reached the maximum number of iterations do 3 for each unclassified sample z = (x0 , y 0 ) do 4 Compute d(x0 , x), which is the distance between test example x0 and every training example x, (x, y) ∈ D; 5 Select Dz ⊆ D, the set ofP k closest training examples to x0 ; 6 The prediction y 0 = max (xi ,yi )∈Dz I(v = yi ); v
Calculate the number of wrong prediction, update the k and number of labeled expmples;
7
6.5.3
Experimental Results and Analysis
Categorization Corpus The test corpus is given in Table 6.19, which has 10 categories and 2816 news articles in total. The documents are separated unequally in each category, The class “Politics” has the most elements, which contains 505 elements, and the class “Computer” only has 200 elements. Table 6.19: The test corpus used in our experimental study. This imbalanced test corpus contains 2816 texts in total, and different categories have different numbers of texts. Item 1 2 3 4 5 6 7 8 9 10
Categories Transport Sports Military Medicine Politics Education Environment Economics Art Computer Total
Numbers 214 450 249 204 505 220 201 325 248 200 2816
In the following, the number of categorized error, and the error rate are utilized to measure the performance of categorization. Error rate is a straightforward way to measure the performance of categorization. It represents the percentage of faultily classified patterns of the test data sets. The error rate is defined as follows: Error rate =
wrong predictions . total number of predictions 184
k Weighted Nearest Neighbor Table 6.20 gives the categorization results of k weighted nearest neighbor classifier. The number of wrong categorized documents and the error rate are given while the training examples of each category are 100 examples. The parameter k is setting within the range [1, 100]. Table 6.20 gives the categorization results of k which is set 1, 5, 11, 20, 30, 40, 50, and 100, respectively. Table 6.20: The categorization error rate of nearest neighbor method with different k. The number of labeled examples is 100 for each class. Cate. Trans. 214 Sports 450 Military 249 Medicine 204 Politics 505 Education 220 Environ. 201 Economics 325 Art 248 Computer 200 Total 2816
k=1 23 0.1074 18 0.04 47 0.1887 50 0.2450 65 0.1287 37 0.1681 40 0.1990 99 0.3046 21 0.0846 20 0.1 420 0.1491
k=5 41 0.1915 14 0.0311 39 0.1566 49 0.2401 35 0.0693 27 0.1227 34 0.1691 80 0.2461 9 0.0362 14 0.07 342 0.1214
k = 11 47 0.2196 20 0.0444 39 0.1566 51 0.25 24 0.0475 27 0.1227 37 0.1840 81 0.2492 13 0.0524 13 0.065 352 0.125
k = 20 71 0.3317 18 0.04 48 0.1927 56 0.2745 17 0.0336 26 0.1181 36 0.1791 80 0.2461 18 0.0725 15 0.075 385 0.1367
k = 30 77 0.3598 23 0.0511 50 0.2008 58 0.2843 12 0.0237 24 0.1090 35 0.1741 84 0.2584 19 0.0766 20 0.1 402 0.1427
k = 40 79 0.3691 23 0.0511 55 0.2208 58 0.2843 9 0.0178 24 0.1090 35 0.1741 89 0.2738 21 0.0846 25 0.125 418 0.1484
k = 50 81 0.3785 20 0.0444 60 0.2409 59 0.2892 7 0.0138 22 0.1 36 0.1791 89 0.2738 19 0.0766 28 0.14 421 0.1495
k = 100 98 0.4579 20 0.0444 77 0.3092 68 0.3333 6 0.0118 22 0.1 40 0.1990 98 0.3015 29 0.1169 28 0.14 486 0.1725
Figure 6.2 gives a more intuitive categorization results of weighted nearest neighbor algorithm with different setting of k, respectively. From the changing curves of error rate, some conclusions can be made. The error rate does not linearly changes with the increasing of parameter k. Also, for specified class, different k should be set to reach the minimum error rate. k value Optimization In this experiment, the population size is 5, iteration is 12, and it has 5 runs independently. The variable of this optimization is k, the evaluation time could be reduced by recording the number of k and its corresponding fitness value. The fitness value will be return directly if the k has been evaluated. The evaluation time will be reduced to 20 ∼ 30 in each run.
Table 6.21 gives the categorization results of k nearest neighbor classifier. The 185
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
10
20
30
40
50
60
70
80
90
100
Figure 6.2: The error rate of nearest neighbor method with different k: the black ∗, +, and ∇ represent the ‘Transport’, ‘Sports’, ‘Military’ class, the blue ∗, +, and ∇ represent the ‘Medicine’, ‘Politics’, and ‘Education’ class, the red ∗, +, and ∇ represent the ‘Environment’, ‘Economics’, ‘Art’ class, the green ∗, and 4 represent the ‘Computer’ class, and the total metric, respectively. The number of labeled examples is 100 for each class. number of wrong categorized documents and the error rate are given while training examples of each category is 100. Figure 6.3 displays the performance of particle swarm optimization in nearest neighbor method. The solutions very quickly converge to the minimum error in five runs. It is very effective to find optimized k to reach minimum error for the whole categorization process or a specified class. k Value and Labeled Examples Optimization In this experiment, the population size is 10, iteration is 100, and it has 5 runs independently. Table 6.22 gives the categorization results of k value and labeled examples optimized nearest neighbor method. Figure 6.4 displays the performance of utilized particle swarm optimization to optimize k and number of labeled examples in nearest neighbor method. From the figure, the conclusion can be made that the speed of convergence is very fast. In general, after 30 iterations, the algorithm will reach the local optima.
6.5.4
Conclusions
Near neighbor method is a simple and effective algorithm in categorization. However, the parameter k and number of labeled examples affect the performance of categorization. For large scale categorization problems, labeled training data is difficult to 186
Table 6.21: The minimum error rate for each class and the categorization process. With the utilization of particle swarm optimization in nearest neighbor method, the different setting of k can be found to reach the minimum error rate. Categories Transport Sports Military Medicine Politics Education Environment Economics Art Computer Total
Number 214 450 249 204 505 220 201 325 248 200 2816
k 1 3 10 2 49, 99, 100 77 3, 28 10 5 10, 13 10
420
error 23 13 39 46 6 21 33 73 9 11 338
error rate 0.107476 0.028888 0.156626 0.225490 0.011881 0.095454 0.164179 0.224615 0.036290 0.055 0.120028
24
410
22
400 20 390 380
18
370
16
360 14 350 12
340 330
0
2
4
6
8
10
10
12
0
2
(a)
4
6
8
10
12
(b)
Figure 6.3: The performance of utilized particle swarm optimization to optimize k in nearest neighbor method. Each class has 100 labeled examples, and optimized k is found to reach the minimum categorization error: (a) minimum error for categorization, (b) minimum error for class “Computer”.
Table 6.22: The minimum error rate for the categorization. The optimized k = 9, and the different number of labeled examples are found to reach the minimum categorization error. Categories Transport Sports Military Medicine Politics Education Environment Economics Art Computer Total
Number 214 450 249 204 505 220 201 325 248 200 2816
Examples 107 50 106 156 68 114 80 95 92 126 994
187
error 32 30 21 11 44 24 37 69 8 5 281
error rate 0.149532 0.066666 0.084337 0.053921 0.087128 0.109090 0.184079 0.212307 0.032258 0.025 0.099786
400
380
360
340
320
300
280
0
20
40
60
80
100
Figure 6.4: The performance of utilized particle swarm optimization to optimize k and number of labeled examples in nearest neighbor method. acquire. Different classes have different “hardness” of categorization. The numbers of labeled examples in each class should be different in limited examples. Finding the proper k and setting the number of labeled examples are important in nearest neighbor method. In this section, we have utilized particle swarm optimization in nearest neighbor classifier to solve Chinese text categorization problem. The parameter k was first optimized to obtain the minimum error, then the categorization problem is formulated as a single objective, discrete, and constrained problem, and the variable in the solution space is dependent to others. The parameter k and the number of labeled examples for each class are optimized together to reach the minimum categorization error. In the experiment, with the utilization of particle swarm optimization, the performance of nearest neighbor can be improved, and the algorithm can obtain the minimum categorization error rate. The error rate is utilized in this paper to measure the performance of different classifier. There are many conflicting objects in text categorization task [57]. Beside the error rate, for the large scale categorization problems, the magnitude of labeled examples is also important in prediction. The error rate and magnitude of labeled examples conflict to each other. With limited labeled examples to reach the minimum error rate, finding minimum labeled examples to reach the expected error rate, or find an acceptable number of labeled examples to reach a satisfactory error rate are common problem in categorization. These problems can be solved as a multi-objective problem [121]. In multi-objective optimization, the number of labeled examples and error rate of prediction can be considered at the same time, and a proper classifier can be found to suit for different situations. The new classifier should be tested on different Chinese text corpus, comparing this method with other classification method, such as support vector machine, and mod-
188
eling text categorization problems as a multi-objective problem, minimizing the error rate and number of labeled examples at the same time may improve the algorithm’s categorization performance.
189
Chapter 7
Conclusions This chapter summarizes the work presented in previous chapters. The potential future researches are also discussed.
7.1
Conclusions
In swarm intelligence, there are many solutions cooperatively to solve the optimization problems. Each fitness evaluation can be seen as a sampling from the search space. From the distribution of the solutions and their corresponded fitness values, we can simulate the landscape of search space. A difficult in swarm intelligence is that the solutions may not easily “jump out” of local optima. From these perspectives, the premature convergence should be prevented, and the distribution of solution should be maintained. We need to find a balance between fast converge speed and the ability of “jumping out” of local optimum. Population diversity is a way to monitor the distribution of solutions. In this thesis, we have defined several kinds of population diversity in particle swarm optimization algorithm. From the experimental results, the population diversities are discussed and analyzed. Several strategies are proposed to enhance the performance of particle swarm optimization through the population diversity maintenance.
7.1.1
Particle Swarm Optimization
Particle swarm optimization is one of the evolutionary computation techniques. It is a population-based stochastic algorithm modeled on the social behaviors observed in flocking birds. The most important factor affecting an optimization algorithm’s performance is its ability of exploration and exploitation. A good optimization algorithm should optimally balance the two conflicted objectives. In this thesis, we have proposed several variants of particle swarm optimization algorithms to solve different kinds of problems. These new algorithms include PSO algorithm with population diversity control [28], PSO algorithm with population diversity promotion [32], PSO algorithm with dynamical exploitation space reduction [33], 190
PSO algorithm with population diversity based inertia weight adaptation [41]. These variants of PSO algorithms can be applied to different problems to archive good performance.
7.1.2
Population Diversity
The exploration and exploitation status can be obtained by observation on the population diversity. In this thesis, the population diversities of particle swarm optimization, which includes position diversity, velocity diversity, and cognitive diversity are defined in [29]. The properties of population diversities are observed, analyzed and discussed. Several diversity enhancement and maintenance methods are proposed to control algorithms’ exploration and exploitation ability. Different problems have different search ranges. To compare the population diversity changes on different problems, we need to normalize the population diversity at first. In [30,39], we have defined and discussed several kinds of normalized population diversity in element-wise and dimension-wise population diversity. The population diversities analysis should be based on the combination of population diversities observation and the properties of problems. From the comparison of population diversity changes of particle swarm optimizer solving single and multi-objective problems [37], the more effective algorithms can be designed to solve different problems.
7.1.3
Single / Multi-Objective Optimization
Generally, the optimization problems can be divided into single objective and multiobjective problems. For single objective problems, we have proposed particle swarm optimization algorithms with population diversity maintenance to solve difference kinds of problems, such as multimodal problems [32], and large-scale problems [33]. One of the main differences between single optimization and multiobjective optimization is that multiobjective optimization constitutes a multidimensional objective space. The approaches to solve multi-objective problems can be broadly categorized as preference-ordering approaches [115], and the objective reduction approaches [22, 23, 195]. In multiobjective optimization, there are many solutions which need to maintain at the same time. The performance metrics may bias the direction of searching. A proper performance metrics is important in the multiobjective optimization. We have analyzed and discussed the performance metrics of multiobjective optimization in [37]. There are several groups of solutions in multiobjective optimization, which include the group of current solutions, the group of solutions in the archive, and group of the nondominated solutions found so far. A multiobjective optimization algorithm needs to maintain the population diversity in all these groups. We have defined and analyzed the population diversities of multiobjective optimization in [34], which measure the 191
distribution of solutions in these groups. From the population diversity analysis, the population diversity can be used as a metric to guide the search to the better and better areas.
7.1.4
Text Categorization
Text Categorization is a field at the intersection of data mining and information retrieval. Its task is to classify documents into a fixed number of one or two predefined classes. Combined with the k weighted nearest neighbors method, the particle swarm optimization is utilized to solve Chinese text category problems. The proposed method can improve the performance of categorization. In this thesis, we have applied the particle swarm optimization algorithm to the text categorization field. Based on the combination of particle swarm optimization algorithm and the nearest neighbor methods, three algorithms are proposed to solve Chinese text categorization problems. These algorithms are particle swarm optimization based semi-supervised learning algorithm [35], particle swarm optimization based nearest neighbor algorithm [42], and the optimization of initialized examples algorithm [38]. The experimental results show that these algorithms can obtain good categorization performance on the Chinese documents. The data mining algorithms also can be utilized in optimization. Every individual in the swarm is not only a solution to the problem to be optimized, but also a data point to reveal the landscapes of the problem. The clustering information can be utilized to reveal the landscapes of problems and to guide the individuals to move toward the better and better areas [40, 202].
7.2
Future Research
The future researches will be focused on the following aspects: 1. Learning in Optimization Different problems have different properties, such as number of local optima, the dependency of each dimension. These attributes affect the performance of optimization. To deal with different properties, we should take different strategies. For some problems, it’s difficult to obtain their properties. If these properties can be recognized during the search, more proper strategies can be taken in the optimization. There are many individuals in the swarm intelligence; each individual in the swarm is not only a solution to the problem to be optimized, but also a data point to reveal the landscapes of the problem. The fitness evaluation process can be seen as a sampling from solution space. Based on the combination of machine learning and data mining techniques, we can have better understanding on the 192
properties of problems, and designing more effective algorithms to archive better performance. 2. Adaption As indicated by the “no free lunch theory”, there is no algorithm that is better than other one on average for all problems. Different optimization algorithms fit for different kinds of problems. To solve more problems, an algorithm should adaptively change its behaviors on different problems. Adaption in optimization means that algorithm can dynamically change its exploration and exploitation ability, such as parameters tuning, using different structures, or diversity maintaining strategies, just to name a few. 3. Big Data Analytics The big data is defined as the dataset whose size is beyond the processing ability of typical database or computers. Four objects are emphasized in the definition, which are capture, store, management, and analysis [162]. The big data analytic is automatically extracting knowledge from large amounts of data. It can be seen as mining or processing of massive data, and “useful” information could be retrieval from large dataset [189]. The big data analysis is required to manage immense amounts of data quickly [189]. The amount of data are attracting more and more attentions, however, the dimension of data and the objective of problems also increase the “hardness” of problems. Three kinds of difficulties should be overwhelmed to solve big data problems: (a) The immense amounts of data should be processed in a limited time; (b) The high dimension of data may decrease the performance of algorithms; (c) The problems may have many objectives that need to be satisfied at the same time. 4. Text Categorization Text retrieval is an interesting topic in data mining. In the big data era, the traditional information retrieval methods are difficult to handle the large amount of high dimensional data. Combining the swarm intelligence with information retrieval is a promising direction. 5. Applications Many real-world applications can be modeled as optimization problems, and swarm intelligence has been utilized in many areas. Utilizing swarm intelligence to solve real world problem is also our future works.
193
To conclude, our works are modeling real world problems as different kind of optimization problems, analyzing the optimization algorithms in swarm intelligence, and combining the machine learning, and data mining techniques to solve these problems.
194
Appendix A
Benchmark Functions A.1
Single Objective Optimization
There are eleven single objective benchmark functions utilized in our experimental study [152, 155, 215, 218, 249], where n is the dimension of each problem. The Section A.1.1 and A.1.2 give the unshift benchmark functions, while the Section A.1.3 and A.1.4 give the shifted benchmark functions. The traditional benchmark functions are unshifted [249], which have some drawbacks. Firstly, the optimal solution are the same for each dimension, most of them are in the middle of the search space, i.e., the optimal solution is 0 for each dimension (for Rosenbrock function, the optimal solution is 1 for each dimension). The fitness values of functions are zero. Secondly, It is meaningless that some algorithms searching for the results with over high degree of accuracy and precision. The important of an algorithm is its generality and ability of find “good enough” solutions. The shift strategy on the benchmark functions can change the optimal solutions and the fitness values. The shift on the solution space makes each dimension have different optimal solutions, while the shift on the objective space changes fitness value of function. Another strategy often utilized on the benchmark functions is coordinate rotation. It is a way of conversion a separable problems into a non-separable one [194].
A.1.1
Unimodal Function
1. Parabolic (Sphere) Function f0 (x) =
n X
x2i
i=1
• Separable, Scalable • x ∈ [−100, 100]n , Global optimum x∗ = {0}n , f0 (x∗ ) = 0,
195
(A.1)
2. Schwefel’s P2.22 Function n X
f1 (x) =
|xi | +
i=1
• x∈
[−10, 10]n ,
Global optimum
x∗
3. Schwefel’s P1.2 Function f2 (x) =
n Y i=1
|xi |
(A.2)
= {0}n , f1 (x∗ ) = 0
n X i X ( xk )2
(A.3)
i=1 k=1
• Non-separable, Scalable
• x ∈ [−100, 100]n , Global optimum x∗ = {0}n , f2 (x∗ ) = 0 4. Step Function f3 (x) =
n X
(bxi + 0.5c)2
(A.4)
i=1
• x ∈ [−100, 100]n , Global optimum x∗ = {0}n , f3 (x∗ ) = 0 5. Quartic Noise Function f4 (x) =
n X
ix4i + random[0, 1)
(A.5)
i=1
• Noise in fitness
• x ∈ [−1.28, 1.28]n , Global optimum x∗ = {0}n , f4 (x∗ ) = 0
A.1.2
Multimodal Function
6. Generalized Rosenbrock Function f5 (x) =
n X i=1
100(xi+1 − x2i )2 + (xi − 1)2
(A.6)
• Non-separable, Scalable • Having a very narrow valley from local optimum to global optimum • x ∈ [−10, 10]n , Global optimum x∗ = {1}n , f5 (x∗ ) = 0 7. Schwefel Function f6 (x) =
n X i=1
p −xi sin( |xi |) + 418.9829n
(A.7)
• x ∈ [−500, 500]n , Global optimum x∗ = {0}n , f6 (x∗ ) = 0 8. Generalized Rastrigin Function f7 (x) =
n X 2 xi − 10 cos(2πxi ) + 10 i=1
196
(A.8)
• Separable, Scalable, the number of local optima is huge
• x ∈ [−5.12, 5.12]n , Global optimum x∗ = {0}n , f7 (x∗ ) = 0 9. Noncontinuous Rastrigin Function f8 (x) =
n X 2 yi − 10 cos(2πyi ) + 10
(A.9)
i=1
where
yi =
|xi | < |xi | ≥
xi round(2xi ) 2
1 2 1 2
• x ∈ [−5.12, 5.12]n , Global optimum x∗ = {0}n , f8 (x∗ ) = 0 10. Ackley Function v u n u1 X f9 (x) = −20 exp −0.2t x2i − exp n
n
1X cos(2πxi ) n
i=1
! + 20 + e (A.10)
i=1
• Separable, Scalable
• x ∈ [−32, 32]n , Global optimum x∗ = {0}n , f9 (x∗ ) = 0 11. Griewank Function f10 (x) =
n
n
i=1
i=1
xi 1 X 2 Y xi − cos( √ ) + 1 4000 i
(A.11)
• Non-separable, Scalable
• x ∈ [−600, 600]n , Global optimum x∗ = {0}n , f10 (x∗ ) = 0 12. Generalized Penalized Function ( ) n−1 X π 2 2 2 2 10 sin (πy1 ) + (yi − 1) × 1 + 10 sin (πyi+1 ) + (yn − 1) f11 (x) = n i=1
+
n X
u(xi , 10, 100, 4)
(A.12)
i=1
1 yi = 1 + (xi + 1) 4 k(xi − a)m 0 u(xi , a, k, m) = k(−xi − a)m
where
xi > a, −a < xi < a xi < −a
• x ∈ [−50, 50]n , Global optimum x∗ = {0}n , f11 (x∗ ) = 0
A.1.3
Unimodal Shifted Function
The benchmark functions used in our experimental study, where n is the dimension of each problem, z = (x − o), x = [x1 , x2 , · · · , xn ], oi is an randomly generated number in problem’s search space S and it is different in each dimension, global optimum x∗ = o, fmin is the minimum value of the function, and S ⊆ Rn . 197
1. Parabolic (Sphere) function f0 (x) =
n X
zi2 + bias0
(A.13)
i=1
• Separable, Scalable • x ∈ [−100, 100]n , Global optimum x∗ = o, f0 (x∗ ) = bias0 2. Schwefel’s function P2.22 f1 (x) =
n X i=1
|zi | +
n Y i=1
|zi | + bias1
(A.14)
• x ∈ [−10, 10]n , Global optimum x∗ = o, f1 (x∗ ) = bias1 3. Schwefel’s function P1.2 f2 (x) =
n X i X zk )2 + bias2 (
(A.15)
i=1 k=1
• x ∈ [−100, 100]n , Global optimum x∗ = o, f2 (x∗ ) = bias2 4. Step function
n X f3 (x) = (bzi + 0.5c)2 + bias3
(A.16)
i=1
• x ∈ [−100, 100]n , Global optimum x∗ = o, f3 (x∗ ) = bias3 5. Quartic Noise function f4 (x) =
n X
izi4 + random[0, 1) + bias4
(A.17)
i=1
• x ∈ [−1.28, 1.28]n , Global optimum x∗ = o, f4 (x∗ ) = bias4
A.1.4
Multimodal Shifted Function
6. Generalized Rosenbrock function f5 (x) =
n−1 X i=1
100(zi+1 − zi2 )2 + (zi − 1)2 + bias5
(A.18)
• z = x − o + 1.0 • x ∈ [−10, 10]n , Global optimum x∗ = o, f5 (x∗ ) = bias5 7. Rastrigin function f6 (x) =
n X i=1
zi2 − 10 cos(2πzi ) + 10 + bias6 198
(A.19)
• x ∈ [−5.12, 5.12]n , Global optimum x∗ = o, f6 (x∗ ) = bias6 8. Noncontinuous Rastrigin function f7 (x) =
n X i=1
yi2 − 10 cos(2πyi ) + 10 + bias7
where
yi =
|zi | < |zi | ≥
zi round(2zi ) 2
1 2 1 2
(A.20)
for i = 1, 2, · · · , n.
• x ∈ [−5.12, 5.12]n , Global optimum x∗ = o, f7 (x∗ ) = bias7 9. Ackley function v u n u1 X zi2 − exp f8 (x) = − 20 exp −0.2t n
i=1
n
1X cos(2πzi ) n
!
i=1
+ 20 + e + bias8
(A.21)
• x ∈ [−32, 32]n , Global optimum x∗ = o, f8 (x∗ ) = bias8 10. Griewank function f9 (x) =
n
n
i=1
i=1
1 X 2 Y zi zi − cos( √ ) + 1 + bias9 4000 i
(A.22)
• x ∈ [−600, 600]n , Global optimum x∗ = o, f9 (x∗ ) = bias9 11. Generalized Penalized function ( ) n−1 X π f10 (x) = 10 sin2 (πy1 ) + (yi − 1)2 × 1 + 10 sin2 (πyi+1 ) + (yn − 1)2 n i=1
+
n X
u(zi , 10, 100, 4) + bias10
(A.23)
i=1
1 yi = 1 + (zi + 1) 4 k(zi − a)m 0 u(zi , a, k, m) = k(−zi − a)m
where
zi > a, −a < zi < a zi < −a
• x ∈ [−50, 50]n , Global optimum x∗ = o, f10 (x∗ ) = bias10
A.2
Multi-Objective Optimization
The Multi-objective benchmark functions used in our experimental study [64, 256, 265, 266], where n is the dimension of each problem.
199
A.2.1
The CEC 2009 Multiobjective Optimization Test Instances
There are six unconstrained (bound constrained) problem [256] in the experimental study, each problem has two objectives to be minimized. The Pareto fronts of each problem are shown in Figure A.2.1. The unconstrained problem 1, 2, 3, 4, and 7 have a continuous Pareto front, the unconstrained problem 5 has a discrete Pareto front. 1. Unconstrained problem 1 (UCP1) jπ 2 2 X xj − sin(6πx1 + f1 = x1 + ) |J1 | n j∈J1 √ 2 X jπ 2 xj − sin(6πx1 + ) f2 = 1 − x1 + |J2 | n
(A.24)
(A.25)
j∈J2
where J1 = {j|j is odd and 2 ≤ j ≤ n} and J2 = {j|j is even and 2 ≤ j ≤ n}. The search space is [0, 1] × [−1, 1]n−1 . The Pareto front is f2 = 1 −
p f1 ,
0 ≤ f1 ≤ 1.
The Pareto set is xj = sin(6πx1 +
jπ ), n
j = 2, · · · , n,
0 ≤ x1 ≤ 1.
2. Unconstrained problem 2 (UCP2) f1 = x1 +
2 X 2 yj |J1 |
(A.26)
j∈J1
f2 = 1 −
√
x1 +
2 X 2 yj |J2 |
(A.27)
j∈J2
where J1 = {j|j is odd and 2 ≤ j ≤ n} and J2 = {j|j is even and 2 ≤ j ≤ n}, and
yi =
xj − [0.3x21 cos(24πx1 + xj − [0.3x21 cos(24πx1 +
4jπ n ) 4jπ n )
+ 0.6x1 ] cos(6πx1 + jπ n) jπ + 0.6x1 ] sin(6πx1 + n )
j ∈ J1 j ∈ J2
The search space is [0, 1] × [−1, 1]n−1 . The Pareto front is f2 = 1 − The Pareto set is {0.3x21 cos(24πx1 + xj = {0.3x21 cos(24πx1 +
p f1 ,
4jπ n ) 4jπ n )
0 ≤ f1 ≤ 1.
+ 0.6x1 } cos(6πx1 + jπ n) jπ + 0.6x1 } sin(6πx1 + n )
0 ≤ x1 ≤ 1 200
j ∈ J1 j ∈ J2
3. Unconstrained problem 3 (UCP3) f1 = x1 +
X Y 20yj π 2 (4 yj2 − 2 cos( √ ) + 2) |J1 | j j∈J1
f2 = 1 −
√
(A.28)
j∈J1
X Y 20yj π 2 x1 + (4 yj2 − 2 cos( √ ) + 2) |J2 | j j∈J2
(A.29)
j∈J2
where J1 = {j|j is odd and 2 ≤ j ≤ n}, J2 = {j|j is even and 2 ≤ j ≤ n}, and 0.5(1.0+
yi = xj − x1
3(j−2) ) n−2
j = 2, · · · , n.
,
The search space is [0, 1]n . The Pareto front is f2 = 1 −
p f1 ,
0 ≤ f1 ≤ 1.
The Pareto set is 0.5(1.0+
xj = x1
3(j−2) ) n−2
j = 2, · · · , n. 0 ≤ x1 ≤ 1.
,
4. Unconstrained problem 4 (UCP4) f1 = x1 +
2 X h(yj ) |J1 |
(A.30)
j∈J1
f2 = 1 − x21 +
2 X h(yj ) |J2 |
(A.31)
j∈J2
where J1 = {j|j is odd and 2 ≤ j ≤ n} and J2 = {j|j is even and 2 ≤ j ≤ n}. yi = xj − sin(6πx1 + and
jπ ), n
j = 2, · · · , n.
|t| . 1 + e2|t|
h(t) =
The search space is [0, 1] × [−2, 2]n−1 . The Pareto front is f2 = 1 − f12 ,
0 ≤ f1 ≤ 1.
The Pareto set is xj = sin(6πx1 +
jπ ), n
j = 2, · · · , n,
0 ≤ x1 ≤ 1.
5. Unconstrained problem 5 (UCP5) f1 = x1 + (
1 2 X + ε)| sin(2N πx1 )| + h(yj ) 2N |J1 |
(A.32)
j∈J1
1 2 X f2 = 1 − x1 + ( + ε)| sin(2N πx1 )| + h(yj ) 2N |J2 | j∈J2
201
(A.33)
where J1 = {j|j is odd and 2 ≤ j ≤ n}, J2 = {j|j is even and 2 ≤ j ≤ n}. N is
an interger, ε > 0, N = 10, ε = 0.1 in the parameter setting. yj = xj − sin(6πx1 +
jπ ), n
j = 2, · · · , n.
and h(t) = 2t2 − cos(4πt) + 1 The search space is [0, 1]×[−1, 1]n−1 . The Pareto front has 2N +1 Pareto optimal solutions:
i i ,1 − 2N 2N
for i = 0, 1, · · · , 2N.
6. Unconstrained problem 7 (UCP7) f1 =
√ 5
2 X 2 yj |J1 |
x1 +
(A.34)
j∈J1
f2 = 1 −
√ 5
x1 +
2 X 2 yj |J2 |
(A.35)
j∈J2
where J1 = {j|j is odd and 2 ≤ j ≤ n}, J2 = {j|j is even and 2 ≤ j ≤ n} and yj = xj − sin(6πx1 +
jπ ), n
j = 2, · · · , n.
The search space is [0, 1] × [−1, 1]n−1 . The Pareto front is f2 = 1 −
p f1 ,
0 ≤ f1 ≤ 1.
The Pareto set is xj = sin(6πx1 +
jπ ), n
j = 2, · · · , n,
202
0 ≤ x1 ≤ 1.
1
1
0.9
0.9
Pareto front
0.8
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0.2
0.4
0.6
0.8
Pareto front
0.8
0.7
0
1
0
0.2
(a) UCP 1 1
1 0.9
Pareto front
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1 0
0.2
0.4
0.6
0.8
0
1
0
0.2
1
1 0.9
Pareto front
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1 0.2
0.4
0.6
0.6
0.8
1
0.8
Pareto front
0.8
0.7
0
0.4
(d) UCP 4
0.9
0
1
Pareto front
(c) UCP 3
0.8
0.8
0.8
0.7
0
0.6
(b) UCP 2
0.9 0.8
0.4
0
1
(e) UCP 7
0
0.2
0.4
0.6
0.8
1
(f) UCP 7
Figure A.1: The Pareto front of Unconstrained (bound constrained) Problems: UCP 1 ∼ UCP 5, and UCP 7.
203
Bibliography [1] Ajith Abraham, Swagatam Das, and Amit Konar. Document clustering using differential evolution. In Proceedings of the 2006 IEEE Congress on Evolutionary Computations (CEC 2006), pages 1784–1791, July 2006. [2] Ajith Abraham, Crina Grosan, and Vitorino Ramos, editors. Swarm Intelligence in Data Mining, volume 34 of Studies in Computational Intelligence. Springer Berlin/Heidelberg, 2006. [3] Salem F. Adra, Tony J. Dodd, Ian A. Griffin, and Peter J. Fleming. Convergence acceleration operator for multiobjective optimization. IEEE Transactions on Evolutionary Computation, 12(4):825–847, August 2009. [4] Salem F. Adra and Peter J. Fleming. Diversity management in evolutionary many-objective optimization. IEEE Transactions on Evolutionary Computation, 15(2):183–195, April 2011. [5] Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim. On the surprising behavior of distance metrics in high dimensional space. In Jan Van den Bussche and Victor Vianu, editors, Database Theory – ICDT 2001, volume 1973 of Lecture Notes in Computer Science, pages 420–434. Springer Berlin / Heidelberg, 2001. [6] Charu C. Aggarwal and ChengXiang Zhai, editors. Mining Text Data. Springer, 2012. [7] David W. Aha. Lazy learning. Kluwer Academic Publishers, 1997. [8] Abbas Ahmadi, Fakhri Karray, and Mohamed Kamel. Multiple cooperating swarms for data clustering. In Proceedings of the 2007 IEEE Swarm Intelligence Symposium (SIS 2007), pages 206 – 212, April 2007. [9] Carmelo J. A. Bastos Filho, Fernando B. de Lima Neto, Anthony J. C. C. Lins, Antˆ onio I. S. Nascimento, and Mar´ılia P. Lima. A novel search algorithm based on fish school behavior. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, (SMC 2008), pages 2646–2651, 12-15 October 2008. 204
[10] Carmelo J. A. Bastos Filho, Fernando B. de Lima Neto, Anthony J. C. C. Lins, Antˆ onio I. S. Nascimento, and Mar´ılia P. Lima. Fish school search. In Raymond Chiong, editor, Nature-Inspired Algorithms for Optimisation, volume 193 of Studies in Computational Intelligence, pages 261–277. Springer Berlin/Heidelberg, 2009. [11] Richard Bellman. Adaptive Control Processes: A guided Tour. Princeton University Press, Princeton, NJ, 1961. [12] Peter J. Bentley. Evolutionary Design by Computers. Morgan Kaufmann Publishers, June 1999. [13] Hans-Georg Beyer and Hans-Paul Schwefel. Evolution strategies: A comprehensive introduction. Natural Computing, 1:3–52, 2002. [14] Tim Blackwell. Particle swarms and population diversity. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 9:793–802, 2005. [15] Tim Blackwell. A study of collapse in bare bones particle swarm optimization. IEEE Transactions on Evolutionary Computation, 16(3):354–372, June 2012. [16] Tim Blackwell and P. Bentley. Don’t push me! collision-avoiding swarms. In Proceedings of The Fourth Congress on Evolutionary Computation (CEC 2002), pages 1691–1696, May 2002. [17] David C. Blair and M. E. Maron. An evaluation of retrieval effectiveness for a full-text document–retrieval system. Communications of the ACM, 28(3):289– 299, March 1985. [18] Eric Bonabeau, Marco Dorigo, and Guy Theraulaz. Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, 1999. [19] Peter A. N. Bosman and Dirk Thierens. The balance between proximity and diversity in multiobjective evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 7(2):174–188, April 2003. [20] J¨ urgen Branke. Memory enhanced evolutionary algorithms for changing optimization problems. In Proceedings of the 1999 Congress on Evolutionary Computation (CEC 1999), volume 3, pages 1975–1882, July 1999. [21] Daniel Bratton and James Kennedy. Defining a standard for particle swarm optimization. In Proceedings of the 2007 IEEE Swarm Intelligence Symposium (SIS 2007), pages 120–127, April 2007.
205
[22] Dimo Brockhoff and Eckart Zitzler. Are all objectives necessary? on dimensionality reduction in evolutionary multiobjective optimization. In Thomas Philip Runarsson, Hans-Georg Beyer, Edmund Burke, Juan J. Merelo-Guerv´os, L. Darrell Whitley, and Xin Yao, editors, Parallel Problem Solving from Nature - PPSN IX, volume 4193 of Lecture Notes in Computer Science, pages 533–542. Springer Berlin Heidelberg, 2006. [23] Dimo Brockhoff and Eckart Zitzler. Objective reduction in evolutionary multiobjective optimization: Theory and applications. Evolutionary Computation, 17(2):135–166, Summer 2009. [24] Lam Thu Bui, Zbignew Michalewicz, Eddy Parkinson, and Manuel Blanco Abello. Adaptation in dynamic environments: A case study in mission planning. IEEE Transactions on Evolutionary Computation, 16(2):190–209, April 2012. [25] Alejandro Cervantes, In´es Mar´ıa Galv´an, and Pedro Isasi. AMPSO: A New Particle Swarm Method for Nearest Neighborhood Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(5):1082–1091, October 2009. [26] Stephen Chen and James Montgomery. A simple strategy to maintain diversity and reduce crowding in particle swarm optimization. In Genetic and Evolutionary Computation Conference (GECCO 2011 Companion), pages 811–812, July 2011. [27] Tianshi Chen, Ke Tang, Guoliang Chen, and Xin Yao. Analysis of computational time of simple estimation of distribution algorithms. IEEE Transactions on Evolutionary Computation, 14(1):1–22, February 2010. [28] Shi Cheng and Yuhui Shi. Diversity control in particle swarm optimization. In Proceedings of 2011 IEEE Symposium on Swarm Intelligence (SIS 2011), pages 110–118, Paris, France, April 2011. [29] Shi Cheng and Yuhui Shi. Measurement of PSO Diversity Based on L1 Norm. Chinese Journal of Computer Science, 38(7):190–193, July 2011. [30] Shi Cheng and Yuhui Shi. Normalized population diversity in particle swarm optimization. In Ying Tan, Yuhui Shi, Yi Chai, and Guoyin Wang, editors, Advances in Swarm Intelligence, volume 6728 of Lecture Notes in Computer Science, pages 38–45. Springer Berlin/Heidelberg, 2011. [31] Shi Cheng, Yuhui Shi, and Quande Qin. Experimental Study on Boundary Constraints Handling in Particle Swarm Optimization: From Population Diversity Perspective. International Journal of Swarm Intelligence Research (IJSIR), 2(3):43–69, July-September 2011. 206
[32] Shi Cheng, Yuhui Shi, and Quande Qin. Promoting diversity in particle swarm optimization to solve multimodal problems. In Bao-Liang Lu, Liqing Zhang, and James Kwok, editors, Neural Information Processing, volume 7063 of Lecture Notes in Computer Science, pages 228–237. Springer Berlin/Heidelberg, 2011. [33] Shi Cheng, Yuhui Shi, and Quande Qin. Dynamical exploitation space reduction in particle swarm optimization for solving large scale problems. In Proceedings of 2012 IEEE Congress on Evolutionary Computation, (CEC 2012), pages 3030– 3037, Brisbane, Australia, 2012. IEEE. [34] Shi Cheng, Yuhui Shi, and Quande Qin. On the performance metrics of multiobjective optimization. In Ying Tan, Yuhui Shi, and Zhen Ji, editors, Advances in Swarm Intelligence, (International Conference on Swarm Intelligence, ICSI 2012), volume 7331 of Lecture Notes in Computer Science, pages 504–512. Springer Berlin / Heidelberg, 2012. [35] Shi Cheng, Yuhui Shi, and Quande Qin. Particle swarm optimization based semisupervised learning on chinese text categorization. In Proceedings of 2012 IEEE Congress on Evolutionary Computation, (CEC 2012), pages 3131–3198, Brisbane, Australia, 2012. IEEE. [36] Shi Cheng, Yuhui Shi, and Quande Qin. Population diversity based study on search information propagation in particle swarm optimization. In Proceedings of 2012 IEEE Congress on Evolutionary Computation, (CEC 2012), pages 1272– 1279, Brisbane, Australia, 2012. IEEE. [37] Shi Cheng, Yuhui Shi, and Quande Qin. Population diversity of particle swarm optimizer solving single and multi-objective problems. International Journal of Swarm Intelligence Research (IJSIR), 3(4):23–60, 2012. [38] Shi Cheng, Yuhui Shi, and Quande Qin. Examples initialization in chinese text categorization. In Proceedings of The Third IEEE International Conference on Information Science and Technology (ICIST 2013), pages 967–971, Yangzhou, China, 2013. IEEE. [39] Shi Cheng, Yuhui Shi, and Quande Qin. A study of normalized population diversity in particle swarm optimization. International Journal of Swarm Intelligence Research (IJSIR), 4(1):1–34, 2013. [40] Shi Cheng, Yuhui Shi, Quande Qin, and Shujing Gao. Solution clustering analysis in brain storm optimization algorithm. In Proceedings of The 2013 IEEE Symposium on Swarm Intelligence, (SIS 2013), pages 111–118, Singapore, 2013. IEEE. 207
[41] Shi Cheng, Yuhui Shi, Quande Qin, and Tiew On Ting. Population diversity based inertia weight adaptation in particle swarm optimization. In Proceedings of The Fifth International Conference on Advanced Computational Intelligence, (ICACI 2012), pages 395–403, Nanjing, China, 2012. IEEE. [42] Shi Cheng, Yuhui Shi, Quande Qin, and Tiew On Ting. Particle swarm optimization based nearest neighbor algorithm on chinese text categorization. In Proceedings of The 2013 IEEE Symposium on Swarm Intelligence, (SIS 2013), pages 164–171, Singapore, 2013. IEEE. [43] Maurice Clerc. From theory to practice in particle swarm optimization. In Bijaya Ketan Panigrahi, Yuhui Shi, and Meng-Hiot Lim, editors, Handbook of Swarm Intelligence, volume 8 of Adaptation, Learning, and Optimization, pages 3–36. Springer Berlin Heidelberg, 2010. [44] Maurice Clerc and James Kennedy. The particle swarm–explosion, stability, and convergence in a multidimensional complex space. IEEE Transactions on Evolutionary Computation, 6(1):58–73, February 2002. [45] Carlos A. Coello Coello, Gary B. Lamont, and David A. Van Veldhuizen. Evolutionary Algorithms for Solving Multi-Objective Problems. Genetic and Evolutionary Computation Series. Springer, second edition, 2007. [46] Carlos A. Coello Coello and Maximino Salazar Lechuga. MOPSO: A Proposal for Multiple Objective Particle Swarm Optimization. In Proceedings of The Fourth Congress on Evolutionary Computation (CEC 2002), pages 1051–1056, May 2002. [47] Carlos A. Coello Coello, Gregorio Toscano Pulido, and Maximino Salazar Lechuga. Handling multiple objectives with particle swarm optimization. IEEE Transactions on Evolutionary Computation, 8(3):256–279, June 2004. [48] Carlos Artemio Coello Coello, Satchidananda Dehuri, and Susmita Ghosh, editors. Swarm Intelligence for Multi-objective Problems in Data Mining, volume 242 of Studies in Computational Intelligence. Springer Berlin/Heidelberg, 2009. [49] Sandra C. M. Cohen and Leandro N. de Castro. Data clustering with particle swarms. In Proceedings of the 2006 IEEE Congress on Evolutionary Computations (CEC 2006), pages 1792–1798, July 2006. [50] Steven M. Corns, Kenneth M. Bryden, and Daniel A. Ashlock. Solution transfer rates in graph based evolutionary algorithms. In Proceedings of the 2005 IEEE Congress on Evolutionary Computation (CEC 2005), volume 2, pages 1699–1705, 2005.
208
[51] Guillaume Corriveau, Raynald Guilbault, Antoine Tahan, and Robert Sabourin. Review and Study of Genotypic Diversity Measures for Real-Coded Representations. IEEE Transactions on Evolutionary Computation, 16(5):695–710, October 2012. [52] Thomas M. Cover and P. E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21–27, January 1967. [53] Xiaohui Cui, Thomas E. Potok, and Paul Palathingal. Document clustering using particle swarm optimization. In Proceedings of 2005 IEEE Swarm Intelligence Symposium (SIS 2005), pages 185–191, June 2005. [54] Carlos Manuel Mira da Fonseca. Multiobjective Genetic Algorithms with Application to Control Engineering Problems. PhD thesis, Department of Automatic Control and Systems Engineering, The University of Sheffield, September 1995. [55] Swagatam Das, P. N. Suganthan, and Carlos Artemio Coello Coello. Guest editorial: Special issue on differential evolution. IEEE Transactions on Evolutionary Computation, 15(1):1–3, February 2011. [56] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. LocalitySensitive Hashing Scheme Based on p-Stable Distributions. In Jack Snoeyink and Jean-Daniel Boissonnat, editors, Proceedings of the 20th ACM Symposium on Computational Geometry, pages 253–262, Brooklyn, New York, USA, June 2004. ACM. [57] Randall Davis, Howard Shrobe, and Peter Szolovits. What Is a Knowledge Representation? AI Magazine, 14(1):17–33, Spring 1993. [58] Leandro Nunes de Castro and Jonathan Timmis. Artificial Immune Systems: A New Computational Intelligence Approach. Springer, November 2002. [59] Kenneth Alan De Jong. An analysis of the behavior of a class of genetic adaptive systems. PhD thesis, Department of Computer and Communication Sciences, University of Michigan, August 1975. [60] Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197, April 2002. [61] Kalyanmoy Deb, Shubham Gupta, David A. Daum, J¨ urgen Branke, Abhishek Kumar Mall, and Dhanesh Padmanabhan. Reliability-based optimization using evolutionary algorithms.
IEEE Transactions on Evolutionary Computation,
13(5):1054–1074, October 2009. 209
[62] Kalyanmoy Deb and Murat K¨oksalan.
Guest editorial:
Special issue on
preference-based multiobjective evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 14(5):669–670, October 2010. [63] Kalyanmoy Deb, Ankur Sinha, Pekka J. Korhonen, and Jyrki Wallenius. An interactive evolutionary multiobjective optimization method based on progressively approximated value functions. IEEE Transactions on Evolutionary Computation, 14(5):723–739, October 2010. [64] Kalyanmoy Deb, Lothar Thiele, Marco Laumanns, and Eckart Zitzler. Scalable Test Problems for Evolutionary Multi-Objective Optimization. Technical Report No. 112, Indian Institute of Technology Kanpur, July 2001. [65] Francesco di Pierro, Soon-Thiam Khu, and Dragan A. Savi´c. An investigation on preference order ranking scheme for multiobjective evolutionary optimization. IEEE Transactions on Evolutionary Computation, 11(1):17–45, 2007. [66] Pedro Domingos. A few useful things to know about machine learning. Communications of the ACM, 55(10):78–87, October 2012. [67] Jorge S. Hern´ andez Dom´ınguez and Gregorio Toscano Pulido. A Comparison on the Search of Particle Swarm Optimization and Differential Evolution on MultiObjective Optimization. In Proceedings of 2011 IEEE Congress on Evolutionary Computation, (CEC 2011), pages 1978–1985, June 2011. [68] Marco Dorigo and Luca Maria Gambardella. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1):53–66, April 1997. [69] Marco Dorigo, Vittorio Maniezzo, and Albert Colorni. Ant system: optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 26(1):29–41, February 1996. [70] Marco Dorigo and Thomas St¨ utzle. Ant Colony Optimization. MIT Press, June 2004. [71] Bernab´e Dorronsoro and Pascal Bouvry. Improving Classical and Decentralized Differential Evolution With New Mutation Operator and Population Topologies. IEEE Transactions on Evolutionary Computation, 15(1):67–98, February 2011. [72] Sahibsingh A. Dudani. The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(4):325–327, April 1976. [73] Margaret H. Dunham. Data Mining Introductory and Advanced Topics. Pearson Education, 2003. 210
[74] Russell Eberhart and James Kennedy. A new optimizer using particle swarm theory. In Proceedings of the Sixth International Symposium on Micro Machine and Human Science, pages 39–43, 1995. [75] Russell Eberhart and Yuhui Shi. Particle swarm optimization: Developments, applications and resources. In Proceedings of the 2001 Congress on Evolutionary Computation (CEC2001), pages 81–86, 2001. [76] Russell Eberhart and Yuhui Shi. Computational Intelligence: Concepts to Implementations. Morgan Kaufmann Publisher, first edition, 2007. [77] Russell C. Eberhart, Roy W. Dobbins, and Patrick K. Simpson. Computational Intelligence PC Tools. Academic Press Professional, 1996. [78] Russell C. Eberhart and Yuhui Shi. Guest editorial special issue on particle swarm optimization. IEEE Transactions on Evolutionary Computation, 8(3):201–203, June 2004. [79] Matthias Ehrgott. Multiobjective optimization. AI Magazine, 29(4):47–57, December 2008. [80] A. E. Eiben and C. A. Schippers. On evolutionary exploration and exploitation. Fundamenta Informaticae, 35(1-4):35–50, 1998. ´ [81] Agoston Endre Eiben, Robert Hinterding, and Zbigniew Michalewicz. Parameter control in evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 3(2):124–141, July 1999. [82] Mohammed El-Abd and Mohamed Kamel. Information exchange in multiple cooperating swarms. In Proceedings of 2005 IEEE Swarm Intelligence Symposium (SIS 2005), pages 138–142, June 2005. [83] Andries Engelbrecht, Xiaodong Li, Martin Middendorf, and Luca Maria Gambardella. Editorial special issue: Swarm intelligence. IEEE Transactions on Evolutionary Computation, 13(4):677–680, August 2009. [84] Muzaffar Eusuff, Kevin Lansey, and Fayzul Pasha. Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization. Engineering Optimization, 38(2):129–154, March 2006. [85] Muzaffar M. Eusuff and Kevin E. Lansey. Optimization of water distribution network design using the shuffled frog leaping algorithm. Journal of Water Resources Planning and Management, 129(3):210–225, 2003.
211
[86] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. From Data Mining to Knowledge Discovery in Databases. AI Magazine, 17(3):37–54, Fall 1996. [87] Daniel R. Fealko. Evaluating Particle Swarm Intelligence Techniques for Solving University Examination Timetabling Problems. PhD thesis, Nova Southeastern University, August 2005. [88] Ronen Feldman and James Sanger. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, 2006. [89] Sevan G. Ficici. Monotonic solution concepts in coevolution. In Genetic and Evolutionary Computation Conference (GECCO 2005), pages 499–506, June 2005. [90] Evelyn Fix and J. L. Hodges. Discriminatory analysis–nonparametric discrimination: Consistency properties. Technical Report Number 21-49-004, USAF School of Aviation Medicine, Randolph Field, Texas, February 1951. [91] David B. Fogel. Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. the IEEE Press Series on Computational Intelligence. IEEE Press, third edition, 2006. [92] Lawrence J. Fogel. Evolutionary Programming in Perspective: The Top-Down View. In Jacek M. Zurada, Robert J. II Marks, and Charles J. Robinson, editors, Computational Intelligence: Imitating Life, pages 135–146. IEEE Press, Piscataway, NJ, July 1994. [93] Lawrence J. Fogel, Alvin J. Owens, and Michael John Walsh. Artificial Intelligence through Simulated Evolution. John Wiley & Sons, October 1966. [94] Carlos M. Fonseca and Peter J. Fleming. Genetic Algorithms for Multiobjective Optimization: Formulation, Discussion and Generalization. In Genetic Algorithms: Proceedings of the Fifth International Conference, pages 416–423, July 1993. [95] Carlos M. Fonseca and Peter J. Fleming. Multiobjective Optimization and Multiple Constraint Handling with Evolutionary Algorithms–Part I: A Unified Formulation. IEEE Transactions on Systems, Man, And Cybernetics—Part A: Systems and Humans, 28(1):26–37, January 1998. [96] Carlos M. Fonseca and Peter J. Fleming. Multiobjective Optimization and Multiple Constraint Handling with Evolutionary Algorithms–Part II: Application Example. IEEE Transactions on Systems, Man, And Cybernetics—Part A: Systems and Humans, 28(1):38–47, January 1998. 212
[97] George Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3:1289–1305, 2003. [98] Chi Keong Goh and Kay Chen Tan. An investigation on noisy environments in evolutionary multiobjective optimization. IEEE Transactions on Evolutionary Computation, 11(3):354–381, June 2007. [99] Andrew Brian Goldberg. New Directions in Semi-Supervised Learning. PhD thesis, University of Wisconsin-Madison, 2010. [100] David E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1989. [101] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, third edition, 1996. [102] Anil K. Gupta, Ken G. Smith, and Christina E. Shalley. The interplay between exploration and exploitation. Academy of Management Journal, 49(4):693–706, August 2006. [103] Michael Pilegaard Hansen and Andrzej Jaszkiewicz. Evaluating the quality of approximations to the non-dominated set. Technical Report MM-REP-1998-7, Technical University of Denmark, March 1998. [104] William Eugene Hart. Adaptive Global Optimization with Local Search. PhD thesis, Department of Computer Science & Engineering, University of California, San Diego, 1994. [105] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, second edition, February 2009. [106] John H. Holland. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. The University of Michigan Press, December 1975. [107] John H. Holland. Building blocks, cohort genetic algorithms, and hyperplanedefined functions. Evolutionary Computation, 8(4):373–391, December 2000. [108] Jeffrey Horn. The Nature of Niching: Genetic Algorithms and The Evolution of Optimal, Cooperative Populations. PhD thesis, University of Illinois at UrbanaChampaign, 1997.
213
[109] Xiaohui Hu and Russell Eberhart. Multiobjective optimization using dynamic neighborhood particle swarm optimization.
In Proceedings of The Fourth
Congress on Evolutionary Computation (CEC 2002), pages 1677–1681, May 2002. [110] Xiaohui Hu, Russell C. Eberhart, and Yuhui Shi. Particle swarm with extended memory for multiobjective optimization. In Proceedings of 2003 IEEE Swarm Intelligence Symposium (SIS 2003), pages 193–197, April 2003. [111] Xiaohui Hu, Yuhui Shi, and Russell Eberhart. Recent advances in particle swarm. In Proceedings of the 2004 Congress on Evolutionary Computation (CEC2004), pages 90–97, 2004. [112] Han Huang, Hu Qin, Zhifeng Hao, and Andrew Lim. Example-based learning particle swarm optimization for continuous optimization. Information Sciences, 182(1):125–138, January 2012. [113] Tony Huang and Ananda Sanagavarapu Mohan. A hybrid boundary condition for robust particle swarm optimization. IEEE Antennas and Wireless Propagation Letters, 4:112–117, 2005. [114] Hisao Ishibuchi and Tomoharu Nakashima. Evolution of reference sets in nearest neighbor classification. In Bob McKay, Xin Yao, Charles S. Newton, Jong-Hwan Kim, and Takeshi Furuhashi, editors, Simulated Evolution and Learning, volume 1585 of Lecture Notes in Computer Science, pages 82–89. Springer Berlin / Heidelberg, 1999. [115] Hisao Ishibuchi, Noritaka Tsukamoto, and Yusuke Nojima. Evolutionary manyobjective optimization: A short review. In Proceedings of 2008 IEEE Congress on Evolutionary Computation (CEC2008), pages 2424–2431, Hong Kong, June 2008. [116] Hisao Ishibuchi, Noritaka Tsukamoto, and Yusuke Nojima. Diversity improvement by non-geometric binary crossover in evolutionary multiobjective optimization. IEEE Transactions on Evolutionary Computation, 14(6):985–998, December 2010. [117] Hisao Ishibuchi, Tadashi Yoshida, and Tadahiko Murata. Balance between genetic search and local search in memetic algorithms for multiobjective permutation flowshop scheduling. IEEE Transactions on Evolutionary Computation, 7(2):204– 223, April 2003. [118] Yaochu Jin. A comprehensive survey of fitness approximation in evolutionary computation. Soft Computing, 9(1):3–12, 2005.
214
[119] Yaochu Jin and J¨ urgen Branke. Evolutionary Optimization in Uncertain Environments – A Survey. IEEE Transactions on Evolutionary Computation, 9(3):303– 317, June 2005. [120] Yaochu Jin and Bernhard Sendhoff. Constructing dynamic optimization test problems using the multi-objective optimization concept. In G¨ unther R. Raidl, Stefano Cagnoni, J¨ urgen Branke, David Wolfe Corne, Rolf Drechsler, Yaochu Jin, Colin G. Johnson, Penousal Machado, Elena Marchiori, Franz Rothlauf, George D. Smith, and Giovanni Squillero, editors, Applications of Evolutionary Computing, volume 3005 of Lecture Notes in Computer Science, pages 525–536. Springer Berlin / Heidelberg, 2004. [121] Yaochu Jin and Bernhard Sendhoff. A systems approach to evolutionary multiobjective structural optimization and beyond. IEEE Computational Intelligence Magazine, 4(3):62–76, August 2009. [122] Yaohong Jin, Wen Xiong, and Cong Wang. Feature selection for chinese text categorization based on improved particle swarm optimization. In International Conference on Natural Language Processing and Knowledge Engineering (NLPKE), pages 1–6, Beijing, August 2010. [123] Thorsten Joachims and Fabrizio Sebastiani. Guest editors’ introduction to the special issue on automated text categorization. Journal of Intelligent Information Systems, 18:103–105, 2002. [124] Dervis Karaboga. An idea based on honey bee swarm for numerical optimization. Technical report, Erciyes University, Engineering Faculty, Computer Engineering Department, October 2005. [125] Dervis Karaboga and Bahriye Akay. A survey: algorithms simulating bee swarm intelligence. Artificial Intelligence Review, 31(1-4):61–85, 2009. [126] Dervis Karaboga and Bahriye Basturk. Artificial bee colony (abc) optimization algorithm for solving constrained optimization problems. In Patricia Melin, Oscar Castillo, Luis Aguilar, Janusz Kacprzyk, and Witold Pedrycz, editors, Foundations of Fuzzy Logic and Soft Computing, volume 4529 of Lecture Notes in Computer Science, pages 789–798. Springer Berlin/Heidelberg, 2007. [127] Dervis Karaboga and Bahriye Basturk. A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. Journal of Global Optimization, 39(3):459–471, November 2007. [128] Dervis Karaboga and Bahriye Basturk. On the performance of artificial bee colony (ABC) algorithm. Applied Soft Computing, 8(1):687–697, 2008. 215
[129] Dervis Karaboga and Celar Ozturk. A novel clustering approach: Artificial Bee Colony (ABC) algorithm. Applied Soft Computing, 11(1):652–657, January 2011. [130] James Kennedy. Small worlds and mega-minds: Effects of neighborhood topology on particle swarm performance. In Proceedings of the 1999 Congress on Evolutionary Computation (CEC1999), pages 1931–1938, 1999. [131] James Kennedy. Bare bones particle swarms. In Proceedings of the 2003 IEEE Swarm Intelligence Symposium (SIS 2003), pages 80–87, April 2003. [132] James Kennedy. Some issues and practices for particle swarms. In Proceedings of the 2007 IEEE Swarm Intelligence Symposium (SIS 2007), pages 162–169, April 2007. [133] James Kennedy and Russell Eberhart. Particle swarm optimization. In Proceedings of IEEE International Conference on Neural Networks (ICNN), pages 1942–1948, 1995. [134] James Kennedy, Russell Eberhart, and Yuhui Shi. Swarm Intelligence. Morgan Kaufmann Publisher, 2001. [135] James Kennedy and Rui Mendes. Population structure and particle swarm performance. In Proceedings of The Fourth Congress on Evolutionary Computation (CEC 2002), pages 1671–1676, May 2002. [136] James Kennedy and Rui Mendes. Neighborhood topologies in fully informed and best-of-neighborhood particle swarms. IEEE Transactions on Systems, Man, and Cybernetics–Part C: Applications and Reviews, 36(4):515–519, July 2006. [137] Jeffrey O. Kephart. A biologically inspired immune system for computers. In Proceedings of Artificial Life IV: The Fourth International Workshop on the Synthesis and Simulation of Living Systems, pages 130–139. MIT Press, 1994. [138] Joshua Knowles. ParEGO: A Hybrid Algorithm With On-Line Landscape Approximation for Expensive Multiobjective Optimization Problems. IEEE Transactions on Evolutionary Computation, 1(10):50–66, February 2005. [139] Joshua D. Knowles and David Corne. On metrics for comparing non-dominated sets.
In Proceedings of the 2002 Congress on Evolutionary Computation
(CEC2002), pages 711–716, 2002. [140] Joshua D. Knowles and David Corne. Properties of an adaptive archiving algorithm for storing nondominated vectors. IEEE Transactions on Evolutionary Computation, 7(2):100–116, April 2003.
216
[141] John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. A Bradford Book, December 1992. [142] Brian Kulis and Kristen Grauman. Kernelized locality-sensitive hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(6):1092–1104, June 2012. [143] William B. Langdon, Riccardo Poli, Owen Holland, and Thiemo Krink. Understanding particle swarm optimisation by evolving problem landscapes. In Proceedings of 2005 IEEE Swarm Intelligence Symposium (SIS 2005), pages 30–37, June 2005. [144] Carsten Lanquillon. Enhancing Text Classification to Improve Information Filtering. PhD thesis, DaimlerChrysler AG, Research & Technology, December 2001. [145] John A. Lee and Michel Verleysen. Nonlinear Dimensionality Reduction. Information Science and Statistics. Springer, 2007. [146] Yee Leung, Yong Gao, and Zong-Ben Xu. Degree of population diversity—a perspective on premature convergence in genetic algorithms and its markov chain analysis. IEEE Transactions on Neural Networks, 8(5):1165–1176, September 1997. [147] Changhe Li, Shengxiang Yang, T. T. Nguyen, E. L. Yu, Xin Yao, Yaochu Jin, Hans-Georg Beyer, and P. N. Suganthan. Benchmark generator for cec’2009 competition on dynamic optimization. Technical report, University of Leicester, October 2008. [148] Changhe Li, Shengxiang Yang, and David Alejandro Pelta. Benchmark generator for the ieee wcci-2012 competition on evolutionary computation for dynamic optimization problems. Technical report, School of Computer Science, China University of Geosciences, October 1998. [149] Hui Li and Qingfu Zhang. Multiobjective Optimization Problems With Complicated Pareto Sets, MOEA/D and NSGA-II. IEEE Transactions on Evolutionary Computation, 13(2):284–302, April 2009. [150] Xiaodong Li and Xin Yao. Cooperatively coevolving particle swarms for large scale optimization. IEEE Transactions on Evolutionary Computation, 16(2):210– 224, April 2012. [151] Xiaoming Li, Hongfei Yan, and Jimin Wang. Search Engine: Principle, Technology and Systems. Science Press, 2004.
217
[152] Jing J. Liang, A. Kai Qin, Ponnuthurai Nagaratnam Suganthan, and S. Baskar. Comprehensive learning particle swarm optimizer for global optimization of multimodal functions. IEEE Transactions on Evolutionary Computation, 10(3):281– 295, June 2006. [153] Jing J. Liang, Thomas Philip Runarsson, Efr´en Mezura-Montes, Maurice Clerc, P. N. Suganthan, Carlos A. Coello Coello, and Kalyanmoy Deb. Problem Definitions and Evaluation Criteria for the CEC 2006 Special Session on Constrained Real-Parameter Optimization. Technical report, Nanyang Technological University, September 2006. [154] Jing J. Liang and Ponnuthurai Nagaratnam Suganthan. Dynamic multi-swarm particle swarm optimizer. In Proceedings of 2005 IEEE Swarm Intelligence Symposium (SIS 2005), pages 124–129, June 2005. [155] Jing J. Liang, Ponnuthurai Nagaratnam Suganthan, and Kalyanmoy Deb. Novel composition test functions for numerical global optimization. In Proceedings of 2005 IEEE Swarm Intelligence Symposium (SIS 2005), pages 68–75, June 2005. [156] Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Data-Centric Systems and Applications. Springer-Verlag Berlin Heidelberg, second edition, 2011. [157] Yiqun Liu, Min Zhang, Liyun Ru, and Shaoping Ma. Data cleansing for web information retrieval using query independent features. Journal of The American Society for Information Science and Technology, 58(12):1884–1898, 2007. [158] Sushil J. Louis. Genetic Algorithms as a Computational Tool for Design. PhD thesis, Department of Computer Science, Indiana University, August 1993. [159] Yanping Lu, Shengrui Wang, Shaozi Li, and Changle Zhou. Particle swarm optimizer for variable weighting in clustering high-dimensional data. Machine Learning, 82(1):43–70, 2011. [160] Rammohan Mallipeddi and Ponnuthurai N. Suganthan.
Ensemble of con-
straint handling techniques. IEEE Transactions on Evolutionary Computation, 14(4):561–579, August 2010. [161] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch¨ utze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. [162] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. Big data: The next frontier for 218
innovation, competition, and productivity. Technical report, McKinsey Global Institute, May 2011. [163] James G. March. Exploration and exploitation in organizational learning. Organization Science, 2(1):71–87, February 1991. [164] David Martens, Bart Baesens, and Tom Fawcett. Editorial survey: swarm intelligence for data mining. Machine Learning, 82(1):1–42, 2011. [165] Michael L. Mauldin. Maintaining diversity in genetic search. In Proceedings of the National Conference on Artificial Intelligence (AAAI 1984), pages 247–250, August 1984. [166] Rui Mendes. Population Topologies and Their Influence in Particle Swarm Performance. PhD thesis, Universidade do Minho, April 2004. [167] Rui Mendes, James Kennedy, and Jos´e Neves. Avoiding the pitfalls of local optima: How topologies can save the day. In Proceedings of the 12th Conference Intelligent Systems Application to Power Systems (ISAP2003). IEEE Computer Society, August 2003. [168] Rui Mendes, James Kennedy, and Jos´e Neves. The fully informed particle swarm: Simpler, maybe better. IEEE Transactions on Evolutionary Computation, 8(3):204–210, June 2004. [169] Efr´en Mezura-Montes and Carlos A. Coello Coello. A simple multimembered evolution strategy to solve constrained optimization problems. IEEE Transactions on Evolutionary Computation, 9(1):1–17, February 2005. [170] Zbigniew Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs. Springer, third, revised and extended edition, 1996. [171] David S. Moore, George P. McCabe, and Bruce Craig. Introduction to the Practice of Statistics. W. H. Freeman and Company, New York, seventh edition, April 2011. [172] Ronald W. Morrison and Kenneth A. De Jong. A test problem generator for nonstationary environments. In Proceedings of the 1999 Congress on Evolutionary Computation (CEC 1999), volume 3, pages 2047–2053, July 1999. [173] Un Yong Nahm. Text Mining with Information Extraction. PhD thesis, Department of Computer Sciences, The University of Texas at Austin, August 2004. [174] Tomoharu Nakashima and Hisao Ishibuchi. GA-based approaches for finding the minimum reference set for nearest neighbor classification. In Proceedings of The 219
1998 IEEE International Conference on Evolutionary Computation, (CEC 1998), pages 709–714, 1998. [175] Sunil Nakrani and Craig Tovey. On honey bees and dynamic allocation in an internet server colony. In Proceedings of 2nd International Workshop on The Mathematics and Algorithms of Social Insects, Atlanta, Georgia, USA, December 2003. [176] Sunil Nakrani and Craig Tovey. On honey bees and dynamic server allocation in internet hosting centers. Adaptive Behavior, 12(3–4):223–240, 2004. [177] Ahmad Nickabadi, Mohammad Mehdi Ebadzadeh, and Reza Safabakhsh. A novel particle swarm optimization algorithm with adaptive inertia weight. Applied Soft Computing, 11(4):3658–3670, 2011. [178] Tatsuya Okabe, Yaochu Jin, and Bernhard Sendhoff. A critical survey of performance indices for multi-objective optimisation. In Proceedings of the 2003 Congress on Evolutionary Computation (CEC2003), volume 2, pages 878–885, December 2003. [179] Olusegun Olorunda and Andries P. Engelbrecht. Measuring Exploration / Exploitation in Particle Swarms using Swarm Diversity. In Proceedings of the 2008 Congress on Evolutionary Computation (CEC 2008), pages 1128–1134, 2008. [180] Ingo Paenke, J¨ urgen Branke, and Yaochu Jin. Efficient search for robust solutions by means of evolutionary algorithms and fitness approximation. IEEE Transactions on Evolutionary Computation, 10(4):405–420, August 2006. [181] Sankar K. Pal, Varun Talwar, and Pabitra Mitra. Web mining in soft computing framework: Relevance, state of the art and future directions. IEEE Transactions on Neural Networks, 13(5):1163–1177, September 2002. [182] Bijaya Ketan Panigrahi, Yuhui Shi, and Meng-Hiot Lim. Handbook of Swarm Intelligence:Concepts, Principles and Applications, volume 8 of Adaptation, Learning, and Optimization. Springer, 2011. [183] Taemin Kim Park. The nature of relevance in information retrieval: An empirical study. The Library Quarterly, 63(3):318–351, July 1993. [184] Kevin M. Passino. Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Systems Magazine, pages 52–67, June 2002. [185] Kevin M. Passino.
Biomimicry for Optimization, Control, and Automation.
Springer-Verlag, London, UK, 2005.
220
[186] Kevin M. Passino. Bacterial foraging optimization. International Journal of Swarm Intelligence Research (IJSIR), 1(1):1–16, January-March 2010. [187] Thanmaya Peram, Kalyan Veeramachaneni, and Chilukuri K. Mohan. Fitnessdistance-ratio based particle swarm optiniization. In Proceedings of the 2003 IEEE Swarm Intelligence Symposium (SIS 2003), pages 174–181, April 2003. [188] Robin C. Purshouse and Peter J. Fleming. On the evolutionary optimization of many conflicting objectives. IEEE Transactions on Evolutionary Computation, 11(6):770–784, December 2007. [189] Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman. Mining of Massive Datasets. Cambridge University Press, July 2012. [190] Asanga Ratnaweera, Saman K. Halgamuge, and Harry C. Watson. Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients. IEEE Transactions on Evolutionary Computation, 8(3):240–255, June 2004. [191] Ingo Rechenberg. Evolutionsstrategie – Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. PhD thesis, Department of Process Engineering, Technical University of Berlin, 1971. [192] Samrat L. Sabat and Layak Ali. The hyperspherical acceleration effect particle swarm optimizer. Applied Soft Computing, 9(3):906–917, June 2009. [193] Samrat L. Sabat, Layak Ali, and Siba K. Udgatab. Integrated learning particle swarm optimizer for global optimization. Applied Soft Computing, 11(1):574–584, January 2011. [194] Ralf Salomon. Re-evaluating genetic algorithm performance under coordinate rotation of benchmark functions. A survey of some theoretical and practical aspects of genetic algorithms. BioSystems, 39(3):263–278, 1996. [195] Dhish Kumar Saxena, Jo ao A. Duro, Ashutosh Tiwari, Kalyanmoy Deb, and Qingfu Zhang.
Objective reduction in many-objective optimization: Linear
and nonlinear algorithms. IEEE Transactions on Evolutionary Computation, 17(1):77–99, February 2013. [196] David W. Scott and James R. Thompson. Probability density estimation in higher dimensions. In James E. Gentle, editor, Computer Science and Statistics: Proceedings of the Fifteenth Symposium on the Interface, pages 173–179, 1983. [197] Hamed Shah-Hosseini. The intelligent water drops algorithm: a nature-inspired swarm-based optimization algorithm. International Journal of Bio-Inspired Computation (IJBIC), 1(1/2):71–79, 2009. 221
[198] Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk, editors.
Nearest-
Neighbor Methods in Learning and Vision: Theory and Practice. Neural Information Processing series. The MIT Press, March 2006. [199] Yun-Wei Shang and Yu-Huang Qiu. A note on the extended rosenbrock function. Evolutionary Computation, 14(1):119–126, 2006. [200] Yang Shi, Hongcheng Liu, Liang Gao, and Guohui Zhang. Cellular particle swarm optimization. Information Sciences, 181(20):4460–4493, October 2011. [201] Yuhui Shi. Brain storm optimization algorithm. In Ying Tan, Yuhui Shi, Yi Chai, and Guoyin Wang, editors, Advances in Swarm Intelligence, volume 6728 of Lecture Notes in Computer Science, pages 303–309. Springer Berlin/Heidelberg, 2011. [202] Yuhui Shi. An optimization algorithm based on brainstorming process. International Journal of Swarm Intelligence Research (IJSIR), 2(4):35–62, OctoberDecember 2011. [203] Yuhui Shi and Russell Eberhart. A modified particle swarm optimizer. In Proceedings of the 1998 Congress on Evolutionary Computation (CEC1998), pages 69–73, 1998. [204] Yuhui Shi and Russell Eberhart. Parameter selection in particle swarm optimization. In Evolutionary Programming VII, volume 1447 of Lecture Notes in Computer Science, pages 591–600. Springer Berlin/Heidelberg, 1998. [205] Yuhui Shi and Russell Eberhart. Empirical study of particle swarm optimization. In Proceedings of the 1999 Congress on Evolutionary Computation (CEC1999), pages 1945–1950, July 1999. [206] Yuhui Shi and Russell Eberhart. Fuzzy adaptive particle swarm optimization. In Proceedings of the 2001 Congress on Evolutionary Computation (CEC2001), pages 101–106, 2001. [207] Yuhui Shi and Russell Eberhart. Population diversity of particle swarms. In Proceedings of the 2008 Congress on Evolutionary Computation (CEC2008), pages 1063–1067, 2008. [208] Yuhui Shi and Russell Eberhart. Monitoring of particle swarm optimization. Frontiers of Computer Science, 3(1):31–37, March 2009. [209] Yuhui Shi, Russell Eberhart, and Yaobin Chen. Implementation of evolutionary fuzzy system. IEEE Transactions on Fuzzy Systems, 7(2):109–119, 1999.
222
[210] S. N. Sivanandam and S. N. Deepa. Introduction to Genetic Algorithms. Springer Berlin Heidelberg, 2011. [211] Malcolm Slaney and Michael Casey. Locality-sensitive hashing for finding nearest neighbors. IEEE Signal Processing Magazine, 25(2):128–131, March 2008. [212] Banu Soylu and Murat K¨oksalan. A favorable weight-based evolutionary algorithm for multiple criteria problems. IEEE Transactions on Evolutionary Computation, 14(2):191–205, April 2010. [213] William M. Spears, Derek T. Green, and Diana F. Spears. Biases in particle swarm optimization. International Journal of Swarm Intenlligence Research (IJSIR), 1(2):34–57, April–June 2010. [214] N. Srinivas and Kalyanmoy Deb. Multiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary Computation, 2(3):221–248, Fall 1994. [215] Ponnuthurai Nagaratnam Suganthan, N. Hansen, Jing J. Liang, Kalyanmoy Deb, Y. P. Chen, A. Auger, and S. Tiwari. Problem Definitions and Evaluation Criteria for the CEC 2005 Special Session on Real-Parameter Optimization. Technical report, Nanyang Technological University, May 2005. [216] Rangarajan K. Sundaram. A First Course in Optimization Theory. Cambridge University Press, 1996. [217] Pang Ning Tan, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining. Addison Wesley, 2005. [218] Ke Tang, Xiaodong Li, Ponnuthurai Nagaratnam Suganthan, Zhenyu Yang, and Thomas Weise. Benchmark Functions for the CEC’2010 Special Session and Competition on Large-Scale Global Optimization. Technical report, University of Science and Technology of China, January 2010. [219] David Todd. Multiple Criteria Genetic Algorithms in Engineering Design and Operation. PhD thesis, Department of Marine Technology, University of Newcastle, October 1997. [220] Chi-Yang Tsai and I-Wei Kao. Particle swarm optimization with selective particle regeneration for data clustering. Expert Systems with Applications, 38(6):6565– 6576, June 2011. [221] Frans van den Bergh. An Analysis of Particle Swarm Optimizers. PhD thesis, Department of Computer Science, University of Pretoria, November 2001.
223
[222] Frans van den Bergh and Andries Petrus Engelbrecht. A study of particle swarm optimization particle trajectories. Information Sciences, 176:937–971, 2006. ˇ [223] Matej Crepinˇ sek, Shih-Hsi Liu, and Marjan Mernik. Exploration and exploitation in evolutionary algorithms: A survey. ACM Computing Surveys, 45(3):1–33, June 2013. [224] Michel Verleysen. Learning high-dimensional data. In S. Ablameyko, M. Gori, L. Goras, and V. Piuri, editors, Limitations and Future Trends in Neural Computation, volume 186 of NATO Science Series, III: Computer and Systems Sciences, pages 141–162. IOS Press, 2003. [225] Bo Wang, Shuming Wang, and Junzo Watada. Fuzzy-portfolio-selection models with value-at-risk. IEEE Transactions on Fuzzy Systems, 19(4):758–769, August 2011. [226] Fu Lee Wang and Christopher C. Yang. Mining web data for chinese segmentation. Journal of The American Society for Information Science and Technology, 58(12):1820–1837, 2007. [227] Shuming Wang, Junzo Watada, and Witold Pedrycz. Value-at-risk-based twostage fuzzy facility location problems. IEEE Transactions on Industrial Informatics, 5(4):465–482, November 2009. [228] Roger Weber, Hans-J. Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th International Conference on Very Large DataBases (VLDB 1998), pages 194–205, New York, USA, 1998. [229] Thomas Weise, Michael Zapf, Raymond Chiong, and Antonio J. Nebro. Why is optimization difficult?
In Nature-Inspired Algorithms for Optimisation,
volume 193 of Studies in Computational Intelligence, pages 1–50. Springer Berlin/Heidelberg, 2009. [230] Lyndon While, Lucas Bradstreet, and Luigi Barone. A fast way of calculating exact hypervolumes. IEEE Transactions on Evolutionary Computation, 16(1):86– 95, February 2012. [231] R. Lyndon While, Philip Hingston, Luigi Barone, and Simon Huband. A faster algorithm for calculating hypervolume. IEEE Transactions on Evolutionary Computation, 10(1):29–38, February 2006. [232] David H. Wolpert. The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7):1341–1390, October 1996. 224
[233] David H. Wolpert and William G. Macready. No free lunch theorems for search. Technical Report SFI-TR-95-02-010, The Santa Fe Institute, February 1996. [234] David H. Wolpert and William G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, April 1997. [235] David H. Wolpert and William G. Macready. Coevolutionary free lunches. IEEE Transactions on Evolutionary Computation, 9(6):721–735, December 2005. [236] Zimin Wu and Gwyneth Tseng. Chinese text segmentation for text retrieval: Achievements and problems. Journal of The American Society for Information Science, 44(9):532–542, 1993. [237] Rui Xu, Jie Xu, and Donald C. Wunsch II. A comparison study of validity indices on swarm-intelligence-based clustering. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, 42(4):1243–1256, August 2012. [238] Shenheng Xu and Yahya Rahmat-Samii. Boundary conditions in particle swarm optimization revisited. IEEE Transactions on Antennas and Propagation, 55(3):760 –765, March 2007. [239] Ronald R. Yager. On the theory of bags. International Journal of General Systems, 13(1):23–37, 1986. [240] Ang Yang, Yin Shan, and Lam Thu Bui. Success in Evolutionary Computation, volume 92 of Studies in Computational Intelligence. Springer, 2008. [241] Shengxiang Yang. Genetic algorithms with memory- and elitism-based immigrants in dynamic environments. Evolutionary Computation, 16(3):385–416, 2008. [242] Xin-She Yang. Nature-Inspired Metaheuristic Algorithms. Luniver Press, February 2008. [243] Xin-She Yang. Firefly algorithms for multimodal optimization. In Osamu Watanabe and Thomas Zeugmann, editors, Stochastic Algorithms: Foundations and Applications, volume 5792 of Lecture Notes in Computer Science, pages 169–178. Springer Berlin / Heidelberg, 2009. [244] Xin-She Yang. Firefly algorithm, l´evy flights and global optimization. In Max Bramer, Richard Ellis, and Miltos Petridis, editors, Research and Development in Intelligent Systems XXVI, pages 209–218. Springer London, 2010. [245] Xin-She Yang. Chaos-enhanced firefly algorithm with automatic parameter tuning. International Journal of Swarm Intelligence Research (IJSIR), 2(4):1–11, October-December 2011. 225
[246] Xin-She Yang and Suash Deb. Cuckoo search via l´evy flights. In World Congress on Nature & Biologically Inspired Computing (NaBIC 2009), pages 210–214. IEEE Publications, 2009. [247] Yiming Yang and Xin Liu. A re-examination of text categorization methods. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 42–49, 1999. [248] Zhenyu Yang, Ke Tang, and Xin Yao. Differential evolution for high-dimensional function optimization. In Proceedings of 2007 IEEE Congress on Evolutionary Computation (CEC 2007), pages 35231–3530. IEEE, 2007. [249] Xin Yao, Yong Liu, and Guangming Lin. Evolutionary programming made faster. IEEE Transactions on Evolutionary Computation, 3(2):82–102, July 1999. [250] Tina Yu, Lawrence Davis, Cem Baydar, and Rajkumar Roy.
Evolutionary
Computation in Practice, volume 88 of Studies in Computational Intelligence. Springer, 2008. [251] Zhi-Hui Zhan, Jun Zhang, Yun Li, and Henry Shu-Hung Chung. Adaptive particle swarm optimization. IEEE Transactions on Systems, Man, and Cybernetics–Part B: Cybernetics, 139(6):1362–1381, December 2009. [252] Zhi-Hui Zhan, Jun Zhang, Yun Li, and Yuhui Shi. Orthogonal learning particle swarm optimization. IEEE Transactions on Evolutionary Computation, 15(6):832–847, December 2011. [253] Zhi-Hui Zhan, Jun Zhang, and Yuhui Shi. Experimental Study on PSO Diversity. In Third International Workshop on Advanced Computational Intelligence, pages 310–317, August 2010. [254] Qingfu Zhang, Wudong Liu, Edward Tsang, and Botond Virginas. Expensive Multiobjective Optimization by MOEA/D with Gaussian Process Model. IEEE Transactions on Evolutionary Computation, 3(14):456–474, June 2010. [255] Qingfu Zhang and Heinz M¨ uhlenbein. On the convergence of a class of estimation of distribution algorithms. IEEE Transactions on Evolutionary Computation, 8(2):127–136, April 2004. [256] Qingfu Zhang, Aimin Zhou, Shizheng Zhao, Ponnuthurai Nagaratnam Suganthan, Wudong Liu, and Santosh Tiwari. Multiobjective optimization Test Instances for the CEC 2009 Special Session and Competition. Technical Report CES-487, University of Essex, April 2009.
226
[257] Wenjun Zhang, Xiao-Feng Xie, and De-Chun Bi. Handling boundary constraints for numerical optimization by particle swarm flying in periodic search space. In Proceedings of the 2004 Congress on Evolutionary Computation, pages 2307–2311, 2004. [258] Yan Zhang, Mingyan Jiang, and Dongfeng Yuan. Chinese text mining based on distributed SMO. In IEEE 3rd International Conference on Communication Software and Networks (ICCSN), pages 175–177, 27-29 May 2011. [259] Dongbin Zhao, Yujie Dai, and Zhen Zhang. Computational intelligence in urban traffic signal control: A survey. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews, 42(4):485–494, July 2012. [260] Aimin Zhou, Qingfu Zhang, and Yaochu Jin. Approximating the set of paretooptimal solutions in both the decision and objective spaces by an estimation of distribution algorithm. IEEE Transactions on Evolutionary Computation, 13(5):1167–1189, October 2009. [261] Zheng Zhou and Yuhui Shi. Inertia weight adaption in particle swarm optimization algorithm. In Ying Tan, Yuhui Shi, Yi Chai, and Guoyin Wang, editors, Advances in Swarm Intelligence, volume 6728 of Lecture Notes in Computer Science, pages 71–79. Springer Berlin / Heidelberg, 2011. [262] Xiaojin Zhu. Semi-supervised learning literature survey. Technical Report 1530, University of Wisconsin-Madison, July 2008. [263] Tao Zhuang, Qiqiang Li, Qinggiang Guo, and Xingshan Wang. A two-stage particle swarm optimizer. In IEEE Congress on Evolutionary Computation (CEC 2008), pages 557–563, June 2008. [264] Karin Zielinski, Petra Weitkemper, Rainer Laur, and Karl-Dirk Kammeyer. Optimization of power allocation for interference cancellation with particle swarm optimization. IEEE Transactions on Evolutionary Computation, 13(1):128–150, February 2009. [265] Eckart Zitzler. Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications. PhD thesis, Swiss Federal Institute of Technology Zurich (ETH), November 1999. [266] Eckart Zitzler, Kalyanmoy Deb, and Lothar Thiele. Comparison of multiobjective evolutionary algorithms: Empirical results. Evolutionary Computation, 8(2):173– 195, 2000.
227
[267] Eckart Zitzler, Marco Laumanns, and Lothar Thiele. SPEA2: Improving the strength pareto evolutionary algorithm for multiobjective optimization. In K. Giannakoglou, D. Tsahalis, J. P´eriaux, K. Papailiou, and T. Fogarty, editors, Evolutionary Methods for Design, Optimisation and Control with Application to Industrial Problems (EUROGEN 2001), pages 95–100. International Center for Numerical Methods in Engineering (CIMNE), 2002. [268] Eckart Zitzler, Lothar Thiele, Marco Laumanns, Carlos M. Fonseca, and Viviane Grunert da Fonseca. Performance assessment of multiobjective optimizers: An analysis and review. IEEE Transactions on Evolutionary Computation, 7(2):117–132, April 2003.
228
Index k Nearest Neighbor (KNN), 164 k Weighted Nearest Neighbor (KWNN), 171 Cognitive diversity, 27 Cosine Similarity, 166 Dice Similarity, 166 Euclidean Distance, 165 Evolutionary multiobjective optimization (EMO), 101 Jaccard Similarity, 166 Manhattan Distance, 165 Minkowski Distance, 165 Nearest Neighbor, 169 No free lunch theory, 99 Overlap Similarity, 167 Particle swarm optimization (PSO), 12 Position diversity, 25 Swarm intelligence (SI), 11 Velocity diversity, 26
229