the results for the data sets Cancer, Diabetes, Glass, Heart and Iris, respectively. The value in bold for the. V represents the lowest value obtained for the. V and.
!000111! BBBrrraaazzziiillliiiaaannn SSSyyymmmpppooosssiiiuuummm ooonnn NNNeeeuuurrraaalll NNNeeetttwwwooorrrkkksss
Effect of the PSO Topologies on the Performance of the PSO-ELM Elliackin M. N. Figueiredo and Teresa B. Ludermir Center of Informatics Federal University of Pernambuco Recife, Brazil Email: {emnf,tbl}@cin.ufpe.br imbalanced data set. In all these works, the social topology used in the PSO is the global. In this topology, all particles are fulled connected. That is, each particle can communicate with every other particle. In this case, each particle is attracted towards the best solution found by the entire swarm. Besides of the global topology, other topologies were designed to improve the performance of the PSO. As observed in [5], the performance of the PSO depends strongly of its topology and there is no outright best topology for all problems. Thus, it is important to investigate the effect of the PSO topologies on the performance of the PSO-ELM in the training of SLFNs. Studies on the effect of the PSO topology on the performance of the PSO in training of the neural networks have been carried out in the past [6]. More recently, in 2010, van Wyk and Engelbrecht investigated the overffiting in neural networks trained with the PSO based on three topologies: Global, Local and Von Neumann [7]. Thus, the objective of this paper is to carry out a similar study to those in context of the PSO-ELM. That is, our objective is to investigate the effect of different PSO topologies on the performance of the PSO-ELM classifier. In this paper, we studied five topologies: Global, Local, Von Neumann, Wheel and Four Clusters. The experimental results showed that the Global topology can be more promising than all other topologies. The rest of this paper is organized as follows. Section II presents the ELM algorithm. Section III presents the PSO algorithm and the topologies used in this work. The experimental arrangement is detailed in Section IV. Section V discusses the experimental results. The paper is concluded in Section VI along with the future work.
Abstract—In recent years, the Extreme Learning Machine (ELM) has been hybridized with the Particle Swarm Optimization (PSO). This hybridization is named PSO-ELM. In most of these hybridizations, the PSO uses the global topology. However, other topologies were designed to improve the performance of the PSO. The performance of PSO depends on its topology, and there is not a best topology for all problems. Thus, in this paper, we investigate the effect of the PSO topology on performance of the PSO-ELM. In our study, we investigate five topologies: Global, Local, Von Neumann, Wheel and Four Clusters. The results showed that the Global topology can be more promising than all other topologies. Keywords-Extreme Learning Machine; Particle Swarm Optimization; topology
I. I NTRODUCTION Extreme Learning Machine (ELM) is an efficient learning algorithm for single-hidden layer feedforward neural network (SLFN) [1] created in order to overcome the drawbacks of gradient-based methods. In the last years, the Particle Swarm Optimization (PSO) has been combined with ELM in order to improve the generalization capacity of the SLFNs. In 2006, Xu and Shu presented an evolutionary ELM based on PSO named PSO-ELM and applied this algorithm in a prediction task [2]. In this work, the PSO was used to optimize the input weights and hidden biases of the SLFN. The objective function used to evaluate each particle is the root mean squared error (RMSE) on the validation set. In 2011, Han et al. [3] proposed an Improved Extreme Learning Machine named IPSO-ELM that uses an improved PSO with adaptive inertia to select the input weights and hidden biases of the SLFN. However, unlike the PSO-ELM, the IPSO-ELM optimizes the input weights and hidden biases according to the RMSE on the validation set and the norm of the output weights. Thus, this algorithm can obtain good performance with more compact and well-conditioned SLFN than other approaches of ELMs. In the same year, Sundararajan et al. [4] propose a combination of Integer Coded Genetic Algorithm (ICGA), PSO and ELM used for cancer classification. In this work, the ICGA is used to features selection while the PSO-ELM classifier is used to find the optimal input weights such that the ELM classifier can discriminate the cancer classes efficiently. This PSOELM was proposed in order to handle with sparse and highly !555222222---444888999999///!222 $$$222666...000000 ©©© 222000!222 IIIEEEEEEEEE DDDOOOIII !000...!!000999///SSSBBBRRRNNN...222000!222...222666
II. E XTREME L EARNING M ACHINE Extreme Learning Machine (ELM) is a simple learning algorithm for single-hidden layer feedforward neural networks (SLFNs) proposed by Huang et al. [1]. This learning algorithm not only learns much faster with higher generalization performance than the traditional gradient-based learning algorithms but also avoids many difficulties faced by gradient-based learning methods such as stopping criteria, learning rate, learning epochs, and local minima [8]. ELM works as following. For 𝑁 distinct training samples (x𝑖 , t𝑖 ) ∈ ℜ𝑛 Xℜ𝑚 , where x𝑖 = [𝑥𝑖1 , 𝑥𝑖2 , . . . , 𝑥𝑖𝑛 ]𝑇 and !777888
measured by objective function, 𝑓 (x𝑖 (𝑡)), calculated from the current position of the particle 𝑖. The terms 𝑐1 and 𝑐2 are acceleration coefficients. The terms r1 and r2 are vectors whose components 𝑟1𝑗 and 𝑟2𝑗 respectively are random numbers sampled from an uniform distribution 𝑈 (0, 1). The velocity is limited to the range [v𝑚𝑖𝑛 , v𝑚𝑎𝑥 ]. After updating velocity, the new position of the particle 𝑖 at iteration 𝑡 + 1 is calculated using Eq. 4
t𝑖 = [𝑡𝑖1 , 𝑡𝑖2 , . . . , 𝑡𝑖𝑚 ]𝑇 , the SLFNs with 𝐾 hidden nodes and an activation function 𝑔(𝑥) are mathematically modeled as H𝛃 = T
(1)
where H = {ℎ𝑖𝑗 } (𝑖 = 1, . . . , 𝑁 and 𝑗 = 1, . . . , 𝐾) is the hidden layer output matrix, ℎ𝑖𝑗 = 𝑔(w𝑗𝑇 ⋅ x𝑖 + 𝑏𝑗 ) denotes the output of 𝑗th hidden node with respect to x𝑖 ; w𝑗 = [𝑤𝑗1 , 𝑤𝑗2 , . . . , 𝑤𝑗𝑛 ]𝑇 denotes the weight vector connecting the 𝑗th hidden node and the input nodes, and 𝑏𝑗 is the bias (threshold) of the 𝑗th hidden node; 𝛃 = [𝛃1 , 𝛃2 , . . . 𝛃𝐾 ]𝑇 is the matrix of output weights and 𝛃𝑗 = [𝛽𝑗1 , 𝛽𝑗2 , . . . 𝛽𝑗𝑚 ]𝑇 (𝑗 = 1, . . . , 𝐾) is the the weight vector connecting the 𝑗th hidden node and the output nodes; T = [t1 , t2 , ⋅ ⋅ ⋅ , t𝑁 ]𝑇 denotes the matrix of targets. Thus, initially the ELM generates the input weights and hidden biases randomly. Next, the determination of the output weights, linking the hidden layer to the output layer, consists in finding simply the least-square solution to the linear system given by the Eq. 1. The minimum norm leastsquare (LS) solution to the linear system given by Eq. 1 is ˆ = H† T β
x𝑖 (𝑡 + 1) = x𝑖 (𝑡) + v𝑖 (𝑡 + 1). The Algorithm 1 summarizes the PSO. Algorithm 1 PSO Algorithm 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:
(2)
12:
where H† is the Moore-Penrose (MP) generalized inverse of matrix H.
Initialize the swarm 𝑆; while stopping condition is false do for all particle 𝑖 of the swarm do Calculate the fitness 𝑓 (x𝑖 (𝑡)); Set the personal best position y𝑖 (𝑡); end for ˆ (𝑡) of the swarm; Set the global best position y for all particle 𝑖 of the swarm do Update the velocity v𝑖 (𝑡); Update the position x𝑖 (𝑡); end for end while
B. Topologies
III. PARTICLE S WARM O PTIMIZATION
The PSO described above is known as gbest PSO. In this PSO, each particle is connected to every other particle in the swarm in a type of topology referred to as Global topology. Different topologies have been developed for PSO and empirically studied, each affecting the performance of the PSO in a potentially drastic way [10]. In this paper, five topologies well known in the literature have been selected for comparison: ∙ Global: In this topology, the swarm is fully connected and each particle can therefore communicate with every other particle as illustrated in Fig. 1a. Thus, each particle is attracted towards the best solution found by the entire swarm. This topology has been shown an information flow between particles and convergence faster than other topologies, but with a susceptibility to be trapped in local minima. ∙ Local: In this topology, each particle communicates with its 𝑛𝑁 immediate neighbors. Thus, instead of a global best particle, each particle selects the best particle in its neighborhood. When 𝑛𝑁 = 2, each particle within of the swarm communicates with its immediately adjacent neighbors as illustrated in Fig. 1b. In this paper, 𝑛𝑁 = 2 was used. ∙ Von Neumann: In this topology, each particle is connected to its left, right, top and bottom neighbors on a
This section presents the basic algorithm of Particle Swarm Optimization and the information sharing topologies. A. Basic Algorithm The Particle Swarm Optimization (PSO) is a populationbased stochastic optimization technique developed by Kennedy and Eberhart in 1995, and attempts to model the flight pattern of a flock of birds [9]. In PSO, each particle in the swarm represents a candidate solution for the objective function 𝑓 (x). Thus, the current position of the 𝑖th particle at iteration 𝑡 is denoted by x𝑖 (𝑡). In each iteration of the PSO, each particle moves through the search space with a velocity v𝑖 (𝑡) calculated as follows: v𝑖 (𝑡 + 1) = 𝑤v𝑖 (𝑡) + 𝑐1 r1 [y𝑖 (𝑡) − x𝑖 (𝑡)] y(𝑡) − x𝑖 (𝑡)], + 𝑐2 r2 [ˆ
(4)
(3)
where 𝑤 is the inertia weight, y𝑖 (𝑡) is the personal best ˆ (𝑡) is the global position of the particle 𝑖 at iteration 𝑡 and y best position of the swarm at iteration 𝑡. The personal best position is called pbest and it represents the best position found by the particle during the search process until the iteration 𝑡. The global best position is called gbest and it constitutes the best position found by the entire swarm until the iteration 𝑡. The quality (fitness) of a solution is
!777999
percentage of correct classifications produced by the trained SLFNs on the testing set. Thus, the 𝑇 𝐴 is calculated using Eq. 5
(a) Global
(b) Local
𝑇 𝐴 = 100 ×
(c) Von Neumann
𝑛𝑐 𝑛𝑇
(5)
where 𝑛𝑇 is the size of the testing set and 𝑛𝑐 is the number of correct classifications. The RMSE over the validation set of size 𝑃 is calculated using Eq. 6:
(d) Wheel Figure 1.
𝑅𝑀 𝑆𝐸𝑉 =
(e) Four Cluster Topologies of the PSO
√∑
𝑃 𝑝=1
∑𝐾
𝑘=1 (𝑡𝑘,𝑝
𝑃 ×𝐾
− 𝑜𝑘,𝑝 )2
(6)
where 𝐾 is the number of output units in the SLFN, 𝑡𝑘,𝑝 is the target to the pattern 𝑝 in the output 𝑘, and 𝑜𝑘,𝑝 is the output obtained by the network to the pattern 𝑝 in the output 𝑘.
two dimensional lattice. This topology has been shown in a number of empirical studies to outperform other topologies in a large number of problems [10]. Figure 1c illustrates this topology. ∙ Wheel: In this social topology, the particles in a neighborhood are isolated from one another. One particle serves as the focal point, and all information flow occurs through the focal particle as illustrated in Fig. 1d. In this work, the focal particle was chosen randomly. ∙ Four clusters: In this topology, four clusters of particles are formed with two connections between clusters and each cluster is a subswarm fully connected as illustrated in Fig. 1e. It is worth mentioning that the neighborhood of the particles in all topologies is based on the their indexes. For a more detailed description of these topologies, the reader may consult [5].
Other metric used in this work is the global swarm diversity 𝐷𝑆 [11] that is calculated as follows: $ ∣𝑆∣ % 𝐼 ∑ 1 ∑% ⎷ (𝑥 − 𝑥 ¯ 𝑘 )2 𝐷𝑆 = 𝑖𝑘 ∣𝑆∣ 𝑖=1
(7)
𝑘=1
where ∣𝑆∣ is the swarm size, 𝐼 is the dimensionality of the search space, x𝑖 and x are particle position 𝑖 and the swarm center respectively. This metric is used to measure the convergence of the PSO. In this metric, a small value indicates swarm convergence around the swarm center while a large value indicates a higher dispersion of particles away from the center.
IV. E XPERIMENTAL A RRANGEMENT
C. Problem Sets
In this section, we give the details of the experimental arrangement.
In our experiments, 5 benchmark classification problems obtained from UCI Machine Learning Repository [12] were used. These problems presents different degrees of difficulties and different number of classes. The missing values were handled using the value most frequently of each attribute. The problems and chosen sizes of the hidden layer are given in Tab. I. The sizes of the hidden layer used were the same that [13] In our experiments, all attributes have been normalized into the range [0,1], while the outputs (targets) have been normalize into [-1,1]. The ELM activation function used was the sigmoid function. Each dataset was divided in training, validation and testing sets, as specified in Tab. II. For all topologies, 30 independent executions were done for each dataset. The training, validation and testing sets were randomly generated at each trial of simulations. All topologies were executed on the same sets generated.
A. PSO-ELM Description In this work, we used the PSO to select the input weights and hidden biases of the SLFN, and the MP generalized inverse is used to analytically calculate the output weights. In the PSO used, the inertia varies linearly with time like in [3]. The fitness of each particle is adopted as the root mean squared error (RMSE) on the validation set. We used the RMSE because it is a continuous metric. Thus, the search process can be conducted in the smooth way. B. Metrics Performance In this paper, two metrics are used to evaluate the performance of PSO-ELM: root mean squared error (RMSE) over the validation set referred to as 𝑅𝑀 𝑆𝐸𝑉 and the test accuracy denoted by 𝑇 𝐴. The test accuracy refers to the
!888000
Table I DATA SETS USED IN THE EXPERIMENTS ALONG WITH THE HIDDEN LAYER SIZES OF THE SLFN S .
RMSE Evolution for Cancer 0.38 Global Local VonNeumann Wheel 4Clusters
0.37
Problem
Attributes
Classes
Hidden Nodes
0.36
Cancer
9
2
5
0.35
Pima Indians Diabetes
8
2
10
Glass Identification
9
7
10
Heart
13
2
5
Iris
4
3
10
RMSE
0.34 0.33 0.32 0.31
Table II PARTITION OF THE DATA SETS USED IN THE EXPERIMENTS . Problem
Training
Validation
Testing
Total
Cancer
350
175
174
699
Pima Indians Diabetes
252
258
258
768
Glass Identification
114
50
50
214
Heart
130
70
70
270
Iris
70
40
40
150
0.3 0.29 0
200
400
600
800
1000
Iteration
Figure 2. 𝑅𝑀 𝑆𝐸𝑉 over time on the Cancer dataset. Graph shows mean over 30 samples.
the decrease in the 𝑅𝑀 𝑆𝐸𝑉 occurred differently for each topology. The Wheel topology, for example, presented a decrease of the 𝑅𝑀 𝑆𝐸𝑉 slower than all other topologies empirically. This result is consistent with the literature which highlights the slow sharing of information in this topology. The same result was also observed in a similar way for all data sets evaluated. For the other topologies, their behaviors varied over the data set. However, the following data was observed: the topologies with the lowest 𝑅𝑀 𝑆𝐸𝑉 in the last iteration were the Global and the Von Neumann topologies for all datasets evaluated. For example, for Cancer and Heart, the Global topology presented a better 𝑅𝑀 𝑆𝐸𝑉 than all other topologies, while for Glass, Iris and Diabetes, the Von Neumann topology showed a better 𝑅𝑀 𝑆𝐸𝑉 than all other topologies. The Tables III, IV, V, VI and VII show the results for the data sets Cancer, Diabetes, Glass, Heart and Iris, respectively. The value in bold for the 𝑅𝑀 𝑆𝐸𝑉 represents the lowest value obtained for the 𝑅𝑀 𝑆𝐸𝑉 and the value in bold for 𝐷𝑆 represents the lowest diversity among the topologies. We perform the one-way analysis of variance (ANOVA) to assess whether the performs of the topologies are statistically different or not 1 . We used this technique because the 𝑅𝑀 𝑆𝐸𝑉 samples came from a normal distribution for all topologies. Figure 3 shows the confidence intervals with 95% of confidence of the topologies for the 𝑅𝑀 𝑆𝐸𝑉 for the Cancer dataset2 . As can be seen, the topologies Global, Local, Von Neumann, and Four Clusters have statistically equivalent performance, while the Wheel topology has a
D. PSO Configuration The following PSO configuration was used for the experiment: the 𝑐1 and 𝑐2 control parameters were both set to 2.0. We used an adaptive inertia 𝑤, where the initial inertia is 0.9 and the end inertia is 0.4. This same scheme was used in the IPSO-ELM. The swarm size used was 20. We use a few particles because we wanted to observe the effect of topology on information sharing. If we use a very large number of particles as in [2] and [3], the effect of the topology on the communication of the particles could be hidden. It is worth to say that we did not try to find out the better number of particles for each topology. Thus, the same number of particles was used for all. The PSO was executed for 1000 iterations. We use many iterations because we wanted to provide enough time for the exchange of information between the particles for all topologies. All components in the particle are randomly initialized within the range of [-1,1]. In this work, we used the reflexive approach as was done by [2], i.e, the out-bounded particles in a given dimension will inverse their directions in this dimension and leave the boundary keeping the same velocity. We used 𝑣𝑚𝑎𝑥 = 1 and 𝑣𝑚𝑖𝑛 = −1 for all dimensions of the search space. V. R ESULTS AND D ISCUSSION In this section, we present the results obtained. Figure 2 shows the evolution of the 𝑅𝑀 𝑆𝐸𝑉 along of the iterations of the five PSO topologies on the Cancer dataset. The graph corresponds to the mean over 30 samples. As can be seen from the graph, all topologies showed the same behavior until approximately the 400th iteration. After this iteration,
1 We used the Levene’s test with a significance level equal to 5% to test if the samples of the topologies had equal variances. 2 To generate this confidence intervals, we performed a multiple comparison test using the Matlab function multcompare
!888!
Table III R ESULTS FOR C ANCER . M EAN AND STANDARD DEVIATION ( IN PARENTHESIS ) OVER 30 SAMPLES ARE REPORTED .
0.34
0.33
Topology
𝑅𝑀 𝑆𝐸𝑉
𝑇 𝐴(%)
𝐷𝑆
0.32
Global
0.2907 (0.0440)
96.34 (1.33)
0.8399 (0.1833)
0.31
Local
0.3018 (0.0384)
96.36 (1.44)
3.0889 (0.1709)
Von Neumann
0.2927 (0.0417)
96.32 (1.40)
1.8905 (0.4683)
Wheel
0.3225 (0.0345)
96.46 (1.52)
2.4464 (0.4609)
Four Clusters
0.2929 (0.0461)
96.34 (1.41)
1.6197 (0.4516)
0.3
0.29
Global
Figure 3. dataset.
Local
Von Neumann
Wheel
Four Clusters
Confidence intervals with 5% of confidence for the Cancer
Table IV R ESULTS FOR D IABETES . M EAN AND STANDARD DEVIATION ( IN PARENTHESIS ) OVER 30 SAMPLES ARE REPORTED .
0.66 0.65
Topology
𝑅𝑀 𝑆𝐸𝑉
𝑇 𝐴(%)
𝐷𝑆
Global
0.7587 (0.0263)
76.87 (2.40)
2.1321 (0.5129)
Local
0.7605 (0.0235)
76.25 (2.25)
4.0701 (0.1816)
Von Neumann
0.7574 (0.0230)
77.13 (2.39)
3.6352 (0.3877)
Wheel
0.7716 (0.0227)
76.69 (2.11)
2.8073 (0.5493)
Four Clusters
0.7580 (0.0240)
76.67 (2.18)
3.2000 (0.4128)
0.64 0.63 0.62 0.61 0.6 0.59 0.58 0.57
Global
Figure 4.
Local
Von Neumann
Wheel
Kruskal-Wallis test with 5% of significance 3 . In terms of diversity, the Global topology achieved a lower result than all other topologies empirically over all datasets. It is important to note that all topologies had low values of diversity with small standard deviations over all datasets, indicating convergence of the PSO for all topologies.
Four Clusters
Confidence intervals 5% of confidence for the Heart dataset.
significantly lower performace than these topologies. The Diabetes and Glass datasets had the same result and their graphs are not shown here due to obvious space limitations. For the Iris dataset, all topologies present the same performance in statistical terms. Figure 4 shows the confidence intervals for the Heart dataset. As we can be see, the topologies Global, Von Neumann and Four Clusters have statistically equivalent performance. The topology Wheel has a performance significantily different from all other topologies. Thus, in general, the topologies Global, Von Neumann and Four Clusters have the same performance statistically. An explanation for this is that these topologies are more connected than all other topologies and thus they have similar behavior. In relation to the Test Accuracy, all topologies showed performance statistically equivalent according to the
VI. C ONCLUSION In this paper, we investigate the effect of five topologies on the performance of a PSO-ELM. The results showed that in all data sets evaluated, the Global and Von Neumann topologies achieved a lower 𝑅𝑀 𝑆𝐸𝑉 , in an empirical analysis, than all other topologies. However, the topologies Global, Von Neumann and Four Cluster were equivalents in the statistical analysis on all datasets. Analyzing the convergence, we see that the diversity of the Global topology in the last iteration is lower than all other topologies on all datasets. This means that the Global topology has a good capacity for exploitation, which can make it promising in finding good solutions for the PSO-ELM. Interestingly, all 3 We used the Kruskal-Wallis test because the test accuracies did not follow a normal distribution on all the data set.
!888222
R EFERENCES
Table V R ESULTS FOR G LASS . M EAN AND STANDARD DEVIATION ( IN PARENTHESIS ) OVER 30 SAMPLES ARE REPORTED . Topology
𝑅𝑀 𝑆𝐸𝑉
𝑇 𝐴(%)
𝐷𝑆
Global
0.5478 (0.0311)
62.93 (6.80)
2.2693 (0.4405)
Local
0.5500 (0.0272)
63.07 (6.12)
4.3205 (0.1675)
Von Neumann
0.5442 (0.0330)
63.40 (7.22)
3.8036 (0.3916)
Wheel
0.5675 (0.0262)
62.00 (7.11)
3.2929 (0.4367)
Four Clusters
0.5494 (0.0330)
62.47 (7.06)
3.4969 (0.5615)
[1] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: a new learning scheme of feedforward neural networks,” in Proceedings of the IEEE International Joint Conference on Neural Networks, vol. 2, 2004, pp. 985 –990. [2] Y. Xu and Y. Shu, Evolutionary extreme learning machine Based on particle swarm optimization, ser. Lecture Notes in Computer Science, 2006, vol. 3971. [3] F. Han, H. . Yao, and Q. . Ling, An improved extreme learning machine based on particle swarm optimization, ser. Lecture Notes in Computer Science, 2011, vol. 6840. [4] S. Saraswathi, S. Sundaram, N. Sundararajan, M. Zimmermann, and M. Nilsen-Hamilton, “Icga-pso-elm approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 2, pp. 452–463, 2011.
Table VI R ESULTS FOR H EART. M EAN AND STANDARD DEVIATION ( IN PARENTHESIS ) OVER 30 SAMPLES ARE REPORTED . Topology
𝑅𝑀 𝑆𝐸𝑉
𝑇 𝐴(%)
𝐷𝑆
Global
0.5773 (0.0491)
81.05 (4.73)
0.9695 (0.2889)
Local
0.6024 (0.0457)
82.10 (4.51)
3.6299 (0.1294)
Von Neumann
0.5935 (0.0490)
81.48 (4.74)
2.3723 (0.5204)
Wheel
0.6441 (0.0454)
82.71 (4.97)
2.5188 (0.4675)
Four Clusters
0.5874 (0.0451)
81.14 (4.43)
2.2232 (0.4917)
[5] A. P. Engelbrecht, Computational Intelligence: An Introduction. John Wiley & Sons Ltd, 2007. [6] R. Mendes, P. Cortez, M. Rocha, and J. Neves, “Particle swarms for feedforward neural network training,” in Proceedings of the International Joint Conference on Neural Networks, IJCNN ’02, vol. 2, 2002, pp. 1895 –1899. [7] A. van Wyk and A. Engelbrecht, “Overfitting by pso trained feedforward neural networks,” in IEEE Congress on Evolutionary Computation (CEC), 2010, pp. 1 –8. [8] J. Cao, Z. Lin, and G. bin Huang, “Composite function wavelet neural networks with extreme learning machine,” Neurocomputing, vol. 73, no. 79, pp. 1405 – 1416, 2010.
Table VII R ESULTS FOR I RIS . M EAN AND STANDARD DEVIATION ( IN PARENTHESIS ) OVER 30 SAMPLES ARE REPORTED . Topology
𝑅𝑀 𝑆𝐸𝑉
𝑇 𝐴(%)
𝐷𝑆
Global
0.2937 (0.0457)
95.33 (2.69)
1.7396 (0.2877)
Local
0.2940 (0.0449)
95.67 (3.14)
3.1875 (0.1376)
Von Neumann
0.2911 (0.0485)
96.08 (2.84)
2.7731 (0.2384)
Wheel
0.3074 (0.0443)
95.75 (2.87)
2.3951 (0.5140)
Four Clusters
0.2952 (0.0405)
96.00 (2.98)
2.5861 (0.3020)
[9] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, vol. 4, 1995, pp. 1942 –1948. [10] J. Kennedy and R. Mendes, “Population structure and particle swarm performance,” in Proceedings of the 2002 Congress on Evolutionary Computation. CEC ’02, vol. 2, 2002, pp. 1671 –1676. [11] O. Olorunda and A. Engelbrecht, “Measuring exploration/exploitation in particle swarms using swarm diversity,” in IEEE Congress on Evolutionary Computation, 2008. CEC 2008, june 2008, pp. 1128 –1134. [12] A. Frank and A. Asuncion, “UCI machine learning repository,” 2010. [Online]. Available: http://archive.ics.uci.edu/ml (Accessed: 12 July 2012) [13] D. N. G. Silva, L. D. S. Pacifico, and T. B. Ludermir, “An evolutionary extreme learning machine based on group search optimization,” in IEEE Congress on Evolutionary Computation (CEC), 2011.
topologies showed the same test accuracy in statistical terms. A hypothesis can explain this fact is that all topologies get stuck in a same local minimum. This point needs to be further studied and it is a future work. Another future work is to use dynamic topologies as the Dynamic Clan topology [14]. These topologies provide a good trade-off between exploration and exploitation.
[14] C. Bastos-Filho, D. Carvalho, E. Figueiredo, and P. de Miranda, “Dynamic clan particle swarm optimization,” in Ninth International Conference on Intelligent Systems Design and Applications, 2009. ISDA ’09, 2009, pp. 249 –254.
!888333