Training Neural Networks Using Multiobjective Particle ... - CiteSeerX

Training Neural Networks Using Multiobjective Particle Swarm Optimization John Paul T. Yusiong1 and Prospero C. Naval Jr.2 1

Division of Natural Sciences and Mathematics, University of the Philippines-Visayas, Tacloban City, Leyte, Philippines [email protected] 2 Department of Computer Science, University of the Philippines-Diliman, Diliman, Quezon City, Philippines [email protected]

Abstract. This paper suggests an approach to neural network training through the simultaneous optimization of architectures and weights with a Particle Swarm Optimization (PSO)-based multiobjective algorithm. Most evolutionary computation-based training methods formulate the problem in a single objective manner by taking a weighted sum of the objectives from which a single neural network model is generated. Our goal is to determine whether Multiobjective Particle Swarm Optimization can train neural networks involving two objectives: accuracy and complexity. We propose rules for automatic deletion of unnecessary nodes from the network based on the following idea: a connection is pruned if its weight is less than the value of the smallest bias of the entire network. Experiments performed on benchmark datasets obtained from the UCI machine learning repository show that this approach provides an effective means for training neural networks that is competitive with other evolutionary computation-based methods.

1

Introduction

Neural networks are computational models capable of learning through adjustments of internal weight parameters according to a training algorithm in response to some training examples. Yao [20] describes three common approaches to neural network training and these are: 1. for a neural network with a fixed architecture find a near-optimal set of connection weights; 2. find a near-optimal neural network architecture; 3. simultaneously find both a near-optimal set of connection weights and neural network architecture. Multiobjective optimization (MOO) deals with simultaneous optimization of several possibly conflicting objectives and generates the Pareto set. Each solution in the Pareto set represents a ”trade-off” among the various parameters that optimize the given objectives. L. Jiao et al. (Eds.): ICNC 2006, Part I, LNCS 4221, pp. 879–888, 2006. c Springer-Verlag Berlin Heidelberg 2006

880

J.P.T. Yusiong and P.C. Naval Jr.

In supervised learning, model selection involves finding a good trade-off between at least two objectives: accuracy and complexity. The usual approach is to formulate the problem in a single objective manner by taking a weighted sum of the objectives but Abbass [1] presented several reasons as to why this method is inefficient. Thus, the MOO approach is suitable since the architecture and connection weights can be determined concurrently and a Pareto set can be obtained in a single computer simulation from which a final solution is chosen. Several multiobjective optimization algorithms [7] are based on Particle Swarm Optimization (PSO) [12] which was originally designed to solve single objective optimization problems. Among the multiobjective PSO algorithms are Multiobjective Particle Swarm Optimization (MOPSO) [5] and Multiobjective Particle Swarm Optimization-Crowding Distance (MOPSO-CD) [16]. MOPSO extends the PSO algorithm to handle multiobjective optimization problems by using an external repository and a mutation operator. MOPSOCD is a variant of MOPSO that incorporates the mechanism of crowding distance and together with the mutation operator maintains the diversity in the external archive. Some researches used PSO to train neural networks [18], [19] while others [2]-[3], [9], [23] compared PSO-based algorithms with the backpropagation algorithm and results showed that PSO-based algorithms are faster and get better results in most cases. Likewise, researches that deal with finding a near-optimal set of connection weights and network architecture simultaneously have been done [11], [13], [15], [20]-[22]. The remainder of the paper is organized as follows. Section 2 presents the proposed approach to train neural networks involving a PSO-based multiobjective algorithm. The implementation, experiments done and results are discussed in Section 3. The paper closes with the Summary and Conclusion in Section 4.

2

Neural Network Training Using Multiobjective PSO

The proposed algorithm called NN-MOPSOCD is a multiobjective optimization approach to neural network training with MOPSO-CD [16] as the multiobjective optimizer. The algorithm will simultaneously determine the set of connection weights and its corresponding architecture by treating this problem as a multiobjective minimization problem. In this study, a particle represents a one-hidden layer neural network and the swarm consists of a population of one-hidden layer neural networks. 2.1

Parameters and Structure Representation

The neural network shown in Figure 1 is represented as a vector with dimension D containing the connection weights as illustrated in Figure 2. The dimension of a particle is: D = (I + 1) ∗ H + (H + 1) ∗ O (1)

Training Neural Networks Using Multiobjective PSO

881

where I, H and O respectively refer to the number of input, hidden and output neurons. The connection weights of a neural network are initialized with random values from a uniform distribution in the range of −1 1 √ ,√ (2) f an − in f an − in where the value of fan-in is the number connection weights (inputs) to a neuron. The number of input and output neurons are problem-specific and there is no exact way of knowing the best number of hidden neurons. However, there are rules-of-thumb [17] to obtain this value. We set the number of hidden neurons to 10. This will be sufficient for the datasets used in the experiments.

j = 4.534 a = 0.546 b = 0.345

1

k = 0.0347 3

c = 1.234 d e

l m

6 n

f g

4

o

h

2

p

7

q i 5

Fig. 1. A fully-connected feedforward neural network

Node

4

Link

3 a d

g

b

e

h

Index

0

2

3

4

5 6

1

5 c f 7

6 i

j

l

7 n

p

k

m o

q

8 9 10 11 12 13 14 15 16

Fig. 2. The representation for a fully-connected feedfoward neural network

Node Deletion Rules. One advantage of this representation is its simultaneous support for structure (hidden neuron) optimization and weight adaptation. Node deletion is accomplished automatically based on the following idea: A connection (except bias) is pruned if its value is less than the value of the smallest bias of the network. Thus, a neuron is considered deleted if all incoming connection weights are pruned or if all outgoing connection weights are pruned.

882


A neuron is deleted if any of these conditions are met: 1. incoming connections have weights less than the smallest bias of the network, 2. outgoing connections have weights less than the smallest bias of the network. In Figure 1, the neural network has five biases and these are a, b, c, j, and k, with corresponding values of 0.546, 0.345, 1.234, 4.534 and 0.0347. Among these biases, k is the smallest bias of the network, thus (1) if the weights of incoming connections e and h are smaller than k then neuron 4 is deleted or (2) if the weights of outgoing connections n and o are smaller than k then neuron 4 is deleted. In short, if a neuron has incoming connections or outgoing connections whose weights are smaller than the smallest bias of the network then the neuron is automatically removed. Thus, this simple representation of the NN enables NN-MOPSOCD to dynamically change the structure and connection weights of the network. 2.2

NN-MOPSOCD Overview

NN-MOPSOCD starts by reading the dataset. This is followed by setting the desired number of hidden neurons and the maximum number of generation for MOPSO-CD. The next step is determining the dimension of the particles and initializing the population with fully-connected feedforward neural network particles. In each generation, every particle is evaluated based on the two objective

Start

Stop

Read Dataset Names

Output non−

Set number of hidden neurons

dominated solutions Evaluate() (Objective Functions)

and maximum generation Multi−Objective PSO

Determine the particle’s dimension

Initialize the population

Yes

Maximum Generation Reached?

No

Fig. 3. The procedure for training neural networks with NN-MOPSOCD


883

functions and after the maximum generation is reached the algorithm outputs a set of non-dominated neural networks. Figure 3 illustrates this procedure. 2.3

Objective Functions

Two objective functions are used to evaluate the neural network particle’s performance. The two objective functions are: 1. Mean-squared error (MSE) on the training set 2. Complexity based on the Minimum Description Length (MDL) principle Mean-squared Error. This is the first objective function. It determines the neural network’s training error. Minimum Description Length. NN-MOPSOCD uses MDL as its second objective function to minimize the neural network’s complexity. It follows the MDL principle described in [4], [8]. As stated in [10], the MDL principle asserts that the best model of some data is the one that minimizes the combined cost of describing the model and describing the misfit between the model and the data. MDL is one of the best methods to minimize complexity of the neural network structure since it minimizes the number of active connections by considering the neural network’s performance on the training set. The total description length of the data misfits and the network complexity can be minimized by minimizing the sum of two terms: M DL = Error + Complexity Error = 1.0 −

num of CorrectClassif ication numof Examples

(3) (4)

num of ActiveConnections (5) T otalP ossibleConnections where: Error is the error in classification while Complexity is the network complexity measured in terms of the ratio between active connections and the total number of possible connections. Complexity =

3

Experiments and Results

The goal is to use a PSO-based multiobjective algorithm to simultaneously optimize architectures and connection weights of neural networks. To test its effectiveness, NN-MOPSOCD had been applied to the six datasets stated in Table 1. For each dataset, the experiments were repeated thirty (30) times to minimize the influence of random effects. Each experiment uses a different randomly generated initial population. Table 1 shows the number of training and test cases, classes, continuous and discrete features, input and output neurons as well as the maximum generation

884


count used for training. These datasets are taken from the UCI machine learning repository [14]. For the breast cancer dataset, 16 of the training examples were deleted due to some missing attribute values thus reducing the total number of training examples from 699 to 683. Table 2 shows the parameter settings used in the experiments. Table 1. Description of the datasets used in the experiments Domains Train Set Test Set Class Cont. Disc. Input Output MaxGen Monks-1 124 432 2 0 6 6 1 500 Vote 300 135 2 0 16 16 1 300 Breast 457 226 2 9 0 9 1 300 Iris 100 50 3 4 0 4 3 500 Heart 180 90 2 6 7 13 1 500 Thyroid 3772 3428 3 6 15 21 3 200

Table 2. Parameters and their corresponding values of NN-MOPSOCD Parameters NN-MOPSOCD Optimization Type Minimization Population Size 100 Archive Size 100 Objective Functions 2 Constraints 0 Lower Limit of Variable -100.0 Upper Limit of Variable 100.0 Probability of Mutation (pM) 0.5

3.1

Results and Discussion

For each dataset, the results from each of the 30 independent runs of the NNMOPSOCD algorithm were recorded and analyzed. Tables 3-6 show the results obtained from the experiments. The results show that the average MSE on the training and test sets are small and the misclassification rate are slightly higher. Least-Error versus Least-Complex. Among the neural networks in the Pareto set, two neural networks are considered, the least-error and least-complex neural networks. Least-error neural networks are those that have the smallest value on the first objective function in the Pareto sets while least-complex neural networks are those that have the smallest value on the second objective function in the Pareto sets. Tables 7-9 compare the least-error and least-complex neural networks. It can be seen that least-error neural networks involve a higher


885

number of connections than least-complex neural networks but perform significantly better than the least-complex neural networks. Comparison of Performance and Complexity. The purpose of the comparison is to show that the performance of this PSO-based multiobjective approach is comparable to that of the existing algorithms which optimize architectures and connections weights concurrently in a single objective manner. When presented with a completely new set of data, the capability to generalize is one of the most significant criteria to determine the effectiveness of neural network learning. Table 10 compares the results obtained with that of MGNN [15] while Table 11 compares the results with EPNet [21] in terms of the error on the test set and the number of connections used. Table 3. Average number of neural networks generated Datasets Monks-1 Vote Breast Iris Heart Thyroid

Average 22.500 31.633 28.700 20.300 21.600 22.167

Median Std. Dev. 21.500 9.391 30.000 15.566 28.000 6.444 20.000 7.145 20.500 10.180 23.000 5.025

Table 4. Average mean-squared error on the training and test set Datasets

Training Set Test Set Average Median Std. Dev. Average Median Std. Dev. Monks-1 3.38% 3.40% 0.027 4.59% 4.80% 0.033 Vote 2.06% 1.81% 0.009 2.28% 2.04% 0.008 Breast 1.32% 1.20% 0.003 1.68% 1.61% 0.003 Iris 2.39% 1.46% 0.020 4.58% 4.39% 0.019 Heart 7.37% 7.40% 0.007 7.62% 7.44% 0.012 Thyroid 5.02% 4.97% 0.007 5.21% 5.14% 0.006

Table 5. Average percentage of misclassification on the training and test set Datasets

Training Set Average Median Std. Dev. Average Monks-1 8.79% 8.28% 0.075 12.40% Vote 4.53% 3.79% 0.021 5.34% Breast 2.94% 2.76% 0.007 3.93% Iris 2.97% 1.92% 0.027 6.24% Heart 18.19% 18.36% 0.024 19.34% Thyroid 7.11% 6.91% 0.008 7.42%

Test Set Median Std. Dev. 11.49% 0.094 4.83% 0.021 3.92% 0.007 5.96% 0.026 18.82% 0.041 7.22% 0.006

886

J.P.T. Yusiong and P.C. Naval Jr. Table 6. Average number of connections used Datasets Average Median Std. Dev. Monks-1 32.72 32.02 10.32 Vote 67.34 63.65 28.34 Breast 48.13 48.46 13.18 Iris 66.02 65.50 7.16 Heart 65.35 67.05 26.04 Thyroid 149.27 145.41 30.91

Table 7. Average mean-squared error on the training and test set Datasets

Training Set Test Set Least-Error Least-Complex Least-Error Least-Complex Monks-1 2.16% 5.70% 3.74% 6.66% Vote 1.56% 2.82% 2.07% 2.66% Breast 0.87% 2.57% 1.45% 2.66% Iris 1.42% 4.77% 3.81% 6.84% Heart 6.30% 9.04% 6.87% 8.99% Thyroid 4.12% 7.02% 4.43% 6.98%

Table 8. Average percentage of misclassification on the training and test set Datasets

Training Set Test Set Least-Error Least-Complex Least-Error Least-Complex Monks-1 5.54% 13.71% 10.08% 16.75% Vote 3.56% 5.48% 5.11% 5.46% Breast 1.94% 4.72% 3.39% 5.32% Iris 2.20% 5.40% 5.20% 9.20% Heart 16.04% 21.09% 17.70% 21.59% Thyroid 7.00% 7.27% 7.57% 7.23%

Table 9. Average number of connections used Datasets Least-Error Least-Complex Monks-1 49.00 17.63 Vote 113.70 35.47 Breast 89.80 17.87 Iris 77.37 55.23 Heart 108.17 28.60 Thyroid 195.77 117.17


887

Table 10. Performance comparison between MGNN and NN-MOPSOCD Algorithms

MSE on Breast MGNN-ep 3.28% MGNN-rank 3.33% MGNN-roul 3.05% NN-MOPSOCD 1.68%

Test set Iris 6.17% 7.28% 8.43% 4.58%

Number of Connections Breast Iris 80.87 56.38 68.46 47.06 76.40 55.13 48.13 66.02

Table 11. Performance comparison between EPNet and NN-MOPSOCD Algorithms

MSE on Test set Number of Connections Breast Heart Thyroid Breast Heart Thyroid EPNet 1.42% 12.27% 1.13% 41.00 92.60 208.70 NN-MOPSOCD 1.68% 7.62% 5.21% 48.13 65.35 149.27

4

Conclusion

This work dealt with the neural network training problem through a multiobjective optimization approach using a PSO-based algorithm to concurrently optimize the architectures and connection weights. Also, the proposed algorithm dynamically generates a set of near-optimal feedforward neural networks with their corresponding connection weights. Our approach outperforms most of the evolutionary computation-based methods found in existing literature to date.

References 1. Abbass, H.: An Evolutionary Artificial Neural Networks Approach to Breast Cancer Diagnosis. Artificial Intelligence in Medicine, 25(3) 2002 265-281 2. Alfassio Grimaldi, E., Grimaccia, F., Mussetta, M. and Zich, R.: PSO as an Effective Learning Algorithm for Neural Network Applications. Proceedings of the International Conference on Computational Electromagnetics and its Applications, Beijing - China (2004) 557-560 3. Al-kazemi, B. and Mohan, C.: Training Feedforward Neural Networks using Multiphase Particle Swarm Optimization. Proceedings of the 9th International Conference on Neural Information Processing (ICONIP 2002), Singapore (2002) 4. Barron, A., Rissanen, J. and Yu, B.: The Minimum Description Length Principle in Coding and Modeling. IEEE Trans. Inform. Theory, 44 (1998) 2743-2760 5. Coello, C. and Lechuga, M.: MOPSO: A Proposal for Multiple Objective Particle Swarm Optimization. Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2002). Honolulu, Hawaii USA (2002) 6. Deb K., Pratap A., Agarwal S. and Meyarivan, T.: A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, vol. 6(2) (2002) 182-197 7. Fieldsend, J.: Multi-Objective Particle Swarm Optimisation Methods. Technical Report # 419, Department of Computer Science, University of Exeter (2004)

888


8. Grunwald, P.: A Tutorial Introduction to the Minimum Description Length Principle. Advances in Minimum Description Length: Theory and Applications. MIT Press (2004) 9. Gudise V. and Venayagamoorthy G.: Comparison of Particle Swarm Optimization and Backpropagation as Training Algorithms for Neural Networks. IEEE Swarm Intelligence Symposium, Indianapolis, IN, USA (2003) 110-117 10. Hinton, G. and van Camp, D.: Keeping Neural Networks Simple by Minimizing the Description Length of the Weights. Proceedings of COLT-93 (1993) 11. Jin, Y., Sendhoff, B. and K¨ orner, E.: Evolutionary Multi-objective Optimization for Simultaneous Generation of Signal-type and Symbol-type Representations. Third International Conference on Evolutionary Multi-Criterion Optimization. LNCS 3410, Springer, Guanajuato, Mexico (2005) 752-766 12. Kennedy, J. and Eberhart, R.: Particle Swarm Optimization. Proceedings of the 1995 IEEE International Conference on Neural Networks, Perth, Australia, vol. 4 (1995) 1942-1948 13. Liu, Y. and Yao, X.: A Population-Based Learning Algorithm Which Learns Both Architectures and Weights of Neural Networks. Chinese J. Advanced Software Res., vol. 3, no. 1 (1996) 54-65 14. Newman, D., Hettich, S., Blake, C. and Merz, C.: UCI Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science (1998) 15. Palmes, P., Hayasaka, T. and Usui, S.: Mutation-based Genetic Neural Network. IEEE Transactions on Neural Networks, vol. 16, no 3 (2005) 587-600 16. Raquel, C. and Naval, P.: An Effective Use of Crowding Distance in Multiobjective Particle Swarm Optimization. Proceedings of the 2005 Conference on Genetic and Evolutionary Computation (Washington DC, USA, June 25-29, 2005). H. Beyer, Ed. GECCO ’05. ACM Press, New York, NY (2005) 257-264 17. Shahin, M., Jaksa, M. and Maier, H.: Application of Neural Networks in Foundation Engineering. Theme paper to the International e-Conference on Modern Trends in Foundation Engineering: Geotechnical Challenges and Solutions, Theme No. 5: Numerical Modelling and Analysis, Chennai, India (2004) 18. Sugisaka, M. and Fan, X.: An Effective Search Method for Neural Network Based Face Detection Using Particle Swarm Optimization. IEICE Transactions 88-D(2) (2005) 214-222 19. van den Bergh, F.: Particle Swarm Weight Initialization in Multi-layer Perceptron Artificial Neural Networks. Development and Practice of Artificial Intelligence Techniques (Durban, South Africa) (1999) 41-45 20. Yao, X.: Evolving Artificial Neural Networks. Proceedings of the IEEE, vol. 87 (1999) 1423-1447 21. Yao, X. and Liu, Y.: Evolving Artificial Neural Networks through Evolutionary Programming, Presented at the Fifth Annual Conference on Evolutionary Programming, 29 February-2 March 1996, San Diego, CA, USA MIT Press (1996) 257-266 22. Yao, X. and Liu, Y.: Towards Designing Artificial Neural Networks by Evolution. Applied Mathematics and Computation. vol. 91 no. 1 (1998) 83-90 23. Zhao, F., Ren, Z., Yu, D., and Yang Y.: Application of An Improved Particle Swarm Optimization Algorithm for Neural Network Training. Proceedings of the 2005 International Conference on Neural Networks and Brain. Beijing, China, Vol. 3 (2005) 1693-1698

Training Neural Networks Using Multiobjective Particle ... - CiteSeerX

Training Neural Networks Using Multiobjective Particle ... - CiteSeerX

Suggest Documents

Training Wavelet Neural Networks Using Hybrid Particle Swarm

Designing Artificial Neural Networks Using Particle Swarm

Multiobjective Optimization of Distribution Networks Using ... - CiteSeerX

Multiobjective Optimization of Distribution Networks Using ... - CiteSeerX

Training Feedforward Neural Networks Using Symbiotic Organisms ...

Training Feedforward Neural Networks Using Genetic Algorithms

Training Deep Spiking Neural Networks using Backpropagation

Training of Artificial Neural Networks Using Differential ... - CiteSeerX

Training of Artificial Neural Networks Using Differential ... - CiteSeerX

Neural Graph Learning: Training Neural Networks Using Graphs

Multiobjective Particle Swarm Optimization Using Fuzzy Logic ...

neuralnet: Training of neural networks

Spiking Neural Network Training Using Evolutionary ... - CiteSeerX

Short Term Load Forecasting Using Neural Networks and Particle ...

Automatic Fingerprint Verification Using Neural Networks - CiteSeerX

Parallelization of Neural Networks using PVM - CiteSeerX

Pattern Recognition Using Neural Networks - CiteSeerX

Internal Model Control Using Neural Networks. - CiteSeerX

Automatic Discourse Segmentation using Neural Networks - CiteSeerX

Using Biased-Output Neural Networks - CiteSeerX

Outlier Detection Using Replicator Neural Networks - CiteSeerX

Model Updating Using Neural Networks - CiteSeerX

Data Clustering Using Artificial Neural Networks - CiteSeerX

USING NEURAL NETWORKS AND GENETIC ... - CiteSeerX