Improving Adaptive Boosting with k-cross-fold validation

7 downloads 3243 Views 67KB Size Report
Cross-Validation to increase the diversity of the ensemble. We have used Cross-. Validation over the whole learning set to generate an specific training set and.
Improving Adaptive Boosting with k-cross-fold validation Joaqu´ın Torres-Sospedra1 , Carlos Hern´andez-Espinosa1 and Mercedes Fern´andez-Redondo1 Departamento de Ingenieria y Ciencia de los Computadores, Universitat Jaume I, Avda. Sos Baynat s/n, C.P. 12071, Castellon, Spain { jtorres , espinosa , redondo} @icc.uji.es

Abstract. As seen in the bibliography, Adaptive Boosting (Adaboost) is one of the most known methods to increase the performance of an ensemble of neural networks. We introduce a new method based on Adaboost where we have applied Cross-Validation to increase the diversity of the ensemble. We have used CrossValidation over the whole learning set to generate an specific training set and validation set for each network of the committee. We have tested Adaboost and Crossboost with seven databases from the UCI repository. We have used the mean percentage of error reduction and the mean increase of performance to compare both methods, the results show that Crossboost performs better.

1 Introduction Perhaps, the most important property of a neural network is the generalization capability. The ability to correctly respond to inputs which were not used in the training set. One technique to increase this capability with respect to a single neural network consist on training an ensemble of neural networks, i.e., to train a set of neural networks with different weight initialization or properties and combine the outputs of the different networks in a suitable manner to give a single output. It is clear from the bibliography that this procedure increases the generalization capability. The error of a neural network can be decomposed into a bias and a variance [1, 2]. The use of an ensemble usually keeps the bias constant and reduce the variance if the errors of the different networks are uncorrelated or negatively correlated. Therefore, it increases the generalization performance. The two key factors to design an ensemble are how to train the individual networks to get uncorrelated errors and how to combine the different outputs of the networks to give a single output. In previouses works, [3] and [4], developed by our research group we presented a comparison among methods to build an ensemble. In these works we concluded that k-Cross-Fold Validation (CVC) presented in [5] has a good performance. Among the methods described in the bibliography we have focused on Adaptive Boosting (Adaboost). Adaboost is a well know method to create an ensemble of neural networks [6] and it has been widely studied by some authors [7–9].

Adaboost could be improved if we increase the diversity of the ensemble. We can generate an specific validation set Vnet and training set Tnet for each network [5] if we apply Cross Validation over the learning set in order to get k disjoint subsets. To test the performance of the method we have proposed Cross Validated Boosting (Crossboost), we have built ensembles of 3, 9, 20 and 40 multilayer feedforward networks with seven databases from the UCI repository. The results we have obtained on these seven databases are in subsection 3.1. We have also calculated the global measurements Mean Increase on performance and Mean Percentage of Error Reduction of Adaboost and Crossboost to compare both methods, these results appear in subsetion 3.2.

2 Theory 2.1

Adaptive Boosting

Adaboost is a method that constructs a sequence of networks, each network is trained as an individual network, but the training set used to train the network is based on the previous network performance on the original training set. ′ The successive networks are trained with a training data set T selected at random from the original training data set T , the probability of selecting a pattern from T is given by the sampling distribution associated to the network dnet . The sampling distribution associated to a network is calculated when the previous network learning process has finished. Although the Output Average and voting are the two most used combining methods [10], Adaboost use the an specific combination method called (boosting combiner) which is described in equation 1 . hboosting (x) =

c=1,...,classes

2.2

k X

arg max

net:hnet (x)6=d(x)

log



1 − ǫnet ǫnet



(1)

Cross Validated Boosting

In the method we have proposed, Crossboost, we assume that each network has its own version of the training set Tnet and validation set Vnet . The training process is similar to the training process of Adaboost. In this case, the learning set L has been divided into k disjoint subsets of the same size L = {L1 , ..., Lk } with k-cross-fold validation in order to create the specific training sets and validation sets. The ith-network validation set, Vi , is generated by equation 2 and the ith-network training set, Ti , is generated by equation 3. Vnet = Lnet Tnet =

k [

j=1 j6=net

Lj

(2)

(3)

The algorithm 1 shows the training process of an ensemble of neural networks with Crossboost. Algorithm 1 CrossBoost {L k} Initialize Sampling Distribution Dist: Dist1pattern = 1/m ∀pattern ∈ L Divide Learning set L into k subsets for net = 1 to k do Create Tnet and Vnet end for for net = 1 to k do ′ Create T sampling from Tnet using distnet ′ ′ MF Network Training Tnet , Vnet Calculate missclassified  vector: 1 if hnet (xpattern ) = d(xpattern ) net misspattern = 0 otherwise Calculate error: Pm net ǫnet = pattern=1 Distnet pattern · misspattern Update sampling distribution:  1 if missnet pattern net 2(1−ǫnet ) Distnet+1 1 pattern = Distpattern · otherwise (2ǫnet ) end for

In our experiments we have used the Average combiner (4) and the boosting combiner (1) to get the ensemble output/hypothesis of the ensembles generated by Crossboost. ( k ) X 1 net haverage (x) = arg max · yclass (x) (4) class=1...classes net=1 k

3 Experimental testing The experimental setup, the datasets we have used in our experiments and the results we have obtained are described in the present section. Finally, the results of Boosting methods Adaboost and Crossboost have been analysed. For this reason we have trained ensembles of 3, 9, 20 and 40 MF networks with Adaboost and Crossboost algorithms on seven different classification problems from the UCI repository of machine learning databases [11] to test the performance of both methods. The databases we have used are: Balance Scale Database (bala), Australian Credit Approval (credit), Heart Disease Database (hear), Image segmentation Database (img), Ionosphere Database (ionos), MONK Problems 2 (mok2), Wisconsin Breast Cancer Database (wdbc). In addition, we repeated ten times the whole learning process, using different partitions of data in training, validation and test sets. With this procedure we can obtain a mean performance of the ensemble for each database and an error in the performance calculated by standard error theory.

3.1

Results

The main results we have obtained with the Boosting methods are presented in this subsection. Table 1 shows the results we have obtained with ensembles of 3, 9, 20 and 40 networks trained with Adaboost. Table 2 shows the results we have obtained with ensembles of 3, 9, 20 and 40 networks trained with Crossboost combining the outputs with the Output Average and the Boosting Combiner. Table 1. Adaboost results Database bala cred hear img ionos mok2 wdbc

3 Nets 94.5±0.8 84.9±1.4 80.5±1.8 96.8±0.2 88.3±1.3 76.5±2.4 95.7±0.6

9 Nets 95.3±0.5 84.2±0.9 81.2±1.4 97.3±0.3 89.4±0.8 78.8±2.5 95.7±0.7

20 Nets 96.1±0.4 84.5±0.8 82±1.9 97.29±0.19 91.4±0.8 81.1±2.4 96.3±0.5

40 Nets 95.7±0.5 85.1±0.9 82.2±1.8 97.3±0.2 91.6±0.7 82.9±2.1 96.7±0.9

Table 2. Crossboost results DB bala cred hear img ionos mok2 wdbc

3.2

3 Nets 96.3±0.5 84.8±1 81.7±1.3 96.6±0.3 89.7±0.9 77.9±2.3 96.6±0.2

Average Combiner 9 Nets 20 Nets 96.3±0.6 96.2±0.6 85±0.8 86.5±0.6 80.5±1.8 82.4±1.1 97.3±0.2 97.4±0.2 91.3±1 91.6±1.3 85.8±1.3 84.1±2 96.5±0.7 96±0.5

40 Nets 95.4±0.6 86.2±0.8 78.1±1.2 97.4±0.3 91.4±1.8 77.3±1.6 96.1±0.5

3 Nets 95.1±0.5 85.2±0.9 80.7±1.3 96.1±0.3 89.1±0.7 74.8±2 96.6±0.4

Boosting Combiner 9 Nets 20 Nets 40 Nets 95.4±0.5 96.2±0.6 95.8±0.7 86.4±0.7 85.3±0.7 84.9±0.7 81.2±1.5 82.7±1.6 81.2±1.5 97.5±0.2 97.5±0.2 97.7±0.1 91.7±0.6 90.7±1 92.6±0.6 86.1±1.4 87.3±0.9 87.5±1.2 96.7±0.5 96.4±0.6 95.9±0.5

Interpretations of Results

In order to see if the method we have proposed is better, we have calculated the increase of performance of the new method respect the original Adaboost. A positive value of the increase of performance means that our method performs better that the original Adaboost on the dataset. There can also be negative valies, which means that our method performs worse than original Adaboost. The increase on performance obtained with Crossboost using Average Combiner and Boosting Combiner is in table 3. Comparing the results showed in table 3 we can see that the improvement in performance using our method depends on the database and the number of networks used in the ensemble. For instance, the highest increase of performance of Crossboost with respect to Adaboost is 9.25 in Database mok2. In contrast we can see that the increase of performance in database bala is not as good as in data base mok2.

Table 3. Crossbost increase on performance respect Adaboost

DB bala cred hear img ionos mok2 wdbc

3 Nets 1.83 -0.16 1.17 -0.18 1.41 1.38 0.93

Average Combiner 9 Nets 20 Nets 40 Nets 1.03 0.08 -0.25 0.76 2.08 1.07 -0.68 0.34 -4.06 0.03 0.14 0.1 1.89 0.16 -0.17 7 3 -5.63 0.74 -0.3 0.39

3 Nets 0.64 0.31 0.17 -0.69 0.84 -1.76 0.84

Boosting Combiner 9 Nets 20 Nets 40 Nets 0.15 0.16 0.15 2.14 0.85 -0.16 0 0.67 -1.02 0.18 0.16 0.4 2.3 -0.7 0.96 7.38 6.12 4.62 1.03 0.06 0.2

But, the increase of performance we have show is an absolute measure so we cannot see how important is the increase of performance with respect to the error. To have information about the error reduction, we have also calculated the percentage of error reduction (PER) (5) of the ensembles with respect to a Single Network. P ER = 100 ·

Errorsinglenetwork − Errorensemble Errorsinglenetwork

(5)

The P ER value ranges from 0%, where there is no improvement by the use of a particular ensemble method with respect to a single network, to 100%. There can also be negative values, which means that the performance of the ensemble is worse than the performance of the single network. This new measurement is relative and can be used to compare more clearly the different methods. Furthermore, we have calculated the increase of performance with respect to Adaboost and the mean PER across all databases for each method to get a global measurement. Table 4 shows these results. Table 4. Global Measures Mean P ER DB 3 Nets 9 Nets 20 Nets Adaboost) 3.57 8.52 16.95 CrossBoost (ave) 13.19 19.34 19.76 CrossBoost (boost) 7.01 23.01 21.25

Mean Increase of Performance 40 Nets 3 Nets 9 Nets 20 Nets 40 Nets 14.95 2.07 2.73 3.71 3.96 4.27 4.5 2.74 -1.46 12.48 2.12 4.62 4.76 4.7 19.95

According to this global measurement Crossboost performs better than original Adaboost. The highest difference between original Adaboost and Crossboost is in the 9network ensemble where the mean P ER increase is 14.49%. We also see that the ensembles composed by a high number of networks tend to be less accurate than smaller ones. This is becuase the partitioning process of learning set we have apply produces a very small validation set when the number of networks is high.

4 Conclusions In this paper we have presented Crossboost, an algortithm based on Adaboost and cross validation. We have trained ensembles of 3, 9, 20 and 40 networks with Adaboost and Crossboost to cover a wide spectrum of the number of networks in the ensemble. Altough the results showed that in general the improvement by the use of Crossboost depends on the database, in four databases there is a high increase in the performance. Finally, we have obtained the mean percentage of error reduction across all databases. According to the results of this measurement Conserboost performs better than Adaboost. The most appropriate method to combine the outputs in Crossboost is the Average combiner for the ensembles of 3 networks and the Boosting combiner for the ensembles of 9, 20 and 40 networks. We can conclude that the Adaboost variation we have presented in this paper increases the diversity of the classifiers so the performance of the final ensemble is, in general, better.

Acknowlegments This research was supported by project P1·1B2004-03 of Universitat Jaume I - Bancaja in Castell´on de la Plana, Spain.

References 1. Tumer, K., Ghosh, J.: Error correlation and error reduction in ensemble classifiers. Connection Science 8(3-4) (1996) 385–403 2. Raviv, Y., Intratorr, N.: Bootstrapping with noise: An effective regularization technique. Connection Science, Special issue on Combining Estimators 8 (1996) 356–372 3. Hernandez-Espinosa, C., Fernandez-Redondo, M., Torres-Sospedra, J.: Ensembles of multilayer feedforward for classification problems. In: Neural Information Processing, ICONIP 2004. Volume 3316 of Lecture Notes in Computer Science. (2005) 744–749 4. Hernandez-Espinosa, C., Torres-Sospedra, J., Fernandez-Redondo, M.: New experiments on ensembles of multilayer feedforward for classification problems. In: Proceedings of International Conference on Neural Networks, IJCNN 2005, Montreal, Canada. (2005) 1120–1124 5. Verikas, A., Lipnickas, A., Malmqvist, K., Bacauskiene, M., Gelzinis, A.: Soft combination of neural classifiers: A comparative study. Pattern Recognition Letters 20(4) (1999) 429–444 6. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning. (1996) 148–156 7. Kuncheva, L., Whitaker, C.J.: Using diversity with three variants of boosting: Aggressive. In: Proceedings International Workshop on Multiple Classifier Systems , Calgiari, Italy, June 2002. Springer. Volume 2364 of Lecture Notes in Computer Science., Springer (2002) 8. Oza, N.C.: Boosting with averaged weight vectors. In Windeatt, T., Roli, F., eds.: Multiple Classifier Systems. Volume 2709 of Lecture Notes in Computer Science., Springer (2003) 15–24 9. Breiman, L.: Arcing classifiers. The Annals of Statistics 26(3) (1998) 801–849 10. Drucker, H., Cortes, C., Jackel, L.D., LeCun, Y., Vapnik, V.: Boosting and other ensemble methods. Neural Computation 6(6) (1994) 1289–1301 11. Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)

Suggest Documents