on the weighted dynamic classifier selection with local ...

2 downloads 0 Views 2MB Size Report
Instituto Tecnológico de Toluca. Av. Tecnológico s/n. 52140 Metepec (Mexico). E-mail: [email protected]. Rosa M. Valdovinos. Computo Aplicado Group.
Proceedings of the 11th IASTED International Conference Intelligent Systems and Control (ISC 2008) November 16 - 18, 2008 Orlando, Florida, USA

ON THE WEIGHTED DYNAMIC CLASSIFIER SELECTION WITH LOCAL ACCURACIES Aydet I. Morales Lab. Reconocimiento de Patrones Instituto Tecnol´ogico de Toluca Av. Tecnol´ogico s/n 52140 Metepec (Mexico) E-mail: [email protected]

Rosa M. Valdovinos Computo Aplicado Group Centro Universitario UAEM Valle de Chalco Av. Hermenegildo Galeana No.3 56615, Valle de Chalco (Mexico) E-mail:li [email protected]

selecting the members that will do the final classification. In the static selection, the classifier expert is assigned during the learning phase, whereas in the dynamic selection the expert is determined in the classification phase. In the present study, in order to identify the expert classifier a new method for weighting the Dynamic Classifier Selection (DCS) [10] is proposed. The effectiveness of the new approach is empirically tested over a number of real-problem data sets. From now on, the rest of the paper is organized as follows. Section 2 provides some related works about classifier selection. Section 3 reviews one of the main weighting functions for the k-Nearest Neighbor (k-NN) classifiers. Next, the experimental results are discussed in Section 4. Finally, Section 5 gives the main conclusions and points out possible directions for future research.

ABSTRACT When a multiple classifier system is employed, one of the most popular methods to perform the classifier fusion is the simple majority voting. However, when the performance of the ensemble members is not uniform, the efficiency of this type of voting generally results affected negatively. In the present paper, we propose a weighting function based on the distance of the nearest neighbors for the Dynamic Classifier Selection with Local Accuracy (DCS-LA) algorithm. Experimental results with several real-problem data sets taken from the UCI Machine Learning Database Repository demonstrate the advantageous of this strategy over the simple voting and the plain (unweighted) DCS-LA.

1.

Introduction

Ensemble is a learning paradigm where several classifiers are combined in order to generate a single classification result. Let D = {D1 , . . . , DL } be a set of L classifiers, and Ω = {ω1 , . . . , ωc } be a set of c classes. Each classifier Di (i = 1, . . . , L) gets as input a feature vector x ∈ 1 interesting regions, R = {R1 , . . . , Rr }; the number of classifiers L has not necessarily to be equal to the number of regions r [6]. In addition, each classifier can be expert for more than one region. 2.1

Static Classifier Selection (SCS)

In this method, the regions are established during the training phase, previous to the classification of new patterns. In the operation (classification) phase, the region of the pattern x is firstly found (Rj ) and then, using the Dj classifier (responsible for the region j), the corresponding label is assigned to x. The allocation Region - classifier can become of two forms: • Begin by specifying the region and later assign a responsible classifier. Start using the K-Means algorithm in order to establish the regions (clusters)

204

case, accept the label and return. If a unique class label might be selected by plurality among the tied classifier, then assign this label to x and return.

in the training phase by using the method designed by Kuncheva [9, 8], called clustering and selection. Then, the centroid is used for finding the classifier that will label the test pattern in the classification phase.

5. Otherwise, there is a class label tied between the most locally competent classifier. The classifier with the next highest local competence is identified to break the tie. If all classifier are tied and the class label is still tied, then pick a random class label among the ties labels and return. If there is a unique winner of the (second) local competence contest and it can break the tie, then use the winning label for x and return.

• Find the region (or group of regions) for each classifier, where the classifier has the highest performance. Hartono et al. [13] determine the regions during the learning process of a neural network. To do this, they incorporate a supervisor neuron of ”confldence” level into the output layer of the network structure. This way, the selection is made considering the confidence level given to determine the test pattern. The mixture expert systems employ similar schemes [12]. 2.2

6. If none of the clauses in the previous point apply, then randomly take a label for x. Two modifications for this method were proposed by Kuncheva [8], which include the random selection form of the winning classifier when ties appear, and by Giacinto [6], where the local estimation is made using the class probabilities; unlike Woods, Giacinto only considers the label of class assigned to the test pattern. In this paper, we proposed a weighting k-NN method for estimate the local accuracy of each individual classifier, in order to increase the accuracy levels and to reduce the steps considered in the original DCS-LA algorithm.

Dynamic Classifier Selection (DCS)

Unlike the static selection, in the dynamic strategy the competence regions are obtained during the operation phase. The division commonly is based on the accuracy of each decision, having higher preference those most accurate classifiers. Some developed algorithms estimate the local precisions and are called Dynamic Classifier Selection with Local Accuracy (DCS-LA). Woods et al [10] propose two methods which consider local accuracies. The first one, called Overall Local Accuracy (DCS-OLA), considers the right percentage of test patterns classified by each classifier L. The second method, Local Class Accuracy (DCS-LCA), considers the class assigned by a classifier to the test pattern, and then it calculates the percentage of the training patterns correctly classified towards the same class by using the k -NN rule. The main difference between both methods is related to the information used to compute the ”local accuracy”: while the former has been called a priori selection, the latter is known as a posteriori selection GiaTesis. The general algorithm of DCS-LA can be written as follows [10]:

3. Weighting Voting k-NN A voting rule for the k-NN rule in which the votes of different neighbors are weighted by a function of their distance to the input pattern was first proposed by Dudani [3]. A neighbor with smaller distance is weighted more heavily than one with a greater distance: the nearest neighbor gets a weight of 1, the furthest neighbor a weight of 0, and the other weights are scaled linearly to the interval in between. These methods were used in the classification process and in the context of the ensembles for the majority weighted voting (dynamically) [5]. In this work, we use to the so-called inverse distance (equation 1) for weighting the k-NN rule used in the dynamic classifier selection context.

1. Design the individual classifier D 1 , . . . , DL . Pick the value of the parameter k. 2. Upon obtaining an input x, label it by D1 , . . . , DL . If all classifiers agree on the label, then assign this label to x and return.

wj =

1 dj

if dj 6= 0

(1)

where dj denotes the distance of the j’th nearest neighbor. Applying this measure to the DCS-LA, we compute the weight of the k neighbors of each classifier in the step 3 of the original DCS-LA algorithm. In other words, for estimate the local accuracy of each individual classifier (Di ) with respect to any class (f.e. s), we use the distances obtained with the k-NN weighted.

3. If there is a disagreement, then estimate the local accuracy for each Di , i = 1, . . . , L. To do this, take the class label offered for x by Di , say s²Ω, and find the k points closest to x for which Di has issued the same label. Calculate the proportion of the points whose true labels were s to be an estimate of the local accuracy of Di with respect to class s.

4.

4. If there is a unique winner of the local accuracy contest, label x with it and return. Otherwise, check if the tied winners have the same labels for x. In this

Experimental Results

The results here reported correspond to the experiments over eight real-problem data sets taken from the UCI

205

Machine Learning Database Repository (http://www. ics.uci.edu/˜mlearn). For each data set, the 5-fold cross-validation method was employed to estimate the classification error: 80% of the available patterns were used for training purposes and 20% for test set. On the other hand, it has to be noted that in the present work, all the base classifiers correspond to the 1-NN rule. The experiments basically consist of computing the classification accuracy when using different decision methods in an ensemble. The weighting function proposed in the present paper for the DCS-LA algorithm is compared to the basic DCS-LA with k = 7 (on both cases) and the simple majority voting. The ensembles have been integrated by using four well-known resampling methods: random selection with no replacement [5], bagging [2], boosting [4], and Arc-x4 [1]. Only the result of the best technique on each database has been presented. Analogously, for each database, related to the number of subsamples to induce the individual classifiers, that is, the number of classifiers in the system, we have experimented with 5 and 7 elements, and the best results have been finally included in Figure 1 and Figure 2. Besides, the 1-NN classification accuracy for each original training set (i.e., with no combination) is also reported as the baseline classifier. 100 95 90

100 95 90

Accuracy

85 80 75 70 65 60 55 German Heart Wine Waveform Liver Vehicle Cancer Satimage Database

Figure 2. Overall Accuracy, ensemble with 7 classifiers

dom. For five trials, the null hypothesis can be rejected if |t| > t4,975 = 2.776. One of the purposes of the t test is to identify the statistic degree among the two methods (Type I error). A Type I error occurs when the null hypothesis is true (i.e., there is no difference between the two strategies) and the resampling method rejects the null hypothesis. Figure 3 and Figure 4 shows the comparison between different methods tested in this work. This comparison was accomplished using the average among different resampling methods. In the left-top of this figure, the different combinations studied are shown. The observed t-value axis gives information about the t statistic values, on which the null hypothesis can be rejected.

Single Simple Voting DCS-LA DCS-LA Weighted

85

Accuracy

Single Simple Voting DCS-LA DCS-LA Weighted

80 75 70

1

65

Simple Voting Weighted Voting

DCS-LA Weighted DCS-LA

0.8

60 German Heart

Obseved t value

55 Wine Liver Satimage VehicleCancer Waveform Database

Figure 1. Overall Accuracy, ensemble with 5 classifiers

0.6

0.4

0.2

From the results given in the Figures 1 and 2, some comments can be drawn. First, it is clear that in all databases the employment of a ensemble leads to better performance than the individual 1-NN classifier. Second, the application of the selection strategy outperforms the combination of classifiers by means of the simple majority voting. In fact, one can observe that the weighted DCSLA presents a higher classification accuracy than the original DCS-LA (without weighting) on all databases. Except for the Cancer database, differences between the weighted DCS-LA and the rest of the schemes are very significant. For the statistical analysis, we use the resampled paired t test [14] to investigate if there exist significant differences between both methods. Under the null hypothesis, this statistic has a t distribution with n − 1 degrees of free-

0 German Heart Wine Waveform Liver Vehicle Cancer Satimage Database

Figure 3. Test comparisons, ensemble with 5 classifiers

We can see that with all methods the null hypothesis is rejected since all classifiers have ranks outside the marked interval which are significantly different from the control.

5.

Conclusion

When an ensemble is employed one has to implement some procedure for combining the individual decisions of

206

1

Simple Voting Weighted Voting

References

DCS-LA Weighted DCS-LA

[1] L. Breiman: Arcing classifiers. The Annals of Statistics 26 (3), (1998), pp. 801–849.

Obseved t value

0.8

0.6

[2] L. Breiman: Bagging predictors: Machine Learning 26 (1996) 123–140.

0.4

[3] S.A. Dudani: The distance weighted k-nearest neighbor rule. IEEE Trans. on Systems, Man and Cybernetics 6 (1976) 325–327.

0.2

0 German Heart Wine Waveform Liver Vehicle Cancer Satimage Database

[4] Y. Freund, R.E. Schapire: Experiments with a new boosting algorithm, In: Proc. 13th Intl. Conference on Machine Learning, Morgan Kaufmann (1996) 148– 156.

Figure 4. Test comparisons, ensemble with 7 classifiers

[5] R.M. Valdovinos, J.S. S´anchez: Class-dependant resampling for medical applications, In: Proc. 4th Intl. Conf. on Machine Learning and Applications, Los Angeles, CA (2005) 351–356.

the base classifiers. Due to some difficulties described above, the majority voting efficiency can become too poor when the performance of the ensemble members is not uniform. Under this practical situation, the selection methods promise to improve the ensemble behavior overcoming the majority voting drawbacks. In this paper, new methods for dynamic weighting in the framework of DCS-LA methods have been introduced. More specifically, a weighting function present in the literature has been adapted to be used in a voting system for dynamic selection. In particular, we have explored the inverse distance weight reformulated by Dudani [3]. Experimental results with several real-problem data sets have shown the true benefits of using the dynamic weighting strategy over the DCS-LA when compared to the unweighed DCS-LA and the simple majority voting schemes. Results also corroborate that in general, an ensemble clearly outperforms the individual 1-NN classifier. Future work is primarily addressed to investigate other weighing functions applied to dynamic selection in an ensemble. Within this context, the use of several wellknown data complexity measures [5] could be of interest to conveniently adjust the classifier weights. Also, we are interested in developing static selection methods. On the other hand, the results reported in this paper should be viewed as a first step towards a more complete understanding of the behavior of the weighted voting procedures on DCS-LA algorithms and consequently, it is still necessary to perform a more exhaustive analysis over a larger number of synthetic and real databases.

[6] G. Giacinto, F. Roli, G. Vernazza: Methods for designing multiple classifiers systems. In: Proceedings of International Workshop on Multiple Classifier Systems (MCS’22001), Lecture Notes in Computer Science, Cambridge (UK) (2001), pp. 78–87. [7] G. Giacinto, F. Roli y G. Fumera: Selection of classifiers based on Multiple Classifier Behaviour. University of Cagliari Piazza d’ Armi, Cagliari Italy, 2000. [8] L.I. Kuncheva: Switching Between Selection and Fusion in Combining Classifiers: An Experiment. IEEE Trans. On Systems Man and Cybernetics, Part BCybernetics 32(2), 2002, pp. 146-156. [9] L. I. Kuncheva: Clustering-and-selection model for classifier combination. In: Proceedings of the 4th International Conference on Knowledge-Based Intelligent Engineering Systems. Allied technologies (KES’200), Brighton (UK), (2000). [10] K. Woods, W. P. Kegelmeyer Jr, K.Bowyer: Combination of Multiple Classifier Using Local Accurcy Estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), (1997). [11] O. Matan: On voting ensembles of classifiers. In: Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96) , Workshop on Integrating Multiple Learned Models, (1996), pp. 84–88. [12] C. Bauckhage, C. Thurau: Towards a Fair’n Square Aimbot Using Mixture of Experts to Learn Context Aware Weapon Handling. In: Proccedings of (GAMEON’04), Ghent, Belgium (2004) pp. 20–24.

Acknowledgments This work has been partially supported by grants: GV/2007/105 from the Generlitat Valenciana (Spain), DPI2006–15542 and CSD2007–00018 from the Spanish Ministry of Education, and Science, and PROMEP/103.5/08/3016 from the Mexican SEP.

[13] P. Hartono, S. Hashimoto: Ensemble of Linear Perceptrons with Confldence Level Output. In: Proceedings of the 4th International Conference on Hybrid Intelligent Systems (HIS’04), (2004), pp. 186–191.

207

[14] J. Demsar: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7 (2006) 1–30.

208

Suggest Documents