Boosted Noise Filters for Identifying Mislabeled Data - Semantic Scholar

12 downloads 0 Views 633KB Size Report
Shi Zhong, Wei Tang, and Taghi M. Khoshgoftaar. Department of Computer ... general, there have been three different approaches to han- dling noise in data ...
Boosted Noise Filters for Identifying Mislabeled Data Shi Zhong, Wei Tang, and Taghi M. Khoshgoftaar Department of Computer Science and Engineering Florida Atlantic University 777 Glades Road, Boca Raton, FL 33431, USA szhong, wtang  @fau.edu, [email protected] Abstract In many practical classification problems, mislabeled data instances (i.e., class noise) exist in the acquired (training) data and often have a detrimental effect on the classification performance. Identifying such noisy instances and removing them from training data can significantly improve the trained classifiers. One such effective noise detector is the so-called ensemble filter, which predicts the instances misclassified by multiple learned classifiers as noise. This paper proposes a novel noise detection method that uses a boosting ensemble of the ensemble noise filters. Multiple ensemble noise filters are built sequentially, with each one working on weighted instances. The weighting scheme follows the general boosting idea and reduces the weights of those instances that are confidently predicted as noise in previous runs. This method essentially wraps an existing ensemble filter-based noise detector with a second layer of boosting ensemble. Our experimental results on a range of real datasets from the UCI repository show the superiority of the proposed boosted noise detectors.

1 Introduction Outliers and various kinds of errors are unavoidable in real data. For example, in medical domains, patients may be mis-diagnosed thus classified into a wrong category. In manufacturing processes, data sensors can be affected by temperature, humidity, and other environment variables, or may be temporarily malfunctioning, leading to possibly abnormal measures. In a data acquisition process that is not completely automated, human can make mistakes and type wrong numbers into databases. All these kinds of data records with anomalous and erroneous values can be viewed as noise. That is, we basically take the same view of “noise” as in [12], which defines noise as any data instances that do not follow the “true” model that generates the rest majority of data records.

Much existing work has shown that proper handling of noise usually leads to better classification performance than ignoring noise in the training data [6, 11, 1, 5, 15, 12, 7]. In general, there have been three different approaches to handling noise in data analysis: (a) designing robust algorithms that are insensitive to noise [9, 6]; (b) filtering out noise [1]; and (c) correcting noisy instances [11]. Robust algorithms are mostly built with a complexity control mechanism so that the resulting models do not overfit training data and generalize well to future unseen data. Cross validation, Minimum Description Length, and Structural Risk Minimization are some commonly-used model selection principles. Noise filtering techniques identify and eliminate potential outliers and mislabeled instances in the dataset. One typical machine learning method in this category is to use an ensemble of multiple classifiers and treat the data instances that are misclassified by a given majority of the classifiers as potential data noise (or mislabeled instances). Noise correction methods are built upon the assumption that each attribute or feature in the data is correlated with others and can be reliably predicted. The correction process starts by predicting the value of each feature for each data instance from other features. Heuristics are then used to determine whether one should change (“correct”) the original value of a feature for an instance to the predicted value. This approach is usually more computationally expensive than the first two, and runs the risk of “correcting” clean instances. It has been argued [4] that the robust algorithm approach is less effective than the other two approaches, which directly handle noise in the training data before building classification models. The filtering approach seems to be safer than the correcting approach since the noise identification effort cannot be fault-proof and removing clean instances is probably less harmful than mislabeling clean instances. For example, in [8], it is shown that removing predicted noise instances can better improve classification models than “correcting” the labels of those instances.

In this paper, we focus on a typical ensemble approach to noise filtering, which is of much research interest lately [1, 5, 15, 12, 7]. The ensemble filter method essentially build multiple classifiers and predict the instances misclassified by majority or all of the classifiers as mislabeled instances. Existing work on ensemble noise filters differs from one another mainly in the following aspects:

 

multiple predictions of an instance could be from different classifiers trained on same data subset [1], or the same classifier trained on different data subsets [5, 15, 12];



data partitioning could use hard n-fold cross validation [1, 5, 15], or resampling (e.g., bagging) [12]; for the n-fold cross validation partitioning, each classifier could be trained on a major set (a union of folds) [1, 5], or on a minor set (one fold) [15];





predictions could be made only for held-out instances [1], only for training instances [5], or for all instances [15].

We picked a bagging filter as the baseline ensemble noise filter method investigated in this paper for the following reasons. First, as discussed in [15], predicting noise on heldout instances may not be as accurate as predicting on training instances. Thus we want to use trained classifiers to predict noise mainly on training instances. The bagging filter [12] is one of the noise detectors satisfying our needs. Second, the comparative study in [12] showed that the bagging filter algorithm is among the best performing methods for detecting class noise. Finally, the bagging filter is an appropriate choice for building another layer of ensemble on top of it, as seen in Section 3. Aiming to improve the noise detection performance of existing ensemble filters, we propose a novel boosted noise filter method, which combines multiple runs of a bagging filter. At each run, data instances are weighted using previous filtering results in such a way that confident noise instances receive lower weights. The main difference from traditional boosting for classification problems is that here we do not know the true noise identity of each instance. Consequently, we cannot focus on incorrectly predicted noise (we do not have this information) in each subsequent run. Thus, we instead focus on instances with uncertain noise identity. In this paper, we compare two weighting strategies for the proposed boosted noise filter approach. One is to reduce the weights of noisy instances in accord with our confidence. That is, the more likely an instance is noise according to our prediction, the smaller weight the instance gets. The second strategy is to reduce the weights of both noise and clean instances in accord with our confidence. If we are

very certain that an instance is clean (i.e., correctly labeled), its weights shrinks in the next run, just like a certain noise instance. In other words, we try to focus only on uncertain instances in subsequent runs. In addition to the new boosted noise filter, this paper distinguishes from existing work in that we focus on evaluating noise detection performance instead of inductive learning performance after eliminating noise. Since the benefit of removing noise for classification performance has been shown in many papers, we mainly present the noise detection performance in this paper. Furthermore, we use a more comprehensive evaluation criterion, precision-recall curve, to compare the bagging filter and boosted bagging filters. Our experimental results over a wide range of UCI datasets show that the boosted bagging filter (with the first weighting strategy above) can significantly outperform the base bagging filter, when the noise level is higher than 15%. Most existing work dealt with class noise, which is usually acknowledged as an easier problem than attribute noise. It is also reported that handling attribute noise leads to less gain than filtering class noise in learning performance [14]. Although we focus on class noise in this paper, the methodology presented is potentially useful for handing attribute noise as well. The organization of this paper is as follows. Section 5 discusses some related work. Section 2 presents the baseline bagging filter used in this paper. Section 3 proposes two boosted bagging filter algorithms. Section 4 shows a comparative study of the aforementioned algorithms on a range of UCI datasets, and demonstrates the effectiveness of the boosted bagging filter. Finally, concluding remarks are given in Section 6.

2 Bagging Filter The base noise filter used in this paper is shown in Figure 1. At the heart of the bagging filter is the resampling component which generates multiple classifiers, each on a random sample of the original dataset. This is also the core difference from the existing majority or consensus filters [5, 15]. In the algorithm, we simply return a ranked list of all instances in the order of decreasing certainty about the noise identity of each instance. The certainty is indicated by a noise count ( ) value. In the results section, we use this ranked list to draw precision-recall curves (see Section 4.2). Of course, either majority or consensus voting can always be used to cut at a certain point to get a list of instances predicted as noise. However, such a hard decision makes evaluation more difficult since different methods may cut at different points and have different precision and recall values. Even an averaged F-value is not as intuitive as the precision-recall curves presented in this paper.



   !" #%$ , & ('*)+!,-.  !" #%$ , one for

Input : corrupted dataset number of bootstrap samples Output: a set of noise counts each data instance

begin initialize ; initialize instance weights for to do ResampleWithWeight( , ); buildClassifier( ); end for to do for to do if then ++; end end end return Sorted instances in the order of decreasing

'*)+!,/ 1023!4 567 7 7 -# 89!,4 ;< : !% 56.7 7 7 # =>?A@C 5B  & ?A@ 8 D @CB !4 5 # => 5D @  & F GE  '*)+!,

Figure 1: Bagging filter algorithm (BF).

vF'*)+!," 102!4 567 7 7 # 89!,4 < : u!% w67 7 7 # ;

xB  v =>?A@C 5B  & D @CB

The ResampleWithWeight() function returns a bootstrap sample of data following the normalized weights (as probabilities). The number of bootstrap samples is usually set to 10. The buildClassifier() function returns a trained hypothesis function which is used to predict the class of each instance. The noise count records the number of classifiers that misclassifies instance .

J

 LKNMO

v

begin initialize total noise counts initialize instance weights for to do for to do ResampleWithWeight( , buildClassifier( ); end ; reset local noise counts for to do for to do if then ++; end end

;

'*)

H

9 uu   !/ #%$ , & , and number of vF'*)+!,-  !/ #%$ ,

Input : corrupted dataset number of bootstrap samples runs Output: a set of total noise counts one for each data instance

);

!% 5 # => 5D @  & NC GE  '*)+!, vF'*)+!, B vF'*)+!,zyP'*)+!, ; end for !% 5 to # do B 89!,z{}|~€}%‚  ƒ ; 9 8   , !  end B …C„ † „ ‚  ƒ  ƒ ; normalize weights 89!, ‚

I

HPKNMO

A? @ 8 '*)+!,/ 10

;

end end return sorted instances in the order of decreasing

vF'*)

Figure 2: Boosted bagging filter algorithm - I (BBF-I).

3 Boosted Bagging Filters In this section, we propose two boosted noise filters. The base noise filter is the bagging filter presented in the previous section. The two boosted bagging filters (BBFs) differ only in instance weighting. The first one (BBF-I) is shown in Figure 2. The algorithm essentially reduces the weights for instances with high total noise counts after each round since the weight of each instance is equivalent to . The number of runs is given as an input in the algorithm. It could be automatically determined by using a stopping criterion, e.g., when the number of instances that have a noise count of drops below certain threshold. However, from our experiments, it is observed that the noise filtering results are relatively stable for any . As a result, in our experiments we simply pick . The algorithm returns a ranked list of instances in the order of decreasing total noise counts. The second algorithm (BBF-II) is almost the same as Figure 2 except for the weight update step at the end of each round:

IQKNMOSRUTAV%WYXCZ\[ ] ^

_

J

_a`cb _ de6f

IQKNMOhgcIQKNMO/iFT Vkj XCZ\[ ] ^VFl m\jon

middle noise count. A high value of r  LKNMOsp q tisp q rthesignifies a high confidence on the instance

where

being either noisy or clean. This weighting scheme effectively boosts the weights of those uncertain instances. Com-

pared to BBF-I, this algorithm reduces the weights of clean instances in addition to confident noise instances.

4 Experimental Results In this section, we first describe the datasets used in our experiments, then the experimental setup, and finally present an analysis of results.

4.1 Datasets We experimented on the 12 datasets shown in Table 1. They are available from the UCI Machine Learning Repository [10], and represent a wide range of data characteristics, with number of instances from 432 to 67557, number of nominal features from 0 to 61, number of continuous features from 0 to 30, and number of classes from 2 to 5. In our experiments, we assume all these datasets are relatively clean and inject artificial class noise to test the performance of our proposed noise filters. The assumption is not a bad one given that most of these datasets are easy to classify, as discussed next. To get an idea about the inherent complexity of each dataset, we can look at the classification performance. In-

‡2‰

‡2ˆ



Š

Table 1. Summary of datasets. Columns two through five are number of instances, number of nominal features, number of continuous features, and number of classes, respectively.



Data connect-4 adult nursery mushroom sick kr-vs-kp car CMC tic-tac-toe credit-a WDBC monks3

67557 48842 12960 8124 3772 3196 1728 1473 958 690 569 432

‡ˆ‹‡2‰ŒŠ

42 8 8 22 22 36 6 7 9 9 0 6

0 6 0 0 7 0 0 2 0 6 30 0

3 2 5 2 2 2 4 3 2 2 2 2



q

J

q





_

’



“z” T.M•oMu–Pd ‘ n ” ‘ T(‘˜——/d ‘š™5 ’>›



œ3 nn › › › n kž



To add noise, we adopt the same pairwise scheme used in [15]: given a pair of classes ( , ) and a noise level , an instance with its label is corrupted and mislabeled as with probability , as is an instance of class . This corruption method is a reasonable simulation of real scenarios, in which usually certain types of classes are likely to be mislabeled. In the following experiments, we corrupt only one pair of classes (usually the pair of classes having the highest proportion of instances). This will lead to an actual noise percentage that is lower than the specified , but we still report the value (not the actual noise level in the dataset) in all results, as in [15]. Both the number of boosting runs and the number of partitions are set to 10. The classifier used in the buildClassifier() function in Figures 1 and 2 is the C4.5 algorithm [9] implemented in the Weka tool [13]. Default setting in the Weka tool is used. Given a ranked list of all instances from the most likely noise to the least likely noise, we can cut at a sequence

"Ž  q

‘



4.2 Experimental Setting

"Ž





When we act conservatively and use a small , the recall value will be small and the precision value will usually be high. As we predict all instances as noise, we will get a 100% recall but a low precision value. When comparing two curves, we say one is better than the other if one curve is mostly above the other. Theoretically, we can have one point on the precisionrecall curve for every in . To get smoother curves, we draw average precision and recall values every 50 instances. For example, the first point on the curve is averaged over the first 50 values.

stead of repeating a lot of experiments, we leverage on existing published results. For example, classification accuracies are reported on all the datasets in Table 1 with different noise levels in [16]. At 5% noise level, the accuracies range from a little over 50% to close to 100%. The CMC dataset is the most difficult one (with a best accuracy of 53%), followed by connect-4 (with a best accuracy of 75%) and other datasets (all with an accuracy of 83% and above). As we shall see in later discussion, these observations can help explain some noise detection results.



of points and calculate precision and recall values at those points. At each cut point , assume there are true noise instances in the top instances (in the ranked list) and noise instances not in the top instances, we define precision and recall as

4.3 Results Analysis Just by looking at the precision-recall curves, we feel that BF and BBF-I are effective for most datasets. For example, at a recall rate of 0.4, the precision rate is higher than 0.8 for majority datasets (actually 9 out of 12 at a noise level of 15%). The one dataset on which BF/BBF-I work very poorly is the CMC dataset, which happens to be the most difficult dataset as discussed in Section 4.1. When we plot the precision values (with a recall rate of 0.4, at a noise level of 15%) against the accuracy values discussed in Section 4.1 (obtained from [16]), we see a highly linear correlation between them (see Figure 3, the correlation coefficient is 0.93). Figures 5-12 show the precision-recall curves for the 12 datasets in Table 1. Overall, BBF-I significantly outperforms BBF-II, except for low ( 20%) noise levels for the adult, car, and nursery datasets. The reason BBF-II performs poorly may be that too many clean instances are weighted low. The noise filter constructed in the next round loses strong support from clean data instances, which are actually vital to the success of ensemble filtering schemes. This scenario would be more commonly seen for high noise levels. For low noise levels, BBF-II may not suffer since clean instances are abundant. The performance of both BF and BBF-I decreases as noise level increases; but BBF-I does so at a much slower rate, thus provides superior noise detection performance to BF for higher values. More comparisons are summarized in Table 2. The win-tie-loss decision is based on our subjective judgment. Despite possible human errors, the trend is clear that for high noise scenarios ( 15%), BBF-I is dominantly better than BF.

Ÿ





¡ 

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

precision (with recall fixed at 0.4)

0.9

0.5 0.4

0.5 0.4 0.3

0.3

0.2

0.2 0.1 0.5

25% noise

1

0.1

0.6

0.7 0.8 classification accuracy

0.9

0 0

1

BF BBF−I (T=2) BBF−I (T=5) BBF−I (T=25) BBF−I (T=255) 0.2

0.4

Recall

0.6

0.8

1

Figure 3. The linear correlation between classification accuracy and noise detection precision.

Figure 4. Noise filtering results on the credit-a dataset with 25% noise level, with different values. BF is essentially BBF-I (T=1).

Table 2. Summary of comparisons between BBF-I and BF for different noise levels. A win-tie-loss of means BBF-I is better than BF on 4 datasets, comparable to BF on 4 datasets and worse than BF on 4 datasets.

validation, bagging, and boosting approaches, and found the first two work equally well and both are better than boosting. It is worth noting that, in their work, boosting is used to get a base noise filter. At each run, each training instance that is incorrectly predicted by the trained classifier gets higher weights in the next run. At the end of boosting, the instances with large weights are identified as noise. In contrast, we apply the boosting idea on top of a base noise filter. In short, their method applies boosting to classifiers (though the final instance weights are used for noise filtering) whereas we apply boosting to noise filters. Zhu et al. [15] mentioned a multiple round noise elimination idea which bears some similarity to our boosted noise filter, but they did not formulate it as a second-level ensemble. They reported improvement in number of noisy instances removed over multiple rounds. At each round, the noise identified from the previous round is removed, along with some clean instances. This can be viewed as using a weight of 0 for the removed instances and a weight of 1 for the rest. Based on this explanation, our approach effectively uses soft weighting on all instances. Soft weighting has an advantage in that some instances may be incorrectly identified as noise at each run and they have a chance to be corrected in later runs (whereas for hard weighting eliminated instances can never make their way back). John [6] proposed to build a robust decision tree classifier by removing misclassified instances from training data over multiple iterations (until every training instance can be correctly classified by the trained classifier). This can be roughly viewed as a hard weighting counterpart of our first weighting scheme, but with a single classifier noise filter.

_

¢+P¢>P¢

Noise level 5% 10% 15% 25% 35% 40%

BBF-I vs. BF (win-tie-loss)

 G£š¥¤ ¢¦§¢¦§¢ b+§¢¦G£ ¨šG£šGf AC5 Gf   GfšGf

_

Finally, we show an example graph on the effect of , the number of runs used in the boosting procedure. Figure 4 draws several precision-recall curves on the credit-a dataset for several different values (1, 2, 5, 25, and 255, respectively). The noise level used is 25%. We mentioned above that for our experiments is fixed at 10. In this particular example, there is actually no difference for any greater than 1.

_

_

_

5 Related Work More complete and detailed reviews of existing work on ensemble filter approaches to noise detection can be found in many papers [15, 12, 14]. Here we focus on work related to the idea of boosting noise filters. Verbaeten and Van Assche [12] compared n-fold cross

The instance weighting scheme used in our boosted noise filters essentially follows the formulation in [2], but is based on different uncertainty/confidence measures. An indirect motivation comes from [3], which applies boosting to clustering problems. Thus, our paper also effectively serves as another example of applying boosting to non-classification problems.

6 Conclusion We have presented a novel boosted noise filter approach to identifying mislabeled data. The proposed approach can be seen as a good example of using boosting for nonclassification problems. We have demonstrated the effectiveness of such a boosted approach through experimental results on 12 UCI datasets. The superiority of the proposed boosted bagging filter algorithm to the regular bagging filter algorithm is clearly seen for medium to high noise (when noise level is greater than 15%). In the future, we can compare with other base filters such as consensus classification filter (using n-fold cross validation). We will investigate different instance weighting strategies in the boosting algorithm and compare other algorithms for the second level ensemble.

References [1] C. E. Brodley and M. A. Friedl. Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11:131– 167, 1999. [2] Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997. [3] D. Frossyniotis, A. Likas, and A. Stafylopatis. A clustering method based on boosting. Pattern Recognition Letters, 25:641–654, December 2003. [4] D. Gamberger, N. Lavraˇc, and S. Dˇzeroski. Noise detection and elimination in data proprocessing: Experiments in medical domains. Applied Artificial Intelligence, 14(2):205–223, 2000. [5] D. Gamberger, N. Lavraˇc, and C. Groˇselj. Experiments with noise filtering in a medical domain. In Proc. 16th Int. Conf. Machine Learning, pages 143–151, San Francisco, CA, 1999. [6] G. H. John. Robust decision trees: Removing outliers from databases. In Proc. 1st ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, pages 174–179, Menlo Park, CA, 1995. [7] T. M. Khoshgoftaar, S. Zhong, and V. Joshi. Enhancing software quality estimation using ensemble-classifier filter based noise filtering. Intelligent Data Analysis: An International Journal, 9(1):3–27, 2005. [8] F. Muhlenbach, S. Lallich, and D. A. Zighed. Identifying and handling mislabelled instances. Intelligent Information Systems, 22(1):89–109, 2004.

[9] J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986. [10] C. B. S. Hettich and C. Merz. UCI repository of machine learning databases, 1998. [11] C. M. Teng. Correcting noisy data. In Proc. 16th Int. Conf. Machine Learning, pages 239–248, 1999. [12] S. Verbaeten and A. Van Assche. Ensemble methods for noise elimination in classification problems. In T. Windeatt and F. Roli, editors, Multiple Classifier Systems: Fourth International Workshop, volume 2709 of Lecture Notes in Computer Science, pages 317–325, 2003. [13] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, 1999. [14] X. Zhu and X. Wu. Class noise vs. attribute noise: A quantitative study. Artif. Intell. Review, 22(3):177–210, 2004. [15] X. Zhu, X. Wu, and Q. Chen. Eliminating class noise in large datasets. In Proc. 20th Int. Conf. Machine Learning, pages 920–927, Washington, DC, 2003. [16] X. Zhu, X. Wu, and Q. Chen. Eliminating class noise in large, distributed databases. Technical report, Dept. of Computer Science, University of Vermont, 2003. Available at http://www.cs.uvm.edu/tr/CS-03-12.shtml.

5% noise

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4 0.3 0.2 0.1 0 0

10% noise

1

Precision

Precision

1

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

BF BBF−I BBF−II 0.2

(a)

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4 0.3

0.1 0 0

0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

1

0.8

1

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(d)

35% noise

40% noise

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.8

0.4

0.9

0.5 0.4 0.3

0 0

1

0.3

1

0.1

0.8

0.5

(c)

0.2

0.6

25% noise

1

0.9

0.2

Recall

(b)

15% noise

1

0.4

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

(e)

0.6

0.8

1

0 0

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(f)

Figure 5. Noise detection results on the connect-4 dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

5% noise

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4 0.3 0.2 0.1 0 0

10% noise

1

Precision

Precision

1

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

BF BBF−I BBF−II 0.2

(a)

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4 0.3

0.1 0 0

0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

1

0.8

1

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(d)

35% noise

40% noise

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.8

0.4

0.9

0.5 0.4 0.3

0 0

1

0.3

1

0.1

0.8

0.5

(c)

0.2

0.6

25% noise

1

0.9

0.2

Recall

(b)

15% noise

1

0.4

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

(e)

0.6

0.8

1

0 0

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(f)

Figure 6. Noise detection results on the adult dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

5% noise

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4 0.3 0.2 0.1 0 0

10% noise

1

Precision

Precision

1

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

BF BBF−I BBF−II 0.2

(a)

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4 0.3

0.1 0 0

0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

1

0.8

1

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(d)

35% noise

40% noise

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.8

0.4

0.9

0.5 0.4 0.3

0 0

1

0.3

1

0.1

0.8

0.5

(c)

0.2

0.6

25% noise

1

0.9

0.2

Recall

(b)

15% noise

1

0.4

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

(e)

0.6

0.8

1

0 0

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(f)

Figure 7. Noise detection results on the nursery dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

5% noise

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4 0.3 0.2 0.1 0 0

10% noise

1

Precision

Precision

1

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

BF BBF−I BBF−II 0.2

(a)

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4 0.3

0.1 0 0

0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

1

0.8

1

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(d)

35% noise

40% noise

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.8

0.4

0.9

0.5 0.4 0.3

0 0

1

0.3

1

0.1

0.8

0.5

(c)

0.2

0.6

25% noise

1

0.9

0.2

Recall

(b)

15% noise

1

0.4

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

(e)

0.6

0.8

1

0 0

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(f)

Figure 8. Noise detection results on the mushroom dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

5% noise

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4 0.3 0.2 0.1 0 0

10% noise

1

Precision

Precision

1

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

BF BBF−I BBF−II 0.2

(a)

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4 0.3

0.1 0 0

0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

1

0.8

1

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(d)

35% noise

40% noise

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.8

0.4

0.9

0.5 0.4 0.3

0 0

1

0.3

1

0.1

0.8

0.5

(c)

0.2

0.6

25% noise

1

0.9

0.2

Recall

(b)

15% noise

1

0.4

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

(e)

0.6

0.8

1

0 0

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(f)

Figure 9. Noise detection results on the sick dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

5% noise

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4 0.3 0.2 0.1 0 0

10% noise

1

Precision

Precision

1

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

BF BBF−I BBF−II 0.2

(a)

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4 0.3

0.1 0 0

0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

1

0.8

1

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(d)

35% noise

40% noise

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.8

0.4

0.9

0.5 0.4 0.3

0 0

1

0.3

1

0.1

0.8

0.5

(c)

0.2

0.6

25% noise

1

0.9

0.2

Recall

(b)

15% noise

1

0.4

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

(e)

0.6

0.8

1

0 0

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(f)

Figure 10. Noise detection results on the kr-vs-kp dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

5% noise

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4 0.3 0.2 0.1 0 0

10% noise

1

Precision

Precision

1

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

BF BBF−I BBF−II 0.2

(a)

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4 0.3

0.1 0 0

0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

1

0.8

1

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(d)

35% noise

40% noise

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.8

0.4

0.9

0.5 0.4 0.3

0 0

1

0.3

1

0.1

0.8

0.5

(c)

0.2

0.6

25% noise

1

0.9

0.2

Recall

(b)

15% noise

1

0.4

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

(e)

0.6

0.8

1

0 0

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(f)

Figure 11. Noise detection results on the car dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

5% noise

1

BF BBF−I BBF−II

0.9

0.8

0.7

0.7

0.6

0.6

0.5 0.4

0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 0

0.2

0.4

Recall

0.6

0.8

BF BBF−I BBF−II

0.9

Precision

Precision

0.8

10% noise

1

0 0

1

0.2

(a)

0.7

0.6

0.6

0.5 0.4

0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

Recall

0.6

0.8

0 0

1

0.2

(c)

0.7

0.6

0.6

0.5 0.4

0.2

0.2

0.1

0.1 0.6

0.8

BF BBF−I BBF−II

0.4 0.3

(e)

1

0.5

0.3

Recall

0.8

0.8

0.7

0.4

0.6

0.9

Precision

Precision

0.8

0.2

Recall

40% noise

1

BF BBF−I BBF−II

0.9

0 0

0.4

(d)

35% noise

1

1

BF BBF−I BBF−II

0.8

0.7

0.4

0.8

0.9

Precision

Precision

0.8

0.2

0.6

25% noise

1

BF BBF−I BBF−II

0.9

0 0

Recall

(b)

15% noise

1

0.4

1

0 0

0.2

0.4

Recall

0.6

0.8

1

(f)

Figure 12. Noise detection results on the CMC dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

5% noise

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4 0.3 0.2 0.1 0 0

10% noise

1

Precision

Precision

1

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

BF BBF−I BBF−II 0.2

(a)

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4 0.3

0.1 0 0

0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

1

0.8

1

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(d)

35% noise

40% noise

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.8

0.4

0.9

0.5 0.4 0.3

0 0

1

0.3

1

0.1

0.8

0.5

(c)

0.2

0.6

25% noise

1

0.9

0.2

Recall

(b)

15% noise

1

0.4

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

(e)

0.6

0.8

1

0 0

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(f)

Figure 13. Noise detection results on the tic-tac-toe dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

5% noise

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4 0.3 0.2 0.1 0 0

10% noise

1

Precision

Precision

1

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

BF BBF−I BBF−II 0.2

(a)

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4 0.3

0.1 0 0

0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

1

0.8

1

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(d)

35% noise

40% noise

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.8

0.4

0.9

0.5 0.4 0.3

0 0

1

0.3

1

0.1

0.8

0.5

(c)

0.2

0.6

25% noise

1

0.9

0.2

Recall

(b)

15% noise

1

0.4

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

(e)

0.6

0.8

1

0 0

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(f)

Figure 14. Noise detection results on the credit-a dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

5% noise

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4 0.3 0.2 0.1 0 0

10% noise

1

Precision

Precision

1

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

BF BBF−I BBF−II 0.2

(a)

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4 0.3

0.1 0 0

0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

1

0.8

1

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(d)

35% noise

40% noise

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.8

0.4

0.9

0.5 0.4 0.3

0 0

1

0.3

1

0.1

0.8

0.5

(c)

0.2

0.6

25% noise

1

0.9

0.2

Recall

(b)

15% noise

1

0.4

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

(e)

0.6

0.8

1

0 0

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(f)

Figure 15. Noise detection results on the WDBC dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

5% noise

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4 0.3 0.2 0.1 0 0

10% noise

1

Precision

Precision

1

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

BF BBF−I BBF−II 0.2

(a)

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4 0.3

0.1 0 0

0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

0.6

0.8

0 0

1

1

0.8

1

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(d)

35% noise

40% noise

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.8

0.4

0.9

0.5 0.4 0.3

0 0

1

0.3

1

0.1

0.8

0.5

(c)

0.2

0.6

25% noise

1

0.9

0.2

Recall

(b)

15% noise

1

0.4

0.5 0.4 0.3 0.2

BF BBF−I BBF−II 0.2

0.1 0.4

Recall

(e)

0.6

0.8

1

0 0

BF BBF−I BBF−II 0.2

0.4

Recall

0.6

(f)

Figure 16. Noise detection results on the monks3 dataset with six different noise levels: (a) 5%; (b) 10%; (c) 15%; (d) 25%; (e) 35%; and (f) 40%.

Suggest Documents