Classification in the Presence of Class Noise Supplementary Material Yunlei Li
Marcel J. T. Reinders
Lodewyk F. A. Wessels
Information and Communication Theory Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands
[email protected] Keywords: pattern recognition, adaptive image segmentation, class noise, kernel, Fisher Discriminant
1. Estimating the Number of Mixtures One of the main difficulties concerning a MoG model to represent the class conditional density is the decision on the number of components K. Many approaches have been presented to find an estimate of K, e.g. Wolfe (1971), Kass & Raftery (1995), Jain & Moreau (1987), Levine & Domany (2001). One of the popularly used methods, the resampling technique, tries to assess the stability of the clustering results under perturbations. The underlying assumption is that the more stable the results are with respect to the perturbations, the more these results are representative of the real structure. The resampling method goes as follows. The dataset is repeatedly sampled and the sampled datasets are clustered. Then the clustering results are compared in order to find the setting of the cluster parameters that shows the most stable clustering. More precisely, given a training set D of n samples, two subsets D1 and D2 of D are created by randomly selecting f x n (0 < f