One-class and Multi-class classifier combining for ill ... - CiteSeerX

1 downloads 0 Views 337KB Size Report
Oct 14, 2004 - sisting of sequential combinations of one-class and multi-class .... changes completely, a one-class classifier strategy [14] may be the most ...
One-class and Multi-class classifier combining for ill-defined problems

Thomas C.W. Landgrebe a,∗ David M.J. Tax a Pavel Pacl´ık a Robert P.W. Duin a Colin M. Andrew a Elect.

Eng., Maths and Comp. Sc., Delft University of Technology, Delft, The Netherlands

Abstract Classifier performance can be severely affected when new unseen classes are present, or the conditional distribution of one of the classes changes. Both classification and rejection performance should be considered. The distance-based reject-option is commonly used in this situation. The model chosen for classification is used in rejection. A model chosen emphasising classification performance may be at the expense of rejection performance, and the opposite also holds. In this paper a classification strategy is presented, consisting of the sequential combining of one-class and multi-class classifiers. Two variants of this classifier are presented. These strategies have the flexibility to select distinct models for classification and rejection, and operate on local regions of the data to emphasise either classification or rejection. An evaluation methodology is presented, and a number of real-world experiments are carried out that illustrate the potential of this approach, showing that in some situations they can improve over the reject-option. Key words: Ill-defined classes, Reject-option, Classifier-combining, One-class classification, Multi-stage

∗ Corresponding author. Tel.: +31 (0)15 27 88433; fax: +31 (0)15 27 81843; P.O. Box 5031 2600 GA Delft, Mekelweg 4 2628CD Delft, The Netherlands Email addresses: [email protected] (Thomas C.W. Landgrebe), [email protected] (David M.J. Tax), [email protected] (Pavel Pacl´ık), [email protected] (Robert P.W. Duin), [email protected] (Colin M. Andrew).

Preprint submitted to Elsevier Science

14 October 2004

1

Introduction

Typical detection problems require the recognition of examples originating from a known target class ωt , distributed among examples originating from an outlier class ωo . A typical procedure here is to gather a representative dataset for both classes, building enough information to estimate the priors and class-conditional densities. Now Bayes rule can be used to assign class membership to each example in a given test set i.e. assign example to class with highest posterior probability [5]. However in some applications there is a difference between conditions assumed during training, and the actual conditions. The problem emphasised here is where a target class is well represented during training, but the conditional distribution of a second outlier class may differ to that assumed during training. This may be due to new unseen outlier classes or clusters, or physical phenomena such that the distribution varies, e.g. sensor drift [3]. The implication is that classifier performance may be worse than expected. Examples of applications affected by this problem: • Diagnostic problems in which the objective of the classifier is to identify abnormal operation (outlier class) from normal operation (target class) [4]. It is often the case that a representative training set can be gathered for the target class, but due to the nature of the problem the outlier class cannot be sampled in a representative manner. For example in machine fault diagnosis [17] a destructive test for all possible abnormal states may not be feasible. • Recognition systems, often involving a rejection and classification stage, for example road sign classification, in which a classifier needs not only to discriminate between examples of road sign classes, but must also reject non-sign class examples [12]. Gathering a representative set of non-signs may not be possible. Similarly face detection [13], where a classifier must deal with well-defined face classes, and an ill-defined non-face class, and handwritten digit recognition [10], where non-digit examples are a serious issue. We argue that in these situations that even though the outlier class is not necessarily well defined, a classifier can still benefit from the partial knowledge of this class. The rationale is to design a classifier that makes a good trade-off between known classes (that may be overlapping), and to protect the classifier against changing conditions with respect to the outlier class. Previous work in this area has typically been the classifier with reject option. Dubuisson and Masson proposed the distance reject option in [4]. This rejection scheme was designed to cope with the condition in which new classes are present that are not represented during training, introducing an additional 2

reject class. New examples situated a particular distance from known class centroids are rejected. A distance reject threshold td is chosen on which to base this decision. A similar procedure can be applied to density-based classifiers, except here the class conditional density is thresholded. The limitation of this approach is that the same model is used to perform both the classification and rejection. Optimising one of these may be at the expense of the other, as will be shown later. In this paper we present a classification strategy to tackle this problem, consisting of sequential combinations of one-class and multi-class classifiers (called SOCMC). The proposed 2-stage schemes allow both rejection and classification performance to be explicitly modified by varying the respective models and representations. A model is chosen locally in the area of known overlap, focused on classification performance, and another model focuses on describing the class boundary such that the probability of accepting new outlier examples is minimal. This was first demonstrated in [9]. In this paper we present, in addition, a variant of the SOCMC classifier involving two one-class classifiers at the first stage (trained on ωt and ωo respectively), and a second stage that handles ambiguous examples only. We call this strategy SOCMC-M. The SOCMC-M selects a small, local region in feature space focused on the overlap region only, for the second stage. The original SOCMC, however, selects a much larger local region, much of which is redundant in terms of designing a focused model in the region of known overlap. Notation and evaluation is presented in 2. The classifier with reject-option is summarised in Section 3. In Section 4 the new combining schemes are presented. In Section 5 a series of experiments are carried out on a number of real-world datasets, comparing classical discriminant-based classifiers, reject-option classifiers, and the new SOCMC and SOCMC-M classifiers. Also included is an evaluation methodology that can be carried out for these illdefined problems. Conclusions are given in Section 6.

2

Notation and Evaluation

Consider a vector of measurements x of dimensionality d representing each example for a given problem, such that an object i can be represented by xi = (xi,1 , xi,2 , . . . , xi,d ). Assume that x is drawn from a unconditional density p(x), composed of a target and outlier class. The target class ωt is well defined, and the outlier class is only partially represented, with the possibility that data can occur that is not distributed as assumed during training 1 . Define ωo as the 1

We do however assume that the domain remains relatively constant, with the possibility of new examples or clusters occurring outside this domain. If the domain

3

class that is represented in training, and ωr as the new/reject class that is illdefined, and should be classified as outlier. Define the respective class priors as p(ωt ), p(ωo ), p(ωr ) respectively 2 , and the respective class-conditional densities as p(x|ωt ), p(x|ωo ), and p(x|ωr ). p(x) can then be written as in Equation 1. p(x) = p(ωt )p(x|ωt ) + p(ωo )p(x|ωo ) + p(ωr )p(x|ωr )

(1)

In training we have access to ωt and ωo , and in testing ωr will also appear. Two classification types are dealt with in this paper. The first are multiclass classifiers (sometimes referred to as MCC’s or discriminators), which are typically two-class discriminant classifiers, denoted DM CC . For example, a classifier trained on ωt and ωo would be defined as in Equation 2, with pˆ(ω|x) representing an estimate of the posterior probability of class ω. These classifiers result in an open/unconstrained decision boundary, which are not robust to new classes or changes in the distribution of one of the classes.  target

if pˆ(ωt |x) > pˆ(ωo |x) DM CC :  outlier otherwise

(2)

The second classification type used is one-class classification (sometimes referred to as OCC’s), denoted DOCC [14]. These classifiers are trained on only a single class, resulting in a closed description of the class density or domain. No assumptions about other classes are made, and thus these classifiers do not make a trade-off between overlapping classes. The decision boundary is, however, closed, i.e. all objects situated outside the class description are rejected as outliers, providing protection against new, unseen classes. The OCC description/model is trained, with some allowance made for outliers in the training set by adjusting a decision threshold θ. The DOCC can be written as in Equation 3, classifying all objects as either target or outlier.  target

if pˆ(x|ωt ) > θ DOCC :  outlier otherwise

(3)

Essential to reasoning and experimentation is the development of a suitable evaluation for this type of problem. A typical evaluation procedure for wellrepresented problems is to inspect the classifier error rates based on independent test data that is representative of the training data. This performance measure is denoted perf(ωt , ωo ) here for a two-class problem between a known target and outlier class (ωt and ωo ), as discussed above. We loosely refer to this as the seen or classification performance. In the case in which the outlier changes completely, a one-class classifier strategy [14] may be the most appropriate. 2 Unknown or varying class priors can have a significant effect on classifier performance. A discussion of the implications and evaluation strategies in this case can be found in [8].

4

conditional distribution varies, it is of interest to evaluate the classifier with respect to the robustness to changing conditions. This performance is referred to as the rejection/unseen performance. As discussed, the outlier class is artificially decomposed into ωo and ωr , referring to known and unknown information about the outlier class. The reject performance is denoted perf(ωt , ωr ) (a methodology to estimate this is given in Section 5). The best classifier would obtain the highest perf(ωt , ωo ), and perf(ωt , ωr ) scores. Experiments show that there is in fact a trade-off between these two measures. Optimising this tradeoff is the topic of current research. In experiments in this paper we compare reject-option and SOCMC approaches by using the same classification models (which result in a similar classification performance), and then primarily compare approaches on the basis of rejection performance.

3

Classifier with reject option

In [2], Chow proposed that when the cost of misclassification is higher than the cost of rejection, the example in question should be rejected, based on thresholding of the posterior probability. This reject option is referred to as the ambiguity reject-option, and consists of two stages. In the first stage an example is assigned to the target or outlier class using Bayes rule. In the second stage, the relevant posterior of the assigned class (p(ωt |x) or p(ωo |x)) is examined and compared to a reject threshold td . Examples are either assigned to an ACCEP T region

Suggest Documents