One Class Classification using Implicit Polynomial

One Class Classification using Implicit Polynomial Surface Fitting Aytül Erçil1, Burak Büke 2, 1

Faculty of Engineering and Natural Sciences Sabanci University Orhanli Istanbul 81474 Turkey [email protected] 2 Industial Engineering Department Boðazici University

Abstract When the number of objects in the training set is too small for the number of features used, most classification procedures cannot find good classification boundaries. In this paper, we introduce a new technique to solve the one class classification problem based on fitting an implicit polynomial surface to the point cloud of features to model the one class which we are trying to separate from the others.

1. Introduction Many real world classification problems face the difficulty of not having enough data for each class. Especially, for the problem of defect classification, we often have a lot of samples for the nondefective class, but only a few samples for various defect classes. One would like to model the observations from a non-defective class and all the observations coming from defective classes should be identified as outliers. This problem cannot be solved efficiently using two-class classification by combining all the observations from various defect classes into one class and applying standard pattern recognition techniques since the variation of the defect class might be very large due to having observations from multiple defect sources. One class classification is often solved using density estimation or a model based approach. Tax et. al. have used a method inspired by the Support vector classifier, where instead of using a hyperplane to distinguish between two classes, they use a hypersphere around the target. [2,5,6]. In this paper, we will try to describe the observations from non-defective class by fitting a surface around the point cloud in d dimensional space, where d is the number of features, and points outside this surface will be labeled as defective.

We will illustrate the results of the developed technique on a dataset obtained from Artesis A.S. for defect inspection of engines.

2. Description of the Data In the following analyses, data from M4 washing machine motors have been used. There were 1016 motors which were reported as non-defective and 74 motors from 4 different defect categories with 42, 5, 11 and 16 samples in the four defect groups. The data set will be referred to as the MQM data. Eighteen features are measured for each engine, in order to detect and separate the defective engines before going to the market. The engines labeled as defective are taken apart to be used as parts, which makes the cost of false positives very high. Hence to be of practical use, the false alarm rate should not be more than 12%.

2.1 Feature Analysis We have done preliminary feature analysis to try to identify the featýres that have more discriminatory powers. [4] Analyzing the means and variances of each of the features using t-tests, we saw that for all variables except for variables 17 and 18, we can reject the null hypothesis (of equal means) with 95% confidence (for many cases, the rejection can be done with much higher confidence). These results indicate that features 1-16 have some discriminatory power. Principle component analysis have been used to obtain linear combinations of the data and the first 6 principle components contain more than 97% of the variation in the data indicating that reduction of the feature dimension space is highly feasible. In further

analyses, principle components will be used instead of the original features. 5

0

3. Classification based on Modeling One Class

-5

-10

3.1 Fitting an Ellipsoid to the Data Since we have very limited number of observations from defective motors, many of the classical pattern recognition methods do not perform very well or are not even applicable. Here, we will try a new approach, where we will try to describe the observations from non-defective motors by fitting a surface around the point cloud in d dimensional space, where d is the number of features, and points outside this surface will be labeled as defective. Since most of the individual features from nondefective motors are observed to follow a normal distribution, it is reasonable to assume that the surface is a hyper-ellipsoid. So we will start the analysis with fitting a minimal volume hyper-ellipsoid which contains all the data. Figure 1 shows an ellipse fitted to the first two principle components where the length of the axes are taken as the eigenvalues of the covariance matrix. Before fitting the ellipse, we discard observations, that are potentially outliers by calculating the distance from every point to the centroid and discarding the top k observations that have the largest distances. (k is taken to be 10 in the following results) 8

6

4

2

0

-2

-4

-6

-10

-8

-6

-4

-2

0

2

4

6

-15

-20 -20

-10

-5

0

5

10

Figure 2. Overlaying observations from defective class onto the model of non-defective class

30

20

10

0

-10 30 20 20

10

10 0

0 -10 -10

-20

Figure 3. Modeling with 3 principle components Table 1 gives the results for various combinations of the ellipsoid method. The first parameter represents the number of data points that are discarded before fitting the ellipse. The second parameter in the table below represent the number of principle components used in the analysis. Table 1. The results of the ellipsoid method

8

Figure 1. Fitting an ellipse to the first 2 principle components If we now overlay the observations from defective motors onto the previous plot, we see that quite a few of the observations from defective motors lie outside ellipse. Magenta, cyan, black and green observations represent data from different defect classes. The procedure given above can be generalized to fitting hyper-ellipsoids in higher dimensions. Figure 3 shows the hyper-ellipsoid fitted to the first 3 principle components with observations from defective motors overlayed.

-15

1,1 0,2 1,2 5,3 10,3 1,4 2,4 3,4 4,4 5,4 10,4 5,5 10,5 3,6

Percent correct classification 6.76 28.38 28.38 68.92 75.68 86.49 86.49 86.49 86.49 86.49 86.49 86.49 86.49 89.19

False alarm rate 0.39 0.89 0.89 1.57 2.76 1.57 1.57 1.87 1.87 1.87 3.15 2.95 4.53 4.72

5,6 10,6 5,7 10,7 5,8 10,8 5,9 10,9 0,10 1,10 2,10 3,10 5,10 1,11 2,11 0,12 2,12 1,13 2,13 1,14 2,14

89.19 90.54 98.65 98.65 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

4.92 6.50 5.51 8.66 8.37 11.91 11.12 13.88 9.65 9.65 10.83 11.02 11.61 11.42 13.19 15.65 15.65 14.27 16.54 15.55 17.72

As we can see from the above results, we can carry out a reasonable classification with 3 or 4 principle components. As the number of components used increases, even though the correct classification rate increases, the false alarm rate also increases which makes the results impractical. Standard classification techniques result in much lower correct classification rates for the same data set [ ].

3.2 Fitting an Implicit Polynomial to the Data In many real life situations the features are not necessarily normal. There are many techniques available to describe object boundaries, such as Bsplines, Non-uniform Rational B-Splines, Fourier descriptors, chain codes, polygonal approximation, curvature primal sketch, medial axis transform. In this work, we will use implicit algebraic 2D curves and kD surfaces because of their inside-outside property. In 2D an implicit curve of degree n is given by the following equation:

f n ( x, y) =

∑

0≤i, j;i+ j≤n

implicit polynomial fitted to the data will usually be close to the points of Γ0 but cannot contain all of them. [1] With this approach, objects in 2D images are described by their silhouettes and then represented by 2D implicit polynomial curves; objects in 3D data are represented by implicit polynomial surfaces. We will now try to find the implicit polynomial function f(x,y) that best represents the object. Many good algorithms have been presented for solving this problem to get the best fitting polynomial. In this work we will use an implicit polynomial fitting technique which can be outlined as follows: If we represent the contour of the object in polar coordinates, taking the center of the object as the center of mass of the object, we can approximate the radius function with a Fourier series up to a desired approximation degree since the radius function is periodic with respect to turning angle with 2π radians. Finding a suitable conversion rule between this parametric approximation and the corresponding implicit polynomial form, a new representation for implicit polynomials is given. [7] The radius function is discrete valued, it will be represented as sampled versions of continuous valued functions. One main problem is that, points obtained from the boundary may not be equally sampled. For Fourier series expansion in discrete domain, equally sampled data is a must. To overcome this problem, interpolation between samples which do not obey equal sampling rule is carried out. We will now go through each of these steps separately on the MQM data set.

3.2.1 Extracting the boundary of the point cloud Before carrying out the implicit polynomial fit to our motor data, the boundary points of the data must be extracted. The results of applying a boundary extraction algoritm [4] are shown in Figure 4.

aij xi y j = a00 + a10 x + a 01y + a20 x2

+ a11xy + a02 y2 + ...+ an0 xn + an−1,1xn−1 y + ...+ a0n yn = 0 The implicit equation f n ( x , y ) has one constant term, two terms of the first degree, three terms of the second degree, etc., up to and including n+1 terms of the highest n-th degree, for a total of ( n + 1)( n + 2) / 2 coefficients. An implicit polynomial curve is said to represent an object

Γ0 = {( xi , yi )|i = 1,..., K} if every point of the shape Γ0 is in the zero set of the implicit polynomial Ζ ( f ) = {( x , y )| f ( x , y ) = 0} . The zero set of an

Figure 4. Extracted boundary of the feature values

Lack of sufficient number of boundary points might be a potential problem for fitting and hence we generated more points on the boundary by doing a linear approximation. We repeat this process several times. The boundary points after interpolation are shown in Figure 5. 2.5 2 1.5 1 0.5

th

Figure 7. Fourier Fit in 4 order

0 -0.5 -1 -1.5 -2 -5

-4

-3

-2

-1

0

1

2

3

4

Figure 5. Increasing the number of boundary points Since the fitted polynomial with go through the data points, many of the observations will be outside the polynomial obtained which will lead to high false alarm rate. Hence, it would be desirable to obtain a confidence band around the boundary points to insure that all the points fall inside the fitted polynomial. Figure 6 shows the fitted implicit polynomials for different confidence levels.

Figure 6. The fitted implicit polynomial for difference confidence levels The next step is to order the boundary points. For ordering the boundary points we convert the x-y coordinates into polar coordinates (r,φ). We then order the data according to φ. The final step is the fitting of the implicit polynomial. The details about fitting an implicit polynomial to data is given in [7]. Figures 7-8 show the effect of fitting various degree polynomials to the data. The percent value in the confidence band is taken to be 0.3.

th

Figure 8. Fourier Fit in 12 order Higher order fits can be memorizing the data at hand and may not be useful for prediction, hence we will use a low ordered polynomial fit for modeling the data. Figure 9 shows the 6th degree implicit surface fit. The observations from non-defective motors are shown in green and the observations from defective motors are shown in red.

th

Figure 9. 6 degree implicit polynomial fit with defective motors overlayed Table 2 shows the results of this technique using a tolerance band of 30%. Table 2. Classification results with confidence band of 20% Non-defective Defective %correct Non-defective 1004 12 98.8 Defective 11 63 85.1

If we change the confidence level to 20%, we see that the false alarm rate increases slightly, however, the detection level also increases significantly. Table 3. Classification results with confidence band of 20% Non-defective 1001 8

Non-defective Defective

Defective 15 66

%correct 98.5 89.2

Figure 10 shows the results of fitting a second degree implicit polynomial to 3 principle components and Table 3 gives the results of the classification using 3 principle components.

nd

Figure 10. 2 degree implicit polynomial fit to 3 principle components. Table 3. Classification results with confidence band of 20% Non-defective Defective

Non-defective 1000 3

Defective 16 71

%correct 98.43 95.83

4. Summary and Conclusions In this paper, we have outlined a new technique for solving the one class classification problem based on fitting an implicit polynomial surface to the point

cloud of features to model the one class which we are trying to separate from the others. The technique has been applied to the problem of defect detection of engines. The classical pattern recognition techniques have not shown a great success due to lack of data from defective motors. However, the surface fitting technique has shown a considerable improvement in the correct classification rate in addition to having the advantage of requiring data only from non-defective motors in the learning stage.

References [1] Civi, H., "Implicit Algebraic Curves and Surfaces for Shape Modelling and Recognition," Ph.D. Dissertation, Bogazici University, October 1997. [2] De Ritter Dick, Tax, D., Duin, R.P.W. “An experimental comparison of one-class classification methods” Proceedings of the Fourth Annual Conference of the Advanced School for Computing and Imaging, ASCI, Delft, June 1998. [3] Duda, R. , Hart, P.E., Stork, D.G. Pattern Classification, John Wiley and Sons, 2001. [4] Ercil, A. “Pattern recognition techniques for motor defect classification”, Research Report, ARTESIS A.ª. [5] Tax, D., Ypma, A., Duin, R.P.W. “Support Vector Data Description applied to Machine Vibration Analysis”, Proceedings of the Fifth Annual Conference of the Advanced School for Computing and Imaging, ASCI, Delft, June 1999. [6] Tax, D., Duin, R.P.W. “Data Domain Description using Support Vectors”, Proceedings of European Symposium on Artificial Neural Networks’99, Brugge, 1999. [7] Ünsalan, C., Erçil, A. “New Robust and Fast Technique for Implicit Polynomial Fitting”, Proceedings of M2 VIP, , p. 15-20, September 1999.

One Class Classification using Implicit Polynomial

One Class Classification using Implicit Polynomial

Suggest Documents

Fuzzy One-Class Classification Model Using Contamination

Deep One-Class Classification

One-Against-All Multi-Class SVM Classification Using ... - CiteSeerX

Using One-Class Classification Techniques in the Anti-phoneme

A One-class Classification Framework using SVDD: Application to an

One-Class SVMs for Document Classification - CiteSeerX

One-Class Classification with Extreme Learning Machine

Can I Trust My One-Class Classification?

Counter-Example Generation-Based One-Class Classification

Multi-modality in One-class Classification

Virtual Screening Based on One Class Classification

KERNEL WHITENING FOR ONE-CLASS CLASSIFICATION

Simple Incremental One-Class Support Vector Classification

ON SIMPLE ONE-CLASS CLASSIFICATION METHODS ... - CiteSeerX

One-Class versus Multi-Class Classification - Department of Computer ...

Image Classification Using SVMs: One-against-One Vs One ... - arXiv

Multi-class SVMs for Image Classification using

Anomaly Detection using One-Class Neural Networks

Combining One-Class Classifiers Using Meta Learning

Support Vector One-Class Classification for Multiple ... - eurasip

One-Class Classification for Spontaneous Facial Expression Analysis

An One Class Classification Approach to Non-relevance Feedback

Block Implicit One-Step Methods*

On the Exploitation of One Class Classification to ... - IPLab - Unict