Sampling Method for Fast Training of Support Vector Data ... - arXiv

28 downloads 2016 Views 2MB Size Report
Sampling Method for Fast Training of Support. Vector Data Description. Arin Chaudhuri and Deovrat Kakde and Maria Jahja and Wei Xiao and. Seunghyun ...
Sampling Method for Fast Training of Support Vector Data Description Arin Chaudhuri and Deovrat Kakde and Maria Jahja and Wei Xiao and Seunghyun Kong and Hansi Jiang and Sergiy Peredriy

arXiv:1606.05382v3 [cs.LG] 25 Sep 2016

SAS Institute Cary, NC, USA Email: [email protected], [email protected], [email protected], [email protected] [email protected], [email protected], [email protected]

Abstract—Support Vector Data Description (SVDD) is a popular outlier detection technique which constructs a flexible description of the input data. SVDD computation time is high for large training datasets which limits its use in big-data processmonitoring applications. We propose a new iterative samplingbased method for SVDD training. The method incrementally learns the training data description at each iteration by computing SVDD on an independent random sample selected with replacement from the training data set. The experimental results indicate that the proposed method is extremely fast and provides good data description.

1 C = nf : is the penalty constant that controls the trade-off between the volume and the errors, and, f : is the expected outlier fraction. Dual Form: The dual formulation is obtained using the Lagrange multipliers. Objective Function:

max

n X

αi (xi .xi ) −

i=1

I. I NTRODUCTION Support Vector Data Description (SVDD) is a machine learning technique used for single class classification and outlier detection. SVDD technique is similar to Support Vector Machines and was first introduced by Tax and Duin [12]. It can be used to build a flexible boundary around single class data. Data boundary is characterized by observations designated as support vectors. SVDD is used in domains where majority of data belongs to a single class. Several researchers have proposed use of SVDD for multivariate process control [11]. Other applications of SVDD involve machine condition monitoring [13], [14] and image classification [10]. A. Mathematical Formulation of SVDD Normal Data Description: The SVDD model for normal data description builds a minimum radius hypersphere around the data. Primal Form: Objective Function: 2

min R + C

n X

X

αi αj (xi .xj ),

(4)

i,j

subject to: n X

αi = 1,

(5)

0 ≤ αi ≤ C, ∀i = 1, . . . , n.

(6)

i=1

where: αi ∈ R: are the Lagrange constants, 1 : is the penalty constant. C = nf Duality Information: Depending upon the position of the observation, the following results hold: Center Position: n X αi xi = a. (7) i=1

Inside Position: kxi − ak < R → αi = 0.

(8)

Boundary Position: ξi ,

(1)

kxi − ak = R → 0 < αi < C.

(9)

i=1

Outside Position:

subject to: kxi − ak2 ≤ R2 + ξi , ∀i = 1, . . . , n,

(2)

ξi ≥ 0, ∀i = 1, ...n.

(3)

where: xi ∈ Rm , i = 1, . . . , n represents the training data, R : radius, represents the decision variable, ξi : is the slack for each variable, a: is the center, a decision variable,

kxi − ak > R → αi = C.

(10)

The radius of the hypersphere is calculated as follows: R2 = (xk .xk ) − 2

X i

αi (xi .xk ) +

X

αi αj (xi .xj ).

(11)

i,j

using any xk ∈ SV

Suggest Documents