Smooth support vector machine based on piecewise ... - Science Direct

The Journal of China Universities of Posts and Telecommunications October 2013, 20(5): 122–128 www.sciencedirect.com/science/journal/10058885

http://jcupt.xsw.bupt.cn

Smooth support vector machine based on piecewise function WU Qing1 (), FAN Jiu-lun2 1. School of Automation, Xi’an University of Posts and Telecommunications, Xi’an 710121, China 2. School of telecommunication and information engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China

Abstract Support vector machines (SVMs) have shown remarkable success in many applications. However, the non-smooth feature of objective function is a limitation in practical application of SVMs. To overcome this disadvantage, a twice continuously differentiable piecewise-smooth function is constructed to smooth the objective function of unconstrained support vector machine (SVM), and it issues a piecewise-smooth support vector machine (PWESSVM). Comparing to the other smooth approximation functions, the smooth precision has an obvious improvement. The theoretical analysis shows PWESSVM is globally convergent. Numerical results and comparisons demonstrate the classification performance of our algorithm is better than other competitive baselines. Keywords

SVM, smooth technique, piecewise function, bound of convergence

1 Introduction SVM is a novel machine learning method based on structure risk minimization principle [1–2], which can find global optimum solutions to problems. It has gained a great deal of attention due to its generalization performance. In fact, SVM has surpassed neural network and becomes the most popular method among all the statistically learning methods. As a powerful tool for supervised learning, SVM has been successfully applied to a variety of real-world problems like mechanical faults diagnostic, text categorization, bioinformatics and financial applications [3–7]. SVM has a unique global optimal solution and avoids the curse of dimensionality. It has been made some progress in classification [8–12]. SVM can be formulated into a non-smooth unconstrained optimization problem [7– 11], but the objective function is non-differentiable at zero. To overcome this disadvantage, Lee et al. used the integral of the sigmoid function to get a smooth support vector (SSVM) model in Ref. [13]. It is a very important and significant result to SVM since many famous Received date: 21-01-2013 Corresponding author: WU Qing, E-mail: [email protected] DOI: 10.1016/S1005-8885(13)60100-4

algorithms can be used to solve it. Yuan et al. proposed two polynomial functions, namely, a smooth quadratic polynomial function and a smooth fourth polynomial function, and got a fourth polynomial smooth support vector machine (FPSSVM) model and a quadratic polynomial smooth support vector machine (QPSSVM) model [14–15] in 2005. Most recently, they used the QPSSVM to forecast the movement direction of financial time series [6]. Xiong et al. derived an important recursive formula and got a class of smooth functions using the technique of interpolation functions in 2007 [16]. In 2009, Purnami et al. proposed a multiple knot spline (MKS) function and replaced the plus function by MKS function. Then they obtained a MKS-SSVM [17]. In Ref. [18], Yuan et al. used a three-order spline function to smooth the objective function of unconstrained optimization problem of SVM and obtained a three-order spline smooth support vector machine (TSSVM) model. In 2011, Yuan et al. introduced a first-order continuously differentiable smooth spline function for approximating the plus function with interpolation theory and got a corresponding smooth SVM [19]. However, the efficiency or the precision of the above algorithms were limited. Whether there is another efficient smooth function to approximate the objective function of unconstrained

Issue 5

WU Qing, et al. / Smooth support vector machine based on piecewise function

optimization problem of SVM is a challenge at present. In this paper, we introduce a novel smooth approximation technique in which a piecewise smooth function is developed for the non-differential term. Theoretical analyses show that the piecewise smooth function has the higher approximation accuracy to the plus function than the available smooth functions. Rough set theory is used to prove the global convergence of the PWESSVM and obtain the upper bound of convergence. The fast Newton-Armijo algorithm [20–21] is employed to train the PWESSVM. Numerical experiments results demonstrate that the PWESSVM is more effective than the existing works. The paper is organized as follows. In Sect. 2, we state the pattern classification and describe the PWESSVM. The approximation performance of smooth functions to the plus function is compared in Sect. 3. Then in Sect. 4, the convergence performance of the PWESSVM is presented. The fast Newton-Armijo algorithm is applied to train the PWESSVM in Sect. 5. Sect. 6 gives numerical experiments. A brief conclusion of this paper is made in the last section. In this paper, unless otherwise stated, all vectors are column vectors. The scalar (inner) product of two vectors x, y in the n-dimensional real space will be denoted by x y and the p-norm will be denoted by || ⋅ || p . For a m×n

matrix A ∈

, Ai is the ith row of A which is a row

n

vector in . A column vector of ones of n dimension will be denoted by e .

2

PWESSVM

Now we consider a binary classification problem with m training samples in the n-dimensional real space n . It is represented by the m × n matrix A, according to membership of each point Ai in the class 1 or −1 as specified by a given m × m diagonal matrix D with 1 or −1 along its diagonal. For this problem, the standard SVM with a linear kernel is given by the following quadratic program with parameter υ > 0 1 ⎫ min υe Τ y + w Τ w ⎪ ( w , γ , y )∈ n +1+ m (1) 2 ⎬ ⎪ s.t. D( Aw − eγ ) + ye; y0⎭ where e is a vector of ones, w is the normal to the bounding plane and γ is the distance of the bounding plane to the origin. The linear separating plane is defined as following

P = { xi | xi ∈

n

, w Τ xi = γ }

123

(2)

The first term in the objective function of Eq. (1) is the 1-norm of the slack variable y with weight υ . Replacing e Τ y with the 2-norm vector y and adding (1/ 2)γ 2 to the objective function, the strong convexity is induced. It has little or no effect on the problem. Then SVM model is transferred to the following problem υ Τ 1 ⎫ min y y + (wΤ w + γ 2 ) ⎪ ( w , γ , y )∈ n +1+ m 2 (3) 2 ⎬ ⎪ s.t. D( Aw − eγ ) + ye; y0⎭ Let y = (e − D ( Aw − eγ ))+ , where (⋅) + replaces negative components of a vector by zeros. The plus function x+ is defined as x+ = (( x1 )+ , ( x2 )+ ,..., ( xm ) + )Τ , where ( xi )+ = max{0, xi } , xi ∈ , i = 1, 2,..., m. Then we can convert the SVM Eq. (3) into the following unconstrained optimization problem 1 1 (4) minn +1 υ || (e − D( Aw − eγ ))+ ||2 + ( w Τ w + γ 2 ) ( w ,γ )∈ 2 2 This is a strongly convex minimization problem and it has a unique solution. (e − D( Aw − eγ ))+ is nondifferentiable, which means the objective function of the optimization problem Eq. (4) is non-differentiable. Many optimization algorithms based on derivative and gradient can not solve the problem Eq. (4). In 2001, Lee et al. [13] employed the integral of the sigmoid function p(t , k ) to approximate the non-differentiable function t + as follows 1 p(t , k ) = t + ln(1 + e − kt ); t ∈ , k > 0 (5) k where ln(⋅) is the natural logarithm, e is the base of natural logarithm and k is a smoothing parameter. They got the SSVM model. In 2005, Yuan et al. [14–15] presented two polynomial functions in the following manner ⎧t ; t 1 ⎪ k ⎪k 2 1 q (t , k ) = ⎨ t + t + 1 ; − 1 < t < 1 , k > 0 (6) 2 4k k k ⎪4 ⎪ 0; t − 1 ⎪⎩ k ⎧t ; t 1 ⎪ k ⎪ 1 (kt + 1)3 (kt − 3); h(t , k ) = ⎨− 16 k ⎪ ⎪0; t − 1 ⎪⎩ k

− 1 0 k k

(7)

124

The Journal of China Universities of Posts and Telecommunications

Using the above smooth functions to proximate plus function t + , they got two smooth polynomial support vector machine models FPSSVM and QPSSVM. Theory analysis and numerical results showed FPSSVM and QPSSVM were more efficient than SSVM. Xiong et al. [16] derived an important recursive Eq. (8) and proposed a class of smoothing functions using the interpolation technique base on FPSSVM and QFSSVM in 2007. d −1 ⎫ t ⎛⎜ t 2 − 12 ⎞⎟ ⎪ − d 2( 1) k ⎠ − ⎪ I d −1 = ⎝ I d ; 2 d − 2 (8) ⎬ 2d − 1 (2d − 1)k 2 ⎪ ⎪ pd (t , k ) = a ∫ I d −1dt ⎭ where pd (t , k ) are dth rank smooth functions, d = 2,3, 4,L Using polynomial functions pd (t , k ) , Xiong gained a kind of smooth SVM. In 2007, a three-order spline Eq. [18] was introduced as follows ⎧t ; t > 1 ⎪ k ⎪ k2 3 k 2 1 ⎪− t + t + t + 1 ; 0 < t 1 ⎪ 6 2 2 6k k (9) T (t , k ) = ⎨ 2 ⎪ k t 3 + k t 2 + 1 t + 1 ; − 1 < t0 2 2 6k k ⎪6 ⎪ 1 ⎪⎩ 0; t − k With this function, a smooth SVM model TSSVM was obtained. However, the efficiency or the precision of these algorithms was limited. In this paper, we propose a novel smooth function ϕ (t , δ ) with smoothing parameter δ > 0 to approximate to the function t+ as following ⎧0; t < − δ ⎪ 4 ⎪ 3 ⎪8 t + δ 4 ⎪ ; − δ t0 4 ⎪ 3δ 2 ϕ (t , δ ) = ⎨ (10) 3 ⎪ 8 δ −t ⎪t + 4 ; 0 < t δ ⎪ 4 3δ 2 ⎪ δ ⎪t ; t > ⎩ 4 The first and second-order derivatives of ϕ (t , δ ) are

( )

( )

2013

⎧0; t < − δ ⎪ 4 ⎪ 2 ⎪8 t + δ 4 ; − δ 0 ⎪ t 4 ⎪ δ2 ∇ ϕ (t , δ ) = ⎨ (11) 2 ⎪ 8 t−δ ⎪1 − 4 ; 0 < t δ ⎪ 4 δ2 ⎪ ⎪1; t > δ ⎩ 4 and ⎧0; | t |> δ ⎪ 4 ⎪ ∇ 2ϕ (t , δ ) = ⎨ δ (12) 16 − | t | ⎪ 4 ; | t | δ ⎪⎩ 4 δ2 The solution of the problem Eq. (3) can be obtained by solving the following unconstrained optimization problem approaching with the smoothing parameter δ infinitesimal. 1 min Φδ (w , γ ) = υ || ϕ (e − D( Aw − eγ ), δ ) ||2 + ( w , γ )∈ n+1 2 1 Τ (13) (w w + γ 2 ) 2 Thus, we develop a new smooth approximation for problem Eq. (3).

( )

( )

(

)

3 Approximation performance analysis of smooth functions In this section, we will compare approximation performance of the smooth functions above. Lemma 1 [13] p(t , k ) is defined as the integral of the sigmoid function in Ref. [13], t + is the plus function. The following conclusions can be obtained 1) p(t , k ) is smooth for arbitrary rank about t . 2) p(t , k )t+ . 3) For ρ > 0, | t |< ρ , then p(t , k )2 − t+2 ( ln 2 k ) + 2

( 2ρ k ) ln 2 . Lemma 2 [14] q(t , k ) and h(t , k ) are defined as Eqs. (6) and (7), and t+ is plus function. We can obtain the following conclusions 1) q(t , k ) is one rank smooth about t, h(t , k ) is twice rank smooth about t. 2) q(t , k )t+ and h(t , k )t+ . 3) For any t ∈

and k > 0 , q(t , k )2 − t +2 1 (11k 2 )

and h(t , k ) 2 − t+2 1 (19k 2 ) .

Issue 5


Lemma 3 [18] T (t , k ) is defined as (9), and t + is plus function. We can obtain the following results 1) T (t , k ) is twice rank smooth about t. 2) T (t , k )t+ . 3) For any t ∈ and k > 0 , h(t , k ) 2 − t+2 1 ( 24k 2 ) . Theorem 1 The smooth approximation function ϕ (t , δ ) defined as Eq. (10) has the following properties 1) ϕ (t , δ ) is twice rank smooth about t. 2) For any t ∈ , δ > 0 , we have ϕ (t , δ )t+ .

3) For any t ∈ , δ > 0 , then ϕ (t , δ )2 − t +2 δ 2 385 . Proof 1) According to the Eqs. (11) and (12), it is easy to obtain the first conclusion in Theorem 1. 2) Now we prove the conclusion ϕ (t , δ )t+ . a) If t > δ 4 or t < − δ 4 , the values of ϕ (t , δ ) and t + are same, ϕ (t , δ ) = t + holds. b) If − δ 4t0 , ϕ (t , δ ) − t+ = ϕ (t , δ )ϕ (− δ 4, δ ) = 0. c) If 0 < tδ 4 , ϕ (t , δ ) − t+ = 8(δ 4 − t )3 / 3δ 20 . Hence we have ϕ (t , δ )t+ . 3) If t > δ 4 or t < − δ 4 , ϕ (t , δ )2 − t +2 = 0 holds. So the inequality ϕ (t , δ )2 − t +2 δ 2 385 holds. If − δ 4t0 , since t + = 0 , ϕ (t , δ )2 − t +2 = ϕ (t , δ ) 2 . Because ϕ (t , δ ) is positive-value, continuous and increasing function for − δ 4t0 , we have ϕ (t , δ )2

ϕ (0, δ )2 = δ 2 576δ 2 385. Hence ϕ (t , δ )2 − t +2 δ 2 385 for − δ 4t0 . If 0 < tδ 4 , let g (t ) = ϕ (t , δ )2 − t +2 = (t + 8(δ 4 − t )3 /

3δ 2 )2 − t 2 = ( 64 9 ) δ 2 (t δ − 1 4)6 − (16 3) δ 2 ( t δ ) (t δ −

1 4)3 . In order to obtain the result, making variable substitution s = t / δ (obviously s ∈ (0,1/ 4) ), then we

125

2) If the smooth functions are defined as Eqs. (6) and (7), by Lemma 2, the following formulas hold. 1 1 (15) q(t , k )2 − t +2 2 ≈ 0.090 9 2 11k k 1 1 ≈ 0.052 6 2 (16) h(t , k ) 2 − t+2 2 19k k 3) If the smooth function is defined as Eq. (9), according to Lemma 3, one can obtain 1 1 ≈ 0.041 7 2 (17) T (t , k )2 − t +2 2 24k k 4) If the smooth function is defined as Eq. (10), by Theorem 1, δ2 1 ϕ (t , δ )2 − t +2 ≈ 0.002 6 2 (18) 385 k Xiong et al. [16] proposed a class of smooth functions pd (t , k ) (d = 2,3, 4,...) derived by Eq. (8). However, the parameter d must be set large enough such that the approximation accuracy of their functions achieves a good level. It will generate the additional computation cost. So approximation performance of this class of smooth functions is not compared with the smooth functions above. Theorem 2 shows that the approximation accuracy of the piecewise-smooth functions is higher than the integral of the sigmoid function by two orders of magnitude. It is also higher than that of the other three polynomial smooth functions by one order of magnitude. The proposed piecewise-smooth function ϕ (t , δ ) achieves the best degree of approximation to the plus function t+ . When δ is fixed, it is easy to find the different smooth capability of the above several smooth functions. The smooth performance comparison is given in Fig. 1 where the smooth parameters are set as k = 10 , ρ = δ = 1/10 .

have g (s ) = ( 64 9 ) δ 2 (s − 1 4)6 − (16 3) δ 2 s (s − 1 4)3 . For s ∈ (0,1/ 4) , the maximum point of

g ( s)

is s =

0.045 36. Then g ( s)g (0.045 36) ≈ 0.002 596δ δ 2 385 . 2

In conclusion, we have ϕ (t , δ )2 − t +2 δ 2 385 for any t∈ , δ >0. According to Lemma 1, Lemma 2 and Lemma 3 and Theorem 1, it is easy to obtain the following results. Theorem 2 Let ρ = δ = 1 k , k > 0 , then we have 1) If the smooth function is defined as Eq. (5), then by Lemma 1, we have 2

1 ⎛ ln 2 ⎞ 2ρ p(t , k ) 2 − t+2 ⎜ ln 2 = ( ln 2 2 + 2 ln 2 ) 2 ≈ + ⎟ k k ⎝ k ⎠ 1 (14) 0.692 7 2 k

Fig. 1 Comparison of approximation performance of smooth functions ( k = 10, ρ =δ =1/10 )

126


4 Convergence performance of PWESSVM In this section, the convergence of PWESSVM will be presented. By rough set theory, we prove that the solution of PWESSVM can closely approximate the optimal solution of the original model Eq. (4) when δ goes to infinitesimal. Furthermore, a formula for computing the upper bound of convergence is deduced. Theorem 3 Let A ∈ m×n and b ∈ m×1 . Define the real valued functions in the n-dimensional real space n as follows 1 1 ⎫ f ( x ) = || ( Ax − b)+ ||22 + || x ||22 ⎪⎪ 2 2 (19) ⎬ 1 1 2 2⎪ g ( x, δ ) = || ϕ ( Ax − b, δ ) ||2 + || x ||2 ⎪⎭ 2 2 where ϕ (⋅) is defined in Eq. (10), x ∈ n , δ > 0 . Then we have the following results: 1) f ( x ) and g ( x, δ ) are strongly convex functions. 2) There exists a unique solution x * to minn f ( x ) , and x∈

a unique solution xδ* to minn g ( x , δ ) . x∈

3) For ∀δ > 0 , xδ* and x * satisfy the following condition

1 g ( x* , δ ) − g ( xδ* , δ )∇g ( xδ* , δ )( x* − xδ* ) + || xδ* − x* ||22 = 2 1 * * 2 || xδ − x ||2 2 Add the two formulas above and notice that ϕ ( x, δ )x+ , and then we obtain || xδ* − x * ||22 f ( xδ* ) − f ( x* ) + g ( x* , δ ) − g ( xδ* , δ ) = ( g ( x * , δ ) − f ( x* )) − ( g ( xδ* , δ ) − f ( xδ* ))g ( x* , δ ) − 1 1 || ϕ ( Ax* − b, δ ) ||22 − || ( Ax* − b)+ ||22 . 2 2 Based on the results of Eq. (3) of Theorem 1, the conclusion || xδ* − x * ||2 mδ 2 770 holds. f ( x* ) =

4) According to || xδ* − x * ||2 mδ 2 770 , we have lim xδ* = x* . δ →0

5 The Newton-Armijo algorithm for PWESSVM The objective function of problem Eq. (13) is twice continuous differentiable. So we can use the fast Newton-Armijo algorithm to train PWESSVM. It works as follows. 5.1

mδ 2 770 * 4) lim || xδ − x* ||= 0 .

|| xδ* − x * ||2

2013

Newton-Armijo algorithm

(20) Step 1 Initialization: start with any ( w 0 , γ 0 ) ∈

δ →0

Proof 1) For any δ > 0 , f ( x ) and g ( x, δ ) are strongly convex functions because || ⋅ ||

2

is a strong convex

function. 2) Let Lv ( f ( x )) be the level set of f ( x ) and Lv ( g ( x, δ )) be the level set of g ( x, δ ) . Since x+ ϕ ( x , δ ) , it is easy to obtain 2

Lv ( f ( x)) ⊂ { x | x 2 2v} . Therefore

Lv ( g ( x , δ )) ⊂ Lv ( f ( x ))

and

Lv ( g ( x, δ )) are compact subsets in n . Using the strong convexity property of f ( x ) and g ( x, δ ) for δ > 0 , there is a unique solution to minn f ( x ) and minn g ( x , δ ) x∈

x∈

respectively. 3) By using the first-order optimization condition and considering convex property of f ( x ) and g ( x, δ ) , we have 1 f ( xδ* ) − f ( x * )∇f ( x* )( xδ* − x* ) + || xδ* − x* ||22 = 2 1 * || xδ − x* ||22 2

n +1

,

τ and set i := 0 . Step 2 Compute Φ i = Φδ (w i , γ i ) and g i = ∇Φδ (w i ,

γi) . Step 3

If || g i ||2 τ , then stop, and accept ( w i , γ i ) .

Otherwise, compute Newton direction d i ∈ linear system. ∇ 2Φ (w i , γ i ; δ i )d i = −( g i )Τ

n+1

from the (21)

where ‘T’ denotes transpose symbol. Step 4 Armijo stepsize: choose a stepsize λi = max{1,1/ 2,1/ 4,...} such that Φδ (w i , γ i ) − Φδ (( w i ,

γ i ) + λi d i ) − ρλi g i d i ,

(w

i +1

,γ

i +1

where

) = ( w ,γ ) + λ d . i

i

ρ ∈ (0,1/ 2)

and

let

i

i

Step 5 Replace i by i + 1 and go to Step 2. We need to only solve a linear system of Eq. (21) instead of a quadratic program in our smooth approach. Because the objective function is strong convex, it is not difficult to prove that our Newton-Armijo algorithm for training PWESSVM converges globally to the unique solution [13,20]. PWESSVM described above can solve the linear

Issue 5


classification problems. In fact, we can extend some of the results in Sect. 2 to nonlinear PWESSVM with kernel technique [13]. Hence Newton-Armijo algorithm can also solve nonlinear PWESSVM successfully.

6 Numerical experiments To demonstrate the effectiveness and speed of PWESSVM, we compare the performance numerically between SSVM, FPSSVM, TSSVM and PWESSVM. The four smooth SVMs are all trained by the fast Newton-Armijo algorithm. Newton-Armijo cannot be applied to QPSSVM model due to lack of the second order derivative. In fact, the classification capacity of FPSSVM is slightly better than QPSSVM [14–15,18]. In our experiment, we do not compare QPSSVM with the other smooth SVM method. All experiments are run on Personal Computer with 3.0 GHz and a maximum of 1.99 GB of the memory available. The programs of PWESSVM, TSSVM and FPSSVM are written in the Matlab language. This computer runs Windows XP with Matlab 7.0.1. The source code of SSVM, ‘ssvm.m’, is obtained from the author’s web site for the linear problem (Musicant D R, Managsarian O L. LSVM: Lagrangian support vector machine. http://www.cs.wisc.edu/dmi/svm/, 2000), and ‘lsvmk.m’ for the nonlinear problem. In our experiments, all of the input data, the variables needed in programs are kept in the memory. For the four smooth SVMs, an optimality tolerance of 10 −8 is used to determine when to terminate. Gaussian kernel is used in all our experiments. The first experiment is used to demonstrate the capability of PWESSVM in solving larger problems. The results in Table 1 are designed to compare the training Table 1 PWESSVM compared with the other three smooth SVMs on NDC generated dataset with difference size Trains /dimension

2 000 000 / 10

2 000 000 / 20

10 000 / 100

10 000 / 1000

Algorithm SSVM FPSSVM TSSVM PWESSVM SSVM FPSSVM TSSVM PWESSVM SSVM FPSSVM TSSVM PWESSVM SSVM FPSSVM TSSVM PWESSVM

Train correctness /(%) 90.86 90.86 90.98 91.34 87.64 87.88 87.89 88.02 94.26 94.78 94.77 94.91 96.67 96.69 96.69 97.94

Test correctness /(%) 91.25 91.25 91.33 91.90 87.08 88.05 88.05 88.59 93.60 93.60 93.77 95.68 86.20 86.14 86.14 86.37

CPU time/s 278.97 367.46 342.45 346.52 417.64 449.28 446.45 451.44 11.17 11.33 6.24 4.88 56.22 66.52 26.69 19.31

127

correctness, the testing correctness, and the training time among the four smooth SVMs on a massively sized dataset. The datasets are created using Musicants NDC Data Generator (Musicant D R. NDC: normally distributed clustered datasets. www.cs.wisc.edu/~.musicant/data/ndc/, 1998) with different sizes. The test samples are 5% of the training samples. The experiment results show that PWESSVM has good training accuracy and testing accuracy. Furthermore, PWESSVM can be used to solve problems more quickly than the other three algorithms when dimension of the sample data is relative small. The second experiment is implemented on 6 randomly generated databases with normal distribution. In the following table, Fvalue-min means the minimize value of objective function. The comparison of experiment results is given in Table 2. Table 2 Experiments results comparison of four smooth SVMs on synthetic data set Trains /dimension

100 / 10

1 000 / 10

100 / 40

1 000 / 40

Algorithm SSVM FPSSVM TSSVM PWESSVM SSVM FPSSVM TSSVM PWESSVM SSVM FPSSVM TSSVM PWESSVM SSVM FPSSVM TSSVM PWESSVM

Test correctness /(%) 92.36 92.34 94.01 96.83 96.96 99.20 99.46 99.83 99.36 100 100 100 100 100 100 100

CPU time/s

Fvalue-min

2.48 1.36 1.37 1.09 98.50 28.79 28.47 27.34 4.47 7.46 6.90 6.65 194 254 243 226

4.473 2 0.049 7 0.049 4 0.048 4 142.64 9.28 8.87 7.99 14.17 0.003 4 0.003 2 0.003 5 156.22 0.006 5 0.006 1 0.005 4

7 Conclusions A novel PWESSVM is proposed in this paper. It only needs to find the unique minima of the unconstrained differentiable convex quadratic function. The new method has many advantages over those available, such as very good classification performance, less training time cost. The numerical results show that PWESSVM has better generalization ability. Acknowledgements This work was supported by the National Natural Science Foundation of China (61100165, 61100231, 61105064, 51205309),

128


the Natural Science Foundation of Shaanxi Province (2012JQ8044, 2011JM8003, 2010JQ8004), and the Foundation of Education Department of Shanxi Province (2013JK1096).

References 1. Burges C J C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2): 121−167 2. Zhang Y, Cai J. A support vector classifier based on vague similarity measure. Mathematical Problems in Engineering, 2013, ID 928054/1−7 3. Baccarini L M R, Silva V V R, Menezes B R D, et al. SVM practical industrial application for mechanical faults diagnostic. Expert Systems with Applications, 2011, 38(6): 6980−6984 4. Li K L, Xie J, Sun X, et al. Multi-class text categorization based on LDA and SVM. Procedia Engineering, 2011, 15: 1963−1967 5. Spinosa E J, Carvalho A C. Support vector machines for novel class detection in Bioinformatics. Genetics and Molecular Research, 2005, 4(3): 608−615 6. Yuan Y B. Forecasting the movement direction of exchange rate with polynomial smooth support vector machine. Mathematical and Computer Modelling, 2013, 57(3/4): 932−944 7. Zheng J, Lu B L. A support vector machine classifier with automatic confidence and its application to gender classification. Neurocomputing, 2011, 74(11): 1926−1935 8. Zhou X, Wang Y, Wang D L. Application of kernel methods in signals modulation classification. The Journal of China Universities of Posts and Telecommunications, 2011, 18(1): 84−90 9. Zhao Z C. Combining SVM and CHMM classifiers for porno video recognition. The Journal of China Universities of Posts and Telecommunications, 2012, 19(3): 100−106 10. Lin H J, Yeh J P. Optimal reduction of solutions for support vector machines. Applied Mathematics and Computation, 2009, 214(2): 329−335

2013

11. Christmann A, Hable R. Consistency of support vector machines using additive kernels for additive models. Computational Statistics and Data Analysis, 2012, 56(4): 854−873 12. Shao Y H, Deng N Y. A coordinate descent margin based-twin support vector machine for classification. Neural Networks, 2012, 25(1): 114−121 13. Lee Y J, Mangarasian O L. SSVM: A smooth support vector machine for classification. Computational Optimization and Applications, 2001, 22(1): 5−21 14. Yuan Y B, Yan J, Xu C X. Polynomial smooth support vector machine. Chinese Journal of Computers, 2005, 28(1): 9−17 (in Chinese) 15. Yuan Y B, Huang T Z. A polynomial smooth support vector machine for classification. Proceedings of the 1st International Conference on Advanced Data Mining and Applications (ADMA’05), Jul 22−24, 2005, Wuhan, China. LNCS 3584. Berlin,Germany: Springer-Verlag, 2005: 157−164 16. Xiong J Z, Hu J L, Yuan H Q, et al. Research on a new class of functions for smoothing support vector machines. Acta Electronica Sinica, 2007, 35(2): 366−370 (in Chinese) 17. Purnami S W, Embong A, Zain J M, et al. A new smooth support vector machine and its applications indiabetes disease diagnosis. Journal of Computer Science, 2009, 5(12): 1003−1008 18. Yuan Y B, Fan W G, Pu D M. Spline function smooth support vector machine for classification. Journal of Industrial and Management Optimization, 2007, 3(3): 529−542 19. Yuan B L, Zhang W J, Wu H. New solution method to smoothing support vector machine with one control parameter smoothing function. Proceedings of the 2nd WRI Global Congress on Intelligent Systems (GCIS’10): Vol 2, Dec 16−17, 2010, Wuhan, China. Piscataway, NJ, USA: IEEE, 2010: 153−154 20. Xu C X, Zhang J Z. A survey of quasi-Newton equations and quasi-Newton methods for optimization. Annals of Operations Research, 2001, 103(1/2/3/4): 213−234 21. Bertsekas D P. Nonlinear programming. 2nd ed. Belmont, MA, USA: Athena Scientific Press, 1999

(Editor: WANG Xu-ying)

From p. 116 6. Belzner M, Haunstein H. Network coding in passive optical networks. Proceedings of the 35th European Conference on Optical Communication (ECOC’09), Sep 21−23, 2009, Vienna, Austria. Piscataway, NJ, USA: IEEE, 2009: 2p 7. Manley E D, Deogun J S, Xu L, et al. All-optical network coding. IEEE/OSA Journal of Optical Communications and Networking, 2010, 2(4): 175−191 8. Qu Z J, Ji Y F, Bai L, et al. Key module for a novel all-optical network

coding scheme. Chinese Optics Letters, 2010, 8(8): 753−756 (in Chinese) 9. Qu Z J, Bai L, Liu X H. Logic-calculation-based optical network coding mode. Acta Electronica Sinica , 2012, 40(7): 1304−1308 (in Chinese) 10. Qu Z J, Bai L, Zhang L K. Minimum coding nodes multicast tree for two-channel all optical network coding scheme. Journal of China Universities of Posts and Telecommunications, 2012, 19(3): 114−121 11. Liu X, Wang H X, Ji Y F. Hybrid multicast mode in all-optical networks. IEEE Photonics Technology Letters, 2007, 19(16): 1212−1214

(Editor: WANG Xu-ying)