Parameters Optimization for Nonparallel Support Vector Machine by

0 downloads 0 Views 649KB Size Report
Nonparallel Support Vector Machine (NPSVM) which is an extension of ..... [13] Li, B., et al., Slope stability analysis based on quantum-behaved particle swarm ...
Available online at www.sciencedirect.com

ScienceDirect Procedia Computer Science 91 (2016) 482 – 491

Information Technology and Quantitative Management (ITQM 2016)

Parameters Optimization for Nonparallel Support Vector Machine by Particle Swarm Optimization Seyed Mojtaba Hosseini Bamakana,b,c*, Huadong Wang a,c, Ahad Zare Ravasand a

Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences, Beijing 100190, China; b School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China; c Research center on fictitious economy and data science, Chinese Academy of Sciences, Beijing 100190, China; d Department of Industrial Management, Allameh Tabataba’i University,Tehran, Iran;

Abstract Support vector machine is a well-known and computationally powerful machine learning technique for pattern classification and regression problems, which has been successfully applied to solve many practical problems in a wide variety of fields. Nonparallel Support Vector Machine (NPSVM) which is an extension of Twin-SVMs, is proved to be theoretically and practically more flexible and superior than TWSVMs and also it overcomes several drawbacks of the existing typical SVMs in order to be applicable in large-scale data sets. However, one of the difficulties in successful implementation of NPSVM is its different parameters, which should be well adjusted during the training process. In fact, the generalization power, robustness and sparsity of NPSVM are extremely depended on well setting of its parameters. In this paper, we propose a hybrid approach for parameter determination of the NPSVM by Particle Swarm Optimization techniques. Furthermore, in order to increase the sparsity of NPSVM and to reduce the training time, we take into account the number of support vectors (SVs) along with classification accuracy as a weighted objective function. Our experiments on several public datasets show that the proposed method can achieve better classification accuracy compare to that of TWSVM and NPSVM with less computational time. © Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license ©2016 2016The The Authors. Published by Elsevier B.V. (http://creativecommons.org/licenses/by-nc-nd/4.0/). Selection and/or peer-review under responsibility of the organizers of ITQM 2016 Peer-review under responsibility of the Organizing Committee of ITQM 2016

Keywords: Nonparallel Support Vector Machine, Particle Swarm Optimization, Twin SVM, Parameter setting

1. Introduction Support vector machines (SVMs) are outstanding machine learning methods proposed by Vapnik [1] which is applied in many real-life problems. SVM is based on Statistical Learning Theory (SLT) that tries to minimize the structural risk instead of minimize the empirical risk. This characteristic gives a good robustness and

* Corresponding author. E-mail address: [email protected], [email protected]

1877-0509 © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the Organizing Committee of ITQM 2016 doi:10.1016/j.procs.2016.07.125

Seyed Mojtaba Hosseini Bamakan et al. / Procedia Computer Science 91 (2016) 482 – 491

483

generalization power to the SVM. In a binary and linearly separable classification problem, SVM tries to find the separating hyper-plane by maximizing the margin between the separating hyper-plane and the closest data points of each class [1, 2]. In a case that the data points are not linearly separable, some kernel functions have been introduced to the SVM to map the original data to a high dimensional feature space in which the problem becomes linearly separable [1-5]. In the recent years, some new extensions of SVM such as Bounded SVM [6], v-SVM [7], least squares SVM [8], Twin SVM [9] and NPSVM [10] has been proposed. Among these models of SVM, nonparallel hyperplane SVM includes; the generalized eigenvalue proximal support vector machine (GEPSVM) [11], the twin support vector machine (TWSVM) [9] and Nonparallel Support Vector Machine (NPSVM) [10, 12] has attracted many interests. In a binary case, these group of SVMs have some advantages over the conventional SVMs, so that by seeking the two nonparallel proximal hyperplanes such that each hyperplane is closer to one of the two classes and is at least one distance far from the other. For example, TWSVM by solving two smaller quadratic programming problems (QPPs) instead of a larger one, increases the training speed by approximately fourfold compared to that of standard SVM. Although TWSVM has been studied extensively by many researchers, a good review on application of TWSVM from 2007 to 2014 is provided in [13], this classifier suffers from some problems and shortcoming [14]. Among the extensions of TWSVMs, the nonparallel support vector machine (NPSVM) [10, 12] is theoretically and empirically superior to the TWSVM and overcomes several drawbacks of the existing TWSVMs [15]. Because of different parameters applied in NPSVM, which their values should be defined by the user, the generalization power, robustness and sparsity of Nonparallel Support Vector Machine will be affected if they do not well set. The parameters of NPSVM include ܿ௜  ൒ 0, ݅ = 1, . . ,4 which is penalty constant, İ in the İinsensitive loss function, parameters used in the Kernel function. The penalty constant ܿ affects the trade-off between model complexity and the proportion of nonseparable samples. The parameter ܿଵ,ଷ or ܿଶ,ସ , is the weighting factor which determines the trade-off between the regularization term and the empirical risk [13]. The value of ߝ determines the smoothness of the SVM’s decision boundary and the number of support vectors, so that the bigger İ, the fewer support vectors are selected. Different value for ߛ will directly affect the flexibility of separating hyper-plane, and it is caused shifting in the resulting decision boundary. All these parameters have considerable effect on the generalization power of NPSVM, which this issue shows the significant role of a proper model parameter setting to improve the NPSVM classification accuracy. Although the most common technique for SVM parameters selection is grid algorithm, time-consuming and local optimality are the most drawbacks of this method [4]. In this study, we try to provide parameters sensitivity analysis of NPSVM and instead of using the grid search algorithm to find the optimal value for the NPSVM parameters, a hybrid approach based on particle swarm optimization (PSO) [16] has been proposed. PSO is a population-based search algorithm that inspired by social behaviour in nature. It is a powerful, easy to implement, and computationally efficient optimization technique [17]. The remainder of this paper is organized as follows. In section 2, we present the related works. Section 3 gives a brief overview of NPSVM, PSO then in section 4, explanation of proposed PSO-NPSVM is presented. The experimental results and discussion are presented in section 5. Finally, we conclude and provide future works in section 6. 2. Related Works In the recent years, with the promising results gained by Swam intelligence techniques, especially Particle Swarm Optimization (PSO), this approach has been widely used to solve complex problems in a variety of domains such as computer science, medicine, finance and engineering [18]. PSO is a population-based intelligent optimization technique which tries to simulate the social behaviour of individuals such as a flock of birds, a school of fish swims or a colony of ants. As discussed by the authors in [16, 19, 20], PSO compares with the other algorithms in this group has several advantages such as simple to implement, scalability, robustness, quick

484

Seyed Mojtaba Hosseini Bamakan et al. / Procedia Computer Science 91 (2016) 482 – 491

in finding approximately optimal solutions and flexibility. Combining the SVM with PSO for improving its performance attracts the attention of many researchers. In Table 1, we summarized the proposed approaches applied in different domains. Table 1: Comparison of hybrid PSO and SVMs approaches in classification problems Authors Wu et al [21]

Context of Study Multidimensional time series

Method CPSO-g-SVRM

Contribution Instead of e-insensitive loss function, Gaussian loss function proposed to the SVR in order to reduce the effect of noises on the regression estimates. The Chaotic PSO is utilized to setting the parameters of proposed g-SVRM.

Liu & Zhou [22]

Chemometrics data

CPSO-LS-SVM

A hybrid methodology based on least square-support vector machine (LS-SVM) optimized by CPSO named “CPL-SVM” is proposed in order to improve the classification accuracy.

Zhai & Jiang [23]

Sense-throughfoliage target detection

DEPSO-SVM

A new hybrid differential evolution and self-adaptive particle swarm optimization (DEPSO) algorithm are adopted to optimize the parameters of SVM. The author used four variant radar target echo signals for evaluation of proposed method and the comparison among DEPSOSVM, PSOSVM, canonical SVM, BPNN and KNN is done.

Zhai & Jiang [24]

Target Detection

ACPSO-SVM

An adaptive chaos particle swarm optimization (ACPSO) is proposed to determine the optimal parameters for SVM.

Dong et al [25]

Hourly solar irradiance forecasting

PSO-SVR

The authors proposed a hybrid forecasting method to predict hourly resolution solar irradiance data using self-organizing maps, support vector regression and particle swarm optimization. PSO is utilized to setting the parameters of SVR.

Kuo et al [21]

Radio frequency identification

HIP–SVM

Optimizing the SVM parameters by a hybrid method consist of artificial immune system (AIS) and particle swarm optimization (PSO).

Jian et al [11]

Forecast largescale goaf instability

PSO–SVM

A model based on SVM and PSO are developed to build a model for the determination of large-scale goaf instability from various underground metal mines.

Chen & Kao [14]

TAIEX

PSO-SVM

The proposed method is based on fuzzy time series, particle swarm optimization techniques and support vector machines for forecasting the TAIEX.

Chen et al [4]

Fault diagnosis

CPSO-MSVM

A hybrid method based on multi-kernel support vector machine (MSVM) with chaotic particle swarm optimization (CPSO) for roller bearing fault diagnosis. In this method local tangent space alignments (LTSA) adopted as a feature selection method.

3. Background 3.1. Nonparallel Support Vector Machine (NPSVM) Let’s consider the training set ܶ = {(࢞ଵ , +1), … . ,൫࢞௣ , +1൯, ൫࢞௣ାଵ ǡ െ1൯, … , ൫࢞௣ା௤ ǡ െ1൯} as a binary classification problem where, ࢞௜ ‫ ࣬  א‬௡ , ݅ = 1, … , ‫ ݌‬+ ‫ݍ‬. Let ࡭ = (࢞ଵ , … , ࢞௣ )்  ‫ ࣬  א‬௣×௡ , ࡮ = (࢞௣ାଵ , … , ࢞௣ା௤ )்  ‫࣬  א‬௤×௡ , and ݊ = ‫ ݌‬+ ‫ݍ‬. The NPSVM tries to find two nonparallel hyperplanes as defined in Eq. 1. (1) ݂ା (࢞) = (࢝ା . ࢞) + ܾା = 0 and ݂_ (࢞) = (࢝ି . ࢞) + ܾି = 0

485

Seyed Mojtaba Hosseini Bamakan et al. / Procedia Computer Science 91 (2016) 482 – 491

where ( ࢝. ࢞ ) is the dot product between ࢝ and ࢞. In NPSVM, the two convex quadratic programming problems (QPPs) formulate as following [10, 26]: ݉݅݊࢝

ሺ‫)כ‬ శ , ௕శ ,ࣁశ ,కష

ଵ ଶ

ԡ࢝ା ԡଶ + ‫ܥ‬ଵ σ௣௜ୀଵ(ߟ௜ + ߟ௜ ‫ ) כ‬+ ‫ܥ‬ଶ σ௣ା௤ ௝ୀ௣ାଵ ߦ

(࢝ା . ࢞௜ ) + ܾା ൑ ߝ + ߟ௜ , െ((࢝ା . ࢞௜ ) + ܾା ሻ ൑ ߝ + ߟ௜ ‫ כ‬, (࢝ା . ࢞௜ ) + ܾା ൑ െ1 + ߦ௝ , ߟ௜ , ߟ௜ ‫ כ‬൒ 0, ߦ௝ ൒ 0,

s.t.



(2)

݅ = 1, … , ‫݌‬ ݅ = 1, … . , ‫݌‬ ݆ = ‫ ݌‬+ 1, … , ‫ ݌‬+ ‫ݍ‬ ݅ = 1, … . , ‫݌‬ ݆ = ‫ ݌‬+ 1, … . , ‫ ݌‬+ ‫ݍ‬

And ݉݅݊࢝



ሺ‫)כ‬ ష , ௕_ ,ࣁష ,ࣈశ



௣ ‫כ‬ ԡ࢝ି ԡଶ + ‫ܥ‬ଷ σ௣ା௤ ௜ୀ௣ାଵ(ߟ௜ + ߟ௜ ) + ‫ܥ‬ସ σ௝ୀଵ ߦ

(࢝ି . ࢞௜ ) + ܾି ൑ ߝ + ߟ௜ , െ((࢝ି . ࢞௜ ) + ܾି ሻ ൑ ߝ + ߟ௜ ‫ כ‬, (࢝ି . ࢞௜ ) + ܾି ൒ ͳ െ ߦ௝ , ߟ௜ , ߟ௜ ‫ כ‬൒ 0, ߦ௝ ൒ 0,

s.t.



(3)

݅ = ‫ ݌‬+ 1, … , ‫ ݌‬+ ‫ݍ‬ ݅ = ‫ ݌‬+ 1, … . , ‫ ݌‬+ ‫ݍ‬ ݆ = 1, … , ‫݌‬ ݅ = ‫ ݌‬+ 1, … . , ‫ ݌‬+ ‫ݍ‬ ݆ = 1, … . , ‫݌‬

In the model 2 and 3, ࢞௜ , ݅ = 1, … , ‫ ݌‬are positive data points, and ‫ݔ‬௜ , ݅ = ‫ ݌‬+ 1, … , ‫ ݌‬+ ‫ ݍ‬are negative data ሺ‫)כ‬ points, ܿ௜  ൒ 0, ݅ = 1, . . ,4 are penalty parameters, ࣈା = (ߦଵ , … . , ߦ௣ )் , ࣈି = (ߦ௣ାଵ , … . , ߦ௣ା௤ )் , ࣁା = ‫் כ‬ ் ‫் ்כ‬ ‫כ‬ ‫כ‬ ‫כ‬ ் ் (ࣁ்ା , ࣁ‫்כ‬ , ࣁሺ‫)כ‬ , are slack ା ) = (ߟଵ , … . , ߟ௣ , ߟଵ , … , ߟ௣ ) ି = (ࣁି , ࣁି ) = (ߟ௣ାଵ , … . , ߟ௣ା௤ , ߟ௣ାଵ , … , ߟ௣ା௤ ) variables. In order to get the solutions of problems (2) and (3), we need to solve their dual problems: ଵ

݉݅݊ఏ



෩் ઩ ෩ + ݇෨ ் ࣂ ෩ ෩ࣂ ࣂ

(4)

෩ = 0, ࢋ෤் ࣂ ෩ ൑ ࢉ෤, ૙൑ࣂ

s.t.

In the above formulation the variables defined as follows: ் ் ் ் ் ் ் ෩ ෩ = (ࢻ‫்כ‬ ෤ = (െࢋ்ା , ࢋ்ା ǡ െࢋ்ି )் , ࢉ෤ = (ܿଵ ࢋ்ା , ܿଵ ࢋ்ା , ܿଶ ࢋ்ି )் ࣂ ା , ࢻା , ࢼି ) , ࢑ = (ߝࢋା , ߝࢋା ǡ െࢋି ) , ࢋ ் ࡴ ࡴଶ ‫࡭(ܭ‬, ࡭ ሻ െ‫࡭(ܭ‬, ࡭் ) ‫࡭(ܭ‬, ࡮் ) ෩ = ൬ ்ଵ ઩ ൰, ࡴଶ = ൬ ൰, ࡴଷ = ‫࡮(ܭ‬, ࡮் ), ൰, ࡴଵ = ൬ ் ் ࡴଶ ࡴଷ െ‫࡭(ܭ‬, ࡭ ) ‫࡭(ܭ‬, ࡭ ) െ‫࡭(ܭ‬, ࡮் )

(5) (6)

And ݉݅݊ఊ s.t. Where

ଵ ଶ

෡் ઩ ෡ + ݇෠ ் ࣂ ෡ ෡ࣂ ࣂ

෡ = 0, ࢋො் ࣂ ෡ ൑ ࢉො, ૙൑ࣂ

(7)

486

Seyed Mojtaba Hosseini Bamakan et al. / Procedia Computer Science 91 (2016) 482 – 491 ் ் ் ் ் ் ் ෡ ෡ = (ࢻ‫்כ‬ ො = (െࢋ்ି , ࢋ்ି ǡ െࢋ்ା )் , ࢉො = (ܿଷ ࢋ்ି , ܿଷ ࢋ்ି , ܿସ ࢋ்ା )் ࣂ ି , ࢻି , ࢼା ) , ࢑ = (ߝࢋି , ߝࢋି ǡ െࢋା ) , ࢋ ் ࡽଵ ࡽଶ ‫࡮(ܭ‬, ࡮ ሻ െ‫࡮(ܭ‬, ࡮் ) ‫࡮(ܭ‬, ࡭் ) ෡= ൬ ் ઩ ൰, ࡽଶ = ൬ ൰, ࡽଷ = ‫࡭(ܭ‬, ࡭் ), ൰, ࡽଵ = ൬ ் ் ࡽଶ ࡽଷ െ‫࡮(ܭ‬, ࡮ ) ‫࡮(ܭ‬, ࡮ ) െ‫࡮(ܭ‬, ࡭் )

(8) (9)

‫כ‬

෩ and ࣂ ෡ ‫ כ‬respectively. The decision functions of Denote the optimal solution of the dual problem (4) and (7) as ࣂ

these two models defined as: ௣

௣ା௤

‫ۓ‬ ‫כ‬ ‫כ‬ ݂ (࢞) = ෍(ߠ෨௜‫ כ‬െ ߠ෨௜ା௣ ) ݇ (࢞௜ , ࢞) െ ෍ ߠ෨௜ା௣ ݇(࢞௜ , ࢞) + ܾା ۖ ା

(10)

‫۔‬ ‫כ‬ ‫כ‬ ) ݇ (࢞௜ , ࢞) + ෍ ߠ෠௜ା௤ ݇(࢞௜ , ࢞) + ܾି ݂ۖି (࢞) = ෍(ߠ෠௜‫ כ‬െ ߠ෠௜ା௤ ‫ە‬ ௜ୀଵ ௝ୀ௤ାଵ

(11)

௜ୀଵ ௤

௝ୀ௣ାଵ ௣ା௤

Where b+ ,b- can be obtained based on the complementary slackness conditions. Separately, a new point ࢞ ‫ ࣬  א‬௡ is therefore predicted to the class ݇(݇ ൌ െ, +) by arg ݉݅݊௞ୀ‫݂| ט‬௞ (࢞)|. In the nonlinearly separable cases, some kernel functions ‫ܭ‬൫࢞௜ , ࢞௝ ൯ have been proposed to map (்࢞௜ . ࢞௝ ) in the original input space to ߮(࢞௜ )் ߮(࢞௝ ) in some high-dimensional feature space, the most widely used kernel function introduced as follows: Polynomial kernel: ‫ܭ‬൫࢞௜ , ࢞௝ ൯ = (1 + ்࢞௜ . ࢞௝ )௣

(12)

Radial-Basis function (RBF): = ݁ ିఊฮ࢞೔ ି ࢞ೕ ฮ

(13)



Where p is the polynomial order and ߛ is the predefined parameter controlling the width of the Gaussian kernel. By adopting RBF kernel in this study, the number of parameters in NPSVM which need to be tuned during the training process consist of: x ܿ as penalty constants, include ܿଵ , ܿଶ and ܿଷ , ܿସ in model 2 and 3 respectively, x İ as an insensitivity parameter in the İ-insensitive loss function, x ߛ as RBF kernel parameter. Different values for the above parameters will directly affect the generalization power, robustness and sparsity of Nonparallel Support Vector Machine and it is caused shifting in the resulting decision boundary. The effect of different values for NPSVM’s parameters on the separating hyper-plane and decision boundary has been shown in Figure 1.

3.2. Particle swarm optimization The particle swarm optimization was firstly introduced by Kennedy and Eberhart in 1995 [16]. It is a population based meta-heuristic method categorized in the Swarm intelligence techniques which were inspired by animal’s collective behavior such as bird flocking, fish schooling or an ant colony. Each individual in the search space called a “particle” and the population of particles is called “swarm”. PSO compares with the other algorithms in this group demonstrates higher efficiency and flexibility, easy implementation and powerful in both global and local exploration abilities [16, 19, 20].

487

Seyed Mojtaba Hosseini Bamakan et al. / Procedia Computer Science 91 (2016) 482 – 491

Fig. 1. The effect of different values for NPSVM’s parameters on the separating hyper-plane

In an n-dimensional search space, ࢞௜ [‫ ݔ{ = ]ݐ‬௜ଵ [‫]ݐ‬, ‫ ݔ‬௜ଶ [‫]ݐ‬, … . , ‫ ݔ‬௜஽ [‫ }]ݐ‬and ࢜௜ [‫ ݒ{ = ]ݐ‬௜ଵ [‫]ݐ‬, ‫ ݒ‬௜ଶ [‫]ݐ‬, … , ‫ ݒ‬௜஽ [‫}]ݐ‬ represents as the position and the velocity of ith particle at tth iteration of algorithm. PSO is an interactive algorithm which at the end of each iteration, the solution is evaluated by a pre-defined fitness function. After the initialization of the population, each particle moves iteratively around in the search space to update its velocity and its position based on two factors, its own best previous experience (pbest) and the best experience of all particles (gbest) as shown in Eq. 14 & 15. The best previously position that particle ݅ has obtained in iteration ‫ݐ‬ is presented by vector ࢖௜,௕௘௦௧ [‫݌{ = ]ݐ‬௜ଵ [‫]ݐ‬, ‫݌‬௜ଶ [‫]ݐ‬, … , ‫݌‬௜஽ [‫ }]ݐ‬and ࢖௚,௕௘௦௧ [‫ ݌{ = ]ݐ‬௚ଵ [‫]ݐ‬, ‫ ݌‬௚ଶ [‫]ݐ‬, … , ‫ ݌‬௚஽ [‫}]ݐ‬ represents the global best position of whole particle until iteration ‫ݐ‬. The movement of particle in each iteration is based on three component, at first particle moves slightly toward the front in the previous direction. Then, it moves slightly toward the previous itself best position and then it moves toward the global best position. These three movement formulated in Eq. 14. ࢜௜ௗ [‫ ݐ‬+ 1] = ‫ݓ‬. ࢜௜ௗ [‫ ]ݐ‬+ ‫ܥ‬ଵ ‫ݎ‬ଵ (࢖௜ௗ,௕௘௦௧ [‫ ]ݐ‬െ ࢞௜ௗ [‫ )]ݐ‬+ ‫ܥ‬ଶ ‫ݎ‬ଶ (࢖௚ௗ,௕௘௦௧ [‫ ]ݐ‬െ ࢞௜ௗ [‫)]ݐ‬ ࢞

௜ௗ [‫ݐ‬

+ 1] = ࢖

௜ௗ [‫]ݐ‬

+࢜

௜ௗ [‫ݐ‬

+ 1]

d=1,2,…,D d=1,2,…,D

(14) (15)

Where, ݅ = 1,2, … , ܰ, N is the number of swarm population. ࢜௜ [‫ ]ݐ‬is the velocity vector in [‫ ݄ݐ]ݐ‬iteration and ‫݌‬ represents the current position of the ݅th particle. ‫ ݓ‬is the nonzero inertia weight factor to control the pressure of local and global search. In fact, by decreasing the inertia weight over time, the PSO will shift from the global exploration to local exploitation [27, 28]. To control the pressure of local and global search, ‫ ݓ‬has ௜ [‫]ݐ‬

488

Seyed Mojtaba Hosseini Bamakan et al. / Procedia Computer Science 91 (2016) 482 – 491

been used. ‫ܥ‬ଵ ܽ݊݀ ‫ܥ‬ଶ are positive acceleration coefficients which respectively called cognitive parameter and social parameter. ‫ݎ‬ଵ ܽ݊݀ ‫ݎ‬ଶ are random number between 0 and 1. Furthermore, in order to address the undesirable dynamical properties of standard PSO [29] proposed a constriction coefficient to limit the particles’ velocities in their trajectories. Les considering ߮ଵ and ߮ଶ as constants which ߮ଵ , ߮ଶ > 0 and ߮ ‫߮ ؜‬ଵ + ߮ଶ > 4. The constriction coefficient termed ߯ defined as following: ߯=

ଶ ఝିଶାඥఝమ ିସఝ

, ‫߯ = ݓ‬, ‫ܥ‬ଵ = ߯߮ଵ and ‫ܥ‬ଶ = ߯߮ଶ

4. The developed PSO-NPSVM In the following steps we describe the procedure for our hybrid approach, PSO-NPSVM: Step 1: Training data preparation: here ܺ = (‫ݔ‬ଵ , ‫ݔ‬ଶ , … , ‫ݔ‬௡ ) represents an n-dimensional input features, and ‫ݕ‬௜ ‫א‬ ሼെ1, +1} denoted as the class label. Step 2: Particle initialization and PSO parameters setting: in this paper we, adopted RBF kernel function for the NPSVM classifier as it has more abilities to analysis higher dimensional data [17]. The detail of parameters setting for PSO-NPSVM is presented in Table 2. Table 2: Parameters for initialization of PSO-NPSVM PSO initialization Parameters Parameters Values Number of the iterations 50 Number of the particles 20 2.05 ߮ଵ = ߮ଶ 0.72 ‫ݓ‬ 0.8, 0.2 Objective functions weights, ‫ݓ‬஺஼஼ , ‫ݓ‬ௌ௏ Nonparallel Support Vector Machines Parameters [10-3,103] ‫ܥ‬ଵ = ‫ܥ‬ଷ [10-3,103] ‫ܥ‬ଶ = ‫ܥ‬ସ İ [0,0.5] [10-3,103] ߛ

Step 3: The classification accuracy rate of NPSVM (ACC) and the number of Support Vectors (SVs) are the two factors used to design the fitness function. Hence, those particles with high classification accuracy and the small number of Support Vectors produce a high fitness value. The small number of SVs increase the sparsity of NPSVM and reduce the model training time. A single weighted objective function is defined to consider these two criteria. Fitness= ‫ݓ‬஺஼஼ . ‫ ܥܥܣ‬+ ‫ݓ‬ௌ௏ . ቂͳ െ

௡ௌ௏ ୫



(16)

Here, here the ratio of correct predicted records to the entire records considered as an accuracy, in fact; accuracy (்௉ା்ே) = , ݉ defined as the number of training samples. ‫ݓ‬஺஼஼ and ‫ݓ‬ௌ௏ are predefined weights for accuracy ்௉ାி௉ାிேା்ே and number of SVs, respectively. Step 4: Increase the number of iteration. Step 5: Train the NPSVM model with parameters set in step 2. Step 6: Calculate the fitness function according to Eq. (16). Step 7: Update the global and personal best according to the Step 6 results. Evaluate the fitness of each particle according to Eq.16 and then compare the evaluated fitness value of each particle (personal optimal fitness (pfit)) to its personal best position (‫݌‬௜,௕௘௦௧ ):

Seyed Mojtaba Hosseini Bamakan et al. / Procedia Computer Science 91 (2016) 482 – 491

If the pfit is better than ‫݌‬௜,௕௘௦௧ then update the ‫݌‬௜,௕௘௦௧ as the current position, otherwise keep the previous ones in memory. b) If the pfit is better than ‫ ݌‬௚,௕௘௦௧ then update the ‫ ݌‬௚,௕௘௦௧ as the current position, otherwise keep the previous ‫ ݌‬௚,௕௘௦௧ . a)

Step 8: Update the velocity and position of each parameters (‫ܥ‬௜ ǡ ɂ, ߛ ݅ = 12,3,4 ) according to Eq. 14 and Eq. 15. Step 9: Terminate the PSO iteration by satisfying the stopping criteria, otherwise go to the Step 4. The termination criteria defined as the algorithm reached the maximum predefined iterations. Step 10: Recall the optimized parameters saved in the stopping iteration, and build the well-tuned NPSVM classifier. 5. Experimental setting and results In order to evaluate the performance of proposed method, we implemented our hybrid method PSO-NPSVM in a MATALAB 2010, on a platform with Windows 7 OS, Intel® Core™ i5 CPU @ 1.70GHz and 4.00 GB RAM. Several benchmarks form UCI machine learning repository is chosen to measure the performance of the proposed method, the description of chosen data sets provided in Table 3. To prevent the dominance of feature values in greater numeric ranges to those in smaller numeric ranges, all the data points scaled into the range of [0, +1]. K-fold cross validation was used to divide the dataset into the training and testing sets. In this research we set k=5, 4 parts as training sets and one part as testing set. Table 3: Description of data sets used in the experiments No 1 2 3 4 5 6 7 8 9 10

Data Sets Australian Germen Pima Indians (Pima) Hepatitis Statlog heart (Heart) Bupa liver disorders (Bupa) Ionosphere Sonar WPBC SPECT

No. of instances 690 1000 768 155 270 345 351 208 198 267

No. of features 14 24 8 19 13 6 34 60 34 22

No. of classes 2 2 2 2 2 2 2 2 2 2

In order to validate the performance of proposed method, we compared the PSO-NPSVM with NPSVM and TWSVM which their parameters defined by grid search method. It should be noted that we adopted the same experimental setup for NPSVM and TWSVM as used in [10]. In this research we chose some of the benchmark datasets which used in [10]. The average accuracy (ACC) of 5 fold cross-validation is presented in Table 4. From the results shown in Table 4, it can be observed that in most of cases the accuracy of proposed PSO-NPSVM is better than those method which their parameters have been adjusted by other methods such as grid search. However, it need to mention that the performance of the PSO depends on the pre-set parameters and it often suffers the problem of being trapped in local optima. Hence, is some cases this drawback of PSO affect the performance of proposed hybrid PSO-NPSVM. Moreover, some feature selection methods, can be used to further improve the PSO-NPSVM method [30] .

489

490

Seyed Mojtaba Hosseini Bamakan et al. / Procedia Computer Science 91 (2016) 482 – 491

Table 4: The results of obtained average accuracy on the benchmark datasets No 1 2 3 4 5 6 7 8 9 10

Data Sets Australian Germen Pima-Indians Hepatitis Statlog heart (Heart) BUPA Ionosphere Sonar WPBC SPECT

TWSVM (% ACC) 75.47 72.36 75.08 83.15 84.15 74.26 87.46 90.09 83.57 78.14

NPSVM(% ACC) 86.84 74.71 79.01 85.68 86.72 77.12 90.15 92.62 85.13 79.76

PSO-NPSVM(% ACC) 87.14 74.26 78.77 86.75 88.34 75.89 90.29 93.47 85.89 80.11

6. Conclusion Nonparallel support vector machine (NPSVM) is an extension of Twin-SVM which theoretically and empirically is superior to the TWSVM and also it overcomes several drawbacks of the existing TWSVMs. Although, NPSVM shows good sparsity and generalization power in most of cases, the performance of this classifier significantly depends on well setting of its parameters. The parameters of NPSVM include ܿ௜  ൒ 0, ݅ = 1, . . ,4 which are penalty constants, İ sensitivity parameter and parameter used in the Kernel functions. In this research we proposed a hybrid method such that particle swarm optimization method has been used to find the optimal value for the NPSVM’s parameters during the training process. In order to further improve the performance of nonparallel support vector machine we plan to propose a multikernel nonparallel SVM. Furthermore, as mentioned in the discussion section is some cases the standard PSO, may trap into the local optimum which this problem needs to be addressed. Acknowledgements This work has been supported by CAS-TWAS President’s Fellowship for International PhD Student.

Reference [1] Cortes, C. and V. Vapnik, Support-vector networks. Machine learning, 1995. 20(3): p. 273-297. [2] Boser, B.E., I.M. Guyon, and V.N. Vapnik. A training algorithm for optimal margin classifiers. in Proceedings of the fifth annual workshop on Computational learning theory. 1992. ACM. [3] Bamakan, S.M.H., et al., A New Intrusion Detection Approach Using PSO based Multiple Criteria Linear Programming. Procedia Computer Science, 2015. 55: p. 231-237. [4] Chen, F., et al., Multi-fault diagnosis study on roller bearing based on multi-kernel support vector machine with chaotic particle swarm optimization. Measurement, 2014. 47: p. 576-590. [5] Bamakan, S.M.H., et al., An effective intrusion detection framework based on MCLP/SVM optimized by time-varying chaos particle swarm optimization. Neurocomputing, 2016. 199: p. 90-102. [6] Mangasarian, O.L. and D.R. Musicant, Successive overrelaxation for support vector machines. Neural Networks, IEEE Transactions on, 1999. 10(5): p. 1032-1037. [7] Chang, C.-C. and C.-J. Lin, Training v-support vector classifiers: theory and algorithms. Neural computation, 2001. 13(9): p. 21192147. [8] Suykens, J.A. and J. Vandewalle, Least squares support vector machine classifiers. Neural processing letters, 1999. 9(3): p. 293-300. [9] Khemchandani, R. and S. Chandra, Twin support vector machines for pattern classification. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2007. 29(5): p. 905-910. [10] Tian, Y., et al., Nonparallel support vector machines for pattern classification. Cybernetics, IEEE Transactions on, 2014. 44(7): p. 10671079. [11] Zhou, J., et al., Identification of large-scale goaf instability in underground mine using particle swarm optimization and support vector machine. International Journal of Mining Science and Technology, 2013. 23(5): p. 701-707. [12] Lou, I., et al., Integrating Support Vector Regression with Particle Swarm Optimization for numerical modeling for algal blooms of freshwater. Applied Mathematical Modelling, 2015. 39(19): p. 5907-5916.

Seyed Mojtaba Hosseini Bamakan et al. / Procedia Computer Science 91 (2016) 482 – 491

[13] Li, B., et al., Slope stability analysis based on quantum-behaved particle swarm optimization and least squares support vector machine. Applied Mathematical Modelling, 2015. 39(17): p. 5253-5264. [14] Chen, S.-M. and P.-Y. Kao, TAIEX forecasting based on fuzzy time series, particle swarm optimization techniques and support vector machines. information Sciences, 2013. 247: p. 62-71. [15] Liu, D., Y. Tian, and Y. Shi, Ramp loss nonparallel support vector machine for pattern classification. Knowledge-Based Systems, 2015. [16] J. Kennedy, R.E., Particle swarm optimization. in: Proceedings of the 1995 IEEE International Conference on Neural Networks, 1995. Part 4 (of 6) Perth: p. pp. 1942–1948. [17] Huang, C.-L. and J.-F. Dun, A distributed PSO–SVM hybrid system with feature selection and parameter optimization. Applied Soft Computing, 2008. 8(4): p. 1381-1391. [18] Kolias, C., G. Kambourakis, and M. Maragoudakis, Swarm intelligence in intrusion detection: A survey. computers & security, 2011. 30(8): p. 625-642. [19] Wu, S.X. and W. Banzhaf, The use of computational intelligence in intrusion detection systems: A review. Applied Soft Computing, 2010. 10(1): p. 1-35. [20] Olariu, S. and A.Y. Zomaya, Handbook of bioinspired algorithms and applications. 2005: CRC Press. [21] Kuo, R., et al., Hybrid of artificial immune system and particle swarm optimization-based support vector machine for Radio Frequency Identification-based positioning system. Computers & Industrial Engineering, 2013. 64(1): p. 333-341. [22] Liu, F. and Z. Zhou, A new data classification method based on chaotic particle swarm optimization and least square-support vector machine. Chemometrics and Intelligent Laboratory Systems, 2015. 147: p. 147-156. [23] Zhai, S. and T. Jiang, A new sense-through-foliage target recognition method based on hybrid differential evolution and self-adaptive particle swarm optimization-based support vector machine. Neurocomputing, 2015. 149: p. 573-584. [24] Zhai, S. and T. Jiang, A novel particle swarm optimization trained support vector machine for automatic sense-through-foliage target recognition system. Knowledge-Based Systems, 2014. 65: p. 50-59. [25] Dong, Z., et al., A novel hybrid approach based on self-organizing maps, support vector regression and particle swarm optimization to forecast solar irradiance. Energy, 2015. 82: p. 570-577. [26] Zhao, X., et al., Customer Churn Prediction Based on Feature Clustering and Nonparallel Support Vector Machine. International Journal of Information Technology & Decision Making, 2014. 13(05): p. 1013-1027. [27] Li, N.-J., et al., Enhanced particle swarm optimizer incorporating a weighted particle. Neurocomputing, 2014. 124: p. 218-227. [28] Chiu, J.-T. and C.-H. Lin, A Modified Particle Swarm Optimization Based on Eagle Foraging Behavior. International Journal of Information Technology & Decision Making, 2016. 15(03): p. 703-727. [29] Clerc, M. and J. Kennedy, The particle swarm-explosion, stability, and convergence in a multidimensional complex space. Evolutionary Computation, IEEE Transactions on, 2002. 6(1): p. 58-73. [30] Bamakan, S.M.H. and P. Gholami, A Novel Feature Selection Method based on an Integrated Data Envelopment Analysis and Entropy Model. Procedia Computer Science, 2014. 31: p. 632-638.

491