Feature Selection and Parameter Optimization of Support Vector ...

Hindawi Publishing Corporation International Journal of Distributed Sensor Networks Volume 2015, Article ID 365869, 9 pages http://dx.doi.org/10.1155/2015/365869

Research Article Feature Selection and Parameter Optimization of Support Vector Machines Based on Modified Cat Swarm Optimization Kuan-Cheng Lin,1 Yi-Hung Huang,2 Jason C. Hung,3 and Yung-Tso Lin1 1

Department of Management Information Systems, National Chung Hsing University, Taichung 40227, Taiwan Department of Mathematics Education, National Taichung University of Education, Taichung 40306, Taiwan 3 Department of Information Management, Overseas Chinese University, Taichung 40721, Taiwan 2

Correspondence should be addressed to Yi-Hung Huang; [email protected] Received 14 July 2014; Accepted 16 September 2014 Academic Editor: Neil Y. Yen Copyright © 2015 Kuan-Cheng Lin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Recently, applications of Internet of Things create enormous volumes of data, which are available for classification and prediction. Classification of big data needs an effective and efficient metaheuristic search algorithm to find the optimal feature subset. Cat swarm optimization (CSO) is a novel metaheuristic for evolutionary optimization algorithms based on swarm intelligence. CSO imitates the behavior of cats through two submodes: seeking and tracing. Previous studies have indicated that CSO algorithms outperform other well-known metaheuristics, such as genetic algorithms and particle swarm optimization. This study presents a modified version of cat swarm optimization (MCSO), capable of improving search efficiency within the problem space. The basic CSO algorithm was integrated with a local search procedure as well as the feature selection and parameter optimization of support vector machines (SVMs). Experiment results demonstrate the superiority of MCSO in classification accuracy using subsets with fewer features for given UCI datasets, compared to the original CSO algorithm. Moreover, experiment results show the fittest CSO parameters and MCSO take less training time to obtain results of higher accuracy than original CSO. Therefore, MCSO is suitable for real-world applications.

1. Introduction Recently, applications of Internet of Things create enormous volumes of data, which are available for classification and prediction. Classification of big data needs an effective and efficient metaheuristic search algorithm to find the optimal feature subset. A wide range of metaheuristic algorithms such as genetic algorithms (GA) [1], particle swarm optimization (PSO) [2], artificial fish swarm algorithm (AFSA) [3], and artificial immune algorithm (AIA) [4] have been developed to solve optimization problems in numerous domains including project scheduling [5], intrusion detection [6, 7], botnet detection [8], and affective computing [9]. Cat swarm optimization (CSO) [10] is a recent development based on the behavior of cats, comprising two search modes: seeking and tracing. Seeking mode is modeled on a cat’s ability to remain alert to its surroundings, even while at rest; tracing mode

emulates the way that cats trace and catch their targets. Experimental results have demonstrated that CSO outperforms PSO in functional optimization problems. Optimization problems can be divided into functional and combinatorial problems. Most functional optimization problems, such as the determination of extreme values, can be solved through calculus; however, combinatorial optimization cannot be dealt with so efficiently. Feature selection is particularly important in domains such as bioinformatics [11] and pattern recognition [12] for its ability to reduce computing time and enhance classification accuracy. Classification is a supervised learning technology, in which input data comprises features classified by support vector machines (SVMs) [13], neural networks [14], or decision trees [15]. The classifier is trained by the input data to build a classification model applicable to unknown class data. However, input data often contains redundant or irrelevant

2

International Journal of Distributed Sensor Networks

features, which increase computing time and may even jeopardize classification accuracy. Selecting features prior to classification is an effective means of enhancing the efficiency and classification accuracy of a classifier. Two feature selection models are commonly used: filter and wrapper models [16]. Filter models are used to evaluate feature subsets by calculating a number of defined criteria, while the latter evaluates feature subsets through the assembly of classification models followed by accuracy testing. Filter models are faster but provide lower classification accuracy, while wrapper models tend to be slower but provide high classification accuracy. Advancements in computing speed, however, have largely overcome the disadvantages of wrapper models, which has led to its widespread adoption for feature selection. In 2006, Tu et al. proposed PSO-SVM [17], which obtains good results through the integration of feature subset optimization and parameter optimization by PSO. Researchers have demonstrated the superior performance of CSO-SVM over PSO-SVM [18]; however, its searching ability remains weak. In [19], the authors propose a modified CSO, called MCSO, capable of enhancing the searching ability of CSO through the integration of feature subset optimization and parameter optimization for SVMs. However, in [20], the authors do not consider the fittest value of CSO parameters. And for the realworld big-data applications, the training time of classifiers is the critical factor to be considered. Hence, in this study, we advancely improve the experiments to discuss the CSO parameters and verify that MCSO takes less training time to obtain results of higher accuracy than original CSO.

2. Related Work 2.1. Support Vector Machine (SVM). Vapnik [21] first proposed the SVM method based on structural risk minimization theory in 1995. Since that time, SVMs have been widely applied in the solving of problems related to classification and regression. The underlying principle of SVM theory involves establishing an optimal hyperplane with which to separate data obtained from different classes. Although more than one hyperplane may exist, just one hyperplane maximizes the distance between two classes. Figure 1 presents an optimal hyperplane for the separation of two classes of data. In many cases, data is not linearly separable. Thus, a kernel function is applied to map data into a Vapnik-Chervonekis dimensional space, within which a hyperplane is identified for the separation of classes. Common kernel functions include radial basis functions (RBFs), polynomials, and sigmoid kernels, as shown in (1), (2), and (3), respectively. RBF kernel is 󵄩 󵄩 Φ (𝑥𝑖 − 𝑥𝑗 ) = exp (−𝛾 󵄩󵄩󵄩󵄩𝑥𝑖 − 𝑥𝑗 󵄩󵄩󵄩󵄩) .

(1)

Polynomial kernel is Φ (𝑥𝑖 − 𝑥𝑗 ) = (1 + 𝑥𝑖 ⋅ 𝑥𝑗 ) .

in arg M

Hyperplane

Figure 1: Optimal hyperplane.

Sigmoid kernel is Φ (𝑥𝑖 − 𝑥𝑗 ) = tanh (𝑘𝑥𝑖 ⋅ 𝑥𝑗 − 𝛿) .

(3)

This study employed LIBSVM [22] as a classifier and an RBF as its kernel function, due to the ability of RBFs to handle nonlinear cases using fewer hyperparameters than that required for the polynomial kernel [23]. The SVM is regarded as a black box that receives input data and thereby builds up a classifier. 2.2. Cat Swarm Optimization (CSO). CSO is a newly developed algorithm based on the behavior of cats, comprising two search modes: seeking and tracing. Seeking mode is modeled on a cat’s ability to remain alert to its surroundings, even while at rest; tracing mode emulates the way that cats trace and catch their targets. These modes are used to solve optimization problems. In CSO, every cat in a given population is given a position, velocity, and fitness value. Position represents a point corresponding to a possible solution set. Velocities are a variance of distance in a 𝐷-dimensional space. Fitness values refer to the quality of the solution set. In every generation, CSO randomly distributes cats into either seeking mode or tracing mode, by which the position of each cat is then altered. Finally, the best solution set is indicated by the position of the cat with the highest fitness value. 2.2.1. CSO Process. The mixture ratio (MR) represents the percentage of cats distributed to tracing mode (e.g., if the total number of cats is 50 and value of MR is 0.2, the number of cats assigned to tracing mode will be 10). Because the cats spend most of the time resting, the value of MR is small. The process is outlined as follows. (1) Randomly initialize 𝑁 cats with position and velocities within a 𝐷-dimensional space. (2) Distribute cats to tracing or seeking mode; the number of cats in the two modes determines MR.

(2)

(3) Measure the fitness value of each cat.


Random initialize cats. WHILE (is terminal condition reached) Distribute cats to seeking/tracing mode. FOR (𝑖 = 0; 𝑖 < NumCat; 𝑖++) Measure fitness for cat𝑖 . IF (cat𝑖 in seeking mode) THEN Search by seeking mode process. ELSE Search by tracing mode process. END End FOR End WHILE Output optimal solution. Pseudocode 1: Pseudocode of CSO.

(4) Search for each cat according to its mode in each iteration. The processes involved in these two modes are described in the following subsection. (5) Stop the algorithm if terminal criteria are satisfied; otherwise return to (2) for the following iteration. Pseudocode 1 presents the pseudocode used in the main processes of CSO. 2.2.2. Seeking Mode. Seeking mode has four operators: seeking memory pool (SMP), self-position consideration (SPC), seeking range of the selected dimension (SRD), and counts of dimension to change (CDC). SMP defines the pool size of seeking memory. For example, an SMP value of 5 indicates that each cat is capable of storing 5 solution sets as candidates. SPC is a Boolean value, such that if SPC is true, one position within the memory will retain the current solution set and not be changed. SRD defines the maximum and minimum values of the seeking range, and CDC represents the number of dimensions to be changed in the seeking process. The process of seeking is outlined as follows. (1) Generate SMP copies of the position of the current cat. If SPC is true, one of the copies retains the position of the current cat and immediately becomes a candidate. The other cats must be changed before becoming candidates. Otherwise, all SMP copies will perform searching resulting in changes in their positions. (2) Each copy to be changed randomly changes position by altering the CDC percent of dimensions. First, select the CDC percent of dimensions. Every selected dimension will randomly change to increase or decrease the current value of the SRD percent. After being changed, the copies become candidates. (3) Calculate the fitness value of each candidate via the fitness function. (4) Calculate the probability of each candidate being selected. If all candidate fitness values are the same,

3 then the selected probability (𝑃𝑖 ) is equal to one. Otherwise, 𝑃𝑖 is calculated via (4). Variable 𝑖 is between 0 and SMP; 𝑃𝑖 is the probability that this candidate will be selected; FSmax and FSmin represent the maximum and minimum overall fitness values; and FS𝑖 is the fitness value of this candidate. If the goal is to find the solution set with the maximum fitness value, then FS𝑏 = FSmin ; otherwise FS𝑏 = FSmax . Calculating probability using this function gives the better candidate a higher chance of being selected, and vice versa, as follows: 󵄨󵄨󵄨FS − FS𝑏 󵄨󵄨󵄨 󵄨 . (4) 𝑃𝑖 = 󵄨 𝑖 FSmax − FSmin (5) Randomly select one candidate according to the selected probability (𝑃𝑖 ). Once the candidate has been selected, move the current cat to this position. 2.2.3. Tracing Mode. Tracing mode represents cats tracing a target, as follows. (1) Update velocities using (5). (2) Update the position of the current cat, according to (6) as follows: 𝑡+1 𝑡 𝑡 𝑡 = V𝑘,𝑑 + 𝑟1 × 𝑐1 × (𝑥best,𝑑 − 𝑥𝑘,𝑑 ), V𝑘,𝑑

𝑑 = 1, 2, . . . , 𝐷, 𝑡+1 𝑡 𝑡 𝑥𝑘,𝑑 = 𝑥𝑘,𝑑 + V𝑘,𝑑 .

(5) (6)

𝑡 𝑡 𝑥𝑘,𝑑 and V𝑘,𝑑 are the position and velocities of current cat 𝑡 denotes the best solution set from cat𝑘 𝑘 at iteration 𝑡 ⋅ 𝑥best,𝑑 in the population. 𝑐1 is a constant and 𝑟1 is a random number between 0 and 1.

3. Proposed MCSO-SVM Neither seeking nor tracing mode is capable of retaining the best cats; however, performing a local search near the best solution sets can help in the selection of better solution sets. This paper proposes a modified cat swarm optimization method (MCSO) to improve searching ability in the vicinity of the best cats. Before examining the process of the algorithm, we must examine a number of relevant issues. 3.1. Classifiers. Classifiers are algorithms used to train data for the construction of models used to assign unknown data to the categories in which they belong. This paper adopted an SVM as a classifier. SVM theory is derived from statistical learning theory, based on structural risk minimization. SVMs are used to find a hyperplane for the separation of two groups of data. This paper regards the SVM as a black box that receives training data for the classification model. 3.1.1. Solution Set Design. Figure 2 presents a solution set containing data in two parts: SVM parameters (𝐶 and 𝛾)

4

International Journal of Distributed Sensor Networks C

𝛾

F1

· · · Fi · · ·

0.8

Fn

Figure 2: Representation of a solution set.

and feature subset. Parameter 𝐶 is the penalty coefficient that means tolerance for error. And, parameter 𝛾 is a kernel parameter of RBF, which means radius of RBF. The continuous value of these two parameters will be converted into binary coding. Another is feature subset variables (𝐹1 ∼ 𝐹𝑛 ) where 𝑛 is the number of features. The range of the variables in each feature subset falls between 0 and 1. If 𝐹𝑖 is greater than 0.5, its corresponding feature is selected; otherwise, the corresponding feature is not chosen. 3.1.2. Mutation Operation. Mutation operators are used to locate new solution sets in the vicinity of other solution sets. When a solution set mutates, every feature has a chance to change. Figure 3 presents an example of mutation. Before the process of mutation, the first, second, and fourth features of the solution set were assigned for mutation; therefore, these features will be converted from selected 0.8(1) to unselected (0), unselected 0.3(0) to selected (1), and selected 0.9(1) to unselected (0). This operation is used only for the best solution sets; therefore, it is not necessary to maintain the actual position—recording the selected features is sufficient. In addition, 𝐶 and 𝛾 must be changed to binary, so they can select a mutation operation for the following search. 3.1.3. Evaluating Fitness. This study employed 𝑘-fold crossvalidation [24] to test the search ability of the algorithms. 𝑘 was set to 5, indicating that 80% of the original data was randomly selected as training data and the remainder was used as testing data. And then retain features up to a solution set that we want to evaluate for training data and testing data. This training data is input into an SVM to build a classification model used for the prediction of testing data. The prediction accuracy of this model represents its fitness. In order to compare solution sets with the same fitness, we considered the number of selected features. If prediction accuracy were the same, the solution set with fewer selected features would be considered superior. 3.1.4. Proposed MCSO Approach. The steps of the proposed MCSO-SVM are presented as follows. (1) Randomly generate 𝑁 solution sets and velocities with 𝐷-dimensional space, represented as cats. Define the following parameters: seeking memory pool (SMP), seeking range of the selected dimension (SRD), counts of dimension to change (CDC), mixture ratio (MR), number of best solution sets (NBS), mutation rate for best solution sets (MR Best), and number of trying mutation (NTM). (2) Evaluate the fitness of every solution set using SVM. (3) Copy the NBS best cats into best solution set (BSS). (4) Assign cats to seeking mode or tracing mode based on MR.

0

0.3 1

0.4 0.4

0.9 0

0.7 0.7

Figure 3: An example of mutation.

Initialization

Distribution

Yes

No

The current cat is in seeking mode?

Seeking mode process

Tracing mode process

Update best cats

Search by mutation for best cats

Terminate?

No

Yes Optimal solution

Figure 4: Flow chart of MCSO.

(5) Perform search operations corresponding to the mode (seeking/tracing) assigned to each cat. (6) Update the BSS. For every cat after the searching process, if it is better than the worst solution set in BSS, then replace the worst solution set with the better solution set. (7) For each solution set in BSS, search by mutation operation for NTM times. If it is better than the worst solution set in BBS, then replace the worse solution set with the better solution set. (8) If terminal criteria are satisfied, output the best subset; otherwise, return to (4). Figure 4 presents a flow chart of MCSO. The bold area indicates the processes added for the modification of CSO, while the other processes run as CSO. After searching in accordance with seeking and tracing modes, the cats with higher fitness can be updated to the best solution set (BSS).


5

Table 1: Datasets from UCI repository. Number 1 2 3 4 5 6 7 8 9

Dataset

Number of classes

Number of features

Number of instances

Australian Bupa German Glass Ionosphere Pima Vehicle Vowel Wine

2 2 2 6 2 2 4 11 3

14 6 24 9 34 8 18 10 13

690 345 1000 214 351 768 846 528 178

Table 2: Parameters of MCSO. Parameter

𝑁

SMP

SRD

CDC

MR

𝑐1

𝐶

𝛾

Value or range

40

5

0.6

80%

20%

2

[0.01, 1024]

[0.00001, 8]

Table 3: Experimental results comparing the parameters of MCSO.

NBS 10 20 30 10 10 10 10 5 30

Parameters MR Best 0.05 0.05 0.05 0.1 0.2 0.05 0.05 0.02 0.2

Datasets NTM 5 5 5 5 5 10 20 5 20

Australian 91.45 90.87 91.30 91.01 91.16 91.30 91.45 91.01 91.30

A mutation operation is then applied to BSS to search other solution sets. If a solution set following mutation shows improvement, it replaces the original solution set in the BSS.

4. Experimental Results This study followed the convention of adopting UCI datasets [20] to measure the classification accuracy of the proposed MCSO-SVM method. Table 1 displays the datasets employed. The experimental environment was as follows: desktop computer running Windows 7 on an Intel Core i5-2400 CPU (3.10 GHz) with 4 GB RAM. Dev C++ 4.9.9.2 was used in conjunction with LIBSVM for development and a radial basis function (RBF) as the SVM kernel function. Table 2 lists the parameter values used in the experiment. The terminal condition was determined as the inability to identify a solution set with higher classification accuracy, despite continuous iteration. We employed 5-fold cross-validation; that is, all values were verified five times to ensure

German 81.70 82.30 82.80 82.30 81.40 81.90 82.50 82.20 82.00

Vehicle 89.60 90.19 90.07 90.31 89.37 89.84 90.55 89.13 91.02

Average 87.58 87.79 88.06 87.87 87.31 87.68 88.17 87.45 88.11

the reliability of the experiment. The dataset was randomly separated into 5 segments. In each iteration, one segment was selected as test data (nonrepetitively) and the others were used as training data. The five tests were averaged to obtain a value of classification accuracy. 4.1. Experiment Parameters. Various experiments were employed to test the parameters of the MCSO-SVM, including NBS, MR Best, and NTM in nine groups. Table 3 lists the experimental results comparing the nine groups of parameters. The highest average classification accuracy for the three datasets was obtained using the following parameters: NBS = 10, MR Best = 0.05, and NTM = 20. 4.2. Comparison with CSO-SVM. The best combination of nine parameters was applied in the comparison of CSO and MCSO. The results are presented in Table 4. Table 4 compares MCSO-SVM and CSO-SVM with regard to classification accuracy and the number of selected

6


Australian

Bupa

0.94

0.8

0.93

0.795

0.92

0.79

0.91

0.785

0.9

0.78

0.89

0.775

0.88

0.77

0.87

0.765

0.86

0.76

0.85

0.755

0.84

0

2000

4000

6000

0.75

0

2000

German

4000

6000

Glass 0.85

0.83 0.82

0.84

0.81 0.8

0.83

0.79 0.82

0.78 0.77

0.81

0.76 0.75 0

2000

4000

6000

0.8

0

2000

Ionosphere

4000

6000

Pima 0.83

1.1

0.82 0.81

1.05

0.8 0.79

1

0.78 0.77

0.95

0.76 0.75

0.9 0

2000 MCSO CSO

4000

6000

0

2000

4000

6000

MCSO CSO

Figure 5: Relationship between time and classification accuracy using CSO and MCSO (Australia, Bupa, German, Glass, Ionosphere, and Pima).


7

Vehicle

0.92

Vowel

1.05 1.04

0.9

1.03 1.02

0.88

1.01 1

0.86

0.99 0.84

0.98 0.97

0.82

0.96 0.8

0

2000

4000

6000

0.95

2000

0

MCSO CSO

4000

6000

MCSO CSO Wine

1.05 1.04 1.03 1.02 1.01 1 0.99 0.98 0.97 0.96 0.95

2000

0

4000

6000

MCSO CSO

Figure 6: Relationship between time and classification accuracy using CSO and MCSO (Vehicle, Vowel, and Wine).

features. For Glass and Pima datasets, the MCSO-SVM had higher classification accuracy and fewer selected features than CSO-SVM. In Australian, Bupa, German, Ionosphere, and Vehicle datasets, MCSO-SVM had higher classification accuracy. Due to the advantages afforded by searching in the vicinity of the best solution sets, MCSO-SVM clearly outperformed CSO-SVM. Training time is a crucial issue in many situations, such that the best solution set is sacrificed for one that is good enough. The above experiment was used to illustrate the relationship between time and classification accuracy in CSO and MCSO, using the nine datasets described in Table 1, as shown in Figures 5 and 6. The horizontal axis represents the execution of processes (in seconds) while the vertical axis

presents classification accuracy. MCSO appears in the upperleft corner of Figures 5 and 6, indicating that it takes less time to obtain results of higher accuracy. So, the MCSO is suitable for the real-world big data applications.

5. Conclusions This study developed a modified version of CSO (MCSO) to improve searching ability by concentrating searches in the vicinity of the best solution sets. We then combined this with an SVM to produce the MCSO-SVM method of feature selection and SVM parameter optimization to improve classification accuracy. Evaluation using UCI datasets demonstrated

8

International Journal of Distributed Sensor Networks Table 4: Comparison results for CSO-SVM and MCSO-SVM.

Datasets

Number of original features

Australian Bupa German Glass Ionosphere Pima Vehicle Vowel Wine

14 6 24 9 34 8 18 10 13

CSO-SVM Number of selected Average accuracy features rate (%) 5.0 3.8 6.6 4.4 7.2 4.6 7.8 7.8 3

90.87 79.13 80.00 83.65 99.72 81.25 86.41 100 100

MCSO-SVM Number of selected Average accuracy features rate (%) 5.0 4 8.2 4.2 7.2 3.6 10.2 7 2.8

91.45∗∗ 79.42∗∗ 82.50∗∗ 84.60∗∗∗ 100∗∗ 81.78∗∗∗ 90.55∗∗ 100∗ 100∗

∗

Accuracy is equal to the other method; however, fewer features are selected. Higher accuracy. ∗∗∗ Higher accuracy and fewer features are selected. ∗∗

that the MCSO-SVM method requires less time than CSOSVM to obtain classification results of superior accuracy.

Conflict of Interests The authors declare that there is no conflict of interests regarding the publication of this paper.

References [1] J. H. Holland, Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, Mich, USA, 1975. [2] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, pp. 1942–1948, Perth, Australia, December 1995. [3] X.-L. Li, Z.-J. Shao, and J.-X. Qian, “Optimizing method based on autonomous animats: fish-swarm Algorithm,” System Engineering Theory and Practice, vol. 22, no. 11, pp. 32–38, 2002. [4] A. Ishiguro, T. Kondo, Y. Watanabe, Y. Uchikawa, and Y. Shirai, “Emergent construction of artificial immune networks for autonomous mobile robots,” in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, vol. 2, pp. 1222–1228, October 1997. [5] K. Kim, M. Gen, and M. Kim, “Adaptive genetic algorithms for multi-recource constrained project scheduling problem with multiple modes,” International Journal of Innovative Computing, Information and Control, vol. 2, no. 1, pp. 41–49, 2006. [6] Y. Li, L. Guo, Z.-H. Tian, and T.-B. Lu, “A lightweight web server anomaly detection method based on transductive scheme and genetic algorithms,” Computer Communications, vol. 31, no. 17, pp. 4018–4025, 2008. [7] M.-Y. Su, “Real-time anomaly detection systems for Denialof-Service attacks by weighted k-nearest-neighbor classifiers,” Expert Systems with Applications, vol. 38, no. 4, pp. 3492–3498, 2011. [8] K.-C. Lin, S.-Y. Chen, and J. C. Hung, “Botnet detection using support vector machines with artificial fish swarm algorithm,” Journal of Applied Mathematics, vol. 2014, Article ID 986428, 9 pages, 2014.

[9] K. C. Lin, T.-C. Huang, J. C. Hung, N. Y. Yen, and S. J. Chen, “Facial emotion recognition towards affective computing-based learning,” Library Hi Tech, vol. 31, no. 2, pp. 294–307, 2013. [10] S. C. Chu and P. W. Tsai, “Computational intelligence based on the behavior of cats,” International Journal of Innovative Computing, Information and Control, vol. 3, no. 1, pp. 163–173, 2007. [11] R. Stevens, C. Goble, P. Baker, and A. Brass, “A classification of tasks in bioinformatics,” Bioinformatics, vol. 17, no. 2, pp. 180– 188, 2001. [12] J. K. Kishore, L. M. Patnaik, V. Mani, and V. K. Agrawal, “Application of genetic programming for multicategory pattern classification,” IEEE Transactions on Evolutionary Computation, vol. 4, no. 3, pp. 242–258, 2000. [13] T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, vol. 16, no. 10, pp. 906–914, 2000. [14] G. P. Zhang, “Neural networks for classification: a survey,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 30, no. 4, pp. 451–462, 2000. [15] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Francisco, Calif, USA, 1993. [16] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, Springer, 1998. [17] C. J. Tu, L.-Y. Chuang, J.-Y. Chang, and C.-H. Yang, “Feature selection using PSO-SVM,” in Proceedings of the International Multiconference of Engineers and Computer Scientists (IMECS ’06), pp. 138–143, 2006. [18] K.-C. Lin and H.-Y. Chien, “CSO-based feature selection and parameter optimization for support vector machine,” in Proceedings of the Joint Conferences on Pervasive Computing (JCPC ’09), pp. 783–788, Taipei City, Taiwan, December 2009. [19] K.-C. Lin, Y.-H. Huang, J. C. Hung, and Y.-T. Lin, “Modified cat swarm optimization algorithm f or feature selection of support vector machines,” in Frontier and Innovation in Future Computing and Communications, vol. 301 of Lecture Notes in Electrical Engineering, pp. 329–336, 2014.

International Journal of Distributed Sensor Networks [20] S. Hettich, C. L. Blake, and C. J. Merz, “UCI Repository of Machine Learning Databases,” 1998, http://www.ics.uci.edu/∼ mlearn/MLRepository.html. [21] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 1995. [22] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vector machines,” 2011, http://www.csie.ntu.edu.tw/∼cjlin/libsvm. [23] C. W. Hsu, C. C. Chang, and C. J. Lin, A Practical Guide to Support Vector Classication, 2003, http://www.csie.ntu.edu.tw/∼ cjlin/papers/guide/guide.pdf. [24] S. L. Salzberg, “On comparing classifiers: pitfalls to avoid and a recommended approach,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 317–328, 1997.

9

International Journal of

Rotating Machinery

The Scientific World Journal Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Engineering Journal of

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014


Volume 2014

Advances in

Mechanical Engineering

Journal of

Sensors Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014


Distributed Sensor Networks Hindawi Publishing Corporation http://www.hindawi.com


Volume 2014

Advances in

Civil Engineering Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Volume 2014

Submit your manuscripts at http://www.hindawi.com

Advances in OptoElectronics

Journal of

Robotics Hindawi Publishing Corporation http://www.hindawi.com


Volume 2014

Volume 2014

VLSI Design

Modelling & Simulation in Engineering


Navigation and Observation


Chemical Engineering Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014



Volume 2014


Advances in

Acoustics and Vibration Volume 2014


Volume 2014

Volume 2014

Journal of

Control Science and Engineering

Active and Passive Electronic Components Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014


Journal of

Antennas and Propagation Hindawi Publishing Corporation http://www.hindawi.com

Shock and Vibration Volume 2014


Volume 2014


Volume 2014

Electrical and Computer Engineering Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Feature Selection and Parameter Optimization of Support Vector ...

Feature Selection and Parameter Optimization of Support Vector ...

Suggest Documents

Feature Selection and Parameter Optimization of a

Parameter selection for support vector machines

Parameter determination of support vector machine and feature ...

EEG Feature Selection Using Mutual Information and Support Vector

multi-category support vector machines, feature selection and solution ...

On the parameter optimization of Support Vector ... - Semantic Scholar

Sensitivity of Support Vector Machines to Random Feature Selection ...

Efficient Parameter Selection for Support Vector ... - Semantic Scholar

Automated Parameter Selection for Support Vector Machine Decision ...

parameter selection in least squares-support vector machines

Parameter selection for support vector machines - Google Sites

Automated Parameter Selection for Support Vector Machine Decision ...

A GA-based feature selection and parameters optimization for support ...

A Heuristic for Free Parameter Optimization with Support Vector

support vector machines combined with feature selection ... - DergiPark

Feature Selection in Proteomic Pattern Data with Support Vector

Markov Blanket Feature Selection for Support Vector ... - Google Sites

Feature Selection for Support Vector Machines by Means ... - CiteSeerX

Bootstrap Feature Selection in Support Vector Machines ... - UCL/ELEN

Feature Selection for Support Vector Machines in Financial Time ...

Linear Penalization Support Vector Machines for Feature Selection

Hybrid Support Vector Machine based Feature Selection Method for

Nonlinear Feature Selection with the Potential Support Vector Machine

Feature Selection for Fast Image Classification with Support Vector