A Genetic Based Wrapper Feature Selection Approach ... - IEEE Xplore

5 downloads 187 Views 900KB Size Report
Department of Computer Science, College of Arts and. Sciences, Universiti Utara Malaysia ... select the best feature (or combination of features) for the data.
2011 3rd Conference on Data Mining and Optimization (DMO) 28-29 June 2011, Selangor, Malaysia

A Genetic Based Wrapper Feature Selection Approach Using Nearest Neighbour Distance Matrix Mohd Shamrie Sainin Department of Computer Science, College of Arts and Sciences, Universiti Utara Malaysia Sintok, Kedah, Malaysia. [email protected] Abstract—Feature selection for data mining optimization receives quite a high demand especially on high-dimensional feature vectors of a data. Feature selection is a method used to select the best feature (or combination of features) for the data in order to achieve similar or better classification rate. Currently, there are three types of feature selection methods: filter, wrapper and embedded. This paper describes a genetic based wrapper approach that optimizes feature selection process embedded in a classification technique called a supervised Nearest Neighbour Distance Matrix (NNDM). This method is implemented and tested on several datasets obtained from the UCI Machine Learning Repository and other datasets. The results demonstrate a significant impact on the predictive accuracy for feature selection combined with the supervised NNDM in classifying new instances. Therefore it can be used in other applications that require feature dimension reduction such as image and bioinformatics classifications. Keywords-machine learning; data mining; data mining optimization; nearest neighbour; distance matrix; classification; feature selection; genetic algorithm I.

INTRODUCTION

The advancement in the usage of data with large number of attributes or features becomes the main concern of data mining and machine learning application especially for pattern recognition. The problem of too many features in the dataset which is called the “curse of dimensionality” is one of the popular challenges in pattern recognition. In the other hand, the question of “which attribute is more important?” is always the first thing to be asked in data mining. Therefore, feature selection has attracted significant attentions from various researchers. Feature selection is defined as a method to select a subset of features from a list of candidate features that can be used with better classification accuracy for some classification methods [1]. The objectives of feature selection are to avoid overfitting and to improve performance, to reduce the dimensionality of the input data to the classifier, and to gain a deeper insight of the underlying processes (e.g. how the data is generated) [2]. Although feature selection comes with certain disadvantages and its effect as described in [2], this approach can be considered as one of the pre-processing techniques which is important in data mining [3]. This preprocessing step (feature selection) reveals additional information and insight of the general pattern before the actual classifier is applied to the data. Fig.1 shows the general feature selection process.

978-1-61284-212-7/11/$26.00©2011 IEEE

Rayner Alfred School of Engineering and Information Technology Universiti Malaysia Sabah, Locked Bag 2073, Kota Kinabalu, Sabah, Malaysia. [email protected] Figure 1. General feature selection processes [4].

Feature selection techniques can be divided into three

categories, based on how the feature selection is combined with the construction of the classifier model. They are filter, wrapper and embedded model search [1]. Each of the methods has their own strengths and weaknesses depending on how the data and the classifier are set up in order to solve the problems at hand. The key differences of the filter and wrapper method are depicted in Fig.2. Figure 2. The key differences of filter and wrapper method based on how

the learning algorithm and feature subset search algorithm integrated to the process [5].

In this paper, a genetic algorithm is implemented in order to find the best set of features for the wrapper method with a nearest neighbour classifier which is called as Nearest Neighbour Distance Matrix (NNDM). This paper is organized as follows: Section 2 discusses related works that motivate the work on the implementation of the genetic based wrapper feature selection. Then, a wrapper feature selection algorithm based on a genetic algorithm (GA) and classifier algorithms are briefly explained

237

in Section 3. The experimental setup and results are discussed in Section 4. Finally, the conclusion and perspectives of this work is discussed in Section 5. II.

BACKGROUND

A genetic algorithm (GA) is considered as an inductive learning [7] and belongs to the evolutionary algorithm [8]. It is used mainly in the problem of optimization. This is due to its capability of natural selection mechanism through initial population until set of random solution is achieved. In a feature selection process which is based on GA, it has been used mainly for optimizing the selection of feature subset that maximizes certain fitness functions. A genetic algorithm consists of three important operators namely Selection, Crossover and Mutation. In the Selection phase, GA maintains a set of good strings (defined by fitness function) to be reproduced in the next generation. Through the generation cycle, new offspring are created by the genetic operators called crossover and mutation. Crossover function operates randomly using two genes as parent and then exchange their values from the point selected. The product of the crossover operator is combination of two new offspring with almost similar features. While crossover affects two new offspring to be created, mutation operator alters the content of a string to another value (if binary 0, change to 1 or vice versa). Crossover and mutation operators are controlled by a small constant value which acts as a probability of the process to happen. This operator prevents any stagnation that might occur during the search process [9]. In each generation, the performance of a population is evaluated which generates the best fitness and tested for the termination criteria. While the termination criterion is not satisfied, the GA operators are applied through the generation cycle and then re-evaluate the current generation. When the criterion is achieved, the GA will stop and present the final solution. There are various methods that have been applied to optimize feature selection problem and GA is one of the popular methods. Since GA has the ability to search from a large space without much priori, it is considered as a good choice for a robust feature selection method. It also performs a global search which has demonstrated a significant improvement compared to other search algorithms. The method was earlier reported in [10] where GA is utilized in a beam search for the nearest neighbour classifier. The work was then extended by [11] using neural network classifier that showed improved effectiveness and execution times. [7] and [12] presented their works of using GA in feature selection for texture classification. Another GA and k-nearest neighbour classifier reported in [13] as an extension and modification of the first work by [10]. This modified combination of GA-kNN is using the scaling of each of the kNN features to get the optimal class separation. In their separate work, [14] again presented a similar method for feature selection with GA to improve the rule induction system. The work shows that their multi-strategy approach is capable to solve the problem of learning rules for texture images classification.

978-1-61284-212-7/11/$26.00©2011 IEEE

There are various other usages of GA in feature selection such as [1, 9, 15-22]. All these works have contributed to the knowledge of optimizing the feature selection process. A feature subset selection in wrapper is done by using the induction algorithm as a black box where the knowledge of the algorithm is not needed [16]. A wrapper method is heavily dependent on the classifier. The induction algorithm will conduct a search for a good feature subset as part of the evaluation function. There are many induction algorithms that have been used as part of the GA process for feature selection. One of the methods is the application of nearest neighbour classifier as described in [13, 23]. In the recent novel nearest neighbour classifier called as Nearest Neighbour Distance Matrix (NNDM) described in [6], the classifier has a promising performance and thus it is used as the classifier for the feature selection process described in this paper. In other word, inspired by the usages of GA in feature selection and NNDM algorithm above, a genetic based wrapper for feature selection is proposed. III.

THE APPROACH

In this section, the proposed NNDM classifier method that comprises of GA based wrapper feature selection is explained. The basic block diagram of the approach is depicted at Fig. 3. According to the diagram in Fig. 1, two portions of any dataset will be used to find the feature subset

that will provide the best estimated accuracy in the final evaluation. Figure 3. The architecture of the GA based wrapper feature selection using NNDM algorithm

A. The NNDM Algorithm The nearest neighbour distance matrix (NNDM) is a matrix of distances for each training data in the dataset. NNDM, which comprises of nxn matrix, computes the distance between every item in the dataset and the remainder items in the dataset. This distance matrix is created prior to classifying a new query to speed up the process during the

238

classification task. NNDM is similar to the k-NN, where distance measure is used to classify the query instance. However, in NNDM, the classification of the new query instance is based on the distance matrix by first calculating the distance of the query instance from the training instances and the classification will be voted by the distance matrix obtained from the training phase. Compared to the traditional k-NN algorithm in which k refers to k-nearest instances to the query, k in NNDM refers to the k-nearest instances from the distance matrix. There are two techniques implemented in NNDM classifier which are called as Unsupervised NNDM (UNNDM) and Supervised NNDM (SNNDM). UNNDM method will not consider the class label in constructing the distance matrix prior to the classification task. The distance values are purely calculated based on the standard Euclidean Distance of two instances. On the other hand, SNNDM considers the class labels of the data that will be applied in the loss function in constructing the distance matrix. Besides having the training instances, the SNNDM will group similar instances into an individual group when the distance loss function is applied. The distance loss function, which is implemented in [24], has a very intuitive upgrade to the NNDM algorithm. It consist of two competing term in which the first term pulls target neighbours closer, and second term pushes different labelled example away from the target instance. The basic implementation of NNDM has been discussed in [6]. The pseudo-code for SNNDM with regard to our GA feature selection is shown in Fig. 4, where D1 is the Euclidean distance of xi and xj and modified with loss function in D2. Function Training for selected attribute, A Define training NNDM array, M Iterate Training, xi, xj D1 = SQRT(d(xi-xj)) D2 = D1*[(1-µ)εpull(L)+µεpush(L)] Build matrix M = D2 End Sort and Index M as INDEX(TADM) End Figure 4. The pseudo-code for the Supervised NNDM

Only selected attributes are considered when calculating the distance measure among instances as shown in Fig. 4. The selected feature subset will be used to create a distance matrix (or known as an affinity matrix). Furthermore as described in D2 (Fig.4) for SNNDM, the loss function will modify the distance in the matrix in order to separate the class nearest neighbours from non-class members. The distance matrix will be used to classify training sample (during GA process) and to evaluate final feature subset at the end of GA process. B. The GA Approach NNDM algorithm alone is not capable to tell which subset of attributes is most important in order to discriminate the entire sample. NNDM only allows the computation of similarity of the samples while ignoring the relation of each corresponding attributes. The motivation of using GA in this work is due to the fact that GA is suitable for optimization

978-1-61284-212-7/11/$26.00©2011 IEEE

problem. The GA based wrapper feature selection is adopted to optimize the possible combination of certain number of attributes that best describe the dataset while maintaining higher classification rates. The initial population for the feature selection is generated based on one of the two methods, which are random and using Information Gain (IG), shown in Equation 1. Random population generation is simply turning randomly certain features in the data with ‘1’ (ON) and ‘0’ (OFF) otherwise. The second method using information gain in initial population generation is conducted where an IG value need to be defined in order to set ‘1’ (ON) only the features which met the criteria of a given IG value (equal or larger) and leaves features with low IG values as ‘0’ (OFF). IG for each attributes (features) is calculated beforehand during the loading of the dataset. Information gain threshold ig is needed to generate set of attributes , only if , : . This threshold is initially set at the value of two third from all IG values to discard low IG values. The later method is useful where only the initial ‘good’ attributes based on their IG are selected to minimize the GA process while producing higher classification rate. The pseudo code for this method is shown in Fig. 5. Function InitGen(ig) For i = 0 to A.length id = random() * A.length gain = getIG(id) If bits[id] != 1 && gain >= ig) bits[id] = 1 count++; Else bits[id] = 0 End End If count < 2 InitGen(ig); End Figure 5. The pseudo code for population generation based on attribute ig threshold

IG in the second method is implemented using MultiInterval discretization of continuous-valued attributes approach to calculate the information gain of each attribute. This approach was introduced by [25] where a decision criterion called minimum description length principle (MDLP) criterion is applied as follow: , :



, :

.

(1)

where T is a partition of the set S for sample S1 and S2, 2 ∆ , : log 3 and E is an entropy induced by T. The algorithm used for splitting and IG calculation is shown in the following Fig.6.

239

Function MultipleSplitting(Av) for i=1 to Training.length on column Av if (f(Avi) == f(Avi-1) || Avi-1 == Avi) count++ else if count >= 2 currentCutPoint = Avi-1 + Avi/2.0 Calculate currentEntropy Select bestEntropy numCutPoints++; end count=1 f(Avi-1) = f(Avi) end gain = priorEntropy - bestEntropy if gain > MDLP merge cut points end return bestGain for Av End Figure 6. Splitting algorithm for determining the IG for each attribute with MDLP

In the selection phase, roulette method is applied where the best chromosome (with maximum fitness) is selected and attempt is made to reproduce the chromosome by replacing the minimum fitness element in a population. In order to accomplish the optimization task, the NNDM classifier as the induction algorithm (evaluation function) evaluates the fitness of the new population. As seen in Fig.3, the GA process will interact with NNDM to evaluate the fitness function of each combination of feature subset throughout the generation. Fig. 7 shows the NNDM algorithm during GA process to calculate the fitness based on the current selected feature subset. The fitness function is defined as the classification accuracy rate over testing data using the selected features (attributes) of the population. Classify selected attributes from GA Iterate Testing instances, xq Get Testing xq with selected attributes For xi to xn in Training, Get sim(xqi,xi) Sort sim(xqi,xi), Index sim as INDEX(Testing) Get first item on INDEX(Testing) For i=1,..n of INDEX(Training) Get n most similar instances from INDEX(Training) End If k=1 If training.class = xq.class correct++; Else if k>1, iterate k Count training.class = xq.class clsLabel++ Specify majority class End Return Classification Rate as Fitness End Figure 7. The pseudo-code for calculating selected feature fitness (classification rate) using supervised NNDM

The next phase in the GA process is the genetic operator crossover where a parameter value to check whether crossover is required to be applied. In this process, if

978-1-61284-212-7/11/$26.00©2011 IEEE

crossover is required then the algorithm will selects randomly two or more parents from a population and crossover is applied to each pair creating new offspring. Finally, genetic operator mutation will take place by changing the value of gen in a population chromosome. Stopping condition for the process is whether the maximum number of iteration is reached or if fitness for any of the chromosome in a population is 100. Fitness equal 100 means that the classification rate using the selected attributes over testing data is 100%. The result of a combination of GA optimization and NNDM as an induction algorithm is the set of significant attributes that provides high classification rate. In the GA based wrapper feature selection using NNDM, other than the crossover and mutation rate, there is no other parameters which need to be specified especially for NNDM. This is due to the nature of NNDM algorithm that simply utilizes its distances matrix to classify new instances. The kNN algorithm is considered as lazy learning algorithm and requires high computation cost during classification. However, with the implementation of loss function, the model (NNDM or matrix of distances) is constructed to leverage the full power of kNN classification. Furthermore, feature selection will reduce the computation cost by considering only the set of features for classification. IV.

EXPERIMENTAL DESIGN AND RESULTS

This section demonstrates the performance of the approach for various datasets available from UCI Machine Learning Repository and other sources. A GA based wrapper feature selection using NNDM classifier performance is investigated over the datasets using a general recommended setting [26] and [30]. In the experimental setup, the population is set to 50, the mutation rate is set to 0.001 and crossover rate is set to 0.6. The datasets for our experiments consist of Wisconsin Breast Cancer Database, Image Segmentation, Pen-based Recognition of Handwritten Digits, Wine recognition data and Pima Indians Diabetes Database from UCI Repository of Machine Learning Databases and Domain Theories [27], Swedish Leaf and Face (All) Database from UCR Time Series Classification/Clustering Page [28], and Leaf Image [29]. Table I shows the description of these datasets. TABLE I.

Dataset Swedish Face All Leaf Data Breast Cancer Segmentation Letter Pen Digit Wine Diabetes

Num. of Attribute (incl. Class) 129 132 142 31 20 17 17 14 9

DATASET DESCRIPTION. Train Size 375 560 111 273 924 676 300 71 576

Test Size 750 1690 50 410 1386 1046 700 107 192

Num. of Class

Instance per Class

15 14 14 2 7 26 10 3 2

25 40 7.9 136 132 26 30 23.67 288

Table II compares the performances of Weka, Supervised 1NNDM (1NNDM), k Supervised NNDM (kSNNDM) and

240

the proposed GA wrapper based feature selection (GA+NNDM) using random initial population. The detailed results of the experiments on NNDM classification using Weka, 1SNNDM, kSNNDM and other algorithms can be found from the previous work in [6]. The results suggest that the application of the GA based wrapper feature selection can significantly improve the classification accuracy rate of the proposed NNDM. However, the performance of NNDM may reduce when the nature of the data sampling requires more data. This is shown in certain datasets, e.g., Pen Digit, Letter and Face All. TABLE II. Data Swedish Breast Cancer Face All Letter Segmentation Pen Digit Wine Diabetes Leaf Data

RESULTS ON SEVERAL BENCHMARK DATASET USING RANDOM INITIAL POPULATION GENERATION. Weka 75.87 97.32 68.76 46.46 96.32 89.57 97.20 68.23 28.00

1SNNDM 92.67 97.32 78.28 63.57 95.17 89.57 69.16 65.10 74.00

kSNNDM (k) 92.67 (1) 97.32 (1) 78.28 (1) 63.57 (1) 95.17 (1) 89.71 (1) 71.02 (37) 65.10 (1) 74.00 (1)

GA+ NNDM 96.66 99.02 91.40 84.51 98.70 90.86 100.00 100.00 98.00

Att. Selected 65 6 62 8 9 13 7 2 88

The random population initialization and the overall GA process require quite longer time to be processed. This is due to the fact that the random combination of attributes in the initial population may generate attributes which produce low fitness to the overall NNDM classification. Thus longer time is needed to optimize the subset feature selection. In order to overcome this problem, a second approach as stated in Fig.5 is implemented to generate the initial population. Information gain values of the benchmark dataset are shown in Table III and Table IV shows the results of selected Information Gain (IG) threshold to the proposed method. TABLE III.

Data Swedish Breast Cancer Face All Letter Segmentation Pen Digits Wine Diabetis DataLeaf

INFORMATION GAIN VALUES FOR THE BENCHMARK DATASET. Entropy (S) 3.91

Min IG 1.06

Max IG 1.96

Range, r 42.00

IG1 1.50

IG2 1.64

0.99 3.81 4.70 2.81 3.32 1.57 0.93 3.81

0.22 1.14 0.19 0.00 1.28 0.35 0.04 0.00

0.71 2.07 1.64 2.08 1.95 1.27 0.33 2.13

2.00 43.00 5.00 6.00 5.00 4.00 2.00 46.00

0.46 1.63 0.84 1.54 1.51 0.79 0.12 0.11

0.57 1.77 1.49 1.98 1.60 1.04 0.23 1.26

As outlined in the Table III, the range of minimum IG (Min IG) and maximum IG (Max IG) is divided into three partitions as (low, medium and high) IG value where there are r IG values in a partition. This is to ensure each dataset’s IG values defined in its (low, medium and high) IG for feature selection threshold. The IG1 and IG2 in the Table IV is the two-third (medium plus high IG) and one-third (high IG only) portion

978-1-61284-212-7/11/$26.00©2011 IEEE

of IG values respectively where IG values are divided into three partitions (low IG, medium IG, and high IG values). While IG1 is considered as average threshold to GA to work, IG2 values will speed up the GA process significantly with less attributes (in initial population) and high classification rates. The results shown in Table IV are using threshold values of IG1and IG2. TABLE IV.

RESULTS ON SEVERAL BENCHMARK DATASET USING IG THRESHOLD FOR INITIAL POPULATION GENERATION.

99.47

Att. Selected 13

1.64

100

Att. Selected 5

100 93.49 86.90 100 90.86 100 100 98.00

2 37 2 3 13 6 2 53

0.57 1.77 1.49 1.98 1.60 1.04 0.23 1.26

99.51 94.02 87.19 100 90.71 100 100 100

2 9 2 2 11 3 2 23

Data

IG1

Rate

Swedish Breast Cancer Face All Letter Segmentation Pen Digit Wine Diabetes Leaf Data

1.50 0.46 1.63 0.84 1.54 1.51 0.79 0.12 0.11

IG2

Rate

According to the results in Table IV, information gain (IG) threshold using IG2 is better than threshold using IG1 and random initial population. However, threshold using IG2 is considered as too biased to the attributes with high information gain and may eliminate attributes that are important, thus results in lower rates as seen in dataset Breast Cancer and Pen Digit. Letter dataset and Pen Digit shows that the dataset preparation are problematic for the wrapper GA, NNDM and the splitting method used to evaluate attribute information gain. This confirms that the two dataset need more training examples to provide better estimation for the MDLP and further support the optimization of feature subset selection. V.

CONCLUSION

This paper introduces a GA based wrapper feature selection using NNDM. Our initial objective is to develop and evaluate the GA based wrapper feature selection method with the NNDM. The nearest neighbour, as a non-lazy learning algorithm, is utilized by building the distance matrix prior to classifying the new instances. The work outcomes in this paper emphasize on two important factors related to feature selection and distance based classification. First, the GA based wrapper feature selection method using NNDM algorithm shows that classification accuracies (with SNNDM) can be improved with a certain number of relevant attributes selected. Second, the NNDM classifier used in this experiment can be considered as a new variation of a distance based classification algorithms that utilize nearest neighbour to build a classifier model (nearest neighbour distance matrix) before classifying new unseen instances. The experiment results show that the proposed method can perform with promising classification accuracy with some restrictions. Notable restriction at this point is how well the priors (e.g. IG for probability classification, class mean and nearest neighbours) are constructed in order to provide NNDM for kNN classification. Part of the current works includes the work to improve the proposed approach by

241

investigating other method to evaluate the attribute for the GA process and also to increase the stability of the NNDM

classifier. Comparison on statistical test to support the result will also be carried out once this approach is feasible.

REFERENCES [1]

[2] [3]

[4]

[5] [6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16] [17]

[18]

[19]

A. Jain and D. Zongker, "Feature Selection: Evaluation, Application, and Small Sample Performance," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, pp. pp. 153-158, 1997. Y. Saeys, et al., "A review of feature selection techniques in bioinformatics," Bioinformatics, vol. 23, pp. pp. 2507-2517, 2007. A. A. Freitas, "A survey of evolutionary algorithms for data mining and knowledge discovery," in Advances in evolutionary computing: theory and applications, ed: Springer-Verlag New York, Inc, 2009, pp. 819 - 845. H. Liu and L. Yu, "Toward Integrating Feature Selection Algorithms for Classification and Clustering," IEEE Transactions on Knowledge and Data Engineering, vol. 17, pp. 491 - 502 2005. J. Yang and V. Honavar, "Feature subset selection using a Genetic Algorithm," IEEE Intelligent Systems, vol. 13, pp. 44-49, 1998. M. S. Sainin and R. Alfred, "Nearest Neighbour Distance Matrix Classification," presented at the International Conference on Advanced Data Mining and Applications (ADMA2010), Chongqing, China, 2010. H. Vafaie and K. d. Jong, "Genetic algorithms as a tool for feature selection in machine learning," presented at the In Machine Learning, Proceeding of the 4th International Conference on Tools with Artificial Intelligence, 1992. H. Chouaib, et al., "Feature selection combining genetic algorithm and Adaboost classifiers," in 19th International Conference on Pattern Recognition - ICPR 2008, 2008. H. Vafaie and I. F. Imam, "Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search," presented at the Proceedings of the International Conference on Fuzzy and Intelligent Control Systems, 1994. W. Siedlecki and J. Sklansky, "A note on genetic algorithms for largescale feature selection," Pattern Recogn. Lett., vol. 10, pp. 335-347, 1989. Z. B. I. Frank, "Genetic Algorithms for Feature Selection," master of Science (Computer Science), School of Engineering and Applied Science, University of Virginia, Virginia, 1990. H. Vafaie and K. D. Jong, "Robust feature selection algorithms," in Fifth International Conference on Tools with Artificial Intelligence, Boston, MA , USA, 1993, pp. 356 - 363. W. F. Punch, et al., "Further Research on Feature Selection and Classification Using Genetic Algorithms," presented at the Proceedings of the 5th International Conference on Genetic Algorithms, 1993. H. Vafaie and K. d. Jong, "Imporving the Performance of a Rule Induction System Using Genetic Algorithms," in Machine Learning: A Multistrategy Approach. vol. 4, R. S. Michalski and a. G.Tecuci, Eds., ed: Morgan Kaufmann, San Mateo, CA, 1993. J. Yang and V. Honavar, Eds., Feature subset selection using a genetic algorithm (Feature Extraction, Construction and Selection – A Data Mining Perspective. Kluwer, 1998, p.^pp. Pages. R. Kohavi and G. H. John, "Wrappers for feature subset selection," Artificial Intelligence, vol. 97, pp. 273-324, 1997. M. J. Martin-Bautista and M.-A. Vila, "Applying Genetic Algorithms to the Feature Selection Problem in Information Retrieval," in Proceedings of the Third International Conference on Flexible Query Answering Systems, 1998, pp. 272-281. J. Jarmulak and S. Craw, "Genetic Algorithms for Feature Selection and Weighting," in IJCAI’99 workshop on Automating the Construction of Case Based Reasoners, 1999. L. Zhou, et al., "A Genetic Algorithm Based Wrapper Feature Selection Method for Classification of Hyperspectral Images Using Support Vector Machine," The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XXXVI, pp. 397-402, 2008.

978-1-61284-212-7/11/$26.00©2011 IEEE

[20] F. Tan, et al., "A genetic algorithm-based method for feature subset selection," Soft Comput., vol. 12, pp. 111-120, 2008. [21] M. A. Jayaram, et al., "Feature Subset Selection Problem using Wrapper Approach in Supervised Learning," International Journal of Computer Applications vol. 1, pp. 13-16, 2010. [22] R. Tiwari and M. P. Singh, "Correlation-based Attribute Selection using Genetic Algorithm," International Journal of Computer Applications, vol. 4, pp. 28-34, 2010. [23] L. I. Kuncheva and L. C. Jain, "Nearest neighbor classifier: Simultaneous editing and feature selection," Pattern Recognition Letters, vol. 20, pp. 1149-1156, 1999. [24] K. Q. Weinberger and L. K. Saul, "Distance Metric Learning for Large Margin Nearest Neighbor Classification," Journal of Machine Learning Research, vol. 10, pp. 207-244, 2009. [25] U. M. Fayyad and K. B. Irani, "Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning," in Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1993, pp. pp. 1022-1027. [26] K. De Jong, "Analysis of the behaviour of a class of genetic adaptive systems. Ph.D Thesis," Ph.D, Department of Computer and Communications Sciences, University of Michigan, Ann Arbor, MI, 1975. [27] A. Asuncion and D. J. Newman, "UCI Machine Learning Repository [http://www.ics.uci.edu/~mlearn/MLRepository.html]," ed. Irvine, CA: University of California, School of Information and Computer Science. , 2007. [28] E. Keogh, Xi, X., Wei, L. and Ratanamahatana, C. A., "The UCR Time Series Classification/Clustering Homepage: www.cs.ucr.edu/~eamonn/time_series_data/," ed, 2006. [29] G. Agarwal, et al., "First steps toward an electronic field guide for plants," Taxon, vol. 55, pp. 597-610, 2006. [30] K. DeJong, and W.M. Spears, "An Analysis of the Interacting Roles of Population Size and Crossover in Genetic Algorithms," Proc. First Workshop Parallel Problem Solving from Nature, Springer-Verlag, Berlin, pp. 38-47, 1990.

242