A Combination of Shuffled Frog-Leaping Algorithm ...

2 downloads 0 Views 2MB Size Report
frog-leaping algorithm (SFLA) and to serve as a local op- ... a new solution is randomly generated to replace that worst frog. ... in John Holland [25]. 2.3.
A Combination of SFLA and GA for Gene Selection

Paper: jc12-3-3323:2008/4/22

A Combination of Shuffled Frog-Leaping Algorithm and Genetic Algorithm for Gene Selection Cheng-San Yang , Li-Yeh Chuang , Chao-Hsuan Ke , and Cheng-Hong Yang 

Institute of biomedical engineering, National Cheng Kung University, Tainan, Taiwan 70101  Department of Chemical Engineering, I-Shou University, Kaohsiung, Taiwan 84001  Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan 80778 E-mail: [email protected] [Received ; accepted ]

Microarray data referencing to gene expression profiles provides valuable answers to a variety of problems, and contributes to advances in clinical medicine. The application of microarray data to the classification of cancer types has recently assumed increasing importance. The classification of microarray data samples involves feature selection, whose goal is to identify subsets of differentially expressed gene potentially relevant for distinguishing sample classes and classifier design. We propose an efficient evolutionary approach for selecting gene subsets from gene expression data that effectively achieves higher accuracy for classification problems. Our proposal combines a shuffled frog-leaping algorithm (SFLA) and a genetic algorithm (GA), and chooses genes (features) related to classification. The K-nearest neighbor (KNN) with leave-one-out cross validation (LOOCV) is used to evaluate classification accuracy. We apply a novel hybrid approach based on SFLAGA and KNN classification and compare 11 classification problems from the literature. Experimental results show that classification accuracy obtained using selected features was higher than the accuracy of datasets without feature selection.

Keywords: gene expression data, classification, SFLA, GA, KNN

1. Introduction DNA microarray technology, which enables thousands of gene expression activation levels to be simultaneously monitored and measured in a single experiment, is universally used in medical diagnosis and gene analysis. Many microarray analysis research projects have focused on clustering analysis, used to analyze group genes showing a pattern correlated with gene expression data providing insight into gene interactions and functions [1–3], and classification accuracy, which discriminates between sample classes and predicts the relative importance of individual genes in sample classification [4]. Vol.12 No.3, 2008

Gene expression data typically has a high dimension and a small sample, which makes general classification testing and training difficult. Obtaining correct classification is also difficult. The purpose of classification is to build an efficient model for predicting data class membership of data, expected to produce the correct label for training data, and to predict the label for unknown data correctly. Generally, only relatively small numbers of gene expression data are strongly correlated with a certain phenotype compared to the total number of genes investigated, meaning that of thousands of genes investigated, only a small number correlate significantly with a phenotype. To analyze gene expression profiles correctly, feature (gene) selection is crucial to classification. Classifying microarray data samples involves feature selection and classifier design. Reliable selection for genes relevant to sample classification is needed to increase predictive accuracy and to avoid incomprehensibility. Such section should be based on the total number of genes investigated. Methods suggested for use in used feature selection have include genetic algorithms [5], branch and bound algorithms [6], sequential search algorithms [7], mutual information [8], tabu search [9], entropy-based methods [10], regularized least squares [11], random forests [12], instance-based methods [13], and least squares support vector machines [14]. We embedded a genetic algorithm (GA) in a shuffled frog-leaping algorithm (SFLA) and to serve as a local optimizer for each generation in implementing feature selection. To classify 11 classification problems taken from the literature, we used the K-nearest neighbor (KNN) with leave-one-out cross-validation (LOOCV) based on Euclidean distance calculation to evaluate the SFLA and GA. Results showed that our proposal achieves superior classification accuracy when applied to the 11 datasets, compared to previous methods. The number of genes needing to be selected can also be significantly reduced.

2. Method Our proposal is briefly explained in the sections below.

Journal of Advanced Computational Intelligence and Intelligent Informatics

1

Yang, C.-S., et al.

2.1. Shuffled Frog-Leaping Algorithm The shuffled frog-leaping algorithm (SFLA) combines the benefits of a gene-based memetic algorithm (MA) and social behavior-based particle swarm optimization (PSO) [15]. A MA is a gene-based optimization algorithm similar to a GA. In a GA, chromosomes are represented as a string consisting of a set of elements called “genes.” Chromosomes in MA are represented by elements, called “memes.” MAs and GA differ in that the MA implements a local search before crossover or mutations to determine offspring. After the local search, new offspring that obtain better results than original offspring replace original offspring, thus continuing the evolutionary process. PSO is an evolutionary algorithm [16] in which individual solutions are called “particle” (analogous to the GA chromosome), but PSO does not apply crossover and mutation to construct a new particle. Each particle changes its position and velocity based on the individual particle’s optimal solution pBest and the corporate optimal solution gBest until a global optimal solution is found. The SFLA is derived from a virtual population of frogs in which individual frogs are equivalent to the GA chromosomes, and represent a set of solutions. Each frog is distributed to a different subset of the whole population called a memeplex. An independent local search is conducted for each frog memplex, in what is called memeplex evolution. After a defined number of memetic evolutionary steps, frogs are shuffled among memeplexes [17], enabling frogs to interchange messages among different memplexes and ensure that they move to an optimal position, similar to particles in PSO. Local search and shuffling continue until defined convergence criteria are met. SFLA have demonstrated effectiveness in a number of global optimization problems difficult to solve using other methods, i.e., water distribution and groundwater model calibration problems [18]. (1) Initial population An initial population of P frogs is created randomly for a S-dimensional problem. A frog i is represented by S variables, such as Fi  fi1 fi2    fis   (2) Sorting and distribution Frogs are sorted in descending order based on their fitness values, then the entire population is divided into m memeplexes, each containing n frogs (i.e., P m n ). The first frog is distributed to the first memeplex, the second frog to the second, the m frog to the m memeplex, and the m  1 frog to the first memeplex, etc. (3) Memeplex evolution Within each memeplex, frogs with the best and the worst fitness are identified as Xb and Xw , and the frog with the global best fitness is identified as Xg separately. To improve the worst solution, an equation similar to PSO is used to update the worst solution, e.g., Eqs. (1) and (2): Change in frog position

Di 2

rand Xb  Xw   . . . . . . . . (1)

New position Xw = current position Xw  Di

Dmax  Di  Dmax . . . . . . . . . . (2) where rand  is a random number between 0 and 1; and

Dmax is the maximum change allowed in a frog’s position. If this process produces a better solution, it replaces the worst frog. If Eqs. (1) and (2) do not improve the worst solution, Xb of Eq. (1) is changed to Xg and adapted to Eq. (3): Change in frog position

D i 

rand Xg  Xw   . . . . . . . . (3)

If Eqs. (1) and (3) do not improve the worst solution, then a new solution is randomly generated to replace that worst frog. (4) Shuffling After a defined number of memeplex evolution steps, all frogs of memeplexes are collected, and sorted in descending order based on their fitness. Step 2 divides frogs into different memeplexes again, and then step 3 is done. (5) Terminal condition If a global solution or a fixed iteration number is reached, the algorithm stops.

2.2. Genetic Algorithms A genetic algorithm (GA) is a general adaptive optimization search based on a direct analogy to Darwinian natural selection and genetics in biological systems. It is a promising alternative to conventional heuristics. A GA uses chromosomes, in string form, to represent a solution. Each chromosome consists of a set of elements called “genes” that hold a set of values for optimization variables. In a S-dimensional problem, for example, each chromosome i is represented by S variables, i.e., Ci ci1 ci2    cis . Before the GA is executed, chromosomes are initially randomly produced and represent the population. Different fitness functions must be used for different selection problems, and each fitness of a chromosome must be calculated by this fitness function. To simulate natural survival-of-the-fittest, the optimal (“best”) chromosomes exchange information to produce offspring chromosomes through crossover and mutation. If the offspring chromosome is superior to the worst chromosome of the population, then it replaces the worst chromosome, and the process continues until a global solution or a fixed number of iterations is reached. GA have been applied to a variety of problems, such as scheduling [19], machine learning [20], multiple objective problems [21], feature selection [22], data mining [23], and traveling salesman problem [24], as detailed in John Holland [25]. 2.3. K-Nearest Neighbor The K-nearest neighbor (KNN), introduced by Fix and Hodges in 1951 [27], is a popular nonparametric

Journal of Advanced Computational Intelligence and Intelligent Informatics

Vol.12 No.3, 2008

A Combination of SFLA and GA for Gene Selection

Start Initialize: (a) SFLA Population size (SFLA_p) (b) Number of memeplex (m) (c) Iterations with each memeplex

Initialize: (a) GAs Population size (GA_p ) = SFLA_p (b) Number of Iterations (c) Crossover rate and mutation rate

Generate Population (SFLA_p )

Generate Population (GA_p )

Evaluate the fitness values of ( )

Evaluate the fitness values of (GA )

Sort (SFLA_p ) in descending order

Crossover or mutation

Divide frogs (SFLA_p ) into ( )

Replace

memeplex evolution

Satisfy ending condition?

Yes

Shuffle the memeplexs No

No

Satisfy ending condition?

Yes End Fig. 1. Flowchart of the SFLA-GA.

method [26, 27] Used to classify a new object based on attributes and training samples. The K-nearest neighbor consists of a supervised learning algorithm where the result of a new query instance is classified based on the majority of the K-nearest neighbor category. Classifiers do not use a model for fitting and are based only on memory, which works based on a minimum distance from the query instance to training samples to determine the K-nearest neighbor. Any ties in results are solved by a random procedure. KNN has been applied, e.g., to statistical estimation, pattern recognition, artificial intelligence, categorical problems, and feature selection. The KNN’s advantage is that it is simple and easy to implement. KNN is not negatively affected when training data is large, and is indifferent to noisy training data. We measured the feature subset using leave-one-out cross-validation of one nearest neighbor (1-NN). Neighbors are calculated using their Vol.12 No.3, 2008

Euclidean distance. The 1-NN classifier requires no userspecified parameters, and classification results are implementation independently.

2.4. Proposed Method (SFLA-GA) An SFLA is an evolutionary algorithm similar to MA and PSO usually used to search for a global solution [28], i.e., in real number problems. In a solution space, each frog of the SFLA represents a solution, and each frog is distributed to a different memeplex. Each memeplex represents a small part of the solution space. An independent local search is conducted in each memeplex and, after a fixed number of evolutionary memeplex steps, shuffling is executed in which frogs can interchange messages between memplexes, and congregate toward a global solution. Conventionally, a SFLA features speedy searches, but similar other optimization algorithms, they can get

Journal of Advanced Computational Intelligence and Intelligent Informatics

3

Yang, C.-S., et al.

Table 1. Cancer-related human gene expression data sets.

Dataset Name

Diagnostic task

9 Tumors GEMS

Nine various human Tumor types Eleven various human Tumor types Fourteen various human Tumor types and 12 normal tissue types Five human brain tumor types Four malignant glioma types Acute myelogenous leukemia (AML), acute lympboblastic leukemia (ALL) B-cell, and ALL T-cell AML, ALL, and mixed-lineage leukemia (MLL) Four lung cancer types and normal tissues Small, round blue cell tumors of children Prostate tumor and normal tissue Diffuse large B-cell lymphomas and follicular lymphomas

11 Tumors GEMS 14 Tumors GEMS Brain Tumor1 GEMS Brain Tumor2 GEMS Leukemia1 GEMS

Leukemia2 GEMS Lung Cancer GEMS SRBCT GEMS Prostate Tumor GEMS DLBCL GEMS

trapped in local optima, which may prevent finding a global optimal solution. To prevent such traps, we combined it with a GA, used as a substitute for the update equation of the SFLA, i.e., the GA conducts the local search, and getting trapped in a local optimum is avoided while conforming to the evolutionary memeplex of the SFLA (Fig. 1). GA chromosome code models, PSO particles, and SFLA frogs are codes for real-number mode. To conduct feature (gene) selection by optimal algorithms, we modified the code model of the optimal algorithms. We modified the position of each solution and represent it as a binary string [29], with the string length equal to feature numbers, (i.e. f x1 x2    xn n number of feature. The bit value 1 represents a selected feature, whereas the bit value 0 represents a non-selected feature. We modified the model of each solution (chromosomes and frogs) to be represented as a binary string, so that the SFLA and a GA can easily be combined into an SFLA-GA, and feature expressions easily interchanged during selection. When conducting memeplex evolution of the SFLA, frog features represent GA chromosomes, and the GA is used to conduct a local search. The pseudo-code for the proposed SFLA-GA procedure is given in Appendix A, the pseudocode for the GA in Appendix B and 1-NN in Appendix C.

2.5. Parameter Setting for Algorithms When conducting the optimization algorithm, the parameters used affect the results, i.e., chromosome number 4

Number of Samples Genes 60 5726

Classes 9

Reference Staunton et al. [4]

174

12533

11

Su et al. [42]

308

15009

26

Ramaswamy et al. [43]

90 50 72

5920 10367 5327

5 4 3

Pomeroy et al. [44] Nutt et al. [45] Golub et al. [46]

72

11225

3

Armstrong et al. [47]

203

12600

5

Bhattacharjee et al. [48]

83

2308

4

Khan et al. [49]

102

10509

2

Singh et al. [50]

77

5469

2

Shipp et al. [51]

and iteration time. If the number of chromosomes is large, the GA finds global solutions easily, but calculation takes time so parameters adjusted appropriately [30]. The SFLA parameters we used were a frog number of 50 P m n, 5 memeplexes m each containing 10 frogs n, and 5 shuffling iterations. GA parameters were the same number off chromosomes as frog numbers of a memeplex n, 10 generations, a crossover of 1.0, mutation of 0.05, and one-point crossover. The K number of the K-nearest neighbor was 1 (1-NN).

3. Experiment 3.1. Problem Description A microarray usually contains n samples and m dimensions of gene expression data. In a single sample, which contains all genes, only a few genes usually significantly correlate with a disease. This means that not all genes contribute to classification results. Before classification accuracy is calculated, features (genes) that benefit prediction results must be selected so they promote calculation efficiency, and improve classification accuracy. 3.2. Data Sets Gene expression data was obtained by the oligonucleotide technique. Our datasets consist of 11 gene expression profiles, downloaded from http://www.gems-system.org, including tumor, brain tu-

Journal of Advanced Computational Intelligence and Intelligent Informatics

Vol.12 No.3, 2008

A Combination of SFLA and GA for Gene Selection

Table 2. Feature selection method and classification accuracy (Statnikov et al. 2005) [3].

9 Tumors GEMS 14 Tumors GEMS Brain Tumor1 GEMS Brain Tumor2 GEMS

Classification

Gene Selected

Selected gene number

CS OVR -SVM OVR -SVM OVR -SVM

BW No KW KW

1000 15009 500 500

Accuracy with gene selection (%) 74.86 76.60 92.67 85.67

Table 3. Ratio of selected features to original features.

9 Tumors GEMS 11 Tumors GEMS 14 Tumors GEMS Brain Tumor1 GEMS Brain Tumor2 GEMS Leukemia1 GEMS Leukemia2 GEMS Lung Cancer GEMS SRBCT GEMS Prostate Tumor GEMS DLBCL GEMS

Original gene Num (A) 5726 12533 15009 5920 10367 5327 11225 12600 2308 10509 5469

mor, leukemia, lung cancer, and prostate tumor samples. The data format (Table 1) included the dataset name, the number of samples, categories, and samples.

4. Result and Discussion 4.1. Experimental Results We selected genes relevant to gene expression data analysis and used SFLA-GA to calculate classification accuracies using KNN with LOOCV as an evaluator. Of the 11 datasets tested, only the Prostate Tumor GEMS and DLBCL GEMS data sets had two categories; all the other data sets were multi-category datasets. The minimum number of samples in a dataset was 60 and the maximum number of samples 308. Statnikov et al. [3] conducted experiments, using SVM and non-SVM learning algorithms. Non-SVM methods included the K-Nearest neighbor (KNN) [31], Backpropagation Neural Networks (BNN) [32], and Probabilistic Neural Networks (PNN) [33]. MC-SVM were the one-versus-rest SVM (OVR-SVM) [34], one-versusone SVM (OVO-SVM), directed acyclic graph SVM (DAGSVM) [35], the method by Weston and Watkins (WW) [36], and the method by Crammer and Singer (CS) [37]. In the study by Statnikov et al. [3], the highest classification accuracy was achieved with OVR-SVM, and the average classification accuracy for the 11 datasets was 89.44%. The study incorporated four different feature selections: (1) ratio of genes between -categories to within-category sums of squares (BW), (2) signalVol.12 No.3, 2008

Selected gene Num (B) 2550 4840 4900 620 1133 1892 599 3185 244 598 985

Ratio = BA (%) 44.53 % 38.62 % 32.65 % 10.47 % 10.93 % 35.52 % 5.34 % 25.28 % 10.57 % 5.69 % 18.01 %

to-noise (S2N) scores applied in a one-versus-rest (S2N-OVR), (3) signal-to-noise scores applied in a one-versus-one (S2N-OVO), and (4) Kruskal-Wallis nonparametric one-way ANOVA (KW). Which of these four methods should be used when the number of genes is 50, 100, 150, and 200 depends on weights and which genes are used to calculate predictive accuracy. The study only selected 9 Tumors GEMS, 14 Tumors GEMS, Brain Tumor1 GEMS, and Brain Tumor2 GEMS datasets to implement feature selection and determine classification accuracy. The best results achieved for these four datasets were 74.86%, 76.60%, 92.67% and 85.67% classification accuracy (Table 2). In our study, we used the SFLA-GA to implement feature selection and calculate the classification accuracy achieved with selected features. We chose a small feature number to represent original datasets (Table 3). The number of chosen features for the 9 Tumors GEMS was 2550, for 11 Tumors GEMS 4840, and for Leukemia1 GEMS 1892. The ratio of chosen features for 9 Tumors GEMS is 44.53%, for 11 Tumors GEMS is 38.62%, and for Leukemia1 GEMS is 35.52% respectively. For other datasets the maximum ratio of chosen features was 18.56% and the minimum ratio 5.69%. Table 4 compares experimental results achieved with our method and those of the Statnikov et al. study [3]. The average classification accuracy achieved with the SFLAGA was 90.86%, whereas the highest average classification accuracy in Statnikov et al., [3] was 89.44%, achieved with the OVR-SVM.

Journal of Advanced Computational Intelligence and Intelligent Informatics

5

Yang, C.-S., et al.

Table 4. Classification accuracy of SFLA-GA compared to the classification accuracy of methods taken from the literature.

Non-SVM MC-SVM SFLA-GA KNN NN PNN OVR OVO DAG WW CS KNN 9 Tumors GEMS 43.90 19.38 34.00 65.10 58.57 60.24 62.24 65.33 76.67 11 Tumors GEMS 78.51 54.14 77.21 94.68 90.36 90.36 94.68 95.30 90.80 14 Tumors GEMS 50.40 11.12 49.09 74.98 47.07 47.35 69.07 76.60 64.28 Brain Tumor1 GEMS 87.94 84.72 79.61 91.67 90.56 90.56 90.56 90.56 93.33 Brain Tumor2 GEMS 68.67 60.33 62.83 77.00 77.83 77.83 73.33 72.83 88.00 Leukemia1 GEMS 83.57 76.61 85.00 97.50 91.32 96.07 97.50 97.50 100.00 Leukemia2 GEMS 87.14 91.03 83.21 97.32 95.89 95.89 95.89 95.89 100.00 Lung Cancer GEMS 89.64 87.80 85.66 96.05 95.59 95.59 95.55 96.55 95.57 SRBCT GEMS 86.90 91.03 79.50 100.00 100.00 100.00 100.00 100.00 100.00 Prostate Tumor GEMS 85.09 79.18 79.18 92.00 92.00 92.00 92.00 92.00 92.16 DLBCL GEMS 86.96 89.64 80.89 97.50 97.50 97.50 97.50 97.50 98.70 Average 77.16 67.73 72.38 89.44 85.15 85.76 88.03 89.10 90.86 - Legend: (1) KNN: K-Nearest Neighbors. (2) NN: Backpropagation Neural Networks. (3) PNN: Probabilistic Neural Networks. (4) OVR: One-Versus-Rest. (5) OVO: One-Versus-One. (6) DAG: DAGSVM. (7) WW: Method by Weston and Watkins. (8) CS: Method by Crammer and Singer. (9) SFLA-GA: the proposed method. - Highest values are in bold-type.

4.2. Discussion SFLA-GA combines the advantages of the SFLA, namely an exchange of information between individual members of the group (frogs), and implements a local search using a GA to evaluate the process results. Using SFLA-GA, feature selection is conducted and classification accuracy calculated. Feature selection improves calculation efficiency and classification accuracy in classification problems with multiple features, since not all features necessarily influence classification accuracy [38]. Selecting appropriate features improves predictive accuracy, but selecting inappropriate features compromises predictive accuracy [39]. Using appropriate feature selection to select optimal features interacting with a category results in higher classification accuracy. In a conventional SFLA, the position is updated through Xb and Xw , i.e., the best and worst solutions of each memeplex. When the difference between Xb and Xw is large, Xw changes its position affected by Xb . If Xb and Xw are similar in value, however, Xw will not change its position based on Xb [40]. This may result in a stagnation of the algorithm at a local optimum and lead to premature convergence. Improving the algorithm to keep it from being trapped early is a major purpose of our study. Conventionally, SFLAs tend to get trapped early on, since the SFLA does not use mutation similar to a GA. When a GA gets temporarily trapped, it can still escape the local optimum through mutation, i.e., it can evolve further. A conventional SFLA may not escape the local optimum because it lacks a similar process, meaning that a global solution cannot be arrived at. Combining mutation of a GA with SFLA ensures that feature selection progresses and that a 6

global solution is found. For the datasets 11 Tumors GEMS, 14 Tumors GEMS and Lung Cancer GEMS datasets, classification accuracy is not improved when feature selection is conducted, but for other datasets, accuracy is improved by 0.16%-5.17%. The three datasets that show no improvement in accuracy all have large dimensions (features), resulting in faulty classification when KNN is used as a classifier, and the parameter K is set to 1, as it was in our study. To rectify this, we suggest using different distance estimation when calculating classification accuracy for high dimensional problems (dimension numbers10000) with the KNN, or changing K [41]. KNN is an easy, readily available classification algorithm that needs few parameters, so it is often used in research for classification problems. It results, however, in long calculation time and consumes much memory, especially for problems with a large number of features. Recent research indicates that support vector machines (SVM) have advantages in solving high dimensional problems [3]. We suggest that an SVM algorithm be used to calculate classification accuracy when solving high dimensional feature selection problems.

5. Conclusions We have used SFLA-GA to implement feature selection of gene expression data and a K-nearest neighbor (KNN) as an evaluator of SFLA-GA. Experimental results show that our method effectively simplified feature selection and reduced the total number of features. Classification accuracy obtained by our proposal method had the highest classification accuracy in eight of the 11 gene expression data test problems, and is comparable to classification ac-

Journal of Advanced Computational Intelligence and Intelligent Informatics

Vol.12 No.3, 2008

A Combination of SFLA and GA for Gene Selection

curacy of other methods from the literature for the other three test problems. The average classification accuracy of our proposal is increased 1.42% over the best results in previously published studies. Our proposal serves as an ideal preprocessing tool to optimize feature selection, because it increases classification accuracy while minimizing required calculation resources. It is also applicable to problems in other areas. References: [1] D. S. V. Wong, F. K. Wong, and G. R. Wood, “A multi-stage approach to clustering and imputation of gene expression profiles,” Bioinformatics, Vol.23, No.8, pp. 998-1005, 2007. [2] E. Hartuv, A. Schmitt, J. Lange, et al., “An algorithm for clustering cDNA fingerprints,” Genomics, Vol.66, pp. 249-256, 2000. [3] A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, “A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis,” Bioinformatics, Vol.21, No.5, pp. 631-643, 2005. [4] J. E. Staunton, D. K. Slonim, H. A. Coller et al., “Chemosensitivity prediction by transcriptional profiling,” PNAS. U.S.A. 98 (19), pp. 10787-10792, 2001. [5] M. L. Raymer, W. F. Punch, E. D Goodman, L. A. Kuhn, and A. K. Jain, “Dimensionality Reduction Using Genetic Algorithms,” IEEE Trans. Evolutionary Computation, Vol.4, No.2, pp. 164-171, 2000. [6] P. M. Narendra and K. Fukunaga, “A Branch and Bound Algorithm for Feature Subset Selection,” IEEE Trans. Computers, Vol.6, No.9, pp. 917-922, 1997. [7] P. Pudil, J. Novovicova, and J. Kittler, “Floating Search Methods in Feature Selection,” Pattern Recognition Letters, Vol.15, pp. 11191125, 1994. [8] B. Roberto, “Using mutual information for selecting features in supervised neural net learning,” IEEE Transactions on Neural Networks, Vol.5, No.4, pp. 537-550, 1994. [9] H. Zhang and G. Sun, “Feature selection using tabu search method,” Pattern Recognition, Vol.35, pp. 701-711, 2002. [10] X. Liu, A. Krishnan, and A. Mondry, “An Entropy-based Gene Selection Method for Cancer Classification using Microarray Data,” BMC Bioinformatics, Vol.6:76, 2005. [11] N. Ancona, R. Maglietta, D. D’Addabbo, S. Liuni, and G. Pesole, “Regularized Least Squares Cancer Classifiers from DNA microarray data,” Bioinformatics, Vol.6 (Suppl 4):S2, 2005. [12] R. D´ıaz-Uriarte and S. A. de Andr´es, “Gene selection and classification of microarray data using random forest,” BMC Bioinformatics, Vol.7:3, 2006. [13] D. Berrar, I. Bradbury, and W. Dubitzky, “Instance-based concept learning from multiclass DNA microarray data,” Bioinformatics, Vol.7:73, 2006. [14] E. K. Tang, P. Suganthan, and X. Yao, “Gene selection algorithms for microarray data based on least squares support vector machine,” Bioinformatics, Vol.7:95, 2006. [15] E. Elbeltagi, T. Hegazy, and D. Grierson, “Comparison among five evolutionary-based optimization algorithms,” Advanced Engineering Informatics, Vol.19, No.1, pp. 43-53, 2005. [16] J. Kennedy and R. Eberhart, “Particle swarm optimization,” Proc. of the IEEE Int. Conf. on neural networks (Perth, Australia), Piscataway, NJ: IEEE Service Center, pp. 1942-1948, 1995. [17] S. Y. Liong and M. Atiquzzaman, “Optimal design of water distribution network using shuffled complex evolution,” Journal of The Institution of Engineers, Vol.44, No.1, Singapore, pp. 93-107, 2004. [18] M. M. Eusuff and K. E. Lansey, “Optimization of water distribution network design using the shuffled frog leaping algorithm,” Journal of Water Resources Plan Management, Vol.129, No.3, pp. 210-225, 2003. [19] E. S. Hou, N. Ansari, and H. Ren, “A Genetic Algorithm for multiprocessor Scheduling” IEEE Transactions on Parallel and Distributed Systems, Vol.5, No.2, pp. 113-120, 1994. [20] H. Vafaie and K. De Jong, “Genetic Algorithms as a Tool for Feature Selection in Machine Learning,” Proc. of the 4th Int. Conf. on Tools with Artifical Intelligence, Arlington, VA, pp. 200-204, 1992. [21] K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, “A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II,” IEEE Trans. Evol. Comput., Vol.6, pp. 182197, 2002. [22] I. S. Oh, J. S. Lee, and B. R. Moon, “Hybrid Genetic Algorithms for Feature Selection,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.26, No.11, 2004.

Vol.12 No.3, 2008

[23] S. Kim and B. T. Zhang, “Evolutionary learning of web-document structure for information retrieval,” Proc. of the 2001 Congress on Evoluationary Computation,Vol.2, pp. 1253-1260, 2001. [24] W. Pullan, “Adapting the Genetic Algorithm to the Traveling Salesman Problem,” IEEE Congress on Evolutionary Computation, 2003. [25] J. H. Holland, “Adaptation in Nature and Artificial Systems,” MIT Press, 1992. [26] T. M. Cover and P. E. Hart, “Nearest Neighbor Pattern Classification,” In Proc. of the IEEE Transactions Information Theory, pp. 2127, 1967. [27] E. Fix and J. L. Hodges, “Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties,” Report No.4, USAF School of Aviation Medicine, Randolph Field, Texas, pp. 261-279, 1951. [28] M. Eusuff, K. Lansey, and F. Pasha, “Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization,” Engineering Optimization, Vol.38, No.2, pp. 129-154, 2006. [29] H. Laanaya, A. Martin, A. Khenchaf, and D. Aboutajdine, “Feature selection using genetic algorithm for sonar images classification with support vector machines,” European Conf. on Propagation and Systems, Brest, France, 2005. [30] C. L. Huang and C. J. Wang, “A GA-based feature selection and parameters optimization for support vector machines,” Expert System with applications, Vol.31, No.2, pp. 231-240, 2006. [31] T. M. Mitchell, “Machine Learning”. McGraw-Hill, New York, USA, 1997. [32] B. V. Dasarathy, “NN concepts and techniques, Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques,” IEEE Computer Society Press, pp. 1-30, 1991. [33] D. F. Specht, “Probabilistic neural network,” Neural Networks, 3, pp. 109-118, 1990. [34] U. Kreßel, “Pairwise classification and support vector machines,” In Advances in Kernel Methods: Support Vector Learning, Cambridge, MA: MIT Press, pp. 255-268, 1999. [35] J. C. Platt, N. Cristianini, and J. Shawe-Taylor, “Large margin DAGS for multiclass classification,” In Advances in Neural Information Processing Systems, Vol.12, MIT Press, pp. 547-553, 2000. [36] J. Weston and C. Watkins, “Support vector machines for multi-class pattern recognition,” In Proc. of the Seventh European Symposium on Artificial Neural Networks (ESANN 99), Bruges, 1999. [37] K. Crammer and Y. Singer, “On the learnability and design of output codes for multiclass problems,” Proc. of the Thirteen Annual Conf. on Computational Learning Theory (COLT 2000), Stanford University, Palo Alto, CA, 2000. [38] C. H. Yeang, S. Ramaswamy, P. Tamayo, et al., ‘ ‘Molecular classification of multiple tumor types,”Bioinformatics, Vol.17: (Suppl.) S316 -S322, 2001. [39] T. Li, C. Zhang, and M. Ogihara, “A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression,” Bioinformatics Vol.20, No.15, pp. 2429-2437, 2004. [40] E. Elbeltagi, T. Hegazy, and D. Grierson, “A modified shuffled frogleaping optimization algorithm: applications to project management,” Structure & Infrastructure Engineering: Maintenance, Management, Life-Cycl, Vol.3, No.1, pp. 53-60, 2007. [41] A. K. Ghosh, “On optimum choice of k in nearest neighbor classification,” Computational Statistics & Data Analysis Vol.50, No.11, pp. 3113-3123, 2006. [42] A. I. Su, J. B. Welsh, L. M. Sapinoso, et al., “Molecular classification of human carcinomas by use of gene expression signatures,” Cancer Res, Vol.61, pp. 7388-7393, 2001. [43] S. Ramaswamy, P. Tamayo, R. Rifkin, et al., “Multiclass cancer diagnosis using tumor gene expression signatures,” PNAS 98 (26), pp. 15149-15154, 2001. [44] S. L. Pomeroy, P. Tamayo, M. Gaasenbeek et al., “Prediction of central nervous system embryonal tumor outcome based on gene expression” Nature 415 6870, pp. 436-442, 2002. [45] C. L. Nutt, D. R. Mani, R. A. Betensky, P. Tamayo, et al., “Gene expression-based classification of malignant gliomas correlates better with survival than histological classification,” Cancer Res. Vol.63, No.7, pp. 1602-1607, 2003. [46] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, and M. Gaasenbeek, et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science 286, pp. 531-537, 1999. [47] S. A. Armstrong, J. E. Staunton, L. B. Silverman, R. Pieters, M. L. den Boer, M. D. Minden, S. E. Sallan, E. S. Lander, T. R. Golub, and S.J. Korsmeyer, “MLL translocations specify a distinct gene expression profile that distinguishes a unique leukaemia,” Nature Genetics, Vol.30, No.1, pp. 41-47, 2002.

Journal of Advanced Computational Intelligence and Intelligent Informatics

7

Yang, C.-S., et al.

[48] A. Bhattacharjee, W. Richards, J. Staunton, C. Li, S. Monti, and P. Vasa et al., “Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses,” PNAS, Vol.98, No.24, pp. 13790-13795, 2001. [49] J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescus, C. Peterson, and P. S. Meltzer, “Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,” Nature Med. Vol.7, pp. 658-659, 2001. [50] D. Singh, P. Febbo, K. Ross, et al., “Gene expression correlates of clinical prostate cancer behavior,” Cancer Cell, Vol.1, pp. 203-209, 2002. [51] M. A. Shipp, K. N. Ross, P. Tamayo, et al., “Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning,” Nature Med, Vol.8 No.1, pp. 68-74, 2002. [52] C. Lee and G. G. Lee, “Information gain and divergence-based feature selection for machine learning-based text categorization,” Information Proc. and management, Vol.42, Issue 1, pp. 155-165, 2006. [53] C. H. Ooi and P. Tan, “Genetic algorithms applied to multi-class prediction for the analysis of gene expression data,” Bioinformatics, Vol.19, pp. 37-44, 2003. [54] L. Davis, “Hybrid genetic algorithms for machine learning,” In IEE Colloquium on Machine Learning, London, Vol.Digest, No.117, 9/1–9/3, 1990. [55] A. H. F. Dias and J. A. de Vasconcelos, “Multiobjective Genetic Algorithms Applied to Solve Optimization Problems,” IEEE Transactions on magnetic, Vol.38, No.2, pp. 1133-1136, 2002. [56] J. H. Holland, “Adaptation in natural and artificial systems,” Ann Arbor, The University of Michigan Press, 1975. [57] K. E. Lee, N. J. Sha, E. R. Dougherty, M. Vannucci, and B. K. Mallick, “Gene selection: a Bayesian variable selection approach,” Bioinformatics, Vol.19, pp. 90-97, 2003. [58] L. Li, C. R. Weinberg, T. A. Darden, and L. G. Pedersen, “Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method,” Bioinformatics, Vol.17, pp. 1131-1142, 2001. [59] M. Sebbana and R. Nockb, “A hybrid filter/wrapper approach of feature selection using information theory,” Pattern Recognition, Vol.35, pp. 835-846, 2002. [60] O. W. Kwon and J. H. Lee, “Text categorization based on k-nearest neighbor approach for web site classification,” Information Processing and Management, Vol.39, No.1, pp. 25-44, 2003. [61] P. Larra˜naga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Arma˜nanzas, G. Santaf´e, A. P´erez, and V. Robles, “Machine learning in Bioinformatics,” Briefings in Bioinformatics, Vol.7, pp. 86-112, 2006. [62] S. K. Fan, Y. C. Liang, and E. Zahara, “A genetic algorithm and a particle swarm optimizer hybridized with Nelder-Mead simplex search,” Computers and Industrial Engineering, Vol.50, No.4, pp. 401-425, 2006. [63] S. Y. Ho, L. S. Shu, and J. H. Chen, “Intelligent evolutionary algorithms for large parameter optimization problems,” IEEE Transaction on Evolutionary Computation, Vol.8, No.6, pp. 522-541, 2004. [64] L. Talavera, “An evaluation of filter and wrapper methods for feature selection in categorical clustering,” Proc 6th Int Symp on Intelligent Data Analysis (IDA05) Madrid, Vol.3646, pp. 440-451, 2005. [65] C. F. Tsai, C. W. Tsai, C. P. Chen, and F. C. Lin, “A multiplesearching approach to genetic algorithms for solving traveling salesman problem,” Joint Conf. on Information Sciences, pp. 362-366, 2002. [66] W. Shang, H. Huang, H. Zhu, Y. Lin, Y. Qu, and Z. Wang, “A novel feature selection algorithm for text categorization,” Expert Systems with Applications, Vol.33, No.1, pp. 1-5, 2007.

8

Appendix A. (1) Pseudo-Code for SFLA-GA Procedure 1: begin 2: Generate random population of P solutions; 3: Evaluation fitness value of frogs by 1-Nearest Neighbor () 4: Sort the population P in descending order of their fitness values 5: Divide P into m memeplexes 6: for i 1 to number of memeplexes 7: Update frogs of memeplex by GA () 8: reproduce messages of chromosomes to frogs of memeplex 9: next i 10: Combine the evolved memeplexes 11: Sort the population P in descending order of their fitness values 12: Check if termination = true 13: end

Appendix B. (2) Pseudo-code for GA procedure 1: begin 2: reproduce messages of frogs among memeplex to chromosomes and represent initialize population 3: while (number of generations, or the stopping criterion is not met) 4: Evaluation fitness value of chromosomes by 1-Nearest Neighbor () 5: Select two parents chrom1 and chrom2 from population 6: offspring = crossover (chrom1 , chrom2  7: mutation (offspring) 8: replace (population, offspring) 9: next generation until stopping criterion 10: end

Journal of Advanced Computational Intelligence and Intelligent Informatics

Vol.12 No.3, 2008

A Combination of SFLA and GA for Gene Selection

Appendix C. Name:

(3) Pseudo-code for 1-Nearest Neighbor procedure 1: begin 2: for i 1 to sample number of classification problem 3: for j 1 to sample number of classification problem 4: for k 1 to dimension number of classification problem 5: disti disti dataik  data jk 2 6: next k 7: if disti  nearest then 8: classi class j 9: nearest disti 10: end if 11: next j 12: next i 13: for i 1 to sample number of classification problem 14: if classi real class of testing data then correct correct  1 15: end if 16: next i 17: Fitness value = correct/number of testing data 18: end

Chao-Hsuan Ke

Name: Cheng-Hong Yang

Affiliation: Electronic Engineering Department of National Kaohsiung University of Applied Sciences

Address: 415 Jiangong Rd., Sanmin District, Kaohsiung City 80778, Taiwan

Brief Biographical History: 1990- Received B.S. degrees in computer engineering from North Dakota State University 1992- Received M.S. and Ph.D. degrees in computer engineering from North Dakota State University

Main Works: Name: Cheng-San Yang

¯ evolutionary computation ¯ bioinformatics ¯ assistive tool implement

Membership in Academic Societies:

¯ Institute of Electrical and Electronics Engineers (IEEE)

Name: Li-Yeh Chuang

Vol.12 No.3, 2008

Journal of Advanced Computational Intelligence and Intelligent Informatics

9

Suggest Documents