Simultaneous Feature Selection and Clustering for ... - Dipankar Dutta's

Simultaneous Feature Selection and Clustering for Categorical Features Using Multi Objective Genetic Algorithm Dipankar Dutta

Paramartha Dutta

Jaya Sil

Department of Computer Science and Information Technology, University Institute of Technology, The University of Burdwan, Golapbug (North), Burdwan, West Bengal, India PIN-713104 Email:dipankar [email protected]

Department of Computer and System Sciences, Visva-Bharati University, Santiniketan, West Bengal, India PIN-731235 Email:[email protected]

Department of Computer Science and Technology, Bengal Engineering and Science University, Shibpur, West Bengal, India PIN-711103 Email:[email protected]

Abstract—Clustering is unsupervised learning where ideally class levels and number of clusters (K) are not known. Kclustering can be categorized as semi-supervised learning where K is known. Here we have considered K-Clustering with simultaneous feature selection. Feature subset selection helps to identify relevant features for clustering, increase understandability, better scalability and improve accuracy. Here we have used two measures, intra-cluster distance (Homogeneity, H) and inter-cluster distances (Separation, S) for clustering. Measures are using mod distance per feature suitable for categorical features (attributes). Rather than combining H and S to frame the problem as single objective optimization problem, we use multi objective genetic algorithm (M OGA) to find out diverse solutions near to Pareto optimal front in the two-dimensional objective space. Each evolved solution represents a set of cluster modes (CM s) build by selected feature subset. Here, K-modes is hybridized with M OGA. We have used hybridized GA to combine global searching powers of GA with local searching powers of K-modes. Considering context sensitivity, we have used a special crossover operator called “pairwise crossover” and “substitution”. The main contribution of this paper is simultaneous dimensionality reduction and optimization of objectives using M OGA. Results on 3 benchmark data sets from UCI Machine Learning Repository containing categorical features shows the superiority of the algorithm.

I. I NTRODUCTION Feature selection is a process of identifying a subset of relevant features (attributes) from the data sets. Various feature selection techniques [16], [20], [27] are available in literature. We are using M OGA for simultaneous feature selection and clustering (M OGA F eature Selection(H, S)). M OGA F eature Selection(H, S) shows significant improvement over computing time and clustering in comparison with M OGA based clustering without feature selection (M OGA(H, S)) [8]. Although some researchers have done clustering by M OGA or by other multi objective evolutionary algorithms [1], [5], [7], [8], [10], [12], [18], [19], [22], [23],

c 978-1-4673-5115-7 2012 IEEE

[24], [26] none of them indeed tried simultaneous feature selection. Multi objective clustering on categorical data has been done in [5], [8], [10], [22], [23], [24]. In [8] we have shown that M OGA(H, S) is better than non-GA method like K-modes. M OGA F eature Selection(H, S) automatically distinguish between relevant and irrelevant features for clustering, without taking help of feature selection techniques, which is the main contribution of this work. Clustering is to formation of natural groups within a data set. However, choosing optimization criterion is one of the fundamental dilemmas in clustering [17]. Objects within the same clusters are similar therefore, intra-cluster distances (H) are low and at the same time, objects within the different clusters are dissimilar, so inter-cluster distances (S) are high. As the relative importance of H and S are not known, it is better to optimize separately (Homogeneity, H) and (Separation, S) rather than combining them into a single measure for optimization. Although GAs have been used as a data clustering problems [6], [11], [21], [25], most of them optimize single objective, which is not equally applicable to all kinds of data sets. So to solve multi objective optimization problem like clustering, M OGA has been applied in the paper. The use of GAs in clustering is an attempt to exploit effectively the large search space usually associated with cluster analysis and better interactions among the features for forming chromosomes. Almost all conventional methods start searching from a single point and through subsequent iterations converge to the best solution. However, they are sensitive to the choice of initial solution. GAs always works on a whole population of points (strings) and performs a nearly global search rather than performing a local, hill-climbing search. This improves the chance of reaching to a global optima, vis-a-vis, reduces the risk of being trapped in a local optima, thereby offering robustness.

191

This paper is organized as follows. Section II clarifies definitions pertaining to the problem. Section III describes the proposed approach for solving the problem. In section IV, testing are carried out on some popular benchmark data sets. Finally, section V summarizes the work with concluding remarks. II. D EFINITIONS A relation R and a list of features A1 , A2 , ...An defines a relational schema R(A1 , A2 , ...An ), where n = total number of features including class label. The domain of Ai is dom(Ai ), where 1 ≤ i ≤ n. X represents a data set, which is a set of tuples that is X = {t1 , t2 , ...tm }, where m = total number of tuples or records. Each tuple t is an n-dimensional attribute vector, an ordered list of n values i.e. t = [v1t , v2t , ...vnt ], where vit ∈ dom(Ai ), with 1 ≤ i ≤ n. vit is ith value in tuple t, which corresponds to the feature Ai . Each tuple, t belongs to predefined class, represented by vnt where vnt ∈ dom(An ). For clustering vnt t or class labels are not known, so t = [v1t , v2t , ...vn−1 ], which imply (n − 1) = total number of features excluding class label. Formally, the problem is stated as every ti of X (1 ≤ i ≤ m) is to be clustered into K number of non-overlapping groups {C1 , C2 , ...CK }; where C1 ∪ C2 ∪ ...CK = X, Ci = Ø, and Ci ∩ Cj = Ø for i = j and j ≤ K. A solution is a set of Cluster Mode (CM ) that is {CM1 , CM2 , ...CMK }. nva-dimensional feature vector, that is [ci1 , ci2 , ...cin−1 ] represents CMi . Where nva is the number of valid features. Calculating mod distance between two points is most common way of measuring dissimilarity between two points for categorical data. We have modified formula of mod distance calculation by dividing it with nva to calculate mod distance per feature. In M OGA F eature Selection(H, S) equations 1, 2 and 3 calculate mod distance per feature between one cluster mode and one tuple, two cluster modes and two tuples respectively. (n−1)

dmod (CMi , tj ) = [[

M OD(cil , vlj )]/nva]1/2

(1)

M OD(cil , cjl )]/nva]1/2

(2)

l=1 (n−1)

dmod (CMi , CMj ) = [[

l=1

(n−1)

dmod (ti , ij ) = [[

M OD(vli , vlj )]/nva]1/2

(3)

l=1

If cil = # (represents don’t care condition for invalid feature) then M OD(cil , vlj ) = 0 or M OD(cil , cjl ) = 0 or M OD(vli , vlj ) = 0. M OD(cil , vlj ), M OD(cil , cjl ) and M OD(vli , vlj ) are equal to 0 if cil = vlj , cil = cjl and vli = vlj respectively. Otherwise they are equal to 1. As all features are forming chromosomes in M OGA(H, S) so nva = (n − 1) in

192

equations 1, 2 and 3 to calculate d(CMi , tj ), d(CMi , CMj ) and d(ti , ij ) respectively. III. P ROPOSED A PPROACHES Original data sets are changed to label the last feature as class label. After this step class label becomes nth feature. In clustering class labels are unknown so we are considering first (n − 1) features of data sets. We have discussed M OGA F eature Selection(H, S) in details in this section. Specialities of M OGA F eature Selection(H, S) in comparison to M OGA(H, S) are also discussed. 1) Building initial population: Initial population size is the nearest integer value of 10% of data set size. Although correlation between Initial Population Size (IP S) and the number of instances in the data set are not explicitly known, our experience suggests that as IP S guides searching power of GA and therefore its size should increase with the size of the data set. K number of tuples are chosen randomly from X to form j th chromosome Chj (where 1 ≤ j ≤ IP S) in a population. The process is repeated for IP S number of times. Random numbers of features build chromosomes. Algorithm 1 describes the process of building initial population. In M OGA(H, S) all features of data sets build chromosomes. Algorithm 1 Algorithm for building initial population for M OGA F eature Selection(H, S) m ← total number of tuples in the data set K ← number of clusters n ← number of features IP S ← (1/10 × m) rn ← GenerateRandomN umber(1, (n − 1))/* Generate integer random number between 1 to (n − 1) */ valid features (va) ← ChooseAttributes(rn, n − 1) /* Choose rn number of features from (n − 1) features */ invalid features (ia)← N otChosenAttributes(va, n − 1) /* Choose other (n − 1 − rn) number of attributes from (n − 1) features */ for j = 1 to IP S do for i = 1 to K do rni ← GenerateRandomN umber(1, m)/* Generate integer random number between 1 to m */ end for rn1 Chj ← v1rn1 /#, v2rn1 /#, ...vn−1 /#| rn2 rn2 rn2 v1 /#, v2 /#, ...vn−1 /#| rnK /#| ...|v1rnK /#, v2rnK /#, ...vn−1 /* vlrni represents value of lth feature of rnth tuple. i ‘,’ separates dimensions of any CMi and ‘|’ separates CMi . # represents don’t care condition. For va values of tuples are building chromosomes and for ia # is building chromosome. */ end for 2) Reassign cluster modes: Every chromosome delivers a set of Cluster Modes (CM s) that is {CM1 , CM2 , ...CMK }. However, in M OGA F eature Selection(H, S), all features

2012 12th International Conference on Hybrid Intelligent Systems (HIS)

are not building cluster modes. For valid features only Kmodes algorithm [14], [15] forms new cluster modes (CM s) ∗ }. In M OGA(H, S) for all that is {CM1∗ , CM2∗ , ...CMK features K-modes algorithm forms new cluster modes (CM s) ∗ that is {CM1∗ , CM2∗ , ...CMK }. for 3) Crossover: Crossover probability (Pco ) M OGA(H, S) and M OGA F eature Selection(H, S) is 0.9. High Pco increases chances of getting good chromosomes for higher rate of information exchanges among chromosomes. Parameter fitting of GA is discussed in detailed in [4]. “Pairwise crossover” [9] is used in both algorithms. Parent chromosomes must have the same set of valid and invalid features to calculate distance between cluster modes. So two extra steps are needed for crossover of M OGA F eature Selection(H, S). At first, one chromosome is selected randomly from the population to consider a set of valid and invalid features from population generated by reassign cluster modes method (section III-2). As a second step, population for crossover is built by selecting chromosomes having the same set of valid and invalid features as the randomly selected chromosome. In this way a fraction of population having same set of valid and invalid features underdone for Crossover following the procedure described in algorithm 2. Crossover is done and it changes some chromosomes. Child chromosomes replace parent chromosomes in the original population formed by reassign cluster modes method (section III-2). Other chromosomes which are not participating in crossover are copied as it is. 4) Substitution: Substitution probability (Psb ) is 0.1. Introducing random information to good chromosomes may destroy good building blocks of chromosomes so Psb value is kept low. It may shift chromosomes from local optima zone to global optima zone. Chromosomes formed by crossover become parent chromosomes to this. Substitution of M OGA F eature Selection(H, S) is same as M OGA(H, S) except in one place. Only valid features of cluster modes are substituted by feature values of any randomly chosen tuple. Algorithm 3 describes the process. 5) Fitness Computation: Intra-cluster distance (Homogeneity) (H) and inter-cluster distances (Separation) (S) are two measures for optimization in MOGA(H, S). Similarly modified intra-cluster distance (Homogeneity) (Hmod ) and modified inter-cluster distances (Separation) (Smod ) are two measures for calculating fitness values of M OGA F eature Selection(H, S). Maximization of 1/H or 1/Hmod and S or Smod are the twin objectives of M OGA(H, S) or M OGA F eature Selection(H, S). It converts the optimization problem befitting into max-max framework. If dlowest (CMi , tj ) is the lowest distance for any value of i (where 1 ≤ i ≤ K), then tj is assigned to Ci . Equation 4 defines Hi (average intra-cluster distance or homogeneity of ith cluster).

Algorithm 3 Algorithm for substitution of M OGA(H, S) and M OGA F eature Selection(H, S) K ← number of clusters Npch ← Number of P Chs in population Psb ← 0.1 for i = 1 to Npch do for j = 1 to K do rn1 ← GenerateRandomN umber(0, 1)/* Generate random number between 0 to 1 */ if rn1 < Psb then rn2 ← GenerateRandomN umber(1, m)/* Generate integer random number between 1 to m */ Substitute CMj of P Chi with trn2 or subset of trn2 end if end for end for

Hi = [

j=m

dlowest (CMi , tj )]/mi

(4)

j=1

where mi is the number of tuples belonging to ith cluster and tj is assigned to ith cluster based on lowest distance. Summation of Hi is H, defined in equation 5 as H=

K

Hi

(5)

i=1

Say, tj and tp are two distinct tuples from a data set where tj is assigned to Cx and tp is assigned to Cy using dlowest (CMi , tj ). S is defined in equation 6 as S=

m

d(tj , tp )

(6)

j,p=1

where j = p and x = y. Equations 4, 5 and 6 are modified into equations 7, 8 and 9 respectively. Hmodi = [

mi

dmodlowest (CMi , tj )]/mi

(7)

j=1

Hmod =

K

Hmodi

(8)

dmod (tj , tp )

(9)

i=1

Smod =

m j,p=1

As we are using Mod distance per feature, 1/Hmod and Smod values of different chromosomes having different sets of valid features become comparable with one another. Note that when all features are taking part in chromosome building equations 7, 8 and 9 become same as equations 4, 5 and 6 respectively. Algorithm 4 calculates 1/H or 1/Hmod and S or Smod for every chromosome in the population.


193

Algorithm 2 Algorithm for crossover of M OGA(H, S) and M OGA F eature Selection(H, S) /* All Ch produced by K-modes described in section III-2 are P Chs of crossover in M OGA(H, S) whereas some selected chromosomes are P Chs in M OGA F eature Selection(H, S) as descried in section III-3 */ Pco ← 0.9; Npch ← Number of P Chs in population for crossover; Nco ← Npch × Pco ) ; K ← number of clusters; n ← number of features; for counter = 1 to Npch do P ChF lagcounter ← T rue/* To prevent repeated choosing of same PCh */ end for counter ← 1 while counter < Nco /2 do rn1 ← GenerateRandomN umber(1, Npch ); rn2 ← GenerateRandomN umber(1, Npch )/* Generate integer random numbers between 1 to Npch */ if rn1 = rn2 AND P ChF lagrn1 = T rue AND P ChF lagrn2 = T rue then counter ← counter + 1; P ChF lagrn1 ← F alse; P ChF lagrn2 ← F alse/* To prevent repeated choosing of same PCh */ /* ********************************Building distance matrix (M)******************************** */ for i = 1 to K do for j = 1 to K do CMi ← CMi of P Chrn1 ; CMj ← CMj of P Chrn2 ; aij ← d(CMi , CMj ) or dmod (CMi , CMj ) /* aij is element of ith row and j th column of M and equation 2 is calculating d(CMi , CMj ) or dmod (CMi , CMj ) */ end for end for /* ********************************Constructing CCh from PCh******************************** */ for counter1 = 1 to K do f lag1counter1 ← T rue; f lag2counter1 ← T rue end for for counter1 = 1 to K do minDistance ← 1000000; i1 ← −1; j1 ← −1 for i = 1 to K do if f lag1i = T rue then for j = 1 to K do if f lag2j = T rue then if aij < minDistance then minDistance ← aij i1 ← i j1 ← j end if end if end for end if end for rn3 ← GenerateRandomN umber(0, 1)/* Generate random number between 0 to 1 */ if rn3 < 0.5 then CMi1 of P Chrn1 ∈ CChrn1 CMj1 of P Chrn2 ∈ CChrn2 else CMi1 of P Chrn1 ∈ CChrn2 CMj1 of P Chrn2 ∈ CChrn1 end if f lag1i ← F alse; f lag2j ← F alse/* To prevent repeated choosing of same CM of P Chi or P Chj */ end for end if end while /* ************************For PChs not taking part in crossover operation****************************** */ for counter = 1 to Npch do if P ChF lagcounter = T rue then CChcounter ← P Chcounter end if end for

194


Algorithm 4 Algorithm for calculating of H or Hmod and S or Smod of M OGA(H, S) or M OGA F eature Selection(H, S) m ← total number of tuples in the data set K ← number of clusters nva ← number of valid features /* *********Assign tuples into clusters********* */ for j = 1 to m do AssingedClusterN umberj ← −1 end for for j = 1 to m do distance ← 1000000000 clusterN o ← −1 for i = 1 to K do /* Equation 1 calculates d(CMi , tj ) or dmod (CMi , tj ) */ if distance > d(CMi , tj ) or dmod (CMi , tj ) then distance ← d(CMi , tj ) or dmod (CMi , tj ) clusterN o ← i end if if distance = d(CMi , tj ) or dmod (CMi , tj ) then rn ← GenerateRandomN umber(0, 1)/* Generate random number between 0 to 1 */ if rn > 0.5 then distance ← d(CMi , tj ) or dmod (CMi , tj ) clusterN o ← i end if end if end for AssingedClusterN umberj ← clusterN o end for /* ***********Calculating H or Hmod *********** */ H or Hmod ← 0 for i = 1 to K do Hi or Hmodi ← 0 mi ← 0 end for for j = 1 to m do for i = 1 to K do if i = AssingedClusterN umberj then Hi or Hmodi ← Hi or Hmodi + d(CMi , tj ) or dmod (CMi , tj ) mi ← mi + 1 end if end for end for for i = 1 to K do H or Hmod ← H or Hmod + (Hi or Hmodi )/mi end for /* ***********Calculating S or Smod ************ */ S←0 for j = 1 to m do for i = j + 1 to m do if AssingedClusterN umberi = AssingedClusterN umberj then S or Smod ← S or Smod +d(ti , tj ) or dmod (ti , tj )/* Equation 3calculates d(ti , tj ) or dmod (ti , tj ) */ end if end for end for

6) Combination: At the end of every generation of M OGA some chromosomes are lying on the Pareto optimal front. These chromosomes have survived and these are known as Pareto chromosomes. Chromosomes obtained from previous substitution method (algorithm 3 of section III-4), say N Cpre and previous generation Pareto chromosomes (section III-8) of M OGA, say P Cpre are considered to perform in the next generation. For the first generation of M OGA, P Cpre is zero because of the nonexistence of previous generation. In general, for ith generation of M OGA, number of chromosomes are P Cpre + N Cpre . 7) Eliminating duplication: Survived chromosomes from previous generation may be generated again in the present generation of M OGA, resulting multiple copies of the same chromosome. So in this step system is selecting unique chromosomes from the combined chromosome set. One may argue with the requirement of this step. But if we remove this step then population size will be very high at the end of some iterations of M OGA. 8) Selection of Pareto chromosomes: All chromosomes lying on the front of max(1/H, S) or max(1/Hmod , Smod ) are Pareto chromosomes. As this process is applied after combination, elitism is adhered to. 9) Building intermediate population: From 2nd generation onwards for every even generation, selected Pareto chromosomes of previous generation build population. This helps to build a population of chromosomes close to the Pareto optimal front, resulting convergence of M OGA. For every odd generation of M OGA or if for any generation previous generation of M OGA does not produce Pareto chromosomes greater than 2 then population are created using algorithm 1 of section III-1. This introduces new chromosomes in population and induces diversity in the population. Population size also varies from one generation to another. IV. T ESTING AND C OMPARISON In literature different validity indices are available to measure the goodness of clusters. Davies-Bouldin (DB) Index [3] is used here. Equation 10 is calculating DB Index. DB Index = 1/K

K

max((Hmodi +Hmodj )/dmod (CMi , CMj ))

i=1,i=j

(10) where K is the number of clusters, Hmodi is the average distance of all patterns in cluster i to their cluster mode CMi , Hmodj is the average distance of all patterns in cluster j to their cluster mode CMj and dmod (CMi , CMj ) is the distance of cluster modes CMi and CMj . M OGA(H, S) and M OGA F eature Selection(H, S) calculate DB Index by equation 10. 3 popular data sets Soybean(Small), Tic Tac Toe and Zoo from UCI Machine Learning Repository [13] compares the performance of 2 clustering algorithms. For every data set for every clustering algorithm average values of 10 individual runs are tabulated in table I. Detail analysis is not possible here due to limitation of space.


195

TABLE I C OMPARISON OF PERFORMANCE OF CLUSTERING ALGORITHMS 1:Optimized min. H or Hmod of M OGA population, 2:Optimized max. S or Smod of M OGA population, 3:DB Index of chromosome giving optimized min. H or Hmod of M OGA population, 4:DB Index of chromosome giving optimized max. S or Smod of M OGA population, 5:Average DB Index of M OGA population, 6:Best DB Index of M OGA population Methods Data set Soybean(Small) Tic Tac Toe Zoo

A1 0.41 0.96 0.52

A2 289.43 157324.8 1957.21

M OGA(H, S) A3 A4 0.86 0.91 0.96 2.24 0.83 1.16

A5 0.87 1.43 1.093

V. C ONCLUSIONS In this work, we have implemented two versions of a novel real coded hybrid elitist M OGA for K-clustering (M OGA(H, S) and M OGA F eature Selection(H, S)). It is known that elitist model of GAs provides the optimal string as the number of iterations increases to infinity [2]. Due to the use of special population building process, hybridization of Kmodes with GA, special crossover and substitution operator GAs are producing good clustering solution in a smaller number of iterations (100). Results shows the effectiveness of simultaneous feature selection. We have achieved one important data mining task of clustering by finding cluster modes. Dealing with missing features values and unseen data are other problem areas. It may be interesting to adapt M OGA based K-clustering with unknown number of clusters. Authors are working in these directions. R EFERENCES [1] S. Bandyopadhyay, U. Maulik, and A. Mukhopadhyay, “Multiobjective genetic clustering for pixel classification in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 5, May. 2007, pp. 15061511, doi:10.1109/TGRS.2007.892604. [2] D. Bhandari, C. A. Murthy, and S. K. Pal, “Genetic algorithm with elitist model and its convergence,” Int. J. Patt. Recogn. Artif. Intell., vol. 10, no. 06, Sep. 1996, pp. 731–747, doi:10.1142/S0218001496000438. [3] D. L. Davies, and D. W. Bouldin, “A cluster separation measure,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-1, no. 2, Apr. 1979, pp. 224-227, doi:10.1109/TPAMI.1979.4766909. [4] K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms. New York: John Wiley & Sons, 2001. [5] S. Deng, Z. He, and X. Xu, “G-ANMI: A mutual information based genetic clustering algorithm for categorical data,” KnowledgeBased Systems, vol. 23, no. 2, Mar. 2010, pp. 144-149, doi:10.1016/j.knosys.2009.11.001. [6] K. Dhiraj, and S. K. Rath, “Comparison of SGA and RGA based clustering algorithm for pattern recognition,” Int. J. of Recent Trends in Engg., vol. 1, no. 1, May. 2009, pp. 269–273. [7] D. Dutta, P. Dutta, and J. Sil, “Clustering by multi objective genetic algorithm,” in Proc. of 1st Int. Conf. on Recent Advances in Information Technology (RAIT 2012), IEEE Press, Dhanbad, India, Mar. 2012, pp. 548–553, doi:10.1109/RAIT.2012.6194619. [8] D. Dutta, P. Dutta, and J. Sil, “Clustering Data Set with Categorical Feature Using Multi Objective Genetic Algorithm,” in Proc. Int’l Conf. on Data Science and Engineering (ICDSE 2012), IEEE Press, Cochin, Kerala, India, Jul. 2012, pp. 103–108, doi: 10.1109/ICDSE.2012.6281897. [9] P. Fr¨ anti, J. Kivij¨ arvi, T. Kaukoranta, and O. Nevalainen, “Genetic Algorithms for Large-Scale Clustering Problems, The Computer Journal, vol. 40, no. 9, Sep. 1997, pp. 547-554, doi:10.1093/comjnl/40.9.547. [10] G. Gan, J. Wu, and Z. Yang, “A genetic fuzzy K-Modes algorithm for clustering categorical data,” Expert Systems with Applications, vol. 36, no. 2, Mar. 2009, pp. 1615-1620, doi:10.1016/j.eswa.2007.11.045.

196

A6 0.77 0.96 0.77

B1 0.14 0.56 0.07

M OGA F eature Selection(H, S) B2 B3 B4 B5 403.81 0.49 1.27 0.71 203568.80 0.56 0.66 0.62 2410.08 0.10 0.69 0.47

B6 0.38 0.56 0.10

¨ [11] L. O. Hall, I. B. Ozyurt, and J. C. Bezdek, “Clustering with a genetically optimized approach,” IEEE Trans. Evol. Comput.. vol. 3, no. 2, Jul. 1999, pp. 103–112, doi:10.1109/4235.771164. [12] J. Handl, and J. Knowles, “An evolutionary approach to multiobjective clustering,” IEEE Trans. Evol. Computat., vol. 11, no. 1, Feb. 2007, pp. 56-76, doi:10.1109/TEVC.2006.877146. [13] S. Hettich, C. Blake, and C. Merz, “UCI repository of machine learning databases,” 1998. [Online]. Available: http://www.ics.uci.edu/ mlearn/MLRepository.html [14] Z. Huang, “Clustering large data sets with mixed numeric and categorical values,” in Proc. 1st Pac. Asia Knowl. Discovery Data Mining Conf., World Scientific, Singapore, 1997, pp. 21–34. [15] Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with categorical values,” Data Mining Knowl. Discovery, vol. 2, no. 3, 1998, pp. 283-304. [16] Y. Kim, W. N. Street, and F. Menczer, “Feature selection in unsupervised learning via evolutionary search,” in Proc. 6th ACM SIGKDD Int. Conf. Knowl. Dis. Data Mining, Boston, MA, USA, Aug. 2000, pp. 365-369. [17] J. Kleinberg, “An impossibility theorem for clustering,” in Advances in Neural Information Processing Systems, vol. 15, S. Becker, S. Thrum, and K. Obermayer, Eds., Cambridge, MA: MIT Press, 2002, pp. 446–453. [18] E. E. Korkmaz, J. Du, R. Alhajj, and K. Barker, “Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering,” Intell. Data Anal., vol. 10, no. 2, Mar. 2006, pp. 163-182. [19] M. H. C. Law, A. P. Topchy, and A. K. Jain, “Multiobjective data clustering,” in Proc. IEEE Comput. Soc. Conf. on Comput. Vision and Pattern Recognit (CVPR 2004), vol. 2, IEEE press, Washington, DC, Jun.-Jul. 2004, pp. 424–430, doi:10.1109/CVPR.2004.1315194. [20] H. Liu, and L. Yu, “Toward integrating feature selection algorithms for classification and clustering,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 1-12, Apr. 2005. doi:10.1109/TKDE.2005.56. [21] U. Maulik, and S. Bandyopadhyay, “Genetic algorithm-based clustering technique,” Pattern Recognit., vol. 3, no. 9, Sep. 2000, pp. 1455–1465. [22] U. Maulik, S. Bandyopadhyay, and I. Saha, “Integrating clustering and supervised learning for categorical data analysis,” IEEE Trans. Syst. Man. Cybern. A, vol. 40, no. 4, Jul. 2010, pp. 664-675, doi:10.1109/TSMCA.2010.2041225. [23] A. Mukhopadhyay, U. Maulik, and S. Bandyopadhyay, “Multiobjective genetic fuzzy clustering of categorical attributes,” in Proc. 10th Int. Conf. on Inf. Technol. (ICIT 2007), IEEE press, Orissa, India, Dec. 2007, pp. 74-79, doi:10.1109/ICIT.2007.13. [24] A. Mukhopadhyay, U. Maulik and S. Bandyopadhyay, “Multiobjective genetic algorithm-based fuzzy clustering of categorical attributes,” IEEE Trans. Evol. Computat., vol. 13, no. 5, Oct. 2009, pp. 991–1005, doi:10.1109/TEVC.2009.2012163. [25] H. Pan, J. Zhu, and D. Han, “Genetic algorithms applied to multi-class clustering for gene expression data,” Genomics, Proteomics, Bioinformatics, vol. 1, no. 4, Nov. 2003, pp. 279–287. [26] K. S. N. Ripon, and M. N. H. Siddique, “Evolutionary multi-objective clustering for overlapping clusters detection,” in Proc. of the 11th Conf. on Congress on Evolutionary Computation (CEC 2009), IEEE Press, Piscataway, NJ, May. 2009, pp. 976–982, doi:10.1109/CEC.2009.4983051. [27] H. Samet, Foundations of Multidimensional and Metric Data Structures. Amsterdam, UA:Elsevier, 2006.


Simultaneous Feature Selection and Clustering for ... - Dipankar Dutta's

Simultaneous Feature Selection and Clustering for ... - Dipankar Dutta's

Suggest Documents

Feature Selection in Clustering Problems

A Filter Feature Selection Method for Clustering

Greedy Feature Selection for Subspace Clustering

Deterministic Feature Selection for k-means Clustering

Interactive Feature Selection for Document Clustering - CiteSeerX

Novel Clustering approach for Feature selection

Greedy Feature Selection for Subspace Clustering

Simultaneous Feature And Instance Selection ... - Semantic Scholar

Feature Selection and Overlapping Clustering-Based Multilabel ...

Integrated Feature Selection and Clustering for Taxonomic Problems

Feature clustering and mutual information for the selection of variables

Data Visualization with Simultaneous Feature Selection - CiteSeerX

SUFFUSE: Simultaneous Fuzzy-Rough Feature- Sample Selection

Feature Selection Via Simultaneous Sparse ... - Google Sites

Feature Selection in Mixture-Based Clustering

Feature-Extraction and Feature-Selection for ...

Feature-Extraction and Feature-Selection for ...

Dependency-based feature selection for clustering symbolic data

Data-Driven Feature Word Selection for Clustering Online News ...

Comparing Feature Selection Criteria For Term Clustering Applications

Robust Feature Selection via Simultaneous Capped l2-Norm and l2,1 ...

Simultaneous Feature Extraction and Selection Using a ... - CiteSeerX

Simultaneous feature selection and ant colony ... - Semantic Scholar

Combining feature selection and feature