K.Sarojini et. al. / International Journal of Engineering Science and Technology Vol. 2(5), 2010, 2456-2465
Supervised Feature Subset Selection based on Modified Fuzzy Relative Information Measure for classifier Cart †a
††
K.SAROJINI , Dr. K.THANGAVEL †
†††
and D.DEVAKUMARI
Assistant Professor, SNR SONS College, Coimbatore, India-641015.
[email protected]
††
Professor and Head, Periyar University, Salem, India-636 011.
[email protected] Govt. Arts College, Dharmapuri, India.
[email protected]
†††
Assistant Professor,
Abstract: Feature subset selection is an essential task in data mining. This paper presents a new method for dealing with supervised feature subset selection based on Modified Fuzzy Relative Information Measure (MFRIM). First, Discretization algorithm is applied to discretize numeric features to construct the membership functions of each fuzzy sets of a feature. Then the proposed MFRIM is applied to select the feature subset focusing on boundary samples. The proposed method can select feature subset with minimum number of features, which are relevant to get higher average classification accuracy for datasets. The experimental results with UCI datasets show that the proposed algorithm is effective and efficient in selecting subset with minimum number of features getting higher average classification accuracy than the consistency based feature subset selection method. Keywords: Discretization; Feature selection; Fuzzy Entropy; Fuzzy Relative Information Measure. 1. Introduction Feature Subset Selection is an essential pre-processing task in Data Mining. Feature selection process refers to choosing subset [1] of attributes from the set of original attributes. Aim of Feature Subset Selection is to reduce the number of features and to increase the Classification Accuracy. It is obvious that a data set might have irrelevant and relevant features. If relevant features are selected properly, then it is possible to increase the classification accuracy rates [2, 3]. This paper presents supervised feature subset selection based on MFRIM. First, discretization algorithms are used to discretize numeric features using K-Means, Fuzzy C Means (FCM) and Median as initial centriod of KMeans to construct the membership function of each fuzzy set of a feature. Then, it selects the feature subset, based on proposed MFRIM focusing on boundary samples. The proposed method can select feature subset with minimum number of features, which are relevant to get higher average classification accuracy for datasets. The proposed method is validated by using classifier which is available in WEKA. Average classification accuracy rates of the proposed feature subset selection method and the consistency based method are compared for the classifier CART too. The experiments are carried out by using data sets taken from UCI Repository of Machine Learning Databases. The proposed feature subset selection method can select feature subset with minimum number of features, which are relevant to get higher average classification accuracy for datasets when compared to consistency based feature subset selection method. The rest of this paper is organized as follows. Section 2 briefly reviews related work of feature subset selection. Section 3, presents proposed MFRIM for feature subset selection focusing on the boundary samples. A comparative analysis is performed in section 4. Section 5 concludes this paper. 2. Literature Review 2.1. Fuzzy Entropy Measure Fuzzy sets and logic are powerful mathematical tools for controlling uncertain systems; they are facilitators for approximate reasoning in decision making in the absence of complete and precise information. Their role is significant when applied to complex phenomena not easily described by traditional mathematical tools. Entropy is a measure, which measures the impurity of a collection. Some of the existing entropy measures are found in
ISSN: 0975-5462
2456
K.Sarojini et. al. / International Journal of Engineering Science and Technology Vol. 2(5), 2010, 2456-2465 [4, 5, 6, 7, 8, 9]. In [8], Lee et al. presented a fuzzy entropy measure of an interval, based on Shannon’s entropy measure and Luca’s axioms. Assume that a set X of samples is divided into a set C of classes. The class degree CDc(Ã) of the samples of class c, where c є C, belonging to the fuzzy set à is defined by:
μ (x) CD ( Ã) = μ (x) Ã
xєXc
C
Ã
xєX
(1)
where Xc denotes the samples of class c, c є C, μà denotes the membership function of the fuzzy set, μÃ(x) denotes the membership grade of x belonging to the fuzzy set Ã, and μÃ(x) є [0, 1]. The fuzzy entropy FEc(Ã) [9] of the samples of class c, where c є C, belonging to the fuzzy set à is defined as follows:
F E C ( Ã ) = -C D C ( Ã ) (lo g 2 C D C ( Ã ) )
(2)
The fuzzy entropy FE(Ã) [9] of a fuzzy set à is defined by:
FE ( Ã) = FEC ( Ã) cЄC
(3)
2.2. Information Gain Measure (IG) Information Gain Measure [10, 11] is a quantitative measure of an attribute. Entropy measures the impurity of a collection, whereas Information Gain measures the purity of a collection with respect to class attribute. The amount by which the entropy of X decreases reflects additional information about X provided by Y and is called information gain, given by, IG(X | Y) = H(X) − H(X | Y)
(4)
2.3. Fuzzy Relative Information Measure (FRIM) The FRIM [12] can be used to measure the degree of similarity between two fuzzy sets A and B, and so they are useful for further development of the theory of similarity measures. Suppose X is a discrete universe of discourse, and A,B Є ξ(X), then the relative difference value, denoted by R(A, B), is called the fuzzy relative information measure of B to A, where ξ(X) is a set consisting of all fuzzy subsets of discrete universe of discourse X, and is defined as, R (A, B) =
H (A ∩ B) __________ H (A)
=
H (A) − H (A/B) _____________ H (A)
(5)
Similarly, H (B ∩ A) H (B) − H (B/A) R (B, A) = ___________ = _______________ H (B) H (B)
(6)
R(B, A) is called the fuzzy relative mutual information of fuzzy set A to fuzzy set B. Here H(A), H(B) denotes the fuzzy entropy measure of attribute set and H(A/B) and H(B/A) are the information gain of the attribute set. R(A, B) shows an influence degree of the fuzzy set A to fuzzy set B in fuzzy information processing aspects. It is a generalized relative fuzzy entropy measure, and is called the fuzzy relative information measure (FRIM). 2.4. Consistency based Attribute Reduction In [13, 14], Dash and Liu introduced the consistency function which can measure the difference. Following definitions present the basic definition on consistency. Consistency measure is defined as follows.
ISSN: 0975-5462
2457
K.Sarojini et. al. / International Journal of Engineering Science and Technology Vol. 2(5), 2010, 2456-2465 Definition 1. A pattern is considered to be inconsistent if there are at least two objects such that they match the whole condition attribute set but are with different decision label. Definition 2. The inconsistency count ξi for a pattern Pi of feature subset is the number of times it appears in the data minus the largest number among different class labels. Definition 3. The inconsistency rate of a feature subset is the sum, ∑ξi, of all the inconsistency counts over all patterns of the feature subset that appears in data divided by |U|, the size of all samples, namely ∑ξ i/ |U|. Correspondingly, consistency is computed as, δ = (|U| - ∑ ξ i) | / | U |
(7)
Based on the above analysis, dependency is the ratio of samples undoubtedly correctly classified, and consistency is the ratio of samples probably correctly classified. Based on the above technique, Qing-Hua Hu, Hui Zhao [13] has proposed a consistency method to evaluate the significance of attributes. A forward greedy search algorithm is used with this consistency method to find reducts. This consistency based attribute reduction is also considering the boundary samples. It was proved that, compared to dependency based reduction, consistency based method is very efficient because, it is not only compares the positive region, but also the samples of the majority class in boundary regions. Therefore, even if the positive region is empty, it can still compare the distinguishing power of the features according to the sample distribution in boundary regions. Through the experimental analysis, using CART classifier, it was proved that compared with dependency [13], consistency can reflect not only the size of decision positive region, but also the sample distribution in boundary region. Therefore, the consistency measure is able to describe the distinguishing power of an attribute set more finely than the dependency function. So, it gives better feature subset reduction. 2.5. Discretization Process In data mining, discretization process is known to be one of the most important data preprocessing tasks. If the attributes are continuous, the algorithms can be integrated with discretization algorithms which transform them into discrete feature. Discretization methods are used to reduce the number of values for a given continuous attributes by dividing the range of the attribute into intervals. Discretization makes learning more accurate and faster [15, 16, 17, 18]. Normal discretization process specifically consists of the following four steps: o o o o
Sort all the continuous values of the feature to be discretized. Choose a cut point to split the continuous values into intervals. Split or merge the intervals of continuous values Choose the stopping criteria of the discretization process [18].
3. Proposed Work The feature subset selection problem can be regarded as a dimension reduction problem. The proposed method uses “boundary samples” instead of a full set of samples to select the feature subset. However, using the boundary samples to calculate the MFRIM of a feature subset directly is not possible. So, an indirect method is used to simplify the feature subset selection process described as follows. FRIM with Shannon’s entropy of the interval I1 is equal to the interval I2, it cannot distinguish the entropies of the intervals I1 and I2. But, MFRIM with Lee’s fuzzy entropy measure can indicate that the interval I2 is more ambiguous than that in the interval I1. So, using Lee’s fuzzy entropy measure and information gain measure in FRIM, the proposed MFRIM is set up, which is used as a measure of proximity between two fuzzy sets. In this process Extension Matrix (EM), Combined Extension Matrix (CEM) and boundary samples are generated from [9]. Applying the proposed MFRIM, Extension matrix EMf is generated for each feature f. Then based on class
ISSN: 0975-5462
2458
K.Sarojini et. al. / International Journal of Engineering Science and Technology Vol. 2(5), 2010, 2456-2465 degree, Fuzzy Feature Relative Information Measure FFRIM (f) is calculated for each feature f. Then Combined-Extension matrix function (CEM) is used to construct the extension matrix of the membership grades of the values of a feature subset. Finally, MFRIM of each feature is calculated based on the Boundary Sample of Fuzzy Feature Relative Information Measure BSFFRIM (f1, f2) of a feature subset {f1, f2}. MFRIM (A, B), is called as Modified Fuzzy Relative Information Measure of B to A. It is defined as: H(A) − H (A/B) FE(A) – FE(A/B) MFRIM (A, B) = _____________ = __________ (8) H (A) FE(A) The MFRIM of a feature increases when the number of clusters increases. However, too many clusters could cause the over fitting problem and reduce their classification accuracy rates when they classify new instances. This paper, uses a threshold value Tc to avoid the over fitting problem, where Tc є [0, 1]. Here, by using Lee’s fuzzy entropy method, fuzzy sets having lower fuzzy entropies can be omitted. The threshold value Tr is used, where Tr є [0,1], to omit the fuzzy sets of feature whose maximum class degree is larger than or equal to the threshold value Tr given by the user for feature subset selection. Phase – I The first phase proposes K-Means, Fuzzy C means and Median as initial centroid of K-Means clustering algorithms to discrete numerical attributes and construct the triangular membership function to fuzzyfy all numeric features. The steps are given here under. Step1: Initially, set the number k of clusters to 2. Step2: Use the (K-Means or Fuzzy C means or Median as initial centroid of K-Means) clustering algorithm to generate K cluster centers based on the values of a feature, where K ≥ 2. Step3: Construct the membership function of the fuzzy sets using triangular membership functions based on these K cluster centers, respectively. Step4: Calculate the fuzzy entropy of feature f using class degree. Step5: Calculate Information gain of feature f using class degree. Step6: Calculate MFRIM for feature f using fuzzy entropy and information gain. Step7: If the decreasing rate of the MFRIM of feature f is larger than the threshold value Tc given by the user, where Tc є [0, 1], then let K = K+1 and go to Step 2. Otherwise, let K = K−1 and Stop. Phase – II Assume that a set R of samples is divided into a set C of classes, where R = {r1, r2, . . . , rc}, F denotes a set of candidate features and FS denotes the selected feature subset. The algorithm for feature subset selection is now presented as follows: Step1: Construct the extension matrix EMf. For each feature f membership grades belonging to the fuzzy sets are used to calculate Modified Fuzzy Relative Information Measure MFRIM (f). Step2: Put the feature with the maximum MFRIM into the selected feature subset FS and remove it from the set F of candidate features. Step3: Repeatedly put the feature which can improve the MFRIM of the feature subset into FS until no such a feature exists. Here the Boundary Samples of Fuzzy Relative Information Measure BSFFRIM (FS, f) of the feature subset FS U {f} is calculated focusing on boundary samples and FS is a selected feature subset. 4. Experimental Analysis and Discussion The proposed method has been implemented using Mat Lab version 7.0 and two experimental analysis are presented. First, the proposed method is compared with consistency based feature selection method to get minimum number of feature subset. Second, the same proposed method is compared with consistency based feature selection method to improve the classification performance of the attributes selected by the proposed algorithm. The proposed algorithm gives improved results for the datasets taken from UCI Repository of machine learning databases. The datasets are described in Table-1.
ISSN: 0975-5462
2459
K.Sarojini et. al. / International Journal of Engineering Science and Technology Vol. 2(5), 2010, 2456-2465 Data sets
Abbreviation
Samples
Features
Class
WPBC
198
33
2
Sonar
208
60
2
Wisc. Progno. Breast Cancer Sonar,MinRocks Wine Recog.
Wine
178
13
3
Ecoli
Ecoli
336
7
7
Ionosphere
Iono
351
34
2
Table: 1 Data Sets
The data sets used here have numerical attributes, so they can be discretized to transform the numerical data into categorical one. K-Means, FCM and Median as initial centriod of K-Means clustering algorithms are used to discretize the data. Then the proposed algorithm is implemented on the discretized dataset to find the subset of features. Similarly, consistency based feature selection algorithm is used to find feature subset where discretization algorithms like (equal width, equal frequency, FCM and entropy) are used to discretize the datasets. The proposed algorithm is compared with the consistency based algorithm and displays the number of selected features which are presented in Table-2. From Table-2, It was found that the number of features selected by the proposed method is lesser than the consistency based method. Particularly, almost for all datasets, the proposed algorithm gives very minimum number of features than consistency based method, even in all discretization methods. Then, the selected data are used to train classifiers with CART, which is available in WEKA to evaluate the performance of the selected feature subsets by different methods. The classification powers of the selected data are in the 10 fold cross- equal size and execute 10 times cross- validation, it divides each data set into 10 subsets of approximately. Each time it selects one of the 10 subsets as the testing data set and train the classifier by the remaining 9 subsets to get the classification accuracy rate with respect to each selected feature subset. After executing 10 times, it can get the average classification accuracy rates of feature subset selection algorithm. The average classification accuracy with CART is presented in Table-3. Data set
Raw Data
Discretization Methods
Consistency Reduction
Proposed MFRIM Reduction
Equalwidth
Equal - Freq
Entropy
FCM
K-Means
FCM
Median as Initial Centroid
WPBC
33
10
6
7
7
2
2
2
SONAR
60
7
6
14
6
2
2
2
WINE
13
4
4
5
4
4
2
2
ECOLI
7
6
7
7
6
3
2
3
IONO
34
7
7
8
9
2
2
7
Table-2 A Comparative analysis for Feature Selection
ISSN: 0975-5462
2460
K.Sarojini et. al. / International Journal of Engineering Science and Technology Vol. 2(5), 2010, 2456-2465 Data sets
Raw data
Discretization methods
Consistency Based Reduction
Proposed MFRIM reduction
Equalwidth
EqualFreq
Entropy
FCM
KMeans
FCM
Median as Initial Centroid
WPBC
69.63
70.24
71.21
68.55
69.24
76.26
76.26
74.75
SONAR
72.07
70.14
74.45
74.48
69.76
73.08
73.08
73.08
WINE
89.86
90.35
91.53
94.37
89.72
90.45
91.01
90.45
ECOLI
81.97
81.38
81.38
81.68
81.68
81.85
78.87
75.59
IONO
87.55
90.64
90.64
89.22
90.62
86.61
89.17
89.74
Table-3 Average Classification Accuracy for Selected Features
Table-3 shows that the proposed algorithm gives very high classification accuracy for WPBC, SONAR and WINE datasets even with minimum number of selected features, when compared to raw data and consistency based methods. For ECOLI and IONO dataset, the proposed algorithm gives almost better classification accuracy with minimum number of selected feature subset, when compared to consistency based method. From the analysis, it is proved that the proposed algorithm is very efficient for all datasets. It improves the classification accuracy with minimum number of selected feature subset. The threshold value Tc is used in the algorithm for constructing the membership functions of the fuzzy sets of a numeric feature and the threshold value Tr, used in the algorithm for feature subset selection. The threshold value Tc and Tr used in this method for different datasets are shown in Table-4. Data set
Threshold value(Tc)
Threshold value (Tr)
WPBC
0.5
0.8
SONAR
0.2
0.75
WINE
0.2
0.5
ECOLI
0.2
0.3
IONO
0.2
0.7
Table-4 The threshold value Tc and Tr for different datasets
Minimum number of features selected for proposed method and consistency method is displayed in Figure-1, 2 and 3. Performance of Average classification accuracy rates for proposed and consistency based algorithms are compared in Figure-4, 5 and 6. It shows that the proposed method gives higher average classification accuracy than the other methods.
ISSN: 0975-5462
2461
K.Sarojini et. al. / International Journal of Engineering Science and Technology Vol. 2(5), 2010, 2456-2465 Num be r of fe ature s s e le cte d for Cons is te ncy Bas e d M e thod and Propos e d FRIM M e thod for WPBC Datas e t
Number of features
12 10 8
Consistency
6
Proposed
4 2 0
Different Discretization algorithms Figure -1 Feature selected for WPBC dataset
Num be r of fe atures se le cte d for Consistency Bas ed Method and Propos ed FRIM Me thod for SONAR Datase t
Number of features
15 10
Consistency Proposed
5 0
Different Discretization algorithms Figure -2 Features selected for SONAR dataset
Number of features
Num ber of features selected for Consistency Based Method and Proposed FRIM Method for WINE Dataset
6 5 4 3 2 1 0
Consistency Proposed
Different Discretization algorithms Figure -3 Features selected for WINE Dataset
ISSN: 0975-5462
2462
K.Sarojini et. al. / International Journal of Engineering Science and Technology Vol. 2(5), 2010, 2456-2465 C la s s if ic a t io n a c c u r a c y o f C o n s is t e n c y b a s e d a n d P r o p o s e d m e t h o d f o r W P BC Da t a s e t
Classification
Accuracy rate
80 75
R a w d a ta
70
C o n si ste n c y
65
P ro p o se d
60 C o n si ste n c y b a se d a n d P ro p o se d [F R I M ] m e th o d
Figure-4 Accuracy rate for WPBC Dataset
rate
Classification Accuracy
C la s s if ic a t io n a c c u r a c y o f C o n s is t e n c y b a s e d a n d P r o p o s e d m e t h o d f o r S O NA R Da t a s e t
76 74 72 70 68 66
R a w d a ta C o n siste n cy P ro p o se d C o n siste n cy b a se d a n d P ro p o se d [F R IM ] m e th o d
Figure-5 Accuracy rate for SONAR Dataset
Classification Accuracy rate
Clas s ification accur acy of Cons is te ncy bas e d and Pr opos e d m e thod for WINE Datas e t
96 94 92 90 88 86 84
Raw data Consistency Proposed
Cons is te ncy bas e d and Pr opos e d [FRIM ] m e thod
Figure-6 Accuracy rate for WINE dataset
ISSN: 0975-5462
2463
K.Sarojini et. al. / International Journal of Engineering Science and Technology Vol. 2(5), 2010, 2456-2465 5. Conclusion This paper, presents a new method for feature subset selection based on proposed MFRIM. The proposed method can deal with numeric features. First, the numerical attributes have been discretized using some of the benchmark clustering algorithms such as K-means, Fuzzy C Means and Median as initial centroid of K-means. The FRIM is computed through the triangular membership functions. Then the feature subset is obtained by focusing boundary samples. The performance evaluation of the proposed feature subset selection based on FRIM is analyzed through CART classifier available in WEKA systems for UCI data sets. From the experimental results shown in Table-2 and Table-3, it can be seen that, when compared with consistency based feature selection, the proposed method can select feature subset with minimum number of features which are relevant to get higher average classification accuracy for datasets.
References [1].S.M.Chen, “A new approach to handling fuzzy decision making problems”, IEEE Trans Syst Man Cybern 18(6):1012–1016, 1988. [2].S.M.Chen, Kao CH, Yu CH,” Generating fuzzy rules from training data containing noise for handling classification problems”, Cybern Syst 33(7):723–748,2002. [3].S.M.Chen, J.D.Shie,”A new method for feature subset selection for handling classification problems”, In: Proceedings of IEEE international conference on fuzzy systems, Reno, NV, pp 183-188, 2005. [4]. E.C.C.Tsang, D.S.Yeung, X.Z.Wang,”OFFSS: optimal fuzzy valued feature subset selection”, IEEE Trans Fuzzy Syst 11(2):202–213, 2003. [5]. Nicolaie popescu-bodorin, “Fast K-Means Image Quantization Algorithm and Its Application Iris Segmentation”, Bulletin Scientific2007.. [6]. S.M.Chen, C.H.Chang, “A new method to construct membership functions and generate weighted fuzzy rules from training instances”, Cybern Syst 36(4):397– 414, 2005. [7]. Luca AD, Termini S (1972), “A definition of non-probabilistic entropy in the setting of fuzzy set theory”, Inf Control 20(4):301-312. [8]. H.M.Lee, C.M.Chen, J.M.Chen, Jou YL, “An efficient fuzzy classifier with feature selection based on fuzzy entropy”, IEEE Trans Syst Man Cybern Part B Cybern 31(3):426–432, 2001. [9]. Jen-Da shie, Shyi-Ming Chen, “Feature subset selection based on fuzzy entropy measures for handling classification problems”, Appl Intel (2008) 28:69-82, 2008. [10].J.D.Shie, S.M.Chen, “A new approach for handling classification problems based on fuzzy information gain measures”, In: Proceedings of the 2006 IEEE international conference on fuzzy systems, Vancouver, BC, Canada, pp 5427–5434,2006. [11] Lei Yu, Huan Liu, “Redundancy Based Feature Selection for Micro array Data”, Proceedings of the 2004 ACM SIGKDD. 2004. pp. 737–742. [12]. SHI-Fei Ding, Shi-xiong Xia, Feng-Xiang Jin and Zhong-Zhi Shis, “Novel fuzzy information proximity measures”, Journal of Information Science, Volume 33 , Issue 6 (December 2007) , pages 678-685 Year of Publication: 2007 ISSN:0165-551. [13].Qing hua Hu, Hui Zhao, Zongxia Xie and Daren Yu, “Consistency Based Attribute Reduction”, Volume 4426/2007, 96-107, 978-3540-71700-3, 2007. [14]. Dash M., Liu H.: Consistency-based search in feature selection. Artificial Intelligence 151 (2003) 155-176. [15]. Marzuki, F. Ahmad,”Data mining discretization methods and Performance”, Proceedings of International conference on Electrical engineering and Informatics, institute Technology Bandung, Indonesia, June 2007. [16]. H Liu, etal: Discretization: ”An Enabling Technique. Data Mining and Knowledge Discovery”, 6,393-423, 2002. [17]. J.A.Hartigan, M.A.Wong, “A k-means clustering algorithm”. J Roy Stat Soc Ser C 28(1):100–108, 1979. [18].S.M.Chen, Y.C.Chen, “Automatically constructing membership functions and generating fuzzy rules using genetic algorithms”, Cybern Syst 33(8):841–862, 2002.
K.Sarojini received her B.Sc (Computer Science) and MCA from Bharathidasan University, Trichy, in 1993 and 1996 respectively. And, she received her M.Phil degree in Computer Science in the year of 2003 from Manonmaniam Sundaranar University, Thirunelveli. She is pursuing her PhD at Mother Teresa University for Women. She is currently working as an Assistant Professor in the Department of computer Applications, SNR Sons College, Coimbatore. She has 12 years of teaching experience. She has presented papers in various national and international conferences. She is a student member of IEEE. She has published a book on Fundamentals of Computers. Her area of specialization is Dimensionality Reduction in Data mining. Dr. K. Thangavel has received Ph. D degree in the area of Optimization Algorithms from The Gandhigram Rural Institute-Deemed University, Gandhigram, Tamilnadu, India in 1999. Currently he is working as Professor and Head in the Department of Computer Science, Periyar University, Salem, Tamilnadu, India. He has published more than 125 research publications in various National and Inter National Journals. He has edited three books published by Narosa Publishers New Delhi, India. He has successfully guided 5 Ph. D and 10 M.Phil scholars. More than 15 Ph. D scholars are pursuing under his research supervision. His research
ISSN: 0975-5462
2464
K.Sarojini et. al. / International Journal of Engineering Science and Technology Vol. 2(5), 2010, 2456-2465 interests include Data Mining, Digital Medical Image Processing, Soft Computing, Mobile Computing and Bioinformatics. He is a life member of Operational Research Society of India and member of the research group in Rough set Society. He has organized three National Conferences, three National Seminars, five Research workshops and two 21 day UGC refresher progrmmes. He is reviewer for leading publishers such as Elsevier, Springer, Taylor and Francis. He has achieved a state level Scientist award by the Government of Tamilnadu in the year 2010. Mrs. D.Devakumari, has received M. Phil degree in the area of Web Server Scheduling from Manonmaniam Sundaranar University in 2003. Currently she is working as Assistant Professor in the Department of Computer Science, Government Arts College, Dharmapuri, India. She is pursuing her Ph.D. in the area of Data Mining in Mother Teresa Women’s University. Two of her research papers have been selected for publication by International journals. She has presented papers in National and International Conferences. Her research interests include Data Pre processing and Pattern Recognition.
ISSN: 0975-5462
2465