(JPMNT) Journal of Process Management – New Technologies, International Vol. 2, No.2, 2014.
TOWARDS FINDING A NEW KERNELIZED FUZZY C-MEANS CLUSTERING ALGORITHM 1
Samarjit Das1 and Hemanta K. Baruah2 Department of Computer Science & IT, Cotton College, Assam, India 2 Vice-Chancellor, Bodoland University, Assam, India 1
[email protected] ,
[email protected]
Abstract- Kernelized Fuzzy C-Means clustering technique is an attempt to improve the performance of the conventional Fuzzy C-Means clustering technique. Recently this technique where a kernel-induced distance function is used as a similarity measure instead of a Euclidean distance which is used in the conventional Fuzzy C-Means clustering technique, has earned popularity among research community. Like the conventional Fuzzy C-Means clustering technique this technique also suffers from inconsistency in its performance due to the fact that here also the initial centroids are obtained based on the randomly initialized membership values of the objects. Our present work proposes a new method where we have applied the Subtractive clustering technique of Chiu as a preprocessor to Kernelized Fuzzy CMeans clustering technique. With this new method we have tried not only to remove the inconsistency of Kernelized Fuzzy C-Means clustering technique but also to deal with the situations where the number of clusters is not predetermined. We have also provided a comparison of our method with the Subtractive clustering technique of Chiu and Kernelized Fuzzy C-Means clustering technique using two validity measures namely Partition Coefficient and Clustering Entropy. Keywords: Kernelized Fuzzy C-Means clustering technique, Subtractive clustering technique, randomly initialized membership values, Partition Coefficient, Clustering Entropy.
1.Introduction With the advancement in the development of novel techniques for generating and collecting data, the rate of growth of scientific databases has been increased almost exponentially. Hence it is practically impossible to extract useful information from such huge collection of data by using the conventional database analysis techniques. Data clustering plays
an important role in such situations where there is a need to learn the inherent grouping structure of data in an unsupervised manner. The basic purpose of clustering is to partition a large dataset into some smaller groups, known as clusters, using some similarity measures, such that the similarity between any two objects in the same group or cluster is more than that between any two objects in two different clusters. The conventional hard clustering techniques can not deal with the situations pertaining to nonprobabilistic uncertainty. Hard clustering techniques are based on Crisp Set Theory and therefore there is no possibility of partial belongingness of objects to multiple clusters. In other words the clusters revealed by a hard clustering technique are disjoint i.e. an object of a dataset, after the application of a hard clustering technique, either belongs totally to a particular cluster or does not belong to that cluster at all. The concept of partial belongingness was first introduced by Prof. Lotfi A. Zadeh of University of California at Berkeley in the year 1965 in his famous Fuzzy Set Theory. According to Zadeh (1965) a fuzzy subset A of X, universe of discourse, is defined by its membership function μ A: X [0, 1]. For any 𝑥 X, the value μ A(𝑥) specifies the degree to what 𝑥 belongs to A. In Zadehian Fuzzy Set Theory it is believed that there is no difference between the membership value and the membership function for the complement of a fuzzy set. Baruah (2011a, 2011b) has rightly shown that the membership value of a fuzzy number can be expressed as a
54 www. japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International Vol. 2, No.2, 2014.
difference between a membership function and a reference function and therefore the membership value and the membership function for the complement of a fuzzy set are not same. With the advent of Fuzzy Set Theory, the conventional hard clustering techniques have unlocked a new way of clustering known as fuzzy clustering. Due to the existence of the concept of degree of belongingness, in fuzzy clustering an object may belong exactly to one cluster or partially to more than one clusters depending on its membership value. In the literature, out of the different available fuzzy clustering techniques the Fuzzy CMeans clustering technique (FCM) of Bezdek (1981) has been found to be widely studied and applied. Derrig and Ostaszewski (1995) have applied the FCM of Bezdek in their research work where they have explained a method of pattern recognition for risk and claim classification. Das and Baruah (2013a) have shown the application of Bezdek’s (1981) FCM clustering technique on vehicular pollution, through which they have discussed the importance of application of a fuzzy clustering technique on a dataset describing vehicular pollution, instead of a hard clustering technique. Das and Baruah (2013b) have applied the FCM clustering technique of Bezdek (1981) and the Gustafson-Kessel (GK) clustering technique of Gustafson and Kessel (1979) on the same dataset to make a comparison between these two clustering techniques and found that the overall performance of FCM clustering technique was better than that of GK clustering technique on the dataset they had used. Although in most of the situations it is evident that the FCM clustering technique performs better than other fuzzy clustering techniques, due to the random initialization of the membership values the performance of FCM clustering technique varies significantly in its different executions. Yager and Filev (1992) proposed a simple and effective method, called the Mountain Method, for estimating the number and
initial location of cluster centers. Although this technique, unlike the FCM clustering technique, did not depend on any randomly initialized membership value to estimate the initial locations of cluster centers, the problem with this method, mountain clustering, was that its computation grew exponentially with the dimension of the problem. Chiu (1994) developed a new method called Subtractive Clustering (SC) with which he could solve this problem by using data points as the candidates for cluster centers instead of grid points as in the Mountain Clustering. Das and Baruah (2014b) have proposed a new method, named as SUBFCM, where they have first applied the Subtractive clustering technique of Chiu (1994) to find out the initial cluster centers and then used these initial cluster centers in FCM clustering technique of Bezdek (1981) for obtaining the final cluster centers along with the membership values of objects in different clusters. Das and Baruah (2014b) have justified that through the SUBFCM method not only the effect of randomness can be removed from the FCM of Bezdek(1981) but also the situations where the number of clusters is not predefined can properly be dealt with. In addition to that the performance of SUBFCM is much higher than that of Subtractive clustering technique of Chiu (1994). Yuan et al. (2004) proposed a systematic method for finding the initial centroids where there is no scope of randomness and therefore the centroids obtained by this method are found to be consistent. Das and Baruah (2014c) proposed a method where they have used the method of Yuan et al. (2004) to obtain the initial centroids and then used these initial centroids in FCM of Bezdek (1981) to remove the effect of random initialization from FCM and also to improve the overall performance of it. Das and Baruah (2014c) have justified that although the average performance level of their proposed method is higher than that of FCM of Bezdek (1981) , it is advisable 55
www. japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International Vol. 2, No.2, 2014.
to optimize the performance level of their method with the best choice of the multiplication factor. In the recent past kernel methods (Carl (1999), Muller et al. (2001) ) have earned popularity especially in the machine learning community and have widely been applied to pattern recognition and function approximation. A Kernel-based Fuzzy C-Means (KFCM) clustering technique is generally derived from the conventional FCM by using a kernel induced distance function instead of a Euclidean distance. Hogo (2010) proposed a KFCM algorithm where he used the Gaussian Radial Basic Function as a kernel induced distance function. His work presented the use of FCM and KFCM to find the learners’ categories and to predict their behaviour which could help the decision makers in e-learning system. Das and Baruah (2014a) have made a comparison of FCM and KFCM clustering techniques through which they have shown that the performances of both FCM and KFCM vary significantly with randomly initialized membership values. They have also found that the performance of KFCM depends on the value of of Gaussian Radial Basic Function used in it as a distance function. Das and Baruah (2014a) have justified that the best choice of in KFCM provides better performance than FCM, however, with random value of the performance of KFCM may not always be better than that of FCM. Although KFCM is an attempt to improve the performance of FCM, the effect of randomness is still there in KFCM and as a consequence of which the performance of KFCM is found to be inconsistent. Our present work proposes a new method through which this inconsistency of KFCM clustering technique has been tried to be removed. KFCM also requires the number of clusters to be predetermined which is also a prerequisite condition of FCM. Looking into the inconsistent behaviour of KFCM due to the effect of randomness and the inability of KFCM to deal with the situations where the number
of clusters is not predetermined, we have applied the Subtractive clustering technique of Chiu (1994) as a preprocessor to KFCM clustering technique with an intention to take care of both of these limitations of KFCM. We have also provided a comparison of our method with the Subtractive clustering technique of Chiu (1994) and KFCM clustering technique of Hogo (2010) using two validity measures namely Partition Coefficient (PC) and Clustering Entropy (CE) (Bezdek (1981) and Bensaid et al. (1996)). In section-2 we define the problem of our present work. The mathematical calculations, algorithms used in our present work have been placed in section3. We shall provide the results and analysis in section-4. Finally we put the conclusions in section-5. 2.Problem Definition Although KFCM clustering technique has earned popularity, the inconsistency due to randomness has been found to be a major drawback of it. This effect of randomness needs to be removed from KFCM to achieve consistent results. Another limitation of KFCM is its inability to deal with the situations where the number of clusters is not predetermined. Chiu (1994) used data points as the candidates for cluster centers in his Subtractive clustering technique to find the initial cluster centers. Due to the absence of randomness, the initial cluster centers obtained by this method of Chiu (1994) do not vary in different executions. Moreover the Subtractive clustering technique of Chiu (1994) can deal with the situations where the number of clusters is not predetermined. Therefore we propose a new method where we have applied the Subtractive clustering technique of Chiu (1994) as a preprocessor to KFCM with an intension to achieve two aspects – to remove the effect of randomness from KFCM and to deal with the situations
56 www. japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International Vol. 2, No.2, 2014.
where the number of clusters is not predetermined.
n
vi
(l )
3.Our present work
( k 1 n
(l ) si
( k 1
Before discussing our proposed algorithm we shall first discuss the algorithms which have been used, directly or indirectly, in our present work. In Section-3.1 we shall describe the FCM of Bezdek (1981) followed by KFCM of Hogo (2010) and Subtractive clustering technique of Chiu (1994) in section-3.2 and section-3.3 respectively. Finally we illustrate our algorithm in section-3.4.
|| xk vi ||
p
=
(x j 1
kj
where xkj is the jth feature of the kth feature vector, for k=1,2,……,n; j=1,2,….,p and vij , j-dimensional centre of the ith cluster for i=1,2,……,c; j=1,2,….,p; n, p and c denote the total number of feature vector , no. of features in each feature vector and total number of clusters respectively. Choose the initial fuzzy partition (by putting some random values) ( 0) U(0) [ si ( xk )]1i c,1 k n Choose a parameter >0 (this will tell us when to stop the iteration). Set the iteration counting parameter l equal to 0. Step 2: Calculate the fuzzy cluster centroids {vi( l ) }i 1, 2,.....,c given by the following formula
( xk ))m
for i = 1, 2 , ….. c;
k= 1, 2, ,
…..n. Step 3: Calculate the new partition matrix (i.e. membership matrix)
U (l 1) [si where
( l 1)
s (l 1) ( xk ) i
( xk )]1ic,1k n ,
1 2 (l ) || xk vi || m 1 ( ) (l ) || j 1 || xk v j c
(3) for i=1,2,……..,c and k=1,2,……..,n. (l ) If xk vi , formula (3) cannot be used. In this case the membership function is
s (l 1) ( xk ) {10ifkifkii ,i 1, 2,....,c i
vij ) 2
(1)
(l ) si
(2)
3.1. Bezdek’s FCM Algorithm Step 1: Choose the number of clusters, c, 2≤c ) then repeat steps-6, 7 and 5 else stop iteration.
(SUBKFCM) ten(10) times on the same dataset (see table1) and tried to make a comparison of the performances of these three clustering techniques. We have used two validity measures namely Partition Coefficient (PC) and Clustering Entropy (CE) (Bezdek (1981) and Bensaid et al. (1996)) and also the number of iterations to compare the performances of these three clustering techniques. The mathematical formulae of these two validity measures have been given in the following. Partition Coefficient (PC): measures the overlapping between clusters. PC (c)
The pictorial representation of the above algorithm has been provided in Figure-1. We have applied each of KFCM clustering technique, Subtractive clustering technique and our method
(10)
Clustering Entropy (CE): measures the fuzziness of the cluster partition
Step 6: Update the centroids by KFCM. (see equation (6) of section-3.2) Step 7: Update the membership values by KFCM. (see equation (7) of section-3.2)
1 c n ( ij ) 2 n i 1 j 1
CE (c)
1 c n ij log(ij ) n i 1 j 1
(11)
As a requirement of Subtractive clustering technique we have normalized the dataset (see table-1) so that the data points are bounded by a hypercube. In our present work we have applied these three clustering techniques on the normalized dataset only to maintain the uniformity.
59 www. japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International Vol. 2, No.2, 2014.
Figure 1: The flowchart of our proposed algorithm.
Start
Read the normalized Dataset Find the initial centroids by the SC algorithm of Chiu.
Take these centroids as the input to KFCM and calculate the membership values of the objects by KFCM.
Calculate
Yes
if( )
Update the centroids by KFCM.
Update the membership values by KFCM.
Stop
60 www. japmnt.com
No
(JPMNT) Journal of Process Management – New Technologies, International Vol. 2, No.2, 2014.
Table 1: Data set of individual differences of fifty (50) feature vectors with dimension(feature) three(03). FV 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
IQ 91 85 120 90 92 82 95 89 96 90 97 125 100 90 100 95 130 130 90 91 140 92 101 85 97
AM 18 16 19 18 17 17 19 18 19 17 16 21 19 17 18 19 23 19 17 17 22 18 18 16 19
SA 55 40 74 75 74 55 75 74 75 55 54 74 75 54 84 75 85 75 55 56 82 75 55 54 54
FV 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
IQ 110 100 100 70 105 79 80 125 100 125 80 85 145 80 92 120 145 95 80 90 115 100 80 105 120
AM 18 16 18 14 17 14 15 20 19 19 18 18 25 18 17 18 30 18 16 17 23 18 14 19 21
SA 55 40 75 30 55 35 34 75 75 85 60 70 90 74 55 70 80 50 36 55 84 80 35 75 74
FV: feature Vector, IQ: Intelligent Quotient, AM: Achievement motivation, SA: Social adjustment. 4.Results and analysis In this section we provide the results and analysis of our present work. First we have applied the Subtractive clustering algorithm of Chiu on a dataset (see table-1) which had been normalized so that it is bounded by a hypercube. This algorithm partitioned the given dataset into five (05) clusters. The values of the different validity measures of the clusters revealed by this algorithm in ten (10)
different executions have been given in table-2. Next we have applied the KFCM algorithm on the same normalized dataset by predetermining five (05) numbers of clusters. We applied this algorithm for ten (10) different random initialization of the membership values of the objects in the dataset. For each random initialization we have used three (03) different values of the adjustable parameter,, of the Gaussian Radial Basic Function to find the optimized result out of the three. In table-3 61
www. japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International Vol. 2, No.2, 2014.
we provide the optimized values of different validity measures for the best choice among three (03) different values of σ of KFCM algorithm obtained in ten (10) different executions. Finally we have applied our algorithm i.e. SUBKFCM on the same normalized dataset. We obtained the optimized results for the best choice of and performed the same experiment for ten (10) times. The optimized values of different validity measures for the best choice of σ of SUBKFCM algorithm obtained in ten (10) different executions have been given in table-4. Based on the values of the different validity measures (see tables – 2, 3 and 4) obtained by these three different algorithms we have tried to make a comparison of these three algorithms.
cluster centers. On the other hand there is no variation of the validity measure PC obtained by SC and SUBKFCM algorithms in ten (10) different executions.
Table 3: Optimized values of different validity measures for the best choice among three (03) different values of σ of KFCM algorithm obtained in ten (10) different executions. KFCM ALGORITHM Itn PC RUN1 12 0.725 RUN2 9 0.726 RUN3 11 0.724 RUN4 11 0.726 RUN5 10 0.682 RUN6 11 0.725 RUN7 8 0.683 RUN8 8 0.683 RUN9 12 0.726 RUN10 9 0.682
Table 2: Values of different validity measures of SC algorithm obtained in ten (10) different executions. SUBTRACTIVE ALGORITHM Itn PC CE RUN1 0 0.409 1.097 RUN2 0 0.409 1.097 RUN3 0 0.409 1.097 RUN4 0 0.409 1.097 RUN5 0 0.409 1.097 RUN6 0 0.409 1.097 RUN7 0 0.409 1.097 RUN8 0 0.409 1.097 RUN9 0 0.409 1.097 RUN10 0 0.409 1.097
CE 0.578 0.579 0.584 0.578 0.659 0.58 0.657 0.659 0.579 0.658
Table 4: Optimized values of different validity measures for three (03) different values of σ of SUBKFCM algorithm obtained in ten (10) different executions.
In Figure-2 we see that there is significant variation of the validity measure PC obtained by KFCM algorithm in ten (10) different executions. This is obviously due to the fact that the membership values of the objects had been initialized randomly to find the initial
62 www. japmnt.com
SUBKFCM ALGORITHM Itn PC CE RUN1 5 0.654 0.702 RUN2 5 0.654 0.702 RUN3 5 0.654 0.702 RUN4 5 0.654 0.702 RUN5 5 0.654 0.702 RUN6 5 0.654 0.702 RUN7 5 0.654 0.702 RUN8 5 0.654 0.702 RUN9 5 0.654 0.702 RUN10 5 0.654 0.702
(JPMNT) Journal of Process Management – New Technologies, International Vol. 2, No.2, 2014.
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
PC KFCM PC SC PC SUBKFCM
Figure 2: Comparison of KFCM, SC and SUBKFCM based on the validity measure PC in ten (10) different executions. This consistent performances of SC and SUBKFCM algorithms have become possible only because of the absence of randomness in both the algorithms. Moreover in Figure-2 the line of performance of SUBKFCM algorithm is seen to be far above (higher values of PC) than that of SC, although in most of the cases it is slightly below than that of KFCM. Figure-3 reveals that the validity measure CE of KFCM varies significantly in ten (10) different executions, however no such variation is there in case of SC and SUBKFCM. We also see here that SUBKFCM exhibits much better performance (lower values of CE) than
SC, although in most of the cases it shows slightly poor performance than KFCM. In Figure-4 we see that KFCM needs different number of iterations to reach the final clusters in ten (10) different executions. However it is seen here that SUBKFCM needs only five iterations to accomplish the same in all the ten (10) iterations and SC is a non iterative algorithm. The above results reveal that there is significant variation in the performances of KFCM algorithm in its different executions. But no such variations have been seen in case of SC and SUBKFCM.
1.2 1 0.8 0.6 0.4 0.2 0
CE KFCM CE SC CE SUBKFCM
Figure 3: Comparison of KFCM, SC and SUBKFCM based on the validity measure CE in ten (10) different executions.
63 www. japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International Vol. 2, No.2, 2014.
14 12 10 8
ITERATION KFCM
6
ITERATION SC
4
ITERATION SUBKFCM
2 0
Figure 4: Comparison of KFCM, SC and SUBKFCM based on the no. of iterations in ten (10) different executions. applications to image segmentation, IEEE Trans. on Fuzzy object, Vol. 2, Issue 2, 112123.
5.Conclusions Although KFCM exhibits high performance, it has two major limitationsexistence of inconsistency due to randomly initialized membership values of the objects in the dataset and the inability to deal with the situations where the number of clusters is not predetermined. These two limitations of KFCM can be overcome by SC algorithm of Chiu which relies on the data points to find the initial cluster centers. But the problem with SC algorithm is that its performance is very low. Looking into the limitations of KFCM algorithm and the poor performance of SC we have introduced our algorithm, namely SUBKFCM, which is capable of handling the limitations of KFCM in a much better way than that of SC algorithm. References 1.
Baruah, H. K. (2011a), The Theory of Fuzzy Sets: Beliefs and Realities, International Journal of Energy, Information and Communications, vol. 2, no. 2, 1 – 22.
2.
Baruah, H. K. (2011b), Towards Forming a Field of Fuzzy Sets, International Journal of Energy, Information and Communications, vol. 2, no. 1, 16-20.
3.
Bensaid, A. M., L. O. Hall and J. C. Bezdek (1996), Validity-guided (re) clustering with
4.
Bezdek, J. C. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York.
5.
Carl, G. A. (1999), Fuzzy clustering and fuzzy merging algorithm, CS-UNR-101 Tech. Rep.
6.
Chiu, S.L. (1994), Fuzzy Model Identification Based on Cluster Estimation, Journal of Intelligent and Fuzzy Systems, vol.2, 267-278.
7.
Das, S. and H. K. Baruah (2013a), Application of Fuzzy C-Means Clustering Technique in Vehicular Pollution, Journal of Process Management – New Technologies, Vol. 1, Issue 3, 96-107.
8.
Das, S. and H. K. Baruah (2013b), A Comparison of Two Fuzzy Clustering Techniques , Journal of Process Management – New Technologies, vol. 1, no. 4, 1-15.
9.
Das, S. and H. K. Baruah (2014a), Dependence of Two Different Fuzzy Clustering Techniques on Random Initialization and a Comparison, International Journal of Advanced Research in Computer Science and Software Engineering , Vol. 4, Issue 1,422-428 .
10. Das, S. and H. K. Baruah (2014b), An Approach to Remove the Effect of Random Initialization from Fuzzy C-Means Clustering Technique, Journal of Process Management – New Technologies, Vol. 2, Issue 1, 23-30.
64 www. japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International Vol. 2, No.2, 2014. 11. Das, S. and H. K. Baruah (2014c), A New Method to Remove Dependence of Fuzzy CMeans Clustering Technique on Random Initialization, International Journal of Research in Advent Technology, Vol. 2, No.1, 322-330. 12. Derrig, R. A. and K. M. Ostaszewski (1995), Fuzzy Techniques of Pattern Recognition in Risk and Claim Classification, Journal of Risk and Insurance, vol. 62, no. 3, 447-482. 13. Gustafson, D. E. and W. C. Kessel (1979), Fuzzy clustering with a fuzzy covariance matrix, Proc. IEEE CDC, San Diego, CA, USA, 761 766. 14. Hogo, M. A. (2010), Evaluation of E-Learners Behaviour using Different Fuzzy Clustering Models: A Comparative Study, International Journal of Computer Science and Information Security, Vol. 7, No. 2, ,131-140. 15. Muller, K. R. , S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf (2001), An introduction to kernel-based learning algorithms, IEEE Trans Neural Networks, Vol. 12, Issue 2,181—202. 16. Yager, R.R. and D.P. Filev (1992), Approximate Clustering Via the Mountain Method, Tech. Report #MII-1305, Machine Intelligence Institute, Iona College, New Rochelle, NY. 17. Yuan, F. , Z.H. Meng, H.X. Zhang, C.R. Dong, (2004), A new algorithm to get the initial centroids, Proc. of the 3rd International Conference on Machine Learning and Cybernetics, 26-29. 18. Zadeh, L. A. (1965), Fuzzy Sets, Information and Control, vol. 8, no. 3, 338-353.
65 www. japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International Vol. 2, No.2, 2014.
66 www. japmnt.com