Applications of Attributes Weighting in Data Mining - CiteSeerX

67 downloads 50186 Views 965KB Size Report
Song, Y-C, Meng, H-D, O'Grady, M.J., O'Hare, G.M.P. Applications of Attributes. Weighting in Data Mining. Proceedings of the IEEE SMC UK&RI 6th Conference ...
Song, Y-C, Meng, H-D, O'Grady, M.J., O'Hare, G.M.P. Applications of Attributes Weighting in Data Mining. Proceedings of the IEEE SMC UK&RI 6th Conference on Cybernetic Systems, Dublin, Ireland, September 2007. 41-45. ISSN 1744-9170.

Applications of Attributes Weighting in Data Mining Yu-Chen Song and Hai-Dong Meng Institute of Information Management Inner Mongolia University of Science & Technology Baotou, China [email protected], [email protected]

Abstract— In this paper, we present some methods of attributes weighting (or feature weighting), and then we consider in some detail another kind of attributes weighting - distance weighting. Distance weighting can be used to eliminate the correlation among attributes in some special data sets. It overcomes the particular problem that the similarity among objects cannot be fully reflected using distance only. Rather it weights each attribute according to its importance to the data, reflecting the different roles of each attribute during the clustering process. In this way, it reflects the data distribution characteristics, while improving the effectiveness of the clustering process.

I. INTRODUCTION Data pre-processing is important for successful data mining, by making the data more amenable for the data mining process. Often, the raw data must be processed in order to make it suitable for analysis. While one objective may be to improve data quality, other goals focus on modifying the data so that it better fits a specified data mining technique. The attributes weighting (feature weighting) is one data pre-processing method, and it is an alternative to keeping or eliminating features in the applications of data mining techniques, such as classification and clustering algorithms. More important features are assigned a higher weight, while less important features are given a lower weight. These weights are sometimes assigned based on domain knowledge about the relative importance of features. Alternatively, they may be determined automatically. For example, the distance weighting is based on weighted Euclidian distance, where the inverse of the coefficient of multiple correlations was decided to be the weights. K-means algorithm [11] is an effective algorithm to cluster data which has numerical attributes. It considers the contribution of each attribute of the samples as equal, but does not consider the different effects that different attributes may have on the clustering result. This paper discusses an attribute weights-based K-means algorithm which considers the different effects of different attributes on the clustering results. This algorithm uses the inverse of coefficient of multiple correlations as the weight of the attribute, which reflects the contribution of each attribute to the clustering result, thereby increasing the accuracy of clustering results, while simultaneously increasing the efficiency of the algorithm by decreasing the number of iterations.

M.J. O’Grady and G.M.P.O’Hare School of Computer Science & Informatics University College Dublin, Belfield, Dublin 4, Ireland {michael.j.ogrady,gregory.ohare}@ucd.ie

II. BACKGROUND There has been numerous attribute weight setting methods for data mining proposed in the literature. Some frequently referenced methods are now summarised. In [1], the authors address the problem of combining multiple weighted clusters which belong to different subspaces of the input space. They leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since they deal with weighted clusters, their consensus function makes use of the weight vectors associated with the clusters. The experimental results show that their ensemble technique is capable of producing a partition that is as good as or better than the best individual clustering. In [9], the authors handle clustering as a constrained minimization of a Bregman divergence. Weight modifications rely on the local variations of the expected complete loglikelihoods. Theoretical results bring modified (weighted) versions of clustering algorithms. In [7], a framework for integrating multiple, heterogeneous feature spaces in the k-means clustering algorithm is presented. Their methodology adaptively selects, in an unsupervised fashion, the relative weights assigned to various feature spaces with the objective of simultaneously attaining good separation along all the feature spaces. Using precision/recall evaluations, they empirically demonstrated that optimal feature weighting is extremely effective in locating the best clustering when compared against known ground truth classifications. In [14] and in other papers about Support Vector Machines (SVM), feature weights can be determined automatically. This technique has its roots in statistical learning theory and has shown promising categorization possibilities. SVM produces classification models in which each feature is given a weight. More information concerning feature weights in SVM may be obtained from [15], [10], [5], [3]. Documents are often represented as vectors, where each attribute represents the frequency with which a particular term (word) occurs in the document. Cosine similarity really is a measure of the cosine of the angle between two document vectors. Thus, if the cosine similarity is 1, the angle between the two document vectors is 0°, and the two document vectors are the same except for magnitude (length). If the

Song, Y-C, Meng, H-D, O'Grady, M.J., O'Hare, G.M.P. Applications of Attributes Weighting in Data Mining. Proceedings of the Observing IEEE SMC UK&RI 6th Conference on cosine similarity is 0, then the angle between the two Figure 1 and 2, the data in the cluster (repreCybernetic Systems, Dublin, Ireland, September 2007. 41-45. ISSN separated 1744-9170. document vectors is 90°, and they do not share any terms sented by ”*”) is distinctly with the other clusters, (words). In this way, the cosine similarity can be considered as a kind of feature weighting. More details about the cosine similarity can be found [13]. A significant number of other different attribute weight setting methods can be found in the literature and the interested reader is referred to [6], [16], [12], [4], [8].

thus the two clusters which are not well separated should be used for experiments. After removing the cluster (represented by ”*”, the other two clusters were kept for discussion which are shown in Figure 3 and 4.

III. WEIGHTS OF THE OBJECTS’ ATTRIBUTES A. Problem analysis The UCI [2] data set Iris was analysed which consists of 150 records sampled from three types of Iris flowers. There are 50 records in each type and each record has four attributes: sepal length, sepal width, petal length and petal width. In order to be visually presented in two-dimensional graph, attributes were divided into pairs to analyse. Two pairs were analysed: sepal length and sepal width, petal length and petal width. Figures 1 and 2 show the scatter plots of the two attribute pairs described in Iris.

Fig. 1.

Fig. 2.

Fig. 3.

Plot of sepal attributes.

Fig. 4.

Plot of petal attributes.

Plot of sepal attributes.

Plot of petal attributes.

Figure 3 shows that the two clusters (represented by the symbols ”.” and ”+”) are mixed when sepal attributes are used for clustering. Figure 4 shows that the two clusters (represented by the symbols ”.” and ”+” ) are clearly defined when petal attributes are used to clustering. It can be seen from Figure 3 and 4 that petal attributes can be used to improve clustering of the two types. Therefore, if clustering directly to the data set, that is treating the four attributes the same, the petal attributes cannot be taken advantage of so as to increase the quality of the clustering result. On the other hand, the sepal attributes would interfere with the petal attributes and thus would affect the accuracy of the final result.

Song, Y-C, Meng, H-D, O'Grady, M.J., O'Hare, G.M.P. Applications of Attributes Weighting in Data Mining. Proceedings of the IEEE SMC UK&RI 6th Conference on B. The Choice of Weighted Euclidian Distance reflects the ability of those attributes other than xi to replace Cybernetic Systems, Dublin, Ireland, September 2007.ρi 41-45. xi . When = 1, xi ISSN can be 1744-9170. removed or the weight of which

Focused on the above problem, clustering based on weighted Euclidian distance was adopted. This method weights each attribute according to the different role of the attributes during the clustering process, so that it makes full use of characteristics of the data distribution as well as increasing the accuracy of the clustering result. As can be seen from Figure 1 and 2, Figure 3 and 4 respectively, attributes are important to clustering which is mainly based on the following heuristic: the better the separated attribute subsets used to describe data sets, the more concentrated the same type of data will be when clustered, and the more discrete the different types of data will be in the resultant plots (that is, the data points will be better distributed and there will be a bigger distance between the clusters). In order to reflect the discrete degree of the data, the inverse of coefficient of multiple correlations was selected as the weight, after comparing different weighting methods. 1) Definitions and Properties of Correlation Coefficient: Coefficient of multiple correlations can be used to describe the synthesised effects and correlation of each attribute. The degree of multiple correlations of several attributes and one attribute are measured by coefficient of multiple correlations. Coefficient of multiple correlations can be calculated using coefficient of single correlation and coefficient of partial correlation. Let attribute y be a function and attributes x1 , x2 , · · · , xk , be variables, then the coefficient of multiple correlation between y and x1 , x2 , · · · , xk , (when there are k variables) is Ry,1 2···k . The formula is as follows when there are k variables, Ry,1 2···k = q 2 2 )(1 − r 2 1 − (1 − ry1 y2.1 ) · · · [1 − ryk.12···(k−1) ]

should be decreased; when ρi is very small, non- xi values cannot replace it and should increase its weight. Thus, |ρi |−1 can be used to calculate the weight wi : |ρi |−1 wi = PK i = 1, 2, · · · , k −1 j=1 |ρj |

wi is the absolute value of coefficient of multiple correlation such that 0 < wi ≤ 1. Then the weighted Euclidian distance is: " d(xi , xj ) =

in which, ry1 , ry2.1 , ryk.12···(k−1) are coefficient of single correlation and coefficient of partial correlation. The properties of coefficient of multiple correlations are as follows: 1) Coefficient of multiple correlations must be between 0 and 1 inclusively. 2) The greater the coefficient of multiple correlations is, the closer the correlation of the attributes is. A coefficient is 1 means absolute correlation and a coefficient of 0 means absolutely no correlation. 3) The coefficient of multiple correlations must greater than or at least equal to the absolute value of the coefficient of single correlation. The importance of each attribute during the clustering process can be obtained from the attributes of coefficient of multiple correlations. 2) The Calculation of Weights: The inverse of coefficient of multiple correlation weight method, based on the variance inverse weight method, was adopted. If considering a selected attribute xi using the correlations between the attribute and all the other attributes - the coefficient of multiple correlations ρx1,x2,··· ,xk which is simply represented as ρi , it

p X

# 21 wk |xik − xjk |2

(3)

k=1

in which wk (k = 1, 2, · · · , p) represents the weight of each variable. C. Weighted Attributes based K-means Algorithm Assume x = {x1 , x2 , · · · , xn } is an array of data elements to be clustered into k clusters, in which xi = [xi1 , xi2 , · · · , xim ] represents a data object which has m attributes. Weigh the input data based on the different attributes’ contribution to the clustering analysis. Assume the weight of each attribute is w1 , w2 , · · · , wn , and wj ≥ 0 0, j = 1, 2, · · · , m, then weighted data objects are: xi = w· xi , i = 1, 2, · · · , n. 

(1)

(2)

w1  0 w= 0 0

0 w2 0 0

0 0 w3 0



0 0  0  wm

(4)

The weighted Euclidian distance of two data objects is: " d(xi , xj ) =

m X

# 12 wk |xik − xjk |2

,

(5)

k=1

i = {, 2, · · · , n}, j = {1, 2, · · · , m} When doing clustering analysis, it is required that the elements of a weighted matrix be greater or equal to zero. The calculation details are as follow: 1) Weighted matrix: calculate the weights wi of each of the sample’s attributes and construct a weighted matrix; 0 2) Weight the attribute of data set: xi = wi · xi , i = 1, 2, · · · , n; 3) Use the K-means algorithm; 4) Calculate the variance function: if the function meets requirements, the clustering result is optimized, otherwise continue iteration from step 3). When algorithm ends, the clustering result is optimized.

Song, Y-C, Meng, H-D, O'Grady, M.J., O'Hare, G.M.P. Applications of Attributes Weighting in Data Mining. Proceedings of the IEEE SMC UK&RI 6th Conference on D. Validating the Weighting Algorithm by Application itself, but has a better performance than the original KCybernetic Systems, Dublin, Ireland, September 2007. 41-45. ISSN 1744-9170. means algorithm which does not use weighted attributes. In order to test the performance of the weighted attributes based K-means algorithm, experiment results of both original K-means algorithm and weighted attributes based K-means algorithm, a data set with 25 data points was clustered in the first experiment, and a data set with 150 data points was clustered in the second experiment. Experiment 1: Cluster the 25 data object into 2 clusters (Figures 5 and 6).

The clustering experiments show that the improved K-means algorithm can produce the correct clustering result easier.

Fig. 7.

Fig. 5.

cluster without using weighted attributes.

cluster without using weighted attributes.

Fig. 8.

cluster with using weighted attributes.

IV. CONCLUSION AND FUTURE WORK Fig. 6.

cluster with using weighted attributes.

Figure 5 shows that four objects which are Cluster 2 are clustered to Cluster 1. Figure 6 shows that all objects are correctly clustered to Cluster 1 and Cluster 2 respectively. Experiment 2: Cluster the 150 data object into 2 clusters (Figures 7 and 8). Figure 7 shows that some objects which are in Cluster 2 are clustered to Cluster 1. Figure 8 shows that the five objects which are in the middle are correctly clustered to Cluster 2 although the other six objects are wrongly clustered to Cluster 1. It can be seen from the above experiments that the clustering result of weighted attributes based K-means algorithm basically has the same the cluster structure as the data

In many real world applications, such as geochemistry, geophysics, and wireless sensor networks, there are some kinds of correlations among attributes in the raw data sets. And so the clustering results are not good if raw data sets are used directly by some clustering algorithms. Sometimes, those raw data sets are pre-processed, thus improving the clustering results. How to find a good way to weight attributes is very important to many real world applications. The Attributes Weighting is an effective and suitable approach for pre-processing raw data sets. The approach of attributes weighting reflects the different roles of attributes and the different characteristics of the distribution, improving the effectiveness of clustering, and optimizing clustering results in some real world applications.

Song, Y-C, Meng, H-D, O'Grady, M.J., O'Hare, G.M.P. Applications of Attributes Weighting in Data Mining. Proceedings of the IEEE SMC UK&RI 6th Conference on ACKNOWLEDGMENT Cybernetic Systems, Dublin, Ireland, September 2007. 41-45. ISSN 1744-9170.

This research is funded in part by the National Funded Project of China (No. 06XTQ011) and Science Foundation Ireland (SFI) under Grant No. 03/IN.3/I361, and by the China Scholarship Council. R EFERENCES

[1] Al-Razgan and C. Domeniconi. Weighted Clustering Ensembles. www.siam.org/meetings/sdm06/proceedings/024alrazganm.pdf. [2] C.L. Blake, D.J. Newman, and C.J. Merz. UCI Repository of machine learning databases. 1998. [3] C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998. [4] S. Cost and S. Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10(1):57–78, January 1993. [5] N. Cristianini and J.Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000. [6] J.D. Kelly and L. Davis. A hybrid genetic algorithm for classificatio. In in Proceedings of the Twelfth International Joint Conference, 1991. [7] D.S. Modha and W.S. Spangler. Feature Weighting in k-Means Clustering. Machine Learning, - Springer, 2003. [8] M. Mohri and H. Tanaka. An optimalweighting criterion of case indexing for both numeric and symbolic attributes. In D. Aha ed. and Menlo Park, editors, in Case-Based Reasoning: Papers from the 1994 Workshop. CA: AAAI Press, 1994. [9] R. Nock and F. Nielsen. On weighting clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(8):1223–1235, Aug. 2006. [10] B. Scjolkopf and A J.Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001. [11] S.Z. Selim and M.A. Ismail. K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Analysis and Machine Intelligence, pages 81–87, 1984. [12] C. Stanfill and D. Waltz. Toward memory-based reasoning. Commun. ACM, 29(12):1213–1228, 1986. [13] P.N. Tan, Nichael Steinbach, and Vipin Kumar. Introduction to Data Mining. Addison Wesley Publishing Company, April 2005. [14] V.Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995. [15] V.Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998. [16] D. Wettschereck. A Description of the Mutual Information Approach and the Variable Similarity Metric. Sankt Augustin, Germany, 1995.

Suggest Documents