digitized image of a fine needle aspirate (FNA) of a breast mass. They describe ..... âGTM: The generative topographic mapping,â Neural. Comput, vol. 10, no.
From variable weighting to cluster characterization in topographic unsupervised learning Nistor Grozavu
Youn`es Bennani
Abstract— We introduce a new learning approach, which provides simultaneously Self-Organizing Map (SOM) and local weight vector for each cluster. The proposed approach is computationally simple, and learns a different features vector weights for each cell (relevance vector). Based on the Self-Organizing Map approach, we present two new simultaneously clustering and weighting algorithms: local weighting observation lwoSOM and local weighting distance lwd-SOM. Both algorithms achieve the same goal by minimizing different cost functions. After learning phase, a selection method with weight vectors is used to prune the irrelevant variables and thus we can characterize the clusters. We illustrate the performance of the proposed approach using different data sets. A number of synthetic and real data are experimented on to show the benefits of the proposed local weighting using self-organizing models.
I. I NTRODUCTION
T
He problem of weighting and selecting a subset of variables constitute an important part of the design of good learning algorithms. The generalization performance of these algorithms can be significantly degraded if irrelevant variables are used. This negative effect increases in the case of unsupervised learning where no class labels are given. In this case, the problem is that not all variables are important. Some of the variables may be redundant, some may be irrelevant, and some can even degrade clustering quality. In order to find out relevant features, we combine weighting with selection techniques. In variable selection, the task is reduced to simply eliminating variables which are completly irrelevant. Variable selection is popular in supervised learning [1; 2; 3; 4] where it is done by maximizing some function of predictive accuracy. Variable weighting is an extension of the selection process where the variables are associated to continuous weights, which can be regarded as degrees of relevance. Continuous weighting provides a richer representation of feature relevance. Hence, it is clear that the clustering and variable selection/weighting task are coupled. Applying these tasks in sequence can degrade the performance of the learning system. Consequently, it is necessary to develop at the same time a new algorithm for clustering and for variables weighting. Since research in variable weighting for unsupervised learning is relatively recent, we hope that this paper will serve as a guideline to future researchers. In this paper we are interested in models, which make at the same time dimensionality reduction and clustering using Nistor Grozavu, Youn`es Bennani and Mustapha Lebbah are with LIPNUMR 7030, Universit´e Paris 13, 99, av. J-B Cl´ement, 93430 Villetaneuse, France (email: {firstname.secondname}@lipn.univ-paris13.fr). This work was supported by Cap Digital under Infom@gic Project.
Mustapha Lebbah
Self-Organizing Maps (SOM, [5]) in order to characterize clusters. SOM models are often used for visualization and for unsupervised topographic clustering, beacuse this technique allow projection in low dimensional spaces that are generally two dimensional. Some extensions and reformulations of the SOM model have been described in the literature [6; 7; 8]. Several important research topics in cluster analysis and variable weighting are discussed in [9; 10; 11; 12; 13; 14]. In [14], the authors propose a probabilistic formalism for variable selection in unsupervised learning using ExpectationMaximization (EM). In [9] the authors proposed two local weighting unsupervised clustering algorithms based on Fuzzy C-Means algorithm (SCAD1 and SCAD2) which categorize the unlabeled data while determining the best variable weights within each cluster. In [10; 12] the authors proposed an approach which minimizes the same cost function as [9] but, in this case, they propose to estimate global weighting variables. The proposed mechanism for feature weighting has been extended for a fuzzy k-means algorithm [15] and subspace clustering [16]. Similar techniques, based on kmeans and weighting are developed by other researchers [11; 17]. In contrast to the global weighting approach based on SOM method, which considers a single weight vector for the map [18; 19], our local weighting algorithms based on SOM characterize each cell of the map by a prototype and a weight vector, where each component indicates the corresponding variable relevance. Hence, these weight vectors are used for local variable selection in order to characterize clusters with the best subset of variables. For the variable selection task we use the statistical technique Scree Test of Cattell which has been proposed to select the principal components [20]. The rest of this paper is organized as follows: we present both approaches lwo-SOM (local weighting observation) and lwd-SOM (local weighting distance) in section III, after introducing the classical Self-Organizing Maps (SOM) in section II. In section IV, we present the variable selection algorithm and the principle of Cattell’s algorithm. In the section V, we show the experimental results on several data sets. These data sets allow us to illustrate the use of this algorithm for topographic clustering and variable weighting. Finally we drew some conclusions and the possibilities of further research in this area. II. T RADITIONAL S ELF -O RGANIZING M AP (SOM) Self-organizing maps are increasingly used as tools for data visualization, as they allow projection in low dimensional spaces, typically bi-dimensional. The basic model
proposed by Kohonen consists of a discrete set C of cells called “map”. This map has a discrete topology defined by an undirected graph, which usually is a regular grid in two dimensions. For each pair of cells (j,k) on the map, the distance δ(j, k) is defined as the length of the shortest chain linking cells j and k on the grid. For each cell j this distance defines a neighbor cell; in order to control the neighborhood area, we introduce a kernel positive function K (K ≥ 0 and lim K(y) = 0). We define the mutual influence of |y|→∞
two cells j and k by Kj,k . In practice, as for traditional topological maps we use a smooth function to control the ). Using this size of the neighborhood as Kj,k = exp( −δ(j,k) T kernel function, T becomes a parameter of the model. As in the Kohonen algorithm, we decrease T from an initilal value Tmax to a final value Tmin . Let