Automatic Cluster Detection in Kohonen's SOM - IEEE Xplore

3 downloads 0 Views 3MB Size Report
Abstract—Kohonen's self-organizing map (SOM) is a popular neural network architecture for solving problems in the field of explorative data analysis, clustering, ...
442

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 3, MARCH 2008

Automatic Cluster Detection in Kohonen’s SOM Dominik Brugger, Martin Bogdan, and Wolfgang Rosenstiel

Abstract—Kohonen’s self-organizing map (SOM) is a popular neural network architecture for solving problems in the field of explorative data analysis, clustering, and data visualization. One of the major drawbacks of the SOM algorithm is the difficulty for nonexpert users to interpret the information contained in a trained SOM. In this paper, this problem is addressed by introducing an enhanced version of the Clusot algorithm. This algorithm consists of two main steps: 1) the computation of the Clusot surface utilizing the information contained in a trained SOM and 2) the automatic detection of clusters in this surface. In the Clusot surface, clusters present in the underlying SOM are indicated by the local maxima of the surface. For SOMs with 2-D topology, the Clusot surface can, therefore, be considered as a convenient visualization technique. Yet, the presented approach is not restricted to a certain type of 2-D SOM topology and it is also applicable for SOMs having an -dimensional grid topology. Index Terms—Clustering methods, exploratory data analysis, neural network architecture, prosthetics, self-organizing feature maps.

I. INTRODUCTION

T

HE self-organizing map (SOM) is a very popular neural network architecture used mainly for unsupervised learning [1]. In the past, it has been successfully applied to a broad range of problems, like large-scale document organization [2], medical engineering [3], and others [4], [5]. When the SOM is used as a tool for exploratory data analysis, it can be viewed as a mapping of -dimensional input space to -dimensional output space, where usually and or . Other methods which allow the computation of such a mapping are principal component analysis (PCA), curvilinear component analysis (CCA), and Sammon’s projection [6]–[8]. In principle, all of these approaches allow the user to gain insight into the structure of high-dimensional data, yet in many applications this is only possible if the user has certain experience with the projection method and/or expert knowledge concerning the data. Manuscript received August 31, 2006; revised April 6, 2007 and June 30, 2007; accepted July 11, 2007. This work was supported by the Deutsche Forschungsgemeinschaft (DFG) under Grants 441 (Graduiertenkolleg Chemie in Interphasen) and RO 1030/12. D. Brugger is with the Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Tübingen, 72076 Baden-Württemberg, Germany (e-mail: [email protected]; [email protected]). M. Bogdan is with the Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Tübingen, 72076 Baden-Württemberg, Germany and also with the Technische Informatik, University of Leipzig, 04103 Leipzig, Germany. W. Rosenstiel is with the Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Tübingen, 720076 Baden-Württemberg, Germany and also with the Department for System Design, Microelectronics at the Computer Science Research Centre (FZI), 76131 Karlsruhe, Germany. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNN.2007.909556

Fig. 1. System for controlling a prosthesis using nerve signals recorded by a neurosensor. The heart of the system is an SOM used to group the preprocessed nerve signals. After the SOM training, an essential step is the automatic detection of clusters, which is done by the Clusot algorithm. This algorithm uses the neuron frequencies and the Euclidian distance between codebook vectors of neighboring neurons to detect clusters in the trained SOM. Numbers at each neuron position indicate the neuron’s hit number.

An example of this kind of application is depicted in Fig. 1, where a patient is supplied with a prosthesis controlled by nerve signals, which are recorded via a neurosensor in the stump of his arm. The connection between nerve signals and motor commands of the prosthesis is achieved by preprocessing the nerve signals using independent component analysis (ICA) and training an SOM [3]. By means of the SOM training, nerve signals are grouped into distinct clusters, which have to be detected by the medical personnel in order to assign clusters to motor commands. This step is crucial for the successful application of the system and usually it can only be performed by a user who has some experience with the SOM algorithm. As the medical personnel and the patient generally do not have this experience, it is necessary to devise a method for automatically detecting clusters in the trained SOM. Another aspect in this biomedical application concerns the confidence with which nerve signals are assigned to a cluster. For the sake of noise suppression, it is desirable that not every nerve signal leads to a motor response. To achieve this goal, it is,

1045-9227/$25.00 © 2008 IEEE

BRUGGER et al.: AUTOMATIC CLUSTER DETECTION IN KOHONEN’S SOM

443

Fig. 2. Two main steps of the Clusot algorithm are the computation of the Clusot surface and the subsequent detection of clusters. Numbers at each neuron position indicate the neuron’s hit number in the figure on the left. Note that the Clusot surface is computed in the space of the SOM grid.

therefore, admissible to leave a certain fraction of nerve signals unassigned (cf. Fig. 1). To solve the problem of automatic cluster detection, the Clusot algorithm has been developed [9]. As shown in Fig. 2, Clusot is a two-step procedure, where the first step involves the computation of the Clusot surface which is used in the second step to detect the clusters. The result of the first step, the Clusot surface, can be regarded as a sophisticated way to visualize the information that is contained in a trained SOM. Thus, it is sim-matrix [10] ilar to other visualization approaches like the or the method described in [11]. However, in contrast to these methods, Clusot not only uses the Euclidian distance between the codebook vectors of the trained SOM for the computation of the surface, but also the neuron frequency [12] of each neuron in the SOM. Previous approaches to cluster the SOM include the direct application of agglomerative clustering algorithms and -means to the codebook vectors of an SOM [13] and a semiautomatic algorithm, requiring the user to iteratively choose starting points in yet unclustered areas of the SOM [14]. While the latter approach does not fulfill the requirement of minimal user interaction, the former approach again only uses part of the information contained in a trained SOM, that is, the Euclidian distance between the codebook vectors. The method described in [14] has recently been extended to use density information and a heuristic to automatically set its threshold parameter [15]. In the following, an enhanced version of the originally proposed Clusot algorithm is introduced, which can be applied to SOMs having an arbitrary 2-D topology or -dimensional grid topology. Additionally, a better way for detecting clusters in the Clusot surface is derived. This paper is organized as follows. In Section II, a brief description of the sequential and batch SOM training algorithm is given. Section III details the computation of the Clusot surface followed by the description of cluster detection given in Section IV. Finally, Section V provides results of our algorithm on some well-known benchmark problems.

with respect to (w.r.t.) a general distance measure , . A common choice for is the where , and Euclidian distance is usually called the winner neuron. During the SOM training, and all the neurons in its neighborhood the winner neuron may adapt their codebook vectors. In the sequential SOM algorithm, the rate of adaption is steered by a monotonically , the learning decreasing and time-dependent function rate. Another function is used to describe the neighborhood of a neuron, a frequent choice being the Gaussian . The size of the function neighborhood is measured by the neighborhood radius; for , which again is the Gaussian function, it is described by a monotonically decreasing function of time. Thus, the main steps of the sequential SOM training are as follows. of all neurons. 1) Initialize the codebook vectors 2) Determine the winner neuron for input using the . distance measure 3) Update the codebook vectors

4) Repeat the last two steps until a predefined number of steps is reached. Note that the sequential SOM training is usually performed in two phases, where the first phase is characterized by the choice of a large learning rate and neighborhood radius and the second phase is used to fine-tune the codebook vectors and . If all of by setting smaller starting values for the input data is known at the beginning of the SOM training, the batch SOM training algorithm can be used instead of the sequential SOM training. Its main steps are as follows. of all neurons. 1) Initialize the codebook vectors 2) Compute the Voronoi sets and the sums . 3) Update the codebook vectors using

II. SOM ALGORITHM As already mentioned, the SOM algorithm [1] can be regarded as a nonlinear, ordered, and smooth mapping of -dimensional input space to -dimensional output space. In output space, each neuron, or unit, of the SOM is represented . The by a codebook vector is projection of an input pattern achieved by assigning to the closest codebook vector

4) Repeat the last two steps until a predefined number of steps is reached. Compared to the sequential SOM training, the batch SOM training has the advantage that no learning rate has to be specified by the user of the algorithm. Another advantage of batch SOM training is the fast convergence of the codebook vectors towards their final values, if the codebook vectors are partially

444

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 3, MARCH 2008

ordered before training starts. This requirement can be fulfilled, if the linear initialization method described in [1] is used. Recently, another SOM training algorithm with constant learning rate has been proposed in [16]. III. CLUSOT SURFACE CALCULATION The information contained in a trained SOM forms the basis for the calculation of the Clusot surface. One source of information is the Euclidian distance between the codebook vectors of the SOM, and the other one is the neuron frequency , which is defined as the hit number of neuron divided by the number of input patterns (1) Furthermore, in the following description, denotes the normalized Euclidian distance between the codebook vectors of neuron and , where the normalization assures that the maximum distance between two neurons is 0.99 and the minimum distance is zero. A. The 1-D Grid Topology For the case of a 1-D grid topology, the neurons of the SOM are embedded into a Cartesian coordinate system. The position of neuron is simply set to , assuming that . Next, for each neuron, a modified Gaussian function centered above each neuron position is computed (2) The height at the peak of this function is influenced by the deneuron frequency , whereas the standard deviation and pends on the normalized distances

N N

Fig. 3. For neuron , the standard deviation of the modified Gaussian function is determined by the normalized Euclidian distance between the codebook and its neighbors and . In the picture, the vectors of neuron because . function is stretched in direction of neuron

N N

N

d

2.

BRUGGER et al.: AUTOMATIC CLUSTER DETECTION IN KOHONEN’S SOM

457

Fig. 27. Special cases that need to be considered when computing the interpolating spline S . (a) Oriented angle between two neighbor neurons of the neuron at position q is larger than  . Cases (b) and (c) arise if =  and the number of neighbor neurons l > 2 or l = 2. All cases can be handled by inserting the so-called phantom interpolation points ph.

5) If

, insert two phantom neuron positions and

with

LOCAL MAXIMA MAXIMUM SPANNING TREE foreach BFS foreach MINIMAL EDGE return Algorithm 2: Minimal Edge

APPENDIX II ALGORITHMIC DESCRIPTIONS

Input: BFS tree and weight function

Note that inside Algorithm 3 the connected components of tree are computed by the algorithm given in [27].

Output: Edge path from to . MINIMAL EDGE

Algorithm 1: Preprocess

1) Input: Undirected graph and functions Output: MST local maxima of PREPROCESS

and two weight .

of , the set , and a set of articulation edges .

2) of

3)

while if

4) 5) 6)

return

, start node .

, node

with the smallest weight on a BFS

,

458

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 3, MARCH 2008

Algorithm 3: Recursive flooding Input: of the undirected graph computed • MST by Algorithm 1. of local maxima of and the set of articulation • Set edges . and . • Two weight functions Output: Set of graphs , where

is a subgraph of .

RECURSIVE FLOODING

Sort the edges in

in nondecreasing order of weight.

MAX NEXT ELEMENT

if

CONNECTED COMPONENTS foreach RECURSIVE FLOODING

return ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for helpful comments and M. Bensch for improving the English of this paper. REFERENCES [1] T. Kohonen, Self-Organizing Maps, ser. Information Sciences, 3rd ed. New York: Springer-Verlag, 2001. [2] T. Kohonen, S. Kaski, K. Lagus, J. Salojrvi, J. Honkela, V. Paatero, and A. Saarela, “Self organization of a massive document collection,” IEEE Trans. Neural Netw., vol. 11, no. 3, pp. 574–585, May 2000. [3] M. Bogdan, “Signalverarbeitung Biologischer Nervensignale zur Steuerung Einer Prothese mit Hilfe künstlicher Neuronaler Netze,” Ph.D. dissertation, Technische Informatik, Eberhard-Karls Universität, Tübingen, Germany, 1998. [4] C.-H. Chang, P. Xu, R. Xiao, and T. Srikanthan, “New adaptive color quantization method based on self-organizing maps,” IEEE Trans. Neural Netw., vol. 16, no. 1, pp. 237–249, Jan. 2005. [5] X. Tan, S. Chen, Z.-H. Zhou, and F. Zhang, “Recognizing partially occluded, expression variant faces from single training image per person with SOM and soft k-NN ensemble,” IEEE Trans. Neural Netw., vol. 16, no. 4, pp. 875–886, Jul. 2005. [6] I. Jolliffe, Principal Component Analysis, 1st ed. New York: Springer-Verlag, 1986. [7] P. Demartines and J. Hérault, “Curvilinear component analysis: A selforganizing neural network for nonlinear mapping of data sets,” IEEE Trans. Neural Netw., vol. 8, no. 1, pp. 148–154, Jan. 1997.

[8] J. W. Sammon, Jr., “A nonlinear mapping for data structure analysis,” IEEE Trans. Comput., vol. 18, no. 5, pp. 401–409, May 1969. [9] M. Bogdan and W. Rosenstiel, “Detection of cluster in self-organizing maps for controlling a prostheses using nerve signals,” Proc. Eur. Symp. Artif. Neural Netw., pp. 131–136, Apr. 2001. [10] A. Ultsch, “U* Matrix: A tool to visualize clusters in high dimensional data,” Databionics Res. Group, Univ. Marburg, Marburg, Germany, 2003 [Online]. Available: http://www.mathematik.uni-marburg.de/databionics/en//downloads/papers/ultsch03ustar.pdf [11] M. A. Kraaijveld, J. Mao, and A. K. Jain, “A nonlinear projection method based on Kohonen’s topology preserving maps,” IEEE Trans. Neural Netw., vol. 6, no. 3, pp. 548–559, May 1995. [12] M. Cottrell and E. de Bodt, “A Kohonen map representation to avoid misleading interpretations,” Proc. Eur. Symp. Artif. Neural Netw., pp. 103–110, 1996. [13] J. Vesanto and E. Alhoniemi, “Clustering of the self-organizing map,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 3, pp. 586–600, May 2000. [14] D. Opolon and F. Moutarde, “Fast semi-automatic segmentation algorithm for self-organizing maps,” Proc. Eur. Symp. Artif. Neural Netw. 2004 [Online]. Available: http://www.ensmp.fr/moutarde/Publis/somsegmentation_ESANN2004.pdf [15] F. Moutarde and A. Ultsch, “U*f clustering: A new performant “cluster-mining” method based on segmentation of self-organizing maps,” in Proc. Workshop Self-Organizing Maps, 2005, pp. 75–82. [16] Y.-M. Cheung and L.-T. Law, “Rival-model penalized self-organizing map,” IEEE Trans. Neural Netw., vol. 18, no. 1, pp. 289–295, Jan. 2007. [17] J. E. Bresenham, “Algorithm for computer control of a digital plotter,” IBM Syst. J., vol. 4, no. 1, pp. 25–30, 1965. [18] L. Ibáñez, C. Hamitouche, and C. Roux, “A vectorial algorithm for tracing discrete straight lines in N-dimensional generalized grids,” IEEE Trans. Vis. Comput. Graphics, vol. 7, no. 2, pp. 97–108, Apr. 2001. [19] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” Int. J. Comput. Vis., pp. 321–331, 1988. [20] C. Xu and J. L. Prince, “Snakes, shapes and gradient vector flow,” IEEE Trans. Image Process., vol. 7, no. 3, pp. 359–369, Mar. 1998. [21] N. Xu, R. Bansal, and N. Ahuja, “Object segmentation using graph cuts based active contours,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2003, pp. 210–224. [22] T. F. Chan and L. Vese, “Active contours without edges,” IEEE Trans. Image Process., vol. 10, no. 2, pp. 266–277, Feb. 2001. [23] S. M. van Dongen, “Graph clustering by flow simulation,” Ph.D. dissertation, Cntr. Math. Comp. Sci., Utrecht Univ., Amsterdam, The Netherlands, 2000. [24] M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Dept. Phys. Cntr. Study Complex Syst., Univ. Michigan, Ann Arbor, MI, arXiv:cond-mat/0308217, Aug. 2003. [25] L. Vincent and P. Soille, “Watersheds in digital spaces: An efficient algorithm based on immersion simulations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 6, pp. 583–598, Jun. 1991. [26] L. Najman and M. Schmitt, “Geodesic saliency of watershed contours and hierachical segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 18, no. 12, pp. 1168–1173, Dec. 1996. [27] T. H. Corman, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd ed. Cambridge, MA: MIT Press, 2001. [28] F. Zanoguera, B. Marcotegui, and F. Meyer, “A toolbox for interactive segmentation based on nested partitions,” in Proc. Int. Conf. Image Process., 1999, vol. 1, pp. 21–25. [29] A. Strehl, J. Ghosh, and R. Mooney, “Impact of similarity measures on web-page clustering,” in Proc. 17th Nat. Conf. Artif. Intell.: Workshop Artif. Intell. Web Search, Austin, TX, Jul. 30–31, 2000, pp. 58–64. [30] A. Strehl and J. Ghosh, “Cluster ensembles—A knowledge reuse framework for combining multiple partitions,” J. Mach. Learn. Res., no. 3, pp. 583–617, 2002. [31] J. Vesanto, J. Himberg, E. Alhoniemi, and J. Parhankangas, “Som Toolbox for Matlab 5,” Helsinki Univ. Technol., HUT, Finland, Tech. Rep. A57, 2000. [32] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 1, no. 2, pp. 224–227, Apr. 1979. [33] T. Hermle, C. Schwarz, and M. Bogdan, “Spike sorting algorithm based on independent components analysis (ICA) and a self-organizing map (SOM) minimizing user interaction—Application to multineuron recordings in CNS,” in Proc. World Congr. Neuroinf., Wien, Austria, 2001, pp. 24–29. [34] R. H. Bartels, J. C. Beatty, and B. A. Barsky, An Introduction to Splines for Use in Computer Graphics and Geometric Modeling. San Mateo, CA: Morgan Kaufmann, 1987. [35] E. Anderson et al., Lapack Users’ Guide, 3rd ed. Philadelphia, PA: SIAM, 1999.

BRUGGER et al.: AUTOMATIC CLUSTER DETECTION IN KOHONEN’S SOM

Dominik Brugger received the diploma degree in bioinformatics with honors from the Eberhard-Karls Universität, Tübingen, Germany. Currently, he is working towards the Ph.D. degree at the Department of Computer Engineering, University of Tübingen, Tübingen, Germany. He is a member of the NeuroTeam at the Department of Computer Engineering, University of Tübingen. His current research interests include neural networks, unsupervised kernel methods, and machine learning algorithms.

Martin Bogdan received the engineer diploma in signal engineering from the Fachhochschule Offenburg, Offenburg, Germany, in 1993, the engineer diploma in industrial informatics and instrumentation from the Université Joseph Fourier Grenoble, Grenoble, France, in 1993, and the Ph.D. degree in computer science (computer engineering) from the University of Tübingen, Tübingen, Germany, in 1998. In 1994, he joined the Department of Computer Engineering at the University of Tübingen, where he has been the Head of the Research Group NeuroTeam since 2000. This research group deals mainly with signal processing based on artificial neural networks and machine learning focussed on but not limited to biomedical applications. Since winter 2005/2006, he has been a Substitute Professor for Computer Engineering at the University of Leipzig, where he received the appointment for a full professorship in March 2007.

459

Wolfgang Rosenstiel received the Diploma in computer science and the Ph.D. degree in computer science from the University of Karlsruhe, Karlsruhe, Germany, in 1980 and 1984, respectively. Currently, he is a Professor and the Chair of Computer Engineering at the University of Tübingen, Tübingen, Germany. He is also the Managing Director of the Wilhelm Schickard Institute, University of Tübingen and the Director of the Department for System Design in Microelectronics at the Computer Science Research Centre (FZI), Karlsruhe, Germany. He is on the Executive Board of the German Edacentrum.

Suggest Documents