Determining Relevant Input Dimensions for the Self ... - CiteSeerX

Determining Relevant Input Dimensions for the Self Organizing Map Thorsten Bojer , Barbara Hammer , Marc Strickert , and Thomas Villmann ½

Department of Mathematics/Computer Science, University of Osnabrück D-49069 Osnabrück, Germany [email protected] ¾ Clinic for Psychotherapy and Psychosomatic Medicine, University of Leipzig, Karl-Tauchnitz-Straße 25, D-04107 Leipzig, Germany, [email protected] Abstract. We propose a method to determine the relevance of the different input dimensions for a self organizing map (SOM). First, a growing self organizing map is adapted to the data. Afterwards, the effect of the input dimensions on the clustering or the topology of the SOM, respectively, is computed and the data dimensions which are ranked low are pruned. The algorithm is applied to real life satellite image data. The results are verified via visualizing the data in RGBimages as well as explicitely computing the classification error.

1 Introduction Kohonen’s self organizing map provides a powerful and biologically motivated tool which computes a topologically adequate clustering of training data [8]. For this purpose, a lattice of neurons is adapted to the data such that the lattice which is chosen a priori fits the unknown topology of the data as good as possible. There exist various possibilities of further processing the SOM’s output: a classification can be approximated by means of attaching labels or local linear maps to the single neurons; substituting the data by their nearest neurons provides a compact representation of the data; the lattice of the SOM allows data visualization as well as structured data mining [6–8, 11, 13]. Hence SOMs are widely used for both, supervised and unsupervised learning tasks in robotics, image processing, data mining, and other areas of applications. One crucial property of SOMs consists in the fact that they compute a topologically adequate representation of the data assumed the lattice is chosen appropriately. Hence various adaptations and modifications of the standard SOM exist in order to measure the degree of topology preservation or to adapt the lattice structure to the data [1–3, 9, 14]. Often, data are high dimensional. Satellite remote sensing image data comprise dimensions according to the spectra provided by the Landsat Thematic Mapper up to several hundred dimensions for hyperspectral imagers. The larger the dimensionality, the more storing capacities and processing time are required for data processing. Furthermore, adequate visualization and data mining becomes difficult for high dimensional data. Hence input pruning, i.e. determining which data dimensions are relevant and which dimensions can be dropped for further processing, is a highly interesting topic [15]. This task is particularly challenging if unsupervised training methods like SOMs are considered. They do not provide a clear objective like the prediction error. It is necessary to find intrinsic characteristics of the unsupervised network instead. We

will deal with pruning methods for SOMs and their application to Landsat TM data in the following. We use a growing SOM (GSOM) for training in order to guarantee an adequate topology. We measure the importance of the input dimensions on the clustering or the topological structure, respectively. Afterwards, we prune the irrelevant dimensions in labeled satellite data for which the error can be quantified.

2 The Self Organizing Map

Assume a finite set of training data Ê is given. We denote the components of a vector by as proposed by Kohonen [8] consists . A SOM Ê together with a neighborhood of a set of neurons or codebooks structure of the neurons. We write iff codebooks and are neighbored. Denote by the minimum length of a path from to in this graph. Often, the neighborhood graph has a regular low dimensional grid structure which can be specified by a tuple , denoting the grid dimensionality and denoting the number of neurons in the respective grid dimension. Denote by the receptive field of the th codebook. After determining a neighborhood graph, the codebooks are trained with the recursive update

e ¼ where is chosen at random, denotes the winner neuron, i.e. ¼ , and and are positive learning rates which are often decreased during training in order to for all

ensure convergence. This algorithm spreads the codebooks onto the data such that the topology of the data matches the topology of the lattice of neurons. Roughly speaking, topology preservation means that neurons number and are neighbored in the lattice of neurons, i.e. , iff the codebooks and are neighbored codebooks in the data manifold, i.e. . There exist various approaches in order to state an exact definition or to compute the degree of neighborhood preservation in an efficient way [1–3, 9, 14]. Obviously, a faithful representation is possible if and only if the lattice of neurons which is chosen a priori fits the unknown topology of the training data. The GSOM generalizes the above learning algorithm such that the lattice is determined during training [2]. Starting from a minimum lattice, codebooks are added to an already existing lattice dimension or a new lattice dimension is attached during training depending on the observed deviation of the training data from the codebooks. GSOM has the advantage of preserving a regular grid structure whereby guaranteeing a maximum degree of topology preservation.

3 Relevance Measures The experiments in [5, 7, 10] indicate that a considerable reduction of the input dimensionality is possible without loosing much information in several applications. At the same time, theoretical results as proved in [12] restrict the possibility of efficiently finding reduced representations for general data. Hence we have to rely on heuristics adapted to the specific situation we are dealing with. We propose pruning algorithms for SOM which are related to methods proposed for the neural gas algorithm and successfully applied to artificial data in [5]. We define a relevance function Ê such that a high value indicates that dimension is relevant, a low value indicates that dimension could be pruned. The dimensions with lowest values are pruned. There exist various possibilities of defining appropriately:

Dispersion: We measure to which extend the variation around the codebook vectors is reduced in the respective dimension; if the variation is considerably reduced then the respective input dimension is important:

Weight function: The SOM is described by the mapping of an input vector to the winning codebook vector; this mapping can be approximated using softmin via

¾

e ¾ e where . The effect of dimension concerning this mapping can be measured via the derivative with respect to the th input dimension. The larger this value, the more important is the respective dimension. We denote this measure by . Topology preservation: The fact that the receptive fields of and intersect is – for reasonably well behaved data – equivalent to the fact that the point in between, , is closest to and . This test can be approximated by the test sgd

sgd being the standard sigmodial function. Hence those dimensions are not important which change the above value only slightly. In this case, only those receptive fields intersect which neurons are neighbored in the lattice of neurons. Hence the th component is important if the derivative of the above term with respect to the th input dimension is large. We denote this measure by . Note that it only depends on the lattice of the SOM and not on the training data. Hence it can be computed very efficiently. Moreover, we can further reduce the above sum to only those neurons

which are close to the neurons or , since a change in the topology will most likely where demanifest in local changes of the neighborhood structure. We refer to notes the maximum distance of neuron from neurons and in the neighborhood graph .

4 Application to Satellite Data Note that the above measures for the significance of the different input dimensions do not refer to a supervised learning task but rely on intrinsic characteristics of the SOM. We test these methods on a LANDSAT-TM satellite image from the Colorado area, U.S.A., for which a complete labeling of the inputs into different classes of vegetation is available 1 . Hence we can compare the results obtained via unsupervised methods explicitely with the classification error on the data. The data comprises six input dimensions and about mio. data points. Since the above methods crucially depend on the fact that the topology of the SOM fits the data topology, we use a GSOM approach which leads to a lattice with 1

Thanks to M.Augusteijn (University of Colorado) for providing the data

Table 1. Ranking of the input dimensions induced by the various significance measures.

½ ¾ ¿ ½¿ ¾¿

1 0.81 0.09 0.49 1.12 1.04 1.12

2 0.82 0.15 0.46 0.69 0.64 0.69

3 0.82 0.15 0.48 0.53 0.49 0.53

4 0.66 0.62 1.0 3.0 3.0 3.0

5 0.81 0.12 0.54 1.79 1.62 1.79

6 0.84 0.11 0.5 1.25 1.12 1.25

ranking 244126 622145 465123 456123 456123 456123

neurons. The topographic product which computes the degree of topology preservation (a value indicates perfect agreement) yields a value of hence a nearly perfect fit [1]. This finding agrees with the results of a Grassberger-Procaccia analysis which yields an intrinsic data dimension of . A standard PCA computes the eigenvalues [4]. Hence one intrinsic data dimension is dominant, dimensions contain further relevant information, where these intrinsic dimensions do not necessarily coincide with the Euclidian coordinates. Fig. 1 displays the labeled image and a RGB-representation based on the unsupervised SOM only – each neuron in the SOM represents one color with RGB values according to the position in the three dimensional lattice; all data points in the receptive field of a neuron are displayed in the same color as the neuron. Although no label information has been used for the second representation one can observe a good agreement of the two images. We compute a posterior labeling of the codebooks depending on the data: is equipped with the vector , denoting the number of different labels, where denotes the percentage of points in the receptive field of labeled with . Those label with maximum is attached to the codebook . Hence the SOM induces a mapping which maps an input to the label of the closest codebook vector. This function misclassifies a percentage of of the data. We refer to this accuracy via . Note that a large number of misclassifications is due to the fact that the class boundaries lie within the receptive fields of the codebooks since no label information is used for training. Additionally to the above relevance functions , , , , and , we further measure the effect of pruning the th input dimension on the labeling function if the percentage of misclassifications is measured according to . The results together with the induced ranking of the input dimensions are collected in Tab. 1. All methods indicate that dimension is most important. An RGB image based on this dimension only is shown in Fig. 1. Moreover, the induced ranking does not differ much between the various relevance measures. In particular, the very efficient measure , which depends on neighbored neurons and their immediate neighborhood only, provides an accurate estimation of , which depends on the whole neighborhood, as well as , which additionally depends on the data. Iteratively pruning the dimensions ranked low allows us to drop all but two dimensions and still obtaining an accuracy of more than as depicted in Fig. 1. The RGB-images which are obtained after pruning some of the input dimensions proposed by are displayed in Fig. 1. Since an explicit labeling is usually not available for unsupervised methods, we need an adequate stopping criterion for input pruning. Note that the relevance factors according to form well separated clusters: dimensions , , , and , hence stopping after pruning all dimensions from a specific cluster together with

some prior estimation of the intrinsic dimensionality of the data could be a reasonable and efficient strategy for a pruning method.

5 Conclusions The presented pruning methods provide a robust tool for efficiently determining relevant input dimension for an unsupervised SOM. They rely on intrinsic characteristics of the SOM and hence do not require explicit label information of the data. Therefore they are particularly suitable for data mining applications or visualization of data with an unknown structure. An application to real life satellite data for which an additional labeling is available showed promising results. In particular, pruning according to topology preservation is a very effective method which, additionally, proposed a natural stopping point in combination with prior dimensionality estimations. Further work has to be done on adequate stopping criteria for the other relevance measures.

References 1. H.-U. Bauer and K. R. Pawelzik. Quantifying the neighborhood preservation of SelfOrganizing Feature Maps. IEEE Trans. on Neural Networks, 3(4):570–579, 1992. 2. H.-U. Bauer and T. Villmann. Growing a Hypercubical Output Space in a Self–Organizing Feature Map. IEEE Transactions on Neural Networks, 8(2):218–226, 1997. 3. B. Fritzke. Growing grid: a self-organizing network wirh constant neighborhood range and adaptation strength. Neural Processing Letters, 2(5):9–13, 1995. 4. P. Grassberger and I. Procaccia. Measuring the strangeness of strange attractors. Physica, 9D:189–208, 1983. 5. B. Hammer and T. Villmann. Input pruning for neural gas architectures. To appear at ESANN’01. 6. T. Honkela, S. Kaski, T. Kohonen, and K. Lagus. Self-organizing maps of very large document collections: Justification for the WEBSOM method. In I. Balderjahn, R. Mathar, and M. Schader, editors, Classification, Data Analysis, and Data Highways, pages 245–252. Springer, Berlin, 1998. 7. S. Kaski. Dimensionality reduction by random mapping: fast similarity computation for clustering. In Proceedings of IJCNN’92, pages 413–418, 1998. 8. T. Kohonen. Self-Organizing Maps. Springer, 1997. 9. T. Martinetz and K. Schulten. Topology representing networks. Neural Networks, 7(3):507– 522, 1993. 10. U. Matecki. Automatische Merkmalsauswahl für Neuronale Netze mit Anwendung in der pixelbezogenen Klassifikation von Bildern. Shaker, 1999. 11. A. Meyering and H. Ritter. Learning 3D-shape-perception with local linear maps. In Proceedings of IJCNN’92, pages 432–436, 1992. 12. R. Nock and M. Sebban. Sharper bounds for the hardness of prototype and feature selection. In H. Arimura, S. Jain, and A. Sharma, editors, Algorihmic Learning Theory, pages 224–237. Springer, 2000. 13. A. Ultsch. Self-organizing neural networks for visualization and classification. In O. Opitz, B. Lausen, and R. Klar, editors, Information and Classification, pages 307–313, London, UK, 1993. Springer. 14. T. Villmann, R. Der, M. Herrmann, and T. Martinetz. Topology Preservation in Self– Organizing Feature Maps: Exact Definition and Measurement. IEEE Transactions on Neural Networks, 8(2):256–266, 1997. 15. T. Villmann and E. Merenyi. Extensions and modifications of SOM and its application in satellite remote sensoring processing. In H. Bothe and R.Rojas, editors, Neural Computation 2000, pages 765–771, Zürich, 2000. ICSC Academic Press.

"accuracy" "dispersion" "weight_function" "topology"

1 0.8 0.6 0.4 0.2 0 0

1

2

3

4

5

Fig. 1. First row - left: Classification according to the labeling, colors reslut from a specific color map; first row - right: RGB-visualization according to the SOM; second row - left: RGBvisualization with pruned dimensions and ; second row - right: RGB-visualization with pruned dimensions , , , and ; third row - left: RGB-visualization based on dimension , only; third row - right: decrease of the accuracy if the input dimensions according to the various measures are pruned.

Determining Relevant Input Dimensions for the Self ... - CiteSeerX

Determining Relevant Input Dimensions for the Self ... - CiteSeerX

Suggest Documents

Determining the Scour Dimensions Around

Defining Virtual Reality: Dimensions Determining ... - CiteSeerX

Determining Optimum Room Dimensions for Critical ... - Gearslutz

Culture and Self-Relevant Predictions - CiteSeerX

NbClust: An R Package for Determining the Relevant Number of ...

NbClust: An R Package for Determining the Relevant Number of ...

Determining the Model Order of Nonlinear Input

NbClust: An R Package for Determining the Relevant Number of ...

Self-organizing maps with multiple input-output option for ... - CiteSeerX

is adherence a relevant issue in the self-management ... - CiteSeerX

A Hierarchical Fuzzy System with High Input Dimensions ... - CiteSeerX

How relevant are Hofstede's dimensions for inter ...

The input bias - CiteSeerX

Determining Dimensions of Job Satisfaction Using ... - ScienceDirect

Determining Avalanche Modelling Input Parameters using Terrestrial

Defining Virtual Reality: Dimensions Determining ... - Cumincad

Determining Dimensions of Job Satisfaction ... - ScienceDirect.com

Determining shapes and dimensions of dental arches for ... - Scielo.br

The Sequence Attribute Method for Determining ... - CiteSeerX

Self-Relevant Scenarios as Mediators of Likelihood ... - CiteSeerX

The Sequence Attribute Method for Determining ... - CiteSeerX

Self relevant detail in false memory formationâ¦ - CiteSeerX

Self-Transferable Plasmids Determining the Hemolysin and ...

Self-Developed Testing System for Determining the Temperature

Determining Relevant Input Dimensions for the Self ... - CiteSeerX