The Architecture of Emergent Self-Organizing Maps to ... - CiteSeerX

2 downloads 958 Views 227KB Size Report
Nov 22, 2004 - Department of Computer Science ... Minimal-U-Ranking, as introduced ... according to rankings of distances between pairs of entities), topology ( ..... Diego and California Institute of Technology, 1995 ... Orlando, Florida, 1994.
The Architecture of Emergent Self-Organizing Maps to Reduce Projection Errors November 22, 2004 A. Ultsch, L. Herrmann Technical Report Department of Computer Science Philipps University of Marburg Abstract Emergent self-organizing maps (ESOM) may be regarded as a non-linear projection technique using neurons arranged as a lattice embedded in a lowdimensional map space. The preservation of the topography of the high dimensional input data onto the map is a primary aim of ESOM projections. ESOM can be distinguished according to their structure and length-to-with ratio. There are mainly two types of structures in use: hexgrid (honeycomb like) and quadgrid (trellis like) maps. In addition to that, the length-to-with ratio may be chosen as uniform (square maps) or nonuniform (rectangular maps). The performance of these map types are evaluated with Zrehen’s measure for backward projection errors and a new measure for forward projection errors (Minimal-U-Ranking). In contrast to others, these two measures are not biased by gaps in the data space. Minimal-U-Ranking, as introduced here, is based on the U-distance which takes ordering of neurons on the map as well as the topography of the data space into account. Hexgrids were found to have no convincing advantage over quadgrids. In some cases folding errors increased for hexgrids. Rectangular maps, however, are destinctively superior to square maps. Most surprisingly, rectangular maps outperform square maps for data spaces that are isotropic, i.e. data spaces with no particular primary direction.

1

Introduction

Self organizing feature maps (SOM) as introduced by Teuvo Kohonen [Kohonen 82] are used as a tool for the investigation of spatial properties of high dimensional data sets. SOM are used to project high dimensional data onto a grid of neurons. The SOM projects data points onto a two dimensional retina or grid. The grid is used as a map in the geometrical sense of the high dimensional data space. The images of the data on the map pictures the topographic relationships of the data space. For an authentic picture of the high dimensional space, the mapping constructed by the SOM should preserve the topographical features of the data space as much as possible. If the dimensionality of the data space is higher than the dimensionality of the grid the topographical relationships can not be preserved completely, topographical mapping errors are unavoidable. In this work we study the influence of the shape of the grid on the fidelity of the topography preservation. Two main parameters are studied: first, the influence of 1

a hexgrid i.e. a hexagonal, honeycomb like structure vs a quadgrid i.e. trellis grid structure and second, the influence of the ratio of edge length, in particular square vs. rectangular is studied.

2

Basic Definitions

2.1

Data spaces and topographies

In this context, the data-space D ⊂ Rn denotes a metric subspace where data points of an application can be observed in principle. The distance measure in D is denoted d : D × D → R+ 0 . The training set E = {x1 , ...xk } ⊂ D consists of input-samples presented during the SOM training algorithm. The map-space M is a low-dimensional manifold embeddable in Rl , l < n, with a distance measure md : M × M → R+ 0 called map distance. Each of the above mentioned sets has its own topography, which is the set of all pairwise similarity relations between its entities. According to different definitions of similarity, three kinds of topographies can be distinguished [Bauer et al. 99][Goodhill et al. 95]: metric topography (similarites are rated as distances between pairs of entities), rank-based topography (similarites are rated according to rankings of distances between pairs of entities), topology (similarites are rated according to pairwise neighbourhood relations).

2.2

The Self-organizing map

The Self-Organizing Map (SOM) is regarded as a set I of neurons. A neuron i ∈ I is a tuple (wi , pi ) consisting of a reference vector wi ∈ W and a position pi ∈ P . The positions of neurons are arranged in P such that regular grids are formed (see below). The reference vectors W = {wi : i ∈ I} ⊂ D are used for vectorquantization purposes, whereas the positions P = {pi : i ∈ I} ⊂ M are used for vector-projection purposes. The SOM assigns arbitrary data points x ∈ D to corresponding neurons with minimal data-space distance, so-called best-matches (see Equation 1). Therefore, the reference vectors of the SOM are used for vector-quantization purposes (see Equation 2): All data points x ∈ D that are assigned to one neuron i make up its Voronoi-region Vi . In additon to that, the low-dimensional positions pi ∈ P of the SOM are used for (non-linear) vector-projection purposes (see Equation 3). For an explicit description of the training algorithm see, for example, [Kohonen 97, pp 86] [Ritter et al. 92, pp 62] [Kaski 97]. bm : D → I, bm(x) = argmini∈I d(x, wi )

(1)

q : D → W, q(x) = wbm(x)

(2)

m : D → P, m(x) = pbm(x)

(3)

The training algorithm of the SOM leads to some remarkable properties (see [Kaski 97] [Kohonen 97, pp 93ff] [Ritter et al. 92, pp 222ff]). First, the distribution of reference vectors follows the distribution of the input-samples from training set E. Second, the vector-projection of the SOM preserves1 the training-set’s neighbourhood relationships sufficiently well on the map-space. The degree of topography preservation depends on the ratio of intrinsic dimension of the data-space and the dimension of the map-space. Therefore, the SOM can be seen as a low-dimensional flexible grid that follows the density function of the input-samples from training set E. 1A

proven law of topography preservation is still missing.

2

For each pair of neurons i, j ∈ I there are two different distances: the data-space distance d(wi , wj ) and the map-space distance md(pi , pj ). This results in a different topography for each space. The k ∈ N neurons with smallest data-space distances towards neuron i construct the data-space neighbourhood NkD (i) ⊂ I. The neurons with smallest data-space distances towards i construct the map-space neighbours NkM (i) ⊂ I.

2.3

Grid structures

The positions of the neurons P ⊂ M are chosen such that for each neuron there is a set of equidistant neighbours N M (i) ⊂ I. This gives P the form of a regular grid. The most popular variants of grids are two dimensional hexgrids and quadgrids. In hexgrids each neuron has six immediate neighbours. In quadgrid maps each neuron has four immediate neighbours (see Figure 1).

2.4

U-Matrix for visualization of distance structures

The sum of all data distances of a neurons’ reference vector wi to the reference vectors wj , j ∈ N M (i) of its immediate neighbours is called U-height. A visualization of all U-heights for a given map space is called U-matrix [Ultsch/Siemon 90]. Most popular visualizations for U-matrices of two dimensional map spaces are grey level or landscape pictures (see [Ultsch/Siemon 90] [Ultsch 03b] [Kohonen 97, pp 126]). There are map grids with and without borders. In planar and finite two dimensional grids, such as shown in Figure 1, border effects occur. Some neurons are located at the edges or corners. Such neurons have less neighbours than neurons in more central regions. Neurons in a corner have only two neighbours whereas central neurons posess four, on a quadgrid, respectively six neighbours, on a hexgrid. During the training phase of a SOM, neurons located on the border show different properties durong training as central neurons. To avoid such border effects, grids can be embedded in a finite but borderless space e.g. a torus a sphere or the PACman space (see [Ultsch 03b]). In order to eliminate border effects, we concentrate in the following on borderless toroid grids. For this type of SOM, the map-space implies a two dimensional manifold on which the neurons’ positions are equally spaced. In the published applications of such SOM two main types of SOM can be distinguished: first, SOM in which each neuron represents a cluster of input-samples. These SOM can be thought of as a variant of the k-means clustering algorithm (see [Kaski 97]). In these SOM the number of neurons corresponds to the number of clusters assumed in the input data. Usually, this number is very small (≤ 16). In contrast to that, SOM may be used as tools for visualization of structural features of the data-space. The structural features of the data-space are usually visualized using U-Matrix [Ultsch/Siemon 90]

Figure 1: Quadgrid and hexgrid arrays of neurons: immediate neighbours in the map-space have connections. 3

or P-Matrix techniques [Ultsch 03b]. A characteristic of this paradigm is the large number of neurons, usually several thousands (≥ 4000) of neurons. These SOM allow the emergence of intrinsic structural features of the data-space on the map. They are called Emergent Self-Organizing Maps (ESOM, [Ultsch 99a]). A single neuron of these SOM represents a local region in data space. In the following, we concentrate on ESOM.

3 3.1

Measurements of topography preservation Projection errors and gaps

ESOM as projections of high dimensional data spaces onto a low dimensional grid may have topographical errors. These errors are reviewed in this section. There are two kinds of topographical errors that have to be considered: first, a pair of similar data points (x, y) is assigned to a distant pair of positions (m(x), m(y)) = (pi , pj ) on the matrix. This means that d(x, y) is small and md(m(x), m(y)) is large. This type of error is called a forward projection error (FPE). ESOM with forward projection errors project data points of a common high dimensional cluster onto distant locations on a matrix. Searching for clusters on the matrix, for example using U-Matrix techniques, may show two or more clusters instead of one. Second, a pair of close neighbouring positions (pi , pj ) = (m(x), m(y)) may be the image of a pair of distant data points (x, y) in the data-space. This is called a backward projection error (BPE). The U-matrix usually shows large heights at such neurons i, j. Therefore, the U-matrix method resembles a topography preservation measure. Backward projection errors lead to a reduction of trustworthiness (see [Venna/Kaski 01]), which means that neighbouring positions on the grid may reference non-neighbouring data-space regions. Usually, there are regions in the data-space where the probability density function becomes very small or even vanishes. In this context, these regions will be called gaps. Gaps divide a data-set into several classes of coherent elements. In case of metric data-spaces, these classes are refered as clusters. It has to be mentioned that gaps affect the ESOM nearly the same way as backward projection errors (BPE): BPE are defined as occurances of neighbouring neurons in the map-space with huge data-space distances. Usually, gaps lead to neighbouring neurons with huge data-space distances, too (see Figure 2). Visual inspection of the U-matrix can not distinguish between these two effects.

3.2

Measures for projection errors

The most popular published measurements of topography preservation of ESOM are: Topographic Product [Bauer/Pawelzik 92], Topographic Function [Villmann et al. 94], C-Measure [Goodhill et al. 95], Minimal Pathlength [Durbin/Mitchison 90], Minimal Wiring [Durbin/Mitchison 90], Zrehen’s Measure [Zrehen 93], Topological Index [Bezdek/Pal 93] and Trustworthiness Measure [Venna/Kaski 01]. For an ovierview see [Herrmann 03]. All measures (except Topographic Function and Zrehen’s Measure) are based on evalutation of distances or rankings of distances between pairs of neurons in the data-space and on the mapspace. Therefore, these measures are subjective to location, scaling and variance of the input data. In the following, topography preservation refers to preservation of topography by the bijective mapping m that connects reference vectors and map-space positions 4

(m : W → P , m(wi ) = pi and m−1 (pi ) = wi ). This specialization makes it possible to operate on two finite sets instead of infinite data-spaces. A useful topography measurement should fulfill the following requirements (see also [Bauer et al. 99] [Goodhill et al. 95]): • Individual assignment Each neuron should get a rating of his own. The assignment of individual projection errors to each neuron allows to pinpoint violations of topography preservation. The rating of the ESOM follows from the neurons’ ratings. By rearrangement of the measures’ formulae, it can easily be shown that each measure fullfils this requirement. • Invariance The topography measure depends exclusively on the topography of the neurons in the data- and map-space. A comparion of two ratings indicates which ESOM (or which neuron) preserves its topography in a better way. Since m (and even m−1 ) usually cannot preserve metric topographies, any topography measure has to be invariant to properties of the reference vectors e.g. scaling or location. The measures Topographic Product, C-Measure, Minimal Pathlength and Minimal Wiring cannot be interpreted meaningfully because there are strong dependencies on the neurons’ data- or map-space distances. • Backward projection errors vs. gaps Gaps of the data-space lead to a remarkable effect: rank-based topographies of neurons usually differ in map-space and data-space because the neigbourhoods N M (i), N D (i) of arbitrary neurons i ∈ I usually do not coincide (see Figure 2). Most measures2 are based on comparisons of such rank-based neighbourhoods on each neuron. The not-conicidence of neighbourhoods may have two reasons: gaps and projections errors. The latter has to be rated by an appropiriate measure. Therefore, topography measures should distinguish between gaps of the data-space and (backward) projection errors of the ESOM. In contrast to that, Topographic function and Zrehen’s measure operate on the Voronoi-tesselation of the data-space. Voronoi neighbourhoods (so-called voronoi-polyhydra [Martinetz/Schulten 94]) obviously stretch in empty space between clusters. Obviously, these two measures are invariant to gaps. Topography preservation of a given ESOM might be rated by the Topographic function or Zrehen’s measure. According to [Bauer et al. 99], the voronoi-polyhydra usually can not be retrieved by a given set of reference vectors. This works only if the training set E is suffiently dense compared to the set of reference vectors W . Usually, this assumption does not hold for ESOM because the number of reference vectors is rather large. Therefore, the topographic function will not be considered any more in this context. Zrehen’s measure rates the local organization of neighbouring neurons i, j ∈ I on the map. Neurons k ∈ I with reference vectors within a sphere with radius (w +w ) d(wi ,wj ) and center i 2 j are called intruders (this means violation of Equation 4). 2 Zrehen’s measure Z is the sum of all intruders of over all pairs of neighbouring neurons. Obviously, this measure can easily be determined for each neuron including a normalization scheme for different numbers of neighbours which leads to a neurondependent measure Z(i), i ∈ I. d(wi , wk )2 + d(wk , wj )2

>

d(wi , wj )2

(4)

2 Topographic Product, C-Measure, Minimal Wiring, Minimal Pathlength, Topological Index, Trustworthiness measure

5

i

i

(a)

(b)

Figure 2: Data-space (grey) with a gap in the middle (white). Reference vectors of neurons are shown as points. Immediate map-space neigbourhoods of neurons are shown as lines. Neighbourhoods N8M (i) and N8D (i) of neuron i – shown as dotted circles in figures (a) and (b) – cannot coincide because of the gap. Still, there is a need for a measure that rates forward projection errors. A new approach is presented in the following section.

4

The Minimal U-Ranking Measure

Usually, gaps of the data-space confuse measures that rate backward or forward projection errors of the ESOM because neighbourhoods of neurons do not coincide in data- and map-space (see Figure 2) even if the topography is preserved as good as possible. In order to avoid this, the so-called U-distance is proposed as a novel distance measure for the map-space. Let P AT Hij be the set of all arbitrary paths between neurons i, j ∈ I. The length of such a path is called pathdistance(.) The udistance(i, j) for neurons i, j ∈ I is defined as the minimal pathdistance between i and j: P AT Hij = { (i1 , ..., in ) : n ∈ N \ {0, 1}, i1 = i, in = j, ik+1 ∈ N M (ik ) } pathdistance(i1 , ..., in ) =

n−1 X

(5)

d(wik , wik+1 )

(6)

pathdistance(q)

(7)

k=1

udistance(i, j) =

min

q∈P AT Hij

Obviously, U-distances correspond to minimal-length paths on the low-dimensional flexible net that is formed by the neurons in the data-space. Therefore, gaps of the data-space can be seen from from the map-space. U-distances are used to define a rank-based topography on the set of neurons: Let (udistance(i, i1 ), ..., udistance(i, in )) be the ordered sequence of all Udistances towards neuron i for all neurons in {i1 , ..., in } = I and udistance(i, ik ) ≤ udistance(i, ik+1 ) for k = 1, ..., n−1. Then uranki (j) = r ∈ {1, ..., n} is the position of udistance(i, j) in this sequence. This means that ir = j. The Minimal-U-Ranking measure is defined as follows: X mur(i) = uranki (j) (8) j∈N D (i)

Obviously, the Minimal-U-Ranking measures the vector-projection’s scattering of neigbourhood N D (i) onto the map-space. Usually, the neigbourhood N D (i) is 6

chosen as a k ∈ N sized data-space neighbourhood NkD (i) around neuron i (e.g. let k be ≈ 5% of the size of I). An additional normalization scheme may be used to scale the resulting values between 0 and 1, where 0 points to perfect neighbourhood preservation (see [Herrmann 03]).

5

Data Sets

In order to demonstrate effects on a wide variety of different types of training set topographies, we have used four synthetic training sets and two real life data sets. The synthetic data sets are called Hexa, Atom, Ball and Chainlink. All data sets may be obtained from the authors on http://www.mathematik.uni-marburg.de/~databionics. All data sets are standardized i.e. the means of each variable is 0 and the variance is 1. Hexa: Three-dimensional points in six well separated clusters of equal size. The six clusters span the axes of the dimensional space. Chainlink: This data set consists of thousand 3-dimensional points that are arranged in two separated clusters of equal size. Each cluster is bound by a torus. The clusters are located in space as two links of a chain. This data set has been used to demonstrate that ESOM have different clustering properties than k-means clustering algorithms [Ultsch/Vetter 94]. Atom: The atom data set consists of 800 3-dimensional points that are arranged in two separated classes of equal size: nucleus and shell. The nucleus class contains data within a spherical layer surrounding the nucleus at a positive distance. The shape of each class is regularly spherical. Therefore, there is no principal component axis with significantly higher variance. The shell may be seen as a principal surface [Ritter et al. 92, pp 233]. Ball: The ball data set consists of 800 3-dimensional points that are arranged in a single class. This class forms a regular round cluster of equal density. Obviously, there is no such thing as a principal axis, curve or manifold with significantly higher variance. The ball data is truly an isotropic set of points. The application data sets are Iris and Oliveoils: Iris: the well known data set of Fisher [Fisher 36]. The data set consists of analytical data from 150 flowers. Four attributes of different flower lengths are used for description. There are 3 cluster predefined in the data set. One cluster is well speparated whereas the two remaining classes overlap on every attribute. A principal components analysis states that there is one principal component axis with significantly bigger variance. Oliveoils: the data set consists of analytical data from 572 italian olive oils produced in nine different regions. Six concentrations of different fatty acids describe each oil. A principal components analysis states that there is one principal component axis with significantly bigger variance. This data set also has been used in [Zupan et al. 94].

6

Experiments

For the data sets described above we want to investigate the effects of the type of grid i.e. hexgrid vs. quadgrid and square vs. rectangular grid structures. Borderless ESOM with a toroid topology and nearly the same number of neurons are used for all experiments. Four different kinds of maps are considered:

7

hexsquare hexrectangular quadsquare quadrectangular

64 × 64 50 × 82 64 × 64 50 × 82

neurons neurons neurons neurons

with with with with

a a a a

hexgrid map hexgrid map quadgrid map quadgrid map

The learning rate is kept constant at 0.1 and the neighbourhood kernel has a regular bubble shape (see [Herrmann 03]). In order to compare two or more ESOM with different map-spaces, the number of modified neurons should be the same for all maps during the training phase. Otherwise, this may cause an advantage for one kind of ESOM. A constant number of modified neurons can be achieved through normalization of the learning enviroments’ sizes: let Γ denote the number of learning 3 epochs, ul denotes the size of the learning PΓ environment for k = 1, ..., Γ and #I denotes the number of neurons. Then ( l=1 ul )/(Γ · #I) should be the same for every ESOM. This results in the following learning radii: ³P ´ Γ initial radius final radius u l=1 l / (Γ · #I) 64 × 64 64 × 64 50 × 82 50 × 82

euklid hexagonal euklid hexagonal

30.312 31 30.63 31.9

3 3 3 3

≈ 0, 262573 ≈ 0, 262561 ≈ 0, 262573 ≈ 0, 262573

As measurement for the forward projection F P E(i) of each neuron i the minimal URanking was used: F P E(i) = mur(i). As measurement for the backward projection error BP E(i) of each neuron i Zrehen’s measure was used: BP E(i) = Z(i). For each data set and ESOM type the training was repeated 200 times. One might regard the resulting ratings as random values produced by the training algorithm of the ESOM. To show the significance of differences in the outcome of the different experiments a Kolmogorov-Smirnov test (KS-test) was used. The KS-test has the advantage of making no assumption about the distribution of data. It is a non-parametric and distribution free test. Null hypothesis for the KS-test was that there are no differences in the projecton errors of ESOM of the different types. All test results reported below were significant on an α = 5% niveau, p-values are given in the appendix. The distributions of F P E and BP E were visualized using PDEplots [Ultsch 03a]. PDEplots schow an estimation of the probability density distributions using an optimal information density estimation.

7

Results

7.1

Quadgrid vs. hexgrid

For each data set it has to be judged whether hexgrid or quadgrid maps lead to lower ratings on average. Obviously, this has to be done separatly for square and rectangular maps. The evalutations4 of the resulting tests can be found in Table 1. An additional visualization of median changes can be found in Figure 3. If one uses hexgrid maps instead of quadgrid maps, there is no general effect on the number or intensity of backward projection errors. Six out of twelve tests show that quadgrid maps lead to bigger error values whereas four tests show that quadgrid maps lead to smaller error values. In contrast, hexgrid maps lead to less or less intense forward foldings in nearly all cases (eleven of twelve tests). 3 set

of modified neurons in an elementary learning step a signficance niveau of α = 5%

4 with

8

Percentage Improvement 100

90

80

70

60

50

40

square−>rectangular

Quad −> Hex

square−>rectangular Quad −> Hex

square−>rectangular

square−>rectangular Quad −> Hex

Forward Projection Error

square−>rectangular

Quad −> Hex

Olive

Olive

Quad −> Hex

Quad −> Hex

Iris

Quad −> Hex

square−>rectangular

30

Hexa

Iris

square−>rectangular

Chainlink

Quad −> Hex

20

Ball

Hexa

square−>rectangular

(a)

Quad −> Hex

square−>rectangular

Quad −> Hex

Backward Projection Error

Chainlink

square−>rectangular

10

4

Ball

square−>rectangular

Atom

3

4

Atom

3

Quad −> Hex

0

−10

100

90

80

70

60

50

40

30

20

10

0

−10

square−>rectangular

Quad −> Hex

(b)

9

Figure 3: Changes of median (a) forward and (b) backward projection errors. Rectangular shaped maps usually lead to superior performance. See also Table 1 and 2.

Percentage Improvement

It has to me mentioned that the latter effect may be produced by the Minimal URanking itself: on maps with hexgrid arrays, nodes have more connections to their adjacent nodes than on quadgrid ones. Therefore, U-distances are usually tighter on hexgrid arrays than they are on quadgrid arrays (see section 4). U-distances become smaller on hexgrid maps because less nodes (and therefore less data-space distances) are added. This tightening-effect is of unknown intensity yet. Therefore, the evaluations are put back until further investigations allow a re-examination of the ratings produced by the Minimal U-Ranking.

7.2

Square vs. rectangular

For each data set it has to be judged whether square or rectangular maps leads to lower ratings on an average. Obviously, this has to be done separatly for hexgrid and quadgrid maps. The evalutations5 of the resulting tests can be found in Table 2. An additional visualization of median changes can be found in Figure 3. Compared with square map-spaces, rectangular ones produce less or less intense fordward foldings in all cases. In contrast to section 7.1, there is no such thing as a tightening-effect because maps with the same grid structure are compared against each other. This means that there is a highly significant effect on the forward projection errors that can be attributed to the shape of the ESOM. Seven out of twelve tests show a decreasing number or intensity of backward projection errors if one uses rectangular instead of square maps. Five tests show opposite results. This means that there is no significant effect on backward projection errors that can be attributed to the shape of the ESOM.

8

Discussion

Most measures for topography preservation misinterpret gaps of the data-space as backward projection errors of the ESOM. Therefore we used Zrehen’s measure for backward projection errors since it is gap invariant due to it’s definition based on a voronoi-tesselation of the input-space. In order to measure forward projection errors without a distortion by gaps in the data space, the minimal-U-ranking measure was introduced here. It depends on a rank-based metric called U-path that is used to adjust data- and output-space topographies. Comparing hexgrid and quadgrid maps, there is no significant difference in backward projection errors. For forward projection errors hexgrids are slighly better. This may be attributed, however, to effects of the measure (Minimal U-Ranking) itself: grids with more connections between adjacent nodes usually lead to shorter minimallength paths beween pairs of nodes. Therefore, U-distances usually are per se smaller on hexgrid maps than they are on quadgrid maps. This leads to improved coincidence of data- and map-space neighbourhoods (see section 7.1). Scott compared the effect of using a hexgrid instead of a quadgrid for histograms in two dimensions [Scott 88]. He found that the effect is rather small: quadgrids are 2% less effective than to hexgrids. This coincides with our observations that the error reduction is much bigger using rectangular vs. square maps than using hexgrid vs. quadgrid maps. In our opinion, the simplicity of subsequent display and processing with of-the-shelf programs therefore favours the usage of quadgrid maps. It has been demonstrated here, that ESOM built on rectangular map spaces lead to smaller projection errors, compared to ESOM with square map spaces. This 5 with

a signficance niveau of α = 5%

10

oblongation-effect may be explainable by the so-called automatic dimension selection [Ritter et al. 92] [Ritter/Schulten 88]. According to that, the principal component axis of the data-space with largest variance is mapped onto the longest axis of the ESOM. For ESOM with square grids, the principal axis of the data-space can be mapped in two orthogngonal directions on the grid. During the training of the ESOM this conflict appears and has to be resolved. Reorganization of some parts of the map may be necessary. As a consequence less training time can be spent for the reduction of topgraphic projection errors. This may lead to less accurate maps. Still there is a open question concerning istotropic data. Isotropic data sets, for example, the atom-data or ball-data shown above, have no principal component axis with prominent variance. In case of the ball-data, there is no principal axis, curve or manifold of biggest variance. Despite of this, topographic errors occur less often or less intense on rectangular shaped ESOM compared to square maps. Further research is necessary to understand the reasons behind this effect.

9

Summary

In this work the effects of different architectures for emergent self-organizing maps (ESOM) on the reduction of projection errors were investigated. To avoid unwanted border effects, maps should be unbounded, i.e. folding back to itself. Useful unbounded maps are, for example, embedded in a toroid manifold. Repeated experiments for a set of very different artifical and real world data sets were undertaken. The performance of the maps were evaluated with a careful selected measure for backward projection errors (Zrehen) and a special designed measure for forward projection errors (Minimal-U-Ranking). In contrast to others, these two measures are not biased by gaps in the data space. Using a hexgrid (honeycomb like) lattice as layout of the neurons on the map space showed no convincing advantage over quadgrid (trellis like) maps. For some data sets, hexgrids even had bigger folding errors than quadgrids. The easy implementation of quadgrids using arrays and the usability of standard plotting routines favours therefore quadgrid ESOM. The comparison of maps with an unit length-to-with ratio (square maps) to maps with a nonuniform ratio(rectangular maps) showed, however, a substantial difference. In all experiments rectangular maps outperformed square maps. This improvement was by far more intensive than any hex- to quadgrid effect. Even for isotropic data with no preferred primary direction, rectangular maps were found to be superior to square ones. To find an explanation of this effect and to derive the length-to-with ratio from the structure of the data set is identified as an interesting direction for further research.

References [Bauer/Pawelzik 92] H.-U. Bauer, K. Pawelzik, Quantifying the neighborhood preservation of Self-Organzing Maps, in: IEEE Transactions on Neural Networks, Edition 3(4), 1992 [Bauer et al. 99] H.-U. Bauer, M. Herrmann, Th.Villmann, Neural Maps and Topographic Vector Quantization, in: Neural Networks, Edition 12, Elsevier, 1999 [Bezdek/Pal 93] J. C. Bezdek, N. R. Pal, An Index of Topological Preservation and its Application to Self-Organizing Feature Maps, in: Proceedings of

11

International Joint Conference on Neural Networks (IJCNN’93), IEEE Service Center, 1993 [Durbin/Mitchison 90] R. Durbin, G. Mitchison, A dimension reduction framework for understanding cortical maps, in: Nature, Edition 343, Nature Publishing Group, London, 1990 [Fisher 36] R. A. Fisher, The use of multiple measurements in taxonomic problems, in: Annals of Eugenics, Edition 7, 1936 [Goodhill et al. 95] G. Goodhill, S. Finch, T. Sejnowski, A unifying measure for neighbourhood preservation in topographic mappings, in: Proceedings of the 2nd Joint Symposium on Neural Computation, University of California, San Diego and California Institute of Technology, 1995 [Herrmann 03] L. Herrmann, Selbstorganisation in Gitterstrukturen, Diploma-Thesis, Philipps-University of Marburg, 2003. [Kaski 97] S. Kaski, Data Exploration Using Self-Organizing Maps, PhDThesis, Helsinki University of Technology, 1997 [Kohonen 82] T. Kohonen, Self-organized formation of topologically correct feature maps, in: Biological Cybernetics, Edition 43, Springer, 1982 [Kohonen 97] T. Kohonen, Self-Organizing Maps, Springer, Berlin, 1997 [Martinetz/Schulten 94] T. Martinetz, K. Schulten, Topology Representing Networks, in: Neural Networks, Edition 7(3), Elsevier, 1994 [Ritter/Schulten 88] H. Ritter, K. Schulten, Convergence Properties of Kohonen’s Topology Conserving Maps: Fluctuations, Stability and Dimension Selection, in: Biological Cybernetics, Edition 60, Springer, 1988. [Ritter et al. 92] H. Ritter, Th. Martinetz, K. Schulten, Neural Computation and Self-Organizing Maps - An Introduction, Addison-Wesley, New York, 1992 [Scott 88] D. W. Scott, A Note on Choice of Bivariate Histogram Bin Shape, in: Journal of Official Statistics, Vol. 4(1), Sweden, 1988. [Ultsch/Siemon 90] A. Ultsch, H. P. Siemon, Kohonen’s Self Organizing Feature Maps for Exploratory Data Analysis, in: Proceedings of International Neural Network Conference (INNC’90), Dordrecht (Netherlands), Kluwer, 1990 [Ultsch/Vetter 94] A. Ultsch, C. Vetter, Selforganizing Feature Maps versus Statistical Clustering: A Benchmark, Research Report No. 9, Dep. of Mathematics, University of Marburg, 1994 [Ultsch 99a] A. Ultsch, Data Mining and Knowledge Discovery with Emergent Self-Organizing Feature Maps for Multivariate Time Series, in: Kohonen Maps, Editors: E. Oja, S. Kaski, 1999 [Ultsch 03a] A. Ultsch, Optimal density estimation in data containing clusters of unknown structure, Technical Report, Philipps University of Marburg, 2003 [Ultsch 03b] A. Ultsch, Maps for the Visalization of high-dimensional Data Spaces, in: Proc. Workshop on Self organizing Maps, pp 225 - 230, Kyushu, Japan, 2003 12

[Venna/Kaski 01] J.Venna, S. Kaski, Neighborhood preservation in nonlinear projection methods: An experimental study, in: Artificial Neural Networks - ICANN 2001, Springer, Berlin, 2001 [Villmann et al. 94] Th. Villmann, R. Der, Th. Martinetz, A new quantitative measure for topology preservation in Kohonen’s feature maps, in: Proceedings of the IEEE International Conference on Neural Networks (ICNN’94), Orlando, Florida, 1994 [Voronoi 1908] G. Voronoi, Nouvelles applications des parametres continus a la theorie des formes quadratiques, in: Journal fr die reine und angewandte Mathematik (134), 1908 [Zrehen 93] S. Zrehen, Analyzing Kohonen Maps With Geometry, in: Proceedings of the International Conference on Artificial Neural Networks (ICANN’93), Springer, London, 1993 [Zupan et al. 94] J. Zupan, M. Novic, X. Li, J. Gasteiger, Classification of Multicomponent Analytical Data of Olive Oils using Different Neural Networks, in: Anal. Chim. Acta, Edition 292, pp 219-234, 1994

13

A

Additional plots and tables

Each outcome is marked with a symbol, where < denotes that quadgrid ESOM lead to smaller values, > denotes that quadgrid ESOM lead to bigger values, ≈ denotes an undecided test.

Hexa Chainlink Atom Ball Iris Olive oils

forward > > > > < > > > > > > >

p-values 2.005 · 10−37 2.2120 · 10−46 4.8013 · 10−75 4.0873 · 10−34 1.3969 · 10−21 0.00014775 5.3802 · 10−32 3.1914 · 10−64 5.3111 · 10−27 3.3551 · 10−35 2.0470 · 10−16 2.8044 · 10−14

backward < < ≈ < > < > > ≈ > > >

p-values 0 2.0386 · 10−4 1.5338 · 10−3 / 0.056135 1.7591 · 10−27 5.3802 · 10−32 6.2522 · 10−05 6.0193 · 10−29 1.3839 · 10−87 0.83527 / 0.19790 1.2853 · 10−12 1.59668 · 10−5 2.6537 · 10−70

Table 1: Ratings of quadgrid maps compared with hexgrid maps.

Each outcome is marked with a symbol, where < denotes that square maps lead to smaller values, > denotes that square maps lead to bigger values, ≈ denotes an undecided test.

Hexa Chainlink Atom Ball Iris Olive oils

forward > > > > > > > > > > > >

p-values 1.6384 · 10−44 9.4694 · 10−36 1.1807 · 10−72 7.2509 · 10−72 1.6124 · 10−32 9.1801 · 10−24 6.76668 · 10−53 7.5117 · 10−76 7.4062 · 10−86 7.4062 · 10−86 1.7843 · 10−77 7.5117 · 10−76

backward < > > > < < < > > > ≈ >

p-values 4.3093 · 10−174 3.6054 · 10−174 6.0349 · 10−30 3.1722 · 10−52 1.7591 · 10−27 4.7653 · 10−16 1.2853 · 10−12 5.3374 · 10−85 2.6537 · 10−70 1.7557 · 10−63 0.32465 / 0.37531 1.5815 · 10−69

Table 2: Ratings of square maps compared with rectangular maps.

14

160

140

120

100

80

60

40

20

0

0

0.02

0.04

0.06 0.08 0.1 forward projection error

0.12

0.14

0.16

− hex−elong−backw , −− hex−quadr−backw , : quad−elong−backw, −. quad−quadr−backw

data set: Atom

180

data set: Atom

4

8

x 10

7

6

5

4

3

2

1

0 0.8

1

1.2

data set: Ball

160

140

120

100

80

60

40

20

0 0.02

0.03

0.04

0.05 forward projection error

0.06

0.07

0.08

400

300

200

100

0.015

0.02

0.025 0.03 forward projection error

0.035

0.04

0.045

0.05

− hex−elong−backw , −− hex−quadr−backw , : quad−elong−backw, −. quad−quadr−backw

500

0.01

2 −4

x 10

x 10

2

1.5

1

0.5

0

2

2.5

3

3.5 Backward projection error

4

4.5

5 −5

x 10

(d)

600

0 0.005

1.8

data set: Ball

5

2.5

(c) data set: Chainlink

700

1.6

(b) − hex−elong−backw , −− hex−quadr−backw , : quad−elong−backw, −. quad−quadr−backw

(a)

1.4 Backward projection error

(e)

data set: Chainlink

10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0

4

6

8 Backward projection error

10

12 x 10

−4

(f)

Figure 4: Plots on the left: forward projection errors, Plots on the right: backward projection errors, atom, ball and chainlink data.

15

300

250

200

150

100

50

0 0.02

0.025

0.03 0.035 forward projection error

0.04

0.045

− hex−elong−backw , −− hex−quadr−backw , : quad−elong−backw, −. quad−quadr−backw

data set: Hexa

350

data set: Hexa

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

4

5

6

7 8 9 Backward projection error

data set: Iris

180

160

140

120

100

80

60

40

20

0 0.01

0.02

0.03

0.04

0.05 0.06 forward projection error

0.07

0.08

0.09

0.1

120

100

80

60

40

20

0.02

0.025

0.03 0.035 forward projection error

0.04

0.045

0.05

0.055

− hex−elong−backw , −− hex−quadr−backw , : quad−elong−backw, −. quad−quadr−backw

140

0.015

−4

x 10

2

1.5

1

0.5

0 1

1.5

2

2.5 Backward projection error

3

3.5

4 −4

x 10

(d)

160

0 0.01

12 x 10

data set: Iris

4

2.5

(c) data set: Olive

180

11

(b) − hex−elong−backw , −− hex−quadr−backw , : quad−elong−backw, −. quad−quadr−backw

(a)

10

(e)

data set: Olive

4

12

x 10

10

8

6

4

2

0

5

6

7

8 Backward projection error

9

10

11 −5

x 10

(f)

Figure 5: Plots on the left: forward projection errors, Plots on the right: backward projection errors, hexa, iris and olive oils data.

16