Hierarchical Growing Cell Structures: TreeGCS
Victoria J. Hodge1 & Jim Austin Dept of Computer Science, University of York, Heslington, York, UK, YO10 5DD Email:
[email protected] Abstract
We propose a hierarchical, unsupervised clustering algorithm (TreeGCS) based upon the Growing Cell Structure (GCS) neural network of Fritzke. Our algorithm improves an inconsistency in the GCS algorithm, where the network topology is susceptible to the ordering of the input vectors. We demonstrate improved stability of the GCS foundation by alternating the input vector order on each presentation. We evaluate our automatically produced cluster hierarchy against that generated by an ascendant hierarchical clustering dendogram. We use a small dataset to illustrate how our approach emulates the hierarchical clustering of the dendogram, regardless of the input vector order.
1
Introduction
The ability to introduce hierarchical structured knowledge into a system using autonomous learning has received much attention. It bypasses the need to extract the hierarchy from human subjects, which is time-consuming and often introduces inconsistencies into the hierarchy. A dynamic flexible and automatic hierarchical clustering approach is required. The structure may then be used by the system for extracting trends, identifying generalisations, rapid searching or calculating similarities using distances within the structure.
smallest distance between any two clusters, and the two clusters are merged producing a branch in the cluster tree. The merging is repeated until only one cluster is left. However, dendograms cannot be visualised for large datasets, as the diagram is just too complex. Therefore, we feel the dendogram is ideal for structure and cluster comparison on the small dataset used here but would not be suitable for the learning system outlined in the first paragraph. Self-organising maps (SOMs) [3] and hierarchical SOMs (HSOMs) [3] are the commonest connectionist approach. SOMs induce a topographic map of the input data space, mapping each input vector to a best matching node on a lattice structure. HSOMs are multi-layered SOMs where each node in the lattice points to an entire SOM lattice on the layer below which represents more detailed concepts. HSOMs typically require a priori knowledge of the input distribution to allow a suitable topology to be chosen. Song & Lee [2] have produced the SAINT system that automatically determines a hierarchy composed of SOM lattices and prevents the need to pre-specify the structure. However, the topology within the SOM lattices fixes the number of neighbours attached to each node losing flexibility. We need a hierarchical methodology that automatically and dynamically determines the cluster hierarchy, does not confine units to a strict lattice structure and allows the network to split into separate clusters.
Clustering may be defined as a process of partitioning a multi-dimensional input space into clusters of similar objects. Similarity is determined by a metric over the object attributes, frequently Euclidean distance. Clustering covers a vast range of methodologies; see for example, Everitt [1] or Song & Lee [2]. Clustering is used in a wide variety of application areas including: information retrieval, natural language processing, image processing and pattern recognition. However, many methods suffer from at least one of the following: they assume specific forms (for example, normal form) for the input space; they do not scale well; or, they require a priori knowledge of the data to specify the cluster structure and parameter settings.
In this paper we propose and develop such an algorithm `TreeGCS’ based on Fritzke' s GCS neural network [4]. TreeGCS is an unsupervised, growing, self-organising hierarchy of nodes able to form discrete clusters. High dimensional inputs are mapped onto a 2-D hierarchy reflecting the topological ordering of the input space. We selected GCS as our foundation rather than Growing Neural Gas [5] or Dynamic Cell Structures [6] as we can set the dimensionality of the cell structure. GNG and DCS maintain the input data dimensionality in the network precluding visualisation and inhibiting the superimposition of a cluster hierarchy.
Hierarchical clustering superimposes a structure on to the clusters below. The generality of the clusters increases monotonically from leaf nodes to the root. Probably the most commonly used statistical hierarchical clustering approach is the dendogram. In this method there is initially one data point per cluster. The algorithm iteratively determines the
Our algorithm is similar to the HiGS of [7]. HiGS is a top-down, self-organising, growing hierarchy that maps the input vector distribution onto a 2-D hierarchical cluster structure. HiGS inserts nodes to reduce the network error, deletes nodes mapped onto by few input vectors and grows down when each layer accurately maps the input space.
However, the structure generated from HiGS does not meet our requirements as the topology is not a tree configuration. Each parent node is a member of a cluster of at least three nodes.
2
GCS
We describe GCS that maps the input vector space onto the 2-D cell structure by mapping each input vector to the best matching cell [4]. The initial topology is a triangle of cells (neurons). Each cell has a neighbourhood defined as those cells linked by a vertex to the cell. Each cell has a vector attached denoting the cell' s position in the input vector space. Topologically close cells have similar attached vectors. Each cell has a winning counter denoting the number of times it has been the best matching unit (bmu). The GCS learning algorithm is described below for one iteration (1 input vector). An epoch is an iteration run for each input vector. 1. A random triangular structure of connected cells is initiated. 2. The next random input vector is selected from the input vector density distribution. 3. The bmu is determined and the cell’s winning counter is incremented by 1. 4. The bmu and its topological neighbours are adapted towards the input vector to make them more similar to the input vector. 5. If the number of iterations exceeds a specified threshold a new cell is inserted (see figure 1) next to the cell with the highest winning count to redistribute the winning counts more evenly, the bmu under-represents the input distribution.
for the winning cell ( b), the adaptation step of the neighbourhood ( n), the temporal decay factor ( ), the number of iterations between insertions and the number of iterations between deletions. GCS runs for a user-specified number of epochs.
2.1
The run-time complexity for GCS is - for each input vector, determine the bmu by calculating the distance between the input vector and every cell' s attached vector. This entire calculation is performed for each epoch. The run-time for the GCS is thus, (Cells * dimension * numberInputs * epochs). During evaluation of the GCS algorithm, we discovered that it was susceptible to the order of the input vectors. If the order of the vectors is rearranged, the cluster topology alters. The algorithm is consistent but is unstable. We define consistency as generating the same cell structure when the algorithm is run repeatedly with the same input vector order. We define stability as generating the same cell structure if the algorithm is run repeatedly with the same set of input vectors but in different orders. We improved the stability by cycling through different data orders. In this paper, we use 3 data orderings to illustrate the GCS’s susceptibility to the order and how cycling improves the stability. GCS reads the first data order for one epoch, the second for the next, the third and back to the first etc. until the requisite number of epochs has been completed. This enables the different data orders to counteract each other and introduces more stability as demonstrated in section 3.
2.2 Figure 1 A new cell is inserted at each step. 6.
After a pre-specified number of iterations, the cell with the greatest mean Euclidean distance between itself and its topological neighbours is deleted and any cells that are enclosed within the neighbourhood (see figure 2).
Figure 2. Cell A is deleted. Cells B and C would be left dangling by the removal of the five connections surrounding A, so B and C are also deleted. 7.
The winning counter of all cells is decreased by a user-specified factor to implement temporal decay. The user-specified parameters are: the maximum number of neighbour connections per cell, the maximum cells in the structure, the adaptation step
GCS Evaluation
TreeGCS
The tree is constructed from and superimposed on to the standard GCS algorithm detailed above. The tree has one leaf node representing each GCS cluster. As the GCS structure splits into clusters, new nodes are added to the tree to mirror the splitting. If a GCS cluster is removed the corresponding tree node is removed. No constraints are imposed on the tree. It is dynamic, progressively adapting to the underlying cell structure and requires no prior data knowledge. The tree is updated once per GCS epoch. The running time is O(cells) as we breadthfirst search (BFS) through the entire cell structure. The algorithm is given below: Execute the GCS epoch, forming an unconnected graph representing the disjoint GCS clusters. BFS from a GCS cell to find which other cells are present in the cell’s cluster. While some cells in the GCS remain unprocessed, BFS from the next unprocessed cell to find which cells are present in its cluster. If the number of clusters has increased from the previous epoch a GCS cluster may have split, so find if any tree nodes point to multiple clusters (see
Figure 3) and add child nodes for each new cluster formed. The cluster list of the parent is deleted and cluster lists are updated for the child nodes1.
Figure 3 illustrating a cluster split and tree update. Alternatively, a new cluster may be formed from cells inserted during the current epoch so a new child node is added to the root and the cluster cells added to the new node' s list. Else if the number of clusters has decreased, a cluster has been deleted and the associated tree node is deleted. The tree is tidied to remove any redundancy (see Figure 4).
Figure 4 illustrating a cluster deletion. For each unprocessed cluster, the tree node that points to that cluster is determined, the cluster list emptied and all cells from the cluster are added. The GCS cells are labelled. Each input vector is presented to the GCS and the identifier of the bmu is returned. The bmu can then be labelled with the input vector (or other label as appropriate).
3
Evaluation
We initially demonstrate our stability improvement and qualitatively evaluate the TreeGCS against an ascendant hierarchical dendogram utilising Ward' s algorithm for 41 countries in Europe. In evaluations by Mangiameli [8], Ward' s method proved superior to six other hierarchical methods with respect to classification accuracy so we selected this approach for our structure comparison. In Ward’s algorithm, the two clusters to merge are selected to maximise an objective function (error sum of squares) at each step producing a branch in the dendogram tree. Thus, information loss is minimised A 47-attribute real-valued vector with no missing values represented the data for the 41 countries. 47 (x1, x2, … ,x47) EuroData={x1, x2, … ,x41} The attributes were geographical, population, economic, communication and transportation factors obtained from the World Factbook [9]. The small 1 Only leaf nodes maintain a cluster list. A parent’s cluster list is implicitly the union of its children’s cluster lists and is not stored for efficiency.
dataset allows simple visualisation and comparison. We use the dataset to demonstrate the stability improvements we have made and also to demonstrate TreeGCS’s ability to accurately map the dataset onto a 2-D hierarchy emulating the dendogram similarity clustering. The parameter settings for TreeGCS were: b=0.02, n =0.002, =0.0002 and up to 10 neighbours per cell as these values produced the maximal consistency during a brief empirical assessment. The maximum number of cells in the structure was set to 123 to ensure maximal spread with minimal redundancy. We set the number of iterations between insertions to 1 to ensure all input vectors are used for cell insertion in turn. We set the number of iterations between each delete to 5000 to ensure maximal adaptation before any cells are deleted. The algorithm was set to run for 30000 epochs for a thorough evaluation of stability. The algorithm took on average 440 seconds to run 30,000 epochs on a 180 MHz MIPS R10000 Processor for the European data described. We use three different orderings of the data: 1. Alphabetical order of the country names. 2. The second half of the alphabetical order file moved to the front. 3. The vectors sorted on the first attribute. We demonstrate the improvement in stability introduced by cycling through the three data orders. We produced nine hierarchies; three were produced using a single input data order (single-pass) - 1,2 and 3 from the numbers of the orders above. Six were produced from cycling through different data orders in the following sequences: 123, 132, 231, 213, 312 and 321 from the order numbers above.
4
Results
The dendogram produced the following three cluster hierarchy; c1 is the cluster for node 1 in the tree etc. c1. c2. c3.
{Den, Fr, Ger, It, UK} {Lux} {Al, An, Au, Be, Bo, Bu, Cr, Cy, Cz, Ei, Es, Fa, Fi, Gi, Gr, Hu, Ic, La, Li, Lt, Ma, Ml, Mo, NL, No, Po, Ro, SM, Se, Sl, Sn, Sp, Sw, Sz, Ur}
In Table 1 we compare each of the 3 single pass hierarchies and each of the 6 cyclic hierarchies to the dendogram hierarchy, above. The first row indicates whether the hierarchy had the {Den, Fr, Ge, It, UK} cluster (from the diagram above) as a separate half of the hierarchy and how many additional countries are included in this cluster. An ‘X’ indicates that the cluster was not present in the hierarchy produced. Row 2 indicates whether Luxembourg formed a separate cluster, an ‘X’ indicates that this cluster was not formed. Row 3 indicates the number of clusters the ‘other’
countries (cluster c3 from the dendogram hierarchy) are separated into and the figure in row 4 is the overall depth of the hierarchy. The diagrams of the hierarchies and a list of the countries present in each cluster are given in [10]. Single Pass Cyclic – three input orders
DFGIU Lux Others Cluster Depth
1
2
3
123 132 231 213 312 321
X X X 4 3
+1 X 5 6 5
+6 X 3 4 3
+2 X 1 2 1
+4 X 2 3 2
+1 X 1 2 1
+2 X 3 4 3
+2 X 4 5 3
+3 X 2 3 2
Table 1 Comparison of the dendogram hierarchy and the hierarchies produced from TreeGCS.
5
Analysis
The cyclic approach is more stable than the single pass. For the cyclic approach, the different orderings seem to counteract each other and cancel out the hierarchical variations. Cyclic TreeGCS consistently finds a {Den, Fr, Ge, It, NL, Sp, UK} cluster and has the remaining countries as the other half of the hierarchy. The single-pass approach does not consistently separate such a cluster (the alphabetical order from the single-pass has an ‘X’ in the first row of the table as this cluster was not in a separate half of the hierarchy). There is little consistency for the single-pass approaches in the clusters formed from the 36 countries not in cluster c1 of the dendogram hierarchy. Comparing the cyclic hierarchies with the dendogram, the dendogram similarities are emulated. However, cyclic TreeGCS includes Sp and NL in its cluster {Den, Fr, Ge, It, NL, Sp, UK} unlike the dendogram {Den, Fr, Ge, It, UK}. Cyclic TreeGCS only omits NL from this cluster for the 231-cyclic hierarchy. From the dendogram, two of the most similar countries to {Den, Fr, Ge, It, UK} are Sp and NL so this is neither unexpected nor undesirable. The two anomalies in cyclic TreeGCS are the inclusion of Be for the 321-cyclic and Be and Sw for the 132-cyclic with {Den, Fr, Ge, It, NL, Sp, UK} but these are minor in comparison with the inconsistencies of the single-pass approach. The slight discrepancies of the cyclic approach are minor in comparison to the anomalies of the singlepass approach. The cyclic approach has improved the stability while maintaining consistency. The cyclic approach generally separates {Den, Fr, Ge, It, NL, Sp, UK} and forms similar clusters for the 34 countries in the other half of the hierarchy.
6
Conclusion
We have introduced a hierarchical, dynamically formed clustering neural network extending and
refining the GCS and partially overcoming an instability problem inherent in GCS. Our TreeGCS algorithm adaptively determines the depth of the cluster hierarchy; there is no requirement to prespecify network dimensions as with SOM-based algorithms. Our superimposed tree will adapt to any variations in the cell structure below and there are no user-specified parameters for the hierarchy. The clustering produced by the cyclic variant is similar to the dendogram while being able to handle identical similarities and maintaining the superiority of the SOM mapping approach as posited by Mangiameli [8]. In many cases the instability in GCS is due to the innate dynamism of the network and is caused by repeated deletion and reinstatement of the same clusters. This is difficult to detect at run-time unless a cluster history is maintained. To eliminate the instability completely would lose the benefits of GCS so we aim for a dynamic hierarchy that is as stable as possible but maintains the benefits. Another advantage of our approach over dendograms is that the leaf clusters represent groups of input vectors so the visualisation does not show a branch for each data point as with dendograms that precludes visualisation of dendograms for large datasets.
Acknowledgement This research was supported by an EPSRC studentship.
References [1] B. S. Everitt. Cluster Analysis, 1993. [2] H-H. Song & S-W. Lee. A Self-Organizing Neural Tree for Large Set Pattern Classification. In, IEEE Trans. on Neural Networks, 9(3), 1998. [3] T. Kohonen. Self-Organizing Maps, vol 2,1997 [4] B. Fritzke. Growing Cell Structures - a SelfOrganizing Network for Unsupervised and Supervised Learning. TR-93-026, Berkeley, 1993. [5] B. Fritzke. A Growing Neural Gas Network Learns Topologies. In, NIPS*94, MIT Press, 1995. [6] J. Bruske & G. Sommer. Dynamic Cell Structure Learns Perfectly Topology Preserving Map. Neural Computation, 7(4), 1995. [7] V. Burzevski & C.K. Mohan. Hierarchical Growing Cell Structures. Tech Report: Syracuse University, 1996 [8] P. Mangiameli, S. Chen & D. West. A Comparison of SOM Neural Network and Hierarchical Clustering Methods. EJOR, 1996. [9] World FactBook -- 1997. http://www.odci.gov /cia/publications/factbook/country-frame.html [10] V. Hodge & J. Austin. Hierarchical Growing Cell Structures: TreeGCS, submitted to IEEE TKDE Special Issue on Connectionist Models for Learning in Structured Domains.