Class-Proximity SOM and Its Applications in Classification Pitoyo Hartono and Aya Saito Department of Media Architecture Future University-Hakodate Hakodate, Japan
[email protected] Abstract—In this study, we propose a model of Self-Organizing Map (SOM) capable of mapping high dimensional data into a low dimension space by preserving not only the feature-proximity of the original data but also their class-proximity. A conventional SOM is known to map original high dimensional data with similar features into points located close to each other in the low dimensional map in a so called competitive layer. In addition to this feature, the proposed SOM is also able to map high dimensional data belonging to a same class in each other’s proximities. These characteristics retains the ability of the map to be used as a visualization tool of high dimensional data while also support the execution of high quality pattern classifications in the low dimensional map. In the experiments the classification performance of the proposed SOM is compared to that of MLP with regards to wide varieties of problems. Index Terms—Keywords goes here.
I. I NTRODUCTION In this study, we propose a model of Self-Organizing Map (SOM) [1][2], namely Class-Proximity SOM (CPSOM) which is able of preserving not only the topological relationship of high dimensional data but also their class-proximity in a low dimensional map. In the conventional SOM, data are distributed in the map based only on their features, in which original high dimensional data with similar features are mapped into each others’ neighborhoods. In this study, the learning process of CPSOM generates a low dimensional map where orignal data that belong to a same class and at the same time have similar features are aligned in each others’ proximities, thus forming a cluster. By introducing class as one of the mapping criterions, the original topological characteristics are not necessarily preserved. However, within a certain cluster, the proximity relationship between the data remains true. A simple classification method similar to Learning Vector Quantization (LVQ) [3] or Nearest Neighbor [4] can then be executed in the generated map to classify an unlabeled sample. Thus, the objective of this study is to propose a new topological map that can be used not only for visualization purpose but also for data classification. So far, there are several studies that utilized SOM for pattern classification. For example, Self-Organizing Relationship (SOR) Network in [5] [6] is a new model of SOM that introduced ”learning with a critic” method to achieve function approximation. The proposed CPSOM is different from SOR in the learning objective, in that CPSOM is trained to form class-clusters in the map, while SOR network is trained to
approximate input-output relation of the data. Vector Quantized Temporal Associative Memory (VQTAM) [7] is a modified SOM that learns the relationship of temporal data, which is different with our proposed SOM in learning method and objective. We are also aware of several models of SOM which regulize the alignment of high dimensional original data in the low dimensional map. ViSOM, proposed in[8] regularizes interneuron distance in controlling the resolution of the generated map. Probabilistic Regulized SOM (PRSOM) [9], an improvement over ViSOM, is capable of realizing multidimensional scaling (MDS). CPSOM differs from these models in that it utilize the class of the data as the learning parameters to generate the map. In this paper the structure and the learning method of the proposed CPSOM is explained in Section II, where the mapping characteristics are also illustrated. In Section III, the classification accuracy of CPSOM is tested with several benchmark problems and compared with other classifiers. The conclusion will be given in the final section. II. CPSOM The structure of CPSOM, which is identical with SelfOrganizing Relationship(SOR) Network proposed in [5][6] is shown in Fig.1. CPSOM is composed of an input layer that is divided into feature part and class part, and a competitive layer where several neurons are arranged in two dimensional array. These neurons are the reference vectors for the high dimensional data and can be expressed as points in a low dimensional map. Given labeled data, the goal of CPSOM is to generate a topological map where original data with similar features and a same label are distributed in each other’s neighborhood, thus forming a cluster of reference vectors of data belonging to a same class. The feature part of the input layer receives a sample vector drawn from the data, while the class part receives the label (also expressed as a vector) of that sample. The feature vector of the drawn sample and its associated class vector are expressed as follows. Xs Cs
2150 c 2008 IEEE 1-4244-2384-2/08/$20.00
= =
{xs1 , xs2 , . . . , xsn } ∈ Rn {cs1 , . . . , csk } ∈ {0, 1}k
(1)
n
distf (s, i)
=
1 s |x − wji | n j=1 j
(4)
distc (s, i)
=
k 1 s |c − uji | k j=1 j
(5)
0.5 + 0.5 (6) T +1 T in Eq.3 and Eq.6 denotes the number of training epochs, where one training epoch contains the presentation of all the samples in the data in a random order. In Eq.3, distf (s, i) and distc (s, i) show the feature distance and class distance between the sample s and the i-th neuron, respectively. As opposed to the conventional SOM, the distance in Eq.3 is a function of time, where the weighting coefficients of feature and class distances change with the progress of the learning process. From Eq.3 and Eq.6, it is clear that the distance between a sample s and the i-th neuron in the competitive layer is the weighted distance of the feature distance and class distance. Utilizing this distance, it is obvious that in the early phase of training process the class vector is prioritized. Over the training progress, the coefficient of the class distance gradually decreases, while the feature distance is increases. Eventually, the feature vector and class vector are evenly weighted in calculating the distance between a sample and the winner neuron. After the presentation of the s-th sample, a winner neuron is selected, from the total of M neurons in the competitive layer, as follows. η(T ) =
Fig. 1.
Structure of CPSOM
X s and C s are the feature vector and the associated class vector of the sample s, respectively. In Eq.1 n is the dimension of the feature vector while k is the number of classes. A class vector is expressed as a k dimensional binary vector, where it has a 1 element whose position shows the class of the sample. For example, when X s belongs to a class a, the components of the class vector is as follows.
csa csi
= 1 = 0 i = a
(2)
The feature part and class part of the input layer are connected to the i-th neuron in the competitive layer with vector W i = {w1i , w2i , . . . , wni }, and vector U i = {u1i , . . . , uki }, respectively.
win = arg max D(s, j) j
(7)
j ∈ 1, . . . , M A. Learning Algorithm of CPSOM The goal of the learning process of CPSOM is to form a map in the competitive layer where original data with similar features that belong to a same class are mapped within one cluster. This objective is different from the conventional SOM, where the classes of the data are not considered. Considering the mapping characteristics of CPSOM, a classification method similar to LVQ and Nearest Neighbor method can then be executed in the map to decide the class of an unlabeled sample. It has to be noted, that these methods are not necessarily suitable for SOM, because in SOM data belonging to different classes are aligned close to each other if they have similar features. In the learning process, given a feature vector X s with a class vector C s , the distance between the sample and the ith neuron in the competitive layer, D(s, i), is as defined as follows.
DT (s, i) = (0.5 − η(T ))distf (s, i) + η(T )distc (s, i)
(3)
The consequences of the distance function in Eq.3 and selection method in Eq.7 are that, in the early phase of the learning process, the original data are mapped to the competitive layer based only on their classes, implying that data belonging to a same class are mapped into one particular neuron in the competitive layer regardless of their features. With the progress of the learning process, it is expected that the neighborhood of that particular neuron will expand to form a cluster of winner neurons representing original data belonging to a same class. In CPSOM, every neuron in the competitive layer has the so called ”class counters” which are updated for every presentation of a sample. When the presented sample, at time t, belongs to class c, the class counter of the i-th neuron with regard to the class c ∈ {1, . . . , k}, Pic , is updated as follows. Pic (t + 1) Pim (t + 1)
= Pic (t) + η(t)(1 − distc (s(t), i)) = Pim (t) m = c
(8)
In Eq.8, s(t) is the sample drawn at time t.
2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)
2151
According to Eq.8, the counter of a neuron with respect to the class of the presented sample will be greatly increased if the class distance between that neuron and the class vector of the sample is small. The increase will be small if the class distance is large, and the counters for unrelated classes stay the same. Every class counter is initialized with 1 and the weight vector from the class part of the input layer is initialized to ensure that 0 ≤ distc (s, i) ≤ 1. It is easy to see that the initial bias of class distance will be enhanced, thus a neuron that has a tendency to become a reference vector for data belonging to a certain class will further become an attractor for that class. The class counter is then used to calculate the likelihood that neuron i in the competitive layer is a reference vector of data belonging to class c, λic as follows. Pic (t) λic (t) = k j=1 Pij (t)
(9)
In the learning phase for a sample belonging to class c the connection vectorsof the winner neuron are updated as follows, in which α(t) is a decreasing learning coefficient. wji (t + 1) uli (t + 1) i = win(t)
decide that the assumed class which generates the minimum distance is the correct class of the unlabeled sample. The classification method is formulated in Eq.13. class(X) = arg min distj (X) j
(13)
j ∈ {1, . . . , k} C. Mapping Characteristics To illustrate the mapping characteristics, we trained CPSOM with two double-spirals problems shown in Fig.2. In these problems, the regions colered in black and white are labeled differently. For visual comparison, the maps generated by SOM with regards to these two problems are shown in the left sides of Fig.3 and Fig.4, while the maps generated by CPSOM are shown in the right sides. It is obvious that CPSOM distributes the data in the competitive layer according not only to the topology of the original data but also their labels. It is easy to see that maps generated by CPSOM formed two close clusters, each one represent the class of the original data. The formation of clusters based on the data’s classes will simplify the classification process in the generated map.
= wji (t) + α(t)λic (t)(1 − η(t))(xsj − wji (t)) = uli (t) + α(t)λic (t)η(t)(csl − uli (t))
(10) (11)
For the non-winner neuron the corrections of the vectors are further weighted with a neighborhood function as in the conventional SOM. The learning method above ensures that in the initial phase of the learning process, the original data are mapped based only on their classes. Hence, there will be a few neurons in the competitive layer representing each class of the data. Along the progress of the learning process, these neurons become attractors to original data belonging to a same class. λ works to scale the learning intensity of a winner neuron according to the class-likelihood of that neuron. The intensity is high when the likelihood of the winner with regards to the class of the given sample is large. In the opposite case, the intensity is low.
Fig. 2.
Double Spirals
B. Nearest Neighbor Classification in CPSOM In the classification phase, given an unlabeled pattern X, we make an assumption that X belongs to class a, thus the class label of this unknown sample is set as follows. C ci
= {c1 , . . . , ck } = 1 (i = a) =
(12)
0 otherwise
A winner neuron in the competitive layer is decided based on Eq.3 and Eq.7 in which η is set to 0.5 . Let the distance of X (assuming that it belongs to class a) to the winner neuron in the competitive layer be dista (X). Considering the characteristic of the CPSOM in which original data with similar features and a same class are mapped in a close cluster, it is logical to
2152
Fig. 3.
Map of Double Spirals(a)
III. E XPERIMENTS Figure 5 shows some snapshots of the formation of a map in the competitive layer during three stages of the learning process with regards to glass classification problem [10]. This
2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)
Fig. 4.
Map of Double Spirals(b)
problem is a nine dimensional classification problem originally with six classes (originally seven classes, but class 4 is not represented in the data). From this figure we can see that in the early phase of the learning process, some neurons in the competitive layer became winners where each of them is a reference vector for data belonging to a certain class. In first graph in Fig.5, the initial winners are shown with the dotted circles. Along the progress of the learning process, clusters of winner neurons are formed around the initial winners. For comparison, Fig. 6 shows the conventional SOM with the same size as the proposed CPSOM. In the SOM, we cannot see the formation of clear clusters. For another illustration, Fig.7 shows the maps generated by CPSOM and SOM with regards to Iris classification problem [10], which is a four dimensional classification problem with three classes. It is also obvious form this figure that CPSOM does not only preserve the feature-proximity but also the classproximity of the original high dimensional data in the low dimensional map. Figure 8 shows the learning errors, which is the average distance of a given sample and the winner neuron over one learning epoch. Figure 9 shows the comparison of generalization the classification accuracy of the CPSOM with that of MLP, K-means and LVQ3 with respect to several benchmark problems taken from UCI repository [10]. The accuracies are calculated with respect to samples that are not used in the training process. For each of the problem, the result is an average over 30 runs, in which the classification method in 13 is executed in the competitive layer of CPSOM. For the maps formations in CPSOM, the learning rate α is gradually change from 0.9 to 0.1 over the learning process and the size of the map is commonly set to 25 × 25. It is obvious from the graph that CPSOM outperformed K-Means and LVQ3 for all the problems. While CPSOM is outperformed by MLP in some of the problems, in general it is capable of reaching a high generalization performance with regards to wide range of benchmark problems while also providing a means for data visualization. IV. C ONCLUSION In this research we propose a modified SOM capable not only of preserving the topological relationship between high
Fig. 5.
Map of Glass(CPSOM)
2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)
2153
Fig. 8.
Fig. 6.
Map of Glass(SOM)
Fig. 9.
Fig. 7.
2154
Learning Error
Map of Iris
Performance Comparison
dimensional data but also preserving their class-proximity. We are aware that the introduction of class as one the criterions to calculate the distance between two samples, alters the original topological relationship of the data. However, we also realize that while the preservation of the original topological relationship in a low dimensional map offers a fine method for visualization, it does not necessarily suitable for classfication. The new topological relationship in this paper is introduced with the consideration of the execution of data classification in the low dimensional map. As opposed to the orginal SOM, the distance function in this research is not stationary, in which in the early phase of the learning process the classes
2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)
of the data are the main mapping criterion for mapping and in the final phase, the classes and the features of the data are evenly weighted. This distance function is the main factor in generating a close-cluster map, where the simple classfication method can be implemented. In the experiments, the mapping characteristics of the proposed CPSOM are illustrated. The result of the implementation of classification task with regards to several benchmark problems are also given. While CPSOM does not dominate the other methods, it has a reliable performance over wide range of benchmark problems. The applications of the proposed CPSOM into real world problems such as image processing, mining of large-scaled data, visualization of anomaly data will be considered. In the future, the investigation of the mathematical aspects of CPSOM mapping characteristics is of interest to us. Especially the investigation of the relation between the proposed map and its classification method with Kernel Method is one of our main reseach interests. R EFERENCES [1] T. Kohonen, Self-Organized Formation of Topologically Correct Feature Maps, Biological Cybernetics, Vol.43, pp.59-69, 1982. [2] T. Kohonen, Self-Organizing Maps, Springer-Verlag, 1995. [3] T. Kohonen, G. Barna, and R. Chrisley, Statistical Pattern Recognition with Neural Networks: Benchmarking Studies, Proc. IEEE Int. Conf. on Neural Networks, Vol.1 pp.61-68, 1988. [4] T. Cover and P. Hart, Nearest Neighbor Pattern Classification, IEEE Trans. on Information Theory, Vol.IT-13, No.1, pp.21-27, 1967. [5] T. Koga, K. Horio and T. Yamakawa, The Self-Organizing Relationship (SOR) Network Employing Fuzzy Inference Based Heuristic Evaluation, Neural Networks, Vol. 19, Nos.6-7, 2006. [6] T. Yamakawa and T. Horio, Self-Organizing Relationship (SOR) Network, IEICE Trans. on Fundamentals, Vol.E82-A, No.8, pp.1674-1677, 1999. [7] G.A. Barreto, and A.F.R Araujo, Identification and Control of Dynamical Systems Using the Self-Organizing Map, IEEE Trans. on Neural Networks, Vol.15, No. 5, pp.1244-1259, 2004. [8] H. Yin, ViSOM-A Novel Method for Multivariate Data Projection and Structure Visualization, IEEE Trans. on Neural Networks, Vol.13, No.1, pp. 237-243, 2002. [9] S. Wu, and W.S. Chow, PRSOM: A New Visualization Method by Hybridizing Multidimensional Scaling and Self-Organizing Map, IEEE Trans. on Neural Networks, Vol. 16, No. 6, 1362-1380, 2005. [10] UCI Repository, http://www.ics.uci.edu/ mlearn/MLRepository.html
2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)
2155