Modular Network SOM: Self-Organizing Maps in Function ... - CiteSeerX

Neural Information Processing - Letters and Reviews

Vol.9, No.1, October 2005

Modular Network SOM: Self-Organizing Maps in Function Space Kazuhiro Tokunaga, Tetsuo Furukawa, and Syozo Yasui Graduate School of Life Science and Systems Engineering Kyushu Institute of Technology 2-4 Hibikino, Wakamatsu-ku, Kitakyushu 808–0196, Japan E-mail: [email protected], {furukawa,yasui}@brain.kyutech.ac.jp (Submitted on July 6, 2005) Abstract - This study presents a new concept that generalizes the self-organizing map (SOM) by adopting the idea of modular network, which we call “modular network SOM (mnSOM)”. In the mnSOM, each codebook vector in the conventional SOM is replaced by a functional module which is a neural network. With mnSOM, the application targets can be widely expanded from fields involving vectorized data to those dealing with more general classes of datasets relevant to functions, systems, time series and so on. In this paper, the idea, the architecture and the algorithms are described for mnSOM, along with some simulation results. Keywords - Self-organizing map, modular network, multilayer perceptron, learning, data clustering, data visualization

1. Introduction Kohonen’s Self-Organizing Map (SOM) is an unsupervised learning algorithm that performs topologypreserving transformations from higher-dimensional vector data spaces to lower map spaces [1, 2]. The SOM has become a powerful tool in many areas such as data mining, data analysis, data classification, data visualization and so on. And, many expansions have been proposed for individual application paradigms. Despite such increasing importance, the conventional SOM can only deal with vectorized data. This study presents a concept which generalizes SOM by introducing the idea of the modular network SOM (mnSOM). In an mnSOM, each codebook vector in the conventional SOM is replaced by a functional module constructed by a neural network (see Figure 1). Therefore, the mnSOM has an arrayed structure consisting of an assembly of functional modules, which can be designed at users’ discretion to suit their tasks, while the backbone algorithms of SOM remain essentially untouched, e.g., the Winner-Take-All principle and the use of neighborhoodfunction. As such, mnSOMs have several advantages. First, the application targets are widely expanded from fields involving vectorized data to those dealing with more general classes of datasets relevant to functions, systems, time series and so on. The second advantage is that what is needed when one wishes to apply an mnSOM to a specific task, is basically only the choice between module types. Thus, mnSOMs provide high degree of freedom and high design flexibility to users. Finally, mnSOMs have the advantage of giving functional data processing capability to their nodal modules. In other words, an mnSOM can be used as an assembly of data processors after training. While many types of trainable architectures are available for use as functional modules, in this paper we focus on the cases in which the functional modules are multilayer perceptrons (MLPs). Since each MLP module represents a function or a system in terms of the input-output relationship, the datasets given to the mnSOM are the input-output data obtained from a group of systems. The MLP modules of the mnSOM are expected to learn and to identify each of the underlying systems. The entire mnSOM is then expected to generate a feature map in which the systems are allocated according to their mutual distances in function space. Therefore, an mnSOM with MLP modules is an SOM in function space rather than vector space [3].

15

Modular Network SOM: Self-Organizing Maps in Function Space

K.Tokunaga, T. Furukawa, and S. Yasui

Figure 1. Concept of the mnSOM architecture. In this paper, the idea, architecture and algorithms of the MLP-module-mnSOM are presented along with the results from two example simulations.

2. Theoretical Framework To explain the algorithm of an mnSOM, let us consider a case in which the user tries to map a set of systems by using an MLP-module-mnSOM. Now suppose that there are M systems {S1 , . . . , SM }, the input-output relationships of which are described by a set of functions {f1 (·), . . . , fM (·)}. Here, it is assumed that these functions are unknown a priori, but the datasets observed from those systems are assumed to be available. Thus, let Di = {(xi,1 , yi,1 ), . . . , (xi,J , yi,J )} (i = 1, . . . , M ) denote the dataset observed from the i-th system, each vector pair of which satisfies yi,j = fi (xi,j ). Under such conditions, the aims of the mnSOM are (i) to identify the unknown functions {fi (·)}M i=1 from the observed datasets {Di }M i=1 by using the MLP modules, and (ii) to generate a feature map of the M systems in such a way as to reflect the mutual distances between the systems in function space. The first aim (i) means that the best matching module (BMM) of each system is expected to learn the input-output relation of the system, while the other aim (ii) means that if the i-th system and the i0 -th system have similar characteristics, then the corresponding positions of the two BMMs are required to be near to each other in the map space. Furthermore, the modules between the BMMs of the i-th and the i0 -th systems are expected to be “intermediate systems” by interpolation. These tasks are processed in parallel. The algorithm of the mnSOM is described below as iterative learning, each step of which consists of four processes: evaluative process, competitive process, cooperative process, and adaptive process. In the evaluative process, the outputs of all mnSOM modules are evaluated for each input-output data vector (1) (N ) ˆ i,j } pair. Suppose that an input data vector xi,j is picked up, then the outputs of the MLP modules {ˆ yi,j , . . . , y are calculated for the input. Here, N denotes the number of the modules. This calculation is repeated for all pairs (k) of sample datasets. Then the mean square errors {Ei } are evaluated as follows. (k)

Ei

=

J °2 1 X° ° (k) ° ˆ i,j ° °yi,j − y J j=1

(1)

=

J °2 1 X° ° ° °fi (xi,j ) − g (k) (xi,j )° . J j=1

(2)

Here, g (k) (·) denotes the input-output function of the k-th MLP module. If J is large enough, then the distance (k) between the k-th module and the i-th system in function space is approximated by the error Ei as follows. Z (k) L2 (fi , g (k) ) = kfi (x) − g (k) (x)k2 pi (x)dx ' Ei . (3)

16



Here, pi (x) denotes the probability density function of the input data vectors {xi,j } of the i-th system. It is assumed in this paper that {p1 (x), . . . , pM (x)} are approximately the same as p(x), due to normalization of the data distribution for each class. In the competitive process, the module which reproduces Di best is defined as the BMM, i.e., the winner module for the i-th system. This is expressed mathematically as follows. Let ki∗ be the module index of the BMM of the i-th system, then ki∗ is defined as (k)

ki∗ = arg min Ei k

.

(4)

Thus, the winner module for the i-th system is the ki∗ -th module. In the cooperative process, the learning rates of the modules are calculated by using the neighborhood function. Usually, a BMM and its neighbor modules gain larger learning rates than other modules. Let the learning rate (k) ψi (T ) be the learning rate of the k-th module for the i-th system at the T -th iteration step. Then, we have (k) ψi (T )

£ ¤ exp −l2 (k, ki∗ )/2σ 2 (T )

= PM

i0 =1

exp [−l2 (k, ki∗0 )/2σ 2 (T )]

.

(5)

Here, l(k, ki∗ ) expresses the distance between the k-th module and the BMM of the i-th system in the map space, whereas σ(T ) denotes the size of the neighborhood function that shrinks with the iteration step T . Finally, all MLP modules are trained by the backpropagation learning algorithm in the adaptive process, as follows. M (k) X (k) ∂Ei . (6) ∆w(k) = −η ψi ∂w(k) i=1 Here, w(k) is the weight vector of the k-th MLP module. The input vector xi,j and the corresponding output vector yi,j are presented as the input and the desired output, respectively. While modules are trained by the (k) backpropagation algorithm for all pairs of input-output data, the output errors are summed up to compute Ei in parallel. Thus, the adaptive process at time T and the evaluative process at time (T + 1) progress simultaneously. These four processes are iterated until the entire network reaches a steady state.

3. Simulations and Results A simulation study was undertaken for two example cases. The first example is concerned with a family of cubic functions, and the second involves a meteorological dataset. 3.1 The mnSOM of Cubic Functions In the first simulation, the datasets were generated artificially by using cubic functions, as follows. yi,j = ai x3i,j + bi x2i,j + ci xi,j .

(7)

By varying the values of ai , bi and ci , a family of six functions were generated (M = 6). Correspondingly, six datasets {D1 , . . . , D6 } consisting of twenty pairs of input-output data were sampled randomly, as is shown in Figure 2. Thus, there were M × L = 6 × 20 pairs of data vectors. The probability density function of input p(x) was made uniformly distributed in [−1, +1]. The mnSOM consisted of one hundred MLP modules arranged in a 10 × 10 lattice (N = 100). The modules were the three-layer MLPs; the input, hidden and output layers had one, eight and one units, respectively. Figure 3 shows the result. Each box corresponds to an MLP module, in which the function acquired by the module g (k) (x) is depicted. The numbered boxes represent the BMMs for the training datasets shown in Figure 2. These BMMs reproduced well the original functions, and all other modules acquired intermediate functions between the training ones. As a result, the functions acquired by the MLP modules varied continuously in the map space. Therefore, the mnSOM succeeded in generating a feature map of cubic functions. Again, we emphasize that only randomly sampled data points were given to the mnSOM and no other information was used.

17


A


C

B

1

1

1

0.5

0.5

0.5

0

0

0

-0.5

-0.5

-0.5

-1

-1

-1 -1

-0.5

0

0.5

1

-1

-0.5

0

0.5

-1

1

E

D

1

1

0.5

0.5

0.5

0

0

0

-0.5

-0.5

-0.5

-1 -1

-0.5

0

0.5

1

0

0.5

1

-0.5

0

0.5

1

F

1

-1

-0.5

-1 -1

-0.5

0

0.5

1

-1

Figure 2. Training datasets sampled from six cubic functions.

Figure 3. Map of the cubic function family generated by the mnSOM. 3.2 The mnSOM of Meteorological Dynamics In the second simulation, the task of the mnSOM was to generate a map of weather dynamics of Kyushu Island District of Japan. The dataset used was one available in a bulletin published by The Meteorological Agency of Japan. It consists of daily records of weather attributes such as atmospheric pressure, temperature, humidity and sunlight hours, during January of the year 2000 at twenty cities in Kyushu Island, as shown in Figure 6(a). Figure

18



4 shows examples of representative time series of each attribute data. Such time series data of ten out of twenty cities (A-J in Figure 6(a)) were given to the mnSOM for training, whereas the data of the remaining ten cities (a-j in Figure 6(a)) were used for testing.

A F J

Atmospheric Pressure

F A J Temperature

F A J F

J A

Humidity

Sunlight Hours

Figure 4. Examples of time series data of four weather attributes (January 1–31, 2000) for cities A, F, J of Figure 6(a).

Input

Hidden

Output

P P Prediction of (n+1)-th day T H S

(n-2)-th day T H S P

(n-1)-th day T H S

n-th day

Bias Part P T Average values H S

P T H S

P:Atmospheric Pressure

T:Temperature

H:Humidity

S:Sunlight Hours

Figure 5. The module architecture of the individual modules embedded in the mnSOM network for learning the meteorological dynamics. The module architecture of the mnSOM is presented in Figure 5. The task of each MLP module was to predict the weather of the fourth day, from the input data which were the weather data recorded at each city for the previous three days. The MLPs were trained by using the backpropagation algorithm with the teacher signal that tells the actual weather of the fourth day at the given city. Prior to the simulation, each time series record was normalized so that the DC component was removed. Thus, the MLP module learns the AC components that reflect the weather dynamics fluctuating around the average. Furthermore, each module has an additional bias unit

19



A

d

J

b

c

b

I

a D

e

C

A

c

a

B

B

C

e

i

J

f

I

d D

H

f

g h

E

G

g

E

h

i

j F

j

G

MO

F (a)

H

(b)

Figure 6. (a) Map of Japan’s Kyushu Island. Weather data of cities A–J were used for training, and the data of cities a–j were used for testing. (b) The weather map of Kyushu Island generated by an mnSOM. The labeled boxes A–J and a–j correspond to the cities in Figure 6(a). The gray scale represents the average distance in the function space between a module and its neighborhood. to be trained to represent the monthly average (DC) value of each weather attribute measured at each city. For this reason, this architecture is a hybrid of a conventional SOM and an mnSOM. After the training was successfully concluded, the network of the mnSOM was fixed. Then the BMM of each test city was identified by giving the test data to the mnSOM. A simulation result is shown in Figure 6(b). The labeled modules represent the BMMs determined by the training (A-J) and the test (a-j) cities of Figure 6(a). In the feature map of Figure 6(b), all the test cities were allocated at appropriate positions, in such a way as to interpolate between the BMMs of the training cities. Thus, the map of Figure 6(b) organized by the mnSOM showed a continuous topological order of the actual geography (Figure 6(a)) of Kyushu Island. Therefore, the mnSOM succeeded in generating a “weather map of Kyushu Island.”

4. Discussion and Closing Remarks The mnSOM described in this study is an SOM in function space, rather than in vector space. There might be a way that a conventional SOM can deal with functions and produce maps similar to the ones presented here. However, this would require regular-interval sampling of the function values, in order to permit the functions to be treated as high-dimensional vectors. By contrast, the sampling can be random in the mnSOM method. The functions underlying the sampled data are not a priori known to the mnSOM network. Note, incidentally, the family of cubic functions used in the simulation of 3.1. were a priori known to the present authors, but not to the network. As in the conventional SOM paradigm, the Winner-Take-All principle and the neighborhood synergism are coupled to make topological clustering. Moreover, the modules other than BMMs serve to interpolate gaps between the BMMs. This topological continuity underlies the generalization ability of mnSOMs. We have undertaken several experiments, in each of which the mnSOM module is a different type of neural network such as a recurrent neural network(RNN) [7], an autoassociative neural network [8] and an SOM [9]. If

20



an RNN is employed as the functional modules of an mnSOM, then the mnSOM acquires an ability to deal with dynamical systems. We have already reported several such works, including the cases of linear damped oscillatory systems and nonlinear neuronal systems [6]. On the other hand, if an autoassociative neural network or an SOM is employed, then the mnSOM acquires the ability to deal with manifolds. Thus, many other subarchitectures are conceivable as possible functional modules of the mnSOM. Acknowledgement: This study has been supported in part by a grant (Center # J19) provided from the 21st Century Center of Excellence Program to Kyushu Institute of Technology.

References [1] T. Kohonen, “Self-organized formation of topologically correct feature maps,” Biological Cybernetics, Vol.43, pp.59--69, 1982. [2] T. Kohonen, Self Organizing Maps, Springer Verlag, 1995. [3] K. Tokunaga, T. Furukawa and S. Yasui, “Modular network SOM: Extension of SOM to the realm of function space,” Proc. of 4th Workshop on Self-Organizing Maps, pp.173--178, 2003. [4] T. Kohonen, S. Kaski and H. Lappalainen, “Self-organization formation of various invariant-features filters in the adaptive subspace SOM,” Neural Computation, Vol.9, pp.1321--1344, 1993. [5] T. Kohonen, “Generalizations of the self-organizing map,” Proc. of Int. Joint Conference on Neural Networks, Vol.1, pp.457--462, 1993. [6] T. Furukawa, K. Tokunaga, S. Kaneko, K. Kimotsuki and S. Yasui, “Generalized self-organizing maps (mnSOM) for dealing with dynamical systems,” Proc. of Int. Symposium on Nonlinear Theory and its Applications, pp.231--234, 2004. [7] S. Kaneko, K. Tokunaga and T. Furukawa, “Modular network SOM: The architecture, the algorithm and applications to nonlinear dynamical systems,” Proc. of 5th Workshop on Self-Organizing Maps, 2005, in press. [8] K. Tokunaga and T. Furukawa, “Nonlinear ASSOM constituted of autoassociative neural modules,” Proc. of 5th Workshop on Self-Organizing Maps, 2005, in press. [9] T. Furukawa, “SOM of SOMs: Self-organizing map which maps a group of self-organizing maps,” Proc. of Inter. Conference on Artificial Neural Networks, 2005, in press. Kazuhiro Tokunaga was born in 1978. He received the B.E. and the M.E. degrees in control engineering and science from Kyushu Institute of Technology (Kyutech), in 2001 and 2003, respectively. He won the Best Master Thesis Award from a work which led to the present paper. He is presently a Ph.D. candidate in the 21st Century COE Program at Kyutech. His current research interests include machine learning, evolutional computing and bioinformatics.

Tetsuo Furukawa was born in Japan, in 1963. He received a Ph.D. from Osaka University in 1998. He is currently an associate professor at Kyutech. His research interests have been concerned with both pure and applied aspects of artificial as well as biological neural networks. Recently he is especially interested in theoretical and applied areas relevant to self-organizing systems.

21



Syozo Yasui who is currently a professor at Kyutech graduated from MIT with Ph.D. in 1973. He served as the president of JNNS during 2003-2004. His resent research interests include neural network pruning, selective binding and internal model formation, with applications to ICA and analogical learning.

22

Modular Network SOM: Self-Organizing Maps in Function ... - CiteSeerX

Modular Network SOM: Self-Organizing Maps in Function ... - CiteSeerX

Suggest Documents

Network Function Virtualization - UPC Maps

Network Function Virtualization - UPC Maps

Network Function Virtualization - UPC Maps

Network Function Virtualization - UPC Maps

Fixed Points of Multivalued Maps in Modular Function Spaces - EMIS

Selforganizing maps for latent semantic analysis ... - Wiley Online Library

MODULAR VERIFICATION OF FUNCTION BLOCK ... - CiteSeerX

INTEGRATION OF SOM NETWORK AND

CENSE: A modular Sensor Network Testbed - CiteSeerX

A convexity property in Modular function spaces - CiteSeerX

Service Maps for Heterogeneous Network Environments - CiteSeerX

A Hybrid Parallel SOM Algorithm for Large Maps in Data ... - CiteSeerX

Function contractive maps in triangular symmetric spaces

A Segmentation Group by Kohonen Self Organizing Maps (SOM) and ...

Associative Memory in a Multi-modular Network

Preserved Modular Network Organization in the

Application of Kohonen Self-Organizing Maps (SOM) - IEEE Xplore

Use of Self-organizing Maps (SOM) for ...

Network Function Virtualization: Challenges and ... - CiteSeerX

SELF-ORGANIZING MAPS WITH SLIDING WINDOW (SOM+SW)

Visual Analysis of SOM Network in Fault Diagnosis

Concept Maps: Implications for the Teaching of Function ... - CiteSeerX

fuzzy association rule reduction using clustering in som neural network

A Modular Platform for Wireless Body Area Network ... - CiteSeerX