Self-Organizing Maps for Pattern Classi cation

6 downloads 0 Views 261KB Size Report
Let us assume that m is the dimension of the input features and v denotes an input pattern vector, i.e. v= v1 ...... 28(3):315{329, 1995. 14] Tsu-Chang Lee and Allen M. Peterson. .... 38] Roberto Battiti and Anna Maria Colla. Democracy in Neural ...
Self-Organizing Maps for Pattern Classi cation

Ph.D Con rmation Report D.A.S. Atukorale Department of Computer Science and Electrical Engineering University of Queensland QLD 4072 Australia. 16th March 1999

EXTENDED ABSTRACT We are investigating novel architectures of self-organizing maps for pattern classi cation tasks. We started our research by investigating a recently proposed algorithm known as the neural gas (NG) algorithm. In this report, we have proposed an implicit ranking scheme to speed up the sequential implementation of the original NG algorithm. The original NG algorithm used a time consuming explicit ranking scheme. We looked at the NG algorithm rather than Kohonen's SOFM algorithm, because the NG algorithm takes a smaller number of learning steps to converge, does not require any prior knowledge about the structure of the network (i.e. topology) and its dynamics can be characterized by a global cost function. We then developed an hierarchical overlapped NG network architecture (HONG) to obtain a better classi cation on con icting data. This is particularly important in totally unconstrained handwritten data, since they contain con icting information within the same class due to various writing styles and instruments used. The HONG network architecture successively partitioned the input space to avoid such situations by projecting the input data to di erent upper level NG networks. Since the training and the testing samples are duplicated in the upper layers of the HONG architecture, we obtain multiple classi cations for every sample. This allowed us to employ one of the several classi er combination schemes to obtain the nal classi cation. Finally, we trained three di erent HONG network classi ers based on three different feature extraction methods. This allowed us to obtain multiple classi cations again for the same data based on their extracted features. The nal classi cation was made based on the overall con dence vectors generated by each of the three HONG classi ers. The proposed architecture was tested on handwritten numerals extracted from the well known NIST SD3 database and we were able to obtain excellent results which are comparable with the current published results for that database. Currently we are using con dence values, generated by di erent classi ers, for combining classi ers. We are planning to interpret their outputs as fuzzy membership values or evidence values which will enable us to use fuzzy rules or Dempster-Shafer theory of evidence techniques for better classi cations. Also, we will investigate the topology representation schemes for the new HONG network architecture which will enable it to function as a feature map. This will allow us to analyse energy function based approaches to generate a topology representation of the HONG network architecture, which we could not do with Kohonen's SOFM algorithm. This con rmation report is organized with eight sections, which discusses the research work done so far and the literature survey towards my Ph.D thesis topic. In section 1, an introduction to the Self Organizing Map (SOM) algorithm is given and the motivations that led to its origin is discussed. Even though two basically di erent feature mapping models can be found in the literature, here we focus our study only on Kohonen's model, which transforms an input signal pattern of arbitrary dimension into a lower dimensional discrete map in an orderly fashion. Then we inii

troduce the variants of the basic SOM algorithm, which are built to overcome some of the problems associated with the basic SOM algorithm. Finally, we introduce the Neural Gas (NG) algorithm as a better candidate in comparison to most of the other variants of the basic SOM algorithm. Despite all the advantages over the SOM network, the NG network algorithm suffers from a high time complexity problem in its sequential implementation. In order to overcome this problem, we introduced an implicit ranking scheme instead of the time consuming explicit ranking scheme. In section 2, we discuss the above problem and the modi cations done to the basic NG algorithm. In section 3, we discuss the hierarchical overlapped architecture, which we built using the modi ed NG algorithm. This enabled us to obtain a better classi cation on con icting data. The overlapped architecture helped us to obtain multiple classi cations and we employed the idea of con dence values in obtaining the nal classi cation. In section 4, we discuss how multiple classi ers can be combined in order to obtain a better classi cation. Here we used three classi ers based on three di erent feature extraction methods and combined their classi cations based on the con dence values generated by each of them. The current work enabled us to obtain an excellent recognition rate for the well known NIST SD3 database and the results are summarized in section 5. Due to the above achievement, we were able to publish the papers listed in section 6. The remaining work towards completion of my candidature will be carried out according to the time plan given in section 7.

iii

Contents EXTENDED ABSTRACT 1 INTRODUCTION

1.1 SOM Algorithm . . . . . . . . . . . . . . . . . . . 1.1.1 Competition . . . . . . . . . . . . . . . . 1.1.2 Cooperation . . . . . . . . . . . . . . . . . 1.1.3 Synaptic Adaptation . . . . . . . . . . . . 1.2 Variants of Basic SOM Algorithm . . . . . . . . . 1.3 Problems Associated with Basic SOM Algorithm 1.4 The NG Algorithm . . . . . . . . . . . . . . . . . 1.4.1 Important Features of the NG Algorithm

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

2 MODIFIED NG ALGORITHM

ii 1 1 2 2 3 4 6 6 8

8

2.1 Problems Associated with the Original NG Algorithm . . . . . . . . . . 8 2.2 Implicit Ranking Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 HIERARCHICAL OVERLAPPED ARCHITECTURE

10

4 COMBINATION OF CLASSIFIERS

12

3.1 HONG Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Classi er Fusion Algorithms . . . . . . . . . 4.1.2 Dynamic Classi er Selection Algorithms . . 4.2 Use of Con dence Values in Combining Classi ers

5 EXPERIMENTAL RESULTS 5.1 5.2 5.3 5.4

. . . .

. . . .

. . . .

. . . .

Data Sets and Other Parameters used in the Experiments NG and HONG Algorithm Results . . . . . . . . . . . . . Feature Extraction Methods . . . . . . . . . . . . . . . . . Classi er Combination Results . . . . . . . . . . . . . . .

6 PUBLICATIONS 7 TIME PLAN 8 CONCLUSIONS BIBLIOGRAPHY

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

12 12 12 13

13 13 14 14 15

16 16 17 18

iv

1 INTRODUCTION 1.1 SOM Algorithm

In this study, we are investigating one of the special classes of Arti cial Neural Networks (ANNs) known as Self-Organizing Maps (SOMs). These networks are based on competitive learning and the output units, known as neurons, of the network compete among themselves to be activated or red. The output neuron that wins the competition is called the winning neuron (or sometimes called the winner-takes-all neuron). In an SOM, the neurons are placed at the nodes of a lattice that is usually one or two dimensional. Higher dimensional maps are also possible but not as common. As the training takes place, the neurons become selectively tuned to various input patterns or classes of input patterns, known as stimuli. The development of self-organizing maps as a neural model is motivated by a distinct feature of the human brain. In the human brain, the cerebral cortex maps di erent sensory inputs (motor, somatosensory, visual, auditory, etc) onto corresponding areas of the cerebral cortex in an orderly fashion. In the literature, we can nd two basically di erent feature mapping models based on the neurobiological motivation. The rst model was originally proposed by Willshaw and von der Malsburg [1]. Their model is based on the idea that the geometrical proximity of presynaptic neurons is coded in the form of correlations in their electrical activity. These correlations in the postsynaptic lattice are used to connect neighboring presynaptic neurons to neighboring postsynaptic neurons, hence producing a continuous mapping. The second model was proposed by Kohonen [2{4]. His model belongs to the class of vector coding algorithms. This model provides a topological mapping that optimally places a xed number of vectors (or code words) onto a higher dimensional input space, and thereby facilitates data compression. It appears that Kohonen's model is more general than the Willshaw-von der Malsburg model in the sense that it is capable of performing data compression. Also Kohonen's model has received much more attention in the literature than the Willshaw-von der Malsburg model. Here onwards we discuss only Kohonen's SOM, which is also known as a SelfOrganizing Feature Map (SOFM). The principle goal of the SOFM is to transform an input signal pattern of arbitrary dimension into a lower (possibly two) dimensional discrete map and to perform this transformation adaptively in a topologically ordered fashion. Figure 1 shows a typical setup of a two dimensional SOFM. The SOFM algorithm starts by randomly initializing the synaptic weights in the network. That is, in the initialization operation no prior order is imposed on the feature map. Once the network has been properly initialized, there are three essential processes involved in the formation of the SOM. They are: [5, Chapter 9] 1. Competition 2. Cooperation 3. Synaptic Adaptation 1

2D Lattice (rectangular)

Input Features (e.g. 5 features)

Figure 1: Two Dimensional lattice of neurons in a SOFM. Each neuron in the lattice is fully connected to the input layer of the network (i.e. input features).

1.1.1 Competition For each input pattern, the neurons in the network compute their respective values of a discriminant function. This discriminant function provides the basis for competition among the neurons. The neuron with the smallest value of discriminant function (in Euclidean space) is declared as the winner of the competition. Let us assume that m is the dimension of the input features and v denotes an input pattern vector, i.e. v= [v1 ; v2 ; :::; vm ]T . The synaptic weight vector of each neuron in the network has the same dimension as the input space. Let the synaptic weight vector of neuron i be denoted by wi, i.e. wi= [wi1 ; wi2 ; :::; wim ]T . To nd the best match of the input vector v with the synaptic weight vector wi, compare the distance between the vectors v and wi for i = 1; 2; :::; N and select the smallest. Here N is the total number of neurons in the network. We can write the selection of best matching or winning neuron k for the input vector v as,

k = arg min kv ? wik;

i = 1; 2; :::; N

i

(1)

1.1.2 Cooperation The winning neuron k determines the spatial location of a topological neighborhood of excited neurons. According to neurobiological evidence (for lateral interaction), a neuron that is ring tends to excite the neurons in its immediate neighborhood more than those farther away from it. This observation lead the SOFM algorithm to de ne the topological neighborhood around the winning neuron k, as explained below. Let dik denote the lateral distance between the winning neuron k and the excited neuron i. We can de ne the topological neighborhood hik as a unimodal function with 2

the following two requirements. i) It is symmetric about the maximum point de ned by dik = 0. ii) Its amplitude decreases monotonically with increasing lateral distance dik . A typical choice of hik that satis es these requirements is the Gaussian function,

 ?d2  hik = exp 2ik2

(2)

where  is the width of the topological neighborhood. Another feature of the SOFM algorithm is that the width of the topological neighborhood  shrinks with time. An exponential decay function is a popular choice for this. The  reduction scheme ensures that the map actually approaches a neighborhood preserving nal structure, provided that such a structure exists. If the topology of the output space does not match that of the data manifold, neighborhood violations are inevitable.

1.1.3 Synaptic Adaptation

For the network to be self-organizing, the synaptic weight vector wi of neuron i in the network is required to change in relation to the input vector v. For this type of unsupervised learning, we can use a modi ed version of Hebbian learning by including a forgetting term g(yi )wi, where g(yi ) is some positive scalar function of the response yi . We can de ne the change to the weight vector of the neuron i in the lattice as, wi = yi v ? g(yi )wi

(3)

where  is the learning rate of the SOFM algorithm. The rst term on the right-hand side of equation (3) is the Hebbian term and the second term is the forgetting term. We can choose a linear function for g(yi ) = yi and further we can simplify the equation (3) by setting yi = hik . Finally, using discrete-time formalism, given the synaptic weight vector wi(n) of neuron i at time n, the updated weight vector wi(n + 1) at time n + 1 is de ned by,

wi(n + 1) = wi(n) + (n)hik (n)[v(n) ? wi(n)]

(4)

which is applied to all the neurons in the lattice that lie inside the topological neighborhood of the winning neuron k. Equation (4) has the e ect of moving the synaptic weight vector wk of the winning neuron k and its neighboring neurons in the vicinity of the winning neuron k towards the input vector v. Upon repeated presentations of the training data, the synaptic weight vectors tend to follow the distribution of the input vectors due to neighborhood updating. The SOFM algorithm therefore leads to a topological ordering of the feature map in the input space in the sense that neurons that are adjacent in the lattice will tend to have similar synaptic weight vectors. The basic SOFM algorithm is summarized in Table 1.

3

Table 1: Basic SOFM algorithm

SOM1: Choose random values for the initial weight vectors wi(0). The only restriction here is that the wi(0) be di erent for each i = 1; 2; :::; N , where N is the number of neurons in the lattice.

SOM2: Select an input sample vector v from the input space. SOM3: Find the best matching or winning neuron k at time step n by using the Euclidean minimum-distance criterion,

k = arg min kv(n) ? wi(n)k; i

i = 1; 2; :::; N

(5)

SOM4: Update the synaptic weight vectors of all neurons centered around the winning neuron k using,

wi(n + 1) = wi(n) + (n)hik (n)[v(n) ? wi(n)]

(6)

where both (n) and hik (n) are varied dynamically during learning as explained earlier.

SOM5: Repeat steps SOM2 - SOM5 until no noticeable changes in the feature map are observed.

1.2 Variants of Basic SOM Algorithm We have shown in the previous sections that the basic SOFM algorithm de nes a nonparametric regression solution to a class of vector quantization problems and in that sense does not need any modi cations. Nonetheless, there exist other problems where the SOM philosophy can be applied in various modi ed ways, as for example, pattern classi cation tasks rather than just using it as a vector quantizer. There seems to exist a number of ways to de ne the matching of an input occurrence with the internal representation (e.g. di erent metrics [6]) and even the neighborhood of a neuron can be de ned in many ways. Regarding the de nitions of neighborhoods, several authors [7{17] have suggested that the de nition of hjk could be made dependent on intermediate results. Another idea is that, adding new neurons to the network (i.e. making it grow) or deleting them will describe the probability density function of input data more accurately. Bauer et al. [17] proposed a growth algorithm called Growing Self-Organizing Map (GSOM), which can adapt both the output space topology as well as the weight vectors. The GSOM algorithm starts with a two-neuron con guration, learns using the basic SOFM algorithm, adds neurons to the output space according to a criterion, learns again and keeps on repeating the above operations until a pre-speci ed maximum number of neurons is distributed. The growth can be achieved either by adding neurons in one of the directions which is already spanned by the output space or by 4

adding a new dimension which is decided on the basis of the uctuations within the masked voronoi cells of the neurons. In this model Bauer et al. decompose the reconstruction error (i.e. v ? wi) along the di erent directions which result from projecting back the output space into the input space. This reconstruction error is used as the criterion in the growth algorithm to add neurons in the direction which has on average the largest error amplitude.The GSOM algorithm restricts the output space structure to the shape of a general hypercube with the overall dimensionality of the grid and its extensions along the di erent directions being subject to adaptation. Another adaptive self-organizing neural tree called the Structure-Parameter-Adaptive (SPA) neural tree has been proposed by Li et al. [13]. Tree structured classi ers have been widely used in pattern recognition tasks and have demonstrated excellent results in the literature. The SPA neural tree can get adapted to a changing environment parametrically and structurally. In this architecture no structural constraints are speci ed for the neurons within the same level. That is, neurons in the same level are not ordered as a one or two dimensional array. The SPA neural tree begins with an empty structure and neurons are added to the tree when the error rate exceeds a threshold, and some neurons are deleted if they remain inactive for a long period. It uses a vigilance factor to control the creation of new neurons and a threshold factor to control the splitting of neurons into more neurons. An operational measure is used to control the deletion of neurons from the tree. In the SPA neural tree architecture, the neurons of a subtree have similar synaptic weight vectors, which re ects that its architecture can be used as a hierarchical classi er. Fritzke [7{11] has also proposed an alternative SOM structure called the Growing Cell Structures (GCS), which has the ability to automatically nd a problem speci c network structure through a growth process. Here the structure is modi ed dynamically by insertion and deletion of neurons. This is done during a self-organizing process which is similar to that in Kohonen's SOFM model. The main advantage of this model is that the network size does not have to be speci ed in advance. Instead, the growth process can be continued until an arbitrary performance criterion is met. Hierarchical maps, Supervised SOMs, Adaptive-Subspace SOMs and SOMs where the neighborhoods are de ned in the signal space are a few other variants of the basic SOM algorithm, which are discussed in more detail in [2]. In this report, we are investigating an algorithm called the Neural Gas (NG) algorithm where the neighborhoods are de ned in the input signal space. When the input vector distribution has a prominent shape, the results of the best match computations tend to be concentrated on a fraction of neurons in the map. Whereas, if the input vector distribution is more uniform, the weight vector set neatly adapts to the input data. Because of this, researchers have abandoned the de nition of topologies on the basis of spatial adjacency relations of the network and have de ned the neighborhoods according to the relative magnitudes of the vectorial di erences in the input space. Kangas et al. [18] used the minimal spanning tree (MST) architecture in de ning the neighborhood relationships and Martinetz et al. [19] used the neural gas (NG) algorithm in this regard, which is going to be discussed in more detail later. 5

1.3 Problems Associated with Basic SOM Algorithm Kohonen's feature map is a special method for conserving the topological relationships in input data, but it has some limitations. For example, in Kohonen's model the neighborhood relations between neural units have to be de ned in advance. Also, the topology of the input space has to match the topology of the output space which is to be represented. That is, the property of neighborhood preservation, which distinguishes self-organizing maps from other neural network paradigms, depends on the choice of the output space map topology. However, in real world data sets, the proper dimensionality required by the input space is usually not known a priori, yet the output grid of the lattice has to be speci ed prior to learning. To tackle this problem, one can use an advanced learning scheme which adapts not only the weight vectors of the neurons, but also the topology of the output space itself. Some examples of such algorithms include topology representing networks [20, 21], the growing cell structure algorithm [9], SPA neural tree algorithm [13] and the growing hypercubical output space algorithm [17].These, we have discussed in more detail in sub-section (1.2). In addition, the dynamics of the SOFM algorithm cannot be described as a stochastic gradient descent on any energy function. The only solution to this problem currently, is to describe the dynamics of the algorithm as a set of energy functions, one for each weight vector [22]. This approach was rst suggested by Tolat [23], and later Erwin et al. [22, 24, 25] modi ed that approach to arbitrary dimensions.

1.4 The NG Algorithm The neural gas (NG) algorithm solves most of the identi ed problems associated with the basic SOM algorithm. Martinetz et al. [19, 26] proposed the NG network algorithm for vector quantization, prediction and topology representation a few years ago. The NG network model: i) converges quickly to low distortion errors, ii) reaches a distortion error lower than that resulting from K-means clustering, maximum-entropy clustering and Kohonen's SOFM, iii) obeys a gradient descent on an energy surface. Similar to the SOFM algorithm, the NG algorithm uses a soft-max adaptation rule (i.e. it not only adjusts the winner reference vector, but also a ects all the cluster centers depending on their proximity to the input signal). This is mainly to generate the topographic map and also to avoid con nement to local minima during the adaptation procedure. In the Neural Gas algorithm, the synaptic weights wi are adapted without any xed topological arrangement of the neural units within the network. Instead, it utilizes a neighborhood-ranking of the synaptic weights wi for a given data vector v. The synaptic weight changes wi are not determined by the relative distances between the neural units within a topologically prestructured lattice, but by the relative distances between the neural units within the input space: hence the name neural gas network. Information about the arrangement of the receptive elds within the input space is implicitly given by a set of distortions, Dv = fkv ? wik; i = 1; 2; :::; N g, associated with each v, where N is the number of units in the network [19]. Each time an input 6

signal v is presented, the ordering of the elements of the set Dv is necessary to determine the adjustment of the synaptic weights wi. This ordering has a time complexity of O(NlogN ) in its sequential implementation. The resulting adaptation rule can be described as a winner-take-most instead of winner-take-all rule. A presented input signal v is received by each neural unit i and induces excitations fi (Dv ) which depend on the set of distortions Dv . Assuming a Hebb-like learning rule as shown in equation (3), coincidence of the presynaptic input vector v with the postsynaptic excitation fi updates the synaptic weight vector wi by wi =   fi (Dv )  (v ? wi )

i = 1; 2; :::; N:

(7)

The step size  2 [0,1] describes the overall extent of the modi cation (learning rate) and fi (Dv ) 2 [0,1] accounts for the topological arrangement of the wi within the input space. Martinetz et al. [19] reported that an exponential function exp(?ki =) should give the best overall result, compared to other choices like Gaussians for the excitation function fi (Dv ), where  determines the number of neural units signi cantly changing their synaptic weights with the adaptation step (7). The rank index ki = 0; 1; :::; (N? 1), describes the neighborhood-ranking of the neural units, with ki = 0 being the closest synaptic weight (wi0 ) to the input signal v, ki = 1 being the second closest synaptic weight (wi1 ) to v, and so on. That is, the set f wi0 , wi1 ,..., wiN ?1 g is the neighborhood-ranking of wi relative to the given input vector v. The neighborhoodranking index ki depends on v and the whole set of synaptic weights W =f w1, w2 ,..., wN g and we denote this as ki( v, W ). The original NG network algorithm is summarized in Table 2. Table 2: Original NG network algorithm

NG1: Initialize the synaptic weights wi randomly and the training parameters

(i ; f ; i ; f ), where i ; i are initial values of (t); (t) and f ; f are nal values of (t); (t).

NG2: Present an input vector v and compute the distortions Dv . NG3: Order the distortion set Dv in ascending order. NG4: Adapt the weight vectors according to: wi =   h (ki (v; W ))  (v ? wi ) where i = 1; 2; :::; N . The parameters have the following time dependencies: (t) = i (f =i )t=tmax , (t) = i (f =i)t=tmax , h (ki ) = exp(?ki=(t)). NG5: Increment the time parameter t by 1. NG6: Repeat NG2 - NG5 until the maximum iteration tmax is reached.

7

(8)

1.4.1 Important Features of the NG Algorithm The main feature of this algoritms is that the dynamics of the synaptic weight vector wi obeys a stochastic gradient descent on the following cost function. N Z X 1 E (w; ) = 2C () P (v)h (ki (v; w))(v ? wi)2 dm v i=1

with

C () =

NX ?1 k=0

(9)

h (k)

where P (v) describes the probability distribution of the input data vectors and m is the dimension of the data manifold. This cost function E (w; ) is related to the fuzzy clustering [26]. That is, in contrast to hard clustering where each data vector v is deterministically assigned to its closest weight vector wi(v) , fuzzy clustering associates a data vector v to a weight vector wi with a certain degree pi (v), the so-called fuzzy membership of v to cluster i. So, if we choose pi (v) = h (ki (v; w))=C (), then the average distortion error we obtain (and which has to be minimized) is given by E (w; ) and the corresponding gradient descent is given by the adaptation rule given in equation (8). The shape of the cost function E (w; ) depends on the decay constant . To obtain better results concerning the set of weight vectors, the adaptation process determined by equation (8) must start with a large decay constant  and decrease it gradually with each adaptation step. In addition, the feature map is achieved without the use of any prior knowledge about the neighborhood relationships of the topology. That is, compared to Kohonen's SOFM algorithm, it is not the neighborhood ranking of the weight vectors within an external lattice, but the neighborhood ranking within the input space.

2 MODIFIED NG ALGORITHM

2.1 Problems Associated with the Original NG Algorithm Despite all its advantages over the SOM network, the NG network algorithm su ers from a high time complexity problem in its sequential implementation [19, 27]. In the original neural gas network, an explicit ordering of all distances between synaptic weights and the training sample was necessary (see section 1.4, NG3). This has a time complexity of O(NlogN ) in a sequential implementation. We started our work by investigating how such a time complexity problem associated with the NG algorithm could be reduced eciently. To this end, we introduced the implicit ranking scheme instead of the time consuming explicit ranking scheme.

2.2 Implicit Ranking Scheme Recently, some work has been done on speeding-up procedures for the sequential implementation of the NG algorithm. Ancona et al. [28] discussed the questions of sorting 8

accuracy and sorting completeness. With theoretical analysis and experimental evidence, they have concluded that partial, exact sorting (i.e. ordering the top few winning units correctly and keeping all other units in the list una ected) performs better than complete but noisy sorting (i.e. ordering the top few winning units correctly and all other remaining units are subjected to inexact sorting). Also they have concluded that even a few units in partial sorting is sucient to attain a nal distortion equivalent to that attained by the original NG algorithm. Moreover, they have concluded that correct identi cation of the best-matching unit becomes more and more important as training proceeds. This is obvious, because as training proceeds, the adaptation step (8) becomes equivalent to the K-means adaptation rule. Choy et al. [27] applied the partial distance elimination (PDE) method to speed-up the NG algorithm in the above context. In our investigations, we eliminate the explicit ordering (NG3, in the above summary) by employing the following implicit ordering metric: ) mi = (d(di ??dmin d ) max

(10)

min

where dmin and dmax are the minimum and maximum distances between the training sample and all reference units in the network respectively, and di 2 Dv ; i = 1; 2; :::; N . The best matching winner unit will then have an index of 0, the worst matching unit will have an index of 1, and other units will take values between 0 and 1. i.e., mi 2 [0; 1]: By employing the above modi cation to the original NG algorithm discussed earlier, we can modify the two entries NG3 and NG4 as shown in Table 3. Table 3: Modi cation to the original NG algorithm

NG3: Find dmin ; dmax from the distortion set Dv . NG4: Adapt the weight vectors according to: wi =   h (mi (v ; W ))  (v ? wi) where h (mi ) = exp(?mi =0 (t)) and 0 (t) = (t)=(N ? 1).

(11)

0

0

In addition, we also updated only those units with a non-negligible e ective learning rate as in [19, 27]. This is implemented by a further modi cation to the weight updating phase with the following truncated exponential function as in [27].

h (mi ) = 0

(

exp(?mi =0 (t)) if mi < r  0 (t) 0 otherwise

(12)

where r is a constant to be selected. Because of the above truncation, the weight update rule (11) will update those weights with a non-zero value of h (mi ). These modi cations were able to eliminate the ranking mechanism completely and also reduce the number of updates substantially - by almost 80% on average. 0

9

3 HIERARCHICAL OVERLAPPED ARCHITECTURE 3.1 HONG Algorithm

By retaining the essence of the original NG algorithm and our modi cation, we developed a hierarchical overlapped neural gas (HONG) network algorithm for labeled pattern recognition. The structure of the HONG network architecture is an adaptation of the hierarchical overlapped architecture developed for SOMs by Suganthan [29]. First, the network is initialized with just one layer which is called the base layer. The number of neurons in the base layer has to be chosen appropriately. In labeled pattern recognition applications, the number of distinct classes and the number of training samples may be considered in the selection of the initial size of the network. Similar to the SOM architecture, in this architecture too, every neuron has a synaptic weight vector which has the same dimension as the input feature vector. Once we have selected the number of neurons in the base layer, we applied our modi ed version of the NG algorithm to adapt the synaptic weights of the neurons in the base network. Having completed the unsupervised NG learning, the neurons in the base layer were labeled using a simple voting mechanism. In order to ne tune the labeled network, we applied the supervised learning vector quantization (LVQ) algorithm [4]. Then we obtained the overlaps for each neuron in the base layer. That is, if we had 100 neurons in the base layer network, then we have 100 separate NG networks in the second level spanned by each neuron in the base layer network. The overlapping is achieved by duplicating every training sample to train several upper level NG networks. That is, the winning neuron as well as a number of runnersup neurons make use of the same training sample to train the second level NG networks grown from those neurons in the base layer NG network. In gure 2 for example, the overlapped NG network grown from neuron A is trained on samples where the neuron A is a winner or one of the rst few runners-up for all the training samples presented to the base layer in its training phase. In this way, we obtain several upper level NG networks which have the same training sample to train each of them. For instance, if we have an overlap of 5 (i.e. a winner and 4 runners-up) for training samples in the base NG network, then each such sample is being used to train 5 di erent NG networks in the second level. Figure 2 also shows the overlap in the feature space of the two overlapped NG networks conceptually assuming that the nodes A and B are adjacent to each other in the feature space. The testing samples are also duplicated, but to a lesser degree. Hence the testing samples t well inside the feature space spanned by the winner and several runners-up in the training data. In addition, this duplication of the samples allows us to employ one of the several classi er combination schemes to obtain the nal classi cation. In combining classi ers, we rst employed majority and weighted voting schemes. Then we extended our study to obtain a con dence value for every sample's membership

10

Overlapped NG Networks Second Level overlappd NG network grown from unit A

overlappd NG network grown from unit B

A

Base Level

B Base NG Network

Input features (fully connected)

Figure 2: Hierarchical overlapped architecture showing two units, A and B, from the base NG network being expanded to the second layer. in every class(j ) using the following:



 d j cj = 1 ? d acc

(13)

P

where dj is the minimum distance for every class j , dacc = 9j =0 dj and j = 0; 1; :::; 9 for numeral classi cation. This de nes a con dence value (cj ) for the input pattern belonging to the j th class. The class which has the global minimum distance will yield a con dence value closer to one (in case of a perfect match, i.e. dj = 0, the con dence value for that class becomes one). That is, the higher the con dence value for a class, more likely a sample will belong to that class. We can also consider the above function as a basic probability assignment, because 0 6 cj 6 1. We can consider the collection of all ten con dence values of an overlapped network as a vector, which here onwards is referred to as the con dence vector of that network. For example, let us assume that we are considering four overlaps for testing data. Then we get four con dence vectors for each overlapped second level network. Given the individual con dence vectors we can calculate the overall con dence vector of the HONG architecture by adding the individual con dence values according to their class label. We can then assign the class label of the test data according to the overall con dence vector (i.e. select the index of the maximum con dence value from the vector) or we can use this overall con dence vector for further calculations as we are going to explain in sub-section (4.2).

11

4 COMBINATION OF CLASSIFIERS 4.1 Introduction

It has been observed that for a given data set, classi ers based on di erent architectures and di erent feature sets do not necessarily have the same recognition error. That is, they may be regarded as error independent [30]. It is bene cial in many cases to apply several error independent classi ers to the same recognition task and use their error independence to improve the recognition performance of a combined system instead of inventing a new architecture or a feature extractor to achieve the same accuracy. In this way, the combination should take advantage of the strengths of the individual classi ers, avoid their weaknesses, and improve classi cation accuracy. The main objectives of combining classi ers are eciency and accuracy. To increase eciency one can adopt multistage combination rules whereby objects are classi ed by a simple classi er using a small set of cheap features in combination with a rejection option. For more dicult objects, more complex procedures, possibly based on di erent features, can be used. An interesting issue in the research concerning classi er combination is the way they are combined [31]. If only labels are available, a majority vote can be used (or label ranking). If continuous outputs like a posterior probabilities are supplied, an average or some other linear combination can be used. If classi er outputs are interpreted as fuzzy membership values, belief values or evidence, then fuzzy rules, belief functions or Dempster-Shafer techniques can be used. In the recent past, various combination techniques were proposed by di erent authors [30{42] and those techniques fall into two basic approaches for combining multiple classi ers. They are, 1. Classi er Fusion Algorithms; 2. Dynamic Classi er Selection Algorithms.

4.1.1 Classi er Fusion Algorithms In classi er fusion algorithms, individual classi ers are applied in parallel and their outputs are combined in some manner to achieve a group consensus. In this method, each classi er uses its own representation of the input sample. That is, the features extracted from the input sample are unique to each classi er. An important feature of combining classi ers in this manner is the possibility to integrate physically di erent types of features successfully. Majority vote, unanimous consensus, Dempster-Shafer theory of evidence, and methods of multistage classi cation are some of the classi er fusion algorithms that are widely used.

4.1.2 Dynamic Classi er Selection Algorithms Compared to classi er fusion algorithms, dynamic classi er selection algorithms attempt to predict which single classi er is most likely to be correct for a given sample. Only the output of the selected classi er is considered in the nal classi cation. That is, in dynamic classi er selection a method of partitioning the input sample space is 12

required. Then the best classi er for each partition is determined using training data. For classi cation, an unknown sample is assigned to a partition and the output of the best classi er for that partition is used to make the nal decision. In this method, all the classi ers use the same representation of the input sample. A typical example of this category is a set of neural network classi ers of xed architecture but having distinct sets of weights which have been obtained by means of di erent training strategies.

4.2 Use of Con dence Values in Combining Classi ers In our experiments with the HONG network architecture, we used three di erent classi ers, based on three di erent feature extraction methods which are explained in sub-section (5.3). In combining these classi ers, we used the overall con dence vectors generated by each of them as mentioned in sub-section (3.1). Our method falls into the classi er fusion algorithm category as discussed earlier. Let C 1 , C 2 and C 3 be the overall con dence vectors generated by each of the three classi ers, i.e. C 1 = [c10 ; c11 ; :::; c19 ], C 2 = [c20 ; c21 ; :::; c29 ] and C 3 = [c30 ; c31 ; :::; c39 ]. To combine the results of the three di erent classi ers, we summed their overall con dence vectors along each class label and the nal classi cation of the sample is obtained using the max-rule [31]. That is, the nal con dence values of each class is given by,

X c = ci 3

j

i=1

j

(14)

and the nal con dence vector is C =[c0 ; c1 ; :::; c9 ]. Assign class k to a sample S based on,

k = arg maxfcj g

where j = 0; 1; ; :::; 9:

j

(15)

5 EXPERIMENTAL RESULTS

5.1 Data Sets and Other Parameters used in the Experiments We performed experiments on handwritten numerals to test our proposed classi er. These handwritten numeral samples were extracted from the NIST SD3 database (which contains 223,124 isolated handwritten numerals scanned at 300 dots per inch) provided by the American National Institute of Standards and Technology (NIST) [43]. We partitioned the NIST SD3 database into the following non-overlapping sets as shown in Table 4. The SD3 database contains a total of 223,124 digit samples written by 2100 writers. The test set comprises samples from 600 writers not used in Training and Validation sets. We restricted the number of upper level layers of the overlapped NG networks to two. The base layer consisted of 250 neurons. The number of neurons for each overlapped NG network (second layer) were determined empirically by considering the available training samples for each of them. We found minf300; maxf35; (training samples)=8gg is a good estimate in determining the number of neurons for the second layer. We used 5 overlaps for the training set and 3 overlaps for the testing set. To truncate the 13

Table 4: Partitions of SD3 data set used in our experiments. Partition(s) hsf f0,1,2g hsf f0,1,2g Size 106,152 53,076 Use Training Validation

hsf 3 63,896 Testing

exponential function as described in (12), a value between 11 and 13 was a good approximation for the parameter r. Through trial and error, we discovered empirically i = 0:7; f = 0:05; i = 0:1; and f = 0:0001, to give the best results for the proposed network.

5.2 NG and HONG Algorithm Results The recognition rates obtained using the above mentioned parameters are illustrated in Tables 5 and 6. As can be seen, the HONG architecture further improves the high classi cation rate provided by the base layer NG network. Table 5: Base NG and HONG network results Method GF PF CF

Training 99.31% 99.21% 99.32%

BASE NG Validation 98.60% 98.82% 98.68%

Testing 98.84% 98.92% 98.85%

Training 99.90% 99.66% 99.93%

HONG Validation 99.30% 99.06% 99.12%

Testing 99.30% 99.14% 99.21%

5.3 Feature Extraction Methods We used three di erent feature extraction methods in our experiments. Prior to the feature extraction operation, we performed pre-processing operations to the isolated numerals as in [44]. First, we removed isolated blobs from the binary image based on a ratio test. Then the digit is centered and only the numeral part is extracted from the 128x128 binary image. Once we have done that, we passed each digit's image through the three feature extraction operations. In the rst method, we extracted the global features based on the pixel values. The pre-processed binary digit image was rescaled to an 88x72 pixel resolution and each such image was sub-sampled into 8x8 blocks. The result was an 11x9 grey scale image with pixel values in the range [0,64]. We refer to these features as global features (GF) in subsequent sections. In the second method, we extracted the structural features based on the projections, black-to-white transitions and contour pro les as in [45]. Initially the binary digit image was normalized to a 32x32 pixel resolution. Then the black pixels were projected in four main directions (horizontal, vertical, left diagonal and right diagonal) 14

and four di erent histograms were obtained. Similarly, black-to-white transitions were counted and another four di erent histograms were obtained. Finally eight contour pro les were computed from eight main directions (the above mentioned four directions in left-to-right and right-to-left directions). A contour pro le value was de ned as the number of white pixels separating the border and the rst black pixel seen from a given direction. Finally, we were left with 16 histograms and we extracted 6 features from each of the histogram by sub-sampling with a weighted average. As a result, we obtained 96 features from this feature extraction method. We refer to these features as projection-based features (PF) in subsequent sections. In the third method, we extracted the structural features based on contours as in [45]. First the binary digit image was normalized to an 88x66 pixel resolution. Then both inner and outer contours of the digit's image were extracted. At each contour point, the gradient direction was estimated by computing the weighted average of the coordinates of the neighboring pixels. The calculated direction was quantized into 8 uniform quantization intervals. Then the image with direction contours was subdivided into 4x3 regions. In each region and for each of the 8 discrete directions, the total number of contour pixels are counted. This counting is weighted according to its position with respect to the center of the corresponding region. Finally, we were left with 96 (i.e. 8x12) direction features from this feature extraction method. We refer to these features as contour-based features (CF) in subsequent sections.

5.4 Classi er Combination Results The nal step in the whole process was to combine the three classi ers based on global features, projection-based features and contour based features. Doing this, we were able to obtain an excellent recognition rate for the NIST SD3 database. (See Table 6) To the best of our knowledge, the previously most successful results obtained for the NIST SD3 database were by Ha et al. [45]. They used a total of 223,124 samples and obtained a recognition rate of 99.54% from a test set of 173,124 samples. They designed two recognition systems based on two distinct feature extraction methods and used a fully connected feed-forward three layer perceptron as the classi er for both feature extraction methods. In addition, if the best score in the combined classi er was less than a xed prede ned threshold, they replaced the normalization operation prior to feature extraction by a set of perturbation processes which modeled the writing habits and instruments.

15

Table 6: Multiple classi er combination results Method Test Rate Global Features (GF) 99.25% Projection-based Features (PF) 99.14% Contour-based Features (CF) 99.21% Combined Result 99.59%

6 PUBLICATIONS The following papers have come out of the research so far. (1) A.S. Atukorale and P.N. Suganthan, An Ecient Neural Gas Network for Classi cation, In Proceedings of the International Conference on Control, Automation, Robotics and Vision (ICARCV'98), pp.1152{1156, Singapore, December, 1998. (2) A.S. Atukorale and P.N. Suganthan, Hierarchical Overlapped Neural-Gas Network with Application to Pattern Classi cation, Submitted to Neurocom-

puting Journal in November 1998.

(3) A.S. Atukorale and P.N. Suganthan, Combination of Multiple HONG Networks for Recognizing Unconstrained Handwritten Numerals, accepted for publication in the IEEE International Joint Conference on Neural Networks (IJCNN'99), Washington DC, July 1999. (4) A.S. Atukorale and P.N. Suganthan, Combining Classi ers based on Con dence Values, Submitted to Fifth International Conference on Document Analysis and Recognition (ICDAR'99), Bangalore, India, September 1999 .

7 TIME PLAN The remaining work towards completion of my candidature will be carried out using the following as a guide: (1) Present - Mar 1999 Complete preliminary coding of the HONG algorithm and multiple classi cation decision combination methods. (2) Apr 1999 - Jun 1999 Currently we are using con dence values generated by di erent classi ers for combining. We are planning to interpret their outputs as fuzzy membership values or evidence values which will enable us to use fuzzy rules or Dempster-Shafer theory of evidence techniques.

16

(3) Jul 1999 - Dec 1999 (a) Investigate the topology representation schemes for the new HONG network architecture which will enable it to function as a feature map. (b) Perform an analysis of energy function based approaches to generate a topology representation of the above network architecture. (4) Jan 2000 - Jun 2000 Carry out remainder of practical and theoretical work to be presented in the thesis based on the previously determined plan. Also we plan to carry out a comparative study with other competing methods such as Multilayer Back-propagation, Kohonen's self-organizing feature maps, etc. (5) Jul 2000 - Dec 2000 Finalize all experimental work and start writing-up the thesis (drafts, proof reading, error correction, etc). (6) Jan 2001 - Feb 2001 Complete research and nish PhD dissertation. Submit thesis for examination.

8 CONCLUSIONS In this report, we proposed an implicit ranking scheme to speed up the sequential implementation of the original NG algorithm instead of its time consuming explicit ranking scheme. Compared to the number of applications in Kohonen's SOFM, there are relatively few for the NG algorithm in the literature [46{55]. We hope, due to the speeding up method that we have introduced for the sequential implementation, that there will be more applications of the NG algorithm in the future. The HONG network architecture allowed us to obtain a better classi cation on con icting data. The HONG network architecture successively partitioned the input space to avoid such situations by projecting the input data to di erent upper level NG networks (see Fig. 2). Since the training and the testing samples are duplicated in the upper layers of the HONG architecture, we obtained multiple classi cations for every sample. This allowed us to employ one of the several classi er combination schemes to obtain the nal classi cation. Each of the HONG network outputs is converted to a con dence vector based on their minimum distance to every class. We can consider this as a basic probability assignment which describes the con dence of a data vector belonging to a certain output class. The proposed architecture was tested on handwritten numerals extracted from the well known NIST SD3 database and we were able to obtain excellent results which are comparable with the current published results for that database.

17

Bibliography [1] D.J. Willshaw and C von der Malsburg. How Patterned Neural Connections can be set up by Self-Organization. Proceedings of the Royal Society of London Series B, 194:431{445, 1976. [2] Teuvo Kohonen. Self-Organizing Maps, Second Edition. Springer-Verlag, Berlin, 1995. [3] Teuvo Kohonen. Self-Organized Formation of Topologically Correct Feature Maps. Biological Cybernetics, 43:59{69, 1982. [4] Teuvo Kohonen. The Self-Organizing Map. Proceedings of the IEEE, 78(9):1464{ 1480, September 1990. [5] Simon Haykin. Neural Networks - A Comprehensive Foundation, Second Ed. Prentice Hall Inc., Upper Saddle River, NJ 07458, 1999. [6] Pierre Demartines and Francois Blayo. Kohonen Self-Organizing Maps: Is the Normalization Necessary? Complex Systems, 6:105{123, 1992. [7] B. Fritzke. LET IT GROW { self-organizing feature maps with problem dependent cell structure. In T. Kohonen, K. Makisara, O. Simula, and J. Kangas, editors, Arti cial Neural Networks, pages 403{408. North-Holland, Amsterdam, 1991. [8] B. Fritzke. Kohonen Feature Maps and Growing Cell Structures { a Performance Comparison. In L. Giles, S. Hanson, and J. Cowan, editors, Advances in Neural Information Processing Systems 5, pages 123{130. Morgan Kaufmann Publishers, San Mateo, CA, 1993. [9] Bernd Fritzke. Growing Cell Structures - A Self-Organizing Network for Unsupervised and Supervised Learning. Neural Networks, 7(9):1441{1460, 1994. [10] Bernd Fritzke. Growing Grid - a self-organizing network with constant neighborhood range and adaptation strength. Neural Processing Letters, 2(5):9{13, 1995. [11] B. Fritzke. Growing Self-organizing Networks { Why? In ESANN'96: European Symposium on Arti cial Neural Networks, pages 61{72, 1996. [12] Justine Blackmore and Risto Miikkulainen. Incremental grid growing: Encoding high-dimensional structure into a two-dimensional feature map. Technical Report AI92-192, University of Texas at Austin, December 1992. [13] Tao Li, Yuan Y. Tang, and L.Y. Fang. A Structure-Parameter-Adaptive (SPA) Neural Tree for the Recognition of Large Character Set. Pattern Recognition, 28(3):315{329, 1995. [14] Tsu-Chang Lee and Allen M. Peterson. Adaptive Vector Quantization Using a Self-Development Neural Network. IEEE Journal on selected areas in Communications, 8(8):1458{1471, October 1990. 18

[15] Doo-Il Choi and Sang-Hui Park. Self-Creating and Organizing Neural Networks. IEEE Transactions on Neural Networks, 5(4):561{575, July 1994. [16] M. Herrmann, H.-U. Bauer, and R. Der. The Perceptual Magnet E ect: A Model Based on Self-Organizing Feature Maps. In Proc. of the 3rd Neural Computation and Psychology Workshop, Stirling, Scotland, pages 107{116, September 1994. [17] Hans-Ulrich Bauer and Thomas Villmann. Growing a Hypercubical Output Space in a Self-Organizing Feature Map. IEEE Transactions on Neural Networks, 8(2):218{226, March 1997. [18] Jari A. Kangas, Teuvo K. Kohonen, and Jorma T. Laaksonen. Variants of SelfOrganizing Maps. IEEE Transactions on Neural Networks, 1(1):93{99, March 1990. [19] Thomas Martinetz and Klaus Schulten. A Neural-Gas Network Learns Topologies. In T. Kohonen, K. Makisara, O. Simula, and J. Kangas, editors, Arti cial Neural Networks, pages 397{402. North-Holland, Amsterdam, 1991. [20] Thomas Martinetz and Klaus Schulten. Topology Representing Networks. Neural Networks, 7(3):507{522, 1994. [21] H.-U. Bauer and K. Pawelzik. Quantifying the Neighborhood Preservation of SelfOrganizing Feature Maps. IEEE Transactions on Neural Networks, 3(4):570{579, July 1992. [22] E. Erwin, K. Obermayer, and K. Schulten. Self-Organizing Maps: Ordering, Convergence Propeties and Energy Functions. Biological Cybernetics, 67:47{55, 1992. [23] V.V. Tolat. An Analysis of Kohonen's Self-Organizing Maps using a System of Energy Functions. Biological Cybernetics, 64:155{164, 1990. [24] E. Erwin, K. Obermayer, and K. Schulten. Self-Organizing Maps: Stationary States, Metastability and Convergence Rate. Biological Cybernetics, 67:35{45, 1992. [25] E. Erwin, K. Obermayer, and K. Schulten. Convergence Properties of SelfOrganizing Maps. In T. Kohonen, K. Makisara, O. Simula, and J. Kangas, editors, Arti cial Neural Networks, pages 409{414. North-Holland, Amsterdam, 1991. [26] Thomas Martinetz, Stanislav G. Berkovich, and Klaus Schulten. Neural Gas Network for Vector Quantization and its Application to Time-Series Prediction. IEEE Transactions on Neural Networks, 4(4):218{226, July 1993. [27] Cli ord Sze-Tsan Choy and Wan-Chi Siu. Fast Sequential Implementation of Neural Gas Network for Vector Quantization. IEEE Transactions on Communications, 46(3):301{304, March 1998. [28] Fabio Ancona, Sandro Ridella, Stefano Rovetta, and Rodolfo Zunino. On the Importance of Sorting in Neural Gas Training of Vector Quantizers. In Proc. of the IEEE International Conference on Neural Networks, pages 1804{1808, 1997. 19

[29] P.N. Suganthan. Hierarchical Overlapped SOMs for Pattern Classi cation. IEEE Transactions on Neural Networks, 10(1):193{196, January 1999. [30] Galina Rogova. Combining the Results of Several Neural Network Classi ers. Neural Networks, 7(5):777{781, 1994. [31] Josef Kittler, Mohamad Hatef, Robert P.W. Duin, and Jiri Matas. On Combining Classi ers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226{239, March 1998. [32] Tin Kam Ho, Jonathan J. Hull, and Sargur N. Srihari. Decision Combination in Multiple Classi er Systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1):66{75, January 1994. [33] Lei Xu, Adam Krzy_zak, and Ching Y. Suen. Methods of Combining Multiple Classi ers and Their Applications to Handwriting Recognition. IEEE Transactions on Systems, Man and Cybernetics, 22(3):418{435, May 1992. [34] Eberhard Mandler and Jurgen Schurmann. Combining the Classi cation Results of Independent Classi ers Based on Dempster{Shafer Theory of Evidence. In E.S. Gelsema and L.N. Kanai, editors, Pattern Recognition and Arti cial Intelligence, pages 381{393. Elseiver Science, North-Holland, 1988. [35] K. Woods, W.P. Kegelmeyer, and K. Bowyer. Combination of Multiple Classi ers using Local Accuracy Estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):405{410, April 1997. [36] Robert A. Jacobs. Methods of Combining Experts' Probability Assessments. Neural Computation, 7(5), September 1995. [37] M. Sabourin, A. Mitiche, D. Thomas, and G. Nagy. Classi er Combination for Hand-Printed Digit Recognition. In Proceedings of the International Conference on Document Analysis and recognition, pages 163{166. Tsukuba Sci City, Japan, 1993. [38] Roberto Battiti and Anna Maria Colla. Democracy in Neural Nets: Voting Schemes for Classi cation. Neural Networks, 7(4):691{707, 1994. [39] Lars Kai Hassen and Peter Salamon. Neural Network Ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10):993{1001, October 1990. [40] Ke Chen and Huisheng Chi. A Method of Combining Multiple Probabilistic Classi ers through Soft Competition on Di erent Feature Sets. Neurocomputing, 20:227{252, 1998. [41] Chuanyi Ji and Sheng Ma. Combination of Weak Classi ers. IEEE Transactions on Neural Networks, 8(1):32{42, January 1997. [42] J. Kittler. Combining Classi ers: A Theoretical Framework. Pattern Analysis and Applications, 1(1):18{27, 1998. 20

[43] M. D. Garris. Design, Collection and Analysis of Handwriting Sample Image Databases. The Encyclopedia of Computer Science and Technology, 31(16):189{ 213, 1994. [44] P. N. Suganthan. Structure Adaptive Multilayer Overlapped SOMs with Supervision for Handprinted Digit Classi cation. In Proceedings of the International Joint Conference on Neural Networks (IJCNN'98), pages 1706{1711. Anchorage, Alaska, USA, May 1998. [45] Thien M. Ha and Horst Bunke. Design, Implementation, and Testing of Perturbation Method for Handwritten Numeral Recognition. Technical Report IAM-96-014, Institute of Computer Science and Applied Math. University of Berne, Switzerland, October 1996. Anonymous ftp: iamftp.unibe.ch/pub/TechReports/1996/. [46] E. Ardizzone, A. Chella, and R. Rizzo. Color Image Segmentation Based on a Neural Gas Network. In Maria Marinaro et al., editor, International Conference of Arti tial Neural Networks (ICANN '94), Sorrento, Italy, pages 1161{1164, May 1994. [47] M. Fontana, N.A. Borghese, and S. Ferrari. Image Reconstruction using Improved Neural Gas. In Maria Marinaro et al., editor, Italian Workshop on Neural Nets (7th), Vietri Sul Mare, Italy, pages 260{265, 1996. [48] Thomas Hofmann and Joachim M. Buhmann. An Annealed Neural Gas Network for Robust Vector Quantization. In C von der Malsburg, W von Seelen, J.C Vorbruggen, and B. Sendho , editors, Arti tial Neural Networks (ICANN '96), volume 7, pages 151{156. Springer, Bochum Germany, July 1996. [49] Kazuya Kishida, Horomi Miyajima, and Michiharu Maeda. Destructive Fuzzy Modeling Using Neural Gas Network. IEICE Transactions on Fundermentals of Electronics, Communications and Computer Sciences, E80-A(9):1578{1584, September 1997. [50] Bai-ling Zhang, Min-yue Fu, and Hong Yan. Application of Neural Gas Model in Image Compression. In Proceedings of the International Joint Conference on Neural Networks (IJCNN'98), pages 918{921. Anchorage, Alaska, USA, May 1998. [51] Bai-ling Zhang, Min-yue Fu, and Hong Yan. Handwritten Digit Recognition by Neural Gas Model and Population Decoding. In Proceedings of the International Joint Conference on Neural Networks (IJCNN'98), pages 1727{1731. Anchorage, Alaska, USA, May 1998. [52] B. Fritzke. A Growing Neural Gas Network Learns Topologies. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 625{632. MIT Press, Cambridge MA, 1995. [53] Fred Hamker and Dietmar Heinke. Implementation and Comparison of Growing Neural Gas, Growing Cell Structures and Fuzzy Artmap. Technical Report 1/97, Technical University of Ilmenau, April 1997. 21

[54] R. Berlich and M. Kunze. A Comparison between the Performance of Feed Forward Neural Networks and the Supervised Growing Neural Gas Algorithm. Nuclear Instruments and Methods in Physics Research, A 389:274{277, April 1997. [55] Marcel Kunze and Johannes Ste ens. Growing Cell Structures and Neural Gas. In Proceedings of the 4th AIHEP Workshop, Pisa, 1995.

22

Suggest Documents