role of neural networks used as classi ers is examined, several neural ... a software package called SNNS 14], while the last was implemented using Matlab 10]. In the .... on the SNNS manual suggested values and then adjusted based on the .... 10] Demuth H., Beale M., \Neural Network Toolbox", The MathWorks Inc., 1992.
Comparative Results for Arabic Character Recognition Using Arti cial Neural Networks R.F. Walker, M. Bennamoun, B. Boashash Signal Processing Research Centre
Queensland University of Technology GPO Box 2434 Brisbane Q 4001 Australia
1 ABSTRACT This paper discusses the aims of automatic character recognition and the challenges that arabic characters pose to the implementation of a recognition system suitable for this script. The role of neural networks used as classi ers is examined, several neural network architectures are investigated, and their classi cation performance is evaluated when trained and tested with handwritten Arabic characters without preprocessing or feature extraction. Of the networks implemented, best classi cation performance was provided by a 2-layer (1 hidden, 1 output layer) network using backpropagation with momentum.
2 INTRODUCTION Character recognition has been an active area of research for more than two decades. The objective of character recognition is to accurately recognize handwritten or typeset characters to facilitate man-machine interaction. The bene ts of this research are numerous, and include automatic mail sorting based on postcode numerals, and machine archival of manuscripts or books. In the case of the ten Arabic numerals much progress has been made. In recent times the use of ANNs has featured prominently in this eld, with reports of high recognition rates for handwritten Arabic numerals being more the norm than the exception [5][8]. This success justi es the employment of ANNs as powerful classi ers for recognition systems. The recognition of Arabic characters however, has been an area of only limited and recent research with very few published papers, particularly with the use of neural network classi ers. As Arabic is a language in wide use throughout the world 1 , and because many other languages also use the Arabic alphabet, the bene ts of automatic recognition systems mentioned above also hold for this script, and we feel that neural networks can play an important role in these systems. The recognition of Arabic characters represents a signi cant challenge due to the salient features of the Arabic script. Each of the 28 letters has four shapes depending on its position within a word (start, middle, end, or isolated; see Table 1). Also, because words are written cursively from right to left and contain several connected letters, character segmentation is necessary before input to the recognition device. Some words contain broken portions because particular characters cannot be connected to succeeding ones. Vowel diacritics and writing style add another degree of complexity to the recognition task. Figure 1 shows an example of Arabic text, and includes segmentation lines which separate the characters. 1 The following languages (among others) use Arabic script: Urdu(India, Pakistan), Persian(Iran), Malay, Pashto, Afghan.
Isolated Start
Middle End
`
b
c
Table 1: Example of the four shapes of the letter a word
`
a
('geem'), depending on its position within
In this preliminary study, a database [1] containing the 28 isolated Arabic characters was used to train and test the ve network architectures. The input patterns to the networks were the actual raw data consisting of binary images of dimension 32x32 pixels. The large input eld size (1024 units) was necessary due to the curved nature of the characters. Smaller dimensions produced noisy, unrecognisable characters. Examples of the 28 character classes are shown in gure 2.
Figure 1: An example of Arabic text, with segmentation lines separating characters.
alif
ba
ta
tha
jeem
hha
kha
dal
thal
ra
za
seen
sheen
sad
dhad
tta
zha
ain
ghain
fa
qaf
kaf
lam
meem
noon
ha
waw
ya
Figure 2: Handwritten examples of the 28 single-letter Arabic characters.
3 NEURAL NETWORK ARCHITECTURES A total of ve network architectures were chosen as a basis for determining classi cation performance of neural networks when used to classify handwritten Arabic script:
Multi-layer feedback network using backpropagation with momentum and atspot elimi
nation, ART1 (unsupervised training) ARTMAP (supervised training) networks Learning Vector Quantiser (LVQ) Kohonen Feature Map competitive network.
The above networks were used to classify the 28 single Arabic characters (shown in gure 2) without pre-processing or feature extraction. Four of the networks were implemented using a software package called SNNS [14], while the last was implemented using Matlab [10]. In the following subsections a functional description of these networks is introduced.
3.1 The Multi-layer Perceptron
The multi-layer perceptron (MLP) is an extension of the single-layer perceptron network [11]. The perceptron neuron (Figure 3) is used to determine which of two classes an N-dimensional input vector x is associated with , by computing the weighted sum of the input vector elements xn and subtracting from this sum a threshold value (). The result y is then passed to the perceptron output function f (y ), a hard-limiting non-linearity in the original model, resulting in an output y 0 of either -1 or 1.
rr r
x0 H w x1 XH HX0HXX X j H z - : : * : xN ?1 wN ?1
??
P y-
x0 y-0
w = weight vector of neuron x = input vector
= threshold value of neuron Figure 3: Perceptron neuron
xN ?1
wx wx w
input
xw w ww w
1st layer 2nd layer (hidden layer) (output layer)
`b@S `b``` `b@S@`bSb@`"b`"?b`"?cacaaaa "`b@"`?b`?"S@b`"?@b`"?!a!a!ca!#ca!#ca ?"`b"`?@b`?"S@b`"S@b`"S!#!#!#!! ?""
y0 yM ?1
Figure 4: 2-layer MLP network
Each single unit in the 1st layer of the MLP (see Figure 4) forms two decision regions which are separated by a hyperplane (a line in the 2-D case). Each unit in the 2nd layer takes as its input, the outputs of all 1st layer units, and forms in vector space a hypercube embodied by the intersection of each of the 1st layer hyperplanes by logically ANDing its inputs. Thus the two-layer perceptron can classify input vectors provided each class is separate and dierent classes are not meshed (only convex decision boundaries can be formed). To classify vectors belonging to classes which are meshed, a three-layer network is required. In this case, two or more 2nd layer neurons are allocated to a meshed class, forming overlapping hypercubes which de ne a complex region in vector space. Thus vectors within a class may activate any of these groups of second layer neurons. Each unit in the third layer represents a single class, and its inputs consist of all 2nd layer units whose hyperplanes de ne the meshed class boundary. The complex class boundary is formed by eectively ORing these inputs. The ability of the threelayer network to form these complex decision regions means that, provided enough units exist to form these regions, no more than three layers will be required [9]. Training of the network can be accomplished by implementation of a number of algorithms based on the Backpropagation training algorithm [12]. This algorithm utilises a gradient search method to minimise some cost function, usually the mean-square error (MSE) between the actual network output and the required network output presented during the learning phase. One requirement of the algorithm is that the activation function of the units must be continuously dierentiable, thus sigmoidal functions generally replace the hard-limiting nonlinearities. Detailed explanation of the operation of this algorithm can be found in [6].
3.2 Adaptive Resonance Theory 1 (ART1) Network
ART1 and ART2 are two general classes of ART structures which can recognize previously learnt categories or create new categories due to the presentation of new input vectors signi cantly dierent from those already learnt. ART2 deals speci cally with analogue input vectors, while ART1 deals with binary input vectors. Only the ART1 structure will be discussed as the images used in this research are binary.
The ART1 topology consists of two layers of neurons, the F1 feature detection layer, and the F2 competitive or categorising layer (see Figure 5). Together, these layers comprise the attentional system. Each neuron in F2 takes a weighted sum of features extracted by the F1 layer, and competes with each other so that at any one time only one F2 neuron will be active. The winning F2 neuron provides top-down attentional priming to F1; in eect a set of critical features possessed by the category it has previously learnt. This form of positive feedback leads to a 'resonance' eect in the F1 layer if the input vector contains similar features. The amount of resonance produced is compared to a parameter called vigilance (), and is used to determine whether there is enough resonance in F1 to consider the input vector to be of the same category as the winning F2 neuron's class. If sucient resonance exists, the input is classed as being of the same category, but if there is insucient resonance, the winning F2 neuron is inhibited and a new winning F2 neuron is made active. This inhibition is performed by the orienting subsystem and the action of inhibition is called a reset wave. If no F2 neuron is found which closely represents the input vector's features, a previously unused F2 neuron is trained to recognize these features and thus forms a new category. When all F2 neurons have been used, the network's memory capacity is exhausted and no new categories can be stored. Good references to the operation of the ART1 network can be found in [2, 3]. Competive Layer Feature Detection
6 666666666 Reset F2 6 ? -- F1 6+ 6+
-
F2a
MAP
Orienting ARTa 6 ? System F1a
Input Pattern
Figure 5: ART1 network model
6
F2b
?6 F1b 6
ARTb
Input pattern Pattern class (training only) Figure 6: ARTMAP network model
3.3 ARTMAP Network
The ARTMAP network consists of two ART1 models (ARTa and ARTb ) and another layer of units designated a MAP Field. The map eld learns to associate input vectors in Rn with output vectors in Rm during the supervised training phase by the association of training vector pairs (ai ; bi), where ai is generally the input vector to be classi ed, and bi is the desired output or response. These input vector pairs are applied to the F1a and F1b layers respectively (see Figure 6). During testing, a previously unpresented input vector a is applied to F1a with no corresponding input to F1b. This input vector is then mapped to one of the previously learnt b outputs. Learning is achieved in part by a process of match tracking [4], where the vigilance parameter is automatically increased fractionally in response to a classi cation error. This process is repeated until a correct classi cation is made. The vigilance parameter is reset to zero before each subsequent vector input, forcing the network to make a choice from existing classi cations before a new category is established. [4] reports 100% classi cation accuracy of vectors belonging to two classes and learning that is orders of magnitude faster than conventional networks.
3.4 Learning Vector Quantiser (LVQ) Network
LVQ networks are very similar in form to feature-map networks, in that they initially use unsupervised training to determine initial reference vector positions for training vector clusters,
based on some type of distance measure (usually Euclidean). This is then followed by supervised training, where decision region boundaries are adjusted by moving these reference vectors either toward (in the case of correct classi cation) or away from (in the case of incorrect classi cation) the input training vector. Each output class can be allocated more than one reference vector, allowing complex and disjoint decision boundaries suitable for cases where input classes do not form uniform clusters in vector space. The number of reference vectors per class is usually de ned prior to training, but can also be allocated dynamically [14] allowing a more re ned network. Classi cation of a test vector is determined by the class of the reference vector that is closest to this vector, based on a suitable distance measure.
3.5 The Self-Organising Feature Map Network
The self-organising feature map is an unsupervised single layer competitive network whose neurons learn to recognize groups of similar input vectors in such a way that neighbouring neurons also respond to these vectors (albeit to a lesser degree). The home neuron has a 1D or 2D neighbourhood of variable size and upon winning the right to classify the input vector (during training), both the home neuron and its neighbours are allowed to adjust their weights in response to the input vector (see Figure 7). Before training, weights are adjusted to small, random values. Upon training, all weight vectors move to the area of vector space where input vectors are occurring. As training proceeds, the neighbourhood is gradually reduced resulting in the weight vectors distributing themselves over this area (in the extreme case the neighbourhood is reduced to 1, the winning unit itself, so only its weights are adjusted closer to the input vector). The self-organising map is a little unusual in that not only does it distribute its weight vectors over the input vector space, but it also distributes weight vectors based on the frequency of occurrence of input vectors to that area. That is, more vectors are allocated to areas where input vector density is higher. Typically the learning process is based on Kohonen learning [6, 10], and for the 1-D case is: wm new = wm old + k(x ? wmold )Am where wm is the weight vector of unit m, 0 k 1 is the learning rate, x is the input vector, and vector A de nes the neighbourhood area and has coecients of 1 for the winning unit and less than 1 for neighbouring units. Thus feature map networks self-adjust weight vectors to the areas covered by the input vectors. input x(0)
2-D feature map winning neuron
x(N-1)
Figure 7: 2-D Feature Map network. Concentric circles indicate neighbourhoods of equal in uence de ned by A (a matrix in the 2D case)
4 SYSTEM IMPLEMENTATION
4.1 Implementation of the Multi-Layer Perceptron
A multi-layer perceptron network was implemented using the 'BIGNET' feature of the SNNS package [14], a facility which allows rapid implementation of large networks of regular structure.
The architecture of the network was of a feed-back network using the backpropagation training algorithm with momentum and at-spot elimination[13]. Momentum refers to adding a proportion of the previous weight change to the current weight change, eectively giving the weight 'momentum' to escape false minima in the backpropagation error surface. Flatspot elimination refers to adding a small constant value to the weight change, providing faster convergence when the error surface is very at. Learning, momentum, and at spot terms were set initially based on the SNNS manual suggested values and then adjusted based on the results obtained after each training run. Two network architectures were trialed; a two layer network with 84 hidden units/28 output units, and a three layer net with 112 hidden/56 hidden/28 output units. The layers were fully connected and sigmiodal output functions used throughout. The input eld consisted of a 32x32 pixel matrix fully connected to the 1st (hidden) layer.
4.2 Implementation of the ART1 Network
Implementation of an ART1 network in SNNS consisted of de ning the F1 and F2 layer dimensions, initialising the weights, choosing the vigilance parameter, and selecting the learning function ART1. F1 layer dimensions were equal to the input image size (32x32 units), while the F2 consisted of 28 units, one unit for each of the 28 Arabic single character classes. Although the ART1 network self-organises the size of its F2 layer during training, the de ning of the F2 layer size before training eectively sets the maximum number of classes that can be de ned. By de ning one unit per class, we are hoping that characters of the same class contain predominantly similar features, while characters of dierent class contain predominantly dierent features, i.e. the vectors in each class are well grouped and class boundaries do not overlap. Multiple training runs were performed, with each subsequent training session commencing with a slight increase in the vigilance parameter , until all 28 output units had been allocated.
4.3 Implementation of the ARTMAP Network
The ARTMAP network was implemented in SNNS by using the 'BIGNET' facility. Layer dimensions were set to: F1a 1024 units, F2a 84 units, F1b 28 units, F2b 28 units. The provided ARTMAP initialisation, learning, and update functions were chosen, and vigilance parameter values a ; b and set to 0.3, 1.0 and 1.0 respectively.
4.4 Implementation of the LVQ Network
SNNS provides a unique type of LVQ network called Dynamic LVQ[14], so named because intra-class units (units associated with the same class) are allocated dynamically by the training algorithm when required. Thus well-grouped class vectors may have only one unit allocated, while class vectors forming more complex cluster shapes in vector space may be allocated up to a set maximum of units, allowing the de nition of more complex class boundaries. Implementation of the network in SNNS consisted simply in the de nition of the input layer (32x32 units), as all other units are allocated dynamically during the training process. The provided DLVQ initialisation, learning, and update functions were chosen, along with the following training parameter values:
+, de ned as the step distance a class vector is moved towards the input vector when correctly classi ed ( + = 0.03). ?, de ned as the step distance a class vector is moved away from the input vector when misclassi ed ( + = 0.03). cycles, the number of cycles of training performed before new class reference vectors are allocated (set to 3).
Class
1 2 3 4 5 6 7
Set 1 Set 2 @
@
H .
H HH .
H
`hp
H
X X
`
P R
h
p
Class Set 1 Set 2 8 9 10 11 12 13 14 X
¤
X
¨ ¨
P
ò ô
R
¼
È
Ð
à
Class Set 1 Set 2 15 16 17 18 19 20 21
è
ð
¤
ø
¨
¨
ò ô
Class Set 1 Set 2 22 23 24 25 26 27 28 ¼ È Ð
à
è
ð
ø
Table 2: Table showing the 28 single character classes and the 17 'root character shape' classes The network was trained with a total of 1435 examples from the training set.
4.5 Implementation of the Kohonen Feature Map Network
As the feature map network requires inputs in vector (as opposed to matrix) form, all training and test character matrices were transformed to vectors by placing the columns of the image matrix in series. All input vectors were then normalised. Thus the input eld was of dimension 1x1024 units. A total of 100 output units were used, allowing the network to allocate more than one unit per character class. This was done to allow weight vectors of each class to form more complex class boundaries in input vector space. The network was then trained for 1000 cycles.
5 RESULTS Classi cation performance for the ve networks are now discussed, followed by an analysis of the results obtained.
5.1 Multi-layer perceptron
The multilayer perceptron converged well for the training set of 1435 patterns in under 30 cycles. As an aside, using standard backpropagation, the network took more than 180 cycles to converge with a much higher mean-square error (we speculate that this was due to local minima). Classi cation performance for both the 2-layer and 3-layer networks using the training set was 100%, falling to an average of 73% for both 3-layer and 2-layer networks, using the test set. Of the misclassi ed characters, approximately 30% were classi ed to classes belonging to the same root character shape (i.e. ). Classi cation accuracy for the 17 root character classes (see Table 2) was approximately 80% based on the above results. Poor classi cation was experienced for the characters (misclassi ed as ), (misclassi ed as ), and (misclassi ed as ). ` p h
p
h
¤
5.2 ART1 and ARTMAP networks
ô
ò
Performance of the ART1 network was quite poor. Even at low vigilance settings, severe misclassi cation resulted, both the (expected) classi cation of input characters of several dierent classes to the same output class, as well as the unexpected misclassi cation of one character class to several output classes, indicating that sucient variation in writing style produced
character classes with signi cant intra-class feature variations. It was impossible to nd an adequate compromise vigilance setting which provided even mediocre performance. The performance of the ARTMAP network was also poor, but from a computational viewpoint. Because of the large size of the net (1.7 million links), one training cycle of 1435 characters took several days to complete (on a multi-user Dec-station). Due to limited time, classi cation performance for this network is unavailable.
5.3 Learning Vector Quantiser
The Dynamic LVQ network implemented in SNNS provided 100% classi cation of the 1435 training set patterns in an average of 16 cycles. Only 28 units were dynamically generated in each training session, one unit per class. Average classi cation accuracy was 65% for the 28 character classes, rising to 73% for the 17 root character classes. Poor classi cation performance was experienced for the characters , (misclassi ed as ), and (misclassi ed as and respectively), and . H
h
`
¤
¤
ø
5.4 Kohonen Feature Map
Classi cation performance for the feature-map network implemented in Matlab was 51% for the 28 character classes, rising to 54% for the 17 root classes. The characters , , , , , , and were poorly classi ed (less than 30% classi cation accuracy). We speculate that this result indicates poor clustering of the input vectors (when the image matrix is vectorised in the manner discussed previously), and demonstrates the unsuitability of a feature-map network of this type for this type of classi cation problem, without some kind of pre-processing or feature extraction being done rst. H H .
¼
È
à
h
¤
ô ò
ð
In summary, classi cation performance of the networks investigated was mixed, with particularly poor performance shown by the feature map and ART networks. Accurate classi cation performance for the feature map network was dicult to determine, due to the complications in associating a particular output unit with only one class when severe misclassi cation has occurred. That is, during training and testing, input vectors from several classes may have been mapped to one output class, with no one class being in the majority (a result of poor clustering of input vector classes and/or an insucient number of weight vectors to model the input vector distributions accurately). Best classi cation accuracy was exhibited by the MLP and LVQ networks. The characters and were severely misclassi ed by all networks. This was well expected, as feature variations due to writing style are far greater than feature variations due to the addition of one dot. Table 3 shows classi cation performance as a percentage of correctly classi ed characters, for those networks with reasonable performance (MLP, LVQ, and Feature Map).
¤
6 CONCLUSIONS We have presented comparative results of classi cation performance for ve neural network architectures based on input patterns without pre-processing or feature extraction. Results show that the Multi-layer perceptron provides the best, albeit poor classi cation performance of the ve networks trialed when trained with raw data. We can apportion part of the poor performance of the networks trialed to the use of a training set which contained a limited number of examples per class. However, we speculate that the measured classi cation error was because of high character deformation due to the unconstrained nature of the handwritten
Network
MLP LVQ F-map
28 classes 73% 67% 50%
17 classes 81% 74% 54%
Table 3: Classi cation performance as a percentage of correctly classi ed patterns. The last column shows performance for the 17 root character classes de ned in Table 2 characters. This indicates a de nite need to provide features invariant to deformation, through pre-processing and feature extraction. The high misclassi cation of characters within the same root class indicates that special emphasis should be placed on feature extraction which exploits the presence or absence of dots within a character. Future work will focus on this area, and on extending the training set to include all 4 forms of each character.
References
[1] Walker R.F., Bennamoun M., \QUT-SPRC Arabic Character Database", 2014 handwritten examples of the 28 isolated Arabic characters (in binary, 32x32 pixels), [ftp ftp.qut.edu.au, in /papers/sprc/OCR/arabic 2014.pat], 1993. [2] Carpenter G., Grossberg S., \A Massively parallel architecture for a Self-Organising Neural Pattern Recognition Machine", Computer Vision, Graphics, and Image Processing, Vol. 37 pp 54-115, 1987. [3] Carpenter G., Grossberg S., \The ART of Adaptive Pattern Recognition by a SelfOrganising Neural Network", Computer, Vol. 21, pp77-88, 1988. [4] Carpenter G., Grossberg S., Reynolds J., \ARTMAP: Supervised Real-Time Learning and Classi cation of Nonstationary Data by a Self-Organising Neural Network", Neural Networks, Vol. 4, pp565-588, 1991. [5] Fontaine T., Shastri L., \Recognising Handprinted Digit Strings: a Hybrid Connectionist/Procedural Approach", [ftp linc.cis.upenn.edu:/pub/fontaine/fontaine.cogsci93.ps.Z], University of Pennsylvania, PA, 1993. [6] Hecht-Nielsen R., \Neurocomputing", Addison-Wesley, 1990. [7] Kageyu S., Ohnishi N., Sugie N., \Augmented Multi-layer Perceptron for Rotation-and-Scale Invariant Handwritten Numeral Recognition", 1991 IEEE International Joint Conference on Neural Networks, pp 54-59, Singapore, Nov. 1991. [8] Le Cun Y., et al, \Handwritten Digit Recognition: Applications of Neural Network Chips and Automatic Learning", IEEE Communications Magazine, IEEE, 1989. [9] Lippmann R.P., "An Introduction to Computing with Neural Networks", IEEE ASSP Mag., pp4-22, 1987. [10] Demuth H., Beale M., \Neural Network Toolbox", The MathWorks Inc., 1992. [11] Rosenblatt F., \Principles of Neurodynamics", Spartan Books, Washington DC, 1961. [12] Rumelhart D.E., \Parallel Distributed Processing", Plenary Lecture, Int. Conf. on Neural Networks, San Diego, July 1988. [13] Rumelhart D.E., McClelland J.L., \Parallel Distributed Processing: Explorations in the Microstructure of Cognition, I, & II", MIT Press, Cambridge MA, 1986. [14] Zell A.,et al, \SNNS User Manual, Version 3.0", University of Stuttgart, 1993. [15] Wechsler H., Zimmerman G.L., \2-D Invariant Object Recognition using Distributed Associative Memory", IEEE Trans. PAMI, Vol. 10, No. 6, pp811-821, 1988