ICGST-GVIP Journal, ISSN: 1687-398X, Volume 9, Issue 4, August 2009
Combining Self-Organizing Maps and Radial Basis Function Networks for Tamil handwritten Character Recognition S.Santhosh Baboo*, P.Subashini**, M.Krishnaveni** *P.G. & Research Dept of Computer applications, D.G.Vaishnav College, Chennai, India **Department of Computer Science, Avinashilingam University for Women, Coimbatore, India
[email protected],
[email protected] ,
[email protected] http://www.avinuty.ac.in
Abstract
1. Introduction
An exceptional effort has been extended in making a computer recognize both typed and handwritten characters automatically. Quite recent still today, the characters of English Language is been the main explore area for our researchers. Tamil under Asian languages has little or no attention is been given. The challenges posed by Indian languages are different from English. In addition, there has been very little research on machine recognition of Indian scripts. Consequently, exhaustive experimentation is necessary in order to get a good insight into the script from machine recognition point of view. Methods currently widely used for character recognition for these languages are mainly those which involve pattern matching using image processing techniques. One limitation of such is their inability to respond to variations. In this paper, we report the results of recognition of handwritten Tamil characters. We experimented with two different approaches. One is SOM based method wherein the interactions between the features in the classification are done using unsupervised learning. In the second approach, a combination of RBF and SOM has been taken to investigate its dynamic training principles in our classification network. The classification ability of RBF-SOM is compared to SOM Network. The comparison is based on the scanned database containing features extracted from preprocessing techniques. The assessment is in terms of average recognition accuracy and the number of training samples required in obtaining an acceptable performance. Here it also performs error analysis to determine the advisability of combining the classifiers. The conclusion obtained will support the recognizing progression in better approach.
Every character in a language forms a class. Character recognition, thus, involves classification of characters into multi-classes. There are 156 distinct symbols/characters in Tamil of which 12 are pure vowels and 23 are pure consonants [5][11]. This set of 35 characters is the basic character units of the script and the remaining character classes are vowel- consonant combinations. They are composed of two parts, namely the basic character and a modifier symbol corresponding to each of the basic character. This enabled them to be organized into hierarchies, thus enhancing and simplifying the process classification. As the feature extraction is a prerequisite process for classification here it plays a prominent role in recognition process [9][11]. Feature selection has been fertile field of research and development since 1970 in statistical pattern recognition, machine learning and data mining and widely applied to many fields such as text categorization and image retrieval[19][16]. Feature selection is one of the important and frequently used techniques in preprocessing for character recognition [9][11]. It reduces the number of features, removes irrelevant, redundant, or noisy data, and brings the immediate effects for applications, speeding up the classifiers, improving recognition performance such as predictive accuracy and result comprehensibility. Feature selection is a process that selects a subset of original features. Here it is experimented with two different approaches for character recognition. First approach is based on self organizing map wherein the classification is done using unsupervised learning [7][5]. In the second approach, a combination of RBF and SOM has been taken and classification is done through its dynamic training principles and the outcome is obtained by both supervised and unsupervised training. The paper is organized as follows: Section 2 deals with the character localization and segmentation. Section 3 deals with the approaches used in selecting and extracting the features. Section 4 comprises the classification of networks taken.
Keywords: Self organizing maps, Radial basis function, classifiers, recognition, learning rate, Feature selection.
1
ICGST-GVIP Journal, ISSN: 1687-398X, Volume 9, Issue 4, August 2009
Section 5 converses the comparison of SOM and RBFSOM classifier and the experimental results of the proceedings. The paper also wraps up with remarks on possible future work in this area and some conclusions.
geometrical features are considered. Even though discarding some features, means discarding information feature selection can be used to reduce training time and increase the classification quality of neural classifiers[7][5].The below figure1 gives the complete picture of the system handled here.
2. Character localization & segmentation The input image is initially processed to improve its quality and prepare it to next stages of described method. In recognition process, segmentation is the problem of recognizing or grouping together the various parts of a composite object[11]. The main task of the character localization and segmentation is to identify characters and to cut out these parts as individual images for further processing[5][10]. A set of elements (objects) in the scanned document is prepared in the binary image is localized. So initially the picture is thresholded so that characters were represented in the picture in a color different from the background[21]. The elements present in this scanned document are identified in the labeling process to take part in a set of eliminating and grouping operations, which are the set of characters presented on it. Then character normalization is done. Here characters are not distorted in shape[12]. The mean and variance are found and the character is been scaled[5][10]. The characters written in two strokes are connected to form one stroke character. Points is been filled in between the first and second strokes. It depends on the distance between the last point in the first stroke and the first point in the second stroke. To make the independency of stroke width, the next preprocessing procedure is the stroke width normalization through skeletonization[12]. Character segmentation is done by finding the boundary of the image[10]. Then finding the blank sides around the character, cropping is ended to segment individual character in the text document[5][4].
Original Word image
Feature Extraction of Segmented characters
Recognition of Characters using a trained RBF-SOM
Thresholding of Gray-Level image
Slant Correction of Word image
Normalization of Segmented Characters
Segmentation using a different Technique
Matching of Character Strings to a tamil character
Figure 1: Complete Tamil handwritten recognition system
4. Formal definition of RBF-SOM RBF-SOM can be seen as a combination of radial basis function and self organizing map[7]. RBF Network is three layered neural networks with an input, hidden and output layer[7][5]. Consecutive layers are totally connected, furthermore the input and output layer can also be totally connected (so called shortcut connections). The net input of the input layer is the same as the external input (input pattern). The hidden layer uses the distance between weight and input vector as net input[20]. The output layer uses a linear combination as net input. Neurons of the input layer use the identical function as activation function. Neurons of the hidden layer use radial symmetric basis functions and neurons of the output layer sigmoid functions (generally the identical function) as activation function, respectively.
3. Feature extraction and selection The ease with which humans classify and describe patterns often leads to the incorrect assumption that this capability is easy to automate. Sometimes the similarity in set of patterns is immediately apparent, whereas in other instances it is not[19]. A Pattern can be basic as a set of measurements or observations, perhaps represented in vector or matrix notation[5][4]. The use of measurements already presupposes some preprocessing and instrumentation system complexity. Here features are higher level entities that are geometric descriptors of a character in a scanned image[19]. In this module the key is to choose and to extract features that (i) are computationally feasible (2) lead to good PR system success and (3) reduce the problem data into a manageable amount of information without discarding valuable information. From the extracted features feature selection is done that the extracted features be relevant to the PR task at hand. The features we have considered for Character recognition here is the height (Hbb) and width (Wbb) of its bounding box, the area of the CC (Acc), the area of the bounding box (Abb),and centroids. Features are combined, which assess the scale of the character, with more discriminate features [3][4]. Five
1.
A
set
of
neurons μ =
μΙ ≠ θ , μΗ ≠ θ , μΟ ≠ θ
μ J ∪ μΗ ∪ μΟ
pair wise disjunct.
with
μΙ
is
further referred to as input layer, μ H as hidden layer or map layer and 2. A scalar neurons
(i;
μ Ο UO as output layer. ω ((i,n )j ) ∈ ℜ is assigned to each pair of j) ∈ μ m × μ n with m ≠ n and
m ∈ {I , H } and n ∈ {Η , Ο}.The scalar is also called weight of the connection between neuron i in layer m and neuron j in layer n. The neurons and the connections together define the structure of the neural network.
2
ICGST-GVIP Journal, ISSN: 1687-398X, Volume 9, Issue 4, August 2009
3. For every neuron of the input layer j ∈ μ Ι the j-th external input of the network for pattern k is defined by x j (k ) ∈ ℜ .For every neuron of the output layer
parameter p j , further referred to as radius of the basis function According to SOM a D-dimensional (generally 1- or 2-dimensional) a topology is defined on the hidden layer. A coordinate vector is assigned to every neuron of the hidden layer. This enables the RBF-SOM to generate a topology preserving mapping during the training process. The hidden layer of RBF-SOM is further called map layer[17][7].
j ∈ μ Ο the j-th external output of the network for pattern k is defined by y j (k ) ∈ ℜ . The vector x(k )=
(x (k ),...x
def
(k ))is called def y (k )= (y1 (k ),... yC μ
1
the vector
μΙ
Ο
pattern of the network.
(I )
4. A propagation function s j
(k )
input pattern and
)is called output
Training of RBF-SOM The training of an RBF-SOM can be easily adapted from the training of RBF networks and SOM[7][5].The training is a two stage process of unsupervised and supervised training. In Unsupervised training of the weights is between input and map layer. This is done from the SOM learning algorithm[17].
is assigned to every
j ∈ μ to calculate its net input (with ι ∈ {Ι, Η, Ο} and j ∈ μ1 ). -For each neuron j ∈ μ Ι :
neuron
Stage 1: Unsupervised training of the weights between input and map layer. It creates a topology preserving mapping from input space to an. two dimensional map space.
s (jI ) (k ) = x j (k ) -For each neuron j ∈ μ Η :
Stage 2: Supervised training of the weights between map and output layer (e.g. singular value decomposition). This training stage can be adapted from the learning algorithm of an RBF network[13][4]. RBF Networks are usually trained with a three stage process. In a third stage the weights between the input layer and the hidden layer and the radi are further optimized using non-linear optimization, e.g. back propagation. This stage is not used for RBF-SOM, because the training of the weights may destroy the topology preserving mapping created in stage one.
s (jΗ ) (k ) = x(k ) − ω (jΗ ) j ∈ μΟ :
-For each neuron
s (jΟ ) (k ) =
∑ ω((
Ο) i, j )
i∈μΙ
.ai( I ) (k ) + ∑ ω ((iΟ, j)) .ai(Η ) (k ) + ω ((ΒΟ,)j )
. is the length of a vector. The length of a vector is
( x, x )
defined as x = scalar
product (Η )
bias. ω j =
def
(ω(
Η 1, j )
with .,. being the standard
in
ℜ . ω ((ΒΟ,)j )
)
is
, ω ((2Η, )j ) ,....ω ( μΙ , j ) ∈ ℜ
μ
called Ι
First stage of the training process In the first stage of the learning process the weights between the input and the map layer are adjusted[2]. The algorithm can be seen as an unsupervised learning algorithm. For the definition of the training algorithm we need the following
is
called the prototype or map vector. 5. An activation function a : ℜ → ℜ is assigned to every neuron j ∈ μ . -For each neuron j ∈ μ Ι :
Radius Function of RBF-SOM A radius function of an RBF-SOM is a function ℜ D → ℜ ≥ with D generally being 2 (two- dimensional map layer)[17][18]. A radius function measures the distance of a point in ℜ D to the origin of the coordinate system with the help of a norm.
a (jΙ ) (k ) = s (jΙ ) (k ) = x j (k )
-For each neuron j ∈ μ Η :
(
a (jΗ ) (k ) = φ j s (jΗ ) (k ). p j
)
Distance Function of RBF-SOM A distance function of an RBF-SOM is a monotonically decreasing function ℜ ≥ → ℜ ≥ .A distance function is used to model the decreasing neighborhood of neurons with progressing time[7].
-For each neuron j ∈ μ Ο :
(
)
a (jΟ ) (k ) = σ j s (jΟ ) (k ) = y j (k )
a (jl ) (k ) s called activation of neuron j in layer l for pattern k . Thereby σ is a generally sigmoid activation function and φ j a basis function (generally a radial symmetric basis function).
φj
Neighborhood Function of RBF-SOM A neighborhood function of an RBF-SOM is a function ℜ ≥ * ℜ ≥ → ℜ [7].A neighborhood function maps a radius r (given by the output of a radius function) and a
has an additional
3
ICGST-GVIP Journal, ISSN: 1687-398X, Volume 9, Issue 4, August 2009
distance d (given by the output of a distance function) to the degree of neighborhood.
Activation Matrix The activation matrix contains the activations of each map neurons for each input pattern and is defined by
Learning Rate Function of RBF-SOM A learning rate function η : ℜ ≥ → [0,1] is a monotonically decreasing function with lim t →∞ η (t ) = 0 [13][4].A learning rate function is used to model the decreasing of the learning rate by time.
A(H )
Winner neuron The map neuron j’ 2€ UH with the lowest net input is called winner neuron. The winner neuron is defined by
j ′ = arg(min
j∈ u H
1.Initialise all weights
( s (j h ) ( k ))
ω ((ii,)j ) with l ∈ u H ∪ u O
For each neuron
A(H). ω j
(o)
ω (jo ) = ( A( H ) )−1 .t j
In practical application the explicit matrix inversion is replaced by a efficient numerical robust method, for instance singular value decomposition.
5. Comparison of SOM and RBF-SOM The experiments of character recognition vary in many factors such as the sample data, pre-processing technique, feature representation, classifier structure and learning algorithm [15][11]. Here two different classification/learning methods based on the same feature data is done. A better scheme to compare classifiers is to train them on a common feature representation.
λk , p ( j ) = f (rnorm , D(h j − h j′ ), d ( p))
ω (j H ) for the pattern k in
the following way :
The purpose of the empirical study was twofold. The first aspect was to verify that SOM networks did in fact provide consistently better results than an RBF-SOM network for tamil character recognition. The second purpose was to investigate the effect of training-set variation on the performance of the two networks. The evolutionary algorithm will be used for feature selection and model optimization. Here model optimization includes number of neurons and learning rate. Independent test data is been used for comparison of networks. The test data is been categorized into two types. Validation data sets used for evaluation during the training and test data set used for evaluation of the population.
∇ k ω (j H ) = ( X (k ) − ω (j H ) ).η ( p ).λ k , p ( j ) 3. Test, whether a termination condition is met. If this is true, the learning process has ended successfully, otherwise continue the learning process from step 2. Second stage of the training process In the second stage of the learning process the weights between the map and the output layer are determined[13][4].The weights between input and map layer were adjusted in stage one. To formulate the learning algorithm we need the following definitions. Target vector The vector tj = (tj(1), tj(2) ,…, tj(|L|))T is called target vector which contains the j-th component of each target pattern.
ω
(o) j
= (ω
vector (o) (1, j )
,ω
(o) ( 2, j )
,...., ω
The (o) (u H , j )
= tj
The approximation problem can be solved by
(c) Determine the winner neuron j ′ ∈ uK H and the coordinate vector hj of the winner neuron. (d) Determine for each neuron j ∈ uK H the degree of neighbourship to the winning neuron. The degree of neighbourship is determined by the radius, distance and neighbourhood function in the following way :
Weight
j ∈ u o the following approximation
problem can be formulated of the
RBF-SOM with 0. 2. For each pattern k of the unsupervised learning process execute the following steps : (a) Use x(k) as external input. (b) Propagate the input and determine the net input of each neuron j ∈ uK H of the map layer.
(e) Modify each weight vector
⎡ a1( H ) (1)..... a K( H ) (1) ⎤ ⎥ ⎢ ⎥ ⎢.. ⎥ ⎢ = def ...... ⎥ ⎢ ⎥ ⎢.... ⎢ a ( H ) ( k )..... a ( H ) ( k ) ⎥ k ⎦ ⎣ 1
RBF-SOM and SOM Networks will be trained as a classifier for a set of tamil character scanned document. Knowledge from recent research on handwritten characters with RBF networks will be integrated in the optimization process to determine a good starting point for the search[3]. A complete initialization consisting of all combinations of feature is not feasible. Research conducted here says that SOM with few basis functions
vector
) is called the weight
vector and contains the weights of the connections between the map layer and output neuron j ∈ u o
4
ICGST-GVIP Journal, ISSN: 1687-398X, Volume 9, Issue 4, August 2009
are able to achieve less classification results for large data sets[17]. The learning rate increase the time of training period. The RBF network can yield competitive accuracy with the SOM when training all parameters by error minimization. The accuracy of RBF-SOM classifier are more stable against the training sample size. At same level of classification accuracy[17][18], RBF-SOM are less expensive in storage and execution.
produced much higher recognition rate than the test set. The size of the training file is also responsible for time taken to train a neural net. The larger the size of training file, the longer it takes to train the neural net. Accuracy is the degree to which the output data generated from the system to which the SOM and RBF – SOM recognizes a character. When there was low accuracy (to be measured in percentage) after the initial test, the network was re-trained to obtain a higher accuracy of the output value. The number of errors contained in the output valve was taken into consideration. Different sources of error were also investigated.
Accuracies lower than 90% are often reported to difficult cases like unconstrained cursive script recognition. Improved accuracy is always desired. Hence forth the combination of RBF and SOM gives more efficiency in bringing out better and accurate results. A significant factor, which was disregarded in this study, was the use of control properties such as learning rate, momentum and error tolerance. While learning rate controls the degree of the changes made to the connections weights, momentum causes the errors from previous training patterns to be averaged together over time and added to the current error. Error tolerance controls the closeness of the output value to the desired output.
Percentage accuracy was calculated as: Accuracy =
Figures 2 and 3 show the behavior of the SOM and RBF-SOM with a different learning rate initialized during the training epoch. With a high learning rate, units easily overwrite their prototype vector with each new kind of input vector. As per the empirical study the results say with a smaller learning rate it does not do preserve their prototype vectors, and provides less accuracy. Similar data clusters in a particular region of the environment populated with sensor units.
Results and findings After preprocessing of the sub-images of isolated characters we extract their geometric features with the order of area, centroids and bounding box and the elements for feature vectors which are fed to classifier part. In the classifier part, we use SOM neural network with batch unsupervised training algorithm and RBFSOM with RBF and SOM learning algorithms set as two different classifiers. NN Classifiers
1000 characters
Classification rate
Recognition rate
SOM
in counts
678
768
in percentage
79%
82.6%
in counts
745
850
in percentage
89%
96.9%
RBFSOM
Number of correct classifications Total number of classifications
Figure. 2. The error rate of the SOM when initializing the learning rate
Table 1: Performance Evaluation
Table 1 explains the follows. Here the system is trained with 1000 characters belonging to all the classes. The testing data is separate and it is around set of 800 characters. A fraction of the training data was also used to test the system, to see how the system interprets well in classification and recognition process. All together, 50 text lines were taken for segmentation and preprocessing stage. Every character in each text line was suitably segmented for classification method. The segmentation part was almost 99% accurate resulting only 1% preprocessing error. In the test set, a recognition rate of 82.6% was achieved for the SOM and 96.9% for the RBF-SOM classifier. More precisely, the training set
Figure 3. The error rate of RBF SOM when given learning rate form classification
5
ICGST-GVIP Journal, ISSN: 1687-398X, Volume 9, Issue 4, August 2009
Conclusion In this research, we have evaluated two kinds of classifiers (SOM neural network and SOM-RBF) for tamil isolated handwritten characters recognition. In addition, in SOM-RBF we have two stages of classification with the best parameters that are obtained by simulation Experimental results indicate that the RBFSOM classifier has a better performance in the test phase; however, it has a longer training time in comparison with the SOM neural network. Combination of RBF and SOM that it can automatically adapt to classification results without complete re-training. Combining complementary classifiers can improve the classification accuracy and the tradeoff between error rate and reject rate. The techniques of pre-processing and feature extraction and the option of distorted sample generation affect the recognition accuracy. Compared to the training of RBF networks the training of SOM-RBF is very slow. A new approach can be adapted to reduce training time of RBFSOM. A hybrid statistical/discriminative classifier may yield higher accuracy than both the pure statistical and the pure discriminative classifier.
[11] C.-L. Liu, K. Nakashima, H. Sako, H. Fujisawa, Handwritten digit recognition: benchmarking of state-of-the-art techniques, Pattern Recognition, 36(10): 2271-2285, 2003. [12] C.-L. Liu, K. Nakashima, H. Sako, H. Fujisawa, Handwritten digit recognition: investigation of normalization and feature extraction techniques, Pattern Recognition, 37(2): 265-279, 2004. [13] C.-L. Liu, H. Sako, H. Fujisawa, Discriminative learning quadratic discriminant function for handwriting recognition, IEEE Trans. Neural Networks, 15(2): 430-444, 2004. [14] K.Negishi, M.Iwamura, S.Omachi, H.Aso. Isolated Character Recognition by Searching Features in Scene Images, First International Workshop on Camera-Based Document Analysis and Recognition, pp. 140-147, 2005. [15] A.F.R. Rahman, M.C. Fairhurst, Multiple classifier decision combination strategies for character recogni-tion: a review, Int. J. Document Analysis and Recognition, 5(4): 166-194, 2003. [16] R. Raina, Y. Shen, A.Y. Ng, A. McCallum, Classification with hybrid generative/discriminative models, Advances in Neural Information Processing System 16,2003. [17] Su, M.C., Chang, H.T.: Fast self-organizing feature map algorithm,2000. [18] Sick, B.: Technische Anwendungen von SoftComputing Methoden. Lecture Notes, University of Passau, Faculty of Mathematics and Computer Science,2001. [19] Shicong Feng, Zhigang Zhang, Xiaoming Li, Implement and Applications of a Chinese Web Page Automatic Categorization Approach, Computer Engineering, Vol.30, No.5, 2004. [20] J.Sun, Y.Hotta, K.Fujimoto, Y.Katsuyama, S.Naoi. Grayscale Feature Combination in Recognition based Segmentation for Degraded Text String Recognition, First InternationalWorkshop on Camera-Based Document Analysis and Recognition, pp. 39-44, 2005. [21] S.Veni, K.A.Narayanankutty, M.Kiran Kumar “Design of Architecture for Skeletonization on Hexagonal Sampled Image Grid” ICGST-GVIP Journal, ISSN 1687-398X, Volume (9), Issue (I), February 2009
References [1] J.H.Bae, K.C.Jung, J.W.Kim, H.J.Kim. Segmentation of touching characters using an MLP, Pattern Recognition Letters, 19(8):701-709, 1998. [2] R.G.Casey, E.Lecolinet. A survey of methods and strategies in character segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(7):690-706, 1996. [3] J.X. Dong, A. Krzyzak, C.Y. Suen, High accuracy handwritten Chinese character recognition using support vector machine, Proc. Int. Workshop on Artificial Neural Networks for Pattern Recognition, Florence, Italy, 2003. [4] T. Evgeniou, C. A. Micchelli, and M. Pontil (2005). Learning Multiple Tasks with Kernel Methods.Journal of Machine Learning Research, 6: 615637, 2005. [5] Ginsberg, M.: Essentials of Artifical Intelligence. Morgan Kaufmann Verlag, San Mateo, Ca.,1993. [6] Hofmann, A., Sick, B.: Evolutionary Optimization of Radial Basis Function Networks for Intrusion Detection. Submitted to the International Joint Conference on Neural Networks ,2003. [7] Hush,DR. and Horne.B.G. Progress in supervised neural networks—what's new since Lippmann? IEEE Signal Process. Mag.,10, 8-39,1993. [8] Q. Ye, Q. Huang, W. Gao, and D. Zhao, Fast and robust text detection in images and video frames, Image and Vision Computing 23, no. 6, 565_576.2005 june [9] Klimek, M.: Evolutionre Architekturoptimierung von RBF-Netzen. Master's thesis, UniversitÄat Passau, Lehrstuhl fuer Rechnerstrukturen, 2003 [10] Kohavi, R., John, G.: The wrapper approach. Feature Selection for Knowledge Discovery and Data Mining 33 – 50,1998.
6
ICGST-GVIP Journal, ISSN: 1687-398X, Volume 9, Issue 4, August 2009
Lt. Dr. S. Santhosh Baboo, has around Seventeen years of postgraduate teaching experience in Computer Science, which includes Six years of administrative experience. He is a member, board of studies, in several autonomous colleges, and designs the curriculum of undergraduate and postgraduate programmes. He is a consultant for starting new courses, setting up computer labs, and recruiting lecturers for many colleges. Equipped with a Masters degree in Computer Science and a Doctorate in Computer Science, presently he is working as Reader, Area of specialization : Data mining, Image processing and software engineering. Dr. P. Subashini, 15 years of teaching experience-working as Associate Professor, Area of Specialization: Image Processing, Pattern recognition, Neural Networks, Email id :
[email protected]
Ms.Krishnaveni , 2 Years of Research Experience Working as Research Assistant in Naval Research Board project, Area of Specialization: Image Processing, Pattern recognition, Neural Networks. Email id: krishnaveni.rd @gmail.com
7