investigating a fuzzy approach for handwritten sinhala ...

0 downloads 0 Views 370KB Size Report
been given for Asian languages such as Sinhala and Tamil. This paper ... presents a new approach for off-line Sinhala character recognition. To facilitate the ...
S.R. Kodituwakku et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6031-6034

INVESTIGATING A FUZZY APPROACH FOR HANDWRITTEN SINHALA CHARACTER RECOGNITION 1

S. R. Kodituwakku and 2P. S. Nilanthi

1,2

Department of Statistics & Computer Science, University of Peradeniya, Sri Lanka

Abstract Much effort has been made in recognizing both online and off-line characters automatically. Although many approaches have been proposed most of them focus on characters of the English Language. A little attention has been given for Asian languages such as Sinhala and Tamil. This paper observes the existing approaches and presents a new approach for off-line Sinhala character recognition. To facilitate the recognition a Fuzzy logic based segmentation approach is used. A subset of the Sinhala alphabet written by different persons is chosen for the study. The set includes smoothly written characters, similar shaped characters with slight differences and characters not written smoothly. On the average, the proposed system achieves 65% accuracy.

1. Introduction The recognition of handwritten characters is a key problem and many methods have been proposed in the recent years [1, 2, 3, 4, 5, 6, 7]. Various segmentation algorithms have been proposed for English character recognition based on soft computing techniques such as artificial neural networks and fuzzy logic [6, 7]. Artificial neural networks and statistical approaches such as Hidden Markov Models are the techniques used for recognition of Sinhala characters. Offline Sinhala character recognition is a challenging task because of the special structure of Sinhala characters. There are few attempts on Sinhala character recognition. A Neural network based technique is introduced by Rajapakse et.al. [4]. They attempt to use neural network techniques in developing a system that can recognize off-line Sinhala characters. It is based on the widely used pattern classification neural network technique, the Backpropagation. An input to a neural network consists of a collection of 1’s and 0’s arranged in an array. The network is then trained to release the output patterns corresponding to these inputs. Each character is split into a number of segments and each segment is handled by a set of neural networks. Three-component architecture is used for recognition. These components handle the Pre-processing, Recognizing and Postprocessing tasks. The pre-processing component handles all the manipulation necessary for the preparation of the characters that are forwarded to the recognition component. The next component recognizes segments of the character. This is carried out by nine neural networks each modeling a particular geometrical segment assigned to it. Modifiers in the Sinhala alphabet go through the recognition process unsplit and are handled by a separate neural network. The post-processing component is the final stage of the entire process. In this component, segments split earlier are unified with the aid of a lookup table. According to the authors performance varies depending on the input characters. For example, the accuracy of the recognition is i) 88%, if the characters are written by the same individual, ii) i64%, if the characters are written by two different individuals and are similar looking and iii) 75%, if the characters are written by a chosen set of individual’s. Algorithms for threshholding, noise reduction and skew correction of Sinhala handwritten words are presented in [2]. Those are used to improve the accuracy of segmentation and recognition. The authors reported that these algorithms have 97.2% accuracy. Off-line Sinhala Handwriting recognition method is presented by Hewavitharana et. al. [1]. In this study, the use of Hidden Markov Models (HMMs) for off-line Sinhala character recognition is investigated. A subset of the Sinhala alphabet, which consists of the most commonly used 25 letters, is used for the study. The recognition process is split into two major sections: preliminary classification and recognition. Firstly, structural properties of the handwritten character line are used to pre-classify an unknown character into a subset of its candidate characters. Then the HMM classifier is used for the final recognition. According to the authors the system has reported 64.3% accuracy. An approach based on Linear Symmetry is described in [5]. In this method, characters are initially recognized through a multi-level filtering process using the Linear Symmetry [LS] feature. The recognized character is then

ISSN: 0975-5462

6031

S.R. Kodituwakku et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6031-6034 segmented to identify the associated modifier symbol/s. The use of LS facilitates the recognition of touching characters as well. A method to determine the skew angle of the script is also presented. Experimental results indicated 84% accuracy. It is clear that different concepts can be used for handwritten Sinhala character recognition with varying accuracies. Since the applicability of soft computing technique has been investigated we investigate another soft computing technique. A Fuzzy logic based recognition technique for handwritten Sinhala characters is described in this paper. The salient feature of this technique is that it segments characters into meaningful segments so that the fuzzy characteristics can be applied on resultant segments for the purpose of recognition. 3. Materials and Methods A recognition system consists of several components. These components could easily be described by considering the block diagram depicted in Figure 1.

Figure 1 - Block diagram of a general character recognition system

Binarizaion In order to reduce unwanted noise and to obtain the character skeleton, the binarization process is used. First letter sized paper is used to collect sample handwriting. A single document typically includes about 10 lines with 4-5 words in each line. The documents are scanned at a resolution of 100 dpi and are binarized using a thresholding technique. The value of a pixel point is set to be 1 or 0, if the intensity value of a certain pixel is above or below a threshold intensity value. Segmentation and feature extraction In character recognition, segmentation plays a major role as the success depends on the segmentation process. Generally characters are decomposed into meaningful segment by travelling along the character skeleton. Sinhala characters mainly consist of arcs and varying shapes. A sample of hand written Sinhala characters is shown in Figure 2.

ISSN: 0975-5462

6032

S.R. Kodituwakku et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6031-6034

Figure 2 - A sample of hand written Sinhala characters

These arcs make it difficult to segment the character by travelling along the character skeleton. Therefore, the character skeleton area is segmented into regions based on the minimum and maximum coordinate values of each character image. For each character, row pixel density count and column pixel density count are taken and stored. Figures 3 and 4 depict how these counts are taken.

Figure 3 - Horizontal pixel density counts

Figure 4 - Vertical pixel density counts

After storing the pixel counts relevant fuzzy membership values are calculated using the fuzzy membership function shown in Figure 5.

Figure 5 - Fuzzy membership functions for pixel density count

Linguistic terms VVS, VS, S, M, H, VH, VVH stand for ‘very very small’ ,’very small’, ’small’, ’medium’, ’high’, ’very high’, ’very very high’. Fuzzy membership values are calculated five times for a character written differently. Training process In this process a database is built to represent characters and segments. For each character the calculated fuzzy characteristic values for all segments are stored in a database. Recognition Process In this process, the database constructed during the training process is used as an automatic rule base for recognition. In order to recognize a character fuzzy features of the character are computed and compared against that of the characters stored in the database. In this way, individual characters in the document are recognized one by one.

ISSN: 0975-5462

6033

S.R. Kodituwakku et. al. / International Journal of Engineering Science and Technology Vol. 2(11), 2010, 6031-6034 4. Results and Discussion A prototype system developed was tested with a sample Sinhala characters written by different persons. This includes smoothly written characters, similar shaped characters with slight differences and characters not written smoothly. The test results indicated that the system successfully identified smoothly written characters with more than 70% accuracy. However, some problems occur when there are characters with similar shape, but they differ by a small part. On average the system reported 65% accuracy. 4. Conclusions Fuzzy logic can be used for Sinhala character recognition with a good segmentation technique. Segmentation of Sinhala characters is difficult compared to the segmentation of English characters. The proposed segmentation indicated different recognition rates for different characters. It responds well for smoothly written characters and for characters which does not tend to conflict with another character. However, sometimes it fails to identify characters written in different shapes. On the average, the proposed system achieves 65% accuracy. Grouping of characters according to shapes before applying fuzzy rules could improve the accuracy. References [1] [2] [3] [4] [5] [6]

[7]

Hewavitharana S., Fernando H. C. and Kodikara N.D. (2002). “Off-line Sinhala Handwriting Recognition using Hidden Markov Models”, Proc. of Indian Conference on Computer Vision , Graphics & Image Processing (ICVGIP) 2002, Ahmedabad, India, 266-269. Karunanayaka M.L.M, Marasinghe C.A. and Kodikara N.D. (2005). Thresholding, Noise Reduction and Skew correction of Sinhala Handwritten Words, MVA2005 IAPR Conference on Machine Vision Applications, May 16-18, Tsukuba Science City, Japan. Fernando H.C. and Kodikara N.D (2003). A Database of handwritten Text Recognition Research in Sinhala Language, Proceedings of the Seventh International Conference on Document Analysis and Recognition, 1262-1264. Rajapakse R. K., Weerasinghe A. R. and Seneviratne E. K. (2004). A neural network based character recognition system for Sinhala script. Department of Statistics and Computer Science, University of Colombo. H. L. Premaratne and J.Bigun (2003). Recognition of Printed Sinhala Characters Using Linear Symmetry, in the proceedings of the ACCV2002: The 5th Asian Conference on Computer Vision, Melbourne, Australia, pp 1 Batuwita K.B.M.R. and Bandara G.E.M.D.C. (2005). "An Online Adaptable Fuzzy System for Offline Handwritten Character Recognition," in the Proceedings of Fuzzy Logic, Soft Computing and Computational Intelligence-- 11th World Congress of International Fuzzy Systems Association (IFSA 2005). China, Springer Tsinghua, Vol. 2:1185-1190. Bandara, G.E.M.D.C., Pathirana, S.D. and Ranawana (2002). R.M.: Use of Fuzzy Feature Descriptions to Recognize Handwritten Alphanumeric Characters. In: Proceedings of 1st Conference on Fuzzy Systems and Knowledge Discovery, Singapore, 269-274

ISSN: 0975-5462

6034