Template of Manuscripts for IREE

Statistical and Template Matching Features for Persian Handwritten Postal Code Recognition

Z. Kamranian 1, S. A. Monadjemi 1, N. Nemat Bakhsh1

Abstract – Postal code recognition can be an important step in the post offices automation. In this paper, a recognition system based on statistical and template matching approaches to recognize Persian handwritten postal codes on postal envelopes is presented. The system consists of pre-processing, feature extraction, and classification stages. Pre-processing is fully automated for envelopes with a postcode frame. For feature extraction, we use statistical and template-matching features. Moreover, a MLP neural network with one hidden layer is trained using back-propagation algorithm as the classifier. Furthermore, we propose a voting algorithm employing several neural networks results to develop the classification performance. Evaluating the proposed system with approximately 2100 test samples, the recognition rate of 97.75% is achieved. Copyright © 2011 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Persian OCR, Statistical features, Template matching, MLP Neural Network, Voting Algorithm, Postal Code, Handwritten

I.

Introduction

In last decades, numerous methods have been proposed for handwritten character recognition, especially for languages such as English, Japanese and Chinese. Particularly, the recognition of Latin handwritten digits has attracted more attention regarding their popularity. Moreover, for Latin digits, many effective recognition methods have been proposed and high recognition rates reported [1]-[3]. In contrast, a few accomplishments have been reported for recognition of Persian/ Arabic handwritten digits or characters [4], and some of the reported accuracies are not so satisfactory [5]. For instance, Soltanzadeh et al. extracted features from the crossing counts, outer profiles, and projection histograms [6]. Ziaratban et al. used language-specific features, and extracted their feature vector using 20 templates selected from average images of each class, and employed MLP for classification [7]. Cheng-Lin Liu and Ching Y.Suen used the gradient direction histogram feature, and some classifiers on three databases: ISI Bangla numerals, CENPARMI Farsi numerals, and IFHCDB Farsi numerals [8]. Also, Monadjemi et al. emphasized on pre-processing and segmentation in Persian/ Arabic characters recognition [9], and Izakian et al. employed a chain code-based method to identify Persian/ Arabic characters [10]. On the other hand, postal automation is a topic of research interest, and recognition of postal codes is an application of digit recognition that requires very high

accuracy and speed. Most of articles available towards the postal automation are not for Persian language [11], [12], i.e. there is no work done straightly towards the automation of the Iranian postal system. In this paper, we propose an automation system for Persian handwritten postal code recognition. We extract the Persian handwritten postal codes from scanned envelopes. In addition, we use some statistical and template matching approaches to extract some appropriate features. We also propose a voting algorithm for high performance multi level classification. The remainder of the paper is organized as follows: Section II, describes the whole process of extraction postal codes from the postal envelopes. Feature extraction techniques and classification are discussed in Section III and IV respectively. Section V represents some experimental results. Finally, some conclusion remarks are presented in Section VI.

II.

Pre- Processing

The recognition of a digit image typically undergoes image pre-processing, feature extraction, and classification [2]. As shown in Fig. 1; the first step in our approach is the pre-processing. The main goal of pre-processing is to extract Persian digits from the images of envelopes. Persian handwritten digits for this work were collected from postal codes that written by people in different ages and literacy levels, and appeared on the real Iranian Mail Envelopes. The images are in gray-scale, and digitized at 200dpi using a flat scanner,

Z. Kamranian, S. A. Monadjemi, N. Nemat Bakhsh

and stored in Bitmap (bmp) format. Afterwards, they are converted into the binary format. We carry out the following stages to extract digits from an envelope image and create our database: Block detection: The postal codes on envelopes used, are located in the postcode frame (see Fig. 2). Therefore, a heuristic method was used to select the frame initially. The frame or block usually contains 10 digit postal codes, and is 25 to 40 pixels wide and 90 to 120 pixels long. Firstly, using morphology technique, we dilate the horizontal and vertical lines to repair the removed parts of the frame. Next, using component labeling, each component was selected and mapped to the original image. A component is selected if the length and width of the component are conformed to the mentioned features. Then, it was cropped from the image (Fig. 2). In this stage, we could find the frames in all images successfully (100% accuracy). Segmentation: After detection of the postcode frame, digits were segmented using component labeling and saved. Postal code digits were extracted from left to right to preserve the order of the writing. The segmentation process is illustrated in Fig. 3. Size normalization: The segmented digits vary in size. Therefore, the size of digits was normalized. For normalization, we used linear normalization (LN) as bounding the strokes of character with a rectangle and linearly mapping the rectangle into a standard size of 40×40 pixel. In addition, in each digit, we align the centroid (center of gravity) of the digit to the center of the rectangle. Some samples after pre-processing have been shown in Fig. 4. Finally, our database consists of 700 images for each of ten Persian digit (7000 in total).

As the second statistical feature, we calculated some ratios as follows:

III.1.2.1

Aspect Ratio

Aspect ratio of each digit is the ratio of the digit width to its height. It is determined by width to height ratio of the circumferential rectangle around of digit in each image. Input image

Block detection

Preprocessing

Segmentation

Size Normalization

Feature extraction

Classification

Fig. 1. Postal codes recognition system

III. Feature Extraction Techniques Typically the next step in pattern recognition systems is feature extraction, which is the core of character recognition procedure. In this study, two general feature extraction approaches were considered, which are briefly described here. III.1. Statistical Features Six statistical features were extracted in this study as: III.1.1

Number of Pixels Cropped

The first statistical feature was used to compare digit's bulk in different images. In Persian handwritten, bulk can be used to distinguish some digits. It is determined by the rate of digit pixels to the number of pixels of the boundary rectangle (here: 1600). III.1.2

Fig. 2. Heuristic Approach for block detection on a typical Iranian envelop

Ratios

2


As the last feature, we tried to separate a special similarity in Persian handwritten digits. There are two popular styles of digit ‘‘6’’ (6). As shown in Fig. 6-a, the second style of digit 6, is similar to digit ‘‘9’’ (9) (Fig. 6-b). Therefore, we used a feature that we call it "Top Hole", by finding a hole in top of the image.

Fig. 3. Segmentation step results

III.1.2.2

Left Length to Right Length and Top Width to bottom Width Ratio

Left length to right length ratio and Top width to bottom width ratio in Persian digits are different. Therefore, they can classify digits to three classes: longer left length/ top width, equal left and right lengths/ top and bottom widths, and shorter left length/ top width. III.1.3

III.2. Template Matching Feathers These features were obtained by searching the templates in input images. For each template, the amount of matching is calculated and used as a feature. Thus, the desired templates must be extracted. To obtain the desired template's image for each of ten digits, it is sufficient to average images as:

Horizontal and Vertical Symmetry

The third feature was used represents the spatial distribution of pixel values of a binary image to compare the white pixels on the upper and lower halves and also left and right halves of a digit image. To extract this feature, only the white corresponding pixels in different areas of the image (top and down and also right and left of the picture) need to be compared. III.1.4

1 imageAv  IDCT (  n DCT ( I )) ik i n k

Where Iik is the kth sample image in ith class of digits. Discrete Cosine Transform (DCT) helps us to decompose the image into different sub-images of different visual quality and importance, and IDCT computes the inverse cosine transform. Because of two styles for four digits: ‘‘2’’ (2), ‘‘3’’ (3), ‘‘4’’ (4), and ‘‘6’’ (6), they should be considered in two classes [7]. Thus, we will have 14 average images (Fig. 7).

Cross Counts

The fourth feature is based on the cross counts. At horizontal/ vertical orientation, a vector is formed by finding the number of image body segments in each column/ row of the image that lies between the first and last columns/ rows [6]. In this paper, the horizontal transition counts are calculated by scanning the binary image columns, and each transition from 0 to 1 increases a counter that has an initial value of zero. Then, at the end of the scanning process, the scanned column is associated with the counter value. Afterwards, the total of one-transition, two-transitions, and threetransitions is counted. A similar procedure is carried out for each row. III.1.5

(1)

Distance to the First Pixel

As the fifth feature, we measure the distances of horizontal and vertical transition pixels, as discussed in III.1.4, from the first pixel. We selected the first digit pixel in the left of each image, and named it "First Point" (or FP). For horizontal/ vertical transition pixels, row/ column differences from FP were considered. This feature is efficient to recognize digit ‘‘2’’ (2) form digit ‘‘6’’ (6). As shown in Fig. 5, the only difference between Persian 2 and 6 in many cases and styles is the attached parts to their dents called "Handle", which is located in the left and right of their dents respectively. III.1.6

Fig. 4. Sample images from the database after pre-processing. In each row, a sample for each digit from ‘‘0’’ to ‘‘9’’ is shown respectively from left to right.

(a)

(b)

Fig. 5. (a) Digit ‘‘2’’ (2); its handle is located in the left of its dent, (b) Digit ‘‘6’’ (6); its handle is located in the right of its dent

Top Hole 3


V.

Results and Discussion

In this section, the accuracies of our classifiers and feature extraction techniques on the Persian handwritten digit recognition are going to be presented. For training and testing our system we used 7000 handwritten digits written by different persons. The samples were divided into the train and test sets by considering randomly 70% for training and the rest 30% for testing the neural network. The evaluation strategy is random iterative hold out strategy which was repeated 10 times. All experiments were performed on an Intel Core™ 2 Dou, 2.5 GHz computer. Table II represents the results of MATLAB simulation for three mentioned ANNs of Section IV. The result of our proposed voting algorithm is also shown in Table III. From these tables, it is observed that this voting algorithm increases the recognition rate rather than using ANNs separately. Using this proposed classifier, average recognition rate of 98.6% and 97.5% are obtained for training and testing sets respectively. Table IV shows the results of the test set in our best experiment. It is observed that most of the recognition errors occurred for discriminating the digits ‘‘2’’ (2) and ‘‘3’’ (3). However, it is noticed that the problem to recognize digits ‘‘6’’ (6), ‘‘9’’ (9) and ‘‘6’’ (6) are nearly solved. Most of misrecognized samples are poorly written as shown in Fig. 8.

(a)

(b) Fig. 6. (a) Two popular styles of digit ‘‘6’’ (6) in Persian handwritten, (b) ‘‘6’’ (6) and ‘‘9’’ (9) similarity in Persian handwritten

IV.

Classification

The final step of recognition system is called classification. For the classification stage, we used three simple Multi-Layer Perceptron (MLP) neural networks and also a proposed voting algorithm. MLP is one of the most common families of neural networks for pattern classification applications. In this paper, we used MLPs with one hidden layer. MATLAB [MathWorks] has become the preferred language of computing for researchers [13]. Therefore, we use MATLAB to train our neural network system and test our testing samples. In MLP, the number of hidden neurons of the neural networks (h) needs to be adjusted. Thus, on the set composed of {training set} - {validating set}; each ANN was trained with 30 neurons in a hidden layer. In this step, three MLP were trained based on statistical features, template-matching features, and combination of these two features as inputs. Hereafter, we will call the third network to "Combined NN". The specifications of these three MLP neural networks are shown in Table I. In the next section, we introduced a voting algorithm to increase the recognition accuracy.

VI.

Conclusion

In this paper, a recognition system for Persian handwritten postal codes was introduced. We extracted postal codes from postal envelop frames, and two feature extraction methods for digit recognition were used: Statistical approach and Template matching. In addition, MLP classifiers with 20, 14 and 34 neurons in their input layer and only a few neurons in the hidden layer was utilized for our proposed voting algorithm in the classification part. By implementation of this algorithm, we obtained high accuracy of 97.75% in offline Persian handwritten postal codes recognition.

IV.1. Proposed Voting Algorithm To extract features, we used two approaches: statistical and template-matching. Therefore, we proposed voting algorithm to use both of neural network results, which acts for each sample as follows: 1. If the outputs of both neural networks are equal, then the voting algorithm selects one of them. 2. If the outputs of both neural networks are not equal, then their accuracy will be compared: 2-a. If the difference of their accuracy is more than 30%, then the voting algorithm selects the neural network with maximum accuracy. 2-b. If the difference of their accuracy is less than 30%, voting algorithm will use the "Combined NN" to make the final algorithm.

0

1

2

2

3

3

4

4

5

6

6

7

8

9

Fig. 7. Average images that is obtained from different styles

4


Authors’ information 1Department

of Computer Engineering, Faculty of Engineering, University of Isfahan, Isfahan, 81746, Iran

2-3

3-2

4-2

2-6

9-6

Z. Kamranian is born in 1985 in Isfahan, Iran. She received her B.Sc. in computer software from University of Isfahan, Isfahan in 2008. Currently she is a M.Sc. student again in software engineering at the Department of Computer Engineering, University of Isfahan, Isfahan, Iran. Her research interest contains OCR, DIP, and

Fig. 8. Test samples incorrectly recognized. The left and right numbers under of each digit, represent the correct and the classified class, respectively.

Artificial Intelligence. Email: [email protected]

References [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

S. A. Monadjemi is born in 1968 in Isfahan, Iran. He received his B.Sc. in electrical/ computer engineering From Isfahan University of Technology, Isfahan, Iran in 1992, and his M.Sc. in computer engineering, machin intelligence and robatics from Shiraz University,Shiraz, Iran in 1994, and his PhD in computer engineering, image processing and pattern recognition, from Bristol University, Bristol, England in 2004. His research interests are DIP, Machine Vision, Pattern Recognition, Artifical Intelligence, and Training through Computer. Dr. Monadjemi is currently an Asst. Professor in Department of Computer Engineering, University of Isfahan, Isfahan, Iran. Email: [email protected]

C. -L. Liu, K. Nakashima, H. Sako, H. Fujisawa, Handwritten Digit Recognition: Benchmarking of State-of-the-art Techniques, Pattern Recognition, Vol. 36, n. 10, pp. 2271-2285, 2003. C. -L. Liu, K. Nakashima, H. Sako, H. Fujisawa, Handwritten Digit Recognition: Investigation of Normalization and Feature Extraction Techniques, Pattern Recognition, Vol. 37, n. 2, pp. 265-279, 2004. P. Zhang, T. D. Bui, C.Y. Suen, A Novel Cascade Ensemble Classifier System with a High Recognition Performance on Handwritten Digits, Pattern Recognition, Vol. 40, n. 12, pp. 34153429, 2007. Mozaffari, S., Faez, K., Ziaratban, M. , Structural Decomposition and Statistical Description of Farsi/ Arabic Handwritten Numeric Characters, Proceedings of the 8th IEEE International Conference on Document Analysis and Recognition (Page: 297 Year of Publication: 2005 ISBN:0-7695-2420-6 ). Sadri, J., Suen, C.Y., Bui, T. D. , Application of Support Vector Machines for Recognition of Handwritten Arabic/ Persian Digits, Proceedings of the 2th Conference on Machine Vision and Image Processing (Page: 300 Year of Publication: 2003 ). H. Soltanzadeh, M. Rahmati, Recognition of Persian Handwritten Digits using Image Profiles of Multiple Orientations, Pattern Recognition Letters, Vol. 25, n. 14, pp. 1569-1576, 2004. Ziaratban, M., Faez, K., Faradji, F. , Language- based Feature Extraction using Template-matching In Farsi/ Arabic Handwritten Numeral Recognition, Proceedings of the 9th International Conference on Document Analysis and Recognition, (Page: 297 Year of Publication: 2007 ISBN:978-0-7695-2822-9 ). C. -L. Liu, C. Y. Suen, A New Benchmark on the Recognition of Handwritten Bangla and Farsi Numeral Characters, Pattern Recognition, Vol. 42, n. 12, pp. 3287-3295, 2009. Monadjemi, S.A., the OCR Research group, Farsi and Arabic OCR Systems: A Review and Some Proposals, in Persian, Proceeding of International Conference on Islamic World, Information technology, and Information Society (Page: Year of Publication: 2006 ). Izakian, H., Monadjemi, S.A., TorkLadani, B., Zamanifar, K. , Multi-font Farsi/ Arabic Isolated Character Recognition using Chain Codes, Proceedings of World Academy of Science, Engineering and Technology (Page: 67 Year of Publication: 2008 ISSN:2070-3740 ). Mahadevan, U., Srihari, S. N. , Parsing and Recognition of City, State, and ZIP Codes in Handwritten Addresses, Proceedings of the 5th International Conference on Document Analysis and Recognition (Page: 325 Year of Publication: 1999 ISBN:0-76950318-7 ). Srihari, S. N., Keubert, E. J. , Integration of Hand-Written Address Interpretation Technology into the United States Postal Service Remote Computer Reader System, Proceedings of the 4th International Conference on Document Analysis and Recognition (Page: 892 Year of Publication: 1997 ISBN:0-8186-7898-4 ). M. Cheriet, N. Kharma, C. -L. Liu, C. Y. Suen, Character Recognition Systems: A Guid for Students and Practitioners (John Wiley & Sons, 2007).

N. Nemat Bakhsh is born in Isfahan, Iran. He received his B.Sc. in Maths. From University of Isfahan, Isfahan, Iran in 1973, and his M.Sc. in computer science from Worcester Polytech. In 1978, and his PhD in Software Reliability from University of Bradford 1998. His research interests are Software Engineering and Performance analysis and measurements. Dr. Nemat Bakhsh is currently an Asst. Professor in Department of Computer Engineering, University of Isfahan, Isfahan, Iran. Email: [email protected] [14]

5

Z. Kamranian, S. A. Monadjemi, N. Nemat Bakhsh TABLE I THE SPECIFICATIONS OF THREE MLP NEURAL NETWORK CLASSIFIERS Number of neurons in Number of neurons in Number of neurons in input layer hidden layer output layer

ANNs

Transfer Function for hidden layer

First ANN

20

30

10

'logsig'

Second ANN

14

30

10

'logsig'

Third ANN

34

30

10

'logsig'

TABLE II THE RESULTS OF MATLAB SIMULATION FOR ANNS ANNs

Recognition rate of system for training data

Recognition rate of system for testing data

First ANN

96.8%

95.3%

Second ANN

93.0%

91.2%

Third ANN

97.5%

95.7%

TABLE III THE RESULTS OF MATLAB SIMULATION FOR PROPOSED VOTING ALGORITHM Recognition rate of system for training data

Recognition rate of system for testing data

Max: 98.8%, Average: 98.6%

Max: 97.75%, Average: 97.5%

TABLE IV CONFUSION AND REGOGNITION RESULTS FOR TEST SET Input digit

0

1

2

3

4

5

6

7

8

9

Recognition rate

0

213

0

0

0

0

2

1

1

0

0

98.2%

1

0

227

0

0

0

0

1

0

0

0

99.6%

2

0

1

197

6

0

0

1

0

0

0

96.1%

3

0

0

6

200

1

3

0

0

0

0

95.2%

4

0

0

1

3

207

1

2

0

0

0

96.7%

5

3

0

0

0

0

213

1

0

0

0

98.2%

6

0

1

3

0

0

0

186

0

1

2

96.4%

7

0

0

1

1

0

0

2

208

0

0

98.1%

8

0

0

0

0

0

0

0

0

205

0

100%

9

0

0

0

0

0

1

1

0

0

197

99%

Mean recognition rate

6

97.75%