The Role of Size Normalization on the Recognition Rate ... - CiteSeerX

9 downloads 0 Views 150KB Size Report
difficult original patterns from NIST, we find that normalizing the original data to a size larger than 20 *. 20 in MNIST increases the recognition rate further. 1.
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui

Centre for Pattern Recognition and Machine Intelligence, Concordia University Montreal, Quebec, Canada H3G 1M8 {cl_he, pin_zhan, jdong, sue , bui}@cenparmi.concordia.ca Abstract Size normalization is an important pre-processing technique in character recognition. Although various effective learning-based methods have been proposed, the role of the original data in a database is always ignored. In this paper, we have conducted experiments to investigate its effects with neural networks and support vector machines and have found that the performance of handwritten numeric recognition systems deteriorates dramatically due to low size resolution. For the MNIST dataset, this study shows that enlarging the size from 20 * 20 to 26 * 26 by bilinear interpolation can improve the performance significantly. After constructing a smaller database of difficult original patterns from NIST, we find that normalizing the original data to a size larger than 20 * 20 in MNIST increases the recognition rate further.

1. Introduction Handwriting recognition has been a subject of research for several decades. Generally, a character recognition system includes three main tasks: preprocessing, feature extraction, and classification. In pre-processing, researchers normally perform noise filtering, binarization, thinning [3], skew correction [2], slant normalization [1], etc. to enhance the quality of images and to correct distortion. In feature extraction, various types of features and extraction techniques are available, such as geometric features, wavelet features [10], etc. In classification, a great number of classification methods are available, including statistical classifiers, artificial neural networks (ANNs) [9], support vector machines (SVMs) [6], and Multiple Classifier Systems (MCSs) [7] [8].

Although correctly selecting learning algorithms helps the improvement of recognition rate, one crucial factor of affecting the recognition rate is always ignored. From many observations and experiments, we suspected that low resolution of original data could reduce the recognition rates of OCR systems dramatically. In this paper, we analyze the effect of size normalization on the recognition of handwritten numerals. We normalize images to different sizes, applying the same classifier and features to observe the relationship between sizes and recognition rates. In pre-processing, we only do size normalization; gradient features [5] are extracted from the normalized images. We choose ANN and SVM as classifiers for this analysis because they have good learning ability and exhibited good performance. We describe the process of pre-processing and feature extraction in Section 2. We then describe recognizing the MNIST test set under different normalization sizes in Section 3. In Section 4, we construct a new but smaller database with all the poor and the most difficult original patterns chosen from NIST, which are selected based on the error images with eight different normalization sizes. We compare the error rates of the small database at different sizes and from different sources in Section 5. Finally, we conclude and analyze the effect of size normalization on the recognition of handwritten numerals in Section 6.

2. Pre-processing & Feature Extraction In size normalization, we keep the aspect ratio of the images and normalize them to bigger sizes. First, we binarize and cut the original images (Figure 1(a)) of an MNIST numeral into a rectangle with the same height and width of original patterns (Figure 1(b)).

1

Substitution Rate vs Normalization Size 5 4.5

4.56

4 Substitution rate

After that, we enlarge images to a fixed size (e.g. 26 * 26) using a bilinear interpolation algorithm (Figure 1(c)) [11]. Finally, we put the new normalized images at the center of an empty image with size 32 * 32 (Figure 1(d)), ready for the extraction of gradient features [5]. In each pattern, a feature vector with size 400 (5 horizontal, 5 vertical, 16 directions) is produced.

3.5 3.2

3 2.5

2.02

2 1.5

1.15

1

1.13

1.1

0.94

0.5 0 ANN

20*20

22*22

24*24

26*26

28*28

30*30

41*41

4.56

3.2

2.02

1.15

1.13

1.1

0.94

Normalization size ANN

(a) 28 * 28 pad

(b) 20*20 cut image

Figure 2. Substitution rates at different normalization sizes of MNIST with ANN Substitution Rate vs Normalization Size 1.2

(c) 26*26 enlarging

(d) 32*32 pad

Substitution rate

1

1.02

1

1.01 0.84

0.8

0.81

0.79

0.75

0.6 0.4 0.2

Figure 1. Sample images in pre-processing

0 SVM

3. Recognizing images in MNIST with different sizes In this section, we applied two different classifiers – a 3-layer ANN with back propagation (BP) algorithm and a SVM classifier to test all the patterns in our MNIST test set to observe the recognition rate at different sizes. The reason for choosing two classifiers is to ensure that the normalization size affecting the recognition rate of the system is not happening because of these classifiers. In ANN, the number of nodes in the first layer is equal to 401 (the number of features + 1 bias node); the number of nodes in the hidden layer is 100; and the number of nodes in the output layer is 10 representing the 10 classes. As a result, we find the recognition rates rise with both ANN and SVM when we enlarge the images to bigger sizes. The details are shown in Figure 2 and Figure 3. We provide some statistics on MNIST database [4], which is a widely known handwritten digit recognition benchmark. It is a subset of a larger set available from NIST. In MNIST, the digits have been size-normalized and centered in a fixed-size image. The original black and white (bi-level) images from NIST were size normalized to fit in a 20*20 pixel box while preserving their aspect ratio.

20*20

22*22

24*24

26*26

28*28

30*30

41*41

1.02

1

1.01

0.84

0.81

0.79

0.75

Normalization size SVM

Figure 3. Substitution rates at different normalization sizes of MNIST with SVM We find that when we increase the normalization sizes from 20 * 20 to 26 * 26, the substitution rate decreases from 4.56% to 1.15% and from 1.02% to 0.84% with ANN and SVM respectively. When we increase the normalization sizes from 26 * 26 to 41 * 41, the substitution rate continues to decrease, but the differences are much smaller. Therefore, in our experiments, we observed that 26 * 26 is an optimal size of normalization. The recognition rate rises when we normalize images to bigger sizes. As images in MNIST have already been normalized, normalizing them to a bigger size is the second source of distortion of the originals. Even though we distort (normalize) images twice, the recognition rates still rise. This suggests that if we normalize the image to a bigger size than 20 * 20 from the originals, the recognition rate of the entire system would rise because we only need to normalize images from the originals once instead of twice.

4. Finding the originals to construct a small database

2

In all, we found 417 substitution images in 8 different sizes. In order to create a database with the most difficult cases, we constructed a small database with 181 images mis-recognized in at least 2 different sizes. Since the MNIST database was constructed from NIST's Special Database 3 and Special Database 1, which contain binary images of handwritten digits, and NIST Special Database 19 includes NIST’s Special Database 3 and Special Database 1, consequently we should be able to match all the images between the normalized images from MNIST and the original images from NIST SD 19. According to some statistics, the total number of handwritten labeled characters (digit and alphabetic) in NIST SD 19 is 814,255. In the training set, there are 344,307 isolated digits, and there are 58,646 isolated digits in the test set. Since NIST SD 19 is too large to examine images one by one manually, we effectively apply template matching with some constraints on number of dissimilar pixels and an aspect ratio, which is ratio between the height and the width in an image. At first, we found all the images misclassified by the SVM in MNIST with different sizes and sorted them to 10 classes (0, 1,…, 9). Secondly, we loaded one error image in a class. Thirdly, we removed left, right, bottom and top boundaries and cut the image to their real size. Fourthly, we loaded and normalized the images in NIST SD 19 to the same size of the cut image. After that, we matched the two images by template matching in order to choose a candidate image. Finally, we verified the candidate image with local structures. Suppose that we have a template g[i, j] and we wish to detect its instances in an image f[i, j]. An obvious thing to do is to place the template at a location in an image and to detect its presence at that point by comparing intensity values in the template with the corresponding values in the image. Since it is rare that intensity values will match exactly, we require a measure of dissimilarity between the intensity values of the template and the corresponding values of the image. Here, we take the entire error image as a template and calculate the similarities between error images and original images with formula (4.1). (4.1) ∑ ( f − g )2 [ i , j ]∈ R

where R is the region of the template. In the case of template matching, this measure can be computed indirectly and computational cost can be reduced. We can simplify: ∑ ( f − g ) 2 = ∑ f 2 + ∑ g 2 − 2 ∑ f g (4.2) [ i , j ]∈R

[ i , j ]∈R

[ i , j ]∈R

Our aim is to find patterns with minimum distances in (4.1) or patterns with distances smaller than a certain threshold value. As f and g are fixed, then ∑ fg gives a measure of mismatch in Equation (4.2). Thus, we only need to find patterns with maximum values of ∑ fg . In the procedure of matching two images, the image has to satisfy constraints below: (i) number of dissimilar pixels is not big and (ii) owning similar aspect ratios. If any image satisfies (i) and (ii), it is considered as a candidate image; otherwise, if no image satisfies the two conditions, K in (i) need to be enlarged until an image or several images are found as candidate images. (i). We use K in the following formula to represent the measure of similarity between two images: (4.3) K = max fg



Here, we need K to satisfy the following condition: K ≤ ( hsubstituti on * w substituti on ) / c1 , h substituti on is the height of the current substituted image, w substituti on is the weight of the current substituted image; and c1 is a constant. Experimentally, we set c1 as six. (ii). The difference between the aspect ratio of the original images and the aspect ratio of the current error image should be small. In (4.4), we experimentally set c 2 as 0.1. roriginal is an aspect ratio of an original image, and roriginal is an aspect ratio of an error image.

| roriginal − rtemplate |≤ c 2

(4.4)

When verifying the original images from candidate images, we consider two situations. If the minimum distances in template matching are very small, we endorse the images with minimum distances as their originals. However, if the minimum distances are too big, we need to consider all the candidate images with their local structures in order to find their original images. We considered choosing several candidates instead of choosing one is because, during the verification, we found that two specific situations occurred when the minimum distances were large. Conditions: a) If d(x, y) = || Dmin(x, y) – D2nd min(x, y)|| ≤ T, we will match all the candidate images; otherwise, we will assign the image with Dmin(x, y) to the image in MNIST as a matching pair, where x is the pattern in MNIST, y is the pattern in NIST, Dmin(x, y) is the distance between x and y with template matching, and d(x, y) is the distance between Dmin(x, y) and D2nd min(x, y), and T is a constant.

[ i , j ]∈R

3

b) If d(x, y1) = d(x, y2) & r(x,y1) ≤ r(x,y1), where r(x,yi)= ||RMNIST (x)–RNIST(yi)||, (i = 1, 2), we assign the image with y1, where RMNIST (x) is an aspect ratio of pattern x in MNIST. If all the images satisfy condition a), which means that the first candidate image is too similar to the second candidate image, we need to look and compare their local geometric structures to those of the image in MNIST. The patterns in Figure 5 serve as an example. Although the 1st candidate (far right image in NIST) has the minimum distance, the second middle candidate (centre image in NIST) is the real match of the image in MNIST (the left one). Accordingly, considering the candidate images is necessary. Image in Images in NIST MNIST Original Images 11 111 11 111 11 111 11 111 111 111 11 11 111 111 1111 11111111111111 11111111111111 111111111111 1111 111 111 111 1111 111 111 11 1 1

11 111 11 111 11 111 11 111 11 11 11 11 111 11 11 111111111111 1 11111111111111 11111111111 1111 111 1111 111 111 11 111 11 11 1

5. Comparing the substitution rates of the small database at different sizes from different sources While keeping their aspect ratios, we normalize the original images to various sizes and recognize the normalized images using the same feature extraction algorithm and classifier. According to Figure 6, we observe that enlarging images again increases the recognition rate. Normalizing images from the originals has a better performance than normalizing images from MNIST when the images are normalized to the same sizes.

Number of Errors vs. Normalization Size with Different Normalization Sources

1 11 11 111 11 111 11 111 111 1 111 111 111 1111 111 1111111 11111111111111 11111111111111 11111 1111 111 111 111 11 1111 111 111 11

160 140 No. of Errors

Local Structure

180

120 100 80

91 87

93 77

76 70

63

60

55

84 69 57

53

40 20

Distance

43

Matching results



37

Aspect Ratio Distance

11111111 11111 1 111 11 111 1 11 1 11 1 11 111 11 111 11 11 111 11 1111111111 1

20/13=1.54

Matching results

20*20

22*22

72/48=1.5

51/38=1.34

44

44



×

Figure 5. An example where the aspect ratios are considered

24*24

26*26

28*28

30*30

Normalization Size

×

Figure 4. An example that candidate images are considered Another case is the situation that condition b) is satisfied. If condition b) is satisfied, which means that the matched image in MNIST owns two candidate images with minimum distance in NIST, the aspect ratio should be considered. Let us look at the patterns in Figure 5. We have determined that the matching image is the middle one because it has the same aspect ratio as the image in MNIST. Image in Images in NIST MNIST Original Images

0

Normalized from MNIST

Normalized from originals

Figure 6. Number of errors in the small normalized database from different sources

6. Conclusion Our experimental results indicate that enlarging the normalization size of MNIST/NIST numeral images from 20 * 20 to bigger sizes (e.g. 26*26) improves the recognition rate. Furthermore, most researchers agree that the substitution is mainly caused by the quality of images, or distortion of images. The study has found that one of the main reasons of substitution is the size of images in MNIST. It can help researchers to define a normalization size when they do pre-processing in handwritten numeral recognition on MNIST or even construct a new database from NIST. Since some MNIST data are not noise-free, they are not good enough to be directly recognized as small size images. Even though we normalize images twice, the recognition rates still rise with both ANN and SVM. This suggests that if we normalize images to a size larger than 20 * 20, the recognition rate of the entire system will rise as well. In other words, normalizing numeral images to 20 * 20 prevents accurate recognition.

4

Moreover, we find that by normalizing images from the originals, the recognition rate is higher. As it is impossible to find all the original images of the test set in MNIST, we used the most difficult images to construct a small database. After retrieving all the original images of this small database (181 images from NIST), we tested in exactly the same way as described in Section 4. We saw that if we normalized images from NIST to larger sizes directly, the recognition rate went up higher in our small database. Although normalizing images to a larger size produced a higher recognition rate, enlarging images requires a higher computational cost, both in space and time. In the future, we can consider enlarging images partially instead of the entire database as an optimal solution, e.g. mainly those that the classifier does not have a high recognition confidence. Moreover, increasing the space resolution of gradient features will be taken into account in future studies.

[8] C. L. He and C. Y. Suen, “A hybrid multiple classifier system of unconstrained handwritten numeral recognition,” Proceedings of 7th International Conference on Pattern Recognition and Image Analysis, St. Petersburg, Russian, October, 2004, pp. 684 – 687. [9] L. Yang, C. Y. Suen, T.D. Bui, and P. Zhang, “Discrimination of similar handwritten numerals based on invariant curvature features,” Pattern Recognition, vol. 38, No.7, 2005, pp. 947 – 963. [10] P. Zhang, T. D. Bui, and C. Y. Suen, “Extraction of hybrid complex wavelet features for the verification of handwritten numerals,” Proceedings of the 9th International Workshop on Frontiers of Handwriting Recognition, Tokyo, Japan, 2004, pp. 347 – 352. [11] S. Battiato, G. Gallo, and F. Stanco, “A New EdgeAdaptive Algorithm for Zooming of Digital Images,” Proceedings of IASTED Signal Processing and Communications SPC 2000, Marbella, Spain, 2000, pp. 144149.

7. References [1] A. Britto-Jr., R. Sabourin, E. Lethelier, F. Bortolozzi, and C. Y. Suen, “Improvement handwritten numeral string recognition by slant normalization and contextual information,” 7th IWFHR, Amsterdam-Netherlands, 2000, pp. 323 – 332. [2] E. Kavallieratou, Fakotakis N. and G. Kokkinakis, “Slant estimation algorithm for OCR systems,” Pattern Recognition, vol. 34, No. 12, 2001, pp. 2515 – 2522. [3] T. Y. Zhang and C. Y. Suen, “A fast parallel algorithm for thinning digital patterns,” Communications of the ACM, vol. 27, No. 3, March 1984, pp. 236 – 239. [4] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, No. 11, November 1998, pp. 2278-2324. [5] M. Shi, Y. Fujisawa, T. Wakabayashi, and F. Kimura, “Handwritten numeral recognition using gradient and curvature of gray scale image,” Pattern Recognition, vol. 35, No. 10, 2002, pp. 2051 – 2059. [6] J. X. Dong, A. Krzyzak, and C. Y. Suen, “A fast SVM training algorithm,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 17, No. 3, 2003, pp. 367-384 [7] L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of combining multiple classifiers and their applications to handwriting recognition,” IEEE Trans. Systems, Man and Cybernetics, vol. 22, No. 3, 1992, pp. 418 – 435.

5

Suggest Documents