Online Character Recognition System using Elastic ... - IEEE Xplore

5 downloads 9925 Views 4MB Size Report
Aug 11, 2006 - 2Department of Computer Engineering & Information Technology, Moradabad ... Abstract - In this work we are presenting a simple online.
First International Conference on Industrial and Information Systems, ICIIS 2006, 8 - 11 August 2006, Sri Lanka

Online

Character Recognition System using Elastic Matching

Vikas Kumar1, Rakesh Ahuja2 'Department of Computer Engineering & Information Technology, Moradabad Institute of Technology (Uttar Pradesh Technical University), Moradabad, Uttar Pradesh, India 2Department of Computer Engineering & Information Technology, Moradabad Institute of Technology (Uttar Pradesh Technical University), Moradabad, Uttar Pradesh, India x F;

-;_1I

2

variety of different writing styles which are present in the target user group. Even more difficult for online recognition in writer-independent environment, a writing which looks similar in a graphical (i.e. offline) representation, can have a different sequential (i.e. online) representation.

Abstract - In this work we are presenting a simple online character recognition system using elastic matching. In our system, we collected our test data from twenty different writes for uppercase, lowercase and digits. We used the nearest neighbor classifier with elastic matching as between patterns distance measure. We achieved very good recognition accuracy 95.99% for uppercase letters, 92.21% for lowercase letters and 99.11% for digits. We have also shown that putting a rejection threshold may improve the reliability of the recognition system.

II. ONLINE HANDWRITING RECOGNITION Handwriting recognition is the task of transforming a language represented in its spatial form of graphical marks into its symbolic representation [4]. Online handwriting recognition refers to the situation where recognition is preformed using the dynamic stroke information along with spatial shape information.

I. INTRODUCTION The field of personal computing has begun to make transition from desktop to handheld devices, thereby requiring input paradigms that are more suited for single hand entry than a keyboard. Also the small size of these

A. Online Handwriting Data Online handwriting data is typically a dynamic, digitized representation of the pen movement, generally describing sequential information about position velocity, acceleration, pen angle etc. as a function of time. For our experiment we have considered only (x, y) coordinate positions of sample points. We have represented writing as a sequence of strokes and a stroke consists of a set of sample points between a consecutive pen-down and penup. Duplicate samples are removed at the time of recording the data. This handwritten data is then segmented using user intervention such that each segmented sequence of points represents a character pattern. This pattern is then assigned the class label of the class to which it belongs. Patterns for training and testing are collected from 20 writers in the categories of uppercase letters, lowercase letters and digits. Test and training patterns are collected at different time so as to absorb variation of handwriting due to situational factors.

devices makes the inclusion of keyboard difficult. Handwritten and Speech inputs may be the attractive alternatives. For many applications [6] and situations handwritten input is preferable over the speech as it is relatively insensitive to environmental noise. In many situations like note-taking, annotations on a document, form filling etc. writing with pen is more natural than keyboard entry. Data entry for many natural languages having large number of symbols (like Kanji, Chinese etc.) has the great potential for handwriting recognition. Handwriting recognition can be broken down in two categories: offline and online. Offline handwriting recognition focuses on the recognition of words that have been written at some previous point of time on the paper. Information is presented to the system in the form of a scanned image of the paper document. In contrast, online handwriting recognition focuses on the tasks when recognition can be performed at the time of writing. The information presented to the recognition system is the sequence of (x, y) coordinates of sample points, which record the trace of pen's movement through time on the surface of digitizing tablet. Handwriting recognition system can further be broken down into the categories of writer-dependent and writer-independent. A writerdependent system achieves higher recognition accuracy as it works on data with smaller variability. A writerindependent system has to discriminate among a large

B. Data Preprocessing

Preprocessing [3] usually addresses the problems of data reduction, elimination of imperfections (smoothing), and size normalization. We have performed size normalization on the character patterns so as to bind the pattern in a box of standard size. Aspect ratio was maintained during this

1-4244-0322- 7/06/$20. 00 c2006 IEEE 84

First International Conference on Industrial and Information Systems, ICIIS 2006, 8 - 11 August 2006, Sri Lanka

unknown against a known pattern stored in a database that we call a model database with the aim of minimizing the distance between unknown and model. Figurel illustrate the process of elastic matching as given by formula 3 below. The definition given in the formula ensures that all the points of the unknown are compared against a point of the model. Scott [5] used a variant of this technique.

size normalization. When patterns are normalized to smaller sizes, we may get some duplicate points because of the compression of pattern. These duplicate points are removed from the pattern. III. STRING MATCHING MEASURE A string matching technique [2] is used to provide a distance measure between character pattern pairs. A written stroke is represented as a sequence of events, corresponding to sequence of sample points in the stroke. This sequence forms a variable length string. The distance between two strings, say A and B, involves computing the distance between the corresponding pair of events eA and

[D(i- -1,j) rmin D(i- -1,1-

-1)

|D(i- -1,j--2)J

D(i-l, j)

nD(i-l,j -1)J

.

where 1 < i < NA,1 < j< NB Dist(A,B)= D (NA,NB )

V. ITERATIVE ELASTIC MATCHING Elastic matching as described above is computationally intensive, especially if the recursion is carried out directly. We used an alternate iterative formula that is not recursive and nearly identical to the recursive description. Its main advantage is the reduction in computational complexity.

yB)2

The distance between two patterns is then the sum distances between each pair of corresponding points the strings, given some alignment of points. An aliginment of the events between two strings takes the form of a Iset of pairings of the events between the strings:

{e(tA (1)

Ie

tB(1)

I

(tA (2)' etB (2) ) e

where

tA (1)< tA (2)< ...
100

Fig. 2. Analysis of the Correct and Wrong recognitions in different distance ranges

Fig. 1. Illustration at Elastic Matching

1-4244-0322- 7/06/$20. 00 C2006 IEEE 86

First International Conference on Industrial and Information Systems, ICIIS 2006, 8 - 11 August 2006, Sri Lanka

Table 4 - Results of using the Reject Threshold Reject Recognition Accuracy Reject Rate Threshold (Complete Set) (%) 93.46 93.87 93.92 94.12 94.44 94.90

None 80 70 60 50 40

For the matching of patterns we demonstrated strength of iterative elastic matching. This technique reduces the time complexity of string matching to linear order while giving good recognition accuracy. We have also shown that the use of stroke infornation leads to better recognition accuracy especially for uppercase letters. It is reconfirned that slope-angle feature appropriately represents the curved shapes well. Also, an appropriate reject threshold may improve reliability of the recognition system. Our recognition results are better than the best seen results [1] for the mixed style characters in the literature.

No Reject 0.86 1.25 2.18 3.94 8.69

Table 5- Recognition Results with slope-angle feature Test Category Recognition Result 95.99% Uppercase Lowercase 92.210% Digits 99.110% 94.74% Complete Set

REFERENCES

In further experiment concentrated our attention towards the lower case characters and observed that most of then are having curves as their shape constituents. Thus, we decided to include a slope-angle feature in our feature set as this feature represents curved shapes well. In place of fornula 1 we used following fornula to compute the Euclidian distance of two aligned events in the patterns to be matched.

dE(i,j) = fXj

xB)2

[1]

C. Bahlmann, H. Burkhardt, "The Writer Independent Online Handwriting Recognition System frog on hand and Cluster Generative Statistical Dynamic Time Warping", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 3, March 2004.

[2]

J. B. Krushkal, "An overview of sequence comparison: Time warps, string edits and macromolecules", SIAM review, vol. 25, No. 2, pp. 201-237, 1983.

[3]

M. Blumenstein, C. K. Cheng, X. Y. Liu, "New preprocessing techniques for handwritten word recognition", Proc. of International conference on Visualization, Imaging, and Image Processing, pp. 480-484, 2002.

[4]

R. Plamondon, S.N. Srihari, "On-line and Off-line Handwriting Recognition: A Comprehensive Survey", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, January 2000.

[5]

Scott D. Connell, "Online Handwriting Recognition using Multiple Pattern Class Models", Ph.D. Thesis, Michigan State University, 2000.

[6]

V. Hiroshi Tanaka, V. Naomi Iwayama, V. Katsuhiko Akiyama, "Online Handwriting Recognition Technology and Its Applications", FUJITSU Science Tech. Journal, vol. 40, No. 1, pp. 170-178, June 2004.

+0< -y)2 + UQI< _$)2 (5)

where w is the coefficient which decides the weight of slope-angle difference and (D is the slope-angle of respective pattern. We repeated our recognition experiment and found further improved results (see Table 5) in all categories. VII. CONCLUSION In this work, we have described our writer-independent character recognition system based on template matching.

1-4244-0322- 7/06/$20. 00 c2006 IEEE 87

Suggest Documents