rule-based algorithms for handwritten character ...

1 downloads 0 Views 2MB Size Report
Chapter 4: Rule-based Algorithm for Off-line Isolated Handwritten character recognition ..... The contour Ci is a list of contour points as shown in (2.2). ..... In reinforcement learning or learning with a critic, no desired category signal is given ...
RULE-BASED ALGORITHMS FOR HANDWRITTEN CHARACTER RECOGNITION By

ENG: RANDA IBRAHIM MOHAMED EL ANWAR A Thesis Submitted to the Faculty of Engineering at Cairo University In Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in ELECTRONIC AND COMMUNICATION ENGINEERING

FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT February 2007

1

RULE-BASED ALGORITHMS FOR HANDWRITTEN CHARACTER RECOGNITION By ENG: RANDA IBRAHIM MOHAMED EL ANWAR A Thesis Submitted to the Faculty of Engineering at Cairo University In Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in ELECTRONIC AND COMMUNICATION ENGINEERING Under the Supervision of Prof. Dr. Mohsen Abdul Raziq Rashwan

Prof. Dr. Samia Abdul Raziq Mashaly

Professor of Digital Signal Processing

Head of Computers and Systems dept.

Faculty of Engineering

Electronic Research Institute

Cairo University

FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT February 2007

2

RULE-BASED ALGORITHMS FOR HANDWRITTEN CHARACTER RECOGNITION By ENG: RANDA IBRAHIM MOHAMED EL ANWAR A Thesis Submitted to the Faculty of Engineering at Cairo University In Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in ELECTRONIC AND COMMUNICATION ENGINEERING Approved by the Examining Committee ________________________________ Prof. Dr. Mohsen Abdul Raziq Rashwan

Thesis Main Advisor

________________________________ Prof. Dr. Samia Abdul Raziq Mashali

Advisor

________________________________ Prof. Dr. Magdy Fikry Mohamed Ragaee

Member

________________________________ Prof. Dr. Mohamed Abdul Fattah Saad El Sherif

Member

FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT February 2007

3

Table of Contents List of Tables

iv

List of Figures

v

List of Abbreviations

vii

Acknowledgement

viii

Abstract

ix

Chapter 1: Introduction

1-4

1.1 The motivation of Document Analysis and Recognition field

1

1.2 The evolution of Pen-Computing devices

2

1.3 The Arabic handwriting

2

1.4 Thesis Objective

3

1.5 Thesis Organization

3

Chapter 2: Document Analysis and Recognition

5-36

2.1 The world of documents and Character Recognition

5

2.2 Off-line documents analysis

7

2.2.1 Preprocessing

8

2.2.2 Segmentation

11

2.2.3 Feature Extraction

15

2.3 On-line documents analysis

16

2.3.1 Preprocessing

17

2.3.2 Segmentation

18

2.3.3 Feature Extraction

21

2.4 Learning & Classification

21

2.4.1 Character Learning

21

2.4.1.1 Supervised Learning

21

2.4.1.2 Unsupervised Learning

22

2.4.1.3 Reinforcement Learning

22

2.4.2 Classification Approaches

22

2.4.3 Classification Tools

24

2.4.3.1 Template Matching

24

2.4.3.2 Statistical Methods

25

2.4.3.3 Stochastic Processes

25

2.4.3.4 Structural Matching

26

4

2.4.3.5 Neural Network

26

2.4.3.6 Rule based methods

27

2.4.4 Multiple classifier decision combination strategies

28

2.4.4.1 Cascading combination scheme

30

2.4.4.2 Parallel combination scheme

30

2.4.4.3 Hybrid combination scheme

30

2.4.4.4 Classifier ensembles

31

2.5 Summary

36

Chapter 3: Handwriting Recognition Systems 3.1 Types of Handwriting Recognition Systems

37-58 37

3.1.1 Styles of Handwriting: Printed vs. Cursive

37

3.1.2 Writer-Dependent vs. Writer-Independent

38

3.1.3 Closed-Vocabulary vs. Open-Vocabulary

39

3.2 Arabic Character Recognition Systems Survey

40

3.2.1 Characteristics and problems of Arabic script

40

3.2.2 Online and Offline systems for recognizing Arabic script

43

3.3 Foreign Languages Recognition Systems Survey

49

3.3.1 English

49

3.3.2 Japanese, Chinese, Thai and other languages

57

3.4 Summary

58

Chapter 4: Rule-based Algorithm for Off-line Isolated Handwritten character recognition

59-73

4.1 Off-line Character Recognition system stages

59

4.1.1 Database collection

59

4.1.2 Preprocessing Stage

59

4.1.3 Feature extraction, Training and Recognition Stages

60

4.1.3.1 Stage 1: using classifier ensemble (hierarchical mixture of experts) controlled by gating according to the structural features of Arabic alphabets

62

4.1.3.2 Stage 2: Adding more structural features for gating between different classifiers

63

4.1.3.3 Stage 3: Adding more features and using feature fusion 66 4.1.3.4 Stage 4: Increasing the reliability of gating 5

70

4.2 Results and Discussion

72

4.3 Summary

73

Chapter 5: Rule-based Algorithm for On-line Cursive Handwriting Segmentation and Recognition

74-103

5.1 On-line Character Recognition system stages

74

5.1.1 Database collection

74

5.1.2 Preprocessing Stage

76

5.1.2.1 Data Filtering

76

5.1.2.2 Text line and word separation

77

5.1.2.3 Classifying strokes types to main and secondary

83

5.1.3 Pattern Definition Stage

88

5.1.4 Feature extraction Stage

88

5.1.5 Training Stage

90

5.1.6 Recognition Stage

94

5.2 Results and Discussion

100

5.3 Summary

103

Chapter 6: Conclusion and Future work

104-106

References

107

Appendix A: Introduction to Tablet PC

112

Appendix B: ICR companies and commercial products for handwritten text recognition

120

Appendix C: Off-line Isolated Arabic alphabet database

128

Appendix D: On-line Cursive Arabic database

130

Appendix E: On-line Arabic Pattern shapes

137

Appendix F: Algorithms

143

6

List of Tables Chapter 3: Table 3.1: The most common ligatures in Arabic words.

42

Table 3.2: ACR research branches.

44

Chapter 4: Table 4.1: Common confusions of ACR system using single classifier.

61

Table 4.2: Classifier ensemble controlled by gating according to the number of dots 62 Table 4.3: Common confusions of ACR system using multiple classifiers.

63

Table 4.4: Multiple classifiers controlled by gating according to the number of dots and loops.

63

Table 4.5: Statistics computed from the document used to find the real system accuracy.

70

Table 4.6: Comparing the proposed system results to other researchers' results.

73

Chapter 5: Table 5.1: The result of the text line and word separation process

81

Table 5.2: Stroke states in FP groups containing only two strokes.

84

Table 5.3: Stroke states in FP groups containing more than two strokes.

85

Table 5.4: Odd Stroke states in FP groups containing more than two strokes.

86

Table 5.5: Feature vectors of different pattern shapes.

90

Table 5.6: The segmentation and recognition results before dot restoration.

100

Table 5.7: The recognition results after dot restoration.

101

Table 5.8: The number of correct results versus their location in the ranked list.

102

Table 5.9: The list size reduction percentages.

103

7

List of Figures Chapter 2: Figure 2.1: The most common morphological operations: closing, opening, erosion and dilation.

9

Figure 2.2: The cascading combination scheme for multiple classifiers system.

30

Figure 2.3: The parallel combination scheme for multiple classifiers system.

30

Figure 2.4: The hybrid combination scheme for multiple classifiers system.

30

Figure 2.5: Classifier ensembles.

31

Figure 2.6: Boosting by filtering.

34

Figure 2.7: Stacked generalization.

35

Figure 2.8: Mixture of experts.

35

Figure 2.9: Hierarchical mixture of experts.

36

Chapter 3: Figure 3.1: Types of English writing styles.

38

Figure 3.2: Examples for character positions in Arabic text.

40

Figure 3.3: All critical points of Arabic characters fall near the writing base line. 40 Figure 3.4: Different cases of character overlapping for the same word.

41

Figure 3.5: Different characters of the same Main-Stroke.

41

Figure 3.6: Clockwise loops form most of the Arabic characters.

41

Figure 3.7: Effect of Diacritics on Arabic word meaning.

41

Figure 3.8: The hierarchy of Arabic CR research.

43

Chapter 4: Figure 4.1: Radial distance Feature.

60

Figure 4.2: Additional features used for recognition.

64

Figure 4.3: Arabic characters classification hierarchy in stage 2.

65

Figure 4.4: New feature used.

66

Figure 4.5: Arabic characters classification hierarchy in stage 3.

68

Figure 4.6: The document used to compute the real system accuracy.

69

Figure 4.7: Arabic characters classification hierarchy in the fifth approach.

71

Figure 4.8: The accuracy progress of the proposed ACR system.

72

8

Chapter 5: Figure 5.1: Motion Computing LE1600 tablet PC.

74

Figure 5.2: The Play Ink GUI tool.

75

Figure 5.3: Documents written using the Play Ink GUI tool.

75

Figure 5.4: The results of data filtering processing.

76

Figure 5.5: Some parameters for the handwritten phrase “on the”.

78

Figure 5.6: The successive stroke states in Arabic language

80

Figure 5.7: The steps of the preprocessing stage.

88

Figure 5.8: Three freeman chain codes used as a feature for on-line handwriting. 89 Figure 5.9: Handwriting representation using direction codes.

89

Figure 5.10: An example of the training transcription file.

92

Figure 5.11: The chain code registry of all pattern shapes built during the training stage.

92

Figure 5.12: The training algorithm.

93

Figure 5.13: The recognition system overview.

96

Figure 5.14: The steps of the first recognition sub-stage

97

Figure 5.15: The result of the recognition stage.

99

Figure 5.16: Strokes overlap causes character accuracy loss

101

Figure 5.17: The number of correct results versus their location in the ranked list. 102

9

List of Abbreviations ACR

Arabic Character Recognition

ANN

Artificial Neural Network

CAD

Computer Aided Design

CN

Convolutional Network

CR

Character Recognition

DAR

Document Analysis and Recognition

DCP

Digital Curve Partitioning

DP

Dynamic Programming

FLC

Fuzzy Logic Comparator

GUI

Graphical User Interface

HMM

Hidden Markov Model

HNN

Hopfield Neural Network

ICA

Independent Component Analysis

k-NN

k-Nearest-Neighbor

LCD

Liquid Crystal Display

LVQ

Learning Vector Quantization Neural Network

MCS

Multiple Classifiers System

MLP

Multi-Layer Perceptron

NN

Neural Networks

OCR

Optical Character Recognition

PCA

Principal Component Analysis

PDA

Personal Digital Assistant

QNN

Quantum Neural Network

RBF

Radial Basis Function

SDNN

Space Displacement Neural Networks

SOM

Self-Organized Maps

TDNN

Time Delay Neural Networks

VQ

Vector Quantization

WD

Writer Dependent

WI

Writer Independent

10

Acknowledgement This work is for the sake of Islam and Arabic, the language of Quraan. All achievements and innovative ideas in this work are in grace of God. Alhamdulellah like it should be for His almighty and His graces. Special thanks for my supervisors Dr. Mohsen and Dr. Samia for guiding me, persuading me to work and enhance results, comprehensive reading of the thesis and providing me with every helpful advice. Millions of thanks are to my mum, my dad and my brothers for their great patience, ultimate understanding, and continuous support. Special thanks for Dr. Ahmed Farag for his very precious advices and for always being ready for help. Thanks to my true friends and my colleagues from ERI for their help and contributions in database collection. Thanks for eng. Omar Nasr for his continuous and precious help Thanks for every one who prayed for me. I hope this will be just the beginning for a long series of researches that may contribute some day to solve the character recognition problem completely inshaallah.

11

Abstract Machine simulation of human reading has been the subject of intensive research for the last three decades. The interest devoted to this field is not explained only by the exciting challenges involved, but also the huge benefits that a system, designed in the context of a commercial application, could bring. Handwriting is a skill that is personal to individuals. It consists of artificial graphical marks on a surface; its purpose is to communicate something. It has continued to persist as a means of communication and recording information in dayto-day life due to the convenience of paper and pen as compared to keyboards for numerous day-to-day situations. The Character Recognition (CR) is the task of transforming language represented in its spatial form of graphical marks into its symbolic representation. The recognition of handwritten characters is quite difficult due to the wide variability of hand printing and cursive script. Most of the efforts done in the field of character recognition were dedicated to recognize Latin, Japanese and Chinese characters. Arabic CR faces technical problems not encountered in any other language, which makes the problem more challenging. In CR we have four major research directions varying according to the problem nature. These major research categories are: 1. Recognition of Isolated Characters. 2. Explicit Word Segmentation Before Recognition. 3. Simultaneous recognition and segmentation of cursive writing. 4. Global Whole Word recognition. The proposed work in this thesis is dedicated to handwritten Arabic character recognition. The problem is to be investigated from different sides: the isolated character and cursive handwriting problems, the off-line and on-line point of view, the single writer and multi-writer variability problems, single character decision and multi-decisions outputs. The objective is to achieve the best possible recognition accuracy using the most logical rule-based algorithms.

12

The thesis is divided to two parts: In the first part, we proposed a rule-based off-line character recognition system for isolated Arabic alphabet written by a single writer. The system was a single-output system. The most common off-line features used by most of the researchers were used but we were able to achieve high results, comparable to that achieved by other researchers in literature, by using feature fusion and proposing the idea of multiple classifier system besides using a classification hierarchy based on the structural features of Arabic characters. In the second part, we proposed the basic stages towards a system that addresses the problem of recognizing on-line cursive handwriting. A database of handwritings was collected from multi-writers, and a new technique for text line separation was used. Rule-based algorithms were used to perform simultaneous segmentation and recognition of word portions in an unconstrained cursively handwritten document using dynamic programming. The output of these stages is in the form of a ranked list of the possible decisions. We were able to correctly segment and recognize most of the test words. The correct segmentation-recognition results we located in the top choices of the ranked list. In the future, linguistics can be used to select the best decision from this list.

13

Chapter 1: Introduction 1.1 The motivation of Document Analysis and Recognition Field Over the past few decades, the use of computers in the creation of documents has been of great benefit to society. Software tools such as word processors, computer aided design (CAD), drawing programs, and mark up languages assist us in the creation of these documents and allow for their storage in a format understood by a computer. In this format a document can be easily edited and high quality hard copies can be created using a printer, or it may be quickly distributed electronically to others across world-wide networks. Additionally we may want to take advantage of other facilities available to us when the document is in a computer readable format. Some of these include key word or pattern searching of what may be very lengthy documents, applying optimization algorithms or simulations on things such as electronic circuit design or improving the visual quality of the pages of a book or a photograph by removing noise that could be the results of years of decay [1]. This is not possible, however, when the present form of the document is on paper, Although, we are now in the age of desktop publishing and most recently printed journals and books are originally produced in a computerized format, this is still not the case of trillions of old documents, nor the handwritten notes, forms or drawings, that are still in use by all of us even today. The information contained within these documents must first be extracted from the hard copy and stored in a computerized format if we wish to have the benefits described above. The problem is that the manual process used to enter the data from these documents into computers demands a great deal of time and money. The field of Document Analysis and Recognition (DAR) has played a very important role in the attempt to overcome this problem. The general objective of DAR research is to fully automate the process of entering and understanding printed or handwritten data into the computer. The interest

14

devoted to this field is not explained only by the exciting challenges involved, but also the huge benefits that a system, designed in the context of a commercial application, could bring [2]. The Optical Character Recognition (OCR) is the sub-field of document analysis which is mainly concerned with the recognition of machine printed or handwritten words or characters in a document.

1.2 The evolution of Pen-Computing devices In spite of the rapid development of new computer technologies every day, handwriting has continued to persist as a means of recording information in day-today life due to the convenience of paper and pen as compared to keyboards for numerous day-to-day situations. As a general rule, it seems that as the length of handwritten messages decreases, the number of people using handwriting increases. This typical paradigm has led to the concept of pen computing e.g., Personal Digital Assistant (PDA), where pen tops aim at replacing mouse and keyboards of the traditional desktop computers with a pen-and-paper like interface. With the advent of a PDA there is a great need for more robust and high quality techniques for handwriting recognition especially for recognizing and storing the handwritten input in a digital (or computerized) format. The problem of recognizing writing in this case is referred to as on-line handwriting recognition. Whereas, the problem of recognizing writing in case of handwritten scanned (or digitized) document images is referred to as off-line handwriting recognition.

1.3 The Arabic handwriting The Arabic handwriting recognition problem, either off-line or on-line, is very much challenging. The Arabic language differs greatly from other Latin languages not only in its characters cursiveness but also in the language structure as well. Unlike Latin characters, Arabic script is always written from right to left and no upper or lower case exists. Generally an Arabic word consists of one or more

15

portions, and every portion has one or more characters. The discontinuities between points are due to some characters that are not connectable from the left side with the succeeding characters. Those characters appear only at the tail of connected portions, and the succeeding character forms the head of the next portion [3]. Many characters differ only by the presence and the number of dots above or below the main part of the character shape. Sometimes, the ambiguity of the positions of these dots in handwritten texts brings out many possible readings for one word. Moreover, every character has more than one shape, depending on its position within a connected portion of the word. As a result of not encountering these special characteristics, Latin character recognition systems achieved a very high accuracy and they are well established as market products while Arabic character recognition systems still need more research to be established commercially.

1.4 Thesis Objective The proposed work in this thesis is dedicated to human written document analysis problem concentrating on the handwritten Arabic character recognition. The problem is to be investigated from different sides: the isolated character and cursive handwriting problems, the off-line and on-line character problem, the single writer and multi-writer variability problems, single character decision and multi-decisions outputs. The objective is to achieve the best possible recognition accuracy using the most logical rule-based algorithms.

1.5 Thesis Organization This thesis is organized as follows: Chapter 2 is an introduction to the character recognition field as an important branch of document analysis and recognition. We highlighted on most CR systems types especially those recognizing human handwritten scripts. We gave a detailed

16

introduction about on-line and off-line character recognition systems stages and the trials done by researchers in each stage. Chapter 3 is a discussion about the Arabic handwriting characteristics and the difficulties facing the researchers in order to develop a successful character recognition system and a quick review of some samples of the work done for recognizing Arabic language as well as foreign languages like English, Chinese, Thai, etc. In Chapter 4, we propose a rule-based system for recognizing off-line isolated handwritten Arabic characters written by a single writer using structural features and classifier ensembles. In Chapter 5, we propose a rule-based algorithm for the two early stages of an on-line cursive Arabic handwriting recognizer. Simultaneous segmentation and recognition of word portions was performed using dynamic programming techniques. In Chapter 6, we present the conclusions obtained from the proposed work and the possible directions of the future research that can be done in this field. The Appendices contain the databases and algorithms used.

17

Chapter 2: Document Analysis and Recognition In this chapter we define the scope of the Document Analysis and Recognition (DAR) field, presenting the nature of the digital image, the fundamental steps in digital image processing and segmentation, techniques of document image representation and description, and recognition techniques. The chapter also focuses on introducing character recognition field (CR) as an important branch of document analysis and recognition. It highlights on most CR systems types especially those recognizing human handwritten scripts. It includes a detailed introduction to the different on-line and off-line character recognition systems stages and the efforts done by researchers in each stage.

2.1 The world of documents and Character Recognition The importance of documents in our life comes mainly from the convenience of the pen-and-paper interface to acquire knowledge, register day-to-day information and communicate with others in every day situation. Paper documents, such as newspapers, magazines, books and even personal notes and sketching, which are an inherently analog medium, can be converted into digital form and benefit from the facilities available to us when the document is in a computer readable format, after a process of scanning and digitization. This approach is distinguished as producing an off-line document. Written documents are not only encountered in the form of handwriting inscribed on paper, as they can be created also in the form of handwriting registered on an electronically sensitive surface. In this case, handwriting data is converted to digital form by writing with a special pen on an electronic surface such as a Liquid Crystal Display (LCD). This approach is distinguished as producing an on-line document.

18

In the on-line case, the two-dimensional coordinates of successive points of the writing are stored in order thus the order of strokes made by the writer are readily available. While, in the off-line case, his full written script is available as an image. The on-line case deals with a one-dimensional representation of the input, whereas the off-line case involves analysis of the two-dimensional image. The raw data storage requirements are widely different. The off-line character recognition systems are those systems that take as input the digital image of the document and try to automatically interpret the graphical marks it comprises to their corresponding symbols. Whereas the on-line character recognition systems are those taking as input streams of two-dimensional coordinates of successive points of the writing, and try to automatically identify the written symbol through the recorded pen trajectory. Off-line and on-line character recognition systems differ according to the applications they are devoted to. The off-line recognition is dedicated to bank check processing, mail sorting, reading of commercial forms, etc, while the on-line recognition is mainly dedicated to pen computing industry and security domains such as signature verification and author authentication. The recognition rates reported are also much higher for the on-line case in comparison with the off-line case. This may be attributed to the fact that more information may be captured in the on-line case such as the direction, speed and the order of strokes of the handwriting. This information is not as easy to recover from handwritten words written on an analog medium such as paper. On the other hand, the recognition of off-line handwriting is more complex than the on-line case due to the presence of noise in the image acquisition process and the loss of temporal information such as the writing sequence and the velocity.

19

2.2 Off-line documents analysis When the system input device is a still camera or a scanner, which captures the position of digital ink on the page but not the order in which it was laid down, the output is a digital image. In this case, we speak of off-line documents. The digital image may be defined as a two-dimensional function, f(x, y), where x and y are spatial (plane) coordinates, and the amplitude of f at any pair of coordinates (x, y) is called the intensity or gray level of the image at that point. When x and y and the amplitude values of f are all finite, discrete quantities, we call the image a digital image. Note that a digital image is composed of a finite number of elements, each of which has a particular location and value. These elements are referred to as picture elements, image elements, pels, and pixels [4]. The analysis and recognition of off-line documents are achieved via several stages. First, the camera or the scanner captures an image of the document. Next, this digital image is preprocessed to simplify subsequent operations without loosing relevant processing information. The processed (or simplified) document is then sent to a feature extractor, whose purpose is to reduce the data by measuring certain “features” or “properties”. These features are then passed to a classifier that evaluates the evidence presented and makes a final analysis decision. The Optical Character Recognition (OCR) is the task of transforming language represented in its spatial form of graphical marks (or digitized image of characters) into its symbolic representation. The typical character image classes are usually the upper- and lower-case characters, the ten digits, and special symbols such as the period, exclamation mark, brackets, dollar and pound signs, etc. The recognition of characters from a single font family on a well-printed paper document can be done very accurately. Difficulties arise when there are decorative fonts, many fonts to be handled, or when the document is of poor quality. In the difficult cases, it becomes necessary to use models to constrain the choices at the character and word levels. Such models are essential in

20

handwriting recognition due to the wide variability of hand printing and cursive script. A pattern recognition algorithm is used to extract shape features and assign the observed character into the appropriate class. These algorithms will be described in details later after discussing the earlier OCR system stages.

2.2.1 Preprocessing Preprocessing is an essential task that precedes the tasks of image representation and recognition. Its importance is derived from the fact that the discrimination power is directly proportional to the digital image quality, in the sense that, the higher the image quality the less confusions we have and thus more powerful classification we can make. Some of the common operations performed as preprocessing are: binarization, the task of converting gray-scale image into a binary black-white image; noise removal, the extraction of the foreground textual matter by removing textured background, salt and pepper noise or interfering strokes; image enhancement and restoration, the task of converting the image to be more suitable than the original image for a specific application ; morphological image processing, the task of extracting image components that are useful in the representation and description of region shape. The basic morphological algorithms are: boundary extraction; region filling; extraction of connected components; thinning and thickening. And the most common morphological operations used are dilation, where the value of the output pixel is the maximum value of all the pixels in the input pixel's neighborhood; erosion, where the value of the output pixel is the minimum value of all the pixels in the input pixel's neighborhood; opening, which smoothes contours and eliminates small islands and sharp peaks; and closing, which smoothes contours, fuses narrow breaks and eliminates small holes.

21

Closing Operation After Closing

Before Closing

Opening Operation After Opening

Before Opening

Erosion Operation After Erosion

Before Erosion

Dilation Operation After Dilation

Before Dilation

Figure 2.1: The most common morphological operations: closing, opening, erosion and dilation.

In the OCR field, the previous operations have been dealt with extensively for handwritten or machine printed documents. 1. Thresholding: The task of thresholding is to extract the foreground (ink) from the background. Following scanning, character images are initially stored in a grey-level format. This means that the intensity of each pixel in the image may vary between a value of 0 and 255. The value zero indicates a white pixel,

22

whereas a black pixel is represented by the value 255. Various shades of grey are represented between these two values. Many researchers have decided to convert the initial grey-level images into a less storage intensive format i.e. a binary (0 and 1), black and white format (binarization) [5]. Some threshold is usually selected so that pixels with a luminance over the threshold are marked as being background pixels while pixels with a luminance under the threshold are considered to be part of the character image. Selecting an appropriate threshold is achieved using the histogram of gray scale values of the document image. An optimal value is determined in the valley between the two peaks corresponding to the foreground and background. 2. Noise Removal: Digital capture of images can introduce noise from scanning devices and transmission media. Noise is a random error in a pixel value, usually placed under one of the three categories: signal-independent, signaldependent and salt & pepper noise. Noise cannot always be totally eliminated; but smoothing is a widely used procedure for replacing the value of a pixel by the average of the values of the neighboring pixels surrounding the original pixel. 3. Skeletonization: Line images coming from scanners are normally several points thick. Most relevant information in lines is not related to the thickness of the line. Hence, thinning of lines by removing all redundant pixels, until they become just 1-point thick can be very useful procedure. In general a thinning procedure is judged by how well it is able to control lines of the original image without: •

Fragmenting a previously continuous line by breaking it into a number of isolated lines,



Clipping the ends of the central line,



Introducing new features (e.g. a cusp) which were not there originally, or



Eliminating/replacing a feature (i.e. by replacing a loop with a single line).

23

Unfortunately, most thinning algorithms introduce artifacts such as spurs which make their use somewhat limited. 4. Normalization: The variability of handwritten patterns poses a very big problem for machine recognition of characters. Normalization refers to such operations as: the estimation and correction of a character’s slant, scaling the character to a uniform size and also possibly reducing the character to a skeleton so that the line width is a uniform, one unit wide. It is often required to carry out a normalization operation before any feature extraction is carried out. Normalization routines may be broken down into the following groups: •

Moment invariant techniques.



Fourier Descriptors.



Boundary-Based Techniques.



Vector Analysis.

These routines normalize the character size by dividing whatever size-related feature they are using by the total length of the character. They normalize the position of the character by moving the center of co-ordinates to a point, which is at a fixed position on/about the character, e.g. the centroid, or the starting point of that character [6].

2.2.2 Segmentation Segmentation subdivides an image into its constituent regions or objects. The level to which subdivision is carried depends on the problem being solved. That is, segmentation should stop when the objects of interest in an application have been isolated [4]. For example, images of handwritten or typewritten text experience several types of segmentation: Images and Graphs zoning and isolation, line segmentation, word segmentation and sometimes also segmentation of a word to subwords, characters or even parts of characters (primitives). In the OCR field, the previous operations have been dealt with extensively for handwritten or machine printed documents. 1. Line-to-word Segmentation: The task of segmenting lines of text into words is straight forward for machine printed documents. It can be accomplished by

24

examining the horizontal histogram profile at a small range of skew angles. The task is more difficult in the handwritten domain. Here lines of text might undulate up and down and ascenders and descenders frequently intersect characters of neighboring lines. One method used by S. N. Srihari in [7], is based on the notion that people write on an imaginary line which forms the core upon which each word of the line resides. This imaginary baseline is approximated by the local minima points from each component. A clustering technique is used to group the minima of all the components to identify the different handwritten lines [8]. 2. Word-to-Character Segmentation: Line-to-word separation is usually followed by a procedure that separates the text word into characters. Most word segmentation approaches assume that gaps between words are larger than the gaps between the characters. However, characters can be written cursively. They may also overlap. There are various methods for tackling the segmentation problem: •

Pre-Segmentation: means that characters arrive already separated from each other. This is normally the case when the text is printed, or when the writer is required to write the characters in boxes or without connecting them together.



Finding Gaps: to find out the gaps between the letters or, at least, the connecting lines. These techniques that function by analyzing the geometric relationships between the various components of the text [6].

There have been various attempts to reliably segment characters, but many have imperfections. Humans can easily segment characters by first recognizing them. Whilst some methods try to combine segmentation with recognition, normally they are done as two separate stages. This two stage approach makes it easier for the computer to segment words, but there is the possibility of error in both stages, if two segments which shouldn’t be combined happen to look like two parts of one character, it is likely a segmentation error will result. 25

The work done in the off-line document segmentation issue can be classified to: a. Segmentation based on contour analysis and baseline location: One font-independent technique introduced by M. Allam [9] for printed text. After tracing the edge contours, the contour analysis yields a set of contours C, as described in equation (2.1) where M is the number of contours: C = {Ci | i = 1, 2… M}

(2.1)

Ci = {(xip, yip), αi, δp | p = 1, 2…, Ni}

(2.2)

The contour Ci is a list of contour points as shown in (2.2). Each pair (xip, yip) represents the x-y coordinates of a contour point. Ni is the ith contour length in number of points. αi is 0 for an external contour and is 1 for a hole. δn is the directional code (chain code). This chain code provides information about finding the location of the baseline. After defining the baseline location, segmentation is done at the points where the contour makes transition from the inside to the outside of the baseline zone. b. Segmentation based on vertical histogram: One font-dependent technique introduced by H. Abdelazim et al. for printed text. After plotting the vertical histogram of the word or sub-word, it is traversed by a predefined threshold. The zones above this threshold are isolated and are passed to the recognition part. The computation of this threshold value depends on the font, and is proportional to the lump of black pixels that joins characters together [9]. c. Stroke Segmentation: As explained in [9], this approach consists of breaking down the skeletonized word into principal strokes and secondary strokes. The algorithm starts by tracing the curve from the right most point until it extracts the end point. Hence, the stroke is isolated and the procedure is repeated for the next curve until there are no more strokes.

26

d. Post- Segmentation (Segmentation by recognition): In [3, 9], One font-dependent technique was experimented by S. ElDabi. The basic idea is to extract sequentially a set of features and accumulating the values while moving along the word. The accumulative invariant moments are used as features and are calculated column by column of the word image then checked against the feature space of a given font. If the character is not found, another column is added to the character, then the moments are calculated again and checked against the feature space. This process is repeated until the character is recognized or the end of the word is reached. Although this approach solves the cursiveness problem in an elegant way, it is time consuming and quite sensitive to noise. e. Segmentation by Neural Network: As explained in [3, 9], this is a sophisticated technique introduced by A. Abdel Magued. Contour analysis is performed using Neural Networks to determine the location of break points between characters. For every point on the contour, a fixed length vector of the directional codes around this point is employed as an input for a multi-layer perceptron (MLP) neural network which decides whether the point is a break point or not. The suggested break points are fed to another neural network, a Hopfeild network, which selects a subset of break points to minimize a cost function. Neural Networks are trained on manually marked break points. This technique is of great theoretical value but is not accurate when it is tested against a large data set. f. Segmentation using Dynamic programming (Pre-stroke segmentation): As described by Thomas M. Breuel in his paper [10], cursive handwriting characters are linked regularly and predictably using festoon-like strokes. Minima of the upper contour of a cursive input string (valley points) will therefore correspond to segmentation points between characters. Cuts are always hypothesized between different connected components whose vertical projections do not overlap significantly. After this preprocessing step, the main task is to find cuts that divide up connected components into their individual characters. The basic idea is to use a dynamic programming algorithm to find a globally optimal set of cuts through the input string which 27

minimizes a certain cost function. The set of cuts and their precise shape are found simultaneously. The tradeoff between over-segmentation and missed cuts is similar to the tradeoff between positive and negative errors in pattern recognition. By choosing thresholds for the evaluation function, he was able to determine how many cuts, on the average, are generated per actual character. In contrast to all above, we have segmentation-free methods which focus on using features of the whole word to be recognized as a single unit. This depends highly on a predefined lexicon which acts as a look-up dictionary. This lexicon is application dependent like that used in bank check processing and mail sorting. The accuracy of the system is highly dependent on the selected features.

2.2.3 Feature Extraction: After an image has been segmented into regions, the resulting aggregate of segmented pixels usually is represented and described in a form suitable for further computer processing. Basically, representing a region involves two choices: (1) We can represent the region in terms of its external characteristics (its boundary), or (2) we can represent it terms of its internal characteristics (the pixels comprising the region). The next task is to describe the region based on the chosen representation. For example, a region may be represented by its boundary, and the boundary described by features such as its length, the orientation of the straight line joining its extreme points, and the number of concavities in the boundary [4]. The features are a reduced representation of the contents of the image which focuses on preserving the characteristics that are most relevant to the task of recognition. Extracted features are represented as a vector of values that are then passed to the recognizer stage. A good feature set should represent the characteristics of a class that helps distinguishing it from other classes. It should be invariant to characteristics differences within the class and should avoid containing redundant information. It should also be limited in number to permit efficient computation functions and to limit the amount of training data required.

28

Three different groups of feature extraction techniques exist: templates, structural decomposition and series expansion. Template Matching is very sensitive to size, rotation and translation variations as well as to noise. Thus, it has limited application. Structural Decomposition is a method that tells about the structure of the object we are trying to recognize: encodes (1) knowledge about the contour of the object like height contour and chain code features, (2) knowledge about what sort of components make up that object like end points, T-joints and X-joints. Series Expansion is a way to represent a signal by a linear combination of a series of simpler well-defined functions, where the coefficients of the linear combination provide a compact encoding. If these functions are orthogonal to each other, then the coefficients in their linear combination provide non-redundant information toward the reconstruction of the signal like: Moments, Fourier transform, Gabor transform and wavelets. There is an infinite number of potential features that one can extract from a finite 2D pattern. However, only those features that are of possible relevance to classifications need to be considered. This entails that during the design stage, the expert is focused on those features, which, given a certain classification technique will produce the most certain and efficient classification results [6]. Once an input image is mapped onto a point on feature space, the next step is to perform character learning and recognition. These stages are the same in case of on-line and off-line document analysis.

2.3 On-line documents analysis On-line handwriting recognition is a key software technology required to realize a computer interface modeling paper and pencil. Pen-computing devices had made a revolution in the human computer interfacing. Pen computers record handwriting information as a time ordered sequence of (x, y) points. The problem of recognizing writing in this case is referred to as on-line handwriting recognition.

29

The pen input device records the trajectory of the pen tip on the paper as a sequence of points sampled over time (xt, yt). The set of points between a pen-down and next pen-up is called a stroke. The pressure of the pen tip on the paper may also be used during recognition. For a complete introduction about pen-computing devices and tablet PC's see appendix A. Classically, on-line recognizers consist of a preprocessor, a classifier which provides estimates of probabilities for the different categories of characters (or other sub-word units) and a dynamic programming postprocessor (often a hidden Markov model), which eventually incorporates a language model. The system has usually adjustable parameters whose values are determined during a training session [11]. Pre-processing of handwriting is done prior to recognition and typically involves noise reduction, normalization and segmentation. Feature vectors are then extracted from the pre-processed handwriting and are used in conjunction with character and language models for recognition.

2.3.1 Preprocessing 1. Noise Removal: Noise in the data typically originates from the limiting accuracy of the digitizing process, erratic hand motion and the inaccuracies of pen-down indication. A number of techniques are used for noise reduction. These include: a. Smoothing: A common technique averages each point with its neighbors. Another well-known approach is to approximate the underlying ink trace to some standard curves. b. Filtering: Eliminates duplicate data points and reduces the total number of points. The form of filtering depends on the recognition method. One filtering technique forces a minimum (or fixed) distance between consecutive points. When the writing is fast, however, the distance between successive points may far exceed the minimum. In this case interpolation can help obtain equally spaced points (This is also called Resampling). Another filtering technique produces more points in regions of great curvature.

30

c. Wild point correction: Replaces or eliminates occasional spurious points, typically caused by hardware problems. d. De-hooking: Eliminates hooks that occur at the beginning and end of strokes, caused due to inaccuracies in pen down and pen up positions. 2. Normalization: There is typically great variability in the size of letters. The goal of normalization is to reduce this variability. A number of algorithms are use for normalization: a. De-skewing: Corrects slant. This can be applied at the letter or word level. b. Baseline drift correction: Orients the word relative to a baseline. c. Size normalization: Adjusts the letter size to a standard size. d. Stroke length normalization: Forces the number of points in a stroke to a specified number for easy alignment [12].

2.3.2 Segmentation Segmentation is the process of breaking up the input handwritten data into smaller logical units. It can be done at various levels and is determined by the nature of the recognition algorithm: 1. Stroke level segmentation: Stroke level segmentation can be done trivially using pen-up/pen-down features for characters written using Single strokes. 2. Character level segmentation: In printed writing, the user can explicitly indicate character boundaries by writing in pre-defined boxes or using heuristic methods that look for spacing between adjacent strokes. In cursive writing, heuristics or statistics computed from training data are used. In most cases, all plausible segmentations are computed and the segmentation that results in the best recognition is used. 3. Word level segmentation: Word segmentation can be done temporally, spatially, or in combination. Word boundaries are determined temporally when the time between adjacent pen-down and pen-up events exceed a threshold. Word boundaries are determined spatially when the distance between the centroids of adjacent strokes exceeds a certain threshold. 4. Line level segmentation: Line segmentation uses temporal and spatial information to identify clusters of strokes that are contained on a line. If real-

31

time recognition is not required, line segmentation is determined first, and word level segmentation is applied to strokes that belong to a line. 5. Region level segmentation: If a user mixes pictorial information (drawings, tables etc.) with text, the page is first segmented by information type. Regions that are marked as text are then segmented into lines by the line level segmentation module [12]. There is some difference in the techniques used by researchers for segmenting on-line cursive script from those used for segmenting off-line script. In [3], Almuallim and Yamaguchi had introduced an algorithm based on following the pen trajectory. Although this technique was applied to off-line cursive script, it was the first step towards developing an on-line cursive handwriting segmentation algorithm. The algorithm defines feature points (line end, branch point, and cross point) to perform segmentation. An algorithm is developed to search the start point, then points are deleted from the word until an end point is detected. The end point could be a feature point or a point of sudden change in the curvature (up or down). Another segmentation algorithm to segment words into characters was introduced in a system called Interactive recognition of Arabic characters (IRAC II) [13]. The system recognizes limited words stored in a dictionary. In this system, segment boundaries are allocated after an intersection, after cusps, or after a change in curvature. Due to the lower recognition rate of IRAC II, IRAC III has been developed based on stroke matching. IRAC III is a limited words recognition system like IRAC II. Abo-Samara [3], used a segmentation-by-recognition technique. He tended to abstract the word script into simple directional primitives (up, down, left, and right) and developed syntactic rules to generate more meaningful primitives. Size and level normalization is done to enhance the recognition accuracy. As a result the script is transformed to a sequence of topological primitives. He developed a primitive dependent language to group primitives into more classified shapes. The major disadvantage of the algorithm is that the abstraction process removes the script details, which definitely affect the recognition accuracy and introduce a new layer to solve the 32

context dependent problem. Moreover, the abstraction process results in very small primitives. The grouping of these primitives into characters needs a huge computational time. Y. Hifny [3], used a similar approach to Abo-Samara. He used the key characteristics for segmentation of some algorithms (loops, straight line strokes, handwriting velocity and angular profile, corners, end points, local minima and local maxima) to segment characters to primitive shapes and developed a description language (syntactic rules) to group these primitives after being recognized using MLP neural networks. The major advantage of his algorithm is that its resultant primitives are nearly characters which results in much less computation time. The drawback of this technique is that it cannot be considered a general purpose algorithm, since it cannot handle a wide variety of different handwriting styles (different character stroke order). P. Neskovic and L. N. Cooper [14], used a multi-layer feedforward network based on a weight sharing technique to detect characters, an architecture similar to Time Delay Neural Networks (TDNN) and Space Displacement Neural Networks (SDNN). One of the most useful properties of this architecture is the original and easy training procedure. While the segmentation network learns to detect isolated characters, this architecture is trained on entire unsegmented words. The only information it needs is whether a character of a particular class is present or not within the pattern but not the exact location of the character. The output layer of the network consists of units called character detectors. The outputs of these detectors form a detection matrix. The elements of the decision matrix represent recognition probabilities of characters. Each row of the detection matrix represents one character and each column corresponds to the position of the character within the pattern. The units that are in the same row have restricted and overlapping receptive fields and share the same weights. But, since the section of the input pattern that is supplied to a character detector often contains insufficient or ambiguous information, the outputs of the character detectors are not very reliable. Therefore, these outputs are oversegmented. Only some of the elements of the detection matrix represent correct characters and the goal is to find them.

33

2.3.3 Feature Extraction Features are typically extracted at a sub-letter level. The feature set varies greatly between recognizers. Some of the commonly used features are: 1. Shape descriptors: For example, ascender, descender, concave-down, concaveup, loop, cusp, curliness, lineness … etc. 2. Symbolic representation of the singularities in an ink trace. 3. Tangent and curvature features for a window of points along the ink trace. 4. Writing speed.

2.4 Learning & Classification 2.4.1 Character Learning Learning refers to some form of algorithm for reducing the error on a set of training data. A range of gradient descent algorithms that alter a classifier’s parameters in order to reduce an error measure now permeate the field of statistical pattern recognition, and these will demand a great deal of our attention [15]. The keys of printed character learning are essentially training set and classification adaptation to new characters and new fonts. The training set can be given either by user or extracted directly from document samples. In the first case, the user selects the fonts and the samples to represent each character in each font and then guides the system to create models. Here, the user must use sufficient number of samples in each font according to the difficulty of its recognition. However, it is difficult in an omnifont context to collect a training set of characters having the expected distribution of noise and pitch size. In the second case, the idea is to generate the training set directly from document images chosen from a wide variety of fonts and image quality and to reflect the variability expected by the system. The problem here is that one is not sure that all valid characters are present. Learning comes in several general forms: 2.4.1.1 Supervised Learning In supervised learning, a teacher provides a category label or cost for each pattern in a training set, and we seek to reduce the sum of the costs for these patterns.

34

2.4.1.2 Unsupervised Learning In unsupervised learning or clustering there is no explicit teacher, and the system forms clusters or “natural groupings” of the input patterns. “Natural” is always defined explicitly or implicitly in the clustering system itself, and given a particular set of patterns or cost function, different clustering algorithms lead to different clusters. 2.4.1.3 Reinforcement Learning The most typical way to train a classifier is to present an input, compute its tentative category label, and use the known target category label to improve the classifier. For instance, in optical character recognition, the input might be an image of a character, the actual output of the classifier the category label “R”, and the desired output a “B”. In reinforcement learning or learning with a critic, no desired category signal is given; critic instead, the only teaching feedback is that the tentative category is right or wrong. This is analogous to a critic who merely states that something is right or wrong, but does not say specifically how it is wrong. (Thus only binary feedback is given to the classifier; reinforcement learning also describes the case where a single scalar signal, say some number between 0 and 1, is given by the teacher.) In pattern classification, it is most common that such reinforcement is binary either the tentative decision is correct or it is not. (Of course, if our problem involves just two categories and equal costs for errors, then learning with a critic is equivalent to standard supervised learning).

2.4.2 Classification Approaches: There are two main types of strategies have been applied to the problem of character classification (recognition) since the beginning of research in this field: the holistic approach and the analytical approach. In the first case recognition is globally performed on the whole representation of words and there is no attempt to identify characters individually. The main advantage of holistic methods is that they avoid word segmentation. Their main drawback is that they are related to a fixed lexicon of word descriptions: as these

35

methods do not rely on characters, words are directly described by means of features and adding new words to the lexicon requires human training or the automatic generation of word descriptions from ASCII words. These methods are generally based on dynamic programming (DP) (edit distance, DP-matching, etc.) or Hidden Markov models. Analytical strategies deal with several levels of representation, corresponding to increasing levels of abstraction (usually the feature level, the character or primitive level and the word level). Words are not considered as a whole but as sequences of smaller size units, which must be easily related to characters in order to make recognition independent from a specific vocabulary. These methods are themselves sub-classed into two categories: analytical methods with explicit (or external) segmentation, where segmentation to character or even primitive level takes place before classification and analytical methods with implicit (or internal) segmentation which perform segmentation and classification simultaneously. In both cases, lexical knowledge is heavily used to help recognition. This lexical knowledge can either be described by means of a lexicon of ASCII words or by statistical information on letter co-occurrence (n-grams, transitional probabilities, etc.). The advantage of characterbased classification methods is that the vocabulary can be dynamically defined and modified without the need for word training. Many techniques initially designed for character classification (like neural networks) have been incorporated to analytical methods for recognizing tentative characters. The contextual phase is generally based on dynamic programming and/or Markov chains (edit distance, Viterbi algorithm, etc.). Fruitful research has been realized in recent years in the field of analytic recognition with implicit segmentation using various kinds of hidden Markov models [11].

36

2.4.3 Classification Tools: Recognition involves classifying each unknown object into one of a finite number of categories (or classes). There are different methods of classification to implement the above recognition approaches: •

Template Matching.



Statistical methods.



Stochastic Processes (Marhov Chains).



Structural Matching (Trees, Chains, etc.).



Neural Networks.



Rule-based methods.

Many recent methods mix several techniques together in order to obtain improved reliability, despite the great variation in handwriting. 2.4.3.1 Template Matching: Template matching operations determine the degree of similarity between two vectors (Groups of pixels, shapes, curvatures, etc) in the feature space. There are three methods of applying this classification approach: Direct Template Matching, String Matching and Elastic Template Matching. Direct Template Matching: It is a technique in which a prototype (template) is created for every type of pattern that can be presented to the recognizer and that template defines the classification of the unknown pattern. Template matching can be performed using some minimum distance approach or a set of likelihood functions can be created for the template. This method is very inflexible since any form of distortion or noise that may exist in the pattern requires storing a new template. String Matching: The technique of representing the pattern as a string of symbols and comparing this to some class string description is known as string matching [1]. Classification can be achieved by minimum distance or maximum likelihood.

37

Elastic Template Matching: It is a method of performing elastic matching to two dimensional patterns. The idea is that an instance of a class differs from its prototype template by a number of local distortions. A match can be found between a pattern and its prototype template by applying a number of local transformations to the template [1]. The distance between a template and the pattern being tested can be measured by the amount of distortion between the two. 2.4.3.2 Statistical Methods: Statistical techniques are concerned with statistical decision functions and a set of optimal criteria, which determine the probability of the observed pattern belonging to a certain class. Several popular handwriting recognition approaches belong to this domain as described in [2]: •

The k-Nearest-Neighbor (k-NN) rule is a popular non-parametric recognition method, where a posteriori probability is estimated from the frequency of nearest neighbors of the unknown pattern. Compelling recognition results for handwriting recognition have been reported using this approach.



Bayesian Classifier (Maximum likelihood): Baye's decision theory is a method of classifying an incoming pattern based on a comparison of a posteriori probability density functions. Using past examples of features from each class (category) we can estimate the class conditional density. The category with the maximum a posteriori probability then defines the classification result.



The polynomial discriminant classifier assigns a pattern to a class with the maximum discriminant value which is computed by a polynomial in the components of a feature vector. The class models are implicitly represented by the coefficients in the polynomial.

2.4.3.3 Stochastic Processes Hidden Markov Model (HMM) is a doubly stochastic process, with an underlying stochastic process that is not observable (hence the word hidden), but can be observed through another stochastic process that produces the sequence of observations. An HMM is called discrete if the observations are naturally discrete or quantized vectors from a codebook or continuous if these observations are continuous. 38

HMMs have been proven to be one of the most powerful tools for modeling speech and later on a wide variety of other real-world signals. These probabilistic models offer many desirable properties for modeling characters or words. One of the most important properties is the existence of efficient algorithms to automatically train the models without any need of labeling presegmented data. HMMs have been extensively applied to handwritten word recognition and their applications to handwritten digit recognition have been growing. 2.4.3.4 Structural Matching: In structural techniques the characters are represented as unions of structural primitives. It is assumed that the character primitives extracted from handwriting are quantifiable, and one can find the relationship among them. Basically, structural methods can be categorized into two classes: grammatical methods and graphical methods. 2.4.3.5 Neural Network: A Neural Network (NN) is defined as a computing structure consisting of a massively parallel interconnection of adaptative “neural” processors. The main advantages of neural networks lies in the ability to be trained automatically from examples, good performance with noisy data, possible parallel implementation, and efficient tools for learning large databases. NNs have been widely used in this field and promising results have been achieved, especially in handwriting digit recognition. The most widely studied and used neural network is the Multi-Layer Perceptron (MLP). Such an architecture trained with back-propagation is among the most popular and versatile forms of neural network classifiers and is also among the most frequently used traditional classifiers for handwriting recognition. Other architectures include Convolutional Network (CN), Self-Organized Maps (SOM), Radial Basis Function (RBF), Space Displacement Neural Network (SDNN), Time Delay Neural Network (TDNN), Quantum Neural Network (QNN), and Hopfield Neural Network (HNN).

39

2.4.3.6 Rule-based methods Rule-based methods use abstract descriptions of handwriting to recognize what was written. For example, a rule of recognizing the letter ‘x’ might be ‘two lines that cross over and are at approximately 45 and 135 degrees’. The problem with these methods is that it is not possible to design an exhaustive set of rules that model all possible ways of forming a letter. Variations of rule based methods that include fuzzy search seem to provide superior recognition performance, but are still not as good as the performance of statistical methods. They are however very useful in disambiguating between certain class pairs using very few parameters. One example is the disambiguation between a ‘v’ and ‘u’ based on the presence or absence of a hook at the end [12]. Systems using Rule-based methods are based on supervised or unsupervised rules generated by training modules to encompass the variability of the dataset and by reflecting this variability in designing heuristic classification frameworks. These systems work well in situations where there is no error in the measurements and no errors in knowledge elicitation. Unfortunately these rule-based systems are often too simplistic to be of real use in solving problems involving “live” data. Nevertheless, they have been very successful when the data variability is small and the task domain remains comparatively constant [16]. The above review indicates that there are many classification techniques available for handwriting recognition systems. All of them have their own advantages and drawbacks. In the recent years, many researchers have combined such techniques in order to improve the recognition results. The idea does not rely on a single decision making scheme. Various classifier combination schemes have been devised and it has been experimentally demonstrated that some of them consistently outperform a single best classifier. The complementariness (also called as independence or diversity) of classifiers is important to yield high combination performance. For character recognition, combining classifiers based on different techniques of pre-processing, feature extraction, and classifier models is effective. This will be discussed in details in the next section [17]. 40

Another strategy that can increase the recognition rate in a relatively easy way with a small additional cost is through the use of verification. Such a scheme consists of refining the top few candidates in order to enhance the recognition rate economically. Cheng-Lin Liu and Hiromichi Fujisawa [17] had made a very comprehensive comparison between some of the classification techniques discussed before together with evaluation of performance. They focused on the classification of isolated (segmented) characters. They mainly discussed feature-based classification methods, which have prevailed structural methods, especially in on-line character recognition. These methods include statistical methods, Artificial NNs, SVMs, and multiple classifier combination. To sum up, the current stage in the evolution of handwriting processing results from a combination of several elements, such as: the use of different strategies for feature selection, the evaluation of different classifiers and their combination, the use of complex systems based on contextual information, verifiers and post processing, the use of synthetic data, the automatic optimization of entire systems, and so forth.

2.4.4 Multiple classifier decision combination strategies It has generally been found that multiple expert (classifier) decision combination strategies can produce more robust, reliable and efficient recognition performance than the application of single expert classifiers. It is also noted that a single classifier with a single feature set and a single generalized classification strategy often does not comprehensively capture the large degree of variability and complexity encountered in many practical task domains. Multiple expert decision combination can help to alleviate many of these problems by acquiring multiple-source information through multiple features extracted from multiple processes, introducing different classification criteria and a sense of modularity in system design which leads to more flexible recognition systems. Although some of these decision combination approaches are task-specific,

41

most are generic and usually it is easily possible to apply the same technique to a variety of tasks [16]. The problem of combination of multiple experts should be expressed formally before a detailed analysis of various solutions is considered. If n classifiers (experts), working on the same problem, deliver a set of classification responses (outputs), then the decision combination process has to combine the decisions of all these different classifiers in such a way that the final decision improves the decisions taken by any of the individual experts. Hence, the decision fusion process has to take into account the individual strengths and weaknesses of the different cooperating classifiers and must build on these to deliver a more robust final decision. There can be three distinctly different types of problem for multiple expert classifier combination based on the type of individual classification response delivered. These can be summarized as: •

Abstract Output Level: The cooperating classifiers decisions are in the form of absolute output labels. Each of the classifiers identifies the character in question definitely as belonging to a particular class and no information other than this assigned label is available. The combination method must make its final decision based solely on this information.



Ranked Output Level: The cooperating classifiers decisions are in the form of a sorted ranking list. Each of the classifiers gives a preference list based on the likelihood of a particular character belonging to a particular class. The previous category is seen to be a special case of this solution, the output label being the top choice of the ranking list. Here, however, much more information is available to determine the final response of the combined classifier.



Measurement Output Level: The cooperating classifiers decisions are in the form of confidence values. Each of the classifiers gives a preference list based on the likelihood of a particular character belonging to a particular class, together with a set of confidence measurement values generated in the original decision-making process. These responses (outputs) represent a special case of ranking list or top choice responses, and are the most generalized form, from which both the ranking list and the top choice response can be generated. However, these responses are difficult to utilize, as the measurement values 42

need to be converted to a normalized scale before any incorporation of information involving a comparison of the individual cooperating classifiers can take place. The different decision combination topologies to consider are as follows: 2.4.4.1 Cascading combination scheme:

Input 1

Input 2

Expert 1

Expert 2

Input N

..

Output

Expert N

Figure 2.2: The cascading combination scheme for multiple classifiers system.

2.4.4.2 Parallel combination scheme:

Expert 1

Input

Expert 2 Fusion

. .

Output

Expert N Figure 2.3: The parallel combination scheme for multiple classifiers system.

2.4.4.3 Hybrid combination scheme:

Input

Decide

Expert 1

Expert Type

Expert 2

Expert 3

Expert n

Figure 2.4: The hybrid combination scheme for multiple classifiers system.

43

Decision

2.4.4.4 Classifier ensembles: Ensemble learning refers to a collection of methods that learn a target function by training a number of individual learners and combining their outputs. Ensemble methods combine a set of redundant classifiers [18].

Voting

Classifier 1

Classifier 2

Classifier 3

Input

Figure 2.5: Classifier ensembles.

Multiple Classifier systems topology can be categorized regarding their functionality and the way the classifiers should interact together. Thus we have: 1. Conditional topology: Once a classifier is unable to classify the output then the following classifier is deployed. 2. Hierarchical Topology: Classifiers applied in succession according to their levels of generalization. 3. Hybrid Topology: The choice of the classifier to use is based on the input pattern. 4. Multiple (Parallel) Topology. The design of classifier ensembles depends on two issues: 1. How do we create the individual classifiers? 2. How do we perform the combination of these classifiers? Individual classifiers can be obtained by several methods: 1. Varying the set of initializations: A number of distinct classifiers can be built with different learning parameters, such as the initial weights in an MLP, etc. 2. Varying the topology: Using different topologies, or architectures, for classifications can lead to different generalization methods.

44

3. Varying the algorithm employed: Applied different classification algorithms for the same topology may produce diverse classifiers. 4. Varying the data: The mostly used approach to produce classifiers with different generalizations. 5. Sampling Data: A common approach is to use some sort of sampling technique, such that different classifiers are trained on different subsets of the data. 6. Disjoint Training sets: Similar to sampling, however, uses mutually exclusive, or disjoint, training sets. That is we use sampling without replacement to avoid overlap between the training sets. 7. Boosting and Adaptive Re-sampling: A series of weak learner can be converted to a strong learner using boosting. 8. Different data sources: Under the circumstances that data from different input sources are available. It is especially useful when these sources provide different sources of information. 9. Preprocessing: Data may be varied by applying different preprocessing methods to each set. Alternatively, datasets may be distorted differently. The combination between individual classifiers can be obtained by several methods: 1. Average vote: 1 Q( x ) = arg max Nj=1  K



K

∑ y (x ) i =1

(2.3)

ij



N is the number of classes.



x is the input pattern.



K is the number of classifiers.



yij(x) is the output of the ith classifier for the jth class for the input x.

Compare the summation of the votes value, the higher is the winner. 2. Weighted averaging: 1 Q( x ) = arg max Nj=1  K



K

∑ wij yij (x ) i =1



N is the number of classes.



x is the input pattern.

45

(2.4)



K is the number of classifiers.



yij(x) is the output of the ith classifier for the jth class for the input x.

The weights wi, i = 1, 2… K can be derived by minimizing the error of the different classifiers on the training set. 3. Non-linear combining methods: •

Voting methods ⇒ Majority Vote: Bad if some experts are very good or very bad. ⇒ Maximum Vote: Trust the most confident expert. Bad if some experts are badly trained. Sensitivity to over-confident base classifiers. ⇒ Product Rule: Q( x ) = arg max Nj=1 ∏i =1Y ij K

(2.5)

Base classifiers are never really independent. •

Rank Based Methods ⇒ Borda Count: K   Q( x ) = arg max Nj=1  B ( j ) = ∑ Bi ( j )  (2.6) i =1  

Bi,j(x) is rank assigned by classifier i for class j given input x. •

Probabilistic methods ⇒ Bayesian combination: ci is the confusion matrix estimated on a training set for the ith classifier. Elements cijk denotes the number of data points that are classified to class k whereas they are actually class j. The conditional probability that a sample x actually belongs to class j, given that classifier i assigns it to class k, can be estimated as

P( x ∈ q j | λi ( x) = ji ) = c ijk / ∑ Nj=1 c jk − i (2.7)

46

Assuming that the different classifiers are independent, a belief value that the input x belongs to class j can be approximated by

Bel ( j ) =

∏ iK=1 P( x ∈ q j | λi ( x) = ji ) = c ijk / ∑ Nj=1 c jk − i ∑ Nj=1 ∏ iK=1 P( x ∈ q j | λi ( x) = ji ) = c ijk / ∑ Nj=1 c jk − i

(2.8)

4. Fuzzy Integral methods. The output of each classifier is assigned a fuzzy density based on its own performance. Combining Strategies Architectures can be: 1. Boosting: For example: Boosting by filtering in which the first expert is used to classify all types of input patterns, if the output is accompanied by an error or a confidence level less than certain threshold it is considered well-classified. Otherwise the input pattern is passed to the next classifier and so on till it is well classified and the final decision is taken.

Expert 1

Misclassified

Input

Expert 2

Misclassified

Well classified

Well classified

Decision

Figure 2.6: Boosting by filtering.

2. Stacked generalization: It is a layered architecture framework. The classifiers at the level 0 receive as input the original data, and each classifier outputs a prediction for its own sub problem. Successive layers receive as input the predictions of the layer immediately preceding it. A single classifier at the top level outputs the final prediction.

47

Level 0 Classifiers

Level 1 Classifiers

Classifier 1

Classifier 2

Input

Classifier 1

. .

Final Output

Classifier N

Figure 2.7: Stacked generalization.

3. Hierarchical mixture of experts: In a mixture of experts, a gating network is used to generate a partition of feature space into different regions with one expert in the ensemble being responsible for generating the correct output within that region.

Classifier 1

Input

Classifier 2 . . .

g1 g2

µ1 µ2



Output

µN

Classifier N

gN ...

Gating Network

Figure 2.8: Mixture of experts.

The mixture of experts can be extended to a multi-level hierarchical structure, where each component is itself a mixture of experts. In this case a linear network can be used for the terminal classifiers.

48

µ

Gating Network

g2 g1 µ1

µ2

x

x

g22

g12

Gating Network

g21

g11 µ11 Local Expert

x

µ12

µ22

Local Expert

Local Expert

x

x

Gating Network

µ21 Local Expert

x

x

Figure 2.9: Hierarchical mixture of experts.

2.5 Summary In this chapter we introduced the Document Analysis and Recognition field as well as the character recognition problem and its types according to the nature of the input data. Off-line and on-line CR systems were introduced and each type of system was discussed in details in all stages pointing out the most representative work done in this area. Preprocessing operations, features extraction, segmentation techniques were fully described. Classification approaches were listed and illustrated together with the types of classifiers. Multiple classifiers systems were introduced also with the strategies and the structures of different classifiers combination.

49

Chapter 3: Handwriting Recognition Systems In this chapter we present the importance of handwriting and its survival as a means of communication. We explain how its convenience together with a penpaper interface has motivated a complete research branch to occur and a new technology to be developed. We discuss also the Arabic handwriting characteristics and explain the difficulties facing the researchers in order to develop a successful recognition system presenting samples of the work done in this area. We present also the work done for recognizing foreign languages like English, Chinese, Thai, etc.

3.1 Types of Handwriting Recognition Systems Writing is an acquired skill that is a complex perceptual-motor task. The execution of writing is a voluntary act that follows behavior patterns learned as habits. Owing to natural variations, no two writings of the same material by the same person are identical. Moreover, Natural variations in writing diverge with the writer’s condition, the writing conditions and may diverge with the nature of the document. When conditions are controlled, there is less variation between executions. In addition to all of this, handwriting changes progressively over the lifetime of the writer. The change is greater during the earlier and later stages of life, however, the nature and extent of the change is peculiar to the individual. Thus; the quality in any human endeavor, and particularly in writing, is its own best defense against simulation, forgery, or counterfeiting [19]. From all the above we can now conclude that it is possible to do the handwriting recognition task depending on the features (or habits) of the written patterns and the rules (or principles) derived from language after analysis and comparison. The degree of difficulty of the recognition task depends greatly upon: (1) the nature of the writing itself, (2) the writer, and (3) the system design. To be more specific, we can categorize the handwriting recognition systems to be addressing one of the following issues:

50

3.1.1 Styles of Handwriting: Printed vs. Cursive The figure below illustrates different writing styles in English. The writing style of the first three lines is commonly referred to as printed or discrete handwriting, in which the writer is told to write each character within a bounding box or to separate each character. The writing style of the fourth line is commonly referred to as pure cursive or connected handwriting, in which the writers are told to connect all of the lower case characters within a word. Most people write in a mixed style, a combination of printed and cursive styles, similar to the writing on the fifth line.

Figure 3.1: Types of English writing styles.

Both printed and cursive handwriting recognition are difficult tasks because of the great amount of variability present in the on-line handwriting signal. The variability is present both in time and signal space. Variability in time refers to variation in writing speed, while variability in signal space refers to the shape changes of the individual characters. It is rare to find two identically written characters. The difficulty of recognizing handwriting lies in constructing accurate and robust models to accommodate the variability in time and feature space. In addition to these two types of variability in time and signal space, cursive handwriting has another type of variability in time which makes this task even more difficult. This additional type of variability is due to the fact that no clear intercharacter boundaries (where one character starts or ends) exist. In printed handwriting, a pen-lift defines these boundaries between characters. However, in cursive handwriting the pen lift cues simply do not exist. Cursive-style handwriting recognition is more difficult because the recognizer has to perform the error-prone step of character segmentation, either explicitly or implicitly [20].

51

3.1.2 Writer-Dependent vs. Writer-Independent A writer-independent (WI) system is capable of recognizing handwriting from users whose writing the system has not seen during training. In general, WI systems are much more difficult to construct than writer-dependent (WD) ones. Humans are capable of WI recognition. However, we are better at WD recognition than WI recognition tasks, i.e., generally we can recognize our own handwriting better than a stranger’s handwriting. The WI systems are more difficult to construct because the variability of handwriting across writers is much greater than the handwriting of a writer. For WD tasks, the system is only required to learn a few handwriting styles. On the other hand, for WI tasks, the system must learn invariant and generalized characteristics of handwriting [20].

3.1.3 Closed-Vocabulary vs. Open-Vocabulary Vocabulary is also a major factor in determining how difficult a handwriting recognition task is. Closed-vocabulary tasks refer to recognition of words from a predetermined dictionary. The dictionary size is arbitrary. Open-vocabulary tasks refer to recognition of any words without the constraint of being in a dictionary. Closed-vocabulary tasks are easier than open-vocabulary ones because only certain sequences of letters are possible when limited by a dictionary. Closedvocabulary tasks using a small dictionary are especially easy because: 1) a small vocabulary size can mean a smaller number of confusable word pairs; 2) a small vocabulary size enables the direct modeling of individual words, whereas a large vocabulary size necessitates the modeling of letters, which is due to the computational complexity of modeling words directly; 3) with the usage of letters for large vocabulary tasks, the search space of all possible sentences is usually much larger due to an increase in the number of nodes in the search graph. When letters are used for modeling instead of words, the number of nodes is m×n instead of n where n is the number of words, and m is the average number of letters per word (generally between

52

three and ten). As the vocabulary size increases, the occurrence of out-of-vocabulary words is less frequent. Thus, the performance of the large vocabulary tasks is approximately the same as of the performance of the open-vocabulary tasks [20].

3.2 Arabic Character Recognition Systems Survey 3.2.1 Characteristics and problems of Arabic script The last two decades witnessed some advances in the development of an Arabic character recognition system. Arabic CR faces technical problems not encountered in any other language. Arabic script has very special characteristics that can be summarized in the following points: 1. Arabic script is always written from right to left and no upper or lower case exists. 2. The Arabic word consists of one or more portions, and every portion has one or more characters. 3. Some characters are not connectable from the left size with the succeeding characters. 4. Every character has more than one shape, depending on its position within a connected portion of the word.

Figure 3.2: Examples for character positions in Arabic text.

5. The cross, branch points inside characters, and the connection points always fall near the writing base line. This line provides useful context information.

Figure 3.3: All critical points of Arabic characters fall near the writing base line.

53

6. Domains covered horizontally by characters overlap in handwriting words.

Figure 3.4: Different cases of character overlapping for the same word.

7. Many characters differ only by the presence and the number of dots above or below the main part of the character shape. Sometimes, the ambiguity of the positions of theses dots in handwritten texts brings out many possible readings for one word.

Figure 3.5: Different characters of the same Main-Stroke.

8. Curves and loops form most of the characters. Loops are written in the clockwise direction usually.

Figure 3.6: Clockwise loops form most of the Arabic characters.

9. Diacritical signs are used as associated vowels that define the pronunciation and discriminate different words that have the same spelling.

Figure 3.7: Effect of Diacritics on Arabic word meaning.

54

10. There is another characteristic of Arabic text, which is the existence of ligatures, for example Lam-Alef, which is a ligature in most Arabic fonts, and Baa-Meem which is a ligature in some fonts. Some ligatures contain up to three characters like Lam-Meem-Haah which is a ligature in some fonts [21]. Table 3.1: The most common ligatures in Arabic words.

‫اﻟﻤﺤـﻤﻮد‬

‫اﻟﻤﺠـﻤﻮع‬

‫اﻟﻤـﺨﺼﺺ‬

‫ﻧﺤـﺘﺎج‬

‫ﻧﺠـﻤﻊ‬

‫ﻧﺨـﺘﺒﺮ‬

‫ﺑﺤـﺎر‬

‫ﺑﺠـﻮار‬

‫ﺑﺨـﻮر‬

‫ﻳﺤـﺐ‬

‫ﻳﺠـﺐ‬

‫ﻳﺨـﺘﺎر‬

‫ﺗﺤـﺖ‬

‫ﺗﺠـﺮﺑﺔ‬

‫ﺗﺨـﻴﻞ‬

‫اﻟﺤـﻖ‬

‫اﻟﺠـﺒﺎل‬

‫اﻟﺨـﻮف‬

‫ﻣﺤـﻤﺪ‬

‫ﻣﺠـﻤﻮع‬

‫ﻣﺨـﺮج‬

‫اﻷﺣﻼم‬

‫اﻵن‬

‫اﻹﺳﻼم‬

11. To justify an English paragraph, we used to insert additional spaces between the words. But to justify an Arabic paragraph, the common practice by many systems is to elongate the baselines of the words of a line, not by inserting inter-word spaces. This elongation changes the shape of some characters. Due to the previously mentioned reasons, Latin CR systems achieved a very high accuracy and they are well established as market products while Arabic CR systems still need more research to be established commercially [21]. A short review about the commercial products developed for handwritten text recognition is presented in appendix B.

55

3.2.2 On-line and Offline systems for recognizing Arabic script As explained before, In order to build an ACR system, the following phases have to be gone through dependently or independently: a. Preprocessing. b. Line and word separation. c. Word segmentation. d. Feature extraction. e. Learning. f. Recognition.

56

The CR systems differ from each others in two ways: (1) whether the system is segmentation free or applies segmentation besides the segmentation technique used, and (2) the recognition technique. The following figure shows the hierarchy of Arabic CR research:

Arabic CR research

Machine Printed

Handwritten

Class A

On-line

Class B

Off-line

Class C

Figure 3.8: The hierarchy of Arabic CR research.

Class A: is by Nature Off-line as the documents are scanned. Handwritten Recognition is classified as either Class B (On-line, using Tablet, or PDAs), and Class C, which represents the ultimate complexity. It's known among researchers in the field that Class C is the most sophisticated as it embodies two levels of complexities, the variability of Arabic Handwriting , and the difficulty in dealing with noisy scanned images in general. On the other hand, though there is large variability in the handwriting, for Class B, the "reasonably contained" Nature of the x-y data generated from the stylus, and the time sequence of movement, gives more hope to produce practical On-line Handwriting recognition systems. After reviewing different research techniques for the three classes, it became evident that there are four major research directions, whether the Arabic Text, is Typewritten, handwritten on-line or off-line. These major research categories are, 1. Recognition of Isolated Characters (ISR). 2. Explicit Segmentation into characters/primitives Before Recognition (SBR). 3. Simultaneous / Sequential recognition and segmentation (SSR). 4. Global Whole Word recognition (GWR). This means that we have 12 branch of research shown in the following table.

57

Table 3.2: ACR research branches.

Isolated Characters Segmentation Before Recognition Simultaneous segmentation and recognition Global Whole Word Recognition

Class A

Class B

Class C

Code : A-ISR

Code : B-ISR

Code : C-ISR

Code : A-SBR

Code : B-SBR

Code : C-SBR

Code : A-SSR

Code : B-SSR

Code : C-SSR

Code : A-GWR

Code : B-GWR

Code : C-GWR

Since the branches 1 – 4 (Class A recognition approaches) are out of the thesis objective, we will focus on the research done about handwriting recognition for online and off-line systems (Class B and C). Branch 5: On-line Handwriting Isolated Character Recognition (B-ISR) It's observed however, that the majority of researcher in Arabic On-line Handwriting recognition, even recent ones focus on isolated characters, rather than tackling the full script recognition problem. In [22], Mezghani et al presented a study of an on-line system for the recognition of handwriting Arabic characters using a Kohonen network. The input of the neural network is a feature vector of Fourier coefficients for the x and y components of pen positions. These descriptors can be transformed to provide invariance with rotation, dilation, and translation and also with the starting point of the character. Experimental results show that the network successfully recognizes both clearly and roughly written characters with good performance. The Kohonen memory runs an unsupervised clustering algorithm. It is easily trained and has attractive properties such topological ordering and good generalization. Characters have been written 24 times by 17 writers to obtain a database of about 7400 samples. The database was divided into two sets. The training set contained 5000 samples and the testing set about 2400 samples. After running several experiments, 83.43% recognition rate on characters written without constraints on the writer. The Ultimate NN accuracy was 86.5%, but with an unpractical performance speed. Al-Sheik et al, assumed a reliable segmentation stage, which divided letters into the 4 groups of letters (initial, medial, final and isolated) .The recognition system depended on a hierarchical division by the number of strokes. One stroke letters were 58

classified separately from two stroke letters etc. Ratios between extremes and position of dots in comparison to the primary stroke were defined heuristically on the data set to produce a rule-based classification. Recognition rates for isolated letters were reported at 100%. It was unclear from the paper whether these results were on the training or test set. This approach had an excellent recognition rate and a good divideand-conquer strategy by reducing the classes through hierarchical rules. It also attempted to classify all of the forms of Arabic letters and used a large data set. However, this approach would be extremely sensitive to noisy data in terms of the number of strokes since the hierarchy was built on counting the exact number of strokes. Branch 6: On-line Handwriting Segmentation Before Recognition (B-SBR) In [22], El-Emami and Usher were trying to recognize postal address words after segmenting them into letters. They used a structural analysis method for selecting features of Arabic characters. The classification used a decision tree. In preprocessing, they segmented method for finding extreme curvature. Some of the features extracted during this segmentation process were direction codes, slope and presence of dot flags. A new input needed to search three decision trees for the primary stroke and also for the upper and lower dots. The decision tree was handtweaked to find the best parameters to fit the data set, which possibly could have led to over fitting. The system was trained on 10 writers with a set of 120 postal code words with a total of 13 characters. They used one tester who had a recognition rate of 86%. They instructed him to change his writing style to account for a weakness in the system and obtained 100% accuracy. Branch 7: On-line Handwriting Simultaneous Segmentation and Recognition (BSSR) In [22], Abdelazim has investigated a Digital Curve Partitioning (DCP) algorithm, based on Vertex Finder in two papers. In this Algorithm Significant control points are extracted having highest Curvatures. In his first paper, Straight Line segments are fuzzified based on their angles and a Fuzzy Logic Comparator (FLC) is used to match against an existing data base of prototypes. The extent of the match represents the basis of the segmentation decision. Diacritics and Dots are recognized using Learning Vector Quantization Neural Network (LVQ). In the second paper, 59

a sequential recognition/ segmentation technique is adopted. After Straight Line segments are generated from the same DCP which is common on both techniques, they are accumulated sequentially and fed into an RBF Neural network which is known for its powerful classification, and more important rejection capabilities. The output of the Network is used as a measure for deciding on the character segments. Reported results, in the first paper: 95% for a single writer, writing clear Naskh writing. In the second paper, accuracy reached 96% for 10 different writers, with also constrained "Naskh" Writing. Branch 8: On-line Handwriting Global Whole Word Recognition (B-GWR) No research is known to be directed to this branch because the global word recognition task in general depends basically on the existence of a lexicon and precomputed feature templates for comparison. And since it is very difficult to construct a complete Arabic dictionary for all Arabic words (several millions of words) as well as no on-line Arabic handwritten database known to be available on the internet for researchers, thus, no efforts had been made to address this problem. On the other hand, most of the researchers working on the on-line Arabic handwriting recognition problem prefer to build a very small and limited self database to work with. And due to the complexity of Arabic script they prefer to segment words and to recognize characters or even primitives rather than the whole word. The only application that may tend to recognize the words as a whole is the on-line Arabic signature verification which is also a research point that had not been addressed by researchers till now. Branch 9: Off-line Handwriting Isolated Character Recognition (C-ISR) In [22], Boashash et al, examined NNs as classifiers. After scanning and digitization each input character is normalized and registered inside a 32x32 Window. Five Neural Network architectures are tested, namely Multilayer Perceptron, ART1, ARTMAP, Learning Vector Quantization (LVQ), and Kohonen Maps. Of the networks implemented, best classification performance was provided by a 2-layer (1 hidden, 1 output layer) network using back propagation with momentum. Results has shown that ART, and ARTMAPS are far from being effective, and usable, either in terms of accuracy or Performance. MLP was found to achieve highest accuracy as in 60

the table below. One possible reason for the generally low accuracy is using raw binary data instead of extracting significant features. Also training data is limited for a single writer. Network MLP LVQ F-Map

28 classes 73% 67% 50%

17 classes 81% 74% 54%

Branch 10: Off-line Handwriting Segmentation Before Recognition (C-SBR) In [22], El Sellami et al attempted to solve the segmentation problem directly. In their paper, a new character segmentation algorithm (ACSA) of Arabic scripts is presented. It is based on morphological rules, which are constructed at the feature extraction phase. Finally, ACSA is combined with an existing handwritten Arabic character recognition system (RECAM). The features include (AS : Ascenders, DS : Descenders, DLM : Double Local Minima , HZ : Hamza , 1D : single Dot, 2D, Double Dot, 3D : Triple Dot, H: Hole, TP : turning Points). Examples: [3CC: (H-DS), (AS), (TP-TP)]: ‫واحد‬ [2CC: (DLM-1D-H-H-DS), (1D-DS)]:‫سبعون‬ [1CC: (TP-1D-H-DLM-H-2D)]: ‫خمسة‬ Local minima (LM) in the lower outer contour of the sub-word are identified. Morphological rules are applied on the local minima in order to accept or reject them as Valid Segmentation Points (VSP). These rules are manually generated and may not cover many of the practical cases of Free Handwriting recognition. Results of the ACSA were tested on an omni-scriptor 100 handwritten Arabic word database. The accuracy of good segmentation was 85% Branch 11: Off-line Handwriting Simultaneous Segmentation and Recognition (C-SSR) In [22], Le Courtier et al proposed an analytical approach based on the Hidden Markovian Models (HMMs) to manage the defaults of the segmentation module. Their approach segments the word into basic graphemes. They also selected an optimal alphabet of graphemes in order to increase the performances of the recognition system. After scanning and digitization, each word is segmented to its basic sub words, Sub words are then segmented roughly to graphemes (basic Character shapes). 61

Each Grapheme is represented by a feature vector of size 19 elements. The features contain partly topological features that correspond to human perception: loops, openings, the relative size (width, height) of the grapheme, the relative position compared to the base line...; The second sub-vector could contain invariant moments, Fourier descriptors, or wavelet descriptors (or other type of transformation). Authors selected 10 Initial Fourier Descriptors. Conventional HMM Modeling was used to model each class name. For each input word an observation sequence is extracted by matching each segmented Grapheme with one of 34 Codebook using k-NN. The Observation sequence is recognized using Maximum Likelihood Classifier. Their data base contained 232 different classes of words. Each class is written by forty writers. They trained their models on 4720 labeled words and tested on 4560 words. Results are shown in the table below: Size of the alphabet 20 27 34 37

Top 1

Top 2

Top 3

Top 4

Top 5

79.19% 81.94% 82.52% 80.11%

86.91% 88.43% 88.60% 86.28%

89.67% 90.63% 90.66% 88.76%

90.92% 91.68% 91.61% 90.05%

92.24% 92.40% 92.17% 90.79%

Branch 12: Off-line Handwriting Global Whole Word Recognition (C-GWR) In [22], Clocksin et al adopted a different holistic approach based on whole word recognition. The new method they used transforms each word into a normalized polar map, and then applies a two-dimensional Fourier transform to the polar map. The resultant spectrum tolerates variations in size, rotation and displacement. Each word is represented by a single template that includes only the lower frequencies of the Fourier spectrum magnitude. Recognition is based on the Euclidean distance from those templates. The method has been successfully applied to historical handwritten manuscripts and to modern multi-font type. The lexicon with very limited vocabulary was used. The lexicon contained 145 word classes, and 3370 word samples were processed. Average Accuracy reached was 92%. The approach is excellent when dealing with very low quality and very sophisticated images, where conventional techniques will definitely fail. The system is limited in practical usage for only limited Vocabulary, and cannot be applied to an open vocabulary system.

62

3.3 Foreign Languages Recognition Systems Survey 3.3.1 English: We can categorize the efforts done by researchers for Latin languages the same way we categorized the Arabic language work. This means that we have also 12 branches of research and we will consider only 8 of them. Branch 5: On-line Handwriting Isolated Character Recognition (B-ISR) J. Lee, J. Kim, and J. H. Kim [23] have developed a system for on-line recognition of English numerals. Handwriting is encoded as one of the 16-direction codes, and the imaginary lines between each two visible segments are encoded by other 16-direction codes. We call the resulting sequence of direction code a skeleton pattern. Skeleton patterns within a class are clustered to determine the number of models for the class. The skeleton pattern that appears most frequently in each cluster is chosen as the representative pattern of that cluster. An HMM is, then, constructed from each representative pattern. As a consequence, the number of representative patterns decides the number of models in a class, and the length of a representative pattern determines the number of states of the corresponding HMM. The multiple models in a class were combined in parallel to form the structure of the multiple parallel-path HMM, and then it behaves as a single HMM. States with structural similarity were tied, hence the number of HMM parameters were reduced. The proposed design method was evaluated using on-line Hangul (Korean) recognition system. The experiments showed that the proposed method reduced about 19% of the error rate compared to the intuitive design methods. Branch 6: On-line Handwriting Segmentation Before Recognition (B-SBR) Gareth Loudon et al [24], have introduced a method for Handwriting input recognition and correction on smart phones. The technique used in the handwriting recognition engine is based on discrete HMMs. Only handwritten examples of isolated characters are used for training. The Features extracted for each data point are x-y coordinates and delta x and delta y. Two ten-state HMMs are used for each character. One HMM is based on the features x, y and one HMM is based on the features delta x and delta y. After segmentation, the time sequence of VQ codes for

63

a test character is matched against all the HMMs using a standard Viterbi search algorithm. The probabilities of match between the input data and all characters are recorded and sorted. The five characters with the highest matching probability are stored. Once the segmentation and character recognition steps have been completed, a re-scoring step takes place that makes use of simple language knowledge to improve recognition performance. The first part of the re-scoring step uses a simple statistical language model based on character unigram and bigram probability values. The second part of the re-scoring step uses a word dictionary, in this case 16,000 words. A test set of handwritten English sentences containing 5559 characters were used to evaluate the recognition performance. The test set contained handwriting samples collected from seven different writers. The handwriting training set contained 11984 different characters written by thirteen different writers. The average character recognition accuracy was 98.3% for the 5559 characters when not including the use of a dictionary. E. Go´mez Sa´nchez et al [25], have implemented a recognition system that used to segment numerals into strokes and then extract some features that represent these strokes. An objective of the segmentation phase was to keep the number of strokes per component as low as possible in order to reduce the computational cost of the classification stage. Several features were studied, containing mainly features of length and phase, but also coefficients of wavelet transform (WT) of important functions of the stroke trajectories. For clustering and classification they used the neuro-fuzzy architecture FasArt (Fuzzy Adaptive System ART-based) whose structure follows the principles of ARTMAP architectures. The system was tested on datasets produced by the UNIPEN project. The same experiments were also carried out with five volunteers. Five training cycles were employed. The achieved recognition rate of almost 86% on the digits of the second training set has been shown to be close to the rate of 90% achieved by independent human testers. Branch 7: On-line Handwriting Simultaneous Segmentation and Recognition (BSSR) P. Neskovic and L. N. Cooper [14], have developed an on-line recognition system in which HMMs together with Dynamic programming technique (Viterbi) were used to efficiently search the space of possible segmentations. Neural Network64

based models were used for representing handwritten patterns as an alternative HMMs. A dictionary of size 1000 words was used. The preprocessor extracts features from the strokes and outputs the feature vector to the segmentation network. The segmentation network is a multi-layer feedforward network that is trained on entire unsegmented words. The output layer of the network consists of units called character detectors. The outputs of these detectors form a detection matrix. The elements of the decision matrix represent recognition probabilities of characters. Each row of the detection matrix represents one character and each column corresponds to the position of the character within the pattern. A binding network is used to bind a subset of the elements from the detection matrix (characters) such that they represent a dictionary word. The candidate words are passed to the recognition stage and Dynamic programming was use as a post-processing module to select the best sequence of elements from the detection matrix. The system's performance depends on the writer, on his style and the clarity of his writing, the output of the system is a ranked set of words. For good writers the correct word is in the top 5 words over 97% of the time, and the correct word is the top ranked word over 93% of the time. For bad writers the correct word is in the top 5 words over 90% of the time and the correct word is the top ranked word over 70% of the time. Branch 8: On-line Handwriting Global Whole Word Recognition (B-GWR) S. Jaeger et al [26], have developed an on-line handwriting recognition system called "NPEN++", this system integrates local as well as global information about the trajectory of the pen into the feature vector. The core of NPEN++ is a Multi-State Time Delay Neural Network, which is hybrid architecture between neural networks and HMMs. Training consists of a forced alignment and a free alignment mode, which requires only a small set of hand segmented data. NPEN++ uses an efficient tree search to ensure real time performance for very large dictionary sizes. The recognition rates range from 96% for a 5000 word dictionary to 91.2% for a 50,000 word dictionary. Han Shu [20], has implemented a large vocabulary, writer-independent, cursive on-line handwriting recognition system with a word error rate under 10% was conceptualized and implemented together with hidden Markov models and global information-bearing features. He presented a series of feature experiments. A new 65

vertical height feature was used to characterize vertical height. A new space feature was used to represent inter-word space. The new features improved the HMM’s modeling of handwriting, thus, also improved the recognition performance of the overall system significantly. Although this handwriting recognition system was capable of recognizing words from a large vocabulary, it was impossible for it to recognize out-of-vocabulary words such as new names and new abbreviations. However, it would not be that difficult to convert it to be one with an open-vocabulary by adding a word-HMM model for out-of-vocabulary words. Branch 9: Off-line Handwriting Isolated Character Recognition (C-ISR) From the latest work done in this area: B. Zhang et al [27], have proposed an off-line system to study the individuality of handwritten characters. Given a handwriting sample, a set of characters are first segmented, then for each isolated character, the so-called micro-features are extracted. Micro-features have been successfully used for recognizing handwritten characters and analyzing handwriting individuality. For an individual character, the micro-features consist of 512 bits corresponding to gradient (192 bits), structural (192 bits), and concavity (128 bits) features. Each of these three sets of features relies on dividing the scanned image of the character into a 4 x 4 region. The gradient features capture the frequency of the direction of the gradient. The structural features capture, in the gradient image, the presence of corners, diagonal lines, and vertical and horizontal lines, as determined by 12 rules. The concavity features capture, in the binary image, major topological and geometrical features including direction of bays, presence of holes, and large vertical and horizontal strokes. Two different models, identification and verification, are used to study the individuality of handwritten characters. A two-stage k-nearest-neighbor search and an artificial neural network were implemented for handwriting identification and verification respectively. Several general conclusions were made from the discussions and observations: (a) Use a large number of handwritten characters presents higher discriminative power of handwriting individuality than document-level features. (b) Different handwritten characters have different power of discriminating handwriting individuality.

66

(c) Handwritten alphabetic characters are much more powerful in discriminating handwriting individuality than handwritten numerals. Branch 10: Off-line Handwriting Segmentation Before Recognition (C-SBR) B. Verma et al [28], have proposed a technique to improve current handwriting recognition systems. The segmentation technique contained two components. Firstly a simple heuristic segmentation algorithm was implemented which scanned handwritten words for important features to identify valid segmentation points between characters. The algorithm first scanned the word looking for minimas or arcs between letters, common in handwritten cursive script. The algorithm incorporated a “hole seeking” component which attempted to prevent invalid segmentation points from being found. Holes, are found in letters which are totally or partially closed such as an “a”, “c” and so on. If such a letter was found then segmentation at that point did not occur. Finally, the algorithm performed a final check to see if one segmentation point was not too close to another. If the segmentation point in question was too close to the previous one, segmentation was aborted. The second component of the segmentation technique incorporated a feed forward artificial neural network trained with the back propagation algorithm. It was initially trained with segmentation points found through manual segmentation of handwritten words. Following training, the ANN was presented with segmentation points obtained through the use of the heuristic segmentation algorithm. The ANN verified whether all the segmentation points found were correct or incorrect. Correctly identified points were not removed while incorrect segmentation points were rejected. After the segmentation technique created a set of segregated characters, another Neural Network was used to classify the characters. After classification, these characters were then presented to a neural based dictionary of words. The network used was based on the Hamming network. The identified characters were then presented to the neural dictionary. The results were very promising indicating that a neural based dictionary can produce recognition rates of up to 100% for handwritten words. Yong Haur Tay et al [29], described an approach to combine neural network (NN) and Hidden Markov models (HMM) for solving handwritten word recognition 67

problem. The preprocessing involves generating a segmentation graph that describes all

possible

ways

to

segment

a

word

into

characters.

To

recognize

a word, the NN computes the observation probabilities for each letter hypothesis in the segmentation graph. The HMMs then compute the likelihood for each word in the lexicon by summing the probabilities over all possible paths through the graph. An offline handwritten word recognizer is developed based on this approach and the recognition performance of the recognizer on three isolated word image databases namely, IRONOFF, SRTP and AWS was found very promising for samples longer than six characters. Branch 11: Off-line Handwriting Simultaneous Segmentation and Recognition (C-SSR) From the recent work published in this branch: M. Morita et al [30], developed an off-line system for segmentation and recognition of handwritten dates on Brazilian bank checks based on HMM-MLP hybrid approach. Through the recognition process, it segments a date into subfields. They proposed to use MLP neural networks to deal with strings of digits (day and year) and the HMMs to recognize and verify words (month). HMMs were fed by two feature sets. The first set is based on global features such as loops, ascenders, and descenders. The second one is based on concavity measurements. After segmenting a date image into its constituent parts, the subimages of the day and year were used as input to the digit string recognizer. Moreover, the number of digits supplied by the HMMs was used as a priori information on digit string recognition to determine which classifiers were employed depending on the subfield (day or year). This strategy aims at reducing the lexicon size on digit string recognition to improve the recognition results. The proposed word verification scheme brought an increase in the overall recognition rate of the system. Two experiments were performed using a subset of the validation set of the NIST SD19 database. The recognition results were 97.1% and 99.2% respectively. A. Vinciarelli and S. Bengio [31], have developed an off-line single writer cursive word recognition using continuous density Hidden Markov Models trained with PCA or ICA Features. The system is based on a sliding window approach: After preprocessing, a window of size 16 pixels (≅ 1.5 mm length) shifts column by column 68

across the image and, at each step, isolates a frame. A feature vector is extracted from each frame and the sequence of frames so obtained was modeled with Continuous Density HMMs. The use of the sliding window approach has the important advantage of avoiding the need of an independent segmentation, a difficult and error prone process. In order to reduce the number of parameters in the HMMs, they used diagonal covariance matrices in the emission probabilities. This corresponds to the unrealistic assumption of having decorrelated feature vectors. For this reason, they applied Principal Component Analysis (PCA) and Independent Component Analysis (ICA) to decorrelate the data. This allowed a significant improvement of the recognition rate. Several experiments were performed using a publicly available database. The accuracy obtained is the highest presented in the literature over the same data. The analysis of the recognition as a function of the word length shows that the system achieves a 100% recognition rate for samples longer than six characters. This suggests that the performance of their system in tasks involving words with high average length can be very good. Both PCA and ICA had a positive effect on the recognition rate, PCA in particular reduced the error rate, with respect to the use of raw data, by 30.3%. A further improvement could probably be obtained by using nonlinear or kernel PCA. Such techniques often work better than the linear transform we used to perform PCA. Branch 12: Off-line Handwriting Global Whole Word Recognition (C-GWR) Victor Lavrenko et al [32], presented a holistic word recognition approach for single-author historical documents. Their goal here is to produce transcriptions that will allow successful retrieval of images, which has been shown to be feasible even in such noisy environments. The experiments showed a recognition accuracy exceeding the performance of other systems that operate on non-degraded input images (non historical documents). Preprocessing stage consisted of slant/skew/baseline normalizations that are commonly used in the literature our features are generally used for the recognition of handwritten characters. Simple holistic scalar features (such as the width, height, aspect ratio, area and the number of ascenders and descenders in the word) and profile-based features (e.g. projection profiles) were used.

69

The document was described using a Hidden Markov Model, where words to be recognized represent hidden states. The state transition probabilities are estimated from word bigram frequencies. Our observations are the feature representations of the word images in the document to be recognized. In this scenario the recognition lexicon was constrained to that of a supplied transcript, which was also used to estimate unigram/bigram frequencies. The error rates we achieve are comparable to those of multi-writer recognition systems for high-quality input pages. While the accuracy is not yet sufficient to produce automatic transcripts that will be acceptable for human readers, successful retrieval of historic documents and transcription alignment can already be performed with at least 93% accuracy. V. Marti et al [33], presented a system for reading handwritten sentences and paragraphs. In contrast to other systems, whole lines of text are the basic units for the recognizer. Thus the difficult problem of segmenting a line of text into individual words can be avoided. Another novel feature of the system is the incorporation of a statistical language model into the recognizer. The original data input to the system are images of complete pages of handwritten text from the underlying database. A sliding window was used for feature extraction. The first three features are the weight of the window (i.e. the number of black pixels), its center of gravity and the second order momentum of the window. Features four and five define the position of the upper and the lower contour in the window. The next two features, number six and seven, give the orientation of the upper and the lower contour in the window by the gradient of the contour at the window’s position. As feature number eight the number of black-white transitions in vertical direction is used. Finally, feature number nine, gives the number of black pixels between the upper and lower contour. An HMM is build for each character. In the recognition phase the character models are concatenated to words, and the words to sentences. Thus a recognition network is obtained. In this network the best path can be found with the Viterbi algorithm. A unigram and a bigram language model has been introduced. These models weight each word by its occurrence probability. The linguistic knowledge introduced by means of unigram and bigram models improved the recognition performance such that a recognition rate on the word level of 79.5% and 60.05% for

70

small (776 words lexicon size) and larger (7719 words lexicon size) vocabularies were increased to 84.3% and 67.32% when the top ten choices were taken into regard.

3.3.2 Japanese, Chinese, Thai and other languages It was noticed that all the work done in this area either for off-line or on-line recognition systems is directed towards single character recognition only due to the nature of these languages. H. Shimodaira et al [34], have presented a system for on-line recognition of both Japanese and Chinese languages. The Substroke has been chosen as the model unit. Depending on the directions, lengths, and pen-up/down movements of substrokes, they defined 25 substrokes of eight directions as shown in the figure below: eight long strokes (A–H), eight short strokes (a–h), eight pen-up movements (1–8) and one pen-up-down movement (0). A three-state HMM is employed for each pen-down substroke to model the changes in substroke velocity, while a one-state HMM without self-loop transition probability is used for each pen-up substroke to model the displacement vector. The Viterbi-based embedded training is employed to train the substroke HMMs. The character recognition system used in this study does not take any measures against different stroke order problem; thus, the recognition performance deteriorates when users write characters in different stroke order from the one defined in the hierarchical dictionary. Two sets: set-A (578 Japanese texts whose character length is between 2 and 8) and set-B (database with fixed stroke order) were used for evaluation. When language model is used, the proposed method gives the text recognition rate of 69.2% on set-A, while 88.0% on set-B, and character recognition accuracies of 74.9% and 91.1%, respectively. S. Tang and I. Methaste [35], have proposed an on-line handwritten character recognition system of Thai characters. Thai character recognition is more difficult than other languages in respects including the similarity of characters, absence space between each word, which causes a difficulty of the segmentation process. Thai characters are complex and are composed of circles, zigzags, curves, and head, just to name a few. In the proposed system, the coordinate sequence data from the digital tablet is passed through a preprocessing procedure, which includes dehooking and 71

resampling processes. Then, local features such as stroke direction, and global features, such as histogram analysis, are used in the feature extraction process. After each character is extracted, the features are fed to a multi-layer perceptron (MLP) neural network with backpropagation learning algorithm for training or recognition. The data were divided into three groups. The first group consisted of 42 consonants. The second group contained vowels and tone marks 23 characters. The last group was Arabic numbers with 10 characters. The recognition rates from the first group and third group were very close to each other (92.50%, 92.99%). However, the second group’s recognition rate was lower at 89.76%. The average error rate of this experiment was produced about 10%.

3.4 Summary In this chapter we discussed the handwriting recognition problem in more details and came closer to the difficulties facing this point of research. We introduced samples of the researcher efforts for recognizing Arabic language as well as English and Asian countries languages.

72

Chapter 4: Rule-based Algorithm for Off-line Isolated Handwritten character recognition In this chapter we propose a Rule-based system dedicated for recognizing offline handwritten Arabic alphabet characters. Using structural features and classifier ensembles, we ended up to a multiple classifiers system which has achieved an increase of about 27% in the recognition accuracy compared to a traditional single classifier system.

4.1 Off-line Character Recognition system stages 4.1.1 Database Collection We used a single writer database composed of 30 samples of the Arabic alphabetic characters (870 characters). The database was divided to two parts: 580 characters (20 samples of each character) were used as training data and 290 characters (10 samples of each character) were tested for recognition. The complete database is included in Appendix B.

4.1.2 Preprocessing Stage In the preprocessing stage, the database image is scanned. Image binarization and thresholding take place. There was no need for noise elimination techniques or morphological operations because the database was quite clean and no need to increase the processing time.

We followed the edge detection procedure described in [36] to separate character samples. We were able to detect the upper and lower boundaries of the text line, and the left and right boundaries of the character using signal (image) differentiation. Text lines were separated according to the following concept: The boundary of an object is a step change in the intensity level. The edge is at the position of the step change. To detect the edge position (tip and bottom of writing 73

line) we can use first order differentiation. Differencing vertically adjacent points will detect horizontal changes in intensity (Vertical edge detector). A vertical operator will not show up vertical changes in intensity since the difference is zero. This will determine horizontal intensity changes but not vertical ones so the vertical edge detector detects the horizontal edges. Similarly, the characters were separated according to the following concept: The boundary of an object is a step change in the intensity level. The edge is at the position of the step change. To detect the edge position (tip and bottom of writing line) we can use first order differentiation. Differencing horizontally adjacent points will detect vertical changes in intensity (Horizontal edge detector). A horizontal operator will not show up horizontal changes in intensity since the difference is zero. This will determine vertical intensity changes but not horizontal ones so the horizontal edge detector detects the vertical edges.

4.1.3 Feature extraction, Training and Recognition Stages The basic feature used was mainly the radial distances. This feature was computed as follows: After deciding the center of gravity of the character image (C0), the image is segmented to a number of triangular sectors (here we used 8 sectors), to each we compute the sector center of gravity (C1, C2, C3, C4, C5, C6, C7, C8). The feature vector is composed of the square of the Euclidean distance between each sector centroid and the main centroid as shown in the figure below.

Figure 4.1: Radial distance Feature.

74

In the training stage, the feature vectors of all the training samples of a given class (character) were averaged. This means that we had 29 representative feature vectors for the 29 classes (28 alphabet characters and the ‫ ال‬ligature) that are passed to the recognition stage for comparison. A traditional single classifier recognition stage was used first; the unknown input in the test stage was decided to be one of the 29 classes. The decision is taken in favor of the class having the minimum Euclidean distance between its representative pattern and test feature vector. To compute the average system accuracy, we have to change the pattern samples used for training and test to insure that the accuracy got for each run is almost the same, not biased to a certain run. Since, we have

30

C20 selection of samples,

unfortunately, we cannot apply the algorithm 30C20 = 30045015 times. Thus, we made 30 runs only. For each run a different group of patterns samples are selected for each class (character) for training and test and we compute the system accuracy then finally we obtain the average system accuracy by averaging the 30 outputs. The average system accuracy was given by 70.06%. We noted the confusions listed in table 4.1. Table 4.1: Common confusions of ACR system using single classifier.

The confused character ‫ى‬ ‫ال‬ ‫ھـ‬ ‫م‬ ‫ل‬ ‫ق‬ ‫ع‬ ‫غ‬ ‫ط‬ ‫ض‬ ‫ص‬

The confused character ‫ش‬ ‫س‬ ‫ز‬ ‫ر‬ ‫ذ‬ ‫د‬ ‫خ‬ ‫ح‬ ‫ج‬ ‫ث‬

Confuses with ‫ ق‬،‫ ف‬،‫ ش‬،‫س‬ ‫ ك‬،‫ن‬ ‫ظ‬ ‫ھـ‬ ‫ ظ‬،‫ن‬ ‫ ى‬،‫ظ‬ ‫ ح‬،‫ ك‬،‫ غ‬،‫ظ‬ ‫ خ‬،‫ح‬ ‫ظ‬ ‫ ن‬،‫ ف‬،‫ص‬ ‫ ش‬،‫ س‬،‫ال‬

Confuses with ‫ ف‬،‫س‬ ‫ش‬ ‫ر‬ ‫و‬ ‫د‬ ‫ط‬ ‫ ق‬،‫ غ‬،‫ح‬ ‫خ‬ ‫ع‬ ‫ت‬

We noticed from table 4.1 that most of the confusions lack sense (i.e., unexpected to happen). This is because the feature used was not representative enough. Thus, we tried to find a better way to define classes and accordingly acquire more features to discriminate between characters in each class and find out a strategy

75

to combine them (i.e., feature fusion) such that we can enhance the final average recognition accuracy. This was achieved after 4 steps (or stages) explained in the following sections. 4.1.3.1 Stage 1: using classifier ensemble (hierarchical mixture of experts) controlled by gating according to the structural features of Arabic alphabets Referring to the work done by M. Rashwan and Hazem Raafat [37], characters were categorized to different classes according to the number of dots they comprised. Each class contains a number of characters having the same number of dots above or below its main body. The input test pattern was compared to the appropriate class whose characters were having the same number of dots. A traditional single classifier based on the radial distance feature was used to decide the closest character to the test pattern from this class according to the minimum Euclidean distance measure. We followed the same approach; this topology represented a classifier ensemble in which individual classifiers (based on the radial distance feature) were established by varying the training data (Refer to section 2.4.4.4.4 in chapter 2.). The Structural features of Arabic script: dots and Hamzas (or secondaries) were used for gating between the radial distance classifiers such that the appropriate classifier was deployed according to the number and location of secondaries of the input test pattern. This is shown in details in table 4.2: Table 4.2: Classifier ensemble controlled by gating according to the number of dots. Class Number 1

Number Of Dots = 3

Characters Within the class ‫ث‬،‫ش‬

2

Number Of Dots = 2

‫ث‬،‫ش‬،‫ت‬،‫ق‬

3

Number Of Dots = 1 & Dot Position up

4 5

Number Of Dots = 1 & Dot Position down Number Of Dots = 0 & has Hamza

6

Rest of characters

Class Property

،‫ذ‬،‫ف‬،‫ن‬،‫ خ‬،‫ف‬،‫ظ‬،‫ض‬ ‫ت‬،‫ق‬،‫غ‬،‫ز‬ ‫ب‬،‫ج‬ ‫ك‬،‫أ‬ ‫ ر‬، ‫ د‬، ‫ ح‬، ‫ ھ‡‡‡ـ‬، ‫ و‬، ‫ م‬، ‫ ط‬، ‫ص‬ ‫ ك‬، ‫ أ‬، ‫ ى‬، ‫ ال‬، ‫ ل‬، ‫ ع‬، ‫ س‬،

Comments Consider case of stuck dots in ‫ ث‬، ‫ش‬ Consider case of stuck dots in ‫ ت‬، ‫ق‬

Consider errors while recognizing “Hamza”

Note that we tended to repeat some characters in more than one class (i.e. overlapping of classes in the feature space) to avoid the effect of: (1) the noise and hazards caused by the writer that were not eliminated during preprocessing in order to

76

keep the image quality and avoid distortion which causes accuracy degradation and (2) low recognition power of the secondaries. These problems were left aside till finding the final system topology and reaching the maximum possible recognition accuracy then we tried to solve them to enhance the final accuracy The average system accuracy was computed the same way explained before. Its value has risen to be 78.33%. The confusions began to make some sense. Table 4.3: Common confusions of ACR system using multiple classifiers.

The confused character ‫ى‬ ‫ال‬ ‫م‬ ‫ل‬ ‫ك‬ ‫ع‬ ‫غ‬ ‫ظ‬

Confuses with ‫ س‬،‫ ص‬،‫ل‬ ‫ ھـ‬،‫ك‬ ‫ھـ‬ ‫ ك‬،‫ص‬ ‫ل‬ ‫ ك‬،‫ح‬ ‫خ‬ ‫ج‬

The confused character ‫ض‬ ‫ص‬ ‫س‬ ‫ر‬ ‫د‬ ‫خ‬ ‫ت‬

Confuses with ‫ن‬ ‫ ال‬،‫س‬ ‫ى‬ ‫و‬ ‫ط‬ ‫ ق‬،‫ غ‬،‫ح‬ ‫ث‬

4.1.3.2 Stage 2: Adding more structural features for gating between different classifiers Since the idea of gating worked well and raised the recognition accuracy thus we tried adding another feature, loop feature, for gating. The details are shown in table 4.4: Table 4.4: Multiple classifiers controlled by gating according to the number of dots and loops. Class Number 1 2 3 4 5 6 7 8

Class Property

Characters within the class

Number Of Dots = 3 Number Of Dots = 2 Number Of Dots = 1 & Dot Position up & Loop found Number Of Dots = 1 & Dot Position up & No Loop found Number Of Dots = 1 & Dot Position down Number Of Dots = 0 & Loop found

‫ ث‬،‫ش‬ ‫ ث‬،‫ ش‬،‫ ت‬،‫ق‬ ‫ ق‬،‫ ف‬، ‫ ظ‬، ‫ض‬ ‫ ت‬،‫ غ‬، ‫ ز‬، ‫ ذ‬، ‫ ن‬، ‫خ‬ ‫ ب‬،‫ج‬ ‫ ھـ‬، ‫ و‬، ‫ م‬، ‫ ط‬، ‫ص‬

Number Of Dots = 0 & has Hamza Rest of characters

‫ ك‬،‫أ‬ ‫ ك‬، ‫ أ‬، ‫ ى‬، ‫ ال‬، ‫ ل‬، ‫ ع‬، ‫ س‬، ‫ ر‬، ‫ د‬، ‫ح‬

Having 8 different classifiers, the unknown input in the test stage was decided to be classified using one of these classifiers according to the number and position of

77

dots it comprises, Hamza and loop existence. The decision is taken in favor of the character having the minimum Euclidean distance between its feature vector and the test pattern feature vector. The average system accuracy has risen to be 80.86%. From the above, we conclude that differing and increasing the number of classifiers has enhanced the system accuracy. We added new classifiers depending on structural features like the number and position of the character stroke end points (after image thinning), and number of vertical and horizontal lines cuts by the character body, in addition to the radial distance feature described before. These new features were helpful for solving the confusions shown in figure 4.2:

Figure 4.2: Additional features used for recognition.

The overall classification hierarchy is shown in figure 4.3.

78

Figure 4.3: Arabic characters classification hierarchy in stage 2.

At this stage, we had 4 main classes and 6 subclasses: 1. Main Class 1 (3 dots: ‫ ش‬،‫)ث‬: Radial distance feature was enough to discriminate between its characters. 2. Main Class 2 (2 dots: ‫ ش‬،‫ ث‬،‫ ت‬،‫)ق‬: Radial distance feature is used to discriminate between its characters. 3. Main Class 3 (1 dot: ‫ ال‬،‫ أ‬،‫ ك‬،‫ ت‬،‫ ق‬،‫ ن‬،‫ ف‬،‫ غ‬،‫ ظ‬،‫ ض‬،‫ ز‬،‫ ذ‬،‫ ح‬،‫ ج‬،‫)ب‬: Subclass 1: (Dot position is down: ‫ ج‬،‫)ب‬: Radial distance feature was enough to discriminate between its characters. Subclass 2: (Dot position is up and has no loop: ‫ ال‬،‫ أ‬،‫ ك‬،‫ ت‬،‫ ن‬،‫ غ‬،‫ ز‬،‫ ذ‬،‫)خ‬ •

Number of endpoints > 2 (‫ غ‬،‫)خ‬: we say ‫ غ‬if all endpoints were in the right half of the image, else we say ‫خ‬.

79



Number of endpoints > 1 (‫ ال‬،‫ أ‬،‫ ك‬،‫ ت‬،‫ ن‬،‫ ز‬،‫)ذ‬: Radial distance feature was used to discriminate between its characters.

Subclass 3: (Dot position is up and has loop: ‫ ق‬، ‫ ف‬، ‫ ظ‬، ‫)ض‬: Radial distance and vertical lines cut features were used to discriminate between its characters. 4. Main Class 4 (0 dot: ‫ك‬،‫أ‬،‫ھـ‬،‫و‬،‫م‬،‫ط‬،‫ص‬،‫ى‬،‫ال‬،‫ل‬،‫ع‬،‫س‬،‫ر‬،‫د‬،‫)ح‬: Subclass 4: (0 dots and Hamza found: ‫ ك‬،‫)أ‬: Radial distance feature was used to discriminate between its characters. Subclass 5: (0 dots and loop found: ‫ھـ‬،‫و‬،‫م‬،‫ط‬،‫)ص‬: Radial distance feature was used to discriminate between its characters. Subclass 6: (0 dots and no loop found: ‫ك‬،‫أ‬،‫ى‬،‫ال‬،‫ل‬،‫ع‬،‫س‬،‫ر‬،‫د‬،‫)ح‬: Radial distance and horizontal lines cut features are used to discriminate between its characters. The results of stage 2 were very much better that stage 1. The value of the average recognition accuracy of the 29 characters was 92.25%. Of the 29 characters we have 14 characters fully recognized, and the recognition accuracy of the rest of characters was ranging between 70-90%. 4.1.3.3 Stage 3: Adding more features and using feature fusion We added a new feature which is the 45° inclined lines cuts feature to solve confusions like that shown in figure 4.4.

Figure 4.4: New feature used.

We noted from the previous stages that sometimes a single feature was not enough to solve the confusion, which means that we need to combine several features in one feature vector.

80

The problem was that, the units (scale) that these different features were sometimes small integer values (horizontal, vertical, and inclined lines cuts feature vector) and some times large floats (Radial distance feature vector). Thus concatenating these different scales features will cause the Euclidean distance to be biased towards large scale features leading to wrong voting according to the feature having the larger units. To avoid this problem, the feature vectors scales were unified by normalization i.e., computing the mean and variance of all vector elements then computing the normalized values using the following equation:

x

= new

x

old

−µ

σ

(4.1)

Where x is the feature vector element, µ is the mean value of the vector elements and σ is the variance. After normalization, features were fused in a single feature vector using a fusion technique, weighted average. In this fusion technique, weights were computed by trial and error but in general, their values were indicative to how much each feature was more or less representative for a certain class. The hierarchy of classification looks like figure 4.5. The classes and the features used for discrimination between their characters are as follows: 1. Main Class 1 (3 dots: ‫ ش‬،‫)ث‬: Radial distance and vertical line cuts features were used to discriminate between its characters. 2. Main Class 2 (2 dots: ‫ ش‬،‫ ث‬،‫ ت‬،‫)ق‬: we used the dot level feature (dot level = 2 for ‫ث‬, and 1 for ‫ )ت‬together with the radial distance and vertical line cuts features were used to discriminate between its characters. 3. Main Class 3 (1 dot: ‫ ال‬،‫ أ‬،‫ ك‬،‫ ت‬،‫ ق‬،‫ ن‬،‫ ف‬،‫ غ‬،‫ ظ‬،‫ ض‬،‫ ز‬،‫ ذ‬،‫ ح‬،‫ ج‬،‫)ب‬: Subclass 1: (Dot position is down: ‫ ج‬،‫)ب‬: Radial distance feature was enough to discriminate between its characters. Subclass 2: (Dot position is up and has no loop: ‫ ال‬،‫ أ‬،‫ ك‬،‫ ت‬،‫ ن‬،‫ غ‬،‫ ز‬،‫ ذ‬،‫)خ‬ •

Number of endpoints > 2 (‫ غ‬،‫)خ‬: we say ‫ غ‬if all endpoints were in the right half of the image, else we say ‫خ‬.



Number of endpoints > 1 (‫ ال‬،‫ أ‬،‫ ك‬،‫ ت‬،‫ ن‬،‫ ز‬،‫)ذ‬: Radial distance and inclined lines cut features were used to discriminate between its characters.

81

Subclass 3: (Dot position is up and has loop: ‫ ق‬، ‫ ف‬، ‫ ظ‬، ‫ )ض‬Radial distance and vertical lines cut features were used to discriminate between its characters. 4. Main Class 4 (0 dot: ‫ك‬،‫أ‬،‫ھـ‬،‫و‬،‫م‬،‫ط‬،‫ص‬،‫ى‬،‫ال‬،‫ل‬،‫ع‬،‫س‬،‫ر‬،‫د‬،‫)ح‬: Subclass 4: (0 dots and Hamza found: ‫ ك‬،‫)أ‬: Radial distance feature was used to discriminate between its characters. Subclass 5: (0 dots and loop found: ‫ھـ‬،‫و‬،‫م‬،‫ط‬،‫)ص‬: Radial distance feature was fused with vertical and horizontal lines cut features to discriminate between its characters. Subclass 6: (0 dots and no loop found: ‫ك‬،‫أ‬،‫ى‬،‫ال‬،‫ل‬،‫ع‬،‫س‬،‫ر‬،‫د‬،‫)ح‬: the structure of this subclass was changed completely. All the proposed features were used to discriminate between its characters.

Figure 4.5: Arabic characters classification hierarchy in stage 3.

82

Where, RD = radial distance feature, VL = vertical lines cuts, HL = horizontal lines cuts, and IL = 45° lines cuts. The results of stage 3 were better than those of stage 2. The average recognition accuracy of the 29 characters recognition is 96%. Of the 29 characters we have 20 characters fully recognized, and the rest of characters were recognized by 90% accuracy. To make sure of that these accuracies are real, we randomly chose an Arabic document and computed the number of isolated characters in it and the probability of occurrence of each character as shown in table 4.5.

Figure 4.6: The document used to compute the real system accuracy.

83

Table 4.5: Statistics computed from the document used to find the real system accuracy. Character ‫أ‬ ‫ب‬ ‫ت‬ ‫غ‬،‫ظ‬،‫ط‬،‫ص‬،‫ش‬،‫خ‬،‫ح‬،‫ج‬،‫ث‬ ‫د‬ ‫ذ‬ ‫ر‬ ‫ز‬

Accuracy =



Occurrence 134 4 3 0 8 3 21 2

Character ‫س‬ ‫ض‬ ‫ع‬ ‫ف‬ ‫ق‬ ‫ك‬ ‫ل‬ ‫م‬

Occurrence 2 1 1 2 1 2 5 5

Character ‫ن‬ ‫ھـ‬ ‫و‬ ‫ال‬ ‫ى‬ ‫ء‬ ‫ة‬ Total

Occurrence 23 2 43 14 3 3 16 300

Probability of occurrence per character * Recognition accuracy per character

29 c / c

= 96.87% The lasting confusions were: (‫ ال‬، ‫)خ‬, (‫ ن‬،‫)ز‬, (‫ ح‬، ‫)ع‬, (‫ ن‬، ‫)ف‬, (‫ د‬، ‫)ل‬, (‫ م‬، ‫)ھـ‬, (‫ ط‬، ‫)و‬,(‫ ك‬، ‫)ال‬,(‫ س‬، ‫)ى‬ 4.1.3.4 Stage 4: Increasing the reliability of gating In stage 4 we tried to avoid misclassification of the input test pattern by improving the discriminative power of the secondaries identification procedure. We succeeded to discriminate between secondaries (Dot, Hamza, Separated Alef) by 99.7% using some structural features like Character-Body-to-Secondary Ratio, Secondary-Black-to-white-pixel ratio, and Secondary-height-to-width ratio. This enabled us to remove any class overlapping in the feature space. The hierarchy of classification looked like figure 4.7. 1. Main Class 1 (3 dots: ‫ ش‬،‫)ث‬: Radial distance and vertical line cuts features were used to discriminate between its characters. 2. Main Class 2 (2 dots: ‫ ت‬،‫)ق‬: Radial distance feature was used to discriminate between its characters. 3. Main Class 3 (1 dot: ‫ ن‬،‫ ف‬،‫ غ‬،‫ ظ‬،‫ ض‬،‫ ز‬،‫ ذ‬،‫ ح‬،‫ ج‬،‫)ب‬: Subclass 1: (Dot position is down: ‫ ج‬،‫)ب‬: Radial distance feature was used to discriminate between its characters. Subclass 2: (Dot position is up and has no loop: ‫ ن‬،‫ غ‬،‫ ز‬،‫ ذ‬،‫)خ‬ •

Number of endpoints > 2 (‫ غ‬،‫)خ‬: we say ‫ غ‬if all endpoints were in the right half of the image, else we say ‫خ‬.



Number of endpoints > 1 (‫ ن‬،‫ ز‬،‫)ذ‬: Radial distance feature was fused with inclined lines cut feature.

84

Subclass 3: (Dot position is up and has loop: ‫ ف‬، ‫ ظ‬، ‫ )ض‬Radial distance, inclined lines cut and vertical lines cut features were used to discriminate between its characters. 4. Main Class 4 (0 dot: ‫ك‬،‫أ‬،‫ھـ‬،‫و‬،‫م‬،‫ط‬،‫ص‬،‫ى‬،‫ال‬،‫ل‬،‫ع‬،‫س‬،‫ر‬،‫د‬،‫)ح‬: Subclass 4: (0 dots and Hamza found: ‫ ك‬،‫)أ‬: Radial distance feature was used to discriminate between its characters. Subclass 5: (0 dots and loop found: ‫ھـ‬،‫و‬،‫م‬،‫ط‬،‫)ص‬: Radial distance feature was fused with vertical and horizontal lines cut feature. Subclass 6: (0 dots and no loop found: ‫ى‬،‫ال‬،‫ل‬،‫ع‬،‫س‬،‫ر‬،‫د‬،‫)ح‬: the structure of this subclass was changed completely. All the proposed features were used to discriminate between its characters.

Figure 4.7: Arabic characters classification hierarchy in the fifth approach.

85

Where, RD = radial distance feature, VL = vertical lines cuts, HL = horizontal lines cuts, and IL = 45° lines cuts.

4.2 Results and Discussion We proposed a rule-based multiple-classifier system for recognizing off-line handwritten Arabic characters. We passed by 4 stages till we reached a final topology of the system. The database used was a single writer database. The final average character recognition accuracy we got is 97%. Of the 29 characters we have 22 characters fully recognized, and the rest of characters were recognized by 90% accuracy. We checked the real accuracy as explained before using the occurrence probabilities of different isolated characters and applying the rule: Accuracy =



Probability of occurrence per character * Recognition accuracy per character

29 c / c

= 97.17%

Average Accuracy__

The results progress through the whole example is shown in figure 4.8:

100 80

96

92.25 70.06

97

78.33

60 40 20 0 Single Classifier

Stage 1

Stage 2

Stage 3

Stage 4

Figure 4.8: The accuracy progress of the proposed ACR system.

Single Classifier: Radial distance feature only Stage 1: Using classifier ensemble controlled by gating (dots) + Radial Distance feature only Stage 2: Using classifier ensemble controlled by gating (dots & loops) + Radial distance, vertical, and horizontal lines cuts features + end points individually Stage 3: Using classifier ensemble controlled by gating (dots & loops) + feature fusion Stage 4: Increasing gating reliability and removing classes overlapping

86

Our results are comparable to results obtained by other researchers working on "Off-line Isolated Handwritten Character Recognition" as shown in the table below. Our system is characterized by the extremely simple classifier (Euclidean Distance measure) based on the most common used features. Table 4.6: Comparing the proposed system results to other researchers' results. Reference No.

Researchers

Features used

Classifier used

No of writers

Average Accuracy

[22]

Boashash et al

32x32 window of raw binary data is used instead of extracting features

5 types of Neural Networks

1

MLP (81%), LVQ (74%), F-Map (54%)

[37]

M. Rashwan et al

Dot feature, radial distance

5

97.7%

_

Our proposed system

Structural features, radial distance, vertical, horizontal & inclined line cuts

1

97%

Feed forward NN with back propagation Rule-based classification tool: Euclidean distance measure

4.3 Summary In this chapter we proposed an offline character recognition system for isolated Arabic alphabet written by a single writer. Although, we used the most known features of Arabic characters used by researchers in this field, but we were able to achieve high results by proposing the idea of multiple classifier system besides using a classification hierarchy based on the structural features of Arabic characters. The proposed system has reached its complete form after trying several approaches and analyzing the drawbacks and suggestions for result improvement at each stage. Four stages followed to end up with average character recognition accuracy of 97%. Using additional features or another fusion technique may help to achieve up to 100% accuracy in a multi-writer system.

87

Chapter 5: Rule-based Algorithm for On-line Cursive Handwriting Segmentation and Recognition Classically, on-line recognizers consist of a preprocessor, a classifier which provides estimates of probabilities for the different categories of characters (or other sub-word units) and a dynamic programming postprocessor, which eventually incorporates a language model [11]. The role of the postprocessor is to choose the best character category matching the context based on linguistics. In this chapter we propose a rule-based algorithm for the two early stages of an on-line recognizer cursive Arabic handwriting. Rule-based methods were used to perform simultaneous segmentation and recognition of word portions using dynamic programming. The output of these stages is in the form of a ranked list of the possible decisions. In the future, linguistics can be used to select the best decision from this list.

5.1 On-line Character Recognition system stages 5.1.1 Database Collection The first stage in our system was the database collection. Handwritten documents were collected using a slate tablet PC (motion computing LE 1600), shown in figure 5.1, through a graphical user interface (GUI) tool available on internet called Play Ink, shown in figure 5.2.

Figure 5.1: Motion Computing LE1600 tablet PC.

88

Figure 5.2: The Play Ink GUI tool.

This tool is capable of recording any instantaneous sketching drawn on the screen using a mouse (or any other input device) and re-drawing the recorded samples with relatively the same sketching velocity of the user on the same location on the screen. In other words, it can record both the x-y coordinate's values and the time instances of each recorded sample point. This is exactly the information needed for on-line handwriting recognition. A screen shot is shown in figure 5.3.

Figure 5.3: Documents written using the Play Ink GUI tool.

The Database collected was unconstrained (open vocabulary) but should fulfill the following: 1. No digits included. 2. Writing will be in NASKH font only. 3. Avoid sticking adjacent dots.

89

Each writer had been asked to write a number of short paragraphs about different subjects (economy, politics, law and sports).

5.1.2 Preprocessing Stage The second stage in our system was the database preprocessing stage in which we did the following: 1. Filter the document and clear it from unintended writers' errors. 2. Break down the document into text lines and words or sub-words 3. Detect the type of each stroke (either main-body or secondary 'dots') 5.1.2.1 Data Filtering: A stroke is defined as all points detected (samples recorded by the tablet PC) between a single pen-on or pen-down (Start of writing action) and the following penoff or pen-up (Lifting the pen up after writing), Which of course can represent a dot, single character, or a sub-word containing 2 or more characters, etc. We noticed that writers tend to make mistakes while writing due to being panic of using a tablet PC for the first time. Some of them were not used to writing in Naskh font. Thus we had to remove this kind of noise (mistakes) before processing the document. This step was done manually as there is no efficient practical technique that can discriminate correct data from erroneous one. The result of this step is shown in the figure 5.5.

Figure 5.4: The results of data filtering processing.

90

5.1.2.2 Text line and word separation In case of off-line data, the default procedure for this task is the projection histogram in the y-axis and x-axis directions respectively, and the two problems that usually face this technique are: •

The base line skewing that makes line separation difficult and needs careful skew detection and correction stage.



The multi-word overlap or when the inter-word distance is smaller than the normal expected threshold for separating words.

But since the nature of on-line data differs greatly from off-line data, different techniques will have to be used for text lines extraction and word level segmentation. Before explaining the new technique we used we will have to review some of the effort done in this issue. E.H. Ratzlaff [38], implemented a text line extraction technique for unconstrained on-line English handwriting. The goal of the text line extraction was to assign correctly each stroke or component to its appropriate text line so that each isolated line may be passed in turn to the following analysis stage. The task was made difficult by the fact that data frequently contain undulations and shifts in the baseline, baseline skew, baseline skew variability, character size variability, sparse data, skipped lines and inter-line distance variability. He took the advantage of both the spatial and temporal information. The general approach was a “bottom-up” clustering of discrete strokes into increasingly larger groups that eventually merge to complete text lines. The initial clustering was based on the strong evidence of spatiotemporal proximity. Subsequent merging was based on more sophisticated metrics that include dependencies on estimates of inter-line distance and mean character height. The Y-axis projection histogram was generated for each stroke, then the initial bottom-up clustering began by creating Forward Projection (FP) groups. Strokes were merged into FP groups if they were temporally adjacent and have strongly overlapping Y-axis projections. A single unmerged stroke became an independent FP group. The intent was to be very conservative by only merging strokes based on very strong spatiotemporal evidence; the process was not iterative. When FP groups became wide or when significant horizontal gaps exist between strokes, he tended to

91

split them into smaller groups to avoid misrepresentation of character height in case the baseline skew or undulation is significant. This procedure will face effective drawbacks if it is going to be applied to Arabic script due to the small complementary strokes "secondaries" occurring above and below text lines and having null overlapping Y-axis projections, creating a number of independent FP groups. In addition to this the writers tend to write in a very irregular pattern causing large base line skews among the text line and even within one word. Another idea for text line separation was expressed by Gareth Loudon et al [24], for editing English script on smart phones. The authors presented a more convenient methodology to break down the document to words and characters. This was successfully working with English script due to limited cursive nature, i.e. the stroke (pen down/up movement) usually represents a single character. Several parameters were calculated for each stroke during the character segmentation step. An example is shown in figure 5.5.

Figure 5.5: Some parameters for the handwritten phrase “on the”.

The key parameters are: •

Parameter “x”: describes the width of a stroke.



Parameter “s”: describes the space between adjacent strokes.



Parameter “y”: describes the difference in height between the centers of adjacent strokes.



Parameter “c”: describes the distance between the centre of a stroke and the right-hand edge of the preceding stroke.

They assumed that, if (si > max(xi)) or (-si > 2* max(xi) and yi > max(xi)), then stroke i should be a character at the end of a word, else if ( ci > 0) stroke i should be

92

a character within a word, else stroke i should be merged with the following stroke to form a character. Each character segmented from the input sentence was passed on to the next steps for character recognition. This procedure will also face effective drawbacks if it is going to be applied to Arabic script due to the cursive nature of language, which means that the stroke in Arabic word usually represents more than one character (3-5 characters on average) which makes it impossible to estimate or even expect the Arabic stroke geometry (height, width, etc.). Don't forget also that the English characters usually have one or two delayed (complementary) strokes written immediately after the main stroke (e.g. t, i, j, A, etc.) which is not the case in Arabic strokes as it was noticed that most writers tend to put the secondaries after writing most or even the whole word. In addition to what was mentioned, the character size (scale) and writing sequence (order) varieties among writers, and sometimes for a single writer, make the problem more difficult. The new technique we used in our work is a compromise between the two previous ideas. We did what best suits the Arabic writing nature. We used the same bottom-up clustering concept in [38] using the spatiotemporal relations between strokes and using parameters similar to those used in [24] to build the smallest possible FP groups instead of separating the whole text line then splitting it to smaller groups. This helped us to overcome the base line skew problem and have more accurate estimate about the stroke height which is important for the next step of identifying the stroke type. By examining the states of successive written Arabic strokes (either main-type strokes or secondary-type "dots") we found them related spatially to each other by one of the following relations: 1. Touching ∴The two strokes should belong to the same word group 2. Not touching but overlapping on x-axis ∴ The two strokes should belong to the same word group 3. Neither touching nor overlapping on x-axis If the inter-stroke distance is less than the average stroke width

93

∴ The two strokes should belong to the same word group Else ∴ The two strokes should belong to two different word groups Examples are shown in figure 5.6. 8

7

6

5 4 2 1

3 Figure 5.6: The successive stroke states in Arabic language

* Strokes 1 and 2 are neither touching nor overlapping but belong to the same word. *Strokes 2 and 5 are neither touching nor overlapping but belong to 2 different words. * Strokes 1 and 3 are overlapping and belong to the same word. * Strokes 7 and 8 are touching and belong to the same word. The delayed stroke problem was taken into consideration and solved also, such that the FP groups contain the main and secondary strokes of the same word regardless the sequence (order) by which they were written. The full algorithm of the text line separation step is included in Appendix F. The inter-stroke distance is taken to be equal to the average stroke width of the previously written strokes of the same FP group. This estimate works quite well especially that the very small size of secondaries compensate for the presence of long strokes (consisting of 3 or more characters) resulting in a suitable estimate. Although this techniques does not perform 100% meaningful word segmentation, but the results were satisfying, as shown in figure 5.6. You will find some FP groups containing one complete word, others split the word in two or more portions (due to too large inter-stroke distances) and others include more than one

94

word (due to too small inter-stroke distances). In all cases you will find that the amount of baseline skew now is not significant. Splitting words or concatenating multi words can be simply overcome if a Language model stage (linguistics) is added to correct the separation decisions. Thus, it is obvious now that the reason for not always having only one complete word per FP group is that writers are not usually conservative to keep the inter-stroke distance within one word less than the inter-word separation. And we expect obtaining better performance if more research is done to find more complicated way to adapt or re-estimate the inter-word distance during the run time. Table 5.1: The result of the text line and word separation process.

95

To sum up the results of this step, we solved many significant problems facing the text line extraction in the on-line systems: 1. The problem of small complementary strokes "secondaries", occurring above and below text lines and having null overlapping Y-axis projections, that were usually separated as an independent text line: The use of spatial relations and writing sequence information help including these secondaries together with their main strokes in one FP group (or word). 2. The text line separation difficulties due to the presence of base line skew: Replacing the y-axis projection histogram of the whole text line by the y-axis projection histogram of the previously written stroke only help avoiding these difficulties and giving better estimate of the average character height. 3. The delayed stroke problem was taken into consideration and solved such that the FP groups contain the main and secondary strokes of the same word regardless the sequence by which they were written. 4. The estimation of the main parameters used for this purpose like inter-line distances and inter-word distances does not need much effort and can adapt itself to all sizes of written documents without using scale normalization or

96

using ad-hoc techniques to find some fixed threshold or even add a restriction in the database collection stage to keep the lines far apart from each other. In addition to all this, there is no need for character height or character width estimation, which adds more difficulty and almost same accuracy to achieve this step. The two main themes beyond these conclusions are: 1. "Successful methods used in the offline case are not necessarily to be also successful in the on-line case as well". 2. "Successful methods used with Latin Languages are not necessarily to be also successful with Arabic language".

5.1.2.3 Classifying strokes types to main and secondary Among the interesting Arabic language characteristics, we see that we have many characters having the same main body and differ from each other only by the existence of complementary strokes like dots and hamzas. This characteristic enables us to reduce the number of patterns in our system in case that we erase these dots resulting in much less confusions for the classifier. In other words, the recognition stage can be simplified by splitting it to two sub-stages. In the first sub-stage, words are segmented and patterns are identified without dots and in the second one, the dots are restored and the patterns are reidentified. To achieve this goal we have to first classify the stroke types of each word (FP group) to mains and secondaries and reject the secondaries before proceeding to the next stage. The target is to obtain a rule-based algorithm that identifies the strokes type of the input document. Trying to use geometrical features like height and width only (like most of the researchers do in the off-line case) may solve the problem of single dots or 2 or 3 stuck dots but unfortunately it gave no help to discriminate between delayed Alef 97

(case of ‫ ط‬and ‫ ال‬for instance) which should be preserved and Hamza which has to be eliminated as they are very close in size and shape (writers usually bend the stroke of Alef instead of making a straight line and draw a careless Hamza with very little zigzag shape). Again here we tended to use the space-time information to find out the type of each stroke. After observing the FP groups obtained from the previous stage we found the following: 1. FP groups usually contain one, two or more than two strokes. 2. Almost all writers tend to write the main stroke before the secondary, the other case "e.g. hamza then Alef in ‫ "أحمد‬is not accepted in our system In case the FP group contains one stroke then it should be main-type. In case the FP group contains two strokes then the first one should be main-type. The second stroke may be a secondary or main depending on its height, shape and location. It usually follows one of 5 cases: Case 1: the second stroke is small (less than 40% of the main stroke height). Case 2: the second stroke lies totally above or below the main stroke with null y-axis histogram overlap. Case 3: Especially for Hamzas, the second stroke may have a small y-axis histogram overlap and large size (may reach in some cases about 70% of the main stroke height) but should lie above the main stroke and should have curvature. Case 4: like case 3 but the second stroke should be straight. This is especially for the kaf hat case where writers tend to write first or middle kaf (‫ ـكـ‬،‫ )كـ‬as two strokes. Case 5: Any stroke state different than the above. Table 5.2: Stroke states in FP groups containing only two strokes.

‫عـن‬

‫ﻣن‬

‫إ‬،‫أ‬

(‫ ـكـ‬،‫)كـ‬

‫ول‬

Case 1

Case 2

Case 3

Case 4

Case 5

98

To sum up, the decision in this part is based upon 3 features: •

The relative height.



The relative position.



The shape geometry (straight or zigzag).

The curvature (or zigzag) detection is achieved using a very simple mathematical rule which is "The shortest length between any 2 points is a straight line". Imagine that we drew a straight line between the start and end points of a Hamza and a kaf hat or Alef for example. And imagine that we did the same with each 2 successive samples (points) on the stroke. Now we have to sum the lengths of all small line segments between each 2 successive samples on the stroke and compare this sum to the length of the straight line joining the stroke both ends. The length of the straight line joining the stroke both ends should be almost equal to the sum of small segments between each 2 successive samples on the stroke in case of a kaf hat or Alef (should be kept), while it will be almost half this sum in case of Hamzas (should be eliminated). Finally, we can claim that the type of the second stroke in cases 1, 2 and 3 is considered to be a secondary stroke, while in cases 4 and 5 it is considered to be a main stroke. For the FP groups containing 3 or more strokes, we will have different cases relative to the preceding main stroke in time not the nearest main stroke in space: Case 1: the current stroke height is very small (< 30% of previous main stroke height) with zero or small y-axis histogram overlap with previous stroke. Case 2: the current stroke height is not large (< 70% of previous main stroke height) and lies totally above or below the previous main stroke with null y-axis histogram overlap (i.e. Hamzas). Case 3: the current stroke height is large (>= 70% of previous main stroke height) and lies totally above or below the previous main stroke with null y-axis histogram overlap (a case occurring due to significant base line skew or kaf hat case). Case 4: Any state different than the above. i.e., the current stroke height is large and with large y-axis histogram overlap with previous main stroke.

99

Table 5.3: Stroke states in FP groups containing more than two strokes.

Case 1

Case 2

Case 4

We can claim that the current stroke in cases 1 and 2 is considered to be a secondary stroke while in cases 3 and 4 is considered to be a main. The odd cases that did not follow the above rules are shown in table 5.4: Table 5.4: Odd Stroke states in FP groups containing more than two strokes.

‫أ‬

‫اإل‬

‫المتحدة‬

‫ كز‬-‫ﻣر‬

‫ستاد‬

‫نظام‬

Type 1 Error

Type 2 Error

Type 3 Error

Type 4 Error

Type 5 Error

Type 6 Error

Character

Preserving Hamzas because: Current stroke height > 30% of previous main stroke height with small y-axis histogram overlap

Preserving Hamzas

Small link deletion

Character deletion

because: Current

because: Current

because: Current

stroke height is >=

stroke height is

stroke height > 30%

70% of previous

small < 30% of

&< 70% of previous

stroke height and

previous stroke

main stroke height

lies totally below

height (relatively too

and lies totally above

the previous main

long) with small y-

the previous main

stroke with null y-

axis histogram

stroke with null y-

axis histogram

overlap with

axis histogram

overlap

previous main stroke

overlap

deletion because: Current stroke height is small < 30% of previous main stroke height (relatively too long) with small y-axis histogram overlap with previous main stroke

100

Character deletion because: Current stroke height < 30% of previous main stroke height (relatively too long) with zero or small y-axis histogram overlap with previous main stroke

The cases need to be updated relative to the nearest main stroke in space not the preceding main stroke in time because its more logical to relate the delayed strokes to the nearest spatial candidate rather than the nearest temporal one. Case 1: the current stroke height is very small (< 30% of nearest main stroke height) with zero or small y-axis histogram overlap with nearest main stroke. Case 2: the current stroke height is large (>= 70% of nearest main stroke height) and lies totally above or below the nearest main stroke with null y-axis histogram overlap. Case 3: the current stroke height is very small (< 30% of nearest main stroke height) with small y-axis histogram overlap with nearest main stroke (which is relatively too long) and the y difference between the both centroids < half the average main stroke height within the same FP group and lies to the left of the nearest stroke (i.e., no overlap on x-axis projection). Case 4: the current stroke height > 30% &< 70% of nearest main stroke height and either has its height = 4 times its width (separate Alef) or its width is >= half the average main stroke width within the same FP group (kaf hat) and lies totally above or below the nearest stroke. Case 5: Neither of the above conditions is true but the current stroke lies totally above or below the nearest stroke. Case 6: the current stroke height is large and with large y-axis histogram overlap with nearest main stroke. We can claim that the current stroke in cases 1, 3 and 5 is considered to be a secondary stroke while in cases 2, 4 and 6 is considered to be a main. The new cases worked well in solving Type 4 and 5 errors but for Type 3 and 6 errors we need to check the eliminated strokes once more and fetch the strokes having relatively large sizes and return them back as main strokes. The consequence of this solution is that successfully eliminated hamzas are back again as main strokes!!! It was obvious now that removing Hamzas is a very difficult task (due to writings' varieties). Preserving them will not cost as much effort as trying to eliminate them. This is because; preserving them will add only a small number of patterns to the recognition stage classifiers which is not a high complexity. In addition to this, 101

actually Hamza is not a complementary stroke indeed since we have a main Hamza pattern written separately (for example: ‫)س‡‡ماء‬. The full algorithm of stroke type identification stage is given in Appendix F. From all the above, we can now derive some database restrictions that best suit our system: 1. Writing in Naskh font only. 2. No numerals included. 3. Sticking adjacent dots is not allowed. 4. Keeping the inter-stroke distance within one word less than the inter-word distances. 5. Neat writing, i.e., there shouldn't be too large main strokes together with too small main strokes within one word because the small strokes in this case will be considered as secondaries. The steps of the preprocessing stage are shown in figure 5.7. FP groups

Main strokes Input

Document

Error Deletion and Text line extraction

. . . .

Stroke Classification to main and secondary

Secondary strokes

Figure 5.7: The steps of the preprocessing stage.

5.1.3 Pattern Definition Stage The third stage in our system was the pattern shape definition stage in which we define the pattern shapes for the recognition (after secondaries elimination). Pattern shapes were defined by observing the collected handwritings. This means that we had more than one shape for the handwritten character in all its known

102

positions (Start, Middle, End, and Isolated). The pattern shapes are shown in Appendix E.

5.1.4 Feature Extraction Stage The fourth stage in our system was the feature extraction stage. From the literature survey we found almost all the researchers take the pen trajectory direction as the main feature representing the on-line handwriting and some make use of unsampled pen movements (in air) as in [20], [23] and [34]. In [34], Substroke has been chosen as the model unit. Depending on the directions, lengths, and pen-up/down movements of substrokes, 25 substrokes of eight directions are defined as shown in the figure below: eight long strokes (A–H), eight short strokes (a–h), eight pen-up (un-sampled movement in air) movements (1–8) and one pen-up-down movement (0).

Figure 5.8: Three freeman chain codes used as a feature for on-line handwriting.

Apart from the 25 substroke directions, an inter-character movement representing the pen-up movement between the last pen-down stroke of the preceding character and the first pen-down stroke of the succeeding character is defined also. It is named pen-up movement ’9’. In [23], Handwriting was structurally simplified as a sequence of the straightline segments. After noise removal and smoothing operation, adjacent pen movements with similar directions were grouped into a single straight-line segment. An invisible pen-up movement between pen-down strokes was also inserted as an imaginary line. Average direction of the line segment with pen-down movement was encoded as one of the 16-direction codes, and the imaginary line was encoded by another 16-direction

103

codes. The resulting sequence of direction code was named "skeleton pattern". It described a time-sequential and global shape of the pen movements.

Figure 5.9: Handwriting representation using direction codes.

A clustering method was used to gather skeleton patterns of the similar directional chain codes into a cluster. The skeleton pattern that appears most frequently in each cluster is chosen as the representative pattern of that cluster. We followed the same trend and used the 3 sets of Free-man chain codes described in [34] to represent the different pattern shapes and group (or cluster) the most representative patterns while training, as described in [23], to be used later in the recognition stage. The feature vector was composed of numbers and letters as shown in the table below. Notice the direction code of the pattern "Dal" in both figures. Table 5.5: Feature vectors of different pattern shapes.

Feature Vector:

[e-D-g-f-e-d-9]

Feature Vector:

[D-9-8-f-e-d-9]

5.1.5 Training Stage The fifth stage in our system is the training stage. The details of this stage depend greatly on the methodology that will be used in the recognition stage.

104

As we discovered from literature, there are two opposite approaches that researchers usually use to solve the unconstrained (open vocabulary) handwriting recognition problem. Approach 1: The preprocessing stage must contain character-level-segmentation. Since working on segmented characters (training and testing by characters models) is better than working on whole words (training and testing by whole word models) because the number of needed patterns or models will be limited and no need for large size lexicons. In the feature extraction stage, Ad-hoc techniques are the only way to find out the best feature. Approach 2: Since there is no efficient segmentation technique for Arabic script till this moment, thus performing segmentation before recognition leads to degrading final recognition accuracy. The segmentation free technique is considered better as it gives more relaxation to modeling, thus HMM is the best modeling technique and the viterbi is the best search technique for classification. Features should be selected by some smart way (e.g. Linear descriminant function). We followed the first approach but by performing segmentation-byrecognition rather than explicit segmentation-before-recognition. This approach was followed by many researches like Thomas M. Breuel [10] and S. El-Dabi [3, 9]. ElDabi's basic idea was to extract sequentially a set of features and accumulate the values while moving along the word (column by column), then checking them against the feature space of a given font. If the character was not found, another column was added to the character, then the features were re-calculated again and checked against the feature space. He used to repeat this process until the character was recognized or the end of the word was reached. We used almost the same approach exactly. Thus we aim at building a registry comprising all representative patterns (feature space) of all pattern shapes during the training stage. To achieve this we made transcription files of the training data to describe the content of each training file. These files stand for manual segmentation of the word

105

strokes because it specifies the start and end points of each pattern found in each stroke. For each word (FP group), the transcription file of the training data contains the following: 1. The pattern shapes names that are represented by the main strokes 2. For each pattern shape, the number of segments composing the pattern (to handle the case of segmented pattern shapes). 3. For each segment composing the pattern shape, the segment number, start and end stroke sample. An example of a transcription file is shown in figure 5.10.

Stroke Number

No. of parts comprising the pattern shape

Start Point of part 1

End Point of part 1 Pattern Shape Name

Figure 5.10: An example of the training transcription file.

The training procedure goes as follows: For each transcription file, pattern shapes data (start, end, number of parts, etc.) are read and the direction features (chain code) are extracted. All the feature vectors belonging to the same pattern shape are clustered, as done in [34] and [23]. The most representative patterns (feature vectors) are stored to construct a registry for the recognition stage as shown in figure 5.11. The 106

flow chart representing the training algorithm is shown in figure 5.12. The full algorithm is in Appendix F. Representative pattern 1

‫أ‬

. . Representative pattern n

.

.

.

. Representative pattern 1

‫ح‬

. . Representative pattern m

.

.

.

. Representative pattern 1

‫ى‬

. Representative pattern k

Figure 5.11: The chain code registry of all pattern shapes built during the training stage.

107

Start Store the training file names in an array  Training_Files File name counter ("i") = 1

ChainCode_Registery = []

Store the strokes of training file i  pnt Read the corresponding transcription training file content File lines counter ("m") = 1

Read and store the pattern shape names Pattern shape name counter ("j") = 1

Model = Pattern shape name no. j

Head of transcription file line "m" = Model

Increment j No

Pattern shape parts counter  ("k")

Read the start and end points of the pattern shape parts Extract the chain code for the pattern shape parts Add the chain code (writing direction)  ChainCode_Registery

j > total no. of pattern names

Increment j No

Yes No m > total no. of transcription file lines

Increment m

Yes No Increment i

i > total no. of training files

End

Figure 5.12: The training algorithm.

5.1.6 Recognition Stage The last stage in our system was the recognition stage. In this stage, words (FP groups) of the test document were pre-processed sequentially. After preprocessing step, the main task was to find cuts that divide up connected components into their individual characters. The basic idea is to use a dynamic programming algorithm to 108

find a globally optimal set of cuts through the input string (feature vector) which minimizes a certain cost function. The set of cuts and their precise shape are found simultaneously. The total stroke direction feature vector of the test stroke was compared against the registry (direction after the other) until either a character was recognized (i.e., we decide a segmentation point) or the feature vector reached its end. This comparison was performed using a dynamic programming technique called "Minimum Edit Distance" [39]. The minimum edit distance algorithm goes as follows: Function Min-Edit-Distance (target, source) returns min-distance n  Length(target) m  Length(source) Create a distance matrix distance [n+1, m+1] distance [0,0]  0 for each column i from 0 to n do for each row j from 0 to m do distance [i,j]  MIN (distance [i-1,j] + ins-cost(targeti), distance [i-1,j-1] + subst-cost(sourcej , targeti), distance [i,j-1] + del-cost(sourcej)) Where, the ins-cost, subst-cost and the del-cost are the insertion error, substitution errors and the deletion error penalties respectively. The question now is: what values did we take for the insertion, deletion and substitution error penalties? J. Lee et al [23], used the angular difference between the two elements of the 2 vectors under comparison (the source and target vectors). For more illustration, we have three groups of directional codes: Group1 = ['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H']; Group2 = ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h']; Group3 = ['1' '2' '3' '4' '5' '6' '7' '8'];

109

The penalties are decided as follows: Source string element

Target String element

Substitution Penalty

Equal 'x' ∈ Group 1

Equal 'x' ∈ Group 1

Zero

Equal 'x' ∈ Group 2

Equal 'x' ∈ Group 2

Zero

Equal 'x' ∈ Group 3

Equal 'x' ∈ Group 3

Zero

Equal 'x' ∈ Group 1

Equal 'y' ∈ Group 1, y ≠ x

Absolute angle between (x, y)

Equal 'x' ∈ Group 2

Equal 'y' ∈ Group 2, y ≠ x

Absolute angle between (x, y)

Equal 'x' ∈ Group 3

Equal 'y' ∈ Group 3, y ≠ x

Absolute angle between (x, y)

Equal 'x' ∈ Group 1

Equal 'y' ∈ Group 2

4*Absolute angle between (x, y)

Equal 'x' ∈ Group 2

Equal 'y' ∈ Group 1

4*Absolute angle between (x, y)

Equal 'x' ∈ Group 1 or 2

Equal 'y' ∈ Group 3

16*Absolute angle between (x, y)

Equal 'x' ∈ Group 3

Equal 'y' ∈ Group 1 or 2

16*Absolute angle between (x, y)

After we had determined the substitution error penalty then we can determine the insertion and deletion error for the current string element (1 substitution is 1 deletion+1 insertion) Insertion Penalty = Substitution Penalty / 2

(5.1)

Deletion Penalty = Substitution Penalty / 2

(5.2)

The factors '4' and '16' come from the assumption that short strokes (represented by Group 2 directions) are almost half the length of long strokes (represented by Group 1 directions) thus moving from Group 1 axes to Group 2 axes may be multiplied by square this factor ('4') to increase the substitution penalty reasonably while in the case of moving from Group 2 or Group 1 axes to Group 3 axes is not logically accepted thus the substitution penalty should be increased significantly by multiplying a factor (42 = 16). Other value sets for these factors were tried {1.52, (1.52)2}, {2.52, (2.52)2}, {32, (32)2}, {3.52, (3.52)2}, {42, (42)2}. We found no change in the character identification results. The total edit distances increase by increasing the factor values. We chose {22, (22)2} value set as they represent the smallest integer values thus the total distances do not get so large.

110

After comparing the test feature vector to the different representative patterns of all pattern shapes in the chain code registry, pattern shapes were arranged ascending according to the minimum-edit-distance multiplied by a factor indicating their degree of resemblance to a specified region of the test pattern. The probable pattern shapes of the first character in the stroke were stored as roots of individual trees. Each tree was completed by comparing the un-identified region of the feature vector to the registry again and again to find the probable pattern shapes of the second, third and fourth characters till the whole stroke was totally recognized. Tree construction was achieved by combining first, second, third and fourth character possibilities such that each begins at the end point of the previous character and the total number of direction codes was equal to the test feature vector length. The recognition stage overview is given in figure 5.13. An example in figure 5.14 shows the steps in details.

Figure 5.13: The recognition system overview.

111

Stroke 3: Hamza Stroke 2: Alef Eliminate secondaries

Stroke 1: Nabra & Dal

Stroke 1: Nabra & Dal Feature Extraction

['h' 'f' 'E' 'D' 'h' 'f' 'E' '9'] ['h'] ['h' 'f'] ['h' 'f' 'E'] ['h' 'f' 'E' 'D'] ['h' 'f' 'E' 'D' 'h'] ['h' 'f' 'E' 'D' 'h' 'f'] ['h' 'f' 'E' 'D' 'h' 'f' 'E']

For all representative patterns in the chain code registry, do: 1. Compare this part of test vector and find the minimum edit distance 2. Use string matching to find the no. of matches between the 2 vectors 3. Distance = minimum edit distance * length of representative pattern / no. of matches

1. Check the end points. 2. If the segmentation point is not the end of the test vector, take the left directions one by one

Tree construction

Figure 5.14: The steps of the first recognition sub-stage

112

For all representative patterns in the chain code registry, do: 1. Choose the part of test vector having the minimum Distance to decide the segmentation point for this pattern 2. List the start and end segmentation points in a file as new character candidates

The advantage of the resemblance factor is that it compensates to some extent the absence of the off-line image of the stroke. The minimum-edit-distance technique is a good mathematical measure but cannot be used solely with the chain code feature. We need either some off-line features or at least template matching information. We used string matching to find out the number of matches between the representative patterns from the registry and the test vector. The final distance is given by the following equation.

Distance = minimum-edit-distance *

Length of representative pattern Number of matches

(5.3)

Thus, as the number of matches increases, the value of resemblance factor tends to 1 and remains the minimum-edit-distance measure only. While, as the number of matches decreases, the value of resemblance factor increases greatly and the 'Distance' value will be multiples of the minimum-edit-distance. After tree construction, we were able to obtain a ranked list in which each member comprised the characters (without dots) representing the stroke, ranked with their total edit distance 'Distance' as shown in figure 5.14. The last step left in this stage was the dot restoration (the second recognition sub-stage). Two trials were done for assigning dots to the characters representing the stroke. A new ranked list was obtained after removing inconvenient members. An example is shown in figure 5.15.

113

Figure 5.15: The result of the recognition stage.

Dot restoration trials: Trial 1: Having decided the segmentation points of each stroke thus we can decide the start and end points of each character comprising the stroke. In other words we can call the centroid of the stroke samples between 2 segmentation points to be the centroid of a specific character starting at one of them and ending at the other. The dots centroids were calculated, as well as the centroid of each character per stroke and the dots were assigned to the character having the nearest centroid (First, second, third or fourth). When the dot was assigned to some character, and some probable candidates of this character do not usually bear dots like Lam, Meem, Waw for example, this candidate together with its associates were removed from the ranked list. The same happened to the candidate characters that need a dot like Baa, Faa for example when there was no dots assigned or when the number of assigned dots or their location was inconvenient. Although dot restoration decreased the ranked list size significantly and swapped the correct segmentation-recognition results to the top of the list, but we noticed that writers usually drift right or left from the correct location when they place

114

dots. The dot position drifts caused wrong dot assignments to characters and therefore a lot of losses of correct choices as well. Trial 2: Another trial was made by trying different distributions of dots with the stroke characters and checking the validity of their number and location to remove inconvenient list members. This trial was more successful, we were able to preserve almost all correct list members together with reasonable reduction percentage in the list size.

5.2 Results and Discussion The database we used for training was composed of 317 words (1814 characters), written by four writers the test database was composed of 94 words (305 strokes, 435 characters) written by other four writers. Before dot restoration, we were able to successfully segment and recognize 93% of the test strokes (80% of the test words). In other words, the probability of finding the correct segmentation-recognition results within the ranked list is .93 for the test strokes (.96 for the test characters). Or using the expressions of P. Neskovic and L. N. Cooper [14] we can claim that: "The correct results of the test strokes exist within the top list members 93% of the time (96% for the test characters)". The results of stroke and word segmentation and recognition are shown in table 5.6. Table 5.6: The segmentation and recognition results before dot restoration. Total

Correctly

Segmentation

Correctly

Recognition

Number

Segmented

Probability

Recognized

Probability

Characters

435

--

--

419

.96

Strokes

305

283

.93

283

.93

Words

94

77

.81

76

.80

After dot restoration (Trial 2), we were able to successfully recognize about 92% of the test strokes. In other words, the probability of finding the correct segmentation-recognition results within the ranked list is .92 for the test strokes (.95 for the test characters). Or again we can claim that: "The correct results of the test

115

strokes exist within the top list members 92% of the time (95% for the test characters)". The results of stroke and word recognition are shown in table 5.7. Table 5.7: The recognition results after dot restoration. Total Number

Correctly Recognized

Recognition Probability

Characters

435

415

.95

Strokes

305

279

.92

Words

94

70

.74

The 5% loss in the number of characters recognized is the consequence of two problems: 1. Imperfect segmentation: Not having a large number of training samples (representative patterns) of each pattern shape defined may cause incorrect segmentation decision for test patterns having large degree of variety. 2. Wrong dot assignment: writer drifts and strokes overlap makes it hard to assign dots to the strokes (before distributing them on characters). An example is shown below:

Figure 5.16: Strokes overlap causes character accuracy loss

Increasing training samples from multi writers and avoiding overlaps is expected to give much better results. One may ask about the single output character recognition accuracy value rather than the probability of finding the correct result within a list. The good news about the results we got is that most correct recognition results exist at the top of the ranked list. This means that, by adding an efficient language model to search a few

116

top list members we may obtain a single output with character recognition accuracy almost equal to our probability of finding the correct result within the list (95%). A chart showing the number of correct results versus their location in the ranked list is shown in figure 5.17.

Recognition Choices

No. of correct choices____

180 160 140 120 100

Characters Strokes

80 60 40 20 521

420

194

74

116

58

53

43

37

31

29

27

25

23

21

19

17

15

13

9

11

7

5

3

1

0 Location in the ranked list

Figure 5.17: The number of correct results versus their location in the ranked list.

Table 5.8: The number of correct results versus their location in the ranked list. Location

No. of correct c/c choices

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

153 45 33 20 18 4 5 16 7 3 4 3 12 3 8 3 0

No. of correct strokes choices 134 32 20 15 11 3 3 8 7 3 3 2 5 1 5 2 0

Location

No. of correct c/c choices

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 37 40

3 2 5 3 3 0 2 4 5 4 4 2 3 7 2 1 2

No. of correct strokes choices 1 1 3 2 2 0 1 3 3 4 2 1 2 6 1 1 1

Location

No. of correct c/c choices

43 51 53 55 58 67 74 115 116 173 194 328 420 487 521

1 1 2 1 2 2 2 2 1 2 1 2 3 2 2

No. of correct strokes choices 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1

Total

415

304

And what makes our results really promising is that the list sizes after dot restoration has been reduced significantly with almost no loss for correct results. The table below shows the list size reduction percentage for strokes having 1, 2, 3 and 4 characters. The full algorithm of the recognition stage is given in Appendix F.

117

Table 5.9: The list size reduction percentages.

No. of characters in stroke 1 2 3 4

List Size Reduction Percentage 63.87% 72.39% 71.15% 55.66%

Reduction percentage____

Percentage of list size reduction 80.00% 70.00%

72.40%

71.20%

63.80% 55.70%

60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 1

2

3

4

No. of characters per stroke

5.3 Summary In this chapter we proposed a rule-based algorithm for the two early stages of an on-line recognizer cursive Arabic handwriting. We used the pen trajectory as feature. This trajectory was coded using 3 sets of Freeman chain codes. Rule-based methods were used to perform simultaneous segmentation and recognition of word portions using dynamic programming. The output of these stages is in the form of a ranked list of the possible choices. Before dot restoration, we were able to correctly segment and recognize the test strokes 93% of the time. Restoring dots has decreased the list size significantly and swapped the correct segmentation-recognition results to the top of the list. The probability of finding the correct choice within the ranked list top members was .95 for test characters and .91 for test strokes. Adding an efficient language model to search a few top list members is expected to obtain a single output with remarkable character recognition accuracy.

118

Chapter 6: Conclusion and Future Work As noticed from the thesis, the proposed work was divided to two parts to overview both branches of the handwritten Arabic character recognition problem: the off-line and the on-line, and attack the problem from different sides: the isolated and connected character problems, the single writer and multi-writer variability problems, single output decision and multi-outputs decisions. We investigated the advantages and the obstacles facing both the off-line and on-line handwritten character recognition problems using the simplest trend of solution: the rule-based algorithms. We proposed an off-line character recognition system for solving a simple known problem, isolated handwritten Arabic character recognition, to investigate the off-line problem nature: features, available information, and solution approaches. The database was a single writer database. We used the most known features of Arabic characters used by researchers in this field (structural features). The system was a single-output system. We were able to achieve high results, comparable to that achieved by other researchers in literature, by proposing the idea of multiple classifier system besides using a classification hierarchy based on the structural features of Arabic characters. We proposed a rule-based algorithm for the two early stages of an on-line cursive Arabic handwriting recognizer. This branch contains many open issues and is very challenging to researchers. The database was multi-writers database. We used the most known features of Arabic characters used by researchers in this field (pen trajectory) and we added some modifications. The output is in the form of a ranked list of choices. We were able to correctly segment and recognize most of the test words. The correct segmentation-recognition results we located in the top choices of the ranked list. The approach we used solves the cursiveness problem in an elegant way, but it is time consuming and quite sensitive to noise. We still have some limitations.

119

Although the pen trajectory, on-line feature, is a powerful feature, this feature is not sufficient to be used solely. Following the pen trajectory causes the loss of the global pattern shape information which the off-line image provides (e.g., confusions between {‫و‬, ‫ }ر‬and {‫ـھ‡ـ‬, ‫)}ـمف‡ـ‬. On the other hand, converting the on-line data to bitmaps and trying to solve it as off-line is a very hard task, still under research and is not yet achieving reliable results [40, 41, 42]. Besides, the segmentation task is quite harder in case of off-line than on-line. Thus we conclude as future work that we can benefit both systems' advantages by using on-line/off-line classifier ensemble system if we were able to do successful bitmap conversion. In this case, the segmentation decision in off-line case is corrected using the on-line system and the classification decision of the on-line case is corrected using the off-line system. This can be done as follows: For either the segmentation or the recognition stages, there should be some error threshold associated to the on-line stage output over which the pattern is said to be mis-segmented or misclassified (or a confidence level measure). In such case we should deploy the off-line stage (Proposal 2). The other solution is to deploy both types of stages simultaneously and take votes for the final segmentation or recognition decision (Proposal 1) as shown in the figures below. Proposal 1: Input

Off-line stage

On-line stage

Stage Decision

Classifier Ensemble

Proposal 2: Input On-line stage

Failure or Suspect

Off-line stage Well classified

Well done

Stage Decision

Boosting by filtering

120

Despite of the remarkable enhancement in results that we experienced using the classification hierarchy of characters depending on the structural features, unfortunately, it was very hard to apply the same approach to the cursive handwriting due to writers' drifts while placing dots above and below characters. Thus, neat writing and careful dot placement is the only way to benefit this advantage after an efficient character segmentation stage. It is also expected that using much larger neat training database from cooperative writers will enhance the results as well. In addition to what was mentioned, single output final character recognition accuracy can be obtained by introducing Language Model and solving ambiguities linguistically. Further more, the character recognition accuracy can be enhanced and a large degree of writing variability can be encountered by using a multiple classifier system. Neural networks, HMMs, Fuzzy and many other classifiers can be used. Their outputs can be fused to have a final decision. Goals of researchers in this field have no end. The problem is so challenging. Working on numerals and Naskh and Reqaa font are still considered as open issues waiting for solution.

121

References: 1. Scott Connell , MSc thesis , A comparison of hidden markov model features for the recognition of cursive handwriting, Michigan State University, 1996. 2. Flávio Bortolozzi, Alceu de Souza Britto Jr., Luiz S. Oliveira and Marisa Morita, "Recent Advances in Handwriting Recognition", In Proceedings of the International Workshop on Document Analysis, 2005, pp 1-30. 3. Yasser Hifny, MSc. Thesis, On-line Arabic handwriting recognition, Cairo University, 2000. 4. Rafael C. Gonzalez, and Richard E. Woods, Digital Image Processing, Pearson Education Asia, 2002. 5. Myer Blumenstein, PHD thesis, Intelligent Techniques for Handwriting Recognition, Griffith University-Gold Coast Campus, December 2000 6. Nawwaf N. Kharma and Rabab K. Ward, "Character recognition systems for the non expert", IEEE Canadian Review, Autumn 1999. 7. G. Kim, V. Govindaraju, and S.N. Srihari, "An Architecture for Handwritten Text Recognition Systems", International Journal of Document Analysis and Recognition, vol. 2, pp. 37-44, 1999. 8. Re’jean Plamondon and N. Srihari, “On-line and off-line handwriting recognition: a comprehensive survey”, IEEE transactions on pattern analysis and machine intelligence, vol 22, No. 1, January 2000. 9. May elsaid Allam, MSc. Thesis, Hidden Markov model for recognizing Arabic characters, Cairo University, 1995.

122

10. Thomas M. Breuel, "Segmentation of Handprinted Letter Strings using a Dynamic Programming Algorithm", In Proceedings of Sixth International Conference on Document Analysis and Recognition, 2001, pp 821-826. 11. R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, and V. Zue, Survey of the State of the Art in Human Language Technology. Center for Spoken Language Understanding CSLU, Carnegie Mellon University, Pittsburgh, PA, 1995. 12. Jayashree Subrahmonia, and Thomas Zimmerman, "Pen Computing: Challenges and Applications", Proceedings of ICPR, 2000. 13. A. Amin, "Machine Recognition of Handwritten Arabic Words by I.R.A.C II System", Workshop on computer processing and transmission of Arabic Language, Kuwait, 1985, pp 12-14. 14. P. Neskovic and L. N. Cooper, "Neural Network-based context driven recognition of on-line cursive script", Proceedings of the Seventh International Workshop on Frontiers in Handwriting Recognition, 2000, pp 353-362. 15. Richard O. Duda, Peter E. Hart & David G. Stork, Pattern Classification, John Wiley & Sons, second edition. 16. A. F. R. Rahman and M. C. Fairhurst, "Multiple classifier decision combination strategies for character recognition: A review", IJDAR (2003), pp. 166–194. 17. Cheng-Lin Liu and Hiromichi Fujisawa, "Classification and Learning for Character Recognition Comparison of Methods and Remaining Problems", First IAPR TC3 NNLDAR Workshop, Seoul, Korea, August 2005, pages 1-7. 18. M. Kamel, and N. Wanas, “Data dependence in combining classifiers”, Multiple Classifiers Systems, Fourth International Workshop, Surrey, UK, June 11-13, 2003. 19. ROY A. HUBER, and A. M. HEADRICK, Handwriting Identification: Facts and Fundamentals, CRC Press, 1999. 123

20. Han Shu, MSc. Thesis, On-Line Handwriting Recognition Using Hidden Markov Models, Massachusetts Institute of Technology, 1997. 21. Alaa Mohamed Gouda, PHD Thesis, Arabic handwritten connected character recognition, Cairo University 2004. 22. Hazem Y. Abdelazim, “Recent trends in Arabic OCR”, Conference of Engineering Language, Ain Shams University, 2005. 23. J. Lee, J. Kim, and J. H. Kim, "Data driven design of HMM topology for on-line handwriting recognition", Proceedings of the Seventh International Workshop on Frontiers in Handwriting Recognition, 2000, pp 239-248. 24. Gareth Loudon, Olle Pellijeff, LI Zhong-Wei, "A Method for Handwriting Input and Correction on Smartphones", Proceedings of the Seventh International Workshop on Frontiers in Handwriting Recognition, 2000, pp 481-485. 25. E. Go´mez Sa´nchez, J.A. Gago Gonza´lez, Y.A. Dimitriadis, J.M. Cano Izquierdo, J. Lo´pez Coronado, "Experimental study of a novel neuro-fuzzy system for on-line handwritten UNIPEN digit recognition", Elsevier, 1998. 26. S. Jaeger, S. Manke & A. Waibel, "NPEN++: an on-line handwriting recognition system", Proceedings of the Seventh International Workshop on Frontiers in Handwriting Recognition, 2000, pp 249-260. 27. B. Zhang, S. Srihari, and S. Lee, "Individuality of Handwritten Characters", Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003). 28. B. Verma, M. Blumenstein and S. Kulkarni, "Recent Achievements in Off-line Handwriting Recognition Systems", International Conference on Computational Intelligence and Multimedia Applications (ICCIMA’98), Melbourne, Australia, 1998, pp 27-33.

124

29. Yong Haur Tay, Pierre-Michel Lallican, Marzuki Khalid, Christian Viard-Gaudin, Stefan Knerr, "Offline Handwritten Word Recognition Using A Hybrid Neural Network And Hidden Markov Model", Sixth International, Symposium on Signal Processing and its Applications, 2001, pp 382-385. 30. M. Morita, R. Sabourin, F. Bortolozzi, C. Suen, "Segmentation and recognition of handwritten dates an HMM-MLP hybrid approach", IJDAR (2004), pp. 248–262 31. A. Vinciarelli and S. Bengio, "Offline Cursive Word Recognition using Continuous Density Hidden Markov Models trained with PCA or ICA Features", Proceedings of 16th International Conference on Pattern Recognition, 2002, pp 81-84. 32. Victor Lavrenko, Toni M. Rath and R. Manmath, "Holistic Word Recognition for Handwritten Historical Documents", 2003 or after 33. U. - V. Marti and H. Bunke, "Handwritten Sentence Recognition", Proceedings of 15th International Conference on Pattern Recognition, Barcelona, Spain, 2000, pp 467–470. 34. H. Shimodaira, T. Sudo, M. Nakai, and S. Sagayama, "On-line OverlaidHandwriting Recognition Based on Substroke HMMs", ICDAR, 2003. 35. Sutat Sae-Tang and Ithipan Methaste, "Thai Online Handwritten Character Recognition Using Windowing Backpropagation Neural Networks", 2000. 36. Mark Nixon, and Alberto Aguado, Feature extraction and Image processing, Newnes, 2002. 37. Hazem Raafat and Mohsen A. A. Rashwan, “A Tree Structured Neural Network”, IEEE computer magazine, 1993, pp.939 - 942. 38. E.H. Ratzlaff, "Inter-line Distance Estimation and Text Line Extraction For Unconstrained Online Handwriting", Proceedings of the Seventh International Workshop on Frontiers in Handwriting Recognition, 2000, pp 33-42. 125

39. Daniel Jurafski and James H. Martin, Speech and Language Processing: An introduction to Natural Language processing, computational Linguistic and Speech recognition, Prentice-Hall, 2000. 40. Marcus LIWICKI and Horst BUNKE, "Handwriting Recognition of Whiteboard Notes", Proceedings of 12th Conference of the International Graphonomics Society, 2005, pp. 118 - 122. 41. Tal Steinherz, Ehud Rivlin, Nathan Intrator, and Predrag Neskovic, "An Integration of Online and Pseudo-Online Information for Cursive Word Recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, MAY 2005 42. O. Velek, M. Nakagawa and Cheng-Lin Liu, "Vector-To-Image Transformation of Character Patterns for On-line and Off-line recognition", International Journal of computer processing of oriental languages, 2002, pp 187-209. 43. Peter Burrow, MSc. thesis, Arabic Handwriting Recognition, University of Edinburgh, 2004

126

Appendix A: Introduction to Tablet PC A.1 What Is the Tablet PC? The Tablet PC is a notebook computer that can operate without a keyboard. Instead, Tablet PCs receive input via the use of a special pen or via speech input. Pen input (called digital ink on the Tablet PC) enables the user to write on, and interact directly with, the computer screen, bypassing the hand-cursor disconnect often encountered with mouse input. Although the Tablet PC can receive input via digital ink, it should not be mistaken for a simple pen computer. Most pen computers are touch-screen devices, which can actually impair productivity for those who need to rest their hand on the tablet to draw or write. The Tablet PC, on the other hand, does not employ a touch screen, which means you can use it more like a real-life notebook. The pen input of the Tablet PC makes it a perfect platform for taking notes, sketching, painting, and marking up documents and drawings. When used as a slate (keyboard removed or hidden), it is also an excellent reading platform for digital e-Books and magazines, as well as Acrobat files and Word documents.

A.2 The Tablet PC Value Proposition The Tablet PC enables or enhances the following: ■ Note taking: Notes can be taken directly on the Tablet PC in the user's own handwriting, which enables him to keep all his notes and files together on his computer. ■ Efficient digital collaboration: The Tablet PC’s direct annotation capability fits well with electronic conferencing. The user can mark directly on a virtual whiteboard or mark up a document with the pen instead of a mouse, making his annotations more direct. ■ Speech recognition and voice control: The user can use his voice to control the Tablet PC and dictate to his favorite word processor. ■ Ultimate portability: The Tablet PC is more mobile than most laptops, yet has a larger screen surface and higher usability than PDAs.

127

■ e-Book reading: The Tablet PC makes it easy to read books and journals directly on the tablet. ■ Electronic forms: The user can use his Tablet PC to fill out and sign forms and documents directly on the tablet. This is useful for mobile sales forces, databases, and those who deal often with contracts. ■ Built-in wireless networking (on most models): Built-in Wi-Fi allows for anywhere access to corporate data and Internet applications. Many offices and public locations now offer wireless access for their patrons and business partners. ■ Mark up documents without thinking about the software: Instead of thinking about how to use mark-up software, a Tablet PC enables the user to annotate documents with his own handwriting to identify items that need to be changed. This can also be used for marking up drawings done in CAD and other applications. Thus, the Tablet PC is an exciting new device that will change the way many people interact with their computers. The size of the Tablet PC and the pen input are the features that people notice most, and tend to be the primary reasons that most people get a Tablet PC. The size of the Tablet PC makes it less conspicuous, and thus more accepted, in meetings. The pen input changes the basic interactions with the computer. In other pen-based machines, the operating system was not designed to make the pen input significantly different from a mouse. With Windows XP Tablet PC Edition, however, the operating system (OS) handles the pen input and allows the user to keep handwritten notes as handwriting, no conversion is required. And because the OS now deals with digital ink, a person can search for notes that are still in handwritten form. Windows XP Tablet PC Edition converts handwriting to text in the background so that it can use the handwriting for searches.

A.3 Comparisons to Other Platforms Many look at the Tablet PC and think it’s a cool toy. It is, but it’s also a serious business machine. It’s important to know the differences between the Tablet PC and other machines before you buy one, and it’s helpful to know the differences if you use multiple devices.

128

Laptops Laptops are excellent devices for what they’re supposed to do: Replace the desktop for mobile users. Laptop prices have dropped to within $1,000 of their desktop counterparts. In addition, the performance of many laptops rivals that of the desktop, although laptops will always lag a bit. The Tablet PC extends the benefits of the laptop to include direct pen input, which enables people to easily take notes, draw, edit graphics, and deal with the computer more naturally. In general, Tablet PCs are lighter than laptops in both weight and performance. Most tablets that shipped immediately after the Tablet PC launch had processors in the 800 MHz range, while their laptop cousins were touting speeds exceeding 2 GHz. Although 800 MHz may seem acceptable for many, the Tablet PC is called upon to perform additional tasks that most computers are not asked to perform. Speech and handwriting recognition take processor horsepower, as does the digitizer interface itself. Newer Tablet PCs are mostly at or above the 1000 MHz (1 GHz) in processor speed. Desktops Unfortunately, the Tablet PC is slower than most new PCs, and with a price tag hovering around two to three times higher than a desktop, this can be troubling. If, however, you have been waiting for a more natural input method, you frequently deal with digital photographs and graphics, or you travel often, then the Tablet PC may be a great tool for you. Many people do not realize that write-on digitizer displays are available now for use in graphics applications. Designers, however, have been using them for some time. Pocket PC and Other PDAs For most people, even die-hard PDA users, the screen on a PDA is just too small. The Pocket PC interface is excellent compared to other offerings, but it still comes up short when dealing with more than a small checklist. In addition, most PDAs do not have a great way to input large amounts of data. Although external keyboards can be attached, the screen is still too small for much useful work. Even so, the Pocket PC does remain more mobile than the Tablet PC. 129

Other Pen-Based Computers One of the key differences between Tablet PCs and other pen-based computers is that the Tablet PC is not a touch-screen device. This is both good and bad, depending on your viewpoint. The Tablet PC is designed to use a pen for input. One cannot use a finger, stick, pencil, or anything else to write on the tablet. This is great for most of us who rest our hands on the tablet. Touch screens, on the other hand, require a hands-off approach; otherwise, the user will select something accidentally with the heel of his hand. In addition, the Tablet PC has the backing of Microsoft’s Windows XP Tablet PC Edition. This provides handwriting and speech recognition, as well as numerous other tools and third-party add-ons. Other pen-based computers, many of which use Windows CE or an older version of Windows, do not have that. Any application that runs on Windows XP also runs on Tablet PCs. Certain applications require touch-screen input, and some do not. According to sources, the Tablet PC platform will not support touch-screen functionality in the future.

A.4 Tablet PC Differences There are three basic hardware designs of Tablet PCs, and several mutations from each of those that add value in particular areas.

A.4.1 Differences in Hardware As shown in the figure below, Tablet PCs come in three basic designs: ■ The slate: A slate Tablet PC is an excellent choice for those who want the lightest device and who are focused on pen input (handwriting and sketching). If the user types often, he can connect a keyboard and mouse to a slate, and in some cases, a slate can be docked. When he's traveling with a slate, he will either need to carry a keyboard along with you or use the pen exclusively. ■ The convertible: A convertible Tablet PC has the clamshell design of a typical laptop, with a screen that can be twisted and folded down to create a pseudo-slate machine. Although convertibles tend to be a bit bulkier than true slates, the benefits of the built-in keyboard often outweigh the extra weight. ■ The hybrid: Hybrid Tablet PCs feature a special keyboard that easily attaches and detaches. When detached from the keyboard, the hybrid becomes a light slate; otherwise, it functions like a convertible. When attached to the slate, the keyboard can be folded over so that both may be carried together. (One down side of this design is

130

that when keyboard and slate are folded up, the screen is exposed, which could lead to screen damage.)

The three types of Tablet PCs: from left to right, the slate, the convertible, and the hybrid.

A.4.2 Input Options The Tablet PC gives the user more options for input than any other mainstream computer. In addition to the traditional keyboard and mouse, he can use voice and pen input. Keyboard Of course everyone understands how to use a keyboard, but doing so is optional for the Tablet PC. That said, few people can escape using their keyboards entirely, so be prepared to connect and disconnect one unless you have a convertible Tablet PC. If the user does use a keyboard with your Tablet PC, it will be USB-based. USB offers more flexibility for the keyboard, as well as auto-detecting the connection state (that is, detecting whether the keyboard is plugged in or not). Some USB keyboards even act as USB hubs, allowing other USB devices to connect to the computer through the keyboard. This may be a great option if the user often leaves his Tablet PC docked or somewhat distant from his keyboard. Some Tablet PCs have docking stations that can have a keyboard constantly attached, so that as soon as the user docks his Tablet PC, he will have access to the keyboard. The Digitizer Pen When a user begins using a Tablet PC, he will begin to use a digitizer pen. A digitizer pen is a special device that communicates the pen movement to the Tablet PC. Most work passively, but some require batteries. A digitizer pen will not have ink because it would mar the Tablet PC screen. Instead, the digitizer pen has a plastic tip that senses the pressure you exert on it, and

131

transfers the location of the pen and the pressure information to the digitizer sandwiched behind the screen surface. This translates into the cursor moving, and also into thicker or thinner marks in pressure-sensitive applications, such as Windows Journal. A PDA stylus will not work with Tablet PC because it is not a digitizer pen. A stylus is just a stick that isn’t as fat as a finger, allowing the user to touch more specific regions of a PDA or touch-screen device.

A.4.3 Data Communications Options The Tablet PC is one of the most mobile primary PC platforms available. Most Tablet PCs have built-in data communications capabilities, including Wi-Fi (wireless Ethernet 802.11), infrared, and even Bluetooth. Wireless Ethernet Wireless Ethernet lets the user use the Internet and other network services without using wires. You probably have a network cable at his office (or used to), which provides access to servers and the Internet. With wireless Ethernet, the user no longer needs the cable. The most common implementation of wireless Ethernet is 802.11b, also called Wi-Fi. This is 11 Mbps, which is slightly faster than the traditional 10 Mbps Ethernet connections, and quite a bit slower than the 100 Mbps “Fast” Ethernet and the 1,000 Mbps “Gigabit” Ethernet speeds. However, Wi-Fi does not require wires. As with any kind of radio-based communication device, obstacles such as walls and large objects can reduce transmission quality. If transmission quality is impaired, the data transmission rate can drop down to 2 Mbps or less (or could even

132

be dropped totally), so a good wireless infrastructure is helpful in keeping your Tablet PC connected. Wi-Fi transmission quality (reception) can be affected by other radio-based devices that transmit at the same frequency. Some cordless phones, microwave ovens, and baby monitors operate at the 2.4 GHz frequency range, which conflicts with WiFi and can cause trouble for the Tablet PC’s Wi-Fi reception. Even with all these limitations, Wi-Fi is still an enabling technology. With it, one can run a meeting and have up-to-the-second data streaming onto his Tablet PC without wires. Old-style meetings with laptops, cables, a network hub, and tons of power cords will hopefully become a thing of the past. As battery life improves, a full-day meeting with no wires and access to everything will be possible. In fact, a few manufacturers sell Tablet PCs that can run eight hours on a single charge. Wi-Fi is available all around you. Airports, coffee shops, convention halls, office buildings, homes, and other places are ready for wireless connectivity. Some places require payment for using their network, others don’t. With Windows XP Tablet PC Edition, the user has a great platform that will automatically detect wireless networks around him and will help him get connected. Infrared There are a couple flavors of infrared: slow and fast. Fast operates at 4 Mbps, and slow is just plain slow. Numerous infrared devices are available, including printers and PDAs. Infrared is a good choice when you can point two devices at each other and leave them that way until all the data is transferred. Infrared requires that the transmitter/receiver of one device is in the line of sight of the other device’s transmitter/receiver. To transfer data via infrared, the user will need to have his other IR device pointed directly at this port. One of the benefits of infrared is that it is one of the oldest and more predominant wireless technologies. It’s been in laptops for years. I would use infrared as a last resort, because other data-transfer methods are often faster. Even so, infrared is good in many situations, such as an ad-hoc sharing of contact information with a PDA. Bluetooth Bluetooth is an excellent technology for Personal Area Networks (PANs). A PAN is the 5-meter radius circle around user that lets him connect everything within 133

his vicinity. Bluetooth-enabled computers can communicate with Bluetooth-enabled phones, mice, keyboards, printers, and the like. The range is not as great as Wi-Fi, but it is great for short-distance communication. Most Bluetooth devices automatically sense other Bluetooth components. The user will probably need to know the security code of the other device. This prevents others from gaining access to his Bluetooth-enabled device without his consent. Much more illustration and examples of the Tablet PC capabilities like speech and handwriting recognition together with voice control and the importance of Tablet PC in day life use can be obtained from web sites like: www.TabletGuru.com www.microsoft.com/tabletpc http://office.microsoft.com/downloads/2002/oxptp.aspx http://communities.microsoft.com/Newsgroups/default.asp?ICP=windowsxp&sLCID =US&newsgroup=microsoft.public.windows.tabletpc http://www.microsoft.com/windowsxp/tabletpc/evaluation/default.asp www.tabletpcdeveloper.com www.tabletnews.com www.tabletpctalk.com www.tabletpcs.net And beginners' guides like the one from which this appendix is prepared from: Absolute Beginner's Guide To Tablet PCs by Craig F. Mathews, 2004 by ToolKits, Inc. or magazines like "Tablet PC Magazine": www.tabletpcmagazine.com

134

Appendix B: ICR companies and commercial products for handwritten text recognition: B.1 A2iA - Artificial Intelligence & Image Analysis Company This company has 4 useful products: a. A2iA Address reader. b. A2iA Check reader. c. A2iA Document reader d. A2iA Field reader. B.1.1 A2iA Address reader: A2iA AddressReader leverages Intelligent Word Recognition technology to locate, identify and read handwritten addresses from envelopes. It accurately recognizes all types of handwritten words, and characters from a variety of envelope types to maximize mail-sorting accuracy and efficiency within incoming / out-going mail, internal and postal mail operations. A2iA AddressReader locates handwritten address blocks on envelopes and flat mail, and segments it into lines and words. A2iA's advanced recognition technology then uses a custom dictionary or national postal database (such as the U.S. Postal Service and German Deutsche Post) to accurately read those words using IWR. It then provides the recognition results with confidence scores. Used for external and internal purposes, its versatility allows for reading of department or employee names as well as post office boxes. It can be used for sorting outgoing mail for pre-sort operations that handle a combination of handwritten and machine-printed mail.

Applications •

Sorting of internal mail: A2iA AddressReader may be combined inside an application with the dictionary supplied by the client to recognize the address fields (internal codes, addressee and addressor names, services, departments, etc.).

135



Sorting of postal mail: A2iA AddressReader extracts and recognizes the fields used for mail dispatch (postcode, city) and delivery (street number and name). The software interfaces with postal databases (American USPS and German Deutsche Post) to enhance recognition capabilities.

B.1.2 A2iA Check reader: A2iA CheckReader automatically reads natural, freeform cursive handwritten and machine-printed documents, including: business and personal checks, deposits slips, cash-in and cash-out documents. A2iA CheckReader has successfully been employed within: check processing, remittance processing, fraud detection, ATM, CRM and Check 21 Image Quality and Usability applications On check images, A2iA CheckReader can accurately read and locate: • Courtesy and Legal Amounts (CAR+LAR) • Address of the Payer • Date • Payee Name • MICR Codeline • Presence of the Signature

Applications •

A2iA CheckReader automatically recognizes courtesy and legal amounts (CAR+LAR) on all bank payment forms: checks, deposit slips, etc. The software can be used at both centralized and decentralized locations (banking branches or directly at large remitters).



A2iA CheckReader, can automate fraud detection. The software recognizes the name of the check's drawer and payee, and can spot fraudulent and moneylaundering operations by validating checks against various blacklists.



A2iA CheckReader recognizes checks deposited at ATMs (Automated Teller Machine Check Deposits) and completes the mandatory consistency controls. The software allows the address block printed on the check to be checked

136

against the data supplied by the bankcard in order to verify the remitter's identity. •

A2iA CheckReader scans all of the information in the address block. This data can be used to manage customer knowledge: geomarketing, database building, etc.

B.1.3 A2iA Document Reader: A2iA DocumentReader automates data capture, keyword spotting, and intelligent document management software for freeform cursive handwriting. A2iA DocumentReader features capabilities designed for both: real-time business process management and workflow applications such as spotting keywords to route incoming mail as well as knowledge management applications, where large batches of documents containing cursive handwriting become searchable for the first time.

Features and Benefits • Low indexing cost and rapid access to information compared to manual transcription. • The ability to perform ad-hoc searches of handwritten information, previously unusable in electronic form. • Automated handwritten document classification and analysis streamlines workflow in paper-intensive industries and mailrooms. • Better compliance as a result of making handwritten documents and their content searchable and by providing more customer information.

Applications •

Mailroom/Processing of opened incoming mail



Sorting/Indexing A2iA DocumentReader extracts key words and expressions to automate the sorting and indexing of incoming company mail.

137

B.1.4 A2iA Field Reader A2iA FieldReader combines the most advanced OCR, ICR, IWR and proprietary cursive handwriting recognition technology to capture written data within structured documents. A2iA FieldReader incorporates IWR technology, which reduces spelling and character recognition errors. Instead of recognizing words character-by-character, IWR matches entire words and phrases against a user-defined dictionary, reducing the usual time-consuming process of proofreading and correcting spelling and character errors encountered in other, less advanced recognition engines. The result is faster throughout and a more efficient workflow. A2iA FieldReader contains multiple recognition engines that are precisionengineered to take advantage of A2iA's artificial intelligence and neural network technology for the fastest and most accurate results possible. A2iA FieldReader automatically recognizes all types of fields containing cursive or machine printed words: • Constrained fields: comb and box fields • Freehand / Cursive and handprint • Numeric • Dates • Check marks / OMR-Machine print • Overwriting detection

Applications •

Structured documents processing A2iA FieldReader recognizes all field types in a structured document. Administrative forms, transport vouchers, questionnaires and contracts are but some of the many documents that can be effortlessly analyzed using A2iA software.



Archive Indexing Specific customized versions of A2iA FieldReader recognize archives and converts the information into digital data. This technology provides new search criteria for the intelligent indexing of documents.

138



Address recognition on forms A2iA AddressPack for A2iA FieldReader locates and identifies the address fields in structured documents. The software interfaces with the main postal databases on the market (USPS, Mediapost, Deutsche Post) to improve the recognition results of the information fields (postcode, city, street name and number).

B.2 CereSoft Company This company has 2 useful products: a. FormAgent. b. DocAgent. B.2.1 FormAgent: Design For a form to be processed, a template must be created that tells the system about the forms attributes, data location and data types. Design provides a template wizard to guide users through the design of these templates. Design also automatically performs form identification, form registration, auto deskew, rotate, and inversion, so that correct and clean data images are extracted for recognition. After the form template is finished, its instructions are stored in an HTML text-file. This file can be accessed by other components of the system. For example, FormAgent Verify reads it when in the Page Mode editor.

FreeStyle Recognition Engine FreeStyle employs CerebralNet technology which integrates, in real time, image, context, and linguistic validation rules to ensure the highest degree of accuracy possible. The FreeStyle recognition engine is seamlessly comprehensive. It recognizes both handwritten and machine-printed words simultaneously and automatically. It also recognizes OMR, barcodes, logos, and form identities.

139

Besides the usual built-in validation rules such as lexical and geometrical context to enhance accuracy, FormAgent uses a real-time higherlevel context known as data objects to assist the recognition process. For example, FormAgent cross-checks the street, city, state, and zip code of an address from the beginning of, rather than after the recognition process. In this way, the system dramatically cuts down on mistakes. An otherwise unreadable address can be correctly read in most cases. Other data objects include dates, phone numbers, and e-mail addresses. FreeStyle also uses a large dictionary to limit candidates to recognized words. The Trueword module is built into the recognition engine from the start rather than as a postprocessing tool, as other vendors often do. Recognition results are stored in an HTML text file. This file contains such information as the multiple choices of each character and word, their confidences, geometric locations, etc. This information can be easily accessed by FormAgent's other components and modules. For example, Verify will use it to improve the efficiency of the editing process.

Verify FormAgent will invoke a carefully designed editor to correct any remaining recognition errors. The editor has two editing modes. In the Field Mode, individual fields will be presented to users one at a time for manual correction. But if field alone cannot resolve the ambiguity of a word, Page Mode can be accessed by double-clicking the field image. In Page Mode, the editing field and the original image in the page are highlighted together for easy identification. Alternative choices for each character and field are displayed at all times along with the field being edited.

Export FormAgent has the ability to export the extracted data in either ASCII text format or any ODBC compliant database format. The Export Manager allows exporting data to multiple databases in multiple formats. It also allows exporting data to spreadsheets, EDI (electronic data interchange) or web-form format.

140

B.2.2 DocAgent: Until now, forms processing applications have only been successfully used to process "static" or pre-defined forms that have a fixed design. This was due to the need for a "template" that identified the areas or zones from which data was to be extracted. This limitation prevented the use of automated data extraction systems on the majority of business forms. An example of one such a business form is an invoice that is presented for payment. Invoices contain all the same information (i.e. vendor name, date of invoice, purchase order number, terms, and amount due), but they often look different. Therefore, it is very difficult to use a standard template for extracting data from invoices. Until recently the data extraction from this type of form was performed by manual data entry into an accounts payable application. Other examples are: o Payment Vouchers - Every payment received will have a different layout for the remittance advice attached to the check. o Utility Bills - These documents have both fixed and variable locations for the data on each bill. The top is usually the same, but the amount due section of the document will almost always fluctuate in length and position. o Business Checks - Personal checks all look alike for the most part. However, each business will use a check document that is totally different from another. Thus, the document becomes a "dynamic form," and one will need dynamic capability in order to process it. DocAgent actually reads documents, locates the needed information, and moves that data through a validation and correction procedure for export to an ASCII file, or any ODBC compliant database. It can also automatically sort, index, and capture documents in a production environment

141

B.3 Paragraph International Company This company has many products for on-line handwriting recognition: a. Smartphone Products b. Pocket PC Products c. Tablet PC Products d. Developer Products

B.4 Parascript Intelligent Recognition This company has many useful products for: a. Mailing and shipping b. Check and remittance processing c. Forms processing d. Fraud detection and prevention

B.5 GDI (Graphics Development international) company This company has products for: a. Form filling. b. Form reading. c. Document management. Much more details about commercial products of different companies and their applications are cited in the link "http://tev.itc.it/OCR/Products.html".

142

Appendix C: Off-line Isolated Arabic alphabet database

143

144

Appendix D: On-line Cursive Arabic alphabet database I. Figures 1. Training Database: Writer 1:

145

Writer 2:

146

Writer 3:

147

148

Writer 4:

149

2. Test Database: Writer 5:

Writer 6:

150

Writer 7:

Writer 8:

II. Data The x-y coordinate details of both the training and test samples are on the attached CD. 151

Appendix E: Online Arabic Pattern Shapes 23

46

69

1

24

47

70

2

25

48

71

3

26

49

72

4

27

50

73

5

28

51

74

6

29

52

75

7

30

53

76

8

31

54

77

9

32

55

78

10

33

56

79

11

34

57

80

152

12

35

58

81

13

36

59

82

14

37

60

83

15

38

61

84

16

39

62

85

17

40

63

86

18

41

64

87

19

42

65

88

20

43

66

89

21

44

67

90

22

45

68

91

153

Pattern Shapes Names: The pattern shapes were named using fields: 1. Character Name: Alef, Baa, Taa, etc. 2. Writing Technique Number: the technique by which the writers tend to write this character with. 3. Hamza exist: a Flag to show if the character has a complementary hamza associated with (HE) or not (NHE) or madda (ME). 4. Alef exist: a Flag to show if the character has a complementary alef associated with (AE) or not (NAE). 5. Kaf hat exist: a Flag to show if the character has a complementary hat (Cap) associated with (CE) or not (NCE). 6. Number of dots: 0, 1, 2 or 3. 7. Position of the complementary stroke: None, Up, Down or Middle. 8. Position of the main stroke in the word: First (F), Middle (M), End (E) or Isolated (I). 9. Complete or Segmented: if written as a whole (C) or if written as segments (S). 10. Touching other strokes: if touching other word strokes from left (L), or right (R), both (LR). If isolated (None). 1. M_0Lig2_m1_NHE_NAE_NCE_0dot_None_I_C_None 2. M_0Lig2_m2_NHE_NAE_NCE_0dot_None_I_C_None 3. M_0Alef_m1_NHE_NAE_NCE_0dot_None_F_C_None 4. M_0Alef_m2_NHE_NAE_NCE_0dot_None_F_C_None 5. M_0Alef_m1_NHE_NAE_NCE_0dot_None_I_C_None 6. M_0Alef_m2_NHE_NAE_NCE_0dot_None_I_C_None 7. M_0Alef_m1_NHE_NAE_NCE_0dot_None_E_C_000R 8. M_Madda_m1_NHE_NAE_NCE_0dot_None_I_C_None 9. M_Hamza_m1_NHE_NAE_NCE_0dot_None_I_C_None 10. M_Hamza_m2_NHE_NAE_NCE_0dot_None_I_C_None 11. M_Hamza_m3_NHE_NAE_NCE_0dot_None_I_C_None 12. M_Hamza_m4_NHE_NAE_NCE_0dot_None_I_C_None

154

13. M_Hamza_m5_NHE_NAE_NCE_0dot_None_I_C_None 14. M_Nabra_m1_NHE_NAE_NCE_0dot_None_F_C_000L 15. M_Nabra_m1_NHE_NAE_NCE_0dot_None_M_C_00LR 16. M_Nabra_m3_NHE_NAE_NCE_0dot_None_M_C_00LR 17. M_Nabra_m1_NHE_NAE_NCE_0dot_None_I_C_None 18. M_Nabra_m2_NHE_NAE_NCE_0dot_None_I_C_None 19. M_Nabra_m1_NHE_NAE_NCE_0dot_None_E_C_000R 20. M_Nabra_m2_NHE_NAE_NCE_0dot_None_E_C_000R 21. M_0Haah_m1_NHE_NAE_NCE_0dot_None_F_C_000L 22. M_0Haah_m3_NHE_NAE_NCE_0dot_None_F_C_000L 23. M_0Haah_m4_NHE_NAE_NCE_0dot_None_F_C_000L 24. M_0Haah_m1_NHE_NAE_NCE_0dot_None_I_C_None 25. M_00Dal_m1_NHE_NAE_NCE_0dot_None_I_C_None 26. M_00Dal_m1_NHE_NAE_NCE_0dot_None_E_C_000R 27. M_00Dal_m1_NHE_NAE_NCE_0dot_None_E_S_000R 28. M_00Raa_m1_NHE_NAE_NCE_0dot_None_I_C_None 29. M_00Raa_m2_NHE_NAE_NCE_0dot_None_I_C_None 30. M_00Raa_m1_NHE_NAE_NCE_0dot_None_E_C_000R 31. M_00Raa_m1_NHE_NAE_NCE_0dot_None_E_S_000R 32. M_0Seen_m1_NHE_NAE_NCE_0dot_None_F_C_000L 33. M_0Seen_m1_NHE_NAE_NCE_0dot_None_M_C_00LR 34. M_0Seen_m1_NHE_NAE_NCE_0dot_None_E_C_000R 35. M_00Sad_m1_NHE_NAE_NCE_0dot_None_F_C_000L 36. M_00Sad_m1_NHE_NAE_NCE_0dot_None_M_C_00LR 37. M_00Sad_m1_NHE_NAE_NCE_0dot_None_I_C_None 38. M_00Sad_m1_NHE_NAE_NCE_0dot_None_E_C_000R 39. M_00Tah_m1_NHE_NAE_NCE_0dot_None_I_C_None 40. M_00Tah_m1_NHE_NAE_NCE_0dot_None_E_C_00LR 41. M_00Tah_m2_NHE_NAE_NCE_0dot_None_F_C_000L 42. M_00Ein_m1_NHE_NAE_NCE_0dot_None_F_C_000L 43. M_00Ein_m1_NHE_NAE_NCE_0dot_None_M_S_00LR 44. M_00Ein_m1_NHE_NAE_NCE_0dot_None_I_C_None 45. M_00Ein_m1_NHE_NAE_NCE_0dot_None_E_S_000R 46. M_00Faa_m1_NHE_NAE_NCE_0dot_None_F_C_000L 155

47. M_00Faa_m2_NHE_NAE_NCE_0dot_None_F_C_000L 48. M_00Faa_m2_NHE_NAE_NCE_0dot_None_I_C_None 49. M_00Faa_m1_NHE_NAE_NCE_0dot_None_E_C_000R 50. M_00Faa_m1_NHE_NAE_NCE_0dot_None_M_C_00LR 51. M_00Qaf_m1_NHE_NAE_NCE_0dot_None_I_C_None 52. M_00Qaf_m2_NHE_NAE_NCE_0dot_None_I_C_None 53. M_00Kaf_m1_NHE_NAE_NCE_0dot_None_E_C_000R 54. M_00Kaf_m1_NHE_NAE_NCE_0dot_None_F_C_000L 55. M_00Kaf_m2_NHE_NAE_NCE_0dot_None_F_C_000L 56. M_00Kaf_m3_NHE_NAE_NCE_0dot_None_F_C_000L 57. M_00Kaf_m1_NHE_NAE_NCE_0dot_None_M_S_00LR 58. M_00Lam_m1_NHE_NAE_NCE_0dot_None_F_C_000L 59. M_00Lam_m2_NHE_NAE_NCE_0dot_None_F_C_None 60. M_00Lam_m1_NHE_NAE_NCE_0dot_None_M_C_00LR 61. M_00Lam_m2_NHE_NAE_NCE_0dot_None_M_C_000R 62. M_00Lam_m1_NHE_NAE_NCE_0dot_None_F_C_None 63. M_00Lam_m1_NHE_NAE_NCE_0dot_None_E_C_000R 64. M_0Meem_m5_NHE_NAE_NCE_0dot_None_F_C_000L 65. M_0Meem_m6_NHE_NAE_NCE_0dot_None_F_C_000L 66. M_0Meem_m7_NHE_NAE_NCE_0dot_None_F_C_000L 67. M_0Meem_m5_NHE_NAE_NCE_0dot_None_M_C_000R 68. M_0Meem_m8_NHE_NAE_NCE_0dot_None_M_C_000R 69. M_0Meem_m5_NHE_NAE_NCE_0dot_None_E_C_000R 70. M_0Meem_m7_NHE_NAE_NCE_0dot_None_E_C_000R 71. M_0Meem_m5_NHE_NAE_NCE_0dot_None_I_C_None 72. M_0Noon_m1_NHE_NAE_NCE_0dot_None_E_C_000R 73. M_0Noon_m1_NHE_NAE_NCE_0dot_None_I_C_None 74. M_00Haa_m1_NHE_NAE_NCE_0dot_None_F_C_000L 75. M_00Haa_m2_NHE_NAE_NCE_0dot_None_F_C_000L 76. M_00Haa_m1_NHE_NAE_NCE_0dot_None_M_C_00LR 77. M_00Haa_m2_NHE_NAE_NCE_0dot_None_M_C_00LR 78. M_00Haa_m4_NHE_NAE_NCE_0dot_None_M_C_00LR 79. M_00Haa_m1_NHE_NAE_NCE_0dot_None_I_C_None 80. M_00Haa_m2_NHE_NAE_NCE_0dot_None_I_C_None 156

81. M_00Haa_m3_NHE_NAE_NCE_0dot_None_I_C_None 82. M_00Haa_m1_NHE_NAE_NCE_0dot_None_E_C_000R 83. M_00Haa_m2_NHE_NAE_NCE_0dot_None_E_C_000R 84. M_00Haa_m3_NHE_NAE_NCE_0dot_None_E_C_000R 85. M_00Haa_m3_NHE_NAE_NCE_0dot_None_E_S_000R 86. M_00Waw_m1_NHE_NAE_NCE_0dot_None_I_C_None 87. M_00Yaa_m1_NHE_NAE_NCE_0dot_None_I_C_None 88. M_00Yaa_m1_NHE_NAE_NCE_0dot_None_E_C_000R 89. M_00Yaa_m2_NHE_NAE_NCE_0dot_None_E_C_000R 90. M_0Lig1_m1_NHE_NAE_NCE_0dot_None_F_C_000L 91. M_0Lig1_m1_NHE_NAE_NCE_0dot_None_M_C_00LR

157

Appendix F: Algorithms 1. Text lines and Words extraction Algorithm 1. 2. 3. 4. 5. 6. 7. 8. 9. a) b) c) d) e) f) g) h) i) j)

k)

Define a 2-D array ("Words") where elements in the same row belong to the same word (Forward Projection "FP" group) Define 2 pointers to the rows and columns of the array "Words" respectively ("X_Pointer") and ("Y_Pointer") Stroke 1 should be belonging to the 1st word, Add stroke 1 to "Words" Increment column pointer ("Y_Pointer") for the new stroke Define the horizontal space between non-touching and non-overlapping strokes ("Deltax") and initialize it to zero Define a global pointer that points to the left margin of the word ("LeftBorder") and initialize it to the left border of stroke 1 Define a global pointer that points to the right margin of the word ("RightBorder") and initialize it to the right border of stroke 1 Define a flag that indicates whether or not the current word is the first word in the new text line ("LineHead") and initialize it to zero Loop on all strokes in document, Let the loop counter is ("i") Define a flag to examine if stroke i+1 touches the previous stroke i ("Touching") and initialize it to zero Define a flag to examine if stroke i+1 overlaps with the previous stroke i ("OverlapWithPrevWord") and initialize it to zero loop on all previous strokes (1-> i)to find all document stroke widths Find Maximum stroke width in the whole document -> ("MaxWidth") Find Minimum stroke width in the whole document -> ("MinWidth") Find Average stroke width in the whole document -> ("AvgWidth") Define the displacement on x-axis direction between last sample of stroke i and the first sample of stroke i+1 ("X_Displacement") to detect new text lines and initialize it with the displacement between strokes i and i+1 Search for a common point (same x-y coordinate) for both strokes i and i+1 If a common point is found set the flag ("Touching") to TRUE If (Touching=TRUE) then stroke i+1 belong to the same word like stroke i iAdjust the poisition of the word left margin pointer ("LeftBorder") to the left border of stroke i+1 iiAdd stroke number i+1 to the current word -> Words(X_Pointer,Y_Pointer) iiiIncrement column pointer ("Y_Pointer") for the new stroke Elseif (the right border of stroke i+1 is >= the word left margin pointer ("LeftBorder")): testing the x-axis overlap for non-touching strokes iIf (the left border of stroke i+1 is Words(X_Pointer,Y_Pointer) • Increment column pointer ("Y_Pointer") for the new stroke iielseif (left border of stroke i+1 is > the word left margin pointer ("LeftBorder"))&(X_Displacement>2*MaxWidth)), then the strokes i and i+1 are not overlapping and i+1 belongs to a new word on a new text line • Set the ("LineHead") flag to 1 because the current word containing stroke i+1 is the head of the new text line • Adjust the poisition of the word right margin pointer ("RightBorder") to the right border of stroke i+1 • Adjust the poisition of the word left margin pointer ("LeftBorder") to the left border of stroke i+1 • Increment row pointer ("X_Pointer") for the new word • Set the column pointer ("Y_Pointer") for the first stroke • Add stroke number i+1 to the current word -> Words(X_Pointer,Y_Pointer) • Increment column pointer ("Y_Pointer") for the new stroke iiiElseif (X_Displacement ("Tmp") ⇒ Delete the current FP group ⇒ Return to the previous FP group, Decrement the word pointer ("X_Pointer") ⇒ Adjust the column pointer ("Y_Pointer") to the first empty place

158

⇒ ⇒ ⇒ •

Add the stored strokes ("Tmp") to the current FP group while incrementing the column pointer ("Y_Pointer") Add the stroke i+1 to the current FP group -> Words(X_Pointer,Y_Pointer) Increment column pointer ("Y_Pointer") for the new stroke

Elseif no overlapping with a previous FP group, then strokes i and i+1 are overlapping and belong to the same word (FP group) ⇒ Add stroke number i+1 to the current word -> Words(X_Pointer,Y_Pointer) ⇒ Increment column pointer ("Y_Pointer") for the new stroke

l)

Elseif (the right border of stroke i+1 is < the word left margin pointer ("LeftBorder")), then the 2 strokes i and i+1 not overlapping iFind the x-axis Gap between strokes i & i+1 -> ("Gap") iiPut the x-axis threshold to compare the inter-stroke distance ("Deltax") = ("AvgWidth") to be adjustable for every new FP group iiiIf the x-axis Gap between strokes i & i+1 less than or equal to the x-axis threshold then the strokes i and i+1 are not Overlapping but belong to the same word • Adjust the poisition of the word left margin pointer ("LeftBorder") to the left border of stroke i+1 • Add stroke number i+1 to the current word -> Words(X_Pointer,Y_Pointer) • Increment column pointer ("Y_Pointer") for the new stroke ivElseif the x-axis Gap between strokes i & i+1 greater than the x-axis threshold then the strokes i and i+1 are not Overlapping and i+1 should be the beginning of a new word • Adjust the poisition of the word right margin pointer ("RightBorder") to the right border of stroke i+1 • Adjust the poisition of the word left margin pointer ("LeftBorder") to the left border of stroke i+1 • Increment row pointer ("X_Pointer") for the new word • Set the column pointer ("Y_Pointer") for the first stroke • Add stroke number i+1 to the current word -> Words(X_Pointer,Y_Pointer) • Increment column pointer ("Y_Pointer") for the new stroke • Reset the flag ("LineHead") indicating that the new word is not the beginning of the text line

m)

Increment the loop counter ("i") which loops on all strokes in document

2. Stroke Type Identification Algorithm 1. 2. 3. 4. 5. 6. 7. 8.

Define a 2-D array ("MainStrokes") where elements in the same row are the main strokes belonging to the same word (Forward Projection "FP" group Define a pointer to the main strokes array ("Pointer1") Define a 2-D array ("SecondaryStrokes") where elements in the same row are the secondary strokes belonging to the same word Define a pointer to the secondary strokes array ("Pointer2") Define a 2-D array (Registery) containing full description of the secondary strokes (type, position, nearest main stroke, ....) Define a pointer to the registery array ("Pointer3") Find the minimum stroke width in the whole document -> ("min_width") Loop on each word group, let the loop counter be ("s") a) Count all strokes found in the current word group b) If the word has only 1 stroke then it should be a main stroke iAdd the current stroke to the main strokes belonging to the current word -> MainStrokes(s,1) iiset the pointers (Pointer1 & Pointer2) to 1 for next word loop c) Elseif the word has 2 strokes, then the first one is main and the other is either main or secondary iAdd the first stroke to the main strokes belonging to the current word -> MainStrokes(s,1) iiDefine variable ("nearest") representing the nearest main stroke to the secondary within the same FP group (in case that the second one is a secondary) iiiThe first stroke should be the nearest main to the other stroke ivDefine a variable ("Height") representing the height of the first stroke vTest the y-axis projection overlap viDetect the second stroke curvature (in case the second stroke is kaf hat or hamza) viiif the second stroke lies in the upper zone and is straight (no curvature) with height greater than quarter the first stroke then it should be main(probably kaf hat) • Add the current stroke to the main strokes belonging to the current word -> MainStrokes(s,2) viiiElseif the second stroke lies in the upper zone or has relatively large size but its centroid lies in the upper zone and has curvatures then it should be secondary (most probably hamza) • Add the current stroke to the secondary strokes belonging to the current word -> SecondaryStrokes(s,1) • Write the secondary stroke information to the registery ("Registery(Pointer3,:)"): FP group number, secondary stroke number, Type (dot, hamza, ..), nearest main stroke number, position with respect to the nearest main, left margin, right margin, height and width

159

ix-

x-

xi-

d)

• Increment the registry counter ("Pointer3") for new strokes Elseif the second stroke lies totally below the first stroke it should be a secondary (most probably a single dot or hamza) • Add the current stroke to the secondary strokes belonging to the current word -> SecondaryStrokes(s,1) • Write the secondary stroke information to the registery ("Registery(Pointer3,:)"): FP group number, secondary stroke number, Type (dot, hamza, ..), nearest main stroke number, position with respect to the nearest main, left margin, right margin, height and width • Increment the registry counter ("Pointer3") for new strokes Elseif the secondary height is less than 40 percent of the main stroke height it should be a single dot, otherwise it should be hamza • Add the current stroke to the secondary strokes belonging to the current word -> SecondaryStrokes(s,1) • Write the secondary stroke information to the registery ("Registery(Pointer3,:)"): FP group number, secondary stroke number, Type (dot, hamza, ..), nearest main stroke number, position with respect to the nearest main, left margin, right margin, height and width • Increment the registry counter ("Pointer3") for new strokes Elseif all above conditions failed then the second stroke should be a main stroke • Add the current stroke to the main strokes belonging to the current word -> MainStrokes(s,2) • Set the pointers (Pointer1 & Pointer2) to 1 for next word loop

Elseif the word group contains number of strokes >= 3 iThe first stroke should be a main stroke -> MainStrokes(s,1) iiInitially the first stroke is the nearest main -> ("nearest") iiiIncrement the pointer of the main stroke array ("Pointer1") for new stroke ivDefine a global pointer that points to the stroke left margin and initialize it to the left margin of the first stroke ("Left_Border") vDefine a global pointer that points to the stroke right margin and initialize it to the right margin of the first stroke ("Right_Border") viloop on each stroke in the current FP group, let the loop counter be ("i") • If the previous main stroke was not the first word stroke, search in all previous main strokes for the nearest one to the current stroke -> ("nearest") • Adjust the global left margin ("Left_Border") to the left margin of the nearest main stroke • Adjust the global right margin ("Right_Border") to the right margin of the nearest main stroke • Find the average heights of the main strokes currently detected within the current FP group -> ("Avg_Height") • Find the average widths of the main strokes currently detected within the current FP group -> ("Avg_Width") • Let the minimum acceptable main strokes height among the current FP group to be 1/3 the Avg main strokes heights • Detect the second stroke curvature • If total or partial overlap on x-axis and minimum Y_Histogram overlap ⇒ if the current stroke lies totally above or below the nearest main and it height is greater than 70% of ("Avg_Height") among the current FP group then it should be a main stroke o Add the current stroke to the main strokes belonging to the current word o Increment the column pointer for new stroke ("Pointer1") o In case the left margin of the current main stroke is less than the global left margin adjust ("Left_Border") to the left margin of the current main stroke o In case the right margin of the current main stroke is larger than the global right margin adjust ("Right_Border") to the right margin of the current main stroke ⇒ Elseif the current stroke is small comparable to the preceeding main strokes, it is detected through its location .. it should be on the same y level and lies to the left of the previous strokes o Add the current stroke to the main strokes belonging to the current word o Increment the column pointer for new stroke ("Pointer1") o In case the left margin of the current main stroke is less than the global left margin adjust ("Left_Border") to the left margin of the current main stroke o In case the right margin of the current main stroke is larger than the global right margin adjust ("Right_Border") to the right margin of the current main stroke ⇒ Elseif the current stroke is >30% and less than 70% of the ("Avg_Height") with medium width it should be main stroke o Add the current stroke to the main strokes belonging to the current word

160

Increment the column pointer for new stroke ("Pointer1") In case the left margin of the current main stroke is less than the global left margin adjust ("Left_Border") to the left margin of the current main stroke o In case the right margin of the current main stroke is larger than the global right margin adjust ("Right_Border") to the right margin of the current main stroke ⇒ Elseif all the above cinditions failed it should be a secondary stroke o Add the current stroke to the secondary strokes belonging to the current word -> SecondaryStrokes o Write the secondary stroke information to the registery ("Registery(Pointer3,:)"): FP group number, secondary stroke number, Type (dot, hamza, ..), nearest main stroke number, position with respect to the nearest main, left margin, right margin, height and width o Increment the registry counter ("Pointer3") for new strokes • Elseif the current stroke is large and has large common y-hist it should be main ⇒ Add the current stroke to the main strokes belonging to the current word ⇒ Increment the column pointer for new stroke ("Pointer1") ⇒ In case the left margin of the current main stroke is less than the global left margin adjust ("Left_Border") to the left margin of the current main stroke ⇒ In case the right margin of the current main stroke is larger than the global right margin adjust ("Right_Border") to the right margin of the current main stroke viiIncrement the stroke loop counter ("i") viiiSet the pointers (Pointer1 & Pointer2) to 1 for next word loop Increment the word loop counter ("s") It's time to restore mains that was detected as secondaries Loop on each word group, let the loop counter be ("s") a) Count all secondary strokes in the current word group b) Find the average secondary strokes areas for the whole document -> ("Avg_SA") c) if the secondary array of the current word contains 1 stroke with area >50000 it should be main iAdd it to the main strokes of the same word iiDelete the secondary stroke from the secondary registery iiiDelete the secondary stroke from the secondary array d) Elseif the secondary array of the current word contains 2 strokes with stroke 1 area >50000 it should be main iAdd first stroke to the main strokes of the same word iiDelete the first stroke from the secondary registery iiiDelete the first stroke from the secondary array e) Elseif the secondary array of the current word contains 2 strokes with stroke 2 area >50000 it should be main iAdd second stroke to the main strokes of the same word iiDelete the second stroke from the secondary registery iiiDelete the second stroke from the secondary array f) Elseif the secondary array of the current word contains 3 or more strokes iLet Threshold = Sum the secondaries areas below 1.5 of ("Avg_SA") iiif a stroke has an area > 1.5* average area and larger than Threshold it should be main • Add second stroke to the main strokes of the same word • Delete the secondary stroke from the secondary registery • Delete the secondary stroke from the secondary array Increment the word loop counter ("s") Define a 2-D array (MainStrokeRegistery) containing full description of the main strokes (connection direction (left, right, right and left), secondaries attached (number, position)) Define a pointer to the MainStrokeRegistery array ("MSRP") Loop on each word group, let the loop counter be ("s") a) If the ("MainStrokes") array of the current word contains 1 stroke then The connection direction is zero (not connected to any other stroke) and all secondary strokes of the current word belongs to it b) Elseif the ("MainStrokes") array of the current word contains more than 1 stroke ifind the nearest distance between the current stroke and the previous and next stroke, if less than some threshold we consider it connected to this stroke either from left (next) or from right (previous) or both iiLoop on secondary strokes of the current word and assign to the current main stroke the secondary which lies in the same x range c) Increment the main stroke loop counter ("MSRP") Increment the word loop counter ("s") o o

9. 10. 11.

12. 13. 14. 15.

16.

3. Training Algorithm 1. 2. 3. 4. 5.

Define an Array ("Training_Files") containing the training files path names Define a registery of writing techniques of each pattern shapes ("ChainCode_Registery") Read the pattern shapes names and store in ("Models_Names") Define an array ("Model_Repetition") containing the number of repetition times of each pattern shapes in the training files loop on transcription training files names, let the loop counter ("i") a) read the required training data (strokes samples)

161

b) c) d)

6. 7. 8.

9.

open the corresponding training file transcription read the current transcription training file loop on all transcription file text lines iskip the line containing the word number iiread the no of strokes needed to form the pattern shapes iiiloop on the number of parts composing the pattern shape • read the current stroke part number • read the start point of current stroke part • read the end point of current stroke part • extract the chain code • store all direction distances e) Let the distance threshold below which directions are discarded = 0.005 * max (all direction distances) f) loop on all transcription file text lines, let the loop counter ("m") iskip the line containing the word number iiLoop on the pattern shapes names, let the loop counter be ("j") • select the current pattern shapes name • If current pattern shapes name is found in the current training file name • read the no of strokes needed to form the pattern shapes • Define an array ("Writing_Direction") containing the chain code of the writing direction of the current pattern shapes name in the current training file name • loop on the number of parts composing the stroke, let the loop counter = k ⇒ if k>1 before finding chain code of new segment store the pen-off movement direction o read the end point of previous stroke part o read the current stroke part number o read the start point of current stroke part o read the end point of current stroke part o add the pen-off direction between start point(current) and the end point (previous)to 'Writing_Direction' array ⇒ read the current stroke part number ⇒ read the start point of current stroke part ⇒ read the end point of current stroke part ⇒ extract the chain code and discard small distance directions and add it to 'Writing_Direction' array ⇒ add the pen-off direction ⇒ Increment the loop counter "k" ⇒ Increment the number of repetition times of the current pattern shape ⇒ add the current pattern shapes writing direction to the chain code registery • Increment the loop counter "j" iiiIncrement the loop counter m Increment the name loop counter "i" It's time to select most common writing directions representing each pattern shapes i.e delete redundancies loop on all pattern shapes, let the loop counter be ("k") a) loop on all repetitions of the current pattern shapes, let the loop counter be ("x") • store the current writing directions of the current pattern shapes repetition • loop on the predecessors of the current pattern shapes repetition, let the loop counter be ("l") ⇒ store the current writing directions of the current pattern shapes repetition predecessor ⇒ find the minimum edit distance between the current pattern shapes repetition and its predecessors ⇒ if the current pattern shapes repetition coicides with one of its predecessors raise a flag and break the loop • if no flag was risen then the current wrting direction is unique and should be added, otherwise it will be repeted and therefore discarded • increment the number of repetition times of each pattern shapes • add the current pattern shapes writing direction to the chain code registery • Increment the loop counter l b) Increment the loop counter x Increment the loop counter k

4. Recognition Algorithm 1. 2. 3. 4. 5.

Read and store the pattern shape names Open an output file 1 to store different character candidates. Open an output file 2 to store the tree of choices constructed for each stroke before restoring dots. Open an output file 3 to store the tree of choices constructed for each stroke after restoring dots. Loop on each word group, let the loop counter be ('s') a. let the variable ('counter') = 1 b. Loop on all strokes in current word group, let the loop counter be ('m') iDefine an empty array containing the chain code of the writing direction of the current stroke in the current test file name --> Writing_Direction iiread the current stroke number iiiread the start point of current stroke part

162

ivvvivii-

c.

read the end point of current stroke part extract the chain code let L be the length of the writing direction of the current test stroke loop on all pattern shapes names, let the loop counter be ('k')$$$ • loop on all representative patterns of the current pattern shape, let the loop counter be ('l') ⇒ store the current representative pattern of the current pattern shape --> Training_String ⇒ set the variable Threshold to 100000000000 ⇒ Loop on the length of the writing direction of the current test stroke ('L'), let the loop counter ('f') o find the minimum edit distance between the test writing direction array (1->f) and Training_String o Let the number of matches ('Number_Of_Hits') = 0, and the last match index ('Idx') = 0 o Loop on the test writing direction array (1->f), let the loop counter ('r') 1) find a match between Training_String and the test writing direction array (r)with index > the last match index ('Idx') 2) if a match is found increment the number of matches ('Number_Of_Hits') and adjust the last match index ('Idx') to the new match index value 3) Increment the loop counter r o if the no matches were found, let the number of matches ('Number_Of_Hits') = 1 o if (the minimum edit distance*length of Training_String / Number_Of_Hits)< Threshold do $ 1) record the pattern shape name, representative pattern number, start location, loop counter f (segmentation 'end' location), (the minimum edit distance*length of Training_String / Number_Of_Hits) in a matrix --> Decision_Matrix 2) adjust Threshold to be equal (the minimum edit distance*length of Training_String / Number_Of_Hits) o Increment the loop counter f ⇒ Increment the loop counter l • Increment the loop counter k viiisort the Decision_Matrix ascendingly by total distance (the minimum edit distance*length of Training_String / Number_Of_Hits) ixLoop on the Decision_Matrix size, let the loop counter ('r') • if the pattern shape cannot be connected from left and the segmentation location < length of the writing direction of the current test stroke ('L') ⇒ adjust the segmentation location to L ⇒ Re-compute the total distance between the test writing direction array (1->L) and all representative patterns of the current pattern shape ⇒ Increment the loop counter r xStore the segmentation points of the current character < L xiLoop on the segmentation points, let the loop counter (f) • let L2 be the length of the writing direction of the current test stroke from the segmentation point (f) to L • repeat steps vii to ix • Store the segmentation points of the current character < L2 • repeat step xi for third and fourth character xiiIncrement the loop counter f xiiiwrite to output file 1 to store first, second, third and fourth character recorded data xivLoop on Decision_Matrix of second, third and fourth characters and reject patterns that cannot be connected from right xvconstruct tree by adding • first characters of length L • first characters of length < L & second characters beginning at the first character end & ending at L • first characters of length < L & second characters ending < L & third characters beginning at the second character end & ending at L • first characters of length < L & second characters ending < L & third characters ending < L & fourth character beginning at the third character end & ending at L • add info: start and end of each character and the total distance xviwrite to output file 2 to store the tree of choices constructed for each stroke before restoring dots xviiAssign dots to all tree choices by trying different distributions of the total no. of dots assigned to stroke with each tree member characters and checking the validity of their number and location to remove inconvenient tree members, replace valid character choices by their ASCII codes. xviiiRemove redundant tree members xixsort the tree ascendingly by total distance (the minimum edit distance*length of Training_String / Number_Of_Hits) xxUse a simple interface to convert to the ASCII codes of the valid tree members to strings xxiwrite to output file 3 to store the tree of choices constructed for each stroke after restoring dots xxiiIncrement the loop counter m Increment the loop counter s

163

‫ملخص البحث‬ ‫أصبحت عملية ﻣحاكاة الحاسب لإلنسان فى القيام باألنشطة المختلفة ‪،‬وﻣنھ‡ا الق‡راءة عل‡ى س‡بيل المث‡ال‪،‬‬ ‫ﻣثارا للبحث الدقيق خالل العقود الثالثة الماضية‪ .‬ويعزى االھتمام الموجه لھ‡ذا المج‡ال ل‡يس فق‡ط لم‡ا يحتوي‡ه ﻣ‡ن‬ ‫تحديات ﻣشوقة ولكن أيضا للمنافع القيمة التى يمكن الحصول عليھا عند تنفيذ ھذه العملية و تقديمھا فى شكل ﻣن‡تج‬ ‫ﻣتاح باألسواق‪.‬‬ ‫إن ﻣھارة الكتابة عند اإلنسان تعد طابعا ﻣميزا له يتفرد به كل شخص عن اآلخر‪ .‬فالكتابة ھ‡ى العالﻣ‡ات‬ ‫التى يقوم اإلنسان برسمھا على سطح ﻣا بغرض التواص‡ل ﻣ‡ع نفس‡ه أو ﻣ‡ع اآلخ‡رين‪ .‬ورغ‡م التط‡ور التكنول‡وجى‬ ‫وظھور األنواع المتقدﻣة ﻣن الحاسب فقد استمرت الكتابة كوس‡يلة اتص‡ال وتس‡جيل للمعلوﻣ‡ات ف‡ى الحي‡اة اليوﻣي‡ة‬ ‫لسھولة استخدام و تداول الورقة والقلم ﻣقارنة بلوحة ﻣفاتيح الحاسب‪.‬‬ ‫والتعرف على الكتابة ھى عملية تحويل اللغة ف‡ى ص‡ورتھا المرس‡وﻣة )بخ‡ط الي‡د أو بواس‡طة الحاس‡ب(‬ ‫إلى رﻣوز )حروف( اللغة المعروفة للبشر‪.‬‬ ‫وعملية التعرف على كتابة اإلنسان )خط اليد( تعد عملية شاقة ج‡دا نتيج‡ة لالختالف‡ات الكبي‡رة ف‡ى ش‡كل‬ ‫الخطوط وتشابك حروف اللغة‪ .‬ولذا‪ ،‬فإن ﻣعظم الجھود التى قام بھا الباحثون ف‡ى ھ‡ذا المج‡ال ق‡د وجھ‡ت للتع‡رف‬ ‫عل‡‡ى الكتاب‡‡ة الالتيني‡‡ة والياباني‡‡ة والص‡‡ينية ألن الح‡‡روف ف‡‡ى ھ‡‡ذه اللغ‡‡ات تكت‡‡ب بطريق‡‡ة ﻣنفص‡‡لة ع‡‡ادة‪ .‬أﻣ‡‡ا اللغ‡‡ة‬ ‫العربية فعلى العكس ف‡إن التع‡رف عليھ‡ا يواج‡ه ع‡دة ﻣش‡كالت تقني‡ة غي‡ر ﻣوج‡ودة ف‡ى أى لغ‡ة أخ‡رى ﻣم‡ا يجع‡ل‬ ‫الھدف أكثر تحديا‪.‬‬ ‫واألبحاث التى تمت فى ﻣجال التعرف على الحروف يمكن تقسيمھا بوجه ع‡ام إل‡ى أربع‡ة أقس‡ام تختل‡ف‬ ‫باختالف طبيعة المشكلة‪ .‬وھذه األقسام ھى‪:‬‬ ‫‪ .١‬التعرف على الحروف المنفصلة‪.‬‬ ‫‪ .٢‬تقسيم )تقطيع( الكلمة قبل التعرف على الحروف‪.‬‬ ‫‪ .٣‬تقسيم )تقطيع( الكلمة والتعرف على الحروف فى نفس الوقت‪.‬‬ ‫‪ .٤‬التعرف على حروف الكلمة ككل بدون اللجوء للتقسيم‪.‬‬ ‫وقد تم توجيه الجھد الذى قمنا به ف‡ى ھ‡ذه الرس‡الة إل‡ى عملي‡ة التع‡رف عل‡ى الح‡روف العربي‡ة المكتوب‡ة‬ ‫بخط اليد‪ .‬كما تمت دراسة ھذه العملي‡ة ﻣ‡ن ع‡دة جوان‡ب‪ :‬التع‡رف عل‡ى الح‡روف المنفص‡لة والمتش‡ابكة‪ ،‬التع‡رف‬ ‫على الحروف بصورة آنية وغير آنية‪ ،‬استخدام قاعدة بيانات ﻣن كات‡ب وحي‡د أو ع‡دة كت‡اب‪ .‬والھ‡دف ك‡ان تحقي‡ق‬ ‫افضل دقة ﻣمكنة فى التعرف على الحروف باستخدام أبسط الخوارزﻣات المبنية على القواعد المنطقية‪.‬‬

‫‪164‬‬

‫وتنقسم ھذه الرسالة إلى جزئين‪:‬‬ ‫الجزء األول وفيه تم تق‡ديم نظاﻣ‡ا ﻣقترح‡ا للتع‡رف عل‡ى الح‡روف العربي‡ة ﻣنفص‡لة وﻣكتوب‡ة بخ‡ط الي‡د‬ ‫بواسطة كاتب وحي‡د بص‡ورة غي‡ر آني‡ة‪ .‬وق‡د ت‡م اس‡تخدام ﻣعظ‡م الس‡مات )‪ (features‬المعروف‡ة والمس‡تخدﻣة ﻣ‡ن‬ ‫الباحثين‪ .‬وق‡د تمكنن‡ا ﻣ‡ن الحص‡ول عل‡ى نت‡ائج ﻣقارب‡ة لم‡ا توص‡ل إلي‡ه الب‡احثون ﻣ‡ع بس‡اطة النظ‡ام المقت‡رح ع‡ن‬ ‫طري‡ق دﻣ‡ج الس‡مات )‪ (feature fusion‬وتقس‡يم الح‡روف العربي‡ة إل‡ى ﻣجموع‡ات ع‡ن طري‡ق س‡ماتھا الھيكلي‡ة‬ ‫المميزة‪.‬‬ ‫الجزء الثانى تم تقديم ﻣقترحا للمراح‡ل األول‡ى نح‡و عم‡ل نظ‡ام للتع‡رف عل‡ى الكتاب‡ة العربي‡ة المتش‡ابكة‬ ‫بص‡‡ورة آني‡‡ة‪ ،‬حي‡‡ث ت‡‡م تجمي‡‡ع قاع‡‡دة بيان‡‡ات ﻣ‡‡ن كتاب‡‡ات يدوي‡‡ة لع‡‡دة أش‡‡خاص وت‡م اقت‡‡راح طريق‡‡ة جدي‡‡دة لفص‡‡ل‬ ‫الس‡‡طور والكلم‡‡ات كم‡‡ا اس‡‡تخدﻣت الخوارزﻣ‡‡ات المبني‡‡ة عل‡‡ى القواع‡‡د المنطقي‡‡ة )‪(Rule-based algorithms‬‬ ‫لتقس‡‡يم )تقطي‡‡ع( ﻣق‡‡اطع الكلم‡‡ة العربي‡‡ة والتع‡‡رف عل‡‡ى الح‡‡روف ف‡‡ى نف‡‡س الوق‡‡ت باس‡‡تخدام البرﻣج‡‡ة الديناﻣيكي‡‡ة‬ ‫)‪ .(dynamic programming‬وكان ن‡اتج ھ‡ذه المراح‡ل ﻣكون‡ا ﻣ‡ن قائم‡ة ﻣرتب‡ة ﻣ‡ن الق‡رارات المحتمل‡ة‪ .‬وھ‡و‬ ‫اسلوب ﻣميز وجديد لم ي‡تم اس‡تخداﻣه ﻣ‡ن قب‡ل ف‡ى ﻣج‡ال التع‡رف عل‡ى الح‡روف العربي‡ة وتوص‡لنا لنت‡ائج ﻣبش‡رة‬ ‫حيث تمكننا ﻣن تقطيع والتعرف على الحروف المتشابكة بصورة جي‡دة وق‡د ظھ‡رت ‪ %٩٥‬ﻣ‡ن النت‡ائج الص‡حيحة‬ ‫فى أعلى قائمة القرارات‪ .‬وفى المستقبل يمكن االستفادة ﻣن علم اللغويات الختيار القرار األفضل ﻣن تلك القائمة‪.‬‬

‫‪165‬‬

‫الخوارزمات المبنية على القواعد المنطقية للتعرف على الحروف المكتوبة بخط اليد‬ ‫إعداد‬ ‫المھندسة ‪ /‬راندة إبراھيم محمد األنور‬ ‫رسالة مقدمة إلى كلية الھندسة‪ ،‬جامعة القاھرة‬ ‫كجزء من متطلبات الحصول على درجة الماجستير‬ ‫فى ھندسة اإللكترونيات واالتصاالت الكھربية‬ ‫يعتمد من لجنة الممتحنين‬ ‫_____________________________________‬ ‫أ‪.‬د‪ .‬محسن عبد الرازق رشوان‬

‫المشرف الرئيسى‬

‫_____________________________________‬ ‫أ‪.‬د‪ .‬سامية عبد الرازق مشالى‬

‫مشرف‬

‫_____________________________________‬ ‫أ‪.‬د‪ .‬مجدى فكرى محمد رجائى‬

‫عضو‬

‫_____________________________________‬ ‫أ‪.‬د‪ .‬محمد عبد الفتاح سعد الشريف‬

‫عضو‬

‫كلية الھندسة‪ ،‬جامعة القاھرة‬ ‫الجيزة‪ ،‬جمھورية مصر العربية‬ ‫فبراير ‪٢٠٠٧‬‬

‫‪166‬‬

‫الخوارزمات المبنية على القواعد المنطقية للتعرف على الحروف المكتوبة بخط اليد‬ ‫إعداد‬ ‫المھندسة ‪ /‬راندة إبراھيم محمد األنور‬ ‫رسالة مقدمة إلى كلية الھندسة‪ ،‬جامعة القاھرة‬ ‫كجزء من متطلبات الحصول على درجة الماجستير‬ ‫فى ھندسة اإللكترونيات واالتصاالت الكھربية‬ ‫تحت إشراف‬ ‫أ‪.‬د‪ .‬محسن عبد الرازق رشوان‬

‫أ‪.‬د‪ .‬سامية عبد الرازق مشالى‬

‫أستاذ مادة معالجة اإلشارات الرقمية‬

‫رئيس قسم الحاسبات والنظم‬

‫كلية الھندسة‪ ،‬جامعة القاھرة‬

‫معھد بحوث االلكترونيات‬

‫كلية الھندسة‪ ،‬جامعة القاھرة‬ ‫الجيزة‪ ،‬جمھورية مصر العربية‬ ‫فبراير ‪٢٠٠٧‬‬

‫‪167‬‬

‫الخوارزمات المبنية على القواعد المنطقية للتعرف على الحروف‬ ‫المكتوبة بخط اليد‬ ‫إعداد‬

‫المھندسة ‪ /‬راندة إبراھيم محمد األنور‬ ‫رسالة مقدمة إلى كلية الھندسة‪ ،‬جامعة القاھرة‬ ‫كجزء من متطلبات الحصول على درجة الماجستير‬ ‫فى ھندسة اإللكترونيات واالتصاالت الكھربية‬

‫كلية الھندسة‪ ،‬جامعة القاھرة‬ ‫الجيزة‪ ،‬جمھورية مصر العربية‬ ‫فبراير ‪٢٠٠٧‬‬

‫‪168‬‬