Pattern Recognition 33 (2000) 1975}1988
An expert system for general symbol recognition Maher Ahmed , Rabab Kreidieh Ward* Wilfrid Laurier University, Physics and Computer Department, Waterloo, Ont, N2L 3C5 Canada University of British Columbia, Electrical and Computer Engineering Department, Vancouver, BC, V6T 1Z4 Canada Received 9 July 1998; accepted 27 August 1999
Abstract An expert system for analysis and recognition of general symbols is introduced. The system uses the structural pattern recognition technique for modeling symbols by a set of straight lines referred to as segments. The system rotates, scales and thins the symbol, then extracts the symbol strokes. Each stroke is transferred into segments (straight lines). The system is shown to be able to map similar styles of the symbol to the same representation. When the system had some stored models for each symbol (an average of 97 models/symbol), the rejection rate was 16.1% and the recognition rate was 83.9% of which 95% was recognized correctly. The system is tested by 5726 handwritten characters from the Center of Excellence for Document Analysis and Recognition (CEDAR) database. The system is capable of learning new symbols by simply adding their models to the system knowledge base. 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Expert systems; OCR; Structural; Pattern recognition; Models; Mapping
1. Introduction Handwritten symbol recognition has received and continue to receive much attention by many researchers. There is a wealth of papers published and studies carried on this subject. In this paper we introduce a method which is designed to recognize any handwritten symbol including numerals, Latin, Arabic, Chinese or electrical symbols. It is thus di!erent from the vast majority of these studies where each of the later was meant for a speci"c application e.g. Arabic numerals or Latin letters. We here thus do not mention many such studies but give quick overview of di!erent pattern recognition techniques before we introduce our method. Pattern recognition techniques fall under the following main categories: statistical pattern recognition, structural pattern recognition, and hybrid systems. In most statist-
* Corresponding author. Tel.: #1-604-822-6894; fax: #1604-822-9013. E-mail addresses:
[email protected] (M. Ahmed),
[email protected] (R.K. Ward).
ical pattern recognition systems, the symbol features are "rst de"ned, then the decision boundaries in the feature space are determined. Many such features are described by Trier et al. [1]. Many methods that are based on statistical pattern recognition techniques have been developed and proved to be e!ective. An example of these methods is suggested by Cao et al. [2]. This system uses the local histograms in each of the di!erent zones (grids) as features. Another statistical pattern recognition system described by H. Al-Yosef and S. Udpa [3] uses the normalized ratios of moments that describe the vertical and horizontal projections as symbol features. Arti"cial neural networks can be e!ectively used to extract and also to cluster the features of symbols. A successful ANN for handwritten English character recognition, the Necognitron [4], is an attempt to imitate the human visual system for pattern recognition. Statistical pattern recognition systems (including arti"cial neural networks) need a large number of training samples to extract or cluster the features. If the features are not selected (or extracted) properly the symbol feature space regions will be overlapped and there will be many mis-recognized symbols in the overlapped regions. Generally, each statistical pattern recognition system is
0031-3203/00/$20.00 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 9 1 - 0
1976
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
developed for a certain application and cannot learn new symbols easily. As an example, if a system uses an arti"cial neural network for extracting English characters features and if it is later required to add one or more new symbols representing, for example, Arabic numerals then, the system should be retrained with the whole training data (old and new samples). In addition, if the new symbols are added to the system, the system may not function properly, especially if one or more of the new symbol features are close to other old symbol features. In this case, the arti"cial neural networks system itself may need to be redesigned, e.g. by changing the number of neurons, the number of layers, etc. In structural pattern recognition, complex patterns are divided into simpler ones, and the relations between the sub-patterns are described. Structural pattern recognition methods generally use rules and grammars to describe symbols as shown by Fu [5]. Four expert systems are developed for recognizing unconstrained handwritten numerals by Suen et al. [6] Di!erent primitives are used. The "rst system obtains the skeleton by thinning, then determines the end points and the junction points. The recognition rate is 86.05% and the substitution rate is 2.25%. The second expert system performs the following steps: 1. 2. 3. 4.
thinning, tracing the skeleton, approximating the curves by lines, size normalization and arrangement of the line segments into primitives.
The recognition rate is 93.1% and the substitution rate is 2.95%. The third expert system uses di!erent structural features. Examples of these features are the locations of the end points and the junction points in each region of the digit, the width of stroke and the symmetry of character. Over 14 other features are used. The system recognition rate is 92.25% and the substitution rate is 2.15%. The fourth expert system relies on features extracted from the contours. The recognition rate is 93.9% and the substitution rate is 1.6%. Another structural method which is also designed for numeral recognition is described by Jianming and Hong [7]. In this method, four points for bends, points for curves, terminal points, and intersections are de"ned. A primitive is de"ned as the skeleton segment which starts and ends at feature points. A feature code of 11 elements is used to describe the local information of the numeral and a "ve-element vector is used to describe the global information. The recognition rate using prototype matching is 97.86% and based on neural networks is 97.29% and substitution rate is 2.71%.
Another system that integrates the use of neural networks and expert systems is reported by Amine et al. [8]. The original image is transferred into a binary image using a parallel thinning algorithm. Then, the skeleton is traced. Some primitives as straight lines, curves and loops are extracted. Finally, a "ve-layer arti"cial neural network is used for classi"cation. The neural network is trained with 2000 isolated Arabic characters and tested with other 1000 characters. The recognition rate is 92%. A new technique integrating a neural network and a knowledge-based system for image recognition was introduced by Kim et al. [9]. The system was designed to recognize handwritten digits (numerals). The model is capable of inductive learning from example data and logical inference from the rule base. The system has the ability to justify its answer (e.g., why a numeral is recognized as 5 and not 6, or 9.) This system is well suited for applications when only partial knowledge is available. For a typical 200 handwritten digits (numerals), the recognition rate ranges from 69.5 to 81.5% and depends on the number of rules (80 rules). A system by Burel et al. [10] uses a combination of statistical and morphological features and has proved to be successful in handwritten digit recognition. There are 20 regions. Hence, 20 features are de"ned as ratios of areas. The morphological features include cavities (west, east, north, south and center) and the hole. A "ve-layer perceptron neural network is used for classi"cation. The neural network is trained by 1414 digits and the evaluation is performed on 1175 digits. The recognition rate is 93.6%. A structural feature extraction technique for English character recognition is described by Starzyk et al. [11]. An e!ective method for thinning using 3;3 template windows as well as pixels outside the windows are used. These windows are applied sequentially and are possible to be implemented by a hardware circuit. After thinning the image, critical points are marked; then the segments are determined. These segments are scaled, matched, and classi"ed. Many character recognizers are based on mathematical formalisms that minimize a measure of misclassi"cation. Arti"cial neural networks employ mathematical minimization techniques and are used in commercial OCR systems. Recognition rates for machine-printed characters can reach over 99%, but handwritten character recognition rates are typically lower because every person writes di!erently, as reported by S. Lam in the Center of Excellence for Document Analysis and Recognition, State University of New York at Bu!alo. In this paper, we introduce a new structural pattern recognition system. While successful existing pattern recognition methods are each designed for a speci"c application e.g. Chinese symbols, or Arabic numerals recognition, our system is general in that it is designed to
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
recognize any symbol whether it is an English character, an Arabic numeral, or an electrical symbol. Our system has the following characteristics: it does not use features but describes the symbol by di!erent models. It does not use syntactic grammars but rules to rotate, scale, thin, and other rules to model symbols. One advantage of our system is that it has the capability of recognition even when few model samples are used. It measures the similarities as well as the di!erences between the representation of the symbol to be recognized and those of the stored models. Our system can justify its answer. Another advantage is that if a new symbol is added, our system can easily be updated by simply adding the models of the new symbol to the system knowledge base. Our system uses di!erent stages for analyzing and recognizing symbols. Each stage reduces the symbol details until the symbol is described by one or more representation which contain the most necessary knowledge needed to enable the symbol recognition. Not many models of a symbol are required to achieve high recognition rate. These models are stored in the system knowledge-base.
1977
A symbol can be handwritten in an enormous number of di!erent ways in the same space. Our system maps this enormous number of di!erent representations of a handwritten symbol to a smaller number of possible representations. We partition the symbol into a number of square zones. There are 2*"5 possible symbols for an image that has = pixels in width and ¸ pixels in length. We map this large number (2*"5) of symbols to just 16, "24, symbols, where N is the number (integer) of zones used to model the symbol. This is achieved by allowing a limited number of sub-symbols in a zone. As an example, consider a symbol which has one zone (N"1) and ¸"10 and ="10 pixels. For this zone there exists 16 possible symbols as shown in Fig. 1. As another example, assume that a symbol is written within a 30;20 pixels where each zone is 10;10 pixels. This area will have 6 zones. Since each zone has 16 possible subsymbols, there are 16 allowable models instead of 2 possible ones. As an example, consider the letter &B'. One of a possible model for the letter &B' is
2. The developed system Expert systems are the most suitable tools for implementing structural pattern recognition techniques. Complex patterns are described in terms of simpler ones and simple patterns are described by sub-patterns. Expert systems help solve di$cult pattern recognition problems. More rules and human experience can be added easily using rule-based systems. Hence, the system performance could be improved without rebuilding or redesigning the system. An expert system for 2-D symbol analysis and recognition is here developed. As mentioned above, our system is not application-speci"c and can be used for the recognition of any bi-level symbol. These include the recognition of mathematical symbols, electrical circuit symbols, and characters such as English, Chinese, Arabic, etc. The basic idea behind our system is that a symbol can be constructed from smaller components. Here, four basic components are used. These components are the horizontal line & * ', the vertical line & " ', the 453 diagonal line &/ ', and the 1353 diagonal line & !'. These components are from now on referred to as `segmentsa or symbol primitives. The system transforms the symbol into a set of these segments. All these segments have the same length. The segments representing a symbol are partitioned into groups or zones, where each zone belongs to one of 15 di!erent possibilities. Fig. 1 shows the 16 di!erent possible zones (including the empty zone).
The top two zones are
the middle zones are
and the bottom zones are
Our system stores each such model for each symbol as a vector in the system knowledge base. For example, one of the models of the letter &B' shown above, its vector representation will be formed of the shapes of the above six zones. When a symbol is presented to the system for recognition, the di!erent steps that the system performs are
1978
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
Fig. 1. The 16 possible images in a zone of 10;10 pixels.
when its principal axis (de"ned below) is vertical (or having a multiple of 203 to the vertical direction). Each symbol will be rotated (by an angle )203) around its central point until the symbol direction, i.e. its principal axis, is at a certain direction which is vertical or having a multiple of 203 to the vertical axis. In our system, the tilted angle is assumed not to exceed 203. (a) The central point. The physical concept of the center of mass refers to that point in an object that has the same amount of matter around it in any direction. If the origin for the entire image is considered to be the pixel at location (0,0), then the center of mass of an object C is (C , C ), where C "1/n x, C "1/n y, for an n;n im age. (b) The principal axis. The principal axis of a bi-level object is a line passing through the object's center of mass having a minimum total distance from all pixels belonging to the object. 2.2. The scaling algorithm
Fig. 2. The structural pattern recognition steps.
shown in Fig. 2. After rotating, scaling, and thinning the system decomposes the symbol into strokes, Each stroke is decomposed into short straight lines (segments). The segments are grouped into zones. A vector is used to store the zones shapes and hence to represent the symbol. The distance between two vectors enables the system to measure the di!erences and similarities between the two symbol representations. 2.1. The rotation stage The objective of the "rst stage is to adjust tilted symbols. Symbols may be drawn tilted in di!erent directions. This problem is solved by rotating the symbols. The central point (de"ned below) is considered as the origin of the symbol. The symbol direction is considered to be zero
There are two kinds of documents. The "rst kind includes documents that have symbols of similar sizes (such as documents of English text, typed or printed by a machine). The second kind includes documents that have symbols of di!erent sizes (such as documents containing graphs, mathematical symbols, electrical circuit elements and handwritten text). If the document has symbols of similar sizes, our system will scale all these symbols so that all the symbols have the same dimensions. The new dimensions of the symbols may be selected by the user or defaulted to 32 pixels width and 48 pixels length symbol. The new dimensions are usually smaller than the dimensions of the symbols in the original document. For the case of a document that have symbols of di!erent sizes, the new dimensions of each scaled symbol will also be the same for all symbols but these dimensions will determined by the program. Hence, the system will automatically choose certain values for the symbol length and width. Examples of these values are 32, 48, 64, 72, or 96 pixels for each of the width and the length. Scaling is useful for mapping some of the di!erent handwritten styles of the same symbol to the same
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
representation. A straight line
and lines with some deformation
will have the same representation
after down scaling them to a certain length. Consequently, this can be useful if the above three lines represent the same symbol. Scaling down a symbol usually deletes some of its details but keeps the global shape, this tends to increase the opportunity of matching the considered symbol with the stored models. However, when the information in the small details is of relevance then scaling down the symbol decreases the opportunity of correct matching. Hence, the choice of the new symbol dimensions is important. Assuming that the document contains symbols with di!erent symbol sizes. After rotation, the system scales the symbols to some prede"ned size (as mentioned also before). The scaling algorithm is described by the following steps: 1. For each symbol, create a new empty (generally smaller) image with the prede"ned dimension values as mentioned above (Fig. 3).
1979
2. Find (S , S ) the row and column scale values as the ratios of the original and the new symbol dimensions. 3. Use a sliding window of size (S , S ) on the original symbol to determine the gray level of each pixel in the new symbol image. The window used to transform Fig. 3(a) to (b) has (S , S )"(2.28, 1.31), the old dimen sions are (73, 42), the new ones are (32, 32). 4. The new gray level value of a pixel in the scaled version is equal to the area of all ink (black) inside the window (normalized by the area of the window) in the original version, thus it is a value between 0 and 1 inclusive. 5. Since the obtained value of the pixel, so far, falls between 0 and 1, we use a threshold value &Th' to determine whether the "nal value of the pixel will be black or white. If the calculated gray level * Th, the pixel will be black. Possible values of the threshold are in the range [0,1]. Experiments show that a value for Th "0.2 is a good choice. Results when using threshold values of 0, 0.2, 0.6, 0.8 and 1.0 are shown in Fig. 4. If the original symbol is thin, using a high threshold value may cause discontinuity. However, if the original symbol is not thin, all values between [0, 1] can be used. Increasing the threshold value tends to thin the pattern but preserve its shape. 2.3. The thinning algorithm Thinning is the process of reducing the thickness of each line of patterns to just a single pixel. A comprehensive survey of thinning algorithms is described by Lam et al. [12].
Fig. 3. A symbol after scaling to di!erent dimensions and using a scaling threshold"0.2: (a) the original symbol, (b) 32;32 size, (c) 48;32 size, (d) 24;16 size, (e) 8;8 size, and ( f ) 8;4 size.
1980
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
Fig. 4. Scaling a symbol with di!erent scaling threshold values and using 32;32 size: (a) the original symbol, (b) threshold"0, (c) threshold"0.2, (d) threshold"0.4, (e) threshold"0.6, ( f ) threshold"0.8, and (g) threshold"1.0.
A fast parallel algorithm for thinning digital patterns by deciding as to whether or not a pixel can be eroded is described by Zhang and Suen. [13]. In our system we "rst used this algorithm. This is then followed by a sequential one pass of our fast knowledge base system [14] so as to reduce the number of pixels in the "nal thinned pattern. 2.4. Symbols representation The objective of this stage is to represent a symbol by a vector. The details required to describe the symbol are proportional to the vector length. Similar symbols will be closer to each other in the N-dimensional space. After thinning, each symbol is described by its strokes. A stroke is a sequence of pixels that starts and ends at `speciala pixels. A `speciala pixel is a pixel that has three or more neighbors or exactly one neighbor. Each `speciala pixel is marked by the letter &A' as shown in Fig. 6(e). After marking all the special pixels, the symbol is decomposed into strokes. Then, each of the unmarked pixels in every stroke is marked by a code (an integer number that takes a value between 1 and 8 inclusive) as shown in Fig. 5. The code indicates the direction between the present pixel and the following one. The marked pixels are shown in Fig. 6( f ). After isolating the symbol strokes and marking each pixel by a code as shown in Fig. 6( f ), a set of rules is applied to each symbol stroke to transfer it into segments (straight lines), of "xed equal lengths. Depending on the chosen length for the segments, a stroke may be converted to one or more segments or may vanish. These
Fig. 5. The di!erent direction possibilities.
segments are the symbol primitives. A vector of these primitives will be used to represent the symbol. Using many and thus shorter segments to represent a stroke preserves all details of the stroke, including unnecessary ones, while using long segments may delete important details. However, if very short segments are used then many more stored models will be needed. Using di!erent segment lengths (2}9 pixels) is shown in Fig. 7. The rules used to represent a stroke by segments are described in the following section. 2.5. The system mapping rules In this section, we will see how di!erent styles for a symbol for example the letter &A'
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
1981
Fig. 6. Di!erent processing of a symbol: (a) A symbol, (b) after rotation, (c) after scaling, (d) after thinning, (e) after marking the special pixels, and ( f ) after isolating the symbol strokes & coding each pixel.
are all mapped to
and the letter &A'
are mapped to
i.e. to the same representation.
We will show how we transfer each stroke into one or more segments. All of these segments will have the same length ¹. In forming a segment, consecutive pixels are processed one at a time. To form a segment of length ¹ pixels, ¹ or more pixels are processed and converted to a segment (except at the end of the stroke, where ¹/2 pixels are here su$cient). A set of rules that map a symbol stroke into segments of "xed length ¹ is developed. To "nd the direction of the line segment, the consecutive pixels of each stroke are analyzed pixel by pixel. For each pixel, a certain probability is assigned for each of the eight possible directions. Then, the sum of the probabilities over the previous pixels and the present one is calculated for each of the eight possible directions. If one of these sums exceeds the threshold value &¹' (the segment length), then, a complete segment is formed and the segment direction is assumed to be the direction of that of the largest probability sum. Next, the same analysis for the remaining pixels in the
1982
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
Fig. 7. Di!erent mappings of a symbol by using different segment lengths of: (a) 1-pixels, (b) 2-pixels, (c) 3-pixels, (d) 4-pixels, (e) 5 pixels, ( f ) 6-pixels, (g) 7-pixels, (h) 8-pixels, and (I) 9-pixels.
stroke continues so as to form new line segments and so on. For the remaining pixels between the end of the last formed segment and the end of the stroke, if any of the eight values exceeds half the value of the threshold it will be considered as a segment otherwise, no segment is formed. The following algorithm describes the mapping rules in more details: 1. De"ne the segment length (integer) which is the same as the threshold &¹'. 2. Find the code of every pixel of the stroke as shown in Fig. 5. 3. Consider the "rst pixel after the special pixel of the stroke. 4. For i"1}8, initialize every one of the eight directions that the segment can take s[i] to 0.0. 5. Assign a probability for the segment direction "1 for the direction of the considered pixel, a probability "0.7 for its two adjacent directions, a probability "0.49 for the next two adjacent directions, and a probability"0 for the remaining directions. For example if the code of the pixel is 5 then p[5]"1, p[4]"0.7, p[3]"0.49, p[6]"0.7, p[7]"0.49, and p2"p[1]"p[8]"0. These probability values are determined experimentally. 6. For i "1}8, update s[i] by adding the new p[i] to the old s[i]. 7. Calculate the maximum of s[1],2, s[8], if it is greater than or equal to the threshold &¹' then we form a segment of length ¹ pixels in that direction. The length of the segment formed is always equal to ¹ pixels. Then, consider the next pixel of the stroke, and go to step 4.
8. If the end of the stroke is not reached, consider the next pixel of the stroke and go to step 5 (note that each of the s[i] is still less than &¹'). 9. At this point, the end of the stroke is reached. Determine the maximum of all of s[1] through s[8], if it is greater than or equal to half of the threshold &¹', then form a segment in that direction. 10. End of a stroke analysis. The result is a new stroke consisting of segments (straight lines), each has one of the eight directions. 11. After processing all the symbol strokes and "nding all the segments, the segments are grouped so that each group corresponds to one of the 16 zones as shown in Fig. 1. 12. The horizontal and vertical number of zones for the symbol is determined and the zone shape is determined as one of the 16 zones shown in Fig. 1.
2.6. The system models and recognition Models for each symbol are stored in the system. Each model is represented by a vector and may have di!erent number of zones. The number of zones is determined by the prior choice of the segments (straight lines) length ¹. The vector of a model contains the number of vertical and horizontal zones followed by a series of integers. Each of these integers (0}15) represents one of the 16 possible segment images (shown in Fig. 1). Our system can use two methods for constructing the models. In the "rst method, the system extracts models from the symbols and store the models in a database. In the second method, the human designer constructs the
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
1983
Fig. 8. (Continued.)
Fig. 8. (a) Some models used by the system and the C of models used for each symbol, (b) Models used by the system for the letter A.
di!erent shapes representing a symbol by accessing the database directly, i.e. for each "xed number of vertical and horizontal zone, the di!erent possible representations of the symbol are found. Then, these models are stored as vectors in the system knowledge base. In our present implementation, we used the second method. The number of each of the vertical or horizontal zones had a range between 1;1 and 8;8 zones. The size of any zone in the models is irrelevant as we only need the shape of its content for later comparison with the vector representing the symbol to be recognized. The symbol to be recognized is compared with the stored models with the same number of vertical and horizontal zones. In this work, as mentioned above, we used the second method for constructing the system models. In our present implementation, an average of 97 models for each symbol was used (Fig. 8). Ideally, the system's stored models should include all possible shapes for each symbol. The length of the segment controls the detail descriptions of the symbol. Using long segment length, hence few zones (such as 2;2 or
1984
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
2;3) results in a fewer number of models for a symbol, but may not preserve some important symbol details. On the other hand, using short segments (hence, many zones such as 4;5 or 5;5) to model this symbol will preserve the symbol details but results in a large number of models. Some system models for di!erent symbols are shown in Fig. 8. These models include Latin letters, digits 0}9, the electric symbol for a diode , the Arabic digit , the Greek letter , Arabic letters , and a general symbol . Models that use a large number of zones (i.e. short segments) are suitable for representing symbols that have small strokes as in the case of most Arabic characters or electrical symbols. In our present implementation, for each input symbol to be recognized we use three representations (vectors). Each vector corresponds to using segment length ¹ equal to 6,10, or 14 pixels. Depending on the segment length ¹ used, each tested vector will have a certain number of vertical and horizontal zones. Each vector will only be compared to the stored model vectors which have the same number of vertical and horizontal zones. The distance between the input vector and the stored ones will be calculated by comparing each zone in the input symbol vector to the corresponding zone in the stored symbol vectors. Finally, the stored vector with the minimum distance with the input vector will be selected as long as the minimum distance is less than a certain `rejectiona threshold. If the minimum distance is larger than the threshold then the system rejects the symbol as unrecognizable (the symbol does not correspond to any of its stored models). In case of a tie, the model with the larger number of zones (i.e. with more details) will be chosen.
the option of selecting the size, however, we here select the option where the system determines the prede"ned size (according to the original symbol size). Then the symbols are thinned as shown in Fig. 9(d). Then, the special pixels are marked and the strokes of the symbols are isolated. At this stage, the system will transfer the strokes into segments where all segments have one "xed length ¹. The symbols representations using segments lengths equal to 6, 10, and 14 pixels are shown in Fig. 9(e), (g), respectively. Most symbols are recognized by the system as shown in Fig. 9(h) when a threshold `rejectiona distance "100 was used. Using a small `rejectiona distance value, as shown in Fig. 9(i) increases the recognition rate, however, it also increases the number of symbols rejected by the system. Generally, if a symbol is mis-recognized but a high recognition rate is required, then more models with larger number of zones for that symbol should be added. After studying the mis-recognized symbols we found that a symbol may be recognized incorrectly due to one of the following reasons: 1. The symbol is written in such a way that it closely resembles another symbol. In this case, the human will also be confused about the meaning of the symbol and has to study the syntax or the semantic to understand the symbol and select the closest meaningful symbol. 2. The symbol model is not in the system. This model can be added to the system. 3. The "nal models for di!erent symbols may be similar if we use long segments (as for some handwritten u's and v's). In this case, we should use short segments to keep as much details as required to di!erentiate between them so as to increase the recognition rate.
3. Results 4. The system limitations and future work As mentioned above an average of 97 models were stored per symbol. The system was tested with 5726 handwritten English characters and digits. When a rejection threshold value "15 was used the rejection rate was 16.1% and the recognition rate was 83.9% of which 95% was recognized correctly. When a higher rejection threshold value "100 was used, the rejection rate was 0% and recognition rate was 100% of which 87.6% was recognized correctly. The tested (input) data were constituted of 5726 handwritten bi-level English characters from the Center of Excellence for Document Analysis and Recognition (CEDAR) database as well as another 120 symbols representing Arabic letters, Chinese characters, and mathematical and electrical symbols. A subset (101 symbols) of this database is shown in Fig. 9(a). After rotating the symbols, some of its similar symbols become closer in shape as shown in Fig. 9(b). Next, the system scales down the symbols to a prede"ned size as shown in Fig. 9(c). As mentioned earlier, the user has
Our system has the main advantage of its capability of recognizing any symbol in any language. It can also justify the answer. On the other hand, in our system, the following characters have the same models (c, C), (f, F), (m, M), (u, U), (v, V), (k, K), (p, P), (s, S), (t, T), (w, W), (x, X), (y, Y), and (z, Z). i.e. for these characters it is now not possible to determine whether or not it is written in the lower or upper case. This problem can be solved by comparing the size and location of the symbols to determine whether the symbol is in the lower or upper case. Also, since our system performs the recognition at di!erent stages, for some English characters, the scaling stage can be useful to "nd whether a character is in upper case or lower case. In our system, we also use the same models for the following symbols (1 and l), (2, z), (5, s), (0, o, O), (q, 9), (g, 9), and (8, B). This problem can be overcome by having some prior knowledge about the symbols. For example,
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
1985
Fig. 9. (a) A document, (b) The symbols after rotation, (c) The symbols after scaling, (d) After thinning, (e) After modeling using 6-pixels segment length, ( f ) 10-pixels segment length, (g) 14-pixels segment length, (h) Recognition with rejection distance"100, (i) rejection distance"15.
1986
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
in the case of the Canadian postal code, the "rst symbol is a letter which is followed by a numeral then a letter, then the symbol &}' then a numeral, then a letter, which is followed by a numeral. The current system is so general that some models for A and R will also be similar. The same applies for O, Q and D. This problem arises because our system may omit small strokes and approximate small changes in the strokes by straight segments. Hence, further checks may be used to determine the "nal decision. For example, for the case of A and R after recognition, we can examine the direction and straightness of certain strokes. If the left stroke is vertical with respect to the right stroke, then it is R and not A. Similarly, for O, D, and Q after recognition, if there is a small stroke in the bottom-right part, then, it is Q. Otherwise, it is O or D. If there is a straight stroke in the left part then it is D and not O. The "nal decision will be determined after these tests. In future work, the employment of basic components which are di!erent from the currently used 4 basic straight lines segments (/, !, ", *) will be considered to recognize more complicated symbols. These components will include symbols. Then, the symbols will be described by sub-symbols. Examples of these complicated symbols are maps, roads, and pictures. For large symbols, the algorithm which measures the distance can be improved by including the possibilities of deleting a row or a column (as deleting a character in spell checking algorithms).
5. System characteristics As mentioned in Section 2, the number of all possible handwritten symbols in a zone of N;N pixels is 2,",. This includes discontinuous symbols, however, for connected symbols, the number of possibilities is less than 2,",, but it is still a very large number. Our system maps this large number of possibilities into 16 since only 16 shapes are allowed in a zone. However, to represent a symbol meaningfully, we need more than one zone. Fig. 10 shows the reduction in the number of possible handwritten symbols and the number of allowed symbols versus the number of zones (the selected zone dimensions are 10;10 pixels). We have found that using 3;3 & 4;4 zones are su$cient to achieve high recognition rate for the English characters and digits. Using the C## language running on the 120 MHz Pentium PC and the 64 English letters shown in the middle of the document in Fig. 9, the system, on average, converts each symbol into its structural feature vector representation ("nal representation of the symbol by line segments) in 1.9 s. This includes rotating the symbol, scaling, thinning, decomposing it into strokes, coding each stroke, mapping each stroke into line segments of
Fig. 10. The number of allowed symbols and the number of possible symbols versus the number of zones.
Fig. 11. Symbols with di!erent number of strokes.
three di!erent lengths and "nding the representation of the line segments by zones. This results in three representations of the symbol. Each representation corresponds to using a certain length for the segments. However, the time required to represent a thinned symbol by a structural feature vector depends on 1) the number of strokes in the symbol and 2) the length of the prede"ned line segments. An experiment was conducted to "nd out how does the time required to represent a sample of thinned symbols shown in Fig. 12 by their structural feature vectors varies with the number of strokes. The original symbols are shown in Fig. 11, where the "rst symbol has one stroke, the second symbol has three strokes and each other symbol has di!erent number of strokes. We have used line segments of length 6, 10 and 14 pixels to represent a symbol by a 3 structural feature vectors. The experiment indicates that the time required to represent a symbol by a structural feature vector varies approximately linearly with the number of strokes in the symbol as shown in Fig. 13. It is not easy to "nd this relationship mathematically. However, it is expected that symbols that have more strokes would require more processing time to be recognized than symbols with fewer strokes. To address the second point, another experiment was conducted to show how does the time required to represent a thinned symbol by a structural feature vector varies with the prede"ned length of the line segment. The eight
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
Fig. 12. The symbols after thinning.
Fig. 13. The processing time for converting a thinned symbol into a structural feature vector as the number of its strokes in the symbol increases.
Fig. 14. The time required to convert the thinned symbols shown in Fig. 12 to structural feature vectors versus the length of the line segments.
thinned symbols shown in Fig. 12 were used simultaneously and only one line segment length was de"ned at a time. It is shown that using short line segments to represent a thinned symbol by a structural feature vector is faster than using long line segments. The time varies approximately linearly with the length of the line segment as shown in Figs. 13 and 14.
6. Conclusion A rule-based system for the recognition of any 2-D bi-level line symbol is introduced. Examples of these symbols are typed or handwritten mathematical and
1987
electrical symbols and characters such as Greek, Arabic, English or Chinese. The system performs the recognition in di!erent steps. The "rst step adjusts the tilted symbols by rotation, then scales them to prede"ned sizes. This is followed by thinning. At this stage, each symbol is described by its strokes. Then, we apply some rules to transform the symbol stroke into a combination of straight lines (segments). A vector representing these segments is used to model the symbol. The distance between this vector and the vectors of stored models is used to identify the input tested symbol. The system was tested with di!erent symbols. The tested data symbols consisted of all bi-level 5726 English characters available from the Center of Excellence for Document Analysis and Recognition (CEDAR) database. In addition, there were 120 other arbitrary symbols. The rejection rate was 16.1% and the recognition rate was 83.9% of which 95% were recognized correctly. In order to increase the recognition rate and decrease the rejection rate, more models should be used for each symbol.
7. Summary An expert system for 2-D bi-level symbol analysis and recognition is introduced. The proposed system is general in that it is not designed for a speci"c application, but can be used for recognition of any symbol. The system uses the structural pattern recognition technique to represent each symbol by a set of short straight lines that we call segments. To obtain a representation of a symbol, the system performs four basic steps. First, the system adjusts the symbol by rotating it around its central point until its principal axis makes a certain angle with the vertical axis (03 or having a multiple of 203). Secondly, the system scales the symbol to a prede"ned size. The third step is thinning. After that, the system extracts and describes the thinned symbol in terms of strokes. Finally, each stroke is approximated by segments (short straight lines). The resulting representation of the symbol is compared with di!erent stored models of the di!erent symbols. For each symbol many models are stored. Results and analysis of the recognition of a document are described. The boundaries (surface of Ndimensional sphere) for each symbol are determined by a threshold (the radius of this sphere). Using a low threshold will decrease the space for this symbol, increase the rejection rate and increase the recognition rate. After storing an average of 97 models/symbol, the system was tested with 5726 bi-level handwritten English letters and digits taken from the Center of Excellence for Document Analysis and Recognition (CEDAR) database and another 120 handwritten characters of another symbols.
1988
M. Ahmed, R.K. Ward / Pattern Recognition 33 (2000) 1975}1988
For a low threshold (radius"15) the rejection rate was 16.1% and the recognition rate was 83.9% of which 95% were correctly recognized. When the threshold was high (radius"100) the rejection rate was 0% and the recognition rate was 100% of which 87.6% were recognized correctly. The performance of our system can be improved by simply storing more models. The system is capable of learning new symbols by simply adding models for these symbols to the system knowledge base. The system is implemented by using the C## language and is running on the 120 MHz Pentium PC.
References [1] O. Trier, A. Jain, T. Taxt, Feature extraction methods for character recognition * a survey, Pattern Recognition 29 (4) (1996) 641}662. [2] J. Cao, M. Ahmadi, M. Shridhar, Recognition of handwritten numerals with multiple feature and multistage classi"er, Pattern Recognition 28 (2) (1995) 153}160. [3] H. Al-Yosef, S. Udpa, Recognition of arabic characters, IEEE Trans. Pattern Anal. Mach Intel. 14 (8) (1992) 853}857. [4] K. Fukushima, Necognitron: a hierarchical neural network capable of visual pattern recognition, Neural Networks 1 (2) (1988) 119}130.
[5] K.S. Fu, Syntactic Pattern Recognition and Applications, Prentice-Hall, Englewood cli!s, NJ, 1982. [6] C. Suen, C. Nadal, R. Legault, T.A. Mai, L. Lam, Computer recognition of unconstrained handwritten numerals, Proceedings IEEE 80 (7) (1992) 1162}1180. [7] H. Jianming, Y. Hong, Structural primitive extraction and coding for handwritten numeral recognition, Pattern Recognition 31 (5) (1998) 493}509. [8] A. Amin, O! -line Arabic character recognition: the state of the art, Pattern Recognition 31 (5) (1998) 517}530. [9] H. Kim, H. Yang, A neural network capable of learning and inference for visual pattern recognition, Pattern Recognition 27 (10) (1994) 1291}1302. [10] G. Burel, I. Pottier, J. Catros, Recognition of handwritten digits by image processing and neural network, International Conference on Neural Networks, Vol. 3, 1992, pp. 666}671 [11] J. Starzyk, Y. Jan, Algorithm & architecture for feature extraction in image recognition, Southeastern Symposium on System Theory, 1994, pp. 448}452. [12] L. Lam, S. Lee, C. Suen, Thinning methodologies * a comprehensive survey, IEEE Trans. Pattern Anal. Mach Intell. 14 (9) (1992) 869}885. [13] Y.T. Zhang, C. Suen, A fast parallel algorithm for thinning digital patterns, Commun. ACM 27 (3) (1984) 236}239. [14] M. Ahmed, R. Ward, A fast one pass knowledge-based system for thinning, Electronic Imaging 7 (1) (1998) 111}116.
About the Author*MAHER AHMED is an assistant professor at Wilfrid Laurier University, Waterloo, Canada. He received his Ph.D. at the University of British Columbia, Vancouver, Canada (1999). He holds two M.Sc. degrees, one in Systems and Control from Queen's University, Kingston, Ontario, Canada (1994), and the other in Computer Science from the University of Cairo University, Egypt (1988). He was with Ontario Hydro, Canada, from 1990 to 1991 and with the National Research Center, Egypt from 1987 to 1990. His research interests include pattern recognition, arti"cial neural networks and expert systems. About the Author*RABAB KREIDIEH WARD was born in Beirut, Lebanon. She received the B.E. degree from the University of Cairo, Egypt (1966), and her Masters and Ph.D. degrees from the University of California, Berkeley (1969, 1972, receptively). She is the Director of the Centre for Integrated Computer Systems Research and Professor in the Electrical & Computer Engineering Dept. at the University of British Columbia, Vancouver, Canada. Her research interests are mainly in the areas of signal processing and image processing. She has made contributions in the areas of signal detection, image encoding, compression, recognition restoration and enhancement, and their applications to infant cry signals, cable TV, HDTV, medical images, and astronomical images. She holds "ve patents related to cable television picture monitoring, measurement and noise reduction. Applications of her work have been transferred to U.S. and Canadian industries. She is the fellow of the EIC, IEEE and the Royal Socity of Canada.