Pergamon - Deep Blue - University of Michigan

10 downloads 6711 Views 1MB Size Report
template matching after a suitable window was chosen. According to Casey and Nagy, ...... who bells m edm:mon-n:~n~ m~mve c~ed i~e~ Dua,v~, .q~mm • n~r.
Pattern Recognition, Vol. 27, No. 6, pp. 825 840, 1994 Elsevier Science Ltd Copyright © 1994 Pattern Recognition Society Printed in Great Britain. All rights reserved 0031-3203/94 $7.00 + .00

Pergamon

0031-3203(93)E0002-3

S E G M E N T A T I O N OF T O U C H I N G CHARACTERS IN PRINTED DOCUMENT RECOGNITION Su LIANG,~'M. SHRIDHAR3~and M. AHMADIt f Department of Electrical Engineering, University of Windsor, Windsor, Ontario, Canada N9B 3P4 :~Department of Electrical and Computer Engineering, University of Michigan-Dearborn, Dearborn, MI 48128, U.S.A. (Received 2 March 1993; in revised form 28 December 1993; receivedfor publication 7 January 1994) Abstraet--A new discrimination function is presented for segmenting touching characters based on both pixel and profile projections. A dynamic recursive segmentation algorithm is developed for effectivelysegmenting touching characters. Contextual information and spell checking are used to correct errors caused by incorrect recognition and segmentation. Based on 12 real documents, a maximum 99.85~oand a minimum 99.4~o recognition accuracy is achieved. Touching character segmentation Dynamic recursive segmentation Feature extraction Spellingcorrection

l. I N T R O D U C T I O N

Document image processing has been the subject of extensive research since the early 1980s.t~'2) Today, character recognition systems dramatically facilitate the transfer of information into computer systems without intensive manual keying. Kahan et al. suggested that in practical applications, a document recognition system is required to read texts accurately with at least a 99.9~ recognition rate. 13) It is not difficult to design a character recognition system that recognizes well-formed and well-spaced printed characters. However, it is a challenge to develop a system which can maintain such a high recognition rate, regardless of the quality of the input documents and the character fonts.~2-6~ Presently, most recognition errors are due to character segmentation errors. (4-6) Very often, even in printed text, adjacent characters are touching, and may exist in an overlapped field. Therefore, it is essential to segment a given word correctly into its character components. Any failure or error in this segmentation step can lead to a critical loss of information from the document. Several investigators have attempted to develop techniques for properly segmenting words into their character components. Kahan et al. 13) discussed the segmentation of touching characters. Tsujimoto and Asada (4~constructed a decision tree for resolving ambiguities in segmenting touching characters. Casey and Nagy ~5)proposed a recursive segmentation algorithm for touching characters. Bose and Kuo (6) applied the Hidden Markov Model to the touched and degraded text recognition. In this paper, we propose a dynamic recursive segmentation algorithm for segmentation of touching characters based on the method proposed by Casey

Character contextual classes

and Nagy. 15~In this process, several candidate cutting points are determined from the proposed discrimination functions; the algorithm then iteratively implements a forward segmentation or a backward merge procedure based on the outputs of a character classifier which operates on the components generated by segmentation algorithms. Using contextual information including a spell checker, further improvements in word recognition are achieved. Extensive studies with different text documents have attested to the feasibility of the proposed algorithm. The algorithm and test results are discussed in the following sections.

2. SEPARATIONOF TOUCHINGCHARACTERS A typical text recognition system is shown in the flow chart in Fig. 1. Since the work described here deals with segmentation of touching characters, it is assumed that a line of text with the appropriately labelled connected components in the line have already been extracted using appropriate techniques.17-91 Figure 2 shows the connected components and their labels for the word "with". It is interesting to observe (Fig. 2(c)) that the letters "t" and "h" are touching. 2.1. The modified discrimination function for touching characters Among the many articles that have been published concerning the segmentation of touching characters, 13-6~we were especially interested in the algorithm proposed by Kahan et al. ~3~ They detected touching characters by using the ratio of the second-order difference of the vertical pixel projection to the value of the vertical projection as an objective function. The cutting points for touching characters were obtained in the 825

S. LIANGet al.

826

---[baseline extraction

r i t e x t line extraction I t

~ '--Iword extraction I

labelling the word pattern for extracting

connected eom~aints

]

Islant detection I es slant correction [ [

[

I no applying the dynamic recurslve sesmentatlon| l algorithm for segmenting and recognizing [ input character components ] [ I

[[oat identification[ [

classes assisnment

~

feature extraction

'~ level I word ]

and elusification

[merging the broken characters

i

[ character verification ] I substitution character t ' ] r'~ level II word ~spell [inserted character ~ J l

.

~error correction ~

1 .o

ehoosin8 first candidate word [ after correction failure J

_I" =I I

Fig. 1. Recognition system diagram.

horizontal positions where the segmenting objective function was maximized. This method is able to cut most lightly touching characters; however, it was unable to separate heavily touching characters, particularly such as "oo', "oe" and "od", etc., because the pixel projection waveforms for these kinds of touching characters vary gradually due to the lack of the vertical strokes near the touching points. Figure 3 shows an example of touching characters "oo'. In order to improve the segmentation process, the authors propose two discrimination functions based on pixel and profile projections. The pixel projection and profile projection are described as follows: (1) The pixel projection is defined as {PXP(k), k = 1, 2..... LT}. It consists of the total number of black "1" pixels in each vertical column. LTis the length of the touching characters. (2) The profile projection is defined as {PFP(k)= TP(k) - BP(k), k = 1, 2..... LT}. TP(k) is the top profile of the external contour of the touching characters as seen from the top; BP(k) is the bottom profile of the external contour of the touching characters as seen from the bottom.

Figure 3 illustrates these projections for the component "oo". Two segmentation discrimination functions based on the profile and pixel projections are then defined as follows:

F~(k)=[ PFP(k + L1)--2PFP(k) + P F P ( k - L 1 ) ]

(1) F'2(k) = [ PXP(k + L 2 ) - 2PXP(k) + PXP(k - L2)]" PXP(k) L _l (2) where L1 and L2 in equations (1) and (2) denote the distances between the current column and the adjacent columns L1 and L2 apart. L1 and L2 are determined empirically based on the size of the characters in the documents. As an example for text with "times" font of size 11, LI and L2 were chosen as 8 and 4, respectively. ~t which represents the power of F1 and F2 is typically chosen to be an integer larger than 1. Figure 4 illustrates the effect of ~ on the values of the discrimination functions. Figures 4(c) and (d) show the discrimination functions for ~t = 1 and 2, respectively. It is evident that

Segmentation of touching characters

111 11111 11111 1~11 3 3 3332 3333 322113 333332333 3323333 33333 33333 31333 33333 33133 33333 33333 33333 33333 33333 33333 33333

4 6666666 6666666 666666 66666

6686 6686 8866 6||8 185

7777777 7777777 77777

77777

66666

7777

......... 66666

I:1

6666

77777 77777

618

6666,

77 . . . .

183

82 66666 7777777 6666 777777777 616 66666 777 77777777 66666666 7777777 6666666 7777777 6666666 77777 66666 77777 66666 7777 6666 777 666 777 66 7

64646 4444444

4666466 444466 44444

44444 44444

66644 64644 44666 64664 66446 46666

212 222222 222222 122222 22222 22212 12221 2221 22222 SSS$S 22221222222222 2222222 222322 22222 2122 22222 22212 2222 22212 2222

22222

2222 2222 2222

22222 22212 22222

1222 2222

22222 22212

22122 22222 22211 33333 22221 23333 22222 13333 9 122212 333333 9 9 9 9 9 9 9 9 9 9 1122222 2322332233333233333331323313 233223 323 33 33

44464 4446~

44444 44444 444444

466444444 66466646

22222 21222 23231 222222 221222 222221

(a) 111 11111 11111

a 6666666 6666666

6666666 6666666

6666 6666

666666 66666

66666 66666

6666

66666

6666

6666 66666 6666 66666 66666 6666

6666 666

66666 666 66666 66 66666 666 666666 666 6666666 66 666666666 666 666 66666666

66666 66666666

6666666 6666666 66666

66446 6464644 6666444 466444

66666 66666 66466 46466 44466 44446

46646 44464

6666666

66444

6666661 66666 66666

44444 64444

66666

64666

6666

6666 666

666 666

66

6

66446 646644 664666466 66664646

222 122222 222122 222222 22221 2 2 22222 2222 22222 2122 2222 222222 22222 22222 112222221 21221221122222 2222222 2222222 222222 22112 12211 1222 22212 22222 21122 22221 2222 22222 22222 2222 22212 22222 2222 22122 22222 1222 21222 22322 2222 22221 22222 2222 21222 22222 2212 22221 21122 22112 22222 12112 22212 21222 22112 22222 22222 22222 222212 22222 22222 222222 11222 22212 2 222222 221222 222222 2222212222 2121222 2222222222222222222112222221 222221 222 22 22

(b) 111 11111 11111 1111 1

G¢66666

6666666

6666666 666666

G666666 66866

6666G

66666

66666

6666

6666 66666 6G66 66568

G6GG 6666

6666 6666 E66 666

E6666 66665

~6 G66 666

66666 666666

GG6G6 666GGG6 56 6666 666666666 666 G6166 666 66666666 66666666 6666666 6666686 6666666 6666666 66666 66666 66666 6666

666

66666 6666 666 666

66

6

1 11111 1211111 1111111 111111 11111 11111 11111 11111 11111

11111 11111 11111 11111 11111 11111 11111 11111 111111 111111111 11111111

222 222222 212222 221222 ] 11222 2 2 22212 2121 22222 2122 2222 222122 22222 12222 121212222 22222222222222 2212222 1222212 222222 22222 22222 2222 22222 22222 12212 22222 2222 21122 22222 2222 22222 11122 2221 22221 22222 2222 22222 22222 2212 22222 11222 2222 22222 22211 2122 11222 22112 22222 22122 21222 12222 22122 22212 12222 12222 22212 222221 12222 22222 222221 11222 22222 2 212222 222122 122222 2222222122 2222112 2222222122222222222222212222 222122 121 12 12

(c) Fig. 2. (a) First scan of component labelling ("w" has 3 labels, 'r has 2 labels and 'th' has 4 labels). (b) Second scan of component labelling ('w' has 1 label, 'i' has 2 labels and 'th" has 1 label). (c) Final result of the labelled "with" ('w', T and 'th" have 1 label). the values of the discrimination functions are increased significantly with ct = 2, especially when F1 and F2 are larger than 2. Using suitable threshold, candidate cutting points are derived for further processing. Figures 4(c) and (d) show four candidate cutting points for the component "oo" with a threshold of 11. It is noted that in this case, the correct cutting location is determined by the discrimination function F2(k) based

827

on the profile projection. Figure 5 illustrates the process for the component "th". In this case, it is clearly seen that the discrimination function FZz(k) based on the pixel projection yields candidate cutting points that contain the correct cutting location. We therefore define the discrimination function as follows:

F~(k), if FZ(k) > T&F2(k)> F~(k), if F2(k)> T&F2(k)< F ( k ) = , F~(k), if F2(k) < T& F22(k)> if F2~(k)< T& F2(k) < 0,

T T T T.

(3)

Several cutting point candidates, including some false cutting points, were obtained at the locations where the values of the discrimination function were greater than a specific threshold. The optimal cutting points were found after several iterations by applying the dynamic recursive segmentation algorithm. Some broken characters caused by false cutting points can be merged using layout context information. This will be discussed in the following sections. 2.2. The dynamic recursive segmentation algorithm The recursive segmentation algorithm with adaptive windows, developed by Casey and Nagy, (5) performed template matching after a suitable window was chosen. According to Casey and Nagy, if the classification failed, a different partitioning of the input pattern was tried. However, even for a specific font, proportional spacing characters make it difficult to choose a suitable window on the first trial. Tsujimoto and Asada used a break cost to segment touching characters. (4) They constructed a decision tree and a set of additional rules to obtain the character component sequences. However, their algorithm required intensive computation to build a decision tree and search for a correct path; the more touching characters in a word, the larger the decision tree would be. Also their algorithm tended to accept several paths in the tree structure concurrently. In our approach, we proposed a dynamic recursive segmentation algorithm that implements a forward segmentation or a backward merge procedure based on the output of a character classifier operating on the components generated by the cutting points. This procedure terminates when the segments generated by specific points are recognized by the character classifier with high confidence. Figures 6 and 7 illustrate the algorithm for two types touching characters: the double-O "oo" and the serif touching pair "th", and the triplet "thi". Figure 6(a) shows the candidate cutting points determined by the discrimination function F(k). The first cutting point yields segments P1 and P2 which are sent to the character classifier. The classifier recognizes PI as a valid 'c' while P2 is rejected as an invalid character. P2 is then segmented by the second candidate cutting point yielding P3 and P5. P3 is rejected by the classifier, while P5 is recognized as a valid "o'. The segment P3 is then merged with segment P1 and the resulting segment P4 is correctly classified as a valid "o'. At this point both the left and

1

1

L~

-I

....

~i!il. °..°.

"fill "lli.,

"::!i!ii::" (a) "::::::::" ....

0.0

~

~

~

k

rP(k)

BP(k) (b)

PFP(k)

PXP(K)

,!ro

iiiiiiiiiiiiii oo. ,, Iiiiiiiiiiiiii (c)

(a)

Fig. 3. (a) Touching component 'oo'. (b) Top and b o t t o m profiles, (c) The profile projection PFP(k). (d) The pixel projection ..:÷!!÷i!h:.

:i!'" .i!ii !i!i! i!!i! • ill:

"~:

..::::::-

.::::::: . . . .

:iiii ...... ~i~:.

"iiiiiiill iiiiii!i: diiiiiill |iii:':iil,

":ii!iiiii~ii:"

::::::.

.~i:: ...... :iii~: :÷i!i ...... ~i~:.

iiii iiii iiii ill:

fill iiiii "iiih :Jill

":!ili~ii:ii:'

"iiiiiliii iiiiiiii: ,fill÷ill! i|il:':iil.

":iiililii~ii÷:"

PFI~

.-,-

":ii~ii~:~ii:'

PXP(k)

"t

25.0

~0.0, 16.0

PXP(k).

~*'

I0.0 •

~o.~!"!"

5.0"

L

0.0

L"

.

.i

~ili..iil.

ii.

:'"!iiiiii!i.

~.:~:

' o~iiiiiiiZiiJiiiiiiiiiiiiiiiiiii~iiiiiii~

k

0.0

(a)

k

(b)

Fl(k ) = PFP(k+8)-2PFP~k)+PI~k-8)

IP l(k) = P ~ P ( k + 4 ) - 2 P X I ~ k ) + P X P ( k - 4 )

3.0 3.0 1.0 O.C

..'

,~ .~ii.



'7 I1~111kJl~

2.0 II

h

'~k

(c) Ff(k) 14.o f tz01:- ........

raa(k)

jjftf III

L ......

:oT

IIh ,...,o.

i2it

,~lllJlh, -~

12.

" ...............

> t3.o

I0, 8.0

" .............

segmenting

threshold

6.0 4.0 .

i

i

it_

k

(d) Fig. 4. (a) The double-o touching character and its profile projection. (b) The double-o touching character and its pixel projection. (c) The discrimination function values based on the profile projection and the pixel projection with ~ = 1. (d) The discrimination function values based on the profile projection and the pixel projection with ct = 2.

Segmentation of touching characters

:iiiiiii! "!!Fi

..ii!:...

":i!iil:::' il"i:l:i:::ih.

i!i!! !iiii ,,H,

ii'~i I!',~! .+°,,

iiii: !!!!!

,,,,, • o,,, ,,°o,

.,,o, ,,,o, ,,,.,

,,,., ,,oo, • ....

=~iliiii! "i!!!!

..ii!:...

.....

829

.....

"'iiiii"*" iiiii'~""'iii~. iiiii

e,,.,

iiill

iiill

"::iiii:!i!!i!!h::::::i!!ii!:

PFP(k)

PXP(k) 30.0

,o.,°,,,, ,,.,.°,,,,

_%._...: ,,,,,

,, !~itt|tl,q ..,, ,,o~ ,o.,~. .,,oo,,o,o..,o,,°.,,.,,,,,.

~o

,o ,o,,.

!i!!!!!!,n.!!!!!!!~n!!!!!!!!!i i!!i!!!!~!!i!'n'i-n!i!!!i!!l! !!!!!!!!,n.!!!!~!.~!!!!!i!!!! !i!!!i!!~!!!!!!!~!!!!!!!!ii

15.0.

lilillilllllllllllilllillllll

:llllllllllllllllli~.lll:lllll::. ,o,,,,,,,,,,,.,,,,,,,,,,,,,,,°,°,

!!",'! i!i!i !!!!!

o.o, ::-"-

ILk

0,0

!!i..i !i!! !!!!

15.0.

i!'i! !ii!!

,°qo, dill,"

0.0

(a) r~k) = (Prnk*81-~Pr~kl+~r~H~)' ~'~(i

"!:.. °,~. l:lll

(b) __( PXP(k+ 4)- 2P~P(k)+ PXP(k-4)pXP(k) ;

PrP(k)

14.0:

14.0 12.0.

12.0" iO,O"

leiimentlnll threshold

8.0

I0.0.

8.0. 6.0. 4,0. 2.0. O.C

6.0

4.0 2.0

I IIIh,

i

0.0

~- k

:::::IMHI::IIHI ,30 segmenting threshold

~k

(c)

(d)

Fig. 5. The serif touching character 'th'. (a) The profile projection. (b) The pixel projection. (c) The discrimination function based on the profile projection. (d) The discrimination function based on the pixel projection.

iiii

"iiililiii

iiii

BM-- Backlmrd Merge Pattern (FP)--rorward cut

(IP)--initial J"

~

(RP)

leg ~

P4

(RIP)-- Residue Pattern '*' --rejection

-,--P5

:l

(BP)~.~'o'

(RP)

r(k) '

'o'

(b)

'I It II 1~,

(BP) ~Ba©kward merle

-

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

'0 . . . . . . . . . . . . .

I!:0

4.0

g:0

(a)

8egmentlnll

threshold

'*'

'C'

'it'

'it'

'O'

'O'

-

O

(c)

output

Fig. 6. (a) The segmented patterns. (b) Graphic representation. (c) Time sequence of segmentation.

830

S. LIANGet

u

t

]¢~ut

al.

i

I-

3 segmenUng

'

threshold

'"l--t-I......

output

R

(.)

(b)

till

allI°

*r

IIl~'

Di°

t,...

P3

v

!--,411

P4

PS

P0"

i=10

P0

Mk) . . . . . . . .

I --nu-

output

threohold

:- k (c)

(d)

Fig. 7. (a) and (b) The serif touching character 'th' and its time sequence of the segmentation. (c) and (d) The serif touching character 'thi' and its time sequence of the segmentation. right segments are recognized with high confidence as valid characters ("o" and "o" in this example). Figures 6(a)-(c) illustrate the various stages of this technique. Figure 5 shows 12 candidate cutting points for splitting component "th" by using the discrimination function based on the pixel projection. In the segmentation process, even though cutting points sometimes deviate from the correct locations, the optimal cutting points Xc, t are found after several forward and backward cutting iterations. Figure 7 illustrates the proposed algorithm for two sets of touching characters "th" and

"thi". The first case involving "th" with two possible cutting locations is very similar to the case of "oo" discussed earlier. The second case is interesting and brings out the effectiveness of the proposed algorithm. Due to the figure space limitation, only five candidate cutting points (Figs 7(c) and (d)) are generated by the discrimination function. (1) Initially the triplet labelled P0 is segmented into two components P1 and P2 of which only P1 is accepted as a valid character "1" by the classifier.

Segmentation of touching characters (2) P2 is further segmented into P3 and P5, both of which are rejected by the classifier. (3) P3 is merged with P1 yielding P4 which is accepted by the classifier as "t". (4) P5 is further split into P6 and P7 of which only P6 is accepted by the classifier as "1" which P7 is rejected. (5) P7 is split into two components P8 and P10, both of which are rejected by the classifier. (6) P8 is merged with P6 yielding P9 that is accepted by the classifier as a valid "1". (7) P10 is split into P l l and P12, both of which are accepted as valid characters "I" and 'T'. At this stage we have a segmentation that yields four characters "t", 'T', ' T ' and 'T'. It is obvious that this segmentation is erroneous as the correct segments should have been "t", "h" and "i'. For touching characters that contain three or more characters, broken characters may appear in the sequence of the recognized characters. The error can only be resolved using contextual classes and other contextual information. This is discussed in later sections. Italic characters in a word were subject to slant correction prior to the application of recursive segmentation. "°)

3. FEATUREEXTRACTIONAND CLASSIFICATION Even in machine-printed documents, shape discrepancy among characters belonging to the same prototype is sometimes quite large because of the poor quality and low resolution of document images. Particularly, when touching characters arc segmented, the

,/7//

F/

(a)

831

noise blobs near the cutting points overlap both sides of the characters, possibly resulting in a large dissimilarity between the input pattern and the corresponding sample class. Image processing techniques such as border tracing or component labelling are able to remove noise blobs that are not touching the character component; however, it is impossible to remove touching noise blobs by simple techniques. Wc have developed a new feature, the Overlap-Neiohbour-DirectionFeature, that is largely based on the border chain code histogram proposed by Kimura and Shridhar. !1~)They independently calculated a local histogram of the quantized chain codes (0 for "-", 1 for "/", 2 for "V', 4 for "r') in 4 x 4 rectangular zones (Fig. 8) (see referenee (11) for detail). The modified feature is then extracted as illustrated in Fig. 9, where there are six overlap regions across the interface of each sub-rectangle; three vertical strips with width m2, and three horizontal strips with width ml. When a local chain code histogram is calculated, the direction chain codes in the overlap regions contribute their direction values to the histograms of the related local sub-rectangles. Kimura's feature extraction method of Kimura's created a large distance between the two "d"s shown in Figs 8(a) and (b) because the rightmost vertical stroke of two " d ' s occupy the different sub-rectangles. However, the Euclidian distance between the feature vectors of the two input patterns in Figs 9(a) and (b) decreases with the modified feature by properly choosing two parameters, ml and m2. Our experiments indicate that with the Overlap-Neighbour-DirectionFeature, the Euclidian distance between the input patterns and the stored prototypes decreases by approx-

direction

0

4e°(/) 9o* (I) l ~ " (~)

o 2o

0

(b)

direction

(c)

number

O* ( - )

number

0° ( - ) 45°(/) 90 ° ('1)

0 o 40

~:m° ('0

o

(d)

Fig. 8. The direction chain code feature. (a) and (c) Normalized character 'd's and their chain codes in a sub-rectangle. (b) and (d) Direction chain code histogram in a sub-rectangle.

832

S. LIANGet al.

a

r-

-

- -

direction

number 0 0 56 0

(-) 4~ (/9 o*

90' . . . . .

.J

(a)

(b)

";--....

ml-'~"

!1

]--,m,-T

II i-'

,' ~ 1 '

___J'__i

direction

number

o* (-) 45" (/)

0 0

90"/I}

56

1 ~ "° ( \ )

0

(c)

(d)

Fig. 9. Thc Overlap Neighbour Direction feature.(a)and (c)Normalized character'd'sand theirchain codes in an extensivesub-rectangle.(b)and (d)The directionchain code histogram in an extensivesub-rectangle.

imately 23~o on the average when ml = 10 and m2 = 8 in the 64W x 80H pixel unit rectangular frame. The minimum-distance classifiercl 2,13)is an effective technique for classification problems in which the pattern classes exhibit a reasonably limited degree of variability. For a specific and clear machine printed text, the pattern of each class tends to cluster tightly about a typical or representative pattern for that class. Under these conditions, a minimum-distance classifier can be a very effective approach to the classification problems. Consider M pattern classes and assume that they are represented by the mean feature vectors of the training classes, i.e./~1, #2 .... , Pro. The Euclidian distance between an input feature vector X and the ith class is given by D , = I I X - / ~ , ) l = ( X - - / ~ , ) T ( x - - ~,).

(4)

In the experiments, X is a 64-dimensional feature vector, i.e. X = ( x 1 , x 2 , . . . , x 6 4 ) T, where the superscript T denotes the transpose. Each component x~ of X, for i = 1, 2.... ,64, is composed of a local histogram of overlap-neighbour direction chain codes of a character shape boundary based on 4 x 4 subsquares of a normalized binary pattern. A minimum-distance classifier

based on a feature extraction approach computes the distance from an input pattern to the mean vector of each class and assigns the pattern to the category of the closest mean, i.e. X is classified to the class i if D i < D j, for all j # i. 4. WORD ANALYSISBASEDON CONTEXTUAL KNOWLEDGE 4.1. Character contextual classes

Using contextual information to analyze a recognized word can improve the performance of the text recognition system. Characters occurring in a text line are generally related to one of the classes based on their location with respect to the baselines,t14) In Table 1, the classification of character contextual classes is summarized for the fonts used in our application. To improve the flexibility and applicability of our document recognition system, we have made use of size and font information to achieve an accurate contextual class description of the character components in a particular text line. The "t" and "i" are aggregated into a specific contextual class, because they have a smaller number of pixels above the upper baseline than the other Ascender characters in a specific font. This

Segmentation of touching characters

th

It

matched ¢I

833

C I -- C l a s s I ( A m c e n d e r ) C 2 -- C l a s s 2 ( A J c e n d e r ) C 3 -- C l a s s $ ( c e n t r e d )

,hnot matched

m

nt

o,o,o,\ matched class 111 not matched merged 'm' Fig. 10. Merging broken characters with the knowledge of the character contextual classes. eliminates the ambiguities that easily occur between ("i.... I'), ('T"T'), ("t""f") etc. Character contextual classes define character sets as statistically disjoint from the results of classification. The incorrect segmentation achieved earlier for the connected component "thi" can be resolved by using contextual classes. As shown in Fig. 10, the combination "th" which can often be split into three valid characters "t", 'T" and "I" can be corrected by recognizing that "1" and "I" do not belong the same contextual class and merging the two segments "1" and "I" into one segment that is accepted by the classifier as a valid "h". Thus using character contextual classes, the component "thi" is correctly split into "t" and "h" and 'T'. To determine the contextual features of character components, the baselines of words are required. The most straightforward method to determine these baselines is to find the character locations with respect to

Description

1

Ascender

2 3 4 5

Ascender Centred Descender Full-height

Example

an "m" is regarded as "r" and "n" or "r', "r" and 'T'; an "n" is replaced by "r" and "I"; a "B" is segmented into "I" and "3"; an "h" becomes 'T' and "r"; a "U" is separated into ' T ' and "J"; etc.

All capital letters, numerals, and b, d, f, h, k, 1 i, t (not for all fonts) a, c, e, m, n, o, r, s, u, v, w, x, z g, p, q, y j

] ,~aid th~ lx~iic~ i'n-ln~x uhe, ,.Giverxltv i . :i.,~ ,

q

~

I

-

Fig. 11. Baseline detection in a segmented text line.

4.2. Mergin 9 broken characters The dynamic recursive segmentation algorithm described in Section 2 may decompose one character into two or three valid but incorrect character components. Here are some examples:

Table l. Character contextual classes. Class

the maximal value of the first-order difference of the pixel projection of the text line image in the horizontal direction/5) Unfortunately, naturally skewed text lines always appear in practical document images, resulting in incorrect character contextual class assignments, Projection of the segmented text lines can be used to relieve this problem (see Fig. 11). We took advantage of a two-phase detection of baselines. The first phase was based on the baselines extracted from isolated words; failure of this phase prompted the activation of the second phase which extracted the baselines from segmented text lines. These baselines were used to assign a contextual class to an input character component. This two-phase strategy makes the process more accurate and effective.

Therefore, a merge procedure based on the similarity calculation is performed to rejoin the broken characters in the following steps:

I

(1) If the characters such as "1", "I", "J", "r", "n", "t", etc. appear adjacently in one word, they are considered to be candidate broken character components.

834

S. LIANGet al.

(2) An adjustable window with a maximum size of three character units is moved from left to right across the input word to check if two or three pieces of the character components mentioned in step (1) appear adjacently and simultaneously in the window. If any such character components are detected, a classifier with a specific rejection threshold is applied to the image array corresponding to those detected character components. Any rejection indicates that the input array contains two or more distinct characters; otherwise, the composition of the character components probably is one character. (3) The merge procedure is not applied if the character components to be merged are not in correct character contextual classes. For instances, the character "m" is only composed of Class 3 (Centred) broken components. The character "h" may contain a Class 1 (Ascender) broken component in the left side and a Class 3 (Centred) broken component in the right side. The touching character "It", even if accepted by the classifier, can be prevented from being merged to "h", because the touching character"It" is composed of one Class 1 (Ascender) component in the left side and one Class 2 (Ascender) component in the right side, violating the merge rule for the character "h" (Fig. 11). 4.3. Character verification If the output of the classifier do not match to the character contextual classes to which the characters belong, the characters may be corrected according to their contextual classes. For example, both the lower and upper cases of the characters c, o, p, s, u, v, w, x, y, and z can be distinguished by applying contextual class information. Some characters, such as "i" and "j" or "t" and "f", etc. may be misclassified after normalization because of their similarity in shape. The character verification procedure is applied to eliminate these ambiguities. If the classifier outputs for a given pattern does not match the contextual classes to which the pattern should belong, the second character candidate accepted by the classifier with matched contextual class will be considered as a possible substitution. For example, if the input component is recognized as 'T' and its contextual class is 5 (i.e. the number of the pixels above the upper baseline and below the lower baseline is greater than a specific threshold), then the "i" will be replaced by an accepted candidate "j" which is a class 5 character (Table 1). Since the character verification process is not absolutely free of error due to baseline deviation, the final replacement is made only after spell checking.

5. SPELLING ERRORS A N D SPELLING CORRECTION

The dictionary look-up method is the most reliable way to ascertain the character level context, t16-17~ The main problem with dictionary look-up methods is the large size dictionary (at least 50,000 words) and consequent costs in memory size and searching time re-

quired to handle a realistic vocabularyJ ~6} Because speed is essential, the spelling correction program should be simple and efficient. Fortunately, the UNIX operating system provides a powerful spell command and a "system" or "popen" function in UNIX C that allows one to execute standard UNIX commands in C programs that makes "on line" spelling correction possible. We assumed that the dictionary in the UNIX system included most English words. To avoid making the correction procedure too complicated, we also assumed that spelling errors in a word came from one of three types of errors (substitution, inserted, and merged character errors), the total number of substitution and insert errors in one word did not exceed one, and the number of merged character errors did not exceed two. Generally, words with more than two different types of error cannot be corrected without human intervention. The procedure consists of the two following steps. 5.1. Spelling errors The words produced by the recognition system after the character verification phase are spell checked by the UNIX spell program. If the given word passes spelling checker test, it is assumed to be a correct word; otherwise, the error correction procedure is employed until a correct word is obtained. If the correction procedure fails to correct the error in the word, the system chooses the first candidate word as the final output. After merging broken characters and verifying characters with contextual class information, we found very few words to be rejected by the spell checker in our case. Only those words absent from the UNIX dictionary were referred to the error correction procedure. 5.2. Spelling correction Correction of substitution character errors. All characters in a word have corresponding sequential candidates. The number of character candidates corresponding to one character component is equal to the number of classes which have accepted Euclidian distances to the input pattern. These candidates are sorted in the order of their Euclidian distance values. The character candidate with the smallest Euclidian distance to the corresponding input pattern is placed at the top of the rating table. The algorithm outputs the first candidate word accepted by the spell checker. Correction for merged character errors. Touching characters which have a similar shape to some isolated characters are frequently recognized as specific characters. These touching characters are even confusing to human beings if they do not appear in a word level. For example, touching "r" and "n" are regarded as "m" and touching "n" and "n" can be interpreted as "n" and "n" or "r" and "m". It is impossible to eliminate these ambiguities without spelling tools. The correction method proposed here is based on the knowledge of

Segmentation of touching characters

t h o s e t o u c h i n g c h a r a c t e r s . I r a w o r d w a s a g a i n rejected b y t h e spell c h e c k e r after s u b s t i t u t i o n e r r o r c o r r e c t i o n , we a s s u m e d t h a t t h e r e w a s p r o b a b l y o n e o r t w o m e r g e d c h a r a c t e r e r r o r s in t h e w o r d . B y u s i n g p o s s i b l e c h a r a c ter c o m p o s i t i o n s to s u b s t i t u t e t h e s e m e r g e d c h a r a c t e r s , a series o f w o r d c a n d i d a t e s were o b t a i n e d . T h e first c a n d i d a t e to p a s s t h e spell c h e c k e r b e c a m e t h e o u t p u t word. C o r r e c t i o n o f inserted c h a r a c t e r errors. It is very difficult to delete i n s e r t e r r o r s w i t h o u t s o m e c o n t e x t u a l information. Insert errors are always caused by noise c o m p o n e n t s w h i c h c a n n o t be e l i m i n a t e d b y a n o i s e s m o o t h i n g filter. O b v i o u s l y , a r b i t r a r i l y d e l e t i n g

II

835

characters in word strings is impractical. Generally, inserted character errors occur in situations where the character contextual classes of those inserted components do not match the recognition result of these components. For instance, the pixel numbers of inserted characters are always below the specific threshold for a given character height. Hence, using context information is an important clue to delete the insert characters with a least amount of risk. 6. EXPERIMENTAL RESULTS

The documents used in our experiment were based on 12 pages of"NEWS LINE", a University of Windsor

I

Teaching, research do mix, Smith says "GivinI pmf(s~on ~beoRponUm~l Io improve Ibeir zeadmll does B x o41~m~ a umvennw's ~ m s e ~ c ~ . " Smart Seutb mid a plemry semum ot tbe Ph)~actsm (CAP) be~ June 17. S"n'zh U the ambor ol" Uze 1991 Un,vtmty E d ~ a m L Last November. U ~

"rnc a.d~nce oC l~ys~im rmmved S-,,,h'sc e e m m e m m / w U mine skqx~

c u ~ and Smith f o w l bimseJf d~fe~Im8

Im rmommmdmmmmmng m,*q u e m m

pmo~

Pk~m

"Sooew believes the main funcmm or" a umvemty zs Ibe (bsmmmm~ o4" bx)wledlle, but every pzece o4"e v ~ a c e mM(m.~ pm(em~n believe mmvenm~

e.tm foe pmfcsmr~ m ~o msr.arc~" m d Smvh a fecm~bead o f ~ e S ~ C(x~ ~ Ca~ "Bin kCs I~ frmk. T'mm~,, a Im of mm~l-rm~ n~aun~ beiag dine ~ ~ ~ m wouM m ~ r ~ ixXZml¢ ~ ene~Zy imo ~ ¢ l b ~ , . " H¢ uud soc~y bm b o u l ~ into ibe idea o/leamu~ born scaobn evm tlzou~

it ~ l d be ~ess~ t l ~ it mUeles u m p t Ibe fwst two ~e~,s ol uatven~/. Hmvevet, o ~ tbe Ima 20 ~ a n ~ slmnkmll ~-,,,~. for umva~uum, d~en~mb~r oL boom ix~on spat u~ctl~ll aom~Jy went dowu. Proks:~,~ sboukl I~ve the f l e x ~ y

m~

m be m m l u ~ l oa m.,.~ o0mnlm.

¢C.U')~r~ feum Wt. Fn~-me Bnd~ CAP ~ . , N/l~ Hed~eem~ CAP mud W'uwb~rp i ~ pmfener m d Jolm MkblJmm~ CAP pru~eu r u e ~ m r ~or AMm~ EAsrlr~ ~ ~ ( CR&P p ~ ; •d ~ n a ~ m divie~m~~

III

Windsor gives physicists positive charge Over ,I00 phy,*~,-- barn a a x ~ ~---,t•,,,-,~-am e C m m m s o¢ ,t,,.~ - " d m W'mdsar's ~ o(Pbys~cs from j , n . I0 io 14. N a b d Pdm-w~.,~_ ~ K,m e ~ W',k.--,o p m m m e ,,--,l~ss w ~ h a publk: kcm~ m w b ~ be ou~ned h " VIes for

~

~ me..,.,demmmm.

~-~.,a-,S ~ F r m k J ~ labomory ram. al~e~,w m Mndle~ m l p m m a m .

"jump ~ 1 1 - ~ ~ eaac~mn s,can~ ~ m i~1~a cac Whm t b e ~ t , ~ " - " t g a ~ lmu kaa~ a I ~ t ts nat I m l ~ o n ¢ WMI ~ 1 b e ~ tbe

Teaching Awards

l~n m ~ q m r ~ d ~ d m ~ m o m l~wod[ W i m "

eth~amM ma~mhL ~

w~

who bells m edm:mon-n:~n~ m~mve

uom m reseamu er m mining, u.,,~,~.~ m S ~ u ~ who weuld ~ : d a ~ a s m ~ Fe~:mu~e ~ resew~ f - ~ g wlds W'mm for e(*,'~r_~"'aJd~v*:._~_k~,m~L He uud C ~ u ~ M ~ mx

c~ed i ~ e ~ D u a , v ~ , . q ~ m m • n ~ r jeb ime01v~,~-J~ . . . . c~lal~rmm md z ~ l m m | for acnve u ~ h e n md cmv-,,,~-n ~ n n m d ~ - ~ , ~ p foe m,* cducanon

suff¢~d from fundmB mlmnfalb m ] ~ m ~

"The dew'k-~,,,,,,,~ we (m]oy Ioday m cmmmzczmms. ~ and oUz~ fleklsa m .*,,.maua oL a o o a m m a m Imx~ss

man umpum, UqUways am ~ c ~ s . " U m v e n m a do n m have zbe do~ian mey mln~ ~ ~ but who does? W e do em~ M v e a H a m m J but ~ e sysmm u noc m ~rms." S m ~ h whined ~at d"u m v e m u e s do m x warn ,n be it~lsed aztBIranly by M~cMa4"& maSmun¢ mey sbou~ con4u~ and pu~c~-,. ~eur own surveys o~ uudem, IFmmue and empim~-r samU[acuon.

De~wmmt bead P,~nlec:bey S d i l m n l ~ md P n l m o r C,ordm DrakL co-¢mm o ( ~ ~ u,,--,--,,,ored~ ,h,. ~ c3(d z ,,,ml~m ~ tbe fuU

From page one aeve~

d m | n and

ot imavsnve ~

The avnsas w ~ e lmmmmd ,~ ,k, 504h .kdmzvm~ a dze ('b,~ o( '42 din,v.-. Also i m m m l d w m ~,. A l u m m A w l l d l~m~ w Mqm.C~ I ~ pnhm.,r. B A '4& •"a J m a ~ Cad Z a ~ . B A "49. W,h",-- w m C I ~ (~rd Z i m p m y barn 197S Io 19~19.lqe is a p m ~ S c amll~r aml in 1978 w l m~le a ~

o(dae Old~ o( ~ Mere. 7Jdev ~ aRmmeed ,emer )~lle o( ¢Je

shm~l apldy m d " educaB~ ~ l e m m I~1." say~ Wibo~ ~ ' s mine, ~mhm$ sbeuid be cdlalmma~ m u ~, m Jq~m. P I ~ wa~ W m a ~ on r ~ m ~

Disma Co.n ~ ,~" C a u m y ~r E u ~ m |975. has l,e m a m ~ e r a[ Ibe advkmry bared o(tbe clinicallaw prOlgum m d w m

d e ~ k R m * ~ e d u c m ~ m d the m m ~ of e c b n o k ~ born me labor~ry ;o .~-

Annual Ad~xacy '_,--~,~_~,s. l m a m l oL C m ~ m m 8 Lcpl , ~ at U u i ~ -

s,y of MidUgmz Fig. 12. A sample document.

836

S. LIANG et al.

Teaching* researcher do mix* Smith says ''Giving professor the opportur{ii}ty to improve t{iI}eir teachi{rI}g does not obstruct a u{rl}iversity's best researchers.''{s}tuart S{rrl}ith told a plenary session of the {c}o{rI}gress of the Canadian Association of P{iI}ysicists ({c}AP) here June 17, s{rrI}ith is the author of the {1}991 report of the Co{rrI}{rrI}ission of Inquiry on Canadia{rI} University Education. Last November, the {II}{rI}iversityse{rl}ateestablishedco{rrI}{rrI}ittees to respond to that report, The audience of phys{l}cists received S{rn}ith's co{rrI}{rrI}entary with some skepticism and {s}{rrl}ith found himself defending his reco{rrI}{rrl}endations d{ll}ring the question period. ''Society believes the {rrI}ain function of a u{rI}iversity is the disse{rrl}i{rI}ation of knowledge, but every piece of evidence suggests professors believe u{rl}iversi{l}ies exist for professors to do research.'' said S{rrl}ith, a former head of the science Council of Ca{rI}ada.''{13}ut Iet's be frank. There is a lot of second-rate research bei{rI}g done by people who would rather be putting their energy into teaclIing.'' He said society has bought i{rl}to the idea of lea{m}ing from scholars even though it would be less costly if colleges taught t{iI}e f{l}rst two years of university. However, over the past 2{0} years of s{iI}ri{rl}king funding for u{rI}iversities, the n{Ii}{rn}ber of ho{II}rs professors spent teachi{rI}g actually went down. Fig. 13. Results after applying the dynamic recursive segmentation algorithm.

publication which contained 8 pages with 40~o touching characters and 4 pages with about 60% touching characters. A typical document image is shown in Fig. 12. The document images were scanned with a 300dpi scanner connected to an IBM 386. All algorithms in this paper were written in C, and executed on a SUN SPARC II work station. We collected and constructed the character sample classes in five fonts individually drawn from the practical document images. The number of samples in each class ranges from 6 samples per class to 100 samples per class, depending on the characters available in the training documents. The clustering technique "K-Means ''{13) was used to specify and partition the given data sample sets. After clustering our raw sample data, each sample class set was subdivided into at least one subclass based on a selected threshold. Text, pictures, and graphic blocks were identified, and non-text parts were removed with the block segmentation and text discrimination approaches proposed by Wong et alJ s) It is conceivable that both segmentation and recognition may be considerably improved by using contextual information. In our case, layout contextual knowledge dealing with baseline information of text lines and the location of the character components with respect to their neigh-

bours was used to assign a specific class to all the character components in the text lines. We successfully merged the broken character components by using the character contextual classes. A multiple-font system is realized by putting each of the font symbols into a separate class. To obtain high efficiency and minimal computation, font recognition is essential in the recognition system. Different prototypes for each class are constructed for different fonts by tagging corresponding font labels. There is little difficulty in classifying an unknown input pattern and assigning a font label to it at the same time. Since it is not guaranteed that every word contains only one font label, we used the font hypothesis process. The "frequency" of font labels is defined as the ratio of the number of the characters in a word belonging to font i to the total number of the characters in the word. It is used to determine mono-font states. The word pattern is recognized as being composed by font i if the "frequency" of font i in the word is greater than the "frequency" of any other fonts in the word. After the font is identified, the multiple font system is switched to the mono-font state. The mono-font state is not changed until the dissimilarity score, the average Euclidian distance of the characters in one word, is

Segmentation of touching characters

837

Teaching* researcher do mix* Smith says ''Giving professor the opporturnty to improve their teaching does not obstruct a university's best researchers.''{s}tuart Smith told a plenary session of the {c}ongress of the Canadian Association of Physicists ({c}AP) here June 17, smith is the author of the {1}991 report of the Commission of Inquiry on Canadian University Education. Last November, the university senate established committees to respond to that report, The audience of phys{l}cists received Smith's commentary with some skepticism and {s}mith found himself defending his recommendations during the question period. ''Society believes the main function of a university is the dissemination of knowledge, but every piece of evidence suggests professors believe universi{l}ies exist for professors to do research.'' said Smith, a former head of the science Council of Canada.''But Iet's be frank. There is a lot of second-rate research being done by people who would rather be putting their energy into teaclIing.'' He said society has bought into the idea of lea{m}ing from scholars even though it would be less costly if colleges taught the f{l}rst two years of university. However, over the past 2{0} years of shrinking funding for universities, the number of hours professors spent teaching actually went down. Fig. 14. Resultsa~ermergingbrokencharacters.

higher than a specific threshold. The font hypothesis process is then repeated. In this paper, we mainly focus on the segmentation of touching character. Therefore, only five fonts of the text are included in our system. The approaches proposed for segmenting touching characters utilize multiple techniques. To make our interpretation more clear, we use Figs 13-16 to illustrate the application of various steps of each individual technique. Due to space limitation, only a part of results of the original document image Fig. 12 are illustrated. Figure 13 shows the results of the dynamic recursive segmentation algorithm based on Fig. 12. The curly braces were used to indicate the errors made by segmentation and recognition algorithms. Figure 14 shows the results after word contextual analysis including merging broken characters. Figure 15 shows

the results after character verification. Figure 16 shows the results after spelling correction. Most touching characters, including the italicized fonts, were successfully separated. The recognition accuracy was 99.85~o for the 8 page documents with 40~o touching characters and 99.4~o for the 4 pages containing about 60~otouching characters. The recognition errors did not take into account small punctuations such as "," and ".", and those symbols that were not included in our sample sets. Recognition errors in our system stemmed mainly from the following sources: (1) Some substitution errors, merged character errors were not corrected by spelling correction because the correct words were absent from the system dictionary, e.g. names of people or cities.

Table 2. Summary of segmentation and recognition results Total number of characters 54,000

Total number of errors

Substitution errors (40~o)

Broken errors (50~o)

Merged errors (10~o)

185

74

92

19

Inserted Average errors (10~o) recognition rate 0

99.65~o

838

S. LIANOet al.

Teaching* researcher do mix* Smith says ''Giving professor the opporturnty to improve their teaching does not obstruct a university's best researchers.''Stuart Smith told a plenary session of the Congress of the Canadian Association of Physicists (CAP) here June 17, smith is the author of the {1}991 report of the Commission of Inquiry on Canadian University Education. Last November, the university senate established committees to respond to that report, The audience of phys{i}cists received Smith's commentary with some skepticism and Smith found himself defending his recommendations during the question period. ''Society believes the main function of a university is the dissemination of knowledge, but every piece of evidence suggests professors believe universi{t}ies exist for professors to do research.'' said Smith, a former head of the science Council of Canada.''But Iet's be frank. There is a lot of second-rate research being done by people who would rather be putting their energy into teaclIing.'' He said society has bought into the idea of lea{m}ing from scholars even though it would be less costly if colleges taught the f{i}rst two years of university. However, over the past 2{0} years of shrinking funding for universities, the number of hours professors spent teaching actually went down. Fig, 15. Result after character verification.

Segmentation of touching characters

839

Teaching* researcher do mix* smith says ''Giving professor the opportunity to improve their teaching does not obstruct a university's best researchers.''Stuart Smith told a plenary session of the Congress of the Canadian Association of Physicists (CAP) here June 17, smith is the author of the {1}991 report of the Commission of Inquiry on Canadian University Education. Last November, the university senate established committees to respond to that report, The audience of physicists received Smith's commentary with some skepticism and Smith found himself defending his recommendations during the question period. ''Society believes the main function of a university is the dissemination of knowledge, but every piece of evidence suggests professors believe universities exist for professors to do research.'' said Smith, a former head of the science Council of Canada.''But let's be frank. There is a lot of second-rate research being done by people who would rather be putting their energy into teaching.'' He said society has bought into the idea of learning from scholars even though it would be less costly if colleges taught the first two years of university. However, over the past 2{0} years of shrinking funding for universities, the number of hours professors spent teaching actually went down. Fig. 16. Resulta~ers~llingcor~ction. (2) Some incorrect words were accepted by the spell checker before or while the correction procedure was applied. (3) Incorrect words with more than one substitution or two merged character errors were not corrected. In Table 2, we list the number of errors and the error type distribution occurred in our recognition system. About 30~o of the errors were broken errors. They were caused by deviation of the baseline detection and the reject threshold set for merging the broken characters. About 60~o of the errors were substitution errors between 'T' and "1", "0" and "O", etc. About 10~o of the errors came from merged errors that were not corrected by spelling correction because more than one type of error occur in a word. This is a good result for texts with many touching characters. However, there are many factors that govern the recognition rates. These factors include the quality of the image, image resolution, character font types, and the degree of overlap and touching between two characters. 7. CONCLUSION We have developed an effective segmentation strategy for segmenting a word into its character corn-

ponents. With multiple segmentation technique, we have provided robustness to this strategy, and we feel that this technique can be introduced into an overall document recognition system. These segmentation techniques are very efficient and can be applied in real time if necessary.

REFERENCES

1. Y.Y. Tang, C.Y. Suen, C.D. Yan and M. Cheriet, Document analysis and understanding: a brief survey, I st Int. Conf. on Document Analysis and Recognition, SaintMalo, France, pp. 17-31, October (1991) 2. George Nagy, Teaching a computer to read, l l t h I A P R Int. Conf. on Pattern Recognition, Hague, Netherlands, pp. 225-229, 30 August-3 September (1992). 3. S. Kahan, T. Pavlidis and H. S. Baird, On the recognition of printed characters of any font and size, IEEE Trans. Pattern Analysis Mach. lntell. PAMI-9(2), 274-287 (1987). 4. S. Tsujimoto and H. Asada, Resolving ambiguity in segmenting touching characters, 1st Int. Conf. on Document Analysis and Recognition, Saint-Malo, France, pp. 701709, October (1991). 5. R. G. Casey and G. Nagy, Recursive segmentation and classification of composite character patterns, Proc. 6th Int. Conf. Pattern Recognition, Munich, Germany, pp. 1023-1026 (1982). 6. Chinmoy B. Bose and Shyh-shiaw Kuo, Connected and degraded text recognition using Hidden Markov Model,

840

7. 8. 9. 10.

11.

S. LIANGet al. Hague, Netherlands, pp. 116-119, 30 August-3 September (1992). Teruo Akiyama and Norihiro Hagita, Automated entry system for printed documents, Pattern Recognition 23, 1141-1154 (1990). K.Y. Wong, R. G. Casey and F. M. Wahl, Document analysis system, IBM J. Res. Dev. 26(6), 647-656 (1982). A. Rosenfeld and A. C. Kak, Di#ital Image Processing, 2nd Edn, pp. 347-349. Addison-Wesley, London (1982). S. Liang, M. Shridhar and M. Ahmadi, Efficient algorithms for segmentation and recognition of printed characters in document processing, IEEE Pacific Rim Conf. on Communications, Computers and Signal Processing, Victoria, British Columbia, Canada, Vol. 1, pp. 240-243, 19-21 May (1993). F. Kimura and M. Shridhar, Handwritten numerical recognition based on multiple algorithm, Pattern Recognition 24, 969-983 (1991).

12. R.O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, pp. 26--27. Wiley, New York (1973). 13. J.T. Tou and R.C. Gonzales, Pattern Recognition Principles. Addison-Wesley, London (1974). 14. P. G. De Luca and A. Gisotti, Printed character preclassification based on word structure, Pattern Recognition 24, 609-615 (1991). 15. R. M. Bozinovic and S. N. Srihari, Off-line cursive script word recognition, IEEE Trans. Pattern Analysis Mach. lntell. PAMI-11(1), 68-83 (1989). 16. C.J. Wells, L. J. Evett, P. E. Whitby and R. J. Whitrow, Fast dictionary look-up for contextual word recognition, Pattern Recognition 23, 501-508 (1990). 17. H. Takahashi, N. Itoh, T. Amano and A. Yamashita, A spelling correction method and its application to an OCR system, Pattern Recognition 23, 363-377 (1990).

About the Author--Su LIANG received his B.Eng. in electrical engineering from Dalian Marine College, Dalian, China, and M.Sc. from the University of Science and Technology of China, Hefei, China, in 1983 and 1988, respectively. He is currently a Ph.D. candidate in electrical engineering at the University of Windsor, Canada. His research interests are in the area of document image processing and pattern recognition.

About the Autbor--M. SHR1DHARreceived his M.Sc. in electrical engineering from Polytechnic Institute of Brooklyn, New York, in 1968, and Ph.D. in electrical engineering from the University of Aston in Birmingham, U.K., in 1970. From 1969 to 1985, Dr Shridhar was a faculty member in the Electrical Engineering Department of the University of Windsor, Canada. Since 1986, he has been with the University of Michigan-Dearborn where he is currently a Professor and Chairman of the Electrical and Computer Engineering Department. His research interests are in the area of image processing, pattern recognition and computer vision.

About the Author--M. AHMADIreceived the B.Sc.(EE) degree from Arya Mehr University in Tehran, Iran, and Ph.D. degree from Imperial College of London University, London, U.K., in 1971 and 1977, respectively. He has been with the Department of Electrical Engineering, University of Windsor, Canada, since 1980 and holds the rank of professor. His research interests include design, stability and realization of 2D and N-dimensional digital filters; image restoration; pattern recognition and computer vision. He has coauthored the book Dioital Filtering in I-D and 2-Dimensions, Design and Applications, published by Plenum, 1989. He has published over 170 articles in the above area. Dr Ahmadi is a Fellow of the Institution of Electrical Engineers in England (FIEE) and is a Senior of IEEE.