MergeLayouts - a Comprehensive Voting of Commercial ... - CiteSeerX

MergeLayouts - a Comprehensive Voting of Commercial OCR Devices Stefan Klink1, Thorsten Jäger German Research Center for Artificial Intelligence GmbH P.O. Box 2080, 67608 Kaiserslautern, Germany E-mail: {klink, tjaeger}@dfki.de http://www.dfki.uni-kl.de/~klink/ Phone: (+49 631) 205-3503, Fax: (+49 631) 205-3210

Abstract Sophisticated voting techniques described in recent literature focused more or less on isolated voting of classification results. Most often, character or word classification results of different classifiers were combined to achieve more reliable results and to overcome a single classifier’s weaknesses. We will present a more comprehensive voting approach, taking entire layouts obtained from commercial OCR devices as input. Such a layout comprises segments of three kinds: lines, words, and characters, each of them comprising several attributes, e.g. recognized text, identified font height, coordinates within the original image etc. By combining all those attributes, we attain a “better” layout, representing the original page layout as good as possible. The voting process itself is hierarchically organized, starting with the line segments. For each level, a search tree is spawn and all fellow segments (segments from different layouts, which denote the same image area) are established. Crucial to MergeLayouts is a similarity measure defined on segments, which allows for an efficient heuristic search process and which guarantees a reliable page layout retention. Keywords: OCR, Voting, Layout Attributes, Page Layout Retention.

1 Introduction In recent years, sophisticated voting technique has made its way from research prototype with limited application focus to commercially available products like PrimeOCR [1]. In general, voting is widely accepted for improving classification results by combination of several distinct classifiers working on the same classification problem. This technique has shown especially useful in combining OCR results. One reason for this might be, that OCR is such a complex and challenging task, that a single OCR device cannot perform well (or better: sufficiently

1. Author for correspondence

Stefan Klink

1

well) on all kind of input. On the other hand, OCR is one of the most crucial parts of every document analysis system aiming on automatic document processing. This dilemma gave an early motivation for researchers to look for efficient methods capable of improving OCR accuracy - leading, among various others, to the voting technique. In this paper, we will focus on voting of OCR results obtained by commercial OCR devices. To the best of our knowledge, all published material on this topic has focused on more or less “isolated” voting approaches, mostly combining classification results of individual word segments or character segments [2]-[9]. We will present a more comprehensive approach, combining entire layouts. Hereafter, a layout is considered to be a structure of line, word, and character segments with certain attributes. The following are typical attributes for such a text segment: • coordinates indicating a segment’s position within the original image • font attributes, e.g. bold, italic, underline, font height • recognition result(s): the recognized text for the segment’s image portion. Most often, speaking of OCR people were aiming at the latter point - the classification of digitized image portions obtaining ASCII character(s). Nevertheless, the other attributes are worth to be recognized with high accuracy, too. Thus, why not vote all attributes achieving a result preserving the original page layout as precise as possible, including a voting of the recognition results? The reliable layout structure obtained offers great benefit for subsequent document analysis steps, e.g. for contentual analysis (cf. [9]). Additionally, layout attributes might also support the identification of matching segments from different OCR devices, improving a voting algorithm solely based on recognition results. Doing so, we will introduce a search space, where each state represents a certain set of segments to be combined. To estimate a state’s quality, we will introduce a similarity measure defined on all segments of the same kind,

Stefan Klink

2

e.g. on all line segments. Defining a quality measure on states, standard search algorithms are applicable to obtain an optimal path in the search tree that denotes all matching segments from the different OCR devices. Hereafter, the matching segments will be combined by voting all attributes, including the recognition results. To do so, classic voting techniques, e.g. majority voting, might be applied. We will not only discuss some theoretical aspects of this approach, but will also present results achieved by a prototypical implementation proving its suitability, efficiency, and flexibility. In Section 2 we describe our system MergeLayouts from a global point of view. The two essential processing steps, planning and merging, are described in more detail in Section 3 and Section 4, respectively. In Section 5 we will present some achieved results in comparison with commercial OCR devices. We will end with some concluding remarks in Section 6.

2 Global view MergeLayouts should combine results obtained from commercial OCR devices, comprising layout information as well as recognition results. The devices should be restricted in no way, e.g. we do not rely on predefined zones or preset recognition attributes like language, font, page quality etc. Thus, input to our voting system are page layouts that are converted to an internal layout format (see [10], pp. 147ff). The output of a commercial OCR device serves as input for this conversion process, where the OCR output should contain as many layout attributes (coordinates, font attributes etc.) and classification “attributes” (confidence measures, character alternatives etc.) as possible. Usually, a commercial OCR API could be configured in such a way to produce an enhanced proprietary output, e.g. XDOC by Xerox Imaging Systems (cf. [11]). The internal layout is a strict hierarchy of line segments, word segments, and character segments. Hereafter, speaking of “segments of one kind”, we refer to a certain

Stefan Klink

3

level within this hierarchy, e.g. all line segments. Every segment is capable of holding classification results with confidence measures and alternatives as well as layout attributes. Whether a specific segment contains a certain layout attribute, e.g. text is italic, depends on the capability of the commercial OCR device from which the segment was originally obtained. MergeLayouts combines the layouts obtained by different commercial OCR devices. The output is a new layout consisting of the “combined” segments. By combining (or “merging”) segments, we combine recognition hypotheses as well as layout attributes. To cope with segmentation errors on the character level (think of the common ‘rn’-’m’), the output is no longer a strict hierarchy but rather consists of character segment graphs (see [9]). The combination process itself is top-down, starting with the line segments. Each segment level is processed separately. Within a certain level, we perform two steps. First, the planning module (see Section 3) identifies all fellow segments. Second, the merge module (see Section 4) combines these fellows and produces one or more new segments containing the combined attributes of the original segments. The global processing steps for all line segments obtained from three commercial OCR devices are depicted in figure 1.

3 Planning The goal of the planning module is to find all segments from all different layouts belonging together (fellow segments) and to construct a plan for the succeeding merge module. When for all inspected segments their fellows are established, the determined plan is executed by the merge module. Intuitively, segments from two or more layouts belong together if they denote the same physical area within the underlying image. Due to the following facts, the establishment of these fellow segments is a non-trivial task:

Stefan Klink

4

Starting with three initial layouts obtained from three commercial OCR devices L1={r1, r2, ..., r5}

L2={s1, s2, ..., s6}

L3={t1, t2, ..., t6}

r1 txt: “helo world” tl: (23, 10) rb: (1050, 34) bold: 0.1 italic: no

t1

s1 txt: “hello world” tl: (22, 10) rb: (1052, 33) bold: no italic: no

txt: “hello” tl: (23, 9) rb: (501, 33)

five line segments from OCR1

txt: “world” tl: (24, 517) rb: (1051, 34)

t2

(tl: top-left, rb: right-bottom)

planning step1: establishing fellow segments while searching for an optimal path

1 Z0=((Ø, Ø, Ø), (L1, L2, L3))

Z11

Z12

Z22=(({r2},{s2},{t3}), (...))

Z2

...

...

Z1m

1

Z2

3

m 2

...

Z21

Z13=(({r1},{s1},{t1, t2}), (L1-{r1}, L2-{r2}, L3-{t1,t2}))

Zk1

Zk2

Zk3=(({r5},{s6},{t6}), (Ø, Ø, Ø))

Zkm

k

optimal plan/path: P=(Z0, Z31, Z22, ..., Z3k) step2: executing the optimal plan by combining all established fellow segments

2 L={u1, u2, ..., u5} u1 txt: “hello world” tl: (23, 10) rb: (1051, 34) bold: 0 italic: no

merge from r1, s1, t1, t2

Figure 1: Global processing steps with intermediate results.

Stefan Klink

5

• even if there is an one-to-one correspondence between two segments their coordinates might not be exactly the same. • due to character classification errors, fellow segments might differ in their recognition results. • segmentation errors might cause a n:m correspondence of fellows obtained from two different layouts. • a mixture of the former problems might occur which gets even worse and more complicated when considering more than two layouts. For the planning module, the membership of segments to a layout is essential. Thus, we define a tupel which contains the segments separated by their membership to a certain layout. Definition 1: Let l1, ..., ln be layouts, where each layout is obtained from a different OCR device, and L1,..., Ln be sets of all segments of one kind belonging to the layouts l1, ..., ln. A tupel of segment sets T = (S1, ..., Sn) is a n-tupel of segment sets with: S i ⊆ L i,

∀i = 1, …, n .

3.1 Searching for a good plan Unfortunately, the input contains a large number of segments (e.g. the average number of word segments in the layout of a single page business letter is approx. 300) and the problem of finding segment fellows is very complex. Our solution is to transform the problem into a classical search problem. Thus, the planning module might construct every syntactically possible plan and then choose an optimal one. Now two other problems arise. First, the planning module has to generate every plan which might result in a combinatorial explosion. Second, we need a quality estimation function to find the ‘best’ of all generated plans. Before discussing these problems, we would like to introduce the definition of a planning tree

Stefan Klink

6

(corresponding to the search tree) and the notion of a plan. Hereby, every node is a problem state and every edge is a transition to a successor state: Definition 2: Let T and R be tupels of segment sets. A problem state Z = (T, R) is a pair of tupels of segment sets. Definition 3: Let Z be a problem state and L1, ..., Ln be sets of all segments of one kind of the layouts l1, ..., ln. Z = (T, R) is called a start state iff T = ( ∅, …, ∅ ) and R = ( R 1, …, R n ) with R i = L i , ∀i = 1, …, n . In our case, there exists exactly one start state: Z0 = ((Ø, ..., Ø), (L1, ..., Ln)). Definition 4: Let Z = (T, R) be a problem state. Z is called a final state iff T = ( T 1, …, T n ) with T i ⊆ L i , ∀i = 1, …, n and R = ( ∅, …, ∅ ) . Definition 5: Let Z k

k

k–1

k

= (T k

k–1

,R

k

k–1

k–1

) = (( T 1

k

k–1

, …, T n

k–1

), ( R 1

k–1

, …, R n

)) and

k

Z = ( T , R ) = (( T 1, …, T n ), ( R 1, …, R n )), k > 0 be problem states. A transition of a problem state Z k

k–1

1. T i ⊆ R i

,

k

k

k–1

k

→ Z is defined with:

∀ i = 1, …, n

2. T j ≠ ∅, 3. R i = R i

k–1

∃1 ≤ j ≤ n k

– T i , ∀ i = 1, …, n

The problem state Zk is also called a successor state of Zk-1. A problem state Zk-1 may k

k

have more than one successor state. They will be enumerated with Z 1, …, Z mk . In a transition from Zk-1 to Zk the established fellows are collected in Tk, whereas Rk holds the remaining segments, among which new fellows have still to be established. This process conti-

Stefan Klink

7

nues until all fellows have been established, resulting in empty Rk. In figure 1, 1 , the line 0

1

segments r1, s1, t1, and t2 are established as fellows in transition Z → Z 3 . Next, we will describe the search process itself. As we mentioned above, the planning could be seen as a traversel throw a search tree. While traversing, a plan for the succeeding merge module is constructed, which contains all established fellow segments of the current layouts. For this reason, we call the search tree also planning tree, which is defined as follows: Definition 6: A planning tree is a directed search tree in which every node is a problem state and every edge is a transition of a problem state to a successor state. A node is a start state iff it is the root of the planning tree and a final state iff it is a leave of the planning tree. A partial planning tree is shown in figure 1, 1 . Definition 7: Let Z be a problem state and L1, ..., Ln be sets of all segments of one kind of the layouts l1, ..., ln. Let P be the associated planning tree. A plan is the sequence of problem states Zk, k = 0, ..., m along a path in the planning tree from the root (start state Z0) to a leave (final state Zm). A subplan is the sequence of problems states Zk, k = 0, ..., t along a path in the planning tree from the root (start state Z0) to an arbitrary state Zt. 0

1

2

In figure 1, 1 , P = ( Z , Z 3, Z 3 ) is a subplan.

3.2 Quality estimation of a plan As mentioned above, the planning module is searching for an optimal plan. Therefore, the qualities of all plans are estimated and the “best” plan is seen as the solution of our search problem. We are going to present a quality estimation function with a dynamic and incremental character, which estimates the quality of complete plans as well as the quality of subplans and

Stefan Klink

8

isolated problem states. This strategy enables to assess a (sub-)plan during the search progression without completing the whole path to a leave node. Definition 8: Let P be a (sub-)plan, Z0, ..., Zm, m>0, the problem states in the plan, and ζ the quality estimation of a problem state. The quality estimation ξ of a (sub-)plan P ist defined as a mapping of all (sub-)plans to a positiv real number with: ξ(P) =

∑k = 0 ζ ( Z m

k

).

Hence, the quality estimation ξ of a (sub-)plan P is the sum of all quality estimations ζ of all problem states within the planning sequence.

3.2.1 Quality estimation of an isolated problem state The quality estimation ζ of an isolated problem state Z = (T, R) only regards the tupels of segment sets T. R is not used in order to keep the computational efforts low. Definition 9: Let Z = (T, R) be a problem state. The quality estimation ζ of an isolated problem state P is defined as a mapping of all problem states to a positiv real number with: ζ( Z ) =

ω 1 ⋅ ∆text ( T ) + ω 2 ⋅ ∆coords ( T )

The feature functions are weighted with a factor ω i , which is a constant parameter of MergeLayouts. To reduce the complexity of ζ , we herein only regard the two important feature functions ∆text and ∆coords , but more feature functions are possible and yet implemented in MergeLayouts (for further information refer to [10]). An important property of ζ is that lower values are representing better problem states. Thus, whilst searching for an optimal plan, the quality estimation function ζ has to be minimized.

Stefan Klink

9

3.2.2 Feature functions of a problem state In the following all in definition 9 enumerated feature functions are defined: Definition 10: Let Z = (T, R) be a problem state with T = (S1, ...,Sn), s ∈ S i be a segment, and t(s) be the recognized text of segment s. Then ∆text is defined as a mapping of all problem states to a positiv real number with:

∑

∆text ( Z ) =

∀r ∈ S i, ∀s ∈ S j

 0, if t ( r ) = t ( s )   1, else

i = 1, …, n – 1 ; j = i + 1, …, n

Definition 11: Let Z = (T, R) be a problem state with T = (S1, ..., Sn) and S˜ =

∪ni = 1 Si be

the union set of all segment sets in T. ∆coords is defined as a mapping of all problem states to a positiv real number with: ∆coords ( Z ) = area ∪ ( S˜ ) – area

˜

∩ (S)

.

Both feature functions, ∆text and ∆coords , contribute to the most interesting parts of a segment, its text and its location within the original image. Of course, you can think of more “sophisticated” feature functions, e.g. ∆text might depend on the edit distance of two strings. Doing so, the computational efforts for calculating ∆text would considerably increase. On the other hand, the quality estimator ξ might guide the search process more straightforward to an optimal plan. For a comparison of various feature functions refer to [10].

3.3 The heuristic search in the planning tree We will now discuss a reasonable approach for searching an optimal plan. Many strategies exist to search in a planning tree and generate a plan: [15] and [16] give detailed descriptions. The exact expansion of the planning tree depends on the refined search strategie reaching from a depth first search to a breadth first search. Both extremes are ruled out for the search of an optimal plan because of their high inefficiency. To find an optimal plan, all

Stefan Klink

10

possible plans have to be generated and compared with each other. This is entailed with an incredible expense. To reduce the cost of finding an optimal plan, in MergeLayouts a beam search algorithm (see [15], pp. 73ff) is utilized, that allows to find a good plan without generating all plans and comparing them (cf. [10] for further details). It makes use of a so called heuristic function, which defines the order of the expansion of the planning tree. Typically, a heuristic function employs a local estimator to assess a state’s quality. In MergeLayouts, this estimation is achieved by the aforementioned quality estimation ξ .2

4 The merge module The merge module has the purpose to execute the plan which is generated from the previous planning module. In this context, to “execute a plan” means to merge all segments within a tupel T to a sole one. Doing so, all informations about the original segments will be combined with suitable voting mechanisms. The result of this process is depicted in figure 1, 2 . The input segments are shown at the top of the figure.

4.1 Voting of all segments in a tupel T Various methods exist to extract a segment’s layout attributes and to achieve reliable classification results. Each one has its pros and cons, and in practice, a huge number of attributes are coorperating in the text recognition. An effective voting of a large number of attributes is difficult because of the following reasons: • Different kinds of recognizers are most suitable for various parts of attributes. Thus, every OCR device might contribute informations which no other device does. For

2. Quality and performance issues can be found in [10], pp. 100ff.

Stefan Klink

11

example, one OCR device might only contribute the recognized text and its font type, whereas another device delivers the recognized text, the attributes bold, italic, the text height, and a subscript/superscript indicator. • Various promises of text recognition could be suitable for the same recognition problem, but it might exist no similarity function defined on the various attributes to combine all values, e.g. parameters which describe an attribute could be meassured in a nominal, ordinal, interval, or in a ratio scale. The meaning of the parameters may be so different, that they could not be easily normed in one scale. For example, different OCR devices might contribute for the same bold segment different values like “true”, “0.876” or “very bold”.

4.1.1 Attributes of a Segment In the following, some attributes of a segment are enumerated and a classification is given. State-of-the-art OCR devices produce several attributes, which might be used from succeeding processing moduls. In MergeLayouts all attributes are taken into account (for example recognized text, coordinates, font type, character height, boldness, super-, subscripted etc.) and are combined3.

4.1.2 Representation of individual attributes and their combination to a common vote Unfortunately, it is not possible to combine every attribute with the same method because of their different representation (cf. [5]). For each representation, we have to define an individual method. In MergeLayouts, we have three of them which are described below: 1.

Binary decisions (“yes” or “no”4) are combined by majority voting, obtaining the

3. A detailed description of all considered attributes could be found in [10], pp. 149ff. 4. The answer might also be “don’t know” if a reliable classification is impossible.

Stefan Klink

12

most frequent value as result [2] – see attribute italic in figure 1. 2.

Numerical values (“42”) are combined by calculating a kind of average (e.g. arithmetic average or median) which represents all input values at best (cf. [10], pp. 84ff) – see attributes tl, rb in figure 1.

3.

Classification results obtained as rankings as well as results obtained as a subset of class labels with measurements are combined on the rank level. For the latter, a ranking is obtained by ordering all classes according to their measure. The combination itself is done by applying the Borda Count method (cf. [4], pp. 53ff).

5 Test results

5.1 Comparison of the accuracy with three commercial OCR devices In this section, we describe a comparison of MergeLayouts with three commercial OCR devices on 22 business documents and on 20 facsimiles. Since the preparation of test data for an automatic evaluation of the page layout retention is a tedious and time consuming task, we focused on the OCR accuracy as a single measure of comparison.

5.1.1 Comparison of business documents For this comparison, 22 german business documents were processed in the following way: • The initial layout structures including the recognized text of each document were obtained from the three OCR devices Recore [12], ScanWorX5 [13], and Easyreader [14], which received the same scanned 300dpi document page as input. • These three layout structures were processed by MergeLayouts, which combined them to

5. Xis is used hereafter as a synonym for ScanWorX.

Stefan Klink

13

a fourth layout structure. • The recognized text within each layout structure of each document was extracted and written into a plain text file, which was compared with the ground truth text file6 and the absolute number of recognition errors and the accuracy of the recognition result was determined. For this comparison, we only regard zones with textual information and exclude every graphical region and signatures. Definition 12: Let n be the number of all characters in the ground truth file and let e be the number of errors, which is based on the operations insert, delete and substitute. n–e The accuracy is defined with: acc = ----------- (cf. [17], p. 13). n The results of the comparison of MergeLayouts with three commercial OCR devices with reference to the accuracy is shown in figure 2.

Figure 2: Comparison of MergeLayouts with 3 commercial OCR devices for scanned business documents.

6. Preparation of ground truth data was in close cooperation with ISRI, following their guidelines for preparing accurate ground truth data (cf. [17]).

Stefan Klink

14

Obviously, the business documents are very good-natured with clean image copies. The overall accuracy of the recognition results for Easyreader is 98.18%, for Recore 97.28%, and for ScanWorX 98.52%. Nevertheless, with MergeLayouts we achieve considerable improvements. Our overall accuracy is 99.45%. Definition 13: Let e x be the number of errors of the recognizer x. ex – ey The relative error reduction is defined with: ∆ r e x, y = ---------------. ex Our relative error reduction with reference to the best recognizer ScanWorX is ∆ r e S, M = 63% . With reference to Recore we achieve ∆ r e R, M = 80% . MergeLayouts is not only considerably better in the overall accuracy compared to a single recognizer. In 19 out of the 22 documents it outperforms the best recognizer for this specific document. In terms of error reduction, MergeLayouts commits 37% less errors than the particular best OCR device and 84,5% less errors than the particular worst OCR device.

5.1.2 Comparison of facsimiles For this comparison, 10 german and 10 english facsimiles of various kind (offer, order, invoice etc.) were processed in the same way as the business documents. The results of the comparison are shown in figure 3. Again, MergeLayouts considerably improves the recognition results. The overall accuracy for Easyreader is 88.7%, for Recore 86.8%, and for ScanWorX 90.2%. With MergeLayouts we achieve 92%. The relative error reduction with reference to the best recognizer ScanWorX is ∆ r e S, M = 18.3% . With reference to Recore we achieve ∆ r e R, M = 40% . Even on facsimiles, in 16 out of the 20 documents MergeLayouts outperforms the best recognizer for this specific document. Overall, MergeLayouts commits 18% less errors than the particular best OCR device and 48% less errors than the particular worst OCR device.

Stefan Klink

15

Figure 3: Comparison of MergeLayouts with 3 commercial OCR devices for facsimiles.

Generally, it is ascertainable that the worse the accuracy of the OCR devices the less they can be corrected with a voting on segmentation and classification. In the following three examples, we want to elucidate some reasons why the facsimiles are so difficult to recognize and what kind of problems might occur: 1.

Due to the low resolution of the facsimiles (204x98 dpi) compared to a scanned image (300x300 dpi), the bit images of the characters are very hard to recognize. The lower

Figure 4: Low quality image of a faxed page.

resolution results in a considerably lower accuracy of all recognizer. This is valid for all facsimiles and is a general problem. In the following snippet (see figure 4), this

Stefan Klink

16

deficiency can be seen very clearly. 2.

Due to the low image quality, the recognition results on smaller fonts gets even worse. If a document contains small fonts, it is scarcely possible for an OCR device to achieve a reliable recognition result (see original-sized footnote in figure 5).

Figure 5: Unreadable small font appearing in a footnote of a page.

3.

Further problems for present OCR devices are graphics or words in different font heights next to the text which is to recognize. Often, the graphical objects are not recognized as non-text objects and the text next to them will be segmented in a wrong way. The following snippet (figure 6) shows the faulty segmentation of a head line.

Figure 6: Intermixed text and graphic portions cause segmentation errors for text lines.

The left stamp “10% Discount” conducts to two segmentation errors. First, some of the line segments are extended too far to the left and second, the heights of some lines are completely wrong. Due to these segmentation errors of some commercial OCR devices, it is unpossible for the planning module to guarantee a correct establishment of fellows of all segments.

Stefan Klink

17

6 Conclusion A comprehensive voting approach is presented, combining entire layouts obtained by commercial OCR devices. For each OCR device, its proprietary output format (e.g. XDOC) is converted into an internal layout hierachy, consisting of line segments, word segments, and character segments. Every segment comprises recognition result(s), segment coordinates, and font information. As output, we produce a new, more reliable layout structure, preserving the original page layout as good as possible. Doing so, we take into account all of the aforementioned attributes and establish so called “fellow segments”. The general voting process is top down, processing each segment level, starting with the line segments. For each level, we perform two steps. First, a planning tree is constructed, representing a search space, where each state describes a possible establishment of fellow segments. A state’s quality relies on a similarity measure of segments of one kind. The similarity itself is based on a segment’s recognition result and its coordinates and may easily be modified to consider various other attributes. Within this search space, a “good” path to an end state is found by utilizing the heuristic beam search algorithm. In the second step, the plan, fully described by the states along the optimal path, is executed. Thus, all established fellow segments are combined by combination of their attributes. To do so, classic voting techniques, e.g. majority voting, are utilized. To evaluate the presented approach, we focus on a single attribute, the recognition hypothesis for each combined segment. The well known character accuracy is determined, for all utilized OCR devices as well as for the presented voting approach. Comparing the results of two test sets (scanned business documents and facsimiles), MergeLayouts achieves much higher recognition accuracy (scanned documents: 99.45%, facsimiles:

Stefan Klink

18

92%) than the best commercial OCR device (scanned documents: 98.52%, facsimiles: 90.2%). In terms of error reduction we reduce the number of errors by 63% for scanned documents and by 18.3% for facsimiles, compared to the best commercial OCR device.

References [1]

PRIME RECOGNITION: PRIMEOCR™, Access Kit Guide, Version 2.50; San Carlos, CA, 1996.

[2]

Lei Xu, Adam Krzyzak, Ching Y. Suen: Methods of Combining Multiple Classifiers and Their Applications to Handwriting Recognition; IEEE Transactions on Systems, Man, and Cybernetics, Vol. 22, No. 3, May/June 1992, pp. 418-435.

[3]

Jürgen Franke & Eberhard Mandler: A Comparison of Two Approaches for Combining the Votes of Cooperating Classifiers; 11th IAPR ’92, The Hague, The Netherlands, pp. 611-614.

[4]

Tin Kam Ho: A Theory of Multiple Classifier Systems and Its Application to Visual Word Recognition; Doctoral Dissertation, Department of Computer Science, State University of New York at Buffalo; Buffalo, New York, May 1992.

[5]

Tin Kam Ho, Jonathan J. Hull, Sargur N. Srihari: Decision Combination in Multiple Classifier Systems; IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16, No. 1, January 1994, pp. 66-75.

[6]

Y. S. Huang, Ching. Y. Suen: Combination of Multiple Classifiers with Measurement values; Proceedings of the 2nd ICDAR, Japan, October 1993, pp. 598-601.

[7]

Xiaoning Ling and W. G. Rudd: Combining Opinions from Several Experts; Applied Artificial Intelligence, Vol. 3, 1989, pp. 439-452.

[8]

Thorsten Jäger, Frank Hönes, and Andreas Dengel: An Adaptive Metaclassifier for Word Recognition Based on Multiple Independent Classifiers; 4th SDAIR, April 1995, Las Vegas, Nevada, pp. 399-412.

[9]

Thorsten Jäger: OCR and Voting Shell Fulfilling Specific Text Analysis Requirements; 5th SDAIR, April 1996, Las Vegas, Nevada, pp. 287-302.

[10] Stefan Klink: Entwurf, Implementierung und Vergleich von Algorithmen zum Merge von Segmentierungs- und Klassifikationsergebnissen unterschiedlicher OCR-Systeme; Master Thesis, DFKI, Kaiserslautern, Germany, October 1997. [11] Xerox Corporation: XDOC Data Format: Technical Specification, Version 3.0; Peabody, Massachusetts, 1995. [12] Ocron Inc.: Recore Developer’s Guide V 2.0; Santa Clara, 1993. [13] Xerox Imaging Systems: ScanWorX API, Programmer’s Guide; Peabody, Massachusetts, 1993. [14] Mimétics S.A.: Easy Reader API V 2.1, User Manual & Reference Manual; ChatenayMalabry Cedex, France, October 1996. [15] Elaine Rich, Kevin Knight: Artificial Intelligence; 2nd edition, McGraw-Hill, Inc., 1991.

Stefan Klink

19

[16] Nils J. Nilsson: Problem-Solving Methods in Artificial Intelligence; McGraw-Hill, Inc., 1974. [17] Stephen V. Rice, Junichi Kanai & Thomas A. Nartker: The Third Annual Test of OCR Accuracy; Annual Research Report, Information Science Research Institute (ISRI), University of Nevada, Las Vegas, USA, 1994.

Stefan Klink

20