An Objective Way to Evaluate and Compare Binarization ... - CiteSeerX

An objective way to evaluate and compare binarization algorithms Ergina Kavallieratou Department of Information and Communication Systems Engineering University of the Aegean Karlovassi 83200, Greece,+30.22730.82263

[email protected] appropriate one is not a simple procedure. The evaluation and comparison of these algorithms proved to be another difficult task since there is no objective way to compare the results. Leedham [5] tries to compare five binarization algorithms by using the precision and recall analysis of the resultant words in the foreground: “The conclusion is that no single algorithm works well for all types of image but some work better than others for particular types of images.” He [3] compares six algorithms by evaluating their effect on end-to-end word recognition performance in a complete archive document recognition system utilizing a commercial OCR engine. Among his conclusions, something that our results also proved is that: “all the local thresholding algorithms we (they) tested have superior performance to a global thresholding algorithm.” Both works, that performed comparison, presented some very interesting conclusions. However, the problem is that in both cases, they try to use results from ensuing tasks in document processing hierarchy, in order to survey the algorithm performance. Although in many case this is the objective goal, it is not always possible. In case of historical documents where their quality, in many cases obstructs the recognition, and sometimes even the word segmentation, this way of evaluation can be proved problematic. On the other hand, we need different evaluation technique, since the processing of historical documents is one of the hardest cases and binarization can be required for removing the noise and facilitate their appropriate presentation.

ABSTRACT The choice of the best binarization algorithm is very critical for any document image processing system, since it is one of the first tasks and any mistake it performs will be carried through the whole system. Here, a new technique for the validation of document binarization algorithms is proposed. Our method is simple in its implementation and it can be applied to any binarization algorithm since it doesn’t require anything more than the binarization stage. It is based on the use of synthetic images from pdf document. Then the binarization algorithm is applied and the result is compared with the original pdf.

Categories and Subject Descriptors I.7.5 [Document and text processing]: Document Capture – binarization.

General Terms Algorithms, Performance, Experimentation

Keywords Document Image, binarization, evaluation technique.

1. INTRODUCTION A document processing task, very useful to document processing systems, is the document binarization, that is, the automatic thresholding of document images in such way that the foreground information is represented by black pixels and the background by white ones. This simple procedure has been proved to be a very difficult task, especially in the case of historical documents that very specialized problems have to be dealt with, such as variation in illumination, smearing, seeping of ink to the other side of the page and general degradation of the paper due to aging. On the other hand, such a task is very important for other stages of document processing.

The ideal way of evaluation should be able to decide, for each pixel, if it finally has succeeded the right color (black or white) after the binarization. This is an easy task for a human observer but very difficult to do it automatically for all the pixels of several images. The proposed method includes the experimentation on document archives made by constructing noisy images, using techniques of image mosaicing, and combining old blank document pages with noise-free pdf documents. This way, after the application of the binarization algorithms to the synthetic images, it is easy to evaluate the results by comparing the resulted image with the original document. Lins[6] is using a similar technique to assess algorithms that remove back-to-front interference.

Several algorithms have been proposed for the document binarization task [1,4,7,8]. However, the selection of the most Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’08, March 16-20, 2008, Fortaleza, Ceará, Brazil. Copyright 2008 ACM 978-1-59593-753-7/08/0003…$5.00.

In this paper, in order to survey the algorithm performance we use for comparison some classic binarization algorithms: Otsu’s, Niblack’s, Sauvola’s and Bernsen’s. The tested binarization algorithms are roughly presented in the next section of this paper. Then, the construction of the experimental data is described in detail in the section 3, while the experimental results and the conclusion are given in section 4 and 5, respectively.

397

2. THE TESTED BINARIZATION METHODS

large enough to suppress noise. It has been proven that a neighborhood 15x15 and k= -0.2 could be a good choice [9].

It is common to distinguish the binarization methods between global and local methods. The global methods calculate one threshold for the entire image, while the local thresholding methods calculate different threshold values depending on the different regions of the image. Here, we used one global method and three local ones.

2.3 Sauvola Sauvola [8] proposes an improvement of the above technique and calculates local threshold by using the local mean m(x,y) value and the local standard deviation s(x,y), again, in the neighborhood of pixel (x,y): ⎛ ⎡ ⎛ s(x, y ) ⎞⎤ ⎞⎟ T ( x, y ) = m⎜⎜ x, y ⎢1 + k ⎜1 − ⎟ R ⎠⎥⎦ ⎟⎠ ⎝ ⎣ ⎝

2.1 Otsu Otsu[7] calculates a global threshold by accepting the existence of two classes (C0,C1), foreground and background, and choosing the threshold that minimizes the interclass variance of the thresholded black and white pixels:

max σ B2 = ω 0 (µ 0 − µ L ) + ω1 (µ1 − µ L ) 2

He [3] notes that, in his experiments, the value of R has a very small effect on the quality and chooses R=128.

2.4 Bernsen

2

Bernsen [1] uses also local thresholding, calculating by the mean value of the maximum (Zmax) and minimum (Zmin) values within a window around the pixel (x,y), when the difference of the two values is bigger than L. Otherwise the pixel is considered as background and takes a default value GT.

Where: k

ω 0 = Pr(C 0 ) = ∑ p i i =1

ω1 = Pr(C1 ) =

⎧ Z max + Z min ⎪ , Z max − Z min ≥ L T ( x, y ) = ⎨ 2 ⎪⎩ GT , otherwise

L

∑p

i = k +1

i

and µ0, µ1 the mean values of the two classes and µL the global mean value.

3. EXPERIMENTAL SETS In order to create the synthetic images for the testing of the binarization methods we used two sets of documents. The first consists of document images in pdf format, including tables, graphics, different number of columns and everything than can be met in a document. The second set consists of old blank document images, taken from a digitized archive of the 18th century. These include all kinds of problems that can be met in old documents: presence of smear, strains, background of big variations and uneven illumination, ink seepage etc.

2.2 Niblack Niblack [4] calculates a local threshold for each pixel (x,y) that depends on the local mean m(x,y) value and the local standard deviation s(x,y) in the neighborhood of pixel (x,y):

T ( x, y ) = m ( x , y ) + k ⋅ s ( x , y )

where k is a constant, which determines how much of the total print object boundary is taken as a part of the given object. The neighborhood size should be small enough to preserve local and Table 1. Relationship ratios between pdf and noise image sizes. Noise:1 noise:2 noise:3 noise:4 noise:5 noise:6 noise:7 noise:8 noise:9 noise:10 noise:11 noise:12 noise:13 noise:14 noise:15

pdf:1 116.66 113.49 113.49 28.25 45.78 103.75 106.47 229.79 239.61 239.61 235.39 211.91 239.61 238.86 15.1

pdf:2 116.66 113.49 113.49 28.25 45.78 103.75 106.47 229.79 239.61 239.61 235.39 211.91 239.61 238.86 15.1

pdf:3 116.66 113.49 113.49 28.25 45.78 103.75 106.47 229.79 239.61 239.61 235.39 211.91 239.61 238.86 15.1

pdf:4 116.66 113.49 113.49 28.25 45.78 103.75 106.47 229.79 239.61 239.61 235.39 211.91 239.61 238.86 15.1

pdf:5 30.87 30.03 30.03 7.48 12.11 27.45 28.17 60.8 63.4 63.4 62.28 56.07 63.4 63.2 4

398

pdf:6 167.27 162.72 162.72 40.51 65.63 148.75 152.66 329.47 343.55 343.55 337.49 303.83 343.55 342.47 21.66

Pdf:7 30.6 29.77 29.77 7.41 12.01 27.21 27.93 60.28 62.85 62.85 61.75 55.59 62.85 62.66 3.96

pdf:8 171.95 167.28 167.28 41.64 67.47 152.92 156.93 338.7 353.17 353.17 346.95 312.34 353.17 352.07 22.26

pdf:9 171.95 167.28 167.28 41.64 67.47 152.92 156.93 338.7 353.17 353.17 346.95 312.34 353.17 352.07 22.26

pdf:10 171.95 167.28 167.28 41.64 67.47 152.92 156.93 338.7 353.17 353.17 346.95 312.34 353.17 352.07 22.26

from the noise level in the image. The result in this case for the same images is also shown in fig.2.

(a)

(b)

Figure 1. Samples of pdf images (above) and noise images (below). The images of the first set are all of size A4. In order to check, if the relation of the size of the two images during the synthesis affects the result, we selected noise images of different sizes in the second set. In table 1 the relationship in the image sizes are shown, while fig.1 gives some samples from both sets. A wide area from less than 4% to around 350% of the size_of_noise /size_of_pdf ratio is covered. A relation of 4 % means that the noise image is only 0,04 times the pdf size (much smaller), while 350% means that the noise image is 3,5 times the pdf size (between A0 and A1).

(c)

(d)

Figure 2. Procedure of construction of synthetic images: a) pdf image, b) noise image, c) image from image averaging, d) image from maximum intensity.

4. EXPERIMENTAL RESULTS

Combining the two sets, by applying superimposing techniques for blending from image mosaicing [2], we built up two different sets of 150 document images each. In more detail, we used as target images the pdf documents and we resized the noisy images according to their sizes. Then, we used two different techniques for the blending: the maximum intensity and the image averaging. In the first case, the maximum intensity technique, the new image was constructing by picking up for each pixel in the new image, the darkest one from the two images. This means that in case of foreground of the pdf image would have a lead over the noisy one, but in the background we would have the one from the noisy image since it is almost always darker than the pdf background that is absolutely white. This technique has a good optical result as it can be seen in fig.2 but it is not very natural as the foreground would be always the darkest, since it is not affected at all from the noise. This set is fine in order to check how much of the background can be detracted by a binarization method. However, in order to have a more natural result, we also used the image averaging technique, where each pixel in the new image is the average of the two corresponding ones in the original images. In this case the result presents a lighter background than that of the maximum intensity technique but the results is more affected

Our intention here is not to criticize the specific algorithms mentioned above. However, we are going to show some results in order to demonstrate how this evaluation method can be used either to select appropriate binarization algorithm for specific application or just to see the disadvantages of an algorithm and try to fix them. We applied all the methods described in section 2 to both of the sets described in section 3. The results for the image of Fig.2c and for all the methods are shown in fig.3. The final validation is made after counting the pixels that changed value, black-to-white or vice versa, by comparing the result binary image with the original pdf image. The error ratio results for the maximum intensity technique are shown in tables 2 and 3 according to noise images and pdf images respectively, while the corresponding results for the image averaging technique are shown in tables 4 and 5.

399

method pdf:1 pdf:2 pdf:3 pdf:4 pdf:5 pdf:6 pdf:7 pdf:8 pdf:9 pdf:10 (a)

Bern 0,09 0,09 0,09 0,09 0,09 0,12 0,14 0,09 0,1 0,14

Nibl 0,16 0,16 0,16 0,16 0,17 0,18 0,16 0,16 0,17 0,18

Otsu 0,94 0,94 0,95 0,94 0,96 0,96 0,98 0,95 0,17 0,98

Sauv 0,1 0,1 0,1 0,1 0,09 0,1 0,08 0,1 0,09 0,1

(b) Table 4. Error rates for the image averaging technique by noise images. method noise:1 noise:2 noise:3 noise:4 noise:5 noise:6 noise:7 noise:8 noise:9 noise:10 noise:11 noise:12 noise:13 noise:14 noise:15

Figure 3: The results for the image of Fig. 2c from (a) Otsu’s (b) Sauvola’s (c) Niblack’s and (d) Bernsen’s

Table 2. Error rates for the maximum intensity technique by noise images. method noise:1 noise:2 noise:3 noise:4 noise:5 noise:6 noise:7 noise:8 noise:9 noise:10 noise:11 noise:12 noise:13 noise:14 noise:15

Bern 0,11 0,03 0,03 0,03 0,03 0,04 0,1 0,03 0,17 0,34 0,31 0,07 0,04 0,05 0,2

Nibl 0,13 0,08 0,08 0,11 0,13 0,06 0,25 0,06 0,27 0,31 0,3 0,15 0,15 0,09 0,29

Otsu 0,96 0,96 0,96 0,96 0,96 0,94 0,96 0,96 0,97 0,97 0,97 0,92 0,96 0,92 0,97

Sauv 0,05 0,03 0,03 0,04 0,06 0,04 0,14 0,04 0,18 0,21 0,21 0,09 0,06 0,05 0,19

Bern 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03

Nibl 0,04 0,03 0,03 0,04 0,04 0,04 0,08 0,03 0,09 0,1 0,1 0,07 0,05 0,04 0,1

Otsu 0,94 0,94 0,94 0,95 0,95 0,91 0,95 0,95 0,95 0,96 0,96 0,89 0,95 0,88 0,96

Sauv 0,03 0,03 0,03 0,03 0,03 0,03 0,04 0,03 0,04 0,04 0,04 0,04 0,03 0,03 0,04

Table 5. Error rates for the image averaging technique by pdf images. method pdf:1 pdf:2 pdf:3 pdf:4 pdf:5 pdf:6 pdf:7 pdf:8 pdf:9 pdf:10

Bern 0,03 0,04 0,03 0,03 0,03 0,03 0,02 0,03 0,02 0,02

Nibl 0,06 0,06 0,06 0,06 0,06 0,07 0,05 0,06 0,05 0,05

Otsu 0,92 0,92 0,93 0,92 0,94 0,94 0,97 0,94 0,12 0,97

Sauv 0,04 0,04 0,04 0,04 0,03 0,04 0,02 0,03 0,03 0,02

It should be mentioned that the majority of the errors are white-toblack in both techniques with less than 0.0001‰ black-to-white errors in the maximum intensity technique and less than 0.0007‰ in the image averaging technique. The specific values for each binarization algorithm are shown in the diagram of figure 4 for

Table 3. Error rates for the maximum intensity technique by pdf images.

400

evaluate the results by comparing the resulted image with the original document.

both techniques. We should notice here that Otsu’s algorithm, although presents high error rates in both techniques have no black-to-white errors in none of them. Information like this can be used properly for the improvement of the algorithms or their use in specific applications.

Although it is not our intention to criticize any algorithms we gave some results in order to demonstrate our method and show what kind of information can be extracted. It is important that our results seem to verify the conclusions of other evaluation methods that we presented in the introduction. That is: “the conclusion is that no single algorithm works well for all types of image but some work better than others for particular types of images” [5] and “all the local thresholding algorithms we (they) tested have superior performance to a global thresholding algorithm” [3].

Black to white pixel mistakes

Bl_Wh Error rate

0,0000005 0,0000004 0,0000003

mean bl_wh rate max

0,0000002

mean bl_wh rate avrg

In our future plans is the performance of more experiments in order to examine the binarization procedure with more algorithms and specific applications.

0,0000001 0 Bern

Nibl

Otsu

Sauv

Algorithms

6. REFERENCES [1] J. Bernsen, “Dynamic thresholding of grey-level images”, 8th ICPR, 1986, pp 1251-1255.

Figure 4. In graph of figure 5, the error rate results are shown by the size of noise images for maximum intensity texhnique. In comparison with the table 1 and keeping in mind that the noise images have been placed on the axis X by size it doesn’t seem any apparent dependency between the binarization performance and the size relationship of pdfs and noise images. Similar are the results for the image average technique.

[2] L. Gottesfeld Brown, “A survey of Image Registration Techniques” , ACM Computing Surveys, Vol 24, No 4,325376, 1992. [3] J. He, Q.D.M. Do, A.C. Downton, J.H. Kim, “A Comparison of Binarization Methods for Historical Archive Documents”, 8th ICDAR, 2005, pp. 538-542. [4] W. Niblack, “An Introduction to Digital image processing”, Prentice Hall, 1986, pp 115-116.

Error rate by the size of noise image

error rate

1 0,8

Bern

0,6

Nibl

0,4

Otsu

0,2

Sauv

[5] G.Leedham, S. Varma, A. Patankar, V. Govindaraju, “Separating Text and Background in Degraded Document Images” Proceedings 8th IWFHR, pp. 244-249, September, 2002.

0

no ise :1 no 5 ise no :4 ise no :5 ise no :6 ise no :7 ise no :2 ise no :3 is no e:1 ise :1 no 2 ise no :8 ise no :11 ise :1 no 4 is no e:9 ise no :10 ise :1 3

[6] R. D. Lins, J. M. M. da Silva, “A Quantitative Method for Assessing Algorithms to Remove Back-to-Front Interference in Documents”, SAC’07, 2007, Seoul, Korea, pp.610-616.

images by size

[7] N. Otsu, “A threshold selection method from gray-level histograms”, IEEE Trans. Systems Man Cybernet., 9 (1), 1979, pp. 62-66.

Figure 5. Maximum intensity technique. The images have been placed on axis X by size.

[8] J. Sauvola, M. Pietikainen, “Adaptive Document Image Binarization”, Pattern Recognition, 33, 2000, pp. 225-236.

An important remark is that all the algorithms present better performance in the case of image average technique.

[9] J.S Valverde, R.R Grigat, “Optimum binarization of technical document images”, Proceedings. of International Conference on Image Processing, vol.3, 985-988, 2000.

5. CONCLUSION – FUTURE WORK We proposed a technique for the evaluation and comparison of binarization algorithms appropriate for document images that are difficult to be evaluated from techniques based on segmentation or recognition of the text. In order to survey the algorithm performance we used for comparison several binarization algorithms a global one (Otsu’s) and several local ones (Niblack’s, Sauvola’s, Bernsen’s). We performed experiments on document archives made by using two different techniques of image mosaicing and combining old blank document pages with noise-free pdf documents. This way, after the application of the binarization algorithms to the synthetic images, it is easy to

Ergina Kavallieratou received the diploma and the PhD in Electrical Computer Engineering, in 1996 and 2000, respectively, from the University of Patras, Greece. She has worked as researcher in the dept. of Telecommunication Engineers, Polytechnic University of Madrid, as well as in the dept. of Electrical Engineers, Ruhr University of Bochum, Germany. Since 2004, she is a lecturer in the University of the Aegean, Greece. Her research interests include document image analysis, image processing and OCR. She is a member of ACM and IEEE.

401

An Objective Way to Evaluate and Compare Binarization ... - CiteSeerX

An Objective Way to Evaluate and Compare Binarization ... - CiteSeerX

Suggest Documents

An experimental study to evaluate and compare the analgesic activity

An experimental study to evaluate and compare the analgesic activity

Some new and easy ways to describe, compare, and evaluate ...

An Objective Method to Evaluate Stroke-Width Measures for Binarized ...

Using Sports Wagering Markets to Evaluate and Compare Team ...

Study to Compare and Evaluate Traditional vs. Endoscopic Septoplasty

A novel approach to evaluate and compare computational ... - NHESS

Application of the Characteristic Function to Evaluate and Compare ...

Objective 2: To compare K deficient and non-deficient ...

An alternative way to evaluate chemistry ... - Geosci. Model Dev.

An Outcomes Approach to Evaluate Professional ... - CiteSeerX

An Outcomes Approach to Evaluate Professional ... - CiteSeerX

AN OBJECTIVE METHOD TO ASSESS ... - CiteSeerX

Methods to reconstruct and compare transcriptional ... - CiteSeerX

Binarization, character extraction, and writer identification ... - CiteSeerX

Adaptive document image binarization - CiteSeerX

an international effort to compare gas hydrate reservoir ... - CiteSeerX

Objective 1. To evaluate the utilization of sewage ...

Objective: Compare the clinical and pressure walkway gait evolution ...

Preclinical investigation to compare different gadolinium ... - CiteSeerX

Beyond Compare - CiteSeerX

Development of an Instrument to Evaluate the Quality of ... - CiteSeerX

An in vitro model to evaluate virus aerosol characteristics ... - CiteSeerX

Using An E-Model Implementation to Evaluate Speech ... - CiteSeerX