Layer-Based Binarization for Textual Images - CiteSeerX

2 downloads 0 Views 619KB Size Report
Mount Carmel. Haifa 31905, ISRAEL navony@il.ibm.com. Abstract. We developed a binarization approach to handle a large variety of images, from scanned ...
Layer-Based Binarization for Textual Images Yaakov Navon IBM Haifa Research Lab Mount Carmel Haifa 31905, ISRAEL [email protected] Abstract We developed a binarization approach to handle a large variety of images, from scanned flatbed images to images acquired by mobile phone cameras. The binarization is targeted at creating layers of binary images for processing by OCR engines. The layers are classified spatially and by intensity and color. First textual pixels are classified by a text operator. The text kernel is then segmented by intensity/color levels and layout analysis techniques to create regions of similar text. Finally, adaptive binarization is applied to each region to obtain superior binary images. Our experimental results show the advantages of our method over local binarization methods.

1. Introduction The readout and analysis of digital image contents by OCR means is successfully used in document processing systems, parcel sorting systems [1], and other similar procedures. Properly handling compound scene images, such as those captured by mobile phone cameras, requires advanced binarization methods. Binarization methods are roughly classified global and local processes. In global binarization methods [2, 3], a single threshold is calculated for the image. The resulting binary image is deemed acceptable when the input image is acquired under reasonable illumination conditions and the image brightness is uniform. In many applications where colored images are processed, global binarization creates a number of artifacts. For example, text printed in reddish colors is assigned high intensity values when the image is transformed to grayscale. If the global threshold is below those intensities (e.g., when dark text dominates the image), the reddish text does not appear in the binary image.

978-1-4244-2175-6/08/$25.00 ©2008 IEEE

Local binarization methods can be improved by calculating local thresholds within separate windows or areas. Many methods have already been proposed. Niblack [4] estimates the local threshold from the mean and standard deviation of pixel values within a window. Bersen [5] sets the local threshold from the maximum and minimum pixel values in the window. Global binarization methods can also be applied on a windowed basis as local binarization. In most of these methods, the size and shape of the window, which is normally square, are predefined parameters. Poor binarization results are obtained when a window’s boundaries cross characters. The different parts of the characters are then binarized with different thresholds, which may cause undesirable artifacts in the binary image. Sauvola et al. propose a two-stage binarization method for document images [6]. The first stage is a rapid classification of local contents: background, picture, and text. The background and picture areas are binarized by a soft decision method. For the text area, a modified version of Niblack is used. This paper presents a new layer-based binarization approach for handling complicated images designed to extract only the textual areas for processing by OCR engines. The method is based on extracting text kernels of differing intensities and colors, analyzing the layout of the kernels to locate regions of similar text, and performing local adaptive binarization for each region. The resulting binary image is free from graphics, dark areas, and textured backgrounds, which are generally the reasons for poor OCR results.

2. Layer Based Binarization The areas of interest for our binarization approach are textual. In the proposed method, text is localized before undergoing local adaptive binarization.

Image text must be segmented into “homogenous” regions prior to binarization. Unfortunately, most segmentation processes, including layout analysis, will only accept binary images as input; thus, we encounter the chicken and egg paradox. Multi-stage binarization overcomes part of the problem. The following sections present the binarization stages that comprise our new approach: 1. Find strokes in image 2. Classify strokes by intensity or color 3. Localize text areas in image 4. Locally binarize the meaningful area

2.1. Global Stroke Kernels A text character printed on an image is composed of several strokes. Generally, either printed or handwritten text is characterized by two main parameters: stroke width (or pen width) and contrast. A pixel is classified as “text/stroke” if its intensity or color differs from that of pixels in the immediate neighborhood of the stroke width size. Let P(x,y) denote pixel intensity or color vectors at the x and y point coordinates and let w be the dominant stroke width. Based on the above, most “text” pixels in an image can easily be set by applying an operator that emphasizes strokes in the image. Checking for contrast along several directions easily reveals stroke pixels. One such operator, which checks the contrast in four directions—horizontal, vertical and two diagonals—is given as follows: P(x-w, y) - P(x,y) > t AND P(x+w, y) - P(x,y) > t OR P(x, y-w) - P(x,y) > t AND P(x, y+w) - P(x,y) > t OR P(x+d, y+d) - P(x,y) > t AND P(x-d, y-d) - P(x,y) > t OR P(x-d, y+d) - P(x,y) > t AND P(x+d, y-d) - P(x,y) > t. The parameter t is the contrast in a grayscale image, or color difference in a color image, and d is w 2 . One can easily verify that the accuracy of the stroke width in this operator is not important because the strokes of text are well “surrounded” with background. However, if d is widely different across an image, then it can be locally estimated, as proposed in [7]. One method of estimating the t parameter is presented by Navon et al. [7]. However, when the text contrast varies across the image, for example in cases where images are captured in non-uniform lighting conditions, there is no meaning for such an estimation of a single t, as explained below in Section 2.2. For many applications, the stroke width can be considered a predefined parameter or estimated as shown in [7]. Pixels that are positioned on the image’s strokes generally come from text, textured background, and

graphics. To emphasize all the stroke pixels, we can set the t parameter to a minimal value (i.e., minimum contrast or color differences) and apply the above operator for a given stroke width or a set of reasonable stroke widths. The result of this process is a mask image of stroke kernels, i.e., all the pixels that are part of the strokes. Text printed in negative, with light text on a dark background, can be treated with the same operator, by substituting the “>” symbol with the “

Suggest Documents