Skew Detection in Binary Image Documents Based on Image Dilation

0 downloads 0 Views 263KB Size Report
In this paper we propose a new and efficient method for estimation of skew angle in a binary document image, based on Image dilation and Region labeling.
Skew Detection in Binary Image Documents Based on Image Dilation and Region labeling Approach B. V. Dhandra¹, V. S. Malemath¹*, Mallikarjun H. ¹, Ravindra Hegadi¹ ¹Post-Graduate Department of Studies & Research in Computer Science, Gulbarga University, Gulbarga-585106 Karnataka, INDIA [email protected], [email protected] Abstract In this paper we propose a new and efficient method for estimation of skew angle in a binary document image, based on Image dilation and Region labeling technique. The input document is dilated by using structuring element as line whose length is fixed experimentally and the region labeling technique is applied using depth first search. Orientation angle is calculated for all the labeled regions, the average of all orientation angles is considered as the skew angle of the document. The experimental results show that better accuracy of estimation could be achieved using this approach since it is based on orientation angles of all the text lines of the underlying document and is the minimum variance unbiased estimator of the true skew angle. The novelty of the proposed method is that, it is robust for machine printed document of any size/font, multi-column layouts and documents containing graphics, pictures, charts, tables etc.

1. Introduction The conversion of paper documents to electronic format is routinely done for record management, automated document delivery, document archiving, journal distribution etc. The stages of document conversion include scanning, displaying, image processing, text recognition, image and text database creation and quality assurance. During the scanning process, the whole document or a portion of it is fed through a loose-leaf page scanner. Some times pages are not fed properly into the scanner causing skew-ness of these bitmapped-image pages. A significant skew in document can be detected by human vision easily and the skew correction can be made by re-scanning the document, whereas for mild skew it may not be possible to notice its skew as human vision system fails to identify it. Even a smallest skew angle existing in a given document image results in the failure of

segmentation of complete characters from words or a text lines, as the distance between the character reduces. Further most of the OCRs and document retrieval/display systems are very sensitive to skew in document images. Hence it is important to detect and correct skew ness. Many methods have been proposed by researchers for the detection of skew in binary image documents. The majority of them are based on Projection profile, Fourier transform, cross-correlation, Hough transform, Nearest neighbor connectivity, Linear regression analysis and mathematical morphology. Baird [1] proposed a method for skew detection using projection profile in which numbers of projections are obtained at different angles close to the best expected orientation and the variations are observed for each projection. The maximum peak in the projection with best match to the text lines is considered to be the skew angle. This method fails for the documents with multi-column layout. Further, the accuracy reduces if the document contains noise. Postl [2] proposed a method based on Fourier Transform. In this method the angle of the direction in which the density of Fourier space is large is considered to be the skew angle. The time complexity for this method is considerably large especially when the images are of bigger sizes. Yan [3] proposed a skew detection method using the cross correlation between the text lines at a fixed distance which is based on the fact that the correlation between vertical lines in an image is maximum for a skewed document. It is found that the proposed method is computationally expensive and gives lesser accuracy. Srihari et. al, [4], and others [5, 7, 16] have proposed skew detection methods based on the Hough Transform (HT). The HT is computed at all angles of θ between 0 and 180 degrees. A heuristic measures the rate of change in accumulator values at each value of θ. The skew angle is set to the value of θ that maximizes the heuristic. These methods are computationally very expensive. Shivakumara et. al,

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

[8] proposed skew detection method based on the nearest neighbor connectivity, fixing the boundary for the character in the text line using contour following. The direction of the text line with respect to horizontal axis is obtained by growing the boundary till it reaches the pixel of neighboring character. However their method fails to give the desired accuracy when dots are present at the end of text or for the documents that are line justified. Also the method has lesser accuracy. Shivakumara et. al, [10] proposed method to estimate skew angle based on linear regression analysis. The method considers all the black pixels present in the document without segmenting the individual text lines. The linear regression analysis is used to find slope of a skewed document using all the pixel co-ordinate values. This method gives better accuracy up to ±10° but fails when non-textual region encounters in the document due to more scatter ness of pixels in the nontextual regions. Hiremath et. al, [6] and others [14,15] proposed methods based on nearest neighbor connectivity (NNC). Hiremath et. al, [6] obtained the centroid of each character in the document and the text line is extracted by searching the centroid of the nearest character based on Euclidean distance from the current character. The bounding boxes are fixed for the first and the last characters of the text line. Three lines pertaining to top left, centroid and bottom right edges of the bounding box are drown and angle is measured. The average angle of these lines is taken as estimate of skew. The method is better in estimating skew; however the method has the limitation that the first region should be the text line. Najman [11] proposed a method based on morphology in which they dilated the image by taking the length of line structuring element as 64 followed by erosion by 512 and using Brent’s method of parabolic interpolation skew is estimated. This method has mean absolute error of 0.2 and mean square error of 0.25. Das et. al, [12] proposed a method based on morphology in which they used run-length smoothening (RSLA) algorithm that is, a closing of the image using line structuring element that forms solid black bands corresponding to text lines. They opened and closed image to remove the bumps and registered all transitions which provide lines. The skew is computed for the biggest line. This method works better on small skew angles but the time complexity of the method is large. Chen et. al,[13] proposed the method that starts with threshold reduction then by applying recursive morphological closing and openings to close up the text lines. Then remove ascender and descenders, and determine the connected components. Fit a best line for the points in each set of connected components. Estimate the global skew based on the average of the lines after discarding the outlier lines. They reported the skew error to be larger than 0.3°

From the literature it reveals that the methods which are efficient in the estimating the skew angle are either computationally expensive or can be applied for typical documents and/or suffers for lesser accuracy. Thus there is a need to devise a method to achieve better accuracy at the lower computational time for any type of document image. Hence the following method is proposed.

2. Proposed Methodology The proposed methodology uses technique based on image dilation and region labeling. The dilation is an operation that grows or thickens object in binary image. The manner and extent of this thickening is controlled by a shape referred to as structuring element (abbreviated as strel). Computationally structuring elements are represented by a matrix of 0’s and 1’s. Mathematically, dilation is defined in terms of set operations. The dilation of ‘A’ by ‘C’ is denoted by A ⊕ C and is defined as A ⊕ C = {z │(Ĉ)z ∩ A ≠Ø}. This equation is based on obtaining the reflection of ‘C’ denoted as ‘Ĉ’ about its origin and shifting this reflection by ‘z’ denoted as (Ĉ)z where set ‘C’ is commonly referred as the structuring element in dilation. The structuring element can be formed with different shapes like square, line, disk ball etc. A text line is a group of characters, symbols, words that are adjacent, relatively close to each other, and through which a straight line can be drawn (usually with horizontal orientation). Hence in the proposed algorithm a horizontal line structuring element is used in the dilation process. The image is dilated with the horizontal line structuring element and the length of line is fixed experimentally such that it bridges the gap between any two words of a text line. The hollow objects in the image are filled after dilation. Then the resultant image is region labeled. The region labeling is performed on the basis of depth first search with 8-way connectivity. The region labeling technique labels the connected components in the binary image and assign label 0 to the background. The region labeling yields the total number of regions present in the document. The orientation angles of all these regions are calculated and the average orientation of these regions is taken as the estimated skew angle of the document. For obtaining the orientation, ellipse is fitted over the regions. The angle of major axis of the ellipse is measured with respect to the horizontal axis and this angle is considered as the orientation angle. The proposed method is a global method and is better than the local methods of skew angle detection, as it depends on large number of samples in the determination of skew angle. The algorithm for the above procedure is as follows.

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

Algorithm: Input: Binary Image Document. Output: Estimated skew angle. Step1: Preprocess the document to eliminate noise and small areas like dots, commas, colons etc. Invert the document. Step2: Carry out horizontal dilation of the preprocessed document with the structural element as line. Flood fill the hollow objects in the dilated image. Step3: Perform the region labeling for the dilated image and calculate the orientation angle for each region. Step4: Compute the average orientation angle of regions, obtained in step 3 which is the estimated skew angle of the input document. End.

3. Experimental Results The experimentation was carried out on 60 A4 size (typically 750 X 1050 pixels) document images scanned from the different international journals, magazines, project reports, books of the type .tif and .bmp formats. Out of which 26 images contain only text. The rest 34 images contain tables, charts, graphs, figures, pictures etc. The images with multi-column layouts were also considered in testing. Among the 60 images 26 images with 1-column, 18 images with 2column and 16 images with 3-column were used for testing. The document images of varied fonts and sizes were also considered. The length of line structuring element was set to 32 for dilation so as to bridge the space between characters and words. The skew angle was tested by pre-specifying the angle in the range of 00 to 150, since the angle more than 150 can be detected easily by the human vision. These angles were considered to be the known skew angles. The experimental results with known skew angle are summarized in the Table1.

respectively. The Fig. 5 and 6 similarly present a three column image containing text along with pictures and their image after dilation. The scatter plot 1 shows the scatter of estimated skew angle for the tested 60 images. The method is simple and robust in the estimating the skew angle for all types of document. The experimental results indicate that better accuracy could be achieved by using this method.

3.1 Sample Outputs

Fig.1) Input Image

Fig. 2) Image after Dilation

Fig. 3) Input Image

Fig. 4) Image after Dilation

Fig. 5) Input Image

Fig. 6) Image after Dilation

Scatter Plot of tested images

Known Skew Mean Variance Std. Dev







10°

13°

15°

2.895 0.008 0.090

4.840 0.012 0.110

7.772 0.018 0.136

9.713 0.025 0.159

12.64 0.040 0.201

14.57 0.076 0.277

The Fig.1 shows the input document with single column containing only text; Fig. 2 is the image after dilation of Fig. 1. Fig. 3 and 4 shows a two column input document and their image after dilation

Angle

Table1. The mean, variance and std. deviation of the angles obtained for tested skewed document

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

3° 5° 8° 10° 13° 15°

0

10

20

30

40

50

60

70

No. of Ima ges

Scatter Plot 1: Showing scatter of results for 60 tested images

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

3.2 Comparison with other methods

5. References.

The Table 2 shows the comparison between mean values of angles for 60 test images with global skew estimation methods like Hough transform over total image, Projection Profile and the proposed method.

[1]Baird H. S., “The skew angle of printed documents”, Proc. of SPSE 40th Symposium On Hybrid imaging systems, Rochester, NY, 1987, pp 739-743. [2]Postl. W., “Detection of linear oblique structure and skew scan in digitized documents”, Proc. of Int. Conf. on Pattern Recognition, 1986, pp 687-689. [3]Yan, H., “Skew correction of document images using interline cross-correlation”, Computer Vision, Graphics, and Image Processing 55, 1993, pp 538-543. [4]Srihari S. N. and Govindraju V., “Analysis of textual images using Hough Transform”, Machine vision Applications 2, 1989, pp 141-153. [5]Hindus. S. C., “A Document skew detection using runlength Encoding and the Hough Transform”, Proc. of Int. Conf. on Pattern Recognition, Vol. I, 1990, pp 464-468. [6]Hiremath P.S., B. V. Dhandra, V. S. Malemath, G. G. Rajput, “Skew detection in binary document image based on nearest neighbor connectivity using region centroid approach”, Vigyana Ganga Gulbarga University Research Journal, India Vol. 4, 2005. pp 73-78. [7]Amin A. and S. Fischer, “A Document skew detection method using Hough Transform”, Pattern Anal. and Applns., Springer-Verlag London, 3, 2000, pp 243-253. [8]Shivakumara P., S. Guru, G. Hemantha Kumar, P Nagabhushan, “Skew detection in Binary document image using Linear Regression Analysis”, proc. of National Conf. on Advanced Computer Application NCAC-2002, Pollachi, India 2002, pp 41-46. [9]Gonzalez R., Woods, Digital Image Processing, AddisonWesley Publishing Company. 2nd Ed. 2002. [10]Shivakumara P. , D. S. Guru, G. Hemantha Kumar, P. Nagabhushan, “Skew detection in Binary document Images Based on Boundary growing approach”, (G. Hemantha Kumar 2004 pers. comm.) 2004. [11]Najman L., “Using mathematical morphology for document skew estimation”, SPIE Document Recognition and retrievals XI vol. 5296, 2004, pp 182-191. [12]Das A. and B. Chanda, “A fast algorithm for skew detection of document images using morphology”, International Journal on Document Analysis and Recognition (IJDAR), Vol. 4, 2001, pp 109-114. [13]Chen S. and R. M. Haralick, “An automatic algorithm for text skew estimation in document images using recursive morphological transforms”, In ICIP-94, Austin, Nov., 1994, 139-143. [14]O’Gorman L., “The document spectrum for page layout analysis”, In IEEE PAMI, Vol. 11, 1993, pp 1162-1173. [15]Nakano Y., Y. Shima, H. Fujisawa, J. Higashino and M. Fujinawa, “An algorithm for skew normalization of document images”, Proc. of the 10th Int. Conf. on Pattern Recognition, New Jersey, 1990, pp 8-13. [16] B. Yu and A. K. Jain, “A robust and fast skew

Table2. The Comparison of mean values for 60 test images with different methods Known Skew

Hough Transform

Projection Profile

Proposed Method

3° 5° 8° 10° 13° 15°

3.72 4.71 8.39 10.23 13.33 14.72

2.80 5.24 7.62 9.67 12.57 14.46

2.895 4.840 7.772 9.713 12.64 14.57

From the Table 2 it can be observed that the proposed method performs better than the Hough transform and projection profile method for all types of document in estimating the skew angle.

4. Conclusion The experimental results for the proposed method are encouraging. The approach is robust for machine printed document of any size and font. Further the method showed best accuracy compared to other methods on multi-column layouts or on the documents containing graphics, pictures, tables, charts, figure, mathematical expressions etc. This method performs better as it considers all the text lines in the document for the estimation of skew angle. Further it is evident that the method is based on minimum variance unbiased estimator of the true skew angle. Here the global skew angle estimation method is used as the global method considers large number samples of text lines than that of local methods.

Acknowledgements The Authors are grateful to the authorities of K. L. E. Society Belgaum, for providing financial assistance to the second author. Authors are grateful to referees for their valuable comments and suggestions. The authors are also grateful to Dr. P. Nagabhushan, Dr. G. Hemantha Kumar and Dr. D. S. Guru, Dept of studies in Computer Science, University of Mysore for their helpful discussions and encouragement throughout this work.

detection algorithm for generic documents,” Pattern Recognition, 29, no. 10, 1996, pp. 15991630.

0-7695-2521-0/06/$20.00 (c) 2006 IEEE