Global Skew Detection and Correction using Mor

1 downloads 0 Views 985KB Size Report
languages i.e. English, Devanagari, and Arabic (custom dataset) and have achieved .... Result Showing Devanagari printed and hand written documents.
Global Skew Detection and Correction using Morphological and Statistical Methods 1

Sharfuddin Waseem Mohammed , Narasimha Reddy Soora 1,2

2

Department of Computer Science and Engineering, Kakatiya Institute of Technology and Sciences, Warangal, Telangana, India 1

2

[email protected], [email protected]

Abstract. In this paper we have proposed a technique for skew detection and correction for printed documents, and have used an existing Optical Character Recognition (OCR) to recognize characters. The proposed algorithm has the following steps a) Applying the morphological dilations by defining the various structure elements (SE) b) extracting the longest connected components (CC) c) finding the global skew angle by statistical analysis of connected component d) reference text line estimation and regression line fit to rotate the individual line by estimated angle of rotation. We have conducted experiment using printed images having different languages i.e. English, Devanagari, and Arabic (custom dataset) and have achieved significant performance. Keywords: Morphological dilations, statistical analysis, regression line fit, connected components analysis.

1

Introduction

Most of the historical documents are preserved in digital format by scanning the document with proper dpi (dots per inch). To retrieve the information from these digital images, processing needs an optical character recognition to recognize the characters with the help of techniques such as [12], [13]. In document recognition systems, the quality of the input image is essential to the output performance. During the scanning process, adverse effects of tilting the document produces noise or skew, which are unavoidable. Many techniques have been proposed in the literature to overcome the noise. Many character recognition systems need a preprocessing of document to improve the efficiency; it involves noise removal, skew correction.

2

2

Related Work

Skew detection and correction can be performed on local text regions in images, which is referred to as local skew detection, based on the distance between the characters, words and lines proposed by Saragiotiset al., [2], and if analysis is performed on the whole document is referred to as global skew detection, most of the related skew estimation algorithms are proposed to work for global skew. Many different approaches have been proposed for skew estimation such as projection profile [3-7], Hough transform [9], nearest neighbor clustering [10], and interline cross correlation [11]. Most traditional method is projection profile approach which is simple to detect the skew angle of document image. It is proposed by Postl[3], which is based on horizontal projection profiles are calculated, profiles with maximum variation refers the best alignment to the text lines, with this projection angle the document is rotated to correct the skew. In order to reduce the computational effort many different algorithms are proposed. Baird [4] proposed a technique for selecting the midpoint of each connected component (CC) the bottom side of the bounding box is projected, the objective function is to compute the sum of the squares of the profiles, ciardiello et al., [5] projected selected sub-region, and the objective is to maximize the mean square deviation of the profile. Ishitani [6] uses a different approach a cluster of parallel line on the image is selected and it stores the number of black/white transitions along the lines. Bloomberg et al., [7] is proceeded with extraction of projection profile from a sample image and skew is estimated for sample image rather than the whole document hence this method faster skew estimation method. All the mentioned projection profiles are horizontal approaches even vertical projections A. Papandreou, and B. Gatos [8] method is also proposed for skew estimation, Projection profile methods are limited to estimate skew angle within ±10º to ±15º, [4]. Hough Transform method is generally used to find the shapes in binary digital image. S. N. Srihari, and V. Govindaraju [9], image is transform in Hough domain and peak is calculated to detect the skew and validate in image domain, if more text is scattered, then it’s difficult to find maximum peak which is major limitation of this approach. Nearest neighbor clustering method is based on page layout analysis. L. Gorman [10], this method clusters the nearest neighbor CCs, which exhibits a poor text line segmentation which is limited to certain languages. In interline crosscorrelational method, the cross correlation between two lines with a fixed distance is calculated. H. Yan [11], and a correlation functions for all pairs of lines are accumulated to find the shift of interline cross correlation to determine the skew rate, this method is suitable for small skew angles up to 10º. Our approach is based on statistical method for estimating the skew based on accumulating the 10 longest CCs where we consider the mean and standard deviation, and a reference text line is estimated from the top left corners of the selected longest CC, and skew rate is estimated to rotate the document. Experiment

3

is conducted on different printed images which belong to different languages i.e. English, Devanagari, and Arabic (custom dataset). Result shows a robustness support for the proposed algorithm.

3

Proposed Work

Historical documents are scanned using an electronic scanner to convert it into a digital image either a color or grayscale image, which consist of R rows and C columns matrix M and contains a value of intensity depending of color or grayscale image, if the digital image is color then value of intensity is combination of RBG (i.e. Red, Blue and Green) and if the digital image is grayscale it consist of values between a range of {0,1,2…… 255}, then M(i,j) ∈ { 0, 1, 2…….. 255} where i= 1,…… R and j= 1…….C. After performing the binarization procedure the image M is converted into binary image B(i,j) whose value is either 0 or 1. Proposed algorithm consists of the following steps: Step 1: Preprocessing of the image, remove the component which are < 5 pixel in dimension and applying image binarization, fix a bounding box. Step 2: Applying the morphological dilations by defining the structuring Elements (SE). Step 3: Extracting the Longest CC. Step 4: Finding the Global Skew angle by statistical method. Step 5: Reference Text line Estimation by regression line fit. Step 6: Document image rotation with the defined skew angle.

Step 1: Preprocessing of the image: In the Preprocessing stage, the components which are 5 se = [1,0;0,1]; end if Angle < -5 se = [0,1;1,0]; end After SE’s are selected based on the angle, we have applied again the morphological dilation process using the selected SE’s on the input document. In eq. (1) C is the input document, SE is the selected structuring element based on the angle of the longest CC. Xn = C 𝜑 SE (1) Xn is resultant matrix after applying of dilation. Step 3: Extracting the Longest CC’s. From the resultant Xn image, we extract all the longest CC by analyzing the subsequent connectivity of component from the top left most bounding box to the next bounding box. LCC = 𝑚𝑎𝑥(⋂3𝑛=1 Xn(𝑖, 𝑗)) (2) The longest CC LCCALL is shown in figure 1

5

Fig. 1. The extraction of all longest connected components LCCALL Step 4: Finding the Global Skew angle by statistical method. Here we consider the 10 longest CC whose aspect ratio (AR) is more when compared with all the CCs (LCCALL) to find the average slope to compute the skew angle. We have applied the mean and standard deviation on these longest 10 CCs height and width. Figure 2 shows the centroid of individual bounding box of a longest CC where Xmin , Ymin are the bottom left most coordinates of the bounding box, similarly Xmin ,Ymax are top left coordiantes , Xmax, Ymin are the bottom right coordinates and Xmax, Ymax are the top right coordinates.

Fig. 2. Shows centroid of bounding box Centroid of CCs is considered as follow: CCC = (

𝑋𝑚𝑖𝑛 + 𝑋𝑚𝑎𝑥 2

,

𝑌𝑚𝑖𝑛 + 𝑌𝑚𝑎𝑥 2

) (3)

We consider the centroid of individual CC i.e. CCC (centroid of connected component) to fit a reference line in the component. Step 5: Reference Text line Estimation by Regression line fit.

6

Reference text line for printed document is almost linear hence we can use firstdegree polynomial equation for fitting a line. y = mx+c (4) Where m is the slope of line and c is y-intercept these can be calculated as follows 𝑚=

Ymax−Ymin 𝑋𝑚𝑎𝑥−𝑋𝑚𝑖𝑛

(5)

The above equation is used to calculate the slope of line with the corresponding bounding box which is applied to the longest CC. 𝐶=

𝑌𝑚𝑎𝑥−𝑚.𝑋𝑚𝑎𝑥 𝑃

(6)

Where P is the number of bounding boxes in individual CC’s, we can calculate the average slope for the top 10 longest CCs as follows: 𝜃 = arctan(𝑚) (7) Where m is the slope of the regression fitted line from the centroid of the CCs. Step 6: Document image rotation with the defined skew angle. Estimated skew angle 𝜃 is computed for all the top 10 longest CC and a mean angle (𝜃𝑚𝑒𝑎𝑛) is computed by which, the skewed image document is rotated. We have considered 𝜃1, 𝜃2, 𝜃3 … 𝜃10 as the angle of the longest CC with x-axis and computed mean angle as 𝜃𝑚𝑒𝑎𝑛. In figure 3 illustrate the computation of 𝜃 from mean angles of 3 longest CC, similarly we compute the angles for all 10 longest CCs.

Fig. 3. Computing of θ1, θ2 and θ3 from longest connected components.

7

4

Experiment Results

Tables 1, 2, 3, and 4 demonstrate the result of the proposed algorithm where the input images and output images (without skew) are listed. Experiment is conducted on different text document images from languages like English, Telugu, Devanagari and Arabic. We have considered both printed and hand written documents having multilingual scripts and also have multi skewed text lines to test the performance of the proposed algorithm. Table 1. Result showing English printed and hand written documents.

a1).Original Image with stamps and skew.

a2). a1 Image after de-skew.

b2) b1 Image after de-skew.

8

b1) Original Image with English characters.

c1) Original Image with layout and tables.

d1) Original Image with rectangular boxes.

c2) c1 Image after de-skew.

d2) d1 Image after de-skew.

Table 2. Result Showing Devanagari printed and hand written documents.

e1) Original Image with Devanagari

e2) e1 Image after de-skew

9

f1) Original Image with Devanagari Max skew

f2) f1 Image after de-skew

Table 3. Results showing Telugu printed and hand written documents.

g1) Original image with Telugu-

g2) g1 Image after de-skew

h1) Orignial image with handwritten Telugu

h2) h1 Image after de-skew

10

i1) Original image with printed Telugu.

i2) i1 image after de-skew.

j1) Original image with multi-lingual.

j2) j1 image after de-skew.

Table 4. Results showing Arabic printed and hand written documents.

k1) Original Arabic hand written document.

k2) k1 Image after de-skew.

11

l1) Original Arabic document with max skew.

l2) l1 Image after de-skew.

m1) Original Arabic printed document.

m2) m1 image after de-skew.

n1) Original Arabic image with blur and skew.

n2) n1 image after de-skew.

12

5

Conclusion

This paper proposes a technique to detect and correct the global skew in printed and handwritten documents by considering the top 10 longest CC’s by applying the morphological operations and statistical methods. We have tested the proposed algorithm on multi-lingual languages like English, Telugu, Devanagari and Arabic and we have observed encouraging results as shown in tables 1, 2, 3 and 4.

References 1. Sauvola, J., PietikaKinen, M.: Adaptive document image binarization. Pattern Recognition. 33, 225-236 (2000). 2. Saragiotis, P., Papamarkos, N.: Local skew correction in documents. International Journal of Pattern Recognition and Artificial Intelligence.22, 691–710 (2008). 3. Postl, W. Detection of linear oblique structures and skew scan in digitized documents. In: 8th international conference on pattern recognition, pp. 687-689 (1986). 4. Baird, H. S. The skew angle of printed documents. In: 40th symposium hybrid imaging systems, Rochester, NY, pp. 739–743 (1987). 5. Ciardiello, G., Scafuro, G., Degrandi, M.T., Spada, M.R., Roccotelli, M.P. An experimental system for office document handling and text recognition. In: 9th international conference on pattern recognition, pp. 739–743 (1988). 6. Ishitani, Y. Document skew detection based on local region complexity. In: 2nd international conference on document analysis and recognition, Tsukuba, Japan, pp. 49– 52 (1993). 7. Bloomberg, D.S., Kopec, G.E., Dasari, L. Measuring document image skew and orientation, Document Recognition. 2422, 302–316 (1995). 8. Papandreou, A., Gatos, G.E. A Novel Skew Detection Technique Based on Vertical Projections. In: International Conference on Document Analysis and Recognition, pp. 1384-388 (2011). 9. Srihari, S.N., Govindaraju, V. Analysis of textual images using the Hough transform, Machine Vision and Applications. 2, 141–153 (1989). 10 Gorman, L. The document spectrum for page layout analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 15, No. 11, 1162–1173 (1993).

13

11. Yan, H. Skew Correction of Document Images Using Interline Cross-Correlation. In: CVGIP: Graphical Models and Image Processing, Vol. 55, No. 6, pp. 538-543 (1993). 12. Narasimha Reddy, Soora., Parag, S.Deshpande. Novel Geometrical Shape Feature Extraction Techniques for Multi-lingual Characters Recognition. IETE Technical Review,DOI: 10.1080/02564602.2016.1229583, (2016). 13. Narasimha Reddy, Soora., Parag, S.Deshpande. Robust Feature Extraction Technique for License Plate Characters Recognition. IETE Journal of Research. 61, No. 01, pp. 73-80 (2015).