from the internet. Suggested method is working efficiently for variety of images ... of The Optical Society of America:
Robust Text Detection in Images using Morphological Operations and Gabor Wavelet Nitin Kaushik1, Divya Sarthi, Ankush Mittal {nitnkpec, samaypec, ankumfec}@iitr.ernet.in Deptt. of Electronics & Computer Engg., Indian Institute of Technology Roorkee, Roorkee - 247 667
ABSTRACT
application desire removal of text as a preprocessing step.
Text detection in images or videos is a vital step to achieve various objectives, such as image inpainting, content based retrieval, character recognition, etc. This paper introduces a noble approach for text detection using morphological operators and Gabor wavelet. The proposed method utilizes the inherent characteristics of text which are orientation and frequency in particular orientation. The detection is based on the observation that the font size of the text is uniform throughout the image and that the text area is rich in high frequency components. The distinction between text and background becomes difficult as high frequency components increase in the image background. The technique works efficaciously on different type of images including photographs of signboards, regular text image etc.
Text extraction which includes text detection, localization, binarization and recognition at first may seem to be a trivial application; however, extracting text from real images faces numerous challenges due to lower resolution, unknown text color, size and position, or complex backgrounds. In general, text appearing in images can be classified into two groups: scene text and artificial text. Scene text is part of the image, and usually does not represent information about the image content, (traffic signs in an outdoor scene, etc.), whereas artificial text is laid over the image in a later stage (e.g. the name of somebody during an interview). In both the cases our approach works eminently.
Key Words: Gabor wavelet, Morphological operators, Text detection, Filtering. Other words: Image inpainting, Content based retrieval, Character recognition, Edge detection, Connected component.
I.
INTRODUCTION
The advancement in science, always compete to satisfy needs of time and technologies always propose contemporary methods to deal with the challenges. Rapidly growing demands of information require application which can automatically process document and multimedia data and can intelligently generate relevant data for storage, retrieval and manipulation. Text appearing in images can provide very useful semantic information and may be a good key to describe the image content. Conversely some image inpainting
1
Corresponding author
In this paper, we present an approach that allows detection and extraction of texts from color besides gray scale images. The approach is targeted towards being robust with respect to different kinds of text appearances, including font size, color and direction of text. To achieve this aim, the main focus of the proposed algorithm is centered on the inherent characteristics of the text which are orientation and frequency in particular orientation. To extract the text features from the image we choose Gabor wavelet since it is potentially capable of simulating the human eye. Morphological operators are used in preprocessing stage to remove the noise and fictitious text like object from the image. Several approaches for text detection in images and videos have been proposed in the past. Most of the proposed work only dealt with the horizontal text in scanned
document images [2, 3, 4] and employ edge detection methods [3, 4, 6]. Our approach works in a much simplified manner and utilizes the advantages of filter theory. Rest of the paper is organized as follows. In Section II theory of Gabor wavelet is considered. In Section II proposed method is discussed. In Section IV results are presented after applying the working code on some natural scenes and other images. Conclusion and future work follow in section V.
II.
GABOR WAVELET
In the Literature Gabor wavelet [1, 7], proves to be very useful texture analysis and is widely adopted to extract texture features from the image. Basically Gabor filters are group of wavelets, with each wavelet capturing energy at a specific frequency and a specific direction [9]. Experimental evidences on human and mammalian vision supports the notion of spatial-frequency (multi-scale) analysis that maximizes the simultaneous localization of energy in both spatial and frequency domains [4] which is similar to Gabor wavelet functioning. For a given image I(x, y) with size P×Q, its discrete Gabor wavelet transform is given by a convolution:
Gmn(x, y) = ∑ s
∑ I(x − s, y −t)Ψ
* mn
(s,t)
t
where, s and t are the filter mask size * variables, and Ψmn is the complex conjugate of Ψmn which is a class of self similar functions generated from dilation and rotation of the following mother wavelet:
Ψ ( x, y ) =
1 x2 y 2 exp[− ( 2 + 2 )].exp(j2πWx) 2πσxσ y 2 σx σy 1
where W is called the modulation frequency. The self-similar Gabor wavelets are obtained through the generating function:
~ ~
Ψ ( x , y ) = a − m Ψ ( x, y ) where m and n specify the scale and orientation of the wavelet respectively, with m = 0, 1, …M-1, n = 0, 1, …, N-1, and ~
x = a −m ( x cosθ + y sinθ ) ~
y = a −m (−x sinθ + y cosθ ) In our implementation we choose m=0 (no scaling). While θ is calculated by applying Sobel filters in every block and W is the dominant frequency in the particular block.
III.
TEXT DETECTION PROCESS
There are broadly two classes of methods for text localization as described in [9], which are: 1. Morphological Operations based methods, 2. Frequency or Texture-based methods. The first class of methods employs concepts based on the analysis of the geometrical arrangement of edges or homogeneous color and grayscale components that belong to characters. They are simple to implement, but they are not very robust for text localization in images with complex background. The second class of methods considers text as regions with distinct textural properties. Methods of frequency and texture analysis like Gabor filtering and the wavelet transform are used to analyze text regions. The technique used for text detection in this case takes advantage of both of the above methodologies [2, 6]. The image is first thresholded, then eroded and diluted using a suitable structuring element to remove small and useless components. After this the image is padded uniformly and subsequently divided into NXN blocks, where N is a user specified parameter which exploits the size of the text present in a particular image.
Each block is separately processed to find the dominant frequency in a particular orientation of the text. Then the Gabor filters are applied specific to each block to be able to extract the dominant texture in that block which will be oriented text in particular direction. Based on the assumption that text regions are rich in high frequency components and will be a dominant component in the block, a mask is created. But to remove line based clutter which are present in image and which also hold same property as of text, Morphological Operations have been considered. Connected Component Analysis is one such morphological operation. Connected components of an image are those components which are joined to each other via the pixels in a binary image. The text regions have a property to have connected components of a particular size and density whereas the other clutters may be too small or too large and are usually disproportionate as compared to the text font ratio. The CCA algorithm [2] is applied on the image where each connected component is measured for its size. The area (Ai) of a connected component Ci is measured as the total number of pixels present in Ci. hi and wi are the height and width of the rectangular bounding box that encloses the component Ci. The average area, average height and average width of all components are computed as,
1 M ∑ Ai M i =1 1 M W= ∑ wi M i =1 TA =
,H =
1 M
M
∑ hi ,
and
i =1
where, M is the total number of connected components present in the image. A given component is judged to be clutter based on following considerations: i. Objects having area larger than ThA times average area ii. Objects having disproportionate height and width as compared to normal text
iii.
Very small areas of connected components where, (i) removes bigger objects based on threshold ThA, (ii) remove lines and (iii) is meant to take care of isolated points and small regions acting as noise.
Fig.1(a) Test Image
Fig.1(b) Mask
Fig.2(a) Test Image
Fig.2(b) Mask
Fig.3(a) Test Image Fig.3(b) Output Image
IV.
RESULTS
To demonstrate the potential of text detection algorithm, test runs of the code were performed. In paper to validate the capability of method some pictures with resulting output are shown. These pictures are downloaded from the internet. Suggested method is working efficiently for variety of images and also conceivably able to detect text in all the orientations. Experiment window size was 16X16. Different kinds of images are tested including scanned document image, photographs of signboards and regular text images. Fig.1(a) shows a text image. The aim is to identify the text from it so that later it can be inpainted. Results are shown in Fig 1(b). In
Fig. 2(b) and 3(b) text is detected from scanned and natural signboard image respectively, so that later character recognition or CBR can be done. The proposed technique leads to noticeable better results. However, a fair comparison with other relevant approaches is difficult since no specific metric is available for the same.
V.
CONCLUSION
A simple and efficient technique for text detection in images using morphological operators and Gabor wavelet is presented in this work. The advantage of using the algorithm is that it can efficiently detect text in different orientation in diverse variety of images. Another advantage of the method is it is easy to understand and no rigorous mathematics is involved as well as the concept is very well supported in literature and employed in past to achieve different objectives. Proposed method is fundamentally sound and does not follow conventional ways to detect horizontal text. Moreover, by taking scaling factor into account or changing window size different font size can be recovered. Clustering [1] and/or Connected component approach [2] can further improve the results. REFERENCES [1] Mittal, N.; Mital, D.P.; Kap Luk Chan; Features for texture segmentation using Gabor filters, Image Processing And Its Applications, 1999. Seventh International Conference on (Conf. Publ. No. 465) Volume 1, 13-15 July 1999 Page(s):353 - 357 vol.1 [2] Pati, P.B.; Sabari Raju, S.; Ramakrishnan, A.G.; Pati, N., Gabor filters for document analysis in Indian bilingual documents; Intelligent Sensing and Information Processing, 2004. Proceedings of International Conference on 2004 Page(s):123 - 126 [3] Gllavata, J.; Ewerth, R.; Freisleben, B.; A robust algorithm for text detection in images; Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the 3rd
International Symposium on Volume 2, 1820 Sept. 2003 Page(s):611 - 616 Vol.2 [4] Lyu, M.R.; Jiqiang Song; Min Cai; A comprehensive method for multilingual video text detection, localization, and extraction; Circuits and Systems for Video Technology, IEEE Transactions on Volume 15, Issue 2, Feb. 2005 Page(s):243 - 255 [5] Zhang D., Wong A., Indrawan M., Lu G., Content-based Image Retrieval Using Gabor Texture Feature, Proc. First IEEE Pacific-Rim Conference on Multimedia, Sydney, Australia, 2000. [6] Victor Wu , R. Manmatha , Edward M. Riseman, Finding text in images, Proceedings of the second ACM international conference on Digital libraries, p.3-12, July 23-26, 1997. [7] Dunn, D.; Higgins, W.E.; Optimal Gabor filters for texture segmentation; Image Processing, IEEE Transactions on Volume 4, Issue 7, July 1995 Page(s):947 964 [8] J. G. Daugman. “Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two dimensional visual cortical filters”. Journal of The Optical Society of America: 2(7),Page(s):1160-1169, 1985. [9] Gllavata J., Ewerth R., Freisleben B., “A Text Detection, Localization and Segmentation System for OCR in Images”, Proceedings of the IEEE Sixth International Symposium on Multimedia Software Engineering (ISMSE’04) 2004