Text extraction from comic books Adam GHORBELa,b, Jean-Marc OGIERb, Nicole VINCENTa a
Paris Descartes University, LIPADE-SIP, Paris, France b La Rochelle University, L3I, La Rochelle, France
[email protected],
[email protected],
[email protected]
Abstract. Comic books are one of different forms of storytelling and entertainments around the word. Through years, comic books have been widely spread. This fact has encouraged document analysis researchers to exploit this type of graphic documents. In this paper, we propose an automatic model for text extraction from comic book. This model works at two different levels. First, generalized Haar-like features are applied in order to define different candidate zones. Then, a connected component technique is processed to extract only text areas. The proposed model is evaluated using the eBDtheque database and is compared with some existing state-of-the-art approaches. Keywords: Haar-like features, comic books, comics text extraction, connectedcomponent labeling.
1
Introduction
Entertainment and storytelling may be considered as an art form. They represent an artistic work full of imagination and creativity. One of the typical devices for storytelling is comic books. They have been present with different representations in human culture for many years [1]. However, nowadays, taking into consideration the technological advances, comics are proposed to be downloaded on digital devices such as mobile devices [2] [3] [4]. Thus, comics have just been widely involved in our living and especially our leisure times. Therefore, this expansion has encouraged the community of document and graphics researchers to analyze the large amount of comic albums. Different applications have been proposed through years for graphic documents such as indexation, the search of specific items, and content analysis (e.g. text localization, speech balloons detection, and frame detection). In this paper, we propose an automatic technique for text extraction in digitized comic pages. The originality of this technique is the ability of detecting all text within the processed pages whether it is contained or not in a speech balloon. Then, to extract solely text areas, a connected component labeling technique is adopted. The remainder of the paper is organized as follows. In section 2, an overview of text segmentation techniques is given. Then, the proposed method is described in
section 3. In section 4, experimental results are presented. Finally, section 5 concludes the paper.
2
State-of-the-art
In the literature, comic books are known and classified into three categories. This classification is based on cultural particularities of each country and its spoken language [5]. Despite the variability of classes, comic books share almost the same structure. Each page contains a sequence of frames or strips which are separated by white or mono-color tubes or gutter. Indeed, we may also find other contents such as texts, speech balloons, and drawing pictures, and they can be either contained within each frame or overlap two frames or more [6]. More particularly, the text may be written within speech balloons or on the outside of them (see Figure 1). As a matter of fact, most of state-of-the-art techniques that treat text detection and localization tasks in comic books based their works on the assumptions which are “text is part of speech balloons” and “text is written in black in a white speech balloon” [2][7][8] [9]. Thus, frame detection and speech balloons steps must to be achieved before text extraction step. For instance, in [7], after gathering the different speech balloons, the authors proposed a sorting rule to extract all text areas within them. Furthermore, in order to not restrict the application on the assumption “text is written in black in a white speech balloon” , [10] propose an application for frame and text extraction from comic pages where text background color should be similar to page background. We may also think of text detection in graphics, architectural plane or others. The complexity of the graphic is even higher but the number of text words is larger. Then, some specific methods are needed. In the context of our work, we propose an automatic text extraction method for digitized comic books where it is not based on any of the previous assumptions. Nevertheless, we assume only that text is nearly horizontal.
Fig. 1. Examples of the location of text in comics.
3
Proposed approach
The contribution of our proposed method is its ability in detecting and extracting text areas directly from comic books without applying frame and speech balloons detection techniques. Indeed, our method does not rely on assumptions about the localization and the written color and style of the text in the processed comics. The originality of this work is the application of Haar-like features in order to detect the text areas and then a connected component labeling algorithm is applied to extract these areas. 3.1
Text detection technique
To detect text areas from processed comics, the idea is to apply Haar-like features that are easily computed from the integral image [11]. As far as we know, this is the first work that applies Haar-like features on digitized comic books. In fact, generalized Haar-like feature are recently proposed and applied to heterogeneous document collections to detect queried texts [12]. The idea is that text, in a coarse version introduces in the document some horizontal well-contrasted part, whatever the colors of the text or the background are. Then, looking at the page, text brings some horizontal contour. The fundamental principle of this technique is based on applying a certain number of viewpoints which are Haar-like filters globally on the processed comic images. In fact, they correspond to a global transformation of the processed image that allows highlighting different shapes. These viewpoints are independently treated and their findings are gathered together to make the final results. In this work, we define only two filters that are shown in figure 2b and which allows detecting aligned text areas, either the top part of text or the bottom part of text. This process aims at transforming the RGB image (figure 2a) into a grey image (figure 2c) where candidate text zones present a high response. Of course, other elements in addition to text can give a high answer to the filters. Then, the final result specifying the candidate zones is obtained by a binarization process of the transformed image (figure 2d). The process of this technique is shown in figure 2. 3.2
Text extraction technique
The detection stage results in generating many candidate zones, painted in red in figure 2, or false positives that correspond more or less to text areas because they contain horizontal graphics. Thus, a filtering step based on connected component labeling algorithm relying on the assumption of 8-connectivity is applied. First, bounding boxes associated with the text candidates are extracted based on the connected. Two criteria enable to select the bounding boxes containing text area. The bounding boxes must not be too large and as they should be associated with lines or words, they must have a high aspect ratio (weight/height). Then, we retain only bounding boxes with areas less than the median of all bounding boxes areas. The too small bounding boxes are deleted thanks to the aspect ratio criterion.
Fig. 2. Text detection process in digitized comics.
After describing the proposed approach, now, we will present the experiment results.
4
Experiments
The proposed text detection and localization approach has been evaluated on the eBDtheque database [13] which is composed of European and American comics. It contains 42 pages from 7 different authors within there is 435 speech text areas and 79 narrative text areas. In the figure 3, we present results obtained by applying our proposed method. In figure 3b, we remark that Haar-like filters allow detecting text areas and no text areas that correspond to texture areas. By applying the connected component algorithm, we extract only the text areas (figure 3c). We notice that we are able to extract all text zones within speech balloons by overcoming the assumption of “text is written in black in a white speech balloon”.
Fig. 3. Text detection and localization in comics
Furthermore, in the following figure, we show an example of obtained results for another comic page where there are some text areas that are not written within speech balloons. This is considered as a challenge for some text detection and extraction approaches for comics that are not able to detect this type of text. In Figure 4b, we show the obtained bounding boxes and we extract the little bounding boxes with areas are less than the text ones, so by eliminating those bounding boxes, we extract text zones and some other zones that correspond more faithfully to text ones (figure 4c).
Fig. 4. Another example of text detection and localization in comics
5
Conclusion
In this paper, we have presented an automatic text detection and localization method for comics. This method is based on applying Haar-like filters to detect text areas and on a connected component labeling algorithm to extract text areas. The originality of this proposed work is that it does not rely on assumptions about the localization
and the written color and style of the text in the processed comics. Indeed, this step is done globally on the processed comic image without applying frame or speech balloons detection techniques. Experiments show that this method allows detecting text inside or outside the speech balloons and generate results almost similar to the stateof-the-art. The main objective of our future work will be to detect and extract texts that are not aligned in comics.
6
References
1. S. McCloud, “Understanding Comics-the invisible art,” Harper Collins, (1994). 2. K. Arai, H. Tolle, “Method for real time text extraction of digital manga comic,” International Journal of Image Processing (IJIP), pp. 669–676, (2011). 3. Cyb, “Bubblegôm,” Studio Cyborga, Goven, France, (2009). 4. Y. In, T. Oie, M. Higuchi, S. Kawasaki, A. Koike, H. Murakami, “Fast frame decomposition and sorting by contour tracing for mobile phone comic images.,” International Journal of systems applications, engineering and development, pp. 216–223, (2011). 5. C. Ponsard, V. Fries, “An accessible viewer for digital comic books,” In: ICCHP, LNCS 5105, Berlin, Heidelberg, pp. 569–577, (2008). 6. A. K. Ngo Ho, J. C.Burie, J. M.Ogier, “Comics page structure analysis based on automatic panel extraction.,” In: Grec, Nineth IAPR International Workshop on Graphics Recognition, Seoul, Korea, pp. 15–16, (2011). 7. M. Yamada, R. Budiarto, M. Endo, S. Miyazaki, “Comic image decomposition for reading comics on cellular phones.,” IEICE Transactions, pp. 1370–1376, (2004). 8. K. Arai, H. Tolle, “Method for automatic e-comic scene frame extraction for reading comic on mobile devices.,” In: Seventh International Conference on Information technology: New Generations, Washington, DC, USA, pp. 370–375, (2010). 9. K. Arai, H. Tolle, “Automatic E-Comic Content Adaptation,” International Journal of ubiquitous Computing IJUVC, (May-2010). 10. C. Rigaud, N. Tsopze, J. C. Burie, J. M.Ogier, “Robust frame and text extraction from comic books,” In: Grec, Nineth IAPR International Workshop on Graphics Recognition, Seoul, Korea, pp. 129–138, (2011). 11. P. Viola, M. Jones, “Robust Real-time Object Detection,” IJCV, (2001). 12. A. Ghorbel, J. M.Ogier, N. Vincent, “A segmentation free Word Spotting for handwritten documents,” ICDAR 2015, Gammarth, Tunisia, (Aug-2015). 13. C. Guerin, C. Rigaud, A. Mercier, F. Ammar-Boudjelal, K. Bertet, A. Bouju, J. C. Burie, G. Louis, J. M.Ogier, A. Revel, “eBDtheque: a representative database of comics,” Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1145–1149, (2013).