A New Text Detection Algorithm for Content-Oriented Line Drawing ...

31 downloads 0 Views 236KB Size Report
Abstract. Content retrieval of scanned line drawing images is a dif- ficult problem, especially from real-life large scale databases. Existing algorithms don't work ...
A New Text Detection Algorithm for Content-Oriented Line Drawing Image Retrieval Zhenyu Zhang, Tong Lu, Feng Su, and Ruoyu Yang State Key Lab. For Novel Software Technology, Nanjing University, China

Abstract. Content retrieval of scanned line drawing images is a difficult problem, especially from real-life large scale databases. Existing algorithms don’t work well due to their low efficiency by first recognizing various types of graphical primitives and then content-oriented texts. A new method for directly detecting texts from line drawing images is proposed in this paper. We first decompose a drawing image into a set of Local Consecutive Segments (LCSs). A LCS is defined as a minimum meaningful structural unit to imitate a stroke during human-drawing process. Next, we identify candidate character LCSs by statistical analysis and merge them into character LCS blocks by geometrical analysis. Finally, Hough transforms are applied to calculate the orientations of character LCS blocks and generate candidate strings. Experimental results show that our algorithm can well detect strings in any orientation. Our method is robust to text-graphic touching, scanning degradation and drawing noises, providing an efficient approach for content retrieval of document images. Keywords: text detection, content-oriented, line drawings, image retrieval

1

Introduction

Image content retrieval has always been a focal problem in multimedia and computer vision, and a number of algorithms have been proposed in the recent years [1–6]. These algorithms work well for scene, landscape and other images; however, they can’t work on binary line drawing images which exist in a large number of real-life industry applications. Based on our previous work [8, 9, 15–18], we propose a novel text detection approach for content-oriented line drawing image retrieval, considering text plays a fundamental role in automatic drawings interpretation. Due to the fact that there are always unpredictable degradations or noises in scanned real-life drawing images and the low-efficiency of existing vectorization algorithms for content retrieval, text detection is still an unsolved problem. Our method first converts a line drawing image into a structural representation composed of a set of Local Consecutive Segments (LCSs). A LCS here represents a minimum meaningful structural unit in the drawing image (see Section 3). Next, we generate initial character clusters by identifying character LCSs

based on statistic and geometric analysis. Finally, Hough transform is employed in the clusters to group them into strings and estimate their orientations. Our method has the following advantages against the existing methods: 1. Texts can be directly detected from binary images without straight line segments or curves being recognized and removed. 2. Texts can be robustly detected from complex environments with text-graphics touching, text degradations or unpredicted noises. 3. Our method is font and orientation invariant. The paper is organized as follows. Section 2 introduces the existing text detection algorithms, then section 3 gives our approach. Experimental results on a number of real-life line drawing images and discussions are given in Section 4. Finally Section 5 gives our conclusions.

2

Related Work

Existing text detection methods can be roughly categorized into two types: vector-level approaches and pixel-level approaches. Vector-level approaches perform text detection by merging short vectors within a limited area into candidate characters [7], or recognizing characters after removing all the graphical primitives [8, 9]. These methods are based on geometrical analysis after vectorization process. However, vectorization is time-consuming and not suitable for current content-oriented retrieval tasks. Pixel-level methods detect strings from raster images directly. Fletcher et al. [10] first use connected components analysis to segment characters from mixed text/graphics images and then apply Hough transforms to group the collinear characters into text strings. This method is font and orientation invariant. Lai [11] extracts dimensional text strings based on a set of heuristic rules. Lu [12] proposes a method by first erasing consecutive black pixels along specific orientation and then extracting text regions from the rest areas. It is font invariant and robust to text-graphic touchings. Strouthopoulos et al. [13] employ a SOFM (Self Organizing Feature Map) method to classify candidate characters extracted by a contour following algorithm. Recently, Roy et al. [14] use SIFT for text character localization in graphical documents. However, most of these methods have the same assumption that characters seldom touch with other graphical primitives and texts are along with several given orientations. We need a new approach to fast detect texts especially from real-life complex line drawing images.

3

Our Approach

In this section, we first introduce how to generate LCSs from a given drawing, and then show how to extract candidate characters and gradually integrate them into meaningful strings. As a result, content retrieval can be performed on the separated strings after using any existing OCR algorithm [18]. To be brief, we only focus on text detection and don’t discuss the OCR process in this paper.

3.1

LCS Generation

We first convert a given drawing image into a set of LCSs. A LCS is defined as a curve segment fitted by those neighboring pixels having the similar first-order derivatives. It is an independent minimum unit to imitate a stroke during humandrawing process. Fig. 1(a) and Fig. 1(b) show the LCSs generated from a string ”PCM 2010, Shang Hai” and another two line segments with an arrow symbol. Each LCS in Fig. 1 is surrounded by a rectangular box. It can be seen that a LCS is actually made up of a set of adjacent black runlengths and represents a linear component. Therefore, LCS representation is a higher level than pixel-level connect component representation.

(a)

(b)

Fig. 1. LCS examples. (a) 69 LCSs of string ”PCM 2010, Shang Hai”, and (b) 4 LCSs of two straight line segments with an arrow symbol.

We use a one-pass process to generate LCS representation from a given drawing image. When sequentially scanning each blackrun r row by row, we determine whether create a new LCS for r, or add it to an existing LCS as follows: 1. Search all the existing LCSs connected to r; 2. Traverse the connected LCSs, add r to such a LCS S if e(r, S) equals to 1, or create a new LCS Snew for r: { 1 r ∈ S if D(r, S) < T e(r, S) = (1) 0 otherwise D(r, S) is to evaluate the relative deviation from r to S, where T is a fixed threshold. In our experiment, T adopts a constant value 0.6. Note that we only add r to the LCS having the minimum D(r, S) if there are more than one such LCSs with √ 2 (r, S) + αδ 2 (r, S) (2) D(r, S) = δw θ

α is a penalty value, adopting 2.0 in our experiment, and δw (r, S) =

l(r) − w(S) w(S)

1 + U (r, S)V (S) δθ (r, S) = 1 − √ (1 + U 2 (r, S))(1 + V 2 (S))

(3) (4)

In (3), we use δw to describe the difference between r and S, while l(r) is the length of r, and w(S) denotes the mean length of the n blackruns in S. In (4), we use δθ to measure the orientation deviation from r to the axis of S, where ∑ hc(r) − n1 γ∈S hc(γ) ∑ U (r, S) = (5) vc(r) − n1 γ∈S vc(γ) V (S) =

1 n



U (γ, S ′ )

(6)

γ∈S,γ ∈S / ′ ,S ′ ⊆S

hc(r) denotes the central horizontal coordinate of r and vc(r) denotes its central vertical coordinate. S ′ is a sub set of S before adding γ to S. LCS provides a stroke-based structural representation as previously mentioned, making directly extracting characters possible. 3.2

Character LCSs Extraction

Then we introduce how to distinguish character LCSs (c-LCSs) from those of graphical primitives (g-LCSs, e.g., straight line segments or curves) or noises (n-LCSs). By analyzing a number of real-life drawings, we find the following differences among c-LCSs, g-LCSs and n-LCSs are helpful: 1. Size: average size of c-LCSs is usually smaller than that of g-LCSs; 2. Amount: the amount of g-LCSs is often less than that of c-LCSs in the same drawing; 3. Orientation: c-LCSs can usually form clusters along a certain orientation to represent one or more strings, since gaps between characters are relatively small [13]; 4. Distribution: c-LCSs distribute in a relative limited region in a line drawing image; however, g-LCSs usually distribute all over it. Inspired by these differences, we first measure the size factor as follows: { max(h(S), w(S)) if Rw /w(S) < TR Size(S) = √ 2 (7) 2 Rh + Rw otherwise where R is the corresponding bounding box of S, Rh and Rw denote its height and width, respectively. h(S) is the number of different rows in S, and TR is used to evaluate Size(S) more accurately since LCSs may be along any orientation in a drawing. In our experiment, TR adopts the constant value 2.0.

Then, we identify each c-LCS Ccomponent (S) from the LCSs as follows: { 1 if Tn < Size(S) < Tl Ccomponent (S) = 0 otherwise

(8)

where Tn and Tl are a pair of filter thresholds to extract c-LCSs (Tn is the lower bound and Tl is the upper limit). Tn and Tl can be evaluated from the SizeNumber distribution histogram of the LCSs. Take the distribution in Fig. 2 as an example. We first search for a feature point a, the first valley after a sharply descending distribution curve. Actually, most LCSs on the left of are n-LCSs with small Size(S) values but large numbers. Then we search for another feature point c, after which the curve tends to be relatively plain. Most LCSs on the right of c are g-LCSs with large Size(S) values but small numbers according to previous analysis. Finally, the LCSs between a and c are considered as candidate c-LCSs. Therefore, Tn and Tl are actually determined by a and c, respectively. Note that there may be one or more peaks between a and c, depending on character sizes in the drawing.

250

Number

200

150

100

50

0

ab c

50

100

Size

150

200

250

Fig. 2. Size-Number distribution of LCSs.

Then we merge all the adjacently connected c-LCSs into corresponding c-LCS blocks. Note that the touching runlengths and long curves can be simultaneously removed by removing the n-LCSs and g-LCSs. As a result, the separated characters are left and the touching characters are separated as a single c-LCS block. For those graphical components still left in the c-LCSs, we further use the existing geometrical restrictions like block ratio or pixel density [10–12] to reject.

3.3

Text String Generation

Finally, we group the rest c-LCS blocks into result strings as follows: 1. Merge all the neighboring c-LCS blocks into one single connected cluster. Any two blocks are regarded as neighboring only when the distance between their bounding box centers is less than an adaptive threshold Td . In our experiment, Td is calculated by ¯ w) Td = 0.5 × max(h, ¯

(9)

where h and w denote the height and width of the bounding box, respectively; 2. Recursively apply Hough transforms for each connected cluster. Merge those c-LCS blocks having the same parameters in Hough domain into the same collinear group. Each collinear group represents a candidate result string on the same line, with all its internal c-LCSs neighboring or connecting to each other; 3. Evaluate the orientation for each collinear group by minimizing the standard deviation of the parameters in Hough domain. Take the orientation as the baseline for the result string. As a result, we obtain separated candidate strings, each having one single calculated baseline.

4

Experimental results and discussions

We test our method on a number of real-life line drawing images. We take Fig. 3 as an example to show the steps involved in our approach. Fig. 3 is a part of a real-life drawing with the resolution of 576 × 484.

Fig. 3. Part of an example line drawing image.

We first extract LCSs from Fig. 3. Fig. 4(a) gives the results, each surrounded by a red bounding box.

(a)

(b)

Fig. 4. (a) LCS analysis, and (b) c-LCS identification.

Then, we identify c-LCS blocks by using the method introduced in Section 3. Note that those very small noise black runlengths and long graphical ones are discarded in this step. Fig. 4(b) shows the results, where each bounding box indicates a c-LCS block. Finally, collinear grouping algorithm is performed on Fig. 4(b). After a rough refinement process, we obtain the final result shown in Fig. 5, where the target strings in arbitrary orientations have been detected. Note that the string “71(T2.B2)” touching with a straight line segment is also well identified.

Fig. 5. Result of text detection from Fig. 3.

Fig. 6(a) shows another complicated drawing image with the resolution of 3189 × 1744 having more noises and degradations. Fig. 6(b) illustrates the final detection results, where nearly 92% characters have been correctly detected and grouped into strings. Note that some g-LCSs are not well discarded in our method due to their similarity to c-LCS in size; however, OCR algorithms can be applied in the following refinement processes to well recognize target strings. In our experiments, we altogether collect 102 real-life line drawing images from Nanjing and Hong Kong. The drawings are scanned with different resolu-

(a)

(b)

Fig. 6. Another complicated drawing. (a) the original image, (b) text detection results.

tions. Table 1 lists our experimental results, where RRC (Recall Rate of Characters) and FAR (False Alarm Rate) are defined as follows: RP C = F AR =

extracted characters × 100% ground-truth characters

(10)

extracted non-character blocks × 100% (11) extracted characters + extracted non-character blocks

Table 1. Characters extraction results Image Groups

Image Resolution

Ground-truth Characters

RRC

FAR

Group1 Group2 Group3 Group4

7400 × 11510 5080 × 7930 13020 × 17290 12540 × 17340

5096 6547 2249 2559

89% 95% 96% 92%

17% 20% 19% 18%

4112

93%

19%

Average

Degradations like touching or crossing with other non-character graphics are the most difficult to be recognized. We test our algorithm on a number of scanned real-life drawing images and the experimental results are shown in Table 2. It can be seen that averagely 84% degraded characters can be successfully detected. Table 2. Detection results for degraded images Image Groups

Image Resolution

Degraded Characters

Successfully detected characters (rate %)

Group1 Group2 Group3 Group4

7400 × 11510 5080 × 7930 13020 × 17290 12540 × 17340

1299 1550 545 778

81% 87% 88% 81%

1043

84%

Average

5

Discussions and Conclusions

We propose a new approach to directly detect texts from line drawing images. Experimental results show that our method deals well with degradations, noises or graphical touchings. Detected texts can be in any orientation and no line detection/removal process is needed, making content-oriented retrieval of line drawing images possible. Further work includes: 1) string reconstruction from line-string crossing situations, and 2) character refinement during OCR process.

Acknowledgments. The work described in this paper was supported by the Natural Science Foundation of China under Grant No. 60603086, the 973 Program of China under Grant No. 2010CB327903, and the Natural Science Foundation of JiangSu under Grant No. BK2009082.

References 1. Zhang, C., Chai, J.Y., Jin, R.: User term feedback in interactive text-based image retrieval. In: SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. pp. 51–58. ACM, New York, NY, USA (2005) 2. Zhao, R., Grosky, W.I.: Narrowing the semantic gap - improved text-based web document retrieval using visual features. IEEE Transactions on Multimedia 4, 189– 200 (2002) 3. Stehling, R.O., ao, A.X.F., Nascimento, M.A.: An adaptive and efficient clusteringbased approach for content-based image retrieval in image databases. Database Engineering and Applications Symposium, International 0, 0356 (2001) 4. Heczko, M., Hinneburg, A., Keim, D., Wawryniuk, M.: Multiresolution similarity search in image databases. Multimedia Systems 10, 28–40 (2004) 5. Cho, S.B., Lee, J.Y.: A human-oriented image retrieval system using interactive genetic algorithm. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on 32(3), 452–458 6. Xu, X., Zhang, L., Yu, Z., Zhou, C.: Image retrieval using multi-granularity color features. In: Audio, Language and Image Processing, 2008. ICALIP 2008. International Conference on. pp. 1584–1589. Shanghai 7. Dori, D., Wenyin, L.: Vector-based segmentation of text connected to graphics in engineering drawings. In: SSPR ’96: Proceedings of the 6th International Workshop on Advances in Structural and Syntactical Pattern Recognition. pp. 322–331. Springer-Verlag, London, UK (1996) 8. Su, F., Lu, T., Cai, S., Yang, R.: A character segmentation method for engineering drawings based on holistic and contextual constraints. In: GREC ’09: Proceddings of the 8th IAPR International Workshop on Graphics RECognition. pp. 280–290. Frence (2009) 9. Song, J., Su, F., Tai, C.L., Cai, S.: An object-oriented progressive-simplificationbased vectorization system for engineering drawings: model, algorithm, and performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(8), 1048–1060 10. Fletcher, L.A., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Transactions on Pattern Analysis and Machine Intelligence 10(6), 910–918 11. Lai, C.P., Kasturi, R.: Detection of dimension sets in engineering drawings. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(8), 848–855 12. Lu, Z.: Detection of text regions from digital engineering drawings. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(4), 431–439 13. Strouthopoulos, C., Nikolaidis, A.: A robust technique for text extraction in mixedtype binary documents. In: Pattern Recognition, 2008. ICPR 2008. 19th International Conference on. pp. 1–4. Tampa, FL os, J.: Touching text character localization in graphical 14. Roy, P.P., Pal, U., Llad´ documents using SIFT. In: GREC ’09: Proceddings of the 8th IAPR International Workshop on Graphics RECognition. pp. 271–279. Frence (2009)

15. Lu, T., Tai, C.L., Yang, H., Cai, S.: A novel knowledge-based system for interpreting complex engineering drawings: Theory, representation, and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(8), 1444–1457 16. Lu, T., Yang, H., Yang, R., Cai, S.: Automatic analysis and integration of architectural drawings. Int. J. Doc. Anal. Recognit. 9(1), 31–47 (2007) 17. Lu, T., Tai, C.L., Su, F., Cai, S.: A new recognition model for electronic architectural drawings. Comput. Aided Des. 37(10), 1053–1069 (2005) 18. Song, J., Li, Z., Lyu, M.R., Cai, S.: Recognition of merged characters based on forepart prediction, necessity-sufficiency matching, and character-adaptive masking. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 35(1), 2–11

Suggest Documents