Thresholding Video Images for Text Detection

3 downloads 0 Views 300KB Size Report
video images, text characters generally have much lower resolution and dimmer ..... and 2(c) where (x,y)-axis is the spatial coordinates of the image pixels with ...
Thresholding Video Images for Text Detection Eliza Yingzi Du and Chein-I Chang Remote Sensing Signal and Image Processing Laboratory. Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County 1000 Hilltop Circle, Baltimore, MD 21250 Paul D. Thouin Department of Defense, Fort Meade, MD 20755 Abstract Thresholdging video images is very challenging due to the fact that image background generally has low resolution and is also more complicated and highly distorted than document images. As a result, thresholding methods that work well for document images may not work effectively for video images in some applications. This paper investigates the issue of thresholding video images for text detection and further develops a relative entropy-based thresholding approach that can effectively extract text from complicated video images. In order to demonstrate its performance a comparative study is conducted among the proposed thresholding method and several thresholding techniques which are widely used for document and gray scale images. The experimental results show that thresholdging video images is far more difficult than thresholding document images and simple histogrambased methods generally do not perform well.

1. Introduction Information retrieval from video images has become an increasingly important research area in recent years. The rapid growth of digitized video collections is due to the widespread use of digital cameras and video recorders combined with inexpensive disk storage technology. Textual information contained in video images can provide one of the most useful keys for successful information indexing and retrieval. Keyword searches for scene text of interest within video images can provide additional capabilities to the search engines. In video images, text characters generally have much lower resolution and dimmer intensity than binary document characters. In addition, video text characters may also have various colors, sizes, styles, and orientations within the same image. Furthermore, the video background is generally much more complex than document images. A combination of this complex background and a large variety of low-quality characters cause thresholding algorithms that are designed for document image processing to perform poorly on video images. In this paper, we investigate this issue and further develop a relative entropy-based approach, referred to as joint

relative entropy (JRE), to thresholding video images. It uses relative entropy, also known as Kullback-Leibler information distance, to measure uncertainty resulting from gray level transitions between background and foreground where the co-occurrence matrix is used to account for spatial correlation. As demonstrated by experiments, the proposed JRE performs significantly better for video images than those thresholding methods that are developed for gray scale document images. In particular, the two commonly used objective criteria for thresholding, uniformity and shape measures are shown to be inappropriate measures for video image thresholding. In fact, the one that yields the best text detection results generally has very low uniformity and shape values, which are supposed to be very high for gray scale document images.

2. Relative Entropy-based Thresholding Method Entropy-based and relative entropy-based thresholding have received considerable interests recently [1,2]. Of particular interest is the relative entropy approach developed by Chang et al. in [2]. Let W = [pij ] L×L be the co-occurrence matrix with the (i,j) element given by gray-level transition probability from gray level i to j, pij . Let t be a threshold used to threshold an image. It partitions the co-occurrence matrix defined by Eq. (1) into four quadrants, namely, A, B, C, and D given by t t t L −1 PA = ∑ti =0 ∑tj =0 pij , PB = ∑i =0 ∑ j =0 pij , t

L −1 L −1 t PC = ∑iL−1 = 0 ∑ j = 0 pij , PD = ∑ i =0 ∑ j = 0 pij t

(1) The probabilities in each quadrant can be further obtained by normalization, called "cell probabilities", t t t t t t pij |A = pij / PA , pij |B = pij / P B , pij |C = pij / PC , t

t

pij |D = pij / PD .

(2) t

Now assume that t is a selected threshold value. Let hij be the transition probability of the t-thresholded binary image in response to pij . We can define their corresponding cell probabilities of a thresholded image in each of four quadrants given by

1051-4651/02 $17.00 (c) 2002 IEEE

t

t

t

t qB

t PB

3.1. Uniformity Measure

hij| A = qA = P A / (t + 1)(t + 1) , t hij|B

t

=

=

The uniformity measure is generally used to describe the homogeneity of regions in an image. For a 2 2 given threshold t, it is defined by U (t ) = 1 − σ B + σ F / C

/ (t + 1)(L − t − 1) ,

t

t

t qD

t PD

hij|C = qC = PC / (L − t − 1)(L − t − 1) , t hij|D

/ (L − t − 1)(t + 1) . = = The relative entropy between the probability t L −1 distributions { pij }iL−1, and {hij } is then defined by =0, j = 0 t J({pij };{hij })

=

L−1 ∑ iL−1 =0 ∑ j = 0

(

pij log pij /

t hij

)

t

= − H({pij}) − ∑ i, j pij log hij = − H ({pij}) − (PAt log qtA + PBt log qtB + PCt log qCt + PDt log q tD ).

(3)

L −1 where H({pij}) is the entropy of { p j }iL−1, and i =0, j = 0 independent of t. The thresholding algorithm developed by Chang et al. in [2] finds a threshold value that minimizes t t t t t t t t HRE (t ) = −PA log qA − PB log qB − PC log qC − PD log qD . (4)

{

}

In Eq. (4), the optimal threshold minimizes the relative entropy in all the four quadrants, A,B,C, and D. However, since we are interested in text detection, a more logical approach is to focus on the gray level transitions between text and background which occur in quadrants B and D. In this case, we need to introduce a t t normalization factor pij |BD = pij / P B + PD to normalize the probabilities in quadrants B and D. The resulting relative entropy is referred to as joint relative entropy (JRE) given by

(

)

(

J JRE ({pij| AC}; hijt ) = ∑(i, j )∈BF∪ FB pij |BD log pij |BD / hijt = − HBF+ FB (t ) −

) ( ( /(P + P )log(q /(P + P )) (

t

t B

t D

)

t ∑(i, j)∈BF ∪FB pij |BD log hij

= − HBF+ FB (t ) − qtB / PBt + PDt log q tB / PBt + PDt − qD

t D

t B

t D

))

(5)

t

}as opposed to

that is, tJRE = arg min JJRE ({ pij|BD };hij ) t∈G

µ F = (∑ ( x ,y ) ∈F f (x, y)) / N F ,

Chang et al.’s algorithm which will be referred to as global relative entropy (GRE) in this paper.

3. Objective Measures In order to avoid human interpretation, two objective measures, uniformity and shape, suggested in [3] have been used to evaluate thresholding performance analysis.

(

)

µ B = (∑ ( x, y )∈B f (x, y) )/ N B ,

σ B = ∑ ( x ,y ) ∈B ( f ( x, y) − µ B )

2

2

,

= ∑( x , y) ∈F ( f (x, y) − µ F ) , N B =| B | and N F =| F | are the number of pixels in background and foreground region respectively. 2

2 σF

3.2. Shape Measure The shape measure is generally used for measurement of the shapes of objects in an image. It is defined by  SF (t ) = ∑ (x ,y ) ∈F sign( f (x, y) − f B )∆(x, y) sign( f (x, y) − t )∑  / CF (6) 



where CF is a normalization constant, sign (x) is the sign function of x, and 1/ 2 2 ∆(x, y) = ∑ 4k =1 Dk + 2 D1 (D3 + D4 ) − 2 D2 (D3 − D4 ) wit h D1 = f (x + 1, y) − f (x − 1, y) , D2 = f (x, y − 1) − f (x, y + 1) , D3 = f (x+1,y+1) − f( x−1,y−1) , D4 = f (x + 1, y − 1) − f (x − 1, y + 1) .

]

4. Experiments Two sets of data were used for experiments, synthetic images and real video images. Two commonly used thresholding methods, Pun-Kupur et al.'s maximum entropy (ME) [4,5] and Otsu's method [6] were used for comparison. Also included for comparison were joint entropy and local entropy method developed by Pal and Pal in [1] from which Chang et al’s algorithm and JRE are derived from their algorithms.

4.1. Synthetic Images

where HBF +BF (t) = − ∑(i, j )∈BF∪ FB pij|BD log pij| BD is the entropy of quadrants B and D. So, the JRE approach finds a threshold tJRE that minimizes J JRE({p ij|BD}; hijt ),

{

C = ( f max − f min ) / 2 ,

[

That is, tRE = arg min HRE (t ) . t ∈G

with

2

Two 256 × 256 synthetic images were created in Figs.1(a) and 2(a) to simulate two scenarios, one with a bright square located in the center of a simple background and another with a light square located at the bottom of a complicated background. Both synthetic images had an identical histogram shown in Figs. 1(b) and 2(b), but had very different spatial correlation which can be visualized in 3-D representations in Figs. 1(c) and 2(c) where (x,y)-axis is the spatial coordinates of the image pixels with their corresponding gray level values specified by the z-axis. In Fig. 1, the foreground and background were well-separated by two Gaussian distributions using any threshold value between 164 and 206, whereas, in Fig. 2 the background was simulated using gray levels from 0 to 229. In the latter case, the background was simulated in such a way that the background was gradually moved towards the

1051-4651/02 $17.00 (c) 2002 IEEE

foreground with smooth transition as shown in Fig. 2(c). Their uniformity and shape values are plotted in Figs. 1(d-e) and 2(d-e). Since ME and Otsu's method are solely based on image histograms shown in Figs. 1(b) and 2(b), the threshold values produced by these methods for images in Figs. 1(a) and 2(a) were identical and their respective results are shown in Figs.1(f-g) and 2(f-g). Figs. 1(h)-1(k) and 2(h)-2(k) show the results produced by LE, JE, GRE and JRE respectively. Since the image background in Fig. 1(a) is well separated from the foregraound as shown in Fig. 1(c), we can expect that Otsu’s method would perform very well which was indeed the case. According to Fig. 1, the best performance was produced by Otsu’s method and JRE. Due to gradual transition from background to foreground shown in Fig. 2(c), ME, LE and GRE completely missed the square, while Otsu’s method and JE became having difficulty with thresholding and could only extracted half of the square. JRE was the only one, which could segment the full square out the background. In Table 1, we tabulate the threshold values, uniformity and shape values generated by ME, Ostu’s method, LE, JE, GRE and JRE for both synthetic images where JRE produced the lowest values of uniformity and shape for synthetic image 2. This contradicts the general understanding in image thresholding that a good threshold value should also produce higher values of uniformity and shape. This interesting experiment demonstrates that thresholding video images is very different from that for gray scale images such as document images.

4.2. Video Images Fig. 3 shows experimental result of a video image which has a single column text in the center of the image. As we can see from Fig. 1(a), the image background is very complicated and contained various low resolution objects. For text detection, the best performance was the one produced by JRE, which completely extracted the text in the video image. Like the experiments of image 2, JRE also produced the lowest values of uniformity and shape in Table 1. Interestingly, except GRE, other entropy-based thresholding methods performed reasonably well by

extracting the text along with some portion of the background.

5. Conclusion This paper develops a relative entropy-based thresholding method, joint relative entropy (JRE) for video images for text detection. It shows that the thresholding techniques that are commonly used for document and gray scale images may not work due to complicated background of video images. Additionally, it also demonstrates that the uniformity and shape measures are not appropriate criteria to measure for thresholding video images. Despite only one real image experiment presented in this paper, many other video images were also conducted for experiments. According to our results, the proposed JRE seems to work effectively in many cases of text detection.

Acknowledgment The authors would like to thank Department of Defense to support his work through the contract number MDA904-00-C2120.

References 1. N. R. Pal and S. K. Pal, "Entropic thresholding," Signal Processing, Vol. 16, pp. 97-108, 1989. 2. C.-I Chang, K. Chen, J.Wang, and M. L. G. Althouse, "A relative entropy-based approach to image thresholding," Pattern Recognition, Vol. 27, No. 9, pp. 1275-1289, 1994. 3. P.K. Sahoo, S.Soltani, and A. K.C. Wong, "A survey of thresholding techniques," Computer Vision, Graphics, and Image Processing, Vol. 41, pp. 233-260, 1988. 4. T. Pun, "Entropic thresholding: A new approach," Computer, Graphics and Image Process, Vol.16, pp.210239, 1981. 5. J. N. Kapur, P. K. Sahoo, and A. K. C. Wong, "A new method for gray-level picture thresholding using the entropy of the histogram," Computer Vision, Graphics, and Image Processing, Vol. 29, pp. 273-285, 1985. 6. N. Otsu, "A threshold selection method from gray-level histograms," IEEE Trans. System Man Cybernetics, Vol. SMC-9, No. 1, pp. 62-66, 1979.

Uniformity Measure

0.016

0.014

Shape Measure

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.012

0.01

0.008

0.006

0.004

0.002

0

(a) synthetic image

0.1

0

50

100

150

200

(b) histogram

250

0

300

(c) 3-D display

0.1

0

50

100

150

200

(d) uniformity

1051-4651/02 $17.00 (c) 2002 IEEE

250

300

0

0

50

100

150

(e) shape

200

250

300

(f) ME (t=115) (g) Otsu (t=161) (h) LE (t=171) (i) JE (t=149) (j) GRE (t=108) (k) JRE (t=210) Figure 1. Thresholding results produced by ME, Otsu’s method, LE, JE, GRE and JRE 0.016

0.014

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.012

0.01

0.008

0.006

0.004 0.2

0.002

0

(a) synthetic image

0.2

0.1

0

50

100

150

200

250

0

300

(b) histogram

(c) 3-D display

0.1

0

50

100

150

200

(d) uniformity

250

300

0

0

50

100

150

200

250

300

(e) shape

(f) ME (t=115) (g) Otsu (t=161) (h) LE (t=118) (i) JE (t=154) (j) GRE (t=126) (k) JRE (t=228) Figure 2. Thresholding results produced by ME, Otsu’s method, LE, JE, GRE and JRE

(a) video image

(b) ME (t=172)

(c) Otsu (t=99)

(d) LE (t=156)

(d) JE (t=177) (e) GRE (t=87) (f) JRE (t=205) Figure 3. Thresholding results produced by ME, Otsu’s method, LE, JE, GRE and JRE Threshold Value Uniformity Shape Video image 1 Image 2 Image 1 Image 2 Video image Image 1 Image 2 Video image image ME 115 115 172 0.719 0.719 0.4564 0.9647 0.6411 0.7307 OTSU 161 161 69 1 1 1 0.5644 0.7375 0.7472 LE 117 118 156 0.7516 0.7676 0.5906 0.9534 0.6255 0.9124 JE 149 154 177 0.9971 0.9994 0.4229 0.9166 0.6966 0.6771 GRE 127 126 87 0.8908 0.8793 0.9866 0.9034 0.5965 0.6831 JRE 212 228 205 0.9944 0.2307 0.2876 0.4588 0.1469 0.3735 Table 1. Threshold value, uniformity and shape values generated by ME, Ostu’s method, LE, JE, GRE and JRE

1051-4651/02 $17.00 (c) 2002 IEEE

Suggest Documents