Texture Descriptors in MPEG-7 - Semantic Scholar

3 downloads 382 Views 267KB Size Report
3Dept. of Electronics Eng., Dongguk Univ., Seoul, 100-715, S. Korea ... SAIT, KiHeung, YongIn, KyungKi-Do 449-712 S. Korea yanglimc@samsung.com ..... T111. 5 Summary and Conclusion. Texture is one of salient features in content-based ...
Texture Descriptors in MPEG-7 1

2

3

4

Peng Wu , Yong Man Ro , Chee Sun Won , and Yanglim Choi 1

Dept. of Electrical and Computer Engineering, UC Santa Barbara, CA 93106-9560, USA [email protected] 2 Multimedia Group, ICU. Yusung-gu POBOX 77, Taejon, S. Korea [email protected] 3 Dept. of Electronics Eng., Dongguk Univ., Seoul, 100-715, S. Korea [email protected] 4 M/M Lab. SAIT, KiHeung, YongIn, KyungKi-Do 449-712 S. Korea [email protected]

Abstract. We present three descriptors of texture feature of a region. Namely, the homogeneous texture descriptor (HTD), the edge histogram descriptor (EHD), and the perceptual browsing descriptor (PBD). They are currently included in the Committee Draft of the MPEG-7 Visual (ISO/IEC 15938-3). Each descriptor has a unique functionality and application domain. HTD and EHD describe statistical distribution of the texture and are useful for image retrieval application, while HTD is for homogeneously textured region and EHD is for multi-textured natural image or sketch. PBD is a compact descriptor suitable for quick browsing application. Keywords: texture descriptor, MPEG-7

1 Introduction MPEG-7 establishes a universal M/M description interface and describing contentbased feature is one of primary objectives. Image texture is one of important visual features and applications using texture includes the retrieval, browsing, and indexing. Three texture have been recommended in the MPEG-7 Committee Draft (CD) Part3: Visual (ISO/IEC 15938-3). We will brief each of them below. The homogeneous texture descriptor (HTD) describes a precise statistical distribution of the image texture. It’s a vector of 62 integers coming from the Gabor filter response of 30 frequency layout channels. It enables to classify images with high precision. HTD is to be used for similarity retrieval applications. The edge histogram descriptor (EHD) is an 80-bin histogram representing the local edge distribution of an image. It is to be used for an image retrieval application where the data images are not necessarily homogeneously textured, e.g., natural images, sketch, or clip art images etc. Also, it supports a query based on sub image blocks. The perceptual browsing descriptor (PBD) is designed for an application where features with perceptual meaning are needed to browse the database. It is very compact and describes a high level perceptual semantics of an image texture; texture regularity, directionality and coarseness. W. Skarbek (Ed.): CAIP 2001, LNCS 2124, pp. 21–28, 2001. © Springer-Verlag Berlin Heidelberg 2001

22

P. Wu et al.

In the following, we present the definition, semantics, extraction, and test results. The readers are advised to read the references to get an in-depth understanding.

2 Homogeneous Texture Descriptor (HTD) 2.1 Definition and Semantics HTD is composed of 62 numbers. The first two are the mean and the standard deviation of the image. The rest are the energy and the energy deviation of the Gabor filtered responses of the “channel”, in the subdivision layout of the frequency domain in Figure1. This design is based on the fact that response of the visual cortex is bandlimited and brain decomposes the spectra into bands in spatial frequency [1,2,3,4,5]. Channel (C i ) channel number (i)

4 5

3

6

10

2 9

11 16

12

17 18

15

14 22 21 24 23 20 19 13

θ

8

7

1

30 29

ω

ω

28 27 26 25 0

Fig. 1. Frequency Domain Division Layout

The center frequencies of the channels in the angular and radial directions are such that θ r = 30° × r 0 ≤ r ≤ 5 , ω s = ω 0 ⋅ 2 − s 0 ≤ s ≤ 4 , ω 0 = 3/4. The equation for the Gabor wavelet filters is the following. ⎡ − (ω − ω )2 ⎤ ⎡ − (θ − θ )2 ⎤ s r ⎥ ⋅ exp ⎢ ⎥ G P s ,r (ω ,θ ) = exp ⎢ 2 ⎢⎣ 2σ ρ s ⎥⎦ ⎥⎦ ⎢⎣ 2σ θ2r

Note that

σω

r

and

σθ

r

(1)

is the standard deviation in the radial and the angular

direction, where the neighboring filters meet at the half of the maximum 2.2 Extraction The extraction of the mean and the standard deviation is straightforward. For the rest, the Radon transform followed by 1-D Fourier transform is applied. Let F (ω , θ ) be the result. Then the energy

ei and the energy deviation d i of the ith channel is

Texture Descriptors in MPEG-7

1

ei = log[1 + pi ] , d i = log[1 + qi ] , where pi

=∑

360o

∑I

2 sr

,

(2)

ω=0+ θ =0o +

qi =

∑ ∑{I ω 1

360o

=0+ θ =0o +

2 sr

23

}

2

− pi , Isr = GP s,r (ω,θ ) ⋅ ω ⋅ F(ω,θ )

ω is the Jacobian between the Cartesian and the Polar. Then the HTD is written as HTD = [ f DC , f SD , e1 , e2 ,K , e30 , d1 , d 2 ,K , d 30 ]

(3)

2.3 Experimental Results Let HTDi and HTD j be the HTD of image i and j, then their similarity is, d (i , j ) = distance ( HTD i , HTD j ) =

∑ k

w( k )[ HTDi (k ) − HTD j ( k )]

(4)

α (k )

Where w(k) is the weight and α (k ) is the standard deviations of k descriptor values in the database. We performed tests on MPEG-7 dataset, which consists of the Brodatz images (T1), ICU images (T2), and their scale and orientation variations [7]. The results for T1 and T2 are shown in Table 1. th

Table 1. Average Retrieval Rates (ARR) for the HTD

Data set T1 T2

ARR (Average Retrieval Rate %) 77.32 90.67

3 Edge Histogram Descriptor 3.1 Definition and Semantics The edge histogram descriptor (EHD) represents local edge distribution in the image. It describes edges in each ‘sub-image’, which is obtained by dividing the image using 4x4 grid as in Fig. 2. Edges in the sub-image are classified into five types; vertical, horizontal, 45-degree, 135-degree, and non-directional. Occurrence of each type becomes a histogram bin, producing 80 histogram bins overall. The order of these bins is shown in Table 2.

24

P. Wu et al.

Fig. 2. The sub-image and Image-block.

Table 2. Semantics of local edge bins Bins

Semantics

Bin[0]

Vertical edge at (0,0)

Bin[1]

Horizontal edge at (0,0)

Bins Bin[5] : :

Semantics Vertical edge at (0,1) :

Bin[2]

45degree edge at (0,0)

Bin[3]

135 degree edge at (0,0)

Bin[78]

135 degree edge at (3,3)

:

Bin[4]

Non-direc. edge at (0,0)

Bin[79]

Non-direc. edge at (3,3)

The histogram bin values are normalized by the total number of the image-blocks. The bin values are then non-linearly quantized to keep the size of the histogram as small as possible [9]. We assigned 3 bits/bin and 240 bits are needed in total [8]. 3.2 Edge Extraction Since the EHD describes non-directional edge and no-edge cases, an edge extraction scheme based on image-block (instead of pixel) is needed, as in Fig. 2. For the blockbased edge extraction schemes, we refer the reader to methods in [10][11][12]. 3.3 Experimental Results For a good performance, we need the global edge distribution for the whole image and semi-global, horizontal and vertical edge distributions. The global is obtained by accumulating 5 EHD bins for all the sub-images. For the semi-global, four connected sub-images are clustered as in Fig. 3 [13]. In total, we have 150 bins (80 local + 5 global + 65 semi-global). The 11639 MPEG-7 image dataset [7] is used for the test. Using an absolute distance measure [12][13], the ANMRR was as low as 0.2962. EHD showed good results both in query by example and by sketch, especially for natural images with non-uniform textures and clipart images.

Texture Descriptors in MPEG-7

1

2

3

5

4

10

9

6

25

13

7

11 12

8

Fig. 3. Clusters of sub-images for semi-global histograms

4 Perceptual Browsing Descriptor 4.1 Definition and Semantics The syntax of the Perceptual Browsing Descriptor (PBD) is the following. P B D = [ v1

v2

v3

v4

v5 ]

(5)

v1 represents the regularity. n2, n3 represent the two directions that best capture the directionality. n4, n5 represent the two scales that best capture the coarseness. 4.2 Extraction As in [17], the image is decomposed into a set of filtered images, Wmn ( x, y ) ,

m = 1,..., S n = 1,..., K . The overall extraction process is depicted in Fig. 4.

Fig. 4. Extraction of Perceptual Browsing Descriptor

Dominant Direction can be estimated in spatial domain [17] or in frequency domain [16], [18]. Due to the aperture effect [18], the spatial domain approach seems to be more adequate [14]. We used S directional histograms of the S × K images. H (s, k ) =

N (s, k )



K 1

N (s, k )

, s = 1,..., S and k = 1, ..., K (0.1)

(6)

26

P. Wu et al.

N ( s , k ) is the number of pixels in the filtered image at (s, k) larger than a threshold t s 1. Among the direction(s) having peaks in H ( s , k ) with peak also in the neighboring

scale, with two peaks of the highest sharpness, are chosen as PBD[v2 ] and PBD[v3 ] . The sharpness of a peak is C ( s , k ) = 0.5 ⋅ (2 H ( s , k ) − H ( s , k − 1) − H ( s , k + 1)) [14]. ( mn )

For the coarseness, two projections, PH

∫∫

( mn )

and PV

are computed as follows.

∫∫

PH (l) = Wmn (x, y)δ (x cosθDO1 + y sinθDO1 − l)dxdy PV (l) = Wmn (x, y)δ (x cosθDO2 + y sinθDO2 − l)dxdy ( mn)

( mn)

(7)

θ DO1 , θ DO 2 are the two dominant directions. Let P (l ) be the projection corresponding to θ DO1 (similar for θ DO 2 ). The Normalized Auto-correlation Function (NAC) is computed as in (8). Let p _ posi(i ) , p _ magn(i ) and v _ posi( j ) , v _ magn( j ) be the positions and magnitudes of the peaks and valleys in NAC(k). And let dis and std be the mean and the standard deviation of the distances of successive peaks. The *

projections with std / dis less than a threshold become candidates. Let m ( H ) and *

m (V ) be the scale index of candidate PH( mn ) and PV( mn ) with the maximum contrast *

*

(defined in (8)). Then PBD [ v 4 ] = m ( H ) and PBD [ v 5 ] = m (V ) . N −1

NAC ( k ) =

∑ P( m − k ) P( m )

contrast =

m= k N −1

∑P

N −1

2

(m − k )

m= k

∑P

2

1 M

M

N

∑ p_ magn(i ) − N ∑ v _ magn(i ) 1

i =1

j =1

(8)

( m)

m= k

The regularity is obtained by assigning credits to the candidates. The credits are then added and quantized to be an integer between 1 and 4. For more details, see [19]. 4.3 Experiments For each 512 × 512 image in the Brodatz album [15], 19 256 × 256 images are obtained o

by partitioning and rotating the large image by 30 steps. Figure 5(a) shows dominant direction estimation of the T069. Note also that the estimated orientations in Figure 5(b) match reasonably well with the perceived directionality of the texture. Subjective tests are also conducted with 25 texture groups. The ground truth was formed by human observation. The comparison result is shown in Table 3

1

At each scale s, the threshold is set to t s = µ s + σ s , where µ s and σ s are the mean and the standard deviation over the all K filtered images at that scale s.

Texture Descriptors in MPEG-7

T069_01: [90

*]

T069_03: [90 *]

T069_02: [60 *]

T001_05:[60 150]

T020_08:[60 150]

T036_06:[150 120]

T047_01:[30 120]

T069_04: [30 120]

T050_01:[90]

T051_05:[150]

T083_18:[60]

T106_09:[30]

(a)

T053_01:[90

27

*]

T115_16:[30]

(b)

Fig. 5. (a) Detection of the gradual directionality change

(b) Detected dominant direction(s)

Table 3. Subjective Evaluation results

S1 S2

Texture Pattern T001, T003, T020, T021, T022, T032, T036, T046, T047, T052, T053, T056, T064, T065, T077, T082, T084, T085, T094, T095 T050, T051, T083, T106, T115

% of matches First Second 98 88 100

The regularity and coarseness are also evaluated using images in Table 4. Five people were asked to quantify the regularity and coarseness. For regularity, the two values were within –1 deviation for 29 out of 30 images. For the scales, assuming that the scales match if the two values are within –1, they were in agreement for 26/30 (87%). Table 4. Images used in subjective evaluation # T001,T002,T006,T007,T009,T012,T014,T018,T020,T021,T023,T025,T026,T037,T039, T053,T055,T056,T064,T065,T067,T071,T075,T088,T094,T096,T097,T103,T107, T111

5 Summary and Conclusion Texture is one of salient features in content-based representation of images. Due to variety of the applications, MPEG-7 recommends three texture descriptors. Their efficiency and robustness are carefully checked through the CE process by the MPEG members. The descriptors are more or less complimentary to each other. It is reasonable to speculate that a combination scheme would give richer functionality. Future research leads to investigation of this aspect.

References 1. 2.

Manjunath, B.S., Ma, W.Y.: Texture Features for Browsing and Retrieval of Image Data. IEEE Transactions on PAMI, Vol. 18, No. 8, August (1996) Chellappa, R.: Two-dimensional discrete Gaussian Markov random field models for image processing. Pattern Recognition, vol. 2 (1985) 79-112

28 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.

P. Wu et al. Saadane, A., Senane, H., Barba, D.: On the Design of Psychovisual Quantizers for a Visual Subband Image Coding. SPIE, Vol. 2308 (1994) 1446 Saadane, A., Senane, H., Barba, D.: An Entirely Psychovisual based Subband Image Coding Scheme. SPIE, Vol. 2501 (1995) 1702 Daugman, J.G.: High Confidence Visual Recongnition of Persons by a Test of Statistical Independence. IEEE Trans. PAMI, vol.15, no.11, November (1993) 1148-1161 Lambrecht, C.J.: A Working Spatio - Temporal Model of Human Vision System for Image Restoration and Quality Assessment Applications. IEEE International Conference on ASSP vol. 4. New York, NY, USA (1996) 2291-2294 Ro, Y.M., Yoo, K.W., Kim, M., Kim, J., Manjunath, B.S., Sim, D.G., Kim, H.K., Ohm, J.R.: An unified texture descriptor. ISO/IEC JTC1 SC29 WG11 (MPEG), M5490. Maui (1999) ISO/IEC/JTC1/SC29/WG11: Core Experiment Results for Edge Histogram Descriptor (CT4). MPEG Document M6174. Beijing (2000) ISO/IEC/JTC1/SC29/WG11: CD 15938-3 MPEG-7 Multimedia Content Description Interface-Part 3. MPEG Document W3703. La Baule (2000) Vaisey, J., Gersho, A.: Image compression with variable block size segmentation. IEEE Tr. Signal Process., vol 40, no 8 (1992) 2040-2060 Won C.S., Park, D.K.: Image block classification and variable block size segmentation using a model-fitting criterion. Optical Eng., vol 36, no 8 (1997) 2204-2209 ISO/IEC/JTC1/SC29/WG11: MPEG-7 Visual XM 8.0. W3673. La Baule (2000) Park, D.K., Jeon, Y.S., Won, C.S., Park, S.-J.: Efficient use of local edge histogram descriptor. Proc. of ACM Multimedia 2000 Workshops. Marina del Rey (2000) 51-54 Brodatz, P.: Textures: A photographic album for artists & designers. Dover, NY (1966) Liu, F., Picard, R.W.: Periodicity, directionality, and randomness: Wold features for image modeling and retrieval. MIT Media Lab Technical Report No. 320, March (1995) Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Transactions on PAMI 18(8) (1996) 837-842 Tamura, H., Mori, S., Yamawaki, T.: Texture features corresponding to visual perception. IEEE Trans. On Sys. Man, and Cyb, SMC 8(6) (1978) Weszka, J.S., Dyer, C.R., Rosenfeld, A.: A comparative study of texture measures for terrain classification. IEEE Trans., Sys., Man, Cyber., SMC-6, Apr (1976) 269-285 Wu, P., Manjunath, B.S., Newsam, S., Shin, H.D.: A texture descriptor for browsing and similarity retrieval. Journal of Signal Processing: Image Communication, Volume 16, Issue 1-2, September (2000) 33-43