Content-Based Image Retrieval Using Block Discrete Cosine Transform

4 downloads 0 Views 686KB Size Report
In this preliminary experiment, 1000 images downloaded from the WBIIS [11] ... Conf. on Information, Communications and Signal Processing, Singapore, pp.
Content-Based Image Retrieval Using Block Discrete Cosine Transform Te-Wei Chiang 1, Tienwei Tsai 2, Chueh-Chih Liu 3 1

Department of Information Networking Technology, Chihlee Institute of Technology, Department of Information Management, Chihlee Institute of Technology, 3 Library & Information Center, Chihlee Institute of Technology, 2

E-mail addresses: [email protected] (T.-W. Chiang), [email protected] (T. Tsai), [email protected] (C.-C. Liu). ABSTRACT Image retrieval is an important problem for accessing large image databases. In this paper, a content-based image retrieval method based on block DCT is proposed. Due to the energy compacting property and the ability of extracting the texture information, discrete cosine transform (DCT) is applied to extract low-level features from the images. Based on the observation that block DCT can reduce the computational complexity of each DCT operation but may lose global view of the image being processed, we are motivated to investigate the relation between the precision rate in image retrieval and the block size of the block DCT. Experiments show that the block DCT with larger block size would perform better than the block DCT with small block size and the ordinary DCT in terms of the precision rate of the content-based image retrieval. Keywords Content-based image retrieval, query-by-example, discrete cosine transform, color space.

1. INTRODUCTION The amount of images stored and accessed in electronic form is already very large, and is rapidly increasing. The ability to select from a large database those images that are relevant to a user request, specified either in visual or in textual terms, is at the heart of the image retrieval problem, Basically, image retrieval procedures can be roughly divided into two approaches: query-by-text (QbT) and query-by-example (QbE). In QbT, queries are texts and targets are images; in QbE, queries are images and targets are images. For practicality, images in QbT retrieval are often annotated by words. When images are sought using these annotations, such retrieval is known as annotation-based image retrieval (ABIR). ABIR has the following drawbacks. Firstly, each image in the database has to be described by keywords, which is extremely time consuming. Secondly, the expressive power of keywords is limited and cannot be exhaustive. Consequently, to retrieve images according to the image contents becomes a significant need. Such retrieval is known as content-based image retrieval (CBIR) [1]. CBIR becomes popular for the purpose of retrieving the desired images automatically. Smeulder et al. [2] reviewed more than 200 references in this field. In QbE, the retrieval of images basically has been done via the similarity between the query image and all candidates on the image database. To evaluate the similarity between two images, one of the 1

simplest ways is to calculate the Euclidean distance between the feature vectors representing the two images. To obtain the feature vector of an image, some transform type feature extraction techniques can be applied, such as wavelet [3], Walsh, Fourier, 2-D moment, DCT [4-6], and Karhunen-Loeve. In our approach, the DCT is used to extract low-level texture features. Due to the energy compacting property of DCT, much of the signal energy has a tendency to lie at low frequencies. In other words, most of the signal energy is preserved in a relatively small amount of DCT coefficients. For instance, in the case of OCR, 84% signal energy appears in the 2.8% DCT coefficients, which corresponds to a saving of 97.2% [7]. Moreover, most of the current images are stored in JPEG format, which is based on DCT. As to other transform-based feature extraction methods like wavelet transform, the image decompression of inverse DCT is needed for the DCT-coded images. We hope that the feature sets derived here can be generated directly in DCT domain so as to reduce the processing time. In this paper, we propose a content-based image retrieval method based on block DCT. In the image database establishing phase, each image is first transformed from the standard RGB color space to the YUV space; then Y component of the image is further transformed to the DCT domain. In the image retrieving phase, the system compares the most significant DCT coefficients of the Y component of the query image and those of the images in the database and find out good matches. Moreover, based on the observation that block DCT can reduce the computational complexity of each DCT operation but may lose global view of the image being processed, in this paper, we will investigate the relation between the precision rate in image retrieval and the block size of the block DCT. The remainder of this paper is organized as follows. The next section is the problem formulation. Section 3 presents the proposed image retrieval system. Experimental results are shown in Section 4. Finally, conclusions are drawn in Section 5.

2. PROBLEM FORMULATION In this work we focus on the QbE approach. The user gives an example image similar to the one he/she is looking for. Formally, the QbE can be defined as follows. Let I be the image database with I := {Xn | n = 1, . . ., N} where Xn is an image represented by a set of features: Xn := {xn m | m = 1, . . ., M}. N and M are the number of images in the image database and the number of features, respectively. Because query Q is also an image, we have Q := {qm | m = 1, . . ., M}. To query the database, the dissimilarity between Q and Xn, D(Q, Xn), needs to be calculated for each n. The details of the dissimilarity (or distance) measure will be shown in subsection 3.3. Finally, the images with the smallest dissimilarity values are retrieved from the image database as the resulting images of the query Q. In the next section we shall introduce our image retrieval scheme.

3. THE PROPOSED IMAGE RETRIEVAL SYSTEM Our system contains two major modules: the feature extraction module and the similarity measure module. The details of each module are introduced in the following subsections.

2

3.1 Feature Extraction Features are functions of the measurements performed on a class of objects (or patterns) that enable that class to be distinguished from other classes in the same general category [8]. Basically, we have to extract distinguishable and reliable features from the images. Before the feature extraction process, the images have to be converted to the desired color space. There exist many models through which to define the valid colors in image data. Each of the following models is specified by a vector of values, each component of that vector being valid on a specified range. This presentation will cover the following major color spaces definitions [9]: RGB (Red, Green, and Blue), CMYK (Cyan, Magenta, Yellow, and Black Key), CIE (Centre International d’Eclairage), YUV (Luminance and Chroma channels), etc. In our approach, the RGB images are first transformed to the YUV color space.

3.1.1 RGB Color Space A gray-level digital image can be defined to be a function of two variables, f(x, y), where x and y are spatial coordinates, and the amplitude f at a given pair of coordinates is called the intensity of the image at that point. Every digital image is composed of a finite number of elements, called pixels, each with a particular location and a finite value. Similarly, for a color image, each pixel (x, y) consists of three components: R(x, y), G(x, y), and B(x, y), each of which corresponds to the intensity of the red, green, and blue color in the pixel, respectively.

3.1.2 YUV Color Space Originally used for PAL (European "standard") analog video, YUV is based on the CIE Y primary, and also chrominance. The Y primary was specifically designed to follow the luminous efficiency function of human eyes. Chrominance is the difference between a color and a reference white at the same luminance. The following equations are used to convert from RGB to YUV spaces: Y ( x, y ) = 0.299 × R ( x, y ) + 0.587 × G ( x, y ) + 0.114 × B ( x, y ),

(1)

U ( x, y ) = 0.492 × ( B ( x, y ) − Y ( x, y )),

(2)

V ( x, y ) = 0.877 × ( R ( x, y ) − Y ( x, y )).

(3)

Compared to the Y component, which is the most significant subspace of the YUV space, U and V component can be neglected. After converting from RGB to YUV, the features of each image can be extracted from the Y subspace by the DCT.

3.1.3 Discrete Cosine Transform Developed by Ahmed et al. [10], DCT is a technique for separating the image into parts (or spectral sub-bands) of differing importance (with respect to the image's visual quality). It uses the orthogonal real basis vectors whose components are cosines. The DCT approach has an excellent energy compaction property and requires only real operations in transformation process. Assume that the Y component of an K×K image is represented by f(x, y) for x, y=0, 1, …, K-1. On applying 2-D DCT, a frequency spectrum (or the 2-D DCT coefficients) F(u, v) of the Y component of the image can be defined as

3

K −1 K −1 2 ( 2 x + 1)uπ ( 2 y + 1)vπ F (u, v) = α (u )α (v)∑∑ f ( x, y ) × cos( ) × cos( ), 2K 2K K x =0 y =0

(4)

where ⎧⎪ 1

α ( w) = ⎨

⎪⎩ 1

for w = 0,

2

otherwise.

We notice that DCT is a unitary transform, which has the energy preservation property, i.e., K −1 K −1

K −1 K −1

E = ∑∑ ( f ( x, y ) ) = ∑∑ (F (u , v) ) , 2

x =0 y =0

2

(5)

u =0 v =0

where E is the signal energy. In Eq. (4), the coefficients with small u and v correspond to low frequency components; on the other hand, the ones with large u or v correspond to high frequency components. For most images, much of the signal energy lies at low frequencies; the high frequency coefficients are often small small enough to be neglected with little visible distortion. Therefore, DCT has superior energy compacting property. Based on above observations, we were motivated to devise an image retrieval scheme using low frequency 2-D DCT coefficients as discriminating features. DCT techniques can be applied to extract texture feature from the images due to the following characteristics [6]: z

the DC coefficient (i.e. F(0, 0)) represents the average energy of the image;

z

all the remaining coefficients contain frequency information which produces a different pattern of image variation; and

z

the coefficients of some regions represent some directional information.

3.1.4 Block Discrete Cosine Transform The block-based DCT initially divide an image into nonoverlapping S×S blocks of pixels. For each block, the 2D DCT is performed. Therefore, the computational complexity can be reduced for each DCT operation.

3.2 Similarity Measurement To decide which image in the image database is the most similar one with the query image, we have to define a measure to indicate the degree of similarity (or dissimilarity). Traditional dissimilarity (or distance) measure typically applies the sum of absolute differences (SAD) to avoid multiplications. To exploit the energy preservation property of DCT (see Eq. (5)), however, we use the sum of squared differences (SSD) instead. Assume that Fq (u , v ) and Fxn (u , v ) represent the DCT coefficients of the query image Q and image Xn, respectively. Then the distance between Q and Xn under the low frequency block of size k×k can be defined as k −1 k −1

(

)

D (Q , X n ) = ∑∑ Fq (u , v ) − Fxn (u , v ) . 2

u =0 v =0

4

(6)

For block DCT, where an image is divided into nonoverlapping S×S blocks, the distance between Q and Xn can be defined as the summation of the distances under each block.

4. EXPERIMENTAL RESULTS In this preliminary experiment, 1000 images downloaded from the WBIIS [11] and the SIMPLIcity [12] database are used to demonstrate the effectiveness of our system. To evaluate the retrieval efficiency of the proposed method, we use the performance measure, the precision rate as shown in Equation (7),

Rr , Tr where Rr is the number of relevant retrieved items, and Tr is the number of all retrieved items. Precision rate =

(7)

In our experiments, a butterfly is used as the query image. The size of the image is 85×128. Basically, the low frequency DCT block size, k×k, used for image retrieval will affect the precision rate. Figure 1 and 2 show the retrieved results using ordinary DCT with low frequency DCT coefficients of size 2×2 and 4×4, respectively. The retrieved results are ranked in the top ten in similarity. The items are ranked in the ascending order of the distance to the query image from the left to the right and then from the top to the bottom. We can find that matching with DCT block of size 4×4 is better than that with 2×2 in terms of the precision rate. Intuitively, the more the DCT coefficients are considered, the better the precision rate is. In fact, the precision rate may decrease as more DCT coefficients are included, which is known as the “peak phenomenon” [13]. To realize the relation between the block size S×S of the block DCT and the precision rate, a series of experiments have been made. Figure 3 – Figure 11 show the precision rates of the test image over the low frequency DCT block size k×k, using the block DCT with different S. The best precision rate, i.e. 0.9, occurs as S and k are set to 2 and 2, respectively. The retrieved results are shown in Figure 12. Figure 13 shows the best precision rate that can be obtained from the block DCT with differing S value. It can be found that block DCT with larger block size would perform better than the block DCT with small block size and the ordinary DCT in terms of the precision rate.

5. CONCLUSIONS In this paper, a content-based image retrieval method based on block DCT is proposed. The DCT is applied to extract low-level features from the images due to its energy compacting property and ability of extracting the texture information. To achieve QbE, the system compares the most significant DCT coefficients of the query image and those of the images in the database and find out good matches. Based on the observation that block DCT can reduce the computational complexity of each DCT operation but may lose global view of the image being processed, we are motivated to investigate the relation between the precision rate in image retrieval and the block size of the block DCT. Experiments show that the block DCT with larger block size would perform better than the block DCT with small block size and the ordinary DCT in terms of the precision rate of the contentbased image retrieval.

6. REFERENCE [1] Gudivada, V. and Raghavan, V. (1995), Content-Based Image Retrieval Systems, IEEE Computers, Vol. 28, No. 9, pp. 18-22. 5

[2] Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. (2000), ContentBased Image Retrieval at the End of the Early Years, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 22, No. 12, pp. 1349-1380. [3] Liang, K.-C. and Kuo, C. C. (1999), WaveGuide: A Joint Wavelet-Based Image Representation and Description System, IEEE Trans. on Image Processing, Vol. 8, No. 11, pp. 1619-1629. [4] Huang, X.-Y., Zhang, Y.-J., and Hu, D. (2003), Image Retrieval Based on Weighted Texture Features Using DCT Coefficients of JPEG Images, Proc. 4th Pacific Rim Conf. on Multimedia, Informaiton, Communications and Siginal Processing, pp. 1571-1575. [5] Huang, Y.-L. and Chang, R.-F. (1999), Texture Features for DCT-Coded Image Retrieval and Classification, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 30133016. [6] Bae, H.-J. and Jung, S.-H. (1997), Image Retrieval Using Texture Based on DCT, Proc. of Int. Conf. on Information, Communications and Signal Processing, Singapore, pp. 1065-1068. [7] Chiang, T.-W., Tsai, T., and Lin, Y.-C. (2004), Progressive Pattern Matching Approach Using Discrete Cosine Transform, Proc. Int. Computer Symposium, Taipei, Taiwan, pp. 726730. [8] Nadler, M. and Smith, E. P. (1993), Pattern Recognition Engineering, New York: Wesley Interscience. [9] Dunn, S. (1999), Digital Color, http://davis.wpi.edu/~matt/courses/color/ . [10] Ahmed, N., Natarajan, T., and Rao, K. R. (1974), Discrete cosine transform, IEEE Trans. on Comput., Vol. 23 pp. 90-93. [11] Wang, J. Z. (1996), Content Based Image Search Demo Page, http://bergman.stanford.edu/ ~zwang/project/imsearch/WBIIS.html . [12] Wang, J. Z., Li, J., and Wiederhold, G. (2001), SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture Libraries, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 23, No. 9, pp. 947-963. [13] Bishop, C. M. (1995), Neural Networks for Pattern Recognition, Oxford: Clarendon Press.

6

Figure 1. Retrieved results using ordinary DCT with low frequency DCT coefficients of size 2×2.

Figure 2. Retrieved results using ordinary DCT with low frequency DCT coefficients of size 4×4. 7

Precision Rate

1 0.8 0.6 0.4 0.2 0 1

2

3

4

5

6

7

8

9 10 11 12

k (DCT Block Size: k xk )

Figure 3. Precision rate of the test image over k using ordinary DCT.

Precision Rate

2x2 Block DCT 1 0.8 0.6 0.4 0.2 0 1

2

3

4

5

6

7

8

9 10 11 12

k (DCT Block Size: k xk )

Figure 4. Precision rate of the test image over k using 2x2 block DCT.

Precision Rate

3x3 Block DCT 1 0.8 0.6 0.4 0.2 0 1

2

3

4

5

6

7

8

9 10 11 12

k (DCT Block Size: k xk )

Figure 5. Precision rate of the test image over k using 3x3 block DCT.

8

4x4 Block DCT Precision Rate

1 0.8 0.6 0.4 0.2 0 1

2

3

4

5

6

7

8

9

10 11 12

k (DCT Block Size: k xk )

Figure 6. Precision rate of the test image over k using 4x4 block DCT.

Precision Rate

5x5 Block DCT 1 0.8 0.6 0.4 0.2 0 1

2

3

4

5

6

7

8

9 10 11 12

k (DCT Block Size: k xk )

Figure 7. Precision rate of the test image over k using 5x5 block DCT. 6x6 Block DCT Precision Rate

1 0.8 0.6 0.4 0.2 0 1

2

3

4

5

6

7

8

9

10 11 12

k (DCT Block Size: k xk )

Figure 8. Precision rate of the test image over k using 6x6 block DCT.

9

7x7 Block DCT Precision Rate

1 0.8 0.6 0.4 0.2 0 1

2

3

4

5

6

7

8

9

10 11 12

k (DCT Block Size: k xk )

Figure 9. Precision rate of the test image over k using 7x7 block DCT. 8x8 Block DCT Precision Rate

1 0.8 0.6 0.4 0.2 0 1

2

3

4

5

6

7

8

9

10 11 12

k (DCT Block Size: k xk )

Figure 10. Precision rate of the test image over k using 8x8 block DCT.

16x16 Block DCT Precision Rate

1 0.8 0.6 0.4 0.2 0 1

2

3

4

5

6

7

8

9

10 11 12

k (DCT Block Size: k xk )

Figure 11. Precision rate of the test image over k using 16x16 block DCT.

10

Figure 12. Retrieved results using 2×2 block DCT with low frequency DCT coefficients of size 2×2.

Precision Rate

S x S Block DCT 1 0.8 0.6 0.4 0.2 0 1

2

3

4

5

6

7

8

9

10 11 12

S

Figure 13. The best precision rate that can be obtained from the block DCT with differing S value.

11

以區塊離散餘弦轉換進行內涵式影像檢索 1

2

蔣德威 蔡殿偉 劉爵至 1

3

致理技術學院 資訊網路技術系 2 致理技術學院 資訊管理系 3 致理技術學院 圖書資訊中心 摘要

對於存取大量影像資料庫而言,影像檢索是一個很重要的課題。本論文提出一種建立在 區塊離散餘弦轉換上的內涵式影像檢索方法。由於離散餘弦轉換具有能量集中的特性以及擷 取紋路資訊的能力,所以常被應用來擷取影像的低階特徵。雖然區塊離散餘弦轉換可以降低 每次離散餘弦轉換時計算的複雜性,但是也可能會失去對該影像的整體概觀,這促使我們對 影像檢索的正確率與區塊離散餘弦轉換的區塊大小之間的關係做進一步的研究。實驗結果顯 示,就內涵式影像檢索的正確率而言,較大區塊的離散餘弦轉換比較小區塊的離散餘弦轉換 和一般的離散餘弦轉換的結果都來得好。 關鍵字:內涵式影像檢索、舉例查詢、離散餘弦轉換、色彩空間

12

Suggest Documents