88
IMAGE CORRELOGRAM IN IMAGE DATABASE INDEXING AND RETRIEVAL IIVARI KUNTTU, LEENA LEPISTÖ, AND ARI VISA Tampere University of Technology, Institute of Signal Processing P. O. Box 553, FIN-33101 Tampere, Finland E-mail:
[email protected] JUHANI RAUHAMAA ABB Oy, Paper, Printing, Metals & Minerals P. O. Box 94, FIN-00381 Helsinki, Finland E-mail:
[email protected] In this paper we present a new approach to image indexing and retrieval based on image correlogram. We show that in comparison with usually used approaches, i.e. image histogram and autocorrelogram, image correlogram gives significantly better results in image retrieval. These results are achievable without increasing computational cost in image indexing or retrieval. In the experimental part of this paper the retrieval performance of image correlogram is compared to that of image autocorrelogram and image histogram.
1.
Introduction
Image color (or gray level), in addition to texture and shape, is an essential feature in image retrieval. Image histogram is a computationally light method in image indexing, but it ignores the spatial organization of the pixels in the image. Therefore, second order statistics have been adopted to image retrieval. The most common second order statistical measures used in image retrieval are based on the correlation function between the image pixels. Image correlogram [2] describes the correlation of the image colors as a function of their spatial distance. Because of its computational lightness, Huang et al. [2] preferred the autocorrelogram to the correlogram in image indexing. The autocorrelogram is a subset of the correlogram, and it gives the probability of finding identical colors at certain distance. For computational reasons, also Ojala et al. [3] chose the autocorrelogram to image indexing. In this paper, the correlogram-based image indexing problem is revisited. Unlike in the previous studies in this field, we consider correlogram instead of autocorrelogram. This approach gives clearly better retrieval results than the autocorrelogram. We show that these results are achievable without increasing computational cost in the image indexing or retrieval.
1
89 2.
Statistical Methods in Image Indexing
Image histogram is an example of first order statistical measures. It can be defined in the following way [3]. Let I be an image that comprises of pixels p(x,y). Each pixel has a certain color or gray level (henceforth level). Let [G] be a set of G levels g1 ... gG that can occur in the image. For a pixel p, let I(p) denote its level g, and let Ig correspond to a pixel p, for which I(p)=g. Then histogram for level gi is defined as: (1) hg (I) ≡ Pr [ p ∈ I g ] i
p∈I
i
which corresponds to the probability of any pixel in I being of the level gi. Hence, the histogram describes the distribution of the pixel levels in the image ignoring their spatial organization. Second order statistical measures used in image retrieval are correlogram and autocorrelogram. The definition of the correlogram is the following [2],[3]. Let [D] denote a set of D fixed distances {d1,,…, dD}. Then the correlogram of the image I is defined for level pair (gi, gj) at a distance d. (2) γ g( d,)g (I ) ≡ Pr p2 ∈ I g p1 − p2 = d i
j
p1∈I g i , p2∈I
[
j
]
which gives the probability that given any pixel p1 of level gi, a pixel p2 at a distance d in certain direction from the given pixel p1 is of level gi. Autocorrelogram [2],[3] captures the spatial correlation of identical levels only: α g(d ) (I) = γ g(d, g) (I) (3) It gives the probability that pixels p1 and p2, d away from each other, are of the same level gi. The distance measure between the histograms, autocorrelograms, and correlograms is the L1-norm that is computationally light method and used in [2] and [3]. 2.1. Computationally Efficient Approach to the Use of Correlograms in Image Retrieval The computation time required in image retrieval is proportional to the length of the feature vector used in image database indexing (table 1). In [5] Smith and Chang extract the regions of similar color (or gray level) in the images by requantizing the color space of the images. In our approach, we use this principle to decrease the number of the levels G in the images. This is done by requantizing the images before computing of the correlograms. Using this method, we achieve two important benefits: 1) The decrease of G lightens the computational cost and 2) Quantization of the image into sets of similar levels generalizes the image content, which improves the retrieval results.
90
Figure 1. An example image of each defect class.
3.
Case Study: Paper Defect Image Database
We tested our method using a set of 1308 paper defect images, which were taken from paper web using a paper inspection system [4]. The test set consisted of 14 defect classes (figure 1). Correlograms were defined to the images quantized to 32 and 16 gray levels. Autocorrelograms were calculated for 256 and 16 level images. The set of distances [D] was {1,3,5,7} that is the same as in [2],[3]. For comparison, also 256 level histograms were calculated. The retrieval experiments were made using leave one out method and the retrieval performance was measured by calculating an average precision/recall curve [1] for each of the queries (figure 2). Table 1 shows the computational characteristics of each method. Table 1. Computational characteristics of the features. The computation was made using Matlab on a PC with 804 MHz Pentium III CPU and 256 MB primary memory. Feature Vector length Indexing time Retrieval time 256-histogram 256 209 s 65 s 256-autocorrelogram 256*4=1024 749 s 100 s 16-autocorrelogram 16*4=64 748 s 39 s 32-correlogram 322*4=4096 363 s 251 s 16-correlogram 162*4=1024 363 s 100 s
Figure 2. Average retrieval performance of the correlograms, autocorrelograms and histograms.
91 4.
Results and Discussion
In this paper we combined the calculation of the correlograms and the generalization of the image content. Using this solution, we claimed that the retrieval accuracy can be better than in the case of the autocorrelogram-based image retrieval [2],[3] without increasing the computational cost. The results of the experiments presented in figure 2 show that the correlogram gives clearly better results in the image retrieval than the histogram or the autocorrelogram. However, the computational cost of the best feature, 16level correlogram is the same as in case of the 256-level autocorrelogram. In addition to this, generalization of the image content using the image level quantization yields to better retrieval accuracy. This can be seen in figure 2, when we compare the 16-level correlogram to the 32-level correlogram or 16-, 32-, and 256-level autocorrelograms. In conclusion, the results show that the correlogram is an effective tool for description of image content in the retrieval. In contrary to the histogram, the correlogram considers also the spatial relationships between the image levels. Our results show that the correlogram gives also significantly better results in comparison with the autocorrelogram. Acknowledgments The authors wish to thank the Technology Development Centre of Finland (TEKES’s grant 40397/01) for financial support. References 1. 2. 3. 4. 5.
R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, ACM Press, Addison-Wesley, New York (1999). J. Huang, S. R. Kumar, M. Mitra, W.-J. Zhu, R. Zabih, Image indexing Using Color Correlograms, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico (1997). T. Ojala, M. Rautiainen, E. Matinmikko and M. Aittola, Semantic Image Retrieval with HSV Correlograms, Proceedings of 12th Scandinavian Conference on Image Analysis, Bergen, Norway (2001). J. Rauhamaa, R. Reinius: Paper Web Imaging with Advanced Defect Classification, Proceedings of the 2002 TAPPI Technology Summit, Atlanta, Georgia, (2002). J. R. Smith, S. F. Chang, Tools and Techniques for Color Image Retrieval, SPIE Proceedings, Vol. 2670, (1996).