Wavelet Packets-based Image retrieval - Semantic Scholar

Wavelet Packets-based Image retrieval* Alexandre H. Paquet1, Saif Zahir2, and Rabab K. Ward1 1

Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada 2 Computer Science and Information System, University of Wisconsin-RF, USA [email protected], [email protected], [email protected],

ABSTRACT The ability to retrieve images from databases is of great importance for a wide range of applications. In this paper, we present a new method for image identification and retrieval that enables the recovery of original images even after size-conserving rotation or flipping operations. Our method uses the correlation of wavelet packets coefficients to create image signatures. It first computes a basic signature for the original image by summing the correlation values along all frequency bands. Possible image rotation/flipping cases are then used to generate additional short signatures that are added to the basis signature, to identify geometric transformations. Simulation results show that our method has image retrieval rates between 88.6% and 91.7%, and geometric transformation recognition of 99.57%. 1. INTRODUCTION The rapid expansion of the Internet and the wide use of digital images have increased the need for both efficient image database creation and retrieval procedures. The challenge in image retrieval is to develop methods that can capture the important characteristics of an image that make it unique, and allow its accurate identification. The actual objective is to produce a one-dimensional signal that represents an image and permits its retrieval from a database in an efficient manner. In the following section, we explain our decision to focus on wavelet transform-based techniques prior to briefly review other related methods. Then, we introduce the main elements of the proposed image retrieval method and present simulation results to demonstrate its retrieval performance. 2. WAVELET ANALYSIS Recently, Wavelet transforms are commonly used to decompose images in frequency bands. The main advantage of wavelets over Fourier analysis is that they do allow jointly finite resolution in space and frequency. Wavelet transform (WT) allows the decomposition of the *

signal in narrow frequency bands while keeping the basis signals space limited [1]. This is certainly of great importance when dealing with real signals, especially when spatial localization is to be considered. Besides, the large amount of mother wavelet gives flexibility to the analysis and allows it to be truly adaptive to a particular application. It is also possible to develop new basis functions to fulfill specific requirements. All those gains certainly explain why WT attracts so much attention for a wide range of applications, including image retrieval. 3. PREVIOUS WORK There has been a lot of research devoted to image retrieval techniques. A great deal of work has been done to develop texture, color, shape or content-based indexing procedures for image signature production [2-7]. Because of the many advantages of wavelet representation, other systems use characteristics extracted from wavelet analysis. Here we concentrate our attention on that category and review some WT-based feature extraction methods. In [8], a single stage technique is discussed which addresses the image segmentation/classification problem. This is performed at the pixel level using an energy density function based on the WT. Instantaneous energy distribution, called Pseudo Power Signature, is used as the image signature Its effectiveness and low computational and storage requirements are discussed. In [9], the hidden Markov tree-modeling framework of the complex wavelet transform is extended to take advantage of its near shift-invariance property and improve angular resolution. By focusing on salient signal features, the model can be used to solve the supervised classification problem more efficiently than methods based on traditional WT. However, the required training of the HMT models for each required sample certainly set limits on the technique’s practical application. A hierarchical wavelet-based framework for modeling patterns in digital images is introduced in [10]. Scott and Nowak use the marginal pdf of the significant and insignificant wavelet coefficients (WTC) to specify the joint distribution of the WTC of a linearly transformed

This research is partially supported by the Natural Sciences and Engineering Research Council of Canada.

0-7803-7402-9/02/$17.00 ©2002 IEEE

IV - 3640

pattern template. With results obtained from real images, the Template Learning from Atomic Representation technique is proven to be efficient in extracting a low dimensional template, representing the defining structure of the pattern while rejecting the noise or the background. In [11], a salient point detector based on WT that extracts points where variations occur in the image is presented. Large WTC at coarse resolution are found and then their largest children coefficients are tracked up to the finest scale. The authors present a retrieval experiment with Gabor features and demonstrate that their method performs better than other point detectors. A method for representing texture information in images using dual tree complex wavelet transform (DTCWT) is presented in [12]. The image texture is represented using magnitude quantization of DT-CWT coefficients, to extract the significance of each subband, and separate coding of phase information. In the retrieval process, the similarity of images is defined according to the Euclidean distance between the significant values of their subbands. Using images from real databases, the authors prove the efficiency of their method to extract texture features from encoded data. All of the techniques described above use WT to extract image signatures. Below, we present a method based on wavelet packets that leads to promising results as WP allow better details’ identification in the frequency domain and produce symmetric decomposition bands. 4. WAVELET PACKETS IMAGE SIGNATURE In this paper, we propose a new method that uses Wavelet Packets (WP) for image signature extraction since they allow better frequency resolution than standard wavelets [13]. This is due to the fact that, in the decomposition, not only the output of the low pass filtered image is used for the subsequent level, but also the high pass filters’ outputs. This leads to narrower frequency bands at higher frequencies. Besides, the WP allows much higher precision and flexibility in the selection of the band to be used in the extraction of the image signature. From the above, one could propose to simply store the WP coefficients as image signatures. This scheme would have the advantage of yielding a 1 to 1 relation between the image and the signature. However, such a scheme is not acceptable since the number of coefficients stored would be at least as large as the number of pixels in the image. For that reason, we need to reduce the size of the signature, while keeping faithful image representation and signature uniqueness. Straightforward transformations cannot be used to reduce size, since this would compromise the exclusivity of the signatures. However, since WP decompose images into symmetric frequency bands, it is possible to compute the correlation between their coefficients and use it as the signature. This

correlation value represents the relation between different frequency bands (FB), but more importantly, allows the identification of the image’s salient points, which correspond to major changes (i.e. large value coefficients) cascading along the frequency axis. According to the assumptions listed below and the information-theoretic analysis developed in [14], this correlation value should capture major image characteristics, thus allowing good image identification. This is also in accordance with [11], since large WP coefficients (WPC) at a lower scale will lead to a large correlation value only if the matching coefficients at higher resolutions are also large. 4.1. Assumptions 1. If a particular coefficient in a low frequency band is of large magnitude, it is likely that its children coefficients (in higher FB) will be large as well [9]. In fact, large and small wavelet coefficient values cascade along the branches of the wavelet tree. 2. Images with similar mean intensity tend to have similar low FB coefficients’ distributions, since they represent slow intensity variations in images. 3. Different images can have similar high frequency band decompositions that represent sharp variations. 4.2. Extraction Procedure While Continuous Wavelet Transform (CWT) is rotation invariant, the discrete nature of our WP computation yields to variations in coefficients’ values introduced by either rotation or flipping of the original image. One must, therefore, provide robustness to those kinds of geometric transformations. We could simply imagine a scheme where an image has several signatures, one for each of the specified possible alterations. However, this would quickly increase the number of signatures, expand the storage space required, and lengthen the computation time. This is why alternative techniques must be found in order to allow geometric transformation identification within a single signature. This is exactly what we propose to do, by adding short signatures to the signature of the original image so that in the retrieval process, we can detect if any flipping or rotation has occurred. Here are the main steps of our image signature technique: 1. Resize the image to standard, square dimensions. Since the images are of square sizes, we only have to consider five rotation/flipping operations. Thus, we have a total of six cases: the original, Rotated 180q, Rotated 90q, flipped around the vertical axis, Rotated 270q counterclockwise, and flipped around the horizontal axis. That is, only the rotation/flipping cases that preserve the images’ square sizes are considered. We did not consider combination of rotation and flipping.

IV - 3641

2. Obtain the two dimensional WP decomposition of the images using three levels of Daubechies’ 12-tap filter. We chose Daubechies’ filter because while it has finite (compact) support, it is continuous and yields better frequency resolution than the Haar wavelet and achieves better spatial resolution than other wavelets [1]. Our choice of using 3 levels of decomposition was motivated by our desire to keep within a reasonable computational cost for analysis. Much work has yet to be done in the domain of wavelet function design or selection for particular applications; that is a field of research in itself and the investigation of the effects of wavelet function selection on the final image signature could be the subject of another paper. For 256x256 images, 3 levels of db12 yield 64 frequency bands (FB) with 2704 coefficients in each band. 3. Produce the basis signature. For each (l,m) coefficients in each of the 64 FB, we compute the correlation with the corresponding (l,m) coefficients in the other 63 bands. Then, all the (l,m) correlation coefficients are summed up. The resulting 2704 values are then normalized before an adequate threshold is applied to reject insignificant values. 4. Produce the signatures specific for the five rotated flipped versions and the original image. For each version, the same principle as for the basis signature is used, except that the summation of the correlation is now done within each of the 64 frequency bands, resulting in 64 values. 5. Add the additional signatures at the end of the basis signature to form the final image signature.

composed of 1500 signatures of which 100 were real images. The other 1400 signatures had been generated to have the same statistical properties (uniform distribution of same amplitude) as the signature of the particular type of real image, which were taken from different sources. The first set was formed of miscellaneous images taken from the public domain; the second set of images was formed of portrait photographs obtained from the U.K. AT&T Laboratories [15] (see Figure 2). Low-cost capacitive sensor images from the “Fingerprint Verification Competition 2000” were used to form the last set of test images. Once the banks were created, as they would for an image archiving application, our image retrieval technique was tested for the different rotation/flipping cases for all images. The results are presented in Table 1.

4.3. Retrieval Process The image retrieval process is quite direct and similar to other ones proposed in the literature (see [11]): 1. Resize the available image, if necessary, and calculate its signature. Then, compute the correlation between the basis part and the corresponding portion of the signatures in the image bank. 2. Use the maximum value of the resultant correlation to decide if the signature belongs to an image in the bank. 3. If it does, compare each portion of the remaining signature with the same portion of the signatures in the bank. This allows us to decide whether or not the original image identified has been geometrically tempered and, if so, by which transformation. 4. Extract the image from the bank and apply the appropriate geometric transformation to obtain the image we are looking for. 5. SIMULATION RESULTS To verify the efficiency of our method, we have created three banks of images with signatures. Each bank is

Figure 1 Cases considered and resultant signatures

Table 1 Retrieval rates obtained Image Set Miscellaneous Faces Fingerprints

Image retrieval rate 88.60 % 88.79 % 91.76 %

Position retrieval rate 100 % 100 % 98.72 %

The results obtained clearly demonstrate the ability of our system to retrieve images from a database using image signatures. At first, it is surprising to find a better image identification rate for the third set, where the images are

IV - 3642

more similar. Upon inspection, we find that it is because the inter FB correlation within those images is generally of a higher magnitude. This leads to a more defined signature (more non-zero points), thus, more differences between the signatures, and therefore, better identification of images. Nevertheless, on average, our system has a recognition rate of 89.72%, and a 99.57% ability to determine applied geometric transforms, while keeping the amount of false detection close to zero and the computational cost far below what is needed for retrieval by direct images comparison.

(see [2]) prior to the correlation computation, to allow quick pruning of irrelevant images, speeding up the retrieval process and maximizing the retrieval rates. 7. REFERENCES [1] [2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

Figure 2 Face images and computed signatures 6. CONCLUSIONS We have introduced a novel approach for image identification and retrieval using wavelet packets. It uses WP coefficients correlation computation along frequency bands for retrieval of images within large databases. Our method allows, by the storage of one single signature, the identification of rotation or flipping operations performed on copies of stored images. With image retrieval rates close to 90% and transformation retrieval rates approaching perfection, we have demonstrated that our system is suitable for image identification and retrieval in the context of images databases. Our results could be further enhanced by the addition of a filtering operation

[12]

[13] [14]

[15]

IV - 3643

M. Vetterli and J. Kovacevic, Wavelet and Subband Coding, Prentice Hall Signal Processing Series, 1995. T. Seng Chua, K.L. Tan and B. Chin Ooi, Fast Signature based Color-Spatial Image Retrieval, IEEE International Conference on Multimedia Computing and Systems, pp.362-369, 1997. K. Schroder and P. Laurent, Efficient Polygon Approximations for Shape Signatures, International Conference on Image Processing, Vol. 2, pp.811-814, 1999. G. Nicchiotti and R. Ottaviani, A Simple Rotation Invariant Shape Signature, IEEE International Conference on Image Processing and its Applications, Vol. 2, pp.722-726, 1999. M. Kliot and E. Rivlin, Invariant-based Data Model for Image Databases, IEEE International Conference on Image Processing, Vol. 2, pp.803-807, 1998. L. Wenyin, T. Wang and H. Zhang, A Hierarchical Characterization Scheme for Image Retrieval, IEEE International Conference on Image Processing, Vol. 3, pp.42-45, 2000. A. Vailaya, M.A.T. Figueiredo. A.K. Jain and H.-J. Zhang, Image Classification for Content Based Indexing, IEEE Trans. on Image Processing, Vol. 10, No. 1, pp.117-129, January 2001. V. Venkatachalam, Image Classification using Pseudo Power Signatures, IEEE International Conference on Image Processing, Vol. 1, pp.796-799, 2000. J. Romberg, H. Choi, R. Baraniuk and N. Kingsbury, Multiscale Classification using Complex Wavelets and Hidden Markov Tree Models, IEEE International Conference on Image Processing, Vol. 2, pp.371-374, 2000. C. Scott and R. Nowak, Pattern Extraction and Synthesis using a Hierarchical Wavelet-based Framework, IEEE International Conference on Image Processing, Vol. 2, pp.383-386, 2000. E. Loupias, N. Sebe, S. Bres and J.-M. Jolion, Waveletbased Salient Points for Image Retrieval, IEEE International Conference on Image Processing, Vol. 2, pp.518-521, 2000. S. Hatipoglu, S.K. Mitra and N. Kingsbury, Image Texture Description using Complex Wavelet Transform, IEEE International Conference on Image Processing, Vol. 2, pp.530-533, 2000. I. Daubechies, Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics, 1992. J. Liu and P. Moulin, Analysis of Interscale and Intrascale Dependencies between Image Wavelet Coefficients, IEEE International Conference on Image Processing, Vol. 1, pp.669-672, 2000. F. Samaria and A. Harter, Parameterization of a stochastic model for human face identification, 2nd IEEE Workshop on Applications of Computer Vision, December 1994, Sarasota (Florida).