EECS, National Tech. Univ. of Athens, Athens Greece. ABSTRACT. In this paper, we propose a model for the fast retrieval image data using multiple features ...
MULTI-FEATURE CONTENT-BASED IMAGE RETRIEVAL PUNPITI PIAMSA-NGA, NIKITAS A. ALEXANDRIDIS SANAN SRAKAEW, and GEORGE BLANKENSHIP EECS, GWU, Washington DC, 20052 USA
ABSTRACT In this paper, we propose a model for the fast retrieval image data using multiple features based on multiresolution processing. We demonstrate our performance results by searching an image database of approximately 30,000 24-bit color photographic images. We use identical structures to build the index keys from the two features using histogram generation and wavelet transform. The experiments show that the results obtained with the quad-tree structure for the two features are more accurate than using just one feature; however, the retrieval time almost doubles. The retrieval time can be improved by applying our weighted cascading comparison scheme, which allows the weighting of the features to control a pipeline. We evaluate our multiplefeature similarity-matching algorithm by comparing its accuracy with a regular-matching algorithm. The results also show that the retrieval time of our weighted cascading scheme is faster than the regular matching algorithm with an acceptable accuracy.
1. INTRODUCTION Current automatic content-based image retrieval systems use different features of interest as keys for indexing and searching of the data. [1] The number of content types that defines the identifiable features is extremely diverse. A type-dependent solution using specific features may not be appropriate in all cases. The generalized model that unifies as many types of features as possible overcomes this problem by exploiting the same computing algorithms on similar data structures. The k-tree was introduced in [2] as a unified structure for multimedia data. It allows the use of a single data structure and processing algorithm multiple types of data content. The k-tree is a directed graph where each node has 2k incoming edges and one outgoing edge with a balanced structure. A tree expresses the hierarchical relationships of data between adjacent levels. The k-tree is used to store k-dimensional data. [3] In the case of two dimensional image data the value of k is two. The 2-k tree is known as a quad tree.
G. PAPAKONSTANTINOU, P. TSANAKAS, and S. TZAFESTAS EECS, National Tech. Univ. of Athens, Athens Greece Content-based image retrieval is done by comparing features extracted from the query with features extracted from every record in the database. We use a quad tree to store the features. Using the globally summarized information at the root of the tree to exclude the obvious, the algorithm gains efficiency. The more detailed rows of the tree are used when more precise definition is needed. Since we can realize all types of data, including its associated data, using the same quad tree data structure, data indexing and retrieving are uniform. Image data is composed of many features, such as color and texture. Each node on the quad-tree shares the structure to contain information of many different features. Similarity searching is a two step algorithm. The first step is to find the distance between the query and each record in the database. The second step is to sort the distances. The distance computation takes into account all of the features that were used to index the database. The best results are the database records with the lowest distances to the query. The comparison time grows with the number of features that form the basis of the distance computation. The cascading processing of the processing offers some promise for relief from the explosion of computation time. The cascading uses the ordering of one feature as the basis for the next feature. Each feature is filter for following features. We introduce a weighted cascading scheme for compromise between retrieval time and accuracy of the search. The weighted cascade uses the existing distances, which were calculated in the prior steps of the computation, to evaluation of the result, rather than using only the weight of current step. In this paper, we propose a model for the fast retrieval image data using multiple features based upon multiresolution processing. We demonstrate our performance results by the use of an image database. We use identical structures to build the two index keys of color and texture. The experiments show that inquiries using one or both features are more accurate with the quad-tree structure, while the retrieval time is only slightly longer. The retrieval time, however, can be improved by applying our weighted cascading scheme, which allows the weighting of importance of multiple features. We evaluate the multi-feature similarity matching by comparing the accuracy against a regular-
matching scheme. The results show that the retrieval time of our weighted cascading scheme is faster than other conventional schemes, such as regular-matching scheme.
2. THE UNIFIED K-TREE MODEL The k-tree has been introduced in [2] as a unified structure for multimedia data. It allows the user to use the same data structures and processing for diverse kinds of data content. A k-tree has three main benefits. First, the ktree holds the features on the tree structure itself. It reduces the computation time to find the distance between two tree nodes; distance computation is the fundamental computational component of a comparison. Second, a ktree accelerates multi-resolution processing by calculating small, global information first and then large, local information when precise resolution is needed. Third, the data of the k-tree is unified since only the degree of the tree changes, while the processing algorithm and data structure remain invariant. Therefore, an algorithm for a particular type of feature can be reused for a feature of another media type. The structure of a k-tree is the basis for multiresolution processing. Multiresolution processing uses global features first; more detailed data is used when more precise identification is required. The reduction of computation time is a function of the ability of the global data to identify unsuitable targets. The higher levels of the k-tree contain more global features, but lack spatiotemporal information; data on the lower levels are more local, but the data size is comparatively large.
Blocked Blocked data data
Level n-1
Level n
Feature
Feature
Feature
Feature Blocked Blocked data data
Data stream
Level 0
Feature
Feature
Feature
FIGURE 1: FEATURE EXTRACTION TO A K-TREE
The salient features of the multimedia data are extracted to create the leaves of the k-tree. The features are combined to create a more generalized content at the next upper level of the tree; the most global information is stored at the root of the tree. Figure 1 shows the feature extraction of 1-dimensional data stream into binary tree structure. Data is separated into blocked data and transformed into features, which are stored in the leaves of the tree. Information stored in the leaves is summarized in the next higher level.
3. HISTOGRAM-BASED FEATURES In the experiments performed, we use a histogram to define a feature; the histogram value is the index of the images. In this paper, we examine two types of histograms: color and textures. The color histogram is constructed by counting the number of pixels of each color in a particular area. The k-tree is a quad tree of color histograms. The histogram of textures is constructed by assigning areas of the image a texture index. Each index is a 14-dimension vector of means and variances generated by a wavelet sub-band (2 iterations, 7 subbands). The database contains approximately 30,000 24-bit photographic images from Smithsonian Online Collection of Photographs [3], Kodak Homepage [4], and several commercial CD-ROMs. (Quantized Image) Quantization
Median Filtering
Quad tree Construction
Scaling (Original Image)
(Normalized Image)
(128x128 Scaled Image) Wavelet Decomposition (Wavelet data)
Texture Identification
(Quad tree (Texture ID of features) matrix) Known Texture Table Texture Texture ID
FIGURE 2: GENERATING THE HISTOGRAM-BASED FEATURE QUAD-TREE OF AN IMAGE
Feature extraction from the images is depicted in Figure 2. Before beginning the extraction of features, all pictures are normalized, scaled down to 128x128 pixels. The color feature extraction is performed in three steps. The first step transforms the number of colors of the scaled images to Smith’s color sets [5]. The second step performs a median filtering using a 5x5 window. The third, and final, step stores the transformed image in a quad tree structure. Texture feature extraction requires three steps. The first step transforms the 64 blocks of 16x16 pixels in to 64 sets of wavelet data using a Quadratic Mirror Filter (QMF) (2 iterations, 7 subbands).[6] Each wavelet data produces seven subbands of means and variances; i.e. a 14-element vector. In the second step, the texture vectors are then compared to reference VisTex textures [7] in the known texture table to generate 64 texture indices representing textures for blocked data. The third, and final, step constructs and stores the texture features in a quad tree structure. The steps of the quad tree generation are the same for both features. The transformed color images and textureid matrices are mapped onto the leaves of a quad tree structure. The leaves represent a single pixel of the normalized image. Histograms of the leaves, which share the same parent nodes, are summed and the results are
stored at their parent nodes. The process continues iteratively for each level until the root has been reached.
4. MULTIPLE-FEATURE RETRIEVAL ALGORITHM COMPARISON The search for an item in a multimedia database uses unique features as the key index. Exact key matching database systems are inefficient and inappropriate for multimedia data. The subjective content of the keys makes the exact matching approach unsuitable; instead, similarity searching is a more appropriate approach. Just as in an exact matching approach, the crux issues are the construction of the index table and the retrieval scheme. A multimedia index entry should contain the salient features, which have been extracted from the raw data, and the spatio-temporal relationships as a single entry. The usefulness of an index entry is a function of the computation time used to extract the features, the space required to store the entry, and the ease with which the entity identifies the data. The basis of the algorithm is a function that measures the distance between the target and each of the multimedia objects.
for a total distance at the time a query comparing with a record in a database.
4.2 CASCADED WEIGHTED COMPARISON Cascading a weighted comparison is based on the concept that each feature is compared in sequence and that each feature has a relative importance. The ranking created by one feature determines the input for the next feature comparison. A weighting function is used to reflect the importance of the features. The features are processed in the inverse order of their weights. The features with a lower weight use a subset of the total database determined by the higher weight features. The subset is the portion of the database that has the lowest distances to the query as determined by the higher weighted features. Figure 4 shows the diagram of weighted cascading comparison. We also generalize the distance as follows. Suppose d ( A, Q ) = D1..n ( A, Q ) is a combined distance among features from 1 to n. The cascading distance is defined by D1..n ( A, Q ) = wn d (a n , q n ) + D1..( n −1) ( A, Q ) , where
d ( a0 , q0 ) = 0 .
4.1 REGULAR WEIGHTED COMPARISON Feature 1 of Data x
Feature 1
d1 (q1, x1) Feature 2 of Data x
Query Feature 2
d2 (q2 , x2) Feature n of Data x
Feature n
dn (qn,xn)
Feature Finding Extraction Distance
Feature 1
w1
+
×
List of ranked results by feature 1,2
List of first k ranked results (final result)
FIGURE 4: WEIGHTED CASCADING COMPARISON
5. THE EXPERIMENTAL RESULTS
Q = q1, q2 ,..., qn is
d i (ai , qi ) is the distance between the i feature of A and Q . wi is the weight of distance d i (ai , qi ) . The weight ranks the
∑ w = 1 ). The total distance is given by d ( A, Q ) = ∑ w d (a , q ) . The distance n
i
n
i
List of ranked results by feature 1
× Weighing Distances
th
i
List of all data in database
d(q,x)
the list of features defining a query.
i
Ranking by feature 1…n
wn
an entity in a multimedia database.
i =1
Feature n
w2
We can generalize the distance between two objects as follows. A = a1 , a 2 ,..., a n is the feature list that defines
i =1
Feature2
Ranking by Ranking by Feature 1 Feature 1,2
×
FIGURE 3: MULTIFEATURE COMPARISON
features (
Query
i
computation algorithm is depicted in Figure 3. Regular weighted comparison can be done by calculating distances of all features and then weighting the distances
The experimental research was concerned primarily with the perceptual result, the retrieval time, and the accuracy of the candidate targets. Table 1 shows the retrieval time and accuracy of the query results by using colors only, textures only, and color-texture together. The table shows the retrieval time and number of acceptable results of two-feature searching are better than single-feature searching. Figure 51 shows examples of results of a query using quad trees of multiple-feature histograms. The result yields better results than the results of one-feature query; better results are those with more similarities in color and texture. 1
The high-resolution, color version is also available at http://www.seas.gwu.edu/student/punpiti/ref/cgim98-5.tif
Moreover, results from weighted cascading method also show good results in both human perception and retrieval time. Table 2 show results when we change the number of input of the second feature comparison, which are output of the first one. It can be noticed that we can reduce a large number of comparisons in the second feature, while we still maintain the accuracy.
Input
longest. Our objective was to devise and algorithm with the retrieval time close to a single feature query and the results close to a multifeature query. Four variations were used for the cascading weighted search algorithm. Each variation used a different subset size of input for the second feature. The variations ranged from the .01% with the closest distance to the query, to the 10% with the closest distance to the query. Using only 0.01% of the database as input to the second stage, the results were slightly worse than the color only search algorithm in both time and quality. As more of the database was introduced the time degraded only slightly and the results improved to match the baseline.
6. SUMMARY Output
FIGURE 5: QUERY USING QUAD TREE STRUCTURE OF COLOR AND TEXTURE HISTOGRAMS (DECREASING ACCURACY OF RESULTS IS LEFT-TO-RIGHT, TOP-TO-BOTTOM.) TABLE 1: COMPARISON RESULTS AMONG METHODS
Features
Color only Texture only Color and Texture (regular matching) Color and Texture (Weighted Cascading matching)*
Retrieval time 818 761 1601 828
Number of acceptable results 12 2 20 19
*number of input for the second-feature comparison = 0.1 percents of input for the first-feature TABLE 2: RESULTS OF WEIGHTED CASCADING COMPARISON
Percentage of input retained from the first phase 0.01 % 0.10 % 1.00 % 10.00 % 100.00 % (regular matching)
Retrieval time 819 728 890 945 1601
Number of acceptable results 10 19 20 20 20
The regular algorithm searches using both color and texture are considered the baseline with the most accurate results. All other algorithms were compared to this baseline. The top twenty images from each algorithm were compared to the baseline. The top twenty images are the twenty images with the smallest distance from the query. The color only algorithm found twelve of the top twenty. The texture found only two. While the baseline is considered the most accurate the retrieval time was the
We have demonstrated an algorithm that performs a content-based retrieval from an image database using multiple computable features. The algorithm uses a single structure to store the features and is extensible to store an arbitrary number of features. The same structure can be used for other multimedia databases; it is a unified approach to content-based retrieval. The weighted cascading algorithm has been introduced to optimize the retrieval time when multiple features are used to index the database. It achieves the same results as a regular comparison algorithm in about much less computation time.
REFERENCES [1] V. Gudivada and V. Raghavan, Special issue on content-based image retrieval systems, IEEE Computers, Vol. 28, No. 9, September 1995. [2] P. Piamsa-nga, N. A. Alexandridis, G. Blankenship, G. Papakonstantinou, P. Tsanakas, and S. Tzafestas, A Unified Model for Multimedia Retrieval by Content, International Conference on Computer and Their Application (CATA98), 1998. [3] Smithsonian Institute, Online Collection of Pictures, available URL: ftp://photo1.si.edu. [4] Eastman Kodak, company homepage, available URL: http://www.kodak.com. [5] J. R. Smith and S.-F. Chang, Tools and Techniques for color image retrieval, IS&T/SPIE proceedings Vol. 2670, Storage & retrieval for image and video database IV, 1996. [6] J. R. Smith and S.-F. Chang, Automated Binary Texture Feature Sets for Image Retrieval, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1996 [7] MIT Media Lab, Vision Texture database, available URL: http://www-white.media.mit.edu/vismod/imagery/ VisionTexture.