computations are those of bj;1 j m. This is perfectly .... S. K. Choubey and V. V. Raghavan, \Generic and fully automatic content-based image retrieval using colorĀ ...
Generic and Fully Automatic Content Based Image Retrieval Architecture Suresh K Choubey1 and Vijay V. Raghavan2 1
Magnetic Resonance Center, General Electric Medical Systems, Waukesha, WI 53188, USA 2 Center for Advanced Computer Studies, University of Southwestern Louisiana, Lafayette, LA 70504, USA
Abstract. Content-based retrieval requires the choice of distance functions for determining inter-image distances. Distance functions considered to be desirable for computing inter-image distances are often too expensive, computationally, to be used for on-line retrieval from large image databases. In this paper, we propose a generic and ecient contentbased image retrieval architecture where the original images are mapped on to an abstract feature space such that the desired (or real) inter-image distances correspond to the distances between the vector representations of the images in the feature space. It is shown that it is more ecient to compute distances between these feature vectors and use them as estimates of the real distances. We have conducted experiments using color as the low-level feature. The results show a substantial reduction in the size of the feature space. The experimental results also indicate that high accuracy is achieved for the set of queries tested.
1 Introduction An Enormous amount of image data is being generated at an ever-increasing rate. It is accumulated from satellites, nger printing and mug-shot devices in law-enforcement agencies, medical applications and entertainment industry. Image databases are becoming common in medicine, law-enforcement agencies, art galleries and museum management, geographic information systems, weather forecasting (meteorological science), trademark and copyright database management, picture archiving and communication systems and multimedia encyclopedias. This leads to the necessity of large image databases. Image databases (IDB) are large collection of images. Image Retrieval (IR) problem is concerned with retrieving images that are relevant to user's requests from an image database. A content-based image retrieval (CBIR) uses information from the content of images for retrieval and helps the user retrieve images relevant to the contents of a query image. By content, we mean computable low-level features such as color, shape, texture, object centroids and boundaries. A CBIR system is required to manage the image data eciently. For a given query the image content of the query is directly compared with that of the images in the image database and a desired number of images close to the query image is retrieved. However, CBIR remains a very dicult problem as the technology for doing retrieval is still in
the process of getting mature. It is very dicult to extract semantics associated with a given image [1]. Approaches to CBIR can be classi ed into two broad classes of attribute-based and feature-based. In attribute-based approach, the image contents are modeled as a set of attributes extracted manually or semi-automatically and managed within the framework of conventional database management system. In a featurebased CBIR, images are represented by their contents and the comparison is made between the contents of the query and the images in the image database [1]. They have been discussed in detail in [2]. The feature-based approach can be further classi ed based on the dependence of retrieval process on how generic the approach is and the degree of automation. In our context, a generic system is de ned as the one, where the processing steps remain largely the same for dierent choices of image content. Approaches to feature-based CBIR can be semi-automatic non-generic, semi-automatic generic, automatic non-generic, or automatic generic. The most desired feature-based approach is a generic and fully automatic approach. An IR system should be automatic and ecient to give an acceptable response time. The use of low-level features in the feature-based image retrieval systems makes the approach automatic but not necessarily ecient. The use of \real" distance in the retrieval process can be computationally very expensive. For example, in the case of retrieval by color, accounting for the eect of color correlations is time consuming. For this, most of the processing that can be done a priori, should be done o-line and as little computation as possible should be done on-line. The IR system should also be generic enough to be transparent when dierent low-level features are used with almost no or very little changes. We propose a generic and ecient feature-based CBIR architecture. This is based on the work done by Goldfarb [3] to bridge the gap between syntactic and statistical pattern recognition for classi cation problems. We have reported image retrieval speci c to color only in [4] and image retrieval by texture in [5]. In this paper, we discuss the general architecture of our approach and provide preliminary experimental results for retrieval by color to show the eectiveness of the proposed approach. In this approach, a "real" inter-image distance is used to compute highlevel (composite) features for images in the database and to generate a training set. The real inter-image distance is de ned by specifying a distance function according to which a user wants to perform the retrieval. The training set is used to compute the feature vector for a query image. In this case, the real distances of the query image to only the images in the training set (and not the whole image database) needs to be computed. This makes this approach ecient as the training set has fewer images compared to the number of images in the database and the computation of real distance is expensive. The image retrieval is done by using estimated distance between the query and the images of the image database. It may be noted that in our approach a query is expected to be in the form of either an image or an intermediate representation specifying certain low-level image properties.
This paper is organized into six sections. Sect. 2 brie y discusses the current approaches to solutions of IR problems by other researchers and our motivation. In Sect. 3, we de ne the problem formally. In Sect. 4, we propose our generic approach for an ecient CBIR. Sect. 5 describes the experiments and the results. Sect. 6 provides the conclusion and future work.
2 Related Work and Motivation Literature shows that the low-level features have been used extensively as a discriminating feature in automatic IR. The use of low-level features may make the information and storage retrieval system automatic but not necessarily ecient. Chang et al. [6] have used color as a discriminating feature for image retrieval. They use real distance between images to index the images in the image database. For a given query, the images in the image database are ranked on the basis of real distance between the query image and the images in the image database. Hence the on-line retrieval is very computation intensive. Faloutsos et al. [7], while using low-level features such as color, have tried to make it more ecient by using estimated distance instead of real distance. They reduce the search space by eliminating many images from consideration by using an estimated distance in such a way that a lower bound lemma is satis ed. This lemma does not allow the elimination of relevant images, but may retain many non-relevant images. Then they perform the search on the basis of real distance in the search space with fewer images. The use of estimated distance does not usually preserve the order induced by the real distance among the images. Thus, they can not use a nearest neighbor type search. Also the distance estimation is non-generic. This approach may end up computing real distance for a large proportion of original images. In [8], the authors have tried to make the retrieval ecient at the cost of accuracy by reducing the image resolution (the process is called quantization). However, when images are quantized to such a low resolution, there is a very strong possibility of loosing important color information relevant to dierentiating one image from other. We therefore see the need for an ecient and eective on-line image retrieval technique where the expensive real distance computation is performed with respect to only a few images and the images are represented by means of lowdimensional feature vectors. When mapping the original images to a feature space, it is desirable that the real inter-image distances are preserved in order to facilitate nearest neighbor type searches.
3 Problem De nition Let us assume that an image database is populated with a set (or a collection) of images O0 ; O1 ; O2 ; ::::::::On?2 ; On?1 . Let this collection be denoted by U . Let Q be a query image. Let the real inter-image distance between any two images Oi and Oj be denoted by (Oi ; Oj ).
The user can specify a query to retrieve a number of relevant images. Let h be the number of images, closest to the query image Q, that the user wants to retrieve, assuming that h < n. This image retrieval problem can be de ned as the ecient retrieval of the best h images according to from a database of n images.
4 Proposed Method The whole image database is divided into initial and incremental image sets. The complete image storage and retrieval process is done in two phases of database population and image retrieval. The database population phase uses low-level image properties such as color to compute the real inter-image distance for images of initial image set. The initial image set is expected to be much smaller than the image database itself. The high-level image feature vectors for these images are computed very eciently in addition to the generation of training set. These feature vectors have considerably fewer features compared to the intermediate image representation and the extent to which they preserve the real inter-image distances among the images, can be controlled. The training set has even fewer images compared to the initial image set. Images from the incremental image set are added using their real distances to only the images in this training set. In the image retrieval phase, we obtain a low-level (or intermediate) representation of a query image, compute its inter-image distance from the images in the training set, compute the query feature vector by using the training set generated in the database population phase, and use the Euclidean distance of this query feature vector from the feature vectors of the images in the image database. The architecture is given in Fig. 1. Our architecture is unique with respect to the generation of the training set during the database population stage. All the possible low-level image processing work of feature extraction is performed a priori at data population time. Only the query processing is done on-line when the query image becomes available. This leads to an eective, ecient, and exible image retrieval system.
4.1 Theoretical Background Let P be the subset of images that has been sampled from the set U to be representative of all possible image classes. This set of images is also referred as the initial image set. Let jP j = k; where k n. The value of k should be as small as possible, in order to achieve faster training period. According to Goldfarb [3], the dissimilarity measure between two objects can be de ned as follows: Let a pseudometric space be a pair (P; ), where P is a set of images and is a non negative real-valued mapping:
: P P ! R+
(1)
Fig. 1. An Architecture of the Proposed System
satisfying following two conditions: (a) 8O1 2 P , 8O2 2 P (O1 ; O2 ) = (O2 ; O1 ) (b) 8O 2 P (O; O) = 0 The mapping is called a pseudometric (or inter-image distance) function. This inter-image distance will be used from this point on and is the only information about images that will be required in the rest of the image retrieval process. From the inter-image distances computed between each pair of images in the initial set, a k k interdistance matrix is constructed. This matrix is symmetric about the diagonal and the diagonal elements are zero, based on the two properties of the distance measure, de ned earlier. If the number of images in the initial image set is k (i:e:; jP j = k), then the interdistance matrix D is de ned as: D = (ij )0i;jk?1 ; (2) ij in the above equation is equal to ( (Oi ); (Oj )). Here , which is an intermediate representation of low-level image features, is de ned as : P ! '; (3) where ' is the space of possible representations of low-level image features. Following discussions lead to a theorem [3] that lays the condition for preserving the inter-image distance of an interdistance matrix into the derived feature space. De nition 1: A pair of non-negative numbers (p; q) will be called the vector signature of a nite pseudometric space (P; ), if there exists an isometric embedding
: (P; ) ! R(p;q) (4) where ( (O1 ); (O2 )) =k (O1 ) ? (O2 ) k2 8O1 ; O2 2 P , such that for any other similar isometric embedding of (P; ) into R(n1 ;n2 ) , we have n1 p; n2 q. The is called a vector representation of (P; ). In other words, (p; q) is the vector signature of a nite pseudometric space (P; ), if R(p;q) is a minimal pseudo-Euclidean vector space, within which (P; )
can be isometrically represented. This idea is illustrated in Fig. 2. Isometric embedding ensures that there exists a distance preserving mapping [3]: De nition 2: Let (P; ) be the pseudometric space de ned earlier and let V be a vector space over R of dimension k ? 1, and let fai g1ik?1 be a basis of such a vector space. A quadratic form on this vector space is given by:
(x) =
X
k?1 1
(02i + 02j ? ij2 )xi xj ; 2 i;j =0
(5)
for x = (x1 ; : : : ; xk?1 ). Here xi and xj are the coordinates of x with respect to the standard basis fai g. Theorem 1: A nite pseudometric space (P; ) has the vector signature (p; q), i the quadratic form given in de nition 2 has the signature (p; q).
Fig. 2. Mapping of (P; ) to V Space (P; ) Space V (Rp+q ) Space Determining whether a quadratic form has a desired signature requires the use of results on symmetric bilinear forms on (P; ) and their connection to quadratic forms. The proof of Theorem 1 is given in [9]. A corollary of the theorm is as follows: A nite pseudometric space (P; ) can be isometrically represented in the Euclidean m(m = p + q)?dimensional space i the quadratic form given in De nition 2 is positive and of rank less than m.
4.2 Database Population Phase A procedure called \ComputeFeatureVector()" to compute the feature vectors of images is given in [2, 4, 5]. This procedure also generates the training sample by calling \GenerateTrainingSet()". This training sample is used to compute query feature vector and feature vectors for images in the incremental image set. This procedure is executed during database population phase only and is also not given here due to space consideration. This algorithm can also be found in [2, 4, 5]. The execution of the ComputeFeatureVector() gives feature vectors of all the images in the initial set. Let these feature vectors be denoted by F0 ; F1 ; : : : ; Fk?1 . Images in incremental image set (Fk ; Fk+1 ; : : : ; Fn?2 ; Fn?1 ) are added on-line.
4.3 Estimated Interimage Distance and Image Retrieval
Let Fi , and Fj be the feature vectors for images Oi and Oj . Then the estimated inter-image distance between two images Oi and Oj can be determined as: Z (Oi ; Oj ) = E (Fi ; Fj ) (6) where E is the Euclidean distance between the feature vectors of images Oi and Oj . Note that the estimated inter-image distance is computed from the feature vectors of images and not the image representations. Also, the computation is very inexpensive, as the feature vectors have much reduced number of elements. The estimated inter-image distance does preserve the real inter-image distance between two images. It can be used instead of real inter-image distance for image retrieval very eciently.
Given the query image Q, one can readily determine the orthogonal projection (w) of Q onto the vector representation space R(p;q) . An algorithm called \ComputeQueryVector()" to compute the feature vector of the query image is also given in [2, 4, 5]. Let the query feature vector be given by Fq . Since the matrix T [2, 4, 5] can easily be computed during the database population phase, the only on-line computations are those of bj ; 1 j m. This is perfectly feasible since one has control over the dimension m of the representation space. Once the query feature vector is computed, we compute the estimated interimage distance between the query feature vector and all the images of U by using the Euclidean distance between the query feature vector and the feature vectors of database images. The estimated distance is given by:
Z (Oi ; Q) = E (Fi ; Fq ); for i = 0; 1; :::; n ? 1
(7)
Where E is the Euclidean distance between feature vectors Fi and Fq respectively of images Oi and query image Q.
4.4 Discussion of Proposed Mapping Scheme Following are the advantages and motivation of using the proposed mapping scheme:
{ Compared to multidimensional scaling (where the mapping is to and from
vector representations) [10] this approach transforms original images into an intermediate representation to create inter-image distance matrix. Thus, the intermediate representation does not have to be a vector representation; it may be a string representation (in the case of shape representation) or involve other syntactic or structural representations. { Interimage distance does not have to a metric. It may be pseudometric. This allows us to apply the mapping technique to a wider range of application areas. { In neural networks (e.g., Kohonen's feature map [11, 12]), the mapped feature space is restricted to be low (1 to 3) dimensional. In our mapping, there is no such restriction. { In Kohonen map [11, 12]), distances between all image pairs should be available at the start to construct the map. Hence, that technique can not support incremental addition of new images.
5 Experiments and Results A set of experiments using color was designed to test the retrieval eectiveness of the proposed method. Color images from the Department of Water Resources (DWR) repository, maintained at University of California, Berkeley have been used.
We have used RGB color space and uniform quantization method to quantize the images. Most of the display devices use RGB display. Uniform quantization is a simple method and a good choice in the absence of information regarding the color distribution of the images. Color histograms can be used to describe characteristics of the unique color distributions of the physical objects, if they are well de ned. For example, sunset can be described with the right percentage of the constituent colors of red, green, and blue. We have used an inter-image distance function for retrieval by image color content, also used by Faloutsos et al., Chang el al., and other researchers [6, 7, 13]. The real inter-image distance function (X; Y ) between two images OX and OY , can be computed as follows: (X; Y ) = (X ? Y )t A(X ? Y ) = (xi ? yi )aij (xj ? yj ); (8)
X i;j
where elements aij represent the cross correlation between color i and color j , for 0 i; j L ? 1, and X = (OX ); Y = (OY ). This compensation takes into account the fact that colors are usually not orthogonal. The commonly used value of L is 64 or 256. Images numbered 0 through 31 from four distinctive color groups with predominant green, red, yellow, and blue colors were used in this experiment. These images were quantized to 64 colors and quantized images were used to compute the color histograms for every image in the image set. From the histograms, the color correlation matrix and then real inter-image distance between each images were computed. From these inter-image distances (inter-image matrix), the high-level image feature vectors were computed. For ease of presentation, we have randomly picked 8 images to be used as query images and have ranked other images in the image set with closest image next to the query image, based on real and estimated inter-image distances. We have used Rnorm performance measure, which was rst introduced in LIVE-Project [14] and is discussed in detail in [2].
5.1 Results
The result is tabulated in Table 1. Due to the limited space, only top fteen retrieved images are shown in this table. For every query image, the rst row in the table gives the ranking based on estimated inter-image distance and the second row gives the ranking based on the real inter-image distance. In essence, this table illustrates the correspondence between the rankings based on estimated and real inter-image distances. After quantization, every image had 64 low-level features. However, after high level features were computed, every image were represented by only ve high-level features. Hence, we were able to reduce the feature space dimensions from 64 to 5 to make on-line computation very ecient. Also as we see in Table 1 that almost all top 15 images are also in top 15 when estimated distance is used to rank the images instead of real distance. For example, in Table 1, top 13 images by real distance are also in top 13 positions
Table 1. Comparison of Ranking (Z =Estimated Distance, =Real Distance) R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 Z(10(0645 0010)) 10 24 2 8 15 0 26 9 18 14 31 27 13 11 19 (10(0645 0010)) 10 24 0 15 2 14 26 8 9 13 18 27 31 1 3 Z(12(0812 0020)) 12 13 14 15 9 8 11 3 28 22 24 10 16 2 20 (12(0812 0020)) 12 13 14 15 22 8 9 3 2 20 11 10 28 16 17 Z(14(0812 0040)) 14 13 15 12 8 9 10 11 24 2 3 22 28 31 26 (14(0812 0040)) 14 13 15 12 10 9 8 2 22 11 24 28 3 16 0 Z(22(1202 0082)) 22 28 30 21 16 11 31 29 23 25 26 1 2 8 18 (22(1202 0082)) 22 30 21 16 28 29 1 25 2 11 23 18 12 13 31 Z(24(1202 0084)) 24 10 26 2 31 18 8 0 9 1 19 11 27 15 3 (24(1202 0084)) 24 10 26 31 2 8 9 0 1 18 15 14 11 25 30 Z(25(1202 0085)) 25 29 26 30 28 21 22 31 24 19 1 23 2 16 11 (25(1202 0085)) 25 30 21 22 28 29 26 10 24 19 1 2 31 27 0 Z(29(1624 0044)) 29 25 21 28 30 22 26 16 23 31 11 1 24 4 2 (29(1624 0044)) 29 30 21 16 22 26 25 31 28 23 1 24 11 2 17 Z(30(1625 0004)) 30 25 22 31 29 28 26 1 21 2 24 23 16 0 18 (30(1625 0004)) 30 25 22 29 31 21 2 28 1 26 0 16 18 23 8
by estimated distance for query image 10. For query image 12, top 14 images by real distance are in top 15 images by estimated distance. In fact for this image, top 4 images were exactly at the same ranks. We notice the same for image no. 14. The degree of accuracy in the match can further be improved by adjusting the threshold value that we use to decide the number of high level image features. Another signi cant result that contributes to eciency is that during the computation of query feature vector, we did not have to compute its distance to every image in the database. Instead, we used the training set generated during the initial database population stage. The Rnorm for the above query images is given in Table 2. They are computed based on rankings of all 32 images. The average Rnorm for all these query images is 0:88. The values for Rnorm also suggest that the rankings of images for a given query image according to estimated distances is in agreement to large extent with the rankings based on real distances.
6 Conclusion We have proposed a generic and ecient automatic CBIR architecture. Except for the computation of inter-image distances, our method is generic relative to image properties such as shape, or texture instead of color. Our experimental results for retrieval by color show that the dimension of the feature space can be substantially reduced (64 to 5), while preserving real inter-image distances. Rnorm of estimated inter-image distance with respect to the real inter-image distance for all the query images is 0:88. Our on-line computation for a given
Table 2. Rnorm Values for Query Images. Query Image Rnorm 10(0645 0010) .915323 12(0812 0020) .875000 14(0812 0040) .883065 22(1202 0082) .860887 24(1202 0084) .870968 25(1202 0085) .862903 29(1624 0044) .877016 30(1625 0004) .907258
query image is substantially reduced as it needs to compute the real distances only between the query image and the images in the training sample generated during the image database population. It will be useful to study the indexing of database images using an eective spatial indexing mechanism such as R-tree, after image feature vectors have been computed. Also the rankings obtained by using proposed method should be evaluated against rankings provided by an expert.
References 1. V. Gudivada and V. Raghavan, \Content-Based Image Retrieval Systems," IEEE Computer, vol. 28, no. 9, pp. 18{22, 1995. 2. S. K. Choubey, Generic and Fully Automatic Content-Based Image Retrieval Using Color Shape and Texture. PhD thesis, University of Southwestern Louisiana Lafayette LA, 1997. 3. L. Goldfarb, \A Uni ed Approach to Pattern Recognition," Pattern Recognition, vol. 17, no. 5, pp. 575{582, 1984. 4. S. K. Choubey and V. V. Raghavan, \Generic and fully automatic content-based image retrieval using color," in Pattern Recognition in Practive V, (Vlieland, Netherland), June 1997. 5. S. K. Choubey and V. V. Raghavan, \Generic and fully automatic content-based image retrieval using texture," in International Conference on Imaging Science, Systems, and Applications (CISST'97), (Las Vegas, Nevada), pp. 228{237, JuneJuly 1997. 6. S. F. Chang, A. Eleftheriadis, and D. Anastassiou, \Development of Columbia's Video on Demand Testbed," International Journal of Image Communication- Signal Processing, vol. 8, no. 3, pp. 191{207, 1996. 7. C. Faloutsos et al., \Ecient and Eective Querying by Image Content," Journal of Intelligent Information Systems, vol. 3, no. 3, pp. 231{262, 1994. 8. X. Wan and C. C. J. Kuo, \Color Space Quantization for Image Retrieval," in SPIE:Storage and Retrieval for Still Image and Video Database'96, (SanJose California), February 1996.
9. L. Goldfarb, \A New Approach to Pattern Recognition," in Progress in Machine Intelligence and Pattern Recognition, eds: L.N. Kanal and A. Rosenfeld, vol. 2, North Holland Publishing Company, 1985. 10. M. R. Rosenzweig and L. W. Porter, eds., Multidimensional scaling, vol. 31, 1980. 11. T. Kohonen, \Self-Organized Formation of Topologically Correct Feature Maps," Biological Cybernetics, vol. 43, pp. 59{69, 1982. 12. T. Kohonen, Self-Organization and Associative Memory. Berlin, Germany: Springer-Verlag, 1984. 13. A. K. Jain and A. Vailaya, \Image Retrieval Using Color and Shape," Pattern Recognition, vol. 29, no. 8, pp. 1233{44, 1996. 14. P. Bollmann et al., \The LIVE-Project|Retrieval Experiments Based on Evaluation Viewpoints," in ACM/SIGIR Conference on Research & Development in Information Retrieval, (Montreal, Canada), pp. 213{214, June 1985.
This article was processed using the LATEX macro package with LLNCS style