To appear 1996 SPIE Conf. on Visual Comm. and Image Proc.
Image Coding for Content-Based Retrieval Mitchell D. Swanson, Srinath Hosur, Ahmed H. Tew k Department of Electrical Engineering University of Minnesota Minneapolis, MN 55455 email: mswanson, hosur, tew
[email protected]
ABSTRACT We develop a new coding technique for content-based retrieval of images and text documents which minimizes a weighted sum of the expected compressed le size and query response time. Files are coded into three parts: (1) a header consisting of concatenated query term codewords, (2) locations of the query terms, and (3) the remainder of the le. The coding algorithm speci es the relative position and codeword length of all query terms. Our approach leads to a progressive re nement retrieval by successively reducing the number of searched les as more bits are read. It also supports progressive transmission. Keywords: content-based retrieval, image databases, image coding, text coding
1 INTRODUCTION With continued growth in electronic data, information and multimedia systems need to eciently manage vast collections of text and image les. To accomplish this task, the les must be eciently stored and readily accessed through content-based searches. Content-based searching allows for retrieval of text documents which contain certain words or phrases and images composed of certain image objects, textures, colors, etc. Ecient storage and transmission of such les is accomplished using compression techniques. However, traditional compression techniques are oriented towards storage eciency and do not support content-based access. Currently, a separate indexing mechanism must be constructed. This situation is depicted in Fig. 1(a) and (b). Although compression eciently represents the data, an indexing structure must be stored. This is inecient. An index is redundant as its construction is based on data which is already stored in the compressed les. Furthermore, an index can be very large, e.g., as large as the data it indexes. Since the index must be searched when a query is formulated, the search time increases with the index size. In this paper, we present a new generation coding algorithm for text and images which is designed for ecient data compression and support of content-based retrieval. In Fig. 1(c), we show the ecient storage requirements of our new compression technique. The coding algorithm we propose implicitly indexes the collection by building query support into the compressed les. Furthermore, the coding procedure is exible as it minimizes a weighted sum of the expected le size in bits (compressed le size) and the expected number of bits that need to be read to answer a query (query response time). The le considered here may be either a text document or an image le. This work was supported by AFOSR under grant AF/F49620-94-1-0461.
Traditional Coding
Traditional Coding
Collection of Image or Text Documents
Compressed Documents
with Index
Compressed Documents
New Coding Technique
Compressed & Indexed Documents
(a)
Index
(b)
Coding Text Document or Image
File Header (query terms) (compressed) Term locations in document Rest of document (non-query terms) (compressed)
(c)
Decoding
Storage Size
Figure 1: Storage requirements for a le collection in terms of (a) compressed les, (b) compressed les and separate index, (c) our new coding technique.
Figure 2: Diagram of new coding technique showing three le sections.
To achieve our goal, we code each le into three sections: 1) a le header consisting of concatenated query term codewords, 2) a set of indices denoting the locations of these terms in the le, and 3) the remainder of the le (c.f. Fig. 2). The coding technique determines the lengths of the codewords corresponding to a multiresolution representation of each query term. It also completely speci es their relative positions. Each le header is therefore constructed by concatenating the codewords of a multiresolution representation of each query term which appears in that le. The order of the concatenation is that speci ed by the coding algorithm. The le header contains all of the information needed to answer a content-based query. As a result, there is no need to read the remaining portion of the compressed le during a query. When a query is posed, the le header of each document or image is searched sequentially for the query term codewords instead of the actual le or a separate index. Since the relative order of the terms is know a priori, a search is terminated early once we read a codeword that corresponds to a query term that should appear after that corresponding to the actual query. To reconstruct a le from its coded version, the main part of the document or image, i.e., the non-query data, is decompressed. Each term in the le header is decoded and inserted into the main part of the document or image according to the coded term locations. Thus, there is no ambiguity during the decoding process. A major feature of our approach is that it implements a progressive re nement retrieval by successively reducing the number of searched les as more bits are read. Based on multiresolution techniques, the le header is constructed in terms of a coarse-to- ne discriminant structure. At the coarsest scale, only a few bits are read and a large number of les not matching the query criteria are quickly rejected. Searching the remaining documents proceeds in terms of the next discriminant scale. This oers a signi cant speed improvement by quickly rejecting les as potential retrievals and concentrating search resources on fewer les. In addition to content-based progressive re nement retrieval, our coding algorithm features progressive transmission, direct manipulation of coded data, and support for location dependence in query terms. These issues will be discussed further. In the next section, we review current techniques used in compressing and indexing image les. In Section 3, we present our new coding technique in a text database environment. Image coding for content-based retrieval is presented in Section 4, followed by experimental results and conclusions.
2 IMAGE COMPRESSION AND INDEXING Information systems have to store, retrieve, and manage large numbers of text les and images. Traditionally, storage and retrieval of les is accomplished using compression and indexing, respectively. In this section, we
review transform and object-based image coding techniques which are used in our new coding algorithm. Keywordbased and content-based image indexing methods are also reviewed.
2.1 Image Compression Data compression conserves storage space and reduces the use of channel bandwidth during data transmission. Furthermore, manipulation or modi cation of data (e.g., a \ nd and replace" operation) is much more ecient when the data are compressed as fewer bits need to be processed. Image compression techniques tend to be lossy. Speci cally, the reconstructed image is not exactly the same as the original image. Lossy techniques give much better compression results than lossless methods. The image coding algorithm we develop in Section 4 is based on transform and object coding techniques. The block diagram of a transform coder is shown in Fig. 3. The transform block maps an image into a domain dierent from the intensity domain. The resulting transform coecients are coded. Transform coders are popular because of their decorrelating eects on image pixels and their energy compaction property. Classical transforms include the Karhunen-Loeve transform and discrete cosine transform (DCT).1 The Joint Photographic Experts Group (JPEG) still-image compression standard2 is based on the DCT. Recently, wavelet transforms3{5 have become popular for image compression. A wavelet transform provides a compact multiresolution representation of an image. A wavelet transform can be computed using a 2-band perfect reconstruction lter bank as shown in Fig. 4. The input image is simultaneously passed through lowpass L and highpass H lters and then decimated by 2 to give approximation and detail components of the original signal (analysis section). The two decimated signals may be upsampled and passed through complementary lters and summed to reconstruct the original signal (synthesis section). A lter bank is cascaded with 1 or more additional lter banks to provide further resolutions of the input signal. The cascade of lter banks is termed a multiresolution pyramid due to its structure. In addition to coding applications, wavelet transforms have be found useful for texture classi cation and as a pre-processing tool for image recognition.6 The transformation operation is followed by a quantization step. To represent an image with a nite number of bits, the transform coecients must be quantized. Quantization may be scalar, i.e., the amplitudes of individual transform coecients are mapped to values represented by a nite number of bits. Another method, vector quantization (VQ),7 addresses a block or vector of transform coecients as a single entity and jointly quantizes the scalars in the vector. The quantization step maps the vector of transform coecients to the most appropriate vector quantization level in a nite codebook. A perceptual analysis stage may be included in a coding algorithm to take into account the human visual system.8 These techniques exploit frequency and spatial domain visual masking eects to reduce irrelevancy in the coded image. Such coding algorithms avoid coding features in an image which cannot be seen by observers and hide errors introduced by the coding process. As shown in Fig. 3, the information obtained from the perceptual analysis stage is typically supplied to the quantizer. x(n)
Perception Model
Transformation
Quantization
Codeword Assignment
x*(n) L
2
2
L*
H
2
2
H*
+
Image
Figure 3: Components in transform coder.
Figure 4: 2-band perfect reconstruction lter bank.
Segmentation
Image
Figure 5: Segmentation of image into objects and regions. Finally, codewords are assigned to the output of the quantization stage. A entropy coder, e.g., the Human9 algorithm, may be used for this purpose. Human coding achieve compression by representing common symbols with short codewords and rare symbols with longer codewords. Probability estimates are obtained using static or adaptive models. Given the probability distribution of a set of symbols, the Human algorithm generates the best possible pre x code for the symbols. In a pre x code, no codeword is the pre x of another codeword. Thus, there is no ambiguity during the decoding process. Image coding may also be object- or region-based. In object-based coding, images are segmented10,11 into various objects and regions of textures as shown in Fig. 5. Each segmented object and region is treated as an individual entity and coded separately. The coding techniques applied to each segmented region may dier according to the characteristics of its region. They include the methods described above, as well as 3-D object modeling techniques.12 Region-based coding is used in the MPEG-4 standard13 and is considered valuable for content access.14 Some image coders allow for progressive transmission.15 Progressive transmission is useful for database browsing applications as it allows the decoded image to be reconstructed in a coarse-to- ne manner. By viewing the coarse approximation to an image, retrieval of an image may be terminated before it is fully decoded. Also, several images may be displayed in a coarse form allowing a user to select a subset to retrieve fully. Most transform coders support progressive transmission for image browsing with proper quantization and ordering of transform coecients.
2.2 Image Indexing While compressing the information in a text or image database allows one to store much more information, it does not address the issue of how to retrieve documents based on content. In fact, current compression techniques make it more dicult to retrieve information as they require the document to be decompressed before being searched. Access to compressed data is provided by an index. An index is usually a separate le or a le header that is stored in addition to the compressed data. It is composed of content speci c information and terms (e.g., words or image features) which occur in the data it indexes. When a query is posed, the index is searched for the terms in the query. Appropriate les may then be retrieved. Images in traditional image databases are retrieved in terms of keywords and text associated with each image.16 Text indexing techniques, e.g., inverted les, signature les, or cluster les,17,18 are used. The keywords and text along with the indexing structure are stored in addition to the compressed images. The image database is accessed using a text query, e.g., \get all images that have cars in them," retrieves images by scanning for the term \car" in the text index. Although useful for some purposes, these techniques suer from the diculty of capturing visual or image properties with textual descriptors. In addition, individual interpretations of scenes may vary greatly, leading to dierences in keywords and query results.
To avoid the problems associated with textual descriptors, content-based retrieval of images is currently under investigation.19{21 Retrieval of an image in terms of content is based on image properties such as color, texture, placement of object(s) within the image and shape of object(s). The numerical features associated with these properties are quanti ed using techniques in the literature.22,23 Unlike text searching, the matching criteria for image features are not exact. Rather, the closeness of images is de ned in terms of similarity functions based on image features. Images are retrieved and displayed in terms of ranked similarity between the query features and the appropriate image features. Currently, most content-based image indexes are constructed by pre-computing the aforementioned image features (e.g., color histograms) for each image and for each of the user-de ned objects in the image. The numerous features are stored along with the compressed image. To invoke a content-based search, a user supplies the system with a sketch, example image, histogram, etc. The features for the query are computed and compared with all of the pre-computed image features in the database. The images are then sorted and displayed by similarity. As mentioned earlier, treating compression and retrieval separately is highly inecient (c.f. Fig. 1). The new image coding technique we introduce in Section 4 minimizes redundancy by using the image data to construct an implicit index.
3 CODING FOR CONTENT-BASED RETRIEVAL We begin by describing our new coding technique in a text database environment where each query term is a word, e.g., \computer" or \coder". The coding procedure is designed to minimize a weighted sum of the expected le size in bits (compressed le size) and the expected number of bits that need to be read to answer a query (query response time). Recall that each text le is coded into three sections: a le header consisting of query terms, a set of indices denoting the locations of these terms in the le, and the remainder of the le (c.f. Fig. 2). Each le header is constructed by concatenating the codeword for each query term (i.e., word) which appears in that le. The order of the concatenation and the codeword lengths are based on the probability and query probability distributions of the query terms. As we shall see below, they are chosen to minimize a weighted sum of expected le size and expected search length. The le headers contain all of the information needed to answer content-based query. As a result, they can then be searched sequentially for the query terms instead of the actual les or a separate index. Let us assume that we have one or more text les from which a set of n query terms ft1 ; t2 ; : : : ; tn g has been extracted. Note that the subscripts do not imply any particular ordering of the terms in the le header (e.g., t2 could come before t1 in the header of a le which contains both terms). To accomplish our goal, we need to: 1) de ne an order (t) for the terms in the le header, and 2) assign a codeword of length Lt to each term ti . For notational convenience, we introduce an ordered set of terms 1 ; 2 ; : : : ; n . The subscripts on these symbols are used to denote the relative ordering of the terms when they occur in the le header, e.g., 1 comes before 2 when they both exist in a header. The ordering (t) associates each ti with an j . i
Let L and p be the codeword length and term probability of term (tj ) = i , respectively. The probability estimates, p , are the same estimates used by entropy coders. Furthermore, let q be the nonzero query probability (probability of being queried) of term i . Query probabilities may be estimated by how frequently each term appears in a query. For n terms, the weighted sum we wish to minimize can be expressed as i
i
i
i
C = E [Search Length] + E [File Length] =
where the weights p^i = (pi ?
X?1 q i
j =1
XL n
i=1
j pi j )
i p^i +
XL n
i=1
i pi ;
(1) (2)
are functions of the ordering (t). Note that the second summand in Eq. 1 is independent of the term ordering
and query probabilities. The user is free to choose 0 to control the tradeo between compression and query response. For < 1, the emphasis of the coder is better query response time. For > 1, emphasis is placed on minimizing expected le length.
3.1 Minimization of Expected Search Length Let us assume that, by using a Human coder, we have the codewords and corresponding lengths L that minimize the le header size. This corresponds to minimizing C for 1. In this case, the Human algorithm generates the best possible pre x code solution for our cost function. i
The rst term in Eq. 1 represents the expected number of bits that need to be read to answer a query. We now describe how to obtain the ordering (t) which minimizes the expected search length given the query and term probability distributions and codeword lengths (i.e., minimizing C with = 0). It can be shown that the expected search length for the ordered elements 1 ; 2 ; :::; n can be expressed as E [Search Length] =
XL n
i=1
i (pi
?
where p = Prob[both i and j occur in le]. i
X?1 q i
j =1
j pi j ) =
XL n
i=1
(3)
i p^i
j
The ordering (t) of the term set ft1; t2 ; : : : ; tn g which minimizes the expected search length is obtained using the following result. Suppose we have two term orders (t) and 0 (t) which have the same ordering except for two adjacent elements, i.e., (tj ) = i (tk ) = i+1 (4) 0 (t ) = 0 (t ) = j
i+1
k
i
for j; k 2 [1; n] and i 2 [1; n ? 1]. Let E [SL] and E [SL]0 denote their expected search lengths. Then E [SL] < E [SL]0 if and only if L L < +1 : (5) q q i
i
i
i+1
The inequality can be derived from Eq. 3. This result tells us that, given the codewords, the criteria for minimizing search length is completely independent of the probabilities p . This is a generalized result of24 which dealt with merging les in a storage device to minimize seek time. In that work, the les were known to exist, i.e., p = 1 for all i. In our case, p is arbitrary. i
i
i
We can obtain the ordering (t) which minimizes Eq. 3 by applying a bubble sort procedure using Eq. 5 as the sorting criterion. Simply de ne (t) in an arbitrary manner to initialize the ordering. Compare n?1 with n using Equation 5, swapping the order of the two terms if necessary. Then compare n?2 with n?1 , and so on. After n compares, the correct ti will be assigned to 1 . The procedure is repeated by comparing n?1 with n and comparing adjacent terms until we obtain 2 , etc. The resulting algorithm relies on simple comparisons and can be performed in O(n2 ). The result in Eq. 5 is intuitively pleasing. For example, if we assume all queries are equiprobable (which is justi ed in cases where the user preferences are not known), then the le header is obtained simply by concatenating the codewords of all the terms which appear in that le, shortest codeword length rst. Thus, to search for a particular indexed term, we search the le header sequentially for the corresponding term codeword and stop the search as soon as we nd a valid codeword whose length is greater than the length of the codeword we are searching for. On the other hand, if all the codewords are equal in length, the le header is constructed by concatenating the codewords of all the terms which appear in that le, highest query probability rst. In this case, we stop the search as soon as we nd a codeword with lower query probability. Generally, for a query term ti , we pre-compute the ratio Lt =qt . Then the le header is searched sequentially. For each term tj read, the ratio Lt =qt is obtained and compared with Lt =qt . If Lt =qt > Lt =qt at any point in the le header, we terminate the search early as the query term does not exist in that le. i
j
j
i
i
i
i
i
j
j
3.2 Algorithm to Minimize Expected Query Length and File Size The previous results rely on the knowledge of the codeword lengths L . We have developed a procedure to simultaneously minimize both expected query length and header le size as expressed in Eq. 1. Using Eq. 3, we rewrite Eq. 1 as i
X C= L n
i=1
i wi
X = L n
i=1
i ((1 + )pi
?1 X ? q i
j =1
j pi j )
(6)
The algorithm is de ned as follows: Step 1: Initialize the ordering (t) arbitrarily. Initialize codeword lengths Li using a Human coder on pi . Step 2: Using Eq. 5, obtain new (t) to minimize expected search length. Step 3: Compute the weights wi (Eq. 6). Step 4: Using Human coder, obtain a new set of codewords with lengths Li using wi as probabilities. Step 5: If codeword lengths did not change, procedure is complete. Otherwise, re-apply Eq. 5 to obtain new
(t). Procedure is complete.
In numerous simulations, the algorithm has been found to converge to the global optimum. Convergence to the global optimum was checked using brute-force search (O(n!)) with up to ten query terms.
3.3 Text Coder Implementation An important consideration in a text retrieval scenario is random access to the compressed text.17 Speci cally, there is often a need to retrieve a portion of an encoded le and decode it. Adaptive algorithms, e.g., arithmetic and Ziv-Lempel coders, make this very dicult, since the model used by the decoder (which is identical to the encoder model) needs all of the preceding text to construct its probability estimates. As a result, a Human coder based on a static (or semi-static) model proves to be a good choice for text databases.17 After Eq. 1 is minimized, it is reasonable to assume that the term probability estimates pi remain xed for a given document or collection of documents. As a result, the codeword and codeword length assigned to each word remains xed. With codeword lengths xed, le headers remain xed in size. The le headers can be regularly updated to minimize expected search length by simply reordering the terms in the header according to a new set of query probabilities. This could occur, for example, once a xed number of queries had been posed. This method is suboptimal as codewords do not change. However, the resulting system remains close to optimal if term probability estimates pi do not change drastically. In addition to reducing storage overhead and increasing retrieval speed, our coding method has two distinct advantages. First, it has built in support for a nearness constraint between two or more query terms18 as the location of each query term is saved in the coded le. For example, two terms in a query may be required to occur adjacent to or within N words of each other. Second, the new coding technique directly supports modi cation or manipulation of the coded data. For example, we can perform an ecient \ nd and replace" operation on the compressed data.
4 IMAGE CODING FOR CONTENT-BASED RETRIEVAL While described in terms of text documents and query words, the coding technique developed in the previous section is readily applicable to image coding. Our image coding algorithm incorporates image segmentation,
object coding, and multiresolution transform methods for progressively re ned content-based retrieval. Image retrieval is based on properties of image regions and objects, e.g., geometrical shapes, faces, clouds, etc. In Fig. 6, we show the block diagram of the image coder. The rst step consists of blocking the image into 8 8 blocks. The block size plays the role of setting a lower limit on granularity of a retrieval. Blocks of size 8 8 were chosen for perceptual reasons. At a normal viewing distance approximately equal to the size of the screen, and with a typical screen resolution of 1024 1280 pixels, this block size is approximately the smallest size we could realistically use to retrieve images. The 8 8 block size is, in eect, a perceptually motivated granularity for our implicit index. Next, a segmentation algorithm is applied to the image to de ne image regions and objects (c.f. Fig. 5). Image regions and objects may denote any entities of interest in the given database application. Examples include shape-based objects like faces, eyes, cars, buildings, windows, etc. Other examples include items more texture- or color-based than shape-based, e.g., regions of grass, sand, clouds, trees, etc. For simplicity, we use the term object to denote any type of segmented region or object. The resolution of the segmentation algorithm may be de ned according to the application. In addition, the segmentation step may bene t from human interaction. The amount of human interaction can be reduced by implementing segmentation algorithms which learn from previous segmentations.25 Each segmented region in the image is then covered with the smallest possible rectangular collection of the previously de ned 8 8 blocks. The collection of blocks is denoted a superblock. An example is shown in Fig. 7(a). Thus, each superblock in the image consists of an object and possibly background pixels and other objects or parts of objects which overlap the superblock. For example, two objects in an image may be a face and a nose on the face. In this case, the face superblock contains the second object. To ensure eciency, we avoid coding pixels more than once. If more than one superblock exists, this can be readily accomplished by de ning \Don't Care" (DC) regions within each superblock. Suppose we have n superblocks, SBi , i = 1; : : : ; n, covering n objects, obi , in an image. Without loss of generality, de ne the superblock symbols in an ordered fashion from the top-left to the lower-right of the image, i.e., SB1 is the superblock nearest the top-left corner of the image, SB2 is the next closest, etc. DC regions are de ned as follows. If any object obi (not the object's superblock, SBi ), i 6= 1 overlaps with SB1 , mark the corresponding pixels as DC. Then proceed to SB2 . If any objects, obi , i 6= 2, overlap with SB2 , the corresponding pixels are marked as DC. Furthermore, mark as DC every non-object pixel in SB2 which overlaps SB1 . Generally, for each i, mark as DC every pixel of SBi that overlaps with another object obj , j 6= i. Also mark every non-object pixel
VQ Mapping Mark DC Regions Segmentation
Construct Superblocks Superblock Removal
Transform Coder
Superblock Locations JPEG
Header Locations Remaining Image Blocks Coded Image
Blocked Image
Figure 6: Block diagram of image coding algorithm.
Superblock
(a)
DC Region for 2nd Superblock
(b)
Figure 7: Example showing (a) superblocks for two objects, and (b) the resulting Don't Care region. in SBi which overlaps SBj , 1 j i ? 1, as DC. When complete, every pixel in the image is marked for accurate coding only once. For the objects in our previous example, we show the DC region for the second object in Fig. 7(b). One may avoid representing DC pixels using transform coding techniques for arbitrarily-shaped image segments.26 In our implementation, we simply ll these regions with background (non-object) pixels. The next step consists of applying a wavelet transform to each superblock. In this way, the powerful multiresolution characteristic of a wavelet decomposition is exploited for recognition. The number of levels in the wavelet decomposition depends on the original size of the superblock. Speci cally, the wavelet transform is applied until the lowest frequency subband is 8 8. Once objects have been de ned and transformed as above, our new coding technique is applied. The coder constructs the le header in terms of the wavelet transform subbands of the segmented objects to minimize the size-search cost function (Eq. 1). To obtain the probability and query probability estimates used by our coding algorithm, we use vector quantization (VQ) to map the subbands to nite dictionaries. The mapping is based on a perceptual distance measure. Speci cally, we construct a frequency domain masking model of each object.27 When computing distances to obtain the appropriate VQ dictionary entry, we set all masked components of the error image to zero, i.e., only non-masked components contribute to the norm of the error. Probability estimates for VQ dictionary elements are obtained by counting the number of subbands mapped to the corresponding vector quantization levels. Using the techniques developed in Section 3, the probabilities associated with each term in the dictionary determine the proper location and codeword length of the subband codeword in the compressed le header. Furthermore, the header is ordered in a coarse-to- ne manner. As shown in Fig. 8, the rst layer of the le header consists of the lowest resolution (coarsest) subband. Within this layer, the 8 8 subband components of all the objects in the image are ordered according to probability estimates obtained from the VQ mapping and codeword lengths. The next layers in the le header consists of all the 8 16, 16 8, 16 16, etc., subbands of the objects in the image. Mapping subbands larger than 8 8 to a nite dictionary is simpli ed using previous subband mappings. Speci cally, the mapping exploits correlation which exists between subbands to greatly reduce the VQ search space for larger subbands. User-de ned queries, in the form of sketches or sample images, are applied as an input to the wavelet transform. The resulting query term subbands are mapped to the nite VQ dictionaries to obtain the appropriate codewords for the search. As the layers in the le header are ordered from lowest resolution to highest resolution subbands, the subband sizes are also ordered from smallest to largest, i.e., 8 8, then 8 16, etc. In this way, our new coding technique implements a progressive re nement retrieval by successively reducing the number of searched les as more bits are read. At the coarsest scale, only a few bits are read and a large number of les not matching the query criteria are quickly rejected. Searching the remaining documents proceeds in terms of the next discriminant scale. This oers a signi cant speed improvement by quickly rejecting les as potential retrievals and concentrating search resources on fewer les.
Header Locations Remaining Image Blocks
8x8 Subbands 8x16 Subbands 16x8 Subbands 16x16 Subbands
Coded Image File Header
Figure 8: Ordering of le header in coded image. The superblocks are removed from the original image. The remaining image regions, i.e., the 8 8 blocks which are not elements of a superblock, are coded using a simple block coding algorithm (e.g., JPEG). The JPEG codewords of the remaining image blocks are stored in the third part of the encoded le (c.f. Fig. 6). In addition, the position of each superblock is stored for decoding purposes. Coding position is inexpensive as each superblock is constructed in terms of 8 8 blocks. Decoding is accomplished by rst decoding the non-header information in the compressed le. The header is then decoded. Each superblock is passed through an inverse wavelet transform and positioned in the decoded image according to encoded location. Superblocks are positioned in reverse order of the aforementioned DC region marking procedure, i.e., from lower-right to upper-left. In this way, DC regions are written over by correctly encoded pixels. Our image coding algorithm has several nice properties in addition to ecient storage, First, our image coding algorithm supports a special kind of progressive image transmission. Speci cally, the image headers may be decoded and placed in a blank image according to their locations. In this way, browsing of objects in an image is supported. Conventional progressive image transmission decodes a coarse-to- ne version of the entire image. Ours decodes a coarse-to- ne version of the objects in an image. For the same number of decoded bits, our decoder presents a much sharper version of important image objects. Second, manipulation and modi cation of image objects regions is supported as the objects are encoded in the le header. This is not possible with a separate index. Third, matching a query object with image objects may be made location dependent. Speci cally, the location of each object in the image is recorded in the second part of the compressed image. The query speci cation may include a location in the image where the object should appear. As a result, the similarity measure may include a distance factor.
5 RESULTS To illustrate our coding procedure, we encoded a collection of 20 grayscale images of size 256 256 using the image coding algorithm of the preceding section. The collection includes a wide variety of images, e.g., faces, building, aerial photos, trees, etc. Each image contained 2 to 8 objects of varying sizes. The total number of objects in the database is 70. Queries are posed using sample objects and image regions. In Fig. 9 we show an example of a query and retrieved images. The query object is the woman's face which is outlined in a block box. The box around the woman's hand denotes a second object in the image and was not used for query purposes. The procedure retrieved the only two images in the database that had faces. Note that the faces are of dissimilar size. No other images were retrieved as they failed to meet a similarity threshold. Initially, the 8 8 subbands of each object in the entire 20 image collection were read. Ten les not meeting the matching criteria were quickly rejected. Searching proceeded in terms of the 8 16, 16 8, etc., subbands of the
Figure 9: Original image and retrieved images. Query term is woman's face. remaining images. A larger image database employing our new coding algorithm is currently under construction.
6 CONCLUSIONS Information systems have to store, retrieve, and manage large numbers of text les and images. To address this task, we introduce a new coding technique for content-based retrieval of images and text documents. The new coding procedure minimizes a weighted sum of the expected compressed le size and the expected query response time. For each le, the coding algorithm constructs a le header by concatenating the codewords of a multiresolution representation of each query term which appears in that le. When a query is posed, each le header is searched sequentially for the query terms. The search may be terminated early by a simple ratio test which determines when a query term does not exist anywhere in the header. Our new coding technique implicitly indexes les rather than creating a separate redundant index. Several other advantages also exist. First, our new image coding technique implements a progressive re nement retrieval by successively reducing the number of searched les as more bits are read. This oers a signi cant speed improvement by quickly rejecting les as potential retrievals and concentrating search resources on fewer les. Second, our coder includes support for a nearness constraint between query terms. Finally, direct manipulation and modi cation of query terms and objects in the compressed les is easily accomplished.
7 REFERENCES [1] N. Ahmed and K. R. Rao, Orthogonal Transforms for Digital Signal Processing. New York: Springer-Verlag, Inc., 1975. [2] G. Wallace, \The JPEG Still Picture Compression Standard," Communications of the ACM, vol. 34, no. 4, pp. 30{44, 1991. [3] O. Rioul and M. Vetterli, \Wavelets and Signal Processing," IEEE Signal Processing Mag., vol. 8, pp. 14{38, Oct. 1991. [4] G. Strang, \Wavelets and Dilation Equations: A Brief Introduction," SIAM Review, vol. 31, pp. 614{627, 1989. [5] I. Daubechies, \Orthonormal Bases of Compactly Supported Wavelets," Comm. Pure and Applied Math., vol. 41, pp. 909{996, Nov. 1988. [6] P. Wunsch and A. F. Laine, \Wavelet Descriptors for Multiresolution Recognition of Handprinted Characters," Pattern Recognition, vol. 28, no. 8, pp. 1237{1249, 1995. [7] A. Gersho and R. Gray, Vector Quantization and Signal Compression. Boston, MA: Kluwer Academic, 1992.
[8] N. Jayant, J. Johnston, and R. Safranek, \Signal Compression Based on Models of Human Perception," Proc. of the IEEE, vol. 81, pp. 1385{1422, oct 1993. [9] D. Human, \A Method for the Construction of Minimum Redundancy Codes," Proc. IRE, vol. 40, pp. 1098{ 1101, sep 1952. [10] C. Bouman and B. Liu, \Multiple Resolution Segmentation of Textured Images," IEEE Trans. on Patt. Anal. and Mach. Intell., vol. 13, no. 2, pp. 99{113, 1991. [11] S. Zhu, T. Lee, and A. Yuille, \Region Competition: Unifying Snakes, Region Growing, Energy/Bayes/MDL for Multi-Band Image Segmentation," in Int. Conf. on Computer Vision, (Boston, MA), pp. 416{423, 1995. [12] K. Aizawa, \Model-Based Image Coding," SPIE, vol. 2308, pp. 1035{1049, 1994. [13] I. Corset, S. Jeannin, and L. Bouchard, \MPEG-4: Very Low Bit Rate Coding for Multimedia Applications," SPIE, vol. 2308, pp. 1065{1073, 1994. [14] R. W. Picard, \Content Access for Image/Video Coding: `The Fourth Criterion'," MPEG Doc. 127, Tech. Rep. 295, MIT Multimedia Lab, 1994. [15] B. Chitprasert and K. Rao, \Human Visual Weighted Progressive Image Transmission," IEEE Trans. on Communications, vol. 38, no. 7, pp. 1040{1044, 1990. [16] A. E. Cawkell, \Picture Queries and Picture Databases," Journal of Info. Sciences, vol. 19, pp. 409{423, 1993. [17] I. H. Witten, A. Moat, and T. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images. New York, N.Y.: Van Nostrand Reinhold, 1994. [18] G. Salton, Introduction to Modern Information Retrieval. New York, NY: McGraw-Hill, 1983. [19] W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, C. Faloutsos, and G. Taubin, \The QBIC Project: Querying Images by Content using Color, Texture, and Shape," SPIE 1993 Storage and Retrieval for Image and Video Databases, vol. 1908, pp. 173{187, 1993. [20] K. Hirata and T. Kato, \Rough Sketch-Based Image Information Retrieval," NEC Research & Development, vol. 34, no. 2, pp. 263{273, 1993. [21] J. Gary and R. Mehrotra, \Shape Similarity-Based Retrieval in Image Database Systems," SPIE Image Storage and Retrieval Systems, vol. 1662, pp. 2{8, 1992. [22] M. Swain and D. Ballard, \Color Indexing," Int. J. of Computer Vision, vol. 7, no. 1, pp. 11{32, 1991. [23] M. Ioka, \A Method of De ning the Similarity of Images on the Basis of Color Information." Tech. Rep. RT-003 0, IBM Tokyo Research Lab, 1989. [24] D. A. Knuth, The Art of Computer Science: Sorting and Searching, vol. 3. Reading, MA: Addison Wesley, 1973. [25] T. Minka and R. Picard, \Interactive Learning Using a Society of Models." MIT Tech Report No. 349. [26] S.-F. Chang and D. Messerschitt, \Transform Coding of Arbitrarily-Shaped Image Segments," in Proc. ACM Multimedia 93, (Anaheim, CA), pp. 83{90, 1993. [27] B. Zhu, A. Tew k, and O. Gerek, \Low Bit Rate Near-Transparent Image Coding," in Proc. of the SPIE Int. Conf. on Wavelet Apps. for Dual Use, vol. 2491, (Orlando, FL), pp. 173{184, 1995.