color blob size table (Cbst) as an image content descriptor. Cbst is a 2-D array ... age indexing and retrieval using well established binary vision technology.
A Binary Color Vision Framework for Content-based Image Indexing G. Qiu and S. Sudirman School of Computer Science, The University of Nottingham {qiu, sxs}@cs.nott.ac.uk Abstract. We have developed an elegant and effective method for contentbased color image indexing and retrieval. A color image is first represented as a sequence of binary images each captures the presence or absence of a predefined visual feature, such as color. Binary vision algorithms are then used to analyze the geometric properties of the bit planes. The size, shape, or geometry moment of each connected binary region on the visual feature planes can then be computed to characterize the image content. In this paper, we introduce the color blob size table (Cbst) as an image content descriptor. Cbst is a 2-D array that captures the co-occurrence statistics of connected regions sizes and their colors. Unlike other similar methods in the literature, Cbst enables the employment of simple numerical metric measures to compare image similarity based on the properties of region segments. We will demonstrate the effectiveness of the method through its application to content-based retrieval from image database.
1 Introduction Image indexing and retrieval is an important area of visual information management. This area has received extensive research interest from various communities including image processing, computer vision and database [1]. However, since the problem is complex and complicated, researchers from each community tend to tackle the problem from their own perspective, and hence solutions developed so far mostly reflected this tendency. It is generally agreed that developing an effective and comprehensive solution will require expertise from many disciplines. While many researchers have been trying to develop new and advanced computer vision techniques to tackle the problem, there is general consensus that state of the art vision technologies are still “not there yet”. Many of them either worked only in very restricted conditions or they can be unstable. We believe a practical solution that is stable, reliable and work well in broad conditions will probably be best built around established and tried methods. In this paper, we seek inspiration from a well-established computer vision area, which seemed to be neglected or overlooked by researchers developing solutions to image indexing and retrieval problems. Binary vision, vision techniques developed to deal with binary images, was well developed for several decades [5, 7]. Many useful techniques such as connected component labeling, region property measurement etc have been routinely used in machine
vision for a long time. The motivation of this paper is to seek solutions for color image indexing and retrieval using well established binary vision technology. The organization of the paper is as follow. In Section 2, we present a framework for representing a color image in a sequence of binary images and from which to derive image content descriptions using binary vision technology. Section 3 presents an implementation of the framework. Section 4 presents experimental results. In section 5, we discuss related methods in the literature and section 6 concludes the paper.
2 The Framework Decomposing an image into a sequence of binary images can be very convenient in many image-processing problems. For example, by using the gray values of the image as thresholds and representing images as a sequence of binary images each of them represents the absence or presence of a gray value in a pixel position, an important class of image filtering – statistical filters can be analyzed [2, 3]. Another application of bit-plane decomposition is in image compression/coding [4]. An attractive feature of the binary image is its simplicity. There have been many well-established techniques to deal with binary images. One of our motivations is to develop elegant, reliable and yet effective solution to content-based image indexing problems. And, we would like to seek solutions using binary vision technique [5, 7]. The starting point of course is how to meaningfully represent a given color image in binary form so that content descriptors can be derived from which using binary vision analysis. The general principle is to represent image pixels having similar properties, such as color, texture appearance, or other visual significance on the same bitplane. Once the image has been represented in binary images, then the properties of these binary images can be measured using routines such as connected component labeling, and connected regions size and shape analysis. The general framework of our approach is illustrated in Fig. 1 [6]. An image is first processed by a pixel classifier. Then the bit planes (one for each of the classes) are constructed and binary vision routines are used to compute content descriptors for the image. To implement the framework, one first has to consider how to implement the pixel classifier. The guiding principle is that pixels classified as belong to the same class should have similar visual significance. Secondly, how and what to measure on each of the feature planes such that the measurements are discriminative, and easily usable for indexing and retrieval purpose. We introduce one possible solution in next section.
3 An Implementation of the Framework for Color Features There are obviously various ways to classify the pixels. The criterions should be that pixels classified into the same class should have similar visual properties. This is of course a form of image segmentation [7]. Image segmentation is a key step in many vision systems. Although there have been tremendous effort put into developing accurate and meaningful image segmentation methods by many very capable researchers
and significant progress has been made, a fool-proven segmentation algorithm, one that works well in any circumstances has still yet to be developed. What we want is something that is reliable and its implementation will not fall apart in vast majority of situations. Plus, the classes the pixels being classified into should have meanings that are related to the visual content of the images. View a pixel in isolation, color of the pixel would be the obvious property to choose. This is the color histogram approach [8]. View a pixel and its neighbor together, texture property can be exploited [9]. Color is by far the most popular features used in content-based image indexing, and it can be a very effective content descriptor if used properly. In this section, we present a method for constructing the binary image planes based on color classification (quantization), as shown in Fig. 2. Input color image
Pixel classifier
Bit-plane #1
Bit-plane #2
Bit-plane #3
Bit-plane #N
Binary Vision Routines
Image Content Descriptors
Fig. 1. A framework using binary vision routine for content-based color image indexing. Input color image
Color Palette
C1 Bit-plane
C1
C2 Bit-plane
C2
CN
C3 Bit-plane
CN Bit-plane
Connected component labeling Blob size measurement
Color Blob Size Table
Fig. 2. An implementation based on color classification. Color quantization is used in many areas, and any color based content-based image indexing method uses color quantization of one form or another. Finding the color
codebook, or palette, is realized by a form of vector quantization [10] and there are many established color quantization methods [11]. The palette consists of N representative colors, C1, C2, …, CN, found in one or ensemble of images through some statistical means. Each pixel is compared with the N colors in the palette and is quantized to the color that is closest to the pixel. Let F(x, y) be a pixel vector of the original color image at co-ordinate location (x, y), CPn (x, y) be the binary pixel value of the bit plane for color n at co-ordinate location (x, y). Then the color index, ci(x,y), for the pixel is found as ci (x,y ) = n, if F ( x,y ) − Cn < F ( x,y ) − Cm , for ∀m and m ≠ n
(1)
The binary plane are then defined as 1, if ci ( x, y ) = n CPn ( x, y ) = 0, otherwise
(2)
That is, there are as many binary planes as the number of colors in the palette. When a pixel in the original image is quantized to the nth color, then the value of the corresponding binary pixel on that bit plane is 1, otherwise it will have a value of 0. Therefore the union of all the bit planes form the quantized original image. Fig. 3 shows an illustration of an image and its seven color bit planes. In this ideal case, all visually distinctive regions (including the background) are clearly separated on each bit plane, which enables the application of binary vision routines to analyze their geometric properties. 1
1 0
0
0 1
1
0 0 1
1
0 0 1
Fig. 3. An image (top-left) and its color bit-plane. By measuring the 1-value regions, i.e. their size, shape and other properties, we can tell a lot about the images content.
There are many useful binary routines. The two we will be using in the current paper are connected component labeling and region measurement. Both are very well established vision techniques and details can be found in any computer vision textbook, e.g., see [5, 7].
3.1 The Color Blob Size Table Once the connected pixels are grouped together, these pixels will form "color blobs". The sizes, shapes, locations, and other properties should be indicative of the content
of the scene. Whilst there are many parameters concerning these blobs can be easily and conveniently measured to give information about the content of the original image, we will present one method which simply indexing the sizes of these blobs. We first quantize the size of the blobs into discrete sizes, S1, S2, …SM. In order to make this feature scale invariant, these discrete sizes are relative to the image size. Assuming for bit plane n, for all n, the blobs are labeled as Blobj(n), j =1, 2, …., a color blob size table, Cbst(m, n), m = 1,2, …M, n = 1, 2, …, N is formed as
Cbst (m, n ) =
∑ size(Blob (n)) [ ( ( ))]
∀j , Where Q Size Blob j n = S m
j
(3)
In words, Cbst(m, n) accumulates the number of pixels of those blobs whose size are being quantized to Sm on bit plane n.
4 Experimental Results To evaluate the performance of the new method, we have tested it in a database consisted of over 7000 color photo images. For comparison, we have also implemented the MPEG-7 color structure descriptor method [12]. We used the color quantization scheme in the MPEG-7 standard to create the color palette (in HMMD space). For both the new method and the MPEG-7 CSD, exactly the same color quantization scheme was used. In the new method, the blob sizes were quantized into 9 discrete values relative to the image size as shown in table 1. The blob size quantization steps were non-uniform. The smaller blob sizes were quantized more finely than larger blob sizes. For each image in the database, we calculated its color blob size table. The image similarity was measured according the difference of their color blob size tables. Let CAbst (m, n) and CBbst (m, n) be the color blob size tables of image A and B respectively, the similarity of A and B is measured according to the following L1 norm:
D( A, B ) =
1 ∑ CAbst (m, n ) − CBbst (m, n ) M × N ∀m , n
(4)
Image similarity based on the MPEG-7 colour structure descriptors is also calculated using the same L1 norm measure. Fig. 5 (a) shows a query to retrieve flags for the MPEG-7 CSD method, and Fig. 5 (b) shows the result using the same query image for the new method. (There are 100 Flag images in the database). As can be seen clearly, the new method returns much more relevant images. In this case, the MPEG-7 CSD returned 13 flags in the first 50 positions, whilst the new method returned 38 flags in the first 50 positions. Fig. 6 shows an example of retrieving Poker cards from the database. In this example, MPEG-7 CSD returned 40 cards (black and while) in the first 50 positions, the new method returned 49 cards in the first 50 positions. Yet another example is shown in Fig. 7 which used a fruit as a query example. Although this was not a clear-cut case in terms of the retrieval quantitative performance. The new method performed extremely well subjectively.
Quantized Blob Sizes S1 S2 S3 S4 S5 S6 S7 S8 S9
% of Image Size (IS) 0.01% 0.05% 0.1% 0.5 % 1% 5% 10% 50% 100%
Table 1. Blob size quantization table.
5 Related Methods There have been many content descriptors published in recent years, see the recent survey paper [1] for a comprehensive review. The ones that are most similar to ours are the "Blobworld" of UC Berkeley [13] and the MPEG-7 color structure descriptor [12]. Here we briefly discuss how Cbst relates to and differs from Blobworld and MPEG-7 CSD. Our method is related to Blobworld. Whilst Blobworld tries to use sophisticated image segmentation algorithms, we do not put our emphasis on the segmentation step for two reasons. Firstly, segmentation is difficult and can be unreliable. Second, pixels segmented into the same regions (based on a variety of parameters) may not have simple and meaningful numerical measures to describe the visual properties of the image segments. This makes it difficult to develop simple image matching methods like the one we use here. The introduction of Cbst makes our method differ from blobworld. Whilst blobworld is complicated and not very easy to implement by novices in the field, our method is simple, and can be implemented by any person who know how to write simple programs. It is worthy mentioning that Cbst can be used in conjunction with the segmentation method of blobworld as well, i.e., used to summarize the segmented regions. We believe the idea of viewing pixels with similar visual properties as on a separate bit plane is an important and useful concept, which provides a cognitive model that is conducive to bring out binary vision routines to help the development of simple and yet effective content descriptors. For example, one can easily measure projections [5, 7] of each bit plane thus analyzing the shape of the visual feature distribution. The introduction of the color blob size table has also enabled the development of simple and yet effective image similarity measures. Based on the same idea, i.e., viewing the visual property as one dimension and region geometric measure as another dimension, other simple and useful 2-D tables can be constructed as well. In a way, our method is related to the MPEG-7 CSD. MPEG-7 CSD, described in detail in the standard, tries to incorporate spatial structures of the color distribution
into the content descriptor. It uses an 8 x 8 structuring mask as the structuring element and counts the number of times a particular color is contained within the structuring element as the structuring element scans the image. Our method uses connected region labeling takes the MPEG-7 CSD a step further. In some circumstances, our method will be more advantageous. Fig. 4 illustrates a situation where the MPEG-7 CSD will fail but our method will succeed in distinguishing the two different patterns. In general, MPEG-7 CSD will not be able to distinguish a solid region and a region of the same dimension and color but with holes in the middle which are smaller than the structuring element.
(a)
(b)
(c)
(d)
Fig. 4. MPEG-7 CSD will have the same bin count for the pixels in all these different patterns, our new method will distiguish them (each dot represents a pixel).
6 Summary In this paper, we have presented an elegant content-based image indexing framework and an implementation of the framework has been shown to be tremendously effective. With such a framework (Fig. 1), we can implement the pixel classifier with a variety of features. For example, as well as using color, other features such as texture can be included. We can even make the pixel classifier semantically meaningful, such as skin color [14]. Only one of many possible region measures was presented in this paper, many other region parameters, such as region's shape, moments etc can be easily used. Such a representation has also laid the foundation for building higher level, more intelligent image retrieval models. Different implementations of the framework are currently being actively pursued and we will publish results in the future.
References 1. 2. 3.
A. W. M. Smeulders et al, "Content-based image retrieval at the end of the early years", IEEE Trans PAMI, vol. 22, pp. 1349 - 1380, 2000 J. Fitch et al, "Median filtering by threshold decomposition", IEEE Trans Accoustic, Speech and Signal Processing, vol. 32, pp. 1183 - 1188, 1984 G. Qiu, "Functional optimization properties of median filtering", IEEE Signal Processing Letters, vol. 1, pp. 64 - 65, 1994
4. 5. 6.
7. 8. 9. 10. 11. 12. 13. 14.
S. Kamata et al, “Depth-first coding for multivalued pictures using bit-lane decomposition”, IEEE Trans on Communications, vol. 43, pp. 1961 – 1969, 1995 R. Jain, R. Kasturi and B. Schunck, Machine Vision, McGraw-Hill, 1995 G Qiu, "Image and image content processing, representation and analysis for image matching, indexing or retrieval and database management", UK Patent Application No GB0103965.0, 17th, February 2001 M. Sonka, V. Hlavac and R. Boyle, Image Processing, Analysis and Machine Vision, 2nd Edition, PWS Publishing, 1999 M. J. Swain et. al., “Color Indexing”, Int. J. Computer Vision, Vol. 7, no. 1, pp.11-32, 1991 J. Huang, et. al., "Image indexing using color correlogram", Proc. CVPR, pp. 762-768, 1997 Gersho, R. M. Gray, Vector quantization and signal compression, Kluwer Academic Publishers, Boston, 1992 J. Arvo, Editor, Graphics Gems II, Academic Press, 1991 MPEG7 FCD, ISO/IEC JTC1/SC29/WG11, March 2001, Singapore C. Carson et al, "Blobworld,: A system for region-based image indexing and retrieval", Proc. International Conference on Visual Information Systems, 1999 M. Jones and J. Rehg, "Statistical color models with application to skin detection", Technical Report, Cambridge Research Laboratory, CRL/98/11, Compaq, 1998
(a)
(b) Fig. 5. (a) Retrieval result of MPEG-7 CS method. (b) Retrieval result of the new method. The top left corner image was the query example image
(a)
(b) Fig. 6. (a) Retrieval result of MPEG-7 CS method. (b) Retrieval result of the new method. The top left corner image was the query example image
(a)
(b) Fig. 7. (a) Retrieval result of MPEG-7 CS method. (b) Retrieval result of the new method. The top left corner image was the query example image