A Blob Coloring Algorithm for Content-Based Image ... - CiteSeerX

20 downloads 0 Views 148KB Size Report
MCEBC – A Blob Coloring Algorithm for Content-Based Image Retrieval System. Sue J. Cho1 and Suk I. Yoo2. Department of Computer Science, Seoul National ...
MCEBC – A Blob Coloring Algorithm for Content-Based Image Retrieval System Sue J. Cho1 and Suk I. Yoo2 Department of Computer Science, Seoul National University, Shillim-dong, Kwanak-gu, Seoul, Korea Phone: +82-2-880-6577 Fax: +82-2-874-2884 [email protected] [email protected]

Abstract. In most content-based image retrieval systems, colors are used as very significant feature for indexing and retrieval purposes. Many of them use the quantized colors for various reasons. In some systems, especially the systems accepting user-drawn queries, color quantization improves the overall performance as well as the execution time. In this paper, a new color quantization method, which uses some heuristics to minimize the color matching errors, is proposed. The proposed MCEBC(Main Color Emphasized Blob Coloring) algorithm quantizes the colors into the predefined color classes, using the heuristic information that humans tend to emphasize the dominant color component when they perceive and memorize colors. The experimental results indicate that the proposed method approximates the user's classification better than the other methods that use the mathematical color difference formulas. The proposed MCEBC algorithm is implemented in a content-based image retrieval system, called QBM system. The retrieval results of the system using MCEBC algorithm were quite satisfactory and showed higher success rates than those using other methods. Keywords: color quantization, content-based retrieval, image database

1 Introduction In content-based image retrieval system that accepts queries drawn by human users, color quantization is performed for three major reasons. First, the color images can be segmented through color quantization. Since pixels can be classified into some fixed set of representative colors, image can be segmented easily. Second, it helps the user select colors while drawing a query image. By providing a global palette of some fixed number of colors, the user has only to select a color that he thinks the nearest to the intended color. Since the colors of the images in the database are quantized, queries can be drawn in such a way. Third, it simplifies the similarity measure used in

image retrieval. With color quantization, the complicated calculation of color differences is not needed. Color quantization is the process of reducing the number of colors in a digital image and mapping them to the nearest representative colors[7]. Since the submitted query image is compared with every image in the database, the image dependent palette cannot be used in image retrieval system. Therefore, the color quantization problem is narrowed to the mapping of image pixels to the palette colors. The simplest approach to this problem is to map each pixel to its nearest neighbor in the palette[6]. The definition of the "nearest color" is varied according to the difference function used by the color quantization algorithm. Generally, colors are represented based on Young's three-color theory that any color can be reproduced by mixing an appropriate set of three primary colors[5]. There are several color coordinate systems, which have come into existence for a variety of reasons[2,3,4,5,6]. Although several color-difference formulas have been proposed to yield perceptually uniform spacing of colors, the existing color spaces do not provide the difference function that is identical to the human perception. In color quantization process, the palette color that a human thinks nearest to a certain color does not always have the smallest difference function value. It is observed that humans tend to emphasize the dominant color component when they perceive colors. In this paper, a heuristic color quantization method that uses heuristic information to approximate the process of human's color quantization is proposed. The proposed color quantization algorithm is implemented in a content-based image retrieval system, QBM(Query by Blob Map) system[1]. QBM accepts a user's query as rough map of colored blobs. It considers an image as a set of colored blobs. The pixels in a blob have the same quantized color although they may have different colors in the original image. Some experiments show that the proposed simple algorithm approximates the human's color mapping better than other methods using mathematical color differences. The retrieval results of the system using the proposed algorithm were quite satisfactory and showed higher success rates than those using other methods.

2 QBM System Overview QBM (Query by Blob Map) is a content-based image retrieval system, which provides the convenient and powerful query-specification scheme. The user of the QBM system submits a query by laying out colored blobs roughly on the screen so that it looks similar to the target image wholly or partially. The position of each individual blob may not be identical to that in the target image because position information is not included in the feature set used in QBM system. In QBM, the following three features are used in image indexing and retrieval; (a) the color of each object in the image, (b) the relative size of each object, and (c) the topological structure of the objects. Therefore, QBM system is robust to imperfect queries such as shifted queries and partial queries.

Fig. 1. A screen dump of the running application

The system consists of indexing module, querying module, and retrieving module. In indexing module, an image is segmented into a collection of objects. The term "object" does not refer the thing that has semantic meaning in real life. An “object” is defined as a set of pixels that are connected and have the same quantized color. Each object is represented as a set of attributes such as color, size, and position. Whenever an image is inserted into the database, it is converted into a collection of objects and these attributes are automatically extracted. Querying module accepts the user-drawn queries. Since a user draws a query in object basis, the query image can be recognized as a collection of objects without any segmentation process. This query image is converted into a graph called prime edge graph, which represents the topological structure of objects in the image. In a prime edge graph, each node represents an object and an edge between two nodes represents the positional relationship of corresponding objects. Retrieving module measures the similarity score of each image in the database through two-step matching – object matching and prime edge matching. Finally, the accepted images are displayed in the order of similarity. A screen dump of the running application is shown in figure 1. The user draws a query image in the rectangular area on the left side of the application window. The system converts the query image to a prime edge graph and tests it against all the images in the database.

3 MCEBC Algorithm When a human memorizes a color of certain object, he quantizes it into several classes such as apples being red, cucumbers being green, etc. The same is true when a human reproduces the image in his memory on a computer monitor. He does not require thousands of colors as he does in creating an artistic picture. Actually, he generally uses only about 20 colors in reproducing an image. In indexing an image database, the process of labeling colors to the stored images is similar to the process of memorizing a color. Since the query image is generated by a human, color quantization of database image must approximate the process of human quantization.

(a)

(b)

(c)

(d)

(e)

Fig. 2. Color blocks of RGB components (a) (0.746,0.598,0.313), (b) (0.500, 0.500, 0.500), (c) (0.500, 0.500, 0.000), (d) (1.000, 0.500, 0.500), and (e) (1.000, 0.500, 0.000)

Color quantization is the process of reducing the numbers used in an image and mapping each pixel to the nearest representative color class. The first problem is how many colors and what colors to be used as representative colors. Second, the meaning of the "nearest" class must be defined. In this paper, the word "nearest" does not simply mean "which has the shortest mathematical distance". It means that "which a human thinks as most similar". In most cases, a human considers the mathematically closest color as the nearest color. However, near gray, he becomes more sensitive to one of the components which constitute the color. For example, although the color of the block in figure 2(a) is closest to the color in figure 2(b) in both RGB and L*a*b* spaces, a human tends to select (c) or (e) as the nearest color. In fact, 18 out of 20 subjects selected (c) rather than (b). Two subjects selected (e).

Fig. 3. The global palette used in QBM system.

Generally, the color quantization is performed in two steps; the design of palette and the mapping of image pixels. Since the proposed color quantization algorithm is designed for an image retrieval system, in which submitted query image is compared with every image in the database, the image dependent palette cannot be used in this case. Therefore, it constructs a simple global palette. The global palette consists of 27 colors which are on the RGB color space and each R, G, and B component has one of the three values {0, 0.5, 1} respectively. The constructed global palette is shown in figure 3. Next, it classifies each pixel into 27 color classes defined in the global palette. Basically, each Red, Green, and Blue component of a pixel is quantized independently into three bands {0, 0.5, 1} according to one-dimensional distance. For example, if RGB coordinate of a pixel is (0.9, 0.1, 0.1), it is quantized into (1, 0, 0). However, if all components have values between 0.25 and 0.75, they are not always quantized into value 0.5 although 0.5 is closer than 0 or 1. In this case, the value of one component may be emphasized or suppressed according to its dominance over other components. If a color component c has the greatest value and is dominant over other components, it is emphasized and replaced with 1. On the other hand, if c has the least value and is dominant over other components, it is suppressed and replaced with 0. A component is said to be dominant, if its degree of dominance is greater than

that of any other component by some threshold t. The degree of dominance (DoD) of each component of the color (r, g, b) is calculated as follows: DoD(R) = |r - g| + |r - b| DoD(G) = |g - r| + |g - b|

(1)

DoD(B) = |b - r| + |b - g|.

c' = 1 yes(greatest) c > 0.75? 0.25 ≤ c ≤ 0.75

c∈{R,G,B}

c < 0.25? c' = 0

Dominant?

no

c' = 0.5

yes(least)

Fig. 4. the block diagram of MCEBC algorithm

The dominant component is what has much larger or smaller value than other components. A threshold t is used in deciding whether a component is dominant or not. Given a color C = (r, g, b), one of the components, say, R is said to be greatest (least) dominant if (a) all r, g, and b are between 0.25 and 0.75, (b) r > g and r > b (r < g and r < b), and (c) DoD(R) > DoD(G) + t and DoD(R) > DoD(B) + t. If a component of certain color is greatest dominant or least dominant, it is said to be dominant. If t takes a large value, the color quantization becomes less sensitive, i.e., most of value between 0.25 and 0.75 is replaced with 0.5. On the other hand, if t takes a small value, many gray-tone colors are shifted toward their dominant colors. Experimental results showed that the appropriate value for t, which best approximated the human perception, was 0.4.

4 Experimental Results

4.1 Evaluation measures The objective of the MCEBC algorithm is to map each color pixel to the same color class as a human user does. In order to set evaluation criteria for the proposed color quantization algorithm, two types of human perception data in the form of pairs of colors were collected from 20 subjects. The first set, called "similarity data" was obtained by asking each subject to select the nearest colors of presented color set out of the global palette displayed in common screen. An evaluation function Fs for this criterion is defined as

Fs =

1 N

N

∑H

s (Result( Di ))

(2)

i =1

where N is the number of gathered data, Result(Di) is the resulted color by quantization algorithm for Di, and Hs is the percentage of the subjects who answered that the nearest color of Di is Result(Di). The larger the value of Fs, the better the quantization algorithm approximates the similarity measure that a human uses in comparing colors. The second set, called "memory quantization data" was obtained in similar manner except that each subject was asked to observe a color in one screen and select the nearest color out of the palette displayed in another screen. An evaluation function Fm for this criterion is defined as Fm =

1 M

M

∑H

m (Result( Di ))

(3)

i =1

where M is the number of gathered data, Result(Di) is the resulted color by quantization algorithm for Di, and Hm is the percentage of the subjects who answered that the nearest color of Di is Result(Di). The larger the value of Fm, the better the quantization algorithm approximates the quantization process that occurs when human memorizes a color as well as the similarity measure that a human uses. 4.2 Determination of the dominant component The selection of threshold t used in determining whether a component is dominant influences the performance of the algorithm. We found appropriate value for t using two types of gathered data as a training set. The result is shown in figure 5. If t takes the value 0, every greatest or least component is determined to be dominant.

100 80 60

Fs Fm

40 20 0 0

0.1

0.2

0.3

0.4

0.5

0.6

t

Fig. 5. Selecting the threshold used in determination of the dominant component

4.3 Performance of the heuristic approach In order to compare MCEBC algorithm with other algorithms which use the mathematical distance as a similarity metric, we implemented another two color quantization algorithms. One uses RGB distance and the other uses L*a*b* distance as color similarity metrics. Table 1 compares our heuristic approach employed in MCEBC algorithm with t = 0.3 and 0.4 to RGB distance and L*a*b* distance using the evaluation functions Fs and Fm. Table 1. Comparison of the color similarity metrics with human's.

Metric

Data Set Fs

Fm

MCEBC (t = 0.3)

90

94

MCEBC (t = 0.4) RGB Distance

96 47

85 36

L*a*b* Distance

50

38

4.4 Performance on image retrieval

Successful Queries (%)

100 80 60

MCEBC L*a*b*

40 20 0 1

3

6

9

12

15

Rank

Fig. 6. Comparison of success rates.

To evaluate the retrieval performance on image queries, two image databases were constructed; one with MCEBC algorithm and the other with L*a*b* distance algorithm. Both image databases are of 1000 images. The success rates over 100 userdrawn queries are plotted in the graph of figure 6. For most of queries, the similarity score of the intended image was higher in MCEBC database than in L*a*b* database. However, success rate of raking the target image in top 15 was the same because color is not the only factor that determines the similarity score. The database indexing

speed is much faster with MCEBC algorithm because the color-conversion and distance-calculation steps are not required.

5 Conclusion The proposed color quantization algorithm approximates the human's quantization process better than other algorithms that use mathematical color difference. In addition, the quantization speed is much faster because it does not perform any color conversion or distance calculation. It manifests these advantages when used in the image retrieval system that accepts user-drawn queries. Our future work includes adopting additional heuristics that reflects the effect of neighboring colors to the humanperceived colors.

References 1. Cho, S.J., Yoo, S.I.: Image Retrieval Using Topological Structure of User Sketch. Proc. IEEE SMC98 (1998) 2. Foley, J.D., van Dam, A., Feiner, S.K., Hughes, J.F.: Computer Graphics Principles and Practice 2nd Edition. Addison-Wesley (1990). 3. Gong, Y., Proietti, G., Faloutsos, C.: Image Indexing and Retrieval Based on Human Perceptual Color clustering. Proc. CVPR98 (1998) 578-583 4. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley (1992). 5. Jain, A.K.: Fundamentals of Digital Image Processing. Prentice Hall, Englewood Cliffs (1989) 6. Sharma, G., Trussell, H.J.: Digital Color Imaging. IEEE Trans. on Image Processing, Vol. 6, No. 7. (1997) 7. Uysal, M., Yarman-Vural, F.T.: A Fast Color Quantization Algorithm Using a Set of One Dimensional Color Intervals. Proc. ICIP98. (1998) 191-195

Suggest Documents