Fuzzy Relevance Feedback in Content-based ... - Semantic Scholar

1 downloads 0 Views 1MB Size Report
Kim-Hui Yap and Kui Wu. School of Electrical and Electronic Engineering ..... Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D.Steele, and P. Yanker, ...
ICICS-PCM 2003 15-18 December 2003 Singapore

3B2.7

Fuzzy Relevance Feedback in Content-based Image Retrieval Kim-Hui Yap and Kui Wu School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore Email: {ekhyap, pg01537831}@ntu.edu.sg

Abstract In this paper, a new notion called fuzzy relevance feedback in interactive content-based image retrieval (CBIR) systems is introduced. Conventional binary labeling scheme requires a crisp decision to be made on the relevance of the retrieved images. However, user interpretation varies with respect to different information needs and perceptual subjectivity. In addition, users tend to learn from the retrieval results to further refine their information request. It is, therefore, inadequate to describe user’s fuzzy perception of image similarity with crisp logic. In view of this, we propose a fuzzy relevance feedback approach which enables the user to make a fuzzy judgment for relevance ranking. A radial basis function (RBF) network with local modeling structure is used for similarity learning. Experimental results show that our system is more user-adaptive, and it can achieve better performance compared with other conventional retrieval systems which are based on hard-decision and global modeling. Keywords: Content-based image retrieval (CBIR), fuzzy perception, relevance feedback.

1. Introduction The need for efficient access to image data is growing rapidly in many applications, ranging from art galleries, digital libraries, biomedicine to military and education. Content-based image retrieval (CBIR) has been developed as an effective way of accessing image data [1-4]. It interprets user information need based on a set of low-level visual features (color, texture, shape) extracted from the images. However, these features may not correspond to the user interpretation and understanding of image content. Thus, a semantic gap exists between the high-level concepts and the low-level features in CBIR. Furthermore, user interpretation depends on individual’s subjectivity, and may change progressively throughout the searching process. Relevance feedback (RF), as an interactive mechanism, has been introduced to facilitate image retrieval [5-8]. The user is incorporated into the retrieval systems to provide his evaluation on the retrieval results. The systems then learn from the feedbacks to

0-7803-8185-8/03/$17.00 © 2003 IEEE

retrieve a new set of images that better satisfy the user information need. Traditionally, the user is restricted to binary classification as to whether an image is “fully relevant” or “totally irrelevant” [6] [9] [10]. This process is also known as binary labeling. However, binary labeling is a hard-decision on whether the retrieval results satisfy the user information need. It does not reflect the nature of user interpretation and understanding of images, which tend to be uncertain or imprecise due to perception subjectivity and changing information need under different circumstances. For example, when a user marks an image as relevant, what he really means is that this image is relevant to his requirement up to an extent. In other words, the user’s relevance ranking is approximate or fuzzy in nature. It is unsuitable to describe user perception by binary crisp logic without considering the degree of relevance. On the other hand, multi-level labeling [5] offers detailed ratings which vary from “highly relevant” to “highly non-relevant”. Nevertheless, it is both inconvenient as well as imprecise for the user to classify an image into one of the multiple levels. This reflects the uncertainty embedded in the decision-making process. A user is more inclined towards using linguistic expressions such as “this image is more or less relevant”or “this image is more relevant than that one”. This paper presents a fuzzy relevance feedback method to model user’s fuzzy perception of image similarity in interactive image retrieval, thus aiding the understanding and expression of user information need. It allows soft decisions to be made with respect to the retrieval results. A good trade-off is achieved between convenience and accuracy for the user to interact with the system. The relevance of fuzzy feedbacks is evaluated by a properly chosen fuzzy membership function. A local modeling network called Fuzzy Radial Basis Function (FRBF) network is developed to learn the user’s relevance ranking.

2. Fuzzy Relevance Feedback Often users have multiple levels of information need which can be characterized as the primary, secondary or tertiary focuses. The primary focus is the general object or topic that a user is looking for. The secondary focus is

defined as important attributes that a user emphasizes on. It provides a more detailed description of the object or topic. The tertiary focus in turn refers to less prominent attributes. Thus we get a hierarchical structure of the user’ s priority at multiple levels, as illustrated in Fig. 1. Focus at different levels has different degree of importance, and therefore does not contribute equally to the user information need. In this work, we will restrict ourselves to two levels of focuses, namely, the primary and the secondary focuses. Primary Focus

Secondary Focus

Secondary Focus

Tertiary Focus

Tertiary Focus

Tertiary Focus

Tertiary Focus

Secondary Focus

Tertiary Focus

Tertiary Focus

Fig. 1 Hierarchical tree structure of multiple levels of information need The user perception is, at times, uncertain or ambiguous as it tends to vary with respect to different users under various circumstances. Users usually respond to the available information (retrieval results) and refine their requests accordingly. For example, initially a user has a primary focus “flowers”, and secondary focus “red”. The user will then encounter a dilemma whether to assign a non-red flower as “relevant” or “irrelevant” when only binary labeling is allowed, since the flower satisfies the primary focus but not the secondary focus. If the user has to mark the non-red flower as relevant, what he really means is that it satisfies his information need at a certain level, in this case, at the primary focus level. Uncertainty arises naturally and cannot be handled adequately by binary labeling and multi-level labeling. The interpretation of visual content by users is inherently nonstationary or fuzzy. Fuzzy logic can address ambiguity which is often reflected in user perception of image similarity. In addition, it provides a natural and flexible way in expressing the user’s preferences. It also models the linguistic interpretation in a gradual and qualitative manner. Hence, we propose fuzzy relevance feedback in the form of a flexible three-level labeling, which incorporates a fuzzy option between “relevant” and “irrelevant” to better simulate user’s decision-making process in image retrieval. The user is allowed to provide a vague or natural description of the retrieval results in the form of fuzzy feedbacks. We utilize a continuous fuzzy membership function to model these fuzzy feedbacks. Different images under the fuzzy label are scaled by different weights to reflect the degree of relevancy embedded in the user perception. A radial basis function (RBF) neural network with local modeling structure is employed to infer the user’s interest

levels and progressively present more desirable images. The combination of fuzzy interpretation and RBF network gives rise to FRBF network.

3. Fuzzy Radial Basis Function (FRBF) Network In interactive CBIR, some works [5] [10] try to find the ideal query, which is associated with a particular location in the feature space. In [11], a single radial basis function (SRBF)-based relevance feedback method is used for similarity learning. All these works are concerned with global characterization of image similarity using a single model. However, global modeling may not precisely or specifically address the local data information defined by the current query. To exploit the multiple local properties of image relevancy, it is more desirable to adopt local modeling strategy. A Gaussian-Mixture Model using the RBF network has been proposed for interactive retrieval in [9]. It characterizes the query by multiple-class models, an inherent strategy of local modeling, and associates those relevant (positive) samples as the models. The irrelevant (negative) samples are used to modify the models such that the models will be moved slightly away from the irrelevant samples. However, this method employs binary labeling for relevance judgment. It does not take into account the user interpretation of image similarity, and therefore fail to tailor for different users’requirement. Taken into account the above-mentioned problem, we propose a FRBF network to model and adaptively learn the user interpretation of visual content. The architecture of the FRBF network is given in Fig. 2. It has a topological structure including an input layer, an output layer and three internal modular subnetworks. The input data to the FRBF network is a P-dimensional feature vector connected to all of the three modular networks. Each modular network has a Gaussian kernel layer and an intermediate output layer, referred to as relevance contribution layer. The three modules are RBF networks associated with the positive, negative and fuzzy samples respectively. They are designed to represent the relevance contributions of the corresponding positive, negative and fuzzy samples, and are referred to as positive-feedback modular RBF network, negative-feedback modular RBF network and fuzzy-feedback modular RBF network, which combine to form a local modeling of image similarity. The outputs of all three modular networks are connected to the FRBF network output layer, which includes a single unit whose output value is the linear combination of all the responses from each modular subnetwork.

the RBF unit width σ i is given by:

w1r x1

σ i = 0.5min || vi − v j ||, j

wrR

w1ir F ( x)

wirIR

(ii)

xP

w Ff

Fig. 2

Architecture of FRBF network

where f r ( x, vi ) , defined in (5), is the Gaussian response of the ith RBF unit associated with a positive sample. For an irrelevant sample, a negative weight wiri is assigned. If an irrelevant sample is assigned a negative weight, the contribution of the irrelevant sample can be greatly reduced. In addition, a negative weight may result in failing to retrieve other relevant samples (which haven’t yet been retrieved) that are close to the irrelevant samples. So we assign a negative weight of wiri = −0.5 to each irrelevant sample. For an image marked as fuzzy by the user, we call it a fuzzy sample, and associate it with a fuzzy weight using a fuzzy membership function. Intuitively, the closer a fuzzy sample is to the relevant cluster center, the higher its degree of relevance. Cauchy function is selected as the membership function given by: 1 (7) wif (vi ) = γ  || vi − cr ||  1+   τ   where vi ∈ ℜ P is the fuzzy sample, cr is the cluster center of all the relevant samples, τ represents the width of the function, here we let τ be the compactness of the relevant class, γ ≥ 0 determines the shape (smoothness) of the function. We set γ = 1 in the experiment. Thus, different fuzzy samples will be assigned different weights reflecting their degree of relevance.

We associate all the training samples as the local models since any training sample, whether desired, undesired or fuzzy, contains some information provided by a user. The relevance degree of each training sample is modeled through the assignment of corresponding weights wi = {wri , wiri , wif }. wri , wiri , wif are called the positive-feedback modular network weight, negative-feedback modular network weight and fuzzy-feedback modular network weight, respectively. They are the connection weights of the ith RBF unit between the Gaussian kernel layer and relevance contribution layer of the respective subnetworks. The detailed FRBF interactive learning algorithm is described as follows: Creation of three modular RBF networks During each iteration, we create the modular RBF networks with three sets of the P-dimensional training samples: Vr = {v1 ,K , vi ,K , v R }, Vir = {v1 , K , vi ,K , v IR }, and

V f = {v1 , K , vi ,K , v F } as the centers of the multi-dimensional Gaussian-shaped RBF units, where Vr , Vir and V f are the sets of positive, negative and fuzzy samples, R, IR and F are their size, respectively. Let V = {v1 ,K , vi ,K , v M } ⊂ ℜ P be the set of all the training samples, M is its size. The Gaussian function is defined by:

(iii)

 ( x − v i ) Λ ( x − vi )  f i ( x, vi ) = exp  −  , i = 1,K , M (5) 2σ i2   T

where vi , σ i are the center of ith RBF unit and its corresponding width respectively, x is the input vector of a particular image. The determination of

Relevance weight assignment For a relevant sample, we assign a positive weight wri to it. In the experiment, wri = 1 , that is, all the relevant samples receive equal importance. The relevance contribution of all the positive feedbacks is computed as Fr ( x ) = ∑ wri f r ( x, vi ) , vi ∈Vr

4. FRBF Interactive Learning Strategy

(i)

j ≠ i (6)

Λ = diag[α1 ,K , α P ] is a diagonal matrix and its elements α p , p = 1,K , P denote the relative importance of different feature components, represented by the standard deviation of the positive samples Vr .

x2

w1f

j = 1,K , M ,

Similarity evaluation The overall output of the network F ( x ) for an input data x is computed as the linear combination of the response from each modular RBF network, given by:

F(x) = Fr (x) +Fir (x) +Ff (x) = ∑ w f (x,vi ) + ∑ w f (x,vi ) + ∑ w f (x,vi ) vi∈Vr

i r r

vi∈Vir

i ir ir

vi∈Vf

i f f

(8)

For the RBF unit center vi associated with a relevant or fuzzy sample, if x is close to vi in the Euclidean sense, the corresponding output of the RBF unit will be large, thus contributing greatly to the network output F ( x ) , which is the weighted summation of all the RBF unit output. On the other hand, if x is far away from vi , the output of each RBF unit will be small due to the exponentially decaying discriminant function. Similarly, if vi is associated with an irrelevant sample and x is close to vi in the Euclidean sense, the corresponding output of the RBF unit f ir ( x, vi ) will be large. Because of a negative weight wiri , the relevance contribution wiri fir ( x , vi ) will be a negative number with large amplitude, leading to F ( x ) having a small value. If x is far away from vi , wiri fir ( x , vi ) will be a negative number with relatively small amplitude, and accordingly larger F ( x ) . So the larger F ( x ) , the more similar the image x .

5. Experimental Results The image database used in the experiment contains 1000 color images in JPEG format from 50 different categories, as shown in Fig. 3. It is obtained from the Corel Gallery product (Corel, 1999) [12]. The visual features used are color and texture. Color histogram and color auto-correlogram [6] are used as the representations for color feature. Gabor wavelet [13] and wavelet moments are used as the texture feature representations.

To evaluate the effectiveness of our proposed method, we perform subjective test, since it best reflects the user perception. No user has prior knowledge about the image database, and was asked to provide four iterations of feedbacks on each of the displayed images according to his information need. For our proposed fuzzy relevance feedback methods, we assume that the fuzzy images and the relevant images are both useful to the user. The following performance measure Average Precision (APR) is adopted: 1 N Average Precision (APR) Pr = ∑ Pi , where N is the N i =1 number of selected queries, N=50 in this experiment. Pi is defined by:

Pi =

Number of relevant images retrieved, N r × 100% Number of retrieved images , N RT

where N RT =16. A comparison of the retrieval performance for our FRBF method with Query Refinement (QR) [2] and Single-RBF (SRBF) [11] methods is given in Table 1 and Fig. 4. Table 1

FRBF SRBF

0 47.25 47.25

Number of iterations 1 2 3 67.00 71.88 73.13 62.63 68.50 70.25

4 74.13 71.00

QR

47.25

64.00

69.50

Method

Fig. 4

Fig. 3

Selected examples of images from 50 categories

Average Precision (%) of 50 queries

67.75

68.63

Comparison of Average Precision (APR) Curve

Based on the tables and figures, we can see that FRBF gives the best retrieval performance in comparison to SRBF and QR. After 4 iterations, the retrieval precision obtained is 74.13% (FRBF), 71.00% (SRBF), and 69.50% (QR), respectively. FRBF consistently achieves a higher APR than the other two methods. The APR of the FRBF method increases quickly in the initial stage. This is a desirable property, since it provides significant improvement on the retrieval results quickly. It is observed that to achieve a specific APR, FRBF requires

the least number of iterations when compared with the SRBF and QR methods. FRBF utilizes local modeling of image similarity to make full use of local information distributed in multiple classes. SRBF and QR are based on global modeling of image similarity. As a result, FRBF can exploit local data properties to achieve higher retrieval performance. Coupled with fuzzy relevance feedback, FRBF is more adaptive towards the user information need. For instance, in one of the image retrieval section, the user is interested in finding some animals. The primary focus is both canine and feline. In particular, the user is interested in dogs. Given the query of a dog, the initial retrieval result consists of some dog images, lion images as well as several images containing other objects. The user will mark the dog images as relevant and the images containing other objects as irrelevant. The categorization of lion images as relevant or irrelevant depends on individual subjectivity. This is because the lion images satisfy the user’ s primary focus but not secondary focus. Therefore, the user faces a dilemma if a hard-decision has to be made on the lion images. For our proposed approach, the user has the flexibility of classifying an image as fuzzy in addition to relevant and irrelevant. Using our approach, the retrieval result is the most satisfactory both in terms of the retrieval performance and relevance ranking. This is in contrast with the conventional binary labeling approach which adopts the classical crisp logic to model the human decision-making process.

6. Conclusion This paper presents a fuzzy relevance feedback framework for interpreting user’s fuzzy perception of image similarity in interactive CBIR systems. A three-level labeling scheme with fuzzy judgment is incorporated into the retrieval system and presented to the user for relevance ranking. In contrast to conventional relevance ranking which is based on binary labeling or multi-level labeling, our method permits the user to provide only a vague or coarse judgment of the retrieved images. It also continuously mines the uncertain knowledge embedded in user’s fuzzy interpretation. Fuzzy logic, as a natural way for managing vagueness or ambiguity, offers flexible and convenient tools to express users’preferences and represent uncertain knowledge. Experimental results demonstrate that our FRBF approach is superior in retrieval performance, and is more effective in addressing different users’information needs.

[2] Y. Rui, T. S. Huang, and S. Mehrotra, “Content-based image retrieval with relevance feedback in MARS,” Proc. IEEE Int. Conf. on Image Processing, Washington D.C., USA, pp. 815-818, 1997. [3] Amarnath Gupta, Ramesh Jain, “Visual information retrieval,” Communications of ACM, vol. 40, no. 5, pp.70-79, May 1997. [4] J. R. Smith, S.-F. Chang, “VisualSEEk: a fully automated content based image query system,”Proc. ACM Multimedia, November 1996. [5] Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: A power tool for interactive content-based image retrieval,” IEEE Trans. on Circuits and Video Technology, vol. 8, no. 5, pp. 644-655, 1998. [6] J. Huang, S. R. Kumar, and M. Metra, “Combining supervised learning with color correlograms for content-based image retrieval,” Proc. of ACM Multimedia, pp. 325-334, Nov. 1997. [7] Vasconcelos, N., and Lippman, A. “Learning from user feedback in image retrieval systems,” in Prof of NIPS’99, Denver, Colorado, 1999. [8] Y. Ishikawa and R. Subramanya, “MindReader: Query database through multiple examples,” Proc. of Int. Conf. on Very Large Data Bases, New York, USA, 1998. [9] P. Muneesawang and L. Guan, “Automatic machine interactions for content-based image retrieval using a self-organizing tree map architecture,”IEEE Trans. on Neural Networks, vol. 13, no. 4, pp. 821-834, July 2002. [10] H. Müller,W. Müller, S. Marchand-Maillet, and D. McG Squire, “Strategies for positive and negative relevance feedback in image retrieval,” Proc. Int. Conf. Pattern Recognition, Barcelona, Spain, 2000. [11] P. Muneesawnag and L. Guan, “Interactive CBIR using RBF-based relevance feedback for WT/VQ coded images” Processing of the IEEE International Conference on Acoustics, Speech and Signal Processing, Utah, USA, May 2001. [12] Corel Gallery Magic 65000, “www.corel.com”, 1999.

References [1] M. Flickher, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D.Steele, and P. Yanker, “Query by image and video content: The QBIC system,”IEEE Computer, vol. 28, no. 9, pp. 23-32, Sept. 1995.

[13] B. S. Manjunath and W. Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. Pattern Anal. Machine Intell., vol. 18, pp. 837–842, Aug. 1996.

Suggest Documents