(CBIR) System with a Cluster Merging approach - Murdoch Research

0 downloads 0 Views 440KB Size Report
inter-query learning in content-based image retrieval systems. The paper then ..... only settle for an approximate solution through pre-image calculation [22, 23].
2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006

Reducing User Log size in an Inter-Query Learning Content Based Image Retrieval (CBIR) System with a Cluster Merging approach Kien-Ping Chung, Student Member, IEEE, Kok Wai Wong, Senior Member, IEEE and Chun-Che Fung, Member, IEEE

Abstract—Use of relevance feedback (RF) in the feature vector model has been one of the most popular approaches to fine tune query for content-based image retrieval (CBIR) systems. This paper proposes a framework that extends the RF approach to capture the inter-query relationship between current and previous queries. By using the feature vector model, this approach avoids the need of “memorizing” actual retrieval relationship between the actual image indexes and the previous queries. This implies that the approach is more suitable for image database application where images are frequently added and removed. The proposed inter-query relationship is presented using a data cluster that is defined by a transformation matrix, a centroid point and the reference boundary value. These parameters are captured in a file commonly known as the user log. The file however will grow rapidly after successive retrieval sessions. In order to reduce the size of the user log, this paper introduces a merging approach to combine clusters that are close-by and similar in their characteristics. Experiments have shown that the proposed framework has out performed the short term learning approach and yet without the burden of the complex database maintenance strategies required in long-term learning approach.

I

I. INTRODUCTION

N the last decade, query tuning using relevance feedback (RF) has gained much attention in the research and development of content-based image retrieval (CBIR) systems. This is largely due to RF’s ability to refine the user query through a sequence of interactive sessions. Various approaches [1] have been introduced for RF in CBIR systems. They all have yielded certain degree of success. However, most of the researches have been focused query tuning in a single retrieval session. This is commonly known as intra-query learning. In contrast, inter-query learning, also known as long-term learning, is a strategy that attempts to analyze the relationship between the current and past retrieval sessions. By accumulating knowledge learned from the previous sessions, inter-query learning aims at further improve the retrieval performance of the current and future sessions. One may view that inter-query is an extension of the intra-query. Although intra-query in CBIR has been a topic of research for the last decade, inter-query in CBIR has only begun to attract interests in the last few years and it is yet to be fully explored. Most of the studies on the inter-query learning for CBIR systems focus on establishing relationship between each previous query by analyzing the image retrieval pattern

0-7803-9490-9/06/$20.00/©2006 IEEE

among the queries. It is commonly assumed that if the retrieval patterns for the two queries are similar, then the user must be searching for similar images. A weakness in this approach is that it assumes the database is static. The query is based on comparison between the query images to the indices of the database. This approach will not be well suited to applications where images were frequently added or removed. On the other hand, feature vector model may be a more suitable approach in inter-query learning. In the feature vector model approach, the size of the feature vector and number of clusters used for representing the capture information become important factors in determining the size of the user log. Based on the short-term learning as proposed by Zhou and Huang [2], this paper extends their idea by incorporating inter-query learning into the framework. This paper begins with a brief review on some of the current approaches used in inter-query learning in content-based image retrieval systems. The paper then provides a description of the proposed framework. Results from experiments conducted on the implemented method are then reported. Lastly, conclusion is drawn based on the experimental findings. II. BACKGROUND From the reported literatures, inter-query learning can be classified into two approaches. The first approach is based on the traditional text search and retrieval system. In this approach, the goal is to establish the relationship between current and previous query sessions by analyzing the image retrieval pattern between the sessions. It is assumed that if the two retrieval sessions have similar image retrieval pattern, then the user must be searching for similar images. Taking this further, this assumption can be extended by saying that if the two images have similar retrieval pattern, then these images must be semantically similar. As for the second approach, it is based on the feature vector model approach. The concept of this approach is to change the scale of the feature vector coordinate in order to bring similar images close to one another. The transformation of the feature vector coordinate may be done through a simple weighting scheme or kernel matrix transformation technique. The following sub-sections provide a more detailed description of the two approaches. A. Query Session Retrieval Pattern The query session retrieval pattern approach is by far the most popular approach in the inter-query learning. While

1184

Authorized licensed use limited to: Murdoch University. Downloaded on June 15, 2009 at 01:58 from IEEE Xplore. Restrictions apply.

some have taken the approach by analyzing the relationship between the current query and the previous user queries stored in the user log, others have taken this approach further by establishing the “hidden” relationship between the current session and the previous retrieval information. In the extended approach, the retrieval information will be stored in an abstract format mirroring the semantic relationship between each query session. The following provides a more detailed description on these approaches. 1) Basic Approach Fournier and Cord [3] is one of the earliest literature reporting the use of long-term learning in CBIR systems. In their approach, a matrix is generated from the image database listing the semantic relationship between each image in the database. One may view it as a N * N table where N is the number of images in the database. The table stores the semantic score of each image against each other. The semantic score is given manually by user through the retrieval feedback cycles. Under this approach, the semantic relationship between an image and all the retrieved images is simply the average semantic score of this image over all retrieved images. Latent semantic analysis (LSI) is a technique used in analyzing the relationship between terms in text and the documents. It has long been a popular method in analyzing text documents. The technique uses singular value decomposition (SVD) to reduce the dimension size of the term-document space to highlight the important relationship between the terms and the documents. As in the case of CBIR inter-query learning, Heisterkamp [4] has adopted the LSI method to provide a generalization of the relationship between the current query and the search history. In this approach, images in the database are viewed as the “terms” while each query session becomes the “document” in the LSI terminology. The relevance feedback result from each query is considered as a document composed of many terms (images). This approach needs no modification from the existing LSI technique. One of the weaknesses of this approach is that it requires all terms in the system to be normalized to unit length. The number of terms will grow as the system gathers more user sessions which may lead to a scalability problem in system storage. A similar approach has also been adopted in [5, 6] for long term learning. These approaches are all based on the assumption that if the two images have similar retrieval and labeled pattern (i.e. both having same label in a search session), then the semantic content of these two images must be similar. The same assumption has also been adopted by [7, 8]. In [7, 8], statistical correlation is used to analyze the relationship between current retrieval session and the previous retrieval user log. Similarly in [9, 10], the semantic clustering was done by using support vector machine (SVM) and active learning strategy. Scalability is again an issue for Gosselin and Cord [9, 10] in that the kernel matrix for the SVM is non-linearly dependent on the number of images stored in the database. To resolve this problem, Gosselin and Cord decomposed the kernel matrix by calculating the eigen values

and vectors of the kernel matrix. This has reduced the size of the user log. 2) Intermediate Approach The approaches have been discussed in (1) have all captured the query session retrieval pattern in “raw” format. A 2-dimesnional table with images and the retrieval session as the rows and columns is generally used for capturing the query session retrieval pattern. These approaches have not taken the full advantage of the information given by learning the hidden semantics between each query session. The main advantage of presenting the query information in the more abstract level is that it provides an elegant way of identifying relationship between images which are not directly linked in the previous retrieval sessions. In addition, such approach also provides the system with a more logical way of organizing data into different semantic groups. Using an abstract approach, [11-13] process the query session retrieval pattern before the data is saved in the database. In [13], a system numbering scheme, called virtual feature (VF), is used for calculating semantic similarity between the images. The basic idea is to encode the retrieved images with a number created by the system during each retrieval session. At the initial stage, all the images in the database will begin with an empty set of VFs. As the retrieval session begins, the system will start assigning VF to the retrieved relevant images. During the feedback process, if the user has selected an image that has an existing VF set, the system appends the newly assigned VF to the image’s VF set. In this way, the similarity can simply be calculated by analyzing the numbering pattern between images’ VF set. Han et. al. [11, 12] tried to analyze the relationship between each query session by analyzing the ratio of co-positive-feedback frequency and the co-feedback frequency. Co-positive-feedback frequency is the number of times that two images are jointly identified as positive images and retrieved during the same query session. Co-feedback frequency is the number of times when both are labeled as retrieved images but not necessary positively labeled. The paper referred to the ratio between the two factors as link intensity. In this approach, the centre of a cluster of similar images is determined by the link intensity of each image over the entire retrieved image set. The images with high link intensity will become the centre of each cluster. K-means algorithm is used to cluster the rest of the retrieved images into different semantic groups based on the link intensity between the images. The relationship strength between two semantic groups is based on the sum of the link intensity of the images that belongs the two groups. 3) Hierarchical Approach Instead of simply analyzing the user retrieval pattern and converting the information into the intermediate feature, Jiang, Er and Dai [14] have taken the approach further by introducing a multi-semantic layer structure into the system. In their approach, each user retrieval session is first analyzed and converted into a concept. The creation of a new concept and identification of the relationship between concepts is

1185

Authorized licensed use limited to: Murdoch University. Downloaded on June 15, 2009 at 01:58 from IEEE Xplore. Restrictions apply.

dependent on whether the query image set is a subset of an existing concept. Such approach provides the system with a more informative way of capturing information and possibly also improving the retrieval efficiency. However, the paper did not attempt to make any connection between the different levels of concept. Without this connection, it is unable for the system to establish connections between different levels of abstract concepts and relationships between concepts in the same concept level. B. Feature Vector Model Approach The query session retrieval pattern approach provides an excellent way of analyzing the semantic relationship between the images without focusing on the image’s visual features. However, such approach is not suitable for applications where images were frequently added or removed. A more suitable approach is to use feature vector model to analyze the inter-query relationship. The image feature driven approach has been dominated by the SVM approach [15-18]. Gondra, Heisterkamp and Peng [18] use a one class support vector machine (1-SVM) to calculate the query point for the search session. The system only stores the positive centroid of each query session. The inter-query similarity measure is performed through calculating the distances between the positive centroid points of the present and the past query sessions. Test results as reported in [18] have shown an improvement over the user driven pattern approach as discussed in the previous section [7, 13]. In [17], Gondra and Heisterkamp have further expanded their work by including query point merging policy into the system based on the K-mean clustering approach. This is designed to control the size of the user log by constraining the number of clusters used in the log. Gondra and Heisterkamp have also reported an improvement in retrieval accuracy over their previous work. Instead of SVM, [15, 16] use self-organizing map (SOM) in an attempt to cluster images into different groups according to the user driven pattern. The aim of the SOM is to modify the weight of the features so the images selected during the same retrieval session stay close together, while images selected from different retrieval session are moving further apart. It is unclear how such framework can accommodate for scenario of an image being positively labeled during several different retrieval sessions. III. PROPOSED FRAMEWORK A. Overall Framework Figure 1 shows the architectural model for the proposed framework. At the beginning, an image query is submitted to the system. Based on the visual features, Euclidean distance is used to quantify the similarity between the query image and the images within the database. The values are then ranked accordingly. By presenting the top ranked images to the user, the system then invites user to select the relevant images from the displayed images. The selected images are then labeled as positive image by the system, while the rest will be negative

tagged. The system will then use the feedback information together with both the short-term and long-term learning strategies to create a new query point for the next retrieval cycle. At the end of each retrieval session, the retrieved data will be saved and updated in the user log. In the user log, each retrieval session is captured by a data cluster that consists of a centroid point, a boundary value and the transformation matrix. The transformation matrix is used to transform the extracted image feature from the original feature space to a new feature space. If an image falls within the boundary, the image is said to be part of this group of images. In the present approach, the boundary distance is defined as the distance of the furthest positive sample from the centroid. One may consider that the inter-query learning analysis as to identify the data cluster that has the ability to include the most number of current positive samples. The searching threshold value, Tp can be defined as:

Tp =

N ni Np

(1)

where Np denotes the total number of positive samples gathered during the feedback cycle, and N ni is the number of positive samples that fall within the boundary of cluster i.

Fig 1. Abstract Architectural Model for the Proposed Learning Framework

Once the cluster within the user log is identified, the system has to decide how the cluster is utilized. The system can either merge the feedback samples with the identified cluster to form a new query point, or, the system can create multiple query points. The multiple query points are based on the short term learning strategy using the feedback samples and the query point of previously identified groups in the user log. The decision for whether to create a merged group or separated query points is based on the searching threshold value as shown in expression (1) as listed previously. If the value of criteria is above a certain threshold value, then a new merge group will be created, or else, two separate queries will be created. This is designed to allow the user to further explore other related visual groups that have similar semantic content. At the end of the retrieval session, the user log will be updated. The user log updating policy is similar to the query

1186

Authorized licensed use limited to: Murdoch University. Downloaded on June 15, 2009 at 01:58 from IEEE Xplore. Restrictions apply.

expansion policy as discussed in the previous paragraph. However, the searching threshold value is different from the criteria as shown in expression (1). Negative feedback samples will also be used to determine the most suitable data cluster for updating. Negative feedback samples are needed because it provides additional information on the discriminant ability of selected data cluster to the feedback samples. This threshold value, Tt is defined as:

N Tt = si Nt

C. Kernel Biased Discriminant Analysis (KBDA) The idea of kernel biased discriminant analysis (KBDA) [2] is to transfer data from the original space to a new feature space that can best discriminates the positive images from the negative given samples. This is done by applying a set of weight vectors W that maximizes the ratio between the φ

Wopt = arg max

(2)

where Nt denotes the total number of positive samples gathered during the retrieval session, and Nsi is the number of positive samples that fall within the boundary of cluster i. Using the feedback samples, the system will first search through the groups to find the most appropriate group. A new group will be created if the searching criterion is less than the threshold value, or else the current group will be updated with the samples. B. Support Vector Machine Vs Discriminant Analysis The relationship between images in the visual feature space is often found to be non-linear. In the statistical machine learning approach, Support Vector Machine (SVM) [19] and the kernel discriminant analysis [2, 20, 21] are the two most popular approaches used to cluster non-linear data. Both approaches are essentially based on the kernel method to handle the non-linear data. In addition, as shown in [2, 18] that mathematical expressions can be easily derived in both approaches to favor the small number of positive labeled data against the large amount of non-relevant data. This is commonly known as the biased approach. This is important to the field of CBIR as most of the sample images are non-relevant. One of the weaknesses in SVM for data clustering is that the cluster cannot be directly updated by the newly acquire data once the cluster is formed. This is due to the non-linear data transformation between the original input space to the transformed feature space. Furthermore, the non-linear transformation may result in information loss due to compression. This implies that the exact original input data could not be recovered once the data is transformed. One can only settle for an approximate solution through pre-image calculation [22, 23]. This is the approach used in [17] for merging the data clusters that are stored in the user log. The pre-image estimation often requires certain number of training samples to calculate the approximation. Kwok and Tsang [23] used 60 to 300 training samples for the estimation. This may not be feasible for CBIR system. Lastly, the pre-image calculation often introduces extra parameters into the system which result in additional complexity. As an alternative to SVM, one may consider discriminant analysis. The following section provides a way for the system to update the kernel matrix for the discriminant analysis without applying pre-image calculation.

φ

positive covariance matrix S x and the biased matrix S y .

w

W T S φy W

(3)

W T S xφ W

One can view the above expression as a problem of generalized eigen-analysis where the optimal eigen-vectors associated with the eigen-values are the transformation matrix for the new feature space. In KBDA, the positive φ

φ

covariance matrix S x and the biased matrix S y are defined as: Ny

(

)(

S yφ = ∑ φ ( yi ) − m xφ φ ( yi ) − mφx i =1

)

Ny

T

S xφ = ∑ (φ ( xi ) − mφx )(φ ( xi ) − mφx )

T

(4)

(5)

i =1

where

{x

i

= xi ,..., xN x

} and {y

i

= yi ,..., y N y

} denote

the positive and negative examples respectively. The variable Nx and Ny are the respective total number of given positive and negative samples. The mean vector of the positive ϕ

transformed samples is defined as mx , and finally, Φ is the kernel mapping function. By using KBDA as the data clustering tool, one can see that each cluster group contains a centroid point, reference distance and the transformation matrix. The transformation matrix of the cluster group cannot be directly updated by the feedback result. However, since the transformation matrix is derived from the covariance matrix as shown in expression (4) and (5), the updating can be done through these two equations. Through manipulation of the matrix, the two equations can be re-written as:

 1 S x =   Nx

 1 Sy =  N  y

Ny

∑y y i

i =1

T i

Nx

∑x x i =1

T i i

 T  − mφ x mφ x 

(6)

  − mφ y mφ Tx − mφ x mφ Ty + mφ x mφ Tx (7)  

where my denotes the mean vector of the negative samples, also known as the negative centroid. By examining equations (6) and (7), it is obvious that the covariance matrix Sx and Sy are defined by the following

1187

Authorized licensed use limited to: Murdoch University. Downloaded on June 15, 2009 at 01:58 from IEEE Xplore. Restrictions apply.

Ny

Nx

∑x x

expressions: ∑ yi yiT ,

i

T i

, and, the number of positive

i =1

i =1

samples, the number of negative samples and the centroids of the positive and negative samples. All of these variables can now be updated from the given the sample data. Hence, unlike SVM, no pre-image calculation is need. The parameters can be easily updated with the following equations: Ny

∑ yi yiT =

O _ Ny

∑ O _ yiO _ yiT +

i =1

Ny

∑x x

T i i

=

i =1

O _ Nx

i =1

∑O _ x O _ x i

i =1

new _ mx =

new _ my =

T i

+

A _ Ny

∑ A_ y A_ y j

T j

(8)

j

A _ x Tj

(9)

j =1

A _ Nx

∑ A_ x j =1

O _ mφ x * O _ N x + A _ mφ x * A _ N x O _ Nx + A _ N x

(10)

O _ mφ y * O _ N y + N _ mφ y * N _ N y

(11)

O _ Ny + N _ Ny

where variables with “O_” prefix implies the original variable and prefix “A_” represents the newly acquire data. One can see that the matrix parameters

Nx

∑x x i

T i

and

i =1

Ny

∑yy i

T i

are symmetry matrices. Hence, one only has to save

i =1

the matrices in the triangular format for representing the entire matrix. This results in reducing the size of the matrix

reference distance of the newly merge cluster. However, one can estimate the new reference distance through the provided user feedback samples. Since the two chosen clusters can discriminate the user feedback samples with a certain degree of accuracy, one can assume the newly merge cluster also inherits this accuracy. Based on this assumption, the new reference distance can be derived through the maximum distance of the provided positive sample to the new positive centroid. If the newly merge cluster cannot discriminate the feedback samples with 100% accuracy, then one may use the average of the maximum distance of the positive sample and the minimum distance of the negative sample as the reference distance value. This is an intuitive approach being adopted in this study, examination of other more elaborate or sophisticated approaches are currently undertaken by the authors. As present, the cluster comparison and merging steps adopted are as follow: 1) Compare two clusters by: a) Transform both positive centroid points into the feature space as defined by both clusters via their respective transformation matrix. b) Calculate the Euclidean distances between the transformed positive centroid points in the respective feature space. c) Merge two clusters if either of the resulting distance calculation is smaller or equal to the respective threshold values. The threshold value for a cluster is set to be the reference distance of the cluster multiplies by a constant. In this paper, the value of the constant is set to 0.1. This value is determined by trial and error. Application of fuzzy logic to this problem is a part of an ongoing research.

N

log from N * N to

∑ i . This is a significant reduction in term i =1

of data storage for a large matrix. Furthermore, since the two matrix parameters are always positive definite, the eigen-values for the two matrices will always be real and positive. Hence, one can further reduce the size of the matrices by decomposing the two matrices via Singular Value Decomposition (SVD) method. D. Cluster Merging As stated in Section III.A, each cluster group in user log will be updated by the feedback samples using the method as discussed in the previous section. Since some of the clusters have been updated with the feedback samples, it is possible that these clusters are capturing similar data information. If this is the case, the updated clusters should be merged together. The similarity between the clusters can be determined by the Euclidean distance between the two positive centroid points. However, merging with the defined clusters is not as straight forward. While the transformation matrix for the clusters can be updated via method as discussed in the previous section, there is no obvious way of determining the

2) Merge the two clusters by: a) Perform the calculation as listed from expression (8) to (11). b) Use the calculation results to calculate the transformation matrix W as listed in expression (3). c) Use the transformation matrix to calculate the discriminant accuracy of user feedback samples. d) If the discriminate accuracy is 100%, then, set the new reference distance to the maximum distance of the positive sample to the positive centroid. e) Or else, set the reference distance to the average of the maximum distance of the positive sample and the minimum distance of the negative sample as the reference distance value. IV. EXPERIMENT A. Prototype System To evaluate the performance of the proposed approach, the authors have designed and implemented a prototype CBIR system using Matlab simulation software toolbox. The user interface of the prototype system is shown in figure 2. This

1188

Authorized licensed use limited to: Murdoch University. Downloaded on June 15, 2009 at 01:58 from IEEE Xplore. Restrictions apply.

prototype system provides the user with the ability to query the image database with an image sample. After the first retrieval iteration, users can select the relevant images while ignoring the non-relevant images. The system will label the selected images as positive while treating the ignored images as negative. The retrieval procedure of the prototype system is as follows: 1) User inputs a query image. 2) The visual features of the query are extracted by the system. 3) All images in the database are sorted in ascending order based on the dissimilarity distances. 4) Display the top ranked 20 images. 5) User selects the positive images and the rest will be automatically labeled as negative images. 6) Process the labeled images in the proposed framework as described in Section III.A and select the appropriate data cluster from user log. 7) Create new queries for image retrieval. 8) Rank and display the top 20 images that have not been labeled by the user. 9) Go back to step 4 for the next retrieval cycle. 10) Update the user log at the end of the retrieval session.

experiment, five visual features have been selected for the analysis of shape, color and texture of the images. The five features are the water-filling edge histogram algorithm [24], HSV color coherent vector [25], HSV histogram , global edge detection algorithm [26], HSV color moments [27] and color intensity histogram. Each feature comprises a number of elements. A total number of sixty-five feature elements have been used. Lastly, Gaussian Radial Basis Function (RBF) is selected as the kernel transformation matrix for the KBDA approach. This is suggested by literatures [2, 28] as both claimed that RBF yields the best accuracy performance out of all the other kernel transformation approaches. C. Experiment Procedure and Data In this experiment, 1500 images of the Corel image database were used. Within these images, 150 images are classified under three different themes. Each theme contains 50 images. Each semantic theme may contain two or more different group of images which are visually different. This is done specifically to capture the nature of a user search process where the targeted group of images may often be visually different and yet semantically similar. The test was carried out by randomly selecting ten positive labeled images from each theme as input entry point to the system. Each retrieval session is triggered by a different image, and the session goes through four RF cycles before the retrieval session is ended. The retrieval accuracy, speed and number of clusters used by the system are used as the factors for comparing the performance of the system. D. Test Result and Discussion

Fig 2. The user interface of the prototype system.

B. Experiment Environment In order to test the performance of the proposed approach, three systems have been implemented. They are (i) the proposed framework, (ii) the same framework but without the merging mechanism, and (iii) a short term learning framework based on KBDA as reported in reference [2]. To evaluate the validity of the experiment, the environment and parameters used by all three systems are identical. The image features and the generalized eigenvector calculation method are the same and the same parameters are also used in the kernel transformation algorithm for all three systems. In this

Fig 3. Retrieval results from the twenty retrieval sessions for each semantic group.

Figure 3 shows the average accuracy of the first thirty retrieval sessions for the three semantic groups. The result has shown that with the exception of the first retrieval session, the short term learning has the worst retrieval performance. The results of the average retrieval accuracy of the first session are more or less as expected. This is because that at the beginning of the sessions, without any information stored in the user log, the long-term learning framework effectively acts like short-term learning. The retrieval accuracy results of the

1189

Authorized licensed use limited to: Murdoch University. Downloaded on June 15, 2009 at 01:58 from IEEE Xplore. Restrictions apply.

long-term learning framework will improve after the first retrieval session. This is due to its ability in capturing the knowledge from the previously learned queries. In addition, the multiple query-points strategy also improves the retrieval accuracy. In term of retrieval accuracy between the two long-term learning frameworks, the results are similar. The result indicates that no valuable information is lost during the merging exercise. By examining figure 4, it can be seen that the long-term learning with cluster merging framework used significantly fewer clusters to achieve the same retrieval performance as compared to the approach that does not use merging. The reduction in the number of clusters also causes the size of the user log for the merging framework to be much less than the non-merging framework. Since there are fewer clusters to search in the cluster merging framework, the retrieval speed will be faster than the non-merging framework. Improvement in the retrieval time can be observed in figure 5 which shows the actual time spent in the retrieval sessions. In addition, figures 4 and 5 show that there is a strong correlation between the number of clusters used and the retrieval speed of the system.

Fig 4. Number of clusters generated from the retrieval sessions.

V. CONCLUSION A statistical discriminant analysis framework for inter-query learning in CBIR system has been introduced. The proposed framework is primary designed to capture the relationship between images and previous queries without the needs of “memorizing” actual retrieval relationship between the actual images and the previous queries. The experiment results have shown to support this goal. This paper also proposed an approach to reduce the user log size by using the cluster merging approach based on the discriminant analysis and the relevance feedback scheme. The test results demonstrate that the proposed cluster merging framework can significantly reduce the size of the user log and improve the retrieval speed while maintaining the retrieval accuracy performance of the system. Currently, only a flat structure is employed in grouping the images. A complex hierarchical structure could be used to organize these image groups that are semantically similar and yet visually different. This will result in more accurate retrieval results. In addition, the criteria for determining if two clusters are to be merged are currently determined by the reference distance of the cluster and a threshold value. The threshold value is currently determined experimentally. Studies on the use of fuzzy logic in determining the merging criteria will be a part of the future research.

Fig 5. Time spent from the retrieval sessions.

REFERENCES [1] X. S. Zhou and T. S. Huang, "Relevance Feedback in Image Retrieval: A Comprehensive Review," ACM Multimedia Systems Journal, vol. 8, pp. 536-544, 2003. [2] X. S. Zhou and T. S. Huang, "Small Sample Learning During Multimedia Retrieval using BiasMap," IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, United States, December 2001. [3] J. Fournier and M. Cord, "Long-Term Similarity Learning in Content-Based Image Retrieval," IEEE International Conference in Image Processing, Rochester, New-York, USA, September 2002. [4] D. R. Heisterkamp, "Building a Latent Semantic Index of an Image Database from Patterns of Relevance Feedback," Proceedings of International Conference on Pattern Recognition, Quebec, Canada, August 11 - 15 2002. [5] X. He, O. King, W.-Y. Ma, M. Li, and H. J. Zhang, "Learning a Semantic Space from User's Relevance Feedback for Image Retrieval," IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 39-48, 2003. [6] M. Koskela and J. Laaksonen, "Using Long-Term Learning to Improve Efficiency of Content-Based Image Retrieval," Third International Workshop on Pattern Recognition in Information Systems, Angers, France, April 2003. [7] M. Li, Z. Chen, and H. J. Zhang, "Statistical Correlation Model for Image Retrieval," Pattern Recognition, vol. 35, pp. 2687 - 2693, 2002. [8] C.-H. Hoi and M. R. Lyu, "A Novel Log-based Relevance Feedback Technique in Content-based Image Retrieval," ACM Multimedia Conference, New York, USA, October 14-16 2004. [9] P. H. Gosselin and M. Cord, "Semantic Kernel Updating for Content-Based Image Retrieval," The IEEE International Conference on

1190

Authorized licensed use limited to: Murdoch University. Downloaded on June 15, 2009 at 01:58 from IEEE Xplore. Restrictions apply.

Image Processing, Genova, Italy, September 11 - 14 2005. [10] P. H. Gosselin and M. Cord, "Semantic Kernel Learning for Interactive Image Retrieval," IEEE International Conference on Image Processing, Genova, Italy, September 11 - 14 2005. [11] J. Han, M. Li, and L. Guo, "A Memorization Learning Model for Image Retrieval," IEEE International Conference on Image Processing, Barcelona, Spain, September 14-17 2003. [12] J. Han, K. N. Ngan, M. Li, and H. J. Zhang, "A Memory Learning Framework for Effective Image Retrieval," IEEE Transactions on Image Processing, vol. 14, pp. 511 - 524, 2005. [13] P.-Y. Yin, B. Bhanu, K.-C. Chang, and A. Dong, "Improving Retrieval Performance by Long-term Relevance Information," Proceedings of 16th International Conference on Pattern Recognition, Quebec, Canada, August 11-15 2002. [14] W. Jiang, G. Er, and Q. Dai, "Multi-Layer Semantic Representation Learning for Image Retrieval," The IEEE International Conference on Image Processing, Singapore, October 24 - 27 2004. [15] C. C. Hang, "Using Biased Support Vector Machine in Image Retrieval with Self-Organizing Map," in Computer Science and Engineering. Hong Kong: The Chinese University of Hong Kong, 2004, pp. 104. [16] C.-H. Chan and I. King, "Using Biased Support Vector Machine to Improve Retrieval Result in Image Retrieval with Self-Organizing Map," International Conference on Neural Information Processing, Calcutta, India, November 22-25 2004. [17] I. Gondra and D. R. Heisterkamp, "Summarizing Inter-Query in Content-Based Image Retrieval via Incremental Semantic Clustering," Proceedings of the International Conference on Information Technology: Coding and Computing, Las Vegas, United States, April 5 7 2004. [18] I. Gondra, D. R. Heisterkamp, and J. Peng, "Improving the Initial Image Retrieval Set by Inter-Query Learning with One-Class SVMs," Proceedings of the 3rd International Conference on Intelligent Systems Design and Applications, Tulsa, Oklahoma, United States, August 10-13 2004. [19] S. Tong and E. Chang, "Support Vector Machine Active Learning for Image Retrieval," Proceedings of 9th ACM International Multimedia Conference, Ottawa, Canada, September 30 - October 05 2001. [20] K.-P. Chung, C. C. Fung, and K. W. Wong, "A Feature Selection Framework for Small Sampling Data in Content-based Image Retrieval System," 5th International Conference on Information, Communications and Signal Processing, Bangkok, Thailand, December 6-9 2005. [21] X. S. Zhou and T. S. Huang, "Comparing Discriminating Transformations and SVM for Learning during Multimedia Retrieval," Proceedings of 9th ACM International Multimedia Conference, Ottawa, Ontario, Canada, September 30 - October 5 2001. [22] C. Burges, "Simplified Support Vector Decision Rules," Proceedings of the 13th International Conference on Machine Learning, July 1996. [23] J. T. Kwok and I. W. Tsang, "The Pre-Image Problem in Kernel Methods," Proceedings of the 12th International Conference on Machine Learning, Washington DC, USA 2003. [24] X. S. Zhou and T. S. Huang, "Edge-based Structural Features for Content-based Image Retrieval," Pattern Recognition Letters, vol. 22, pp. 457 - 468, 2001. [25] G. Pass, R. Zabih, and J. Miller, "Comparing Images Using Color Coherence Vectors," The 4th ACM International Conference on Multimedia, Boston, Massachusetts, United States, November 18-22 1997. [26] D. K. Park, Y. S. Jeon, and C. S. Won, "Efficient Use of Local Edge Histogram Descriptor," Proceedings of the 2000 ACM workshops on Multimedia, Los Angles, Calfornia, United States 2000. [27] M. Stricker and M. Orengo, "Similarity of Color Images," Storage and Retrieval for Image and Video Databases III, San Diego/La Jolla, CA, USA, February 5-10 1995. [28] L. Wang, K. L. Chan, and P. Xue, "A Criterion for Optimizing Kernel Parameters in KBDA for Image Retrieval," IEEE Transactions on Systems, Man, and Cypernetics, vol. 35, pp. 556 - 562, 2005.

1191

Authorized licensed use limited to: Murdoch University. Downloaded on June 15, 2009 at 01:58 from IEEE Xplore. Restrictions apply.

Suggest Documents