are retrieved based on their semantic association. Experi- mental results show promising performance of the proposed scheme. 1. INTRODUCTION. Today with ...
ARIRS: ASSOCIATION RULE BASED IMAGE RETRIEVAL SYSTEM Haoran Yi, Deepu Rajan and Liang-Tien Chia Center for Multimedia and Network Technology School of Computer Engineering Nanyang Technological University, Singapore 639798 {pg03763623, asdrajan, asltchia}@ntu.edu.sg ABSTRACT
the low level features using appropriate distance function. There have been a lot of research effort to improve the perA new image retrieval system based on association rule (ARIRS) formance by using different features and different distance is described in this paper. Association rules have been used function [12, 8]. in application such as market basket analysis to capture reThe most often used strategy of content-based image relationships presented among items in large data sets. It is trieval is query-by-example, in which the user provides a shown that association rules are able to find the frequent query image and requests all images in the database that item pairs. In the image retrieval system, user’s interaclook similar. The CBIR system extracts low-level features tion is a very important part. But it is not easy for the from the query image and searches the database for images user to specify their query. Relevance feedback is a comwhose corresponding features are close in the space of the monly used technique for the image retrieval system to infeature vectors according to some distance measures. Exact teractively interpret the user’s desire. Here, we present a matches are not necessary. new scheme for mining the relevance feedback using assoWhile there is much effort addressing content-based imciation rules. The proposed scheme automatically establish age retrieval issues [14, 13], the performance of contentthe semantic association among images by mining the previbased image retrieval methods are still limited in the reous users’ browsing and relevance feedback. Then images trieval accuracy. The limited retrieval accuracy is because are retrieved based on their semantic association. Experiof the big gap between semantic concepts and low-level immental results show promising performance of the proposed age features. The low-level features can only at best capture scheme. pre-attentive similarity, not semantic similarity [6]. Since human is an indispensable part in the retrieval system. Human’s feedback should be considered in the re1. INTRODUCTION trieval stage. Recently, much has been written about relevance feedback in content-based image retrieval from the Today with the rapid growth of the number of digital imperspective of machine learning [14, 13, 6, 16, 17, 18, 19, ages, the need for intelligent tools to index, sort and orga20]. The retrieval process is not one pass process, but sevnize the digital images is very urgent. Content-based image eral interactive processes. The system can adapt itself with retrieval system attracted wide spread researchers’ interest the user’s feedback. Generally, the reentrance feedback alduring the last few years [10, 5, 11, 15, 3]. It supports imgorithms belong to 3 categories: (1) query point Movement age searches based on low-level perceptual features, such [14], (2) reweighing of different feature dimension [7], (3) as color, texture, and shape etc. In traditional databases updated probability parameters [4]. the stored data is searched through alphanumeric matching. Each entry in the database has several key fields by However, most of the relevance feedback techniques menwhich a query is matched. However images and videos can tioned above focus on using only the feedback of the current not be characterized by the alphanumeric strings. Images query secession. The feedback from previous secessions are are usually described by low-level features, such as color, ignored. In this paper we provide an image retrieval scheme texture, shape, spatial constraints, annotation etc. In imbased on association rules to take feedback of both the curage databases, each image is stored with its corresponding rent and previous query secessions into consideration. Our features. The features are chosen in the hope of capturproposed scheme learns the semantic association among iming salient semantical information about the image. The ages by mining the previous users’ feedback. The learned process of the content based image retrieval is to measure semantics association are used to search for similar images the similarity of two images by calculating the distance of during retrieval. The rest of the paper is organized as fol-
lows. Section 2 describes the association rule for relevance feedback mining. Section 3 describes the image retrieval using association rules. Experimental result and concluding remarks are given in Section 4 and Section 5, respectively. 2. ASSOCIATION RULE MINING FOR RELEVANCE FEEDBACK Association rules capture the semantic association among the images. Association rules have traditionally been used in business applications such as market basket analysis. Association rules are able to find the frequent item pairs and the association among items. This section describes our scheme to establish the semantic association among images based on association rules with users’ feedback. 2.1. Association rule overview The problem of association rule is introduce in [1] for mining basket data. An example of such rule might be the 85% of the customers that purchase bread also purchase butter. Finding all such rules is valuable for market managing. Association rules capture information about items that are frequently associated with each other. The formal statement of the association rule is follows. Let I = {i 1 , i2 , . . . , im } be a set of distinct items. Let D be a set of transactions, where each transaction T is a subset of I or T ⊆ I. A transaction T contains X if and only if X ⊆ T . An association rule is an implication in the form of A =⇒ B, where A ⊆ T , B ⊆ T and A ∩ B = ∅. A and B are also called the antecedent and consequent of the rule. Each rule A =⇒ B has a support value and a confidence value. The support for a rule is defined as the percentage of transactions for which all the items in both the antecedent(A) and consequent(B) of the rule are present. If a rule A =⇒ B has support s%, it means that s% of the transactions in D contains A ∪ B. The confidence value is defined as the ratio of number of transactions in which both antecedent(A) and consequent(B) appear together to the number of transaction in which antecedent(A) appears. If a rule A =⇒ B has confidence c%, it means that c% of transactions in D contains A also contains B. In other words, the support is the estimate of the probability of P (A ∪ B) and confidence is the estimate of the conditional probability of P (B|A). 2.2. Association rule for image feedback The choice of items and transaction are problem dependent. In the case of market and basket analysis, the items are all the merchandizes and the transactions are the set of all the items that are bought together(basket). For image association rule, the goal is to find semantic associated images. In order to use association rules to capture this information, it is necessary to model the images in term of items and
transactions. In the association rules for image feedback, the terms for describing association rules are defined as follows: • Items: All the images in the database. • Transactions(Basket): One complete session of the user retrieval/browsing. It consists of the images that are retrieved from the beginning till the end of the retrieval/browsing. • Association rule: Semantically associated images. During one complete session of retrieval, the user tends to browse semantically similar images. For example, if the user is interested in wolf images, the images he or she browsed all belong to wolf. Thus, the transaction (basket) is actually a group of semantically similar images. The frequent items (images) found by association rules have strong semantic association. 2.3. Association extraction The objective of mining a set of images for association rules is to identify the rules that satisfy the user’s specified constraint. In the association rule mining process, the user need to specify the two constraints for the rules: the support and confidence. The rules with high support and confidence identified among images describe frequently occurring images during the user’s browsing/retrieval, which represent the semantic association among the images. Most algorithms developed for association rule mining are desired to operate on transaction data that is stored in database or in a series of file. The problem of discovering association rules can be decomposed into two phrases [2]. The first phrase determines all of the frequent itemsets and the number of times each of them appears in the transaction. A frequent itemset is a set of items that occurs in the transactions with frequency greater than or equal to the user specified minimal support value. The second phrase generates the association rules and their support and confidence values from the frequent itemsets. 2.3.1. Find frequent items Since the number of possible itemsets is very large, it is not practical to evaluate every possible combination of items. It is possible to limit the number of itemsets to be considerable by using the minimum support level specified by the user. An important observation is that an itemset is frequent only when all its subsets are also frequent [9]. This observation leads to an efficient iterative approach, in which the frequent k-itemsets are found by using the frequent (k − 1)itemsets. The algorithm to identify the frequent itemsets is as follows: 1. Find all frequent 1-itemsets, which are the basis for the
following processing. 2. Construct frequent candidate k-itemsets from the frequent (k − 1)itemsets. Frequent (k − 1)itemsets A and B are combined to obtain frequent candidate k-itemset F if and only if A matches with B in all but one item. 3. Compute the support for each candidate frequent k − itemsets. If the candidate itemset’s support value is greater than or equal to the minimum support value, it is accepted, else it is discarded. 4. Step 2 and step 3 are iterated until all the frequent itemsets are generated. 2.3.2. Find association rules The procedure described above generates a list of all itemsets that satisfy the minimal support requirement and provide the frequency of occurrence of each frequent itemset. With this information it is possible to generate a set of association rules of the form A =⇒ B. Consider an itemset F . Let A and B be any two subsets of F such that A ∪ B = F . The association rule A =⇒ B is valid if the confidence value of the rule, which is the ratio of the support(F ) to support(A), is greater or equal to minimal confidence value specified by the user. Another observation is that if an itemset F contains k element, it will generate 2k rules. If there is no restrict of the rules, the number of rules will be very large and many of those rules maybe redundant and contains irrelevant information. The number of rules can be greatly reduced by specifying some constraint in the form of the rules, such as the number of items in the antecedent(A) or consequent(B). For our image retrieval application, the user gives only one query images. Therefore, we set the constraint of the rule as that the the number of items in the antecedent (A) should be one. This reduces the number of rules significantly, from 2 k to k for a frequent k-itemset. The algorithm for finding association rule is given below: 1. Create k candidate rules from frequent k-itemset F = {f1 , f2 , . . . , fk }: fi =⇒ F − fi (i = 1, 2, . . . , k). 2. If the confidence of these rules, defined as the ratio of support(F ) to support(fi ), is greater than or equal to the user specified minimal confidence value, the rule is accepted, else the rule is discarded. 3. Step 1 and 2 is repeated for every frequent itemset generated by the frequent itemset finding algorithm.
3. USING ASSOCIATION RULE FOR IMAGE RETRIEVAL In Section 2, we described how to mine the association rules from the users’ previous feedback. This section we describe the image retrieval using association rules. Since the associ-
ation rules capture the semantic association among images, the image retrieval based on association rules is expected to perform well. For each association rule A =⇒ B, the support and confidence value indicates the frequency of occurrence of the images during one secession of user’s retrieval and browsing. The more frequent the images occur together during the user’s retrieval/browsing, the more semantically similar they are. We use the confidence value of the association rules for image retrieval. The procedure for image retrieval using association rule is described as follow: For each query image, we find all the association rules which use the query image as the antecedent(A). The consequent(B) are the candidate images for retrieval. We rank those candidate images by their confidence values in the association rules. The higher the confidence value, the higher the rank. Note that our association rules consist of only one image in the antecedent(A). Another observation about the association rules is that the support value of rule A =⇒ B is greater than the rule A =⇒ C if B is a subset of C. Therefore, considering the frequent two item pairs is enough to find all the candidate images. That is both the antecedent and consequent are 1-itemsets. During retrieval, if the candidate image set found by association rule is empty or less than the number of images to be presented, the system randomly picked several images from the database and that would give every image the chance to establish the association rules. 4. EXPERIMENTAL RESULT We have conducted experiments on a data set consists of 800 images. The images belong to 8 categories, each consists of 100 images. These are: lion, wolf, horse, canyon, flower, race car, wave and penguin. All the images come from the Corel image database. The user interface is implemented with Matlab as shown in figure 1. The user select the query image from the listbox on the left. Then the system will present the top 16 retrieval images. User can then use the popmenu under each retrieved image to mark it as either ‘good’ or ‘bad’. Initially, we randomly select several images for user to browsing and provide feedback. The user first chooses one query image. The system will randomly select 16 images from the database. The user will label whether these images are relevant to the query images. The user may perform several rounds of browsing before he stops. All the images that the user consider relevant are stored as one transaction. We generated 1000 user’s browsing transaction. From the 1000 user’s browsing transactions, we use the association mining algorithm described in Section 2. The support and confidence for the association rule is set to be 0.01 and 0.1. The relative low value of support and confidence are chosen
Fig. 1. ARIRS Retrieval Example 3
P S = P (1+(1−P )+(1−P )2 +. . .+(1−P )n +. . .) (4)
1.2 1
Top 16
S=
Top 100
n−m+1 1 = P m
(5)
In our experiment, m = 16 and n = 800 thus expectation of the number of images to be visited is 49.06. We have conducted 28 image finding tasks using association rule based image retrieval. During each around the user chose one images from those presented by the system, then the query is substitute by the chosen image. From our experiment, we find that the average number of images to be visited is 2.48.
0.8 0.6 0.4 0.2 0 200 300 400 500 600 700 800 900 1000
5. CONCLUSION
Fig. 2. Averaged hit rate for image retrieval using association rules to capture the sparse transaction for the association mining. These association rules are used for image retrieval as described in Section 3. We rank all the candidate images according to its confidence value associated with the query image. The hit rate in the first 16 images and 100 images are used as the performance measure. Figure 2 shows the average hit rates for 80 queries (10 each category) with different number of transaction used for association mining. From this figure, we can see the hit rate increase gradually with the increasing number of transactions. The more usage of the system, the better the system performs. In the second experiment, we perform a task called “picture finding” to find one specific image in the database. To find a specific image in a database is a very difficult task. It is just like to find a needle in the hey. Theoretically, if we select the image from the image database randomly, the expectation of the number of images need to be visited is n−m+1 before the desired image to be found, where n is m the number of image in the database and m is number of images we present during each around. The derivation of the equation is as follows. We denote P as the possibility that the desired image are selected during this pass.
In this work, a new image retrieval system based on association rule (ARIRS) is developed and evaluated. Association rules are shown to be able to capture the semantical association among the images. Association rules form a semantic space. Retrieval based on association shows promising performance. The association rule based learning scheme can be easily incorporated into other content based image retrieval system as the relevance feedback learning. The content based low-level image retrieval provides the initial result and association rule based image retrieval will provide the following retrieval result based on user selected images. 6. REFERENCES [1] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207–216, Washington, D.C., May 1993. [2] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, 1994.
(1)
[3] S. Chang, W. Chen, H. Meng, H. Sundaram, and D. Zhong. Videoq: an automated content-based video search system using visual cues. In Proc. ACM Multimedia, Seattle, WA, Novemember 1997.
The expectation of the number of images to be visited before the desired image to be found is
[4] I. Cox, T. Minka, T. Papathomas, and P. Yianilos. The bayesian image retrieval system. IEEE Trans. Image Processing, 9:20–37, January 2000.
C m−1 P = nm = Cn
n! (m−1)!(n−m+1)! n! m!(n−m)!
=
m n−m+1
S = P +(1−P )P ·2+(1−P )2P ·3+. . .+(1−P )n−1 P ·n+. . . [5] (2) (1−P )·S = (1−P )P +(1−P )2 P ·2+. . .+(1−P )n−1 P (n−1)+. . . (3)
W. N. et. al. The qibc project: Querying images by content using color, texture and shape. In Proc. IST/SPIE. on Stroage and Retrieval for image and video databases I, San Jose, CA, Februrary 1993.
[6] X. He, O. King, W. Ma, M. Li, and H. Zhang. Learning a semantic space from user’s relevance feedback for image retrieval. IEEE Circuit Sys. Video Technol, 13(1), January 2003.
[18] Y. Wu, Q. Tian, and T. Huang. Descriminant-em algorithm with application to image retrieval. In Proc. IEEE conf. Computer Vision and Pattern Recognition, Hiton Head Islandm, SC, June 2000.
[7] Y. Ishikawa, R. Subramanya, and C. Faloutsos. Mindreader: Query databases throgh multiple examples. In Proc. 24th Int. Conf. Very Large Databases, New York, 218-227 1998.
[19] X. Zhou and T. Huang. Comparing discriminating transformations and svm for learning during multimedia retrieval. In Proc. ACM Multimedia, Ottawa, ON, Canada, September 2001.
[8] B. Li, E. Chang, and C.-T. Wu. Dpf — a perceptual distance function for image retrieval. In International Conference on Image Processing, New York, USA, Sep 2002.
[20] X. S. Zhou and T. Huang. Biasmap for small sample learning during multimedia retrieval. In Proc. IEEE conf. Computer Vision and Pattern Recognition, Kauai, HI, December 2001.
[9] H. M. and S. A. Set-oriented mining for association rules in relational databases. In Proceedings of the 11th International conference on Data Engineering, pages 25–33, Taipei, Taiwan, R.O.C., 1995. [10] M. Ortega, Y. Rui, K. Chakrabarti, S. Mehrotra, and T. Huang. Supporting similarity queries in mars. In Proc. ACM Multimedia, Seattle, WA, Novemember 1997. [11] A. Pentland, R. Picard, and S. Scarloff. Photobook: tools for content-based manipulation of image databases. In Proc. IST/SPIE. on Stroage and Retrieval for image and video databases II, pages 34–47, San Jose, CA, Februrary 1994. [12] Y. Rubner, C. Tomasi, and L. Guibas. A metric for distributions with application to image databases. In Proceedings of the 1998 IEEE International Conference on Computer Vision, Jan 1998. [13] Y. Rui and T. Huang. A novel relevance feedback technique in image retrieval. In Proc. ACM Multimedia, 1999. [14] Y. Rui and T. Huang. Relevance feedback: a power tool for interactive content-base image retrieval. IEEE Circuit Sys. Video Technol, 8(5), 1999. [15] J. Smith and S.-F. Chang. Visualseek: a fully automated content-based image query system. In Proc. ACM Multimedia, Boston, MASan Jose, CA, Novemember 1996. [16] K. Tieu and P. Viola. Boosting image retrieval. In Proc. IEEE conf. Computer Vision and Pattern Recognition, pages 228–235, Hiton Head Islandm, SC, June 2000. [17] S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proc. ACM Multimedia, Ottawa, ON, Canada, September 2001.