A Review of Content-Based Image Retrieval

10 downloads 54748 Views 168KB Size Report
an image hosting website and online community platform claims to host more than .... Grass, forest, blue sky, sea, flower, sunset, beach, firework, tiger, ape fur, ...
A Review of Content-Based Image Retrieval G. Rafiee, S.S. Dlay, and W.L. Woo School of Electrical, Electronic and Computer Engineering, Newcastle University, England, United Kingdom {g.rafiee, s.s.dlay, w.l.woo}@newcastle.ac.uk Abstract—A comprehensive survey on patch recognition, which is a crucial part of content-based image retrieval (CBIR), is presented. CBIR can be viewed as a methodology in which three correlated modules including patch sampling, characterizing, and recognizing are employed. This paper aims to evaluate meaningful models for one of the most challenging

problems in image understanding, specifically, for the effective and efficient mapping between image visual features and high-level semantic concepts. To achieve this, the latest classification, clustering, and interactive methods have been meticulously discussed. Finally, several recommendations for future research issues have been suggested based on the weaknesses of recent technologies. Index Terms-Review, content-based image retrieval (CBIR), semantic concept, effective mapping, patch sampling, patch characterizing, patch recognizing.

I.

INTRODUCTION

In recent years, a drastic increase in the size of image databases has been realized. As an illustration, Flickr 1 as an image hosting website and online community platform claims to host more than two billion images during four years from 2004 to 2007 [1]. Due to these developments, cataloguing, annotating, and accessing effectively to these data have been significantly requested in various applications such as image search engine, grouping and filtering of Web images, biomedical information management, computer forensics, and security [2]. Therefore, many researchers from different fields of science have focused their attention on image retrieval methods. In Fig. 1, the number of published items and citations pertaining to CBIR topic has been illustrated. CBIR as a set of challenging techniques aims to retrieve semantically requested relevant image concepts from large-scale image databases. One of the most crucial challenges in the literature of CBIR is to intelligently map between low-level visual features (i.e., color, texture, shape, and salient points) and high-level semantic concepts [3] (e.g., animal, building, flower, river and so forth). In recent years, many methods and algorithms have been presented to reduce the semantic gap between visual data and the richness of human semantics [4, 5]. CBIR methodology principally encompasses three correlated modules: patch sampling, patch characterizing, and patch recognizing [6]. A patch can be defined as each informative area of an image, which can be obtained by region-based approaches (e.g., segmentation-based) [7], [8], uniform sampling (e.g., dense sampling or spatial pyramid) [9, 10], or by salient points extraction

techniques [11, 12]. To mathematically describe a patch of an image using feature vectors or distributions is defined as patch characterizing. The meaningful interpretation of a characterized patch or a set of patches can be considered as patch recognizing. Many novel techniques have been proposed to tackle the challenges of patch sampling, characterizing, and recognizing. However, all current CBIR systems suffer from insufficient generalization performance and accuracy as they are not able to establish a robust link/map between image features and high-level concepts. In this paper, we aim to focus on state-of-the-art techniques in patch recognizing and investigate their difficulties from a learning point of view. Furthermore, some technical and critical issues, which can be incorporated into current techniques, are proposed.

Figure 1: The number of published items and citations within 15 years with the topic of CBIR (or Image Retrieval) obtained by ISI Web of Knowledge2

The reminder of this paper is organized as follows: Section II deals with the state of the art methods employed in the recent decade for semantic-based recognition. Section III presents some critical issues regarding current techniques. Finally, Section IV concludes the paper. II.

An image is worth more than ten thousand words. Human beings are able to explain a narrative from an image on the basis of observations and specifically their background knowledge. One important question that arises is whether we can develop an intelligent model to learn image concepts like human. There is no doubt that the ambitious efforts have been made to unravel it in the past decade. However, there is no complete and perfect solution for this issue from user and system points of view [13]. The comprehensive survey of Datta et al. [5] 2

1

Photo management and sharing application: http://www.flickr.com/

SEMANTIC-BASED IMAGE RETRIEVAL

Database: Science Citation Index Expanded (SCI-EXPANDED) and Conference Proceedings Citation Index- Science (CPCI-S), http://isiwebofknowledge.com/

shows that using similarity measure functions alone for concept recognition often results in failure. However, machine learning techniques together with similarity measure functions are able to create a robust link between visual features and the meaningful regions of an image. There are three common frameworks to establish a link between image features and a concept: supervised learning, unsupervised learning, and interactive models. The first one employed in an off-line manner aims to predict semantic category labels based on a set of image descriptors and corresponding labels. In contrast, unsupervised learning attempts to group image concepts into discriminative categories according to similarity and dissimilarity measures between image features. Finally, interactive models (i.e., relevance feedback techniques) can be utilized to provide cumulative learning capabilities along with supervised and unsupervised learning methods. A.

Supervised Learning Techniques

Supervised image learning is an important process to accelerate image retrieval speed, improve retrieval accuracy, and perform annotation automatically [5]. In this off-line process, a collection of category labels and related visual features (i.e., training set) are used. Image classification can be more useful when the image training sets (i.e., characterized patches) are well identified. Datta et al. [5] categorized the classification of images into discriminative and generative frameworks. In discriminative model, boundaries of classification are directly determined. Support vector machine (SVM) and decision tree (DT) belong to this category. In contrast, generative approach aims to estimate data density within each class and the Bayesian formula can then be employed to optimize the boundaries. The second approach is more convenient when we have considerable classes. Vailaya et al. [14] presented a hierarchical algorithm based on binary Bayesian classification. They used a hierarchical structure in which natural scene images are categorized into indoor and outdoor. Outdoor images are also divided into city and landscape. In the lowest level, subsets of landscape images are classified into sunset, forest, and mountain. They concluded high accuracy in classification on a specific database including 6931 images. They also mentioned that the accuracy of algorithm depends on selected features, the number of training sets, and the learning ability of classifier in true decision boundary. The limitation of this approach is that classification is under constraint. That is, test image should be selected from one of the classes. In [15], the authors presented an approach to tackle some neural network problems. The Bootstrap aims to learn samples from a small set of training labels. In this sense, to label new sample, two independent classifiers are employed to co-train and co-annotate for cumulative annotating. The results on 6000 mid-size images from CorelCD1, PhotoCD2, and Web demonstrated 10% improvement in retrieval accuracy in comparison to previous methods. 1 2

http://www.Corel.com/ http://www.Kodak.com/

SVM is another machine learning tool used for multiple concepts learning in image retrieval. As pointed out by Shi et al. [16], binary SVM is a proper candidate for learning image regions because of generalization ability. They used 23 important category concepts for image annotating. The categories are animals, vehicles, beaches, mountains, meadows, buildings, transportation, facilities, office equipments, food, clouds, snow, sunrises/sunsets, grasses, trees, plants, flowers, rocks, clothing, people, water, none, and unknown. Their experiments with a set of 800 training and testing image sets have shown strong results to classify image regions. It should be noted that SVM methods have been often used when user provides some relevant or irrelevant feedbacks [17]. Tao et al. [17] claims that the small number of positive feedback affects the effectiveness of SVM classification method. Their approach based on combination of classifiers aims to overcome some problems of SVM classification along with relevance feedback (RF). Moosmann et al. [6] have presented an effective and computationally efficient method used in image classification and search. The efficiency of method is provided by considering the two common forms of knowledge about semantic categories, which are a collection of correspondent constraints between pairs of images and using common visual concepts. According to the most popular methods in image classification, four important phases have recently been used [6]. The first step is selecting patches from an image. These patches can be local regions, segments or even some randomly selected parts of an image. Describing mathematically of local visual parts of an image is other crucial stage in image classification, which is called patch characterizing. Creating visual vocabulary (dictionary) is the critical phase for coding (quantizing) vectors, in which distinct labels are designated to descriptors. In [6], this process is carried out by creating a balanced tree (BT), in which each leaf of tree is used as a distinct label. Fast computation and high discrimination in creating visual vocabulary are the advantages of their technique. In the final phase, the number of occurrences of each label is calculated to build a global histogram to predict the image category label. Extremely Randomized Clustering Forests (ERCForests) method [6] has some advantages rather than kmeans-based coding. Fast computation in terms of training time and memory usage, testing time, and classification accuracy are some of these advantages. Other significant characteristic of the method is robustness feature with respect to noisy background by which clean class segmentations could be provided. Moreover, this technique can be embedded within other classification methods. Recent evidence [18] suggests that decision tree learning (DT) methods can present the more efficient interpretation of high-level semantic concepts in semantic-based image retrieval (SBIR) among the machine learning techniques for feature vector coding.

Decision tree learning algorithms including ID3, ASSISTANT, C4.5 (enhanced version of ID3), and CART are widely used for data classifying. This technique aims to partition feature space repeatedly into a group of non-overlapping spaces [19]. Liu’s comparative study [18] found that the decision tree induction method is an effective candidate for mapping between visual features and high-level semantic concepts of an image. Furthermore, user’s feedback is no necessary in this regard. They argued that SVM method, which is often used for semantic concept learning, might be successful in the presence of supplementary information such as relevant/irrelevant feedback of user. They used 19 categories of natural scenery images. Grass, forest, blue sky, sea, flower, sunset, beach, firework, tiger, ape fur, eagle, building, snow, rock, bear, night sky, crowd, butterfly, and mountain are all the categories employed in their work. The state-of-the-art technique in this paper is using semantic template (ST) [20] along with decision tree learning. They constructed a collection of semantic templates for concept categories. A semantic template is an appropriate description of a region concept, which can be calculated from the centroid of the color and texture features of a set of sample regions concerning one category [18]. The learning scenario includes three following important phases. Firstly, to partition an image into some regions based on JSEG segmentation technique [7]. Secondly, extraction and normalizing of color and texture features of desired regions. Eventually, these features are converted to discrete value labels using discretizing process (DT-ST). The system supports query by keyword or region as the method is able to create a distinctive map between color/ texture features of a region and its equivalence high-level semantic concept. Their precise experimental results in precision (Pr) and recall (Re) of 40 queries with various numbers of images demonstrate an improvement of 10% in retrieval accuracy compared with other CBIR systems. They also compared the performance of DT-ST classification method with other decision tree methods such as ID3 and C4.5 in natural image semantic learning. B.

Unsupervised Learning Techniques

Unsupervised clustering is another important technique used in content-based image retrieval. The aim of this approach is to categorize a collection of image data in a way to maximize the similarity within clusters (high intra-cluster similarity) and minimize the similarity between the clusters (low inter-cluster similarity) [21]. Datta et al. [5] divided clustering techniques into three different types in terms of image signatures: pair-wise distance based method, optimization of an overall clustering quality measure, and statistical modeling. One of the interesting techniques used in image retrieval, where image signatures often have complex formulations, is pair-wised clustering. Spectral partitioning and Linkage clustering are some samples of this group [5]. The authors in [5] pointed out that high computational cost is one disadvantage of this sort of clustering methods. In [22], the authors presented a

powerful locality preserving clustering (LPC) algorithm for image databases, which is modified version of locality preserving projections (LPP) [23]. Basically, spectral clustering encompasses two steps: dimension reduction and using clustering method. Zheng’s comparative study [22] found that the cluster representation and computational efficiency of LPC method are very useful in their method. Furthermore, this technique can provide an explicit mapping function compared to Normalized cut method (spectral clustering) [24]. In [25], the authors developed a new approach named cluster-based retrieval of images by unsupervised learning (CLUE), in which the system aims to retrieve images by including the similarity knowledge between target images through user interaction. They claim that the degree of user involvement with CBIR system can assist to reduce the semantic gap. The SemQuery approach [26] clusters a query image into various set of classes based on the heterogeneous features of images. Li et al. [27] presented an annotation system named automatic linguistic indexing of pictures (ALIP), in which each semantic category is characterized by a statistical model named 2D multi-resolution hidden Markov. K-means algorithm is one of the most popular methods in clustering based on optimization quality of clusters. In this method, the centre vector of each cluster (in mass cluster representation) is employed to minimize the sum of internal-cluster distances. Li et al. [28] developed a new algorithm based on statistical modeling and optimization (D2-clustering), in which data points (region-based image signatures) characterized by a set of probability weighted vectors. D2-clustering aims to generalize k-means algorithm by using sets of weighted vectors instead of vectors. Their method for real-time automatic image annotation attempts to establish probabilistic relationships between images and relevant labels. C.

Relevance Feedback Approaches

The subjectivity of human perception is one of the key motivating reasons to make use of interaction model and specifically relevance feedback in CBIR systems. Yong et al. [29] discussed that these sorts of techniques can help user to have high-level subjective query. As pointed out by the authors, human perception subjectivity can be appeared at the different level of subjectivity. For instance, people under different circumstances may recognize the same image content in a different way. Continuous (cumulative) learning is other motivating factor to employ these kinds of techniques [21]. To develop an efficient algorithm to reveal user’s preferences is the main goal of interactive techniques. In a typical RF approach [30], retrieval system provides initial image results in response to the query of user (by image or keyword). Then, user’s judgment (relevant/irrelevant) on the retrieved images is employed by specific algorithm for tuning system parameters. These steps are iteratively carried out till user is satisfied with the image results. As pointed out by Liu et al. [21], the three sorts of algorithms can be employed to adjust

parameters: Query re-weighting, query point movement (QPM), and learning techniques. In re-weighting algorithms, the weight of various types of image features and components are dynamically updated [29]. Query-point-movement method aims to improve the estimation of query point by strongly considering the positive samples and declining the impact of negative ones through the Rocchio’s formula [21], [31]. These techniques make use of the nearest-neighbor strategy and return top graded images for user. In contrast, learning techniques aims to separate relevant images from irrelevant ones in hyper-plane space [32-37]. Chen et al. [33] developed a new kernel framework to tackle the problem of small training samples (positive/relevant) in user feedback using one-class SVM. In [34], the authors employed SVM active learning algorithm to select a proper subset of user feedback, instead of selecting a random subset. Their studies showed that after three or four iterations, the algorithm is able to learn discriminative boundary with higher accuracy compared to traditional refinement query. He et al. [35] proposed a new method in active learning, termed mean version space, in which the algorithm tries to choose an optimum image of relevance feedback in each round. In [36], the authors presented a method based on dimensionality reduction, named augmented relation embedding (ARE), used for mapping the image space into a semantic manifold. The technique attempts to consider the multiple angles of relations among images and user feedbacks for extracting the relevancy/irrelevancy of images. Compared with other methods, this method can incorporate the intrinsic structure of unlabeled images in the user loop. Yin et al. [32] proposed a new approach, image relevance reinforcement learning (IRRL) model, in which relevance feedback techniques (multiple RF approaches) are integrated to improve the performance of retrieval using reinforcement learning. Reinforcement learning [38] can be utilized in situations where an agent (i.e., a CBIR system) needs to learn a specific task by a series of trial-and-error interactions with its environment. The aim of IRRL model is to employ the advantages of RF techniques and make use of prior relevance information and long-term learning for a given query image. Their experimental results manifest significant improvement in average precision (from 31.65% to 68.12%) compared with traditional RF techniques. The paper recently published by Gosselin et al. [37] provides a fast and efficient strategy focused on interactive methods. Different active learning strategies are employed as a statistical framework to improve active learning methods for online CBIR. The main goal of the method is based on utilizing of three related components: boundary correction, average precision maximization, and batch selection. These parts are integrated to create an online retrieval system named RETIN. They compared statistical learning techniques such as Bayesian classification with Parzen density estimation, k-nearest neighbors, SVM, Kernel Fisher discriminant, and also a query-reweighting strategy. The comparative results in

terms of mean average precision show the power of statistical methods over the mentioned methods. III.

DISCUSSION

Three major categories of current techniques for reducing the semantic gap between visual features and high-level semantic concept have been presented. As outlined, most classification techniques are able to learn a limited number of image concepts with a reasonable degree of accuracy. Moreover, the majority of results obtained in these systems have been assessed using a limited number of image categories. In such a way, supervised learning techniques have been considered as a predictable learning problem with a small number of parameters. Consequently, the desired level of generalization and accuracy expected from CBIR system for a wide range of image concepts is not reasonably achieved. In this sense, the use of multiple classifiers for improving classification accuracy at four levels of fusion: data, feature, classifier, and aggregation is recommended [39, 40]. The main drawback of clustering methods employed in CBIR field is the lack of solution to reduce the uncertainty involved in the meaningful image concepts. This uncertainty may originate from patch selection (e.g., segmentation process), which can lead to significant errors in patch characterizing and recognition phases. To address this, care must be taken to ensure that different errors are modeled in the process of patch sampling and characterizing. Compared to supervised learning, little attention has been paid to semi-supervised learning or constrained clustering techniques. The constrained clustering technique aims to employ extra knowledge (domain knowledge) about the types of clusters which can be sought in the data [41]. We suggest the constrained clustering technique along with an uncertainty-based model of image concepts for further research. One such approach can tackle the challenges of impression, vagueness, and inconsistency of high-level semantic concepts and their equivalent descriptors. Relevance feedback as a real-time classification technique [17] can be integrated with other supervised and unsupervised learning techniques to provide meaningful image retrieval. The past decade has witnessed the rapid development of relevance feedback techniques in the interpretation of image concepts. However, more research on this concern needs to be undertaken for feasible and real time operation. IV. CONCLUSION This paper presents a comprehensive survey on patch recognition, which is an important part of content-based image retrieval. Three major categories of state-of-the-art techniques including supervised learning, unsupervised learning, and relevance feedback approaches in reducing the gap between visual features and image concepts have been investigated. Moreover, several recommendations have been suggested based on the weaknesses of current technologies.

REFERENCES [1] [2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13] [14]

[15]

[16]

[17]

[18]

[19] [20]

[21]

E. Auchard, "Flickr to map the world's latest photo hotspots," Reuters, Nov. 2007. J. Z. Wang, D. Geman, J. Luo, and R. M. Gray, "Real-World Image Annotation and Retrieval: An Introduction to the Special Section," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, pp. 1873-1876, 2008. A. Smeulders, W. Worring, S. Santini, A. Gupta, and R. Jain, "Content-Based Image Retrieval at the End of the Early Years," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 22, pp. 1349-1380, 2000. J. Z. Wang, J. Li, and G. Wiederhold, "SIMPLIcity: semantics-sensitive integrated matching for picture libraries," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, pp. 947-963, 2001. R. Datta, D. Joshi, J. Li, and J. Z. Wang, "Image Retrieval: Ideas, Influences, and Trends of the New Age," ACM Computing Survey, vol. 40, pp. 5:1-60, 2008. F. Moosmann, E. Nowak, and F. Jurie, "Randomized Clustering Forests for Image Classification," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, pp. 1632-1646, 2008. Y. Deng and B. S. Manjunath, "Unsupervised segmentation of color-texture regions in images and video," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, pp. 800-810, 2001. C. Carson, S. Belongie, H. Greenspan, and J. Malik, "Blobworld: image segmentation using expectationmaximization and its application to image querying," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, pp. 1026-1038, 2002. F. Jurie and B. Triggs, "Creating efficient codebooks for visual recognition," Tenth IEEE International Conference on Computer Vision, vol. 1, pp. 604-610 2005. S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, 2006, pp. 2169-2178. D. G. Lowe, "Distinctive Image Features from ScaleInvariant Keypoints," Int’l J. Computer Vision, vol. 60, pp. 91-110, 2004. K. Yan and R. Sukthankar, "PCA-SIFT: a more distinctive representation for local image descriptors," Computer Vision and Pattern Recognition, Proceedings of the IEEE Computer Society Conference on, vol. 2, pp. 506-513, 2004. Y.-J. Zhang, Semantic-based visual information retrieval: IRM Press, 2007. A. Vailaya, M. A. T. Figueiredo, A. K. Jain, and Z. HongJiang, "Image classification for content-based indexing," Image Processing, IEEE Transactions on, vol. 10, pp. 117130, 2001. H. Feng and T.-S. Chua, "A bootstrapping approach to annotating large image collection," Multimedia Information Retrieval in ACM Multimedia, Workshop on, pp. 55-62, 2003. R. Shi, H. Feng, T.-S. Chua, and C.-H. Lee, "An adaptive image content representation and segmentation approach to automatic image annotation," International Conference on Image and Video Retrieval (CIVR), pp. 545-554, 2004. D. Tao, X. Tang, X. Li, and X. Wu, "Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, pp. 1088-1099, 2006. Y. Liu, D. Zhang, and G. G. Lu, "Region-based image retrieval with high-level semantics using decision tree leaning," The Journal of the Pattern Recognition Society, vol. 41, pp. 2554-2570, 2008. T. M. MITCHELL, Machine Learning: McGraw-Hill, 1997. S. F. Cheng, W. Chen, and H. Sundaram, "Semantic visual templates: linking visual features to semantics," image processing, International Conference on, ICIP, vol. 3, pp. 531-535, 1998. Y. Liu, D. Zhang, G. Lu, and W. Ma, "A survey of contentbased image retrieval with high-level semantics," The

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39] [40]

[41]

Journal of the Pattern Recognition Society, vol. 40, pp. 262282, 2007. X. Zheng, D. CAI, X. He, W.-Y. Ma, and X. Lin, "Locality preserving clustering for image database," Proceedings of the ACM International Conference on Multimedia, pp. 885 – 891, 2004. X. He and P. Niyogi, Locality preserving projections, Advances in Neural Information Processing Systems vol. 16: Cambridge, MA, MIT Press, pp. 153-60, 2004. S. Jianbo and J. Malik, "Normalized cuts and image segmentation," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 22, pp. 888-905, 2000. Y. Chen, J. Z. Wang, and R. Krovetz, "CLUE: cluster-based retrieval of images by unsupervised learning," Image Processing, IEEE Transactions on, vol. 14, pp. 1187-1201, 2005. G. Sheikholeslami, W. Chang, and Z. Aidong, "SemQuery: semantic clustering and querying on heterogeneous features for visual data," Knowledge and Data Engineering, IEEE Transactions on, vol. 14, pp. 988-1002, 2002. J. Li and J. Z. Wang, "Automatic Linguistic Indexing of Pictures by a statistical modeling approach," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, pp. 1075-1088, 2003. J. Li and J. Z. Wang, "Real-Time Computerized Annotation of Pictures," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, pp. 985-1002, 2008. R. Yong, T. S. Huang, M. Ortega, and S. Mehrotra, "Relevance feedback: a power tool for interactive contentbased image retrieval," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 8, pp. 644-655, 1998. X. S. Zhou and T. S. Huang, "Relevance feedback in image retrieval: a comprehensive review," Multimedia System, vol. 8, pp. 536–544, 2003. F. Jing, M. Li, H. Zhang, and B. Zhang, "Relevance feedback in region-based image retrieval," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 14, pp. 672-681, 2004. Y. Peng-Yeng, B. Bhanu, C. Kuang-Cheng, and D. Anlei, "Integrating relevance feedback techniques for image retrieval using reinforcement learning," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, pp. 1536-1551, 2005. Y. Chen, X. S. Zhou, and T. S. Huang, "One-class SVM for learning in image retrieval," in Image Processing, 2001. Proceedings. 2001 International Conference on, 2001, pp. 34-37 vol.1. S. Tong and E. Chang, "Support vector machine active learning for image retrieval," Proceedings of the ninth ACM international conference on Multimedia vol. 9, pp. 107 118, 2001. J. He, M. Li, H.-J. Zhang, H. Tong, and C. Zhang, "Mean version space: A new active learning method for contentbased image retrieval," Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval pp. 15-22, 2004. Y. Lin, T. Liu, and H. Chen, "Semantic manifold learning for image retrieval," Proceedings of the 13th annual ACM international conference on Multimedia, pp. 249 - 258, 2005. P. H. Gosselin and M. Cord, "Active Learning Methods for Interactive Image Retrieval," Image Processing, IEEE Transactions on, vol. 17, pp. 1200-1211, 2008. L. P. Kaelbling and A. W. Moore, "Reinforcement Learning: A Survey," Artificial Intelligence Research, vol. 4, pp. 237285, 1996. L. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms: John Wiley and Sons, 2004. R. Dara, M. Makrehchi, and M. Kamel, "Filter-Based Data Partitioning for Training Multiple Classifier Systems," Knowledge and Data Engineering, IEEE Transactions on, vol. 22, pp. 508-522, 2010. S. Basu, I. Davidson, and K. L. Wagstaff, Constrained Clustering, Advances in Algorithms, Theory, and Application CRC Press, 2009.