Merging Results for Distributed Content Based Image ... - CiteSeerX

Multimedia Tools and Applications, 24, 215–232, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

Merging Results for Distributed Content Based Image Retrieval∗ STEFANO BERRETTI [email protected] ALBERTO DEL BIMBO [email protected] PIETRO PALA [email protected] Dipartimento Sistemi e Informatica, Università di Firenze, via S. Marta 3, 50139 Firenze, Italy

Abstract. Searching information through the Internet often requires users to separately contact several digital libraries, use each library interface to author the query, analyze retrieval results and merge them with results returned by other libraries. Such a solution could be simplified by using a centralized server that acts as a gateway between the user and several distributed repositories: The centralized server receives the user query, forwards the user query to federated repositories—possibly translating the query in the specific format required by each repository—and fuses retrieved documents for presentation to the user. To accomplish these tasks efficiently, the centralized server should perform some major operations such as: resource selection, query transformation and data fusion. In this paper we report on some aspects of MIND, a system for managing distributed, heterogeneous multimedia libraries (MIND, 2001, http://www.mind-project.org). In particular, this paper focusses on the issue of fusing results returned by different image repositories. The proposed approach is based on normalization of matching scores assigned to retrieved images by individual libraries. Experimental results on a prototype system show the potential of the proposed approach with respect to traditional solutions. Keywords: digital libraries, collection fusion, distributed retrieval

1.

Introduction

Nowadays, many different document repositories are accessible through the Internet. A generic user looking for a particular information should contact each repository and verify the presence of the information of interest. This typically takes place by contacting the repository server through a client web browsing application. Then, the user is presented a web page that guides her/him in the process of retrieval of information. In a multimedia scenario this can be in the form of text documents, images, videos, audio and so on. This framework has two major drawbacks. First of all, in order to find some information of interest, the user should know in advance which repository stores it. Internet search engines such as Altavista, Google and Lycos, to say a few, can support the user in this task. However, if the user doesn’t know the existence of one repository s/he will never contact it, even if it is the only resource storing the desired information. ∗ This work is supported by EU Commission under grant IST-2000-26061. Detailed information about this project is available at http://www.mind-project.org.

216

BERRETTI, DEL BIMBO AND PALA

A second drawback is related to the fact that, in order to find the information of interest, the user typically contacts several repositories. Each one is queried and even for a limited number of repositories, the user is soon overwhelmed by a huge and unmanageable amount of—probably irrelevant—retrieved documents. A much more efficient and convenient solution, especially for the user, is to federate all repositories within a network featuring one central server. A generic user looking for some information sends the query to the server. This one is in charge to forward the query to all, or a subset of all, the networked resources. All retrieved documents, that is the set of all documents returned by each resource, are gathered by the central server and conveniently arranged for presentation to the user. The implementation of this solution develops upon the availability of three main functions: resource buffering, resource selection and data fusion. Resource buffering. In general, each resource has its own ways of describing, accessing and presenting stored information. These correspond to different choices in terms of descriptors of content, query paradigms and visualization metaphors. In order to allow the user to seamlessly query several different resources and receive their results, a module is required that acts as a buffer translating information back and forth between the resource and the central server. In its simplest form, this module is just a wrapper that accomplishes simple format translation of data, commands and content descriptors. However, in a more realistic scenario, this module should also be responsible for management of heterogeneous content representations. For instance, the user query is represented through a 64 bins color histogram whilst the digital library process queries only if are in the form of a 32 bins color histogram. Resource selection. In general, it is not always convenient to send the user query to all federated resources. Several considerations can be brought forward to support this decision: sending the query to all resources contribute to network congestion; some resources provide documents for free whereas some others don’t; some resources may have much more relevant documents than others. More precisely, resource selection is the process that, given a user query, estimates the optimum number of documents that should be retrieved from each resource (the query is not sent to resources that should provide no document) [5, 10, 11]. An efficient resource selection relies on several information, including cost factors as well as descriptors of content for each resource (resource descriptions [2]). Data fusion. Since each resource uses its own ways to represent image content and compute the similarity between images, retrieval lists returned by different libraries may not be directly comparable. Furthermore, even if matching scores are associated with retrieved images, they are not normalized with each other. This prevents the possibility to infer the relative relevance of documents returned by different resources from the analysis of their matching scores. However, comparing the relative relevance of all retrieved documents is necessary in order to present them in a suitable form to the user. Data fusion is the process by which retrieval lists returned by different resources are normalized with each other so as to enable comparison of the relative relevance of all retrieved documents [22]. In this paper, a novel approach is presented for fusion of results returned by distributed digital image libraries. In particular, we focussed on the case in which documents scores are available in the result lists. In the case in which scores cannot be obtained by every

MERGING RESULTS FOR DISTRIBUTED CONTENT BASED IMAGE RETRIEVAL

217

library, an additional module is required to map document rank into a document score, possibly relying on duplicates and cross matching between results returned by different libraries. The proposed approach relies on learning normalization coefficients. These are used to normalize, with respect to a reference scale, document scores associated by different libraries. This is accomplished through two separate steps. In the first step, each library is processed separately through a set of sample queries. For each sample query, the value of normalization coefficients is computed. In the second step, known values of normalization coefficients are used to estimate their values for new queries. In addition, in order to make the discussion more concrete, we present the data fusion as the final stage of the wider process carried out by a distributed retrieval system. In particular, we will refer to the architecture developed in the context of the MIND project—Resource Selection and Data Fusion for Multimedia INternational Digital Libraries [15]—focusing on the problem of resource description extraction. In fact, the way resource descriptors are extracted largely affects the applicability and quality of data fusion approaches. This paper is organized as follows. In Section 2, the overall architecture of the MIND system is described, focussing on the resource description process. In Section 3, previous work related to data fusion for distributed libraries is reviewed. In Section 4, the proposed model for data fusion is described. Finally, experimental results are reported in Section 5, and conclusions are drawn in Section 6. 2.

Architecture

Several mediator/federated systems have been developed since the early 1990s. Few of these are the TSIMMIS project [6], which is primarily focused on the semi-automatic generation of wrappers, translators and mediators; the DISCO [20] and HERMES [19] projects which also address related issues. Among the others, the GARLIC project [8], focused on the specific case of federating multiple multimedia sources. Figure 1 shows the MIND architecture for the overall distributed retrieval system. It can be noticed that each archive is linked to the network through a proxy module. Each proxy stores all relevant information about the resource it gives access to. This information includes summaries of resource content (resource descriptions), specification of supported media types, database schema and query predicates for each schema attribute, rules to format a query so that it can be processed by the resource, and rules to format retrieval results so that they conform to a common, predefined syntax. A new proxy is built each time a new resource wants to join the network. At that time, the resource is sampled either following a cooperative protocol or an uncooperative one (only for text libraries). In the former case, it is assumed that the digital library adheres to the project, joining the network of federated libraries and providing content descriptors for all its documents. The latter case occurs when the digital library does not explicitly join the federated network, and its content can be only estimated through some form of sampling. Sampling the resource aims at the extraction of the resource description, that is a summary of resource content. Throughout the paper, it is assumed that image digital libraries adhere to a cooperative protocol.

218


Figure 1.

Simplified overview of the MIND system architecture for distributed retrieval.

Resource descriptions play a central role in the operational cycle of the distributed retrieval system, and are used both for resource selection and for data fusion. Resource selection is the process that, given a user query, selects which resources are best candidates to contain relevant items. Of course, selection is not performed by actually issuing the query to the resource. Rather, estimation of the relevant items that are contained in a generic resource is accomplished by comparing the query against the resource description. Resource descriptions are also used for data fusion, as described in the next Sections. In general, being resource descriptions a summary of resource content, they are used to help the definition of a model representing how items are distributed in the collection (resource). 2.1.

Extraction of resource descriptions

Regardless of the media characterization of database items (text, audio, images, videos, etc.), previous approaches to computation of resource descriptors rely on some form of clustering [5]: database items are grouped (clustered) according to some homogeneity criteria which corresponds to the proximity of items in some suitable representation space. Then, each cluster is represented through its distinguishing features: typically, the value of the feature for which cluster items are homogeneous. In the following, we assume that (i) the content of each image is represented through a multidimensional feature vector and (ii) similarity of content between two images is measured through a metric distance between their feature vectors. Extraction of resource descriptors relies on organization of feature vectors into a metric access structure [7]. Computation of this structure induces a hierarchical clustering of feature vectors: at each level of the hierarchy, similar vectors are grouped together, while dissimilar ones are located into separated clusters. The resulting multi-level structure represents the database content at different levels of resolution, through a tree of nodes. Specifically, each node is associated with a fixed maximum number m of entries. Each entry is the root of a


219

Figure 2. (a) Multiresolution clustering of database items. The hierarchy of cluster centers is also shown. (b) Resource descriptors extraction for different values of granularity. Composition and cardinality of resource descriptors change with different granularities.

sub-tree and is representative for all the n V feature vectors included in its sub-tree sub V . According to this, each node entry V (routing vector) can be seen as the center of the cluster with radius µV , which comprises all the feature vectors in the sub-tree of V (i.e. µV is the maximum distance between the routing vector and a generic vector belonging to its sub-tree). According to this organization, layer of the tree (composed of nodes at the same height) can be considered as an abstraction of database content at a certain level of resolution. In fact, routing vectors representative of a generic resolution level (level-i), are derived by promoting routing vectors from the immediately lower level of the tree. In this way, proceeding towards the upper levels, representative routing vectors abstract the content of the database using a reduced number of vectors. An example is shown in figure 2(a) where a sample set of items is clustered at three different resolution levels. During access to the index, the triangular inequality can be used to support efficient processing of range queries, i.e. queries seeking for all the images in the archive which are within a given range of distance from a query image Q. To this end, the distance between Q and any image in the covering region of a routing vector V can be lower-bounded using the radius µV and the distance between V and Q. Specifically, if µmax is the range of the query (that is, the query returns all images which are similar to a given image with a threshold of µmax distance), the following condition can be employed to check whether all the feature vectors in the covering region of V can be discarded, based on the sole evaluation of the distance µ(Q, V ): µ(Q, V ) ≥ µmax + µV → no vector acceptable in subV

(1)

In similar manner, an opposite condition checks whether all the feature vectors in the covering region of V fall within the range of the query (in this case, all the feature vectors

220


in the region can be accepted). In the case that neither of the two conditions holds, the covering region of V may contain both acceptable and non-acceptable feature vectors, and the search must be repeated on the sub-index sub V . K -nearest neighbor queries [16], seeking for the K feature vectors that are most similar to the query, can also be managed in a similar manner. This is obtained by regarding the query as a particular case of range query in which the range µmax is determined during the search. 2.2.

Resource descriptions at different granularities

The hierarchical organization of database items induced by multi-level clustering is exploited to derive resource descriptors at different levels of granularity. In this way, estimation of database population relevant to a query relies on resource descriptors selected from a specific resolution accordingly to some trade-off between the accuracy of the estimate and the efficiency of the estimation process. In principle, the multi-level clustering approach of Section 2.1 is applicable both to the case of cooperative or un-cooperative image libraries. For cooperative libraries, a separate index is constructed for every individual resource, by using the entire set of content descriptors provided by the library for all its images. In the case of un-cooperative libraries, image descriptors are not directly accessible to the federated network; in this case the library content can be only estimated through some form of sampling. A possible approach is that of randomly quering the library and considering the returned result sets as representative of the library content. During this sampling process, images are gathered into a local database; after that, content descriptors are extracted from all these images. Locally extracted descriptors can be used to construct the index of resource descriptor in the same way used for cooperative libraries. It is worth noting that the effective use of such descriptors in the resource selection phase largely depends on the way local descriptors are capable to replicate the actual behaviour of the unknown descriptor used by the particular library. In the following, a cooperative protocol is assumed. As evidenced in Section 2.1, database items are organized in clusters each identified by a routing object and a radius. According to this, cluster items are subsequently decomposed in more coherent clusters (i.e. clusters with a smaller radius) from the top to the bottom of the hierarchy. In so doing, cluster radius does not-increase descending the hierarchy towards the leaf level. The resource descriptor for a particular granularity is thus determined by imposing a maximum value for the cluster radius and using it to traverse the hierarchy. Along each path, starting from the root node, the search is stopped when a cluster with radius less than the given maximum is encountered, and the corresponding routing object is retained as one of the components of the resource descriptor at that granularity (i.e. the set of routing objects collected in this process constitute the resource descriptor). The value of the maximum is used to identify a specific granularity, that is, in the hypothesis of a distance normalized in the [0, 1] interval, the resource descriptor for granularity 0.1 comprises the routing objects with radius less than 0.1, selected according to the above process. A schematic representation of resource descriptors extraction is reported in figure 2(b).


221

By repeating the resource descriptors selection process for different values of granularity, independent resource descriptors are extracted. In this way, for each granularity, the entire set of database items collected into the hierarchy is reduced to a resource descriptor constituted by a set of components each associated with the feature vector of the cluster it represents, the cluster radius, and the population of the cluster. 3.

Related work on data fusion

In general, the main problem with merging multiple ranked result lists is the use of many different similarity metrics and their incompatibility for comparison. Out of the specific context of data fusion, works which are strictly related are those in [1, 12] and [9]. In Adali et al. [1], proposed a similarity algebra that brings together relational operators and results of multiple similarity implementations in a uniform language. This algebra can be used to specify complex queries that combine different interpretations of similarity values and multiple algorithms for computing these values. Ilyas et al. [12], introduced a pipelined query operator, that produces a global rank from input ranked streams based on a score function. Considering the problem under a different perspective, in Fagin et al. [9] introduced various distance measures between the top k documents in different retrieval lists which also provides an approximation algorithms for the rank aggregation problem with respect to a large class of distance measures. Most of the approaches proposed for merging results apply to text libraries and, for some of them, the extension to deal with multimedia libraries is not straightforward. A major difficulty related to the extension of approaches developed for text to deal with images is due to the difference, in terms of computational complexity, related to computation of the similarity between two text documents and two images. Comparison of text documents can be accomplished very quickly. Differently, comparison of two images typically takes much more time, due to the processing time required to extract content descriptors to be used during distance computation. This prevents the possibility of re-ranking retrieval results if these are in the form of images. One of the most known approaches is called Round-Robin [21]. In this approach, it is assumed that each collection contains approximately the same number of relevant items that are equally distributed on the top of the result lists provided by each collection. Therefore, results merging can be accomplished by picking up items from the top of the result lists in a round-robin fashion—the first item from the first list, the first from the second list, . . . , the second from the first list and so on. Unfortunately, the assumption of uniform distribution of relevant retrieved items is rarely observed in practice, especially if libraries are generic collections available on the Web. As a result, the effectiveness of the Round-Robin approach is very limited. A different solution develops on the hypothesis that the same search model is used to retrieve items from different collections. In this case, items matching scores are comparable across different collections and they can be used to drive the fusion strategy. This approach is known as Raw Score Merging [14], but it is rarely used due to the large number of different search models that are used to index documents even in text libraries.

222


To overcome limitations of approaches based on raw score matching, more effective solutions have been proposed developing on the idea of score normalization. These solutions assume that each library returns a list of retrieved items with matching scores. Then, some technique is used to normalize matching scores provided by different libraries. Score normalization can also be accomplished by looking for duplicate documents in different lists. The presence of one document in two different lists is exploited to normalize scores in the two lists. Unfortunately, this approach cannot be used if one or more lists are composed only of unique documents, that is documents that are not included in other lists. A more sophisticated approach based on cross similarity of documents is presented in [13]. For each list, each document is used as a new query. Retrieval results for these new queries as well as matching scores are used to evaluate cross similarities between documents retrieved by different libraries. Then, cross similarities are used to normalize matching scores. The main limitations of this approach are related to its computational complexity—each document in each retrieved list should be used as a new query for all the libraries—and to the fact that it requires the extraction of content descriptors from each retrieved document. This latter requirement that can be easily accomplished for text document, can be quite challenging for image documents. In order to lessen these requirements, in [17] a new approach is presented that is based on the use of a local archive of documents to accomplish score normalization. Periodically, libraries are sampled in order to extract a set of representative documents. Representative documents extracted from all the libraries are gathered into a central archive. When a new query is issued to the libraries, it is also issued to the central search engine that evaluates the similarity of the query with all documents in the central archive. Then, under the assumption that each retrieved list contains at least two documents that are also included in the central archive, linear regression is exploited to compute normalization coefficient, for each retrieved list, and normalize scores. This approach has been successfully tested for text libraries that were described through the Query-based Sampling approach. Briefly, Query-based Sampling [3] is a method to extract a description of the content of a given resource by iteratively running queries to the resource and analyzing retrieved results. At the end of the iterative process, all retrieved documents are used to compute a description of the resource content. Unfortunately, when the approach presented in [17] is not combined with a description of each resource extracted using Query-based Sampling, the assumption that each retrieved list contains two elements of the central archive is rarely verified. In these cases, linear regression cannot be applied and matching scores are left un-normalized. As an example, figure 3 shows the ratio of queries that are successfully processed using the method described above for a database of images. In this experiment, the central archive is constructed using the approach described in Section 2.1. In the figure, the horizontal axis reports the number of retrieved images in the result set. The vertical axis reports the ratio of queries (averaged over 1000 random queries) that return at least two elements also included in the central archive. Several curves are reported, corresponding to different cardinalities of the central archive measured as percentage with respect to the library size. For instance, using a central archive that contains only 10% of images in the library, the average percentage that among the first 40 retrieved images there are at least 2 images of the central archive is only 80%.


223

Figure 3. Applicability to a library of images of the data fusion approach using sampling and regression. Different curves represent the ratio of manageable queries at different granularities. For example, the curve at 16% indicates the ratio of manageable queries using a central archive which comprises about 16% of the images in the source.

4.

Data fusion model

The proposed solution develops on the method presented in [17, 18]. Improvements are introduced to address two major limitations of the original method: applicability of the solution to the general case, regardless of the particular solution adopted for resource description extraction; reduction of computational costs at retrieval time. These goals are achieved by decomposing the fusion process in two separate steps: model learning and normalization. Model learning is carried out once, separately for each library: it is based on running a set of sampling queries to the library and processing retrieval results. In particular, for each sampling query, the library returns a list of retrieved images with associated matching scores. These images are reordered using a fusion search engine that associates a normalized score to each image. Then, the function mapping original scores onto normalized scores is learned. Differently, normalization takes place only at retrieval time. It allows, given a query and results provided by a library for that specific query, to normalize matching scores associated with retrieved results. This is achieved by using the normalization model learned for that library during the model learning phase. In doing so, data fusion is achieved through a model learning approach: during the first phase parameters of the model are learned; during the second one the learned model is used to accomplish score normalization and enable fusion of results based on normalized scores merging. Selection of the fusion search engine plays a central role in the data fusion process. The type of search engine, its way of describing image content and computing the similarity between images, affects the accuracy of score normalization. In general, the higher is the similarity between the fusion engine and a specific digital library engine–in terms of content representation and similarity measure—the higher the accuracy of score normalization. 4.1.

Score modelling

Let L(i) (q) = {(Ik , sk )}nk=1 be the set of pairs image/score retrieved by the i-th digital library as a result to query q. We assume that the relationship between un-normalized scores sk and

224


their normalized counterpart σk can be approximated by a linear relationship. In particular, we assume σk = a ∗ sk + b + k , where k is an approximation error (residue), a and b two parameters that depend on the digital library (i.e. they may not be the same for two different digital libraries, even if the query is the same) and on the query (i.e. for one digital library they may not be the same for two different queries). The approximation error k accounts for the use of heterogeneous content descriptors (the digital library and the data fusion server may represent image content in different ways) as well as for the use of different similarity measures (even if the digital library and the data fusion server use the same image content descriptors—e.g. 64 bins color histogram—they may use two different similarity measures—e.g. one may compare histograms through a quadratic form distance function, the other may use the histogram intersection). Values of parameters a and b are unknown at the beginning of the learning process. However, during learning, values of these parameters are computed, separately for each library. For each sample query, the value of parameters a and b is estimated. Computing the value of parameters a and b for different queries is equivalent to sampling the value distribution of these parameters on a grid of points in the query space. Knowledge of parameter values on the grid points is exploited to approximate the value of parameters for new queries. This is accomplished by considering the position of the new query with respect to grid points. Approximation of parameters a and b for a new query q is carried out as follows. If the new query matches exactly one of the sample queries (say qk ) used during the learning process, then the value of a and b is set exactly to the values ak and bk that were computed for the sample query qk . Otherwise, the three grid points q j , qk , ql that are closest to the new query q are considered. The value of a and b for the new query is approximated by considering values of a and b as they were computed for queries q j , qk , ql . In particular, if these values were a j , b j , ak , bk , al and bl , then values of a and b for the query qk are estimated as: aˆ =

dj dk dl a j + ak + al , D D D

dj dk dl bˆ = b j + bk + bl D D D

(2)

where D = d j + dk + dl , and d j , dk and dl are the Euclidean distances between the new query and grid points q j , qk , ql , respectively (see figure 4(a)). Using Eq. (2) is equivalent to estimate values of parameters a and b as if the new query lain to the hyperplane passing by q j , qk and ql . For this assumption to be valid, the new query should be close to the hyperplane. The closer it is, the more precise the approximation. As a consequence of this approximation, normalized score is estimated as: σˆk = aˆ ∗ sk + bˆ so that the error k = σk − σˆk is a function of the query q. In the proposed approach, linear regression is the mathematical tool used for the estimation of normalization coefficients, given a sample query and the corresponding retrieved images. In our case, data points are pairs (σi , si ) being si the original matching score of the i-th image and σi its normalized score. The application of this method to a collection of pairs (σi , si ) results in the identification of the two parameters a and b such that σi = a ∗si +b+i . Values of parameters a and b are determined so as to minimize the sum of square distances


225

Figure 4. (a) Position of a new query q with respect to the approximating points q j , qk , ql , on the grid. (b) Normalized scores σk with respect to un-normalized ones sk for a sample query. The continuous straight line approximates the data model using linear regression.

of data points to the model line: min a,b

(ask − σk + b)2 a2 + 1 k

(3)

Figure 4(b) plots, for a representative sample query, the values of normalized scores σk with respect to un-normalized ones sk . The straight line is that derived during the learning phase by computing parameters a and b of the linear regression between scores σk and sk . This shows that the linear model assumption is quite well fulfilled, at least for the first few retrieved images. In this experiment, we used the fusion engine with the features described in Section 4 and a digital library which supports color search based on a 64 bins histogram and Euclidean metric of distance. 4.2.

Selection of sample queries

Through the learning phase, distribution of parameters a and b is sampled for a set of queries. Selection of these sample queries affects the quality of the approximation of a and b for a generic query in the query space during normalization. Basically, two main approaches can be distinguished to select sample queries. The first approach completely disregards any information available about the distribution of images in the library. In this case, the only viable solution is to use sample queries uniformly distributed in the query space. A different approach relies on the availability of information about the distribution of images in the library. This should not be considered a severe limitation since this kind of information is certainly available to accomplish the resource selection task. In particular, this information is available in the form of a resource descriptor capturing the content of the entire library. Typically, resource descriptors are obtained by clustering library images and retaining information about cluster centers, cluster population and cluster radius. Clustering is performed at several resolution levels so as to obtain multiple

226


cluster sets, each one representing the content of the library at a specific granularity (see Section 2.2). Resource descriptors can be used to guide the selection of sample queries by using the cluster center itself as the sample query. In this way, the distribution of sample queries conforms to the distribution of images in the library. Once the resource description process is completed, each cluster center ck is used as a query to feed the library search engine. This returns a set of retrieved images and associated matching scores si . Then, ck is used as a query to feed the reference search engine (i.e. the search engine used by the data fuser) that associates with each retrieved image a normalized (reference) matching score σi . Linear regression is used to compute regression coefficients (ak , bk ) for the set of pairs (σi , si ). In doing so, each cluster center ck is associated with a pair (ak , bk ), approximating values of regression coefficients for query points close to ck . The set of points (ck , ak , bk ) can be used to approximate the value of regression coefficients (a, b) for a generic query q by interpolating values of (ak , bk ) according to Eq. (2). It is worth noting that, in general, feature vectors capturing image content are defined in a high dimensional space, thus making difficult a uniform coverage of the whole feature space by using a limited set of sample queries. However, the use of resource descriptors as sampling points reduces this complexity by adaptively locating sampling queries in those parts of the space where the features vectors are more dense. In fact, points of resource descriptors corresponds to cluster centers in the feature space. This guarantees an accurate coverage of the feature space with a limited number of sample queries. In addition, use of resource descriptors at different granularities as sample queries, yields an adaptive coverage of the feature space at different levels of resolution. Finally, we note that the computational complexity of the learning process is determined by the combination of several factors. In particular, the type of content descriptor used by the data fusion engine and its similarity measure are the two factors that most affect the computational load. Furthermore, since the query process takes place in a distributed environment, the connection speed over the network becomes the main bottleneck of the learning phase. In fact, the query time sums up the time to deliver the query to the digital library, the search time on the source side, and the time to transfer all the result images to the data fusion engine; only at this time re-processing and learning phases can be accomplished by the data fusion module. 5.

Experimental results

The proposed solution to data fusion for distributed digital libraries has been implemented and tested on a benchmark of three different libraries each one including about 1000 images. Two libraries represent image content through 64 bins color histograms. However, they use two different similarity measures, namely L 1 (city-block) and L 2 (Euclidean) norm, in order to compare image histograms. The third library computes local histograms on the components of a 3 × 3 grid which partitions the image. The CIE Lab is used as reference color space for the computation of this histogram, while the Euclidean distance is used to compare image histograms.


227

For the experiments described in this session, the fusion engine represents image content through a 32 bins color histogram and uses histogram intersection in the RGB space to compute image similarity. Experimental results are presented to report both on the quality of normalization coefficients approximation and on the overall quality of the data fusion process. Each library was processed separately. Libraries were first subject to the extraction of resource descriptors according to the procedure presented in Section 2.1. Cluster centers extracted through the resource description process were used as sample queries. For each sample query, values of normalization coefficients were computed, as explained in Section 4.1. Evaluation of the proposed solution is carried out addressing two different accuracy measures. The first one describes the accuracy associated with approximation of normalization coefficients. The second one describes the accuracy associated with ranking of merged results. Values of normalization coefficients for sample queries were used to approximate the value of normalization coefficients for new queries. In particular, the accuracy of the approximation was measured with reference to a random set of test queries, not used during model learning. For each test query, values of normalization coefficients were approximated according to Eq. (2). Actual values of normalization coefficients were also computed by running the query and following the same procedure used for model learning (Section 4.1). Comparison of actual and approximated values of normalization coefficients gives a measure of the approximation accuracy. Let a and b be the estimated value of normalization parameters and aˆ and bˆ their actual values. Quality of the approximation is measured by considering the expected value of the normalized errors:

ˆ |a − a| a = E , ˆ 1 + |a|

ˆ |b − b| b = E ˆ 1 + |b|

(4)

Plots in figure 5 report values of the expected normalized errors for different values of the granularity of resource descriptions. Granularity is measured as the percentage of resource descriptions cardinality with respect to the database size. According to this, a smaller granularity corresponds to a lower number of sample images used at learning time. It can be noticed that, even with coarse descriptors corresponding to low granularity levels, the value of the approximation error is always quite small. This experimentally confirms that it is possible to use estimated parameters for the data fusion process instead of computing their actual values on the fly at retrieval time. The second measure of accuracy aims to estimate the quality of the data fusion and of the ordering of retrieved images that it yields. This is obtained by considering how images are ordered using the data fusion engine to process all retrieved images. This results in the comparison of two ordered lists of images. The first list (data fusion list) is the output of the proposed data fusion module (called Learning and Regression, L R in the following). The fusion module receives the N best ranked retrieved images from each library and transforms their original scores into

228


Figure 5. Normalization error for approximated parameters of the three digital libraries: (a) error on parameter a and (b) error on parameter b.

normalized scores. This yields a list of N ∗ L images (being L the number of available digital libraries) ordered by normalized scores. A second ranked list (reference list) is obtained by ranking the set of all N ∗ L retrieved images using the data fusion search engine. To build this second list, each one of the N ∗ L images is processed in order to extract its content descriptor. Then, these content descriptors are compared against the user query and all images are sorted in decreasing order of similarity. For this experimentation, the content of each image is described locally in terms of a color histogram (according to the representation used by the data fusion engine, see Section 4), while the similarity between two image histograms is evaluated through the histogram intersection distance. The three digital libraries used in the experiment are those described at the beginning ot this Section. (d f ) (d f ) M M Let L(r ) (q) = {(dk(r ) , σk(r ) )}k=1 and L(d f ) (q) = {(dk , σk )}k=1 be the set of pairs image/normalized-score for the reference and the data fusion list, respectively, obtained by combining results returned from a query q. The size of both the lists is indicated as M = N ∗ L. Let f : {1, . . . , M} → {1, . . . , M} a function that associates the index i of a generic image in the reference list to the index f (i) of the same image in the data fusion list. We assume that the best result for the data fusion process corresponds to the case in which every image in the data fusion list has the same rank that the image yields in the reference list, that is: rank(r ) (i) = rank(d f ) ( f (i)). Thus, a measure of the overall quality of the data fusion process can be expressed through the following function: F=

1 rank(r ) (i) − rank(d f ) ( f (i)) N i=1,N

(5)

that is, the average value of rank disparity computed on the first N images of the two lists. This corresponds to the case in which a user asks for N images and the resource selection module queries more than one library retrieving N images from each of them. Then, only the N images in the fused list which are more relevant are shown to the user.


229

Figure 6. (a) Quality of the data fusion for a sample query. The straight diagonal line is the ideal mapping which preserves the image rankings between the reference and the data fuser list. (b) Quality of the data fusion for the Learning and Regression method (LR curve) and for the Round-Robin approach (RR curve). Values of F are reported for different granularities.

The lower the value of F the higher the quality of data fusion (i.e. 0 corresponds to an optimal data fusion). As an example, figure 6(a) plots values of F(i) = |rank(r ) (i) − rank(d f ) ( f (i))| for a sample query and resource descriptions granularity equal to 4. In this experiments the three libraries are considered (L = 3), each returning sixteen images (N = 16), so that the maximum size of the fused and reference lists is equal to M = 16 × 3 = 48. Resulting values are reported only for the first 32 images ranked according to the reference list. The horizontal axis represents the ranking produced according to the reference list, while the vertical axis is the ranking for the list originated by the data fuser. In this graphic, the straight diagonal line represents the behavior of an ideal fuser, which yields the same ranking given by the reference list. Each displacement with respect to this line indicates a difference in ranking the corresponding image between the two lists. It can be observed that the error is quite small for all ranks. In particular, the error is small for the first and more relevant images of the ranking. In addition only few of the images comprised in the first 32 of the reference list are ranked outside of this limit in the L R list. A similar behavior has been experimented in the average case. Finally, the proposed L R solution for data fusion has been compared to the general Round-Robin (R R) approach [21]. Figure 6(b) compares values of F for L R and R R, averaged on 100 random queries and for different granularities. It results that the average ranking error for L R is quite small and decreases with the granularity. It can be also observed that the Learning and Regression solution yields better results with respect to R R approach, with the gain of L R increasing with the granularity. 6.

Conclusion and future work

In this paper, an approach is presented for merging results returned by distributed image libraries. The proposed solution is based on learning and approximating normalization

230


coefficients for each library. Since the major computational effort is moved to the learning phase, the approach has a low processing time at retrieval time, making it applicable to archives of images. In addition, differently from other fusion methods which rely on the presence of common documents in different result lists, the proposed fusion strategy is completely general, not imposing any particular constraint on the results. Experimental results are reported to demonstrate the effectiveness of the proposed solution. A comparative evaluation has also shown that the proposed Linear and Regression solution outperforms the general Round-Robin approach to the data fusion problem. Future work, will address experimentation and comparison of the effect of the learning technique on fusion effectiveness. In particular, we will investigate the possibility to adapt the learning model to the specific digital library in order to use more appropriate models than the linear one. Issues related to the relationships between granularity of sample queries and accuracy of fusion will be also investigated together with a detailed analysis of relationships between the choice of fusion search engine and accuracy of model learning.

References 1. S. Adali, P.A. Bonatti, M.L. Sapino, and V.S. Subrahmanian, “A multi-similarity algebra,” in Proc. of ACM SIGMOD Conf. on Management of Data, Seattle, WA, June 1998, pp. 402–413. 2. S. Berretti, A. Del Bimbo, and P. Pala, “Using indexing structures for resource descriptors extraction from distributed image repositories,” in Proc. IEEE Int. Conf. on Multimedia and Expo, Lousanne, Switzerland, Aug. 2002, Vol. 2, pp. 197–200. 3. J. Callan and M. Connell, “Query-based sampling of text databases,” ACM Transactions on Information Systems, Vol. 19, No. 2, pp. 97–130, 2001. 4. W. Chang, D. Murthy, A. Zhang, and T. Syeda-Mahmood, “Metadatabase and search agent for multimedia database access over internet,” Int. Conf. on Multimedia Computing and Systems (ICMCS’97), Ottawa, Canada, June 1997. 5. W. Chang, G. Sheikholaslami, A. Zhang, and T. Syeda-Mahmood, “Efficient resource selection in distributed visual information retrieval,” in Proc. of ACM Multimedia’97, Seattle, 1997. 6. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom, “The TSIMMIS project: Integration of heterogeneous information sources,” in Proceedings of IPSJ Conference, Tokyo, Japan, Oct. 1994, pp. 7–18. 7. P. Ciaccia, M. Patella, and P. Zezula, “M-tree: An efficient access method for similarity search in metric spaces,” in Proc. of the Int. Conf. on Very Large Databases, Athens, Greece, 1997. 8. W.F. Cody, L.M. Haas, W. Niblack, M. Arya, M.J. Carey, R. Fagin, M. Flickner, D. Lee, D. Petkovic, P.M. Schwarz, J. Thomas, M. Tork Roth, J.H. Williams, and E.L. Wimmers, “Querying multimedia data from multiple repositories by content: The garlic project,” IFIP 2.6 Third Working Conference on Visual Database Systems (VDB-3), Lausanne, Switzerland, March 1995, pp. 17–35. 9. R. Fagin, R. Kumar, and D. Sivakumar, “Comparing top k lists,” in Proc. of the ACM-SIAM Symposium on Discrete Algorithms (SODA’03), Baltimore, MD, Jan. 2003, pp. 28–36. 10. N. Fuhr, “Optimum database selection in networked IR,” in Proc. of the SIGIR’96 Workshop on Networked Information Retrieval, Zurich, Switzerland, Aug. 1996. 11. L. Gravano and H. Garcia-Molina, “Generalizing gloss to vector-space databases and broker hierarchies,” in Proc. of the 21st Int. Conf. on Very Large Data Bases, 1995, pp. 78–89. 12. I.F. Ilyas, W.G. Aref, and A.K. Elmagarmid, “Joining ranked inputs in practice,” in Proc. of the VLDB Conference, Hong Kong, China, 2002, pp. 950–961. 13. S.T. Kirsch, Document Retrieval Over Networks wherein Ranking and Relevance Scores are Computed at the Client for Multiple Database Documents. US patent 5659732, 1999.


231

14. K.L. Kwok, L. Grunfeld, and D.D. Lewis, “TREC-3 Ad-hoc, routing retrieval and thresholding experiment using PIRCS,” in Proc. of TREC-3, 1995, pp. 247–255. 15. MIND: Resource Selection and Data Fusion for Multimedia International Digital Libraries. EU project IST2000-26061. http://www.mind-project.org. 16. N. Roussopoulos, S. Kelley, and F. Vincent, “Nearest neighbor queries,” in Proc. of the 1995 ACM SIGMOD Int. Conf. on Management of data, San Jose, CA, May 1995, pp. 71–79. 17. L. Si and J. Callan, “Using sampled data and regression to merge search engine results,” in Proc. of Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Tampere, Finland, 2002, pp. 19–26. 18. L. Si and J. Callan, “A semisupervised learning method to merge search engine results,” ACM Transactions on Information Systems, Vol. 21, No. 4, pp. 457–491, 2003. 19. V.S. Subrahmanian, S. Adali, A. Brink, R. Emery, J. Lu, A. Rajput, T.J. Rogers, R. Ross, and C. Ward, HERMES: Heterogeneous Reasoning and Mediator System. http://www.cs.umd.edu/projects/hermes/. 20. A. Tomasic, L. Raschid, and P. Valduriez, “Scaling heterogeneous databases and the design of disco,” in Proc. of the Int. Conf. on Distributed Computer Systems, Hong-Kong, 1996. 21. E.M. Vorhees, N.K. Gupta, and B. Johnson-Laird, “Learning collection fusion strategies,” in Proc. ACMSIGIR’95, 1995, pp. 172–179. 22. E.M. Vorhees, N.K. Gupta, and B. Johnson-Laird, “The collection fusion problem,” in The Third Text REtrieval Conference (TREC-3).

Stefano Berretti received the doctoral degree in Electronics Engineering and the Ph.D. in Information Technology and Communications from the Università di Firenze, in 1997 and 2001, respectively. Since 2002, he is Assistant Professor of Computer Engineering at the University of Firenze. His current research interest is mainly focused on content modeling, retrieval, and indexing for image and video databases.

Alberto Del Bimbo is Full Professor of Computer Engineering at the Università di Firenze, Italy. Since 1998 he is the Director of the Master in Multimedia of the Università di Firenze. At the present time, he is Deputy Rector of the Università di Firenze, in charge of Research and Innovation Transfer. His scientific interests are Pattern Recognition, Image Databases, Multimedia and Human Computer Interaction. Prof. Del Bimbo is the author of over 170 publications in the most distinguished international journals and conference proceedings. He is the author of the “Visual Information Retrieval” monography on content-based retrieval from image and video databases edited by Morgan Kaufman.

232


He is Member of IEEE (Institute of Electrical and Electronic Engineers) and Fellow of IAPR (International Association for Pattern Recognition). He is presently Associate Editor of Pattern Recognition, Journal of Visual Languages and Computing, Multimedia Tools and Applications Journal, Pattern Analysis and Applications, IEEE Transactions on Multimedia, and IEEE Transactions on Pattern Analysis and Machine Intelligence. He was the Guest Editor of several special issues on Image databases in highly respected journals.

Pietro Pala graduated in Electronics Engineering at the University of Firenze in 1994. He received the Ph.D. in Information and Telecommunications Engineering in 1997. Presently, he is Assistant Professor of Computer Engineering at the University of Firenze, Italy where he teaches Database Management Systems. His main scientific interests include Pattern Recognition, Image and Video Databases and Multimedia Information Retrieval. He is author of over 40 publications in the most distinguished international journals and conference proceedings and serves as reviewer for a number of international journals, including IEEE Transactions on Multimedia, IEEE Transactions on Pattern Analysis and Machine Intelligence and IEEE Transactions on Image Processing.

Merging Results for Distributed Content Based Image ... - CiteSeerX

Merging Results for Distributed Content Based Image ... - CiteSeerX

Suggest Documents

A Distributed Algorithm for Content-Based Image

Geographically Distributed Complementary Content-Based Image

mobile agents for content-based www distributed image retrieval - arXiv

Improved Seam Merging for Content-Aware Image Resizing

DISTRIBUTED MATLAB BASED SIGNAL AND IMAGE ... - CiteSeerX

multi-feature content-based image retrieval - CiteSeerX

Content-Based Image Retrieval Using Multiresolution ... - CiteSeerX

Evaluating Content Based Image Retrieval Techniques ... - CiteSeerX

Content-Based Image Retrieval in Radiology - CiteSeerX

Content-based Image Retrieval in Astronomy - CiteSeerX

Content-based 3D Neuroradiologic Image Retrieval - CiteSeerX

Multi Feature Content Based Image Retrieval - CiteSeerX

Content-based histopathology image retrieval using ... - CiteSeerX

Evaluating Content Based Image Retrieval Techniques ... - CiteSeerX

Content-Based Image Retrieval

Content Based Image Retrieval Approaches for Detection ... - CiteSeerX

Color and spatial feature for content-based image retrieval - CiteSeerX

Content-based image retrieval in medical applications for ... - CiteSeerX

shape measures for content based image retrieval: a ... - CiteSeerX

content-based image retrieval for digital mammography - CiteSeerX

Learning to Rank for Content-Based Image Retrieval - CiteSeerX

Vector Space Image Model (VSIM) for Content-Based ... - CiteSeerX

Textural Features For Content-Based Image Database ... - CiteSeerX

Multiple Feature Extraction for Content-Based Image ... - CiteSeerX