distributed PeerâtoâPeer (P2P) search engine for similarity search based ... the advantage of ranking the retrieved results according to their proximity to a query ...
MRoute: A Peer-to-Peer Routing Index for Similarity Search in Metric Spaces? Claudio Gennaro‡ , Matteo Mordacchini‡ , Salvatore Orlando∗‡ , Fausto Rabitti‡ ‡
ISTI - CNR, Area della Ricerca di Pisa Via Giuseppe Moruzzi, 1 56124 Pisa, Italy
∗
Dipartimento di Informatica Universit` a Ca’ Foscari Via Torino, 155 30172 Venezia, Italy
Abstract. Similarity search for content-based retrieval (where content can be any combination of text, image, audio/video, etc.) has gained importance in recent years, also because of the advantage of ranking the retrieved results according to their proximity to a query. However, to use similarity search in real world applications, we need to tackle the problem of huge volumes of such mixed multimedia data (e.g., coming from Web sites) and the problem of their distribution on multiple cooperating nodes. This is the situation of the NeP4B project (Networked Peers for Business), where the distributed nodes (i.e., peers) represent aggregations of SME’s with similar activities and the multimedia objects are descriptions/presentations of their products/services extracted from the companies’ Web sites. In this paper we approach this problem by considering a scenario of a network of autonomous peers maintaining a local collection of metric objects (i.e., mixed mode multimedia content). This network forms a distributed Peer–to–Peer (P2P) search engine for similarity search based on the paradigm of Routing Index. Each peer in the network thus maintains both an index of its local resources and a table for every neighbor, summarizing the objects that are reachable from it. The paper presents techniques that aim to make our P2P similarity-based search system viable, trading approximate results for scalable solutions. Results of simulations that use real collections of images are discussed.
1
Introduction
Nowadays the search for non-textual data (e.g., images, audio, or videos) is mostly done by exploiting the metadata annotations or extracting the text close to the data. For instance, web search engines in performing image searching simply index the text near the image and the image tags, used to provide a description of an image. Images indexing methods based on content-based analysis ?
This work was partially supported by the NeP4B project (Networked Peers for Business), funded by the Italian government, and by the SAPIR (Search In Audio Visual Content Using Peer-to-Peer IR) project, funded by the European Commission under IST FP6 (Sixth Framework Programme)
and resulting image features, such as colors and shapes), are not exploited at all. These limitations can be overcome if a similarity based search approach is adopted: content-based retrieval is supported also when the content can be any combination of text, image, audio/video, etc.. Moreover, this approach has also the advantage of ranking the retrieved results according to their proximity to a query, and supports relevance feedback [11]. The similarity search approach has been extensively used in content-based image retrieval systems, that retrieve images whose visual appearance is globally similar to a selected example image. Ihis approach, initially proposed by Swain and Ballard [10], was then adopted by a vast majority of content-based image retrieval systems (e.g., IBM QBIC, VisualSEEk, Virage’s VIR Image Engine, and Excalibur’s Image RetrievalWare). However, to use similarity search in real world applications, we need to tackle the problem of huge volumes of multimedia data (e.g., coming from Web sites), the problem that such multimedia data are of mixed nature (i.e., text image, audio/video, etc.) and the user wants to query the combination of such data (i.e., complex query support) and the problem of their distribution on multiple cooperating nodes. This is the situation of the NeP4B project (Networked Peers for Business), a P2P project aiming at innovative ICTs solutions for SMEs, by developing an advanced technological infrastructure to enable companies of any nature, size and geographic location to search for partners, negotiate and collaborate without limitations and constraints. The infrastructure will base on independent and interoperable semantic peers who behave as nodes of a virtual network. The project vision is towards an Internet-based structured marketplace where companies can access the huge amount of information already present in vertical portals and corporate databases and use it for dynamic, value-adding collaboration purposes. In the NeP4B P2P infrastructure the semantic peers (i.e., the distributed nodes) represent aggregations of SME’s (Small and Medium size Enterprises) with similar activities and the multimedia objects are descriptions/presentations of their products/services extracted from the companies’ Web sites. An example of such multimedia object, with complex content (in this case, a textual description and an image) is given in Figure 1. The proposed NeP4B similarity search infrastructure should support complex queries on this kind of content on the numerous connected peers. In this paper, to achieve the goal of being able to cope with a huge number of indexed objects and queries, we propose a scalable P2P indexing structure, based on Routing Index (RI) [4]. According to this approach, the global indexing phase is no longer required, because each peer node is responsible for storing and indexing its own objects. Also adopting this approach, the similarity-based search exploits objects characterized by a metric space, while the multi-feature queries are easily supported. Our proposal, called MRoute, intend to explore the opportunity of combing the advantage of the Routing Index paradigm with the Similarity Search on Metric Spaces approach.
Fig. 1. Example mixed multimedia object in NeP4B.
The rest of the paper is organized as follows. In sec. 2 we review some of the most prominent works present in the literature, while in sec. 3 we present our proposed approach. More precisely, in sec. 3.1 we show how our network is structured and formed and give some definition about our system. In sec. 3.2 we define our indexing system and show how it is used to spread objects feature in the network. Sec. 3.4 explain how indices are exploited for routing queries toward potential matchings. Results of simulation experiments with our network are presented in sec. 4.
2
Related Works
The problem of routing queries in P2P systems is well known. The first solutions that have appeared used to send a query all over the network (flooding). The most prominent example of such network is Gnutella [1]. In order to avoid flooding the network with query messages, many data indexing methods have been proposed. The most promising ones are the so-called Distributed Hash Table (DHT) based systems. They are based on a global index, partitioned between the peers that take part in the system. Every data item is associated with a key obtained by hashing an attribute of the object (e.g. its name) and every node in the network is responsible for maintaining information about a set of keys and the associated items. They also maintain a list of adjacent or neighboring nodes. A query becomes the search of a key in the DHT. When a peer receives a query, if it does not have the requested items, it forwards the query to the neighbor having keys which are closer to the requested one. Data placement ensures that queries eventually reach a matching data item. In order to further enhance the search performance, many DHT-based protocols organize the peers into an overlay structure. So, in Chord [9] nodes are organized into a virtual circle, while in CAN [7] the identifier space is seen as a d-dimensional Cartesian space. This space is partitioned into zones of equal size and every peer is responsible of one of these zones. Other relevant examples of this kind of systems are Pastry [8] and Tapestry [12].
Although these networks show good performances and scalability characteristics, they only support exact queries, i.e., requests for data items matching a given key. Moreover, these networks index objects using only one attribute. Similarity searches always involve the execution of range queries with possibly more than one attribute per query. Thus, classical DHT systems need to be extended and adapted in order to deal with such queries. Very recently four scalable and distributed similarity access methods, able to execute similarity queries for any metric dataset, have been proposed [2]. Two different strategies have been adopted: techniques based on partitioning principles and transformation methods. The former group comprises the VPT∗ which uses ball partitioning techniques, and the GHT∗ which exploits generalized hyperplane partitioning principles. The principle adopted by the latter group exploits a mapping from a metric space into a coordinate space of given dimensionality. This enables us to apply the well-known solution named distributed hash tables (DHT): the CAN and the Chord. The metric indexes based on them are called the MCAN and the M-Chord, respectively.
3
MRoute - Multimedia Routing Index
In this section, we describe our proposal of Multimedia Routing Index that supports similarity search for content-based retrieval and aims at solving the two related problems of huge volumes of mixed multimedia data and their distribution on multiple cooperating peers. We will show also a running example inspired to the context of the NeP4B project. 3.1
Network Organization
The network topology presented in this paper is a tree. We choose this topology because the network uses Routing Indices to share knowledge about objects properties and to route queries toward potential matchings. As outlined in [4], Routing Indices work well when no cycles are present in the network. If they are present, the system needs mechanism that remove or avoid them when creating the indices or when queries are spread in the network. This work is the extension for similarity searches of a previous one [6]. In that work, we present two types of routing indices, one uses a loop-free network (i.e. a tree) and another one which consider a less structured network. In this paper we use the first type of network organization. The purpose is to test the indexing structure performances when dealing with similarity searches in metric spaces. Extensions with less structured network topologies are foreseen in the future. In this paper we concentrate ourselves on similarity-based Range Queries over metric objects, defined as follows: given an object q ∈ D and a range r, a query Q = (q, r) has to retrieve {x ∈ D | d(x, q) ≤ r}. On the basis of range queries, we can also realize nearest neighbors queries. According to Ch´ avez, et al. [3], the similarity-based search methods for metric spaces relies on an index that allows determining a set of candidate subsets where
the objects relevant to the query can appear. In this paper we use the well known technique of reference objects (or pivots) to transform the indexed object and the query objects into a vector of distances from the m reference objects {p1 , . . . , pm }. In particular, given x ∈ D, function T : D → Rm transforms x as follows: T (x) = (d(x, p1 ), . . . , d(x, pm )) Due to the triangular inequality property of metric spaces, using L∞ we obtain a contractive mapping in the new vector space, such that L∞ (T (x), T (y)) ≤ d(x, y). Therefore, given Q = (q, r), we have to check regions of our mapped space that include objects x such that L∞ (T (q), T (x)) ≤ r, i.e., such that maxm i=1 (|d(x, pi ) − d(q, pi )|) ≤ r. In other words, given an object x, we have that d(x, q) > r surely holds if there exists a reference object pi such that |d(x, pi ) − d(q, pi )| > r. In conclusion, we need only to check regions of our mapped space Rm centered in (d(q, p1 ), . . . , d(q, pm )), such that [d(q, pi ) − r, d(q, pi ) + r] for each reference object pi (i.e., for each dimension i of Rm ). For multimedia objects, we can compute the distance d on the basis of single or multiple features. For each feature (or combination of features) we can select a set of reference objects, map the objects using a function T : D → Rm , and build an index based on L∞ and the triangle inequality property. In the following, we denote with Nb (P ) the set of neighbors of a peer P . With T (P → P 0 ) we denote the network subtree rooted on P 0 , where P 0 ∈ Nb (P ). The set of data objects owned by a peer P is called local repository and we denote it with Data (P ). We use the qualifier “data” in order to distinguish it from metric object. We suppose that data in the system is in some way clusterized, i.e. each peer collects objects from homogenous data sources. Each data object O is characterized by a set of features. A feature is a metric object extracted from the data object. For instance a data object could be a jpeg image, from which we can extract two features, e.g. the texture and the color-histogram. O.F indicates the value of feature F for O. Throughout the paper, we will simply use the term object to mean data objects. We suppose that the system starts with n initial nodes. As we see in the next section, searches in metric spaces are done with the help of some reference objects. We assume that when the system is started, for every feature F , m reference data objects are chosen, with m ≤ n. They are selected as m random objects from the local repositories of the first n nodes, one per peer, starting from the peers with the largest data collections. Given a set of reference objects T = {T1 , . . . , Tm }, the corresponding set of features is denoted as T.F = {T.F1 , . . . , T.Fm }. Different features could have different sets of reference objects, each set with different cardinalities. 3.2
Data Indexing
The data indexing process is intended to produce a summarized description of the objects owned by each peer in order to produce efficient Routing Indices.
The aim of these indices is to provide a concise but yet sufficiently detailed description of the resources available in a given network area. This information is then used at query-resolution time to prune entire system zones from searching, thus easing the search process. As for classical methods for searching in metric spaces, we make use of reference objets in order to prune non-matching objects from searching when solving queries. In particular, let O be an object, Q a query object and Q.F its corresponding feature, and T.Fj the feature of a j-th reference object. Given a range query of radius RF centered in Q.F , an object O that belongs to the results set must satisfy the triangle inequality, as in the following: |dF (O.F, T.Fj ) − dF (Q.F, T.Fj )| ≤ RF
(1)
where dF (O.F, T.Fj ) and dF (Q.F, T.Fj ) indicate the distances of O.F and Q.F from T.Fj , respectively. This process allow to discover potential matching for the query. Actual distances of the feature of the object from the feature of the query must be computed, in order to find the objects that really satisfy the query conditions. Since distance computation is an expensive process, using precomputed distances from some reference objects allow to discard uninteresting objects and reduce the computational cost of the whole process. Moreover, using a larger number of reference objects poses a larger number of conditions like Eq. (1), thus further reducing the search space. For these reasons, we index objects with respect to their distances from the reference objects selected during the network creation process. Since objects have possibly more than one feature, we maintain a set of indices for each feature considered in the network. Each set is built with respect to the reference objects of the feature the indices refer to. Objects individual indices are then used to construct Routing Indices. Thus, the indexing process is twofold. The first phase takes place inside each peer. During this step, each data object is sorted according to its distance to the reference objects. As said above, an object has an index for each reference object associated with a given feature. So, if there are m reference objects for a feature, then the object associates m different indices with respect that feature. In our structure, an index simply consists of a vector with k elements. Each element of the vector represents a distance interval from the reference object, as illustrated in Fig. 2. As can be seen, the vector is filled in with 0s, except the entry corresponding to the distance interval where the real distance of the object from the reference object lays. Formally, if we consider distances from a reference object T.Fi in the interval [a, b], we select k + 1 division points a = a0 < a1 < . . . < ak = b such that [a, b] is partitioned into k disjoint intervals [ai , ai+1 ), i = 0, 1, . . . , k −1. For each feature F and its set of associated reference objects T.F = {T.F1 , . . . , T.Fm }, we encode the value d(O.F, T.Fj ), j = 1, . . . , m, with a k bit binary vector DataIdx (O.F )T.Fj = (b0 , b1 , . . . , bk−1 )
(2)
Fig. 2. Creation of a single local index
such that bi = 1 i and only if d(O.F, T.Fj ) ∈ [ai , ai+1 ). Both the parameter k (number of elements of the vector) and the division points a0 , a1 , . . . , ak may be different for each reference object of each feature type. Individual object indices are then used to construct a concise description of all the data owned by a peer. The peer index, wrt a reference object Tj , is obtained by simply summing the indices of all the objects of the local repository of P : X (3) DataIdx (O.F )T.Fj NodeIdx (P, F )T.Fj ≡ O∈Data(P )
The final result is a vector that represent how the objects of the peer are distributed with respect to the feature of the reference object Tj . Such an index could be regarded as an histogram the object distances from Tj . The creation of the peer local indices concludes the first phase of the indexing process. The next step allows the construction of Routing Indices and thus the spreading of object information along the network. In order to show how RI are built, let us consider a generic peer Pi . For each neighbor Pj ∈ Nb (Pi ), Pi keeps information on the data items which can be found by following the link {Pi , Pj } on the overlay network. For this purpose, Pi maintains an index called LinkBitIdx (Pi → Pj , F )T.Fk for each reference object Tk of each feature F of the data items in T (Pi → Pj ). Since Pi can receive information only from its neighborhood, the index is calculated recursively in the following way: X LinkBitIdx (Pi → Pj , F )T.Fk ≡ NodeIdx (Pj , F )p +( LinkBitIdx (Pj → P, F )T.Fk ) P ∈Nb(Pj )−Pi
(4) The final result is that the index contains the sum of all the vector indices NodeIdx (P, F )T.Fk associated with every peer in T (Pi → Pj ).
3.3
Query Processing
The query process starts when a peer receives a query from a user. The query could specify conditions over more than one feature. In this section we show how a query is solved for a single feature. When many features are involved, neighbors are selected by simply combining with AND and OR operations (as defined by the user) the results of every single feature matching.
We assume that queries could be originated from any node P in the system. We concentrate our attention on range queries. The first step consists in the transformation of a query range specification into index vectors. Since we consider the case of single feature, for simplicity we drop the symbol F from the following formulations. Let Q be a query, {T1 , . . . , Tm } the reference objects and R the query radius. The values d(Q, Tj ) − R and d(Q, Tj ) + R are computed, for each j = 1, . . . , m. The index vector representation, denoted as QueryIdx (Q)Tj , is then built by setting to 1 all the entries that correspond to intervals that are covered (even partially) by the requested range. All the other entries are set to 0. This index has a length of k, as the resources corresponding one. As said in the previous section, P knows the routing index LinkBitIdx (P → P 0 , A)Tj , for each P 0 ∈ Nb (P ), where LinkBitIdx (·)Tj is defined according to Eq. 4. Instead of flooding the network by forwarding Q to all its neighbors, P matches the indices of Q with the corresponding routing indices of its neighborhood. The matching process is performed in the following way: given a neighbor P 0 , for each Tj j = 1, . . . , m, the results of logical AND between all the entries of QueryIdx (Q)Tj and LinkBitIdx (P → P 0 )Tj . For each entry i of the indices, the result is 1 if both QueryIdx (Q)Tj [i] > 0 and LinkBitIdx (P → P 0 )Tj [i] > 0, and 0 otherwise. If the final result is the vector (0, . . . , 0), then the two indices do not match. It means that in the whole sub-network rooted on P 0 there are no matchings for Q. Otherwise, Q is propagated along the connection from P to P 0 because a match is likely to be present in T (P → P 0 ). In this way, a query is forwarded only toward zones where potential matching objects could be found. Since indices are an approximate representation of the objects values, a positive index matching could not mean that the corresponding resource will really match the query. The process described above is designed to discover all the matchings for a given query. A way to further decrease the number of messages flooded in the network is to performed an approximate search. In this case, the system try to find only a fraction of all the matching objects. This kind of search is the one we adopted in the experiments illustrated in the next section. The adopted strategy works as follow. Each routing index is a histogram of the objects value distribution in the subnetwork rooted on a neighbor P 0 . Thus, when this index matches a query, it is possible to know the number of the potentially matching objects present in T (P → P 0 ). The peer P that has to forward a query can then prunes its neighbors on the basis of the number of potential matchings they have in their subnetworks. The selection is made using what we call the requested coverage, i.e. the neighbors the query is forwarded to have a number of matchings that is equal or above a given percentage of all the possible matchings of P ’s neighbors. The aim of this process is to further reduce the number of query messages, while trying to collect the largest number of results. 3.4
Query Processing Example
Suppose that we have a network like the one depicted in fig. 3. In this network, each peer has a local repository formed by objects like the one in fig. 1. The
indices shown in fig. 3 are the routing indices has they are seen by the parents of each node.
Fig. 3. Example of a network
Each peer has its own local index for each reference object like the one in 4. Suppose that a user poses the query shown in 5 and that the requested coverage is setted to 50%. Fig. 6 describes the query forwarding phase. For the
Fig. 4. Peer local index
Fig. 5. Index of a range query
sake of simplicity, only matching indices with their relative number of potential matchings are shown. As can be seen, P1 and P2 have matching indices. Thus, P3 is excluded from the forwarding process. Moreover, P1 is the neighbor with the highest number of matchings. It covers 3/4 of all potential matching objects. Since this percentage is above the requested coverage, the query is forwarded only to P1 . receives the query from, since it is the neighbor with the most relevant set of potential matchings. P1 , in turn, search in its local repository and send the query to P4 . Finally, both P1 and P4 give their answers back to P .
4
Simulation Results
In this section we present the results obtained by a simulation of our system. The aim of this simulation experiments is to study the behavior of the system with
Fig. 6. Index of a range query
respect to thrre indicators: Query span, i.e. the number of nodes that receives a query and check if the they matchings in their own local repositories; Recall, i.e. the percentage of the total number of relevant results retrieved by the network; Distance score, i.e. a measure that indicates how similar are the retrieved results to the best possible answers that can be found among the network objects. Simulation results were computed as intervals with 90% confidence level and each measurement was repeated multiple times in order to get confidence intervals having width of less than 5% of the central value. The simulation model has been implemented using the C++ library described in [5]. The simulation settings are as follows: We use a network with 100 nodes. The network topology is a tree generated randomly, with at most five children per node. The object used in the network are the one described in the zurich databes. The databes has been partitioned in 100 clusters using an implementation of the k-mean algorithm. Each peer has one of this clusters has its local repository. Each object is characterized by only one feature, since this the information available for the data of the zurich database. We used 10 different reference objects for constructing the indices. As described in sec. 3.1, they are chosen as 10 random objects taken from the 10 peers with the largest local repositories. In order to measure the system performances, four different types of range queries were submitted to the network. Each type of query is characterized by a different range. We chose as query ranges the values of 1250, 1500, 1750, 2000. For each query type, we generated 50 queries with the given range. Each query starts from a random node in the network. For each query the performances were measured by varying the requested coverage of available resources from 10% to 100%. Fig. 7, 8, 9 and 10 show the results for query span and recall. Each figure shows two results. The x axis reports the requested resource coverage, while the y shows both the recall of the query (percentage of total number of relevant results
retrieved) and the query span (percentage of the system peers that examined the query).
1
1 Queried nodes Retrieved Results
Queried nodes Retrieved Results 0.9 perc. of nodes involved / perc. of retrieved resources
perc. of nodes involved / perc. of retrieved resources
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
0
0 0.1
0.2
0.3
0.4
0.5 0.6 requested percentage
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5 0.6 requested percentage
0.7
0.8
0.9
1
Fig. 7. Recall and query span for radius =Fig. 8. Recall and query span for radius = 1250 1500
1
1 Queried nodes Retrieved Results
Queried nodes Retrieved Results 0.9 perc. of nodes involved / perc. of retrieved resources
perc. of nodes involved / perc. of retrieved resources
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
0
0 0.1
0.2
0.3
0.4
0.5 0.6 requested percentage
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5 0.6 requested percentage
0.7
0.8
0.9
Fig. 9. Recall and query span for radius =Fig. 10. Recall and query span for radius 1750 = 2000
Another measure of the system performances is given in ??. The metric used in this figure gives a numeric values of the distance of the retrieved result from the optimal ones that can be found in the network, wrt the query. The measure is obtained in the following way. Let O = {O1 , . . . , On } be the set of retrieved results for a range query Q, B = {B1 , . . . , Bn } the set of the n objects that the most similar to Q and W = {W1 , . . . , Wn } the set of the n objects with the highest distance from Q.
1
Let consider the following quantity: P P | Oi ∈O d(Oi , Q) − Wi ∈W d(Wi , Q)| P P | Bi ∈B d(Bi , Q) − Wi ∈W d(Wi , Q)|
(5)
This quantity is calculated with respect to the distance of the objects from the query. It is always less or equal to 1 and gives a measure of how close the retrieved results are to the best possible results.
1 Radius = 1250 Radius = 1500 Radius = 1750 Radius = 2000
0.9 0.8 0.7
Score
0.6 0.5 0.4 0.3 0.2 0.1 0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
requested percentage
Fig. 11. Creation of a single local index
The results show that the indexing system is effective in limiting the diffusion of a query among non-matching nodes. In particular, we can observe that approximate searches with values of requested coverage around 90% limit considerably the number of spanned nodes, while achieving a recall closer to one and very good scores, with respect to the best possible results.
5
Conclusions and Future Work
In this paper we presented an approach for distributed similarity search on a network of autonomous peers maintaining a local collection of metric objects with mixed mode multimedia content (i.e, any combination of text, image, audio/video, etc.). This network forms a distributed Peer–to–Peer (P2P) search engine for similarity search based on the paradigm of Routing Index (MRoute – Multimedia Routing Index). These are the operative assumptions of the NeP4B project (Networked Peers for Business), where the distributed nodes (i.e., peers) represent aggregations of SME’s with similar activities and the multimedia objects are descriptions/presentations of their products/services extracted from the companies’ Web sites.
We have presented some simulation results showing the characteristics of MRoute, that lead to the conclusion that it fulfill the operation requirements of the NeP4B projects, and in general the requirements of an index to support large scale multimedia similarity based searching applications. In the context of the NeP4B project, we intend to pursue this research by integrating our Multimedia Routing Index with a Semantic Routing Index. In fact, a Semantic Routing Index is based on the semantic mappings between peers, exploiting the mapping between the single elements of the domain ontologies of each peer. A query on a NeP4B semantic peer can be propagated to the other peers through these semantic mappings. Integrating the two mechanisms (Multimedia Routing Index and Semantic Routing Index) would allow the P2P system to try to answer user queries also in these cases, quite common in the real world, where the sought complex objects are poorly described in the peer ontologies or the the semantic mapping are imprecise or incomplete.
Bibliography
[1] 2006. Gnutella protocol development. http://rfc-gnutella.sourceforge.net/. [2] M. Batko, D. Novak, F. Falchi, and P. Zezula. On scalability of the similarity search in the world of peers. In InfoScale ’06 Proceedings of the 1st international conference on Scalable information systems, page 20, New York, NY, USA, 2006. ACM Press. [3] E. Ch´ avez, G. Navarro, R. Baeza-Yates, and J. L. Marroqu´ın. Searching in metric spaces. ACM Comput. Surv., 33(3):273–321, 2001. [4] A. Crespo and H. Garcia-Molina. Routing indices for peer-to-peer systems. In Proc. of the 28 tn Conference on Distributed Computing Systems, July 2002. [5] M. Marzolla. libcppsim: a simula-like, portable process-oriented simulation library in C++. In G. Horton, editor, Proc. of ESM04, the 18th European Simulation Multiconference, Magdeburg, DE, 2004. SCS–European Publishing House. [6] M. Marzolla, M. Mordacchini, and S. Orlando. Peer-to-peer systems for discovering resources in a dynamic grid. Parallel Computing, 33(4-5):339– 358, 2007. [7] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content addressable network. In Proceedings of ACM SIGCOMM 2001, 2001. [8] A. I. T. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Middleware ’01: Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, pages 329–350, London, UK, 2001. SpringerVerlag. [9] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, F. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw., 11(1):17–32, February 2003. [10] M. Swain and D. Ballard. Color indexing. International Journal of Computer Vision (IJCV), 7(1):11–32, 1991. [11] P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach, volume 32 of Advances in Database Systems. Springer, 2006. 220 p. [12] B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. Kubiatowicz. Tapestry: A resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications, 22(1):41–53, January 2004.