Efficient and scalable query routing for unstructured peer ... - CiteSeerX

46 downloads 0 Views 280KB Size Report
The current Limewire distribution already has a rudimentary implementation of the query routing protocol as specified in [21]. The existing work on query routing ...
Efficient and scalable query routing for unstructured peer-to-peer networks Abhishek Kumar

Jun (Jim) Xu Ellen W. Zegura College of Computing Georgia Institute of Technology fakumar,jx,[email protected]

Abstract— Searching for content in peer-to-peer networks is an interesting and challenging problem. Queries in Gnutellalike unstructured systems that use flooding or random walk to search must visit O(n) nodes, thus consuming significant amounts of bandwidth. In this paper, we propose a query routing protocol that allows low bandwidth consumption during query forwarding using a low cost mechanism to create and maintain information about nearby objects. To achieve this, our protocol maintains a lightweight probabilistic routing table at each node that suggests the location of each object in the network. Following the corresponding routing table entries, a query can reach the destination in a small number of hops with high probability. However, maintaining routing tables in a large and highly dynamic network requires non-traditional mechanisms. We invent a novel data structure called Exponentially Decaying Bloom Filter (EDBF) that encodes such probabilistic routing tables in a highly compressed manner, and allows for efficient aggregation and propagation. The search primitives provided by our system can be used to search for single keys or multiple keywords with equal ease. Analytical modeling of our design predicts significant improvements in search efficiency, verified through extensive simulations in which we observed an order of magnitude reduction in query path length.

I. I NTRODUCTION Searching in the world-wide web has benefited from the sophisticated mechanisms at centralized search engines like Google [1]. In contrast, the mechanisms for searches in peerto-peer (p2p) networks are severely constrained due to the distributed nature of content indices and highly dynamic membership of hosts in the network. In the context of peer-to-peer networks, searching is equivalent to routing a query to a node in the network that hosts content matching the query. Routing in traditional networks has endeavored to achieve deterministic and complete routing in the face of (relatively infrequent) network changes. On the other hand, content-based routing for peerto-peer networks has to grapple with the large size of routable content and frequent changes in the network membership. This rules out the conventional solution of building and maintaining a per-destination routing table through an exchange of routing updates. However, relaxing the requirements of completeness and determinism in routing enables some interesting solutions that we explore in this paper. The existing solutions to achieve content-based routing can be divided into two broad families. The first family of structured p2p networks consists of solutions that impose a particular structure on the overlay network (e.g., [2], [3], [4], [5], [6]).

The regularities in this structure are then exploited to efficiently maintain and query a global data-structure such a Distributed Hash Table (DHT). Earlier proposals associated a unique key with each data item using hashing, and then stored the keys at nodes responsible for the corresponding portions of the DHT. Proposals like pSearch [7] associate each piece of content with a vector in a Cartesian space. Queries too are vectors in this space. A search is implemented by routing the query to a node responsible for the corresponding portion of the Cartesian space. The second family of unstructured p2p networks comprises Gnutella-like networks that do not impose any structure on the overlay network [8]. The default search mechanism in Gnutella is to blindly forward queries to all neighbors within a certain number of hops. Although this mechanism handles network dynamics very well, search through blind flooding is quite inefficient. This has motivated a host of studies proposing various enhancements to search in unstructured networks. Major improvements include replacing the blind flooding with a randomwalk [9] or an expanding ring search, tailoring the network construction to achieve properties of small world graphs [10], reflecting the capacities of heterogeneous nodes in topologyconstruction [11], and caching pointers to content located one hop away[11]. All of these proposals (except for caching) retain the “blind” nature of query forwarding in Gnutella. In other words, the forwarding of queries is independent of the query string and does not exploit the information contained in the query itself. The keywords in the query are used only for searching the local content index. We propose to build probabilistic routing tables at nodes, constructed and maintained through an exchange of updates among immediate neighbors in the overlay. These routing tables use a novel data structure — the Exponential Decay Bloom Filter (EDBF) — to efficiently store and propagate probabilistic information about content hosted in the neighborhood of a node. The amount of information in an EDBF (and the number of bits used to store this information) decreases exponentially with distance. Such exponential decrease in information with distance restricts the impact of network dynamics to the neighborhood of any departing or newly arriving node. The Scalable Query Routing (SQR) mechanism we design uses hints obtained from these probabilistic routing tables to forward queries. The use of probabilistic hints provides a significant advantage over the completely blind nature of existing mechanisms, translating

Noise

Low information Noise

Medium information

Strong information

Content Host (Origin of information)

Noise

Fig. 1. Exponentially decaying information in the neighborhood of a node. Arrow sizes represent the amount of information or awareness about content hosted at the node on the right extreme of the figure. Notice how the noise, depicted as small arrows, is present everywhere but is dominated by the information as we get closer to the host.

into large reductions in the average number of hops over which a query is forwarded before it is answered. We evaluate our design through both analytical modeling and simulations. Our analysis provides good insight into the complex process of probabilistic information dissemination in random graphs and its use for efficient routing. Using the analytical model, we derive expressions for query success probability and the expected query path length. Predictions from this model closely match our observations of the system in simulation. Our simulations cover a number of proposed systems and demonstrate the performance advantages of using a probabilistic routing mechanism like SQR. SQR achieves one to three orders of magnitude reduction in query path-length over the existing and proposed systems. These improvements are observed to be consistent over a range of content replication models. The rest of this paper is organized as follows. An overview of SQR is presented in the next section. This is followed by a detailed description of the design in section III. Our model of SQR and its analysis is presented in section IV, with some of the proofs deferred to the appendix. In section V, the predictions from our analysis are verified and the performance of SQR is studied through extensive simulations. We also study the benefits of coupling SQR with various topology adaptation mechanisms. A discussion of routing issues such as hysteresis, the cost of routing updates and its interaction with network dynamics is presented in section VI. We also discuss possible extensions such as allowing larger routing tables at high capacity nodes in this section. We review related work in section VII and conclude in section VIII with pointers to future work. II. OVERVIEW In this section we present a brief overview of our solution, deferring a detailed description to the next section. As mentioned earlier, our system for Scalable Query Routing (SQR) propagates awareness about content hosted at a node in the neighborhood of the node, with the amount of information decreasing exponentially with the distance from the hosting node. Figure 1 shows an intuitive interpretation of this effect. The node on the extreme right of the figure is hosting a piece of content and originating information about it. As a consequence, other nodes

close to the origin have strong information about the content at the origin. The strength of this information decreases with distance until it becomes indistinguishable from noise towards the extreme left of the figure. The goal of our design is to spread awareness about content hosted at a node in its neighborhood. However, due to frequent changes in the network, it is desirable to minimize the amount of effort required to spread this awareness. Exponentially decreasing the awareness with distance restricts the impact of dynamics, since only a few nearby nodes have a lot of information, while spreading partial awareness over a larger neighborhood. The challenge here is to design a way to efficiently propagate and maintain such information. We address this challenge through the design of the Exponential Decay Bloom Filter (EDBF) — a building block crucial to our system. EDBF is a data structure that succinctly represents this exponentially decaying information and allows for efficient communication of such information. Intuitively, the EDBF is a data structure that allows us to vary the number of bits used to store probabilistic information. As the distance from the node hosting the object (source of information) increases, fewer bits are used to represent information about the direction in which the object is located. At sufficiently large distances the number of bits used becomes much smaller than 1 and is dominated by random noise due to collisions in hashing, thus effectively restricting the size of the “aware” neighborhood. As in any system for routing, SQR has two major components, the routing component and the forwarding component. The routing component is concerned with the construction and maintenance of routing tables. In SQR the routing table is a set of EDBF’s, one corresponding to each link. Nodes send route advertisements during the setup of (overlay) links. Incremental updates are used to maintain the consistency of advertised information. The use of EDBF allows for efficient communication of both the initial advertisement and incremental updates. The cost of constructing and maintaining these routing table increases with the size of the aware neighborhood. The forwarding component uses these routing tables to forward queries. While forwarding a query, the EDBF corresponding to each of the neighbors is examined. The query is forwarded to the neighbor advertising the maximum amount of information about the queried object. Within the aware neighborhood, this action propels the query towards the node hosting the queried object. Since EDBF is a “noisy” data structure, at nodes outside the aware neighborhood, the maximum information is just a random accumulation of noise and following this random noise corresponds to a random walk. Thus from a global view, each query follows a random walk until it hits a node within the aware neighborhood, and then quickly gravitates towards the node that is originating information about the target object. The cost of each query decreases as the size of the aware neighborhood increases. Through our design of SQR in this paper, we make a case for discarding the maintenance of precise routing information required by the previously proposed query routing mechanisms, in favor of approximate, but considerably efficient, probabilistic routing information maintained through EDBFs. Our evaluation through analysis and simulation establishes the scalability

of SQR and paves the way for wide-area deployment of query routing with propagation of routing updates. The next section gives a complete description of the various components of our solution and their synthesis to provide scalable query routing. III. D ESIGN In this section we describe the details of our query routing mechanism. We begin with the design of the Exponential Decay Bloom Filter (EDBF), a data structure for space efficient representation of probabilistic information. We then describe the routing protocol that builds and maintains routing tables using EDBFs and the use of these probabilistic routing tables for forwarding queries. Finally we describe how the semantics of searches for content labels or keywords are implemented using SQR. A. Design of the Exponential Decay Bloom Filter The EDBF is closely related to the traditional Bloom filter (BF) [12] that uses a fixed number (k ) of hash functions to answer set membership questions. While inserting an element x in the BF, all k hash functions h1 (:);    ; hk (:) are evaluated simultaneously over x and the locations hi (x) in an array of bits are set to 1. A query for the element y in a BF looks at all the bits at locations h1 (y );    ; hk (y ) and returns a yes if and only if all these bits are 1. There is a small probability of false positives that can be arbitrarily reduced by increasing the number of hash functions and the size of the bit array. An EDBF uses a fixed number (k ) of hash functions too, but returns a probabilistic answer instead of a binary yes/no answer. This probabilistic answer is a value between 0 and 1, computed as an increasing function of the number of locations, out of k total possibilities, for which hi (y ) is 1. A simple choice for such a function is: F () = =k , where  is the number of locations where the corresponding bits are set to 1. Due to collisions in hashing, some of the x locations can be set to 1 by other labels hashing to the same location as the one being queried. This causes a false increase in the count x of the total number of bits out of k that are 1. In section IV, we study the impact of such false positives and incorporate this behavior in an analytical model of SQR. The chief reason for our choice of a modified Bloom filter for representing the routing information is the extraordinary succinctness of representation achieved by Bloom filters. Recent work [13] on compressed Bloom filters has shown that if the objective for optimization is not storage space, but instead is the communication overhead involved in transferring the bit-array, then using a larger bit-array that results in a smaller fraction of bits being set to 1 is preferable. We discuss some theoretical issues that motivate the design of EDBF and their implications in appendix A. B. Routing using EDBFs In this subsection, we present the mechanisms for creating and maintaining the routing tables using EDBFs (see figure 2). The next subsection describes how queries are forwarded using these routing tables. At the end of this section (section IIID) we describe how this architecture can be used to implement searches for unique keys or for keywords with equal ease.

Create Local EDBF (given local content X ): // Populate local EDBF A. 1. 8x 2 X 2. Set bits A h1 x , ..., A hk x to 1;

[ ( )℄

[ ( )℄

Create Update (for neighbor j ): // Copy all the bits from the local EDBF A into // the update Uj . A; 1. Uj // Decay the information received from all neighbors // other than j by a factor of d, and add the // surviving bits to Uj . 2. 8i 2 neighbor list; i 6 j 3. 8r 2 f ;    ; mg 4. if Ai r 5. with probability =d, Uj r ; 6. Return Uj ;

= 1 ( [ ℄ == 1)

1

[℄ 1

Reliably Send Update(to neighbor j ): // Bit array Uj0 is the cumulative advertisement sent to j . // Send an incremental update, the diff of Uj and Uj0 . 1. if 9Uj0 2. 8r 2 f ;    ; mg 3. U r Uj0 r  Uj r ; //  represents XOR. 4. else 5. 8r 2 f ;    ; mg 6. U r Uj r ; Uj ; 7. Uj0 8. Compress (U ); 9. Reliably Deliver Update(U to neighbor j );

(

)

1 [℄ 1 [℄

[℄

[℄

[℄

Recv Update (update U from neighbor i): 1. if 9Aj 2. 8r 2 f ;    ; mg 3. Aj r A j r  U r ; 4. else 5. 8r 2 f ;    ; mg 6. Aj r U r ;

(

)

1 [℄ 1 [℄

[℄

[℄

[℄

Fig. 2. Algorithms for creating, sending and receiving updates in SQR.

As in traditional routing protocols, every node in our mechanism maintains one local routing table (stored in the form of EDBFs) and sends triggered updates to its neighbors. Nodes have local policies to construct their own routing table from the updates they receive. Similarly, nodes have local policies that determine the updates to be sent to neighbors. We specify the default local policy for creating and receiving updates, while acknowledging the scope for extending these policies to reflect the notions of capacity, trust, etc. In SQR, each node advertises its own content by inserting the identifiers of all its content in an EDBF and forwarding this EDBF to all its neighbors. Advertisements received from neighbors are processed by randomly resetting a fraction of the bits to zero and forwarding the result to all other neighbors. A node maintains its routing table by maintaining one EDBF for each link. These EDBFs are updated by computing the XOR of the current EDBF for a link with an advertisement received on that link and saving the result as the routing EDBF for that link. The algorithms for creating and updating routing tables in SQR is shown in Figure 2. The first algorithm Create Local EDBF takes the list X of keywords (or content labels) corresponding to locally hosted content and creates an EDBF by

computing the k hash functions h1 (x);    ; hk (x) for each keyword (label) x 2 X and setting the corresponding bits in the array A to 1. Note that the size the array of bits is m, and the range of all the hash functions is [1;    ; m℄. The local EDBF is created when the node first joins the network and modified suitably whenever the set of locally hosted objects changes. Algorithm Create Update creates an update for a neighbor j by first inspecting the local bits in EDBF A and setting all the corresponding bits in update Uj to 1. Then the updates Ai received from all neighbors other than j are observed1, and the bit Uj [r℄ is set to 1 if Ai [r℄ is 1, with probability 1=d. In other words, the union of updates received from all neighbors other than j is computed and “attenuated” by a factor of d before propagation. The local EDBF is propagated without attenuation. Algorithm Send Update first inspects if an earlier update Uj0 was received by the neighbor j . If yes, an incremental update is sent by computing the XOR of the current update Uj with the previously sent update Uj0 . Otherwise the complete update Uj is sent. Before transmission, the updates are compressed using arithmetic coding for efficiency. The above design achieves our primary objective of efficiently maintaining probabilistic information about the content stored in the neighborhood. Due to the random resetting of bits during forwarding of advertisements, the impact of a node’s advertisements on another node’s routing table reduces exponentially with the distance between the two. An initial update is sent whenever a node acquires a new neighbor. Incremental updates for each neighbor j are created periodically but sent asynchronously only when the current update Uj differs from the previously sent update Uj0 by more than Æ bits, where Æ is a parameter to be tuned. At the other end, algorithm Recv Update, on receiving an update U from neighbor j , first inspects if an earlier update Aj was received from the same neighbor. If yes, the update is treated as an incremental one and Aj is modified suitably by computing its bitwise XOR with the new update U . Otherwise, U is the first update from neighbor j and is simply stored as Aj the latest update from j . C. Forwarding in SQR The algorithms for forwarding queries in SQR is summarized in figure 3. On receiving a query, a node first checks its cache of recently seen queries to see if the query was seen previously. If yes, the query is forwarded to a randomly chosen neighbor, provided its TTL has not expired. We discuss the justifications for this later in section III-C.3. New queries are processed as follows.The recipient node first looks up its local EDBF A, to determine if matching content is hosted locally. If yes, a response is created and sent to the originator of the query. Successfully answered queries can either be discarded to save network resources, or forwarded in the hope of generating further 1 If the information learned from neighbor j is advertised back to neighbor j , it would create a manifestation of the count to infinity problem. Our solution to exclude the advertisements received from j implements the semantics of split horizon with poisoned reverse [14]. However, split horizon does not protect against all forms of loops. Fortunately, the exponentially decaying nature of information in SQR implies that the count to infinity problem manifests itself as a “decay to infinitely small amount of information”, thus significantly restricting the impact of routing loops.

Forward Query (given query Y ): // Forward previously seen queries to neighbor i, . // chosen randomly from neighbor list. 1. if ( Seen Query(Y )) 2. Deliver Query(Y ,i); 3. else //Forward previously unseen queries to the neighbor // with the maximum information about this query. Lookup (Y ); 4. 5. Pick i such that i max ; 6. Deliver Query(Y ,i);



=

()

Lookup (given query Y ): 7. 8i 2 neighbor list 8. 8q 2 f ;    ; k g 9. i A i hq y ; 10. Return ; /* fi g*/

1 + = [ ( )℄  =

Fig. 3. Algorithms for forwarding queries in SQR.

responses. This decision should be made by the flow control component of the system, which is better aware of the load on the system and available resources in the neighborhood. Unanswered queries, or answered ones that are to be forwarded are sent, in a probabilistic sense, towards the neighbor advertising the maximum information to corresponding content. Forwarding a query in SQR involves looking up the EDBF Ai from each neighbor i. The result of this lookup is an “indicator” value i for each neighbor i. The likelihood of finding matching content via the neighbor i is roughly proportional to i . Looking up the advertisements received from different neighbors to obtain the corresponding indicator value i proceeds as follows. While looking for content labeled x, we compute the k hash functions over x and inspect the bits at the corresponding locations in the advertisement Ai . The indicator value i is simply the total number of bits that are 1. Similarly, while searching for multiple keywords in content labels, for each keyword y in the list of keywords Y in the query, we compute the k hash functions and inspect the bits at the corresponding locations in the advertisement Ai . The indicator value i is simply the total number of bits in these locations that are 1. This algorithm is specified in Figure 3. Once the neighbor with the highest awareness about the query (i.e., the neighbor advertising the highest number of bits corresponding to the query) is identified, the query is forwarded to this neighbor after decrementing the TTL by 1. In an ideal situation, with the aware neighborhood extending throughout the network, and an exponentially increasing number of bits pointing towards the origin of the information about any object, queries will follow the shortest path to their destination. However, due to limited aware neighborhood size (to save on the cost of creating and maintaining the routing tables), and false positives in the EDBF, a number of issues need to be addressed before our description of the forwarding algorithm is complete. 1) Noise in the EDBF: The first issue we need to consider is that of “noise” in the indicator values as computed from the EDBF data-structure. Due to collisions in hashing, some of the bits that contribute to a total of i in advertisement Ai , could be caused by other unrelated keywords hashing to the same loca-

tions as hq (y ). Assuming good hash functions, if the proportion of bits that are set to 1 in the array is , then out of the total k bits that are inspected, k have a value 1 due to such random noise, on the average. The distribution of the number of noise bits is the binomial distribution with parameters  and k . When the query is far away from the origin of information about an object, the information would have decayed to much less than one bit, and almost all bits corresponding to the query are likely to be such false positives. The neighbor with the maximum bits corresponding to the query is likely to be a local maximum for such random noise. Thus forwarding to the neighbor with the highest indicator value is equivalent to forwarding to a randomly chosen neighbor while outside the aware neighborhood. Even inside the aware neighborhood, where the number of bits due to the source of the information is non-negligible, the noise due to false positives might sometimes overwhelm the correct information, thus leading to a wrong forwarding decision. We develop an analytical model in section IV that captures this behavior and derive expression for the probability of making such mistake at various distances from the node hosting the target object. 2) Conflict resolution: Due to noise caused by false positives, sometimes more than one neighbor can advertise the maximum number of bits corresponding to a query. Such a conflict can be resolved by randomly picking one of the neighbors with the maximum number of bits and forwarding the query to this neighbor. 3) Duplicate queries: We assume that queries contain unique identifiers (e.g., a string composed of the IP - port pair of the source node and a unique query id assigned by the source node) that are cached2 at intermediate nodes when seen for the first time. Thus, duplicate queries can be easily identified. As discussed above, the behavior of SQR outside the aware neighborhood is equivalent to random walk. It is possible that nodes are visited multiple times in such a random walk. To correctly implement the behavior of random walk, such duplicate queries should be forwarded to a randomly chosen neighbor. Another way to understand the intuition behind this behavior is to look at it from the perspective of an intermediate node that sees a duplicate query. The first time this query was seen at this node, it must have been forwarded to the neighbor with the maximum number of bits corresponding to the query. Now, since the query has somehow propagated back to this node, the neighbor with maximum bits is quite likely just a local maximum of random noise. In such a case, the amount of information in the routing tables is too low to make any inference about the location of the queried object, and a randomly chosen neighbor is as good a candidate as any to receive this query. In our experiments, we tried more sophisticated alternative mechanisms like choosing the neighbor with maximum bits among the untried neighbors, but did not observe any performance improvement over the much simpler protocol of randomly forwarding all duplicate queries. 2 The cost of storing id’s of recently seen queries is negligible even at nodes with a high query throughput.

D. Implementing search semantics over EDBF-based routing In this section we demonstrate how the EDBF can be used to implement specific search semantics of various mechanisms. The examples used here are: 1) Searching for unique keys associated with items using universal hashing, and 2) Keyword searches. 1) Searching for Unique Keys: The most straightforward application of EDBF based routing is in searching for unique keys. This is similar to the semantics implemented by DHTbased structured p2p networks, like Chord [2], SCAN [3], etc. The identifier for each content item is the key generated using a well known universal hash function. A lookup in the EDBF comprises of first extracting the identifier from the query string using this well known hash function and then looking up the EDBFs for this identifier-key. Each EDBF returns a fraction between 0 and 1, corresponding to the fraction of bits corresponding to this key that were set to 1. This fraction can be directly interpreted as the probability of finding a node with content corresponding to the query if it is forwarded along the respective link. 2) Keyword Searches: Allowing keyword searches over content labels is an important issue that is sometimes overlooked or difficult to implement over mechanisms that use hash functions to propagate information about content. In this section, we show how SQR can be used to implement keyword searches in a straightforward manner. The semantics of keyword search can be implemented using EDBF-based routing as follows. Content labels should be broken into a list of keywords using whitespace, punctuation and special characters as delimiters.This list of keywords then represents the set of identifiers used to advertise the content in SQR. Some words such as common file extensions, words smaller than three characters in length and extremely common words should be excluded from this list. To implement longest prefix match over keywords, multiple prefixes of words more than three characters in length should be advertised. Forwarding of queries can proceed as follows.Each query string can be broken into a list of keywords using whitespace, punctuation and special characters as delimiters. This list of keywords then represents the set of identifiers used to lookup the routing table. During the lookup operation, each of the EDBFs is queried once for each of the keywords (identifiers). The normalized sum of the probabilities obtained from each of the EDBF is then used as the final probability of finding matching content along corresponding links in the overlay. The normalization itself can either be a plain averaging of probabilities for individual keywords in the query, or it can be a weighted mean of the probabilities of finding individual keywords. Here the weight of a keyword is inversely proportional to its frequency of occurrence in content labels. Such weighted normalization brings the keyword search close to the Vector Space Model (VSM) in the context of text-document searching in traditional Information Retrieval (IR). Appendix C contains a small discussion of choosing EDBF parameters for implementing keyword searches, based on a measurement study of the actual content labels and query keywords used in current p2p systems.

IV. A NALYSIS In this section we develop an analytical model of SQR. The purpose of this analysis is twofold – to provide an analytical verification of the system design as well as to evaluate the performance of the system. We begin with an information theoretic model of the creation and propagation of EDBFs in section IV-A. We then present a Markovian model of routing in SQR with justifications for using this particular modeling technique in section IV-B. Finally, section IV-C presents techniques to compute the cost of forwarding using SQR. In our evaluations (section V-D), we compare the analytical predictions of the performance of SQR with simulations of the system. A. EDBF We model the network as a graph of n vertices, with degree

at each node and make a further simplifying assumption that for small i, the number of nodes exactly i hops away from an arbitrary node is ni = ( 1)i 1 . These assumptions are not precisely accurate unless the graph is a tree rooted at the arbitrarily chosen node, but they are aP good approximation in the i region where i is chosen so that j =1 nj Y ℄r t

1

(3)

Where [a; b℄ is the range of possible node degrees, and X and are binomial random variables, picked from the binomial distributions B (k; 1=di 1 +  =di 1 ) and B (k; ) respectively. Proof: See appendix B. Corollary 1: The transition probability from state i to states j j  i and above is given by ij = nntail i , where i = 1 i and ntail = ni + ni+1 +    + n5 .

Y

C. Computing the cost of forwarding using SQR Using the model developed above, we can compute the number of steps required to route a query to its destination (the origin). Let B = (bi;j ) be a 6  6 matrix representing the transition probabilities among various states. In this matrix, bi;i 1 = i ; 8i  1 and bij = ij ; 8i  1; j  i. Since nodes are associated with different states corresponding to their shortest distance from the origin, there can be no transition from state i to state i 2 and below. Thus bij = 0, 8j  i 2. Finally queries reaching state 0 are answered correctly and stay in this state hereafter4 . Thus b0j = bj 0 = 0, 8j 6= 0 and b00 = 1. The entries of matrix B are summarized in figure 5. Let Ti;0 , i = 1; 2; :::; 5, denote the number of steps it takes for a query that starts at a node in state i, to first encounter the target (i.e., state 0). Let T denote the number of steps for a query that starts at a node chosen uniformly at random, to 4 Our model considers a query successfully answered from the point where a query reaches the first node that has matching content.

1

1 0 0 0 0

0

11 2 0 0 0

0

12 22 3 0 0

0

13 23 33 4 0

0

14 24 34 44 5

0

15 25 35 45 55

Fig. 5. The matrix B.

first encounter the destination. Since the probability of a query originating in state i is ni =n, it is easy to verify that

E [T ℄ =

5 X ni

i=1

n

E [Ti;0 ℄

(4)

Now it remains to compute E [Ti;0 ℄. Let C = ( i;j ) be a 6x6 matrix such that all its entries are identical to those of B , except that 0;0 = 0 (note that b0;0 = 1). The values of E [Ti;0 ℄ are characterized by the following theorem: Theorem 2: Let the matrix D = (di;j ) be defined as D = C (I C ) 1 (I C ) 1 . Then E [Ti;0 ℄ = di;0 . The proof is omitted due to lack of space. V. S IMULATIONS In this section we present a simulation-based evaluation of SQR and compare its performance with other mechanisms for query routing in unstructured peer to peer systems. We begin with a description of the system model in section V-A. In the subsequent subsections, we present a simulation study of the query success rate, and the impact of replication on different search mechanisms. Finally, we compare our simulation results against the theoretical predictions from our analytical model presented in the previous section. A. System Model The various systems used in our simulations are summarized in table I. We broadly classify the systems as either having a flat topology or a hierarchical topology constructed explicitly or implicitly. Within these broad classes, different mechanisms for query forwarding can be used, giving rise to a large number of combinations. The set of systems we study is by no means complete, but our choices are representative of the body of prior work in this area and cover both existing systems as well as new ones proposed in the research literature. The original Guntella protocol (version 0.4) was not designed to exploit node heterogeneity. It constructs a topology with relatively similar node degrees and assigns a uniform search behavior of scoped flooding to all nodes. Both versions of Gnutella (V0.4 & V0.6) use well known “... permanent Gnutella hosts, called hostcaches (or gwebcaches), whose purpose is to provide a list of Gnutella hosts to any Gnutella servant connecting to them ... ” [16]. This list is usually a subset of the set of hosts seen by the hostcache recently. After obtaining such a list from the hostcache, the client (or servant) attempts to connect to one or more of the hosts mentioned therein. In Gnutella V0.4, clients actively seek neighbors till they have at least min neigh (=3 in our simulations) neighbors and accept neighbors till they have at most max neigh (=8 in our simulations) neighbors.

System

Topology

Flood Random OHR

flat (Gnutella V0.4) flat (Gnutella V0.4) flat (Gnutella V0.4)

SQR Ultrapeer

Search mechanism

flat (Gnutella V0.4) explicit hierarchy (Gnutella V0.6)

GIA

implicit hierarchy (GIA)

SQR+GIA

implicit hierarchy (GIA)

Scoped flooding Random Walk Random Walk with one hop replication of pointers This paper Scoped flooding among Ultrapeers Random walk with biased forwarding SQR with heterogeneous nodes

# neighbors per node [min,max] [3,8] [3,8] [3,8] [3,8] Leaf ! Ultrapeer [1,3] Ultrapeer ! Ultrapeer [3,8] Ultrapeer ! Leaf [1,100] [3,128] [3,128]

TABLE I S YSTEMS USED IN OUR EVALUATION .

We use the flat topology constructed by Gnutella V0.4 to evaluate a number of systems. The first system, Flood, uses flooding to propagate queries. We also simulate the system Random that replaces flooding with random walk. Random walks have been proposed as a more efficient replacement for scoped flooding [9], with the assumption that the Gnutella V0.4 topology resembles a random graph. Replication of pointers to content hosted one hop away is another enhancement to Gnutella V0.4. The system OHR in our simulations uses random walk with one hop replication of pointers. SQR is also simulated over flat topologies for a fair comparison with these systems. The second class of systems we simulate are those designed with the explicitly stated goal of exploiting the heterogeneity in node capacities, in terms of processing capabilities and/or network bandwidth. GIA [11] and Gnutella with Ultrapeers [16], [17] are examples of such systems. These systems achieve this goal by trying to build a topology that reflects the capacity of a node in the degree of connectivity of the node, with higher capacity nodes corresponding to high degree vertices in the overlay graph. Gnutella V0.6 does this explicitly by classifying nodes as either leaf nodes (low capacity) or Ultrapeers (high capacity). Leaf nodes connect to one or more (up to 3) Ultrapeers and flood their own queries to all neighbors, but do not propagate queries received from neighbors. Ultrapeers connect to up to 100 leaves and up to 8 other Ultrapeers and flood queries to Ultrapeer neighbors only. Ultrapeers also maintain pointers to content located at directly connected leaves and forward queries to a leaf only when the leaf is expected to host matching content. The topology of Gnutella V0.6 can be seen as a two layer hierarchy with a strongly connected layer of Ultrapeers to which a large number of leaf nodes are attached. The topology adaptation algorithm of GIA also achieves a similar situation of a small number of well connected high capacity nodes to which a large number of low capacity nodes attach themselves. We summarize its features here, referring the reader to [11] for a complete description. The GIA system has a network-wide limit on the minimum and maximum neighbors a node can have (3 and 128 respectively). A node i, with capacity Ci and num nbrsi neighbors, accepts new connections till C numn brs  4. Within these constraints, nodes try to maximize

their satisfaction5 level given by : 0

1



CN A 1 num nbrs Ci N N 2neighbors(i)



X



B. Query response rate In this subsection, we discuss the impact of various query routing mechanisms on the amount of resources used for providing a certain query response rate. We use initial resource limit to specify the maximum number of forwarding events caused by a query in the network. To obtain a common metric that captures the resource usage of both flooding and random walk based approaches, we use a generalization of TTL called Resource Limit (RL). For random walk based systems, that do not create multiple copies of a message, RL is the same as TTL. However, if scoped flooding is being used, and a query with resource limit r is received at a node with d neighbors, it is forwarded to the remaining d 1 neighbors with a resource limit of (r 1)=(d 1) appropriately6 rounded up or down so that the resource limit of all the flooded messages adds up to r 1. The set of object labels and queries are picked at random from a uniform distribution. We only simulate queries for content actually hosted in the network, with the understanding that queries for non-available content will fail after using up the maximum allowable resources, irrespective of the search mechanism in use. While all query routing mechanisms achieve a 100% success rate with sufficiently high resource limits, only the superior query routing mechanisms can achieve successful delivery of queries with low resource limits. Varying the resource limit allows us to study the profile of resource usage by various systems at different query success rates. 5 On receiving a join request, a node X that needs to drop some other neighbor to accommodate the incoming neighbor (Y) makes its choice using a pi k neighbor to drop algorithm that examines all the existing neighbors that have capacity less than the incoming neighbor and picks the one with the highest capacity (say Z). If Z has more neighbors than Y or it is the highest capacity neighbor among all neighbors of X, then it is dropped in favor of Y. Otherwise Z is retained and Y is refused the connection. 6 More precisely, the resource limit of each of the messages forwarded to 1; ;d 1 , is set to r 1=d 1 + 1 if i neighbor i, where i 1=d 1 + 1 otherwise. (r 1) mod (d 1) and to r

2 f   b

g

b



100

100

90 SQR OHR Random walk flooding

60

40

20

80 % of queries answered

% of queries answered

80

SQR OHR Random walk flooding

80 % of queries answered

100

70 60 50 40 30 20

60

40

20

SQR+GIA GIA Ultrapeer

10 0

0 0

500

1000 1500 2000 2500 3000 3500 4000 4500 5000 Resource Limit

(a) Flat topologies.

0 1

10

100

1000 10000 Resource Limit

100000

1e+06

(b) Flat topologies, log scale on x-axis.

1

10 100 Resource Limit

1000

(c) Hierarchical topologies, log scale on x-axis.

Fig. 6. Number of queries answered vs. resource limit, with one query per node.

C. Impact of Replication Having multiple instances of the same object at different locations in the network improves the efficiency of all query routing mechanisms by cutting short the average query path. We 7 The gap between the curves for GIA and Ultrapeer increases for larger sizes of the simulated network. However, routing tables in SQR make the simulations very memory intensive and do not scale to large sizes easily. For the sake of uniformity, we simulate a network of 2500 nodes for all systems.

10000 Resource limit for 90% query success

The query success rates vary significantly from the systems with flat topologies to those designed for heterogeneous nodes and having hierarchical topologies, either implicit or explicit. The systems with hierarchical topologies usually have a much lower resource usage. Figure 6 shows the plot of number of queries answered vs. the initial resource limit. Figure 6(a) and 6(b) are for flat topologies and figure 6(c) shows the results for hierarchical topologies. Log scale is used on the x-axis in figures 6(b) and 6(c) to capture the difference in scale of the amount of resources consumed by different mechanisms. The simulated network has 2500 nodes and the topology is constructed using the mechanisms listed in table I. Among the systems with flat topology, flooding is the most inefficient, followed by plain random walks. One hop replication of pointers improves the efficiency by an order of magnitude and using SQR further improves the efficiency by an order of magnitude. The topology construction and adaptation mechanisms of Gnutella V0.6 and GIA exploit the heterogeneity in node capacities to construct topologies that associate a high degree with high capacity nodes. One hop replication of pointers further improves the advantage of such topologies over the flat Gnutella V0.4 topology. Figure 6(c) shows the query response rate of various systems that use advanced topology adaptation mechanisms to construct hierarchical topologies. Recall that SQR+GIA is SQR with GIA’s topology adaptation mechanism. The simulations in [11], verified independently by our own simulations, demonstrate the superiority of GIA’s topology construction7 over that of Gnutella V0.6.The superior performance of SQR+GIA verifies our claim that query routing is somewhat orthogonal to topology construction and the efficiencies of SQR carry over to any network irrespective of the topology.

OHR GIA Ultrapeer SQR+GIA

1000

100

10 1 (0.04)

2 (0.08)

5 (0.2)

10 (0.4)

25 (1)

Number of Copies of each object (Replication rate in %)

Fig. 7. Resource limit for 90% query success vs. replication rate. Notice the logscale on both axes.

study two models of content replication, corresponding to uniform replication of all objects and a replication rate picked from the Zipf distribution [18]. Our attempt in this section is to study the effects of various kinds of replication on the query routing systems. Queries originate at nodes chosen uniformly at random and the query target distribution is the same as the content replication model. This is a simplifying assumption, but a natural one in the absence of better models. The first model of content replication we use assumes a uniform (constant) rate of replication for all content in the system. While this is quite unlikely to occur in a real system, choosing a uniform replication rate allows us to isolate the effect of specific rates of replication over different systems. Figure 7 demonstrates the effect of replication on different query routing systems. The x-axis represents the number of copies of each object or the corresponding uniform replication rate in a 2500 node network. The y -axis represents the resource usage to attain a 90% query success rate in such a network. Curves for all systems have an almost constant negative slope in this log-log plot. The relative advantages for different systems are roughly maintained across different replication rates with the exception of Ultrapeer that shows a faster performance

100

70 60 50

100

100

90

90

80

80 % of queries answered

80

% of queries answered

Percentage of successful queries

90

70 60 50 40 SQR (Analytical) SQR (Simulation)

30 20

70 60 50 40 30 20

10

40

10

0

30 20

0 10

100 Resource limit

1000

500

1000 1500 Resource Limit

2000

(a) Normal scale.

10 1

0 0

SQR+GIA GIA Ultrapeer OHR

10000

Fig. 8. Success rate of queries vs. resource limit in a network with replication according to the Zipf distribution. Notice the contrast with figure 6.

SQR (Analytical) SQR (Simulation)

1

10

100 1000 Resource Limit

10000

(b) Log scale on x-axis.

Fig. 9. Comparison of analytical predictions with simulations.

Metric Mean Median

Analysis 232 161

Simulation 221 130

TABLE II

improvement with increasing replication rate. This is an effect of the relatively small size of the simulated topology (2500 nodes, 5% of which are Ultrapeers), where the explicitly hierarchical topology of Gnutella V0.6 starts looking like a clientserver system at high replication rates. These advantages are absent at network sizes above 10,000 nodes and we observed Ultrapeer to perform worse than GIA in simulated networks of this size. The second model we use for replication of content is based on the Zipf distribution. The Zipf distribution is reported to model the replication of objects on the web and has also been used to model the replication of content in peer-to-peer systems [18]. In a set of objects replicated according to the Zipf distribution, the ith most replicated object has 1=i times as many replicas as the most replicated object. In our experiments, we use a replication rate of 10% for the most replicated object. The success rate of queries is plotted against the initial resource limit in figure 8. The content is replicated according to the Zipf distribution as described above. Notice how the lower half of the curves differ from those in figure 6. The faster initial growth in success rate can be attributed to the large number of replicas of the most replicated item. However, the top end of the curves look the same as in figure 6. This portion corresponds to the queries that take a lot of steps to get answered, most of them being queries that target rare or unreplicated objects with low rank in the Zipf distribution. D. Comparing analytical predictions with simulation results The predicted behavior of SQR from our model in the previous section is compared with the simulations of SQR reported earlier in this section in figure 9. The analytical curve corresponds to numerical computations based on theorems 1 and 2. The simulation curve is the same as in figure 6(b), described in section V-B. Recall that the analysis in section IV models the performance of SQR in random graphs. This is an approximation of the topology constructed by Gnutella V0.4, used in our simulations. In spite of this approximation, and other simplifying assumptions made during the modeling, the two curves have a close fit.

C OMPARISON OF ANALYTICAL PREDICTIONS WITH SIMULATION RESULTS .

Table II compares the predicted values of the mean and median of the query length distribution with the actual values obtained from our simulations. The prediction errors are below 20%, which is quite reasonable in light of the various simplifying assumptions made during modeling. VI. D ISCUSSION The simulations in the previous section demonstrate the superior performance of SQR for query routing. However some issues such as routing overheads of SQR, the impact of network dynamics on these overheads and the interworking of SQR with other components of a peer-to-peer system merit further discussion. A. Routing overheads The routing overhead of SQR is the cost of building and maintaining the EDBF based routing tables. An upper bound on the number of bits propagated across all links in the network by one object, defined as A, was derived in equation 1. Assume that the fraction of bits that are one in the EDBF’s, defined as , is 0.1. If the average degree of nodes ( )is 5, the total number of hash functions (k ) is 32 and the decay factor (d) is 8, the value of S from equation 1 is 320 bits. Since , the fraction of bits in the EDBF’s, is 0.1, for every bit that is 1, nine others have to be zero, bringing the cost up by a factor of 10 to 3200 bits. However, since updates are compressed using arithmetic compression, the actual cost is about 3200H () bits, where H is the entropy function. With  = 0.1 and H () = 0:469, this cost evaluates to 1500 bits or 187 bytes per object. We emphasize that this is the total cost of routing updates caused by one object over the entire network and not the cost of a single update. Contrast this with the cost of one hop replication of pointers to content hosted at each node. An empirical analysis of 1.3 million file names traced from query responses on a Gnutella

network shows that the average file name is 52.4 bytes long8. Replicating these filenames at each neighbor in a network with average degree of 5 will cost 5  52:4 = 262 bytes. Thus the update cost, or routing overheads, for SQR compare favorably with existing and proposed mechanisms that replicate the content index at all immediate neighbors. The mechanism proposed in [19], uses Bloom filters to propagate probabilistic information. We review this mechanism as a related work in the next section. The evaluation in [19] reports a cumulative update cost of several kilobytes per-document, clearly exceeding the overheads of SQR. B. Network dynamics The high dynamics of peer-to-peer networks, in terms of membership churn (node arrivals and departures) as well as changes in content hosted at various nodes is the principle motivation for minimizing the routing overheads. The overheads in any routing system are amortized over the forwarding events that gain from the low-cost paths discovered through routing. If the network changes frequently, the routing cost is amortized over a smaller number of forwarding events, thus reducing the relative benefit of running the routing protocol. Due to this reason, routing protocols need to become more and more lightweight as the amount of dynamics in the network increases. Peer-to-peer networks, lie close to the extremes of this routing-cost forwarding-benefit tradeoff, so much so that the mechanisms of flooding or random walk do no routing at all, but settle for a high cost for every query. This is a sub-optimal design choice for all but the exceptionally dynamic networks. As shown in section VI-A, the overhead of advertising each piece of hosted content is below 200 bytes in SQR. This is a small price to pay for the significant reduction in query path length and the corresponding reduction in network traffic achieved by SQR. Recent studies have shown that the distribution of node and connection lifetimes in Gnutella like networks is highly skewed, with a lot of connections surviving for less than a minute. On the other hand, connections (overlay links) that survive the first five minutes have a much larger mean lifetime. This motivates the addition of a small amount of hysteresis to the routing protocol so that updates are exchanged after a small time has elapsed since the establishment of the peering relationship. Based on the existing measurement studies, we propose a hysteresis interval of 5 minutes before the first updates are exchanged. Routing using in SQR will function correctly with this added hysteresis, but will be slightly inefficient as queries during the hysteresis interval will not be able to exploit any information about the newly arriving content. C. Exploiting Node Heterogeneity The population of nodes in Gnutella like networks has been observed to be quite heterogeneous, both in terms of their processing and storage capacity, and the access bandwidth they have to the rest of the Internet. This is exploited by systems such as Gnutella V0.6 and GIA in their topology construction, 8 This is closely linked to the fact that searches in Gnutella are keyword searches over content filenames

with high capacity nodes getting a higher degree, which provides the network with a well connected core of high capacity nodes that routes queries more efficiently. Through our simulations of SQR+GIA we have demonstrated that SQR works quite well with such topology adaptation mechanisms. However, there is still scope for improvement. We briefly describe here, an extension of SQR to allow variable size routing tables, so that nodes with high capacity can accumulate and exchange a larger quantity of information while protecting low capacity nodes from getting swamped. The basic idea is to allow multiple sizes of the bitmaps (bit arrays) used to store the EDBFs such that the information can be translated consistently across bitmaps of different sizes. Using powers of 2 for bitmap sizes is a natural option. The same set of hash functions can be used across different bitmap sizes by using hash functions with a large range but picking only the i least significant bits as the result when using a bitmap of size 2i . A large bitmap can be translated to one half its size by splitting it into two halves and taking bitwise or of the first half with the second. For example, a bitmap of size 1024, with bits numbered from 0 to 1023, can be “folded” by taking the bitwise OR of bits at locations 0 and 512, 1 and 513 and so on and storing the result in a 512 bit long array. Bitmaps can also be “expanded” to twice their size by making two copies of the same bitmap and concatenating them back to back. However, this operation creates some noise as for most bits set to 1 in the original bitmap, only one of the two copies in the expanded bitmap is a correct translation while the other copy is noise. Instead we recommend that a high capacity node maintaining a large bitmap also maintain all smaller bitmaps. This is not an expensive solution because if the largest bitmap has size 2i , all the smaller bitmaps will add up to a size strictly smaller than 2i . For all practical purposes, allowing three different bitmap sizes (say 64Kbits, 512 Kbits and 4Mbits) with a factor of 8 difference between consecutive sizes should be adequate to cover several orders of magnitude difference in node capacities. For interworking with Gnutella V0.6 with a two level hierarchy of Ultrapeers and leaf nodes, two sizes differing by a factor of 26 or 27 should be chosen. Such a choice would correspond roughly with the two orders of magnitude difference between node degrees of leaf nodes (up to 3) and Ultrapeers (up to 100). D. Interaction with other system components While we have focussed on query routing in this paper, two other components – topology construction and adaptation, and flow control – have a significant impact on search performance. An implementation of SQR needs to operate along with such components of a p2p system. Dynamics of node lifetimes and variability in node processing capabilities and access bandwidth raise some other issues that need to be addressed in a full implementation. We discuss such issues in brief in this subsection. As observed in the simulations in the previous section, topology adaptation mechanisms such as those of GIA and Gnutella V0.6 reduce the path length of average queries by almost an order of magnitude. The overlay topology constructed by these mechanisms assign a high degree to high capacity nodes ensuring that they are more likely to be visited during random walks.

However, this raises the possibility of congestion at high capacity nodes, handled by the flow control component of the system. As demonstrated by our simulations, the design of SQR allows easy interoperation with different topology construction mechanisms. GIA’s topology adaptation mechanism uses an abstract notion of capacity, capturing the processing capacity of a node and its access bandwidth to the network in a single scalar. Other approaches to topology construction might take a deeper look into the notion of capacity, investigating factors such as the end-to-end delay, overlay link capacity, or available bandwidth between candidate neighbors. Similarly, there are approaches that try to build topologies with globally optimal characteristics such as the small world property 9 in the constructed topology. Since we make no assumptions about the type of topology in the design of SQR, it should interoperate equally well with any such topology construction mechanism. The issue of flow control is again related but somewhat orthogonal to the operation of query routing mechanisms, and we expect SQR to work well with any mechanism for flow control. Other issues such as local policies to create and process routing updates are more intimately related to the performance of query routing. While we describe a simple default mechanism in our design, there are sophisticated alternatives such as reflecting trust in the decay factor associated with each neighbor. SQR provides an efficient and scalable routing mechanism, and issues related to policies that implement richer semantics merit further exploration. VII. R ELATED W ORK The use of Bloom filters to store and propagate information about distributed content in overlay networks has been proposed earlier in various contexts [20], [21], [19], [22]. The work in the Gnutella development community [20], [21] has focused on specifying and implementing a protocol for constructing and maintaining a Bloom Filter based routing table for keyword queries. Each location in the array storing the Bloom filter contains an integer representing the distance of content associated with the corresponding key. The current Limewire distribution already has a rudimentary implementation of the query routing protocol as specified in [21]. The existing work on query routing, although pioneering the idea of using Bloom Filters to represent routing information, suffers from a lack of understanding of the effects of propagating such routing information. Indeed, the authors of [21], while specifying the Query Routing protocol for Gnutella, conclude by highlighting the need for a careful study before propagation of query routing tables is switched on. We argue that the Gnutella Query Routing protocol in its current form is not amenable to propagation of routing information, as the overheads will exceed the cost of even flooding based search in even moderately dynamic networks. The ideas in [19] come closer to our design, although the target application in [19] is locating nearby replicas of content. Assuming a default search algorithm that uses a deterministic 9 The small world property is that while most links are to nearby nodes some links should be to nodes far away in the network. Such topologies have low diameter yet minimize the total traffic in the network as most overlay links correspond to short end-to-end paths.

location and routing mechanism, such as an idealized directory service or a DHT-based search, the authors use Bloom Filters to enhance the search performance when the object being queried for is located nearby. The routing protocol in [19] creates an attenuated Bloom filter – which is just a collection of d Bloom filters, with the ith Bloom filter at a node being associated with content published by nodes i hops away from the node. While forwarding a query, the ith filter associated with each link is examined. If a match is found in an attenuated Bloom filter, the query is forwarded on the corresponding link. Otherwise, the (i + 1)th Bloom filters at each link are examined. On finding no match in any of the d Bloom filters, the default search algorithm is used. Routing updates are flooded to nodes up to d hops away from each source of content. However, as discussed in section VI-A, the costs of propagating this information is quite high. While the architecture presented in [19] addresses its target application quite well, extending it to produce a global query routing mechanism is not feasible. GIA is a comprehensive framework for increasing the scalability of Gnutella-like systems [11]. Starting with the observation that current p2p file-sharing systems exhibit remarkable heterogeneity in node capacities – in terms of query processing capabilities and connectivity bandwidth – the authors design a system that has components for dynamic topology adaptation, active flow control, biased random walk for search and one hop replication of pointers to content. Together these four components exploit the heterogeneity in node capacities to build an implicit hierarchy in the system with high capacity nodes attaining a high degree of connectivity due to topology adaptation. Higher degree nodes have pointers to a lot of content due to one hop replication of pointers and are visited preferentially due to the biased random walk. The danger of congestion at such high capacity nodes is mitigated by the active flow control mechanism. The mechanisms in GIA are designed with the objective of reflecting node heterogeneity in the topology of the network and then exploiting the higher capacity at high degree nodes to service most of the queries. Our efforts, on the other hand, have been focussed on disseminating information about content as efficiently as possible, and then exploiting the large size of the region that is partially aware of the location of the content to route queries efficiently. The mechanisms for topology adaptation and flow control in GIA are orthogonal to our design and can be easily assimilated into an SQR based system. The routing tables in SQR have significantly larger amount of information than one hop replication of queries in GIA, yet the cost about the same to construct and maintain. Our evaluation shows that replacing the replication of pointers and biased random walk in GIA with SQR produces a system that retains all the benefits of GIA while improving the efficiency of query routing. VIII. C ONCLUSIONS Routing in networks is a complex problem. But for the success of hierarchical routing, it would be impossible for the Internet to reach its present scale. Unfortunately the prerequisites for hierarchical routing such as prefix aggregation and a hierarchy in the network topology, are absent from the domain

of query routing in peer-to-peer networks. Recent work has made progress in identifying some structure in these networks and then exploiting this structure to simplify the problem of query routing. Recognizing heterogeneity in node capacities as a source of implicit hierarchy in p2p networks, identifying the small-world property as a way to create topologies with low stretch, and spreading information in the neighborhood of hosted content are all steps in this direction. In this work, we have explored this last approach of spreading information about the location of hosted content in its neighborhood. We design a framework to provide Scalable Query Routing by spreading probabilistic information about the location of content through a lightweight routing protocol, and then using this information for forwarding queries. In the process, we designed EDBF, a data structure to efficiently represent probabilistic information. We expect EDBF to find application in a variety of situations where some accuracy can be traded off for the gains in space efficiency provided by EDBF. The simple design of SQR makes it possible to analytically model the mechanisms of information propagation through routing and its use in forwarding. Our analysis allows better understanding of the design ideas and also allows us to predict the performance of SQR. Simulation based comparison with various query routing mechanisms, under a variety of topology construction mechanisms and different content replication scenarios, establish the performance advantages of SQR. These simulations also demonstrate the ease with which SQR can inter work with other system components such as topology construction. We expect a similar ease of integration with other system components such as flow control. While providing a set of mechanisms for query routing, SQR imposes little restriction on the possible policies that can be implemented. A case in point is the ease with which the semantics of keyword searches can be composed using SQR. The framework provided by SQR presents interesting possibilities of implementing high level semantics of trust, reliability, etc. using routing and forwarding policies. These are interesting issues that merit further exploration. R EFERENCES [1] S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine,” Computer Networks, vol. 30, pp. 107–117, 1998. [2] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” in Proc. of ACM SIGCOMM ’01, 2001. [3] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, “A Scalable Content-Addressable Network,” in Proc. of ACM SIGCOMM, 2001. [4] A. Rowstron and P. Druschel, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems,” in IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), 2001. [5] B. Y. Zhao, J. Kubiatowicz, and A. Joseph, “Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing,” U.C. Berkeley Tech. Report UCB/CSD-01-1141, Tech. Rep., 2001. [6] A. Kumar, S. Merugu, J. Xu, and X. Yu, “Ulysses: A Robust, LowDiameter, Low-Latency Peer-to-Peer Network,” in Proc. of IEEE ICNP, 2003. [7] C. Tang, Z. Xu, and M. Mahalingam, “pSearch: Information Retrieval in Structured Overlays,” in ACM HotNets-I, October 2002. [Online]. Available: citeseer.nj.nec.com/tang02psearch.html [8] “http://gnutella.wego.com.” [9] C. Gkantsidis, M. Mihail, and A. Saberi, “Random walks in peer-to-peer networks,” in Proceedings of IEEE Infocom, 2004.

[10] S. Merugu, S. Srinivasan, and E. W. Zegura, “Adding structure to unstructured peer-to-peer networks: the role of overlay topology,” in Proceedings of Networked Group Communication (NGC)., 2003. [11] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S. Shenker, “Making gnutella-like p2p systems scalable,” in Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications. ACM Press, 2003, pp. 407–418. [12] B. Bloom, “Space/time trade-offs in hash coding with allowable errors,” CACM, vol. 13, no. 7, pp. 422–426, 1970. [13] M. Mitzenmacher, “Compressed bloom filters,” IEEE/ACM Trans. Netw., vol. 10, no. 5, pp. 604–612, 2002. [14] C. Huitema, Routing in the Internet. Prentice Hall PTR, 1999. [15] K. Whang, B. Vander-Zanden, and H. Taylor, “A linear-time probabilistic counting algorithm for database applications,” ACM Transactions on Database Systems, 1990. [16] T. Clingberg and R. Manfredi, “Gnutella0.6,” June 2002. [17] A. Singla and C. Rohrs, “Ultrapeers: Another Step Towards Gnutella Scalability,” 2002, version 1.0. [18] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker., “Web caching and zipf-like distributions: Evidence and implications,” in Proceedings of IEEE Infocom, Mar. 1999. [19] S. C. Rhea and J. Kubiatowicz, “Probabilistic location and routing,” in Proceedings of the 21st Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM ’02), 2002. [20] M. T. Prinkey, “An efficient scheme for query processing on peer-to-peer networks,” aeolus Research, Inc., http://aeolusres.homestead.com/files/index.html. [21] C. Rohrs, “Query routing for the gnutella network,” lime Wire LLC, http://www.limewire.com/developer/query routing/keyword routing.htm. [22] J. Byers, J. Considine, M. Mitzenmacher, and S. Rost, “Informed content delivery across adaptive overlay networks,” in Proceedings of the ACM SIGCOMM ’02 Conference, August 2002. [Online]. Available: citeseer.nj.nec.com/byers02informed.html [23] A. Broder and M. Mitzenmacher, “Network Applications of Bloom Filters: A Survey,” in Fortieth Annual Allerton Conference on Communication, Control, and Computing, 2002. [24] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression,” Commun. ACM, vol. 30, no. 6, pp. 520–540, 1987.

A PPENDIX A. Information theoretic issues in the design of EDBF Bloom filters have been used to exchange summary information about objects (e.g., web object) cached or hosted in nodes distributed in a networked environment, consuming much less bandwidth than the alternative of exchanging the full description of such objects (e.g., URL). An EDBF can be similarly used to store and exchange probabilistic information where a small loss in accuracy is acceptable in exchange for the gains in space efficiency. Previous research on Bloom filters [23] has shown that the most efficient configuration for a Bloom filter that stores n items using a bitmap of size m , is to use k = (m=n) ln 2 hash functions, thus filling up the bitmap by 50%. In other words, this choice of k minimizes the probability of false positives, and in this case the fraction of bits in the bitmap that are set to 1 is 1/2. However, a key insight in [13] is that if we reduce the number of hash functions used for each object (say to k 0 ), and start with a larger bitmap (say of size m0 ), the probability of false positives can be reduced further, while retaining the same communication cost for the Bloom filter, using arithmetic compression [24]. Updates that represent incremental changes to the filter can be constructed by taking bitwise XOR of the latest version of the bit-array with the previously transmitted version. This delta array can also be compressed using arithmetic compression to get significant savings in the communication costs.

1

0.6 0.5 0.4

% of total keywords

query file name

20 15 10

0.3 0.2

query file name

25

25

0.7

% of total

Prob[length Y ℄r 1 . The signal from the upstream neighbor can be the strongest, but equal to the noise from t neighbors, (1  t  r 1). This event has a probability p2 = P r[X = Y ℄t P r[X > Y ℄r t 1 . In this case the query is forwarded to the upstream neighbor only with a probability 1=t + 1. Assuming the number of neighbors as distributed uniformly in the interval [a; b℄, we get the total probability of the query being forwarded to the upstream neighbor Pb Pr 1 1 p2 ). Substituting for p1 and p2 as i = r=a (p1 + t=1 t+1 completes the proof. C. Keywords in content labels and queries. Recall that EDBF uses k hash functions for advertising each unique label. The existence multiple of keywords in content labels, and the multiple prefixes of each keyword, allow us to reduce the number of hash functions per item. A study of actual content labels and query keywords used in current p2p systems, summarized in figure 10, indicates that almost 85% of content labels (file names) are made up of 5 or more words and 70% of the queries use two or more keywords. About 70% of keywords used in searches and 65% keywords occurring in filenames are three or more characters long.

We use traces collected at Ultrapeers in a Gnutella network for our study. To get a sample of content labels and query strings, we extract the file names from query responses in a trace containing 1.3 million query responses, and use a trace of 17.8 million queries on the Gnutella network to get query samples. To split a filename or query into words we, split the corresponding string at commonly used delimiters such as whitespace, underscores, parentheses, dots and dashes. The CDF of string lengths, in terms of characters, is shown in figure 10(a), and the distribution of string lengths in terms of words is shown in figure 10(b) . The average file name was made up of 7.67 words and was 52.4 bytes long, while the average query was 17.7 bytes long and contained 2.5 keywords. The distribution of keyword lengths, for the sets of words occuring in filenames and queries respectively, is shown in figure 10(c). Based on these observations, we recommend that for advertising keywords from filenames, the four most unique keywords from each filename be chosen and all prefixes longer than three characters be inserted into the EDBF using k 0 hash functions. The value of K 0 should be chosen in such a way that the total bits inserted into the EDBF for each file name equals k the total number of hash functions in the EDBF.

Suggest Documents