Enhanced Distributed Hash Tables for Complex Queries

2 downloads 0 Views 264KB Size Report
Oct 14, 2005 - apparent from the fact that a large number of peer-to- peer architectures have been proposed, for example,. Napster [3], gnutella [2], Chord [13], ...
Enhanced Distributed Hash Tables for Complex Queries Pankaj Garg

Amit Kumar

Huzur Saran

[email protected]

[email protected]

[email protected]

Dept. of Computer Science and Engg. IIT Delhi, New Delhi – 110016 October 14, 2005

Abstract

is expected that the size of such networks are going to increase rapidly.

Peer-to-peer file sharing systems have become a very popular way of sharing large number of files over a distributed environment. One of the principal ingredients of such systems is a lookup service which maps a key denoting a file to a location storing the file. Dynamic hash tables (DHT’s) were recently proposed as a means of supporting such a lookup service in a completely distributed manner. They have many desirable properties, but suffer from one serious drawback — in order to locate a file, we must have a precise knowledge of the key representing it. In this paper, we propose a lookup service which supports complex queries and has all the advantages of DHT’s. We also compare our proposed method with PIER [8], another recently proposed peer-to-peer system for answering complex queries. Our experiments show that our method results in better utilization of the network than PIER.

The participating nodes in a peer-to-peer file system are responsible for storing multiple copies of the files and providing a lookup service for searching the required file. As is standard in this community, we shall assume that each file has a unique “key” associated with it. Several nodes may store a file with a specified key and they publish the key in the system. The lookup service takes a key as input and outputs the address of one of the nodes storing the corresponding file. Clearly, the desired properties of a lookup service are fast search, and resilience to new nodes joining or current nodes leaving the network.

1

Introduction

Peer-to-peer file sharing systems have become a very popular way of sharing large number of files over a distributed environment. By some estimates, as much as two-thirds of the traffic that ISPs carry today is from peer-to-peer sites [16]. Some of the reasons for the success of such systems are – completely distributed architecture without any centralized control, replication of data leading to high availability at all times, fault tolerance, and efficient searching of required data. As bandwidth gets cheaper, throughput of such systems are bound to improve and so it

In all such systems, some index mapping keys to locations is stored in a distributed manner. When a query for a key appears, the system needs to locate the index which contains this key. The wide range of choices available here in designing such a service is apparent from the fact that a large number of peer-topeer architectures have been proposed, for example, Napster [3], gnutella [2], Chord [13], CAN [11]. Dynamic hash tables (DHT’s) were proposed by [13, 11, 17, 5] as a means of supporting the lookup service in a completely distributed manner. It has been shown by a large body of work [13, 11, 17, 5] that DHTs are a highly efficient and scalable of way of providing this service. Some of the key features are even distribution of load to various nodes (here load means the overhead associated with storing index tables), fast lookup service, resilience to faults in the network and a scalable architecture. Chord [13] is one of the systems which uses DHTs and has been shown to have many desirable properties.

In DHTs, a hash function (which is known to all participating nodes) maps each node to a unique point in a logical space. Let V be the set of such points to which a node can be mapped. When a node x wants to share a file with key k, it uses the hash function to map the key k to one of the locations in V . This location stores the pair (k, x). When we need to lookup for the key k, we use the same hash function which directs us to a node storing a pair (k, x0 ) for some node x0 . Here x0 is a node which stores the file with key k (x0 may not be same as x if multiple copies of the file are present in the system). From this node, we can get the location of the node x0 storing the file with key k.

section 3, we describe our overall system architecture, in section 4, we give a brief analysis of our design, in section 5 we describe the results obtained from an actual implementation of this system and then we end with comments on future directions in section 6.

However DHTs suffer from one serious drawback. In order to search for a file, we need to know the corresponding key exactly. Indeed the hash function needs to know the key exactly, otherwise it can map the wrong key to an entirely different location. To see why this is a drawback, we need to look at the applications where such systems are popular. Many popular peer-to-peer systems are used for sharing songs, movies, and books. Typically, the key used here is the name of the song or movie or book. But suppose a user remembers only some of the words in a song. It is not possible for such a simplistic DHT based scheme to address such problems.

Napster [3] was the first popular peer-to-peer system. It consists of a central server which maintains a complete list of filenames and the locations where they are residing. When we need to look for a file (filename is the key in Napster), we send the query string to the central server and the server replies with a list of matched results. Note that this model can easily deal with complex queries because all the information resides at a single place. But it suffers from problems of scalability and fault tolerance.

The goal of this paper is to design a peer-to-peer file sharing system which has all the advantages of a DHT based system and which allows more complex lookup service than merely searching for the exact key. Let us briefly explain by what we mean by a complex query. Each key has certain attributes, e.g., a key corresponding to a song file can have singer, album, etc. as its attributes. A complex query allows for searching on any of the attributes of the keys. Thus we aim at achieving the best of both worlds – a rudimentary database allowing complex queries on the attributes of the key, and a distributed scalable file sharing system. Although ours is not the first attempt at such a system, we believe our design parameters achieve scalability, fast lookup service and low network traffic. We demonstrate this by comparing our system to PIER [8], another system proposed to address the problem of complex queries over DHT based systems, and we show that our system leads to less overheads in terms of network traffic. The paper is organized as follows. In section 2, we describe related work done on peer-to-peer systems, in

2

Previous Work

Peer-to-peer systems is a large and rapidly growing field of research. We do not intend to give a complete survey of this area, but shall refer to key works which are directly related to our work.

Another application which tried to get rid of this problem is gnutella [2]. In this system, each node just maintains the keys of the files it wants to share. Whenever a query comes, it is flooded to all the nodes in the network and the nodes containing the file reply to the query. Although this system is distributed and allows for complex queries, it leads to huge amount of bandwidth consumption in the network [2]. To reduce the network traffic a modification is made in the querying procedure, namely all queries have a time to live parameter so they can reach nodes up to a certain depth only. However with this constraint gnutella cannot ensure that we will always be able to locate a file even if it exists in the system. In order to take care of these problems, systems based on dynamic hash tables (DHTs) were proposed. In these systems the nodes are assigned some sub-space in a logical space. They also maintain a routing table so that they can communicate among themselves. The keys are also hashed in the same space. We shall briefly survey Chord[13] and CAN [11]. Our system uses the underlying routing system suggested by Chord. In Chord [13], the logical space considered is a ring of large size. All the nodes are placed on this (vir-

Figure 1: (a) Logical Space. (b) Routing Table. tual) ring, and are identified by the location on the ring where the node is mapped (typically, the hash function maps the ip-address of the node to a location on the ring). Each key is also mapped on the same ring using the same hash function. The node which lies immediately next to the location where a key gets mapped on the ring stores the location information for this key. The queries are also hashed on the same space using the same hash function and hence are routed to the nodes which store the data corresponding to the queried key. These nodes reply back with the location where the data corresponding to the key is stored. Suppose a node wants to search for a key k. The hash function maps k to a location x on the ring. So the node needs to know the identity of the node which lies immediately to the right of x in the ring. A brute force way of doing this is that each node stores the entire information about all other nodes. But this would require huge amount of data storage at each node. A more scalable solution was proposed by [13]. This is done by maintaining a routing table(fig 2) at each node which maintains direct links to the nodes at exponential increasing distance on the ring. This table ensures that a node can locate the node x in O(log n) time while maintaining O(log n) direct links with other nodes, where n is the size of the ring.

sages are routed in the system such that they take the shortest path in the space considered. This ensures minimum number of messages exchanged to route a message. The routing takes O(n1/D ) messages, where n is the number of nodes in the system. We see that because of the use of hash function to index and locate a key, we need to have the precise knowledge of the key in order to locate the corresponding file. A partial key will not lead us in any way to the desired location. So first of all we consider keys which have several attributes and our aim is to have a distributed DHT based lookup service which can resolve the queries made on partial keys, i.e. queries made by specifying only few of the attributes of a key. For example, if the key is a string, a partial key can be a sub-string of any size and we consider the attributes of the key as the n-grams of the string, i.e. all sub-strings of size n for some suitable value n.

A brute force way to achieve this goal is to hash all the attributes of the keys into the system(Figure 2). This approach is followed in PIER[6], which considers a key as a string and its sub-strings as its attributes. Suppose a node wants to share a file with a given key. The node publishes the key and all the attributes of the key into the system. To resolve a query, the query string is also converted to n-grams and separate CAN [11] is another system based on distributed hash queries are made for all the n-grams. The results tables. In CAN, the space considered is a D-dimensionalcan be combined in several ways [15] to produce the toroid. All the nodes are allocated a D-dimensional location of nodes containing the desired data. For volume in the space. Each node maintains directed example, a simple intersection of the results gives the links to its neighbors which are defined as the nodes set of nodes sharing keys with a sub-string matching whose corresponding space shares (D−1)-dimensional the query string. edges the space corresponding to this node. MesWe will show in section 4 that this scheme gener-

ates huge network traffic. We suggest a publishing scheme which reduces the network traffic generated by removing the need for redundant messages sent in PIER system by several nodes for publishing the same data in the system.

3

System Architecture

In this section, we describe the overall architecture of our system. We need an indexing scheme based on the notion of DHTs such that we are able to publish all the attributes of a key, while keeping the network traffic in control. In addition we need a lookup service which can take a partial key, with only few of its attributes specified as input and returns the locations where the corresponding data can be found. We need to use hashing to publish the data since hashing significantly reduces the query resolving time as we have described in section 2. We use an underlying DHT scheme based on Chord (Section 2). We now describe the key features of our design.

Figure 3: Query Result Size vs Average Replication Factor[10].

3.1

Indexing Scheme

We first describe the indexing scheme used by our lookup service. The design criteria for this were motivated by the following observation. Most of the user queries are for files which have many replicas in the

system. This key observation has been validated by several experiments, we describe one such result. Experiments were performed in [10] to explore the nature of searches done in Gnutella. The study concluded that the common queries made by the users have huge number of replicas available in the system. To analyze the results, Gnutella query traces were logged and random 700 queries from among them were replayed at 30 of the Gnutella clients working in the Ultrapeer mode. Figure 3 plots the count of number of items with the same filename in the set of results obtained by making the same query on 30 nodes. The result is plotted in figure 3, where the Y axis is the query result set size and the X axis is the average replication factor averaged across all queries for each result set size. They show that there is strong correlation factor between the number of copies of a file and the number of results obtained for a query. This observation led us to the following design choice — we would like to save on the effort it takes to publish all the attributes of a key. Indeed, otherwise this process can happen many times in the system and can lead to deterioration of network utilization. Keeping this goal in mind, we allocate a unique node for each key(corresponding to a unique file) in the system. This unique node republishes all the attributes of the corresponding key into the system. This is done by introducing two levels of hashing as shown in Figure 3.1. In order to motivate this indexing scheme, let us consider the same scenario in a centralized system. We have a central server which keeps a look-up table maintaining the mapping between keys and nodes storing the corresponding file, e.g. Napster. In order to make queries on partial keys, either we maintain an index table and match the query key with all the table entries as in Gnutella, which is not scalable, or we maintain an inverted index table for the keys and index the attributes of the keys in it. The benefit of using inverted index table is that we can store them in some convenient form in order to make the search procedure fast, e.g. in case of strings if we lexicographically sort the inverted table entries, we can perform binary search and answer the query in small time. In this centralized approach there can be two models by which we can index the attributes of the key. First, we ask the participating nodes to index the attributes of the keys which they want to share, which

Figure 2: (a) Same key shared by several nodes (b) PIER Indexing Scheme is similar to the approach followed in the PIER[6]. This approach requires the nodes to send lot of redundant messages because of the replication of the files shared by the users on a large scale as the experiments mentioned at the beginning of this section showed. Second approach is that the central server on its own indexes the attributes of the complete keys indexed by the nodes. This will reduce the network traffic caused due to decrease in the number of messages exchanged.

We now describe our indexing scheme in the distributed environment. We employ two levels of hashing. In the first level of hashing the nodes chooses a node which will index the attributes of the keys it want to share. This step is similar to the indexing scheme of the existing DHT models(see Section 2). In this step a node indexes the complete key in the system using the hashing function. This step also collects the same keys at one node. In the second step, this node publishes the attributes of the key in the system using the same underlying indexing system. This approach will significantly reduce the network traffic caused since the need for huge amount of redundant messages sent in the system to publish the same data is removed.

3.2

Figure 4: Suggested Indexing Scheme.

Query Processing

A DHT based system enhanced with the indexing scheme suggested above can resolve queries made by providing only few of its attributes. Query can be resolved by making a small change in the resolving process of the underlying lookup system. When a

query is made, separate query is made for all the attributes provided in the query using the underlying lookup function. The results for all the queries can be collected and combined in several ways [15] to produce the final result. For example in case the key is a string, intersection of the results for the partial strings will give us the set of locations sharing the key with a sub string matching the queried string. By making some modifications to this intersection scheme we can also allow errors in making the query.

the keys is given by the expression i=k X

#Ki ∗ (Ri )

i=0

where #Ki is the number of attributes of the key Ki . The traffic generated by our model will be i=k X

#Ki + (Ri )

i=0

Multicast[7] tables can be introduced to store the partial results in the system so that further query can be made on it. This helps in supporting various complex query models, e.g. join as discussed in the PIER model[6] and reduce the inbound traffic for the node making the query. In addition to the indexing scheme, we assigned a time to live (TTL) field with each indexed key or its attribute of the key. In the original DHT model, the key is removed from the indexed table in the update cycle following the cycle in which the node indexing the key dies. This could have caused the problem here in case the node, which is sharing the attributes of the key for some other node dies. TTL field ensures that the key will remain indexed in the system for TTL number of update cycles. In the mean time the node sharing the complete key will index the key to some other node and it will update the location information for the attributes of the key.

4

Analysis

Along with this there will be some traffic generated to maintain the topology of the network depending on the underlying routing protocol used. Since both the ideas are based on DHTs we can assume this traffic to be same for both. Hence we can expect our model to perform better. Comparison with Gnutella We implemented a hybrid model of our enhanced DHT system and Gnutella client. This was required in order to make the data more realistic. A hybrid node implemented Chord protocol enhanced with the suggested indexing scheme which was also a Gnutella ultrapeer[2]. The node could log the queries by the users on Gnutella network, inject new queries in the system and collect the results for the queries. A number of predefined queries were injected into the system and the results for the queries were published into the DHT system. So rather than sharing some hypothetical filenames, we injected a few predefined queries into the Gnutella network and as the results were returned, they were published in the DHT system.

Our experiments with our hybrid model as described in section 5.4 shows that a pure DHT based peer to We compare our indexing scheme with other ideas peer model generates significantly lesser amount of supporting search on partial keys. traffic as compared to the flooding based systems. Comparison with PIER The analysis of the amount Hybrid model is essentially a flooding based model of traffic generated by PIER and our model along in which some improvements are made by making a with the plot in Figure 3 shows that our model will DHT based peer to peer network of several gnutella perform significantly better than the PIER model and nodes[9]. All the nodes in the hybrid nodes are also our model performs same as PIER in its worst case, Gnutella clients so hybrid nodes will generate more i.e. when there is no replication of any key in the sys- traffic than our model which is a pure DHT based tem, which is quite unlikely as can be inferred from model. Figure 3. Say there are k keys to be indexed in the system and key Ki is replicated Ri number of times. In the PIER network the amount of messages required to publish

5

Experimental Results

In this section, we describe the performance results for an implementation of our peer to peer system.

The primary performance parameters of our system is the amount of network traffic generated to maintain the topology and the index tables in the distributed environment. We measure the network traffic by the total size of messages sent and received by the nodes in the system in a real world scenario. This is done by running the hybrid model of a DHT based system and Gnutella client on several planet lab[4] nodes. Planet lab is an open platform for deploying and accessing planetry scale services. It serves as a test bed for the overlay networks. Later we report the experimental results obtained from operational hybrid system on the planet lab nodes in the real world scenario.

5.1

The reason for this large variation is that the nodes and the keys are not distributed uniformly in the logical space. So some nodes are associated with the larger space, which have large number of keys indexed at them while other have small space associated with them so they have fewer keys indexed at them.

Chord Simulator

The Chord routing scheme can be implemented in both recursive and iterative styles. In a recursive style the origin node, who wants to send a message transfers this work to the node from its routing table which lies closest to the destination node in the logical space. This process continues till the message is delivered to the destination node. In an iterative style of routing, rather than transferring the message, the origin node asks the node for the node in its routing table which lies closest to the destination node in the logical space. This way the origin node queries a series of nodes for information from their routing table in an iterative manner. Once the address of the destination node is found message is directly sent to it. Each step in both the styles reduces its distance to the desired node by half. We implemented the underlying Chord protocol in the iterative style. Our indexing scheme is built on top of this protocol, i.e. the partial keys are also published into the system using the underlying indexing scheme.

5.2

mean and the number of keys indexed at each node. The top and bottom 1 percent of the nodes have been removed before plotting. We can observe large variation in the number of keys indexed at each per node. To show the same figure 5.2 plots the probability density function(PDF) of the number of keys per node when there are 104 number of nodes in the system and 10 · 104 keys and gram size is set to 10.

Load Balancing

We would like that in a distributed environment each node should contribute equally or there is no overloaded node. So if there are N nodes in the system and K complete and partial keys then each node should publish the data for K/N keys. We consider a network of 104 nodes. Total number of keys in the system are varied from 2 · 104 to 16 · 104 and the gram size is set to 10. Figure 5.2 plots the

5.3

Path Length

From our simulator point of view, path length is the number of iteration steps required to reach the required node. This metric defines how good a routing protocol is. Since at every iteration step we reduce the distance to the required node by half, the path length has to be less than log N in an N node network. To observe this a random set of 10 · 104 queries are made on variable size of the network and the path length required to resolve the query is measured. The figure 5.3 plots the average path length. As expected the mean path length increases logarithmically with the number of nodes. In figure 5.3 we plot the PDF of the path length for 10 · 104 queries on 104 number of nodes present on the ring of size 214 . The PDF plot shows that the path length is log N/2. The reason for the 1/2 is as follows. The distance between two nodes can be represented as a binary number, and the number of iteration required to resolve a query is the number of 1s in the binary representation. Since the distance is random, we expect half the number of bits will be 1.

5.4

Network Traffic

Next we measure the network traffic caused by our enhanced client to maintain the topology and publish the shared data. The traffic caused by a Gnutella client was also measured. Traffic is measured in both the LAN environment and the real world scenario, on

500

0.016

keys per node in 104 node network mean line

pdf 10,000 nodes, total number of keys is 100,000 and gram size is 10

450 0.014

400 0.012 350

PDF

keys indexed per node

0.01 300

250

0.008

200 0.006

150 0.004 100

0.002 50

0

0 2

4

6

8

10

12

14

16

18

0

200

400

600

total number of keys (x 104)

800

1000

1200

1400

1600

Number of keys per node

Figure 5: (a) The mean and number of keys stored per node in a 104 node network. (b) The probability density function (PDF) of the number of keys per node. Total number of keys is 10 5 . 10

0.25

1

path length for a query mean pathlength

pdf 2 2 nodes

9

0.2 8

0.15 6 PDF

path length for certain query

7

5 0.1 4

3 0.05

2

1

0 10

100

1000 number of nodes in the system

10000

100000

0

2

4

6 8 Number of keys per node

10

12

14

Figure 6: (a)Path length as function of network size. (b) PDF of the path length in a 10 4 node network. the Planet lab test bed. To perform the experiment a hybrid model described in section 4 was used. Hybrid nodes were ported on 10 planet lab nodes located geographically apart, each node sharing the results for the queries made on the Gnutella network. The parameters were set to take less number of participating nodes into account. The traffic generated was measured every minute. Same experiment was performed in the LAN environment. Figure 5.4 compares the traffic generated by pure DHT based client and the Gnutella client. The plots show a pure DHT based nodes generates significantly less amount of network traffic as compared to the flooding based Gnutella client.

6

Conclusion

The Distributed Hash Table based systems could be used only as a lookup service because of the hashing based indexing and querying. In this paper we suggest an indexing scheme which utilizes the routing scheme provided by the Distributed Hash Tables. We also suggest a way to process queries so that complex queries can be handled efficiently while generating small amount of network traffic as compared to the other similar distributed models.

References [1] Mutella. “http://www.mutella.sourceforge.net/” [2] Gnutella. “http://gnutella.wego.com/”

700

35000 total traffic generated by Gnutella total traffic generated by my implementation

600

30000

500

25000

traffic generated (in kB)

traffic generated (in kB)

total traffic generated by Gnutella total traffic generated by my implementation

400

300

20000

15000

200

10000

100

5000

0

0 0

100

200

300

400

500

600

700

time(in mins)

0

10

20

30

40

50

60

70

80

90

100

time(in mins)

Figure 7: Traffic caused by Chord client enhanced with the suggested indexing scheme and Gnutella client. (a)In LAN scenario. (b)In real world scenario on Planet lab nodes. [3] Napster. “http://napster.com/” [4] Planet-Lab. “http://planet-lab.org/” [5] P. Druschel, A. Rowstron. “Past: Persistent and Anonymous Storage in Peer-to-Peer Networking Environment.” HotOS, 2001. [6] M. Harren, J. M. Hellerstein, N. Lanham, B. T. Loo, S. Shenker and I. Stoica. “Querying the Internet with PIER.” VLDB, 2003. [7] R. Huebsch. “Content Based Multicast: Comparison of Implementation Options.” Feb 2003. [8] R. Huebsch, B. Chun, J. M. Hellerstein, B. T. Loo, S. Shenker and I. Stoica. “The Architecture of PIER: an Internet Scale Query Processor.” In the Proceedings of CIDR, 2005. [9] B. T. Loo, R. Huebsch, I. Stoica and J. M. Hellerstein. “The Case for a Hybrid P2P Search Infrastructure.” In the Proceedings of CIDR, 2005. [10] J. Li, B. T. Loo, J. Hellerstein, F. Kaashoek, D. Karger and R. Morris. “On the Feasibility of Peerto-Peer Web Indexing and Searching.” IPTPS, 2003. [11] S. Ratnasamy, P. Francis, M. Handley, R. Karp and S. Shenker. “A Scalable Content Addressable Network.” ACM SIGCOMM, 2001. [12] S. Ratnasamy, S. Shenker and I. Stoica. “Routing Algorithms for DHTs: Some Open Questions.”

[13] I. Stoica, R. Morris, D. Karger, M. Frans Kaashoek and H. Balakrishnan. “Chord : A Scalable Peer-to-Peer Lookup Service for Internet Applications.” ACM SIGCOMM, 2001. [14] P. Valdueriez, Y. Viemont. “A Multikey Hashing Scheme using Predicate Keys.” ACM, 1984. [15] I. H. Whitten, A. Moffat, T. C. Bell. “Managing Gigabytes: Compressing and Indexing Documents and Images, Second ed.” Morgan Kaufmann, 1999. [16] “Network World ISP News Report Newsletter.” 31st March, 2004. [17] B. Zhao, J. Kubiatowicz, A. Joseph. “Tapestry: An infrastructure for Fault-Tolerant Wide-Area Location and Routing.” Tech. Rep. UCB/CSD01-1141, University of California Berkeley, Computer Science Department, 2001.

Suggest Documents