A DHT-based Distributed Location Service for Internet Applications Simone Cirani and Luca Veltri Department of Information Engineering University of Parma Viale G. P. Usberti 181/A, 43100 Parma - Italy Email:
[email protected],
[email protected]
Abstract—Distributed Hash Tables (DHTs) are structured peer-to-peer systems that provide an information storage and retrieval service of key/value pairs among a number of hosts. Although DHTs are usually used to store P2P-application-specific information, they could be used in a transparent and applicationindependent way. In this paper, we describe how DHTs can be used in order to obtain a general purpose Distributed Location Service (DLS). This approach results in a twofold benefit: a unique service for multiple P2P applications is deployed, and a more robust DHT is obtained due to the increased number of collaborating nodes. The DLS provides generic and dynamic service for mapping application resource URIs to the actual list of contact URIs, expiration dates, and other related information. Such a DLS can be exploited by any new P2P application or other legacy application, which otherwise would natively rely on a centralized location server. First, we discuss the role of URIs and URLs for service/resource addressing, the requirements for architectures components and protocols for DHT maintenance and DLS service access, and the possibility to reuse current RPC-based or other existing protocols. Finally, we provide implementation details and demonstrative applications to show the feasibility of the proposed solution.
algorithm. The IETF P2PSIP Working Group has proposed DHTs to be used to realize a P2P SIP architecture, specifically to implement a Distributed Location Service (DLS). A DLS would allow a direct and distributed resolution of SIP contacts, thus eliminating the need for intermediate SIP nodes, such as SIP Proxy servers and SIP Registrar servers, as the registration and resolution of SIP contacts would be managed by the DHT itself. Many proposals have been made within the P2PSIP WG for a P2P signalling protocol to be used in order to maintain and manage such DHT. Some proposed protocols are binarybased, such as RELOAD [3] and XPP [4]. Another major protocol is dSIP [5], which is a text-based SIP extension. This paper focuses on how to implement a DLS to be used not only for SIP resources, such as SIP contact addresses, but for generic resources, and therefore intended for any new P2P application or other legacy application, which otherwise would natively rely on a centralized location server to perform address resolution tasks, such as DNS servers.
I. I NTRODUCTION
II. M OTIVATIONS FOR A D ISTRIBUTED L OCATION S ERVICE
Distributed Hash Tables (DHTs) are structured peer-topeer (P2P) systems that provide an information storage and retrieval service of key/value pairs among a number of nodes. DHTs feature desirable properties such as scalability, selforganization, robustness, and fault-tolerance. Several DHT algorithms, such as Chord [1] and Kademlia [2], have been already defined and successfully implemented. DHTs rely on the cooperation of a number of nodes (peers) which collectively provide the information storage and retrieval service. Nodes are arranged on an overlay network, which is built upon an existing network, whose topology depends on the particular DHT algorithm. For instance, Chord organizes nodes on a circle, while Kademlia as leaves of a binary tree. The structure of the DHT topology affects message routing inside the DHT. Each DHT algorithm typically defines also a protocol (usually a set of RPCs) to be used for the communication and cooperation among the DHT nodes. Applications interact with the DHT through two basic RPCs: • put(key, value) • get(key) The key of a resource is in general the hash of the resource name, through some hashing function defined by the DHT
Some application-layer protocols like SIP, SMTP, and HTTP use URI or URL (RFC 3986 [6]) information to identify and address their resources, the end points (peers) of their communications, or simply the recipient of their messages. The use of an URI-based address mechanism instead of simply using IP addresses and port numbers could be preferred since: • IP addresses have a geographical distribution and are related to the specific (and sometimes temporary) network point of attachment of hosts; • URI may contain more information on the addressed resource (user name, resource path, parameters, etc.). The method used for mapping the URI to the actual next hop toward the recipient depends on the specific URI scheme, protocol type, or service logic. When no specific mechanism applies, a common way is to use the DNS system for resolving a part of the given URI representing a fully qualified domain name (FQDN), if present. For example, a HTTP request to http://www.wonderland.net:8080/download/ is processed by a HTTP client agent (e.g. a web browser) by taking the hostport part (www.wonderland.net:8080) and resolving the hostname www.wonderland.net through a standard DNS query to the
2
default DNS server. Similarly, when calling a user identified by the URI sip:
[email protected];user=phone, the SIP UACs often use DNS for resolving the FQDN wonderland.net to determinate the next hop SIP node through which the INVITE messages has to be routed. However, such a DNSbased approach generally suffers of some problems like: i) it is server centric; by using standard DNS servers it uses an architecture that is intrinsically distributed but not completely reliable, since an entire domain is normally managed by one or few servers; ii) it does not allow dynamic name resolution; the association between the FQDN and the host address (IPv4 or IPv6) is statically configured on DNS servers; although updates of such binding are possible, the frequency of such updates is strongly limited by caching mechanisms implemented within the DNS architecture; iii) it applies only to the FQDN part of the URI (when present) although some protocol-dependent information can also be provided within the URI. Note that, although some protocols such as SIP provide their own location service in order to handle the issue ii, the dependence on DNS still remains, resulting in issues i and iii. A possible solution to these issues could be to partially or completely replace the DNS dependence with a complete distributed and independent P2P location service (LS), for example based on DHT. In such a scenario, whenever an agent wants to send a request to a resource or agent identified by a given URI, the following steps may be followed: 1) the agent processes the URI and decides if the URI already contains all information to route the message (for example because it includes a valid IP address and port number, or because it contains some explicit reference that can be locally and uniquely resolved to the proper IP address and port number), or requires an explicit LS lookup; in the former case the procedure ends here, while in the latter case step step 2 is executed; 2) in order to obtain a fully routable address (URL) the agent performs a LS lookup by issuing a DHT get() query; 3) the returned address (URL) is used as new reference as the next hop or recipient for the resource request. This procedure can be also extended to the case in which the step 2 may return a new URI (or URL) not routable in the sense of step 1. In this case, steps 1, 2, and 3 are repeated more times until step 2 returns a finally fully routable-URL. This simple LS procedure can be further extended in order to provide dynamic name resolution. If an agent wants to update the location information of a given resource (or of the agent itself): 4) the agent performs a LS insertion or update by issuing a put() query with the proper new URI-to-routable-URL binding. Since each binding has a time-to-live (expiration) date, such binding is refreshed through successive put() queries. Step 4 can be used for inserting, refreshing, modifying, or deleting binding information.
A. Distributed SIP Location Service
The SIP protocol is an application-level signalling protocol defined by the IETF RFC 3261 [7] and used to establish multimedia sessions. SIP is defined as a P2P protocol, in the sense that, once a session has been established, the multimedia stream flows among the participants directly. Moreover, some SIP scenarios, such as a “SIP P2P call”, require nothing but User Agents: in this case, the caller initiates a session by knowing the callee’s location (i.e. its IP address and port number). However, since this information is usually not known in advance, in order to be useful and practical for a public service, SIP relies on some network elements, such as Registrar servers or Proxy servers, that introduce some degree of centralization and possible failure points of the architecture. SIP has been investigated for P2P capabilities in order to propose a version of SIP that does not require central servers. One possible solution to this problem has been found in realizing a DLS, in order to remove the need for central servers to resolve the contacts’ addresses to route requests between User Agents. Because of their nature, DHTs have revealed as a perfect tool to accomplish this goal, as they can be used to store the user registration information (name address and location). A protocol for DHT maintenance and management in SIP is then required. A mature proposal for a SIP-based P2P protocol was dSIP [5], which is a very simple extension of the SIP protocol with a few new headers added in order to maintain and manage the DHT. dSIP messages are based on the SIP REGISTER method; this choice was due to the fact that SIP REGISTER messages are intended to be processed only by those network elements, such as SIP Proxies, that provide a centralization point and that can decide whether to process the message or not by checking if they support the P2P capability. Depending on the implementation, peers can act either as proxy servers or redirect servers. Clients that are not aware of the P2P substrate can interact with their peer with the SIP protocol, thus allowing for backward compatibility towards legacy SIP applications. In this case, the peer would analyze the request (i.e. a normal SIP INVITE request for another client) from the client and retrieve the necessary information from the DHT; then the peer would forward the request to the appropriate endpoint or send a response back to the client. This role of the peer is called “adapter”. An adapter peer provides therefore a sort of gateway between SIP and P2PSIP. The peer thus has two main communication interfaces. The first one is the DHT communication interface, which is used to communicate with other peers in the DHT. Communication inside the DHT occurs using the dSIP protocol. dSIP messages are therefore received and sent at this interface. The second interface is what we call the SIP adapter, which is the interface responsible to provide the adapter functionality to the peer. The SIP adapter receives and sends regular SIP messages from and to clients. The P2PSIP WG moved from the text-based dSIP proposal towards other binary-based protocols such as RELOAD [3] and XPP [4].
3
TABLE I R EPRESENTATION OF THE DHT’ S CONTENTS FOR A DLS. Key resourceURI-1
resourceURI-2 resourceURI-3 ...
contactURI-1; contactURI-2; contactURI-3; contactURI-4; contactURI-5; contactURI-6;
Value displayname-1; priority=P1; displayname-2; priority=P2; displayname-3; priority=P3; displayname-4; priority=P4; displayname-5; priority=P5; displayname-6; priority=P6; ...
expires=T1 expires=T2 expires=T3 expires=T4 expires=T5 expires=T6
B. Merging Location Services The distributed SIP LS described above is one example of how a DHT can be used to create a LS for a given application. Other applications might exploit a DLS as well. We propose that these location services be merged into a unique DHT in order to achieve a single DLS that several application might use. This approach is preferred rather than creating separate location services for each application since: • only one registry would be used as a single distributed access point; • the increased number of collaborating nodes would result in a more robust DHT. III. G ENERAL PURPOSE DLS ARCHITECTURE The P2P DLS system should provide a storage and retrieval service for the binding between an URI, identifying the target resource, and one or more mapped contact URIs, identifying the place where or through which the resource can be accessed. Together with each contact URI some other information should be stored like the expiration time, an access priority value, and, optionally, a displayable text (for example a description of the contact or a readable name). The distribute LS can be abstractly represented as in table I. The proposed P2P distributed LS may actually store and retrieve mappings between a URI (identifying the resource) and one or more URIs in a distributed and reliable manner. RFC 2397 [8] defines a method for mapping any (short) data within a standard URI. Our LS system in conjunction with RFC 2397 may also be seen as a system for storing any kind of short data in a distributed P2P manner, providing a sort of distributed database. It is important to point out that, although a real distributed database would require that the information stored in the DHT should be the actual data (such as files), the data stored in the DHT should be short as they would be moved often from node to node as the DHT reorganizes when members join and leave. This is why we prefer using the DHT as a location service, that is, a registry, where the information stored are not the actual data but rather show how to access that particular resource. This approach is also preferred as it is up to the application that looks up the DHT to decide what to do with the location information and what protocol to eventually use to actually access the resource. Therefore, applications may treat the DHT as an external registry to consult whenever they need location information.
In order to implement a general purpose DLS we do not specify a particular DHT algorithm: different implementations may use different DHT algorithms (like Kademlia, Chord, etc.). The main components of such P2P distribute LS are: 1) a DHT algorithm; 2) a P2P protocol used for managing the DHT (inserting a new peer, updating the DHT, etc.); 3) a protocol used to perform basic LS queries like put(), get() on the distributed LS, used by DHT-peer and possibly by non-DHT peers; since DHT peers use 2 for maintaining the DHT, protocol 3 is intended for pure DHT access at the border of the P2P system. As pointed out, for component 1 we do not make any particular assumption since different DHT algorithms should be supported. For component 2 we propose to use SIP as it is a candidate as standard protocol for maintaining P2P systems by the P2PSIP WG [5], [9], [11]. Finally, for component 3 we do not specify a new protocol. Any implementation can consider and use its own mechanism, according with the other systems it has to interact with. Some examples are described in the next sections. Note that protocol 2 is also a possible candidate for 3. Particularly, we have already implemented a LS system in which SIP (actually with some extensions) is used for both 2 and 3 [12]. Note that, if the application includes the peer, 3 is not needed as the communication between the application and the peer occurs through basic API calls. In our realization both Kademlia and Chord DHT algorithm have been implemented and used for 1. A. DHT protocols There are several DHT algorithms currently proposed in literature, each with its own procedures for allowing a node to join, insert a new resource, query for a resource, leaving the DHT. Generally, the DHT algorithm describes all mechanisms required for the maintenance of the DHT. However, in order to implement such mechanisms amongst several nodes distributed on an underlying network (usually an IP network or more precisely the Internet) a proper communication protocol is needed. Current DHT implementations often use ad-hoc communication protocols specified directly with their implementations. In other cases, a more general RPC protocol is used. The use of RPC protocols is motivated by the typical interaction between peers required by various DHT algorithms. Instead, in this section we discuss the use of SIP for the purpose of the maintenance of the LS DHT. Although the primary function of SIP is to initiate, modify, and terminate multimedia sessions, the use of SIP for other communication scenarios has been already proposed for other applications (e.g. for presence, or instant messaging). Some advantages of using SIP are: • reference protocol for different P2P-oriented applications such as VoIP, IM, presence, and conference; easy integration of the proposed mechanisms with this applications; • quite tested protocol; • runs everywhere: over high speed fixed or mobile network, over light transport protocol like UDP or on reliable
4
and secure protocol such as TLS over TCP or SCTP; already solved (at least partially) some well-known issues such as NAT or firewall traversal; • SIP already contemplates the presence of intermediate nodes; • already implemented within several devices (PCs, PDAs, smartphones, hardphones, etc.); • availability of several implementation of SIP intermediate nodes; • availability of several SIP stack implementations for different OSs and programming languages; • easy to implement (text-based); • easy to extend, keeping backward compatibility. We envisaged two different approaches for using SIP as DHT communication protocol: A) SIP is used transparently to carry opaque DHT messages. This is similar to the usage of HTTP for Web Services. DHT messages, for example XML encoded, are encapsulated within the SIP message body. Standard SIP methods such as REGISTER, MESSAGE, UPDATE, SUBSCRIBE, NOTIFY can be used, or a new specific method (e.g. DHTQUERY) can be defined. B) SIP is extended in order to provide support for DHT maintenance, and proper mechanisms are used to map the specified DHT operation (such as join, leave, put, leave, etc). This is the approach followed by some proposals arisen within the P2PSIP WG. In particular, dSIP [5] tried to completely reuse the semantics used by standard SIP registration service. The REGISTER message (like in dSIP) or new proper messages can be used for the purpose. Approach A substantially uses SIP as RPC protocol, although this is in contrast with what declared in RFC 4485 (Informational). However, this is consistent with other SIP usages such as for UA registration and service subscription. If approach A is followed, the SIP protocol may be replaced at any time by another RPC protocol. Solution B is more compact than A and is the approach that we followed in our work. •
B. Information stored into the DHT We propose that the information stored into the DHT does not include only the contact of the resource (or service), that is, its routable-URI. We also store additional information such as: • an optional display name, to be used as a description or a readable name; • an expiration time, referring to the time for which the resource is to be considered fresh (this information can be also used to delete a resource from the DHT if set to 0); • an access priority value. This information can be included directly into a single URI, just like SIP specifies for contact information. Another approach could be to represent the resource information in XML format allowing also for adding additional parameters that might be considered useful for the resource.
Fig. 1.
DHT Peer API and DHT client API.
IV. I MPLEMENTATION DETAILS An implementation of the DLS described in the previous section has already been realized based on the Java programming language [12]. In this section we will describe some architectural and implementative choices and solutions that have been undertaken. For compatibility, two different DHT algorithms (1) have been implemented: Chord and Kademlia. The protocol used for maintaining and managing the DHT (2) and the resources is dSIP. For the client protocol (3) no assumption has been made. Our implementation currently supports SIP and HTTP request from SIP UAs and HTTP clients. Compatibility with client protocols is realized by Protocol Adapter interfaces that reside within the peer. It is possibile to extend the adapter capabilities of the peer by implementing new adapter interfaces that support new client protocols. It is also possible that the communication between the client and the peer may occur through API calls. This is the case when the application includes both one or more clients and a peer. A. The DHT API In our investigation, we have highlighted the basic operations that are common to all DHT algorithms. We have found that the basic operations are: • join: this operation allows a peer to join an overlay; • leave: this opartion allows a peer to leave the overlay it is currently enrolled in; • put: this operation allows to store a key/value pair in the DHT; • get: this operation allows to retrieve the information associated with the given key. All these operations can be performed by peers, but only the put and get operations can be performed by clients. Hence, the DHT client API is a logical subset of the DHT peer API, since a peer can do anything a client can do, but not viceversa (see figure 1). B. A DHT-independent implementation Since no assumption has been made on the particular DHT algorithm (1) to be used, we have decided to create our implementation in a pluggable-DHT algorithm fashion. This means that the peer can negotiate and learn at the time of joining which DHT algorithm is in use and act consequently. The
5
Fig. 3.
Fig. 2.
Client protocol adapters.
implementation therefore exploits the DHT API in order to be as neutral as possible. The DHT logic for the specific DHT algorithm is part of the peer’s core, and must be implemented for any DHT algorithm that the peer is willing to support. C. A protocol-independent implementation Our implementation is not tied to a specific protocol for managing the DHT (2). This choice was made because the different approaches for a DHT protocol are still under investigation. Therefore, the DHT communication interface can be implemented using any suitable protocol. The communication API is based on two basic mathods (request() and respond()) which are transparent to the DHT protocol in use. Our DLS implementation currently supports the dSIP protocol, but it is easy to add support for additional protocol by simply implementing a DHT communication interface for the specific protocol. dSIP support is based upon the open source mjSIP project [10]. Since no assumption is made for the DHT protocol, we manage requests and responses internally to the peer with neutral objects, which are constructed based on the actual protocol requests and responses that are received by the DHT communicator interface. D. Protocol adapters As described above, communication between clients and peers may occur through any client protocol the peer is able to support. Client requests are received by a protocol adapter interface, which is part of the peer, whose task is to parse the request, constructing the internal request representation, executing the request using the peer’s DHT logic, and finally to respond to the client. Currently supported client protocols are SIP and HTTP, but other client protocols may be supported by simply implementing the corresponding protocol adapter (figure 2). Note that when the application includes both one or more clients and a peer, it is possible to directly use the DHT API to perform the desired operations, without passing through the intermediate protocol adapter.
Session establishment using the DLS.
Fig. 4.
Peer-to-Peer SIP user registration.
E. Session establishment The DLS that we have realized can be used by any application to establish sessions among users (i.e. SIP UAs). Agents can register their contact into the DHT through a put() request, and can initiate sessions with other users by retrieving the contact information from the DHT and successively exploiting this information for establishing the session, as shown in figure 3. V. D EMONSTRATIVE APPLICATIONS Together with the DLS implementation, we have realized some demonstrative applications to show usage possibilities for our DLS. These applications are only examples of how to exploit the DLS and are not exclusive, that is, they can co-exist since they can be based upon the same DLS. A. Peer-to-Peer SIP Pure P2P SIP calls can be performed by exploiting the DLS as a SIP LS. Legacy SIP User Agents would register themselves using a peer enrolled into the DHT as a Proxy server. The peer would receive the registration request at its SIP protocol adapter interface and store the UA’s contact into the DHT. When a UA wants to perform a SIP call, it sends an INVITE request to the peer for some user. The peer performs a lookup to resolve the target user’s address and retrieve its location, then it would forward the INVITE request to the UA. The SIP session is now initiated. User registration scenario is sketched in figure 4. Session establishment is shown in figure 5. B. HTTP distributed virtual server Another DLS-based application is a HTTP distributed virtual server. In such application, a virtual web server is deployed. This means that the files are not stored on the same host but are published by a number of nodes that collaborate. Data replication can be achieved by storing the same information on different hosts, and the website contents can be partitioned among several nodes.
6
Fig. 5.
Peer-to-Peer SIP call.
information. Motivations for a DLS rise from the need to remove centralizing points for name resolution, which occurs in many applications that use URI and URL to refer to the enpoints of communication. The DLS architecture is based on distributed hash tables and consists in creating a distributed registry that application can consult to perfrom address resolution. The main components for a DLS would therefore be a DHT algorithm, a peer protocol for maintaining the DHT, and client protocol to interact with the DLS. After describing the possible approaches for a DLS, we have presented a Javabased implementation for a DLS, which uses a SIP-based peer protocol and supports both Chord and Kademlia DHTs. Finally, we have presented also some sample application that show how the DLS can be exploited to create P2P applications. ACKNOWLEDGMENTS
Fig. 6.
HTTP resource registration.
This work was partially supported by the Italian Ministry for University and Research (MIUR) within the project PROFILES under the PRIN 2006 research program. R EFERENCES
Fig. 7.
HTTP resource access.
Resource registration can be performed with a HTTP PUT request as shown in figure 6. A resource can be accessed by a HTTP User Agent as shown in figure 7. Let’s assume that there are three hosts H1 , H2 , and H3 that collaborate in deploying a virtual web server as follows: H1 publishes files FA , FB , and FC . H2 publishes files FA , FD , and FE . H3 publishes files FB , FD , and FF . A HTTP User Agent (i.e., a web server) might want to access the resource FA . The HTTP UA sends a HTTP GET request for the resource FA to a DHT peer it has set as proxy. The peer perfroms a lookup into the DHT for the targeted resource (thus avoiding a DNS lookup) and discovers that it is available at hosts H1 and H2 . With some policy it can select either of these two hosts, forward the HTTP request, and retrieve the resource. Access to resources occurs in a transparent way as the resources would all be registered with resource names that refer to the same virtual domain. The end user would not be aware of the fact that resources are distributed among a number of hosts. VI. C ONCLUSIONS In this paper, we have presented a proposal for a general purpose Distributed Location Service to be used by new P2P application or other legacy application, which otherwise would natively rely on a centralized location server to gather location
[1] I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, F. Dabek, and H. Balakrishnan. “Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications”. IEEE/ACM Transactions on Networking, 11(1):17–32, Feb. 2003. [2] P. Maymounkov and D. Mazires. “Kademlia: A Peer-to-Peer Information System Based on the XOR metric”. In 1st International Workshop on Peer-to-peer Systems, 2002. [3] D. Bryan. REsource LOcation And Discovery (RELOAD). Internet-Draft draft-bryan-p2psip-reload-01, IETF, July 2007. [4] E. Marocco and E. Ivov. Extensible Peer Protocol (XPP). Internet-Draft draft-marocco-p2psip-xpp-00, IETF, June 2007. [5] D. Bryan. dSIP: A P2P Approach to SIP Registration and Resource Location. Internet-Draft draft-bryan-p2psip-dsip-00, IETF, February 2007. [6] T. Berners-Lee, R. Fielding, and L. Masinter, “RFC 3986: Uniform Resource Identifier (URI): Generic Syntax,” January 2005, status: IETF Standard Track. [7] J. Rosenberg, H. Schulzrinne, G. Camarillo, J. Peterson, A. Johnston, and E. Schooler, “RFC 3261: SIP: Session Initiation Protocol,” June 2002, status: IETF Standard Track. [8] L. Masinter, “RFC 2397: The “data” URL scheme,” August 1998, status: IETF Standard Track. [9] M. Zangrilli and D. Bryan. A Chord-based DHT for Resource Lookup in P2PSIP. Internet-Draft draft-zangrilli-p2psip-dsip-dhtchord-00, IETF, February 2007. [10] L. Veltri. mjSIP project, 2007. http://www.mjsip.org [11] S. Cirani and L. Veltri. A Kademlia-based DHT for Resource Lookup in P2PSIP. Internet-Draft draft-cirani-p2psip-dsip-dhtkademlia-00, IETF, October 2007. [12] S. Cirani. Kademlia implementation, 2007. http://www.mjsip.org/projects/p2psip/p2psip dsip 071025.zip