Searching in variably connected P2P networks - CiteSeerX

4 downloads 0 Views 307KB Size Report
and Kazaa[6]. While these hybrids work rather well, they introduce servers. Servers add a dependency to the Internet, they add running costs, and possibly also ...
Searching in variably connected P2P networks Nj˚al T. Borch

Lars Kristian Vognild

Norut Informasjonsteknologi AS Tromsø, Norway Email: [email protected]

Norut Informasjonsteknologi AS Tromsø, Norway Email: [email protected]

Abstract— Peer-to-Peer networks are gaining popularity through file-sharing communities. Most P2P networks demand a certain stability from it’s nodes in order to function satisfactory. A variably connected P2P network, however, is a network where the connectivity of nodes might vary greatly over time. The nodes can be in different connection states, such as connected to the Internet or moving between networks as well as ad-hoc or offline operation. Creating a search infrastructure for variably connected P2P networks is challenging, as it will be very hard to keep updated shared state (such as distributed indexes) within the network. Both scalability and stability must be addressed; scalability for large scale overlay networks and stability in order to handle the large fluctuations in the network. Routing queries in Peer-to-Peer networks based on semantic information looks very promising for solving scalability issues. This paper discusses how a Semantic Query Routing protocol can be extended to solve stability issues and thus better support both mobility and ad-hoc operations. We will make an implementation in the Seers experimental P2P search infrastructure to verify the new protocol.1 Keywords

ported, it does add much overhead in server management. The overhead can be moderation systems, server costs, bandwidth costs et cetera. The Peer-to-Peer paradigm allows members of the P2P network to both supply and consume resources. P2P makes it possible to create communities of individual nodes with little or no extra cost for the individual user. The P2P paradigm can also be applied on a large variety of network infrastructures, such as ad-hoc networks or as an overlay network on top of the Internet. When devices are highly mobile, they are likely to have very variable connectivity. For example, a device can have a wired cradle at home, a stable WLAN service at work, and only ad-hoc WLAN capabilities when on a bus. An important observation is that while on the bus, the device will be regarded as offline for anyone not a part of that specific ad-hoc network. Designing a search protocol on such unstable network infrastructures is a great challenge as most of the popular P2P search protocols demand a certain degree of stability in order to work properly. In this paper an extendible search protocol for unstable network infrastructures will be presented.

Variable connectivity, ad-hoc, mobility, search.

S EARCHING I NTRODUCTION The usage of computers is rapidly changing, moving away from stationary computers to smaller, mobile devices. These small devices are typically equipped with networking capabilities, enabling them to communicate with neighboring devices or to the Internet. The usage patterns of the small devices differs from the static PC, in that they often both produce data (such as photographs or sound recordings) and consume data (such as playing music or displaying a bus timetable). These devices are also to a larger degree a personal device, making it possible to exploit feedback from a user to provide a better service[1]. The Server/Client paradigm is not well suited when clients create content, nor when nodes are disconnected or in adhoc networks. While resource providing clients could be sup1 This work was funded by the Norwegian Research Council under contract no 145249/223

IN

P2P

NETWORKS

There are currently many different approaches for searching in Peer-to-Peer networks. Early “pure” P2P networks, such as Gnutella[2], would typically flood the network with queries. While flooding provides very robust search networks, it does not scale well. Searching can take a very long time, as potentially very large amounts of nodes will be queried before a hit can be found. Gnutella has improved this by introducing ultra-peers[3],[4]. These can act on behalf of a set of nodes, in effect indexing sub-trees of the network. The available bandwidth of nodes also has a certain influence on query routing. In order to speed up searching, hybrid search infrastructures were created. They include a set of indexing nodes who handle the searching. File transfers are then done using P2P techniques. Examples of such hybrid networks are Napster[5] and Kazaa[6]. While these hybrids work rather well, they introduce servers. Servers add a dependency to the Internet, they add running costs, and possibly also introduce legal risks.

Distributed Hash Tables (DHT) is another approach, in which the nodes are typically organized in a logical circle. A hash function is then used to map a key (such as a filename) to a subset of nodes in charge of that key. While this makes lookups of keys very quick, it is not necessarily well adapted to do searching. For example “word” will hash to a different key than “words”, making fuzzy searching a complex operation. As well there is some overhead in arranging the key-space between the member nodes. Examples of DHT based networks are Chord[7] and Overnet[8],[9]. Most of the P2P networks mentioned above add dependencies to either servers (typically on the Internet) or a structured group of nodes. These networks are typically unable to function in ad-hoc networks, as moving to a (new) ad-hoc network will effectively reset the entire group of peers. All neighbors must then be discovered from scratch. Flooding systems can however be adapted to work in ad-hoc networks, as long as the ad-hoc networks are rather small. Pure ad-hoc search infrastructures, such as Proem[10], are generally not well adapted for large scale overlay network operation. The Internet is a fairly stable infrastructure, and this stability should be exploited . Protocols designed to search for local services, such as Rendezvous[11] or UPnP[12], are typically based on a local broadcast. This makes them able to find local resources, but their reach is very limited. Semantic Query Routing is an emerging technique to increase the efficiency of P2P searches[13],[14]. Specifically the decision of which nodes to include in a query is based on the semantics of the query. Examples of implementations that uses this technique are Neurogrid[15], Alpine [16] and Limewire[17]. Neurogrid is dependent on a fairly static network configuration due to it’s learning process. Alpine uses a protocol called “DTCP” that allows suspend/resume operations on a connection. It is thus possible for an Alpine node to move to another location and resume it’s operation. However, this requires a certain stability. A moved node must be able to inform it’s neighbors before the neighbors move themselves. If the nodes are personal, portable devices, it might very well happen that many of them move at the same time (say, office closing time). This can be solved in Alpine by introducing “home-agents”, or some other location where a node can be contacted (such as in Mobile IP[18]), but this adds a large degree of complexity to the system.

T HE S EERS

SEARCH PROTOCOL

In Seers[19] a search infrastructure that will seamlessly work in both ad-hoc and overlay (Internet) modes is created. Seers is built around a concept we call Meta-Documents. Metadocuments are self-contained resource descriptions, such as a URL with additional descriptive meta-data. A query is represented by a meta-document that describes a wanted resource. A reply is represented by a meta-document describing an existing resource. The meta-document can also be either active or passive. Active meta-documents will be sent to neighbor nodes, while passive meta-documents are stored locally until a query requests it. This provides both pull and push style searching and filtering. Meta-documents are represented in XML. Applications can extend the meta-documents to describe the resources they provide or request. For example, a video player requires different descriptive meta-data for a resource than a word processor. Both the tags used to represent the meta-data and the metadocument policies can be changed. Specifically, there are policies for matching, transmission and life cycle management of meta-documents. The matching policy defines how tags in the meta-document should be matched against others. The transmission policy describes how a meta-document should be transmitted. It can be limited by scope (host-only, local, global) or to groups of known nodes, or it can be public. It also contains parameters for the Semantic Query Routing (described later). The life cycle policy governs the storing and forwarding of meta-documents, such as validity limitations and cacheability. Thus a meta-document can for example be specified to expire at a certain time, or it can request not to be cached in the network (say for quickly changing resources). Seers is designed to provide service to applications. The metadocuments allow suppliers to specialize the meta-data used to describe their resources and the caching rules in the network. All applications can further specialize how meta-documents are ranked and searched through. Figure 1 is an example of a meta-document with only a few tags shown.

In the next section the Seers search protocol is described. How it can provide a search infrastructure for variably connected Peer-to-Peer networks is then discussed. Then the first Seers prototype will be described and the preliminary results presented. A conclusion and future work will summarize the future work on Seers. Fig. 1.

An extended Seers meta-document

The Seers search protocol matches a query and a reply meta-document by giving each reply a score using the given matching policy. The documents will then be ranked according to their score. In order to limit a sudden burst of replies to the query originator, all replies will be routed back, typically following the path of the query.

according to the local node’s observed interests.2 The highest scoring neighbors are kept. Only addresses that have been active during a given period of time will be kept.

Seers learn about other node’s interests by passive observation of the meta-documents it routes. Local IP multicast is also used to update information of local neighbors. Neighbor information will then be updated with additional addresses as well as activity and connectivity points. Seers allows multiple addresses for a node, much in the same way that humans keep multiple phone numbers to each other. For example, we might call a friend at work, or if that fails, try the home phone, and after that the cell phone. If all fails, the friend will be regarded as (temporarily) unavailable. By keeping multiple addresses for nodes, Seers can “snap” into different configurations. As long as a neighbor is at some known place (say at home or at work), the node will be able to contact it.

Fig. 2.

Address selection for neighbor nodes

In figure 2 the basic algorithm for address selection is shown. Each address of a neighbor can be in one of three different states, active, inactive or stale. While traffic is seen from a neighbor, it will stay in the active state. When a neighbor is in the inactive state, it can still be used, but active probing of the neighbor might be triggered. If the neighbor fails to show liveness, it will be set to the stale state. A stale address can still be used, but ranks lowest. An address will also receive points when it is seen in use. Communication directly from the node gives more points than routed messages. This allows the shortest known path to be used. If all addresses of a neighbor are in the stale state, the addresses will be cycled by using the least recently used address. If addresses are not stale, but otherwise equal, the last seen address will be used. Limiting the amount of neighbors is done by ranking them

Fig. 3.

The basic Semantic Query Routing function

The Semantic Query Routing in Seers works by calculating a score for each known neighbor as shown in figure 3. Points are not only given based on the knowledge of the neighbor’s interests. Very active, stable and well connected nodes can receive many points for their connectivity. This seamlessly opens for a kind of “super-node”, attracting queries from nodes when no neighbor is known to have relevant interests for a query. Lastly, the local user might have an opinion about certain nodes. This can be either due to some other node’s excellence, or a “ban” of disliked nodes. The list of neighbors is then sorted, and the highest scoring nodes are selected. The selection is based on negative send points, which is a construction where it is more expensive to send to less certain nodes. This maps to the human feeling of embarrassment when calling a friend who is not likely able to answer your question. The send points, default points and minimum points are specified in the transmission policy of the query. In figure 4 an example of a search is shown. It is a setup with 3 nodes (L1-L3) in a local network. These are also connected to the Internet, represented by nodes I1-I6. There are also two pure ad-hoc nodes (A1 and A2). Node L1 can only reach A2 through A1. In this example, node L1 searches for metadocuments available at A2, I5 and I6. Some state is shown for three nodes (A1, L1 and I4). The ’Address’ is the best current address. All nodes that see (route) the meta-documents will update their semantic knowledge of the senders. Also node statistics, such as addresses, activity levels et cetera will be updated. The search starts by Node L1 creating a meta-document describing the wanted resource. The meta-document (the query) 2 Seers nodes will also monitor their own queries and replies, in order to determine it’s own interests.

Fig. 4. An example of Seers with state before and after a search operation. Thicker lines show the search tree. The stapled lines show the new neighbor connections after the search. State changes are emphasized.

is then sent on local multicast to the LAN.3 L1 then sends the request to nodes A1 and I4. A1 forwards the request to A2, who sends a reply via A1 to L1. I4 will forward the query to I5, who will reply and forward it to I6. I6 will send a reply back to L1. The nodes I1-I3 are not included in the search as they are not seen as likely to contain the wanted resource. As seen in figure 4, L1 has included I5 and I6 as neighbors. Other nodes in the search (I4) path have also learned from the search. Also notice that L1 has seen A2, but at A1’s address. This is because L1 cannot reach A2 directly. C URRENT

STATE

The first prototype of Seers, code named “Mˆel´ee”, implements the majority of the semantic query routing aspects of the protocol described above. The extendible message format is not yet implemented. Mˆel´ee has limitations as to maximum points for each class (keyword points, activity points etc), in order to balance them. It has been run in a setup on two locations with 3 stationary nodes each, plus two mobile nodes, as shown in figure 5. All nodes are running Slackware[20] Linux (versions 8.1, 9.0 and 9.1). One mobile nodes varies between a local wireless connection at one location and pure ad-hoc mode (disconnected from the Internet). The other mobile node is connected to any of the locations through either Ethernet or WLAN, as well as operating in ad-hoc mode. When connected to Ethernet, it will keep the WLAN in ad-hoc mode, enabling it to bridge the ad-hoc network to the Internet. The stationary nodes are all connected to the Internet, one location with routeable IP addresses, the other location is behind NAT. All stationary nodes are permanently online. The data in the setup was mixed, with both music, video, web-links and documents 3 In this example nodes L1-L3 do not have any matching meta-documents. As the cost of a local multicast message is low while possibly providing the best replies (closest resources), it will normally be the first step in a Seers search.

(html/pdf/postscript). Mˆel´ee also includes a small http server in order to serve local files. Multiple test applications was made to test the service. A generic search application was made to search for any file, while a plugin for the XMMS multimedia player[21] was modified to support searching for music directly from XMMS. A “smart bookmark” was added to the Galeon web browser[22] in order to integrate searching for documents and web links. The early observations of the prototype indicates that it operates as expected. It is able to function in ad-hoc settings with no intervention, and it is able to contact known hosts if they are available, be it locally or through the Internet. The local neighbor updates are done via IPv6 link-local multicast, and allows all nodes on a link to know about each other with little overhead. A new prototype, code named “LeChuck”, with full support for application-extendible XML messages, policies and semantic query routing is currently being implemented. C ONCLUSIONS

AND FUTURE WORK

The early observations of the first prototype indicate that it is possible to create a Peer-to-Peer network based on Semantic Query Routing that allows mobility and ad-hoc operation. For the upcoming prototype “LeChuck” a set of test applications will be implemented. We expect these applications to be of immense value for an environment in which both stationary, portable and PDA style devices are present. Specifically, the resource limited PDA style devices seem to be well helped by the ability to navigate and use both local and remote services. R EFERENCES [1] Schilit, Adams, Want: Context-Aware Computing Applications. IEEE Workshop on Mobile Computing Systems and Applications, 1994. [2] Kan: Gnutella Peer-to-peer: Harnessing the benefits of distruptive technologies, ed. Oram, O’Reilly & Associates: 94-122, 2001

Fig. 5.

Test setup for the Mˆel´ee prototype

[3] Singla and Rohrs: Ultrapeers: Another step towards gnutella scalability. Whitepaper, 2002. [4] Gnutella protocol v0.4 http://www9.limewire.com/developer/gnutella protocol 0.4.pdf [5] Napster, www.napster.com [6] Kazaa, www.kazaa.com [7] Stoica, Morris, Karger, Kaashoek and Balakrishnan: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. Proceedings of the 2001 conference on applications, technologies, architectures, and protocols for computer communications, 2001. ISBN 1-58113-411-8. [8] Overnet, www.overnet.com [9] Maymounkov and Mazieres: Kademlia: A peer-to-peer information system based on the XOR metric. Proceedings of IPTPS02, Cambridge, USA, March 2002. [10] Kortuem, G: Proem: A Peer-to-Peer Computing Platform for Mobile Ad-hoc Networks. Advanced Topic Workshop Middleware for Mobile Computing November 16, 2001, Heidelberg, Germany [11] Rendezvous Technology Brief http://www.apple.com/macosx/pdfs/Rendezvous TB.pdf

[12] UPnP forum: UPnP Device Architecture v1.0.1 Draft, 2003. [13] Joseph, S.: P2P MetaData Search Layers.. Second International Workshop on Agents and Peer-to-Peer Computing AP2PC 2003. [14] Krishna Ramanathan, Kalogeraki, Pruyne: Finding good peers in peerto-peer networks. Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, Abstracts and CD-ROM, Vol., Iss., 2002 Pages:24-31 [15] Joseph, S.: NeuroGrid: Semantically Routing Queries in Peer-to- Peer Networks. Proceedings of the International Workshop on Peer-to-Peer Computing, May 2002, Pisa, Italy [16] The Alpine Network: http://peertech.org/alpine/ [17] Christopher Rohsr: Query Routing for the Gnutella Network. 2002, http://www.limewire.com/developer/query routing/keyword routing.htm [18] IETF: IP Mobility Support. RFC 2002, 1996. [19] The Socialized.Net, http://www.socialized.net [20] Slackware Linux, http://www.slackware.com [21] XMMS, http://www.xmms.org [22] Galeon web browser, http://galeon.sourceforge.net