Peer-to-Peer over Ad-hoc Networks: (Re)Configuration Algorithms Fernanda P. Franciscani Marisa A. Vasconcelos Rainer P. Couto Antonio A.F. Loureiro Computer Science Department, Federal University of Minas Gerais Av Antonio Carlos 6627, Belo Horizonte MG, Brazil 31270-010 ffepaixao, isa, rainerpc,
[email protected] Abstract A Peer-to-Peer network over an ad-hoc infrastructure is a powerful combination that provides users with means to access different kinds of information anytime and anywhere. In this paper we study the (re)configuration issue in this highly dynamic scenario. We propose three (re)configuration algorithms especially concerned with the constraints of the environment presented. The algorithms aim to use the scarce resources of the network in an efficient way, improving the performance and the network lifetime. The algorithms were simulated and used a simple Gnutella-like algorithm as comparison. The results show that the algorithms achieved their goals, presenting a good cost-benefit relation.
1. Introduction Mobile computing has the goal to allow people access the distinct types of information anytime and anywhere. In general, the client-server architecture is not adequate to satisfy this demand due to many reasons: the server can be down (little fault-tolerance) or overloaded (scalability problems), or there is no infrastructure to access the server and other entities. Some of the problems above, like scalability and faulttolerance, can be addressed by Peer-to-Peer (p2p) applications, which build a virtual network over the physical one. The peers can act as servers or as clients, and are called servents. The servents can exchange data among themselves in a completely decentralized manner or with the support of some central entity, which usually helps servents to get in touch with one another. Thus, the success and popularity of the p2p networks reveal how important it is to satisfy the anytime demand. Regarding the anywhere issue, however, we can say that its mainly related to the network itself, that is, it depends This work is partially supported by CNPq-Brazil
on the infrastructure provided. Once the lack of infrastructure is a problem, networks that do not need it turn out to be a solution. This is the case of ad-hoc networks, which, once endowed with mobility, are called MANETs (Mobile ad-hoc networks). In these networks, the communication between nodes is based on radio coverage and can be made directly or using other nodes as routers. Thus, the MANETs can be easily formed by a group of people who make use of current devices as cellular phones, PDAs and notebooks. From the arguments above, it is reasonable to think that p2p networks over ad-hoc networks would be a very good solution to both the anytime and the anywhere problems. However, this combination leads to a highly dynamic scenario in which references between nodes are constantly changing. The frequent reconfiguration may cause a great impact in the scarce resources of the network, such as energy and bandwidth. Aiming to control and diminish this impact, we designed algorithms to (re)configure a p2p network over an ad-hoc network and analyzed their behavior. Although there are some studies on p2p applications over ad-hoc networks, they usually are not concerned about (re)configuration issues. Many times the references between nodes simply show up or are created through the indiscriminate use of broadcasts. The relevance of the current work is therefore to study adequate ways of (re)configuring a p2p network over and ad-hoc network taking into account the serious constraints this scenario presents. The rest of this paper is organized as follows. Section 2 briefly describes p2p networks, whereas Section 3 discusses the p2p over mobile ad-hoc networks. Section 4 describes the main characteristics of a mobile ad-hoc network and the routing protocol used. Section 5 presents the related work. The four (re)configuration algorithms are described in Section 6 and analyzed through simulations in Section 7. Finally, Section 8 presents our conclusions and future work.
2. Peer-to-Peer Networks The p2p networks are application-level virtual networks. They have their own routing protocols that permit com-
0-7695-1926-1/03/$17.00 (C) 2003 IEEE
puting devices to share information and resources directly, without dedicated servers. P2p members act as a clients and as a servers, thus being called servents [17]. File sharing applications such as Napster, Gnutella, Morpheus have become the p2p networks very popular. The great contribution of this kind of system is the scalability that allows millions of users to be online concurrently even at peak periods [7]. This is obtained thanks the user hybrid behavior (client + server), which yield a greater computing decentralization. The p2p systems are classified in three main categories: centralized, decentralized and hybrid [4, 9]. In a centralized p2p system, coordination between peers is managed by a central server. However, after receiving the information from the central server, the communication between peers happens directly. Some advantages of this kind of system are: the easiness of management and security. The disadvantages are the low fault-tolerance, since some data are held only by the central server and the scalability limitation by the capacity of the server. However, the scalability can still be achieved through the increasing processing power of the computers, which make possible one machine be able to serve a great number of users. Some examples are the search architecture of Napster [11] and the system SETI@Home [20], which has a central job dispatcher. In decentralized topologies, e.g. Freenet [2] and Gnutella [3], all the peers have equal roles. The communication is made through multiples multicasts, where peers forward messages on behalf of others peer. Some important advantages are extensibility and fault-tolerance. Scalability in these systems is difficult to measure: on the one hand, if you add more hosts, the system becomes more capable. On the other hand, however, the overhead of keeping the consistency of the data increases with the size of the system. In hybrid (centralized + decentralized) topologies, peers forward their queries to ”super-peers” which communicate with each other in a decentralized manner. The advantages of this topology resemble the ones of decentralized systems except for data consistency, which was improved. This happens since part of the data is kept only by the “superpeers”. Examples of this topology are KazaA [5] and Morpheus [10]. Table 1 was derived from [9], in which the advantages and disadvantages of distributed topologies were listed.
Manageable Extensible Fault-Tolerant Secure Lawsuit-proof Scalable
Centralized yes no no yes no depend
Decentralized no yes yes no yes maybe
Hybrid no yes yes no yes apparently
Table 1. Topologies and their characteristics.
In this work, only hybrid and decentralized configurations were adopted for two reasons. First, we consider these topologies closer to reality. Second, as it will be shown, the environment considered is highly dynamic, making extensibility an important issue. The developed algorithms, including the Hybrid, also were inspired in the decentralized protocol Gnutella, especially in query mechanism part. Gnutella is a public domain p2p protocol used mainly for sharing, searches and retrieval of files and content. To be part of this network, the servents must connect to the neighbors who already belong to the network. After that, the servent will send messages through broadcasts for its neighbors and will act as router to the other messages that had been transmitted. The types of messages exchanged in this network are: ping to discover other member nodes, the query messages with information about the searched content and file properly said, which is transferred directly between the peers. When a servent wishes to search for some file, it sends a query message to its neighbors, which if possible, return the desired files besides forwarding the message for its own neighbors. In order to prevent the indefinite propagation, all the messages are flagged with the TTL (time-to-live) field and with the number of hops passed [17].
3. P2p over Mobile Ad-Hoc Networks The wireless devices have, in general, a restricted transmission range due to its limited power supply. Thus, the search for data should be made in a small distance range. This search can be made of two distinct ways. The first one uses a fixed infrastructure, generally it involves a high cost and provides continuous access to a information network like the Internet or a private intranet. The second way of access does not need fixed infrastructure: the own set of the mobile devices, acting as routers and information servers, forms the network. This kind of network is called ad-hoc network. In this last case, the nearest devices become important sources of data for each other, what resembled the p2p paradigm, in which the network elements act, at the same time, as clients, servers and routers. One of the main advantages of a p2p over ad-hoc network would be easiness of forming the network, since it is not necessary to have infrastructure nor it depends on a central server. Examples of possible uses of p2p over ad-hoc networks include applications that alert us to the presence of friends at a crowded public space or identify people we want to meet taking into account our preferences and interests; systems that spread rumors, facilitate the exchange of personal information, or support us in more complex tasks [6]. On the other hand, p2p over ad-hoc networks is a very dynamic combination that demands, among other things, special attention regarding (re)configuration issues.
0-7695-1926-1/03/$17.00 (C) 2003 IEEE
4. Mobile Ad-Hoc Networks The ad-hoc mobile networks are composed by wireless mobile devices, which its communication is based on radio coverage and can be made directly (point-to-point) or using other nodes as routers. The utilization of this kind of network is mainly in scenarios without a fixed network infrastructure. Some examples are conventions or meetings, where people, for comfortableness, wish quickly exchanging of information [18], and emergency operations. Ad-hoc mobile networks show, beyond the common restrictions to the wireless network, the additional challenge to deal with a very dynamic topology. The limited energy of the devices demands smart routing protocols. In our simulations, the chosen routing protocol was AODV. Comparing the performance of some routing protocols, it can be seen that each of them performed well in some scenarios for some metrics yet had drawbacks in others [13]. We, therefore, adopted AODV, which exhibited the best performance on high mobility scenarios. The AODV protocol is a demand routing algorithm for ad-hoc mobile networks, in which each node has the information about the next hop of a route. It is an on demand algorithm since the routes are only maintained if they are being used. When a link is broken, AODV updates the nodes that have this route saved. Advantages of this protocol are: quick adaptation to dynamic link conditions, self-starting, multi-hop and loop-free [16, 19].
5. Related Work A comparison is made in [19] between ad-hoc networks and p2p networks routing. Besides presenting taxonomy of both, this work proposed the joint use of these networks, aiming at a synergetic effect. This theory passes to the practical one in [14, 15], where the 7DS (Seven Degrees of Separation) application is presented, a network for data dissemination among hosts in an ad-hoc network. Besides the implementation, these works present the effect (through simulation) of power conservation, radio control coverage and strategies to cooperation among host to data dissemination. In [6] it is presented a platform for development of peerto-peer applications in small range ad-hoc networks, more precisely, for PAN’s - Personal Area Network. A study on sensor networks regarding cooperation and network formation is made in [21]. In [13] it is evaluated the performance of several routing protocols in a p2p over ad-hoc network. The works above are related to the world of p2p over ad-hoc networks and they all lack the concern about p2p (re)configuration issues. Our work focuses exactly on this problem, analyzing its impact on the performance of both p2p and ad-hoc networks.
In [4] it is described some metrics for performance evaluation in a p2p network, using Gnutella and Freenet as case studies, based on four criteria: efficiency, speed, worse case performance and scalability. Gnutella was not considered scalable, since the required bandwidth increases linearly as the network grows. Conversely, Freenet scales logarithmically, using pathlength as metric.
6. (Re)Configuration Algorithms For (re)configuration of the network we proposed two types of algorithms: decentralized and hybrid. There are three different decentralized algorithms, which are called Basic, Regular and Random. The hybrid type has only one representative: the so called Hybrid algorithm. In the description of all these algorithms, it will be said that the nodes are connected, trying to connect, maintaining a connection etc. It is important to notice, however, that we are dealing with wireless networks and thus there are no real connections, e.g. a TCP connection, between nodes. Then it must be kept in mind that the so called connections actually are references, that is, they represent the knowledge of the addresses of some reachable nodes. Thus, a symmetrical connection is the one in which a node A keeps a reference to node B while B also references A. Asymmetrical connections also exist and are used in the Basic Algorithm.
6.1. Decentralized Algorithms Our system’s model is based on the use of messages that are forwarded over many hops from one peer to the next in order to establish connections and to search for data. Despite having this same basis, the three decentralized algorithms have distinct behaviors, as it will be seen below. The Basic algorithm will be presented in the next section. After that, a very important concept will be briefly described: the small-world effect [8], which was the basis for the changes that turned the Regular algorithm into the Random algorithm. Finally, there is the description of these last two algorithms. 6.1.1 The Basic Algorithm The Basic algorithm was meant to represent a simple (re)configuration algorithm and therefore to serve as a basis for comparison. Its main characteristic —simplicity— implies easy implementation but partially ignores the dynamic nature of the network. This algorithm, shown in figure 1, makes use of three constants named MAXNCONN, NHOPS and TIMER. The first represents the maximum number of connections per node. The second is the number of hops a message travels and the third stands for the time interval between two
0-7695-1926-1/03/$17.00 (C) 2003 IEEE
attempts to establish connections. The algorithm works as described below. A node, when starting its participation in the p2p network, broadcasts a message to discover other nodes within NHOPS away in the neighborhood. Every node that listens to this message answers it. As soon as a response arrives, the node establishes a connection to the neighbor who sent it, till the limit of MAXNCONN connections. In case the number of responses is lesser than MAXNCONN, and whenever else it has less than MAXNCONN connections, the node keeps trying to create the rest of the connections. Between the trials, the node waits for a time interval —TIMER— in order to avoid traffic overload in the network. Once a reference1 is created its validity is frequently checked by sending pings. Whenever a node receives a ping it answers with a pong. The receiving of a pong thus signals the connection still exists while its lack means the neighbor is not reachable anymore and then the connection is over.
possible conn be the number of all connections that could exist between these neighbors. The clustering coefficient is given by real conn=possible conn. Besides the clustering coefficient, the regular and random graphs also have very distinct characteristic pathlengths. In large regular graphs with n much larger than k —for a k much larger than 1— the pathlength is approximately n= k . In large random graphs this value decreases substantially and is given by n= k [4]. Interestingly, little changes in regular graphs connections are sufficient to achieve short global pathlengths as in random graphs. The rewiring of some connections from neighbors to randomly chosen vertices represented the creation of bridges between clusters a great distance away. These bridges diminish the pathlength without any considerable change in the clustering coefficient. The graphs that have high clustering coefficients and, at the same time, short global pathlengths are called small-world graphs. Our Random (re)configuration algorithm —presented next— aimed to construct the peer-to-peer networks as small-world graphs. Before presenting the Regular and Random algorithms, we will list their variables and constants, most of which are present in both algorithms. There are three variables: nhops, randhops and timer. The first one represents the number of hops a message looking for a regular connection can travel. It is initialized with the value NHOPS INITIAL, which is greater than 1, and has MAXNHOPS as an upper limit. The second one has a similar meaning but it is only applied to random connections; it does not need to be initialized. The third variable stands for the time interval a node waits between two attempts to establish connections. It is initialized with TIMER INITIAL and can increase up to MAXTIMER. Finally, there are two remaining constants not explained yet: MAXNCONN, which is the maximum number of connections per node, and MAXDIST, which is the maximum distance allowed between two connected neighbors (measured in number of hops).
6.1.2 The Small-World Model
6.1.3
Establishing connections while the node belongs to p2p network if number of connections MAXNCONN try to establish new connections to nodes within NHOPS away up to the limit of MAXNCONN connections; wait TIMER before next try; endif endwhile