Exploiting the TTL rule in Unstructured Peer-to-Peer ... - CiteSeerX

2 downloads 0 Views 119KB Size Report
(Gnucleus, Swapper, Bearshare, Limewire), to the best of our knowledge, use an approved TTL value of 7, which we suppose has turned up by experience or it.
Exploiting the TTL rule in Unstructured Peer-to-Peer Networks Georgios Pitsilis, Panayiotis Periorellis School of Computing Science, University of Newcastle, UK {Georgios.Pitsilis,Panayiotis.Periorellis}@ncl.ac.uk Abstract Peer-to-Peer networks exist with the volunteering cooperation of various entities on the Internet. Their self-structure nature has the important characteristic that they make no use of central entities to run as coordinators and the benefits of this cooperation can be enjoyed equally by all the members of the community with the assumption that they all make right use of the protocol. In this study we examine what the consequences on the community are in the case of existence of misbehaving nodes which can abuse the network resources for their personal benefit and we also analyze the cost and the benefit of some proposed solution that could be used to bring into account the above problem.

Keywords: P2P systems, TTL, Optimization,

Horizon.

1. Introduction P2P systems nowadays are basically used for exchange of content over the internet. With content we refer to simple forms, such as files or exchanged messages but in more sophisticated solutions live media streams can even be used. The unstructured systems have as main characteristic the independence from central entities in order to operate, which is both advantage and disadvantage. Even though the central entities can be a single point of failure and crash the whole system, at the same time they may work as a centre of control for the adoption of the protocol principles. (e.g. Napster). Otherwise, to our knowledge, there is no other way but to assume that the participants have some serious reason to adopt the protocol policies if there is no entity that will enforce them to do so. In this work we focus on the Gnutella protocol which was build on the idea to use no central entities for its operation and belongs to the category of protocols that are exposed to any misbehaving users.

As it will be seen next, there are quite many studies performed with the purpose to improve the search capabilities of the protocol. In all these it is assumed that all nodes participating in the schema are adopting faithfully the rules of the protocol and do not act at the expense of the others. The case of misbehaviour that our research is dealing with is the abuse of the time-tolive factor (TTL), which is used for the propagation of messages. In a few words, this factor defines how long a message can be transmitted from a hop to the neighbouring one before it dies. Initiating messages which will have a standard TTL value is a kind of “rule” that all participants in a P2P topology must abide and all protocol implementations have been agreed as optimum. The lack of entities which would monitor the traffic and could form some measures upon their observations has left this duty to the honesty of the software implementers to built P2P software which will preserve the protocol rules. In this study we examine the consequences of a misbehaving population in the rest of the P2P network and also the cost of policing the network for the right application of the TTL rule. The rest of the paper is organized as follows: In section 2 there is an analysis of the problem. Section 3 describes the experiment that took place, the parameters used, etc. In section 4 there is an analysis of the results and in section 5 can be found related work in the field. Future issues and concluding remarks are referred in section 6.

2. Motivation The concept of using TTL in transmitted messages is as old as the network technology itself. It was introduced in the protocols for packet switching (such as TCP) to avoid the circulation of packets in the network for ever. The same concept is used in the layers above TCP, such as in the Peer-to-Peer applications, to restrict the message flooding up to a specified hop distance. In unstructured Peer-to-Peer networks TTL together with the outer degree of a node is also related to the horizon that can be seen by a single peer in the network. For

instance, we refer to the following formula that gives theoretically the horizon size if searching up to a TTL hop distance: TTL (1) r (n ) = ( n − 1) (t −1) ⋅ n t =1

[

]

Where, n: is the number of connections to other peers in the network (outer degree) and t: is the Time To Live on the sending packets. With searching operation we refer to a resourcesearching query for finding content or to a resource discovery query for finding available peers on the net. In reality, with the evolution of hybrid P2P networks more peers can actually be seen from a single point of view than those the formula indicates. Each user’s request (either resource-peer discovery or search for an item) which carries a TTL value is send with the purpose to be replied by the recipients of the message along the travelling path. Due to the dependency of the result on various factors, such as the point of view and the capabilities of the neighbouring nodes, each peer may be aware of the existence of different set of peers compared to some other. The horizons can also vary in size since resource discovery is based on uncertain factors. A possible misuse in the protocol can be done in such a way that a peer increases deliberately the value of TTL in the sending queries in order to increase the size of its horizon and thus to get access to more resources. Such action will have as a consequence the creation of more traffic in the network environment, which in any case is undesirable. In Gnutella protocol [10] the resource discovery and the search for files is done by the exchange of descriptors between the peers. The information of TTL is included in the descriptor Header together with another called ‘Hops’ which shows the number of times the descriptor has been forwarded in the network. From these two pieces of information could be found if the TTL rule has been violated. Even though the protocol specification [10] does not mention which maximum TTL value should be set, the protocol implementations (Gnucleus, Swapper, Bearshare, Limewire), to the best of our knowledge, use an approved TTL value of 7, which we suppose has turned up by experience or it was chosen because it was found that the diameter of the real Gnutella network was 7. If applying the previous formula (1), this TTL value, with considering a standard outer degree of 5, gives an horizon of around 3000 nodes large. This value has been so far adopted also in the version 0.6 of the protocol which supports the use of Superpeers. The open architecture of the Gnutella protocol though permits to anyone to create its own servent and embed its own rules for searching in it. Another way of peers increasing their horizon would be by adding more connections to the local neighbours

(outer degree). Such a choice can be severe only for the servent that adopts it because of the congestion it can cause in the servent’s connection to the internet. On the other hand, any attempt to increase the TTL could affect a large number of nodes. Especially those that will be encountered into a search path that has been initiated by a misbehaving servent would be affected with severe consequences in terms of congestion. The solution might be as easy as to instruct all servents to reject any requests they receive and have TTL higher than the adopted standard 7. Since the existing servents have not been instructed by their makers to be acting so, the solution should be looked into other directions. In this study we hypothesize that apart from the misbehaving ones there is always a percentage of users which have adopted the role of policing the protocol for the right application of the rule. For example, they might have the rule of preserving the TTL embedded into the servents so that when they are asked to pass on a request which has abnormal TTL value, they would act by rejecting it or in the best case by turning it back to within the normal limits. In the experiments we run, the action taken by the policy protectors is the alteration of the TTL value back to the maximum allowed so as to get the illegitimate requests transformed in to legitimate ones before they get retransmitted to the neighbouring nodes.

3. The experiment We can split the tests we ran into two parts. In the first part there is an attempt to measure the consequences of having illegitimate packets communicated in the network and in the second part we study the effectiveness along with the benefits of measures against them. Due to limitations in computing resources we tried to run the tests on a small community first before applying them to a network as big as the real one. The purpose was to identify any special properties and the behaviour of the system for each set of simulation parameters. We ran simulations in a network of 200 nodes and we assumed that the agreed value for TTL in the experiment would be 4, which means a resource search query should travel for maximum 4 hops distance. This adjustment for the maximum TTL from 7 to 4 was necessary due to the smaller number of nodes in the testing network. Even though it might look to be quite low, such TTL is quite reasonable for demonstrating the proposed solution. As regards the outer degree, we assumed that it follows a power-law distribution with

4. Results – Discussion The aim of receiving these series of measurements is to find the best ratio of illegitimate servents and policy

protector servents (r) for each combination of population ratios (p). In figure 1 we present the size of the horizon for various levels of misbehaviour and various sizes of special populations. These measurements were taken for r=infinite. As can be seen the horizon for misbehaviour level 0 is around 27 which is much less than 45 which we would get if we applied the formula (1) for n=3 and TTL=3. This can be justified as: the peers’outer degree in our simulations was not chosen to be fixed, but its value followed a power law distribution as in a real network. Horizon 90 80 num ber of nodes

70 60

p=10%

50

p=20%

40

p=30%

30 20 10 0 0

1

2 3 illegitimacy level

4

5

Fig. 1 The horizon seen by the average user. In figure 2 there is a presentation of the number of messages communicated in the network (traffic) for various levels of misbehaviour and various sizes of special populations with r=infinite. Traffic Congestion 8 7 millions of messages

parameter a = -1.4 as it has been measured in the real Gnutella network [3]. In total, we used 3 variables in the test. The first one is the special population ratio (shown as p). That is the percentage of peers against the whole population that are either acting illegitimately by sending packets with higher TTL than the agreed value or they belong to those that have adopted the duty to protect the network from requests with illegal TTL and we call them Policy Protectors. For each test we ran experiments for various special population ratios ranging from 10% to 30%. The second variable used is the ratio between the illegitimate servents and the policy protectors (shown as r). We tried tests for ratios starting from infinite and going to 1, 5, 1/5 and ½. The ratio 5 denotes that there was one policy protector for five illegitimate servents and a ratio of 1/2 means that there are 2 policy protectors for each illegitimate servent. We also examined cases of different levels of illegitimacy and we used that as a third variable. A level of illegitimacy of 1 would mean that a malicious servent that wishes to get advantage of the network would distribute request packets which would have TTL value increased by one compared to the agreed standard (3 in our experiment). In the first part of the experiment we assumed that r is infinite and in this way we saw what happened when there were no policy protectors in the network to defend from the illegitimate servents. In the second part of the experiment the special population is composed of various mixes of illegitimate servents and policy protectors and we observe how the traffic and the horizon of the servents are affected by that ratio. There is also a measurement taken for the case of having no illegitimate servents at all in the network and it is used as a point of reference for all the other combinations we tried. As regards the measured quantities, we were interested in the traffic encountered in the network and the gain in the horizon seen by the average peer. As can be seen from the quantities we chose to measure, we aimed to find the best and the worst case scenario that can occur and how successful the measures against them can be. Therefore, we introduced a factor which shows how successful each certain combination was. The numbers presented in the diagram are averaged values of 10 measurements.

6 5

p=10%

4

p=20%

3

p=30%

2 1 0 0

1

2 3 illegitim acy level

4

5

Fig. 2 The traffic Congestion. As it was expected, the traffic increases with the level of illegitimacy (Increase in the TTL). The interesting observation though is that illegitimacy brings a more general benefit to the community by increasing the horizon of the average user. That is quite reasonable since the replies that are sent back from the longer distance due to the increased TTL are received not only by the node that have sent them, but also by any intermediates in the traversed path. It is rather interesting if we express the increase in the horizon and the traffic in relative units so as to get able to make a comparison against the normal state (p=0%)

and thus find out what is the best combination of parameters (r,i) that can help the network to reach its best performance. In figure 3 we present the relative increase in the traffic for 4 different levels of illegitimacy (used higher TTL than the normal) and five different illegitimate/protectors ratios for the case that the special population is the 10% of the whole population. The set No 1 represents the case where there are no policy protectors at all, therefore the levels of traffic are quite high. In set No 2 there is equal number of policy protectors and illegitimate servents. In set No 3 the illegitimate servents are 5 times as much as the policy protectors which shows a very identical case with that of having no policy protectors at all. In sets 4 and 5 the policy protectors are 5 and 2 times more than the illegitimate peers respectively. The case of having 5 policy protectors for each illegitimate servent (r=1/5) seems adequate to bring the traffic back to the normal levels as it was in the case of having no illegitimate peers. Relative increase in the traffic for Special Population 10% 25 Illegitimacy 1 Illegitimacy 2

Value (%)

20

Illegitimacy 3 Illegitimacy 4

15

The denominator represents the relative increase in the horizon. t(i=0) is the network traffic when there are no illegitimate users at all. We were interested in knowing the benefit for a whole range of illegitimacy (i=[1..4] therefore we set I=4 in our experiment. As regards Population it equals to the value 200 that is the size of the total population in the experiment. The B factor relates the horizon accomplished with regard to the traffic consumed for achieving this horizon. We applied this formula for every different combination of r and p and the results are pictorially presented in figure 4. As can be seen from the diagram, as the ratio of illegitimate servents by policy protectors approaches 0 the benefit for the community increases. So, the best policy is that for which r=1/5 or explaining it otherwise, there are 5 policy protectors used per each illegitimate user. It is also shown that the benefit maximizes in this policy when p=20% (4 over 5 peers in the community are normal peers). When p=30% the benefit is almost the same for any policy used. Assuming that policy protectors behave legitimately the only illegitimate peers in the network that has p=20% correspond to the 5% of the total population. That percentage seems to be the optimum for helping the network to reach its best performance. Total benefit

10

1000 p=10%

5

p=20% p=30%

0 2

3

4

5

Policy used

Fig.3 Relative traffic. As can be seen in the figure as the level of illegitimacy (i) increases both traffic and horizon increase too. Therefore, we need to express the benefit of applying each policy in such a way that we are able to make a choice of which is the best to choose. We assume that the target of some policy is to: • Increase the horizon to a size as much closer to the whole set of peers in the community. • Keep the traffic as close as possible to the traffic that a community with no illegitimate peers would have. t (i = 0) 1− I t ( I , r , p) where: We call Benefit the Sum: B= h( I , r , p ) i =1 Population t stands for traffic and h for the horizon of a given situation.

100

Benefit

1

10

1 1

2

3

4

5

Policy used

Fig 4. The total benefit for various r and p

5. Related Work The purpose of this paragraph is to identify and present some of the related work in this area. Let us start with Matai and Foster [1] who looked at the topological characteristics of Gnutella to identify that although the network did not follow a power law connectivity distribution it behaves (in the sense that it carries the benefits and drawbacks of such networks) like it did. Their analysis shows that given any 2 nodes, in 95% of

the cases, they are at most 7 hops away. Hence TTL 7 would achieve almost 100% horizon capability. They do not address the flooding problem that TTL creates directly although they do acknowledge that almost 1.7% of all traffic over the US backbone results from P2P networks. Their work is interesting in understanding the flooding problem P2P overlays create. Acusta and Chandra [2] argue that the topological characteristics of P2P networks affect the way searching is being carried out. There is a tendency to connect to nodes depending on their topological proximity and that has a great impact on the horizon of each node. The impact of various searching methods based on proximity and biased associated searches was analysed according to its flooding potential. Compared against their hybrid search algorithm, it revealed a significant swing in the ratio of all messages against replicated messages. Feldman et al. [4] address the interesting problem of cooperation asymmetry in P2P networks exposing the problems created by user’s lack to cooperate. Lack of cooperation (nodes unwilling to share) asymmetry of interest (lack of mutual interest in one’s resources) and zero-cost identity (peers change identities often and without cost) impact the network performance as a whole. Many users in an attempt to economize the usage of their bandwidth devalue to performance of the entire network. Drawing from game theory they have devised and tested a set of incentive techniques that support cooperative behavior and improve overall performance. The incentives deal with maintaining historical records of past transactions and using those to intelligently select the nodes to cooperate with. They fail to tell about the case of such an operation (i.e. cost of maintain historical records) as well as implementation details of the ‘discriminating server’ as they describe it. Another issue of great importance that is also related to our work, is that of Kalogeraki et. al. [5], which address the issue of identifying the right information. Given vast of amount of resources and the potential of replication it is important to deploy searching algorithms that enable the users to identify the right resource inexpensively. They acknowledge that high TTL values provide a way of identifying potential resources by brute force. A peer receives a message and is consequently forced to propagate the message to all its connected peers, creating the flooding problem that we and our colleagues have also talked about. Their analysis and work is based on measuring the search results of various degrees of distribution probability. Putting it into perspective, rather than using the TTL force to distribute messages, it is suggested that a message is propagated to a percentage of its connected peers. Their paper identifies slight variations in search potential versus flooding. Although it does not carry the problems of [4] its does not provides us with the

optimum distribution figure. [6] achieves a significant reduction in flooding by implementing what they call a multiple walks algorithm. Simulations of the method show that the algorithm reduces flooding by 2 orders of magnitude based on a similar statistical analysis that [5] proposes. In a nutshell any query is automatically sent initially with small TTL. If the search is unsuccessful then the search is performed again with increased TTL. The search continues until the search has yielded the appropriate results or TTL has reached its maximum. The reduction of flooding to the network is obviously proportional to the results returned. Although the simulated examples show that the method is successful, it does not take into account the fact that the right resource is rapidly becoming difficult to discover as reported by [5]. Chang and Liu [7] address the same problem of search optimization by applying the same technique as [6] where TTL is not a standard or constant value but rather a variable that increases depending on the success rate of the search. In addition one of the interesting conclusions that this paper has observed is that when the probability distribution of the location of a resource is not known (as it is the most common scenario) the best search are the random strategies that consist of sequence of random variables. Chang and Liu argue that “successive TTL values are drawn from certain probability distributions rather than being deterministic values.” In a similar manner to [6] the conclusions and results were observed during simulation and there have not been attempts to apply these techniques to real P2P networks. Sun and Garcia Molina [8] directly address the same problem as we do in this paper i.e. that of users lurking without contributing anything or much to the network. The suggestion by [8] is to develop reputation databases that maintain statistics at the site of its potential user. The data kept only concern the neighbours of that user. Consequently they are used to make intelligent decisions regarding who to serve and where to forward query messages to. The idea of reputation has indeed be widely addressed. The issue is not simply a technological overhead but also a statistical one. The dynamics of P2P networks change rapidly therefore making it unlikely that neighbours will remain constant. In addition a significant statistical sample would need to be observed prior to making an informed decision about someone’s performance. Finally as [5] observed identities change at zero-cost, hence associating a set of metadata about a peer’s performance may not actually reflect the performance of the user’s (the real identities of those users) behind that peer IP address. Although we should stress here that flooding has its merits such those of low latency, large coverage and high reliability Jiang [9] (along with almost everyone that works in the area) also acknowledges that 70% of the messages within a P2P are in fact redundant and

recognises that TTL values are responsible for such an amount of message overhead. Jiang, Guo and Zhang [9] proposed a decrease flooding of about 60-70 percent by constructing a tree structure of a sub P2P overlay within an existing P2P. The tree is constructed by observed peers (i.e. peers identified during earlier queries). Query messages are still sent at reduced TTL 3-4 while at the same time they are also sent to the tree with the same TTL. The authors argue that the success rate they observed during simulation make a case for application of their work. An interesting final point here is that collectively all authors agree that TTL should not be left to be valued at the discretion of the users. To the best of our knowledge no study has been done to assess the consequences that misbehaving users have on the scalability of the P2P community.

6. Conclusion – Future work We ran simulation experiments for Peer-to-Peer communities in which a sub-population of the participants behave illegitimately in regard to the TTL factor they use in the queries they issue in the community. We assumed that a subset of the users has undertaken the task to protect the system from the malicious ones. In the experiment we used mixed populations of illegitimate and policy protector servents and our preliminary results show that the existence of illegitimate servents is for the benefit of the community. Furthermore, we defined the Benefit in regard to the horizon and the caused traffic and we studied the summation of the benefit for all levels of illegitimacy. The observation we made from the experiments is, that from all different ratios of illegitimate peers per policy protectors (r) the community benefits when that ratio receives low values. We also found, that there is a relationship between the number of malicious servents in the system and the Benefit they offer, and the optimum number of misbehaving servents (as regard to the TTL value used in their submitting queries) that a Peer-to-Peer network should have in order to maximize its performance, is around the 5% of the total population. In our future plans is to simulate a bigger Peer-to-Peer network of 10.000 nodes and find out if the rules we discovered for the small community of 200 nodes reflect the way that real communities operate.

References [1] Matei Ripeanu & Ian Foster,"Mapping the Gnutella Network: Macroscopic Properties of Large-Scale Peer-toPeer Systems", 1st International Workshop on Peer-toPeer Systems (IPTPS ' 02), 7-8 March 2002 - MIT Faculty Club, Cambridge, MA, USA. [2] William Acosta & Surendar Chandra, "Unstructured Peer-to-Peer Networks - Next Generation of Performance and Reliability", Refereed poster, IEEE INFOCOM 2005 [3] V. Cholvi, P. Felber, and E. Biersack. "Efficient search in unstructured peer-to-peer networks", European Transactions on Telecommunications: Special Issue on P2P Networking and P2P Services, 15(6):535--548, November 2004. [4] Feldman, M., Lai, K., Stoica, I., And Chuang, J. "Robust Incentive Techniques for Peer-to-Peer Networks". In ACM Conference on Electronic Commerce (EC' 04) (2004). [5] V. Kalogeraki, D. Gunopulos and D. Zeinalipour-Yazti, A Local Search Mechanism for Peer-to-Peer Networks, Eleventh International Conference on Information and Knowledge Management (CIKM), McLean, VA, November 2002. [6] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker. Search and replication in unstructured peer-to-peer networks. Proceedings of the 16th annual ACM International Conference on supercomputing (ICS) 2002. [7] Nicholas Chang, Mingyan Liu: "Revisiting the TTLbased controlled flooding search: optimality and randomization", In Proc. of the Tenth Annual International Conference on Mobile Computing and Networks (MOBICOM) Sep 2004: pp.85-99, Philadelphia , USA. [8] Qixiang Sun & Hector Garcia-Molina, "SLIC: A Selfish Link-Based Incentive Mechanism for Unstructured Peerto-Peer Networks",Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS' 04) pp.506-515. [9] Song Jiang, Lei Guo, Xiaodong Zhang. LightFlood: an Efficient Flooding Scheme for File Search in Unstructured Peer-to-Peer Systems, icpp, p. 627, 2003 International Conference on Parallel Processing (ICPP' 03), 2003. [10] The Gnutella protocol specification v0.4: http://www9.limewire.com/developer/gnutella_protocol_0 .4.pdf