Two Levels SPF-Based System to Interconnect Partially Decentralized P2P File Sharing Networks J. Lloret1, F. Boronat2, C. Palau3, M. Esteve4 Department of Communications, Polytechnic University of Valencia 1
[email protected],
[email protected],
[email protected],
[email protected] Abstract Partially decentralized P2P (PDP2P) networks are a subset of P2P networks. Roles of nodes are different according to their functionality in the network. Nowadays, there are many PDP2P networks, with different characteristics. It is necessary to find a way to join those autonomous networks. We have not found any interconnection system, so in this article we present a new hierarchical system for interconnecting peers from different PDP2P networks. It will allow sharing data, content and resources between networks. It can be employed between sensor networks, P2P networks, and generally, overlay networks. Oldest superpeers or brokers with higher bandwidth in the PDP2P network will belong to higher layers. The topology will change as a function of the available processing capacity and the available number of connections with neighbor nodes. We also describe the interconnection system and its management. Simulations results for some topologies are shown.
1. Introduction Peer to Peer (P2P) systems are a set of distributed nodes of equal roles or capabilities, with autonomy. One of the benefits of P2P networks is their large-scale data sharing and content distribution. Peers in a P2P network collaborate directly with each other, sharing their resources, in order to obtain services, exchange information or jointly tackle large computing problems. Currently, many P2P networks coexist, all of them with different architecture features such as structure, grade of decentralization, routing algorithms, routing metrics, discovery techniques, search algorithms and load balancing. PDP2P networks are a subset of P2P networks in which the roles of the nodes are different according to their functionality. Relatively powerful computers with large bandwidth are called PDP2P network superpeers or brokers. We consider two subclasses of PDP2P file sharing networks: (1) Hybrid
peer-to-peer model. This model uses a farm of servers (brokers). Each broker maintains the indexes of the local peers and, in some cases, the indexes of some files from neighbor brokers. (2) Superpeer model. Superpeers perform query processing on the behalf of their leaf nodes. In P2P file sharing networks, it is possible that when a user wants to find a file, that file may not be in the P2P network where the user is searching. In this case, a system which allows searching and downloading from every file sharing network is needed. Some desktop file sharing applications can use more than one P2P protocol and can join several P2P file sharing networks (e.g., giFT [1]), but these solutions imply the user to be connected to all P2P file sharing networks. Moreover, the computer running a desktop file sharing application developed to join all P2P file sharing networks will need a lot of processing capacity. On the other hand, if a new P2P file sharing network is developed, a new client tool is required to support it and all users will have to update their desktop file sharing application to join the new one. In several communities such as SETI@home [2], users donate volunteer computer time in those research projects. In these cases, when a user wants to collaborate in several research projects, the user has to install all these applications with their possible incompatibility and their processing capacity needed because of several parallel running processes. The interconnection system proposed has the following features: (1) Each PDP2P network works independently. (2) The interconnection system is scalable. (3) The searching process for content isn’t a dependent desktop file sharing application. (4) Every peer should be able to download from every PDP2P network or should be able to share resources with other PDP2P networks. (5) The interconnection system overload needs to be addressed. An interconnection system will give some benefits such as higher probability to find desired files, file replication and higher file availability. P2P desktop file sharing applications could search in every PDP2P network and download from every PDP2P network
using a single service and when a new PDP2P network is added to the system, existing desktop file sharing applications will not be necessarily changed. This paper is structured as follows. In section 2, the architecture design, the components, how the information is routed and the routing tables are presented. Joining or leaving the system and node failures are described in section 3. Finally, in section 4, we present the conclusions and possible future works.
2. Design Our system uses superpeers or brokers (now called supernodes) from PDP2P networks to interconnect them. When a peer, in a PDP2P network, sends a search to its PDP2P network and obtains no or few results, the search will be sent to other PDP2P networks using the developed system. From now, we will not consider those peers because the system works without these leaf peers. Every supernode can send searches to its PDP2P network or to other PDP2P networks. It acts as a representation of its leaf peers. We have divided the interconnection system in three layers: (1) Organization layer. It is composed by a set of supernodes from every PDP2P networks and its topology is unstructured. This layer is used to organize the adjacencies between distribution layer supernodes. Every supernode in the distribution layer has adjacencies with supernodes from other PDP2P networks as a hub-and-spoke. (2) Distribution layer. It is used to send searches and transfer data from different PDP2P networks. (3) Access Layer. The rest of the supernodes and peers, of the PDP2P network, belong to this layer. Every PDP2P network has those three layers. All nodes in upper layers (organization and distribution layers) are superpeers or brokers in their PDP2P networks with additional features to allow the interconnections. The system is shown in figure 1. Let’s consider n∈[2,2K] different types of PDP2P networks with common type of sharing (e.g. files, resources, computing or mixed ones). We estimate that 232 PDP2P networks could be enough for our purposes (so k=32), but it could be incremented. The ith PDP2P network (ni) has M supernodes (M≥3), plus its normal peers. The ni network has three kind of supernodes: Supernodes in the organization layer (Oi), supernodes in the distribution layer (Di) and supernodes in the access layer (Ai). Where M=ΣOi+ΣDi+ΣAi. We define δ parameter, which depends on the supernode’s bandwidth (speed line) and its age in the system. We have considered that supernodes with higher bandwidth and older are preferred (in case of equal bandwidth, the oldest ones are preferred). To
avoid system overload, is better to consider higher bandwidth supernodes instead of older ones. On the other hand, hosts with high bandwidth have long online times and hosts with low bandwidth and short online times communicate with few other hosts [3]. When a new PDP2P network joins this system, a 32bit network identifier (networkID) is assigned to it. New PDP2P networks need three supernodes at least, each one belonging to one level. If there are more than three nodes, every β supernodes in the access layer, the one with the highest δ parameter will belong to the distribution layer. Every α supernodes in the distribution layer, the one with the highest δ will belong to the organization layer. An identifier is assigned to each node indicating the node’s age and it is assigned sequentially by upper layers nodes. A 32-bit node identifier (nodeID) is assigned to each new supernode at the access layer, which ranges from 0 to 232-1. A nodeID is assigned to each new supernode at distribution layer which ranges from 0 to whole number of (232-1)/β. A nodeID is also assigned to each new supernode at organization layer, which ranges from 0 to whole number of ((232-1)/(α·β)). Assuming m normal supernodes in the access layer for a PDP2P network, the number of supernodes in that PDP2P network is M (only limited by the PDP2P network characteristics): m m M = + + m α * β β
(1)
To provide more accurate real values for α and β, we have considered some real measurements: (1) In a hybrid network, a broker/server permits more than 8.192 users for all models measured [4] (except the the Full Replication architecture). Taking into account overall performance vs the number of servers; we can have the following assumptions: (1) All brokers/servers in hybrid P2P networks, can belong to the distribution layer, so β=1. (2) One supernode in the organization layer every 64 supernodes in the distribution layer is a proper number to organize adjacencies between supernodes from different superpeer networks. So α=64. (2) In a superpeer network, a study reveals that most superpeers today support either 30 or 75 leaf nodes [5]. The number of neighbor superpeers ranges from 6 to 32. All these values limited by the desktop P2P application. To know how many superpeers are involved in a TTL of 6 (some PDP2P networks have a TTL of 7, we use the seventh hop to send the search to other PDP2P networks), we have taken a mean value of 24 neighbor superpeers every hop. So, it is needed one supernode in the distribution layer every 96 supernodes (β=96). To organize the distribution layer, it is needed one supernode in the organization layer every 96 supernodes (α=96).
Organization Layer
Distribution Layer Access Layer
Figure 1: System Organization
2.1. System components and roles Initially, when a supernode joins the proposed system, it must have the following information: (1) An identifier of its own PDP2P network. (2) The maximum number of supported adjacencies from other PDP2P network supernodes. (3) Maximum number of supported adjacencies from supernodes in its PDP2P network. (4) Maximum load supported by itself. (5) Maximum number of simultaneous connections supported from its own PDP2P network peers. We could improve the system, adding more information (e.g., in case of PDP2P file sharing networks, kind of files that have its PDP2P network, kind of file downloading system, download queue system, etc.). All supernodes at the organization layer (now called Onodes) are known by distribution layer supernodes (now called Dnodes). Every new Onode must be authenticated at least by one Onode in its PDP2P network. Every new Dnode must be authenticated by one Onode. Dnodes will use these connections to request adjacencies with Dnodes from other PDP2P networks.
2.2. Routing The topology of the nodes is different according to the layer where they are placed. We can see an example of topology in figure 2. Onodes have adjacencies with Onodes in its PDP2P network and with Onodes from other PDP2P networks (black dotted lines). Dnodes have adjacencies with an Onode in its PDP2P network (red dotted lines) and with the selected Dnodes from other PDP2P networks (solid black lines, only D1 and D2 adjacencies are shown). There are connections from supernodes to its leaf peers (solid red lines). 2.2.1. Organization layer. This layer is the one with fewer supernodes. Our measurements indicate, that around 100 Onodes are needed every 106 supernodes in superpeer topologies, and only 9 supernodes are needed every 600 servers/brokers in a hybrid topologies. Onodes help to establish adjacencies between Dnodes from different PDP2P networks. Because there are few Onodes in the organization layer of every PDP2P network and currently there are few
public domain P2P file sharing networks (over 50 P2P file sharing networks), we have chosen SPF as the routing algorithm to have fast routing and searches for providing Dnodes adjacencies. Experiments given in [4] show that a database having 104 external updates from other supernodes will consume 640 Kbytes of memory. SPF suffers in node’s leavings or failures because the topological database has to be updated [6]. Although we have chosen the more stable supernodes in the PDP2P network to be in the organization layer, given by the δ parameter, we have divided the organization layer in two levels to avoid rerunning SPF in supernodes from all PDP2P networks: (1) Level-1 Onodes must know the existence of all Onodes in its PDP2P network. (2) Level-2 Onodes must know the existence of all nodes in its PDP2P network and all Level-2 Onodes from other PDP2P networks. Every Level-1 Onode, which has queries, sent by one of its Dnodes, will forward these queries to a Level-2 Onode, which will route the query to Level-2 Onodes from other PDP2P networks. Figure 3 shows an example. We consider four kinds of tables: (1) Onodes’ neighbor table (see table 1): Every Onode has its own neighbor table formed by all adjacent Onodes. It is used by Level-1 and Level-2 Onodes. (2) Onodes’ network table (see table 2): It is used to interconnect all Onodes in the same PDP2P network. It is used by both level Onodes in a PDP2P network. (3) Dnodes’ network table (see table 3): It is used by Level-1 Onodes to forward information from Dnodes which use this Level-1 Onode as a gateway. Its information is shown in table 3. (4) Level-2 Onodes’ distribution table (see table 4): It is used to only by Level-2 Onodes to interconnect all PDP2P networks. 2.2.2. Level-1 Onodes. Onodes try to become adjacent with at least one Onode in its PDP2P network. A new Onode can establish the first connection with any Onode previously known or by bootstrapping. The information gathered from adjacent Onodes is not a complete Onodes’ network table. Instead, Onodes swap the changes in their network tables. In other words, when an Onode advertises its information with other Onodes, it replies with what is lacking in its network table. This process allows Onodes to share routing information with adjacent Onodes and build its network database. Then, each Onode runs, independently, the SPF algorithm on the PDP2P network database to find the best routes to a destination. The SPF algorithm adds up the cost (a value based on the hops to the destination), the available number of simultaneous supported adjacencies to other Onodes and their available load. Finally, the Onode chooses the lowest cost path to add this entry to its network table.
D5
L1 L1 L2
Network D
Network B
L2
Network D L2 L1
L2
L1
L2 L1
D4
Network A
Onode
D2
Dnode
Network C
Figure 2: Topology example 1st IP of the neigbour Onode. 2nd Keepalive Interval and Dead Interval for that Onode. 3rd Level of the adjacent Onode. 1st 2nd 3rd 4th 5th 6th
L2
Network A L1
Other peers or supernodes
L1
L2
Network B
L1
D1
L1 L1 L2
L1
D7
D6
D3
L1
L1
Level-1 Onode
Network C L2
L1
Level-2 Onode
Figure 3: Levels in the organization Layer
Table 1: Onodes’ neighbour table
1st 2nd 3rd 4th
IP of the Onode in its PDP2P network. IP of the adjacent Onode needed to reach that Onode. Cost to reach that Onode. Level of that Onode.
IP of the Level-1 Onode. Network Identifier. Keepalive Interval and Dead Interval of every Dnode. Available number of connections to other Dnodes from other PDP2P networks. Dnodes’ available load. Dnodes’ available number of connections for their leaf peers.
1st 2nd 3rd 4th 5th 6th
IP of the Onode from other PDP2P network. Network Identifier. IP of the adjacent Onode needed to reach that Onodes. Keepalive Interval and Dead Interval for that Onode. Cost to reach that Onode. Type of files it can be searched in that PDP2P network.
Table 2: Onodes’ network table
Table 3: Dnodes’ network table
Table 4: Level-2 Onodes’ distribution table
We have simulated an interconnection of 30 Onodes with random topologies, one of them shown in figure 4 (37 connections, diameter=8). We have elected 5 nodes to be new Onodes to measure the number of queries generated in the PDP2P network. Results are shown in figure 5. Measurements were taken in time slots. The worst time to converge is when the new Onode is at the edge of the topology, but it implies lower number of simultaneous queries through the system. It is better to limit the number of connections to 4 for faster convergence. More connections increase the number of simultaneous queries in the system. Lower number of connections implies larger time to converge.
organization layer. In case of hybrid P2P networks, there could be 182784 brokers/servers interconnected. In case of superpeer P2P network there could be 26320896 interconnected superpeers. The topology has 80 connections and a diameter of 12. Five nodes in the topology have been elected to be new Level-2 Onodes in the system. They have been elected by different number of connections in order to measure the number of queries generated in the network. Results are shown in figure 7. We can see that the maximum number of replies is between time slots 6 to 12, but the routing performance differs when the overlay service network takes different topology [8]. We have selected four networks (1, 8, 12 and 13 in figure 6) to measure the number queries generated through the system per PDP2P network reached every slot time. Figure 8 shows that PDP2P networks with more number of Layer-2 nodes with connections to other PDP2P networks give higher performance to the network.
2.2.3. Level-2 Onodes. We have decided, using values taken from IS-IS protocol [7], that one Level-2 Onode, every 50 Level-1 Onodes, is needed to maintain the performance of the Organization layer. Level-2 Onodes have connections with Level-1 Onodes in their PDP2P network. They are designated by Level-1 Onodes as a function of the δ parameter. A new Level-2 Onode must be authenticated with one or several Level2 Onodes in its PDP2P network (using Onodes’ network table) and from other PDP2P networks. These adjacencies are used to forward Dnodes requests between PDP2P networks. To simulate our system, we have elected a random topology with 56 Level-2 Onodes and 27 PDP2P networks interconnected. Figure 6 simulates an interconnection of 2856 supernodes in whole
2.2.4. Distribution layer. Dnodes are used to forward data between PDP2P network peers. Dnodes directly establish adjacencies with Dnodes from other PDP2P networks as a Hub-and-Spoke. Every new Dnode must send a query to its Level-1 Onode to request those adjacencies. Level-1 Onodes will forward this query to Level-2 Onodes in order to reach other PDP2P networks. Level-2 Onodes will select a Level-2 Onode from every PDP2P network and forward the query to those layer-2 nodes. Other PDP2P
networks Level-2 Onodes will send the query to Level-1 Onodes in their PDP2P network. Other PDP2P networks Level-1 Onodes will select their best Dnode based on their available adjacencies and their available load, and it will send this entry to the Level-2 Onode. Level-2 Onodes will select the best 2 entries and forward them to the Level-2 Onodes of the beginning PDP2P network. Level-2 Onodes will forward this information to Level-1 Onodes which send the selected Dnodes to establish adjacencies to the requesting Dnode. Dnode will obtain two selected Dnodes from every PDP2P network. With the information obtained, the Dnode will try to establish adjacencies with one Dnode from every PDP2P network. When Dnodes establish adjacencies, they exchange information which is added in Dnodes’ distribution table. Dnodes’ distribution table is shown in table 5. There could be Dnodes in a PDP2P network that could have adjacencies with the same Dnode of other PDP2P network. When an Onode leaves or fails, Dnode with highest δ parameter is elected to be an Onode. The total time spent to converge a Dnode (when it has a table with a Dnode from every PDP2P network) is (4+2*d1+2*d2+2*d3)*tp, where d1 is the number of hops from Level-1 to Level-2 Onode in the first PDP2P network, d2 is the number of hops from Level-2 to last PDP2P network Level-2 Onode, d3 is the number of hops from Level-2 to Level-1 Onode from last PDP2P network and tp is the average propagation time. As it has been measured in [9], the PDP2P file sharing network with the greatest number of peers has about 4 millions of peers. Considering 56 public domain PDP2P file sharing networks with 4 millions of peers, for every PDP2P file sharing network, it is needed over 2000 Dnodes, with around 2000 connections with peers in their PDP2P network (less than the worst case for the maximum number of users supported in different types of architectures [4]). The number of adjacencies between Dnodes is related to the number of Dnodes in whole distribution table. To model the distribution layer, we will suppose n PDP2P networks, each one with k Dnodes in its distribution layer. Every Dnode will have a distribution table with n entries. The number of adjacencies in the system is k*n*(n-1)/2. Figure 9 shows the number of adjacencies when there are interconnected 30, 40 and 50 PDP2P networks. As it has been measured in [4], there are around 0.000833 searches every second, which implicates a search every 1200 seconds in a population of 1350 users. We can scale these values to our system. Considering 4 millions of users in a PDP2P network, a quarter of them send searches to other PDP2P networks.
The system has to perform around 6 searches per second per PDP2P network. So, this interconnection system is feasible. Search results will be shown as a function of the ∂ parameter, which depends on the number of users with the file, the peers bandwidth, downloading file size and the download queue system of the PDP2P network. It is used to select the best file to download.
3. Joining or leaving the system and node failures. When a supernode entries, fails or leaves the interconnection system, behaves different according to the layer where it is placed. Level-1 and Level-2 Onodes are the stable supernodes of the interconnection system, but when an Onode fails or leaves the architecture, a query is sent to the distribution layer to have a substitute. Then, a Dnode is elected, as a function of the δ parameter, to be a new Onode. Because Organization layer uses SPF algorithm, Onode joining or leavings are verified by adjacent Onodes. Every time an Onode sends a query to an adjacent Onode and gets back a reply, the counter for both Onodes is reset to 0. If there has been sent no search query to other Onode for ten minutes, it will send a keepalive message to adjacent Onodes. This keepalive message is used to verify that adjacent Onodes are still alive. If there is no reply in 60 seconds from an Onode due to an exit or a failure, the failure or disconnection is verified. When there is a failure or disconnection, the SPF algorithm must run again. The change is distributed over the Onodes level network until the system has converged. When the old Onode returns to the system, it will become a Dnode. When a new node joins the distribution layer as a Dnode, it must request adjacent Dnodes from other PDP2P networks. Dnodes adjacencies can be refused by remote Dnodes when they reach the maximum number of allowed adjacencies. In this case, the connection establishment is sent to the successor Dnode. Every time a Dnode sends a search query to other Dnode and it replies the search query, the counter for both Dnodes is reset to 0. If no search query is sent to a Dnode during ten minutes, it sends a keepalive message to the adjacent Dnode. This keepalive message is used to verify that the adjacent node is still alive. If there is no reply in 60 seconds from the other Dnode, due to an exit or a failure, it will be substituted by the backup Dnode in its distribution table. In case of no response by the backup Dnode, it will send a query to the organization layer in order to have an adjacent Dnode for this PDP2P network.
2
Queries 12
3
1
1 connection
18 3
4
4 connections
8
27
10
3 connections
4
14
2 connections
10
4
5 connections 6
2
5
4
5
3
5
2
4 6 Propagation time
8
Network
10
Figure 5: Queries for topology-1
Layer-2 Onode
23
20 17
0
25
22
2
9 7
0
Figure 4: Random topology-1
16
6
2
1
11
26
19
15
13
8
1
21
12
24
Figure 6: Random topology-2 with 56 Level-2 nodes in 27 networks Querie s pe r Network re ache d
Queries 16
12
1 connection
14
2 connections
12
3 connections
10
5 connections
4 connections
8
Network Network Network Network
10
1 8 12 13
8 6
6
4
4 2
2
0
0 0
2
4
6 8 Propagation time
10
12
14
Figure 7: Time of convergence for Level-2 nodes in topology-2. Adjacencies 1400000
Adjacencies (n=30)
1200000
Adjacencies (n=40)
1000000
Adjacencies (n=50)
800000 600000 400000 200000 0 0
100
200
300
400
500
600
700
800
900
1000
Dnodes
Figure 9: Number of adjacencies between Dnodes.
1
2
3
4
5
6
7
8
9
10
11
Propagation time
Figure 8: Queries generated per network reached in topology-2 1st 2nd 3rd 4th 5th 6th 7th 8th 9th
IP of the D node from other PDP2P network. Keepalive Interval and Dead Interval of that Dnode. Network Identifier for that Dnode. Available number of adjacencies for that Dnode. Available load that can carry out that Dnode. Type of files it can be searched in its PDP2P network. Filedownloading system of its PDP2P network. Download queue system of its PDP2P network. Backup Dnode for that Dnode. Table 5: Dnodes’ distribution table
4. Conclusions
5. References
An interconnection system that allows searching and downloading from every PDP2P network connected to this system has been proposed. It could be also used to reach more nodes inside a PDP2P network. It is based on three layers to distribute the organization, data transfer and search tasks All supernodes have two roles, a role in its PDP2P network and a role in the system. This design allows changing nodes’ adjacencies based on the available adjacencies and load from other Onodes or Dnodes. We have chosen SPF algorithm to reduce the latency to request new Dnodes when there are Dnode’s failure or leavings. Simulations shown in this paper demonstrate that it is feasible to interconnect PDP2P networks. System bandwidth overload could be avoided using load balancing between Dnodes. Actually the system is being developed for some existing public domain PDP2P file sharing networks. As future work, we will do some experimental results to adjust δ and ∂ parameters. Supernodes load will be tested in order to simultaneously belong to the organization layer and the distribution layer besides its role in its PDP2P network.
[1] The giFT Project, at http://gift.sourceforge.net/ [2] E. Korpela, D. Werthimer, D. Anderson, J. Cobb, M. Leboisky, “SETI@home-massively distributed computing for SETI”. Computing in Science & Engineering, Volume: 3, Issue: 1, pp. 78-83. January 2001 [3] Subhabrata Sen and Jia Wang. “Analyzing peer-to-peer traffic across large networks”, IEEE/ACM Transactions on Networking, Volume 12, Issue 2, pp 219-232. April 2004. [4] B. Yang and H. Garcia-Molina. “Comparing hybrid peerto-peer systems”. Proceedings of the 27th International Conference on Very Large Databases. September 2001. [5] B. T. Loo, R. Huebsch, I. Stoica, and J. Hellerstein, “The case for a hybrid P2P search infrastructure”. Proc. 3rd International Workshop on Peer-to-Peer Systems. Feb. 2004. [6] Anindya Basu and Jon Riecke. “Stability Issues in OSPF Routing”. Proc. of ACM SIGCOMM 2001. Aug. 2001. [7] Alaettinoglu, C., Jacobson, V., and Yu, H., "Towards Millisecond IGP Convergence", NANOG 20, October 2000J. [8] Zhi Li and Prasant Mohapatra. The Impact of Topology on Overlay Routing Service, Proc. 23rd Annual Conference of the IEEE Communications Society. March 2004. [9] J. Lloret, B. Molina, C. Palau and M. Esteve. “Public Peer-To-Peer Filesharing Networks Evaluation”. The 2nd Iasted International Conference on Communication and Computer Networks. November 2004.