IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
VOL. 19,
NO. 8,
AUGUST 2008
1099
A Tree-Based Peer-to-Peer Network with Quality Guarantees Hung-Chang Hsiao and Chih-Peng He Abstract—Peer-to-peer (P2P) networks often demand scalability, low communication latency among nodes, and low systemwide overhead. For scalability, a node maintains partial states of a P2P network and connects to a few nodes. For fast communication, a P2P network intends to reduce the communication latency between any two nodes as much as possible. With regard to a low systemwide overhead, a P2P network minimizes its traffic in maintaining its performance efficiency and functional correctness. In this paper, we present a novel tree-based P2P network with low communication delay and low systemwide overhead. The merits of our tree-based network include 1) a tree-shaped P2P network, which guarantees that the degree of a node is constant in probability, regardless of the system size (the network diameter in our tree-based network increases logarithmically with an increase in the system size, and in particular, given a physical network with a power-law latency expansion property, we show that the diameter of our tree network is constant), and 2) provable performance guarantees. We evaluate our proposal by a rigorous performance analysis, and we validate this by extensive simulations. Index Terms—Peer-to-peer systems, tree-based networks, multicast, performance analysis.
Ç 1
INTRODUCTION
P
EER-TO-PEER
(P2P) networks (or overlays) have recently become an active research area. Applications over P2P networks include information retrieval, content distribution, processor cycle sharing, etc. These applications often demand that their underlying P2P network infrastructures be scalable and have low diameter and overhead. For example, an Internet-scale file sharing system, namely, Oceanstore [1], is designed and deployed on top of a P2P network Tapestry [2]. Tapestry is scalable in that each node participates in the network by using Oðlog X Þ connections. Its overlay diameter is equal to Oðlog X Þ, where X is the total number of nodes in the system. In this study, we concentrate on addressing the abovementioned fundamental requirements, that is, scalability, low diameter, and low overhead, for the overlay network infrastructures. By scalability, we mean that each node only has partial knowledge regarding the entire network structure. This implies that each node in the network maintains very few overlay links. For diameter, consider that the shortest routing path v1 ; v2 ; ; vn of any message in an overlay network G ¼ ðV ; EÞ, where v1 ; v2 ; ; vn 2 V are distinct. The diameter of G is the maximal path length n of a path among all possible ones in G. An overlay with a low diameter is desirable, since a route between any two nodes visits a lesser number of intermediates and is thus less sensitive to faults of these intermediates [3]. In contrast to the diameter, the “weighted” diameter is the maximally . The authors are with the Department of Computer Science and Information Engineering, National Cheng-Kung University, Tainan 701, Taiwan, R.O.C. E-mail:
[email protected]. Manuscript received 15 June 2007; revised 21 Sept. 2007; accepted 16 Oct. 2007; published online 24 Oct. 2007. Recommended for acceptance by K. Hwang. For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference IEEECS Log Number TPDS-2007-06-0193. Digital Object Identifier no. 10.1109/TPDS.2007.70798. 1045-9219/08/$25.00 ß 2008 IEEE
P weighted path length n1 i¼1 cvi viþ1 among all possible shortest paths, where the edge cost (that is, the delay in this study) of two adjacent nodes vi and vj on the path is denoted by cvi vj . Notably, we explicitly differentiate between the “diameter” and “weighted diameter” in this paper. For a low overhead, we mean that an overlay has a low systemwide operational traffic in maintaining the performance efficiency and functional correctness of the overlay. More precisely, we estimate the overhead of an overlay as P e2E ce f, where ce is the delay of sending a control message through the overlay link e, and f is a predefined maximum bandwidth required for sending a control message.1 We assume that the control messages used to construct and maintain an overlay have the same message length. In this work, we are particularly interested in studying tree-based overlay networks. We aim at designing a scalable tree-based overlay with low (weighted) diameter and overhead. Tree-based overlays are often the core infrastructures adopted by P2P applications that demand collective communication services (for example, message multicasting and reduction [5]). For example, consider a tree-based live media multicasting system in which a root peer in a tree-shaped overlay acts as a source that stores a complete media stream and offers the stream to nonroot peers. Meanwhile, each nonroot peer downloads the stream from its upstream peer and relays those downloaded to the downstream peers if available.
1.1 Previous Studies Perhaps, the studies most relevant to our work are [6], [7], [8], [9], [10], and [11], considering the P2P setting. The earlier work relies on tree-shaped overlays to facilitate streaming media contents. Chu et al. [6] suggest constructing a mesh overlay network. Given a mesh network, a tree 1. Li et al. in [4] suggest that the overhead required for constructing and maintaining an overlay is parameterized by the bandwidth metric. Published by the IEEE Computer Society
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1100
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
subnetwork Narada with good quality links is then built and maintained. Narada implements a shortest path spanning tree algorithm and, thus, intends to minimize the weighted diameter and the overall traffic generated. The shortest path spanning tree, however, may not guarantee the network diameter to be minimal. In addition, Narada targets at a small-scale environment, which acquires the global knowledge to construct its tree network. Banerjee et al. [7] designed a tree overlay network NICE, in which a node takes Oðlog X Þ connections to join the tree. In NICE, nodes geographically nearby form a cluster, and the cluster is the basic building block for the tree. NICE guarantees the diameter of the tree to be equal to 2 Oðlog X Þ. It is, however, unclear whether NICE has the minimal overhead, although the tree network exploiting the physical network topology operates toward the minimization of the overhead. It is also unclear whether the weighted diameter of NICE can have a bound guarantee. Tran et al. [8] present the construction of a tree network, namely, ZIGZAG, similar to that in NICE. In [8], Tran et al. offer an in-depth performance analysis for ZIGZAG. Overall, if the size of a cluster is in ½k; 3k, ZIGZAG guarantees that each node takes Oðk2 Þ connections to join the tree and the diameter of the tree is 2 Oðlogk X Þ. However, Tran et al. do not investigate the bounds for the weighted diameter and systemwide overhead. In contrast to [6], [7], and [8], Hefeeda et al. [9] present a tree-based overlay network that exploits the physical Internet topology. Liao et al. [10] present how one can utilize the tree links that are used to bridge different tree overlays. Both studies aim at minimizing the delay of receiving a message for any nonroot peer. Clearly, such a design principle works toward the minimization of the weighted diameter and the systemwide overhead. However, the designs presented in [9] and [10] have no performance guarantees. Chunkyspread [11] shows that tree-based overlays are viable solutions to live media broadcasting in the face of peers joining, departure, and failure. In Chunkyspread, participating nodes balance their loads in streaming data. A node is forced to connect to a new parent node if its present parent is overloaded. If the load of a node is under a targeted lower bound, the node may accommodate more children nodes. In Chunkyspread, instead of optimizing the network diameter, nodes minimize the latency of receiving media contents sent by the root. England et al. [3] investigate the design trade-off of the data loss rate and performance-oriented metrics (for example, the delay from the source to a destination) for tree-based overlay networks. They present a tree-based overlay to achieve a desirable trade-off. Banik et al. [12] propose a tree-based network to satisfy the given constraints of the delay bound and the delay variation bound from the source to any destination. Structured overlay networks are general-purpose communication infrastructures for P2P applications like file sharing, multicasting, information retrieval, and processor cycle sharing. Structured P2P systems, for example, Chord [13], Pastry [14], and Tapestry [2], which are all based on distributed hash tables (DHTs), may include tree structures into their designs [15], [16], [17]. A possible tree structure embedded in a DHT overlay, say, Chord, is
VOL. 19,
NO. 8,
AUGUST 2008
formatted by having an internal tree node with a hash ID x pick its fingers, with IDs between ½x; yÞ being the children nodes for the tree, where nodes with IDs x and y are immediate siblings in the tree [18], [19]. Clearly, such a tree network can serve as a collective communication substrate, where each node in the tree network maintains Oðlog X Þ children nodes, and the overall diameter of the tree is 2 Oðlog X Þ. However, the performance of the tree network embedded in a DHT may not be optimal in terms of the weighted diameter and overhead. For example, Bharambe et al. [20] conclude that the trees embedded in a DHT network may not have a low weighted diameter and an overhead due to the deterministic structure of a DHT network and the mismatch of the ID space and the physical network topology. Instead of relying on trees embedded in a general-purpose DHT network, in this paper, we are interested in designing a tree-shaped overlay to provide collective communication. In particular, we intend to design a tree network with good performance guarantees.
1.2 Our Idea and Contribution In this study, we present a scalable tree network T T ¼ ðV ; EÞ with low (weighted) diameter and overhead. To build our tree network T T, we first design a tree-based overlay T with a low diameter. We denote T as T 1 , and T 2 is formed by structuring d disjoint T 1 , where d is a given positive integer. In general, T k1 consists of d T k2 trees. In our proposal, the height HðT i Þ of a tree T i ð1 i k ¼ logd X Þ is guaranteed to have a bound of Oðln dÞ if we treat each subtree T i1 as T ¼ T k . With the recurrence a single node in T i . Let T equation [21], this results in the height of T T being HðT TÞ ¼ HðT k Þ ¼ HðT k1 Þ þ Oðln dÞ ¼ Oðln d logd X Þ ¼ Oðln X Þ and, thus, the diameter of T T being 2 Oðln X Þ. Since in T i , our design allows each T i1 to freely pick a geographically nearby node as its parent, such a flexibility for picking a parent node reduces the weighted diameter of the resulting T T to a constant. We summarize our major contributions as follows: 1.
2.
3.
We propose a decentralized algorithm that constructs and maintains a tree network with low (weighted) diameter and overhead. To our best knowledge, our design is the first attempt to address these design issues simultaneously. Our tree-shaped overlay has provable performance guarantees, which is efficient in that with a constant probability, the degree of each node in the network is constant. The expected diameter of our tree network is Oðln X Þ. Given a physical network with the powerlaw latency expansion [22], the expected weighted diameter is OðÞ, where is the maximal delay between any two peers in the physical network. We offer a thorough and rigorous theoretical analysis for our tree-shaped overlay protocol. Our analytical results have tight performance bounds. We also validate our analytical results in simulations.
1.3 Roadmap The remainder of this paper is organized as follows: Section 2 gives the definitions, notations, and assumptions. The design of our constant-degree low-diameter tree-shaped overlay is
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES
1101
TABLE 1 Notations Frequently Used
described in Section 3. We discuss how our overlay exploits the underlay network locality such that the resulting weighted diameter becomes constant in Section 4. Section 5 presents the performance analysis for our tree overlay protocol. We also perform the simulation study, and the simulation results are given in Section 6. We summarize our study in Section 7, with possible future research directions.
2
DEFINITIONS, NOTATIONS,
AND
ASSUMPTIONS
We model a P2P network as an undirected graph G ¼ ðV ; EÞ, where V includes the nodes participating in the system, and E represents the overlay links among the nodes. An overlay edge e 2 E between two nodes u and v in V is denoted by e ¼ uv. In this paper, the delay of an edge uv in an overlay network is denoted by cuv . We assume in this study that nodes in V may come and go. Some terminologies that are frequently used are defined as follows: Definition 1. The simple path (or path) from u 2 V to v 2 V ðu 6¼ vÞ, denoted by u ! v, is a connected subgraph ðV 0 ; E 0 Þ G such that the cardinality jV 0 j ¼ jE 0 j þ 1, where V 0 V , and E 0 E. We let the path length ju ! vj ¼ jV 0 j. Definition 2. The shortest path length from u 2 V to v 2 V ðu 6¼ vÞ, denoted by lu;v , is lu;v ¼ minfju ! vk8u ! v Gg. The shortest path length is the length of the shortest simple path from u to v. Definition 3. The diameter of a graph G, denoted by DG , is the maximal shortest path length ju ! vj from any node u 2 V to any v 2 V . That is, DG ¼ maxflu;v j8u 6¼ v 2 V g. Definition 4. The degree of a node v 2 V , denoted by dG v , is jUj, where any node u 2 U V fvg has uv 2 E. We assume in this study that there exists at least a robust bootstrap node to help a node join/rejoin the network. Table 1 lists the notations frequently used in this paper.
3
CONSTANT-DEGREE SMALL-DIAMETER OVERLAY NETWORK PROTOCOL
In this section, we first give an overview of the idea regarding our tree-based overlay formation protocol in
Fig. 1. An example of a T k tree consisting of six T k1 trees, in which each node v in each of T k1 and T k trees has a maximum degree d^Tv ¼ 6.
Section 3.1. The details of the tree protocol are then given in Sections 3.2 and 3.3. We defer the discussion of how our tree network exploits the physical network locality in Section 4.
3.1 Overview Fig. 1 shows our idea for constructing a constant-degree low-diameter tree. Basically, our tree is recursively formed in a hierarchical fashion. The basic element of our tree is a T 1 tree. A T i tree is built by at most d^Tv T i1 trees, where 1 i k. The resulting tree that our tree protocol constructs is T T ¼ T k . We note that ðd^Tv Þk is the maximum number of nodes in T T. When forming a T i tree, nodes self organize, and the maximal path length from the root to any leaf is bounded. We note that in each T i tree, the root node, denoted by r, is associated with an only child node r:chd½1. This allows us to minimize the degree of the root. In contrast, nonroot nodes can use up to the degree of d^Tv to participate a T i tree. Once a T i tree is constructed, its root node proceeds to join a T iþ1 tree. Possibly, the root remains a root node of a T iþ1 tree. Otherwise, it can connect not more than d^Tv nodes in T iþ1 . For example, in Fig. 1, the root node A is associated with an only child node C in a T k1 tree. The root node A then participates in a T k tree and maintains another child node D for the T k tree. That is, a node may participate in several T k trees (where k ¼ 1; 2; 3; ), which serves as a root node in each T k and maintains an only child node for each T k . In contrast, the root node B of another T k1 tree in Fig. 1 joins the T k tree as a nonroot node. The resulting T k tree in Fig. 1 consists of nodes A, B, D, E, F, and G that are the roots of corresponding T k1 trees. We call the tree formation protocol that forms a T i tree for any 1 i k as the T protocol in the following discussions. 3.2 T Protocol for Constant Peers We consider formatting and maintaining a tree network T i ¼ T ¼ ðV ; EÞ, where 1 i k.
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1102
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
3.2.1 Network Construction We first define the following notation: Definition 5. The numerical difference of a node v with respect to the root node r, that is, diffðvÞ, is defined as 4; F ðr:chd½1Þ F ðvÞ; def diffðvÞ ¼ R þ 4; otherwise; where F can be an arbitrary collision-free hash function that can provide a unique ID ( 1) to a node, R is the maximum value that F can return, and 4 ¼ F ðvÞ F ðr:chd½1Þ. When a node A intends to join the overlay, it first connects to the bootstrap node2 that provides an entry point, that is, the root node r, of the overlay (see Algorithm 1). The root node r then helps A join by uniformly picking a node in the tree at random. Notably, r has the knowledge of the tree topology (discussed later) such that r picks a node in the tree uniformly at random without consuming any network traffic.
AUGUST 2008
If diffðAÞ < diffðBÞ, B reports its parent node B:prt to A. Upon receiving the network address of B:prt, A then iteratively performs the joining by sending the joining request to B:prt. The joining process proceeds until the joining request is forwarded to an ancestor node Q of B:prt, and diffðQÞ < diffðAÞ. A then connects to Q. 2. Otherwise, if diffðAÞ > diffðBÞ, A simply connects B as B’s child node. We note the following in our tree formation protocol: 1.
2.
3.
5.
6.
2. We adopt the mechanism similar to Gnutella [23], which provides a bootstrap node for a node joining. Possibly, there are several bootstrap nodes to help nodes join the overlay.
NO. 8,
1.
4.
When the random node, say, B, is determined, the process is immediately performed as follows:
VOL. 19,
The bootstrap node picks the first node that joins the network as the root node r. r is then registered with the bootstrap node. When nodes are forming a tree network, the root node r in the tree always maintains only one child node r:chd½1. The second node that joins the network simply becomes the only child of r. That is, dTr ¼ 1. Any node v, except r, can accept any number of nodes as their children subject to their degree constraints. That is, dTv d^Tv . The total number of nodes in a tree is up to dTv . That is, dTv ¼ d^Tv . Each leaf node v of the tree requires sending a live message to its parent v:prt such that v:prt can keep track of the number of its children nodes and its subtree topology. v:prt performs similarly so that the parent node ðv:prtÞ:prt of v:prt can add up the size of the “subtree” rooted at ðv:prtÞ:prt and maintain the knowledge of its subtree topology. r can then have the topology of the tree and calculate the total number of the nodes in the tree. If the tree rooted at r contains up to d^Tv nodes, then r will not include any newly coming node into its tree. r deregisters from the bootstrap node. r may reregister with the bootstrap if r maintains less than d^Tv nodes.
3.2.2 Network Maintenance Our tree-shaped overlay network may be fragmented due to node failure or departure. To handle the dynamics of the overlay, each node v periodically pings its parent node v:prt. If v:prt fails to respond to v, v assumes the failure of v:prt and then rejoins the network via the help of another node in the network by consulting its local cache. Algorithm 2 details the overlay maintenance. A node A first checks whether its parent A:prt is active by sending a ping message periodically. Upon receiving a ping message, A:prt replies to A with a pong message. If A does not receive any pong message from A:prt, A then sends another ping message to A:prt. If a number of ping messages are sent and A does not receive any pong from A:prt, A then performs the rejoining operation by sending a joining request to a node U picked uniformly at random from A’s cache (denoted as CacheðAÞ) that A locally maintains. The rejoining operation simply lets a rejoining node join the network by using the joining algorithm discussed in the previous section. In our design, before A performs its rejoining, A needs to notify all nodes in its subtree (that is,
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES
the subtree rooted at A). Upon receiving the notification sent from A, the nodes in A’s subtree leave and join the overlay. The notification can be simply implemented by sending the notification message downward a subtree.
Consider the example given in Fig. 1. A is the root of the tree T k , where T k is formed by the root nodes A, B, D, E, F, and G of the tree T k1 . Assume that D fails. E then detects the failure of D without receiving any pong messages from D. E informs the nodes (that is, B and F) in T k in the subtree rooted at E to rejoin. G performs the same operations if it has offspring nodes in T k . E and G then rejoin also. We will provide a theoretical analysis for the rejoining cost (Definition 8) in Section 5.1. Our analysis result (Theorem 4) presents that our network maintenance protocol is efficient. The expected rejoining cost of the 2 maintenance protocol is OðNl Þ, where l is the diameter of a T i tree, and N is the number of nodes participating in T i . We emphasize that it is critical to let a node pick a node uniformly at random for its joining or rejoining the network. This enables our protocol to guarantee that l ¼ Oðln N Þ in expectation. Section 6.5 verifies these analytical results. We finally note the following for our network maintenance protocol: 1.
2.
A rejoins by first invoking join ðUÞ shown in Algorithm 1, where U is the node ID maintained in A’s local cache. If A cannot find any live node U from its cache for its rejoining, A needs to locate an entry point B from the bootstrap node3 for its rejoining with JOINðBÞ.
3. In this study, we assume that the bootstrap node is always alive and keeps some random nodes participating in the tree overlay.
1103
Similarly, any node in the subtree rooted at A rejoins by randomly picking an entry node from its cache. If the entry node is unavailable, the node rejoins via the bootstrap node. 4. If a node is performing its rejoining via an entry point that is also performing the rejoining, then the node selects another node in its cache as a new entry point. Similarly, if a node cannot find any node in its cache to help its rejoining, the node requests the bootstrap to pick one. 5. Possibly, more than one of the nodes in the subtree rooted at r:chd½1 select r from their local caches as their entry points for their rejoining with join ðrÞ. If r:chd½1 leaves or fails, r will pick one of these nodes as its r:chd½1. Otherwise, those nodes that fail to become r:chd½1 then rejoin by selecting other entries from their caches (or by consulting the bootstrap node in case no live cached node can be found). Each node A in the network maintains a cache, denoted by CacheðAÞ, by using a PULL algorithm. Basically, in our PULL algorithm, each node A has to periodically send a live message to its parent A:prt. Upon receiving a live message, A:prt collects the IDs in the subtree rooted at A:prt. A:prt then performs the similar by sending the received IDs to its parent. This process continues so that r collects the IDs of all nodes in the tree network rooted at r. r then disseminates these IDs to all nodes in the network along the tree structure. Consequently, each node in the network can construct and maintain its cache that contains the IDs of the nodes in the network. We finally note that our network maintenance protocol can handle the failure of r:chd½1 well. Consider a level-i tree T i with a root node r, r’s only child node r:chd½1, and two other nonroot nodes C and D. Assume that the children nodes of r:chd½1 are C and D. If r:chd½1 fails, C and D detect the failure of r:chd½1 and then need to rejoin the network. If the local cache maintained by C (or D) is up to date and has r’s location, then C (or D) connects to r and becomes an only child node of r. Otherwise, C (or D) consults the bootstrap node to seek a root node of a level-i tree as its entry point for its rejoining. 3.
3.3 T T: Scaling with the T Protocol We have presented the basic algorithm for forming a tree network T that can consist of up to d^Tv , where d^Tv is also the maximum degree of a node. For constructing a level-2 tree T 2 , root nodes r in distinct 1 T trees query the bootstrap node for their entry points. This process is identical to that of the joining of a node into a T 1 tree, except that the candidate entry points that can help these root nodes form their T 2 trees are the root nodes of T 1 trees. Therefore, in our tree formation protocol, we require the bootstrap node to additionally label each registry node with its level ID. The bootstrap node depends on the level ID to identify the “root level” of a registry node. That is, the root node of a T k tree will be labeled with the level ID k in the bootstrap. For example, if a node is a root node of a level-3 tree, then it will have level ID 3 in the bootstrap.
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1104
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
If T T is a level-k tree T k , then the above-mentioned process proceeds until multiple T k1 trees self organize into a T k tree. Similarly, the root nodes of these T k1 trees form a T k by consulting the bootstrap for the locations of roots, with level ID k 1 being the entry points. We note the following: 1.
2.
3.
4.
If a node is the root of T i , then it must also be the root of T i1 ; T i2 ; ; T 1 . Consider the example shown in Fig. 1. The node A is not only a root node of T k1 but also a root of T k . It is easy to verify that A is also a root node of T j , where j ¼ 1; 2; ; k 2. Nodes can use the degree of up to dTv to form a level-i tree, where 1 i k. Therefore, in a level-i tree, P the root has the “total” degree equal to ij¼1 1 ¼ i, since the root node (for example, the node A in Fig. 1) maintains an only child node in each level-k tree, where k ¼ 1; 2; ; i. In contrast, the “maximum” total degree of a nonroot node in a level-i tree is dTv þ i 1. This is because a nonroot node (for example, the node B in Fig. 1) in a level-i tree must be a root node of a level-k tree (where k ¼ 1; 2; ; i 1). The nonroot node then participates in the level-i tree by using the degree up to dTv . Consider a T i tree. If a node v detects the failure (or departure) of its parent v:prt in T i , then similar to what we have discussed in Section 3.2.2, v notifies its offspring root nodes of the subtree T i1 in T i regarding the failure/departure. Upon receiving the notification message, a node u in T i rejoins the network by using Algorithm 2. A node in T i maintains i caches, and each cache is constructed and maintained as mentioned in Section 3.2.2.
Definition 6. A graph follows the -power-law latency expansion if for each node v in the graph Nv ðxÞ ¼ x ; where Nv ðxÞ denotes the number of nodes that have latencies not more than x to v, and and are two given positive constants.
Lemma 1. Let X1 ; X2 ; ; Xn be independent random variables over [0, 1], where X1 ; X2 ; ; Xn follow the probability distribution with the -power-law latency expansion P ðX < xÞ ¼ x . Let Y ¼ minfX1 ; X2 ; ; Xn g. Then, there 1 exists Y < c such that E½Y n , where c is a positive number. Proof. Since X1 ; X2 ; ; Xn Y ¼ minfX1 ; X2 ; ; Xn g
are
where u is any node in the graph, and cvu denotes the latency from v to u. Without loss of generality, we let ¼ 1.
independent
and
P ðY yÞ ¼ P ðminfX1 ; X2 ; ; Xn g yÞ ! n \ ¼P ðXi yÞ i¼1
¼
n Y
P ðXi yÞ
i¼1
¼ ð1 y Þn : Since 1 a ea (when 0 < a < 1, and a is sufficiently small), it follows that Z 1 E½Y ¼ P ðY yÞdy 0 Z 1 y n e dy ¼
Z
1
e
1
n y
dy
0
1 s Rs
¼ 1
where s ¼ n . Since proof follows.
0
Z
s
ex dx;
0
ex dx 1, E½Y 1s , and the u t
Lemma 1 indicates that if a node v picks n nodes uniformly at random and among these n nodes, v maintains the one u that has the smallest delay to v, then the delay 1 from v to u will be n . This suggests that our tree protocol works in the following way to exploit the physical network locality: 1.
Assume that the maximal distance between any two nodes in the graph (with the power-law latency expansion) in which our tree network T T overlays is . Then, the probability distribution for Nv ðxÞ is x P ðcvu < xÞ ¼ ;
AUGUST 2008
Clearly, without exploiting the physical network locality, T T can have the “weighted diameter” Oð logd^ X Þ ¼ Oðlogd^ X Þ, where T i ð0 i kÞ has d^ nodes, and the total nodes in T T is X .
EXPLOITATION OF PHYSICAL NETWORK LOCALITY
Studies in [22], [24], and [25] present that the latency distribution between Internet end hosts likely follow the power-law latency expansion. In this study, we thus concentrate on the network graphs with the power-law latency expansion.
NO. 8,
Definition 7. The weighted diameter in a graph G is P maxf adjacent nodes v;u in p cvu j8 shortest paths p Gg, where p denotes the simple shortest path.
0
4
VOL. 19,
2. 3.
A node v in T i samples the nodes in T i . Since T i maintains up to d^Tv nodes (that is, the root nodes of T i1 subtrees), each node can sample d^Tv nodes at most. Assume that v performs n samples, where n d^Tv . v maintains a node u that is closest in terms of the network delay among the n samples. v then rejoins T i via u if diffðuÞ < diffðvÞ. v thus becomes a child node of u. Otherwise, v still connects to its original parent.
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES
The details are shown in the following algorithm:
1105
system at time t is denoted by MðtÞ. We also assess the load of the bootstrap node in this section. Theorem 1. The number of peers in the system at time t is OðE½MðtÞÞ, with high probability.4 Corollary 1. Let
¼ N . If t N , then OðE½MðtÞÞ ¼ N .
Due to space limitations, we give the detailed proofs for Theorem 1 and Corollary 1 in [31]. Theorem 1 states that the number of nodes in the system at any time t is OðMðtÞÞ with high probability. Corollary 1 presents that if the system time t OðN Þ, then the number of nodes in the system is OðE½MðtÞÞ ¼ N . Therefore, in the following, we will discuss operating at t > cN for some c and denote the number of peers in at t by N . Lemma 2. If an overlay is constructed using JOIN, then will be cycle free. Proof. Consider a cycle, denoted by p ¼ a0 a1 a2 an1 a0 in , where a0 ¼ r:chd½1, and fa1 ; a2 ; ; an1 g V fr:chd½1g. We consider the following cases: 1. We note the following: Any node v 2 V fr; r:chd½1g can simply sample nodes in v’s cache. 2. Since r has the global knowledge of T i , r replaces r:chd½1 by the closest offspring node among V frg. If so, all offspring nodes, except the newly selected r:chd½1, rejoins T i by using Algorithm 2. 3. Measuring the latency between two nodes is out of the scope of our study. This may refer to public network services such as GNP [26]. 4. Nodes in the subtree rooted at v need to rejoin, since v successfully connects to a closer parent compared to its previous one. These nodes, except for v, perform rejoining by Algorithm 2. We will show later in Section 5.3 that if n 2þ1 ðln X Þ , then the weighted diameter of our tree network T T becomes the constant . 1.
5
PERFORMANCE ANALYSIS
This section provides a rigorous thorough performance analysis for our proposal given in Sections 3 and 4. Section 5.4 concludes this section and presents the implications of our performance results by illustrating an example.
5.1 Performance of T It is sufficient to consider the subtree ¼ ðV ; EÞ rooted at r:chd½1 in T . Recent measurement studies [27], [28] of real P2P systems (that is, Gnutella [23] and Naspter [29]) provide evidence that peers have lifetimes approximating the exponential distribution reasonably well [30]. In the following analysis, we assume that the system follows the M=M=1 queuing model, in which the arrival rate of peers is according to a Poisson distribution with parameter . The lifetimes for peers are independent and exponentially distributed with parameter . The number of peers in the
2.
p is a cycle, because two paths p1 and p2 share the same end point a0 joint at a node, say, ai ð1 i n 1Þ. However, this is impossible, since by definition, each nonroot node in can only have a parent node. p can be a circular path, even without two paths, with r:chd½1 being their end point cross. If so, it can be easily shown that diffða0 Þ < diffða1 Þ < diffða2 Þ < < diffða0 Þ. This is a contradiction, and the proof follows. u t
Remark 1. If an overlay is constructed using JOIN, then a node joining will visit nodes on not more than one path, with the root node being the end point. Theorem 2. If implements JOIN, jV j ¼ N , and d^v ¼ jV j for any v 2 V , then the maximal path length in from r:chd½1 to any leaf is ln N þ Oð1Þ in expectation. Theorem 3. Assume with N nodes. Denote the maximal path length in rooted at r:chd½1 by the random variable SN . Then, SN 6 ln N , with the probability not less than 1 N 1 . Since the proofs for Theorems 2 and 3 are lengthy, we refer the readers to our technical report in [31] for the details. Lemma 2 and Remark 1 state that any node a takes a finite number of hops to join the overlay and the nodes helping a join appear on only one path, with r:chd½1 being the end point. Theorems 2 and 3 show that any path, with r:chd½1 being the end point has Oðln N Þ hops with high probability. We thus conclude as follows: Corollary 2. If rooted at r:chd½1 with N nodes is constructed with JOIN, then a newly joining node takes Oðln N Þ hops with high probability to join . Clearly, T associated with has the maximal path length Oðln N Þ þ 1 ¼ Oðln N Þ from r to any leaf in T .
4. "With high probability" in this paper denotes the probability not less ð1Þ Þ. than 1 OðN
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1106
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
Our tree construction protocol (that is, JOIN) strongly depends on the uniformity in selecting a node to join the overlay. In addition to randomly picking a node by r for joining a newly coming node, each node will also rely on randomly and uniformly selecting a node in its cache for helping its rejoining if the node detects the failure of its parent node. Remark 2. If with N nodes is maintained with HEAL and PULL, then any node u 2 V has the identical probability to be picked as an entry point by a rejoining node v, where u 6¼ v. We note that Remark 2 shows that if a node rejoins its tree overlay , then it will pick a node as its entry point from N 1 participating peers in V with the probability of 1 N 1 . If so, the JOIN algorithm shown in Algorithm 1 guarantees that the maximal path length from r:chd½1 to a leaf remains Oðln N Þ (see Corollary 2) with high probability, even if nodes operating the JOIN algorithm are in an environment in which nodes may come and go. In our HEAL algorithm, if the root node p of a “subtree” in detects the failure of its parent p:prt, it needs to rejoin and notify the nodes in the subtree regarding the failure. That is, each of the nodes in the subtree requires performing the JOIN algorithm. We are thus interested in knowing the cost associated with these rejoining operations. Definition 8. The rejoining cost due to the failure of a node v in is defined as the number of nodes in S1 ; S2 ; ; Sk performing their rejoining operations and the number of nodes that help nodes in S1 ; S2 ; ; Sk perform their rejoining, where S1 ; S2 ; ; Sk are the subtrees rooted at v. Theorem 4. If with N nodes is maintained with HEAL and PULL, then in the time interval ½t N1 ; t, the rejoining cost introduced by any node is not more than l2 þ Oð1Þ, on the average, where l is the maximal path length from r:chd½1 to any leaf, and any t N1 . Proof. Assume that l is the maximal path length from r:chd½1 to any leaf and that ni is the number of nodes with the i-hop distance from the root. Clearly, Pl1 1 i l 1, and n ¼ N 1 (excluding r:chd½1). i i¼1 If a node v having the k-hop ð1 k l 1Þ distance from r:chd½1 detects the failure of its parent, then v will rejoin , and those nodes in the subtree rooted at v will also rejoin. We denote the number of nodes that perform the rejoining operations by sk;v . It can be easily shown that if any node v 2 fv1 ; v2 ; ; vnk g that has the k-hop distance from r:chd½1 detects the failure of its parent, then the total number of nodes that need to rejoin P will be l1 sk ¼ sk;v1 þsk;v2 þ þsk;vn ¼ nk þ nkþ1 þ nl1 ¼ i¼k ni . The total rejoining cost ck is thus not more than sk þ l sk . Consider the time interval ½t N1 ; t for any t N1 . Thus, N 1 ¼ N1 nodes depart the system in the interval. In addition, each of these N 1 nodes leaves the system with high probability. This is because the probability distribution of the lifetime Tl of a peer is P ðTl > tÞ ¼ et , and P Tl > N1 ¼ eN þ1 0.
Therefore, in the time interval ½t N1 ; t, the average l l ðl þ 1Þ s1 þsN2 þþs . Since rejoining cost Cavg is c1 þcN2 þþc 1 1
VOL. 19,
NO. 8,
si N 1 ð1 i l 1Þ, Cavg ðl þ 1Þ ðl þ 1Þ ðl 1Þ ¼ l2 1. The proof follows.
AUGUST 2008
Pl1
ðN 1Þ N 1
i¼1
¼ u t
Corollary 3. If with N nodes is maintained with HEAL and ; t,the rejoining cost PULL, then in the time interval ½t N1 2
2
per unit time is not more than Nl1 ¼ O Nl , on the average, where l 6 ln N , with the probability not smaller than 1 N 1 , and any t N1 .
5.2 Performance of T T As we have discussed earlier, T T ¼ T k . We will, in this section, report the performance analysis for T T regarding T for any v and the maximal path length from the degree dvT the root node r to any leaf. Theorem 5. Assume that each node v in T T initially has the degree d^ to form T i , where 1 i k. Then, T T T ^ ^ E½dT v ¼ d þ Oð1Þ, and dv 2d, with the probability not 3 ^ less than 1 d . ^ Theorem 6. Assume that constructing a d-node T i tree takes ^ ^ d^ tðdÞ time units, where 1 i k. If E½tðdÞ 2ðþÞ d^ ^ (E½tðdÞ 2 when ), then the number of registry nodes in the bootstrap node is less than k in expectation and not more than k2 þ OðkÞ with the probability 1 k4 þ oð1Þ. Due to space limitations, we omit the proofs for Theorems 5 and 6, and the details of the proofs can be found in [31]. Corollary 4. Let X ¼ d^k be the total number of nodes in T T. Then, the diameter of T T is DT T ¼ 2 ln X in expectation, and with the probability not less than 1 Oðd^1 Þ, DT T is not more than 12 ln X . Proof. The diameter DT T is the length of the path T of T crossing through the root node r in T i ði ¼ 1; 2; ; kÞ from a leaf node a in T 1 to a leaf b in another T 1 . ^ ^ Therefore, DT T ¼ 2k ln d ¼ 2 logd^ X ln d ¼ 2 ln X . By the proof in [31, Theorem 3], with the probability not less than 1 Oðd^1 Þ, the diameter is not more than 12 ln X . We can also rely on the recurrence [21] to prove this result. Let HðT k Þ be the maximal path length from the root of T k to any node in T 1 . Then, we have the ^ Since recurrence equation HðT k Þ ¼ HðT k1 Þ þ ln d. k ¼ logd^ X , we have HðT k Þ ¼ logd^ X ln d^ ¼ ln X . We thus have the diameter DT T not more than 2 ln X , and the proof follows. u t We have shown in Theorem 5 that with high probability, ^ We are, the degree of any node in T T is not more than 2d. however, also interested in knowing the maximal degree of a node in T T. Remark 3. The root node in T T ¼ T k has the degree k ¼ logd^ X . A nonroot node in the T k tree has the degree not more than d^ þ k 1. Thus, the degree of any node in the system is not more than d^ þ k 1.
5.3 Weighted Diameter Corollary 5. In T i ¼ ðV ; EÞ, any node v 2 V that implements JOIN_VICINITY and samples n nodes in V frg will connect 1 to a parent node u 2 V fvg such that E½cvu n2 . Proof. Since P ðdiffðuÞ < diffðvÞÞ ¼ 12 , we have
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES
Fig. 2. The maximal path length from r:chd½1 to any leaf.
1 P ðcvu yÞ P ðcvu yÞ \ diffðuÞ < diffðvÞ 2 n 1 ¼ 1 y : 2 Similar to the proof in Lemma 1, we have Z 1 1 n E½cvu ¼ e2y dy 0 Z 1 n 1 ð2Þ y ¼ e dy 0 Z s 1 ex dx; ¼ s 0 1 1 where s ¼ n2 . Therefore, we have E½cvu n2 , and the proof thus follows. u t Theorem 7. Let each T i ð0 i kÞ have d^ nodes at most. Let X ¼ d^k be the maximum number of nodes in T T. Assume that each node in T i samples n ¼ cd^ nodes, where 0 < c 1. If n 2þ1 ðln X Þ , then the weighted diameter of T T is ð1Þ in expectation. Proof. Without loss of generality, let N ¼ dk be the maximum number of nodes in T , where d > 1, and k > 1. Assume that each node in V samples n ¼ cd nodes, where 0 < c 1. Applying Corollary 4 yields the 1 expected . Thus, if weighted diameter equal to L ¼ 2 ln N cd 2 2þ1 ðln N Þ , then L 1, and the proof follows. u t d c Corollary 6. If the weighted diameter of T T is ð1Þ, then þ1 N Þ Ti ^ 2þ1 ðln N Þ . . That is, d ¼ d d^ 2 ðln v c c
5.4 An Example Previous studies have shown that in popular P2P networks (for example, KaZaA [32]), peers have the mean lifetime 1= ¼ 2:5 hours [28], [33]. That is, if T T with X ¼ 105 nodes, 5 then ¼ 10 =9;000 11:11 newly coming peers per second (Corollary 1). We let d^ ¼ 10, and k is thus 5. By Theorem 5, the degree of any node is not more than 20, with the probability not less than 1 103 ¼ 0:999. If the mean network delay between two Internet end hosts is 10 ms [34], then the mean number of registry nodes in the bootstrap will be 5, and the number of registry nodes is not more than
1107
Fig. 3. The diameter.
52 þ 5 ¼ 30, with the probability not less than 1 514 ¼ 0:998 (Theorem 6). This is because by Corollary 2, constructing a ^ ^ 0:01 d^ dln de ^ ¼ 0:3 second, and d-node tree takes E½tðdÞ ^ < d=2 ^ by Theorem 6, E½tðdÞ 0:45 second.
6
SIMULATION RESULTS
We have developed an event-driven simulator that allows the study of the performance of tree-based networks. The performance metrics that we are interested in include the degree distribution of participating nodes and the (weighted) diameter of the network, given the number of nodes participating in the system, the mean lifetime of the joining peers, and the initially maximal degree d^ of a node. In our simulations, the number of nodes participating in the system is up to X ¼ 100;000. The initially maximal degree d^ of a node simulated is from 5 to 100. Each participating peer has a lifetime with a mean of 150 minutes [28], [33]. The lifetime follows the exponential distribution. We have also investigated the effect of a mean lifetime of 30 minutes. In this paper, we, however, omit the simulation results for the mean lifetime of 30 minutes. This is because we do not observe any significant difference from the simulation results for the two lifetime values (that is, 30 and 150 minutes). We perform extensive simulations by averaging the performance metrics collected from 1,000 runs. Each run takes 1,600 minutes.
6.1 Height of T Fig. 2 depicts the simulation results for the T protocol, where the performance metric, that is, the maximal path length from r:chd½1 to any leaf, is shown for the number of participating peers from 10 to 100,000. We note that the x-axis is in logarithmic scale. The results show that the maximal path length from r:chd½1 to any leaf is nearly identical to ln X . This conforms to our theoretical analysis, as discussed in Section 5 (see Theorem 2). 6.2 Diameter of T T We illustrate the diameter of the tree implementing the T T protocol in Fig. 3. In the simulations, d^ is 10, 100, and 1. Fig. 3 shows that if d^ is enlarged (for example, d^ ¼ 100), then the diameter measured from simulations closely matches the result presented in Corollary 4. However, if d^ becomes relatively smaller (for example, d^ ¼ 10), then our
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1108
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
VOL. 19,
NO. 8,
AUGUST 2008
Fig. 5. The cumulative distribution function of degrees.
Fig. 4. (a) The BRITE topology models Waxman and Barabasi. (b) The real topology, with 500 PlanetLab nodes.
T T networks have diameters larger than the expected (that is, 2 ln X ). This is because in the proof of Corollary 4, with which we estimate the diameter of a tree, with d^ nodes ^ instead of dln de. ^ Thus, the diameters equal to ln d, measured for our T T trees are not less than 2 ln X .
6.3 Weighted Diameter of T T Fig. 4a illustrates the simulation results for the weighted diameters (that is, the maximal weighted path length) of our tree-shaped overlay networks over the Waxman and Barabasi topology models [35]. The performance results for exploiting and not exploiting the physical network topology (W/ Exploiting Network Locality and W/O Exploiting Network Locality, respectively) are both shown in Fig. 4a. We note that the delay between nodes in the Waxman and Barabasi topologies does not follow the distribution of the power-law latency expansion. However, our algorithms still work well, and with our algorithms, the increase in the delay between nodes is less sensitive to the number of nodes participating in the system. We also investigate the effectiveness of our algorithms in exploiting the physical network using the real network topology of an experimental testbed, namely, PlanetLab [36]. Fig. 4b depicts the simulation results for the topology with 500 PlanetLab nodes. As we can see from the results shown in Fig. 4b, our algorithms for exploiting the physical network locality are particularly effective (see Theorem 7),
Fig. 6. The averaged rejoining cost per node.
since the PlanetLab network topology exhibits the powerlaw latency expansion [25]. Note that in the experiments with the Waxman, Barabasi, and PlanetLab topologies, we let d^ ¼ 100 and ¼ 1 (see Theorem 7 and Corollary 6).
6.4 Degree Distribution Fig. 5 shows the cumulative distribution function of degrees, where ðx; yÞ represents the number y of nodes having degrees not more than x. In this experiment, we let d^ ¼ 8; 20; 40, and X ¼ 100;000. The simulation results are similar for different system sizes and are thus omitted in this paper. The results in Fig. 5 conform to our analytical result presented in Theorem 5. That is, the degree of a node is ^ unlikely to be more than the expected d. 6.5
Rejoining Cost and Overhead
6.5.1 The Rejoining Cost Theorem 4 states that a node takes at most Oðln2 N Þ, on the average, to rejoin a T i network if the node detects the departure or failure of its parent, where N is the number of nodes (that is, the roots of T i1 ) in T i . Our simulation results show that the averaged rejoining cost of a node is very small, given d^ up to 100. This represents that our healing protocol is efficient. For further understanding of the rejoining cost of our protocol, we instead investigate d^ up to 100,000 (that is, N ¼ 100;000). Fig. 6 depicts the
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
HSIAO AND HE: A TREE-BASED PEER-TO-PEER NETWORK WITH QUALITY GUARANTEES
1109
Fig. 7. The overheads. (a) Waxman. (b) Barabasi. (c) PlanetLab.
simulation results, which presents that a node takes the cost far less than Oðln2 N Þ to repair the network.
6.5.2 The Overhead As we have discussed in Section 1.1, the earlier proposals such as Narada [6], PROMISE [9], and Anysee [10] intend to minimize the delay of receiving messages sent by the root for any nonroot nodes. That is, these proposals work toward the construction of the shortest path spanning tree. We thus also investigate the overhead of the shortest path spanning tree. We note that the shortest path spanning tree is a minimal-cost network flow problem, which can be formally formulated as follows: X min c f ij2E ij ij P P k2OðjÞ fjk i2IðjÞ fij ¼ 1; 8j 2 V frg; P s:t: f ¼ jV j 1; k2OðrÞ rk 8i; j 2 V ; fij 0; where IðjÞ and OðjÞ respectively represent the incoming and outgoing flows to/from the node j. Since P one of our design objectives is to minimize the overhead ij2E cij f as much as possible, if we normalize f such thatPf ¼ 1, then we can simply estimate P the overhead equal to ij2E cij . Clearly, the cost (that is, ij2E cij fij ) of the P shortest path spanning tree, as defined above, is at least ij2E cij , because the resulting flow fij must be 1 for any link ij in the tree. The shortest path spanning tree works toward the minimization of the overhead of the network. Fig. 7 shows the overheads of our tree-based overlay with/without the exploitation of the physical network locality (denoted by W/ and W/O, respectively). In Fig. 7, SPT represents the shortest path spanning tree. The physical network topologies that we study in this experiment are Waxman, Barabasi, and PlanetLab, with jV j ¼ 2;000, jV j ¼ 2;000, and jV j ¼ 500 nodes, respectively. In the experiment, we P estimate the overhead by using the total delay (that is, ij2E cij ), assuming that f is normalized to 1. Fig. 7 presents that our tree-based overlay with the exploitation of network locality (that is, W/) obviously outperforms SPT. This indicates that our tree-based overlay performs better toward the minimization of the overhead.
7
SUMMARY
AND
FUTURE WORK
We have presented a tree-shaped P2P network infrastructure. Our tree-based overlay is lightweight and implements
a simple T protocol (see Section 3). We thoroughly and rigorously analyzed the performance of our proposal, and we have shown that our tree-based network has nice performance guarantees in terms of 1. the degree of a peer, 2. the diameter, 3. the weighted diameter, 4. the cost of joining a new peer, 5. the protocol maintenance overhead, and 6. the queue length in the bootstrap node. We also validate our analytical results in extensive simulations. We believe that our tree-based overlay could serve as an infrastructure for P2P applications that demand scalability, fast communication, and low overhead. Our next work will study how a P2P application such as live media broadcasting takes advantage of our tree-based overlay for minimizing the communication latencies among nodes and the systemwide overhead. In particular, recent studies (for example, [37] and [38]) have presented that the heterogeneity of peers is the nature of a P2P environment, and they show that taking advantage of the heterogeneity can improve the performance quality of tree-based streaming overlays. In our future study, we will optimize our tree-based overlay by exploiting the heterogeneity of peers.
ACKNOWLEDGMENTS The authors thank the anonymous reviewers for their valuable feedback and Dr. Yingwu Zhu for his helpful comments on this paper. This work was partially supported by the National Science Council, Taiwan, under Grant 95-2221-E-006-095.
REFERENCES [1]
[2]
[3]
J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao, “OceanStore: An Architecture for Global-Scale Persistent Storage,” Proc. Ninth ACM Int’l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS ’00), pp. 190-201, Nov. 2000. B.Y. Zhao, L. Huang, J. Stribling, S.C. Rhea, A.D. Joseph, and J.D. Kubiatowicz, “Tapestry: A Resilient Global-Scale Overlay for Service Deployment,” IEEE J. Selected Areas in Comm., vol. 22, no. 1, pp. 41-53, Jan. 2004. D. England, B. Veeravalli, and J.B. Weissman, “A Robust Spanning Tree Topology for Data Collection and Dissemination in Distributed Environments,” IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 5, pp. 608-620, May 2007.
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.
1110
[4]
[5] [6] [7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21] [22]
[23] [24]
[25]
[26]
[27]
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
J. Li, J. Stribling, T.M. Gil, R. Morris, and M.F. Kaashoek, “Comparing the Performance of Distributed Hash Tables under Churn,” LNCS 3279, pp. 87-99, Jan. 2005. J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach. Morgan Kaufmann, 2002. Y. Chu, S. Rao, and H. Zhang, “A Case for End System Multicast,” Proc. ACM SIGMETRICS ’00, pp. 1-12, 2000. S. Banerjee, B. Bhattacharjee, and C. Kommareddy, “Scalable Application Layer Multicast,” Proc. ACM SIGCOMM ’02, pp. 205217, Aug. 2002. D.A. Tran, K.A. Hua, and T. Do, “ZIGZAG: An Efficient Peer-toPeer Scheme for Media Streaming,” Proc. IEEE INFOCOM ’03, pp. 1283-1292, Mar. 2003. M. Hefeeda, A. Habib, B. Botev, D. Xu, and B. Bhargava, “PROMISE: Peer-to-Peer Media Streaming Using CollectCast,” Proc. 11th ACM Int’l Conf. Multimedia (Multimedia ’03), pp. 45-54, Nov. 2003. X. Liao, H. Jin, Y. Liu, L.M. Ni, and D. Deng, “AnySee: Peer-toPeer Live Streaming,” Proc. IEEE INFOCOM ’06, pp. 1-10, Mar. 2006. V. Venkataraman, K. Yoshida, and P. Francis, “Chunkyspread: Heterogeneous Unstructured Tree-Based Peer-to-Peer Multicast,” Proc. 14th IEEE Int’l Conf. Network Protocols (ICNP ’06), pp. 2-11, Nov. 2006. S.M. Banik, S. Radhakrishnan, and C.N. Sekharan, “Multicast Routing with Delay and Delay Variation Constraints for Collaborative Applications on Overlay Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 3, pp. 421-431, Mar. 2007. I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proc. ACM SIGCOMM ’01, pp. 149-160, Aug. 2001. A. Rowstron and P. Druschel, “Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems,” LNCS 2218, pp. 161-172, Nov. 2001. M. Castro, P. Druschel, A. Kermarrec, A. Nandi, A. Rowstron, and A. Singh, “SplitStream: High-Bandwidth Content Multicast in a Cooperative Environment,” Proc. 19th ACM Symp. Operating Systems Principles (SOSP ’03), pp. 298-313, Oct. 2003. C. Chou, T.-Y. Huang, K.-L. Huang, and T.-Y. Chen, “SCALLOP: A Scalable and Load-Balanced Peer-to-Peer Lookup Protocol,” IEEE Trans. Parallel and Distributed Systems, vol. 17, no. 5, pp. 419433, May 2006. C.G. Plaxton, R. Rajaraman, and A.W. Richa, “Accessing Nearby Copies of Replicated Objects in a Distributed Environment,” Proc. Ninth ACM Symp. Parallel Algorithms and Architectures (SPAA ’97), pp. 311-320, June 1997. M. Castro, M.B. Jones, A.-M. Kermarrec, A. Rowstron, M. Theimer, H. Wang, and A. Wolman, “An Evaluation of Scalable Application-Level Multicast Built Using Peer-to-Peer Overlays,” Proc. IEEE INFOCOM ’03, pp. 1510-1520, Mar. 2003. S. El-Ansary, L.O. Alima, P. Brand, and S. Haridi, “Efficient Broadcast in Structured P2P Networks,” LNCS 2735, pp. 304-314, Oct. 2003. A. Bharambe, S. Rao, V. Padmanabhan, S. Seshan, and H. Zhang, “The Impact of Heterogeneous Bandwidth Constraints on DHTBased Multicast Protocols,” LNCS 3640, pp. 115-126, Feb. 2005. T. Cormen, C. Leiserson, and R. Rivest, “Recurrences,” Introduction to Algorithms, second ed. MIT and McGraw-Hill, 2001. M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On Power-Law Relationships of the Internet Topology,” Proc. ACM SIGCOMM ’99, pp. 251-262, Aug. 1999. Gnutella, http://rfc-gnutella.sourceforge.net/, 2007. D.R. Karger and M. Ruhl, “Finding Nearest Neighbors in GrowthRestricted Metrics,” Proc. 34th ACM Ann. Symp. Theory of Computing (STOC ’02), pp. 741-750, May 2002. H. Zhang, A. Goel, and R. Govindan, “Improving Lookup Latency in Distributed Hash Table Systems Using Random Sampling,” ACM/IEEE Trans. Networking, vol. 13, no. 5, pp. 1121-1134, Oct. 2005. T.S.E. Ng and H. Zhang, “Predicting Internet Network Distance with Coordinates-Based Approaches,” Proc. IEEE INFOCOM ’02, pp. 170-179, June 2002. J.C. Chu, K.S. Labonte, and B.N. Levine, “Availability and Locality Measurements of Peer-to-Peer File Systems,” Proc. SPIE—ITCom: Scalability and Traffic Control in IP Networks, pp. 310-321, July 2002.
VOL. 19,
NO. 8,
AUGUST 2008
[28] S. Saroiu, P.K. Gummadi, and S.D. Gribble, “Measurement Study of Peer-to-Peer File Sharing Systems,” Proc. Multimedia Computing and Networking (MCN ’02), pp. 18-25, Jan. 2002. [29] Napster, http://www.napster.com/, 2007. [30] G. Pandurangan, P. Raghavan, and E. Upfal, “Building LowDiameter Peer-to-Peer Networks,” IEEE J. Selected Areas in Comm., vol. 21, no. 6, pp. 995-1002, Aug. 2003. [31] H.-C. Hsiao and C.-P. He, “A Tree-Based Peer-to-Peer Network with Quality Guarantees,” technical report (available upon request), Dept. of Computer Science and Information Eng., Nat’l Cheng-Kung Univ., June 2007. [32] KaZaA, http://www.kazaa.com/, 2007. [33] K.P. Gummadi, R.J. Dunn, S. Saroiu, S.D. Gribble, H.M. Levy, and J. Zahorjan, “Measurement, Modeling, and Analysis of a Peer-toPeer File-Sharing Workload,” Proc. 19th ACM Symp. Operating Systems Principles (SOSP ’03), pp. 314-329, Oct. 2003. [34] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, “Handling Churn in a DHT,” Proc. Usenix Ann. Technical Conf., 2004. [35] A. Medina, A. Lakhina, I. Matta, and J. Byers, “BRITE: An Approach to Universal Topology Generation,” Proc. Ninth Int’l Symp. Modeling, Analysis, and Simulation of Computer and Telecomm. Systems (MASCOTS ’01), pp. 346-353, Aug. 2001. [36] PlanetLab, http://www.planet-lab.org/, 2007. [37] M. Bishop, S. Rao, and K. Sripanidkulchai, “Considering Priority in Overlay Multicast Protocols under Heterogeneous Environments,” Proc. IEEE INFOCOM ’06, pp. 1-13, Mar. 2006. [38] Y.-W. Sung, M. Bishop, and S. Rao, “Enabling Contribution Awareness in an Overlay Broadcasting System,” Proc. ACM SIGCOMM ’06, pp. 411-422, Sept. 2006. Hung-Chang Hsiao received the PhD degree in computer science from the National Tsing-Hua University, Hsinchu, Taiwan, in 2000. From October 2000 to July 2005, he was a postdoctoral researcher in the Department of Computer Science, National Tsing-Hua University. Since August 2005, he has been an assistant professor in the Department of Computer Science and Information Engineering, National Cheng-Kung University, Tainan, Taiwan. His research interests include peer-to-peer computing, overlay networking, and grid computing. Chih-Peng He received the BS degree in computer science and information engineering from Fu-Jen Catholic University, Taipei, in 2004 and the MS degree in computer science and information engineering from the National Cheng-Kung University, Tainan, Taiwan, in 2007. He is currently with the Department of Computer Science and Information Engineering, National Cheng-Kung University. His research interests include peer-to-peer computing and overlay networking. . For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.
Authorized licensed use limited to: National Taiwan Univ of Science and Technology. Downloaded on September 4, 2009 at 06:24 from IEEE Xplore. Restrictions apply.