ACNS: Adaptive Complementary Neighbor Selection in ... - CiteSeerX

1 downloads 0 Views 338KB Size Report
In this paper, we study traffic shaping in BitTorrent-like applications to improve traffic locality and enable fast data delivery. To this end, a piece complementary ...
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2009 proceedings

ACNS: Adaptive Complementary Neighbor Selection in BitTorrent-like Applications Zhenbao Zhou1, 2, Zhenyu Li1, 2, Gaogang Xie1 1

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China 2 Graduate School of Chinese Academy of Sciences, Beijing 100190, China {zhouzhenbao, zyli, xie}@ict.ac.cn

Abstract—BitTorrent, one of the most popular Peer-to-Peer file sharing applications, accounts for a large proportion of the total Internet traffic. While its appearance benefits the content distributors and users, the traffic injected into the network backbone has become a great challenge for the ISPs. In this paper, we study traffic shaping in BitTorrent-like applications to improve traffic locality and enable fast data delivery. To this end, a piece complementary index is introduced based on the piece demand between peer nodes. Then, we propose an efficient and adaptive neighbor selection scheme (ACNS). According to ACNS, each node self-adaptively chooses the most complementary peers to connect with and download file pieces, rather than having a fixed number of outside neighbors. Our scheme can be integrated with the BitTorrent protocol by slight modifications, and requires no additional infrastructure provided by ISPs. Experimental results based on extensive simulations have shown the effectiveness of ACNS. Compared with the fixed biased neighbor selection scheme, ANCS cuts down the cross-ISP traffic by more than 31% and improves the download rate by about 15%. I. INTRODUCTION The past few years have witnessed the continuous growth of the P2P (Peer-to-Peer) file sharing applications, such as Gnutella, eMule, BitTorrent [3] and etc. A recent report [13] by Ipoque has shown that P2P file sharing applications account for about 49% to 83% of Internet traffic worldwide, while BitTorrent is the most popular application which represents as large as 73% of P2P traffic in Australia, and more than 60% in Germany, China and some other countries. For space limitation, we omit the BitTorrent protocol specifications since is quite well-known [10]. A. Problem Statement and Related Works BitTorrent has gained great success and has been considered as one of the most popular and efficient applications for file content distribution. However, it brings a great challenge for the Internet Service Providers (ISPs) due to the large amount of cross-ISP traffic, which not only significantly increases ISPs’ cost, but also consumes too much backbone’s bandwidth, adversely affecting the QoS of other applications, such as WWW, Email and etc. In BitTorrent, the neighbors of a node are chosen randomly. The randomized neighbor selection enables the even distribution of file pieces during file downloading. But, it also

greatly increases the probability that a node downloads a piece from a remote neighbor (e.g. belonging to different ISPs), although the piece is also owned by a physically nearby neighbor (e.g. belonging to the same ISP). Thus, the cross-ISP traffic is considerable. Actually, in an N-node system, for the ISP that has n nodes in the system, a file piece traverses into the ISP n(1 − n / N ) ≈ n times, on average [11]. Currently, ISPs always resort to rate throttling to control the traffic generated by BitTorrent. The complaints raised by the subscribers and how to accurately identify BitTorrent traffic are two major concerns. Deploying caches at the ISPs’ gateways is another choice [6][8]. However, it requires a large storage space and violates the legality of the content. Bindal et. al in [10] present a fixed biased neighbor selection algorithm to improve the traffic locality of BitTorrent. Each node only builds a small number (e.g. 1) of links with external peers that locate at other ISPs. Limiting and fixing the number of links with external peers adversely affect the evenly distribution of file pieces, which in turn prolongs the download time. Moreover, the external neighbors of a node are fixed all the time, which makes it very likely the file pieces that are very common within local network are still fetched from other ISPs. Xie et. al propose P4P [6], an architecture for cooperative traffic control. Each ISP provides a server named “itracker” to keep track of the network state, topology and policy of the ISP. Then, the tracker calculates which neighbor should be chosen according to the information collected by itrackers. In [11], nodes are grouped into clusters and several nodes with higher upload speed are selected as the core nodes for the cluster. Only the core nodes get file pieces from other clusters. Therefore, the core nodes are naturally the bottlenecks of the clusters and more vulnerable. Other related works include measurement and modeling of BitTorrent systems. Interested readers please refer to [2] [4]for details. B. Our Contribution In this paper, we study traffic shaping in BitTorrent-like applications through adaptive bias neighbor selection. Our goal is to improve traffic locality and enable fast data delivery. To this end, we propose an efficient and adaptive neighbor selection scheme (ACNS). ACNS is motivated by the fact that the number of random neighbors of a node should varies adaptively with the file download process. The file download process is spit into three phases [12]: bootstrap phase, efficient download phase and last download phase. At the bootstrap phase, since only very few nodes have file pieces, a node

978-1-4244-3435-0/09/$25.00 ©2009 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2009 proceedings

should establish more links to random nodes to fetch file pieces into the local network. At the efficient download phase, since there have been lots of file pieces in local network, the number of random neighbors of a node should be reduced correspondingly. At the last download phase, since the number of nodes in the local network is relative small, a node should build more random links to get the last pieces. We introduce a piece complementary index (PCI) to evaluate the complement of two nodes, based on the demand of them on each other. PCIs of neighbors vary with the evolution of the file download process and give us indications of how many and which random neighbors that a node should have. Each peer periodically evaluates the PCIs to random nodes and self-adaptively chooses the most complementary peers to connect with and download file pieces. To the best of our knowledge, this is the first work that uses adaptive neighbor selection to shape BitTorrent traffic. We perform extensive system simulations based on an event-driven simulator to evaluate the performance of ACNS. The results show that our scheme improves the traffic locality and file download speed effectively. Specially, compared with the fixed biased neighbor selection scheme [10], ANCS cuts down the cross-ISP traffic by more than 31% and improves the download rate by about 15%. The rest of the paper is organized as follows. Section II details ACNS algorithm, including the definition of PCI and adaptive neighbor selection. Section III presents the experiment setup, performance metrics and the results. Finally, we conclude our work in Section IV and describe the possible future works. II. ADAPTIVE COMPLEMENTARY NEIGHBOR SELECTION In this section, we detail our approach ACNS. To ease description, we define some notations. A node y is called as node x’s outside neighbor if x and y locate at different ISPs. Otherwise, y is x’s inside neighbor. We first introduce Piece Complementary Index (PCI) to reflect the complement of two peers. If a node x’s outside peer y has several file pieces which are rare on x’s inside neighbors, then we call x has a demand on y. If y also has a demand on x, we call x and y are complementary. A larger complement of x and y indicates the necessary to build a neighbor link between them. Obviously, the complement of two peers varies with the file download process. Thus, each node periodically evaluates the demands of it on the outside peers and, in turn, calculates the PCIs. Based on the PCIs, a node adaptively adjusts its TABLE I NOTATIONS Notation

Definition

m ISP-x D(x, y) P(x) M(x) S(x) R(x) Kx

the number of overall pieces the ISP that node x belongs to the demand of node x on node y the set of pieces peer x has downloaded the set of pieces peer x misses the set of inside neighbors of peer x the set of outside neighbors of peer x the number of inside neighbors of peer x, equal to the cardinality of Sx the number of nodes in Sx that have piece i the piece complementary index of node x and node y

L(i,x) PCI(x, y)

neighbor relationship by selecting several outside peers which are more complementary to it to connect and download file pieces. Table I lists all important parameters and notations used in this paper. A. Piece Complementary Index We first describe how to evaluate the demand of a node x on an outside peer y, D( x, y ) . Intuitively, x has a demand on y if y has some file pieces that x missed. However, our goal is to limit the traffic injected into the ISPs. Thus, we call x has a demand on y if and only if y has several file pieces that are rare on x’s inside neighbors. Thus, we define D( x, y ) to reflect the extent of x’s demand on y as follows. m

D ( x, y ) =

∑ (e

xy

(i )( K x − L(i, x))

i=0

(1)

m

∑ (e

xy

(i ) K x )

i =0

where exy (i ) is a indicator which indicates whether file piece i is owned by y but missed by x, and is calculated as follows. Other notations are listed in Table I. ⎧1 if i ∈ P ( y ) and i ∈ M ( x ) exy (i ) = ⎨ (2) ⎩0 otherwise Obviously, 0 ≤ D( x, y ) ≤ 1 . Note that D( x, y ) is not zero even x’s inside neighbors have the pieces that x misses. Recall that another goal is to enable fast file distribution. If only very few (e.g. 1) inside neighbors have the file piece that x misses, then, since a node only supports 5 concurrent uploads, x may wait for a long time before it gets that piece. Thus, if the file pieces that x misses are rarely replicated on its inside neighbors, x still has a demand on the outside neighbor y which has those pieces. However, the extent of x’s demand on an outside peer decreases with the growth of the number of inside neighbors which have the file pieces that x misses. Another thing worth pointing out is that the demand of x on outside peer y also reflects the demand of ISP-x on peer y. The demand of node x on its outside peer can be used to guide the neighbor relationship establishment between them: the higher the demand is, the higher probability that a link is built between them. However, recall that BitTorrent uses incentive scheme “Tit-for-Tat” (TFT). Thus, if x’s outside peer y has little interests on the pieces that x has downloaded, then it would take a long time before x gets a file piece from y, even x has a large demand on y. On the other hand, if y also has a large demand on x, then it has a great chance that the file piece transferring starts quickly between them and the transferring lasts for a long time. Note that, since each peer leverages “rare-first” strategy to select which piece to download, the file pieces exchanged between x and y must be the ones that ISP-x and ISP-y need eagerly. Motivated by this fact, we define Piece Complementary Index (PCI) to reflect the complement of x and y as follows. (3) PCI ( x, y ) = D ( x, y ) × D ( y , x) Now, we directly leverage PCI ( x, y ) to guide the neighbor relationship establishment. A larger complement of x and y indicates the more necessary to build a neighbor link between them. Fig. 1 gives an example of how to compute PCI between

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2009 proceedings

two peers. The shared file is divided into 3 pieces. Since node y only has piece c, thus D( x, y ) = (0 + 0 + (3 − 2)) / 3 = 1 / 3 . Since node y and all its inside neighbors miss piece a, thus D( y, x) = ((3 − 0) + 0 + 0) / 3 = 1 . Finally, PCI ( x, y ) = 1 / 3 .

Fig. 1 Example: calculation of PCI ( x, y )

B. Adaptive Neighbor Management Each node x periodically requests the tracker for a peer list. In ACNS, the tracker records which ISP each peer belongs to. When receiving a request from x for peer list, the tracker responds x with a k-peer list, of which k/2 peers are randomly selected from x’s inside peers and other k/2 peers are randomly selected from x’s outside peers. Node x maintains two peer lists: one for inside peers and one for outside peers. These two lists are used to record the peer information received from the tracker. Node x computes PCIs with each peer in the outside peer list according formula (3). To save the time and space cost, we limit the length of each list to 35. Least Recently Used (LRU) mechanism is used when the lists are full. For each node x, suppose that q outside peers are kept in the outside peer list, we sort them by their PCIs with x, which yields PCI ( x, y1 ) ≥ PCI ( x, y2 ) ≥ ... ≥ PCI ( x, yq ) . Now, we select the first several peers with higher PCIs as x’s outside neighbor. The question is how many peers we should select. A simple answer is that each node fixed selects k peers. Then, the question is transformed to how to determine the value of k. A larger k causes too much cross-ISP traffic whereas a smaller k slows down the download rate. Instead, in ACNS, we adopt an adaptive selection based on the PCIs. After sorting the PCIs of all available outside peers of x, we have the following question: min(k ) , subjected to PCI ( x, y1 ) + PCI ( x, y2 ) + ... + PCI ( x, yk ) ≥ RT , where RT is short for (Required Threshold) and reflects how large the complement we want to have. Given the value of RT, we can get the k outside peers for peer x to connect with. The default value of RT is 1. However, we evaluate its effect on the performance in the simulations (Section III). Suppose the selected k outside peers for node x compose set T ( x) . Node x runs the routine adjust_neighbor as described in Fig. 2. After the execution of routine adjust_neighbor, if the total number of connections that node x initializes is below 20, as the regular BitTorrent, x contacts the tracker to obtain a new peer list and connects with the inside peers. Peers in ACNS periodically adjust their neighborhood relationships. The period, which is referred as Rechoose Neighbor Interval (RNI), should be chosen carefully. Since TFT strategy

x.adjust_neighbor( T ( x) ) // the notations used here are all listed in Table 1, if not otherwise specified. 1. for each node y ∈ R ( x) do 2. if y ∉ T ( x) then 3. x disconnects the connection to y 4. T ( x) = T ( x) − { y} 5. end if 6. end for 7. t = 35 − S ( x) − R ( x) /* A node at most can initialize 35 connections actively, S ( x) and R ( x) are the cardinality of the set S ( x) and R ( x) , respectively. */ 8. while t > 0 and T ( x) is not empty do 9. x randomly selects a node z in T ( x) to establish a connection 10. t = t −1 11. T ( x) = T ( x) − {z} 12. end while 13. if t > 0 then 14. x randomly selects t peers from the inside peer list it stores to connect with 15. end if Fig. 2 Algorithm for neighbor adjustment

is used, a node only unchokes its neighbor from which its download rate is higher. Therefore, newly neighbors can only get unchoked by “optimistic unchoke” strategy. This means that the RNI should be bigger than the “optimistic unchoke” interval, which is 30 seconds. If we only consider the control overhead, RNI should be larger. However, after the piece exchanges for a while, the complement of two nodes shrinks quickly. When the complement of two nodes is very low, we should disconnect the links and build new links between more complementary nodes. Thus, the interval should not be too long. In current design, the default value of RNI is 120 seconds. However, we also evaluate the impact of RNI on the performance in simulations (Section III). Under two special circumstances, ACNS is not used. First, since the seeds do not need any file piece, they randomly choose their neighbors as in the regular BitTorrent protocol to evenly distribute the file pieces. Second, for a new peer that has just joined and does not have any blocks, ACNS is not applicable. The new peers also select their initial neighbors randomly. Note that we do not employ ACNS when selecting inside neighbors. This is because, after joining the torrent for a while, a node always has more links to inside neighbors and the bandwidths of these links are always very large. Thus, the file pieces are distributed very quickly within the ISPs even without ACNS. C. The Implementation of ACNS By slightly modifying the tracker and the client, ANCS can be implemented easily. A function is added in the tracker to record the ISPs that the peers belong to. The challenge for tracker is how to get the information of ISP for each client. Many tools such as iPlane [14] periodically update their databases for IP-to-AS mapping. Based on that information, tracker can easily know that which ISP the peers belong to. As for the client, the function of periodic neighbor relationship

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2009 proceedings

management based on PCIs should be added. We believe the slightly modifications on tracker and client are acceptable, and it greatly benefits both service provider and end-user. III. EVALUATIONS A. Experiment Setup We developed a discrete-event simulator to evaluate the performance of BitTorrent protocol with ACNS. This simulator models peer activities (e.g. joins, departures) as well as other BitTorrent mechanisms (e.g. TFT, etc.) in detail. We implement ACNS, original BitTorrent with no changes (called as Native BitTorrent) and biased neighbor selection [11] scheme in which a node only has 1 fixed outside neighbor. All nodes in the network have asymmetric upload/download bandwidths. The network (i.e. torrent) consists of 1,000 peers. Since most of the current users are ADSL subscribers, the upload bandwidth of each peer is set to 512 Kbps while the download bandwidth is 2,048 Kbps. It is assumed that the file to distribute is a very popular one, so all the peers join in the system as a “flash crowd” [7]. The peer leaves the network as soon as it completes downloading. There is one original seed in the system. The original seed does not depart until all the nodes finish downloading. The default upload bandwidth of the seed is 5Mbps. 1,000 nodes are assigned to 10 ISPs, each with roughly 100 nodes. All the ISPs are completely connected with each other. The downlink bandwidth of each ISP is 200Mbps, while the uplink bandwidth for BitTorrent application is limited, considering the traffic-shaping devices deployed by ISPs. The default uplink bandwidth of each ISP is 30Mbps. However, we explore the impact of bandwidth throttling. The default file size is 100MB. The file is divided into pieces, each with 256KB. We mainly focus on three performance metrics. M1. Proportion of cross-ISP traffic. Let Tintra-ISP denotes the traffic generated within ISPs to distribute the file to all the nodes, Tcross-ISP denote the traffic between ISPs. Then the proportion of cross-ISP traffic (P_Cross) is defined as follow, where n is the number of nodes. P _ Cross =

Tcross − ISP Tcross − ISP = × 100% Tcross − ISP + Tint ra − ISP n × sizeof ( file)

M2.

Maximum download time: It is defined as the maximum value of nodes’ download complete time. M3. Mean download time: It is defined as the mean value of nodes’ download complete time. Obviously, the first metric demonstrates the efficacy of our scheme in terms of traffic shaping, while the last two tell us the download rate performance. B. Performance Results Experimental results are reported below in terms of 3 performance metrics. We compare our scheme ACNS-enabled BitTorrent performance with the native BitTorrent and bias neighbor selection based BitTorrent. The simulations are run multiple times and the differences between them are neglectable. Hence, we only present the result of one run. 1) Relative ACNS Performance First, we evaluate the performances of different schemes

(a)

Proportion of the cross-ISP traffic

(b) CDF of complete time Fig. 3

Relative performance of ACNS

enabled BitTorrent protocols. Fig. 3 depicts the results. Fig. 3(a) compares the proportion of cross-ISP traffic. The cross-ISP traffic only contributes 15.19% in ACNS, while it contributes 55.18% and 22.18% in native and biased BitTorrent, respectively. Thus, compared with native and biased BitTorrent, ACNS-enabled BitTorrent reduces the cross-ISP traffic by about 72.5% and 31.5%, respectively. Fig. 3(b) shows the cumulative distribution of download complete time. It is found that, in ACNS-enabled BitTorrent, 95.4% percent of the total nodes complete the download within 2,000 seconds, while only 17.0% percent and 59.1% percent within 2,000 seconds in native BitTorrent and bias neighbor selection based BitTorrent, respectively. We also compute the mean download time. The results show that the mean download time is 2,376 seconds for native BitTorrent, 1,945 seconds for bias neighbor selection based BitTorrent, and 1,660 seconds for ACNS-enabled BitTorrent. Hence, compared with native and biased BitTorrent, ACNS-enabled BitTorrent improves the download rate by about 30.1% and 14.6%, respectively. 2) Impact of Rechoose Neighbor Interval (RNI) Next, we explore the impact of RNI on the performance of ACNS. Recall that RNI should be bigger than 30 seconds. Fig. 4 plots the results when RNI varies from 60 seconds to 600 seconds. The download time is relative large when RNI is 60 seconds. This is because the interval is so small that an outside neighbor may have no chance to be optimistic unchoked before the connection is disconnected. However, once the RNI grows to 120 seconds, the download time dwindles sharply. As to the cross-ISP traffic, it grows from 13.9% to 28.3% as RNI increases from 60 seconds to 600 seconds. This stems from the fact that, for a larger interval, it takes a long time for the number of outside neighbors of a node drops to a small value, which in turn benefits the piece exchanges among ISPs. We choose 120 seconds for the default value of the interval.

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2009 proceedings

Proportion of cross-ISP traffic Fig. 4

Fig. 5

Impact of Rechoose Neighbor Interval

Effect of Required Threshold

3) Effect of Required Threshold (RT) In this set of experiments, we evaluate the effect of Required Threshold. Fig. 5 reports the results. Two observations are notable. First, when RT is as small as 0.5, the download time is relatively large. This is because, in this case, a node always has fewer outside neighbors. The file pieces that the ISPs really need are not fetched into the ISPs in time. However, once the threshold grows to 1, the download time decreases sharply. Second, the proportion of cross-ISP traffic grows with the growth of the threshold for that a node always has more outside neighbors when the threshold is larger. We choose 1 as the default value of the threshold. 4) Impact of bandwidth throttling We evaluate the impact of bandwidth throttling on the performance by varying the ISP uplink bandwidth. Fig. 6 reports the results. In the native BitTorrent, node neighbors are randomly selected and most of file pieces are exchanged among ISPs. Thus, the uplink bandwidth is the bottleneck, which is the reason why the download time in native BitTorrent decreases with the growth of uplink bandwidth. On the other hand, when the bandwidth is limited, a node may only get fewer file pieces from outside neighbors, while a considerable number of file pieces are gotten from inside neighbors. This is the reason why the cross-ISP traffic in native BitTorrent is smaller when the uplink bandwidth is only 10Mbps. It is also found that the uplink bandwidth has a relatively small impact on the performances of ACNS-enabled and bias neighbor selection based BitTorrent. However, ACNS outperforms others. IV. CONCLUSION AND FUTURE WORK In this paper, we propose ACNS, an adaptive neighbor selection algorithm for BitTorrent applications to reduce cross-ISP traffic and enable fast content distribution. Rather than having a fixed number of outside neighbors, ACNS suggests that a node should adaptively select the most comple-

Fig. 6

Impact of bandwidth throttling

mentary peers in other ISPs to connect with. To this end, we introduce piece complementary index to evaluate the complement of two nodes locating in different ISPs and then, depict how to leverage the index to guide the adjustment of neighbor relationships. The experimental results show the efficacy of our scheme in terms of two performance metrics, namely download time and the cross-ISP traffic. Specially, compared with the fixed biased neighbor selection scheme, ANCS cuts down the cross-ISP traffic by more than 31% and improves the download rate by about 15%. As a future work, we will study the piece selection strategy in BitTorrent to improve the traffic locality and integrate the proposed scheme into the open-source BitTorrent. ACKNOWLEDGE This work is supported by National Basic Research Program of China with grant No.2007CB310702, by National Natural Science Foundation of China with grant No. 90604015 and No. 60873242. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

A. Legout , G. Urvoy-Keller , P. Michiardi, “Rarest first and choke algorithms are enough”, proceedings of the 6th ACM SIGCOMM conference on Internet measurement, Rio de Janeriro, Brazil, 2006. A. R. Bharambe, C. Herley, and V. N. Padmanabhan, “Analyzing and improving a bittorrent networks performance mechanisms,” INFOCOM 2006.. B. Cohen. “Incentives build robustness in BitTorrent,” P2PEcon, 2003. D. Qiu and R. Srikant. “Modeling and Performance Analysis of BitTorrent-like Peer-to-Peer Networks,” SIGCOMM, Sep. 2004. G. Shen, Y. Wang, Y. Xiong, B.Y. Zhao, Z. Zhang “HPTP: Relieving the tension between ISPs and P2P,” Proceedings of IPTPS, 2007. H. Xie, A. Krishnamurthy, A. Silberschatz, Y. R. Yang, “P4P: Provider Portal for Applications”, SIGCOMM, 2008. J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips,”A Measurement Study of the BitTorrent Peer-to-Peer File-Sharing System,” technical report PDS-2004-003, May 2004. K. Gummadi, R. Dunn, S. Saroiu, S. Gribble, H. Levy, and J. Zahorjan. “Measurement, modeling, and analysis of a peer-to-peer file-sharing workload,” in Proc. of SOSP ’03, Bolton Landing, Oct. 2003. L. Guo, S. Chen, Z. Xiao, E. Tan,” Measuremsnts, analysis and modeling of bittorrent-like systems,” IMC, 2005. R. Bindal, P. Cao, W. Chan, J. Medved, G. Suwala, T. Bates, and A. Zhang, “Improving traffic locality in bittorrent via biased neighbor selection,” ICDCS, 2006. T. Wang, L. Wen, W. Li, J. Tao, A. Wang, and F. Baker “Traffic Shaping in BitTorrent Systems by Centralized Hierarchical Peer-node Assignment,” ICC, 2008. V. Rai, S. Sivasubramanian, and S. Bhulai., “A multi Phased approach for modeling and analysis of the BitTorrent protocol,” Proc. of IEEE ICDCS 2007, Toronto, Canada, 2007. “Internet Study 2007”, Available: http://www.ipoque.com/userfiles/file/internet_study_2007.pdf “iPlane”. Available: http://iplane.cs.washington.edu/data.html

Suggest Documents