A study on key strategies in P2P file sharing systems and ISPs' P2P ...

Peer-to-Peer Netw. Appl. DOI 10.1007/s12083-010-0098-7

A study on key strategies in P2P file sharing systems and ISPs’ P2P traffic management Jessie Hui Wang · Chungang Wang · Jiahai Yang · Changqing An

Received: 15 January 2010 / Accepted: 21 December 2010 © Springer Science+Business Media, LLC 2011

Abstract The flourish of P2P systems draws a lot of attention of networking researchers. Some research efforts focus on P2P systems, trying to understand the mechanism of various implementations and the behavior pattern of P2P users, and then improve the systems’ performance. Others look at the issue from the angle of ISPs, trying to help ISPs solve various issues brought by P2P applications. In this article, we conduct a review study on recent research efforts in these two areas. The first part of this article focuses on several key strategies that have significant influence on the performance of P2P systems. In the second part, we review some important techniques for ISPs to manage P2P traffic, i.e., blocking, caching and localization, and compare their advantages and disadvantages. Keywords P2P · ISP · Traffic engineering · Network management

1 Introduction The traffic generated by peer-to-peer (P2P) systems has become the dominant traffic in many networks today, especially those with broadband access. IPOQUE reports that in 2008 and 2009 the average share of P2P traffic regionally varies between 43% in Northern Africa and 70% in Eastern Europe (Internet Study,

J. H. Wang (B) · C. Wang · J. Yang · C. An Network Research Center, Tsinghua National Laboratory of Information Science and Technology, Tsinghua University, Beijing, People’s Republic of China e-mail: [email protected]

http://www.ipoque.com/resources/internet-studies/internetstudy-2008_2009). In P2P systems, each peer (node) gets service from other peers, at the same time it also provides service to other peers. As a result, the whole system achieves better scalability and robustness than traditional clientserver communication systems. The concept of P2P has been exploited to implement various systems for different goals such as file sharing (e.g., Gnutella, http:// rfc-gnutella.sourceforge.net/index.html; BitTorrent, http:// www.bittorrent.com; eMule [1]), video on demand (e.g., PPLive, http://www.pplive.com/; PPStream, http://www. ppstream.com/), and instant messenger (e.g., QQ, http://www.qq.com/; Skype, http://www.skype.com/). The flourish of P2P systems draws a lot of attention of networking researchers. Most of their research efforts can be classified into two categories. One category focuses on P2P systems, trying to understand the mechanism of various implementations and the behavior pattern of P2P users, and then improve the systems’ performance. The other category looks at the issue from the angle of ISPs, trying to help ISPs solve various issues brought by P2P applications. It is well-known that ISPs are facing many challenges. For example, the application layer routing of P2P overlays can be in conflict with the network layer routing policies of ISPs, which upsets the basic assumptions of ISPs’ traditional business model [2]. Although P2P fuels the demand of end users for broadband connections, ISPs may not be able to benefit from this trend due to the fixed monthly fee paid by end users and the volume-based charges of upstream providers [3]. In this article, we summarize some recent research results in these two areas. The first part of our article focuses on several key strategies in P2P systems,

Peer-to-Peer Netw. Appl.

including f ile splitting strategy on how to split files into pieces or chunks, piece selection strategy on how to determine the next piece to download, and peer selection strategy on how to determine the order of providing service to other peers. As we will discuss in Section 2, these strategies have significant influence on the performance of P2P systems. In Section 3 we present some important and promising techniques to manage P2P traffic, i.e., blocking, caching and localization, and compare their advantages and disadvantages. Section 4 concludes the article.

2 Key strategies in P2P systems The implementations of P2P systems may differ from each other in a lot of details, however, most of them share the same basic elements. For example, most P2P implementations split the content into multiple pieces to enable simultaneous downloading; most of them need to make decision on the order of requesting wanted pieces. Androutsellis-theotokis and Spinellis [4], as well as Lua et al. [5], have published two representative survey papers on P2P systems. In this section, we will not repeat their surveys on all aspects of P2P systems. Instead, we will emphasize several key strategies which are essential for ISPs’ traffic management, introduce the updated details of different implementations and summarize recent research efforts on evaluating the performance of these implementations.

2.1 File splitting strategy In order to speed the distribution of a resource file, most P2P systems split the file into fixed-size pieces except the last piece, which enables P2P clients to download data from multiple peers simultaneously. Here, “piece” is the smallest unit to share data, which means that a peer must have at least one piece to provide uploading service to other peers.

Table 1 Some terminologies and parameters in P2P systems

System eMule BitTorrent PPLive

Terminology Part Block Chunk Block Chunk Piece Sub-piece

Different P2P systems have different terminologies to denote “piece”, and they also set different piece sizes. In BitTorrent, the default piece size is 256 KB. BitTorrent accepts other piece sizes, but the piece size must be a positive integer power of two. eMule systems use the terminology “part”. The part size is required to be about 9.28 MB and other part sizes are not allowed. In PPLive systems, it is defined as “chunk”, and the chunk size is set to be 2 MB. For efficient scheduling of transmission, each piece is further divided into multiple subpieces , and “subpiece” is the smallest unit to conduct data transfer between peers. In other words, when a piece is available from multiple remote peers, the local peer can decide which remote peer to download the file piece. Once it decides, it must download all subpieces of this piece from the same remote peer. The terminologies and default parameters in some representative P2P systems are summarized in Table 1. Piece size is related to the degree of parallelism available in the system, so it is potentially critical for the performance of traffic distribution. Marciniak studies the influence of piece size on system performance by varying piece sizes on a controlled BitTorrent testbed [6]. Their experiments are conducted on PlanetLab with private torrents sharing files of different sizes. Experiments show that smaller piece sizes enable shorter download time and higher upload utilization for small content (e.g., 1 MB, 5 MB and 10 MB), while for large files optimal piece size increases with content size. The authors give two possible reasons for drawbacks of small piece sizes. One is that it reduces opportunities for subpiece request pipelining, the other is that it may incur slowdown due to TCP effects. 2.2 Piece selection strategy When a local peer needs multiple file pieces and these pieces are available from remote peers in its peer set, the piece selection strategy gives the solution to the issue of determining which piece should be requested first.

Design for Unit for advertisement and validity checkout Unit for transmission and validity checkout Unit for advertisement and validity checkout Unit for transmission Unit for storage and advertisement Unit for playback Unit for transmission

Size Default

Fixed?

9500 KB 180 KB 256 KB 16 KB 2 MB 16 KB 1 KB

Yes Yes No No Yes Yes Yes


The most well-known piece selection strategy is the rarest f irst piece selection algorithm. With this algorithm, the local peer maintains the number of owners in its peer set of each file piece, and it always selects the piece with fewest owner to request first. Another popular strategy is called random piece selection, in which the local peer always selects randomly an available piece to request. Most P2P systems employ rarest first algorithm with various minor modification to adapt to different situations. For example, in BitTorrent, each peer would download its first 4 pieces randomly to have some pieces to reciprocate for the choke algorithm. After 4 pieces, it would switch to rarest first strategy [7]. eMule’s basic principle is rarest first, also giving priority to the pieces used for preview and file check (the first and last piece). PPLive also deploys a mixed strategy, giving the first priority to sequential strategy, by which the piece closest to what is needed for the video playback is selected, then rarest first [8]. For P2P streaming systems, two factors must be balanced in piece selection algorithm: greedy piece selection for playback urgency and rarest piece selection for distribution efficiency. Therefore, some mixed strategies are proposed [9, 10]. In [11], the authors demonstrate that the optimal piece selection policy can change with the number of peers, therefore they propose an Adaptive Piece Selection algorithm and also design a protocol to update the optimal chunk selection policy to all peers. In [12], experiments show that the rarest first strategy performs better than random strategy in terms of multiple metrics. The authors conclude that rarest first strategy is critical in eliminating the last piece problem1 and ensuring that new peers quickly have something to offer to other peers. In [7], the authors argue that a piece selection strategy is considered to be efficient if each peer can always find an interesting piece from any other peer. Based on this argument, they define an efficiency metric called entropy of torrent. Their experiment demonstrates that the rarest first algorithm achieves a close to ideal entropy, and its replacement by more complex solutions cannot be justified. However, in [13], the authors claim that the random strategy performs better than rarest first for the slowest nodes. The authors suspect that it is because for slowest nodes the rarest first strategy operates on potentially

1 The last piece problem means a node has difficulty finding a peer

that possesses the last piece, increasing the overall download time significantly in distribution systems.

stale information due to lossy links. Simulations presented in [14] also shows that rarest first algorithm may result in scarcity of some pieces, which finally brings long finish time, and the authors propose a solution based on network coding. But this solution faces several complex deployment issues such as security and computational cost [7]. Some conclusions mentioned above are inconsistent. We believe that the performance of piece selection algorithms should be further investigated with justified metrics and diverse network conditions.

2.3 Peer selection strategy After a local peer pi sends an “interest” message to a remote peer, say p j, p j can decide whether it would provide upload service to pi . The strategy a peer uses to make decision on this issue is called peer selection strategy. In [7], the authors argue that the goal of peer selection algorithm should be fairness and system capacity maximization. The choke algorithm, also called tit-for-tat, is the peer selection strategy used in BitTorrent. With this algorithm, p j sorts all peers who are interested in p j according to their uploading rate to p j, and only the first three peers can be unchoked, which means they have the chance to connect with p j for data downloading. In order to give chance to new peers, one additional peer is unchoked at random. After p j gets all file pieces, its peer selection strategy would be slightly changed peers are ordered by their downloading rate from p j to maximize the utilization of seed peers. In eMule systems, p j would order peers who are interested in p j by two metrics: the amount (instead of rate in BitTorrent) of data uploaded to p j from the peer, and the time of the peer in p j’s queue of uploading service. It allows p j to define some peers as “friends” to give those peers highest priority. As a media streaming system, PPLive does not have any built-in controls to allow users to adjust their contribution levels [8]. The client software must provide uploading services to continue its own playback. If neighboring peers cannot supply sufficient downloading rate, PPLive content server can always be used to supplement the need. The fairness of choke algorithm is disputable. In [12], the authors claim that current rate-based tit-for-tat policy is not effective in preventing unfairness in terms of volume of content served, they propose to employ block-based tit-for-tat algorithm to improve fairness. Jun and Ahamad study this issue in a game theoretic framework and conclude that current algorithm is “sus-


ceptible to free riding” [15]. They also provide a more “robust” mechanism. However, a different conclusion is drawn in [7] that rate-based tit-for-tat is enough for fairness. This is partly because they understand the context of P2P file replication in different ways and then propose different fairness criteria. Many researchers are proposing new peer selection algorithms to improve the fairness of P2P systems. Sherman et al. state that all rate-based approaches, such as tit-for-tat and proportional response algorithm, suffer from a fundamental flaw, i.e., requirement to estimate neighboring peers’ rate. They present a deficitbased distributed P2P algorithm, FairTorrent, which runs locally at each peer and maintains a deficit counter for each neighbor to represent the difference between bytes sent and bytes received from that neighbor [16]. The authors of [17] study peer selection strategies (also including piece selection strategies) from a different prospective. They formulate the collaborative file distribution as a scheduling problem in a simplified context, and develop several algorithms, i.e., rarest piece first, most demanding node first and maximumflow algorithms, to solve the scheduling problem. Their simulation results show that the graph-based dynamically weighted maximum-flow algorithm, which dynamically takes care of the rarity of file pieces, the demands of nodes and the number of concurrent transmissions, outperforms all other algorithms. The authors argue that their algorithm is a promising solution to be employed as the core scheduling module in P2P file sharing applications.

3 ISPs’ P2P traffic management In many provider networks, the traffic generated by P2P applications has overtaken the previous dominant traffic such as Web and email. In order to ensure the quality of other services, ISPs have to increase their bandwidth provisioning, or take some measures to reduce P2P traffic, especially inter-domain P2P traffic. In [18], Halme summarizes ISPs’ strategies into four categories: tolerate, limit, hamper and control. In this section, we will review three P2P traffic management techniques respectively, i.e., blocking, caching, and localization, wherein blocking is to limit P2P usage, caching and localization are to control P2P usage.

is how to identify P2P traffic. At the beginning, P2P systems used to use some well-known port numbers to complete their communications, so ISPs can easily block P2P traffic by filtering traffic coming from or going to those port numbers. In order to evade ISPs’ detection, most P2P systems started to use random or user-designated port numbers. It is reported that the accuracy rate of port-based detection has dropped to less than 50%. In [19], the authors also present several limitations of this method in a more general context. Therefore, now many researchers try to develop methodologies based on other traffic features. In [20], the algorithm is based on the topological feature such as a large network diameter and the presence of many hosts acting both as servers and clients. In [21], the authors propose to classify traffic flows based on three simple properties of the captured IP packets: their size, inter-arrival time and arrival order. In [22], the authors focus on Skype-relayed traffic, proposing to detect these traffic flows based on thresholds of start and end time differences, byte size ratio, and maximum cross correlation between two relayed bursts of packets. There are also research efforts to exploit neural networks, data mining or machine learning techniques [23, 24]. However, we do not see any deployment of these methods in real Internet. Today, systems deployed by ISPs are often based on the technique called deep packet inspection (DPI), which looks into packet payload to find application level signatures to detect P2P traffic [25]. Well-known DPI products include L7-filter, PDML from Cisco, netscreen-IDP from Juniper, Engage from P-Cube, PPTM from ARA, NetEnforcer from Allot, etc. DPI can only identify P2P applications with known signatures, but cannot detect emerging P2P applications or encrypted P2P traffic flows. The other disadvantage is that these systems often consume a lot of resources of computation and storage space, which makes them unsuitable to deploy on backbone links. In [26], the authors study the phenomenon of BitTorrent traffic blocking deployed in the current Internet. Their result shows that most of ISPs that are doing traffic blocking appear to be using DPI technology, and most blocking is in the upstream direction while downloading traffic is rarely interfered. As a summary, in Table 2 we present advantages and disadvantages of three identification methods mentioned above.

3.1 P2P blocking

3.2 P2P caching

The simplest way for ISPs to reduce P2P traffic is to identify P2P traffic and then block it. The key issue here

Obviously, ISPs’ traffic blocking would degrade users’ experience and result in battles between ISPs and net-

Peer-to-Peer Netw. Appl. Table 2 Detecting and blocking P2P traffic Basic idea

Advantages

Disadvantages

Port number Flow features

Easy to implement and deploy Generality, can deal with encrypted applications or new applications Mature technique, high accuracy rate

Cannot detect applications with random or user-designated port Low accuracy rate, immature to deploy

Deep packet inspection

work users. Some researchers try to apply the traditional caching technology of Web traffic on P2P traffic to alleviate the load on the Internet backbone. Generally speaking, a cache system for P2P traffic should implement the following blocks: flow capture, protocol analysis and classification, cache matching algorithm, traffic forwarding and cache replacement policy [27]. Based on the trace analysis, the authors of [28] state that P2P caching has a theoretical caching potential of 67% byte-hit-rate, which even exceeds the high end of HTTP caching systems, and 200 GB of disk space would suffice to achieve considerable caching results. They conclude that P2P traffic over inter-domain links is highly repetitive and consequently responds well to caching. Simulation results in [29] also show that more than 30% of inter-ISP traffic could be saved with a relatively small cache size. In [30], the authors develop a caching algorithm for P2P traffic. Their trace-based simulations show a bytehit-rate of up to 35%, which is 40–300% of the bytehit-rate of the common Web caching algorithms. They also present a measurement result that the popularity of P2P objects follows a Mandelbrot-Zipf distribution regardless of the AS, which has a negative effect on hit rates of caches that use LRU or LFU policies. Most of research efforts focus on deploying cache server on a single link, while in [31], researchers study how to deploy cache devices on multiple backbone links to maximize ISPs’ revenue. The authors define a link benefit utility function to evaluate the benefits of different deployments. Based on this utility function, the issue of how to place cache servers is modeled as an optimization problem. The authors also propose a greedy algorithm and a branch and bound algorithm (for small networks) to solve the optimization problem to find the best deployment. In [32], the authors propose to use “passive peer” for providing resource cache equivalent functions on protocol closed P2P networks. The passive peer is realized by execution of the corresponding P2P application without any change, but it does not perform any active operations such as resource creation and resource request origination. As a result, the passive

High computational complexity, cannot apply to encrypted applications or new applications with unknown signatures

peer behaves as a resource cache. Since it is aware of physical network topology, inter-domain P2P traffic is expected to decrease. The experiment in an ISP network with 2 million IP addresses demonstrates that the proposed method can decrease about 2-45% interdomain traffic. Similarly, Papafili et al. propose to insert high-bandwidth ISP-owned peers as an optimization approach to improve end-users’ performance and reduce inter-domain traffic [33]. They also show that the insertion of an ISP-owned peer can complement effectively the use of locality awareness. Although many research efforts claim that P2P caches can reach very high byte-hit-rate, there are a lot of difficulties in implementing and deploying such P2P caches. Firstly, traditional P2P cache systems still face the same challenge as P2P blocking on how to identify various P2P applications, especially emerging applications and encrypted applications. Secondly, although various P2P systems are based on the similar basic peer-to-peer principle, they might implement communication among peers in different ways. As a result, it is not easy to design a general platform to cache traffic of different P2P applications. Thirdly, P2P cache servers need to complete more functions than traditional Web cache servers, so these servers must have more computation power and network bandwidth etc. In fact, P2P caching is contradictory to the basic idea of P2P applications—making distributed systems to avoid performance bottlenecks. Last but not least, ISPs may run into legal issues, since they may engage in caching of illegal contents.

3.3 P2P localization In these few years, researchers propose to improve P2P algorithm by exploiting traffic localization to reduce inter-domain P2P traffic. The basic idea is to direct peers to download pieces from peers in the same ISP, i.e., biased peer selection. This would not only reduce the transmission cost of ISPs, but also might improve users’ P2P experience, since the available capacity of intra-domain links are often larger than inter-domain links. So traffic localization is beneficial for both ISPs


and network users, and it is a good example of “cooperation” between the two parties. Some researchers conduct theoretic analysis or simulations to quantify the possible savings of locality-aware P2P implementations. In [34], trace-based simulations show that an ideal locality-aware scheme can obtain impressive external bandwidth savings, which is about 68% byte saving for large objects and a 37% byte saving for small objects. In [35], based on a P2P traffic simulator called J-Sim, the authors evaluate three different P2P traffic management policies, i.e., no peer selection, preferred ISP and preferred metro area, showing that traffic localization using a peer selection policy at super peers is possible to contain P2P traffic to the local metropolitan network as much as 40%. In [36], the packet analysis on an edge network shows that 50–90% of existing local pieces in active users are downloaded externally in current Internet. Their simulations demonstrate the locality scheme is beneficial for both end users and ISPs. About 70% of the peers show increased mean download rate in the locality scenario, wherein 24% of peers experience more than 50% faster download rate. For ISPs, the locality scheme can reduce the ISP’s ingress link utilization by a factor of 2 and the traffic uploaded externally is reduced by more than a factor of 6. However, the effectiveness of biased peer selection may vary with the situation. In [37], the authors present three findings on the condition to ensure the effectiveness of locality schemes. First, the original seed should have moderately high upload bandwidth to ensure no degradation in download times. Second, the rarest first algorithm is key to the success of biased neighbor selection, while random piece selection algorithm does not work well. Third, higher bandwidth external peers would reduce the effectiveness of biased neighbor selections. How to find nearest or suitable peers is the key problem in all locality schemes. The difficulty of finding the nearest peer in terms of latency in P2P systems is studied in [38]. Based on different ways to select suitable peers, researchers present different solutions to locality-aware P2P systems, e.g., UTAPS [39], Ono [40], Oracle [41], P4P [42], etc. UTAPS [39] selects peers based on RTT and hop count information collected by traceroute running on all trackers and peers. This scheme requires modifications on both servers and clients to enable their active measurements and active measurements would introduce overhead to both nodes and links. In order to avoid such overhead, the authors of [40] design their biased peer selection algorithm based on the information collected by content distribution net-

works (CDNs). CDNs attempt to improve Web performance by delivering content to end users from multiple, geographically dispersed servers located at the edge of the network. Clients’ requests can be dynamically forwarded to topologically proximate replicas. The authors posit that if two clients are sent to a similar set of replica servers by the CDN, they are likely to be close to these servers and, more importantly, to each other. They develop a plugin called Ono, which performs periodic DNS lookups on popular CDN names. When Ono determines that a peer has similar redirection behavior, it attempts to bias traffic toward that peer by ensuring there is always a connection to it, which minimizes the time that the peer is choked. Ono has been installed by over 120,000 subscriber peers distributed worldwide and demonstrates its good performance. In [41], the authors propose a solution where ISPs help P2P systems by offering an oracle service, which ranks the potential neighbors for a peer according to certain metrics such as AS hop, geographical information or traffic engineering concerns. Since ISPs have direct access to a lot of information of physical networks, they don’t need to do extra measurement or inference. Xie et al. propose a similar solution called P4P [42], where each ISP deploys “iTracker” as portals operated by network providers. iTracker allows trackers of P2P applications (appTracker) to query costs and distance (p-distance) between peers. We can see that there is a difference between [39, 40] and [41, 42]. In [39, 40], peers launch measurements and make connection decisions according to measurement results. While in the algorithms of [41, 42] (under situation with trackers), ISPs rank remote peers according to their information and their interests. P2P localization is the most promising technique to reduce inter-domain P2P traffic. We summarize the organization and operation procedure of these solutions in Fig. 1. Different from biased peer selection systems mentioned above, the authors of [43] propose Biased Unchoking, which is motivated by the fact that the choke algorithm has a major impact on with which peers exchange data and how much. Their comparison study shows that biased unchoking works best in scenarios with high load on the swarm and the combination of Biased Neighbor Selection with Biased Unchoking leads to the best performance. A new IETF working group called ALTO was established in 2009 to design and specify an ApplicationLayer Traffic Optimization (ALTO) service that will provide applications with information to perform better-than-random initial peer selection (ALTO, http://www.ietf.org/dyn/wg/charter/alto-charter.html). The Working Group will consider the needs of

Peer-to-Peer Netw. Appl. Fig. 1 Locality-aware P2P solutions

Tracker

(2)

(3)

(1)

(2)

(4)

Peer

Peer

UTAPS (modification needed on client software. Information collected by peers.)

(5)

Tracker

Oracle (deployed by ISP)

(5) Peer

(2)

iTracker (deployed by ISP)

(1)

Peer

Oracle (under P2P with tracker) (modification needed on client software information provided by ISPs.)

request and reply peer set.

(2) (3) (4)

(5)

Peer

BitTorrent, tracker-less P2P, and other applications, such as content delivery networks (CDN) and mirror selection. Recent developments in the IETF ALTO Working Group are summarized in [44]. Gurbani, Hilt et al. also conduct a survey of research on the application-layer traffic optimization problem and the

key step: get rank information

Peer

Ono (modification needed on client software. Information collected by peers.)

Tracker

(3) (4)

(1)

(4)

(3)

Peer

Tracker

DNS server

(1) Connection established. Peer

P4P (modification needed on tracker software. information provided by ISPs.)

need for layer cooperation in [45], where the authors state that ALTO problem will be best achieved by enabling communications between the P2P application layer and the network layer. As a summary of Section 3, we compare P2P traffic management techniques discussed above in Table 3.

Table 3 ISPs’ P2P traffic management schemes Scheme

Advantages

Disadvantages

References

p2p blocking

ISPs have full control

[19–26]

p2p caching

Can be beneficial for both sides

Localization (peers)

Can be beneficial for both sides, do not need extra infrastructure Can be beneficial for both sides, get reliable information without measurement overhead

Legal issues, based on identification, degrade user experience (ISPs may lose customers) Legal issues, based on identification, application-specific, computational complexity, possible performance bottleneck Measurement overhead, computation power consumption of peers Must have mutual trust, need extra infrastructure

Localization (ISPs)

[27–33]

[36, 39, 40] [41, 42]


4 Conclusion P2P systems generate a major portion of the Internet traffic. As broadband deployment and flat rate pricing for residential users continue, P2P applications would be more and more popular. The system performance of P2P applications and the challenges they pose for ISPs have attracted a lot of attention. In this article, we review recent research results on the key strategies or algorithms that have significant influence on system performance and network management techniques for ISPs to control P2P flows. As we point out in the article, there are still many open issues for researchers to do further investigation and P2P will continue to be a important area of networking research in the next few years.

References 1. Kulbak Y, Bickson D (2005) The emule protocol specification. Tech. Rep., http://www.cs.huji.ac.il/labs/danss/ presentations/emule.pdf 2. Wang JH, Chiu DM, Lui JC (2008) A game-theoretic analysis of the implications of overlay network traffic on isp peering. Comput Networks 52(15):2961–2974 3. Karagiannis T, Broido A, Brownlee N, Claffy KC, Faloutsos M (2004) Is p2p dying or just hiding? In: GLOBECOM 2004. Dallas, Texas, USA, IEEE Computer Society Press 4. Androutsellis-Theotokis S, Spinellis D (2004) A survey of peer-to-peer content distribution technologies. ACM Comput Surv 36(4):335–371 5. Lua EK, Crowcroft J, Pias M, Sharma R, Lim S (2005) A survey and comparison of peer-to-peer overlay network schemes. IEEE Commun Surv Tutor 7:72–93 6. Marciniak P, Liogkas N, Legout A, Kohler E (2008) Small is not always beautiful. In: IPTPS’08, the 7th international workshop on peer-to-peer systems, Tampa Bay, Florida, USA 7. Legout A, Urvoy-Keller G, Michiardi P (2006) Rarest first and choke algorithms are enough. In: IMC’2006, ACM SIGCOMM/USENIX conference, Rio de Janeiro, Brazil 8. Huang Y, Fu TZJ, Chiu D-M, Lui JCS, Huang C (2008) Challenges, design and analysis of a large-scale p2p-vod system. In: SIGCOMM ’08: proceedings of the ACM SIGCOMM 2008 conference on data communication, Seattle, WA, USA, pp 375–388 9. Zhou Y, Ming Chiu D, Lui JCS (2007) A Simple Model for Analyzing P2P Streaming Protocols. IEEE International Conference on Network Protocols, 2007. ICNP 2007. pp 226– 235. doi:10.1109/ICNP.2007.4375853 10. Vlavianos A, Iliofotou M, Faloutsos M (2006) Bitos: enhancing bittorrent for supporting streaming applications. In: IEEE global internet, pp 1–6 11. Zhao B, Lui J, Chiu D-M (2009) Exploring the optimal chunk selection policy for data-driven p2p streaming systems. In: IEEE ninth international conference on peer-to-peer computing, 2009. P2P ’09, pp 271–280

12. Bharambe AR, Herley C, Padmanabhan VN (2006) Analyzing and improving a bittorrent network’s performance mechanisms. In: IEEE Infocom 2006, Barcelona, Spain 13. Kostic D, Braud R, Killian CE, Vandekieft E, Anderson JW, Snoeren AC, Vahdat A (2005) Maintaining high bandwidth under dynamic network conditions. In: ATEC ’05: proceedings of the annual conference on USENIX annual technical conference 14. Gkantsidis C, Rodriguez P (2005) Network coding for large scale content distribution. In: IEEE infocom 2005, Miami, USA 15. Jun S, Ahamad M (2005) Incentives in bittorrent induce free riding. In: P2PECON ’05: proceedings of the 2005 ACM sigcomm workshop on economics of peer-to-peer systems, pp 116–121 16. Sherman A, Nieh J, Stein C (2009) Fairtorrent: bringing fairness to peer-to-peer systems. In: CoNEXT ’09: proceedings of the 5th international conference on emerging networking experiments and technologies. New York, NY, USA: ACM, pp 133–144 17. Chan JS, Li VO, Lui K-S (2007) Performance comparison of scheduling algorithms for peer-to-peer collaborative file distribution. IEEE J Sel Areas Commun 25(1):146–154 18. Halme A (2005) Peer-to-peer traffic: impact on isps and evaluation of traffic management tools. In: HUT T-110.551 seminar on internetworking 19. Roughan M, Sen S, Spatscheck O, Duffield N (2004) Classof-service mapping for qos: a statistical signature-based approach to ip traffic classification. In: IMC 04: proceedings of the 4th ACM SIGCOMM conference on internet measurement. New York, USA, pp 135–148 20. Constantinou F, Mavrommatis P (2006) Identifying known and unknown peer-to-peer traffic. In: IEEE international symposium on network computing and applications, pp 93– 102 21. Crotti M, Dusi M, Gringoli F, Salgarelli L (2007) Traffic classification through simple statistical fingerprinting. Comput Commun Rev 37(1):5–16 22. Suh K, Figueiredo DR, Kurose J, Towsley D (2006) Characterizing and detecting skype-relayed traffic. In: INFOCOM 2006. 25th IEEE international conference on computer communications, Barcelona, Spain, pp 1–12 23. Moore AW, Zuev D (2005) Internet traffic classification using bayesian analysis techniques. In: ACM SIGMETRICS 2005, Banff, Alberta, Canada 24. Couto A, Nogueira A, Salvador P, Valadas R (2008) Identification of peer-to-peer applications’ flow patterns. In: Next generation internet networks, NGI 2008, pp 292–299 25. Subhabrata S, Oliver S, Wang D (2004) Accurate, scalable in-network identification of p2p traffic using application signatures. In: WWW ’04: proceedings of the 13th international conference on world wide web, pp 512–521 26. Dischinger M, Mislove A, Haeberlen A, Gummadi KP (2008) Detecting bittorrent blocking. In: Proceedings of the 8th ACM SIGCOMM conference on internet measurement (IMC’08), Vouliagmeni, Greece 27. Wierzbicki A, Leibowitz N, Ripeanu M, Wozniak R (2004) Cache replacement policies revisited: the case of p2p traffic. Los Alamitos, CA, USA, pp 182–189 28. Leibowitz N, Bergman A, Ben-Shaul R, Shavit A (2002) Are file swapping networks cacheable? Characterizing p2p traffic. In: The 7th international workshop on web content caching and distribution (WCW’02), Boulder, CO, USA 29. Huang N-F, Chu Y-M, Tsai C-H, Huang W-Z, Tzeng W-J (2009) A resource-efficient traffic localization scheme


30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

for multiple bittorrents. In: IEEE international conference on communications, 2009. ICC ’09, pp 1–5 Saleh O, Hefeeda M (2006) Modeling and caching of peer-topeer traffic. In: IEEE international conference on network protocols, pp 249–258 Ye M, Wu J, Xu K (2008) Caching the p2p traffic in isp network. In: IEEE international conference on communications, 2008, ICC ’08, pp 5876–5880 Tagami A, Hasegawa T, Hasegawa T (2004) Analysis and application of passive peer influence on peer-to-peer interdomain traffic. In: P2P ’04: proceedings of the fourth international conference on peer-to-peer computing. Washington, DC, USA: IEEE Computer Society, pp 142–150 Papafili I, Soursos S, Stamoulis GD (2009) Improvement of bittorrent performance and inter-domain traffic by inserting isp-owned peers. In: ICQT ’09: proceedings of the 6th international workshop on internet charging and Qos technologies. Berlin, Heidelberg: Springer-Verlag, pp 97– 108 Gummadi KP, Dunn RJ, Saroiu S, Gribble SD, Levy HM, Zahorjan J (2003) Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. SIGOPS Oper Syst Rev 37(5):314–329 Hamada T, Chujo K, Chujo T, Yang X (2004) Peer-to-peer traffic in metro networks: analysis, modeling, and policies. In: Network operations and management symposium, NOMS 2004. IEEE/IFIP, vol 1, pp 425–438 Karagiannis T, Rodriguez P, Papagiannaki K (2005) Should internet service providers fear peer-assisted content distribution? In: IMC’05: proceedings of the 5th ACM SIGCOMM conference on internet measurement, pp 63–76 Bindal R, Cao P, Chan W, Medved J, Suwala G, Bates T, Zhang A (2006) Improving traffic locality in bittorrent via biased neighbor selection. In: 26th IEEE international conference on distributed computing systems, ICDCS 2006, p 66 Vishnumurthy V, Francis P (2008) On the difficulty of finding the nearest peer in p2p systems. In: IMC ’08: proceedings of the 8th ACM SIGCOMM conference on internet measurement. New York, NY, USA: ACM, pp 9–14 Li W, Chen S, Yu T (2008) Utaps: an underlying topologyaware peer selection algorithm in bittorrent. In: International conference on advanced information networking and applications, pp 539–545 Choffnes DR, Bustamante FE (2008) Taming the torrent: a practical approach to reducing cross-isp traffic in peer-topeer systems. SIGCOMM Comput Commun Rev 38(4):363– 374 Aggarwal V, Feldmann A, Scheideler C (2007) Can isps and p2p users cooperate for improved performance? SIGCOMM Comput Commun Rev 37(3):29–40 Xie H, Yang YR, Krishnamurthy A, Liu Y, Silberschatz A (2008) P4P: provider portal for applications. In: Proceedings of ACM SIGCOMM, Seattle, WA Oechsner S, Lehrieder F, Hossfeld T, Metzger F, Staehle D, Pussep K (2009) Pushing the performance of biased neighbor selection through biased unchoking. In: IEEE ninth international conference on peer-to-peer computing, 2009. P2P ’09, pp 301–310 Seedorf J, Kiesel S, Stiemerling M (2009) Traffic localization for p2p-applications: the alto approach. In: IEEE ninth international conference on peer-to-peer computing, 2009, P2P ’09, pp 171–177 Gurbani V, Hilt V, Rimac I, Tomsu M, Marocco E (2009) A survey of research on the application-layer traffic opti-

mization problem and the need for layer cooperation. IEEE Commun Mag 47(8):107–112

Jessie Hui Wang received her Ph.D. degree from the Department of Information Engineering at the Chinese University of Hong Kong. She is now an assistant research professor in Tsinghua University. Her research interests include traffic engineering, routing protocols and economic analysis of data networks.

Chungang Wang received his master degree from Computer Scicence Department of Tsinghua University in 2009. His research interests include measurement and analysis of P2P applications and other overlay networks.


Jiahai Yang received his Ph. D. degree in computer science from Tsinghua University, Beijing, P.R.China. He is now a professor of Tsinghua University. Jiahai’s research interests include Internet architecture and its protocols, IP routing technology, network measurement, network management, etc.

Changqing An received her Master’s degree from the Department of Computer Science and Technology at Tsinghua University. She is now an Associate Professor in Tsinghua University. Her research focuses on theory and technology related to network operation and management.