On Assessing Measurement Accuracy in ... - Distributed Systems

1 downloads 139 Views 1004KB Size Report
Peers interested in a file obtain the file's metadata from a web site (the ... limited uptime (presence) in the system, measurements taken at peer level may not be ...
Delft University of Technology Parallel and Distributed Systems Report Series

On Assessing Measurement Accuracy in BitTorrent Peer-to-Peer File-Sharing Networks Boxun Zhang, Alexandru Iosup, Johan Pouwelse, Dick Epema, and Henk Sips {B.Zhang,A.Iosup,J.Pouwelse,D.H.J.Epema,H.J.Sips}@tudelft.nl

report number PDS-2009-005

PDS ISSN 1387-2109

Published and produced by: Parallel and Distributed Systems Section Faculty of Information Technology and Systems Department of Technical Mathematics and Informatics Delft University of Technology Zuidplantsoen 4 2628 BZ Delft The Netherlands Information about Parallel and Distributed Systems Report Series: [email protected] Information about Parallel and Distributed Systems Section: http://pds.twi.tudelft.nl/

c 2009 Parallel and Distributed Systems Section, Faculty of Information Technology and Systems, Department

of Technical Mathematics and Informatics, Delft University of Technology. All rights reserved. No part of this series may be reproduced in any form or by any means without prior written permission of the publisher.

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp Abstract

The BitTorrent peer-to-peer file-sharing network is currently one of the dominant Internet applications. Understanding the characteristics of BitTorrent through real-world measurements is key to improve the quality of service for tens of millions of BitTorrent users, but the complexity and scale of BitTorrent make a single, complete measurement impractical. Thus, an increasing number of real measurements have employed diverse sampling techniques to study the BitTorrent network. However, there is no study that investigates the accuracy of the findings of the different measurement techniques used in practice. To address this gap, in this work we propose a thorough investigation of the accuracy of BitTorrent measurement techniques. To this end, we first introduce a taxonomy of inaccuracy sources. We then investigate the effect of these sources using 15 long-term BitTorrent datasets collected from 9 BitTorrent communities between 2004 and 2009. We find that most reported measurements are based on techniques that can lead to inaccurate characterization of system properties.

Wp

1

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

WpContents

Contents 1 Introduction

4

2 Background 2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 BitTorrrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 4 5

3 Estimating Measurement Inaccuracy 3.1 Estimating Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Data Source Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Data Volume Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 5 6 6

4 The 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9

Collected Traces T1: BT-TUD-1, SuprNova . T2: BT-TUD-2, PirateBay T3: LegalTorrents.com . . . T4: etree.org . . . . . . . . T5: tlm-project.org . . . . . T6: transamrit.net . . . . . T7: unix-ag.uni-kl.de . . . . T8: idsoftware.com . . . . . T9: boenielsen.dk . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

5 The Results 5.1 Inaccuracy Due to Data Source Selection . . . 5.1.1 Measurement Level . . . . . . . . . . . . 5.1.2 Community Type . . . . . . . . . . . . . 5.1.3 Passive vs. Active . . . . . . . . . . . . 5.2 Inaccuracy Due to Data Volume Reduction . . 5.2.1 Sampling rate and Duration . . . . . . . 5.2.2 Number of Communities and Number of 5.2.3 Catching long-term dynamics . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

8 . 8 . 8 . 8 . 9 . 9 . 9 . 10 . 10 . 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Torrents . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

11 11 11 12 13 14 14 16 19

6 Related Work

22

7 Conclusion and Future Work

23

8 Acknowledgements

23

Wp

2

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

WpList of Figures

List of Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Swarm dynamics at swarm and peer level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of cumulative community throughput resulting from community-level measurement and swarm-level measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of file size distributions 2005 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of file size distributions 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Download speed 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Upload speed 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session length 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sampling Bias for one swarm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sampling bias on swarm size, multi-swarm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sampling Bias Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of overall peer coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of session length distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of download speed distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of download amount per peer distributions . . . . . . . . . . . . . . . . . . . . . . Comparison of upload speed distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of download speed distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of main characteristics resulting from datasets acquired from a specific number of torrents coming from one community. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of monthly file size distribution from T1 . . . . . . . . . . . . . . . . . . . . . . . . Comparison of monthly swarm size distributions from T1 . . . . . . . . . . . . . . . . . . . . . Comparison of monthly throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Download speed of different continents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Download speed comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Torrent size comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 11 . . . . . . . . . . . . . . .

11 12 12 13 13 14 15 15 16 16 17 17 18 18 19

. . . . . . .

19 20 20 21 21 21 22

Summary of the datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

List of Tables 1

Wp

3

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

BitTorrent BitTorrent Measurement AccuracyWp

1

Wp Wp1. Introduction

Introduction

Peer-to-Peer file-sharing networks such as BitTorrent serve millions of users daily and are responsible for a significant percentage of the total Internet traffic. Since BitTorrent is still relatively new, ongoing research still attempts to improve core functionality such as performance and tolerance to abuse. Deep understanding of the resource and of the usage patterns are the basis of system design and optimization; for BitTorrent, such understanding results from measuring real BitTorrent deployments. Despite a large number of BitTorrent measurements [18, 10, 16, 2], the lack of a thorough evaluation of the accuracy of these measurements prevents the comparison of measurement results. Our objective is to understand the accuracy of BitTorrent measurements. The complexity, scale, and dynamics of P2P file-sharing networks make complete measurements impractical. In practice, measurements have to sample data from a part of the network, within resource and cost limits. Thus, the design of these measurements includes a hidden trade-off between cost and data accuracy. Much work [17, 18, 3, 5, 10, 16, 7] has been put into empirical measurements of P2P file-sharing systems, including BitTorrent, but only few studies [8, 20, 2] consider the inaccuracy introduced by the measurements. Understanding the accuracy of BitTorrent measurements facilitate the design, validation, and comparison of BitTorrent models and algorithms. Ultimately, using accurate data leads to improved user experience. In contrast, inaccurate data may highlight false problems and lead to impractical solutions. Our work is further motivated by two real and immediate applications. First, we are continuing [22] our work to establish a publicly-accessible P2P Workloads Archive. This archive will include in a first phase the tens of P2P measurement datasets we have acquired since 2003, and in particular the 15 datasets we use in this work. Second, within the QLectives project1 we are currently taking and there are plans for new measurements of the BitTorrent network. It is therefore important to develop a method and the tools to assess the measurement accuracy for each of these traces. To address this situation, in this work we investigate the accuracy of various BitTorrent measurement techniques used in practice, and show that the existing measurement techniques need to reconsider the data sources and the volume of acquired data, or produce inaccurate or even meaningless results. Our main contribution is twofold: 1. We propose a method for estimating the accuracy of BitTorrent measurements that focuses on two main axes and six main sources of measurement inaccuracy (Section 3); 2. We evaluate the effect of these sources using 15 long-term BitTorrent datasets collected from 9 BitTorrent communities between 2004 and 2009, and show evidence that the techniques used in practice today lead to inaccurate results (Section 5).

2

Background

In this section we introduce the background needed to understand the remainder of this work. Much of the P2P-related terminology and BitTorrent description in this section is adapted from our previous work on BitTorrent [16, 8, 22].

2.1

Terminology

A P2P system is a system that uses P2P technology to provide a set of services; this group of services forms together an application such as file sharing. We call peers the participants in a P2P system that contribute to or use the system’s resources and services. A peer is completely disconnected until it joins the system, and is active until it leaves the system. A real user may run several peer sessions; the sessions are not overlapped in time. We call a swarm the group of peers, from all the peers in a P2P system, that interact with each other 1 QLectives (http://www.qlectives.eu/ ) is a 7 million Euro four years project, starting in March 2009, funded by the EU under the FP7 FET programme.

Wp

4

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp Wp2.2

BitTorrrent

for a specific goal, such as transferring a file. A swarm starts being active when the first peer joins that swarm, and ends its activity when its last peer leaves. The lifetime of a swarm is the period between the start and the end of the swarm. A community is the group of peers who are or can easily become aware of the existence of each other’s swarms. Our view on P2P systems considers three levels of operation. A P2P system includes at least a peer level, but may also include any of the community and swarm levels. The definitions of community, swarm, and peers presented here are general for peer-to-peer systems, though their implementation may differ with the P2P protocol. For example, BitTorrent and eDonkey have different implementations and uses of the swarm concept.

2.2

BitTorrrent

In this work we focus on BitTorrent, a popular P2P file-sharing application. BitTorrent is currently the largest P2P file-sharing, with an estimated Internet traffic share of over 50% in 2008 [1], up from 30% in 2004 [14]. BitTorrent includes all the three levels of operation defined in Section 2.1, that is, community, swarm, and peer. The files (torrents) transferred in BitTorrent contain two parts: the raw file data and a metadata (directory and information) part. Peers interested in a file obtain the file’s metadata from a web site (the community level of BitTorrent) and use the peer location services offered by a tracker (the swarm level of BitTorrent) to find other peers interested in sharing the file. The raw file is then exchanged between peers (the peer level of BitTorrent). To facilitate this exchange, the raw data are split in smaller parts, called chunks. Thus, to obtain a complete file a user has to obtain all the chunks comprised by a file through the use of three application levels. In BitTorrent, a leecher is a peer who still needs chunks, and a seeder is a peer who has the complete torrent and shares it with other peers. A delicate part in the file-sharing process of BitTorrent is the contribution of bandwidth. In this context, freeriding is defined as the activity of a peer that does not contribute any bandwidth to the system, and hit-and-run is defined as the activity of a peer that does not contribute to the network after obtaining the complete file (so after being eligible to act as a seeder).

3

Estimating Measurement Inaccuracy

In this section we introduce a method for estimating the inaccuracy of BitTorrent measurements. Our method focuses on two main questions that define a measurement process: What to measure? and How much to measure? The first question results from the complexity of BitTorrent. For example, there currently exist tens of BitTorrent communities to choose from, each with its own web sites and trackers, and possibly its own usage characteristics. The second question expresses the trade-off between accuracy and volume of measurement data. Since the data are often collected from sources within different administrative domains (i.e., users), BitTorrent measurements are inherently limited in size. Corresponding to the two main questions, two main aspects influence the measurement accuracy: the source of the data to be collected, and the volume of the data to be collected. We first introduce the characteristics of the system that we want to observe, and for which we want to understand the sources of inaccuracy. We then present in turn the sources of inaccuracy stemming from the data source selection and from the data volume reduction.

3.1

Estimating Accuracy

Following traditional work on modeling Internet traffic [11, 9], much can be gained for system designer by understanding at the community level the sizes of the files that are shared, at the swarm level the arrival and departure processes, and at the peer level the application-level bandwidth. We distinguish between the complete and the transient swarm population: we define the swarm population as the set of peers present in the swarm

Wp

5

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp Wp3.2 Data Source Selection

at any time during the measurement, and the swarm (dynamic) size as the transient set of peers present during a specified interval of the measurement. We estimate the accuracy of a measurement with the following metrics: • Coverage is the percentage of peers or events included in the dataset resulting from a real measurement, from the real number of peers or events comprised in the complete dataset. As a first approximation, a model of the system or the most complete dataset available can be used when the complete dataset is not available. • Error/deviation of values is a metric that mimics traditional statistical approaches for comparing probability distributions for random variables. The Kolmogorov-Smirnov test [12] uses the D characteristic to estimate the maximum deviation between the cumulative distribution functions (CDFs) of two random variables. We use a graphical representation of the CDFs of the measured and the real characteristic under study, which gives us a visual estimation of the D characteristic.

3.2

Data Source Selection

Depending on the selection of the data source, we distinguish three main sources of inaccuracy: Measurement level In Section 2 we have defined three levels for a P2P application, community, swarm, and peer. Measuring at any single level may result in measurement inaccuracy. For example, since peers have limited uptime (presence) in the system, measurements taken at peer level may not be able to contact all peers. Moreover, accurately estimating the time when a swarm becomes active is not possible when using only peer level measurements. Community level: Community type There are many types of communities in BitTorrent. We define three characteristics that discriminate between BitTorrent communities: content coverage, legality, openness. Communities may cover either general or specific content. The specific content may be further divided into traditional content sub-types such as video, audio, games, operating system, etc.; Garbacki et al. [4] identify up to 200 content sub-types for the SuprNova BitTorrent community. The correlation between content type popularity and individual file popularity has been observed and exploited for several P2P file-sharing networks, including BitTorrent [7, 4]. Independent of content type, communities may share only content that has been certified as legal, or content with any legal status. Legal communities are usually built around companies that target a reduction of costs in distributing their content. Such companies install and maintain well-connected seeders with high avaialbility; such communities may exhibit very different sharing behavior than communities that rely on voluntary seeding. Last, communities may be open to any user or closed. Closed communities require the registration of their users; this allows unique identification of peers and permits the enforcement of seeding quotas or minimal seeding-to-leeching ratios. Peer level: Passive vs. Active Measurements Following the terminology in our previous work [8], peer-level measurements are active if the measurement probes initiate contact with BitTorrent peers, and passive if the measurement probes wait for externally initiated contacts. In contrast to passive measurements, active measurements require that peers are accessible, that is, that they are not behind a firewall. The 2007 measurement by Xie et al. [21] shows that firewalls may affect up to 90% of the peers in a large live streaming application, and that less than 20% of the peers can by-pass firewalling through user-initiated configuration (UPnP). Thus, active measurements may lead to significantly reduced peer coverage.

3.3

Data Volume Reduction

The data volume is another major discriminant for peer-to-peer measurements. We define the complete data as the dataset comprising the complete state of the system at the time when the measurement was taken, and all the events that changed the system state during the measurement. Complete data ensures the maximum accuracy possible for a specific data source, but raises the requirements of the measurement. Reducing the data Wp

6

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

ID T1’04 T2’05 T3’05 T3’09 T4’05 T4’09 T5’05 T5’09 T6’05 T6’09 T7’05 T7’09 T8’05 T8’09 T9’05

Trace description BT-TUD-1, SuprNova (General, Any Legal Status) BT-TUD-2, PirateBay (General/Movies, Any Legal Status) LegalTorrents.com (General, Legal Content) etree.org (Recorded events, Only Legal Content) tlm-project.org (Linux, Only Legal Content) transamrit.net (Slackware Unix, Only Legal Content) unix-ag.uni-kl.de (Knoppix, Only Legal Content) idsoftware.com (Game Demos, Only Legal Content) boegenielsen.net (Knoppix, Only Legal Content)

Wp4. The Collected Traces

Period Dec 2003 to Dec 2004 06 Dec 2003 to 17 Jan 2004 05-11 May 2005

Sampling Hourly 2.5 min 2.5 min

22 Mar to 19 Jul 2005 24 Sep 2009 onwards 22 Mar to 19 Jul 2005 24 Sep 2009 onwards 22 Mar to 30 Apr 2005 24 Sep 2009 onwards 22 Mar to 19 Jul 2005 24 Sep 2009 onwards 22 Mar to 19 Jul 2005 24 Sep 2009 onwards 22 Mar to 19 Jul 2005 24 Sep 2009 onwards 22 Mar to 19 Jul 2005

5 min 5 min 15 min 15 min 10 min 10 min 5 min 5 min 5 min 5 min 5 min 5 min 5 min

Torrents 32,452 120 2,000

Sessions n/a 28,423,470 35,881,338

Traffic n/a n/a 12 PB/year

41 183 52 45 264 74 14 60 11 12 13 37 15

n/a n/a 165,168 169,768 149,071 21,529 130,253 61,011 279,323 160,522 48,271 14,697 36,391

698 GB/day 1.1 TB/day 9 GB/day 143 GB/day 735 GB/day 15 GB/day 258 GB/day 840 GB/day 493 GB/day 348 GB/day 19 GB/day 12 GB/day 308 GB/day

Table 1: Summary of the datasets used in this work. Only the datasets for traces T1’04 and T2’05 have been previously analyzed [16, 8]. volume, for example by sampling events and/or peers, ensures that the measurement is feasible and sometimes that the measurement is resource-efficient. Measuring complete data for large BitTorrent communities even for one week would require the use of thousands of measurement machines and petabytes of storage. In our previous work [16, 8] we have used various data volume reduction techniques to be able to track such large communities using ”‘only”’ hundreds of nodes and terabytes of storage. We distinguish three main types of techniques for data volume reduction: Sampling rate and Duration Since peer-to-peer systems have properties that evolve over time, measurements have to observe the same property over time. The data volume is then the product of the sampling rate and the duration of the measurement. Reducing the sampling rate and/or the duration leads to data volume reduction and possibly to lower accuracy. In practice, sampling rates of a sample every 2.5 [16, 8] and even 30 minutes [2], and durations of a few days [8] to a few months [10] are common. Number of communities and Number of swarms Complete data on BitTorrent comprise all the swarms from all the BitTorrent communities. This is impractical, as many communities may share properties, and within a community the most populated swarms account for most of the BitTorrent traffic. Thus, measurements may reduce the volume of acquired data by reducing one or both of the number of communities and the number of swarms. In practice, measurements have often focused on one community [16, 8], or even on only one swarm [10]. Recently, Andrade et al. have measured four communities [2], but their commendable approach is singular for the BitTorrent community. Long-term dynamics In the past decade, the evolution of BitTorrent has proven surprising. Overall, BitTorrent has emerged as a dominant Internet traffic generator. However, many of the BitTorrent communities have changed in time; some have disappeared due to lack of interest or corporate pressure. Thus, to prevent reported characteristics from becoming stale, measurements should make efforts to catch long-term system dynamics, including monthly, seasonal, yearly, and multi-year patterns; studying time patterns is a well-established topic for the Internet community [11, 9]. In practice, the only long-term studies related to BitTorrent are the five months study of Izal et al. [10] and our own year-long measurement of SuprNova [16].

Wp

7

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

BitTorrent BitTorrent Measurement AccuracyWp

4

Wp Wp4. The Collected Traces

The Collected Traces

To understand and characterize the accuracy of BitTorrent measurements we acquired long-term traces from ten BitTorrent communities. Overall, the collected traces describe communities of hundreds of thousands of peers and responsible yearly for over 13 peta-bytes of data, as summarized in Table 1. To ensure heterogeneity among the limited number of traces, we have taken into account the following controllable factors into trace collection. From the community perspective, the traces focus on communities with either general or very specific specific types of content; for example, the id software community focuses on sharing demos of games commercialized by id software. From the community size perspective, the traces include the largest communities in the world at the time of the data collection down to small communities, both in terms of number of users and number of shared files. In terms of duration, with the notable exception of the BitTorrent traces collected during just a few days in mid-2005, but still represent the largest collection of BitTorrent information relative to the unit of time, all the traces are long-term. With respect to the biases introduced by the very long-term community evolution, several of the collected traces include two datasets, one acquired in 2005 and one acquired in 2009. In the remainder of this section we describe each of the traces, in turn.

4.1

T1: BT-TUD-1, SuprNova

Datasets (1): T1’04 This trace was collected from the SuprNova community during the period between 2003 and 2004. This community distributes vary types of contents with any legal state, and this trace contains data at both swarm level and peer level: swarm level data was collected from 32,452 swarms with hourly sampling interval, which contains the number of seeders and leechers of each measured swarms over time, and descriptive information of torrents including file name, info hash, added time and file size; peer level data was collected from 120 swarms during the period between 06 Dec 2003 and 17 Jan 2004 with sampling interval of 2.5 minutes, and in total 28,423,470 sessions were captured, which contains peer’s ip address, port number, download progress (number of downloaded chunks) and error messages.

4.2

T2: BT-TUD-2, PirateBay

Datasets (1): T2’04 This trace was collected from the ThePirateBay community during the period between 05 May 2005 and 11 May 2005. This community distributes vary types of contents with any legal state. The trace contains data at both swarm level and peer level: swarm level data was collected from 2000 swarms, which contains the number of seeders and leechers of each measured swarms over time, and descriptive information of torrents including file name, info hash, added time and file size; peer level data was collected from 2000 swarms with sampling interval of 2.5 minutes, and in total 35,881,338 sessions were captured, which contains peers’ ip address, port number, client ID, download progress (number of downloaded chunks) and error messages. And the estimated annual throughput of this community during that period is 12 PB.

4.3

T3: LegalTorrents.com

Datasets (2): T3’05, T3’09 T3’05 was collected from the LegalTorrents.com during the period between 22 Mar 2005 and 17 Jul 2005, and T3’09 has been collected from this community since 24 Sep 2009 with 5 minute sampling interval. This community mainly distributes general types of contents and only provides legal contents. Both datasets only contain community-level data, which is the number of leechers and seeders, total number of completed downloads

Wp

8

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp Wp4.4

T4: etree.org

and traffic of each swarm. And both datasets contain descriptive information of measured torrents including file name, added time, file size, number of files in each torrent and description. In 2005, 41 swarms were measured and the daily throughput of this community was 698 GB traffic. In 2009, 183 swarms until now are measured and the daily throughput of this community is 1.1 TB traffic.

4.4

T4: etree.org

Datasets (2): T4’05, T4’09 T4’05 was collected from etree.org during the period between 22 Mar 2005 and 17 Jul 2005, and T4’09 has been colloected from this community since 24 Sep 2009 with 15 minute sampling interval. This community mainly distributes recorded events and only provides legal contents. Both datasets only contain swarm level data, which is peer’s ip address with last byte blinded, client type, port number, download amount, upload amount, connected time, sharing ratio, download progress, download speed and upload speed of in each swarm. And both datasets contain descriptive information of measured torrents including file name, infohash, added time, file size, number of files in each torrent and torrent description. In 2005, 165,168 sessions in 52 swarms were measured and the daily throughput of this community was 9 GB. In 2009, until now 169,768 sessions in 45 swarms are measured and the daily throughput of this community is 143 GB traffic.

4.5

T5: tlm-project.org

Datasets (2): T5’05, T5’09 T5’05 was collected from tlm-project.org during the period between 22 Mar 2005 and 30 Apr 2005 with, and T5’09 has been collected from this community since 24 Sep 2009 with 10 minute sampling interval. This community mainly distributes various linux distributions and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads and traffic of each measured swarm; peer level data contains peer’s ip address with last byte blinded, port number, download amount, upload amount, download progress, connected time, sharing ratio in each swarm, and T5’09 also includes peer’s download and upload speed. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size, number of files in each torrent. In 2005, 149,071 sessions in 264 swarms were measured and the daily throughput of this community was 735 GB. In 2009, until now 21,529 sessions in 74 torrents are measured and the daily throughput of this community is 15 GB.

4.6

T6: transamrit.net

Datasets (2): T6’05, T6’09 T6’05 was collected from the transamrit.net during the period between 22 Mar 2005 and 19 Jul 2005, and T6’09 has been collected from this community since 24 Sep 2009 with 5 minute sampling interval. This community mainly distributes Slackware linux distributions and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads and traffic of each measured swarm; peer level data contains ip address with last byte blinded, port number, download amount, upload amount, connected time, sharing ratio, download progress, download speed and upload speed in each measured swarm. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent.

Wp

9

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp4.7

T7: unix-ag.uni-kl.de

In 2005, 130,253 sessions in 14 swarms were measured and the daily throughput of this community was 258 GB. In 2009, until now 61,011 sessions in 60 swarms are measured and the daily throughput of this community is 840 GB.

4.7

T7: unix-ag.uni-kl.de

Datasets (2): T7’05, T7’09 T7’05 was collected from unix-ag.uni-kl.de during the period between 22 Mar 2005 and 19 Jul 2005, and T7’09 has been collected from this community since 24 Sep 2009 with 5 minute sampling interval. This community mainly distributes Knoppix linux distributions and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads, total traffic and average download progress of all participating peers of each swarm; peer level data contains peer’s ip address with last byte blinded, port number, download amount, upload amount, connected time, sharing ratio, download progress, download speed and upload speed in each measured swarm. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent. In 2005, 279,323 sessions in 11 swarms were measured and the daily throughput of this community was 493 GB. In 2009, until now 160,522 sessions in 12 swarms are measured and the daily throughput of this community is 348 GB.

4.8

T8: idsoftware.com

Datasets (2): T8’05, T8’09 T8’05 was collected from idsoftware.com during the period between 22 Mar 2005 and 19 Jul 2005, and T8’09 has been collected from this community since 24 Sep 2009 with 5 minute sampling interval. This community distributes demos of games from id Software and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders in each swarm: peer level data contains peer’s ip address with last byte blinded, port number, download amount, upload amount, connected time, download progress and sharing ratio in each measured swarm. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent. In 2005, 48,271 sessions in 13 swarms were measured and the daily throughput of this community was 19 GB. In 2009, until now 14,697 sessions in 37 swarms are measured and the daily throughput of this community is 12 GB.

4.9

T9: boenielsen.dk

Datasets (2): T9’05 T9’05 was collected from boenielsen.dk during the period between 22 Mar 2005 and 19 Jul 2005 with 5 minute sampling interval. This community mainly distributed Knoppix linux distributions and only provided legal contents. The dataset contains community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads, total traffic and average download progress of all peers of each swarm: peer level data contains peer’s ip address with last byte blinded, port number, download amount, upload amount, connected time, download progress and sharing ratio in measured swarms. And the dataset also contains descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent. In 2005, 36,391 sessions in 15 swarms were measured and the daily throughput of this community was 308 GB.

Wp

10

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

4000

Wp5. The Results

Swarm Level Peer Level

Flashcrowd

3500

30% difference

Number of peers

3000

2500

2000 Peer level measurement failure 1500

1000

500

0 20/12/03

27/12/03

03/01/04

10/01/04

17/01/04

Time

Figure 1: Comparison of swarm dynamics resulting from swarm-level measurement and peer-level measurement.

Community Throughput (GB)

12000

9000 55% difference 6000

3000 T7 ’05 Community Level T7 ’05 Swarm Level 0 16/04

23/04

30/04

07/05

14/05

21/05

Time (DD/YY)

Figure 2: Comparison of cumulative community throughput resulting from community-level measurement and swarm-level measurement.

5

The Results

In this section investigate the impact of the techniques for data source selection and data volume reduction on the accuracy of BitTorrent measurements.

5.1

Inaccuracy Due to Data Source Selection

Method: Throughout the evaluation of inaccuracy due to data source selection we compare characteristics extracted from measured datasets. 5.1.1

Measurement Level

Finding: Measurements focusing on a single operational level of BitTorrent may lead to very low accuracy. For example, swarm 003 of T1’04 was tracked at both swarm and peer level. During the

Wp

11

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp5.1

Inaccuracy Due to Data Source Selection

1

0.8

CDF

0.6

0.4

0.2

T1 Dec ’04 T2 ’05 T3 ’05 T4 ’05 T5 ’05 T8 ’05

0 0

200

400

600

800

1000

1200

File Size (MB)

Figure 3: Comparison of file size distributions in six BitTorrent communities in 2005. 1

0.8

CDF

0.6

0.4

0.2 T3 ’09 T4 ’09 T5 ’09 T8 ’09 0 0

200

400

600 Torrent Size (MB)

800

1000

1200

Figure 4: Comparison of file size distributions in four BitTorrent communities in 2009. flashcrowd exhibited at the beginning of the swarm activity (see Figure 1), the peer-level coverage drops below 70% of the swarm level data. Later during the measurements, the infrastructure becomes overloaded and the resulting failure leads to a 50% peer coverage for about half the duration of the flashcrowd. A similar effect can be observed when taking measurements at both community and swarm level. For T7’05, different levels of information aggregation (community and swarm) lead to errors of over 50% and thus very high inaccuracy (see Figure 2). Recommendation: Measure swarms or communities at multiple levels simultaneously. 5.1.2

Community Type

Finding: Measuring different BitTorrent communities may lead to very different results. For several BitTorrent communities, we show in Figures 3, 4, 5, 6, and 7 the cumulative distribution functions (CDFs) of file sizes, download speed, and session length, respectively. For example, the statistical properties of file sizes (Figure 3 and 4) differ significantly between communities. For these communities we do not see a correlation of the characteristics with the community focus on general vs. specific content (T1.Dec’04, T2’05, and T3’05 vs. T4’05, T5’05, and T8’05), or on content legality concerns (T1.Dec’04 and T2’05 vs. T3’05, Wp

12

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp5.1

Inaccuracy Due to Data Source Selection

1

0.8

CDF

0.6

0.4

0.2 T5 ’09 T6 ’09 T7 ’09 T8 ’09 0 0

50

100

150

200

250

300

Downlod Speed (KB/s)

Figure 5: Comparison of download speed in four BitTorrent communities. 1

0.8

CDF

0.6

0.4

T5 ’09 T6 ’09 T7 ’09 T8 ’09

0.2

0 0

10

20

30

40

50

Upload speed (KB/s)

Figure 6: Comparison of upload speed in four BitTorrent communities. T4’05, T5’05, and T8’05). Recommendation: Include BitTorrent communities of different types in the same measurement. 5.1.3

Passive vs. Active

Finding: Passive and active measurement results differ significantly. The presence of firewalled peers is significant in BitTorrent. For example, less than 60% of the peers are non-firewalled in the T1’04 (SuprNova) trace [16]. An in-depth analysis of the presence and behavior of firewalled peers for four communities was presented by Mol et al. [13]; their analysis also covers the data of T2’05 (The Pirate Bay), which were collected using both active and passive measurements. It turns out that only 34% of the peers discovered using the active measurements are non-firewalled and that 96% of the swarms have over 50% peers firewalled. Most importantly, the same study shows that the characteristics of firewalled and non-firewalled peers differ significantly. Notably, as BitTorrent rewards peers with (good) connectivity, non-firewalled peers exhibit 80% less uptime than firewalled peers. An impractically large number of measurement points, which act as peers and wait to be contacted, are required to reach good coverage passive measurements require the deployment. Wp

13

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp5.2 Inaccuracy Due to Data Volume Reduction

1

0.8

CDF

0.6

0.4

T5 ’09 T6 ’09 T7 ’09 T8 ’09

0.2

0 0

200

400

600

800

1000

1200

1400

Session Length (Minute)

Figure 7: Comparison of session length in four BitTorrent communities. Recommendation: Use both passive and active measurements for peer-level measurements.

5.2 5.2.1

Inaccuracy Due to Data Volume Reduction Sampling rate and Duration

Method: To understand the effect of the sampling rate, as a first step we select from a dataset measured with a low sampling interval (the highest sampling rate), from which we extract a contiguous block with the complete data for one month. We consider this block the complete data (in the sense of perfectly accurate) for the remainder of the analysis; we call this block the original dataset, and the sampling interval of this dataset the original sampling interval. As a second step, we sample from the selected dataset at various intervals, which are multiples of the original sampling interval, and report the accuracy of the datasets obtained with the new sampling intervals relative to the original dataset. We use a similar approach to understand the effect of the measurement duration, with the following changes. In the first step we select a one month dataset; we call the measurement duration of the selected dataset the original duration. In the second step, we consider the blocks of data from the beginning of the selected dataset and with various duration; the considered durations are power-of-two divisions of the original duration, such as 1/2, 1/4, 1/8, etc. By applying both the sampling interval enlargement and the duration reduction on the same dataset we can understand which of these two data volume reduction techniques has a bigger impact on the measurement accuracy. Finding: When measuring at peer-level, higher sampling interval leads to lower accuracy. In particular, a sampling interval above 15 minutes may lead to very low accuracy. Figure 8 shows for an exemplary swarm, swarm 003 of T1’04, that the hourly peer coverage drops below 80% when the sampling interval is increased from the 2.5 minutes (the original sampling interval) to 15 minutes, and below 60% when the sampling interval is increased to 30 minutes. Figure 9 confirms these results for more swarms in T1’04. Finding: When measuring at peer-level, higher sampling interval also leads to higher accuracy variance. Figure 10 depicts as a boxplot the basic statistical properties (i.e., median, Q1, Q3) of the accuracy values observed for the whole dataset, for various sampling intervals. Only measurements taken with a 7.5 (12.5) minutes or lower sampling interval have over 90% (80%) median accuracy. The expected variance, defined as the inter-quartile range and indicated in Figure 10 by boxes, indicates that only measurements with a sampling interval of at most 10 minutes result in over 80% accuracy. These results confirm the variance that can be visually observed in Figure 8. Recommendation: Measurements at peer level must be taken with rates of more than one sample every 15 minutes, and should target more than one sample at Wp

14

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp5.2 Inaccuracy Due to Data Volume Reduction

100

Percentage (%)

80

60

40

20 2.5 minutes 5 minutes 15 minutes 30 minutes 0 0

50

100

150

200

250

300

Time (in hour, since 2003.12.20, 15:00)

Figure 8: Comparison of transient peer coverage resulting from measuring at different sampling intervals. Only the first 300 hours are displayed. (Data from T1’04.003.)

Average Peer Coverage (%)

100

original measurement

80

60

40

20

T1 ’04 swarm 003 T1 ’04 swarm 005 T1 ’04 swarm 009 T1 ’04 swarm 010

15 min

0 0

5

10 15 Normalized Sampling Interval (unit = 2.5 min)

20

25

Figure 9: Average transient peer coverage resulting from measuring at different sampling intervals. (Data for four swarms of T1’04.) most every 10 minutes. Finding: Reducing the measurement duration quickly reduces the coverage of the measurements. A doubling of the sampling interval leads to lower accuracy loss than a halving of the measurement duration. Figure 11 depicts for several datasets the average peer coverage resulting from various measurement durations, including the original duration. The different datasets exhibit different losses of accuracy for initial reductions of the measurement duration, but quickly converge to over 80% loss of coverage. After the initial duration halving (to 1/2 of the original duration), the swarm 003 from the trace T1’04 is the least affected at over 80% coverage, but the coverage of the complete community in T5’05 would already be below 40% coverage. The large difference is the result of the system state: swarm 003 exhibits a large flashcrowd [16] in which the peers are caught for at least a week until obtaining the content they want, while in the tlm-project.org community the peers obtain results quickly and then can leave the swarm without returning. Recommendation: Measurements should be taken over a period of at least one month. Avoid

Wp

15

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp5.2 Inaccuracy Due to Data Volume Reduction

Figure 10: Measurement results becomes more unstable with large sampling intervals. The accuracy levels of 90%, 80%, and 60% are emphasized. (Data from T1.003.) original measurement

Overall Peer Coverage (%)

100

T1 ’04 swarm 003 T1 ’04 swarm 009 T5 ’05

80

60

40

20

0 4 weeks

2 weeks

1 week 1/2 week 1/4 week Measurement Duration

1/8 week

1/16 week

Figure 11: Comparison of overall peer coverage resulting from measuring for different sampling durations. reducing the volume of the data by reducing the measurement duration. Doubling the sampling interval is preferable to halving the measurement duration. 5.2.2

Number of Communities and Number of Torrents

Method: To understand the effect of the number of communities included in the measurement, as a first step we select from six communities, T4 through T9, the first month of data from their 2005 datasets; the first month is the same for each of these datasets. In Step 2, we order the six communities by the total amount of traffic generated by that community during the selected month. In Step 3, we compute all the investigated characteristics from all the selected datasets. In Step 4, we iteratively remove from the considered datasets the dataset corresponding to the community with the lowest rank (total traffic) that was considered at the previous iteration, and repeat from Step 3 until only one community is left for analysis. We apply a similar four-step approach to understand the impact of the number of torrents included in the measurement. We conservatively assume that there exists some way to order the swarms a priori by the amount of traffic they generate; this is often possible for example for highly-anticipated torrents such as blockbuster movies; this approach has been taken by many reported measurements [16, 8]. In Step 1, we select each of the swarms in a community as an independent dataset. In Step 2, we rank all the swarms in a community according to the total amount of traffic generated by each swarm. In Step 3, we compute all the investigated characteristics for all the considered

Wp

16

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp5.2 Inaccuracy Due to Data Volume Reduction

1

0.8 8% difference CDF

0.6

0.4 6 communities 5 communities 4 communities 3 communities 2 communities 1 community

0.2

0 0

500

1000

1500

2000

Session Length (Minute)

Figure 12: Comparison of session length distributions resulting from measuring different numbers of communities. 1

0.8 8% difference

CDF

0.6

0.4

0.2

6 communities 5 communities 4 communities 3 communities 2 communities 1 community

0 0

50

100

150

200

Downlod Speed (KB/s)

Figure 13: Comparison of download speed distributions resulting from measuring different numbers of communities. datasets. In Step 4, we iteratively remove from the considered datasets the lowest ranked swarms that were considered at the previous iteration, and repeat from Step 3 until only one swarm is left for analysis. We are thus able to analyze both absolute (e.g., from 100 to 50) and relative (e.g., from 100% to 50%) reductions in the number of swarms included in the measurement. Finding: Measuring only one community is insufficient to obtain representative BitTorrent results. Figure 12, 13 and 14 and depicts the session length, download speed, download amount per session CDFs for a varying number of communities. The session length CDF stabilizes only after four or more communities are considered together. And the download speed and download amount CDFs stablize after more than one community are considered together.The upload speed distribution is the only characteristic that we have investigated, and which does not require more communities to be measured to obtain, which is shown in figure 15. Recommendation: Measure at least four communities simultaneously. Finding: Measuring a single swarm is insufficient to obtain representative results for the swarm’s community. Figure 16 depicts the download speed characteristic for various numbers of swarms Wp

17

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp5.2 Inaccuracy Due to Data Volume Reduction

1

0.8

CDF

0.6

0.4

0.2

6 communities 5 communities 4 communities 3 communities 2 communities 1 community

0 0

200

400

600

800

1000

Download Amount (MB)

Figure 14: Comparison of download amount per peer distributions resulting from measuring different numbers of communities. 1

0.8

CDF

0.6

0.4

0.2

6 communities 5 communities 4 communities 3 communities 2 communities 1 community

0 0

10

20

30

40

50

60

Upload Speed (KB/s)

Figure 15: Comparison of upload speed distributions resulting from measuring different numbers of communities. from the Unix-AG community (trace T7’09): the characteristics of the top-ranked swarm and of the top-12 swarms are significantly different (over 8%). To better compare the impact of the number of selected swarms we display a Kiviat diagram with six axes, one for each characteristic (Figure 17). Six datasets are considered, corresponding each to a number of selected swarms, from 1 to 100. For each dataset, the displayed value on each axis is the average value obtained from the dataset normalized by the largest average value found for all datasets. The figure shows the difference between the top ranked swarm and the other datasets comprising multiple swarms, and characterizes the complex differences between the multiple swarm datasets. Recommendation: Include more than one swarm in the measurements. A different number of swarms leads to different results. Depending on the measurement scenario, including 20 to 50 swarms in the measurements provides a good trade-off between results accuracy and data volume.

Wp

18

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp5.2 Inaccuracy Due to Data Volume Reduction

1

0.8

8% difference

CDF

0.6

0.4

T7 ’09 12 swarms T7 ’09 6 swarms T7 ’09 3 swarms T7 ’09 1 swarms

0.2

0 0

100

200

300

400

500

Downlod Speed (KB/s)

Figure 16: Comparison of download speed distributions resulting from measuring different numbers of torrents in one community. Average Swarm Size

Average No. Sessions

Average Download Speed

Average Completed Downloads

Average Arrival Rate

Average Session Length

Top swarm (by no. peers) Top 5 swarms Top 10 swarms

Top 20 swarms Top 50 swarms Top 100 swarms

Figure 17: Comparison of main characteristics resulting from datasets acquired from a specific number of torrents coming from one community. 5.2.3

Catching long-term dynamics

Method: To show evidence of the long-term evolution of BitTorrent we first extract from our long-term traces blocks of contiguous data and then compare them. Finding: Yearly and seasonal patterns exist in BitTorrent, but different communities exhibit different yearly and seasonal evolution trends. In Figure 3, the file size CDFs for T1’04 and T2’05 indicate that a significant file size decrease may occur over the course of a single year; we now investigate this effect. Figure 18 depicts the evolution of the file sizes from Dec 2003 to Nov 2004; for clarity, the figure only shows curves corresponding to every second month. The measurements taken in Dec 2003 reveal a very different values of this characteristic vs the other measurements (8%, or a D metric value of 0.08). Smaller differences appear between consecutive months, and overall the file sizes decrease slowly over time (the curves have similar shape and ”move” towards the right side of the graph). The changes in the swarm size distribution for T1’04 are depicted in Figure 19. There exist some indications of seasonal pattern: high swarm sizes occur in April, June, and December, typical vacation months, and low swarm sizes occur in August and October, typical work months. We conclude that yearly and seasonal patterns exist in BitTorrent. Figure 20 depicts the evolution of the total monthly throughput for three communities, T3’05, T7’05, and T9’05, over a period of three months. The total monthly throughput evolves differently for the three communities; for example, for T7’05 the total Wp

19

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp5.2 Inaccuracy Due to Data Volume Reduction

1

0.8

8% difference

CDF

0.6

0.4 T1 ’03 Dec T1 ’04 Feb T1 ’04 Apr T1 ’04 Jun T1 ’04 Aug T1 ’04 Oct

0.2

0 0

500

1000

1500

2000

2500

3000

3500

4000

File Size (MB)

Figure 18: Comparison of monthly file size distribution from T1 (SuprNova). From 12 months of data only every second month is depicted. 1

0.8

CDF

0.6

0.4 T1 ’03 Dec T1 ’04 Feb T1 ’04 Apr T1 ’04 Jun T1 ’04 Aug T1 ’04 Oct

0.2

0 0

200

400

600

800

1000

Swarm Size

Figure 19: Comparison of monthly swarm size distributions from T1(SuprNova). From 12 months of data only every second month is depicted. monthly throughput first increases then decreases. We conclude that the evolution trends are not communityindependent. Recommendation: Measurements that follow several system characteristics should be longer than three months, and preferably at least year-long. Finding: Multi-year evolution is present in BitTorrent, but it is difficult to characterize. In our previous work [8] we have observed that the average application-level download speed (the main QoS indicator for BitTorrent) has doubled between T1’04 to T2’05. We now show that the evolution is not consistent across all users. Figure 21 depicts the download speed distributions for T1’04 and T2’05 with users grouped by continent. The top-left sub-graph confirms our previous remark; the doubling holds for the whole distribution of users, including the median. The median for EU users increased more than twice, which compensates for the lower increase in download speed for the other continents. North American and Asian users show similar median download speed growth. There is a steep increase in the number of users with very high download bandwidth in both the EU and North America, but not in the remaining continents. We have shown that a similar effect,

Wp

20

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp5.2 Inaccuracy Due to Data Volume Reduction

Figure 20: Comparison of monthly throughput of 3 communities in 2005. 1 CDF

0.8 0.6 0.4 0.2

World ’04 World ’05

Europe ’04 Europe ’05

North America ’04 North America ’05

Asia ’04 Asia ’05

South America ’04 South America ’05

Africa ’04 Africa ’05

0 1 CDF

0.8 0.6 0.4 0.2 0 0

20 40 60 80 Download speed (KB/s)

0

20 40 60 80 Download speed (KB/s)

0

20 40 60 80 Download speed (KB/s)

CDF

Figure 21: Change of download speed in different continents and in different years. (Data from trace BT-TUD-1, BT-TUD-2.) 1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2 T5 ’05 T5 ’09

T6 ’05 T6 ’09

0

0

CDF

0

50

100

150

200

250

300

0

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

50

100

150

0.2

T7 ’05 T7 ’09

0

200

250

300

250

300

T8 ’05 T8 ’09

0 0

50

100

150

200

250

300

Download speed (KB/s)

0

50

100

150

200

Download speed (KB/s)

Figure 22: Change of download speed from 2005 to 2009, by community.

Wp

21

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

Wp

CDF

BitTorrent BitTorrent Measurement AccuracyWp

Wp6. Related Work

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2 T3 ’05 T3 ’09

T4 ’05 T4 ’09

0

0

CDF

0

200

400

600

800

1000 1200

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0

200

400

600

0

200

400

600

0.2

T5 ’05 T5 ’09

0

800

1000 1200

T8 ’05 T8 ’09

0 0

200

400

600

800

1000 1200

Torrent Size (MB)

800

1000 1200

Torrent Size (MB)

Figure 23: Change of file size from 2005 to 2009, by community. that is, the download speed increases from 2005 to 2009, but the increase varies greatly by community, for traces T3-T8, as shown in figure 22. Similarly, we show in figure 23 evidence that the file size distribution changed from 2005 to 2009, but the actual ”direction” of the change varies greatly by community. Recommendation: Repeat the measurements yearly.

6

Related Work

bias:sec:related Much previous work was dedicated to measurements of real P2P file-sharing networks [17, 18, 3, 5, 10, 16, 7], but only few reported results include references to the bias of the measurements [8, 20, 2]. Overall, prior to this work there exists no comprehensive study of the sources of bias, and only limited solutions for reducing bias have been proposed. The studies that did not focus on bias include many of the bias sources analyzed in our work. Most of these studies span only a few days [3] or weeks. Similarly, many of these studies collect information from a particular location, such as a university [5] or a router in the Internet backbone [18]; thus, their results include a vantage point effect [15]. Other studies cover only few [10] files or just one community [6]. The studies that did recognize the importance of addressing measurement biases focused on a limited subset from the sources analyzed in our work. In general, under the assumption that ”‘more is better”’, these studies obtained data over long periods of time [20, 2], from more peers and from peers located all over the world [7, 8, 2], for more files [8] and more communities [2], and filtered the raw data before analysis to eliminate potential biases [7, 2]. However, these studies did not eliminate all the sources of bias investigated in our work, and did not quantify for the sources of bias they consider the extent of the bias. Close to our work, Stutzbach et al. [19] assess the bias incurred by sampling data from unstructured P2P file-sharing networks, and propose the MRWB technique for collecting nearly unbiased samples for unstructured networks. Their technique is designed specifically for unstructured networks such as Gnutella, in that it relies on the ability of the measurement tools to select nodes based on their different connectivity degree in the network graph. Thus, MRWB cannot be applied to traditional BitTorrent networks, in which all nodes have a connectivity degree of 1 to the measurement point (the tracker). Also, this body of related work does not consider the case of disjoint networks, which are common in BitTorrent either as independent communities or as independent swarms within the same community. Closest to our work, Stutzbach et al. [20] investigate the trade-off between sampling rate and the bias Wp

22

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp Wp7. Conclusion and Future Work

incurred by sampling data from unstructured P2P file-sharing networks. However, their analysis results are specific to unstructured networks in that the maximum crawling rate can exceed the rate of meaningful events such as peer arrival and departure. Thus, their results do not hold for BitTorrent networks, where the default tracker settings limit the ”‘crawl”’ to less than 500 peers per hour.

7

Conclusion and Future Work

Accurate measurements of real BitTorrent deployments are required to improve quality of service for millions of BitTorrent users. In this work we have presented the first thorough investigation of the factors that influence the BitTorrent measurement accuracy. Towards this end, we have first proposed a method for evaluating accuracy. Our method includes a taxonomy of sources of inaccurate results comprising two axes–data source selection and data volume reduction– totaling six inaccuracy sources, and two categories of metrics for quantifying accuracy. Then, we have evaluated the effects of the different sources of inaccuracy using 15 real traces taken from 9 BitTorrent communities. Our results indicate that current BitTorrent techniques fail to consider most of the six sources of inaccuracy, and thus introduce high and often difficult to characterize data inaccuracies. We plan to extend our work towards a complete method for accurate and low-volume BitTorrent measurements.

8

Acknowledgements

The research leading to this contribution has received funding from the European Community’s Seventh Framework Programme in the P2P-Next project under grant no 216217.

Wp

23

http://www.pds.ewi.tudelft.nl/∼iosup/

PDS B. Zhang et al.

Wp

BitTorrent BitTorrent Measurement AccuracyWp

Wp WpReferences

References [1] ipoque internet studies, 2006-2009. [Online] Available: www.ipoque.com/resources/internet-studies/. 5 [2] N. Andrade, E. Santos-Neto, F. V. Brasileiro, and M. Ripeanu. Resource demand and supply in bittorrent contentsharing communities. Computer Networks, 53(4):515–527, 2009. 4, 7, 22 [3] R. Bhagwan, S. Savage, and G. M. Voelker. Understanding availability. In IPTPS, pages 256–267, 2003. 4, 22 [4] P. Garbacki, D. H. J. Epema, and M. van Steen. Optimizing peer relationships in a super-peer network. In ICDCS, page 31, 2007. 6 [5] K. Gummadi, R. Dunn, S. Saroiu, S. Gribble, H. Levy, and J. Zahorjan. Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In ACM Symp. on Operating Systems Principles (SOSP), 2003. 4, 22 [6] L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, and X. Zhang. Measurements, analysis, and modeling of bittorrent-like systems. In Internet Measurment Conference, pages 35–48, 2005. 22 [7] S. B. Handurukande, A.-M. Kermarrec, F. L. Fessant, L. Massouli´e, and S. Patarin. Peer sharing behaviour in the edonkey network, and implications for the design of server-less file sharing systems. In EuroSys, pages 359–371, 2006. 4, 6, 22 [8] A. Iosup, P. Garbacki, J. A. Pouwelse, and D. H. J. Epema. Correlating topology and path characteristics of overlay networks and the internet. In IEEE/ACM Int’l. Symp. on Cluster Computing and the Grid (CCGrid) Workshops, GP2PC, page 10, 2006. 4, 6, 7, 16, 20, 22 [9] A. Iyengar, M. S. Squillante, and L. Zhang. Analysis and characterization of large-scale web server access patterns and performance. World Wide Web, 2(1-2):85–100, 1999. 5, 7 [10] M. Izal et al. Dissecting BitTorrent: Five Months in a Torrent’s Lifetime. In Proc. of PAM, pages 1–11, Antibes Juan-les-Pins, France, Apr 2004. 4, 7, 22 [11] W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson. On the self-similar nature of ethernet traffic (extended version). IEEE/ACM Trans. Netw., 2(1):1–15, 1994. 5, 7 [12] H. W. Lilliefors. On the kolmogorov-smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62:399–402, 1967. 6 [13] J. Mol, J. Pouwelse, D. Epema, and H. Sips. Free-riding, fairness, and firewalls in p2p file-sharing. In IEEE Int’l. Conf. on Peer-to-Peer Computing (P2P), pages 301–310, 2008. 13 [14] A. Parker. The True Picture of Peer-To-Peer File-Sharing, 2005. Panel Presentation, IEEE Int’l. Workshop on Web Content Caching and Distribution. 5 [15] V. Paxson. Strategies for Sound Internet Measurement. In Proc. of ACM/USENIX IMC, pages 263–271, Oct 2004. 22 [16] J. A. Pouwelse, P. Garbacki, D. H. J. Epema, and H. J. Sips. The bittorrent p2p file-sharing system: Measurements and analysis. In IPTPS, volume 3640 of LNCS, pages 205–216. Springer, 2005. 4, 7, 13, 15, 16, 22 [17] S. Saroiu, P. K. Gummadi, and S. Gribble. A measurement study of peer-to-peer file sharing systems. In Multimedia Computing and Networking (MMCN ’02), January 2002. 4, 22 [18] S. Sen and J. Wang. Analyzing peer-to-peer traffic across large networks. In Proc. of ACM SIGCOMM IMW, pages 137–150, 2002. 4, 22 [19] D. Stutzbach, R. Rejaie, N. G. Duffield, S. Sen, and W. Willinger. On unbiased sampling for unstructured peer-topeer networks. IEEE/ACM Trans. Netw., 17(2):377–390, 2009. 22 [20] D. Stutzbach, R. Rejaie, and S. Sen. Characterizing unstructured overlay topologies in modern p2p file-sharing systems. IEEE/ACM Trans. Netw., 16(2):267–280, 2008. 4, 22 [21] S. Xie, G. Y. Keung, and B. Li. A measurement of a large-scale peer-to-peer live video streaming system. In ICPPW ’07: Proceedings of the 2007 International Conference on Parallel Processing Workshops, page 57, Washington, DC, USA, 2007. IEEE Computer Society. 6 [22] B. Zhang, A. Iosup, P. Garbacki, and J. Pouwelse. A unified format for traces of peer-to-peer systems. In LSAP ’09: Proceedings of the 1st ACM workshop on Large-Scale system and application performance, pages 27–34, New York, NY, USA, 2009. ACM. 4

Wp

24

http://www.pds.ewi.tudelft.nl/∼iosup/