BitTorrent Traffic Measurements and Models - CiteSeerX

4 downloads 98547 Views 5MB Size Report
The architecture of Napster was built around a central server that was used to .... content delivery networks (CDNs), such as Akamai [1]. Re- dundant server ...
BitTorrent Traffic Measurements and Models David Erman

October 2005 Department of Telecommunication Systems, School of Engineering, Blekinge Institute of Technology

c October 2005, David Erman. All rights reserved. Copyright Blekinge Institute of Technology Licentiate Dissertation Series No. 2005:13 ISSN 1650-2140 ISBN 91-7295-071-4

Published 2005 Printed by Kaserntryckeriet AB Karlskrona 2005 Sweden This publication was typeset using LATEX.

For my Family past, present and future

Abstract The Internet has experienced two major revolutions. The first was the emergence of the World Wide Web, which catapulted the Internet from being a scientific and academic network to becoming part of the societal infrastructure. The second revolution was the appearance of the Peer-to-Peer (P2P) applications, spear-headed by Napster. The popularity of P2P networking has lead to a dramatic increase of the volume and complexity of the traffic generated by P2P applications. P2P traffic has recently been shown to amount to almost 80 % of the total traffic in a high speed IP backbone link. One of the major contributors to this massive volume of traffic is BitTorrent, a P2P replication system. Studies have shown that BitTorrent traffic more than doubled during the first quarter of 2004, and still amounts to 60 % of all P2P traffic in 2005. This thesis reports on measurement, modelling and analysis of BitTorrent traffic collected at Blekinge Institute of Technology (BIT) as well as at a local ISP. An application layer measurement infrastructure for P2P measurements developed at BIT is presented. Furthermore, a dedicated fitness assessment method to avoid issues with large sample spaces is described. New results regarding BitTorrent session and message characteristics are reported and models for several important characteristics are provided. Results show that several BitTorrent metrics such as session durations and sizes exhibit heavy-tail behaviour. Additionally, previously reported results on peer reactivity to new content are corroborated.

iii

iv

Acknowledgements Several people have contributed in various ways, directly or indirectly, to the work culminating in this thesis. I extend my gratitude to them all. However, I would like to thank a few people in particular. • My advisor, Docent Adrian Popescu for his attention to detail and correctness, motivation and encouragement. • My fellow graduate students at BIT. In particular Dragos Ilie and Doru Constantinescu for valuable criticism and encouragement. • Dr. Markus Fiedler. His enthusiasm and tenacity is an inspiration to any PhD student. • Prof. Arne Nilsson for accepting me as a PhD student. • My parents, for performing above and beyond the call of duty and teaching me to question the unquestionable. • My immediate family, Maria, for putting up with me during the writing of this thesis.

David Erman Karlskrona, October 2005

v

vi

Contents Page 1 Introduction

1

1.1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.2

Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.3

Main Contributions

. . . . . . . . . . . . . . . . . . . . . . .

4

1.4

Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2 Peer-to-peer Protocols

7

2.1

Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2.2

Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

2.3

P2P and File Sharing . . . . . . . . . . . . . . . . . . . . . .

12

2.4

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

3 The BitTorrent Protocol

15

3.1

BitTorrent Encoding . . . . . . . . . . . . . . . . . . . . . .

16

3.2

Resource Meta-data . . . . . . . . . . . . . . . . . . . . . . .

17

3.3

Network Entities and Protocols . . . . . . . . . . . . . . .

18

3.4

Peer States . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

3.5

Sharing Fairness and Bootstrapping . . . . . . . . . . . . .

23

3.6

Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

vii

3.7

BitTorrent Performance Issues . . . . . . . . . . . . . . .

25

3.8

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

4 Traffic Measurements

29

4.1

Measurement Approaches . . . . . . . . . . . . . . . . . . . .

30

4.2

Application Level Traffic Analysis . . . . . . . . . . . . .

34

4.3

Measurement Infrastructure . . . . . . . . . . . . . . . . .

36

4.4

Measurement Software . . . . . . . . . . . . . . . . . . . . .

36

4.5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

5 Traffic Modelling

43

5.1

Heavy-tailed Traffic Models . . . . . . . . . . . . . . . . .

44

5.2

Hypothesising Distributions . . . . . . . . . . . . . . . . . .

51

5.3

Mixture Distributions . . . . . . . . . . . . . . . . . . . . . .

54

5.4

Parameter Estimation . . . . . . . . . . . . . . . . . . . . . .

58

5.5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

6 Fitness Assessment

65

6.1

Graphical Methods . . . . . . . . . . . . . . . . . . . . . . . .

66

6.2

Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . .

66

6.3

The Case of Large Sample Spaces . . . . . . . . . . . . . .

71

6.4

Relative and Absolute Fitness . . . . . . . . . . . . . . . .

71

6.5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

7 Modelling Methodology

73

7.1

Distribution Selection

. . . . . . . . . . . . . . . . . . . . .

73

7.2

Parameter Estimation . . . . . . . . . . . . . . . . . . . . . .

74

7.3

Fitness Assessment . . . . . . . . . . . . . . . . . . . . . . . .

74

7.4

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

viii

8 BitTorrent Measurements

79

8.1

Traffic Metrics . . . . . . . . . . . . . . . . . . . . . . . . . .

80

8.2

Traffic Measurements . . . . . . . . . . . . . . . . . . . . . .

82

8.3

Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . .

84

8.4

Swarm Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

8.5

Session Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . .

92

8.6

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

9 BitTorrent Models

95

9.1

Session Characteristics . . . . . . . . . . . . . . . . . . . . .

9.2

Message Characteristics . . . . . . . . . . . . . . . . . . . . 105

9.3

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

10 Conclusions and Future Work

95

119

10.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A BitTorrent Protocol Details

123

A.1 Bencoding Types . . . . . . . . . . . . . . . . . . . . . . . . . 123 A.2 Peer Wire Protocol Messages . . . . . . . . . . . . . . . . 124 A.3 Tracker Request Parameters . . . . . . . . . . . . . . . . . 125 A.4 Scrape Response Keys . . . . . . . . . . . . . . . . . . . . . . 128 B BitTorrent XML Log File

129

B.1 BitTorrent Application Log DTD . . . . . . . . . . . . . . . . . . 134 Bibliography

137

ix

x

List of Figures Figure

Page

3.1

BitTorrent handshake procedure . . . . . . . . . . . . . . . . . .

19

3.2

Example tracker announce GET request . . . . . . . . . . . . . .

20

3.3

Compact tracker response . . . . . . . . . . . . . . . . . . . . . .

21

3.4

Example tracker scrape GET request . . . . . . . . . . . . . . . .

22

3.5

BitTorrent protocol exchange . . . . . . . . . . . . . . . . . . . .

25

4.1

BIT measurement setup . . . . . . . . . . . . . . . . . . . . . . .

36

4.2

Measurement procedures . . . . . . . . . . . . . . . . . . . . . . .

37

4.3

Sample BitTorrent log file . . . . . . . . . . . . . . . . . . . . . .

42

5.1

Pareto, Weibull, Log-normal and Exponential Hill plots . . . . .

46

5.2

Pareto, Weibull, Log-normal and Exponential α-estimator plots .

47

5.3

Pareto, Weibull, Log-normal and Exponential CCDF . . . . . . .

49

5.4

Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

5.5

A finite mixture distribution . . . . . . . . . . . . . . . . . . . . .

54

6.1

AD weighting function for a uniform distribution . . . . . . . . .

69

8.1

Temporal structure of measurements 1–12 . . . . . . . . . . . . .

83

8.2

Connected peers during seed phase for measurements 4 and 6 . .

90

xi

8.3

Connected peers during leech phase for measurements 4 and 6 . .

91

8.4

Swarm reaction to new content . . . . . . . . . . . . . . . . . . .

91

9.1

Fitness assessment plots . . . . . . . . . . . . . . . . . . . . . . .

97

9.2

Session size-duration scatter plot . . . . . . . . . . . . . . . . . . 100

9.3

α-estimates and CCDF for measurement 3 . . . . . . . . . . . . . 102

9.4

Upstream request rate during leech phase . . . . . . . . . . . . . 106

9.5

Modelling results for request rate during leech phase . . . . . . . 107

9.6

Modelling results for request inter-departure times during leech phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

9.7

Modelling results for downstream piece rate during leech phase . 110

9.8

Modelling results for downstream piece inter-arrival times during leech phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

9.9

Dual Weibull modelling results for downstream request rate during seed phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

9.10 Modelling results for request inter-arrival times during seed phase 115 9.11 Dual Weibull modelling results for upstream piece rates during seed phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 9.12 Modelling results for piece inter-departure times during seed phase117 B.1 Extract from BitTorrent XML log file . . . . . . . . . . . . . . . 130

xii

List of Tables Table

Page

2.1

P2P and CS content models . . . . . . . . . . . . . . . . . . . . .

11

6.1

EDF statistic percentage points . . . . . . . . . . . . . . . . . . .

71

7.1

Fitness quality boundaries . . . . . . . . . . . . . . . . . . . . . .

76

8.1

Measurement summary

. . . . . . . . . . . . . . . . . . . . . . .

83

8.2

Content summary

. . . . . . . . . . . . . . . . . . . . . . . . . .

84

8.3

Download time and average download rate summary . . . . . . .

85

8.4

Session and peer summary . . . . . . . . . . . . . . . . . . . . . .

86

8.5

Downstream protocol message summary . . . . . . . . . . . . . .

88

8.6

Upstream protocol message summary . . . . . . . . . . . . . . . .

89

8.7

Share ratio during leech phase . . . . . . . . . . . . . . . . . . . .

92

8.8

Correlation coefficients for session sizes . . . . . . . . . . . . . . .

92

9.1

Fitted hyper-exponential parameters . . . . . . . . . . . . . . . .

98

9.2

Correlation coefficients for session duration and sizes . . . . . . .

99

9.3

Percentages of session sizes exceeding 0 bytes and 1 piece size . . 100

9.4

Session α-estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 101 xiii

9.5

Log-normal parameter estimates and errors for upstream session sizes during seed phase . . . . . . . . . . . . . . . . . . . . . . . . 103

9.6

Log-normal parameter estimates and errors for upstream session durations during seed phase . . . . . . . . . . . . . . . . . . . . . 104

9.7

Gaussian parameter estimates and errors for upstream request rate during leech phase . . . . . . . . . . . . . . . . . . . . . . . . 107

9.8

Exponential parameter estimates and errors for request inter-departure times during leech phase . . . . . . . . . . . . . . . . . . 107

9.9

Exponential and Uniform parameter estimates and errors using alternative model for request inter-departure times during leech phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

9.10 Weibull parameter estimates and errors for downstream piece rate during leech phase . . . . . . . . . . . . . . . . . . . . . . . . . . 109 9.11 Exponential parameter estimates and errors for piece inter-arrival times during leech phase . . . . . . . . . . . . . . . . . . . . . . . 110 9.12 Weibull parameter estimates and errors for downstream request rate during seed phase . . . . . . . . . . . . . . . . . . . . . . . . 112 9.13 Dual Weibull parameter estimates and errors for downstream request rate during seed phase . . . . . . . . . . . . . . . . . . . . . 112 9.14 Exponential parameter estimates and errors for request interarrival times during seed phase . . . . . . . . . . . . . . . . . . . 114 9.15 Weibull parameter estimates and errors for upstream piece rate during seed phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 9.16 Dual Weibull parameter estimates and errors for upstream piece rate during seed phase . . . . . . . . . . . . . . . . . . . . . . . . 116 9.17 Exponential parameter estimates and errors for piece inter-departure times during seed phase . . . . . . . . . . . . . . . . . . . . 117

xiv

Acronyms Anderson-Darling Blekinge Institute of Technology CCDF Complementary Cumulative Distribution Function CDN Content Delivery Network CS Client-Server CVM Cram´er-von Mises DHT Distributed Hash Table DNS Domain Name System DTD Document Type Definition DOM Document Object Model EDF Empirical Distribution Function EPDF Experimental Probability Density Function IID Independent and Identically Distributed ISP Internet Service Provider KS Kolmogorov-Smirnov LRD Long-Range Dependence MLE Maximum Likelihood Estimation ML Maximum-Likelihood MPAA Motion Picture Association of AD BIT

America NAT Network Address Translation NFS Network Filesystem NNTP Network News Transfer Protocol P2P Peer-to-Peer PDF Probability Density Function PIT Probability Integral Transform PSTN Public Switched Telephone Network QQ Quantile-Quantile QoS Quality of Service RIAA Recording Industry Association of America RMON Remote Monitoring SAX Simple API for XML SHA-1 Secure Hash Algorithm One SMTP Simple Mail Transfer Protocol SNMP Simple Network Management Protocol SRD Short-Range Dependence URI Uniform Resource Indicator UUCP Unix to Unix Copy Protocol VoIP Voice over IP

xv

xvi

Chapter 1

Introduction The prisoner falls in love with his chains. – Edsger W. Dijkstra

The global Internet has emerged to become an integral part of everyday life. It is now as fundamental a part of the infrastructure as the telephone system or the road network. The initial driving factor pushing the acceptance and widespread usage of the Internet was the introduction of the World Wide Web (WWW) by Tim Berners-Lee in 1989. The WWW provided a way of accessing information in a novel and intuitive way, and quickly became the Internet “killer application” [6]. In May 1999, ten years after the advent of the WWW, Shawn Fanning introduced Napster, arguably the first modern Peer-to-Peer (P2P) application [53]. The Napster application and protocols were the first to allow users to share files among each other without the need of a central storage server. Very quickly, Napster became immensely popular, and the P2P revolution had begun. Since the advent of Napster, P2P systems have become wide-spread with the emergence of file-sharing applications such as Gnutella [46], KaZaA [71] and eDonkey [27]. These systems generated headlines across the globe when the

1

CHAPTER 1. INTRODUCTION Recording Industry Association of America (RIAA) [56] and Motion Picture Association of America (MPAA) [55] started filing law suites against file-sharing users suspected of copyright infringement. The law suites are partly responsible for the embrace of the term P2P as a euphemism for illegal file-sharing. Fortunately, the concept of P2P networking is broader than that and P2P systems have many useful legitimate applications. The P2P paradigm is the logical and functional antithesis of the ClientServer (CS) paradigm that has been the predominant paradigm for IP-based networks since their inception. This is however only true to a certain degree, as the idea of sharing among equals has been part of the Internet since the early days of the network. Two examples are the e-mail system employed in the Internet and the Domain Name System (DNS). Both protocols are so tightly connected to the inner workings of the Internet, that it is impossible to imagine the degree of usage that the Internet sees today without them. Once an e-mail has left the user’s mail software, it is routed among mail transfer agents (MTAs), all acting as equally valued message forwarders. The DNS is the first distributed information database, and implements a hierarchical mapping scheme, which is comparable to the multi-layered P2P systems. The fundamental difference between “legacy” P2P systems such as DNS and e-mail and the new Peer-to-Peer (P2P) systems such as Gnutella, Napster and eDonkey is that the older systems work as part of the network core, while the new applications are typically application-layer protocols run by edge-node applications. The shift of the edge nodes from acting as purely service users to additionally taking the role as service providers has significantly changed the characteristics of the network traffic.

1.1

Motivation

Measurement studies and analysis of P2P traffic have been rather limited so far. This is because of the complexity of this task, which involves answering hard questions related to data retrieval and content location, storage, data analysis and modelling of traffic and topological characteristics as well as privacy and 2

1.2. RELATED WORK copyright issues. There are two major points in motivating the work performed for this thesis: • BitTorrent has become extremely popular over the last years. According to Cachelogic, the BitTorrent traffic volume has increased from 26 % to 52 % of the total P2P traffic volume during the first half of 2004 [9]. The increase of the amount of BitTorrent traffic indicates that understanding the characteristics of BitTorrent would also help in understanding the overall Internet behaviour. • There are few measurement studies performed on BitTorrent [24, 38, 39]. This is because the protocol is quite new, only a few years old, but also because of the general complexity of the task. In the few studies that do exist, traffic has been collected from ”trackers” as well as with the help of modified clients. However, there have been no dedicated measurement studies on a message-level so far. The main goals of this thesis are to understand the characteristics of the BitTorrent system and, based on that, to develop models suitable for a P2P simulation environment. To that end, a dedicated measurement system for P2P system traffic measurements [36] has been designed and implemented.

1.2

Related Work

In general, measurement studies of P2P systems are limited in number. Saroiu et al. performed a measurement study of Napster and Gnutella in 2002 [69]. Active and passive measurements were performed on both systems. Their results show the non-cooperativity of peers involved in the systems and several other characteristics such as estimated peer bandwidths, number of shared files and resilience. The present work is one of a few works investigating the properties of the BitTorrent system. For instance, in [38], the authors use tracker and client 3

CHAPTER 1. INTRODUCTION logs to evaluate performance on both global and session scales. They note the efficiency of the tit-for-tat-policy employed in BitTorrent, and the flexibility and scalability of the protocol. Qiu and Srikant present a fluid flow model for BitTorrent-like file-sharing P2P networks. They assume a Poisson peer arrival process and exponentially distributed download times. Additionally, the authors assume that seeds remain in the network according to an exponentially distributed time and identical download rates for all peers. Their results state that the number of seeds and leechers are Gaussian random variables [24] when in steady state. Nicoll et al. have analysed tracker log files with regards to session sizes (denoted by file sizes in the paper), peer bandwidth and share ratios. They note that up to 20 % of peers do not download any data at all, and 20-25 % of peers connect but do not upload any data. Additionally, the authors point to that 80 % of peers have a share ratio less than 1, i.e., they download more than they upload. The measurement infrastructure developed at Blekinge Institute of Technology (BIT) is capable of detecting and measuring application layer messages with link layer accuracy. One drawback with the infrastructure is that it cannot yet do so in real-time. In [44], Karagiannis et al. present a novel method for identifying P2P traffic without resorting to application payload decoding. Their method is based on observing connection patterns of source and destination IP addresses. To verify the method, the authors also present a payload identification method using protocol-specific bit strings.

1.3

Main Contributions

The main contributions of this thesis are related to providing accurate models of several BitTorrent key characteristics. To the best of our knowledge, this is the first study of this kind. The reported models include session duration, size and inter-arrival times as well as rates and inter-arrival times for the two most relevant BitTorrent 4

1.4. THESIS OUTLINE application messages. From a traffic engineering and control viewpoint, the session models reported in Section 9.1 provide incentive for controlling the amount of concurrent BitTorrent flows. The message characteristics reported in Section 9.2 indicate that some form of per-message control may also be beneficial to decrease the burstiness of the network traffic. Other contributions include the development of a modular P2P measurement infrastructure. This infrastructure is currently used to measure Gnutella and BitTorrent traffic with high accuracy on the link layer. Additionally, the method for assessing model fitness in the case of large sample spaces may prove useful for other modelling scenarios as well (Section 7.3). It has performed well during the current work, as well as in other published work [21]. Parts of the the work presented in this thesis has been previously been published in [29–31, 36, 37].

1.4

Thesis Outline

This thesis contains nine chapters and two appendices. The current chapter has presented the motivation for and main contributions of the thesis, along with a brief presentation of the state of the art in P2P research. Chapter two contains a short history and description of P2P systems, with special focus on the most popular application, i.e., file sharing. This is followed by a detailed description of the BitTorrent system and the associated protocols in chapter three. Chapter four gives an introduction to traffic modelling and a brief description of the measurement infrastructure used for this work. Chapter five discusses traffic modelling in general, and heavy-tailed modelling in particular. Also, tools for describing empirical distributions are presented. Chapter six summarises some of the most common methods of determining the fitness of specific distributions. Chapter seven builds on the two preceding chapters to 5

CHAPTER 1. INTRODUCTION present the modelling methodology used for the work performed for this thesis. In Chapter eight, the actual measurements performed are presented, together with some of the more salient results of these measurements. Chapter nine reports on the models for BitTorrent session and message characteristics. Chapter ten concludes the thesis, with conclusions and implications of the presented work. Potential future work is also presented. The appendix contains implementation details for the BitTorrent protocols, and a description of the XML log format used for the application measurements presented in Chapter eight.

6

Chapter 2

Peer-to-peer Protocols Tvertimot! – Henrik Ibsen

The concept of P2P protocols, systems and applications is quite broad. The term P2P commonly refers to applications and systems that share resources in a distributed and decentralised manner. Participants in these systems are viewed as logical and functional equals. This is in contrast to pure ClientServer (CS) protocols, where participants either serve resources or are being served resources. A more formal definition of these is provided in Section 2.2.

2.1

Evolution

The earliest recorded use of the term “peer-to-peer” was in 1984. It was related to the IBM Advanced Peer to Peer Networking (APPN) Architecture [28], which was the result of multiple enhancements to the Systems Network Architecture (SNA). Although early networking protocols such as the Unix to Unix Copy Protocol (UUCP) [35], Network News Transfer Protocol (NNTP) [43] and Simple Mail 7

CHAPTER 2. PEER-TO-PEER PROTOCOLS Transfer Protocol (SMTP) [45] were working in a P2P fashion – indeed, the original ARPANET was designed as a P2P system – the term P2P did not become mainstream until the appearance of Napster in the fall of 1999. Napster was the first popular file-sharing P2P service. The main goal of the service was to provide users with easy means of finding music files encoded in the MP3 format. The architecture of Napster was built around a central server that was used to index music files shared by client nodes. This approach is called a centralised directory. The centralised directory allowed Napster to give a very rapid reply to which hosts stored a particular file. The actual file transfer occurred directly between the node looking for the file and the node storing the file. The success of Napster quickly became a source of serious concern for major record companies who rapidly filed a lawsuit against Napster on grounds of copyright infringement. The lawsuit made Napster immensely popular, attracting additional millions of users to the system. However, Napster could not withstand the pressure of the lawsuit and in July 2001 they were forced to shut down the central server. Without the central server the client nodes could no longer search for files. Thus, the fragility of a centralised directory system became apparent. Napster is one of the first generation P2P applications as defined by [28]. Following the advent of Napster, several other P2P applications appeared. These applications were similar in appearance, but altogether different beasts in detail. Gnutella [18], which was released by Justin Frankel of Winamp fame in early 2000, opted to implement a fully distributed system with no central authority. The same year saw the emergence of the Freenet system, which was the brainchild of Ian Clarke. Clarke wrote his Master’s thesis on a distributed, anonymous and decentralised information storage and retrieval system. This system later became Freenet [17, 65]. Freenet’s major difference to previous P2P systems was the complete anonymity it offered to users. The fully distributed architecture was resilient to node failures and was also immune to service disruptions of the type experienced by Napster. However, experience with Gnutella has shown that fully distributed P2P systems may 8

2.2. DEFINITIONS lead to scalability problems due to the massive amounts of signalling traffic they generate [68]. By late 2000 and early 2001, the P2P boom had started and applications such as KaZaA [71], DirectConnect [54], SoulSeek [73] and eDonkey [27] appeared. These new systems usually provided some form of community-like features such as chat rooms and forums, in addition to the file-sharing services provided by previous systems. KaZaA, which uses the FastTrack protocol, introduced the concept of supernodes in order to solve scalability problems similar to those experienced by Gnutella. Each supernode manages a number of regular nodes and exchanges information about them with other supernodes. Regular nodes upload file lists and search requests to their supernode. The search requests are processed solely among the supernodes. Regular peers establish direct HTTP connections to download files. Gnutella resolved the scalability problem in a similar way. In Gnutella, supernodes are called ultrapeers. During the last few years, the old P2P systems have evolved to better utilise network resources. New systems have emerged with the specific focus of efficient bandwidth utilisation. The most significant example of this development is the BitTorrent system. Furthermore, new systems tend to focus on using Distributed Hash Tables (DHTs). DHTs force network topology and data storage to follow specific mathematical structures in order to optimise various parameters (e.g., minimise delay or number of hops). They are considered as being a promising alternative to the flooding algorithms required by routing in unstructured P2P networks.

2.2

Definitions

There is no clear consensus regarding an exact definition of a P2P system. Schollmeier makes an attempt to define a P2P network in [70]. In general, the notion of a P2P network seems to be leaning towards some form of utilisation 9

CHAPTER 2. PEER-TO-PEER PROTOCOLS of edge node resources by other edge node resources. The resource in question is commonly accepted to be files, and much research is focusing on the efficient localisation and placement of files. There also seems to be some consensus regarding the idea of pure and hybrid systems. A P2P network is defined in [70] as a network in which the service provided by the system is provided by the participating nodes. The participating nodes share part of their local resource pool, such as disk space, files, CPU processing time to the common resource pool. A pure P2P network is one in which any given participant may be removed without the system experiencing loss of service. Examples of this type of network are Gnutella, FastTrack and Freenet. A hybrid P2P network is one in which a central authority of some sort is necessary for the system to function. Note that, in contrast to the CS model, the central authority in a hybrid network rarely shares resources – it is still the participating peers that share resources. The central authority is commonly an indexing server for files or provides a peer localisation service. Examples of this type of network are Napster, eDonkey and DirectConnect. It is also possible to take a resource view of the two types of P2P networks described above. Consider the three functions of content insertion, distribution and control and how they are performed in P2P and CS networks (Table 2.1). Insertion

Insertion is the function of adding content to the resource pool of a network. Insertion is here referred to in the sense of providing the content, so that in both pure and hybrid content is inserted by the participating peers. This is analogous with the peers sharing content. In a CS system however, content is always provided by the server, and thus also “shared” by the server.

Distribution This is the function of retrieving content from a network resource pool. Again, P2P systems lack central content localisation, thus content is disseminated in a decentralised fashion. This does not necessarily mean that parts of the same content is retrieved from different sources, i.e., swarming, 10

2.2. DEFINITIONS but rather that the parts (e.g., files) of the total resource pool are retrieved from different sources. Hybrid CS systems refer to redundant server systems and content delivery networks (CDNs), such as Akamai [1]. Redundant server systems are systems in which several servers provide the same content, but are accessed by the requesting client from a single Uniform Resource Indicator (URI). This is a common model for WWW servers in the Internet today. Control

Control is the function of managing the resource pool of a network, such as admission control and resource localisation. This is the function that separates the two types of P2P networks. The peers participating in fully decentralised networks are required to assist in the control mechanisms in the network, while hybrid systems may rely on a central authority for this. Of course, the clients in CS systems have no responsibility towards the network control functionality.

Table 2.1: P2P and CS content models. C denotes centralised and D denotes decentralised.

Pure P2P

Hybrid P2P

Hybrid CS

Pure CS

Insertion

D

D

C

C

Distribution

D

D

C/D

C

Control

D

C

C

C

In addition to the definitions provided above, P2P systems can also be classified according to their “generation” [28]. In this classification scheme, hybrid systems such as Napster are considered to be first generation systems, while fully decentralised systems such as FastTrack and Gnutella are second generation systems. A third generation is discussed as being the improvement upon the 11

CHAPTER 2. PEER-TO-PEER PROTOCOLS two first with respect to features such as redundancy, reliability or anonymity.

2.3

P2P and File Sharing

File sharing is almost as old as operating systems themselves. Early methods for sharing files include protocols such as the UNIX remote copy (rcp) command and the File Transfer Protocol (FTP) [64]. They were quickly followed by fullfledged network file systems such as NFS [52,72] and CIFS [2]. A common trait of these protocols (with the exception of rcp) is that they were designed around the CS paradigm, with the servers being the entity storing and serving files. A client that wants to share files must upload them to the server to make them available to other clients. Instant messaging systems such as ICQ [3], Yahoo! Messenger [7] and MSN Messenger [4] attempted to provide file sharing service by implementing a mechanism similar to rcp. Users could thus share file with each other without having to store them on a central server. In fact, this was the first form of P2P file sharing. Napster further extended this idea by implementing efficient file search facilities. In the public eye, P2P is synonymous with file sharing. However, other applications that may be termed P2P have become fairly popular as well, such as the SETI@home project [75], distributed.net [26] and ZetaGrid [84]. These applications have been fairly successful in attracting a user-base, but none of them come close to the number of users that the file sharing services have. These services are examples of altruistic systems. The participating peers provide CPU processing power and time to a common resource pool without deriving personal benefit from this. The pooled CPU resources are then used to perform various complex calculations such as calculating fast Fourier transforms of galactic radio data, code-breaking or finding roots of the Riemann Zetafunction. A possible reason for the difference in number of users could be that the incentive to altruistically share resources without gaining anything other than 12

2.3. P2P AND FILE SHARING some virtual fame or feel-good points of having contributed to the greater good of humanity seems to be low. Most file sharing P2P systems employ some form of admission scheme in which peers are not allowed to join the system or download from it unless they are sharing an adequate amount of files. This provides a dual incentive: first, a peer wanting to join the network must1 provide sort of an entry token in the form of shared files, and second, peers joining the system know that there is a certain amount of content provided to them once they join. The BitTorrent P2P system is one of the most prominent networks in enforcing incentive. As not all files are equally desirable in every system, files not belonging to the general category of files handled in a specific P2P network should not be allowed in. For instance, users of a network such as Napster, which only manages digital music files, might not be interested in peers sharing text files. For systems that require a large amount of file data to be shared as an admission scheme, this becomes a problem. Peers may share “junk files” just to gain access to the network. Junk files are files that are not really requested or desired in the network. These practices are usually scorned upon, but are hard to get to grips with. Some systems, such as eDonkey, have implemented a rating system, in which peers are punished for sharing junk files. Similar to junk files, there are also “fakes” or “decoys”. Fakes are files inserted in the network that masquerade under a filename that does not represent the actual content, or files that contain modified versions of the same content. By adding fakes into the network, the real content is made more difficult to find. This problem is alleviated by using various hashing techniques for the files instead of only relying on the filenames to identify the content. An example of this is the insertion of a faked Madonna single, in which the artist had overlaid the phrase “What the hell do you think you’re doing?” on top of her newly released single. Often, fakes are not as immediately apparent as this, and some form of user feedback is useful. For instance, the eDonkey system implements a reputation system for files. Decoys are often automatically generated from incoming queries to pollute the P2P networks. While decoys do not pollute the actual resource pool of the network, they can have the effect of valid queries 1 Not

in all systems, but in most hybrid systems.

13

CHAPTER 2. PEER-TO-PEER PROTOCOLS being ignored or de-emphasised. While file sharing in and of itself is not an illegal technology and has several non-copyright infringing uses, the ease with which peers may share copyrighted material has drawn the attention of the MPAA and RIAA. These organisations consider the sharing of material under the copyrights of their members as seriously harming their revenue streams, by decreasing sales. In 2004, the MPAA and RIAA began suing individuals for sharing copyrighted material. However, not all copyright holders and artists agree on this course of action, nor do they agree on the detrimental effect file sharing has on sales or artistic expression. Several smaller record labels have embraced the distribution of samples of their artists’ music online, and artists have formed coalitions against what they feel is the oppressive behaviour of the larger record labels. More recently, P2P systems have been used by corporations to distribute large files such as Linux distributions, game demos and patches. Many companies make use of the BitTorrent system for this, as it provides for substantial savings in bandwidth costs.

2.4

Summary

This chapter has discussed the history and evolution of P2P systems. The first P2P systems were classic Internet services such as the DNS or e-mail systems. More modern systems include Gnutella and eDonkey. Currently, P2P systems are usually categorised as either pure or hybrid systems. Additionally, the most popular P2P service, file-sharing, has been discussed. File-sharing is the major application for P2P protocols, and is used in both commercial and personal applications.

14

Chapter 3

The BitTorrent Protocol Anyone who considers protocol unimportant has never dealt with a cat. – Robert A. Heinlein

BitTorrent is a P2P protocol for content distribution and replication designed to quickly, efficiently and fairly replicate data [15,19]. The BitTorrent system may be viewed as being comprised of two protocols and a set of resource meta-data. The two protocols are for communication among peers and for the communication with a central network entity called the tracker. The meta-data provides all information needed for a peer to join a BitTorrent distribution swarm and to verify correct reception of the resource. The following terminology is used in this thesis: a BitTorrent swarm refers to all network entities partaking in a distribution of a specific resource. When referring to the the peer–peer protocol, the BitTorrent protocol or protocol in singular is used, while explicitly referring to the tracker protocol for the peer– tracker communication. The collection of protocols (peer, tracker and metadata) is referred to as the BitTorrent protocol suite or protocol suite. In contrast to many other P2P protocols such as eDonkey, DirectConnect, KaZaA, the BitTorrent protocol suite provides neither resource query or lookup 15

CHAPTER 3. THE BITTORRENT PROTOCOL functionality, nor chat, messaging or topology formation facilities. The protocols rather focus on fair and effective distribution of data. The signalling is geared towards an efficient dissemination of data only. Fairness in the BitTorrent system is implemented by enforcing tit-for-tat exchange of content between peers. Non-uploading peers are only allowed to download very small amounts of data, making the download of a complete resource very time consuming if a peer does not share downloaded parts of the resource. With one exception (Section 3.3.2), the protocols operate over TCP and use swarming, i.e., peers simultaneously downloading parts of the content, called pieces, from several peers. The rationale for this is that it is more efficient in terms of network load, as the load is shared across links between peers. This results in a more evenly distributed network utilisation than in the case of conventional CS distribution systems. The size of the pieces is fixed on a per-resource basis and may not be changed without generating a new meta-data file. The default piece size is 218 bytes, i.e., 256 kB. The selection of an appropriate piece size is a fairly important issue. If the piece size is small, re-downloading a failed piece is fast, while the amount of extra data needed to describe all the data grows. On the other hand, larger piece sizes means less meta-data, but longer re-download times.

3.1

BitTorrent Encoding

BitTorrent uses a simple encoding scheme for most protocol messages and associated data. This encoding scheme is known as bencoding. The scheme allows for data structuring and type definition, and currently supports four data types: strings, integers, lists and dictionaries. These are detailed in Section A.1 in the Appendix. 16

3.2. RESOURCE META-DATA

3.2

Resource Meta-data

A peer interested in downloading some content by using BitTorrent must first obtain a set of meta-data, the so-called torrent file, to be able to join a set of peers engaging in the distribution of the specific content. The meta-data needed to join a BitTorrent swarm consists of the network address information (in BitTorrent terminology called the announce URL) of the tracker and resource information such as file and piece size. The torrent file itself is a bencoded version of the associated meta information. An important part of the resource information is a set of Secure Hash Algorithm One (SHA-1) [8, 57] hash values1 , each value corresponding to a specific piece of the resource. The hash values are used to verify the correct reception of a piece. When rejoining a swarm, the client must recalculate the hash for each downloaded piece. This is a very intensive operation with regards to both CPU usage and disk I/O, which has resulted in certain alternative BitTorrent clients storing information regarding which pieces have been successfully downloaded within a specific field in the torrent file. A separate SHA-1 hash value, the info field, is also included in the metadata. This value is used as an identification of the current swarm, and the hash value appears in both the tracker and peer protocols. The value is obtained by hashing the entire meta-data (except the info-field itself). Of course, if a third-party client has added extra fields to the torrent file that may change intermittently (such as the resume data or cached peer addresses), these should not be taken into account when calculating the info-field hash value. The meta-data as defined by the original BitTorrent design does not contain any information regarding the peers participating in a swarm, though this information is added by some alternative clients to lessen strain on trackers when rejoining a swarm. This feature allows the peer to continue the download in case of tracker failure.

1 These

are also known as message digests.

17

CHAPTER 3. THE BITTORRENT PROTOCOL

3.3

Network Entities and Protocols

A BitTorrent swarm is composed of peers and at least one tracker. The peers are responsible for content distribution among each other. Peers locate other peers by communicating with the tracker, which keeps peer lists for each swarm. A swarm may continue to function even after the loss of the tracker, but no new peers are able to join. To be functional, the swarm initially needs at least one connected peer to have the entire content. These peers are denominated as seeds, while peers that do not have the entire content, i.e., downloading peers, are denominated as leechers. The BitTorrent protocols (except the meta-data distribution protocol) are the tracker protocol and the peer protocol. The tracker protocol is either a HTTP-based protocol or a UDP-based compact protocol, while the peer protocol is a BitTorrent-specific binary protocol. Peer-to-tracker communication usually takes place using HTTP, with peers issuing HTTP GET requests and the tracker returning the results of the query in the returning HTTP response. The purpose of the peer request to the tracker is to locate other peers in the distribution swarm and to allow the tracker to record simple statistics of the swarm. The peer sends a request containing information about itself and some basic statistics to the tracker, which responds with a randomly selected subset of all peers engaged in the swarm.

3.3.1

The Peer Protocol

The peer protocol, also known as the peer wire protocol, operates over TCP, and uses in-band signalling. Signalling and data transfer occur in the form of a continuous bi-directional stream of length-prefixed protocol messages over a common TCP byte stream. A BitTorrent session is equivalent with a TCP session, and there are no protocol entities for tearing down a BitTorrent session beyond the TCP tear18

3.3. NETWORK ENTITIES AND PROTOCOLS down itself. Connections between peers are single TCP sessions, carrying both data and signalling traffic. Once a TCP connection between two peers is established, the initiating peer (Peer A in Figure 3.1) sends a handshake message containing the peer id and info field hash (Figure 3.1). If the receiving peer (Peer B) replies with the corresponding information, the BitTorrent session is considered to be opened and the peers start exchanging messages across the TCP streams. Otherwise, the TCP connection is closed. Immediately following the handshake procedure, each peer sends information about the pieces of the resource it possesses. This is done only once, and only by using the first message after the handshake. The information is sent in a bitfield message, consisting of a stream of bits, with each bit index corresponding to a piece index. Peer A

Peer B

info info,peer_id B peer_id A

bitfield exchange

message exchange

Figure 3.1: BitTorrent handshake procedure

The BitTorrent peer protocol messages are described in Section A.2 of the Appendix.

3.3.2

The Tracker Protocol

The tracker is accessed by HTTP or HTTPS GET requests. The default listening port is 6969. The tracker address, port and top-level directory are specified in the announce url field in the torrent file for a specific swarm. 19

CHAPTER 3. THE BITTORRENT PROTOCOL Tracker Queries Tracker queries are encoded as part of the GET URL, in which binary data such as the info_hash and peer_id fields are escaped as described in RFC1738 [14]. The query is added to the base URL by appending a question-mark, ?, as described in RFC2396 [13]. The query itself is a sequence of parameter=value pairs, separated by ampersands, &, and possibly escaped. An example of a tracker request is given in Figure 3.2. The \-characters indicate that the line continues on the following line. GET /announce?info_hash=n%05hV%A9%BA%20%FC%29%12%1Ap%D4%12%5D%E6U%0A%85%E1&\ peer_id=M3-4-2--d0241ecc3a07&port=6881&key=0fcca260&uploaded=0&downloaded=0&\ left=663459840&compact=1&event=started HTTP/1.0

Figure 3.2: Example tracker announce GET request

A complete list of parameters is given in Section A.3 of the Appendix.

Tracker Replies The tracker HTTP response is, unless the compact parameter is 1, a bencoded dictionary. The contents of the reply are listed in Section A.3.3 of the Appendix. If the compact parameter is set to 1, then the reply is a binary list of peer addresses and ports. This list is encoded as a six-byte datum for each peer, in which the first four bytes are the IP address of the peer, and the last two bytes are the peer’s listening port (Figure 3.3). This saves bandwidth, but is only usable in an IPv4 environment. There is no equivalent compact format for IPv6. If the request fails for some reason, the dictionary contains only a single key: failure reason, indicating the reason for the failed request. 20

3.3. NETWORK ENTITIES AND PROTOCOLS 0

32

Peer1 IP address

Peer1 port

Peer2 IP address .. .

Peer2 port

Peern IP address

Peern port

48

Figure 3.3: Compact tracker response

Tracker UDP Protocol Extension To lower the bandwidth usage for heavily loaded trackers, a UDP-based tracker protocol has been proposed [77]. The UDP tracker protocol is not part of the official BitTorrent specification, but has been implemented in some of the third-party clients and trackers. Compared to the standard HTTP-based protocol, the UDP protocol uses about 50 % less bandwidth. It also has the advantage of being stateless, as opposed to the stateful TCP connections required by the HTTP scheme. This means that a tracker is less likely to run out of resources due to for instance half-open TCP-connections. The Scrape Convention BitTorrent trackers commonly include simple HTTP servers to provide information on the swarms they track. Web scraping denotes the procedure of parsing a Web page to extract information from it. It is a fall-back method of obtaining information when other methods fail or are not available. The BitTorrent variant is a bit different, as it is a way for peers to gain information on a specific swarm without actually joining the swarm. Trackers can implement functionality to allow peers to request information regarding a specific swarm without resorting to error-prone Web-scraping techniques. If the last name in the announce URL, i.e., the name after the last /-character 21

CHAPTER 3. THE BITTORRENT PROTOCOL is announce, then the tracker supports scraping by using the announce URL with the name announce replaced by scrape. The scrape request may contain an info_hash parameter, as shown in Figure 3.4, or be completely without parameters. GET /scrape?info_hash=n%05hV%A9%BA%20%FC)%12%1Ap%D4%12%5D%E6U%0A%85%E1 HTTP/1.0

Figure 3.4: Example tracker scrape GET request

The tracker responds with a bencoded dictionary containing information about all the swarms that the tracker is currently tracking. The dictionary has a single key, named files. This key contains another dictionary whose keys are the 20-bit binary info_hash values of the torrents on the specific tracker. Each value of these keys contains another dictionary with information about the specific swarm. The contents of this dictionary is given in Section A.4 of the Appendix.

3.4

Peer States

A peer maintains two states for each peer relationship. These states are known as the interested and choked states. The interested state is imposed by the requesting peer on the serving peer, while for the case of the choked state the opposite is true. If a peer is being choked, then it will not be sent any data by the serving peer until unchoking occurs. Thus, unchoking is usually equivalent with uploading. The interested state indicates whether other peers have parts of the sought content. Interest should be expressed explicitly, as should lack of interest. That means that a peer wishing to download notifies the sending peer (where the sought data is) by sending an interested message, and as soon as the peer no longer needs any other data, a not interested message is issued. Similarly, for a peer to be allowed to download, it must have received an unchoke message from the sending peer. Once a peer receives a choke message, it will no longer be allowed to download. This allows the sending peer to keep track of the 22

3.5. SHARING FAIRNESS AND BOOTSTRAPPING peers that are likely to immediately start downloading when unchoked. A new connection starts out choked and not interested, and a peer with all data, i.e., a seed, is never interested. In addition to the two states described above, some clients add a third state – the snubbed state. A peer relationship enters this state when a peer purports that it is going to send a specific sub-piece, but fails to do so before a timeout occurs (typically 60 seconds). The local peer then considers itself snubbed by the non-cooperating peer, and will not consider sub-pieces requested from this peer to be requested at all. The snubbed state is reconsidered from time to time.

3.5

Sharing Fairness and Bootstrapping

The choke/unchoke and interested/not interested mechanism provides fairness in the BitTorrent protocol. Since it is the transmitting peer that decides whether to allow a download or not, peers not sharing content tend to be reciprocated in the same manner. To allow peers that have no content to join the swarm and start sharing, a mechanism called optimistic unchoking is employed. Optimistic unchoking means that from time to time, a peer with content will allow even a non-sharing peer to download. This will allow the peer to share the small portions of data received so far and thus enter into a data exchange with other peers. This means that while sharing resources is not strictly enforced it is strongly encouraged. It also means that peers that have not been able to configure their firewalls and/or Network Address Translation (NAT) routers properly will only be able to download the pieces altruistically shared by peers through the optimistic unchoking scheme. 23

CHAPTER 3. THE BITTORRENT PROTOCOL

3.6

Data Transfer

Data transfer is performed in parts of a piece (called subpiece, block or chunk) at a time, by issuing a request message. Subpiece sizes are typically of size 16384 or 32768 bytes. The subpiece size is not part of the protocol, and may be chosen at the discretion of the requesting peer. To allow TCP to increase throughput, several requests are usually sent backto-back. Each request should result in the corresponding subpiece to be transmitted. If the subpiece is not received within a certain time (typically one minute), the non-transmitting peer is snubbed, i.e., is punished by not being allowed to download, even if unchoked. Data transfer is performed by sending a piece message, which contains the requested subpiece (Figure 3.5). Once the entire piece, i.e., all subpieces, has been received, and the SHA-1 hash of the piece has been verified to the corresponding hash value in the meta-data, a have message is sent to all connected peers. The have message allows other peers in the swarm to update their internal information on which pieces are shared by specific peers in the swarm.

3.6.1

End-game Mode

When a peer is approaching completion of the download, it sends out requests for the remaining data to all currently connected peers to quickly finish the download. This is known as the end-game mode. Once a requested subpiece is received, the peer sends out cancel-messages to all peers that have not yet sent the requested data. Without the end-game mode, there is a tendency for peers to download the final pieces from the same peer, which may be on a slow link [20]. 24

3.7. BITTORRENT PERFORMANCE ISSUES Peer A

Peer B interested

Peer C interested

request(piece,subpiece) request(piece,subpiece) request(piece,subpiece) request(piece,subpiece) unchoke

unchoke

piece(subpiece) piece(subpiece)

piece(subpiece) piece(subpiece)

have

have

Figure 3.5: BitTorrent protocol exchange

3.7

BitTorrent Performance Issues

Even though BitTorrent has become very popular among home users, and widely deployed in corporate environments, there are still some issues currently being addressed for the next version of BitTorrent. The most pressing issue regards the load on the central tracker authority. There are two main problems related to the tracker: peak load and redundancy. Many trackers also handle more than a single swarm. The most popular trackers handle several hundred swarms simultaneously. It is not uncommon for popular swarms to contain hundreds or even thousands of peers. Each of these peers connect to the tracker every 30 minutes by default to request new peers and provide transfer statistics. An initial peer request to the tracker results in about 2-3 kilobyte of response data. If these requests are evenly spread out temporally, the tracker can usually handle the load. However, if a particularly desired resource is made available, this may severely strain the tracker, as it will be subject to a mass accumulation of connections akin to a distributed denial 25

CHAPTER 3. THE BITTORRENT PROTOCOL of service attack by requesting peers. This is also known as the flash-crowd effect [39]. It is imperative for a swarm to have a functioning tracker if the swarm is to gain new peers as, without the tracker, new peers have no location to receive new peer addresses. Tracker redundancy is currently being explored and two alternatives are studied: backup trackers and distributing the tracking functionality in the swarm itself. An extension exists to the current protocol that adds a field, announce-list to the meta-data, which contains URLs to alternate trackers. No good way of distributing the tracking in the swarm has yet been found, but a network of distributed trackers has been proposed. Proposals of peers sending their currently connected peers to each others have also cropped up, but again, no consensus has been agreed on. Additionally, Distributed Hash Table (DHT) functionality has been implemented in third party clients to address this problem [11]. A beta version of the reference client also has support for DHT functionality. Another important problem is the initial sharing delay problem. If a torrent has large piece sizes, e.g., larger than 2 MB, the time before a peer has downloaded an entire piece and can start sharing the piece might be quite substantial. It would be preferable to have the ability to have varying verification granularities for the data in the swarm, so that a downloading peer does not have to wait for an entire piece to begin calculating the hashes of the data. One way to do this would be to use a mechanism known as Merkle trees [16], which allow for varying granularity. By using this mechanism, a peer may start sharing after having downloaded only a small amount of the data (on about the same order as the subpiece sizes).

3.7.1

Super Seeding

When a swarm is fairly new, i.e., there are few seeds in the swarm and peers have little of the shared resource, it makes sense to try to evenly distribute the pieces of the content to the downloading peers. This will speed up the dissemination of the entire content in the swarm. A normal seed would announce itself as 26

3.8. SUMMARY having all pieces during the initial handshaking procedure, thus leaving the piece selection up to the downloading peer. Seeds have usually been in the swarm longer. This means that they are likely to have a better view on which pieces are the most rare in the swarm, and thus most suitable to be first inserted. As soon as peers start receiving the rare pieces, other peers can download them from other peers instead of seeds. This further balances the load in and increases the performance of the swarm. A seed that employs super seeding does not advertise having any pieces at all during handshake. As peers connect to the in effect hidden seed, it instead sends have-messages on a per-peer basis to entice specific peers to download a particular piece. This mechanism is most effective in new swarms, or when there is a high peer-to-seed ratio and the peers have little data. It is not recommended for everyday use. As certain peers might have heuristics governing which swarms to be part of, a swarm containing only super seeders might be discarded. This is because peers cannot detect the super seeder as a seeder, thus assuming that the swarm is unseeded. This decreases the overall performance of the swarm.

3.8

Summary

This chapter has discussed the BitTorrent system in detail. It is a swarming content replication and distribution system. The BitTorrent system consists of peers and trackers. Peers are either leechers, i.e., downloading peers, or seeds, i.e., uploading peers. Data content is described by a torrent file, using a specific binary encoding known as bencoding. Data transfer occurs among peers in a swarming fashion, i.e., peers share parts of the content among themselves. Each content part has an associated SHA-1 hash to enable verification of downloaded data. Peers that do not share data are punished by not being allowed to download. Furthermore, protocol performance issues and current developments have 27

CHAPTER 3. THE BITTORRENT PROTOCOL been discussed. Primarily, the load on the tracker has been identified as a potential bottleneck.

28

Chapter 4

Traffic Measurements Science is the observation of things possible, whether present or past; prescience is the knowledge of things which may come to pass, though but slowly. – Lenoardo da Vinci

Depending upon the domain, traffic measurements may serve different purposes. For example, an Internet Service Provider (ISP) may benefit from measuring the amount of outgoing traffic to estimate pricing and the services provided. A company providing Voice over IP (VoIP) services may want to measure latencies with high accuracy to ensure a certain degree of Quality of Service (QoS), while a Web hosting provider may be more interested in metrics such as the number of requests per time unit. On the other hand, manufacturers of network hardware such as routers and switches use real-world measurements to test the behaviour of the hardware under realistic conditions without deploying them. There are four main reasons for the usefulness of network traffic measurements: network troubleshooting, protocol debugging, workload characterisation and performance evaluation [82]. For the present work, only the last two are considered, with a strong emphasis on workload characterisation.

29

CHAPTER 4. TRAFFIC MEASUREMENTS

4.1

Measurement Approaches

There are two main approaches to traffic measurements: active and passive measurements. Active measurements entails actively probing a network with either artificially generated traffic or having a node join in the network as an active participant. Probing with artificial traffic is somewhat analogous to system identification using impulses in for instance vibration experiments or acoustical environments. A passive measurement is one where the network is silently monitored without any intrusion.

4.1.1

Passive Measurements

Passive measurements are commonly used when data on “real” networks is desired, for instance for use in trace-driven simulations, model validation or bottleneck identification. Essentially, this technique is used to observe a live network without interacting with it. Depending on the level of accuracy desired, different measurement options are available. For coarse-grained measurements, on a time-scale on the order of seconds, there is the possibility of using Simple Network Management Protocol (SNMP) and Remote Monitoring (RMON) to gather information from networking hardware. This provides only very rough information granularity and no packet inspection capabilities. It is usually used as part of normal network operations, and is not very useful for protocol evaluations and per-flow performance evaluation. Per-flow information is available in for instance Cisco’s NetFlow, but still without packet inspection.

Application Logging Application logging is commonly used in server software to enable traceability of errors and client requests. In certain server applications, such as critical business systems and other high-security systems, server logs are very important for detecting intrusion attempts and for estimating severity of security breaches. 30

4.1. MEASUREMENT APPROACHES In other applications, logs are a useful tool for performance analysis. However, client applications do not usually provide for much in terms of logging. If logging is made available, it usually provides rather coarsely grained information, such as application start and other very high-level application events. It is unusual that an application provides the amount of log detail needed to analyse the network performance of the application. To provide adequate detail in application logs, it is necessary to modify the application in such a way that the application both provides the detailed event information needed and a way to store this information in a log file or database. In applications that are based on an event-loop with a central managing component, obtaining the relevant information is a fairly easy task, as the events being handled contain all information relevant to the specific event. By adding a timestamp, these may then be ejected to a log file or database. On the other hand, in a threaded and less centralised application, this becomes a more difficult task, as events may not be handled through a single component. An additional issue with client-side logging is deployment of the modified clients. It is important to have a large enough number of users to provide representative data. Also, not all users may agree to running a modified client. One of the most difficult problems relates to the non-availability of clientsource code. For example, most proprietary software does not provide the source code for the application, making modification impossible without substantial reverse engineering. Log storage may become an issue if, for instance, the application is running on an embedded system where there is no storage available except for internal memory. Also, if measurements are performed over a long period of time and/or there is a large number of events, the application logs may grow prohibitively large.

31

CHAPTER 4. TRAFFIC MEASUREMENTS Packet Capture Application logs rarely provide network specific information, such as IP packet arrival times. Dedicated packet capture or packet monitoring units have the possibility of capturing every packet on a physical link. The packets may then be associated with application level events and messages, resulting in higher granularity of the captured messages. Packet capture may be performed using only software or by employing specialised hardware. Software configurations provide measurement accuracies on the order of tens of microseconds, while dedicated measurement hardware provides nanosecond accuracy. The arguably most commonly used passive software measurement tool for capturing live network traffic is tcpdump [40]. tcpdump is based on the pcaplibrary which includes the Berkeley Packet Filter (BPF) [51]. This allows tools using the library to set up complex filter rules on which packets to capture. This library is the basis for many other measurement and general network tools. Two important issues regarding passive measurements are related to storage and computing power requirements [44]. In the case of complete flow reconstruction, the entire data payload portion of the captured packets must be retained. For each packet captured, there is an additional capture header containing a timestamp and other meta-data. To capture large flows such as P2P downloads, the storage requirements are correspondingly large. In the case of traffic measurements on backbone networks, the amount of off-line storage needed is prohibitively large. For instance, large BitTorrent trackers measure the amount of downloaded data among their peers in peta-bytes and the number of messages to a single peer often count in the millions. The complexity of computing statistical measures with this amount of data makes it a challenging, if not daunting, task. It is often necessary to create specialised software to calculate statistics of interest. Other important issues regarding capturing live traffic relate to privacy, deployment and cost. Since a capturing unit has the potential of capturing all traffic, including user data, security and privacy implications must be consid32

4.1. MEASUREMENT APPROACHES ered. This may create problems when choosing suitable measurement locations, as the network owner may not allow full packet captures to be collected. Even if the network owner allows the recording of full packets, data that allows for user identification, such as IP addresses, must be scrambled if the packet traces are to be made public. Finding a suitable location to place the measurement unit may also prove to be a challenging process. Ideally, the measurements should be made at a location that provides data representative for the metrics to be studied. For certain types of measurements, it is not possible to perform measurements using only software running on off-the-shelf hardware. For example, high-accuracy measurements on high-speed links such as optical links require both special hardware to split the physical line, known as wire taps, and special hardware to capture the actual packets. This hardware is expensive, and large-scale deployment of such units may not be economically feasible.

4.1.2

Active Measurements

Active traffic measurements are used for assessing performance metrics and network parameters that are not readily available by using passive measurements, such as topological information and end-to-end latency. This type of measurements are also useful in determining system response to varying workloads. The workloads can be either synthetic, i.e., traffic or packets generated from some analytic model, or trace-driven. A trace-driven workload uses previously captured traffic to inject into the network. This has the benefit of using real, non-synthetic traffic, thus avoiding any model discrepancies. It is however more time-consuming to perform and less flexible. Certain protocols, such as TCP, are more difficult to properly replay. Synthetic traffic generators are a more flexible way of placing load on a network than using trace-driven traffic insertion. For instance, if it is desired to inject similar, but not identical, traffic from several hosts on the network under study, a slight change in model parameters would suffice. A trace-driven load would require both moving around large packet traces and changing these in real-time on every traffic generating node. A synthetic generator also has the 33

CHAPTER 4. TRAFFIC MEASUREMENTS advantage of being portable between different systems [41]. As actively probing the network entails injecting traffic or packets, there is a risk of disturbing the network being measured. Thus, the amount of injected traffic must be carefully chosen so as not to change the behaviour of the network to such a degree that the measurements no longer are relevant for the network under study. The most commonly known active measurement tools are ping and traceroute. The ping-command measures the latency between two hosts by sending an ICMP echo request and measuring the time until the response arrives. traceroute uses the TTL-field in the IP header to estimate the IP network path to a given host. Novel techniques have been developed to circumvent problems associated with ICMP probes in NAT environments [33]. Depending on the metrics needed, other programs may be used as well. For example, simple Web clients may be used to measure server response times. For P2P networks, programs known as crawlers are occasionally used to create a topological snapshot of the P2P network. However, crawlers generate large amounts of signalling traffic, and are usually viewed as disruptive to the network.

4.2

Application Level Traffic Analysis

Conventional measurement methodologies such as passive network monitors using either software or specialised hardware are adequate for diagnosing most directly network-related problems. There are however a few issues that are difficult, if not impossible, to solve without resorting to application layer information. For instance, if the network traffic is observed at the network or link layer, the information gained regarding the higher-layer protocols is filtered through the mechanisms of the IP stack. Only second-hand information about the application protocols is obtained in this case, with the consequence of more difficulty in debugging protocols. 34

4.2. APPLICATION LEVEL TRAFFIC ANALYSIS Lower layer captures also lack the possibility of directly knowing about application state. The internal application states have to be inferred by the available protocol messages sent. Implicit states such as the BitTorrent snubbed state have to be heuristically inferred, as there is no explicit message denoting the state. This adds additional complexity and processing time to the parsing software. Additionally, one may choose only specific messages and states of interest. Packet capture necessitates full payload capture and decoding before discarding messages is possible. On the other hand, there are drawbacks with analysing traffic at the application layer as well. As mentioned above, the incoming protocol messages are filtered through the IP stack before reaching the application. This means that timestamps on the application level are affected by buffering and thus queueing in the kernel. Timestamps are also affected by the current system load, primarily I/O load. Thus, inaccuracies may occur if the system is heavily loaded with network traffic and logging to disk.

4.2.1

TCP/IP Stack Performance

By combining the methods of application layer logging and application stream reassembly on the link and/or IP layers, there is the possibility of evaluating the performance of the IP stack on which the logging is taking place1 . This opens for the possibility to assess the amount of buffering taking place in the stack as well as performance issues associated with specific network loads. For instance, P2P applications often use a large amount of TCP connections, which might not only compete for the available network bandwidth, but may also result in contention in the host computer’s IP stack. Problems in the stack might then be inappropriately diagnosed as network or protocol issues when the fact of the matter may be altogether different.

1 Given

that both logging and tracing is performed on the same host.

35

CHAPTER 4. TRAFFIC MEASUREMENTS

4.3

Measurement Infrastructure

A dedicated P2P measurement infrastructure has been developed at BIT to perform measurements on P2P networks (Figure 4.1) [30]. It is composed of standard personal computers participating in various P2P networks (currently BitTorrent and Gnutella) running both instrumented and non-instrumented clients. Traffic collection and decoding is based on tcpdump [40] for packet capture and tcptrace [58] for protocol decoding. The measurement nodes run the Gentoo Linux 1.4 operating system, with kernel version 2.6.5. Each node is equipped with an Intel Celeron 2.4 GHz processor, 1 GB RAM, 120 GB hard drive, and 10/100 FastEthernet network interface. As shown in Figure 4.1, the network interface is connected to a 100 Mbit switch in the lab at the Telecommunications department, which is further connected through a router to the GigaSUNET backbone.

Internet BIT router

Switch 10/100 Mbit

BitTorrent node

Gnutella node

Figure 4.1: BIT measurement setup

4.4

Measurement Software

The software of the measurement infrastructure is comprised of two major components [30]. The first component is a generic TCP reassembly and post36

4.4. MEASUREMENT SOFTWARE processing framework which is used to parse and analyse link layer traces captured by tcpdump. The second component is a common logging format and set of analysis programs written in a variety of languages, such as perl, python, awk, R and Octave. Additionally, application logging is performed, with logs adhering to the same common logging format as parsed packet traces. A high-level abstraction of the measurement process is presented in Figure 4.2.

Log parsing Log data reduction Data collection with tcpdump

TCP Reassembly

Postprocessing and analysis

Application msg flow reassembly

Figure 4.2: Measurement procedures

4.4.1

Generic TCP Reassembly Framework

The TCP reassembly framework is essentially a three-stage capture and reassembly engine. The first stage is the packet capture stage, and the two following stages are TCP and application flow reassembly. The last two stages are working in parallel, but are logically separate. Packet capture traces are collected using tcpdump, version 3.8.3. Tcpdump is started before the P2P application is measured and filters are applied to the capture process to avoid capturing packets belonging to applications listening on well-known ports such as HTTP, FTP, SSH and SMTP. During capture, the packet traces are saved to files of size 600 MB to facilitate backups to recordable CDs. This also avoids problems related to file size limitations of the file systems on the measurement nodes, since a typical Windows or Linux file system cannot handle files larger than 4 GB. 37

CHAPTER 4. TRAFFIC MEASUREMENTS TCP Reassembly The TCP reassembly module is based on the TCP engine from tcptrace, and works in a fashion similar to the one used in the BSD TCP/IP stack as described in [66]. The engine is highly modular and extensions are provided to facilitate reassembly of any type of data stream, not only TCP. Capabilities include detection and handling of out-of-order segments as well as forward and backward segment overlapping.

Application Data Flow Reassembly As soon as a new TCP segment has been read and inserted into the segment list by the TCP reassembly engine, an application-specific hook function is called. This hook is used to notify an application data reassembly module. The application decoder is responsible for parsing and decoding the data stream provided by the TCP reassembly engine. Once a message (which may span several TCP segments) is fully decoded, a log entry is ejected to the log file. Decoders have been implemented for Gnutella and BitTorrent.

4.4.2

BitTorrent Application Logging

The reference BitTorrent client version2 is written in the python programming language [78]. Python is an interpreted and interactive language with object oriented features that combines syntactical clarity with powerful components and system-level functionality. This makes the process of extending software written in the language less complicated than in a compiled and syntactically more demanding language such as C or Java. 2 Version

38

3.4.1, released on March 11, 2004.

4.4. MEASUREMENT SOFTWARE Software Modifications The client is written as an event-based program, reacting on incoming protocol messages and internal timers. The internal timers activate the sending of messages such as tracker requests, unchoking peers and network timeouts. For the purpose of the present work, the incoming network message handling routines are the important part. These are mainly located in a single software component, which handles all incoming events. This component consists of a function containing the main loop (that receives the network messages), and several message specific functions to handle the incoming messages that are invoked from the main loop. While it is possible to intercept the messages in the main loop, it is much easier to do so in the specific message handling routines. There are two major reasons for this:

• The message type is already implicitly given by the call of the function. • Message-specific information is provided automatically, without the need to write extra parsing code. For instance, in the case of a piece request message being intercepted in the main loop, it would have been necessary to parse the incoming message to find information such as piece number and subpiece index.

Before saving the ejected log messages to disk, they are compressed by the zlib library [32]. This is beneficial both with regards to disk storage and with regards to the amount of disk I/O performed. The degradation in CPU performance of the compression is practically negligible on the measurement computers. Finally, extra parameters have been added to the application to allow changing the filename of the log-file, and code to automatically generate a date and timestamped filename if none was given. 39

CHAPTER 4. TRAFFIC MEASUREMENTS

4.4.3

Log Formats

Selection of a log format that provides a suitable amount of information is a tricky issue. It is important to capture enough information to make relevant statistical analysis possible, while at the same time keep the sizes of the log files to a manageable level. This problem is most noticeable when designing a log format for application logs, as it is not possible to re-run a specific measurement a second time if one has chosen too small a subset of metrics to log. Packet captures are less affected by this, but are not impervious to similar effects in the case of, for instance, too small capture size for the recorded packets, thus losing parts of the payload data. In both cases, information is irretrievably lost. Complete packet captures that contain all data transmitted on a link may be used to re-generate log files as needed. This is however often a very timeconsuming process, and it is preferable to avoid it whenever possible. BitTorrent XML Log Format The eXtensible Markup Language (XML) [80] has a number of attractive features that makes it a good choice as a log format. XML is by concept and design made to be easily parsable by a computer, while at the same time be at least semi-readable by humans. Some of the salient advantages of using XML as a log format are : Parsability

There are several XML parsing libraries available for a plethora of languages, including, but not limited to, perl, C, C++, python and Matlab. This makes the writing of log parsers much easier, since it is not necessary to write an application specific parser for the log format.

Extensibility It is easy to add new log fields, and new log fields do not necessitate changing the parser. This is very useful when deciding what information goes into the log, as fields may 40

4.4. MEASUREMENT SOFTWARE be added and removed easily. Validation

The number and types of fields are easily verifiable, and is usually performed as part of the XML validation process provided by the parsing library.

Two drawbacks with using XML as a log format are that the parsing is slightly slower than an application specific parser, and that memory requirements are substantially higher when using specific parsers. In particular, it is rarely possible to use the Document Object Model (DOM) parsers to parse the log files. These parsers maintain a representation of the entire XML file in memory and, with log files in the gigabyte range, the amount of memory required is substantial. Simpler parsers, such as Simple API for XML (SAX) parsers, are therefore used. These parse the document on an element by element basis, removing the need for keeping the entire document in memory. This solution unfortunately also means that the transformation capabilities provided by eXtensible Stylesheet Language Transformations (XSLT) cannot be used, and specific software making use of provided SAX parsers must be created. XML documents are text documents comprised of elements and attributes. Attributes are contained within the elements, and usually carry element-specific information and modifiers. A detailed description of the BitTorrent XML log format is provided in Section B of the Appendix.

Common Log Formats To facilitate the re-use of parsing software, it was decided as part of the measurement infrastructure, that parsed traces should be written to log files that adhere to a common format. This format is defined as follows: • Fields are separated by spaces (ASCII code 20) • Field definitions are: 1. The first field is the UNIX timestamp for the event. 41

CHAPTER 4. TRAFFIC MEASUREMENTS 2. The second field contains the message type, if any. 3. Any following field may contain arbitrary information. This is a simple and flexible logging scheme that allows the use of standard UNIX tools such as sed, awk and perl to parse the files without resorting to writing specialised parsers for each log file. In fact, the parsing software only assumes that the first field is a UNIX timestamp, and the rest of the fields are arbitrary. It is however recommended that the second field is a message type. Figure 4.3 shows an example of a log file generated from a BitTorrent application log. 1088079499.265901 1088079499.267193 1088079499.269075 1088079499.282697

piece request 1311 196608 have 593 have 6690

Figure 4.3: Sample BitTorrent log file

4.5

Summary

This chapter has discussed methods and reasons for traffic measurements. Traffic measurements can be either passive or active. Passive measurements implies observing a network without interfering with the traffic, while active measurements involves actively probing the network. Additionally, the P2P measurement infrastructure used for the measurements in this thesis has been presented. It is based on passive measurements using application logs and packet capture. Specific parsing and analysis software for BitTorrent and Gnutella has been written as part of the infrastructure. The measurements performed using this infrastructure have the advantage of being able to capture application layer messages with link-layer accuracy.

42

Chapter 5

Traffic Modelling So long, and thanks for all the fish. – Douglas Adams

Traffic modelling is a form of workload characterisation, with the aim of providing tractable and parsimonious models for the traffic loads placed on a network by various applications. Ideally, these models should be invariant, i.e., should hold true irrespective of operating conditions. Historically, teletraffic modelling concerned characterising the relationship between traffic load and performance in the Public Switched Telephone Network (PSTN). Researchers such as Conny Palm created models that worked very well for modelling telephone call arrivals, number of blocked calls and busy hour load [59]. The major tool for these models was the Poisson process. The fundamental idea behind the Poisson process is that times between events (e.g., incoming telephone calls) are drawn from an exponential distribution, and that these times are independent. Furthermore, as the number of sources increase, the aggregate traffic becomes less bursty. This makes treatment of the models analytically simple and attractive. For the remainder of the thesis, the following notations are used. A theoreti-

43

CHAPTER 5. TRAFFIC MODELLING cal distribution function with the associated estimated parameters is denoted by Fˆ . Fˆ is occasionally mentioned as the estimated distribution in the thesis. The Empirical Distribution Function (EDF) of a given data set is denoted by Fn . The associated theoretical Probability Density Function (PDF) and histogram are denoted by fˆ and fn respectively. Also, x1 , x2 , . . . , xn denote observations from the random variable X and the total number of observations are denoted by n.

5.1

Heavy-tailed Traffic Models

With the appearance of packet-switched networks in the late 60’s and early 70’s, the assumption of exponentiality and independence remained; as much for the analytic tractability as no apparent reason to suspect otherwise. However, this all changed when Willinger et al. showed that Ethernet traffic does not display the expected decrease in burstiness. Rather, the results showed that Ethernet traffic has self-similar qualities and traffic “spikes” ride on longer term “ripples” that in turn ride on still longer term “swells” [49]. This seminal paper instigated a flurry of work on self-similarity in network traffic, and several sources of selfsimilar and long-range dependent traffic have been reported. Examples of these are Web and FTP file sizes [23,62]. The self-similar traffic characteristics appear over several time scales and aggregation levels, making the self-similarity a traffic invariant [60]. Additionally, the control and recovery mechanisms of TCP have been shown to be both a possible source of Long-Range Dependence (LRD) and that LRD tendencies are propagated by them [34, 79]. A complete discussion regarding self-similarity and LRD is not within the scope of this thesis. The current section will therefore give only a very concise summary of the fundamental concepts.

5.1.1

Self-similarity and Long-range Dependence

Self-similarity refers to the insensitivity to scale. In the case of stochastic processes, this means that process statistics such as variance and mean do not 44

5.1. HEAVY-TAILED TRAFFIC MODELS change with a change in time scale. That is, the process “looks” the same whether it is observed on a scale of 1 s or a scale of 6 s. In particular, the process exhibits the same amount of burstiness on all scales. The term statistical self-similarity is usually expressed as [74]

A process x(t) is statistically self-similar with parameter H ( 12 ≤ H ≤ 1) if for any real a, the process a−H x(t) have the same statistical properties as x(t).

The parameter H is called the Hurst parameter, and denotes the amount of LRD of the process. For H = 12 , the process has independent increments, and thus has no dependence on previous values. However, for H ≥ 12 , the process exhibits persistence, i.e., previous trends implies that future trends will be the same. The closer H is to 1, the higher the amount of LRD. The Brownian motion (BM) process is an example of a stochastic process that exhibits self-similarity, but not LRD, i.e., has a Hurst parameter H = 21 . The BM process can be generalised to a fractional BM (FBM) process, which may exhibit LRD. LRD and Short-Range Dependence (SRD) are often defined in terms of the process auto-covariance. An SRD process has a summable auto-covariance, i.e., the sum is non-divergent. For an LRD process, the auto-covariance is divergent and thus non-summable. More formally, the auto-covariance of a short-range dependent process decays at least exponentially, while the long-range equivalent decays hyperbolically. It is of interest to be able to estimate the amount of self-similarity in a particular data set. This may be done either by using graphical methods or statistical methods. Two examples of graphical methods are discussed, the Hill and α scaling estimator plots. 45

CHAPTER 5. TRAFFIC MODELLING Hill Plots The Hill plot uses the order statistics of X to form an estimate for the tail index α (Section 5.1.2). It is defined as

Hk,n =

k−1 1X (log xn−1 − log xn−k ) k i=0

(5.1)

10

where k is the number of order statistics used in estimating the α parameter. Hk,n is plotted versus k and a horizontal line indicates the estimate of α. Figure 5.1 shows an example of a Hill plot for the Exponential, Pareto, Weibull and Log-normal distributions. The parameters have been chosen so as to make the distributions have quite long tails. Pareto

8

Log−normal

4

6

Exponential

0

2

Hill alpha−estimate

Weibull

0

2000

4000

6000

8000

10000

Number of order statistics

Figure 5.1: Pareto, Weibull, Log-normal and Exponential Hill plots

Often, empirical distributions show power-law behaviour only in the upper tail, while the body of the distribution appears to be non-heavy-tailed. The Hill estimator suffers from the problem of choosing the correct number of order statistics to give an acceptable estimate for α, i.e., locating the proper cutoff point where power-law behaviour begins. Furthermore, the Hill estimator only performs well for distributions close to a Pareto [67]. It is for instance difficult to assess the heavy-tail index for the Log-normal and Weibull distributions presented in Figure 5.1. 46

5.1. HEAVY-TAILED TRAFFIC MODELS The α Scaling Estimator The α or scaling estimator uses a property of random variables with infinite variance that is analogous to the central limit theorem for finite variance random variables [22]. This property is the scaling property of sums of infinite variance random variables. The estimate is formed by calculating aggregate versions of the original data at increasing levels of aggregation and comparing the log-log CCDFs of these aggregates. File: x.w.asc No. points: 1000000 Alpha Estimate: 1.427 0

-1

-1

-2

-2 Log10(P[X > x])

Log10(P[X > x])

File: x.p.asc No. points: 1000000 Alpha Estimate: 0.986 0

-3

Raw Data 2-Aggregated 4-Aggregated 8-Aggregated 16-Aggregated 32-Aggregated 64-Aggregated 128-Aggregated 256-Aggregated 512-Aggregated "x.p.asc.pts"

-4

-5

-6 -6

-3

Raw Data 2-Aggregated 4-Aggregated 8-Aggregated 16-Aggregated 32-Aggregated 64-Aggregated 128-Aggregated 256-Aggregated 512-Aggregated "x.w.asc.pts"

-4

-5

-6

-4

-2

0

2

4

6

-4

-3

-2

-1

0

Log10(size - 15.078)

(a) Pareto

-1

-1

-2

-2 Log10(P[X > x])

Log10(P[X > x])

0

-3

Raw Data 2-Aggregated 4-Aggregated 8-Aggregated 16-Aggregated 32-Aggregated 64-Aggregated 128-Aggregated 256-Aggregated 512-Aggregated "x.ln.asc.pts"

-6 -6

-5

-4

-6 -1

0

Log10(size - 9.430)

(c) Log-normal

5

Raw Data 2-Aggregated 4-Aggregated 8-Aggregated 16-Aggregated 32-Aggregated 64-Aggregated 128-Aggregated 256-Aggregated 512-Aggregated "x.e.asc.pts"

-5

-2

4

-3

-4

-3

3

File: x.e.asc No. points: 1000000 Alpha Estimate: 2.264

0

-5

2

(b) Weibull

File: x.ln.asc No. points: 1000000 Alpha Estimate: 1.483

-4

1

Log10(size - 48.344)

1

2

3

4

-6

-5

-4

-3

-2

-1

0

1

2

Log10(size - 1.000)

(d) Exponential

Figure 5.2: Pareto, Weibull, Log-normal and Exponential α-estimator plots

Figure 5.2 shows four examples of outputs from the scaling estimator. The 47

CHAPTER 5. TRAFFIC MODELLING points in these plots are the points used for comparing the aggregated CCDFs, i.e., the points for which the scaling property holds. The most salient advantage of the scaling method is that it takes into account both power-law shape (i.e., a straight line in the CCDF) and the regions for which the distribution is scale invariant. As Figure 5.2 shows, the scaling estimator performs well in detecting the scaling behaviour of the Log-normal and Weibull distributions as well as for the Pareto.

5.1.2

Heavy-tailed Distributions

Formally, a heavy-tailed distribution is a distribution whose Complementary Cumulative Distribution Function (CCDF) decays as power-law, i.e., P [X ≥ x] ∼ cx−α

as x → ∞, 0 < α < 2

(5.2)

where c is a positive constant, α is the tail index, and ∼1 indicates that lim

x→∞

P [X ≥ x] =1 cx−α

Heavy-tailed distributions have infinite variance, and for α ≤ 1 also exhibit infinite means. The α parameter is related to the Hurst parameter by H = 3−α 2 . In network modelling, the range 1 < α < 2, corresponding to 21 < H < 1 is of primary interest [60]. That is, processes with infinite variances, but bounded means. By plotting the CCDF on a log-log scale, a heavy-tailed distribution appears as a straight line with slope −α. This is also known as exhibiting power-law behaviour. The typical example of such a distribution is the Pareto distribution (Eq. (5.3)). A Pareto distribution with slope α = −1 is shown in Figure 5.3. A larger class of distributions are the long-tailed distributions, also known as sub-exponential distributions. A long-tailed distribution is one that decays slower than an exponential distribution. Examples of this class are the Weibull 1 To

48

be read as “is distributed as”.

5.1. HEAVY-TAILED TRAFFIC MODELS

−2

Pareto Weibull Log−normal Exponential

−3

log P[X ≥ x]

−1

0

(Eq. (5.5)) and the Log-normal (Eq. (5.7)) distributions as shown in Figure 5.3. The parameters for the distributions are the same as in Figure 5.1. These distributions do not exhibit infinite means. For simplicity, the rest of the thesis denotes both long- and heavy-tailed distributions as heavy-tailed.

0

1

2

3

log x

Figure 5.3: Pareto, Weibull, Log-normal and Exponential CCDF

The Pareto Distribution The density and distribution functions for the Pareto distribution are

and

  α   α k k x f (x) =   0   α   1− k x F (x) =   0

x < k; k, α > 0

(5.3)

otherwise

x > k; k, α > 0

(5.4)

otherwise

where k is the smallest value the distribution may have. 49

CHAPTER 5. TRAFFIC MODELLING The Weibull Distribution The density and distribution functions for the Weibull distribution are   αβ −α xα−1 e−(x/β)α x > 0, α > 0, β > 0 f (x) =  0 otherwise and

  1 − e−(x/β)α F (x) =  0

x > k; α > 0

.

(5.5)

(5.6)

otherwise

For α = 1, the Weibull distribution is identical with the exponential distribution. The Log-normal Distribution The Log-normal distribution, as the Gaussian distribution, lacks a closed-form expression for the distribution. The density function is given by  2 2   √1 e−(ln x−µ) /2σ x > 0, σ > 0 2 x 2πσ f (x) = . (5.7)   0 otherwise

5.1.3

Implications of Heavy-tail Behaviour

An important implication of heavy-tail distributions is that the probability of “very” large values is nonneglible. This affects many fundamental tools and results in profound ways. For instance, heavy-tailed flow durations tend to persist if they have already been active for a period of time [60]. That is, the longer the flow has existed, the longer it is likely to persist. Another issue regarding the high variability of the heavy-tailed distributions is that the sample mean converges slowly to the population mean. It can be shown that the convergence error follows [60] |X(n) − µ| ∼ n1/α − 1. 50

(5.8)

5.2. HYPOTHESISING DISTRIBUTIONS Clearly, the closer α is to 1, the slower the convergence becomes. The consequence of this is that the number of samples needed to be able to generate heavy-tailed random variates is extremely high. For example, 1012 samples are needed to achieve a two-digit accuracy for α = 1.2 [60]. The implication of this is that classical traffic models based on assumptions of exponentiality and independence underestimate performance measures and requirements such as buffer occupancy and loss rates [62]. The high variability of self-similar processes is poorly handled by these models, and the impact on networks and systems is significant. Furthermore, Web servers and FTP servers may face issues regarding depletion of resources such as open sockets and files for heavy-tailed flow durations. The common way to handle these situations is to over-dimension networks and server hardware.

5.2

Hypothesising Distributions

The initial process of distribution selection is usually a combination of using visual inspection and summary statistics. Regardless of the method used, the process is of an exploratory nature and both qualitative and quantitative measures are frequently used.

5.2.1

Summary Statistics

Summary statistics are primarily useful as general indicators for the general shape of the distribution of X. Besides the sample mean n

X(n) = and variance

1X xi n i=1

n X 2  1 S (n) = xi − X(n) n(n − 1) i=1 2

51

CHAPTER 5. TRAFFIC MODELLING a few other statistic estimates can prove useful2 . The coefficient of variation, √ S2 cv = X is especially useful for indicating exponential behaviour, since cv = 1 for the exponential distribution. To measure symmetry, the skewness v=

1 3/2 n(S 2 )

n X  i=1

xi − X

3

can be used. v = 0 indicates a symmetric distribution, v < 0 indicates a distribution skewed to the left (i.e., has a heavier left tail), while v > 0 indicates a distribution skewed to the right (i.e., has a heavier right tail). Examples of distributions with different values for v are shown in Figure 5.4. The solid line is the normal density with mean 2 and variance 0.52 (v = 0.0), the dashed and dotted lines are Weibull densities with parameters (α = 1.5, β = 1) and (α = 7, β = 3.5) respectively (v = 1.1 and − 0.5). The dash-dotted line is an exponential density with rate 0.75 (v = 2). Typically, many observed distributions are skewed to the right [48]. v=0.0 0.6

v=1.1 v=−0.5

0.4 0.0

0.2

f(x)

v=2.0

0

1

2

3

4

x

Figure 5.4: Skewness 2 For

brevity, the variable n is omitted for the rest of the statistics in this section, so X(n) becomes X etc.

52

5.2. HYPOTHESISING DISTRIBUTIONS An additional statistic, the kurtosis κ=

1 n(S 2 )

4/2

n X 4  xi − X i=1

can be used as a measure of peakedness. A distribution with large kurtosis tends to have a peak near the mean and exhibit heavy tail behaviour, while distributions with low kurtosis have flat peaks (e.g., the uniform distribution) with rapidly decreasing tails.

5.2.2

Graphical Methods

Visual inspection makes use of various plots such as histogram (or Experimental P Probability Density Function (EPDF) if the xi s are normalised to i xi ), EDF, CCDF, Hill and α-estimation plots [22]. Experienced modellers are often able to identify distributions or mixtures of distributions by inspection alone. The lower quantiles of the data are useful to observe using the PDF. The CCDF serves the same purpose for the upper tail. The CCDF is useful for discerning potentially heavy tail behaviour in the distribution such as for file sizes and session durations [47]. The histogram is more suitable for observing metrics in situations where higher frequency behaviour is to be modelled, such as for inter-arrival times. Hill plots give an indication of the amount of heavy tail behaviour, and also potential cutoff points in the censored mixture model case. The α-estimation provides indications of the degree of self-similarity in the data. The visual inspection helps in eliminating many candidate distributions, and indicates whether a single distribution will suffice or if a mixture model is required. In certain cases, one may also identify rough estimates of distribution parameters (e.g., means and variances). For this thesis, single distributions and mixtures of two distributions are primarily considered, as the number of measurements makes the heuristics involved in calculating more cutoff points prohibitively complex. 53

CHAPTER 5. TRAFFIC MODELLING

5.3

Mixture Distributions

A finite mixture distribution is a distribution composed of a weighted sum of distributions [76, 83]. The PDF of such a distribution is given by p(x) =

n X

πi fi (x)

(5.9)

i=1

f(x)

0.00

0.05

0.10

0.15

0.20

where the πi s are known as the mixing weights or mixing probabilities and the fi s as the component densities with their associated parameters. For p(x) to Pn form a proper PDF, all fi s must be proper PDFs, and i=1 πi = 1. Figure 5.5 shows an example of a three-component mixture distribution in which all the components are normal distributions. The dashed lines depict the component densities, while the solid line is the resulting mixture distribution.

−2

0

2

4

6

8

10

x

Figure 5.5: The mixture distribution f (x) = 0.2φ(2.5, 0.5) + 0.4φ(6, 1) + 0.4φ(1, 2)

A classic example of a mixture distribution is the hyper-exponential distribution r X πi λi e−λi x (5.10) Hr (x) = i=1

which, for the case of r = 2, is an example of a binary mixture distribution (Eq. (9.1)). This important special case is of the form p(x) = πf1 (x) + (1 − π)f2 (x). 54

(5.11)

5.3. MIXTURE DISTRIBUTIONS For the purposes of this thesis, a binary mixture in which f1 (x) and f2 (x) are of the same distributional family is denoted as a dual distribution, e.g., dual Gaussian for g(x) = πφ(x) + (1 − π)φ(x). Probabilistically, a mixture model can be interpreted as modelling an r-stage parallel system in which a customer enters state i with probability πi . The customer remains in this state for a time distributed according to distribution fi . The theory for parameter estimation and fitness testing of mixture distributions is quite similar to that of the single distribution case, which makes mixture distributions an attractive tool for model construction. For instance, in certain cases, the body and tail of a given empirical distribution may not be adequately matched in a single distribution. Using a mixture of distributions would provide for more flexibility in the distribution fitting process, such as adding a high-variance Gaussian component to approximate heavy tail behaviour [76]. However, this flexibility may have as a consequence that the construction of analytic estimators is difficult, and it is often necessary to resort to numerical methods. If the number of component densities is large, it may be difficult to properly estimate parameters even by using numerical methods. Additionally, the data may have to be censored at specific cut-off points to remove biases in the estimation procedure. Locating these cut-off points is often a trial-and-error process and application-specific heuristics may have to be employed.

5.3.1

Censored Mixture Distributions

Though finite mixture models are a useful and intuitive tool for model construction, the method does not work well with all processes. For example, the hypothesised distribution may lack analytic parameter estimators and numeric methods fail, or provide very poor estimates. In this case, a censored mixture distribution may yield better results. By a censored mixture distribution we refer to a set of distributions describing various ranges of the distribution of the observations. Formally, there is a set of distributions F = F1 , . . . , Fk describing the ranges R = r1 , . . . , rk . These ranges are delimited by the cutoff points 55

CHAPTER 5. TRAFFIC MODELLING C = c1 , . . . , ck+1 such that R = {r1 = c1 ≤ X < c2 , . . . , rk = ck ≤ X < ck+1 }. Usually, k = 2, c1 = min X and c2 = max X + ǫ, where ǫ is a small value to ensure that max X is included in the range c2 . However, the added flexibility comes at the price of additional problems that need to be addressed: • The first problem involves the localisation of the cutoff points between distributions. It is rarely apparent where the cutoff points are located, and heuristic methods are often useful in locating them. • Secondly, the parameter estimation methods described in Section 5.4 assume that the observations make out the complete set of observations from the process. By splitting the data into ranges, this assumption no longer holds, and the estimation methods need modification. • A third problem regards the fitness assessment. The common fitness assessment methods described in Chapter 6 assume that the observation is a complete sample from the process. These methods therefore need to be modified to work as intended. Furthermore, in reality more than one or two distributions is not of much practical use other than for purely descriptive reasons. For example, generating random variate using this type of censored mixture distributions for use in, e.g., simulations is cumbersome. Fortunately, for many problems, it is sufficient to locate a single cutoff point between the distribution body and tail.

Cutoff Localisation The general idea for locating the cutoff points between distributions is that the best set of cutoff points is the one which gives the smallest total estimation error. This may be achieved by either employing successive censoring as described in [42], or by expressing the problem as a minimisation problem. An error 56

5.3. MIXTURE DISTRIBUTIONS function that takes into account the cutoff points can be described as follows: ε(C, Θ, x) =

k X i=1

[δ(x − ci ) − δ(x − ci+1 )] ξ(Fˆi (x), Fni (x))

(5.12)

where δ(x) is the Heaviside step function and ξ(X, Y) is a vector valued function to calculate a distance between X and Y. For example ξ(X, Y) = |X − Y| or ξ(X, Y) = (X − Y)2 . This function “picks off” each portion of the errors between the estimated CDF and the EDF. Here, the variable x does not refer to the observations but rather to the x-axis for the distribution combinations. It then becomes a matter of minimising the sum of these errors, i.e., minΘ,C ε. However, if k is large, the optimisation can be difficult due to the number of parameters involved. If the parameters are known, the problem becomes more tractable, as the only variables left are the locations of the cutoff points. Censoring The “picking off” in the previous section is formally denoted as censoring. Censoring may be performed in the time domain, i.e., only observations up to a specific time or number of observations are used. This is known as time censoring or type 1 censoring. Type 2 or failure censoring involves using only certain quantiles of the data. In the case of type 2 censoring with a single cutoff point, removing data starting from the lower quantiles (starting at 0) is denoted as left censoring while removing data from the upper quantiles (starting at 1) is denoted as right censoring. Censoring removes part of the observations, and many fitness assessment methods (e.g., the Anderson-Darling (AD) test, Section 6.2.2) assume that all observations are available. This means that the censored observations need to be modified. Fortunately, since the most powerful fitness assessment statistics make use of the Probability Integral Transform (PIT) method, this reduces the problem of adapting an arbitrary censored distribution to adapting a censored uniform distribution according to the following transformation [25]. Assume that the values Us . . . Ur , s < r are a set of ordered observations 57

CHAPTER 5. TRAFFIC MODELLING from U (s, r). After the transformation Vi =

Us+i − Us Ur − Us

(5.13)

the variable Vi is an ordered sample from U (0, 1). Vi can then be used to compute various test statistics or be tested for uniformity.

The Probability Integral Transformation (PIT) Theorem In short, the PIT transformation theorem states [25]: “If X is a real-valued random variable with CDF F (X), then U = F (X) is a uniformly distributed random variable on the interval (0, 1).” The PIT transform works similarly for both continuous and discrete random variables and is commonly used for generating random variates from a U (0, 1) random number [48]. In this case, the relation Y = F −1 (U ) is used. If U is a U (0, 1) random variable, then Y is a random variable distributed according to F. By using this method, the testing for a specific distribution F is converted to testing for uniformity over the range (0, 1). Figure 9.1(c) shows an example of BitTorrent session inter-arrival times transformed using the PIT.

5.4

Parameter Estimation

Parameter estimation, also known as point estimation, is in essence a minimisation problem. The principal difference among various estimation methods is related to the function to minimise. The theory of point estimation is extensive and a cursory description of some of the most popular methods is provided here. These are the method of moments, minimum χ2 , minimum distance and Maximum-Likelihood (ML) estimation methods. 58

5.4. PARAMETER ESTIMATION Though much of the discussion is about estimation of a single parameter θ, the methodology and theory is very similar in the case of multiple parameters. The parameter could thus very well be a set of parameters Θ = θ1 , . . . , θn . The set of parameters is denoted with a capital Θ, and a single parameter with lower-case θ. An estimator, ǫ, of the parameter θ is a real-valued function of X, so that θX = ǫ(X). The estimation error is then θX − θ.

5.4.1

Method of Moments

The idea behind the method of moments is to construct estimators by equating the sample moments with the distribution moments. This results in a system of equations as follows: µ1

=

m1

µ2

= .. .

m2

µN

= mN

(5.14)

Pn where µn = E[X n ] and mn = 1/n i=1 xni . The solution to this set of equations results in estimators for the first N moments. Often, only the first two moments are used, yielding the sample mean and sample variance. This method is rarely used, other than as a starting point for other estimators.

5.4.2

Minimum χ2

The minimum χ2 estimation method involves aggregation of the input data into discrete classes or bins. It utilises the χ2 statistic (also known as the Pearson statistic), which is a formal way of comparing a histogram of measured data and the PDF of a hypothesised distribution. If the range of the measured data is partitioned into k bins, the χ2 statistic 59

CHAPTER 5. TRAFFIC MODELLING is given by χ2 =

k 2 X [ni − npi (θ)] i=1

npi (θ)

(5.15)

where ni is the number of samples from the data that falls into bin i and n is the total number of samples in the data. The term npi (θ) denotes the number of samples that are expected to fall in bin i, if the samples were drawn from d 2 χ (θ) = 0 distribution p with parameter θ. Minimising (5.15) over θ, i.e., dθ 2 yields the minimum-χ estimate. Note that the minimum χ2 method does not require equally sized, i.e., equiprobable, bins. However, for this thesis the bin sizes have always been chosen to be of the same size. This means that specifying the number of bins implicitly specifies the width of each bin. A critical matter when using the χ2 method is choosing the proper number of bins. If the number of bins is too small, information is lost due to excessive aggregation. On the other hand, if the number of bins is too large, the histogram becomes jagged and erratic. There is no specific “correct” method for choosing the “proper” number of bins, but a rather large number of rules of thumb. Common rules for choosing the number of bins are “enough bins so that each bin contains at least 5 samples” [48] or “make sure that each bin has at least one sample”. A more formal rule is Sturge’s rule, which gives the number of bins, k, as k = 1 + log2 n. To choose the width of the bins, Scott’s rule (5.16) or the Freedman-Draconis rule (5.17) is often used. 3.5s (5.16) w= √ 3 n 2IQR (5.17) w= √ 3 n In (5.16) and (5.17), s is the sample standard deviation, n is the number of samples, and IQR is the sample interquartile range. In [81], the author presents a method based on Scott’s rule for calculating bin widths that is asymptotically L2 -optimal. This method has proven to provide good bin widths for most of the data in this thesis, and has been the method of choice for most histograms presented herein. 60

5.4. PARAMETER ESTIMATION A modification of the χ2 statistic is the λ2 statistic [61]. The λ2 takes into account the discrepancy in the χ2 -estimate for each bin. This makes it possible to compare the λ2 statistic among estimates using varying number of bins, which is not possible using the χ2 statistic.

5.4.3

Minimum Distance

The minimum distance estimation method uses a measure of difference or distance between Fn and Fˆ 3 . Compared to the χ2 method, it has the advantage of having no need for data aggregation beyond forming the EDF. The EDF is defined as Fn (x) = k/n with k as the number of observations less than or equal to some value x, and n the total number of observations. The aggregation used in this method is thus data-dependent, and not arbitrary as in the χ2 case. Common distance measures are the supremum distance D(θ) = sup |Fn (x) − Fˆ (x)| x

and the l2 -norm D(θ) =

s Xh x

i2 Fn (x) − Fˆ (x) .

The distance measure function is minimised to give the minimum-distance estimate of the parameter θ, thus ˆ = min D(Θ). Θ Θ

5.4.4

(5.18)

Maximum Likelihood

The central idea of Maximum Likelihood Estimation (MLE) is to answer the question “For what set of parameters Θ of the distribution F is it most likely that we would end up with the data x1 , . . . , xn ?” The answer to this is obtained 3 Recall

ˆ is denoted Fˆ (x). that F (x; Θ)

61

CHAPTER 5. TRAFFIC MODELLING by forming the likelihood function, L(Θ), L(Θ) =

n Y

f (xi ; Θ).

(5.19)

i=1

If the xi s are independent, L(Θ) is the probability that the xi s would be obtained if Θ is the parameter set for the distribution f . To avoid terminological confusion, this probability is called the likelihood. The likelihood function is maximised to obtain the ML parameter estimate ˆ = max L(Θ). Θ

(5.20)

Θ

Alternatively, the logarithm of the likelihood can be used. The value of Θ that maximises l(Θ) = ln L(Θ) will also maximise L(Θ). l(Θ) is known as the log-likelihood function. For example, the likelihood for the exponential distribution is Pn 1 (5.21) L(µ) = µ−n e− µ i=1 xi while the log-likelihood function is n

l(µ) = −n ln µ − which is much easier to manage.

5.4.5

1X xi µ i=1

(5.22)

Notes on Parameter Estimation

The theory of parameter estimation is often based on creating analytic expressions for a particular estimator. In certain cases, such as for moment estimators, which do not take the specific distribution into account, this is not a problem. However, in the case of mixture distributions with several components, obtaining closed-form expressions for the estimators may prove to be very difficult, if not impossible. It is often necessary to resort to numerical methods. The parameter estimates presented in this work have been obtained by optimisation methods available in the software package R [5]. For the singledistribution cases, the closed form ML estimators have been used as initial 62

5.5. SUMMARY estimate where possible. The parameter estimates are further improved by minimising the error percentage as described in Section 7.3. For the case of mixture distributions and single distributions without closed form estimators, numeric ML estimates have been used as starting points for further error percentage minimisation.

5.5

Summary

This chapter has discussed two basic tools of traffic modelling: model selection and parameter estimation. Special focus has been placed on heavy-tailed models. These models challenge fundamental assumptions of network traffic, and accurate modelling of traffic displaying heavy-tail behaviour is important. Several methods for selecting model distributions and parameter estimation are presented.

63

CHAPTER 5. TRAFFIC MODELLING

64

Chapter 6

Fitness Assessment No matter how many instances of white swans we may have observed, this does not justify the conclusion that all swans are white. – Karl Popper

Performance modelling tends to be fairly subjective, at least in the sense that the idea of whether a specific model is “good enough” or not, is application specific. There are several ways of approaching a specific modelling problem as well as there are several methods for determining whether the selected models are fit or not for the specific modelling activity. In this chapter, some classic methods for determining goodness-of-fit as well as an alternative method for assessing model suitability are discussed. The same notation for Fn , Fˆ et al. as in the previous chapter is used.

65

CHAPTER 6. FITNESS ASSESSMENT

6.1

Graphical Methods

For the purpose of performance modelling, graphical methods are intuitive and appealing ways of assessing the general fitness of a parametrised distribution. Visual procedures such as histogram and CCDF overplots, Quantile-Quantile (QQ) plots and difference plots all provide fitness quality assessment information. For instance, overplots give insight into the fitness of the lower and upper tails respectively of a single distribution. The QQ plot is a useful visual aid to assess the representativeness of the chosen model to several measurements simultaneously. Figure 9.1 shows examples of the graphical assessment tools. A humorously worded but still useful augmentation of both the visual tools and quantitative tools is the inter-ocular trauma test [10], which basically states that if the data looks significant, the data is significant. The point is that goodness-of-fit should not be completely dependent on statistical measures without carefully examining the data.

6.2

Hypothesis Testing

Classical statistical theory provides formal methods for testing the goodness of fit [25]. These tests are known as hypothesis tests and are formal ways of testing whether a given set of observations are Independent and Identically Distributed (IID) samples from a distribution Fˆ or not. The tests are performed by forming a statement regarding the nature of the observation and a related distribution F . This statement is known as the null hypothesis and is often denoted by H0 , e.g., H0 : The observations x1 , x2 , . . . , xn are drawn from the distribution F with parameters Θ. The hypothesis testing procedure entails the calculation of a specific test statistic and comparing this to tables of critical values. These tables contain values for the test statistic at specific significance levels. If the test statistic 66

6.2. HYPOTHESIS TESTING exceeds the value given at a certain level α, the null hypothesis is said to be rejected at significance level α. In this thesis, the term pass at significance level α will occasionally be used when referring to not rejecting the null hypothesis. The most fundamental form of most test statistics assume that the parameters of the distribution are not estimated from the data. This is known as the all parameters known-case or Case 0. The hypothesis is then called a simple hypothesis. If the distribution is known, but the parameters unknown, it is called a composite hypothesis. If the parameters are estimated from the data in any way, both the test statistic and the critical values need modification for the ˆ are specific distribution (Section 6.3). However, if the parameter estimates Θ “good enough”, the all parameters known case can be used to calculate the test statistics [25]. There are two major types of hypothesis tests: the χ2 and EDF tests. EDF tests are more powerful than χ2 tests [25].

6.2.1

The χ2 Test

The χ2 test is based on the fact that, in the case of the null hypothesis being true, the distribution of the χ2 -statistic (Eq. (5.15)) can be shown to converge to a χ2 distribution of degree k − 1 as n → ∞. Thus, the obtained value for χ2 is compared to χ2k−1,1−α , and if χ2 > the null hypothesis is rejected at the approximate level α.

χ2k−1,1−α

The χ2 test suffers from the same problems as the χ2 estimators, e.g., appropriate selection of bin sizes and the associated risk of misrepresentation. Despite these problems, the χ2 test is still used, since it is possible to test any distribution by using it. Other tests, such as the EDF tests, may require modification depending on the specific distribution being tested [48]. 67

CHAPTER 6. FITNESS ASSESSMENT

6.2.2

EDF Tests

EDF tests are a class of tests based on the differences between the EDF Fn (x) and the estimated distribution function Fˆ (x). There are two major types of EDF tests: the supremum tests and the quadrature tests.

The Supremum Tests The most well-known supremum test is based on the Kolmogorov-Smirnov (KS) statistic. The KS test statistic measures the largest vertical distance between Fn and Fˆ [25]. It is defined as o n Dn = sup Fn (x) − Fˆ (x) (6.1) x

or, alternatively

 Dn = sup D+ , D−

(6.2)

x

where D+ and D− are the largest positive and negative vertical differences respectively, i.e., D+ = supx {Fn (x) − Fˆ (x)} and D− = supx {Fˆ (x) − Fn (x)}. A related statistic is the Kuiper statistic, which is defined as V = D+ + D− .

The Quadrature Tests This class of tests, also known as the Cram´er-von Mises (CVM) family of tests, are based on the sum of squared differences between Fn and Fˆ [25]. The general form for the test statistics in this class is Z ∞ 2 {Fn (x) − F (x)} Ψ(x)dF (x) (6.3) Q=n −∞

where Ψ(x) is an error weighting function and n is the number of samples. This class of tests is more powerful, as it takes into account all discrepancies between Fn and Fˆ , not only the largest. 68

6.2. HYPOTHESIS TESTING The most common test statistics in the CVM family are the Cram´er-von Mises statistic (denoted by W 2 ) and the Anderson-Darling (AD) statistic (denoted by A2 ). For W 2 , the weighting function is Ψ(x) = 1, which basically amounts to W 2 being the mean square error of the estimated distribution to the actual data. The CVM statistic weights all errors equally, something that is not always desirable as distributions often differ mainly in the tails [25]. The AD statistic is more powerful when detecting deviations in the tails of a distribution. The associated weighting function in this case is Ψ(x) =

1

(6.4)



 Fˆ (x) 1 − Fˆ (x)

200 20 50 5

Weights

1000

and a visual example of Ψ(x) for a uniform distribution over (0,1) is depicted in Figure 6.1.

0.0

0.2

0.4

0.6

0.8

1.0

x

Figure 6.1: AD weighting function for a uniform distribution

Notes on EDF Tests The EDF test statistics are usually calculated by using the PIT method. This method transforms any distribution F into an approximately uniform(0,1) distribution (denoted by U (0, 1)). It can be shown that the vertical differences between a true U (0, 1) and the estimate are the same as those between F and Fˆ [25]. 69

CHAPTER 6. FITNESS ASSESSMENT The statistics are then calculated from the transformed values. For example, the D and A2 statistics are calculated with (6.5) and (6.6)      i−1 i − Zi , max Zi − Dn = max max i i i n n

(6.5)

n

A2 = −n −

1X (2i − 1) (ln Zi + ln [1 − Zn−i+1 ]) n i=1

(6.6)

where Zi are the order statistics of Fˆ (X; Θ). Even if the all parameters known-case is assumed, the distribution of most test statistics differs from the true distribution if parameters are estimated from the data. With the exception of the AD statistic, all EDF test statistics previously discussed need modification to maintain significance [25]. For example, in the case of the D statistic, the modification is Dmod = Dn





0.11 n + 0.12 + √ n



.

(6.7)

In the case of the Dmod statistic, the value of Dmod clearly grows with the square root of Dn . Thus, as n grows very large, so does Dmod . The modifications for √ other supremum statistics are similarly dependent on n. For the quadrature statistics, the modifications are similar to Eq 6.8 except for the AD statistic, which needs no modification if n ≥ 5. 2 Wmod = W 2 − 0.1n−1 + 0.6n−2



 1 + n−1 .

(6.8)

The modified test statistics are then compared to the values in tables such as shown in Table 6.1. For the composite case, i.e., no parameters known, other modifiers and tables must be employed. 70

6.3. THE CASE OF LARGE SAMPLE SPACES

Table 6.1: EDF statistic percentage points

Significance level α Test statistic

0.15

0.10

0.05

0.25

0.01

0.005

0.001

Dmod

1.138

1.224

1.358

1.480

1.628

1.731

1.950

W2

0.284

0.347

0.461

0.581

0.743

0.869

1.167

2

1.610

1.933

2.492

3.070

3.880

4.500

6.000

A

6.3

The Case of Large Sample Spaces

The largest problem with hypothesis tests is that, for a very large number of observations, they tend to reject the null hypothesis [12, 25, 48]. As noted in the previous section, the modified supremum statistics grow as √ n. Unless Dn is very small, H0 will be rejected for even fairly small n. There is no clear connection with the quadrature modifiers, though it is possible to intuit the same effect as shown in Figure 6.1. A possible explanation for the rejection of the null hypothesis in the case of large number of observations is that it is impossible to create a perfect model. This means that for each additional observation, an additional error term will be added to the integrated statistic. Even if each error term is very small, the total error will increase with n. However, Beran has shown that the rejection of the null hypothesis may also be caused by LRD in the data [12]. The effect of LRD is especially pronounced in the case of the simple hypothesis.

6.4

Relative and Absolute Fitness

While the test statistics discussed above may not work as expected for large sample spaces, the statistics are still useful as goodness-of-fit measures. For 71

CHAPTER 6. FITNESS ASSESSMENT instance, assume that the hypothesised distributions Fˆ1 and Fˆ2 yield values for A2 of 23.2 and 43.5 respectively. In this case, the choice of F1 as a more representative model would be warranted. This type of relative measure can be useful in comparing a large number of hypothesised distributions. However, as different statistics emphasise different parts of the distribution, it would be beneficial to compare using several statistics. If a maximum value for the test statistic exists, it is possible to give an absolute measure of fitness. For instance, the largest possible value of Dn is 1. An error percentage for Dn could then be denoted as D% = 100Dn . Absolute fitness measures have a more intuitive appeal than statistical measures. However, it is not always an easy task to determine a maximum error value. For example, a maximum error for the AD statistic Z ∞ Fn (x) − F (x) dF (x) (6.9) A2 = n −∞ F (x) [1 − F (x)] is substantially more difficult to calculate. Results on evaluating the distribution of the AD statistic exist, which may provide some assistance [50]. This is however beyond the scope of this thesis.

6.5

Summary

A number of fitness assessment methods have been presented. These include the χ2 and EDF hypothesis tests. Furthermore, the problems with large sample spaces have been highlighted. Classical hypothesis tests tend to reject the null hypothesis test for large sample spaces. Additionally, the benefits of using an absolute test statistic have been put forth.

72

Chapter 7

Modelling Methodology I KEEP six honest servingmen; (They taught me all I knew) Their names are What and Where and When And How and Where and Who. – Rudyard Kipling

Previous chapters have discussed several methods for model selection, parameter estimation and fitness assessment. This chapter presents the general methodology used in obtaining the models.

7.1

Distribution Selection

For each parameter to be modelled, distribution selection is an exploratory process of observing EPDF and CCDF plots. For inter-arrival and inter-departure times, the EPDF is primarily used, while for message rates the CCDF is the preferred method. The CCDF is better suited for detecting potential long-range dependent behaviour. This is important to detect in the case of rates, sizes and 73

CHAPTER 7. MODELLING METHODOLOGY durations, as underestimating these characteristics may have an adverse effect on the network.

7.2

Parameter Estimation

Based on the candidate distributions selected for modelling, MLE is used to obtain parameter estimates. With the number of observations available for the measurements, the obtained parameter estimates are assumed to be accurate enough to consider the associated distribution fully specified, given that the confidence intervals for the estimated parameters are within acceptable boundaries. In the case of single and mixture distributions, parameter estimation is a straightforward procedure, and estimates are obtained from the complete set of data. In the censored mixture model case, successive right censoring as employed in [42] together with an error percentage assessment (described in the following section) is used to find out the cutoff points for the mixture model. The censored models are however deprecated in favour of more tractable mixture models, as censored models are less convenient to use in a simulation environment. Once ML estimates are available, the error percentage presented in the following section is further used to optimise the parameters. It has been found that this often provides better results than accepting the ML estimates.

7.3

Fitness Assessment

To determine whether a distribution is representative of the observed data, visual procedures, formal hypothesis tests to a certain extent, and an error percentage assessment are used. To assess the quality of the estimated distributions in a more quantitative manner, a method similar to the EDF test that does not suffer as much with increasing number of observations is employed. 74

7.3. FITNESS ASSESSMENT The method is based on the EDF test for a fully specified distribution, as described in [25]: 1. Obtain the order statistics X1 < X2 < · · · < Xn from the measured data. 2. Transform the original data by using the PIT method and using the selected distribution and estimated parameters. If the samples X1 · · · Xn ˆi = F (Xi ; Θ),where ˆ are IID samples from some distribution F , then U i = 1, 2 . . . n, are uniformly IID on [0, 1]. 3. Obtain the error percentage by using the following expression:

E% =

n 100 X ˆi | |Ui − U nEmax i=1

(7.1)

where Emax is defined as Z1 0

1

sup {U (x), 1 − U (x)} dx =

Z2 0

1 − U (x) dx +

Z1

U (x) dx =

3 4

(7.2)

1 2

or, in plain terms, the maximum discrepancy from a true U [0, 1] distribution that may occur. 4. Accept or discard the estimated distribution as “good enough” according to some predefined criteria. For the purposes of this thesis, E% ≈ 5 is chosen as an upper limit for not discarding the estimated distribution. It is important to mention that this is not a statistical significance level, but rather an acceptable margin of error. Additionally, fuzzy classification or rough set theory may be employed in quantifying the goodness-of-fit in a more formal way. The informal degrees of fitness quality presented in Table 7.1 are used. More formally defined measures, e.g., proper membership functions, are subject of future research. 75

CHAPTER 7. MODELLING METHODOLOGY

Table 7.1: Fitness quality boundaries E% ≈ Degree

7.3.1

0

1

2

3

4

excellent

very good

good

fair

poor

Notes on the Error Percentage

The error percentage presented above suffers from the same problem as do the CVM and KS statistics, i.e., it weights all errors equally. To address this problem, a weighting similar to the AD weight function may be used. For using the error percentage in optimising parameter estimates for heavy-tailed ˆi | as well as an adaptation distributions, a weighting function for the term |Ui − U of Emax is necessary. Using a weight function of Ri = (1 + Ui )k provides increasing weight to the upper tail with increasing k. A general modification of Emax for any strictly increasing weight function is Z 1 Z Rx R(x)dx = Rmax − R(x)dx + Emax (k) = Rx

0

1 Rx ˜ ˜ + R(x) = Rx Rmax − R(x) 0

Rx

˜ x ) + R(0) ˜ ˜ x ) + R(1) ˜ = Rx Rmax − R(R − R(R ˜ x ) + R(0) ˜ ˜ = Rx Rmax − 2R(R + R(1)

(7.3)

where in the case for Ri = (1 + Ui )k R(x) = (1 + x)k Rmax = R(1) = 2k Z (1 + x)k+1 ˜ R(x) = R(x) = k+1 √ −1 k R (x) = x − 1  √  k−1 1 k −1 Rmax = 2k−1 − 1 = 2 k − 1. Rx = R 2

(7.4)

A suitable value for k depends on the shape of the distribution, and a certain 76

7.4. SUMMARY amount of experimentation is needed for each fitting problem. Adaptation of the error percentage to other weights such as those similar to the AD weighting function may be investigated as part of future work.

7.4

Summary

This chapter has presented the modelling methodology used in the thesis. Distribution selection, parameter estimation and fitness assessment have been discussed. Furthermore, a dedicated fitness assessment method have been presented, and a parametrisable weighting function for increased tail accuracy has been suggested.

77

CHAPTER 7. MODELLING METHODOLOGY

78

Chapter 8

BitTorrent Measurements I like to think that the moon is there even if I am not looking at it. – Albert Einstein

The measurements reported in this chapter were performed by having instances of the BitTorrent client software join several distribution swarms. An instrumented version of the reference BitTorrent client has been used to avoid potentially injecting non-standard protocol messages in the swarm. The client was instrumented to log all incoming and outgoing protocol messages together with a UNIX timestamp. The BitTorrent client is implemented in python, an interpreted programming language. The drawback with this is that the accuracy of the timestamps is reduced compared to the actual arrival times of the carrying IP datagrams. By comparing the actual timestamps of back-to-back messages at the application level with the corresponding TCP segments, the accuracy is estimated to approximately 10 ms. Most of the traffic reported here has been collected over a three week time period at two measurement points in Blekinge, Sweden. The first measurement point was the networking lab at BIT, Karlskrona, which is connected to the Internet through a 100 Mbps Ethernet network. The second measurement point

79

CHAPTER 8. BITTORRENT MEASUREMENTS was placed at a local ISP with 5 Mbpslink. Both measurement points were running the Gentoo Linux operating system, on standard PC hardware. For the initial set of measurements, a number of twelve measurements have been performed, each of them with a duration of two to seven days (Table 8.1). This first set of measurements were purely performed with the instrumented client. An additional measurement set with both application logging active and packet capturing running simultaneously has also been performed at BIT, for a total of thirteen measurements. For the first measurement point, no significant amount of other software was running simultaneously with the BitTorrent client. At the second measurement point, the BitTorrent client was running as a normal application, together with other software such as Web browsers and mail software. The first measurement point can be viewed as a dedicated BitTorrent client, while the second corresponds to normal desktop PC usage patterns.

8.1

Traffic Metrics

The BitTorrent client application logs are in essence timestamped protocol events. This means that metrics like inter-arrival and inter-departure times are readily available by simple calculations. The possibility does exist to compute detailed statistics on several levels of aggregation as well. Most notably, this offers the possibility to look into potential burstiness on timescales that are decided by the timestamp accuracy. Specific software has been written to extract several important statistics and metrics, to characterise the peer behaviour only, and not the entire swarm [30]. To measure the true size of the swarm, active probing of the tracker is necessary. This is, however, subject for future work. The goal is to use accurate characterisation and modelling of the behaviour of a peer in modelling entire swarms. A number of metrics have been used for the characterisation of the BitTorrent signalling traffic [30]. The most important ones are as follows: 80

8.1. TRAFFIC METRICS Download time This is the time it takes for the modified client to do a complete download. This metric also provides information about the peer changes from being both a downloading and uploading peer to being a seed, thus offering the possibility to collect statistics about the seed and leecher states. Session duration and size A BitTorrent session is equivalent to a TCP session, given that the BitTorrent handshake is completed. As BitTorrent protocol messages are fixedlength messages, there is a one-to-one mapping between the messages sent and received during a session and the session size. A BitTorrent session duration is the same as the TCP session duration, whereas the session size is the amount of data transmitted during the TCP session. Number and type of messages The number of messages of each type in both upstream and downstream directions are counted. Together with the session duration and size, this gives us valuable insights into the behaviour of a peer. Host persistence The number of unique host IP addresses and peer client IDs are also counted. If a given host IP address has a one-to-one mapping to a peer ID and a long session time, the peer is considered to be persistent. Persistent peers indicate a healthy swarm in the sense that new peers are more likely to find a larger number of seeds in a swarm with many persistent peers than in one with less persistent peers. Peer swarm size The peer swarm size refers to the number of peers observed by the measuring client at any given time. This is not the size of the entire swarm, i.e., the total number of collaborating peers, but the number of peers to which the measuring peer is connected. Information about the total swarm size is only available at the tracker, and therefore it is not considered in the reported measurements. 81

CHAPTER 8. BITTORRENT MEASUREMENTS Piece response times The piece response time is defined to be the time elapsed between the moment of the initial request for any subpiece belonging to a given piece to the moment of the transmission of the associated have message. This parameter gives us the possibility to estimate the downstream bandwidth usage. Piece popularity The popularity of a piece is given by the number of requests for any subpiece of a given piece. This gives an indication of the effectiveness of the piece selection algorithms of the requesting peers.

8.2

Traffic Measurements

Measurements 1 through 3 (Table 8.1) were performed with a single instance of the instrumented BitTorrent client running. As TCP is known to be very aggressive in using the network, this was to minimise the effects of several clients competing for the available bandwidth and to establish a point of reference for the rest of client sessions. Measurements 4 through 8 were started simultaneously, as were measurements 11 and 12. The remaining measurements were performed with some temporal overlap, as shown in Figure 8.1. An important issue regarding traffic measurements in P2P networks is the copyright issue. The most popular content in these networks is usually copyrighted material. To circumvent this problem, BitTorrent swarms distributing several popular Linux operating system distributions were joined. Notably, both the RedHat Fedora Core 2 (FC2) test and release version swarms were joined. The FC2 ’Tettnang’ version was released on May 18th 2004, while the rest of the content was available at the start of the measurements. This provided a unique opportunity to study the dynamic nature of the FC2 swarms. The contents of the measured swarms are reported in Table 8.2. Two of the swarms have been measured from both measurement points to allow for comparisons, one with temporal overlap, and another without overlap. 82

8.2. TRAFFIC MEASUREMENTS

Table 8.1: Measurement summary Number

Records

Start

Duration

Location

1

10770695

2004-05-03

2 days, 20 hours

BIT

2

10653466

2004-05-06

3 days, 19 hours

BIT

3

10990569

2004-05-12

4 days, 4 hours

BIT

4

12567283

2004-05-17

7 days

BIT

5

13691459

2004-05-17

7 days

BIT

6

11754838

2004-05-17

7 days

BIT

7

1943636

2004-05-17

7 days

BIT

8

7321166

2004-05-17

7 days

BIT

9

687046

2004-05-13

3 days, 7 hours

ISP

10

2881803

2004-05-18

5 days, 23 hours

ISP

11

9252170

2004-05-22

7 days

ISP

12

5599997

2004-05-22

7 days

ISP

13

14803678

2004-06-26

7 days

BIT

a

a Unfortunately, the original data for this measurement was lost due to hardware failure. Thus, most analysis is not performed on this data, and only summary statistics are provided.

May 1

May 8 1

2

May 15 3

May 22

May 29

4–8 9

10 11–12

Figure 8.1: Temporal structure of measurements 1–12

83

CHAPTER 8. BITTORRENT MEASUREMENTS

Table 8.2: Content summary

8.3

Content

Pieces

Size

Measurement

RedHat FC 2 test3 CD Images

8465

2.2 GB

1–3

RedHat FC 2 test3 DVD Image

16708

4.3 GB

6, 10

Slackware Linux Install Disk 1

2501

650 MB

4

Slackware Linux Install Disk 2

2627

670 MB

5

Dynebolic Linux 1.3

2522

650 MB

7, 9

Knoppix Linux 3.4

2753

700 MB

8

RedHat FC 2 ‘Tettnang‘ CD Images

8719

2.2 GB

12,13

RedHat FC 2 ‘Tettnang‘ DVD Image

16673

4.3 GB

11

Summary Statistics

In this section some of the more salient results obtained from the measurements are reported. Download times and rates are summarised in Table 8.3. It is observed that the time before the measurement peer enters seeding mode varies from roughly 20 minutes up to 6.5 hours. As the content sizes vary with each measurement, also provided are the average download rates for the entire content, i.e., the size/time ratio. The download rates also show large disparity, with rates ranging from just over 120 kB to over 1.3 MB, with the three first measurements clearly being the most demanding in terms of bandwidth. A summary of session sizes and durations is reported in Table 8.4. Also included are the number of sessions and unique peer IPs and peer client IDs. Measurement 6 clearly stands out here, both with regards to mean session size and session length. Also, the maximum session size for this measurement is more than twice that of any of the other measurements. The mean session size is also about twice that of the corresponding measurement of the same content (measurement 10). As measurements 6 and 10 have the top two session sizes, it is probable that the session size is related to the total content size (4.3 GB). 84

8.3. SUMMARY STATISTICS

Table 8.3: Download time and average download rate summary Measurement

Time (s)

Rate (bytes/s)

1

1930

1149520

2

1932

1147908

3

1681

1319445

4

2607

251424

5

3397

202644

6

23000

190416

7

1237

534282

8

6005

120153

9

2723

242776

10

23475

186570

11

19431

224927

12

9106

250989

13

2951

774420

The minimum session lengths are all set to 0, indicating that all of them are shorter than the accuracy provided for by the application logs. These very short sessions are also indicated in the minimum session sizes, and correspond to a session containing only a handshake or an interrupted handshake. Another pertinent feature is the ratio of number of unique IPs to number of unique peers for measurement 8. The IP-to-ID ratio for this measurement is slightly above 0.25, while none of the other measurements are below 0.5. This might indicate either users stopping and restarting their clients several times, or users sharing IPs, such as peers subject to NAT. Table 8.5 summarises the number of messages received on a per-message basis. In addition, column 5 shows the number of incoming connection requests collected. 85

CHAPTER 8. BITTORRENT MEASUREMENTS

Table 8.4: Session and peer summary

8

7

6

5

4

3

2

1

3043

17287

4444

10685

12354

13493

28687

46022

29712

652

294

231

218

1207

910

750

465

233

343

Mean

141509

267497

29163

87026

46478

223235

180298

143707

171074

117605

98991

Max

0

0

0

0

0

0

0

0

0

0

0

0

Min

4036

2580

3791

5907

1719

1972

1642

7016

4504

3942

3614

2316

2741

Std

32.2

8.31

17.22

37.78

21.62

33.11

49.96

74.25

57.08

49.88

28.54

27.15

27.49

Mean

1652.83

987.89

475.86

1499.85

408.05

695.94

431.13

3117.79

668.53

671.99

539.20

646.03

647.26

Max

73

73

73

73

78

73

78

73

73

73

73

73

73

Minb

99.4

30.63

52.73

109.08

42.27

109.31

76.48

247.74

116.10

100.65

61.70

64.05

70.65

Std

3930

2177

1841

444

193

1656

279

1033

1747

1813

1913

1876

2024

ID

2440

1152

1067

305

166

406

184

619

962

1143

1319

1394

1314

IP

Peersa

9 9701

448

292241

0

Session size (MB)

10

43939

197

483996

Session length (s)

11

68288

465

Sessions

12

52833

#

13 b This

peer client IDs and IP addresses. column is measured in bytes.

a Unique

86

8.3. SUMMARY STATISTICS The request and have messages clearly dominate in terms of number of messages sent, while the interested and not interested messages are the least common. This is valid for all measurements, except for measurement 2, which has almost 5 times more incoming interested messages than the measurement with the second highest number of interested messages. The high number of request and have messages found in the measurements is expected, as the peer is acting as a seed for most of the time spent in the swarm. When seeding, a peer never receives piece messages, and the downloading peer must request data with the request message, thus explaining their high number. The have messages are accounted for by the fact that every completed piece download results in such a message being transmitted. The summary of the outgoing messages in Table 8.6 again shows the very low number of interested and not interested messages. The major bulk of the outgoing messages is however accounted for by the piece messages. This is again an expected result, as request messages generate a piece message in response. The absence of transmitted choke messages for measurement 7 indicate that there has been a continuous exchange of data between peers. As for the request and have messages, these are tightly coupled to the number of pieces present in the content. The higher number of request messages is due to these messages corresponding to only a single subpiece.

87

CHAPTER 8. BITTORRENT MEASUREMENTS

12

11

10

9

8

7

6

5

4

3

2

1

#

8113100

1118110

1835910

838379

217336

3347256

810019

4501907

6163605

5596270

3276644

3044768

3316470

request

711

348

470

79

37

766

52

191

401

406

493

489

504

not int.

139702

139943

268575

268429

40426

44328

40371

277261

42176

40167

135682

135797

135615

piece

52865

68297

43957

9703

3045

17292

4445

10688

12364

13502

28714

46047

29746

new conn.

50304

67373

42848

9181

2996

16623

4370

9659

11827

12935

27092

45054

28024

bitfield

60524

37925

54090

13015

1114

9270

290

24239

32325

29628

40705

19117

27120

unchoke

6293438

2619333

4713440

570367

139472

404038

198885

2090892

1197813

1206000

3941658

3984881

3651835

have

9477

3242

2573

692

259

2012

230

2147

2059

2041

2430

14602

2905

int.

58925

36872

52458

11936

956

8579

122

23639

31452

28640

39955

18061

26314

choke

24632

25047

17313

9085

3061

18999

1255

6244

11508

14643

7628

9059

6500

cancel

Table 8.5: Downstream protocol message summary

13

88

request

137007

137271

136738

42709

44862

291200

40497

47413

40906

285650

281921

145517

141316

#

1

2

3

4

5

6

7

8

9

10

11

12

13

7940342

960802

1660868

753074

213693

3296616

808844

4394389

6032599

5468908

3189175

2964836

3251948

piece

80

76

67

71

16

100

18

91

146

76

62

63

63

not int.

27332

49093

35698

21304

3192

19380

4445

23166

25759

25476

16545

17471

11792

unchoke

52830

68271

43927

9673

3042

17281

4444

10661

12353

13489

28682

46020

29714

bitfield

97

125

157

214

19

136

18

197

157

86

64

70

68

int.

8719

8719

16673

16708

2522

2753

2522

16708

2627

2501

8465

8465

8465

have

Table 8.6: Upstream protocol message summary

23527

34570

31279

15222

193

8672

0

18943

23749

22740

14085

13301

9553

choke

807

701

812

611

220

423

140

555

725

855

1011

894

970

cancel

8.3. SUMMARY STATISTICS

89

CHAPTER 8. BITTORRENT MEASUREMENTS

8.4

Swarm Size

The number of locally connected peers at any given time is an indicator of the popularity of the data content of the swarm. The solid line in Figure 8.2 (Measurement 6) shows the typical evolution of the number of connected peers in a popular swarm. The measurement peer rapidly connects to – and stays connected to – the preconfigured maximum number of 55 peers, which is the default. Measurements 1–3, 6, 11–13 show similar behaviour. The dashed line (Measurement 4) shows a swarm that is less popular, at least in the sense that it takes longer to find enough peers to download from. The maximum number of peers is not reached until the end of the leeching phase. The amount of data is substantially smaller and the average download rate higher (Table 8.2), which means that the leech phase is ended fairly quickly.

Figure 8.2: Connected peers during seed phase for measurements 4 and 6

This is further reinforced by the accompanying seed phase graphs. During the seeding phase, none of the measurement peers have the maximum number of peers continuously connected, though the number of connected peers is close to the maximum. Measurements 4, 5, 7 and 9 show much less activity during leech and seed phases. This behaviour can be explained by the fact that the content of these swarms (the Slackware and Dynebolic Linux distributions) are less well-known and/or used than the RedHat distribution. In [39], the authors show the influx of new users in a BitTorrent swarm when popular new content arrives. They denote this sudden increase in swarm size 90

8.4. SWARM SIZE

Number of connections

60 50 40 30 20 Measurement 6 Measurement 4

10 09:00

10:00

11:00

12:00

13:00

14:00

15:00

16:00

Figure 8.3: Connected peers during leech phase for measurements 4 and 6

as the flash-crowd effect. With this in mind, it is interesting to compare the seed phases of measurements 6, 10 and 11. The data in the first two measurements was the test release of the RedHat Fedora Core 2 Linux distribution, while the data in the last swarm was the final release of the same version. The final version was released on May 18, 2004. This event can be clearly observed at around 12:00 in Figure 8.4, at which time peers start disconnecting. It is likely that this is due to the release of the new version of the distribution, and that BitTorrent users react quickly to the availability of the new content.

Figure 8.4: Swarm reaction to new content

It is also interesting to note the similarity of measurements 6 and 10 between May 19 and 24 as shown in Figure 8.4. The number of connected peers are quite similar, except for a slight shift in time. Since the measurements were made at different locations, this shift can be explained by a slight difference in system clocks.

91

CHAPTER 8. BITTORRENT MEASUREMENTS

8.5

Session Sizes

One of the innovations with the BitTorrent protocol is the tit-for-tat notion of enforced reciprocation. One way to assess how well this is enforced in practice is to observe the amounts of data sent and received in each session. Of particular interest are the session sizes during the leech phase. The average share ratios for the measurement peer during the leech phase are presented Table 8.7 . The share ratio is the upstream session size divided by the downstream session size. Preferably, a peer should have at least a share ratio of 1, i.e., should upload at least as much data as it downloads. Table 8.7: Share ratio during leech phase Measurement Share ratio

1

2

3

4

5

6

7

8

10

11

12

13

0.01 0.19 0.08 3.45 3.63 0.53 0.60 5.55 0.73 0.05 0.09 0.07

The most apparent result is the very low share ratio for measurements 1–3 and 11–13. This is more likely due to the peer connecting to a high number of seeds rather than acting unfairly to leech peers. If the latter were the case, the download times would likely be substantially larger than the other measurements, since the peer would be punished for not reciprocating. Table 8.8: Correlation coefficients for session sizes Measurement Leech phase Seed phase

1

2

3

4

5

6

7

8

10

11

12

13

−0.10 −0.06 −0.09 0.14 0.11 0.14 −0.14 0.48 0.01 −0.08 0.90 −0.10 0.98

0.96

0.97

1

1

1

1

1

0.98

0.91

0.92

0.99

Table 8.8 shows the correlation coefficients for upstream and downstream session sizes during leech and seed phases. The high correlation of session upstream and downstream sizes during the seed phase is not surprising. It may by explained by the fact that during this phase, the measurement peer only responds to activity from other peers. The peer does not initiate any new connections and does not request any data. The slight deviations from a correlation 92

8.6. SUMMARY of 1 are probably due to peers requesting data but disconnecting due to being snubbed or to network problems. Further investigation of the share ratios and session size correlations are planned as future work.

8.6

Summary

This chapter has given a detailed account of the measurements used for modelling in this thesis. Relevant traffic metrics have been discussed and presented. Furthermore, summary statistics have been put forth, showing high variability in bandwidth utilisation and number of active peers. Additionally, peers are observed to react quickly to new content, which corroborates earlier results.

93

CHAPTER 8. BITTORRENT MEASUREMENTS

94

Chapter 9

BitTorrent Models Mh mou toudej kÎklouj t‚ratte!

– Archimedes

Traffic modelling is an important activity in the context of predicting future network behaviour. Accurate models for various networked applications are a decisive step towards QoS in the Internet. With the rise in popularity of P2P protocols, the importance of tractable and practically usable models is further increased. This chapter presents models for BitTorrent session and message characteristics. The primary goal of the models is to be useful in a simulation environment.

9.1

Session Characteristics

In this section the modelling results for the distributions of session inter-arrival times, upstream session sizes and durations are reported.

95

CHAPTER 9. BITTORRENT MODELS

9.1.1

Session Inter-arrival Times

The distributions reported in this section refer to inter-arrival times for remotely initiated sessions during the seeding phase of the measurement peer. The leech phase is not considered, partly because it is short compared to the seed phase and the number of non-locally initiated sessions is fairly low, partly because the peer is more active during this phase than during the seed phase. The combination of active peer status and low number of samples (e.g., only 10–20 sessions) that is present during the leech phase makes the analysis more difficult. Session inter-arrival times have been observed to be well modelled by using a two-stage hyper-exponential distribution, denoted by H2 . The associated probability density function is H2 (x) = pλ1 e−λ1 x + (1 − p)λ2 e−λ2 x

(9.1)

where λ1 and λ2 are the arrival rates for the two exponential terms, and p is the mixing weight. Figure 9.1 shows examples of visual assessment tools. Figures 9.1(a) and 9.1(b) show PDF and CCDF overlay plots for measurement 3. Both indicate a very good match for up to 99% probability mass, with most of the errors in the tail of the distribution. Figure 9.1(c) shows a QQ plot with all measurements. Parameter estimates for all measurements have been obtained using MLE. Table 9.1 reports the parameter estimates and the associated standard deviations obtained in the fitting procedure. Also presented is the E% value and the resulting fitness decision and degree. Summarising the results for session inter-arrival times during the seeding phase it is observed that all measurements pass according to the selected error criteria. Furthermore, it is observed that measurements 2 and 3 have low E% values, and that they pass at significance levels of ≈ 0.005 when using the Anderson-Darling test. This indicates that the selected H2 distribution is a good candidate for the underlying true distributions.

96

9.1. SESSION CHARACTERISTICS

0.12

0

log10 P[X ≤ x]

0.08 0.04

Density

−1

−2

−3

50.0% 80.0% 90.0% 95.0% 99.0%

0.00

−4

0

10

20

30

40

−3

−2

−1

Interarrival time

0

1

2

3

log10 x

(a) Empirical PDF for measurement 3 with (b) CCDF for measurement 3 with estimate estimate overlaid overlaid

(c) QQ-plot of all measurements subject to ˆ1 , λ ˆ 2 , pˆ) H2 (λ

Figure 9.1: Fitness assessment plots

97

CHAPTER 9. BITTORRENT MODELS

12

11

10

8

7

6

5

4

3

2

1

number

0.0563 ± 0.0004

0.0935 ± 0.0004

0.0140 ± 0.0009

0.5581 ± 0.0205

0.5142 ± 0.0113

0.4188 ± 0.0143

0.4798 ± 0.0174

0.5538 ± 0.0212

0.5372 ± 0.0178

0.0566 ± 0.0006

0.1158 ± 0.0009

0.0593 ± 0.0046

ˆ1 ± σ λ ˆλ1

0.4175 ± 0.0065

5.8224 ± 0.1380

0.0802 ± 0.0005

0.0128 ± 0.0002

0.0168 ± 0.0002

0.0052 ± 0.0001

0.0127 ± 0.0002

0.0162 ± 0.0002

0.0168 ± 0.0002

0.3653 ± 0.0099

0.7556 ± 0.0279

0.1696 ± 0.0085

ˆ2 ± σ λ ˆλ2

0.5897 ± 0.0048

0.8252 ± 0.0021

0.0219 ± 0.0024

0.3276 ± 0.0064

0.4252 ± 0.0050

0.3014 ± 0.0076

0.2879 ± 0.0060

0.2156 ± 0.0052

0.2533 ± 0.0052

0.6575 ± 0.0077

0.7936 ± 0.0066

0.2215 ± 0.0467

pˆ ± σ ˆp

1.87389

3.84606

2.20763

3.76412

2.79291

2.05430

3.93588

2.79722

2.79455

0.49009

0.41535

2.07367

E%

Pass, good

Pass, poor

Pass, good

Pass, poor

Pass, fair

Pass, good

Pass, poor

Pass, fair

Pass, fair

Pass, excellent

Pass, excellent

Pass, fair

Comment

Measurement

13

Table 9.1: Fitted hyper-exponential parameters

98

9.1. SESSION CHARACTERISTICS

9.1.2

Session Duration and Size

In this section the modelling results for the size and duration of remotely initiated peer sessions are reported. It is observed that they are highly related, and also show fairly high correlation, as shown in Table 9.2. This is an expected result, though the correlations were expected to be a bit higher. The reported correlations indicate that there are long sessions that request little or no data, alternatively that short sessions transmit large amounts of data (the mice and elephants effect [63]). Table 9.3 indicates that it is the former reason that primarily affects the size-duration correlations. Table 9.2: Correlation coefficients for session duration and sizes Measurement

1

2

3

4

5

6

7

8

10

11

12

13

ρxy 0.32 0.36 0.29 0.30 0.30 0.34 0.47 0.40 0.67 0.43 0.38 0.25

Figure 9.2 further indicates the mice and elephant effect. The dotted lines show the average session duration and size respectively. The circles represent the sessions initiated during the leech phase, and the dashed line marks the length of the leech phase. It is interesting to note that there are two clearly discernable clusters for the sessions during the seed phase. These clusters are present for all measurements to some extent and represent the elephants (the upper cluster), and mice (lower cluster plus linear shape at bottom). Note that the axis are log scaled. For reasons similar to those for session inter-arrival times, the following are considered for modelling: • Measurements with more than 20000 sessions • Sessions initiated after the start of the seeding phase • Sessions that actually request and receive at least one piece. The reason for this is threefold: 99

1e+09 1e+02

1e+03

1e+04

1e+07

Upstream session size 1e+01

1e+05

5e+07 5e+06 5e+05 5e+04

Upstream session size

5e+08

CHAPTER 9. BITTORRENT MODELS

1e+05

1e+01

1e+02

Session duration

1e+03

1e+04

1e+05

Session duration

(a) Measurement 3

(b) Measurement 13

Figure 9.2: Session size-duration scatter plot

1. As observed in Table 9.3, most sessions do not transfer any data after the initial TCP handshake, with the consequence of a fairly low number of samples (3–6 % of the total number of sessions) left for parameter estimation. By including the measurements with fewer sessions, the remaining number of sessions would be inadequate for proper parameter estimations.

Table 9.3: Percentages of session sizes exceeding 0 bytes and 1 piece size

> 0 bytes

≥ 1 pieces

Measurement

1

2

3

11

12

13

Sessions

1558

1619

1795

3092

3793

3438

% of sessions

5

4

6

7

6

7

Sessions

1392

1356

1564

1769

2612

3017

% of sessions

5

3

5

4

4

6

2. The α-estimations for measurements (Table 9.4) indicate that there could be some heavy tail behaviour present in the distributions, as observed 100

9.1. SESSION CHARACTERISTICS in the CCDF plots. The shape in Figure 9.3(a) is representative of the CCDFs of session duration for all measurements. To consider modelling this behaviour, enough samples in the tail are needed. Considering point 1 above together with the need for a sufficient number of samples in the tail (e.g., the upper 5 %-quantile), a large number of observations in the original data is necessary. Table 9.4: Session α-estimates Measurement duration

size

1

2

3

11

12

13

α ˆ

1.335

1.264

1.523

1.379

1.272

1.435

2 σ ˆα

0.149

0.163

0.116

0.134

0.060

0.176

α ˆ

1.176

1.147

1.233

0.961

0.902

1.289

2 σ ˆα

0.353

0.339

0.320

0.222

0.147

0.207

3. Both session sizes and durations appear to be drawn from a single, similar distribution when inspecting only sessions that have transmitted at least one piece (Figure 9.3(b)). Having a single distribution makes the model more tractable than using a mixture model. This is especially true in the case of models that cannot be expressed as a mixture distribution (Section 5.3), i.e., a linear combination of distributions. In this case, it is necessary to locate cutoff points between distributions, and it is more cumbersome to use the results in for instance a simulation environment. The models for session sizes and durations are reported in Tables 9.5 and 9.6, respectively. Only the sessions that actually receive data have been modelled. Log-normal distributions with parameters µ and σ have been used for modelling. The second and third columns show the estimated parameters, together with the associated estimated standard deviations, for which the best value of E% was obtained. The value of E% is given in column 6. The fourth column indicates the tail probability mass for which the estimated distribution passed the 5 % fitness limit of E% , while the fifth column shows the tail probability 101

log P[X ≥ x] −3

−2

50.0% 80.0% 90.0% 95.0% 99.0%

−3

−4

50.0% 80.0% 90.0% 95.0% 99.0%

−2

−2 −3

log P[X ≥ x]

−1

−1

0

0

CHAPTER 9. BITTORRENT MODELS

−1

0

1

2

3

4

5

1

2

3

log x

(a) Duration CCDF for all sessions

File: meas_2_reference_duration.asc No. points: 28687 Alpha Estimate: 1.523

-0.5 -1 -1.5 Log10(P[X > x])

5

(b) Duration CCDF for sessions with ≥ 1 piece

0

-2 -2.5 Raw Data 2-Aggregated 4-Aggregated 8-Aggregated 16-Aggregated 32-Aggregated 64-Aggregated 128-Aggregated 256-Aggregated 512-Aggregated "meas_2_reference_duration.asc.pts"

-3 -3.5 -4 -4.5 -1

0

1

2 3 Log10(size - 465.197)

4

5

6

(c) α-estimate plot for session duration

Figure 9.3: α-estimates and CCDF for measurement 3

102

4

log x

9.1. SESSION CHARACTERISTICS mass for which the best value of E% was obtained. It should also be noted that the single Log-normal distribution fitted to durations and sizes tends to overestimate them. Since the number of samples is substantially smaller than for the hyperexponential models shown in Section 9.1.1, the AD statistic is calculated for the estimated distribution. Column 7 shows the significance levels obtained in the AD test, under the assumption that the parameter estimates are good enough to assume a fully specified distribution. The last column shows the fitness decision, together with the result of the AD test passing at the critical level.

Table 9.5: Log-normal parameter estimates and errors for upstream session sizes during seed phase Measurement number

µ ˆ±σ ˆ

1

18.7 ± 0.04

σ ˆLN ± σ ˆ

Tail Pass mass E% AD sign. Comment

0.62 ± 0.02

0.45

0.21

2.1 > 0.25

Pass, good AD: Pass

2

17.8 ± 0.04

0.99 ± 0.03

1

0.4

2.9 > 0.025

Pass, fair AD: Fail

3

18.4 ± 0.04

0.60 ± 0.02

1

0.24

3.3 > 0.05

Pass, fair AD: Pass

11

14.1 ± 0.06

2.44 ± 0.04

1

0.99

2.4 ≈ 0.001

Pass, good AD: Fail

12

13.6 ± 0.05

2.36 ± 0.04

0.86

0.74

3.4 < 0.001

Pass, fair AD: Fail

13

19.0 ± 0.03

0.69 ± 0.02

1

0.17

3.0 > 0.025

Pass, fair AD: Fail

Though expected that a true heavy-tail model such as the Pareto distribution or a mixture of the Pareto and Log-normal distributions would provide a better 103

CHAPTER 9. BITTORRENT MODELS

Table 9.6: Log-normal parameter estimates and errors for upstream session durations during seed phase Measurement number

µ ˆ±σ ˆ

1

8.55 ± 0.03

σ ˆLN ± σ ˆ 1.08 ± 0.02

Tail Pass mass E% AD sign. Comment 1

0.74

2.2 ≈ 0.01

Pass, good AD: Fail

2

8.16 ± 0.04

1.33 ± 0.03

1

0.99

1.5 > 0.15

Pass, good AD: Pass

3

8.17 ± 0.04

1.38 ± 0.02

1

0.98

1.6 > 0.05

Pass, good AD: Pass

11

8.09 ± 0.04

1.56 ± 0.03

1

1

2.4 > 0.001

Pass, good AD: Fail

12

7.2 ± 0.03

1.57 ± 0.02

1

1

3.9 ≪ 0.001

Pass, poor AD: Fail

13

7.94 ± 0.03

1.52 ± 0.02

1

1

2.3 < 0.001

Pass, good AD: Fail

fitting model, it was found that this was not the case. This is most likely due to the limitation in the amount of data available in a BitTorrent swarm, which places an upper bound on the amount of data that a peer is interested in downloading. There is no point in a peer downloading more data once the entire content is obtained. It may be conjectured that if a swarming BitTorrent-like model is applied to streaming data such as VoIP or video, the distribution would tend more toward a Paretian tail. 104

9.2. MESSAGE CHARACTERISTICS

9.2

Message Characteristics

While the models presented in the previous section refer to traffic collected from application layer logs from several measurements, the following models refer to traffic collected from the link layer trace of measurement 13. The link layer packet captures were processed using the modified tcptrace described in [30]. The BitTorrent-specific parser module created the time-stamped message logs used for the models. The resolution used for modelling the message rates is one second. This resolution has been chosen partly to reduce the amount of data in larger sample sets and partly due to the difficulty of properly calculating instantaneous rates on short timescales. Also, the number of back-to-back messages has not been modelled, i.e., inter-arrival and inter-departure times of 0 s are excluded from the models.

9.2.1

Upstream request–messages During Leech Phase

The request-messages and their responses are the major bandwidth contributors in a BitTorrent session. Modelling the request behaviour of the measurement peer provides valuable information to describe the overall behaviour of the entire swarm. The leech phase of a BitTorrent peer may be partitioned into three different sub-phases. During the first sub-phase, the peer is trying to connect to a predefined maximum number of other peers. This means that the number of connected peers will increase during this phase, thus also increasing the number of outgoing piece requests. On entering the second phase, the peer has connected to enough peers. The number of connected peers and outgoing messages fluctuate around some average value during this phase. The final phase is the end-game mode (Section 3.6.1), during which the peer sends a large amount of requests. Figure 9.4 shows the upstream request rate measured in requests per second during the leech phase of the measurement peer. The sub-phases are delimited 105

80 60 40 0

20

Number of requests

100

120

CHAPTER 9. BITTORRENT MODELS

0

500

1000

1500

2000

2500

Interval number

Figure 9.4: Upstream request rate during leech phase

by dashed lines. The behaviour during the sub-phases is clearly observable in this figure. A single distributional model for the entire duration of the leech phase would certainly not be able to capture this behaviour. Therefore, the models in this section only relate to the longer of the sub-phases. Models are provided for the instantaneous upstream request rate and upstream inter-departure times for the messages.

Request Rates Upstream request rates have been observed to be accurately modelled by a Gaussian distribution. The results of the modelling are given in Table 9.7 and the associated EPDF and CCDF overlay plots in Figure 9.5. It is conjectured that the appearance of a Gaussian distribution is due to the fact that requests to each connected peer can be viewed as being drawn from a separate distribution. The sum of these distributions would then approach a Gaussian distribution according to the central limit theorem. 106

9.2. MESSAGE CHARACTERISTICS

Table 9.7: Gaussian parameter estimates and errors for upstream request rate during leech phase

µ ˆ±σ ˆ

6.88 ± 0.04

E%

AD sign.

Comment

0.52

≈ 0.05

Pass, very good; AD: Pass

0.04 Density

0.03 0.01

0.02

−2

log P[X ≥ x]

−1

0.05

0.06

0

39.4 ± 0.04

σ ˆN ± σ ˆ

0.00

−3

50.0% 80.0% 90.0% 95.0% 99.0% 1

10

20

30

log x

40

50

60

70

Messages/s

(a) Request rate CCDF

(b) Request rate EPDF

Figure 9.5: Modelling results for request rate during leech phase

Request Inter-departure Times The exponential distribution has been used to model upstream request interdeparture times. The results of this modelling are given in Table 9.8 and the associated EPDF and CCDF overlay plots in Figure 9.6. Table 9.8: Exponential parameter estimates and errors for request interdeparture times during leech phase

ˆ±σ λ ˆ

E%

Comment

32.8 ± 0.8

1.98

Pass, good

It is observed that the majority of error is from inter-departure times shorter 107

CHAPTER 9. BITTORRENT MODELS than 0.001 s. Fitting an exponential distribution to times longer than this results in about 0.4 % error. This warrants the proposition of an additional, alternative model in which inter-departure times shorter than 0.001 s are modelled by a uniform distribution. As before, the longer times are modelled by an exponential. The censored mixture model gives a better result with regards to the error percentage. The cutoff point between the uniform and exponential distributions is approximately 0.142, which corresponds to 0.001 s. Table 9.9: Exponential and Uniform parameter estimates and errors using alternative model for request inter-departure times during leech phase Uni. range Uni. E% [0, ≈ 10

−3

]

3.52

ˆ±σ λ ˆ

Exp. E% Total E% Comment

30.8 ± 0.4

0.39

0.55

Exp.: Pass, excellent Unif.: Pass, fair

Density

20

−2

50.0% 80.0% 90.0% 95.0% 99.0% −5

0

−5

−4

10

−3

log P[X ≥ x]

30

−1

40

0

Total: Pass, very good

−4

−3

−2

−1

log x

(a) Request inter-departure time CCDF

0

0.0

0.2

0.4

0.6

Interdeparture time

(b) Request inter-departure time EPDF

Figure 9.6: Modelling results for request inter-departure times during leech phase

108

9.2. MESSAGE CHARACTERISTICS It is clear that the uniform part of the model does not provide as good a fit as the exponential part. However, the improvement in the exponential part still makes the model a viable alternative to the pure exponential model. Also, while the linear portion in the upper 1 % quantile can be fitted to a Pareto distribution, the exponential provides a good enough fit for the purpose of simulation.

9.2.2

Downstream piece-messages During Leech Phase

By modelling the behaviour of downstream piece-messages, it is possible to understand the response characteristics of uploading peers.

Piece Rates Contrary to what might be expected, the downstream piece rates do not conform to a Gaussian distribution, but rather to a Weibull distribution. The results presented in Table 9.10 and Figure 9.7 clearly show the applicability of the model. In particular, the AD statistic (A2 ) has been calculated to be 0.325, approximately equivalent to α = 0.9 (Table 6.1). This gives a quantitative confirmation of the qualitative fitness clearly visible in Figure 9.7. Table 9.10: Weibull parameter estimates and errors for downstream piece rate during leech phase

α ˆ±σ ˆ

βˆ ± σ ˆ

E%

AD sign.

Comment

6.83 ± 0.06

53.6 ± 0.05

0.52

≈ 0.9

Pass, very good AD: Pass

It is interesting to note the skewness of the EPDF in Figure 9.7. The tendency is to a lighter upper tail and a heavier lower tail with a higher mean than the corresponding request-messages. This is probably due to back-to-back requests being sent. Since a sub-piece is larger (16384 bytes) than an Ethernet 109

0.03

Density

0.02 0.01

−2

50.0% 80.0% 90.0% 95.0% 99.0%

0.00

−3

log P[X ≥ x]

−1

0.04

0

0.05

CHAPTER 9. BITTORRENT MODELS

20

30

40

log x

50

60

70

Messages/s

(a) Piece rate CCDF

(b) Piece rate EPDF

Figure 9.7: Modelling results for downstream piece rate during leech phase

frame, it cannot fit in a single frame and would thus be taken into account in the rate calculations.

Piece Inter-arrival Times The inter-arrival times show similar improvements in fitting quality as downstream rates. The same model as for upstream request-messages is used, and the results are presented in Table 9.11. Table 9.11: Exponential parameter estimates and errors for piece interarrival times during leech phase

ˆ±σ λ ˆ

E%

Comment

48.4 ± 0.5

0.53

Pass, very good

The absence of the discrepancies in the lower tail that was evident in the upstream request-messages, lead us to suspect that there is some form of local interference. This can be kernel or user-space queueing on the measurement computer. The fact that the exponential distribution is still valid is however an 110

9.2. MESSAGE CHARACTERISTICS

Density

10

20

−2 −3

50.0% 80.0% 90.0% 95.0% 99.0% −6

0

−5

−4

log P[X ≥ x]

30

−1

40

0

indicator of the validity of the first model, despite the discrepancies.

−5

−4

−3

−2

log x

(a) Inter-arrival time CCDF

−1

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Interarrival time

(b) Inter-arrival time EPDF

Figure 9.8: Modelling results for downstream piece inter-arrival times during leech phase

Figure 9.8 shows the EPDF and CCDF for piece-message inter-arrival times during the leech phase. The improvement in tail fitness compared to the request inter-departure times is clearly observable in Figure 9.8(a).

9.2.3

Downstream request–messages During Seed Phase

In this section, models for the aggregate, i.e., rate and inter-arrival times of all downstream request-messages are presented. While these models do not provide any direct information on the behaviour of any particular peer, it does provide information on the expected load placed on a participating peer. The models also give an indication of the downstream network load placed by a seeding peer. Request Rates As with the upstream request-message inter-departure times, two models are presented for the downstream messages. However, as opposed to the upstream 111

CHAPTER 9. BITTORRENT MODELS request-rates, the downstream equivalent is not Gaussian but displays a heavier tail. The models used thus need at least a long-tailed distribution. A single Weibull and a dual Weibull mixture have been fitted with good results. The results of the single Weibull model is presented in Table 9.12, while Table 9.13 shows the results for the dual model. Table 9.12: Weibull parameter estimates and errors for downstream request rate during seed phase

α ˆ±σ ˆ

βˆ ± σ ˆ

E%

Comment

1.99 ± 0.14

12.8 ± 0.29

0.99

Pass, very good

Though the single Weibull provides good results, adding a second Weibull component reduces the error percentage by 75 %. The decrease occurs primarily in the lower tail, but the upper tail also yields significant improvement. The mixture model gives an excellent match up to at least the 99 % quantile, as shown in Figure 9.9. Table 9.13: Dual Weibull parameter estimates and errors for downstream request rate during seed phase

α ˆ1

βˆ1

α ˆ2

βˆ2



E%

Comment

2.24

10.8

1.86

15.8

0.52

0.21

Pass, excellent

Again, as in the case of upstream request inter-departure times, a linearity can be observed in the CCDF (Figure 9.9) for quantiles > 99 %, which can be matched to a true heavy-tail distribution. The blue line shows a Pareto fit to the upper 1 % quantile. While underestimation of a rate would also underestimate the network load, the tail discrepancy for the dual Weibull model is considered acceptable. This is partly because request-messages are small, and partly because the tail does not seem to propagate to the corresponding piece-messages (see Figure 9.12). Since in Section 9.2.1 a Gaussian distribution was fitted to upstream request 112

0.05 0.04

Density

0.01

0.02

0.03

−3 −4

log P[X ≥ x]

−2

0.06

−1

0.07

0

9.2. MESSAGE CHARACTERISTICS

0.00

−5

50.0% 80.0% 90.0% 95.0% 99.0%

0 −1

0

1

20

40

60

80

2 Messages/s

log x

(a) Request rate CCDF

(b) Request rate EPDF

Figure 9.9: Dual Weibull modelling results for downstream request rate during seed phase

rates, it might be expected that this distribution would also appear in the downstream rates. There are a few reasons as to why this is not the case. First, recall that the Gaussian was only used for modelling the non-startup and end-game mode phase during the leech phase. This is not the case for the downstream messages. In particular the end-game mode adds a certain amount to the tail of the distribution. This is not as pronounced as for the upstream messages, since the measurement peer only receives one request per remaining sub-piece from the requesting peer. Second, heavy tail behaviour may be induced by TCP under certain circumstances [34]. Also, the ON-OFF behaviour induced by losses can be further increased by the way the tit-for-tat mechanism works. Assuming a per-flow view, a requesting peer may be forced to wait for an unchoke-message before being allowed to download. This state would then be equivalent with an OFF state, with the peer entering the ON state after receiving the unchoke-message. For frequently choked peers with high download rates, heavy-tail behaviour may occur. Third, TCP has a tendency to propagate LRD behaviour [79]. This means 113

CHAPTER 9. BITTORRENT MODELS that if a TCP connection is sharing a link with LRD traffic, the connection will “catch” the LRD behaviour. Considering all these factors, the appearance of Weibull distributions is not surprising, but may rather be expected. Request Inter-arrival Times Compared to the upstream inter-departure times, the equivalent downstream inter-arrival times are more well-behaved. An exponential distribution provides a good model for the entire range of data. The results are presented in Table 9.14. It is interesting to note that the inter-arrival rate is about 1/3 of the equivalent inter-departure rate. This is not unexpected behaviour, even if every requesting peer has a total request rate on the order of ≈ 30 requests per second. The requests are then spread out across the swarm, decreasing the total load on each uploading peer. Table 9.14: Exponential parameter estimates and errors for request interarrival times during seed phase

ˆ±σ λ ˆ

E%

Comment

9.03 ± 0.18

0.54

Pass, very good

The tail of the CCDF in Figure 9.10 shows the same tendency of heavy-tail behaviour. This corroborates the work reported in [79] to a certain extent.

9.2.4

Upstream piece–messages During Seed Phase

While the previous section presented models for the incoming requests to a seeding peer, this section provides models for the corresponding piece-messages. Though these are expected to be highly related (see Table 9.2), it is still of interest to observe how and to what degree the models differ. The models used to model the piece-messages are the same as the corre114

6 2

4

Density

−3 −4 −5

log P[X ≥ x]

−2

−1

8

0

9.2. MESSAGE CHARACTERISTICS

−6

−5

0

50.0% 80.0% 90.0% 95.0% 99.0% −4

−3

−2

−1

0

1

0.0

0.5

log x

1.0

1.5

2.0

x

(a) Request inter-arrival time CCDF

(b) Request inter-arrival time EPDF

Figure 9.10: Modelling results for request inter-arrival times during seed phase

sponding request-messages, i.e., the single and dual Weibull distributions. Piece Rates For the single Weibull model, the parameters for upstream piece-messages are very similar to those in the request model. The β-parameter is slightly higher, but for simulation purposes the models can be viewed as equivalent. It should also be noted that the fitting is better in the case of upstream piece-rates. Table 9.15: Weibull parameter estimates and errors for upstream piece rate during seed phase α ˆ±σ ˆ

βˆ ± σ ˆ

2.05 ± 0.1

12.5 ± 0.23

E%

Comment

0.76 Pass, very good

While the single Weibull fitting of upstream piece-messages outperforms the single Weibull request-message fitting, for the dual Weibull case, the opposite is true. However, the results shown in Table 9.16 are still very good, and there is no reason for exploring other models. Additionally, the tail matching appears 115

CHAPTER 9. BITTORRENT MODELS to be better in the dual piece-case than for the corresponding request-case, as can be observed in the CCDF plot in Figure 9.11 Table 9.16: Dual Weibull parameter estimates and errors for upstream piece rate during seed phase α ˆ1 ± σ ˆ

βˆ1 ± σ ˆ

α ˆ2 ± σ ˆ

βˆ2 ± σ ˆ

2.33 ± 0.24

11.2 ± 0.76

1.78 ± 0.38

15.1 ± 0.92

E%

Comment

0.07

0.58 0.31 Pass, excellent

0.06

0

Density

0.02

0.03

0.04

0.05

−1 −2 −3

log P[X ≥ x]

−4

−1

0.01

50.0% 80.0% 90.0% 95.0% 99.0%

0.00

−5



0

1 log x

(a) Request rate CCDF

0

10

20

30

40

50

60

Messages/s

(b) Request rate EPDF

Figure 9.11: Dual Weibull modelling results for upstream piece rates during seed phase

Piece Inter-departure Times The exponential distribution has been selected for modelling inter-departure times. The results reported in Table 9.17 indicate the validity of the model. The linear tail is still apparent in the CCDF in Figure 9.12. However, the fit is not quite as good as the alternative model in Section 9.2.1. Since the present model does not require special handling of short inter-departure times, it is deemed to be accurate enough for simulation purposes. 116

9.3. SUMMARY

Table 9.17: Exponential parameter estimates and errors for piece interdeparture times during seed phase

E%

Comment

11.7 ± 0.47

0.62

Pass, very good

10 8

Density

4

6

−3 −4 −5

log P[X ≥ x]

−2

12

−1

14

0

ˆ±σ λ ˆ

−5

0

2

50.0% 80.0% 90.0% 95.0% 99.0% −4

−3

−2

−1

0

1

log x

(a) Piece inter-departure time CCDF

0.0

0.5

1.0

1.5

2.0

x

(b) Piece inter-departure time EPDF

Figure 9.12: Modelling results for piece inter-departure times during seed phase

9.3

Summary

In this chapter, accurate models for BitTorrent session and message characteristics have been reported. Rates have been shown to have long tails, while interarrival and inter-departure times are Exponentially distributed. Furthermore, upstream and downstream rates and times are observed to be distributionally similar.

117

CHAPTER 9. BITTORRENT MODELS

118

Chapter 10

Conclusions and Future Work It only takes 20 years for a liberal to become a conservative without changing a single idea. – Robert Anton Wilson

The main goal of this thesis was to develop tractable models for key BitTorrent characteristics that are suitable for simulation environments. The work leading up to and culminating in this thesis started by designing and implementing a dedicated P2P measurement infrastructure. This infrastructure provides the possibility of measuring P2P application traffic with high accuracy. A large number of measurements have been performed to provide the experimental data used for modelling. The measurements have been reported, and salient characteristics regarding the BitTorrent system have been identified. The reported models are divided into two specific categories: session models and message models. Sessions have been shown to arrive with hyper-exponentially distributed intervals and their duration and size exhibits long-tail behaviour if the session involves data transfer. It is important to note that there is no true heavy-tail behaviour present, which indicates that the BitTorrent system is 119

CHAPTER 10.

CONCLUSIONS AND FUTURE WORK

fairly well-behaved with respect to session sizes and durations. Also, the longtail models presented tend to slightly overestimate the durations and sizes, thus making the model suitable for prediction purposes. Furthermore, accurate models have been obtained for the most bandwidthconsuming BitTorrent messages. The inter-arrival and inter-departure times of these messages have been shown to be well modelled as exponentially distributed. Upstream data request rates have been shown to be Gaussian under certain circumstances. The corresponding downstream rates are however longtailed, due to the end-game mode of the BitTorrent protocol suite. In addition to models for the BitTorrent protocol, a fitness assessment method has been presented. The method circumvents problems that classical fitness tests exhibit with large number of observations.

10.1

Future Work

A slight drawback with the message models reported in Section 9.2 is that they primarily model aggregate characteristics. A natural continuation of the work is to extend the aggregate models to per-flow models. Extending the message models to other BitTorrent messages is also an interesting prospect. This would allow for very detailed simulation models to be built. It would also make it possible to evaluate the behaviour of specific BitTorrent client types, to see whether there are any inherent protocol invariants. Tracker information is still missing from the measurements. Ideally, a tracker and a collaborating peer should be used for measurements. This would make it possible to assess, e.g., to what degree the flash-crowd effect influences a specific peer (in both seed and leech phases). The measurements presented herein were performed during mid 2004. The Internet has not remained static since then, and more measurements to verify the results presented would be interesting. For example, some of the latest developments (summer of 2005) in the BitTorrent community is working towards removing the dependency on the tracker. This would add signalling load on the 120

10.1. FUTURE WORK network, something that the BitTorrent networks have been more or less devoid of previously. The fitness assessment method reported in Section 7.3 suffers from insensitivity to tail discrepancies. Further work on suitable weighting and normalisation for the fitness measure is needed to increase the applicability of the method. It has been successfully used for parameter optimisation in other work, but modification for use as a general error percentage is still needed.

121

CHAPTER 10.

122

CONCLUSIONS AND FUTURE WORK

Appendix A

BitTorrent Protocol Details This chapter provides a fairly complete description of the BitTorrent Protocol as defined in [20]. Where applicable, notes have been added to expound on the specification.

A.1 strings

Bencoding Types Strings are encoded length-prefixed. The length is given in base ten, in ASCII character encoding. The length should be followed by a colon, immediately followed by the specified number of characters as string data. Note that the string encoding does not necessarily mean that the string data are humanly readable, i.e., in the printable ASCII range. Strings carry any valid 8-bit value, and are commonly used to carry binary data. Example: 3:BIT encodes the string “BIT”.

integers

Integers are encoded by enclosing a base ten ASCII coded numerical string by i and e. Negative numbers are ac123

APPENDIX A. BITTORRENT PROTOCOL DETAILS cepted, but not leading zeroes, except in the case for the value 0 itself. Example: i23e encodes the integer 23. lists

Lists are encoded by enclosing any valid bencoding type, including other lists, by l and e. More than type is allowed. Example: l3:agei30ee encodes the string “age” and the integer 30.

dictionaries Dictionaries are encoded by enclosing (key, value) pairs by d and e. The keys must be bencoding strings and the values may be any valid bencoding type, including other dictionaries. Example: d3:agei30e4:name5:james5likesl4:food5:drinkee

encodes the structure: age: 30 name: james likes: {food, drink}

A.2

Peer Wire Protocol Messages

piece

The only payload-related protocol message. The message contains one subpiece.

request

The request-message is the method a peer wishing to download uses to notify the sending peer which subpiece is desired.

cancel

If a peer has previously sent a request message, this message may be used to withdraw the request before it has been serviced. Mostly used during end-game mode (Section 3.6.1).

124

A.3. TRACKER REQUEST PARAMETERS interested

This message is sent by a peer to another peer to notify it that the first peer intends to download some data. See Section 3.4 for description of this and the following three messages.

not interested This is the negation of the previous message. It is sent when a peer no longer wants to download. choke

This message is sent by a data transmitting peer to notify the receiving peer that it will no longer be allowed to download.

unchoke

The negation of the previous message. Sent by a transmitting peer to a peer that has previously sent it an interested message.

have

After a completed download, the peer sends this message to all its connected peers to notify them of which parts of the data are available from the peer.

bitfield

Only sent during the initial BitTorrent handshake, and is then exchanged between the connecting peers. Contains a bitfield indicating which pieces the peer has.

keepalive

Empty message, to keep a connection alive.

A.3

Tracker Request Parameters

A.3.1

Mandatory

Each announce request must include the following parameters: info hash

The SHA1 hash of the value contained in the info field in the torrent file. 125

APPENDIX A. BITTORRENT PROTOCOL DETAILS peer id

A 20 byte string to uniquely identify the requesting peer. There is no consensus regarding the generation of this value, but several distinct types of ID-generation have appeared that may be used to identify which client a peer is running. There is some disagreement between the official protocol description [20] and the Wiki [15]. The original specification states that this field most likely will have to be URL escaped, while the Wiki claims that it must not be escaped.

port

The listening port of the client. The default port range for the reference client is 6881–6889. Each active swarm needs a separate port in the default client, but third party clients have implemented single-port functionality.

uploaded

The total number of bytes uploaded to all peers in the swarm, encoded in base ten ASCII. The specification does not state whether this takes into account re-transmits or not.

downloaded The total number of bytes downloaded from all peers in the swarm, encoded in base ten ASCII. The specification does not state whether this takes into account re-transmits or not. left

A.3.2

The total number of bytes left to download, also encoded in base ten ASCII.

Optional Parameters

The following parameters may optionally be included: compact

126

If set to 1, the tracker response will not be a proper bencoded datum as described below, but rather a binary list of peer addresses and ports.

A.3. TRACKER REQUEST PARAMETERS numwant Specifies the number of peers that the requesting peer is requesting from the tracker. event

May be one of: started The first request to the tracker, must include this parameter–value pair. stopped If shutting down, this should be specified to indicate graceful shutdown. completed Included to notify the tracker once a download is complete, and should not be included when joining a swarm with the full content.

key

A.3.3

Used as session identifier.

Tracker Replies

interval

Indicates the number of seconds between subsequent requests to the tracker.

complete

Number of seeds in the swarm.

incomplete Number of leechers in the swarm. peers

Contains a list of dictionaries. Each dictionary in this list has the following keys: peer id The peer id parameter that the peer has reported to the tracker. ip

IP address or DNS name of the peer.

port

Listening port of the peer. 127

APPENDIX A. BITTORRENT PROTOCOL DETAILS

A.4

Scrape Response Keys

complete

Number of seeds for the specific swarm.

downloaded Number of registered complete-events for the specific swarm. incomplete Number of leechers for the specific swarm. name

128

This optional field contains the name of the file as defined in the name-field in the torrent file.

Appendix B

BitTorrent XML Log File The XML document type used for the BitTorrent log files is comprised of only two elements: EVENTLIST and EVENT. The EVENTLIST element carries information regarding the torrent-file used for the measurement and the settings that were used for the BitTorrent client during the measurement session. Figure B.1 shows two excerpts from such an XML document. Section B.1 gives the Document Type Definition (DTD) for the BitTorrent XML log. Every EVENT element contains the attributes type and timestamp. The timestamp attribute signifies the time at which this event was ejected to the log file, expressed as a UNIX timestamp, i.e., the number of seconds elapsed since 00:00:00 UTC, January 1, 1970. The type field denotes the event type. The various values for the type-attribute are: announce The only tracker-related event type available. It is ejected into the log file when the peer communicates with the tracker to request more peers. This element carries the following attributes: uploaded

Denotes the number of subpiece bytes this peer has sent to other peers since it was launched.

downloaded Denotes the number of subpiece bytes 129

Figure B.1: Extract from BitTorrent XML log file

130



APPENDIX B. BITTORRENT XML LOG FILE

this peer has received from other peers since the client was launched. left

Denotes the number of bytes of the resource that remains to download.

last

This parameter is undocumented in both the official protocol specification and the Wiki.

trackerid

Used by the tracker for maintaining state.

event

Is one of started, none or completed. The value started should be used when sending the initial tracker announce message, and only then. The None value is used when transmitting the periodic updates to the tracker, while the value completed is sent exactly once to the tracker when the download is complete.

numwant

Denotes the number of new peer addresses the peer is requesting from the tracker.

start dl

This element is ejected for every newly initiated TCP connection to a peer. Note that it does not necessarily imply that the BitTorrent handshake will be completed.

connect

This element is ejected after every completed BitTorrent handshake.

unchoke,choke,interested,not interested,request,piece,have,cancel These element types are ejected for each sent or received corresponding BitTorrent protocol message. send

The send-element is the equivalent of the piece-message, but for the subpieces the local peer transmits. 131

APPENDIX B. BITTORRENT XML LOG FILE done

This element is ejected once a download completes fully, and should only appear once per log file.

The various peer-related event types carry event specific information in additional attributes. These additional attributes are: src, dst

These attributes indicate the source and destination IP address of the sending or receiving peer respectively. Valid for all event types.

srcid, dstid These attributes indicate the peer ID of the sending or receiving peer respectively. The content of these attributes are encoded using the python functions repr and xml.saxutils.escape. The function repr returns a unique string representation of the input parameter, and the escape function returns an XML-escaped version of its input. Recall that the peer ID is a binary 20-byte value. The peer ID is first processed by the repr function to convert any non-printable to its python hexadecimal representation, i.e., the characters \x followed by the hexadecimal value. This string is then made into a valid XML attribute by the xml.saxutil.escape function, i.e., converting XML special characters such as & to &, with the exception of the quotation character, ", which is encoded using the python hexadecimal encoding (\x22). For a complete list of XML entity encodings see [80]. Valid for all event types except start_dl. piece

Denotes which piece a specific message refers to. Valid for types piece, cancel, send, have and request.

begin

Starting byte of a subpiece reference. Used together with the length parameter to denote a specific subpiece. Valid for types piece, cancel, send and request.

132

length

Number of content data bytes received or sent in a single piece message. Valid for types piece, send and cancel.

down

Denotes the number of downloaded and SHA1-verified bytes. Only valid for type have.

nconns

This attribute denotes the number of currently connected peers at the time of event ejection. This includes both locally and remotely initiated connections. Valid for all event types.

port

Indicates the TCP port of the remote peer. Valid for type start_dl only.

direction

Only valid for types have and bitfield. Used for differentiating between sent and received messages of these types. If the attribute is present and contains the value out, the message was sent by the measurement peer, otherwise it was received.

txtime

The difference in time between the sending of the first subpiece of a piece and the reception of the last subpiece of the piece.

rxtime

The difference in time between the first request of a piece and the reception of the last subpiece of the piece.

133

APPENDIX B. BITTORRENT XML LOG FILE

B.1
134

BitTorrent Application Log DTD EVENTLIST (#PCDATA | EVENT)*> EVENTLIST start_timestamp CDATA #IMPLIED> EVENTLIST peertype CDATA #IMPLIED> EVENTLIST version CDATA #IMPLIED> EVENTLIST bound_ip CDATA #IMPLIED> EVENTLIST bound_port CDATA #IMPLIED> EVENTLIST tracker_ip CDATA #IMPLIED> EVENTLIST tracker_port CDATA #IMPLIED> EVENTLIST peer_id CDATA #IMPLIED> EVENTLIST pieces CDATA #IMPLIED> EVENTLIST piecesize CDATA #IMPLIED> EVENTLIST nfiles CDATA #IMPLIED> EVENTLIST totlen CDATA #IMPLIED> EVENTLIST max_slice_length CDATA #IMPLIED> EVENTLIST rarest_first_cutoff CDATA #IMPLIED> EVENTLIST ip CDATA #IMPLIED> EVENTLIST download_slice_size CDATA #IMPLIED> EVENTLIST snub_time CDATA #IMPLIED> EVENTLIST rerequest_interval CDATA #IMPLIED> EVENTLIST max_uploads CDATA #IMPLIED> EVENTLIST saveas CDATA #IMPLIED> EVENTLIST min_uploads CDATA #IMPLIED> EVENTLIST spew CDATA #IMPLIED> EVENTLIST max_upload_rate CDATA #IMPLIED> EVENTLIST minport CDATA #IMPLIED> EVENTLIST http_timeout CDATA #IMPLIED> EVENTLIST timeout_check_interval CDATA #IMPLIED> EVENTLIST display_interval CDATA #IMPLIED> EVENTLIST max_initiate CDATA #IMPLIED> EVENTLIST max_message_length CDATA #IMPLIED> EVENTLIST upload_rate_fudge CDATA #IMPLIED> EVENTLIST check_hashes CDATA #IMPLIED> EVENTLIST min_peers CDATA #IMPLIED> EVENTLIST keepalive_interval CDATA #IMPLIED> EVENTLIST maxport CDATA #IMPLIED> EVENTLIST request_backlog CDATA #IMPLIED> EVENTLIST bind CDATA #IMPLIED> EVENTLIST max_rate_period CDATA #IMPLIED> EVENTLIST url CDATA #IMPLIED> EVENTLIST statfile CDATA #IMPLIED> EVENTLIST report_hash_failures CDATA #IMPLIED> EVENTLIST timeout CDATA #IMPLIED> EVENTLIST responsefile CDATA #IMPLIED> EVENTLIST max_allow_in CDATA #IMPLIED> EVENT (#PCDATA)> EVENT uploaded CDATA #IMPLIED> EVENT downloaded CDATA #IMPLIED> EVENT left CDATA #IMPLIED> EVENT last CDATA #IMPLIED> EVENT trackerid CDATA #IMPLIED> EVENT event CDATA #IMPLIED> EVENT numwant CDATA #IMPLIED> EVENT port CDATA #IMPLIED> EVENT txtime CDATA #IMPLIED> EVENT rxtime CDATA #IMPLIED> EVENT rxstart CDATA #IMPLIED> EVENT direction CDATA #IMPLIED> EVENT down CDATA #IMPLIED> EVENT dst CDATA #IMPLIED> EVENT dstid CDATA #IMPLIED> EVENT nconns CDATA #IMPLIED> EVENT type CDATA #IMPLIED> EVENT timestamp CDATA #IMPLIED>

B.1. BITTORRENT APPLICATION LOG DTD


EVENT EVENT EVENT EVENT EVENT

src CDATA #IMPLIED> srcid CDATA #IMPLIED> piece CDATA #IMPLIED> begin CDATA #IMPLIED> length CDATA #IMPLIED>

135

APPENDIX B. BITTORRENT XML LOG FILE

136

Bibliography [1] Akamai. http://www.akamai.com, August 2005. [2] CIFS: A common internet file system. http://www.microsoft.com/mind/1196/cifs.asp, August 2005. [3] ICQ. http://www.icq.com, August 2005. [4] Msn messenger. http://messenger.msn.com, August 2005. [5] The R project. http://www.r-project.org, August 2005. [6] The world wide web consortium. http://www.w3.org/, September 2005. [7] Yahoo! messenger. http://messenger.yahoo.com, August 2005. [8] D. Eastlake 3rd and P. Jones. September 2001. RFC 3174.

US Secure Hash Algorithm 1 (SHA1),

[9] Cachelogic A. Parker. The true picture of peer-to-peer file sharing. http://www.cachelogic.com/research/slide9.php, May 2005. 137

BIBLIOGRAPHY [10] Charles Annis. InterOcular trauma test. http://www.statisticalengineering.com/interocular.htm, August 2005. [11] Azureus. http://azureus.sourceforge.net/, August 2005. [12] Jan Beran. Statistics for Long-Memory Processes. Chapman & Hall, 1994. [13] T. Berners-Lee, R. Fielding, and L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax, August 1998. RFC 2396. [14] T. Berners-Lee, L. Masinter, and M. McCahill. Uniform Resource Locators (URL), December 1994. RFC 1738. [15] BitTorrent specification. http://wiki.theory.org/BitTorrentSpecification, February 2005. [16] J. Chapweske. Tree hash exchange format. http://open-content.net/specs/draft-jchapweske-thex-02.html, 2005.

February

[17] Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A distributed anonymous information storage and retreival system. White paper. [18] Clip2. The Annotated Gnutella Protocol Specification v0.4. Gnutella Developer Forum (GDF), 1.8th edition, July http://groups.yahoo.com/group/the gdf/files/Development/.

The 2003.

[19] Bram Cohen. Bittorrent. http://bitconjurer.org/BitTorrent/, August 2005. [20] Bram Cohen. BitTorrent protocol specification. http://www.bitconjurer.org/BitTorrent/protocol.html, February 2005. [21] Doru Constantinescu. Measurements and models of one-way transit time in IP routers, 2005. Licenciate thesis, Blekinge Institute of Technology. 138

BIBLIOGRAPHY [22] Mark E. Crovella and Murad S. Taqqu. Estimating the heavy tail index from scaling properties. Methodology and Computing in Applied Probability, Vol 1(No. 1), 1999. [23] Mark E. Crovella, Murad S. Taqqu, and Azer Bestavros. A Practical Guide to Heavy Tails, chapter Heavy-Tailed Probability Distributions in the World Wide Web, pages 3–27. Birkh¨auser, 1998. ISBN 0-8176-3951-9. [24] Qiu D. and Srikant R.J. Modeling and performance analysis of bittorrentlike peer-to-peer networks. Technical report, University of Illinois at Urbana-Champaign, USA, 2004. [25] Ralph B. D’Agostino and Michael A. Stephens, editors. Goodness-of-fit Techniques. Dekker, 1986. [26] distributed.net. distributed.net. http://distributed.net, February 2005. [27] eDonkey. http://www.edonkey.com, February 2005. [28] Wikipedia Encyclopedia. Peer-to-peer. http://en.wikipedia.org/wiki/P2p, August 2005. [29] David Erman, Dragos Ilie, and Adrian Popescu. BitTorrent session characteristics and models. In Demetres Kouvatsos, editor, Technical Proceedings. HET-NETs ’05 - 3rd International Working Conference on Performance Modelling and Evaluation of Heterogeneous Networks, 2005. [30] David Erman, Dragos Ilie, and Adrian Popescu. Peer-to-peer traffic measurements. Technical report, Blekinge Institute of Technology, Karlskrona, Sweden, 2005. [31] David Erman, Dragos Ilie, Adrian Popescu, and Arne A. Nilsson. Measurement and analysis of BitTorrent traffic. In NTS 17, August 2004. [32] Jean-loup Gailly and Mark Adler. zlib. http://www.gzip.org/zlib, August 2005. 139

BIBLIOGRAPHY [33] Krishna P. Gummadi, Stefan Saroiu, and Steven D. Gribble. King: estimating latency between arbitrary internet end hosts. In Proceedings of the second ACM SIGCOMM Workshop on Internet measurment workshop, pages 5–18. ACM Press, 2002. [34] Liang Guo, Mark Crovella, and Ibrahim Matta. TCP congestion control and heavy tails. Technical Report 2000-017, 3 2000. [35] M.R. Horton. UUCP mail interchange format standard, February 1986. RFC 976. [36] Dragos Ilie, David Erman, and Adrian Popescu. Traffic measurements of P2P systems. Swedish National on Computer Networking Workshop (SNCNW04), November 2004. [37] Dragos Ilie, David Erman, Adrian Popescu, and Arne A. Nilsson. Measurement and analysis of Gnutella signaling traffic. In IPSI 2004, September 2004. [38] M. Izal, G. Urvoy-Keller, E.W. Biersack, P.A. Felber, A. Al Hamra, and L. Garc´es-Erice. Dissecting BitTorrent: Five months in a torrent’s lifetime. In PAM2004, 2004. [39] Pouwelse J.A., Garbacki P., Epema D.H.J., and Sips H.J. The BitTorrent P2P file-sharing system: Measurements and analysis. 4th International Workshop on Peer-to-Peer Systems (IPTPS’05), February 2005. [40] Van Jacobsen, Leres C., and McCanne S. Tcpdump. http://www.tcpdump.org, August 2005. [41] Raj Jain. The Art of Computer Systems Performance Analysis. John Wiley & Sons, 1991. ISBN 0-471-50336-3. [42] Ajit K. Jena, Adrian Popescu, and Arne A. Nilsson. Modeling and evaluation of internet applications. In International Teletraffic Conference, Berlin, Germany, August 2003. ITC18. [43] B. Kantor and P. Lapsley. Network News Transfer Protocol, February 1986. RFC 977. 140

BIBLIOGRAPHY [44] Thomas Karagiannis, Andre Broido, Michalis Faloustos, and Claffy Kc. Transport layer identification of P2P traffic. IMC’04, 2004. [45] J. Klensin and Ed. Simple Mail Transfer Protocol, April 2001. RFC 2821. [46] Tor Klingberg and Raphael Manfredi. Gnutella 0.6. The Gnutella Developer Forum (GDF), 200206-draft edition, June 2002. http://groups.yahoo.com/group/the gdf/files/Development/. [47] Balachander Krishnamurty and Jennifer Rexford. Web Protocols and Practice. Addison Wesley, 2001. ISBN 0-201-71088-9. [48] Averill M. Law and W. David Kelton. Simulation Modeling and Analysis. McGraw-Hill, 2000. ISBN 0-07-059292-6. [49] Will E. Leland, Murad S. Taqq, Walter Willinger, and Daniel V. Wilson. On the self-similar nature of Ethernet traffic. In Deepinder P. Sidhu, editor, ACM SIGCOMM, pages 183–193, San Francisco, California, 1993. [50] George Marsaglia and John Marsaglia. Evaluating the Anderson-Darling distribution. Journal of Statistical Software, 9(2):1–5, February 2004. [51] Steven McCanne and Van Jacobson. The BSD packet filter: A new architecture for user-level packet capture. In USENIX Winter, pages 259–270, 1993. [52] Sun Microsystems. NFS: Network File System Protocol specification, March 1989. RFC 1094. [53] Napster. Napster. http://www.napster.com, August 2005. [54] NeoModus. DirectConnect. http://www.neo-modus.com, February 2005. [55] Motion Picture Association of America, August 2005. http://www.mpaa.com. [56] Recording Industry Association of America, August 2005. http://www.riaa.com. 141

BIBLIOGRAPHY [57] National Institute of Standards and Technology. Specifications for secure hash standard. http://www.itl.nist.gov/fipspubs/fip180-1.htm, April 1995. FIPS PUB 1801. [58] Shawn Ostermann. Tcptrace. http://www.tcptrace.org, August 2005. [59] Conny Palm. Intensit¨ atsschwankungen im Fernsprechverkehr. PhD thesis, Royal Institute of Technology, 1943. [60] Kihong Park and Walter Willinger, editors. Self-Similar Network Traffic and Performance Evaluation. Wiley Interscience, 2000. ISBN 0-471-319740. [61] Vern Paxson. Empirically derived analytic models of wide-area tcp connections. IEEE Transactions on Networking, 1994. [62] Vern Paxson and Sally Floyd. Wide area traffic: the failure of Poisson modeling. IEEE/ACM Transactions on Networking, 3(3):226–244, 1995. [63] Vern Paxson and Sally Floyd. Why we don’t know how to simulate the internet. In Winter Simulation Conference, pages 1037–1044, 1997. [64] J. Postel and J.K. Reynolds. File Transfer Protocol, October 1985. RFC 959. [65] The Free Network Project. The free network project. http://freenet.sourceforge.net. [66] Gary R. Wright and W. Richard Stevens. TCP/IP Illustrated: The Implementation, volume 2. Addison-Wesley, 1995. ISBN: 0-201-63354-X. [67] S. Resnick. Heavy tail modeling and teletraffic data. [68] Jordan Ritter. Why Gnutella can’t scale. No, really., February 2001. http://www.darkridge.com-/˜jpr5-/doc-/gnutella.html. 142

BIBLIOGRAPHY [69] Stefan Saroiu, P. Krishna Gummadi, and Steven D. Gribble. A measurement study of peer-to-peer file sharing systems. In Proceedings of the Multimedia Computing and Networking (MMCN), January 2002. [70] R¨ udiger Schollmeier. A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications. In Proceedings of the First International Conference on Peer-to-Peer Computing. IEEE, 2001. [71] Sharman Networks. KaZaA. http://www.kazaa.com, February 2005. [72] S. Shepler, B. Callaghan, D. Robinson, R. Thurlow, C. Beame, M. Eisler, and D. Noveck. Network File System (NFS) version 4 Protocol, April 2003. RFC 3530. [73] Soulseek. http://www.slsknet.org/, August 2005. [74] William Stallings. High-Speed Networks and Internets. Prentice-Hall, Inc., second edition edition, 2002. ISBN: 0-13-032221-0. [75] The SETI@Home Project. SETI@Home – the search for extraterrestial intelligence. http://setiathome.ssl.berkeley.edu/, February 2005. [76] D.M. Titterington, A.F.M. Smith, and U.E. Makov. Statistical Analysis of Finite Mixture Distributions. John Wiley & Sons, 1985. ISBN 0 471 90763 4. [77] Olaf van der Spek. BitTorrent udp-tracker protocol extension. http://libtorrent.sourceforge.net/udp tracker protocol.html, 2005.

February

[78] Guido van Rossum et al. Python. Online at http://www.python.org, August 2005. [79] A. Veres, Z. Kenesi, S. Molnar, and G. Vattay. On the propagation of long-range dependence in the internet. In Proceedings of ACM IGCOMM 2000, Stockholm, Sweden, Aug.-Sep. 2000., 2000. 143

BIBLIOGRAPHY [80] W3C. Extensible Markup Language (XML) 1.0, 2004. [81] M.P. Wand. Data-based choice of histogram bin width. [82] Carey Williamson. Internet traffic measurement. 2001. [83] Stephen Yantis, David E. Meyer, and J.E. Keith Smith. Analyses of multinomial mixture distributions: New tests for stochastic models of cognition and action. Psychological Bulletin, Volume 110(No. 2):350–374, 1991. [84] ZetaGrid. ZetaGrid. http://www.zetagrid.net/, February 2005.

144