are subsequently connected to other super nodes in a decentralized manner ( ...... puter Communication Review, pages 139â154, Cannes, France, September.
Christian Spielvogel, Dipl.-Ing.
A PROXY-TO-PROXY(X2X) FRAMEWORK FOR MULTIMEDIA ADAPTATION AND DELIVERY
DISSERTATION
zur Erlangung des akademischen Grades Doktor der technischen Wissenschaften
Alpen-Adria Universit¨at Klagenfurt Fakult¨at f¨ ur Technische Wissenschaften
1. Begutachter:
Dr. L´aszl´o B¨osz¨orm´enyi (Full Professor)
Institut:
Institut of Information Technology, University Klagenfurt, Austria
2. Begutachter:
Dr. Frank Eliassen (Full Professor)
Institut:
Department of Informatics, University Oslo, Norway
Contents List of Tables
vi
List of Figures
viii
Abstract
xi
1 Introduction
1
I
4
Content Distribution Topologies
2 Content Delivery Networks
5
2.1
An overview of Content Delivery Networks . . . . . . . . . . . . . . .
5
2.2
Content Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.2.1
Push-based systems . . . . . . . . . . . . . . . . . . . . . . . .
10
2.2.2
Pull-based systems . . . . . . . . . . . . . . . . . . . . . . . .
10
2.3
Request Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.4
Media Streaming in Content Delivery Networks . . . . . . . . . . . .
14
2.4.1
Selecting the Streaming Server . . . . . . . . . . . . . . . . . .
15
2.4.2
Failure Handling in the Akamai Network . . . . . . . . . . . .
15
2.4.3
QoS aware Media Delivery . . . . . . . . . . . . . . . . . . . .
16
3 Peer-to-Peer Networks 3.1
18
Centralized Architecture . . . . . . . . . . . . . . . . . . . . . . . . . ii
18
3.2
3.3
3.1.1
File Sharing Example: Napster . . . . . . . . . . . . . . . . .
19
3.1.2
Media Streaming Example: DirectStream . . . . . . . . . . . .
19
Decentralized Architectures . . . . . . . . . . . . . . . . . . . . . . .
23
3.2.1
File Sharing Example: Gnutella . . . . . . . . . . . . . . . . .
24
3.2.2
Media Streaming Example: GnuStream . . . . . . . . . . . . .
26
Hybrid Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.3.1
File Sharing Example: KaZaA . . . . . . . . . . . . . . . . . .
30
3.3.2
File Sharing Example: BitTorrent . . . . . . . . . . . . . . . .
31
3.3.3
Streaming Example: BiTos . . . . . . . . . . . . . . . . . . . .
31
4 Proxy Affinity based Topology Generation 4.1
4.2
Basic characterics of Peer-to-Peer and Content Delivery Networks . .
35
4.1.1
Peer-to-Peer Network Scenario . . . . . . . . . . . . . . . . . .
36
4.1.2
Content Delivery Network Scenario . . . . . . . . . . . . . . .
39
The ProxyAffinity Model . . . . . . . . . . . . . . . . . . . . . . . . .
40
4.2.1
Semantical Closeness (SC) . . . . . . . . . . . . . . . . . . . .
41
4.2.2
Network Closeness (NC) . . . . . . . . . . . . . . . . . . . . .
45
4.2.3
Proxy-to-Proxy Scenario . . . . . . . . . . . . . . . . . . . . .
47
5 Proxy-to-Proxy: Implementation
II
33
49
5.1
Proxy Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
5.2
Measuring Network Characteristics . . . . . . . . . . . . . . . . . . .
51
5.2.1
The Discovery Layer . . . . . . . . . . . . . . . . . . . . . . .
52
5.2.2
The Data Collection Layer . . . . . . . . . . . . . . . . . . . .
52
5.2.3
The Forecast Layer . . . . . . . . . . . . . . . . . . . . . . . .
53
Error Concealment and Error Avoidance
6 Forward Error Correction 6.1
54 55
Media-Independent Forward Error Correction . . . . . . . . . . . . .
iii
55
6.1.1
Duplicating Original Packets . . . . . . . . . . . . . . . . . . .
56
6.1.2
Generating Parity Packets . . . . . . . . . . . . . . . . . . . .
56
6.1.3
Comparing Packet Arrival Probabilities . . . . . . . . . . . . .
58
7 Layered Video Coding 7.1
7.2
60
Scalable Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
7.1.1
Temporal Scalability . . . . . . . . . . . . . . . . . . . . . . .
61
7.1.2
Spatial Scalability . . . . . . . . . . . . . . . . . . . . . . . . .
63
7.1.3
Scalable Coding: Signal-to-Noise-Ratio Scalability . . . . . . .
65
Multiple Description Coding . . . . . . . . . . . . . . . . . . . . . . .
67
7.2.1
Temporal Polyphase Down-Sampling . . . . . . . . . . . . . .
68
7.2.2
Spatial Polyphase Down sampling . . . . . . . . . . . . . . . .
71
8 Stream Affinity based Error Treatment 8.1
74
Multiple Source Streaming . . . . . . . . . . . . . . . . . . . . . . . .
75
8.1.1
Scenario 1 - Round Robin Scheduling . . . . . . . . . . . . . .
76
8.1.2
Scenario 2 - Coding Aware Fair Scheduling . . . . . . . . . . .
77
8.1.3
Scenario 3 - Coding aware Weighted Fair Scheduling . . . . .
77
8.1.4
Scenario 4 - Coding aware Weighted Fair Scheduling with Forward Error Correction . . . . . . . . . . . . . . . . . . . . . .
78
Conclusion from the streaming scenarios . . . . . . . . . . . .
79
The Stream-Affinity Metrics . . . . . . . . . . . . . . . . . . . . . . .
79
8.2.1
Network Closeness . . . . . . . . . . . . . . . . . . . . . . . .
80
8.2.2
Quality Closeness . . . . . . . . . . . . . . . . . . . . . . . . .
80
8.3
Error Treatment selection based on the A∗ Algorithm . . . . . . . . .
84
8.4
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
8.4.1
Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
8.4.2
Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
8.1.5 8.2
iv
III
Constraint based Data Replication
9 Caching
89 90
9.1
Standalone Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
9.2
Hierarchical Caching . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
9.3
Distributed Caching . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
9.4
Hybrid Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
9.5
Cache Replacement Strategies . . . . . . . . . . . . . . . . . . . . . .
92
10 Replication Affinity
95
10.1 The Replication Affinity Metrics . . . . . . . . . . . . . . . . . . . . .
96
10.1.1 Placement Affinity (PA) . . . . . . . . . . . . . . . . . . . . .
96
10.1.2 Reallocation Affinity (RA) . . . . . . . . . . . . . . . . . . . .
99
10.2 Search based Replication and Reallocation Decisions . . . . . . . . . 101 10.3 Scenarios and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 102 10.3.1 Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.3.2 Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.3.3 Scenario 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 10.3.4 Reallocation Scenarios . . . . . . . . . . . . . . . . . . . . . . 104 11 Future work
107
11.1 Eventually Consistent Indexing . . . . . . . . . . . . . . . . . . . . . 108 A SourceCode
111
Bibliography
123
v
List of Tables 4.1
Emulation result scenario 1 . . . . . . . . . . . . . . . . . . . . . . .
38
4.2
Emulation Result Scenario 2 . . . . . . . . . . . . . . . . . . . . . . .
40
4.3
Group formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
4.4
Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.5
Proxy-Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
4.6
Proxy-Group Throughput . . . . . . . . . . . . . . . . . . . . . . . .
48
6.1
Arrival Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
7.1
Temporal MDC downsampling results . . . . . . . . . . . . . . . . . .
69
7.2
Summary temp. MDC results . . . . . . . . . . . . . . . . . . . . . .
69
7.3
Temporal MDC downsampling results . . . . . . . . . . . . . . . . . .
70
7.4
Quality Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
7.5
Spatial MDC downsampling results . . . . . . . . . . . . . . . . . . .
72
7.6
Summary spat. MDC results . . . . . . . . . . . . . . . . . . . . . . .
72
7.7
Spatial MDC downsampling results . . . . . . . . . . . . . . . . . . .
73
7.8
Quality Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
8.1
Stream-, Network Characteristics . . . . . . . . . . . . . . . . . . . .
87
8.2
Network Closeness . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
8.3
Result scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
8.4
Possible decisions scenario 2 . . . . . . . . . . . . . . . . . . . . . . .
88
8.5
Result scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
vi
10.1 State Proxy P1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 10.2 State Proxy P3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 11.1 Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
vii
List of Figures 2.1
Content Deliver Network use-case . . . . . . . . . . . . . . . . . . . .
7
2.2
Content Delivery Network Layers . . . . . . . . . . . . . . . . . . . .
8
2.3
Centralized cooperative CDN Architecture . . . . . . . . . . . . . . .
11
2.4
Decentralized cooperative CDN Architecture . . . . . . . . . . . . . .
11
2.5
Akamai Streaming Architecture . . . . . . . . . . . . . . . . . . . . .
14
3.1
Centralized P2P architecture . . . . . . . . . . . . . . . . . . . . . . .
18
3.2
DirectStream P2P-Scenario . . . . . . . . . . . . . . . . . . . . . . .
20
3.3
Decentralized P2P architecture . . . . . . . . . . . . . . . . . . . . .
23
3.4
Decentralized P2P architecture . . . . . . . . . . . . . . . . . . . . .
25
3.5
GnuStream Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.6
GnuStream Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.7
Hybrid P2P architecture . . . . . . . . . . . . . . . . . . . . . . . . .
29
4.1
Logical Proxy-Client View . . . . . . . . . . . . . . . . . . . . . . . .
34
4.2
A schematic view of a peer-to-peer network . . . . . . . . . . . . . . .
35
4.3
A schematic view of a content delivery network . . . . . . . . . . . .
36
4.4
A peer-to-peer scenario . . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.5
A content delivery network scenario . . . . . . . . . . . . . . . . . . .
39
4.6
Logical proxy group view . . . . . . . . . . . . . . . . . . . . . . . . .
42
4.7
Logical PlanetLab view . . . . . . . . . . . . . . . . . . . . . . . . . .
43
4.8
Semantically based group formation . . . . . . . . . . . . . . . . . . .
44
4.9
Network throughput based group joining . . . . . . . . . . . . . . . .
45
viii
4.10 Network Closeness based Group Formation . . . . . . . . . . . . . . .
46
5.1
Logical Proxy View . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.2
The naming manager view . . . . . . . . . . . . . . . . . . . . . . . .
50
5.3
Proxy join operation . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
5.4
Proxy Management Schema . . . . . . . . . . . . . . . . . . . . . . .
51
6.1
Simple FEC Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
6.2
A parity packet FEC scheme
. . . . . . . . . . . . . . . . . . . . . .
57
6.3
Parity-packet based forward error correction . . . . . . . . . . . . . .
57
7.1
Scalable Coding - Bit Stream Hierarchy . . . . . . . . . . . . . . . . .
60
7.2
Blockdiagram temporal scalable encoder Scheme . . . . . . . . . . . .
61
7.3
Blockdiagram temporal scalable decoder . . . . . . . . . . . . . . . .
63
7.4
Blockdiagram spatial scalable encoder . . . . . . . . . . . . . . . . . .
63
7.5
Blockdiagram spatial scalable decoder . . . . . . . . . . . . . . . . . .
64
7.6
Blockdiagram signal-to-noise scalable encoder . . . . . . . . . . . . .
65
7.7
Signal-to-Noise Scalable Decoder Scheme . . . . . . . . . . . . . . . .
66
7.8
Temporal Polyphase Down sampling . . . . . . . . . . . . . . . . . .
68
7.9
Spatial Down sampling . . . . . . . . . . . . . . . . . . . . . . . . . .
71
8.1
Logical view of the testbed . . . . . . . . . . . . . . . . . . . . . . . .
75
8.2
Logical View Scenario 1 and Scenario 2 . . . . . . . . . . . . . . . . .
76
8.3
Logical View Scenario 3 . . . . . . . . . . . . . . . . . . . . . . . . .
78
8.4
Scenario5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
8.5
GOP structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
8.6
Graph based error treatment selection
. . . . . . . . . . . . . . . . .
84
8.7
Logical view scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . .
86
8.8
Logical view scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . .
88
9.1
Standalone Proxy Architecture . . . . . . . . . . . . . . . . . . . . . .
91
ix
10.1 Replication Affinity (logical view) . . . . . . . . . . . . . . . . . . . .
96
10.2 Up- and DownLink . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
10.3 Graph Search Scenario using A∗ . . . . . . . . . . . . . . . . . . . . . 102 10.4 Replication Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.5 Emulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . 104 11.1 Centralized Proxy-to-Proxy Group . . . . . . . . . . . . . . . . . . . 107 11.2 Decentralized Proxy-to-Proxy Group . . . . . . . . . . . . . . . . . . 107 11.3 P2PView . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 11.4 P2PView2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
x
Abstract This thesis strives to develop methods for delivering visual content while maximizing the quality of service (QoS) aspect. Therefore an innovative architecture known as proxy-to-proxy is presented. The proposed system is able to behave like a kind of peer-to-peer system as one extreme, a kind of content delivery network as the other extreme or a combination of both approaches. Tuning the system to behave like a peer-to-peer system or a content delivery network in the same situation enables to compare the two extremes against each other. By combining content delivery and peer-to-peer network characteristics it is possible to (a) manage content replication on the one hand and (b) enable QoS based content delivery from “nearby” sources on the other hand. The tuning between different system behaviors is managed by three affinity functions called (I) proxy affinity, (II) stream affinity and (III) replication affinity. Proxy affinity is required to form the group based overlay network. Stream affinity is used to find the proper tradeoff between error correction and error avoidance when delivering media streams. Replication affinity is required to make replication and reallocation decisions. The architecture has been designed, implemented and tested and can be used to stream MPEG-1,2,4 multimedia content over IP based networks. Test traces have been performed under ”real” Internet conditions using PlanetLab. In order to emulate system behavior under controlled conditions the network simulator NS-2 [NS201] has been extended.
xi
CHAPTER
1
Introduction
Streaming video data over best effort networks is a challenging task concerning the quality of the received content. The quality decreases with every frame that is (I) corrupted, (II) lost or (III) received after its playback time. The main reasons for lost, delayed or corrupted frames are (a) overloaded streaming servers and (b) crowded network paths. In order to increase server scalability and balance network load between alternative paths, several solutions already exist. The most popular ones are peer-to-peer systems (Chapter 3) and content delivery networks (Chapter 2). Both approaches have been developed with different intensions, but from an abstract point of view they serve the same purpose of distributing content to users. In this work I analyze characteristics of both approaches and propose an innovative architecture called proxy-to-proxy (Chapter 4). The core components of the architecture are workstations operated by end-users. The workstations are dedicated for storing, processing and sharing multimedia content. In order to respect quality of service constraints, the proxies adapt their behavior to (a) the state of the network, (b) the coding structure of the media files and (c) the end-user requirements. The adaptive behavior is based on three metrics called (1) proxy affinity (Chapter 4), (2) stream affinity (Chapter 8) and (3) replication affinity (Chapter 10). Proxy affinity is used to build an overlay network of proxy groups that combine characteristics of peer-to-peer systems and content delivery networks. These characteristics are represented by two utility values known as (a) network closeness and (b) semantical closeness. Network closeness and
CHAPTER 1. INTRODUCTION
Page 2
semantical closeness can be weighed against each other to enforce different system behaviors. As one extreme the system can behave like a pure (1) peer-to-peer system, as the other extreme it can behave like (2) a content delivery network, or (3) as a combination of both approaches. The second metrics is stream affinity. Stream affinity is used to select the proper treatment to avoid the effects of network errors on video streams. Based on (a) the state of the network and (b) the structure of the video stream the selection is between (I) error avoidance, (II) error concealment or (III) the combination of both approaches. Error avoidance is achieved by streaming video fragments from multiple senders. The fragments are encoded using polyphase multiple description coding (MDC). In case that some fragments are lost during transmission other fragments are not affected. Error concealment is based on the media independent forward error correction. By using media independent forward error correction, redundant packets can be used to reconstruct lost ones. The third metrics is replication affinity. Replication affinity is used to (I) select the best destination for replicated content and (II) to decide which content to replace in case of insufficient storage space. The destination selection depends on content and network parameters. Content parameters are used calculate the similarity to other videos that are already shared by a certain group. Network parameters are used to estimate the expected (a) replication time and (b) video quality received by later clients. The second aspect of replication affinity are the replacement decisions. Replacement decisions are based on the (a) popularity, (b) availability within the proxy-to-proxy network and (c) the required storage space. The system is evaluated using (1) emulations and (2) real Internet traces. The emulations are based on an extended version of the network simulator NS-2 [NS201]. I have extended NS-2 for delivering multiple media streams to one receiver and applying forward error correction. The quality of the received media stream is determined using a plug-in called EvalVid [Jir03]. The evaluation under real network conditions
CHAPTER 1. INTRODUCTION
Page 3
was performed using PlanetLab nodes. This thesis is logically divided into three parts. Part I first gives a general overview about Content Delivery- and Peer-to-Peer Networks and then describes a new architecture called Proxy-to-Proxy. Part II first gives an overview about error concealment and error avoidance techniques for content delivery and then describes the stream affinity model. Part III gives an overview about alternative cache architectures and cache replacement strategies and then presents the replication affinity model.
Part I Content Distribution Topologies
4
CHAPTER
2
Content Delivery Networks
This chapter gives an overview about the current state of the art of content delivery networks. Content delivery networks are used to replicate content from origin servers to surrogate (cache) servers. Section 2.1 presents a general introduction about content delivery networks, describing the layered architecture with an exemplary use-case. Section 2.2 outlines push and pull based content distribution mechanisms. Pushbased mechanisms replicate content pro-actively (before any user sends a request) whereas pull-based mechanisms fetch it from the origin server or a nearby surrogate once a request has arrived. Section 2.3 discusses different request routing mechanisms and algorithms for assigning a client request to a surrogate server. Making the best assignment is a difficult task because server load and network conditions change continuously. Finally section 2.4 gives an overview of the architecture of the Akamai Content Delivery network that streams multimedia content, aiming to achieve high end-user satisfaction.
2.1
An overview of Content Delivery Networks
Content Delivery Networks (CDNs) were developed to solve performance problems caused by congested network links and overloaded servers. The three key components
CHAPTER 2. CONTENT DELIVERY NETWORKS
Page 6
in the content delivery process are (1) the content providers, (2) a content delivery network provider and (3) the end-users. A content provider provides content for other users, typically end users as well as professionals from companies or people from academia. Content providers store the data on the origin server. A content delivery network provider is an organization or company that provides infrastructure facilities for delivering content in a timely and reliable manner from the base server to the surrogate servers and from there to the end users. The infrastructure facilities include (1) content-replication, (2) request-routing, (3) delivery to end-users and (4) accounting mechanisms. End-users send queries to origin servers. When the origin server cooperates with a content delivery network the request is redirected to a replica server, enabling to deliver the content in the desired quality. The redirection to a nearby server is typically transparent to the end-user. A scenario for using a content delivery network can be found in Figure 2.1:
CHAPTER 2. CONTENT DELIVERY NETWORKS
Page 7
Figure 2.1: Content Deliver Network use-case
1. The client sends a request to the origin server 2. The client receives an HTML page containing references to high bandwidth demanding or frequently used content 3. The request for replicated objects is redirected to the content delivery network provider 4. The CDN provider’s request redirection algorithm selects the surrogate server with the closest proximity to the client 5. The surrogate server fetches the objects marked for replication from the origin server, forwards them to the client and caches them for further requests.
CHAPTER 2. CONTENT DELIVERY NETWORKS
Page 8
The tasks described in the scenario are usually performed by four layers:
Figure 2.2: Content Delivery Network Layers As can be seen in Figure 2.2, the Basic Fabric Layer (Layer 1) is the lowest layer. This layer provides information about infrastructural resources such as (1) origin as well as (2) surrogate server and the (3) network infrastructure. Information about the network infrastructure is required to (a) replicate the content from origin- to surrogate servers and (b) deliver it to end-users. Network infrastructure information can be either exchanged purely at the application level (overlay-approach) or by cooperation between the application- and the network level (network-approach). Using the overlay-approach, as used by most commercial CDN providers, management information is only exchanged between application level components. Network elements (routers, switches, etc.) do not make any content delivery network specific decisions. The advantage of the overlay approach is that content delivery network providers do not need to control the underlying network elements which simplifies management and the establishment of new services. When the network approach is used, routers and switches are augmented with CDN software, enabling them to redirect requests to surrogate servers according to the current server load and network state. The Communication and Connectivity Layer (Layer 2) usually provides the core communication protocols, enabling communication between network elements on the one
CHAPTER 2. CONTENT DELIVERY NETWORKS
Page 9
hand and the origin- and surrogate servers on the other hand. An example of network element communication is the TCP based Network Element Control Protocol (NECP)[M. 00]. NECP is a lightweight protocol for the communication between surrogate servers and content-aware switches or load-balancing routers. The protocol is used to inform network elements about surrogate server capabilities and content availability, serving as a basis for request routing decisions. The Content Distribution Network Layer (Layer 3) enables fast delivery of (1) static content, (2) streaming media or the (3) provision of the infrastructure for customer defined services. Examples of static content are images, text documents or HTML pages. Streaming media can be based on (a) live or (b) pre-recorded input, including movie files and music clips for consummation on demand. Customer defined services are dynamically generated HTML pages (eg. e-commerce applications). Dynamically generated pages cannot be cached and therefore need a special handling. The End User Layer (Layer 4) defines the interaction between (1) end-users, (2) surrogate servers and (3) origin servers. When the content is available on a surrogate server the client’s request is directed there. Otherwise an additional step, where the content fetched from the original server and forwarded to the client, is necessary.
2.2
Content Distribution
Once the surrogate servers have been placed an efficient content outsourcing technique has to be used. The highest cache-hit rates would be achieved by replicating entire origin servers, however replica placement decisions are heavily influenced by replication costs and space limitations on the surrogate servers. In order to find a tradeoff between full replication and an empty cache, objects similar to the requested ones are replicated. The three commonly used strategies [Geo06] are (1) push-based as well as (2) uncooperative pull-based and (3) cooperative pull-based replication. Push-based systems are an academic approach, all commercial content delivery network providers use the pull-based one.
CHAPTER 2. CONTENT DELIVERY NETWORKS
2.2.1
Page 10
Push-based systems
The push-based approach is based on distributing content based on client interests, reducing cache-misses. The two most commonly used mechanisms for finding similarities between objects is the (1) coarse-grained user-session-based and the (2)fine grained url-based approach. In the user-session-based approach valuable information about user interests is collected by extracting demographics and historical access patterns from server log files. The collected information includes for example (a) IP addresses of end user devices, (b) the order of accessed web pages, (c) lists of selected links, (d) the elapsed time between successive requests from the same user etc. This information is used for grouping similar content [Jit01] accessed by different users in the same geographic region. In order to differentiate users that access the same web site from the same computer from those that merely belong to the same network, the concept of a session has been introduced. A session is the unit of interaction between the user and the web server that is limited by a timeout value. The second mechanism for selecting content to replicate is the URL-based approach. The mechanism described in [Geo06] uses a directed graph where single web objects are connected to similar objects on the same server . Once the most popular data is identified, the arcs in the graph are followed and the content within a predefined range is clustered. Selecting the cluster size is a tradeoff between storage space utilization and management effort. The analysis in [Yan03a] suggests that replicating 10% of the most popular data is sufficient for serving more than 80% of the requests. As object popularity changes after one week, the selection must be periodically redefined.
2.2.2
Pull-based systems
Pull-based replication represents the state of the art in all commercial content delivery networks[Yan03a]. Client requests are redirected to surrogate servers based solely on network connectivity and server load - ignoring the cache status of the surrogate
CHAPTER 2. CONTENT DELIVERY NETWORKS
Page 11
server. So when a cache miss occurs (which happens very often using this policy) the content has to be fetched from the origin- or another surrogate server. When the system is able to exchange the content between surrogate servers it is said to be cooperative. When in case of a cache miss the origin server has to be contacted the system is said to be uncooperative. In cooperative systems the cooperation can take various forms, ranging from centralized (tightly-coupled) infrastructures (Figure 2.3) to completely decentralized
Figure 2.3: Centralized cooperative CDN Architecture (loosely-coupled) systems (Figure 2.4). In a centralized cooperative architecture,
Figure 2.4: Decentralized cooperative CDN Architecture a query message is sent to the central server that stores the meta information about the content available on all surrogates. The advantage is that the query message only has to be sent to one destination. The drawback of this approach is that the central directory server is a single point of failure and likely to become a bottleneck. Using a decentralized cooperative CDN architecture, a query message is broadcast
CHAPTER 2. CONTENT DELIVERY NETWORKS
Page 12
to all cooperating surrogate servers [WC97]. Broadcasting query messages to all cooperating surrogates is a network-intensive and time-consuming task. The surrogate has to wait for all replies before it is definitely known that the requested content is not available on any other surrogate. A tradeoff between the centralized and decentralized approach is the digest-based one where each participating surrogate maintains references to all other participants. The price for avoiding a flood of queries is that all other surrogates have to be informed in case of an update. The amount of traffic is the same for the query-based and the digest-based approach but content can be found much faster using the latter approach.
2.3
Request Routing
Request routing is required to direct a client request to a surrogate server. The main distinction for request routing mechanisms is between (1) transparent and (2) nontransparent approaches and (3) a combination of both. Transparent request redirection is usually performed using modified DNS servers (DNS based redirection), that return an IP address from the pool of candidate replica servers. The returned address is selected using either (I) adaptive or (II) non-adaptive policies. Adaptive policies consider current system conditions like host and network characteristics or request frequencies. Non-adaptive policies ignore current conditions and use mechanisms like round-robin assignment that balances the load and treats all requests equally. Adaptive mechanisms are more complex and time consuming but achieve much better network throughput. DNS based redirection is popular because of its simplicity and independence from any replication policy. In general, the drawbacks are increased network latency [Bal01, Abb03, Ane01, Zhu02] and the ignorance of information about a client’s location. DNS based redirection is only possible when (1) the requested URL is not stored in the client cache and (2) services are referred by means of DNS names and
CHAPTER 2. CONTENT DELIVERY NETWORKS
Page 13
not by IP addresses. The second mechanism is non-transparent redirection. Non-transparent redirection enables a much finer granularity than the transparent approach. For example it is possible to treat each object in an HTML page (eg. embedded pictures) individually instead of accessing the entire page. Non-transparent redirection can be performed (1) manually or (2) automatically. Manual replica selection is typical for peer-to-peer applications. In case of a successful query the user receives a list that contains all alternative sources. The client then selects the source he wants to download the content from. Automatic replica selection is commonly implemented using HTTP redirection. HTTP redirection is based on sending a special message to the client. The message provides a list of replica servers. The client must process the message and select an alternative server to re-submit the request. HTML redirection requires modifications to the server as well as to the client. The third alternative is a combination of transparent- and non-transparent redirection, using DNS and HTTP. URL’s inside the HTML pages are rewritten so that each DNS name represents a group of replica servers storing the given object. This is beneficial when the objects within the same document are of different types like video and text. Videos are then served from dedicated streaming servers, maybe using a different transport protocol such as the real time streaming protocol (RTP).
CHAPTER 2. CONTENT DELIVERY NETWORKS
2.4
Page 14
Media Streaming in Content Delivery Networks
Content delivery networks were originally designed for delivering static content (text documents, pictures, etc.). Streaming a media file typically requires much more bandwidth than delivering static content and additionally the streams are more sensitive to delay, jitter and packet loss. In order to cope with these effects and enhance end-user experience Akamai, the largest Content Delivery Network provider, uses transport and streaming servers that support (1) lost packet recovery, (2) redundant stream delivery and (3) pre-bursting techniques [Kon04]. The Akamai streaming architecture (Figure 2.5)consists of two
Figure 2.5: Akamai Streaming Architecture building blocks: (1) transportation- and (2) delivery network. The transportation network connects entry points and set reflectors. Entry points act as replication proxies to the origin stream, whereas set reflectors form an overlay network that propagate (deliver in real-time) data from the entry points to the streaming servers close to the end users. The streaming servers are located in networks operated by network providers at the edge of the Internet.
CHAPTER 2. CONTENT DELIVERY NETWORKS
2.4.1
Page 15
Selecting the Streaming Server
User requests are transparently mapped to the streaming server in the nearest edge region using adaptive 2-level DNS redirection. Akamai’s DNS hierarchy consists of high-level (HLDNS) and low-level (LLDNS) servers. Client requests are first received by the high-level servers which know all available edge-server regions close to the end-client. The high-level server selects the edge-region and forwards the request to the low-level DNS server. The low-level DNS server in the edge region considers all current conditions like server health, server load and network condition to select the server for the request. Server health and load are determined by analyzing (1) CPU and (2) disk utilization to decide if a server is able to handle an additional request. (I) Client location and (II) network load are measured to decide if a network path between the server and the client has sufficient bandwidth, minimal packet loss and delay.
2.4.2
Failure Handling in the Akamai Network
Entry point failures either happen due to the (1) crash of the entry point itself or a (2) break-down of the connection between the origin server and the entry point (proxy). In order to ensure that a stream continues to be available in case of such a failure it is necessary to connect multiple entry points to the same origin server. So an ordered list of candidate entry points for every streaming server is created. The first entry in the list is the default entry point, the others are able to be used in case of failure. The health of the default entry points is checked by sending periodical query messages. A failure is detected when the default entry points do not respond to the query messages within a predefined interval. Once a crash of an entry point is detected a distributed election algorithm between all back off entry points is executed. The election is necessary to ensure that only one backup server takes over the functionality of the crashed entry point. Once a new entry point has been elected, the information is sent to all others, ceasing their efforts to become the default entry point. The node
CHAPTER 2. CONTENT DELIVERY NETWORKS
Page 16
that wins the election becomes the new default entry point. In case of an ongoing streaming session the new default entry point has to resume the operation of the crashed entry point and continue pulling the stream from the origin server. When the crashed entry point reconnects, it starts to send heartbeat and announcement information. As soon as this information is received by the set reflectors, the recovered entry point takes over again. The second type of failure is the loss of a connection between the origin server and the default entry point. As soon as a server is unreachable, the entry point stops sending announcements for its streams. In this case also a new default entry point is selected.
2.4.3
QoS aware Media Delivery
One possibility to enable error freeness is retransmitting lost data using the Transport Control Protocol (TCP). The drawback of this mechanism is that real-time constraints of the media stream are not considered and back off time and bandwidth throttling harms the end user’s experience. As alternatives to TCP, Akamai uses three mechanisms: (1) packet loss recovery, (2) adaptive multipath transmission and (3) frame pre-bursting. Packet loss recovery Packet loss recovery is implemented in Akamai using a technique called plain retransmits that has shown to perform better than the insertion of parity packets. If the streaming server receives packet i+k from the entry point without having received packet i, the i-th packet is requested again. Requesting the packet is only possible when the time to retransmit it is shorter than the scheduled streaming time. An important parameter is the tolerance factor k: setting the factor too low wastes network bandwidth because delayed packets are considered lost. Setting it too high retransmits lost packets too late to be of use to for the streaming server. Even though packet retransmission is able to recover about 99% of the lost packets, is not able
CHAPTER 2. CONTENT DELIVERY NETWORKS
Page 17
to improve the quality in case that a link goes down. Therefore Akamai combines it with adaptive multipath transmission.
Adaptive multipath retransmission Adaptive multipath transmission reduces the loss rate between the entry point and the streaming server by sending data over multiple paths. Akamai sends the data from the entry point to different set reflectors that forward it to the same streaming server. The streaming server merges the packets transparently and delivers one single stream to the receiver. In order to save resources, the number of paths can be adapted dynamically, according to the quality information provided by the streaming servers. Bad quality increases - good quality decreases the number of adjacent paths. In the best case one path is sufficient, but if a link goes down, rapid detection is required to ensure that the user-experience remains unaffected. Pre-bursting The third technique to reduce packet loss is called pre-bursting, where the frames between the entry point and the streaming server are transferred at a higher rate than between the streaming server and the client. This technique reduces initial buffering times and the risk of buffer under runs during the transmission. The main limiting factor for this approach is the speed of the encoder in case of live streams.
CHAPTER
3
Peer-to-Peer Networks
This section gives an overview of Peer-to-Peer networks. Peer-to-Peer networks consist of a set of interconnected nodes [Ste04] that are organized into different network topologies with the aim of sharing (1) content, (2) CPU cycles, (3) storage space or (4) network bandwidth. The focus of this chapter is on (a) centralized, (b) decentralized and (c) hybrid peer-to-peer architectures.
3.1
Centralized Architecture
A logical view of a centralized peer-to-peer architecture can be found in Figure 3.1. The main components are the dedicated central server (S) and the peers (P). Each
Figure 3.1: Centralized P2P architecture
CHAPTER 3. PEER-TO-PEER NETWORKS
Page 19
peer shares content for all other peers. The central server manages a database containing meta information about the shared content. Queries for content are always sent to the central server. The content itself is always stored on the peers. The main advantage of centralized architectures is that a central database server is simple to implement. All information is available in one place - so either all or no information is available - allowing errors to be easily detected. The main disadvantage of centralized architectures is that they have a central point of failure. Maybe the best known example of a centralized Peer-to-Peer system is Napster.
3.1.1
File Sharing Example: Napster
Napster, a centralized peer-to-peer file sharing system, was developed in 2000 by Shawn Fanning [Hoo04]. Napster differentiates only slightly from the general centralized architecture described in Section 3.1. An interesting aspect about the Napster implementation is that the central server is replaced by a cluster of multiple nodes. The goal of using multiple nodes is to maximize the number of queries for content that can be handled in parallel. The load balancing mechanisms used within the Napster cluster are proprietary and unpublished.
3.1.2
Media Streaming Example: DirectStream
DirectStream [Yan03b] is an on-demand media streaming system that is based on a centralized peer-to-peer network. The central component is a directory server that stores references to the media streams stored on the peers. The peers operate in 3 different modes: (1) server mode, (2) client mode or (3) double mode (a combination of both). When a peer is operated in server mode, it streams content to one or more other peer(s). In client mode, the peer caches and renders a video stream. When operated in double mode, the received content is rendered, cached and forwarded. The most challenging aspect of DirectStream is the double mode mechanism. Double mode is used to balance host and network load. An example scenario can be found
CHAPTER 3. PEER-TO-PEER NETWORKS
Page 20
in Figure 3.2. Peer A sends a request for some content to the original server O.
Figure 3.2: DirectStream P2P-Scenario Sometime later peer B sends the same request to the directory server. Instead of the original server’s address peer B receives the address of peer A. Peer B contacts peer A and consumes the cached content. At this stage peer A is switched from client to double mode. In the rest of this section the double mode functionality is examined in more detail. While playing back the content from the original server a peer in double mode caches the content using a window of w seconds. If another client sends a request for the same content to the directory server within s (s < w) seconds, the server knows that the requested content is being played backed and cached by another peer. The server advices the two peers to form a so called cluster (dotted line in Figure 3.2). Let’s assume that additionally, another client (peer C) sends the same request after s2 ( w < s2 ) seconds. The start of the movie has already been replaced on peer A but is still available on peer B. So the new peer is connected to peer B. So peer A, B and C form a cluster. In order to manage cluster formation, the central server uses a so called QoS parent selection algorithm to form an overlay network of forwarding peers. The main goal of the algorithm is to balance two factors: (1) the selection of a forwarding peer
CHAPTER 3. PEER-TO-PEER NETWORKS
Page 21
with high available bandwidth over long paths and (2) the selection of a forwarding peer with a short path to the receiver minimizes the traffic placed on the network. Selecting peers with high available bandwidth over long paths helps to balance the workload, selecting peers close to the receiver is used to reduce network load. The peer selection is based on the formula: DistanceBandwidth =
nri xi
where ni is the distance between the forwarding and receiving peer in number of hops, xi is the available bandwidth between the two peers and r, 0 ≤ r < 8 is a weighting factor that can be set dynamically. A discussion of the DirectStream approach Principally the forwarding of the cached content in order to (1) balance the network load and (2) release single peers is a good idea. However the correctness of the formula DistanceBandwidth =
nri xi
(3.1)
that is used for QoS based parent selection is doubtful to me. According to [Yan03b] xi only represents the available bandwidth between the sending and the receiving peer - but not the bit rate of the stream. It would however be essential to explicitly consider the required bit rate in this formula, as underlined by the following short example. Let’s assume that the stream can be requested from a peer that is connected to the receiver by a 500 Kbit/s network link and that the number of hops between the sending and the receiving peer is 2. Using the weighting parameter r=1, according to Equation 3.1, the distance-bandwidth ratio is 0,004. A network connection to the alternative server provides an average bandwidth of 800 Kbit/s with only 1 hop, resulting in a calculated bandwidth distance ratio of 0,00125. To achieve the goal of best balancing the network load, the algorithm chooses the first alternative (having a higher distance-bandwidth ratio of 0,004). Now let’s assume that the bit rate of the transmitted stream is 700 Kbit/s (a fact that is not considered in the formula). Using
CHAPTER 3. PEER-TO-PEER NETWORKS
Page 22
the first alternative a lossless transmission would not be possible. The bandwidth of the link is 500 Kbit/s - meaning that when a stream with an average bit rate of 700 Kbit/s is transmitted, 200 Kbit/s must be discarded. Selecting the second alternative a lossless transmission would be possible.
CHAPTER 3. PEER-TO-PEER NETWORKS
3.2
Page 23
Decentralized Architectures
Decentralized architectures differ from centralized ones in the sense that each peer exchanges messages with the direct neighbors, messages to other peers are forwarded through the network (Figure 3.3). As all peers are equal and provide client as well
Figure 3.3: Decentralized P2P architecture as server functionality, they are known as servants [Ste04]. In order to join the network a new peer has to contact a bootstrap server and subsequently receives the IP address(es) of N (N ≥ 1) node(s) that are already connected to the system. For redundancy reasons a new peer typically connects to multiple other peers, ensuring that the peer is not disconnected from the system in the event that one connection breaks. Direct communication is only possible between neighboring peers and thus messages to other peers are forwarded in a recursive multistep process. In each step the peer forwards the message to all neighboring peers. To limit the spread of the messages, the message header contains a time to live (TTL) value which is decremented each time the query is forwarded. When the TTL value reaches zero, the message is discarded. The advantage of decentralized architectures is that single node or network failures can typically be compensated for by using connections to other nodes, providing no single point of failure.
CHAPTER 3. PEER-TO-PEER NETWORKS
Page 24
Beside the positive aspects, decentralized architectures have mainly three disadvantages: 1. Forwarding messages is a host and network intensive task. 2. The TTL value imposes a virtual horizon for the messages. When the TTL is 0 the message is discarded maybe before the destination is reached. Sometimes content that is shared by some peer is not found and has to be downloaded again from the original server. 3. A massive time difference exists between searching for popular and unpopular content. Popular content is located much faster because it is typically available in many more places - and closer to the client.
The most well-known example of a decentralized peer-to-peer system is Gnutella.
3.2.1
File Sharing Example: Gnutella
The Gnutella network is based on the Gnutella protocol that was developed by Justin Frankle and Tom Pepper in 2000. The peers are organized in a flat distributed architecture where each peer offers (1) client as well as (2) server functionality. The bootstrap servers in the gnutella architecture are called “GnuCaches” and are well known nodes with static IP addresses, keeping records of all peers available on the network. When a new peer wants to join the network it queries a GnuCache for a list of available peers. In order to connect to one of the available peers the new node sends a ping message. Upon the receipt of a ping message an already connected peer returns a pong message to the originator with the information for the new peer to connect. In a recursive process each receiver forwards the message to all neighbors and after some time, a new peer has typically received a pong message from all active nodes. Once a peer is connected to other peers the typical activity is searching for content.
CHAPTER 3. PEER-TO-PEER NETWORKS
Page 25
Query messages are forwarded recursively in a step by step basis to all connected peers. The time required for receiving a query response message mainly depends on the popularity of the searched content: The more popular the content is, the more replicas are available and typically the messages have to travel through shorter paths. An example can be found in Figure 3.4. Let’s assume that a request is sent from
Figure 3.4: Decentralized P2P architecture peer A that has only one neighbor, peer B. In the first step the query message is sent from peer A to B. In order to avoid redundant message processing, peer B checks the message ID. Peer B decides that it has not seen the message yet so it searches the local database for the requested content. In parallel it decrements the TTL value of the message by one and forwards it to its neighbors (peers C,D,F,H). Peer A is excluded because it is the peer that the message originally was received from. The process of forwarding the message is repeated until either all nodes have seen it or the TTL value is 0. Let’s assume that peer F is the only one that shares the requested content and wants to send a query hit message to A. As peer F has no direct connection with A - the message is sent step-by-step using the inverse path, (F-B-A). Peer A recognizes the query hit message and takes it from the network.
CHAPTER 3. PEER-TO-PEER NETWORKS
3.2.2
Page 26
Media Streaming Example: GnuStream
This section introduces GnuStream, a media streaming system that is built on top of the decentralized Gnutella peer-to-peer network [Xux03]. GnuStream is able to (1) aggregate the upload bandwidth from multiple senders and (2) react to host and network capacity changes during the transmission. The functionality of GnuStream is described with the help of an example (Figure 3.5). Let’s assume that peer 1 sends
Figure 3.5: GnuStream Scenario a request for some content (details about searching content in the Gnutella network can be found in Section 3.2.1). Let’s assume that the content is shared by four peers (Peer2-5). Next the required bandwidth for streaming the content is calculated using the following formula: ArrivalM ediaBandwidth =
F rameSize ∗ Bits/P ixel ∗ Y U V Sampling ∗ P laybackRate CompressionRate ∗ Bits/Byte
In the example, the frame size is 352*240 pixels, the frame rate is 30 frames/second, YUV sampling is 4:2:0, the number of bits per pixel is 8 and the compression rate is 26:1. So the bit rate of the requested stream is 1144 Kbit/s. Each of the peers only has an upload bandwidth to the receiver of 480 Kbit/s. In order to transmit the video without loss the peers P2,P3 and P4 cooperate. Peer P5 is used as a standby sender in case one of the streaming servers fails or the available network bandwidth decreases.
CHAPTER 3. PEER-TO-PEER NETWORKS
Page 27
In order to enable the behavior described in the introductionary scenario of this section GnuStream is organized in three layers called: (1) Network Abstraction Layer, (2) Streaming Control Layer and (3) Media Playback Layer (Figure 3.6). The Net-
Figure 3.6: GnuStream Layers work Abstraction Layer is required to provide a uniform interface to the underlying peer-to-peer system, allowing GnuStream not to be restricted to Gnutella. Other networks such as Chord [Ion01] or CAN [Syl01] can also be used. The Streaming Control Layer provides the functionality to deal with the heterogeneity and the dynamic behavior of best effort networks, including (1) bandwidth aggregation and (2) host status. Bandwidth aggregation makes a distinction between even and proportional allocation. Even allocation balances the load evenly among the senders. This is only possible when all senders have the same network throughput to the receiver. For example when two peers are used to deliver a video with a bit rate of 600 Kbit/s each peer transmits 300 Kbit/s. Proportional allocation is used if the paths have different capacities. The transmission rate is then calculated considering available bandwidth between each sender and the receiver. For example, peer 1 might send 400 Kbit/s and peer 2 only 200 Kbit/s. The Streaming Control Layer also detects changes in the host status using periodic probing. The third layer is the Media Playback Layer which is used to smooth out data arrival jitter from multiple senders. The buffer size in seconds is calculated using the following formula:
CHAPTER 3. PEER-TO-PEER NETWORKS
0 Buf f erDelay = N umberof F rames − AF R
Page 28
if AF R ≥ P F R N umberof F rames PFR
where AFR is the arrival frame rate and PFR is the playback frame rate.
(3.2)
CHAPTER 3. PEER-TO-PEER NETWORKS
3.3
Page 29
Hybrid Architectures
The third peer-to-peer approach described in this thesis are hybrid architectures which combine the centralized with the decentralized approach. Hybrid architectures are characterized by multiple groups, each of which has a central coordinator called super node [KR00]. All peers belonging to one group have the same super node. Super nodes are subsequently connected to other super nodes in a decentralized manner (Figure 3.7). A super-node is a pure management component that is not used for file sharing
Figure 3.7: Hybrid P2P architecture but rather maintains a database containing the information about the shared files in the group. A new peer has to select one of the super nodes to register with. The exact selection mechanism differs between implementations and can be based on the distance between the peers, the number of participants in the groups or any other metrics. When searching for content each peer sends the query to the super node it is connected to, and receives references to all peers in the group that share the content. For efficient downloading the destination peer establishes a direct connection to the one holding the content. The main advantages of a hybrid architecture compared to decentralized one are the (1) reduced query time and (2) the error robustness. Queries only have to be sent to one dedicated super node. In the event that super node goes down, the connected
CHAPTER 3. PEER-TO-PEER NETWORKS
Page 30
nodes will elect a new one and continue their operation. Another advantage is that super nodes perform a large portion of the administrative tasks, reducing the demand on the file sharing peers. The most popular peer-to-peer file sharing system that is based on a hybrid architecture is KaZaA.
3.3.1
File Sharing Example: KaZaA
KaZaA [Jun05] is one of the most popular file sharing networks with more than 3 million active users that share over 5000 terabyte of content. The hybrid KaZaA architecture is based on two types of peers, (1) ordinary nodes (ON) and (2) super nodes (SN). An ordinary node is used for file sharing and downloading whereas a super node (SN) is a group management node that stores the meta data of the files shared in the group. On startup each ordinary node connects to a super node and transmits the meta information about its shared files. The meta information includes the file name, file size, file descriptors (e.g. artist name, album name and text provided by the user) and a hash value. Searching for content is performed using multiple steps. In the first step the query is sent to a peer’s super node. Then the peer disconnects from the current super node, connects to a neighboring one and re-transmits the query. Subsequently connecting to different super nodes is called super node hopping. During the search process the peer temporally belongs to several groups. Belonging to a group means that the files shared by the peer belong to the group it is connected to. So the content shared by a group changes every time a node performs a query. As soon as the node disconnects from the super node the content is not available in the group any more. Usually a searching peer maintains a connection with the last super node it is connected to until a new search is performed.
CHAPTER 3. PEER-TO-PEER NETWORKS
3.3.2
Page 31
File Sharing Example: BitTorrent
Another hybrid file sharing architecture beside Kazaa is BitTorrent. BitTorrent was developed in 2001 with the goal of providing a simple way of distributing Linux software. Two basic components in the BitTorrent concept are (1) seeds and (2) leechers. A seed is a peer that provides some content for download. An important characteristics of a seed is to provide the complete content. A leecher is a peer that downloads some content from a seed. In order to exchange data between the seed and the leecher some meta information about the exchanged content is required - known as a torrent. A torrent file can combine information about content pieces available on different seeds. Each piece is typically about 256 - 512 Kb in size. The download of the pieces can be performed in parallel from multiple peers. The performance of BitTorrent is heavily influenced by the way peers are selected for downloading the pieces. The main policies used by BitTorrent are: (1) strict policy and (2) rarest first policy: When strict policy is used, pieces are requested in sequential order according to the structure of the file. When the rarest first policy is used the pieces that are shared by the smallest number of peers are selected first. Using the rarest first policy has 3 main advantages: (1) the number of providing sources is increasing - the more sources are available the lower is the risk of bottlenecks, (2) the higher the number of replicas the higher is the degree of parallelism that can be achieved and (3) downloading rare peaces first increases the probability of a successful file download
3.3.3
Streaming Example: BiTos
An example of a media streaming system based on BitTorrent is BiTos. BiTos was developed based on the inability of the original BitTorent protocol to handle time sensitive traffic. The problem about time sensitive traffic is that pieces that arrive after their playback time are useless and have to be discarded. In case of video,
CHAPTER 3. PEER-TO-PEER NETWORKS
Page 32
content pieces correspond to frames. The BiTos protocol tries to find a tradeoff between requesting pieces according to their rareness and their deadline. To achieve this goal BiTos differentiates between three sets of pieces: (1) received, (2) high priority and (3) remaining pieces. The set of received pieces has a logical entry for all pieces belonging to a stream. The state of each piece is either (a) downloaded, (b) not downloaded or (c) missed. Pieces with the state downloaded have been successfully received. Others with the state not downloaded or missed were either not available from any peer or arrived after the playback time. The high priority set contains references to pieces that have not been received or missed but will soon be required by the player. These pieces have a higher priority for download than other pieces. The remaining pieces are (1) not downloaded, (2) not missed and (3) not in the high priority set. These are effectively the pieces that (a) are currently downloading or (b) have not been requested yet. The BiTos assigns each piece a certain probability which represents the immediate need for a certain part of the media file. The probability can be set dynamically in order to adapt to different conditions. For example, it can be used for setting the tradeoff between the (a) early downloading of pieces that are shared by only a small number of peers and (b) enforcing the download of pieces that will be required next.
CHAPTER
4
Proxy Affinity based Topology Generation
To combine peer-to-peer and content delivery network behavior the Proxy Affinity model has been developed and is presented in this thesis. To evaluate the model a prototype implementation is used. The implementation is called proxy-to-proxy framework. The basic idea of proxy-to-proxy is that content, typically but not necessarily originating from large, high-quality servers is dynamically replicated onto logical networks of proxies. Proxies form dynamic groups to support small geographic regions and certain areas of interest. For example one group is automatically formed by proxies of soccer fans in Klagenfurt (Austria), another one is formed by ice-hockey fans in Munich (Germany). The group building process is based on an model called Proxy Affinity Model (section 4.2). By using different parameter settings, the mode can be used to adapt group size and group membership to the state of the network and the preferences of the end-users. Depending on the parameters the system is able to behave like a (1) pure peer-to-peer system as one extreme, (2) content delivery network in the other extreme or (3) a combination of both approaches. The main innovative aspects or the architecture are: • the combination of high availability from content delivery network clusters with the flexibility of peer-to-peer networks • to have a single system that allows to operate nearby proxies in cooperation as well to select between more distant ones alternatively to avoid bottlenecks -
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 34
based on an affinity metrics • the adaptability of cooperating resources - the number of proxies is automatically adapted according to the state of the network and the number of requests • the possibility to combine peer-to-peer and content delivery network technology - CDN technology allows to cover the network area between the origin source and the replica server - peer-to-peer technology enables QoS delivery from the replica server to the end-client The difficulty about forming proxy groups is that the locations of future clients are not known in advance. For this reason the proxies serve as logical placeholders for later clients. Using placeholders is possible because each end-device has to be connected to one proxy (entrance-proxy)in the local network. By knowing the absolute position of the entrance-proxy, the positions of the “future” clients are also known. A logical view of multiple proxy groups can be found in Figure 4.1. The (entrance) proxies are labeled using green and the end-clients are labeled using yellow.
Figure 4.1: Logical Proxy-Client View
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 35
4.1
Basic characterics of Peer-to-Peer and Content Delivery Networks
Two well known approaches for delivering content are peer-to-peer and content delivery networks as described in Chapter 2 and 3. In general peer-to-peer networks are characterized by a high number of nodes operated by end-users. End-users utilize the peers to exchange content with other end-users. A logical view of a peer-to-peer network can be found in figure 4.2. The blue node
Figure 4.2: A schematic view of a peer-to-peer network in the figure is a peer of a user who wants to download some content, the red ones are peers that can provide it. The user is free to select any of the red peers to download the data. The advantage of multiple download locations is that they can be (1) selected alternatively or (2) used to work in cooperation. Selecting between alternative locations allows an end-user to choose the one that provides the content at the best quality. Cooperative work between the peers is necessary in the case that none of the nodes is able to deliver the content at the desired quality. An example for a peer-to-peer network that supports cooperation between peers can be found in [Xux03]. The disadvantages of peer-to-peer networks are that (1) peer behavior is very dynamic and (2) storage space of an individual peer is limited. Dynamic peer behavior is characterized by the fact that on- and offline times depend on user interests. Storage space may be limited because (a) peers do not cooperate concerning
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 36
storage decisions and (b) home users set a boundary on the amount of space they are willing to provide. The second approach are content delivery networks that are characterized by clusters of high performance computers, placed at strategic Internet locations (Figure 4.3). The number of clusters is by far below the number of peers in a peer-to-peer network. In the extreme case there is only one cluster - depending on the interest and the policy of the delivery network provider. The advantages of a content delivery network are
Figure 4.3: A schematic view of a content delivery network that the servers in the clusters (1) are always on, (2) have sufficient upload capacities and (3) storage space. The disadvantages are that the (1) number of clusters is limited and (2) the clusters are statically placed, making it hard to circumvent bottlenecks on the path to the end-user.
4.1.1
Peer-to-Peer Network Scenario
Figure 4.4 shows a simplified scenario of a peer-to-peer network. It is assumed that a video with a bit rate of 600 Kbit/s is provided by three peers (A,B,C) and requested by peer R. Each of the providing peers has an upload capacity of 250 Kbit/s. In order to serve the request without quality loss the three peers cooperate, achieving an aggregated bandwidth of 750 Kbit/s. The playback starts at 3.45 o’clock and is scheduled to end one hour later at 4.45 o’clock. The problem is that peer A is switched off after 30 minutes. So from 4.15 o’clock onwards the aggregated bandwidth from peers B and C is 500 Kbit/sec (100
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 37
Figure 4.4: A peer-to-peer scenario Kbit/s below the required bandwidth). There are other peers with sufficient resources that are available for the full playback time (peers D and E), however they can’t overtake the operation from peer A as they don’t share the required content and cannot be forced to. As a result the receiver is only able to view the video at full quality for the first 30 minutes. In order to calculate the effect of peer availability on the resulting media quality emulations were performed with two tools Gnutella-Simulator (GnuSim) [Qi 03] and Evalvid [Jir03]. Both tools interact with the network simulator NS-2 [NS201]. GnuSim requires NS-2 to simulate the data exchange between the peers. Evalvid, also based on NS-2, has been combined with GnuSim to calculate the quality of received video streams using a metric known as the Mean Opinion Score [Jir03]. The Mean Opinion Score is calculated by mapping the PSNR value, that represents the quality difference between the original and the received video, to a range between 1 and 5. A value of 5 represents the best quality, meaning that the video quality is not been affected by the network packet loss. The emulation results listed in Table 4.1 cover the behavior of the system over 10 traces. Each trace has a duration of 24 hours, with a total of 100 peers. 50 peers are free-riders (peers that only download content) the other half share videos. Free-riding peers are always online, while the peers that share video files change their status between online and offline. As can be seen the quality difference of the video streams between the emulation with the highest- (100%) and lowest (10%) peer availability is 48,8% (MOS 2.65 vs. 5.00). The
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 38
Availability 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Mean Opinion Score 2.65 2.95 3.24 3.10 3.28 3.23 3.12 3.43 4.30 5.00
Table 4.1: Emulation result scenario 1 goal of this experiment is to show that only by changing the online behavior of the sources the quality of the received test stream can be changed by 48,8 %. According to [Eyt00] Gnutella peers have a daily availability of 2 hours - which corresponds to an online level of less than 10% in the emulations. According to the emulation results the average mean opinion score for the test stream would be decreased to 2.65 only because of the fragile peer-to-peer behavior.
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 39
4.1.2
Content Delivery Network Scenario
Figure 4.5 shows a simplified topology of a content delivery network [Mol06]. The
Figure 4.5: A content delivery network scenario scenario includes one origin server (S), four surrogate server clusters with high speed network connections and one client (C). One would assume that once there are enough surrogate server resources to serve the requests, performance bottlenecks can be avoided. In this scenario the content is available from the original server and replicated on cluster 1. The problem is that there is a bottleneck between cluster 1 and the client. In order to express the influence of network packet loss between surrogate servers and end-clients on the resulting media quality, 10 subsequent emulations have been performed. In the subsequent experiments the bottleneck leads to packet losses between 0 and 90 %. The corresponding mean opinion score values for the test video can be found in Table 4.2. The difference between the best and the worst media quality is 69,6% (MOS 1.52 vs. 5.00). To avoid the bottleneck the content could be replicated to one of the three other clusters and delivered from there. The problem with this alternative solution is that the network throughput from each of the three clusters is worse than the connection between the origin server and the client. So the best result could be achieved by streaming the content from the origin server. Streaming the content from the origin server would (1) render the content delivery network useless and (2) would not scale in case of multiple requests. A solution to
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 40
Loss Rate 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Mean Opinion Score 1.52 1.49 1.65 2.16 2.17 2.23 3.01 3.44 4.65 5.00
Table 4.2: Emulation Result Scenario 2 this problem would be to provide more alternative surrogate server locations that are located “close” to the end-user.
4.2
The ProxyAffinity Model
The proxy affinity model combines two utility values called Network Closeness and Semantical Closeness, described later in this section. By changing model parameters the architecture behavior can either be (1) peer-to-peer network like, (2) content delivery network like or (3) a combination of both (= proxy-to-proxy ). When peerto-peer behavior is enforced the number of groups is high, the number of proxies in the groups is low and the proxies have fast network connections among each other. In the extreme case N proxy groups are formed and each group contains one proxy. Otherwise when content delivery network behavior is enforced the number of groups is small but the number of proxies in the group is high. In the extreme case only one group is formed where all proxies belong to. When proxy-to-proxy behavior is enforced a tradeoff between the two extremes of peer-to-peer and content delivery network behavior has to be found.
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 41
The system behavior is set either (1) implicitly by current host- and network characteristics or (2) explicitly by a weighting factor. Setting the system behavior explicitly is used to compare peer-to-peer, content delivery- and proxy-to-proxy behavior against each other in the same situation. In any case a new proxy that enters the system joins the group with the highest affinity value. Proxy Affinity is the weighted sum of Network Closeness (NC) and Semantical Closeness(SC): P roxyAf f inity = α ∗ N C + (1 − α) ∗ SC
(4.1)
where α is a weighting factor that can take values between 0 and 1. When implicit topology formation is used α is set to 0.5. In this case the group formation purely depends on the characteristics of the host-, network- and content. Otherwise, in case of explicit topology generation, the system behavior is predetermined by the weighting factor. Examples for enforcing content delivery and peer-to-peer behavior are presented in Sections 4.2.1 and 4.2.2. Section 4.2.3 presents an example where α is set to 0.5 so that both utility values have the same influence and proxy-to-proxy behavior is evaluated.
4.2.1
Semantical Closeness (SC)
Semantical closeness is used to form homogenous proxy groups concerning the (1) genre and (2) playback duration of the shared videos. So the number of groups depends on the number of genres and the variance in playback durations. In the extreme case, when only videos of the same genre, having identical playback times are shared by all proxies, only one group is formed. The more different genres and playback durations are available the higher is the number of groups. A logical view can be found in Figure 4.6. The different colors represent the different genres and playback durations. The genre of a video is required to distinguish between different topics. So it is possible
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 42
Figure 4.6: Logical proxy group view to form a group where only scientific documentary films are shared and another one where entertaining movies are available etc. The playback duration (in seconds) is required to distinguish between a trailer and a full length movie. The prototype implementation assumes that the genre of each video is known. The playback duration is calculated as the ratio between the total number of frames and the frames displayed in each second: P laybackT ime =
F rames F rames Second
The total number of frames as well as the frames per second can be found in the header of the media file. Semantical closeness between a new proxy and existing group is calculated as follows: All movies with the same genre have to be mapped to one category c. Then the average playback duration for all movies in the category is calculated. M
1 X AvgDurationc = ∗ P laybackT imei M i=1
(4.2)
where M is the number of movies belonging to the category (c) and P laybackT imei is the duration of movie i (in seconds). In the next step the semantical closeness values for the categories that are available on the new proxy (P) and in the examined group
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 43
(G) are calculated. The semantical closeness for each category SCc is calculated avgP b(G) if avgP b(G) < avgP b(P ) avgP b(P ) avgP b(P ) SCc = avgP if avgP b(G) ≥ avgP b(P ) b(G) 0 if avgP layback(P ||G) == 0 The average semantical closeness SC for all categories is calculated as: CA
SC =
1 X SCc CA c=1
(4.3)
where CA is the number of categories and SCc is the semantical closeness for category c. The semantical closeness SC between a new proxy and an examined group can take values between 0 and 1. The more videos from the same genre and the same playback duration are available on both the proxy and within the group the higher is the semantical closeness. A scenario evaluating the formation of one single group can be found in the following section. The scenario is based on real Internet conditions using PlanetLab [Mat04]. At the time the measurements were performed PlanetLab consisted of 612 available nodes distributed in networks all over the world (see Figure 4.7).
Figure 4.7: Logical PlanetLab view
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 44
Semantical Closeness Scenario In order to make the scenario comprehensible only 12 out of 612 available planet nodes are used. From the 12 nodes, 3 nodes are located in California (US), 3 in New York (US), 3 in Great Britain, 2 in Germany and 1 in Austria. For simplicity it is assumed that only 2 types of content with identical playback times are shared by the proxies. So in this scenario only 2 large groups are formed. A schematic view can be found in Figure 4.8. Proxies that share sport videos are labeled using
Figure 4.8: Semantically based group formation yellow, proxies that share entertainment movies are labeled using blue. The blue and the yellow lines connect all proxies that belong to the same group. The lines are used to visualize the large distance between proxies belonging to the same group. To put the characteristics of the groups to numbers, network measurements between all proxies have been made. The measurements have been repeated every 10 minutes for 24 hours. The average throughput between the proxies of the group yellow group was 639Kbit/s the average throughput within the blue group was 849 Kbit/s. The bottleneck in the yellow group was measured between East Irvine (California) and Essen (Germany). The bottleneck in the blue group was measured between Harbor City (California) and Innsbruck (Austria).
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 45
4.2.2
Network Closeness (NC)
Network closeness is a metrics to maximize the throughput between the proxies and future clients by forming multiple small groups. In the most extreme case each group consists only of 1 proxy (each proxy has the best connection to itself). In this extreme case no cooperation with other proxies exists. Beside the extreme case each new proxy can be tuned to join the group with the highest average bandwidth (network closeness) to all members. The N etworkCloseness to each group is calculated as: 1 N
N etworkCloseness = max(1,
N P
AvailBW (i)
i=1
AvailU ploadBW
)
N is the number of connected group members, AvailBW(i) is the measured available bandwidth (measured in bit/s) between the new proxy and group member i, 1 ≤ i ≤ N and AvailUploadBW is the upload bandwidth (measured in bits/s) of the new proxy. The higher the available bandwidth to the already connected group members the better is the network closeness value. In case that the AvailBW is equal to AvailUploadBW, network closeness takes the value of 1. Otherwise it is between 0 and 1. A schematic view can be found in Figure 4.9. In the example a new proxy
Figure 4.9: Network throughput based group joining measures the throughput to two alternative groups and joins the one with the higher throughput to all proxies.
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 46
Network Closeness Scenario This scenario is based on the configuration presented in section 4.2.1. Here a new proxy decides to connect to another proxy based on the measured throughput. So when proxy A has the highest throughput to proxy B (A → B) and proxy C has the highest throughput to proxy A (C → A) the three proxies (C → A → B) form a group. The group formation in this example is represented in Figure 4.10. Contrary to the scenario in Section 4.2.1, four instead of two groups are formed. The color of the nodes indicates the group membership. It can be seen that all nodes join the local group. Full information about the group formation can be found in Table 4.3 and the average throughput values between the proxies of each group are listed in
Figure 4.10: Network Closeness based Group Formation Proxy/Group los angeles new york cambridge eastirvine X harborcity X innsbruck kaiserslautern los angeles X manchester baltimore X princeton X
london X
essen
X X X
Table 4.3: Group formation Table 4.4. The name of the group corresponds to the host name of the leader. The
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 47
Group Name los angeles new york london essen
Throughput Kbit/s 23730.48 40244.24 19711.33 6243.758
Table 4.4: Throughput measurements between the proxies are the same as in the previous scenario. The difference is that the groups are much smaller and have a much higher density than in the first scenario. So for example the average throughput within the group los angeles is 37% higher than in the global yellow group. The disadvantage of only considering network throughput for group formation is that the content in the groups is heterogeneous. For example in the first scenario node East Irvine and Harbor City belonged to different groups (one sharing sport, the other sharing entertainment videos) - in the second scenario they belong to the same group. Building heterogeneous groups is expected to result in a lower hit rate than homogenous groups. Due to time constraints a detailed evaluation concerning this aspect is not in the scope of this thesis.
4.2.3
Proxy-to-Proxy Scenario
In this section a tradeoff between two homogenous group with slow connections (content delivery network behavior) and multiple heterogeneous groups with very good connections (peer-to-peer network behavior) is presented. The tradeoff is achieved by splitting large homogenous groups into subgroups. The criterion for splitting is the bandwidth between the proxies. So for example when proxy A can decide either to connect to proxy B or to proxy C (both share the same content) it will connect to the one with the better bandwidth connection. The condition for establishing the connection is that the relative bandwidth value is higher than the relative semantical closeness value. Otherwise the proxy refuses to form a group and waits for a better connection. The example in this section is based on the first scenario presented in
CHAPTER 4. PROXY AFFINITY BASED TOPOLOGY GENERATION Page 48
Section 4.2.1 where two types of videos and identical playback durations. A visual representation can be found in Figure 4.2.3. All proxies that belong to the same group are within the same circle. Details about the group formation can be found in
Table 4.5 The yellow and the blue group from the first scenario are split into 3 yellow Proxy/Group cambridge eastirvine kaiserslautern los angeles manchester baltimore princeton new york
los angeles new york
london X
innsbruck
X X X X X X X Table 4.5: Proxy-Groups
and 4 blue groups. Three of the groups consist of 2 or 3 members, 1 group consists of one member. The average throughput values from the groups consisting of more than one member can be found in Table 4.6. Group Name los angeles new york london innsbruck
Throughput Kbit/s 8161.23 1972.52 4235.70 639.03
Table 4.6: Proxy-Group Throughput
CHAPTER
5
Proxy-to-Proxy: Implementation
The core components of the proxy-to-proxy network are the multimedia proxies. Multimedia proxies are workstations dedicated to storing, processing and delivering multimedia data. For the end-clients the proxies serve as entrance elements to the proxyto-proxy network. The structure of proxy in UML notation can be seen in Figure 5. Each proxy is
Figure 5.1: Logical Proxy View identified by an IP address and an unique name assigned by a naming manager as described in Section 5.1. Each proxy stores multiple media files. The content is either (1) provided by the proxy owner or (2) cached from previous requests where the proxy served as an intermediate component. Each media file is identified by the name and the number of elementary streams. The media files are encoded in MPEG-4 format using the multiple description approach (see section 7.2) and can therefore consist of
CHAPTER 5. PROXY-TO-PROXY: IMPLEMENTATION
Page 50
multiple streams.
5.1
Proxy Groups
In this section the software components required for the group formation process are described. The central component is the naming manager (Figure 5.2 ). which offers (1) query- and (2) a registration service. The query service is based on a standard
Figure 5.2: The naming manager view domain name server used for resolving proxy names to IP addresses. The registration service is used for updating the database of the domain name server in case proxies join or leave the system. Figure 5.3 shows an example scenario. (1) In the scenario a
Figure 5.3: Proxy join operation new proxy with the IP address 143.205.122.117 is started by a home user. In this state the proxy is in stand-alone mode. It has no information about other proxies that are
CHAPTER 5. PROXY-TO-PROXY: IMPLEMENTATION
Page 51
already part of the proxy-to-proxy network. (2) The proxy contacts the registration service that is part of the naming manager. The naming manager 5.2 is a dedicated node somewhere in the Internet that has a static IP address. Once contacted, the registration service creates a new entry in the database using the addProxy method (Figure 5.4). The entry includes the IP address as well as an alias name for the proxy.
Figure 5.4: Proxy Management Schema In this scenario the name ”proxy8” is selected. (3) In the next step a group to join has to be selected. For the join operation the proxy affinity is calculated. An abbreviation (not implemented in the prototype system) would be to introduce the concept of group leaders. Instead of contacting each proxy it would be sufficient to contact each leader. (4) In this example the proxy joins group 2 because it has the highest affinity value.
5.2
Measuring Network Characteristics
A basic precondition for calculating the network closeness is the available bandwidth between the proxies. The actual bandwidth is determined by a network measurement and forecasting tool which is organized in a 3 layer architecture including a discovery
CHAPTER 5. PROXY-TO-PROXY: IMPLEMENTATION
Page 52
layer (section 5.2.1), a data collector layer (section 5.2.2) and a forecast layer (section 5.2.3).
5.2.1
The Discovery Layer
The discovery layer has the task of discovering the path from the source to the destination proxy [Chr04b]. The source proxy is the one that wants to join the network - the destination proxy is a member of the group currently examined. The discovery layer has the task of identifying each network element on the path between the source and the destination proxy in order to discover route changes [Chr04a]. The path is rediscovered every time before a new network measurement is started. The discovery process is based on iteratively sending ICMP (Internet Control Message Protocol) [Pos97] packets from the source to the destination proxy. The first packet has TTL value of 1, it expires at the fist network element and an error message is generated. The ICMP error message [Pos97] includes the IP address of the last network element that was reached. After each iteration the TTL is incremented by 1. The iteration is stopped as soon as the received error message contains the IP address of the destination proxy.
5.2.2
The Data Collection Layer
The data collection layer is used for measuring the bandwidth characteristics of the path provided by the discovery layer. The bandwidth measurement is based on a technique called packet bunch mode [Ver97]. The available bandwidth BA is calculated as the ratio of transmitted bits to the time required for the transmission. BWA =
Bitstransmitted T ransmissionT ime
It is important to note that by using active network measurements, the results can be adversely affected by the (1) processing time of the network elements and (2) the influence of network packets from other streams. In case that the gap between two
CHAPTER 5. PROXY-TO-PROXY: IMPLEMENTATION
Page 53
measurement packets is enlarged, the network bandwidth is underestimated. Otherwise if the gap is shortened the bandwidth is overestimated. In order to produce precise and accurate measurements it is necessary to exclude both types of erroneous gaps. Shortened gaps are detected when the inter arrival gaps are shorter than the inter delivery gaps. This is obviously an error since the available bandwidth between the sender and the receiver cannot be higher than the upload bandwidth of the sender. After removing packets that lead to overestimations, the ones that lead to underestimation have to be excluded. Detecting enlarged gaps is much harder than detecting shortened ones as interarrival gaps are always larger than intersending gaps in case that the download capacity of receiver is below the upload capacity of the sender. In order find to a representative gap size the packet pair with the median gap size is selected because the median size is assumed to be least influenced by any cross traffic.
5.2.3
The Forecast Layer
The forecast layer is used to produce precise forecasts for specific time periods. The forecasts are based on measurements captured by the data collector layer. Producing forecasts is always a tradeoff between (1) the required time, (2) the size of the data set and (3) the accuracy required. The models implemented in the measurement tool are simple and similar to those used by the Network Weather Service(NWS) [Ric98]. The implementation differs from NWS in the way that an active error correction mechanism for the forecasts has been added [Chr04b]. Every forecast depends on a set of initialization values known as hold out set. Different hold out sets show different patterns. Every time a forecast has to be performed, the best fitting model needs to be selected. The proper forecasting model is selected by applying all implemented models to a small subset of the hold out set. The model that leads to the smallest forecast error within the hold out set is used to make the predictions. In contrast to NWS the result of the forecast is not a single value but a so called confidence interval [Chr04b]. The higher the required precision, the larger is the resulting interval.
Part II Error Concealment and Error Avoidance
54
CHAPTER
6
Forward Error Correction
Forward error correction is based on the principle of reconstructing data that has been lost during network transmission [Col01]. To enable data reconstruction, redundant data has to be transmitted. The price for being able to reconstruct lost data is higher network load and the associated risk of even more packet loss. Finding a balance between being able to recover from errors and additional network congestion is a challenging task. The focus of this chapter is on media-independent [Jar06] forward error correction.
6.1
Media-Independent Forward Error Correction
Media-independent forward error correction requires no knowledge about the delivered content. One example for media-independent forward error correction is the parity coding approach that is based on the exclusive-or (XOR) operation. Using this approach parity packets are generated for groups of data packets. The parity packets are then transmitted together with the original packets. Section 6.1.1 describes the simple approach of sending each original packet twice and Section 6.1.2 describes the approach of generating parity packets. Section 6.1.3 compares the arrival probabilities of the approaches against each other.
CHAPTER 6. FORWARD ERROR CORRECTION
6.1.1
Page 56
Duplicating Original Packets
The simplest media-independent error-correction approach is to produce a forward error correction stream by copying packets (send every packet twice) (Figure 6.1).
Figure 6.1: Simple FEC Scheme The original packets are labeled a to d, the copied packets are labeled f(a) to f(d). The function ( f) represents the copying. Copying packets has two disadvantages: (1) it produces a network overhead of 100 % and (2) when the original and the copy of the packet are lost the data cannot be reconstructed. Assuming that the original stream consists of n packets, the number of packets transmitted over the network is 2n. The probability of receiving at least n packets that are different from each other (no duplicates) is calculated as: ploss (n, p) = (1 − p2 )n
(6.1)
where (1−p2 ) is the probability that either the original packet or duplicate is received, and n is the number of original packets.
6.1.2
Generating Parity Packets
An alternative to sending all original packets twice is calculating parity packets. The basic idea is that n original packets are protected by k parity packets, where 0 < k < n. Instead of 2n packets only n + k packets have to be transmitted. Figure 6.2 shows an example where pairs of subsequent packets are used to calculate the parity packet. The binary XOR operation is represented by f(P1,P2) where P1 and P2 are the input packets. In order to successfully reconstruct the original data it is sufficient to receive any n out of n+k packets, as demonstrated by a simple example
CHAPTER 6. FORWARD ERROR CORRECTION
Page 57
Figure 6.2: A parity packet FEC scheme
Figure 6.3: Parity-packet based forward error correction in Figure 6.3. The original packets are on the left side, labeled P1 to P3, the parity packets are on the right side, labeled FP1 and FP2. As can be seen in Figure 6.3, the information contained in P1 and P2 is combined in FP1 and the information contained in P2 and P3 is combined in FP2. Assuming that packets P1 and P2 are lost, P2 can be reconstructed using P3 and FP2. P1 can be reconstructed using the reconstructed P2 in combination with FP1. The probability of receiving at least n error free packets using this approach can again be calculated in two steps: First the probability of loosing exactly i out of n+k packets is calculated as: ploss (n, k, i, p) =
n+k i p (1 − p)n+k−i i
(6.2)
where n is the number of original packets, k is the number of FEC packets, i is the number of lost packets and p is the loss probability of the network link. Based on Equation 6.2 the probability of receiving at least n out of n+k packets is calculated as: psuccess (n, k, p) = 1 −
n−1 X i=0
ploss (n, k, i, p)
(6.3)
CHAPTER 6. FORWARD ERROR CORRECTION
6.1.3
Page 58
Comparing Packet Arrival Probabilities
The formulas for calculating the probabilities of successfully receiving n packets by (1) duplicating packets and (2) adding k FEC packets have been presented in sections 6.1.1 and 6.1.2. This section presents the comparison between the two approaches and the alternative of using no forward error correction at all. Assuming that 10 data packets are sent over a link with a loss probability of 20 % and no forward error correction is used, the probability of successfully receiving the packets without using forward error correction is (0.810 = 0, 1) is 10 %. When each packet is sent twice the probability of receiving all data packets is 66,4% with an associated network overhead of 100 %. According to Equation 6.3, adding 9 parity error correction packets leads to an arrival probability of 99,3 %. Protecting pairs of original packets with parity packets leads to a massive improvement concerning the arrival probability. The negative aspect about protecting pairs of packets is the massive network overhead (100 %). For this reason the experiment has been repeated and the number of forward error corrections has been decreased. The results can be found in Table 6.1.
CHAPTER 6. FORWARD ERROR CORRECTION
FEC overhead 90 % 80 % 70 % 60 % 50 % 40 % 30 % 20 % 10 %
Page 59
Arrival Probability 99,3 98,9 97,3 93,8 87,0 74,7 55,8 32,2 10,7
Table 6.1: Arrival Probabilities Analyzing the results in Table 6.1 it can be seen that sending only 4 error correction packets already leads to a 8,3 % higher arrival probability that sending every packet twice (74,7 % vs. 66,4 %) with a network overhead of 40 % vs. an overhead of 100 %. So, for the error correction approach in the Proxy-to-Proxy network parity packet based one has been selected. A model for determining the proper granularity can be found in Chapter 8.
CHAPTER
7
Layered Video Coding
This chapter gives an overview of layered video coding, a principle for encoding raw video into several bit streams. Two well known approaches are (1) scalable and (2) multiple description video coding, described in Sections 7.1 and 7.2.
7.1
Scalable Coding
In scalable coding an elementary video stream consists of one base layer and N (N ≥ 1) enhancement layers, each of which is encoded in a hierarchical fashion. That is to say layer N directly depends on layer N-1 and indirectly on all N-1 lower layers (Figure 7.1 ). The base layer provides the base quality and each enhancement
Figure 7.1: Scalable Coding - Bit Stream Hierarchy layers augments the base quality either in the temporal, spatial or frequency domain. The advantage of scalable coding is the possibility of adapting quality of the video content to the state of the network without transcoding. This allows graceful degradation in case that the network bandwidth is not sufficient to transmit the content
CHAPTER 7. LAYERED VIDEO CODING
Page 61
without quality loss. The clear disadvantage of using enhancement layers is the resulting dependencies. An error that occurs at layer l, (1 ≤ l ≤ N ) is propagated to all higher layers. In case that layer l is lost all higher layers are rendered useless. Sections 7.1.1 to 7.1.3 give an overview of scalable video coding in the temporal, spatial and quality domain.
7.1.1
Temporal Scalability
Temporal scaling is used to encode a raw video sequence into multiple layers, each having the same spatial resolution but different frame rates. Decoding only the base layer results in a lower frame rate than that of the original stream. Each enhancement layer stores the missing frames, thus increasing the frame rate with every enhancement layer that is decoded. A block diagram for an encoder producing one base and one enhancement layer can be found in Figure 7.2. According to [Dap01] the base layer
Figure 7.2: Blockdiagram temporal scalable encoder Scheme and enhancement layer are created in 6 steps. The steps 1 and 6 are identical for the both layers whereas steps 2 to 5 are only required to produce the enhancement layer.
CHAPTER 7. LAYERED VIDEO CODING
Page 62
1. The raw video is temporally down-sampled, transformed using the discrete cosine transform (DCT) and quantized. Temporal down-sampling is achieved by skipping frames. For example a down-sampling ratio of 2:1 is achieved by skipping every second frame. 2. In order to produce the enhancement layer each frame is reconstructed by inverse quantization and the inverse discrete cosine transform 3. The input for the enhancement layer is temporally up-sampled to the original frame rate. An exemplary sampling rate of 1:2, can be achieved by duplicating every frame. 4. The difference between the reconstructed and the original frames is calculated, known as the residual. 5. The residual is transformed using the discrete cosine transformation and quantization 6. The quantized coefficients of the (1) base- and (2 ) enhancement layers are encoded using variable length coding To decode the base and enhancement layers (1) variable length coding, (2) inverse quantization and (3) inverse cosine transformation have to be applied. The base layer frames are then temporally up-sampled and combined with the difference information stored in the enhancement-layer. A block diagram for a decoder can be found in Figure 7.3.
CHAPTER 7. LAYERED VIDEO CODING
Page 63
Figure 7.3: Blockdiagram temporal scalable decoder
7.1.2
Spatial Scalability
Spatial scaling is used to encode a video sequence into multiple layers having the same frame rate but different spatial resolutions. When only the base layer is decoded the spatial resolution of the resulting video is below the one of the original stream. Decoding the enhancement layer increases the spatial resolution to the original size. A block diagram for the encoder can be found in Figure 7.4. According to [Dap01]
Figure 7.4: Blockdiagram spatial scalable encoder base layer and the enhancement layer are created using 6 steps. Once again step 1 and 6 are identical for the both layers whereas, steps 2 to 5 are only applied to the enhancement layer.
CHAPTER 7. LAYERED VIDEO CODING
Page 64
1. The raw video is spatially down-sampled, transformed using DCT and quantized to get the input for both layers 2. To produce the enhancement layer each frame is reconstructed using inverse quantization and the inverse discrete cosine transform 3. Each frame is spatially up-sampled to the original size using interpolation 4. For the enhancement layer each frame is up-sampled and subtracted from the original image. This difference is known as the residual 5. The residual is transformed using the discrete cosine transformation and quantized 6. The coefficients from the (1) base- and (2) enhancement layers are encoded using variable length coding In order to decode the (1) base- and (2 ) enhancement layers variable length coding, inverse quantization and inverse cosine transformation have to be applied. Each base-layer frame is spatially up-sampled and combined with the residual stored in the enhancement-layer. A block diagram of decoding process can be found in Figure 7.5.
Figure 7.5: Blockdiagram spatial scalable decoder
CHAPTER 7. LAYERED VIDEO CODING
7.1.3
Page 65
Scalable Coding: Signal-to-Noise-Ratio Scalability
Signal-to-noise ratio (SNR) scaling is used to code a video sequence into multiple layers at the same frame rate and spatial resolution with differing quantization accuracies. The DCT coefficients in the base layer are quantized with a coarse quantizer and the subsequent differences to the original quality are stored in the enhancement layers. A block diagram of an SNR-scalable encoder can be found in Figure 7.6. The
Figure 7.6: Blockdiagram signal-to-noise scalable encoder encoder produces one base layer and one enhancement layer. According to [Dap01] the base layer is created by transforming the raw video using the discrete cosine transform, quantizing the coefficients and applying variable length coding. The differences between the highly quantized base layer and the original stream are stored in the enhancement layer using the following steps: 1. The input for producing the enhancement layer is the quantized base layer 2. The DCT coefficients are reconstructed by inverse quantization 3. The reconstructed DCT coefficients are subtracted from the original DCT coefficients. The difference between the DCT coefficients is called residual 4. The residual is quantized by a quantization parameter smaller than the one of the base layer.
CHAPTER 7. LAYERED VIDEO CODING
Page 66
5. The quantized bits of each layer are coded using variable length coding The steps for decoding a SNR scalable video can be found in Figure 7.7. Both the
Figure 7.7: Signal-to-Noise Scalable Decoder Scheme base and the enhancement layers must be decoded using variable length coding and inverse quantization. Following this base-layer is modified by the differences stored in the enhancement layer.
CHAPTER 7. LAYERED VIDEO CODING
7.2
Page 67
Multiple Description Coding
The second well known approach for layered coding is multiple description coding [Yao05]. Multiple description coding (MDC) was established as an approach to enhance the error resilience of video delivery systems in the early 1980s [Joh02]. Here the signal is encoded into a number of separate bit streams called descriptions, each of which has roughly the same (1) size and (2) influence on media quality and provides the same base quality. The more descriptions are used together, the better the quality of the stream. Multiple description coding differs from layered coding in the sense that every description can be used to provide the base quality. However the price for the redundancy is a lower compression efficiency: more bits are required to meet the same quality as a conventional single description coder. In this thesis the overview of MDC is restricted to the polyphase down sampling approach. Polyphase down sampling has the advantage of being completely independent of any underlying video codec [Vit05]. The approach is characterized by a pre-processing stage at the encoder and a post-processing stage at the decoder. Sections 7.2.1 and 7.2.2 describe polyphase down sampling in the temporal and spatial domain. When applied in the temporal domain the separation is between subsequent frames whereas applying it in the spatial domain produces the separation between subsequent pixels.
CHAPTER 7. LAYERED VIDEO CODING
7.2.1
Page 68
Temporal Polyphase Down-Sampling
Polyphase down sampling in the temporal domain causes the frames from the raw video stream to be divided between multiple descriptions [Ric04]. In case that two descriptions are produced even and odd frames are demultiplexed (see Figure 7.8) [Yao05]. Demultiplexing is performed before the encoding starts. For encoding any
Figure 7.8: Temporal Polyphase Down sampling standard conformant video encoder can be used. In the scope of this thesis the FFMPEG API was used to produce the descriptions. Each description conforms to the MPEG-4 baseline format. descriptions. Each stream has a separate prediction loop where motion estimation and motion compensation are performed. The negative aspect of this approach is that the prediction gain decreases as it results in a larger distance between neighboring frames compared to conventional single description coding. The more descriptions produced, the higher the bit rate of each individual description. The bit rates from encoding 18 different video sequences applying multiple description encoding in the temporal domain are presented in Table 7.3.
CHAPTER 7. LAYERED VIDEO CODING
Page 69
The table presents the additional storage/bandwidth requirements caused by the missing temporal dependencies. The min, mean and max values for the additional size can be found in Table 7.2. FileName bridge carphone clair coastguard container foreman grandma highway lotrings mother news salesman silent superman f1-canada davinci estonia orf-news
Size 1.9MB 382KB 226KB 424KB 208KB 477KB 511KB 1.5MB 2.4MB 168KB 274KB 352KB 284KB 14MB 7.8MB 5.8MB 154MB 51MB
Size1 998KB 213KB 125KB 265KB 117KB 283KB 276KB 802KB 1.3MB 93KB 157KB 197KB 156KB 8.7MB 4.8MB 3.9MB 88MB 30 MB
Size2 998KB 210KB 123KB 255KB 109KB 267KB 276KB 783KB 1.3MB 87KB 147KB 198KB 148KB 8.6MB 4.7MB 3.9MB 88MB 30MB
Additional Size 5,05 % 10,7 % 0,97 % 22,6 % 8,65 % 15,3 % 8,02 % 5,6 % 8,3 % 7,14 % 10,94 % 12,21 % 7,04 % 23,57 % 21,79 % 34,48 % 14,28 % 17,64 %
Table 7.1: Temporal MDC downsampling results
Additional Size min 0,97 % mean 13,02 % max 34,48 % Table 7.2: Summary temp. MDC results Analyzing the results from Table 7.2 it can be seen that for the 18 test streams the average overhead for producing two independent descriptions is 13,02 %. In the best case the overhead is only 0,97 %, in the worst case 34,48 %.
CHAPTER 7. LAYERED VIDEO CODING
Page 70
In the next experiment the receipt of one from two temporal descriptions is compared against loosing N% of the original stream. The N% in the original stream are selected randomly. Due to the graceful degradation the MDC based streaming results in much better results as can be found in table 7.3: Analyzing the results from TaFileName bridge carphone clair coastguard container foreman grandma highway lotrings mother news salesman silent
Throughput Kbit/s Original Stream (MOS) 1 Description (MOS) 431 1,08 3,0 372 1,47 3,0 111 2,57 4,0 476 1,34 2,85 182 1,82 3,0 449 1,28 3,07 310 2,13 4,0 375 1,13 3,86 1003 1,19 4.32 161 1,83 4,0 210 1,98 4,34 210 1,98 4,79 210 1,66 4,26 Table 7.3: Temporal MDC downsampling results
ble 7.3 it can be seen that receiving 1 description always yields a better result than sending the original stream in full quality. The quality improvement in % can be found in Table 7.4. Summarizing the experiment it can be said that quality of the 13 Quality Difference min 23,6 % mean 41,58 % max 62,6 % Table 7.4: Quality Difference test streams, that were generated using the prototype temporal multiple description coding approach, was on average 41,58% better than the quality of the corresponding single video streams that were transmitted under the same conditions.
CHAPTER 7. LAYERED VIDEO CODING
7.2.2
Page 71
Spatial Polyphase Down sampling
Similarly to temporal polyphase down sampling (Section 7.2.1), multiple descriptions can be created by demultiplexing neighboring pixels. Figure 7.9 shows a scenario where four descriptions are produced in two subsequent phases. In this approach down sampling is first performed along the rows and then along the columns. Due to strong inter-pixel correlations each of the four description contains the main features of the original image. The additionally required storage space / bandwidth requirement from encoding
Figure 7.9: Spatial Down sampling 10 different video sequences compared to using a standard MPEG-4 encoder are presented in Table 7.5.
CHAPTER 7. LAYERED VIDEO CODING
Page 72
The min, mean and max overheads can be found in Table 7.6. Comparing these results to the ones presented in Table (7.4) it can be seen that the overhead using the prototype temporal multiple description encoder is in the average case 5,24 % and in the max case 15,78 % lower than using the prototype spatial multiple description encoder. Only in the case of the minimal overhead the spatial encoder performed 7% better. FileName bridge carphone clair coastguard container foreman grandma highway lotrings mother news salesman silent superman f1-canada davinci
Size 1.9MB 382KB 226KB 424KB 208KB 477KB 511KB 1.5MB 2.4MB 168KB 274KB 352KB 284KB 14MB 7.8MB 5.8MB
Size1 1.2MB 292KB 216KB 291KB 131KB 402KB 339KB 989KB 697MB 143KB 193KB 223KB 210KB 9.3MB 5.9MB 3.7MB
Size2 Additional Size 1.3MB 31,58 % 311KB 57,58 % 230KB 95,13 % 298KB 41,51 % 137KB 29,81 % 437KB 75,68 % 376KB 39,33 % 1100KB 39,27 % 713MB 16,67 % 151KB 76,79 % 201KB 43,8 % 235KB 30,11 % 216KB 50,7 % 11MB 21,43 % 6.5MB 61,54 % 4.3MB 37,93 %
Table 7.5: Spatial MDC downsampling results
Additional Size min 16,67 % mean 46,82 % max 95,13 % Table 7.6: Summary spat. MDC results
CHAPTER 7. LAYERED VIDEO CODING
Page 73
In the next experiment the receipt of one from two spatial descriptions is compared against loosing N% of the original stream. N % of the original stream corresponds to the size of the description that is assumed to be lost. Analyzing the results from FileName bridge carphone clair coastguard container foreman grandma highway lotrings mother news salesman silent
Throughput Kbit/s Original Stream (MOS) Description1 (MOS) 431 1,08 3,0 372 1,47 2,26 111 2,57 3,29 476 1,34 1,98 182 1,82 2,76 449 1,28 1,76 310 2,13 3,28 375 1,13 2,35 1003 1,19 4.71 161 1,83 3,28 210 1,98 2,32 210 1,98 3,01 210 1,66 2,63 Table 7.7: Spatial MDC downsampling results
Table 7.7 it can be seen that receiving 1 description always yields a better result than sending the original stream in full quality. The better quality values in % can be found in Table 7.8. Summarizing the experiment it can be said that quality of the Quality Difference min 6,8 % mean 23,34 % max 70,4 % Table 7.8: Quality Difference 10 test streams that were generated using the prototype spatial multiple description coding approach was on average 23,34% better than the quality of the corresponding single video streams that were transmitted under the same conditions. Due to the much lower storage space and bandwidth overhead as well as the better loss probabilities the streams in the proxy-to-proxy approach are encoded using multiple description coding in the temporal domain.
CHAPTER
8
Stream Affinity based Error Treatment
Chapters 6 and 7 present two approaches for concealing and avoiding errors (forward error correction and layered coding). The limitation in these approaches is that the ability to deliver the content in the desired quality depends on the characteristics of the host-, network- and content. Error avoidance can only be applied when a media stream (1) consists of multiple descriptions and (2) is available, at least partially, on multiple proxies. The effectiveness of forward error correction is heavily influenced by the loss patterns. Unfortunately in case of burst losses, parity based forward error correction is only of limited profit. So in the context of this thesis a metrics called Stream Affinity was developed. Stream Affinity can be used to decide between (1) error concealment, (2) error avoidance and (3) a combination of both approaches in dependency of current host-, network- and content characteristics. If none of the three treatments is able to achieve error freeness, requests are rejected. In order to avoid comparing all possible numbers of streaming servers and amounts of redundant network packets, the well known, heuristic A∗ algorithm is used.
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
8.1
Page 75
Multiple Source Streaming
Multiple source streaming is an approach that enables the delivery of (1) multiple descriptions or (2) a combination of data- and forward error correction streams to one receiver. Multiple source streaming has been developed together with the author of [Ste06], who has examined • Round Robin Scheduling • Fair Scheduling • Weighted Fair Scheduling for regular MPEG-1/2/4 streams. In this thesis the work of [Ste06] has been extended by examining the delivery of multiple description coded streams. An important aspect about the delivery of MDC streams is to preserve the independency during the network transmission. In order to outline the influence of the scheduling strategy on the quality of the media stream 4 scenarios are presented. For all experiments a media stream called Akio, available from [YUV], has been used. The stream was encoded using multiple description coding in the temporal domain and consists of two descriptions. The full stream has a rate of 1081 Kbit/s, description 1 has a rate of 539 Kbit/s and description 2 has a rate of 542 Kbit/s. The experiments have been performed using the network simulator NS-2 [NS201] and a plug-in called EvalVid [Jir03]. A logical view of the topology used within NS-2 can be found in Figure 8.1. The figure shows the streaming servers, the player (in fact a quality analyzing tool) as
Figure 8.1: Logical view of the testbed
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Page 76
well as the RTP mixer. The RTP mixer is an intermediary bidirectional component that is located between the server(s) and the player. RTSP messages from the player are duplicated and forwarded to each streaming server. Data streams from multiple servers are merged and forwarded as one single stream to the player. The merging process is transparent so that any standard conformant media player/analyzing tool can be used.
8.1.1
Scenario 1 - Round Robin Scheduling
In the first scenario (Figure8.2) two streaming servers are used. The paths from the servers to the mixer have the same upload bandwidth (550 Kbit/s). Description 1 and
Figure 8.2: Logical View Scenario 1 and Scenario 2 description 2 can be transmitted without loss. The scheduling decision is to ignore the coding format, use coding independent round robin scheduling. When coding independent round robin scheduling is used the content is delivered from both servers on a per byte basis. During the first run the network state does not change - the stream can be rendered in full quality, having a mean opinion score of 5.0. The experiment is repeated once. During the second run a network element on path 2 fails immediately after the start of the streaming session. Although the stream was encoded in two independent descriptions the data that is received over path 1 is rendered useless due to the coding unaware round robin scheduling. The mean
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Page 77
opinion score is 1.0. A solution for this problem can be found in Scenario 2.
8.1.2
Scenario 2 - Coding Aware Fair Scheduling
The second scenario is highly similar to the first one with the difference that coding aware fair scheduling is used instead of coding unaware round robin scheduling. The fragmentation between the servers is based on coding dependent units called chunks. In case of a regular MPEG-1/2/4 video a chunk can be a slice, frame or group of pictures. When multiple description coding is used a chunk is a description. Coding aware fair scheduling protects the independence between the two descriptions during the transmission. In this scenario, server 1 sends description 1 (539 Kbit/s) and sever 2 sends description 2 (542 Kbit/s). Similarly to the second run of scenario 1 a failure occurs on path 2. Contrary to scenario 1 the data received over path 1 can still be decoded successfully. In total only 542 packets are lost due to the failure on path2. The measured mean opinion score is 4.34. Compared to the second run of scenario 1 the quality is 434 % better under the same conditions.
8.1.3
Scenario 3 - Coding aware Weighted Fair Scheduling
In the third scenario path1 and path2 have different available upload bandwidths of 725 Kbit/s and 240 Kbit/s respectively. Thus the bandwidth of path 1 is 26,4 % above the bit rate of description 1 and the bandwidth on path 2 is 32,47 % below the bit rate of description 2. Coding dependent weighted fair scheduling is used to balance the differences between the paths. Different path capacities are balanced by assigning weights to the senders. The principle of weighted fair source scheduling can be explained with the help of a byte counter. The actual value of the byte counter is calculated by weighting the number of transmitted bytes. In this scenario, server 1 sends description 1 plus 34 % of description 2. Server 2 sends the remaining 66 %
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Page 78
Figure 8.3: Logical View Scenario 3 of description 2. The fragmentation of description 2 between server 1 and server 2 is coding aware and done on a per GOP basis within the description. So for description 2 there are no dependencies between the fragments sent by server 1 and server 2. During the first run the network state remains unchanged and the stream can be delivered without quality loss. The measured mean opinion score is 5.0. During the second run path 1 fails immediately after the start of the session. Description 1 plus 34 % of description 2, 1699 (1248+451) packets (the major part of the stream) is lost. The measured mean opinion score is 3.02 ( 39,6 % below the original quality). A solution to this problem is presented in the fourth scenario (Section 8.1.4).
8.1.4
Scenario 4 - Coding aware Weighted Fair Scheduling with Forward Error Correction
In this scenario a combination of coding aware weighted fair scheduling with forward error correction is presented. The media data is sent from the servers 1 and 2. The network condition between server 1 and 2 to the receiver is identical to the third scenario. The forward error correction packets are sent from server 3. Path 1 fails immediately after the start of the streaming session and again 1699 packets are lost. The mixer detects the loss and triggers the transmission of the forward error
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Page 79
Figure 8.4: Scenario5 correction packets, allowing 924 packets from description 1 to be reconstructed. The total number of lost packets is then 775 caused by burst losses. The mean opinion score of the merged stream is 4.08 ( 18,4 % below the original quality). The result is 21,2 % better than the result from the second run in scenario 3.
8.1.5
Conclusion from the streaming scenarios
Based on the results of the streaming scenarios presented in Sections 8.1.1 to 8.1.4 the scheduling strategy used in the proxy-to-proxy is always weighted fair scheduling with and without forward error correction. In order to calculate the weight assigned to different paths and decide about the degree of forward error correction a metrics called stream affinity (Section 8.2) is used.
8.2
The Stream-Affinity Metrics
The stream affinity metrics is used to find the best error treatment for media stream delivery over best effort networks. The metrics combines two measures called Network Closeness and Quality Closeness: StreamAf f inity =
N Y 1 ∗ QualityCloseness ∗ N etworkClosenessi N i=1
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Page 80
where N is the number of streaming servers, N etworkClosenessi represents the network capabilities between proxy i and the end-client and QualityCloseness represents the probability of receiving the stream in the desired quality. One important aspect is that the more streaming servers are required to achieve a certain quality the less is the affinity value ( N1 ). Stream Affinity can take values between 0 and 1.
8.2.1
Network Closeness
Network closeness is used to select between alternative streaming servers based on two factors: (1) the available bandwidth to the receiver and (2) the bit rate of the video stream. In order not to harm other network streams ”TCP-friendliness” [Jit98] is considered when calculating the available bandwidth: AvailableBandwidth = tRtt
q
2p 3
s q + tRT O (3 3p )p(1 + 32p2 ) 8
(8.1)
where s is the packet size, tRT T is the round-trip time, p is the packet loss probability and tRT O is the TCP retransmit timeout value. Network Closeness is calculated as the ratio between the required bit rate and the available bandwidth: N etworkCloseness = min(1,
AvailableBandwidth ) Bitrate
where AvailableBandwidth is the TCP-friendly available bandwidth (Equation 8.1) between the sender and the receiver, measured in bits/second and Bitrate is the bit rate of the (partial)video stream. Network Closeness can take values between 0 and 1. In case that the available bandwidth from the sender to the receiver is sufficient to deliver the content without loss, Network Closeness has the value of 1.
8.2.2
Quality Closeness
Quality closeness expresses the probability that a stream arrives at the receiver with the desired quality. This probability depends (1) on the packet loss of the network
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Page 81
path and (2) on the type of frames in the stream. MPEG coded video streams [Mar03] consist of three main frame types, I, P and B [Jil98]. I-frames (Intra-coded frames) have the advantage of being self-contained, allowing random access with the disadvantage that the compression rate is much lower compared to P- or B-frames. P-frames (predictive-coded frame) have a better compression ratio than I-frames but encoding and decoding requires information from the previous I- or P frames. The third type of frames are Bi-directionally predictive-coded frames (B-frames). The advantage of B-frames is that they have the highest compression ratio compared to I- and P-frames but they additionally depend on the I-frame and all P-frames in the GOP. The dependency relationship can be found in Figure 8.5. Errors in I or P frames are propagated to all dependent frames. For example with a typical playback rate of 30 frames per second and a GOP size of 15 frames the loss of one single I frame leads to an outage of 500 msec. So quality closeness is used to consider the structure of the
Figure 8.5: GOP structure stream additionally to the loss rate of the network. For example a stream (encoded using only I-frames) losing more frames, sometimes achieves a better quality result than the same content with a lower bit rate (encoded using I,P and B-frames) sent over the same link. Using the network model presented in [Hua03] it is possible to find a mapping from lost network packets to the quality decrease in the video stream. Different packet losses have different effects on the media quality and thus the protection mechanism has to be adapted to the importance of the frames. The model requires the number of network packets belonging to each video frame as well as the loss probability of the network path. The loss probability can be expressed
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Page 82
as the ratio of the probability of transmitted- and received packets: LossP robability =
P acketsReceived P acketsT ransmitted
(8.2)
The number of received packets (P acketsReceived ) is determined by sending test packets from the sender to the receiver. The number of transmitted packets can be calculated by parsing the structure of the stream. The LossProbability takes values between 0 and 1: 0 means that all packets are lost, 1 means that all packets are received. Knowing the loss probability of the path and the number of packets belonging to a frame, the probability for successfully receiving one single frame can be calculated using the statistical binomial distribution as follows: ! T ap(T, p) = 1 − × p0 × (1 − p)T −1 0
(8.3)
T is the number of network packets and p is the loss probability of the path (defined in equation 8.2). Calculating the arrival probability (ap) for one single frame is not sufficient to select between streams on alternative proxies. Videos have playback times ranging from several seconds to hours and thus analyzing the complete structure would take too long. However the fact that video streams are organized in subsequent groups of pictures (GOPs) can be used to simplify calculations. In the test streams each GOP follows the same frame pattern (e.g. IBBP...), providing sufficient information to make predictions about the complete video. In order to model packet loss for a group of pictures I, P and B frames must be analyzed separately as they have different sizes and dependencies: apI = ap(NI , FI , p) apP = ap(NP , FP , p)
(8.4)
apB = ap(NB , FB , p) apI , apP , apB are the probabilities that I,P and B frames principally are not lost. NI , NP , NB are the numbers of packets for each type of frame, FI , FP , FB are the
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Page 83
numbers of forward error correction packets used and p is the network loss probability (defined in Equation 8.2). Calculating the probability (RI ) that all packets belonging to the I frame arrive successfully is simple because no dependencies need to be considered: RI = apI
(8.5)
P and B frame dependencies are considered in the rest of this section. When calculating the probability for P frames the dependency to the previous I and all previous P frames has to be considered: RP = RI × apP × apP (NP −1)
(8.6)
where RI × apP expresses the probability of successfully decoding the I frame ref(NP −1)
erenced by the first P frame of the GOP and apP
expresses the probability of
decoding the (NP − 1) P-frames depending on other P-frames. As B frames depend on one preceding and one succeeding I or P frame, the probability for the arrival of all B frames is calculated as: RB = RI × apB × apP × RP (NB −1) × apB
(8.7)
where the first B frame of the GOP depends on one I frame and one P frame, all (NB − 1) others depend on one preceding and one succeeding P frame. Thus the total probability for the arrival of the packets belonging to frames of a GOP is: ArrivalP robability = RI ∗ RP ∗ RB
(8.8)
Quality Closeness is calculated as: QualityClosenss =
ArrivalP robability r(FA ) + r(FB ) + r(FC )
where r(FA ) + r(FB ) + r(FC ) is the percentage of forward error corrections required. The higher the amount of required forward error correction packets the lower is the quality closeness value.
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
8.3
Page 84
Error Treatment selection based on the A∗ Algorithm
This section demonstrates how the stream affinity metrics is used to find the best error treatment using a graph based search. The nodes in the graph are states in the decision process where the arcs represent the dependencies between the states. Each state is characterized by a set of possible error treatment methods. When starting at node S all possible error treatments can be selected. After each decision the remaining possibilities are limited by the constraints. Figure 8.6 shows how the best error treatment for an example video V1 is found. The first decision is to select the server
Figure 8.6: Graph based error treatment selection with the best network connection to the client. The quality of the network connection is determined by considering the bit rate of the stream and the available bandwidth of the network (see Network Closeness, section 8.2.1). Assuming that video V1 is available on two servers, three options are available - (1) sending it from server 1, (2) sending it from server 2 or (3) sending fragments from both servers (Multiple Source Streaming). In this scenario it is assumed that none of the paths has sufficient bandwidth to stream the content from a single server. As can be seen in Figure 8.6, the network closeness values of the single server paths are 0.3. So sending fragments from
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Page 85
both servers is the best alternative, a network closeness value of 0.6. The network closeness value of 0.6 (< 1.0) indicates that two servers are not sufficient to deliver the content without quality loss. To further improve stream quality error avoidance has to be combined with error correction and accordingly in the next three select steps the degree of I-frame, P-frame and B-frame protection. The selection is especially important when the network bandwidth is not sufficient to send all error-correction frames. Using QualityCloseness allows the frames with the highest loss probability to be protected by redundant packets.
Heuristics based error treatment selection The problem with doing pure graph based search, as presented so far, is that all alternative selections have to be calculated and compared against each other before a decision can be made. In order to avoid an exhaustive search, the A∗ algorithm [Win93] is used to find the best solution by comparing only a minimal set of alternatives. Therefore the algorithm uses heuristic values in combination with real ones (those already known from past decisions). For example when the decision process is in stage 2 of 5, 2 values are already known and 3 have to be estimated. These estimated values are used for the further decision making process. Heuristic values are good estimates of the current state of the system with the advantage that they can be calculated much faster than the real ones. Assume that a media stream with a bit rate of 1000 Kbit/sec has to be delivered and it can be streamed either from server S1 with an upload bandwidth of 500kbit/sec or from S2 with a upload band- width of 1500kbit/s. In order to really judge the resulting quality both streams would have to be sent from both servers, allowing the measurement of the PSNR values. Sending the stream from both servers is not practicable in a real system, so heuristic values are used. The heuristic value for S1 would be 1 and the heuristic value for S2 would be 3, meaning that the probability for delivering the stream with the expected quality is three times higher from S2 even though the
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Page 86
current network conditions are not known. When using heuristic values in combination with the A∗ algorithm it is important to obey the admissibility theorem which states that the heuristics values can be better but must never be worse than the real ones. Consider for example that a network node with the best up-link capacity is required. The selection is between two alternative servers: the up-link capacity of server 1 offers 10 Mbit/s and server 2 offers 100 Mbit/s. Making an exhaustive evaluation requires 2 network measurements, which costs time and is a network intensive task. So in the first step only the capacity of the link that is expected to yield a better result is measured . Assuming that the throughput is 50 Mbit/s, the throughput measurement for the second up-link can be skipped because a link with a maximal capacity of 10 Mbit/s can never achieve a higher throughput than a first with a measured throughput of 50 Mbit/s.
8.4
Evaluation
8.4.1
Scenario 1
The following example illustrates the necessity of combining Network Closeness and Quality Closeness. Content C is provided by two alternative proxies in different qualities (Figure 8.7). Stream A is encoded using only I frames. Stream B is encoded
Figure 8.7: Logical view scenario 1 using I,P and B frames. Calculating the network closeness value for both streams yields a better result for stream B (Table 8.1). Due to this calculation one would
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Stream A B
Bitrate 1462 1098
Page 87
Avail. BW NW - Closeness 1257 0.86 989 0.90
Table 8.1: Stream-, Network Characteristics decide to request the stream B. When stream affinity is calculated (Table 8.2) it can Stream A B
Stream Affinity 0.66 0.5
Table 8.2: Network Closeness be seen that A is expected to yield a better result. In order to verify the stream affinity calculation both decisions are simulated. By sending both streams and comparing the qualities, it can be seen that considering Network Closeness alone is not sufficient and selecting stream B was the wrong decision. The MOS values of stream A and B are 4.02 and 3.50 respectively (8.3). Stream B scores poorly because of the temporal dependencies to lost frames which cause an additional quality degeneration. So stream affinity metrics can be used to select the better alternative. This small example is Decision Mean Opinion Score Server A 4,02 Server B 3,50 Table 8.3: Result scenario 1 used to show that calculating the ratio between available bandwidth and the bit rate of the stream is not sufficient to decide between alternative streaming sources - the structure of the stream has a strong influence on the resulting quality.
8.4.2
Scenario 2
In the second scenario it is assumed that two proxies provide the requested content. The quality of the content is identical on both proxies. The network paths to the receiver have an average bandwidth of 300 Kbit/s. Without calculating stream affin-
CHAPTER 8. STREAM AFFINITY BASED ERROR TREATMENT
Page 88
Figure 8.8: Logical view scenario 2 ity a likely decision would be to send one description from each of the proxies and accepting that some data is lost. This decision would be based on the fact that no additional streaming server is available to send forward error correction packets. When stream affinity is calculated it can be seen that sending 2 descriptions and no forward error correction stream would be the wrong decision (Table 8.4): In order to verify Decision Stream Affinity 2 Descriptions 0,44 1 Description + FEC Stream 0,62 Table 8.4: Possible decisions scenario 2 the stream affinity calculation both decisions are simulated. The results are listed in Table 8.5. As can be seen in Table 8.5, the result from sending 1 description and 1 Decision Send 2 Descriptions Send 1 Description + FEC Stream
Mean Opinion Score 2,25 3,16
Table 8.5: Result scenario 2 forward error correction is 18,2 % better than sending 2 descriptions. In this scenario sending 2 descriptions performs poorly because none of the two descriptions can be fully received. Contrary to that sending the error correction stream allows to receive 1 description with a minimal loss of 0,5%.
Part III Constraint based Data Replication
89
CHAPTER
9
Caching
Replicating data from primary sources to surrogate servers near to clients is generally used in order to (1) reduce the load on the primary server, (2) decrease network traffic and (3) shorten response time (known as start up latency for continuous data). Proxies are places at strategic locations. For example placing a proxy at the border of a local network keeps parts of the traffic within the network. A proxy can receive a request (a) through interception or (b) redirection. Request interception is only possible when the user knows about the proxy and configures the application so that the proxy receives every request that a client sends to a server. Upon a request the proxy checks the local cache for the availability of the content. When the content is available it serves the request - otherwise it forwards the request to the server. As already mentioned the alternative to interception is redirection. The best known mechanisms for redirection are DNS- or HTTP redirection [Jus01]. For proxy caching several architectures are possible. This work discusses (1) standalone proxies, (2) hierarchical proxy architectures and (3) distributed (flat) proxy architectures.
9.1
Standalone Caching
A logical view of a possible standalone proxy architecture can be found in Figure 9.1. Here the proxy is located close to the clients. In case that the clients send N requests
CHAPTER 9. CACHING
Page 91
Figure 9.1: Standalone Proxy Architecture for the same content within a certain time, N − 1 requests can be served using the cache. This type of architecture is typically not transparent for the end-users as they must configure their applications to use the proxy.
9.2
Hierarchical Caching
The basic idea of hierarchical caching is to place caches in particular at 4 levels in the network [Wan99]. These levels are known as (1) the bottom (client/application caches), (2) the institutional, (3) the regional and (4) the national level. The lowest level is the bottom-, the highest level is the national level. This architecture tries to serve each request using the lowest possible proxy level. When a request cannot be served from there it is forwarded to a higher level. The process is repeated recursively until the national level (level 4) is reached. Requests that cannot be served at any of the 4 levels are forwarded to the original server. According to [Wan99] the disadvantages of hierarchical caching are (1) difficult proxy relocation, (2) high delay, (3) bottlenecks at high level caches and (4) wastage of storage space. Relocating proxies is difficult because all dependent proxies at the lower levels have to be updated to be able to forward requests. High delay is caused by checking content availability at each level. Bottlenecks mainly occur at higher levels because requests from lower levels are aggregated there. Also a fourth argument
CHAPTER 9. CACHING
Page 92
may be presented that a lot of storage space is wasted by caching the same content at different levels.
9.3
Distributed Caching
Distributed caching is based on the approach of hierarchical caching. It differs from hierarchical caching in the sense that caches are only available at one level (eg. institutional level), so requests are only forwarded to proxies at the same level but not to the parents. Each cache maintains meta information about the content that is available in other caches at the same level. The challenge of this approach is the exchange of the information about cached objects. In order to provide caches with accurate information, hierarchical distribution mechanisms are used, where meta information is not broadcasted to all members but rather is only exchanged between child and parent nodes.
9.4
Hybrid Caching
Hybrid caching combines hierarchical caching with distributed caching. The goal of this approach is to fetch the content from a cache with a sufficient network connection. For example when the content is available on a higher level cache connected through a crowded network connection and an original server with a good network connection, the second alternative is preferred.
9.5
Cache Replacement Strategies
According to [Ste03] the best known cache replacement strategies are (1) recency-, (2) frequency-, (3) size-, (4) cost- or (5) modification time based. In this chapter only recency- and frequency based strategies are discussed. Recency based algorithms use the time since the last object access for making replacement decisions. The best
CHAPTER 9. CACHING
Page 93
known recency based replacement algorithm is called the least recently used (LRU). The least recently used principle can be best understood by considering a linked list of items and a set items. The linked list represents all objects that are cached by the proxy whereas the set details the objects that are available on the original server. The items in the list are sorted according to the last access time. The most recently
used item is the first element and the least recently used item is the last element in the list. When the cache is full but a new object has to be cached the least recently used one (eldest access timestamp) is replaced. The main disadvantage of the basic LRU replacement algorithm is that the size of the cached objects is not considered. Let’s assume that the new object to cache requires an storage space of 1MB. The size of element N (least recently used one) in the list is only 500 kB. Removing only the least recently element is not sufficient to cache the new object. So object N-1 (1 MB) is also removed. So it can be seen that removing only object N-1 would be sufficient and object N could have been kept. The second type of replacement algorithms in this chapter are the frequency based ones. Frequency based algorithms consider object popularity. The popularity of the objects is represented by the number of requests. At the implementation level the mechanism differs from the recency based one in the sense that the linked list is sorted according to the number of requests. The object with the highest number of requests is at the head-, the one with the lowest number is at the tail of the list. The main problem of the least frequently used replacement algorithm is that the history has a strong influence on the replacement decisions. For example when an object has been very popular in the past but is not requested any more it might have a higher popularity value than objects that start becoming popular. In order to get rid of
CHAPTER 9. CACHING
Page 94
influences from the past the least frequently used algorithm has been extended. This extensions are basically similar to the LRU principle [Ste03].
CHAPTER
10
Replication Affinity
Based on the insight about least recently used [Ath00] and least frequently used [Don99] cache replacement algorithms a metrics called replication affinity has been invented. This chapter shows that by using replication affinity an optimum among the following constraints can be found: (1) Video data may be very large - Therefore replication must not cost more than the gain. (2) A surrogate with good network connections to later clients may be an attractive target for replicas - However, it is important that content on popular targets is not replaced too frequently. (3) In the case of asymmetric network connections (i.e. high download and modest upload bandwidth), multiple-source streaming can be the best solution for good stream performance. In such cases a certain number of properly placed replicas is a necessity. (4) A group of surrogates sharing videos of the same genre have to reach a high level of reuse of these videos. (5) Full content replacement decreases the hit rate - Partial replacement of multiple videos is a more efficient strategy to make the best possible use of available storage space. (6) Wrong replacement decisions have to be smoothed by other surrogate servers that are able to stream the content at a similar quality. Moreover video data can be usually regarded as immutable and therefore the question of replication consistency does not need to be handled. The 6 listed constraints are combined into the replication affinity metrics presented in the next section. In order to consider the constraints between the closeness values a dependency graph is generated for each decision. With the help of a heuristics based
CHAPTER 10. REPLICATION AFFINITY
Page 96
A∗ algorithm, the best replication decision can be found with the minimum number of comparisons.
10.1
The Replication Affinity Metrics
Replication Affinity is the weighted sum of two measures called Placement Affinity (PA) (Section 10.1.1) and Reallocation Affinity (RA) (Section 10.1.2). Placement
Figure 10.1: Replication Affinity (logical view) Affinity combines the three measures known as (1) genre closeness (Section 10.1.1), (2) replication-time closeness (Section 10.1.1) and (3) delivery closeness (Section 10.1.1). Reallocation Affinity combines the three measures known as (1) replacement closeness, (2) distribution closeness and (3) granularity closeness. A logical view can be found in Figure 10.1.
10.1.1
Placement Affinity (PA)
Placement Affinity is used for selecting between potential replica destinations. The destinations are ranked according to their ability to deliver the content at the desired quality. The ranking considers (1) the existence of identical or similar content in the group as well as the (2) download bandwidth (the link from the original server to the
CHAPTER 10. REPLICATION AFFINITY
Page 97
media proxy, see 10.2) and (3) upload bandwidth (the link from the media proxy to the end-client, see 10.2) of the destination proxy. Replication closeness is calculated as
Figure 10.2: Up- and DownLink the weighted sum of genre closeness (GC), time closeness(TC) and delivery closeness (DC). 1 1 1 ReplicationCloseness = GC + T C + DC 3 3 3 Genre Closeness (GC) Genre closeness is used for ranking alternative destination groups according to the fraction of (1) similar or (2) identical content. Two videos are considered to be similar if they belong to the same genre. If they additionally have the same content, resolution, bit rate and playback time they are considered to be identical. Now what is the gain from replicating to a group with similar or identical content? Replicating to a group with similar content increases the chance that proxies and end-clients are located nearby. Content delivery between nearby proxies has a good probability of resulting in low delay, jitter and loss. The closeness to a group with similar content is calculated as: GC =
count(SimilarStreamsGroup ) count(StreamsGroup )
(10.1)
where count(SimilarStreamsGroup ) is the number of similar streams and count(StreamsGroup ) is the total number of streams in the examined group. Having identical content on multiple proxies allows multiple source streaming which can be used to balance the network load between alternative paths. The closeness to a group with identical content is calculated as: GC =
count(IdenticalStreamsGroup ) count(StreamsGroup )
(10.2)
CHAPTER 10. REPLICATION AFFINITY
Page 98
where count(IdenticalStreamsGroup ) is the number of identical and count(StreamsGroup ) is the total number of streams in the examined group. In both cases Genre Closeness (GC) can take values between 0 and 1. A group that shares neither identical nor similar content has a Genre Closeness value of 0.
Replication-time Closeness (RTC) Replication-time closeness is used to rank potential destination proxies according to the time required to replicate the content. The shorter the replication time, the sooner the content can be served from the proxy cache. Replication time depends on the content size and the network state. The network state is determined by measuring the download bandwidth from the original server to the proxy (see Figure 10.2). Time closeness (TC) is calculated as: RT C = min 1,
ContentSize DownloadBW
1 1 ∗ P laybackT ime
(10.3)
where ContentSize is the number of bytes to replicate, DownloadBW is the estimated bandwidth in bits/sec between the original server and the proxy and P laybackT ime is the playback duration of the video measured in seconds. The relation
ContentSize DownloadBW
is the ReplicationTime. As long as ReplicationT ime > toDo: add later **/ void newFile( in string fileName, in streamType type) raises (FileAlreadyExists);
APPENDIX A. SOURCECODE
Page 117
192 193 194 195 196 197
/** * return a reference to a file stored on this proxy * @fileName name of the file * @return reference to file **/ MediaFile getFile( in string fileName ) raises (NoSuchFile);
198 199 200 201 202 203 204 205
/** * a file that moved from this proxy * to a destination proxy * @fileName name of the file to move * @proxyName identity of the destination proxy **/ void moveFile( in string fileName, in string proxyName ) raises (NoSuchFile);
206 207 208 209 210
/** * removes a file, identified by fileID from this proxy * @fileName name of the file that is removed **/ void deleteFile( in string fileName ) raises (NoSuchFile);
211 212 213 214 215
/** * Returns a list with the filenames stored on this proxy * @return list with filenames stored ob this proxy **/ strings listFileNames() raises (NoFilesAvailable);
216 217 218 219 220 221
/** * Returns the number of media category entries * @category Media Category a file can belong to * @return number of entries on this proxy **/ long getCategoryEntries( in MediaCategory category );
222 223 224 225 226
/** * Returns a list containing the media categories available on this proxy * @return media categories **/ MediaCategories getMediaCategories();
APPENDIX A. SOURCECODE
227 228 229 230 231
/** * Returns the proxy’s upload bandwidth * @return upload bandwidth **/ long getUploadBandwidth();
232 233 234 235 236
/** * return reference to ConnectionManager * @return ConnectionManager **/ ConnectionManager getConnectionManager();
237 ;//Proxy 238 interface MediaFile 239 240 241 242 243
/** * set the filename * @fileName name **/ void setFilename( in string fileName );
244 245 246 247 248
/** * return the filename * @return fileName **/ string getFilename();
249 250 251 252
/** * demultiplex the file, check that file is multiplexed **/ void demultiplex() raises (FileAlreadyExists,NoSuchFile);
253 254 255 256 257 258
/** * return true if the file contains audio and video streams, * false otherwise * @return true/false **/ boolean isDemultiplexed();
Page 118
APPENDIX A. SOURCECODE
Page 119
259 260 261 262 263
/** * add a new ElementaryStream to this file * @streamType type of the stream (audioStream, videoStream) **/ void newStream( in streamType type ) raises (StreamTypeExists);
264 265 266 267 268 269 270 271
/** * identity of the stream that is moved from this proxy * to a destination proxy * @streamType identity of the stream to move * @proxyName identity of the destination proxy **/ void moveStream( in streamType type, in string proxyName ) raises (StreamTypeNotExists, ProxyNameNotExists, FileAlreadyExists);
272 273 274 275 276
/** * returns the reference to an elementary stream * @return reference to stream **/ Stream getStream( in streamType type ) raises (StreamTypeNotExists);
277 278 279 280 281
/** * removes a stream identified by streamID, from a file identified by fileID * @streamType, type of the stream that is removed **/ void deleteStream( in streamType type ) raises (StreamTypeNotExists);
282 283 284 285 286 287
/** * returns a list containing all stream types for the current file * @return stream types **/ streamTypes listStreamTypes(); ;//File
288 interface Stream 289 290 291 292
/** * set the stream type * @type audioStream/videoStream **/
APPENDIX A. SOURCECODE
Page 120
293 void setStreamType( in streamType type ); 294 295 296 297 298
/** * return the stream type * @return audioStream/videoStream **/ streamType getStreamType();
299 300 301 302 303 304
/** * add additional frames to a stream * @dataBytes stream data * @size number of transmitted bytes **/ void addData( in bytes dataBytes, in long size );
305 306 307 308 309 310 311
/** * return the data bytes from the stream * @dataBytes dataBytes read from stream * @size maxBuffer size * @return bytes returned **/ long getData( out bytes dataBytes, in long size );
312 313 314 315 316 317 318 319 320 321 322 323
/** * moves a group of frames from this proxy to an other proxy * @sourceFileName name of the source file * @sourceStreamType type of the source stream * @startIndex startindex of the framegroup * @endIndex endindex of the framegroup * @destProxyID id of the destination proxy * @destFileName name of the destination file on the destination proxy **/ void moveFrames( in string sourceFileName, in streamType sourceStreamType, in long startIndex, in long endIndex, in string destProxyName, in string destFileName );
324 325 326 327
/** * removes a group of frames from a stream identified by * streamID, from a file identified by fileID * @fileName identity of the file that is removed
APPENDIX A. SOURCECODE
328 329 330 331 332
Page 121
* @type, type of the stream from which the frames are removed * @startIndex start of the FrameGroup * @endIndex end of the FrameGroup **/ void deleteFrames( in string fileName, in streamType type, in long startIndex,
333 ;
334 335 336 337 338 339 340 341 342
/** Connection between the proxies **/ interface Connection double getBandwidth(); /** jitter value of the connection **/ double getJitter(); /** loss value of the connection **/ double getLoss(); ; typedef sequenceConnections;
343 interface ConnectionManager 344 345 346 347 348
/** * Initialize bandwidth measurement and start the timer * @sourceProxy Proxy that started the BW measurement */ void startMeasurement( in string sourceProxy );
349 350 351 352 353 354 355
/** * Dummy data that is received to calculate the BW between the source * proxy and this proxy, don’t store dummy data * @sourceProxy Proxy that started the BW measurement **/ void receiveMeasurementData( in string sourceProxy, in bytes dataBytes, in long size );
356 /** 357 * Terminate the measurement and stop the timer
APPENDIX A. SOURCECODE
Page 122
358 * @sourceProxy Proxy that started the BW measurement 359 **/ 360 long endMeasurement( in string sourceProxy ); 361 362 363 364 365 366 367
/** * creates a new connection between two proxies belonging to the same group * @proxyName1 unique name of proxy1 * @proxyName2 unique name of proxy2 **/ void newConnection( in string proxyName1, in string proxyName2 ) raises (ConnectionExists);
368 369 370 371 372 373 374
/** * deletes a connection in case that a proxy is removed * @proxyName1 unique name of proxy1 * @proxyName2 unique name of proxy2 **/ void deleteConnection( in string proxyName1, in string proxyName2 ) raises (NoSuchConnection);
375 376 377 378
/** * lists all connections **/ strings listConnections() raises (NoConnectionsAvailable);
379 380 381 382 383 384 385 386
/** * get a connection between two proxies identified by their ids * @proxyName1 unique name of proxy1 * @proxyName2 unique name of proxy2 **/ Connection getProxyConnection( in string proxyName1, in string proxyName2 ) raises (NoSuchConnection); ;
387 ;
Bibliography [Abb03] Abbie Barbir, Brad Cain, Raj Nair and Oliver Spatscheck. Known Content Network Request-Routing Mechanisms, RFC3568, 2003. [Ane01]
Anees Shaikh, Renu Tewari and Mukesh Agrawal. On the effectiveness of DNS-based server selection. INFOCOM 2001. Twentieth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, 3:1801–1810, 2001.
[Ath00]
Athena Vakali. Lru-based algorithms for web cache replacement. Proceedings of the First International Conference on Electronic Commerce and Web Technologies, pages 409–418, 2000.
[Bal01]
Balachander Krishnamurthy, Craig Wills and Yin Zhang. On the use and performance of content distribution networks. In IMW ’01: Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, pages 169– 182, New York, NY, USA, 2001. ACM Press.
[Chr04a] Christian Spielvogel and Laszlo B¨osz¨ormenyi.
An alternative way of
providing QoS without support from the network.
Technical Re-
port TR/ITEC/04/2.07, University Klagenfurt, http://www-itec.uniklu.ac.at/ laszlo/publications/index.html, 2004. [Chr04b] Christian Spielvogel, Laszlo B¨osz¨ormenyi, Roland Tusch. Good enough Predictive QoS. Technical Report TR/ITEC/04/2.14, University Klagenfurt, http://www-itec.uni-klu.ac.at/ laszlo/publications/index.html, 2004. 123
BIBLIOGRAPHY
[Col01]
Page 124
Colin Perkins, Orion Hodson and Vicky Hardman. A survey of packet loss recovery techniques for streaming audio. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001.
[Dap01] Dapeng Wu, Yiwei Hou and Ya-Qin Zhang. Scalable video coding and transport over broadband wirelessnetworks. Proceedings of the IEEE, 89:6– 20, Jan 2001. [Don99] Donghee Lee, Jongmoo Choi, Jong-Hun Kim, Sam H. Noh, Sang Lyul Min, Yookun Cho and Chong Sang Kim. On the existence of a spectrum of policies that subsumes the least recently used (lru) and least frequently used (lfu) policies. In SIGMETRICS ’99: Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 134–143, New York, NY, USA, 1999. ACM Press. [Eyt00]
Eytan Adar and Bernardo A. Huberman . Free riding on gnutella. First Monday, 5(10), 2000.
[Geo06]
George Pallis and Athena Vakali. Insight and perspectives for content delivery networks. Communications of the ACM, 49(1):101–106, 2006.
[Hoo04] Hoong Ding, Choon Sarana and Buyya Rajkumar. P2P Networks for Content Sharing. ArXiv Computer Science e-prints, February 2004. [Hua03] Huahui Wu, Mark Claypool and Robert Kinicki. A model for MPEG with forward error correction and TCP-friendly bandwidth. NOSSDAV ’03: Proceedings of the 13th international workshop on Network and operating systems support for digital audio and video, pages 122–130, 2003. [Ion01]
Ion Stoica, Robert Morris, David Karger, Frans Kaashoek and Hari Balakrishnan. Chord: A scalable Peer-To-Peer lookup service for internet applications. Proceedings of the 2001 ACM SIGCOMM Conference, pages 149–160, 2001.
BIBLIOGRAPHY
[Jar06]
Page 125
Jari Korhonen, Yicheng Huang and Ye Wang. Generic forward error correction of short frames for ip streaming applications. Multimedia Tools and Applications, 29(3):305–323, June 2006.
[Jil98]
Jill M. Boyce and Robert D. Gaglianello. Packet loss effects on mpeg video sent over the public internet. In MULTIMEDIA ’98: Proceedings of the sixth ACM international conference on Multimedia, pages 181–190, New York, NY, USA, 1998. ACM Press.
[Jir03]
Jirka Klaue, Berthold Rathke and Adam Wolisz. Evalvid - a framework for video transmission and quality evaluation. In Computer Performance Evaluation/Tools, pages 255–272, 2003.
[Jit98]
Jitendra Padhye and Victor Firoiu and Don Towsley and Jim Kurose. Modeling TCP throughput: A simple Model and its Empirical Validation. In SIGCOMM ’98: Proceedings of the ACM SIGCOMM ’98 conference Applications, Technologies, Architectures and Protocols for Computer Communication, pages 303–314, New York, NY, USA, 1998. ACM Press.
[Jit01]
Jitian Xiao and Yanchun Zhang. Clustering of Web Users Using SessionBased Similarity Measures. In ICCNMC ’01: Proceedings of the 2001 International Conference on Computer Networks and Mobile Computing (ICCNMC’01), page 223, Washington, DC, USA, 2001. IEEE Computer Society.
[Joh02]
John Apostolopoulos, Tina Wong, Wai-tain Tan and Susie Wee. On multiple description streaming with content delivery networks. INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, 3:1736– 1745, 2002.
[Jun05]
Jun Shi, Jian Liang and Jinyuan You. Measurements and Understanding of the KaZaA P2P Network. Springer Berlin Heidelberg, 2005.
BIBLIOGRAPHY
[Jus01]
Page 126
Jussi Kangasharju, Keith Ross and James Roberts. Performance evaluation of redirection schemes in content distribution networks. Computer Communications, 24(2):207–214, 2001.
[Kon04] Kontothanassis, L. Sitaraman, R. Wein, J. Hong, D. Kleinberg, R. Mancuso, B. Shaw, D. Stodolsky, D. A transport layer for live streaming in a content delivery network. Proceedings of the IEEE, 92(9):1408–1419, Sept. 2004. [KR00]
James Kurose and Keith Ross. Computer Networking: A Top-Down Approach Featuring the Internet. Addison-Wesley, July 2000.
[M. 00]
M. Cieslak, D. Foster, G. Tiwana and R. Wilson. Web cache coordination protocol v2.0, 2000.
[Mar03] Mark Claypool and Yali Zhu. Using interleaving to ameliorate the effects of packet loss in a video stream. In ICDCSW ’03: Proceedings of the 23rd International Conference on Distributed Computing Systems, page 508, Washington, DC, USA, 2003. IEEE Computer Society. [Mat04] Matei Ripeanu,Mic Bowman,
Jeffrey Chase,Ian Foster and Milan
Milenkovic. Globus and planetlab resource management solutions compared. In HPDC ’04: Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, pages 246–255, Washington, DC, USA, 2004. IEEE Computer Society. [Mol06]
Molina Moreno, Palau Salvador, Esteve Domingo, Alonso Pena and Ruiz Extremera. On content delivery network implementation. In Elsevier Science, Computer Communications, Volume 29, Issue 12, pages 2396–2412, 2006.
[NS201] The Network Simulator ns-2 (v2.1b8a). http://www.ns-2.com, October 2001.
BIBLIOGRAPHY
[Pos97]
J. Postel.
Page 127
Internet Control Message Protocol.
IETF Stan-
dard, Request For Comments (RFC) 792, September 1997.
URL:
http://www.ietf.org/rfc/rfc2205.txt. [Qi 03]
Qi He, Mostafa Ammar, George Riley, Himanshu Raj and Richard Fujimoto. Mapping peer behavior to packet-level details: A framework for packet-level simulation of peer-to-peer systems. MASCOTS 2003. 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, pages 71–78, 2003.
[Ric98]
Rich Wolski. Dynamically forecasting network performance using the network weather service. Cluster Computing, 1(1):119–132, 1998.
[Ric04]
Riccardo Bernardini, Marco Durigon, Roberto Rinaldo, Luca Celetto and Andrea Vitali. Polyphase spatial subsampling multiple description coding of video streams with h264. International Conference on Image Processing, 2004. ICIP apos;, 5:3213–3216, October 2004.
[Ste03]
Stefan Podlipnig and Laszlo B¨osz¨ormenyi. A survey of web cache replacement strategies. ACM Computing Surveys, 35(4):374–398, December 2003.
[Ste04]
Stephanos Androutsellis-Theotokis and Diomidis Spinellis. A survey of peer-to-peer content distribution technologies. ACM Computing Surveys, 36(4):335–371, December 2004.
[Ste06]
Stefan Perauer. Request routing and multiple-source streaming in a proxyto-proxy network. Master’s thesis, University Klagenfurt, November 2006.
[Syl01]
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp and Scott Schenker. A scalable content-addressable network. In SIGCOMM ’2001: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, pages 161–172, New York, NY, USA, 2001. ACM Press.
BIBLIOGRAPHY
[Ver97]
Page 128
Vern Paxson. End-to-End Internet Packet Dynamics. In Proceedings of the ACM SIGCOMM ’97 conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, volume 27,4 of Computer Communication Review, pages 139–154, Cannes, France, September 1997. ACM Press.
[Vit05]
Andrea Vitali. Multiple description coding. ST Journal of Research Networked Multimedia, 2(1):83–92, October 2005.
[Wan99] Jia Wang. A survey of web caching schemes for the internet. ACM SIGCOMM Computer Communication Review, 29(5):36–46, 1999. [WC97]
D. Wessels and K. Claffy. Internet cache protocol (icp), version 2, 1997.
[Win93] P. H. Winston. Artificial Intelligence. Addison Wesley, 1993. [Xux03] Xuxian Jiang, Yu Dong, Dongyan Xu and Bharat Bhargava. Gnustream: A p2p media streaming system prototype. In Proceedings of the International Conference on Multimedia and Expo (ICME), 2:325–328, 2003. [Yan03a] Yan Chen, Lili Qiu, Weiyu Chen, Luan Nguyen and Randy H. Katz. Efficient and adaptive web replication using content clustering. IEEE Journal on Selected Areas in Communications, 21(6), August 2003. [Yan03b] Yang Guo, Kyoungwon Suh, Jim Kurose and Don Towsley. A peer-to-peer on-demand streaming service and its performance evaluation. In ICME ’03: Proceedings of the 2003 International Conference on Multimedia and Expo, pages 649–652, Washington, DC, USA, 2003. IEEE Computer Society. [Yao05]
Yao Wang, Amy Reibman and Lin Shunan. Multiple description coding for video delivery. Proceedings of the IEEE, 93(1):57–70, 2005.
[YUV]
Akio - yuv stream. http://www.tkn.tu-berlin.de/research/evalvid/cif.html.
BIBLIOGRAPHY
[Zhu02]
Page 129
Zhuoqing Morley Mao, Charles D. Cranor, Fred Douglis, Michael Rabinovich, Oliver Spatscheck and Jia Wang. A precise and efficient evaluation of the proximity between web clients and their local dns servers. pages 229–242, 2002.