Optimizing the Throughput of Data-Driven based ... - Semantic Scholar

3 downloads 263 Views 363KB Size Report
2 Microsoft Research Asia, Beijing 100080, China,. Yongqiang. ... of data-driven streaming systems in heterogeneous overlay network. We first model ... step named block scheduling is also intuitive: the live media content is divided into blocks ..... erage the delivery ratio of that layer over all nodes that can achieve the layer.
Optimizing the Throughput of Data-Driven based Streaming in Heterogeneous Overlay Network Meng Zhang1 , Chunxiao Chen1 , Yongqiang Xiong2 , Qian Zhang3 , and Shiqiang Yang1 1

Dept. of Computer Sci. & Tech., Tsinghua Univ., Beijing 100084, China, {zhangmeng00,chencx05}@mails.tsinghua.edu.cn, [email protected] 2 Microsoft Research Asia, Beijing 100080, China, [email protected] 3 Dept. of Computer Sci., Hong Kong Univ. of Sci. and Tech., Hong Kong, China, [email protected]

Abstract. Recently, much attention has been paid on data-driven (or swarm-like) based live streaming systems due to its rapid growth in deployment over Internet. In such systems, nodes randomly select their neighbors to form an unstructured overlay mesh (gossip-style overlay construction) and then each node requests desired data blocks from its neighbors (block scheduling). To improve the performance, most of existing works focus on the gossip-style overlay construction issue; however few concentrate on optimizing the block scheduling for improving the throughput of a constructed overlay, especially in heterogeneous environment. In this paper, we propose a scheme to optimize the throughput of data-driven streaming systems in heterogeneous overlay network. We first model the block scheduling problem as a classical min-cost flow problem and thereby derive a global optimal solution. Based on this idea, we then propose DONLE - a fully distributed asynchronous scheduling algorithm. Simulation results verify that DONLE is superior to a number of conventional strategies.

1

Introduction

As the most promising alternative to IP multicast, overlay multicast especially multicast through peer-to-peer (P2P) network has attracted a lot of attention during the past decade. One of the most important applications of overlay multicast is to stream live media content to a huge population of end users through Internet, also known as peer-to-peer streaming. A lot of measurement studies in P2P overlay networks reveal that the bottleneck bandwidth between the end hosts exhibits extremely heterogeneity. To deal with heterogeneity in streaming multicast applications, numerous solutions has been proposed for both IP multicast [1] and overlay multicast [2, 3]. Their basic way is to encode the source video into multiple layers, and each receiver subscribes an appropriate number of layers due to its bandwidth capacity.

2

Meng ZHANG et al.

Recently, a new category of overlay streaming multicast protocols called datadriven protocols (or swarm-like protocols) [4–7] targeting non-interactive streaming multicast applications has been proposed. Unlike conventional tree-based approaches, in data-driven protocol, each node randomly finds some nodes as its neighbors so that an unstructured network is formed. This step is usually called gossip-style overlay construction (or membership management). The next step named block scheduling is also intuitive: the live media content is divided into blocks (or segments, packets) and every node announces what blocks it has to its neighbors. Then each node explicitly requests the blocks of interest from its neighbors according to their announcement. Actually, it is similar to Bit-Torrent protocol [8]. Some systematical studies (such as [9]) show that data-driven approach is better than tree-based approach under many conditions especially in high churn rate of clients. Meanwhile, data-driven based streaming systems are also emerging and rapidly deployed over Internet [4, 10, 11] in the past two years. Given the significance of data-driven streaming protocol, it is important to study how to improve the throughput of this category of protocols especially under heterogeneous network. Most of existing works in P2P streaming with layered coding [2, 3] use stream level scheduling method. However, the scheduling in data-driven protocol is more fine-grained because it needs a block level scheduling. This leads to the challenge: how does a node decide to fetch which block of which layer from which neighbor node under heterogeneous bandwidth constraints. In our previous work [12], we have studied how to do optimal scheduling in homogeneous environment. In this paper, we propose DONLE, a Data-driven Overlay Network algorithm using LayEred coding to handle the heterogeneity. We first state the basic block scheduling problem, then model the problem as a classical min-cost flow problem and give a global optimal block scheduling solution. After that, we propose a fully distributed algorithm - DONLE, doing local optimal block scheduling at each node. Simulation results show that DONLE is superior to a number of recent proposed scheduling strategies under the same overlay topology. The remainder of this paper is organized as follows. In Section 2 we briefly review the related work. In Section 3, we state the block scheduling problem in detail and formulate the problem. Next, in Section 4, we model this scheduling problem as an equivalent min-cost flow problem and derive the global optimal scheduling algorithm. Section 5 presents the proposed distributed asynchronous algorithm DONLE. The performance of DONLE is evaluated in Section 6. We conclude this paper in Section 7.

2

Related Work

Actually, there are a wealth of research efforts towards improving the overlay multicast throughput. Early researchers in this area mainly focus on how to construct single or multiple application layer tree(s). LION [3] employs a streamlevel multi-path based method to improve the throughput of overlay network using network coding. Recently, a new category of overlay multicast protocols

Lecture Notes in Computer Science

3

- data-driven (or swarm-like) protocols are proposed [4–7]. In theses protocols, PALS [7] is an adaptive streaming mechanism from multiple senders to a single receiver using layered coding, which is actually a swarm-like (or data-driven) protocol. PALS mainly focuses on coping with the network dynamics such as bandwidth variations and sender participation. PALS evaluates its performance under the scenario of streaming from multiple senders to a single receiver very detailedly. Yet it does not involve the performance of the data-driven protocol under an overlay mesh and does not aim to improve the throughput of data-driven streaming. Besides, many recent works have been done to improve the gossip-style overlay construction for various purposes [13, 14]. However, few works address how to maximize the throughput of data-driven streaming in a constructed heterogeneous overlay mesh.

3

Block Scheduling: Problem Statement And Formulation

In this section, we first intuitively explain what we optimize in data-driven streaming. Then we formulate this problem. Our basic approach is comprehensive. We define a priority for every desired block of each node due to the block importance, such as block layer, and its rarity. Our goal is to maximize the average priority sum of all streaming blocks that are delivered to each node in one request period under heterogeneous bandwidth constraints. 3.1

Block Scheduling Problem

The idea of DON based streaming system is similar to Bit-Torrent protocol [8]. In our protocol, each node will independently find its neighbors in the overlay so that an unstructured random overlay mesh will be formed. The media streaming is encoded with layered coding, and every layer is divided into blocks with the same size, each of which has a unique sequence number. Every node has a sliding window which contains all the up-to-date blocks on the node and goes forward continuously at the speed of streaming rate. We call the front part of the sliding window exchanging window. The blocks in the exchanging window are the ones before the playback deadline, and only these blocks will be requested if they are not received. The unavailable blocks beyond playback deadline will be no more requested. Every node periodically pushes all its neighbors a bit vector called buffer map in which each bit represents the availability of a block in its sliding window to announce what blocks it holds. Due to the announcement of the neighbors, each node will periodically send requests to its neighbors for the desired blocks in its exchanging window. We call the time between two requests a request period (or period for short, typically 1∼6 sec). Each node will decide from which neighbor to ask for which blocks at the beginning of each request period. When a block does not arrive after its request is issued for a while and is still in the exchanging window, it is requested in the following period again. In layered video coding, video is encoded into a base layer and several enhanced layers and a higher layer can only be decoded if all lower layers are available,

4

Meng ZHANG et al. Table 1. Notations

Notation N L rl , l = 1, · · · , L Ii , Oi , i = 0, · · · , |N | Eik , i, k = 0, · · · , |N | hij ∈ {0, 1} N BRi τ πji WT Ci dij Di

Description Set of all nodes in the overlay except the source node 0 Number of encoded layers The cumulative rate from layer 1 to layer l, blocks per second The inbound and outbound bandwidth capacity of node i The maximum end-to-end bandwidth from node i to k “hij = 1” denotes node i holds block j; “hij = 0”, otherwise Set of neighbors of node i The request period The priority of block j for node i The exchanging windows size scaled by time The current clock time at node i Play out deadline of block j at node i Set of all desired blocks in the exchanging window of node i

namely the block dependency. So in our algorithm, the blocks in lower layer always have higher priority than the ones in the upper layer. 3.2

Problem Formulation

In this section, we will give the formulation of the block scheduling problem (BSP for short). As aforementioned, we try to maximize the average priority sum of all streaming blocks that are delivered to each node in one request period under heterogeneous bandwidth constraints. We consider two types of bandwidth constraints, namely access bottlenecks (inbound and outbound bandwidth capacity) and non-access bottleneck bandwidth (maximum end-to-end available bandwidth). As all the blocks have the same size, we use blocks per second to represent the amount of inbound, outbound, and end-to-end available bandwidth. Table 1 summarizes the notations in the rest of this paper. Block Priority Definition We give each block a priority due to its importance for a specified node. Two key factors that have impact on the block importance are considered here: the layer factor and the rarity factor. As in layered coding, the upper layer can be decoded only if the lower layers are available, we should ensure that the blocks of lower layer have higher priority. Besides, many previous works such as [15] demonstrate that requesting the block with rarest holders first brings more diversity to the system and help the block spread more rapidly. The following is the priority value of block j for node i: µ X ¶ i πj = βΠR hkj + (1 − β)θΠL (λj ), where β = (dij − Ci )/WT (1) k∈N BRi

We let both function ΠR and ΠL monotonously decreasing. Function ΠL satisfies ΠL (λj ) À ΠL (λk ) when λj < λk , for any block j and k so as to guarantee the layer dependency requirement. Parameter 0 ≤ β ≤ 1 represents the current position block j in the exchanging window. We let θ have relatively

Lecture Notes in Computer Science

5

large value. Although our block priority definition is a simple linear combination of the two factors, it can guarantee the following key requirements: a) when a lower-layer block is in near the playback deadline (β = 0), it has much higher priority than any other upper-layer blocks (large value of θ); b) a block with fewer holders has higher priority than the one with more holders in the same layer and the same position in exchanging window. Formulation We formulate the block scheduling problem. We define the decision variable xikj to denote whether node i ∈ N should request block j ∈ Di from its neighbor k ∈ N BRi : ½ 1, node i should request packet j from neighbor k i xkj = (2) 0, otherwise Our target is to maximize the average priority sum of blocks that each node can receive with heterogenous bandwidth constraints: max

1 X X |N |

X

πji hkj xikj

i∈N j∈Di k∈N BRi

s.t. (a) (c) (e)

X k∈N BRi X

xikj ≤ 1, ∀i ∈ N, j ∈ Di X

xikj ≤ τ Ok , ∀k ∈ N

i∈N BRk j∈Di xikj ∈ {0, 1}, ∀i

(b) (d)

X

X

xikj ≤ τ Ii , ∀i ∈ N

j∈D Xi k∈N BRi xikj ≤ τ Eki , ∀i j∈Di

∈ N, k ∈ N BRi

∈ N, k ∈ N BRi , j ∈ Di

(3) The formulation is a comprehensive integer linear programming, and we call this optimization problem global block scheduling problem (or global BSP for short). Constraint a) ensures no duplicate blocks are requested. Constraints b) and c) guarantee the blocks numbers that are downloaded from node i and uploaded to node k do not exceed the inbound and outbound bandwidth limitation respectively. Furthermore, constraint d) ensures that the number of blocks transmitted from node k to node i is under the constraint of end-to-end available bandwidth. Finally, constraint e) indicates that it is an integer programming.

4

Modeling and Global Optimal Solution

In this section, we will show that the global BSP (3) can be transformed into an equivalent minimum cost flow problem that can be solved in polynomial time. We call solving such a min-cost flow problem a global optimal solution. The mincost flow problem is introduced in [16]. By double scaling algorithm [16], the time complexity for min-cost flow problem is bounded with O(nm(log log U ) log(nC)), where n and m are the number of vertices and arcs while U and C is the largest magnitude of arc capacity and cost respectively. Fig. 1(a) and Fig. 1(b) show a sample of global BSP with four nodes and its min-cost flow modeling respectively. In Fig. 1(b), the two numbers close to

6

Meng ZHANG et al.

(a) A global block scheduling problem

(b) Model as a min cost flow problem

Fig. 1. An example of the equivalent MCFP

an arc represent the capacity and per unit flow cost of the arc. Rather than describe the general model formally, we merely describe the model ingredients for these figures. In data-driven streaming, we decompose a node into its three roles: a send, a receiver and a neighbor. We model each sender k as a vertex sk , each receiver i as a vertex ri , and each neighbor k of node i as a vertex nik . Further, we model a desired block j for node i as a vertex bij . Besides we add two virtual vertices: a source vertex s and a sink vertex t. The decision variables for this problem are whether to request block j from neighbor k of node i which we represent by an arc from vertex nik to vertex bij if block j is a desired by node i. These arcs are capacitated by 1 and their per unit flow cost is 0. And we insert arc from vertex nik to bij to indicate that neighbor k of node i holds block j. To avoid duplicate blocks, we add arc capacitated by 1 from bij to ri and set the per unit flow cost as the priority of block j for node i multiplied a constant −1/|N |. To satisfy the outbound bandwidth constraint of node k, we add arc between vertex s and vertex sk whose capacity is τ Ok . And for the maximum end-to-end available bandwidth from neighbor k to node i, we insert arc from vertex sk to nik with capacity τ Eik . Finally, to incorporate the inbound bandwidth constraint of node i, we introduce arc between ri and t with capacity τ Ii . To guarantee maximum number of blocks are delivered, we insert uncapacitated arc from vertex t to s that has a negative per unit flow cost with large absolute value. Finally we have the conclusion: The min-cost flow problem would yield the optimal solution of the global BSP. We omit the proof here.

5

Heuristic Distributed Algorithm - DONLE

In this section, based on the basic idea of the global optimal solution, we present the heuristic practical algorithm - DONLE which is fully distributed and asyn-

Lecture Notes in Computer Science

7

chronous. In DONLE, each node will decide from which neighbor to fetch which blocks at the beginning of its request period. As the request period is relatively short (such as 2 seconds), our scheduling algorithm should make decision as rapidly as possible. So in our heuristic distributed algorithm, we just do a local optimal block scheduling on each node based on the current knowledge of the block availability among the neighbors. The local optimal block scheduling can also be modeled as a min-cost flow problem. As shown in Fig. 1(b), the sub min-cost flow problem in the each rectangle is just the local optimal block scheduling. However, one problem to do local scheduling is that each node does not know the optimal flow amount on arcs (sk , nki ) (≤ Ok ). In other words, we should estimate the proper upper-bound of the bandwidth from each neighbor. For simplicity, here we use a purely heuristic way for each node to estimate the maximum rate at which each neighbor can send blocks. Our approach is to use the historical traffic from each neighbor to do this. More formally, let Qki denote the estimated maximum rate at which neighbor k ∈ N BRi can deliver to node (p) i. Of course, Qki should not exceed Ok . We let gki denote the total number of blocks received by node i from neighbor k in the pth period. In each request interval, we use the average traffic received by node i in the previous P periods to ¡ Pp (ω) ¢ estimate Qki in the (p + 1)th period: Qki = γ · ω=p−P +1 gki /P τ . Parameter γ(> 1) is a constant called aggressive coefficient. Then we can do a local optimal block scheduling formulated as below and solve it by its equivalent min-cost flow problem in polynomial time. We call it a local BSP. Our distributed algorithm is heuristic and we examine its performance and the gap between DONLE and the global optimal solution by simulation in Section 6. max

X

X

Pji hkj xikj

(4)

j∈Di k∈N BRi

6

s.t. (a)

X

(c)

k∈N X BRi xikj j∈Di

xikj ≤ 1, ∀j ∈ Di ,

(b)

X

X

xikj ≤ τ Ii , ∀i ∈ N

j∈Di k∈N BRi

≤ τ Qki , ∀i ∈ N, k ∈ N BRi , (d) xikj ∈ {0, 1}, ∀k ∈ N BRi , j ∈ Di

Performance Evaluation

In this section, we compare DONLE to other existing block scheduling strategies, and also examine the gap between DONLE and the global optimal solution. Three conventional strategies are compared here: – Random Strategy: each node will assign each desired block randomly to a neighbor which holds that block. Chainsaw [5] uses this simple strategy. We examine how this method works in layered data-driven streaming.

8

Meng ZHANG et al.

(a)

Conservative

(b)

Aggressive

(c)

Tradeoff

Fig. 2. Three round robin strategies

– Local Rarest First (LRF) Strategy: As Section 3 depicted, a block that has the minimum owners among the neighbors will be requested first. DONet [4] adopts this strategy. We also introduce this method into layered data-driven streaming and compare it with ours. – Round Robin (RR) Strategy: All the desired packets will be assigned to one neighbor in a prescribed order in a round-robin way. If the block is only available at one sender, it is assigned to that sender. Otherwise, it is assigned to a sender that has the maximum surplus available bandwidth. In Fig 2, we introduce three conventional block ordering schemes used in the literature. Fig. 2(a) shows the conservative block ordering: it always requests blocks of lower layers first. On the contrary, aggressive block ordering scheme requests blocks of all layers with lowest sequence number (or time stamp) preemptively as illustrated in Fig. 2(b). Fig. 2(c) uses a zigzag ordering (slope=1) which is a tradeoff between the two extreme schemes.

To evaluate the performance, we define delivery ratio of a layer to represent the number of different blocks that arrive at each node before the playback deadline over the total number of blocks encoded in that layer. Since the total number of blocks in a layer is a constant that relies on the encoding and packetization, the average delivery ratio among all nodes can represent the throughput of the overlay. We compare DONLE and global optimal solution to the following five strategies: random, LRF, RR-conservative, RR-aggressive, RR-tradeoff. To ensure fair comparison, all the approaches have the same physical network and end-host participants in each scenario. Each curve in all the plots is an average over 10 simulation runs. We encode the video into 10 layers, and each layer has a rate of 50Kbps. To evaluate the quality of a specified layer, we average the delivery ratio of that layer over all nodes that can achieve the layer due to their inbound bandwidth. We use 500 nodes in the overlay and set the request period to 2 seconds. We set the node access bandwidth is asymmetric: the inbound bandwidth evenly distributes across 15Kbps to 1Mbps; while the outbound bandwidth of each node is randomly selected between half and one time of its inbound bandwidth. We set the outbound bandwidth of the source node to 2Mbps. Previous study [17] has shown that there is a sweet range of neighbor count or peer degree (roughly between 6 to 14) where the delivered quality to the majority of peers is high. Therefore, in our simulation, each node randomly selects 14 other nodes as its neighbors. We set the exchanging window

Lecture Notes in Computer Science

(a)

9

(b) The bottleneck is not only at last mile. Fig. 3. Average delivery ratio at each layer

The bandwidth bottleneck is only at last mile.

to 10 seconds so as to avoid large delay and set the sliding window to 1 minute aiming to increase the opportunity of serving more neighbors. As shown in Fig. 3(a), we compare the global optimal solution and DONLE to five other strategies. In this figure the bottlenecks are configured to be only at the last mile. We note that the global optimal solution has the best performance, and the delivery ratio in all layers is nearly 1. This demonstrates that the generated topologies have sufficient capacity to support all the nodes to receive all layers that they can achieve. The performance of DONLE is also fairly good. Most of the delivery ratio in lower layers has nearly 1 and most in higher layers is also above 0.9. However, though the RR-conservative method has perfect delivery ratio in layer 1 to 4, the quality has a cliff drop from layer 5. This means all the users can enjoy the video of 4 layers very smoothly, yet few nodes can receive data beyond the 4th layer even if their inbound bandwidth is sufficient to support higher quality. This is because requesting lower layers first leads to bad block diversity among nodes. In contrast, the curve of the RR-aggressive method is flat. We note that most nodes can not watch even the base layer, although more blocks of higher layers are propagated, since this method does not consider the layer dependency. RR-tradeoff methods leverage the previous two methods. Here we use zigzag ordering with slope of 1/10 in RR-tradeoff. We found that the LRF strategy has more deliver ratio than round-robin schemes. Meanwhile, the random strategy has the poorest performance. As shown in Fig. 3(a), our distributed method DONLE outperforms other strategies much with a gain of 10%∼80%. Nevertheless, there is still about 12% gap between the global optimal solution and DONLE. In Fig. 3(b), we investigate the performance of these methods when the bottleneck is not only at last mile. In this figure, we let the maximum end-to-end available bandwidth distribute across 10Kbps 150Kbps. All the other configurations are not changed. The delivery ratio of all methods degrades compared to the results when bottleneck is only at last mile. The performance from the best to the poorest in turn is still DONLE, LRF, round-robin schemes, and random. It is observed that the rarity factor has significant im-

10

Meng ZHANG et al.

pact on the throughput improvement in data-driven streaming. Therefore LRF strategy has better performance than round-robin and random strategies. Further, DONLE not only considers the rarity factor, but also does a local optimal scheduling that utilize the local bandwidth capacity as sufficient as possible as explained intuitively in Section 3. Hence DONLE outperforms other strategies.

7

Conclusion and Future Work

To improve the throughput of data-driven streaming in heterogeneous network, we propose a global optimal solution and a distributed algorithm - DONLE. Our simulation results show that our proposed algorithm DONLE is superior to a number of conventional strategies. For future work, we will study how to maximize the blocks delivered over a horizon of several periods, taking into account the inter-dependence between the periods. We are also planning to do more experiments on examining the parameter sensitivities in our algorithm.

References 1. McCanne, S., Jacobson, V., Vetterli, M.: Receiver-driven layered multicast. In: ACM SIGCOMM 1996. (1996) 2. Cui, Y., Nahrstedt, K.: Layered peer-to-peer streaming. In: NOSSDAV. (2003) 3. Zhao, J., Yang, F., Zhang, Q., Zhang, Z., Zhang, F.: Lion: Layered overlay multicast with network coding. IEEE Trans. on Multimedia (2007) Accepted to be published. 4. Zhang, X., Liu, J., Li, B., Yum, T.S.P.: Coolstreaming/donet: A data-driven overlay network for efficent media streaming. In: IEEE INFOCOM 2005. (2005) 5. Pai, V., Kumar, K., et al: Chainsaw: Eliminating trees from overlay multicast. In: IEEE INFOCOM 2005, Conell, US (2005) 6. Zhang, M., Zhao, L., Tang, Y., Luo, J., Yang, S.: Large-scale live media streaming over peer-to-peer networks through global internet. In: ACM workshop on Advances in peer-to-peer multimedia streaming (P2PMMS), Singapore (2005) 21–28 7. Agarwal, V., Rejaie, R.: Adaptive multi-source streaming in heterogeneous peerto-peer networks. In: SPIE/ACM MMCN 2005, San Jose, CA, USA (2005) 8. Cohen, B.: Bittorrent website: http://bitconjuer.com. (2006) 9. Silverston, T., Fourmaux, O.: Source vs data-driven approach for live p2p streaming. In: IEEE International Conference on Networking 2006, Mauritius (2006) 10. GridMedia: http://www.gridmedia.com.cn/. (2006) 11. PPLive: http://www.pplive.com/. (2006) 12. Zhang, M., Xiong, Y., Zhang, Q., Yang, S.: On the optimal scheduling for media streaming in data-driven overlay networks. In: IEEE GLOBECOM. (2006) 13. Venkataraman, V., Francis, P.: On heterogeneous overlay construction and random node selection in unstructured p2p networks. (In: IEEE INFOCOM 2006) 14. Jiang, J., Nahrstedt, K.: Randpeer: Membership management for qos sensitive peer-to-peer applications. (In: IEEE INFOCOM 2006) 15. Bhrarmbe, A.R., Herley, C., Padmanabhan, V.N.: Analyzing and improving a bittorrent network’s performance mechanisms. (In: IEEE INFOCOM 2006) 16. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. (Prentice Hall) 17. Magharei, N., Rejaie, R.: Understanding mesh based peer-to-peer streaming. In: ACM NOSSDAV 2006, Newport, Rhode Island, USA (2006)

Suggest Documents