Computer Networks 57 (2013) 2856–2868
Contents lists available at SciVerse ScienceDirect
Computer Networks journal homepage: www.elsevier.com/locate/comnet
An efficient scheduling algorithm for scalable video streaming over P2P networks q Kai-Lung Hua a,⇑, Ge-Ming Chiu a, Hsing-Kuo Pao a, Yi-Chi Cheng b a b
Dept. of Computer Science & Information Engineering, National Taiwan University of Science & Technology, Taipei 106, Taiwan Apexx Technology Corp, Taoyuan 325, Taiwan
a r t i c l e
i n f o
Article history: Received 29 February 2012 Received in revised form 15 June 2013 Accepted 18 June 2013 Available online 6 July 2013 Keywords: P2P streaming Scalable video coding Block scheduling algorithm
a b s t r a c t During recent years, the Internet has witnessed rapid advancement in peer-to-peer (P2P) media streaming. In these applications, an important issue has been the block scheduling problem, which deals with how each node requests the media data blocks from its neighbors. In most streaming systems, peers are likely to have heterogeneous upload/download bandwidths, leading to the fact that different peers probably perceive different streaming quality. Layered (or scalable) streaming in P2P networks has recently been proposed to address the heterogeneity of the network environment. In this paper, we propose a novel block scheduling scheme that is aimed to address the P2P layered video streaming. We define a soft priority function for each block to be requested by a node in accordance with the block’s significance for video playback. The priority function is unique in that it strikes good balance between different factors, which makes the priority of a block well represent the relative importance of the block over a wide variation of block size between different layers. The block scheduling problem is then transformed to an optimization problem that maximizes the priority sum of the delivered video blocks. We develop both centralized and distributed scheduling algorithms for the problem. Simulation of two popular scalability types has been conducted to evaluate the performance of the algorithms. The simulation results show that the proposed algorithm is effective in terms of bandwidth utilization and video quality. Ó 2013 Elsevier B.V. All rights reserved.
1. Introduction With the widespread deployment of broadband technology, multimedia service has attracted a large number of users from the Internet. Many multimedia platforms over the Internet, such as YouTube and NetTV, have been presented to the public in recent years. One of the major challenges faced by multimedia streaming services is serving a massive number of concurrent users online. In this respect,
q This work was supported by National Science Council of Taiwan via 101-2221-E-011-138. ⇑ Corresponding author. Tel.: +886 2 2730 3664. E-mail addresses:
[email protected] (K.-L. Hua),
[email protected]. edu.tw (G.-M. Chiu),
[email protected] (H.-K. Pao), poca.cheng@apexx. com.tw (Y.-C. Cheng).
1389-1286/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.comnet.2013.06.014
IP multicast seems to be one of the most promising approaches. The deployment of IP multicast, however, turns out to be limited as a result of a variety of technical and market reasons, such as the difficulty of deploying interISP multicast, and unwillingness of users to install multicast-capable routers. Recently, the idea of the applicationlayer technique in peer-to-peer (P2P) environment has been proposed as an alternative solution. In a P2P streaming system, an unstructured overlay network is first constructed for a collection of nodes (or users). Through the overlay network, the participant nodes cooperate to exchange multimedia contents to meet their respective streaming demands. Several P2P streaming systems, such as CoolStreaming [1], Bittos [2], PPTV [3], and Mol’s system [4], have already been deployed in practice and are considered as promising solutions for serving large-scale
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
users owing to the self-scalability property of the P2P technique. These P2P streaming systems operate in a similar way as the BitTorrent [5], with each node being assigned a set of neighboring peers over an overlay network. The streaming media contents are divided into blocks of data for exchanges among the peer nodes. At any given instant of time, each peer may possess some subset of the data blocks with different peers having different content subsets. A node periodically requests each of its neighboring peers for the list of blocks in cache. With this information, the node then issues requests for blocks of interest from its neighbors. On the other hand, each responding node in the network typically allocates only limited upstream bandwidth for answering data requests issued independently by several peer nodes that arrive at the same time. Therefore, it remains a challenge for a node to determine the specific data blocks to retrieve from each of the neighboring nodes under the constraint of the allocated bandwidth. The decision of course directly affects the performance of the system and is generally known as the block scheduling problem. Most of existing P2P schemes adopt ad hoc-like strategies such as random selection [6–9], rarest first [5,1], or round-robin [10] for block scheduling. Note that these ad hoc scheduling methods are not effective for video streaming because when a node requests the media data blocks from its neighbors using these methods, it does not consider the different importance of its desired blocks. Furthermore, to handle the heterogeneity of users’ network environment, the layered encoding technique [11] has been employed to encode the video contents into multiple non-overlapping layers. The fundamental data of the video contents is contained in the base layer. The rest of the layers, called enhancement layers, are used to enhance the quality of video playback in a progressive manner. To decode a high layer of data, all lower layers, including the base layer, must be available to support the decoding function. These layers offer the flexibility of meeting distinct requirements for the streaming quality demanded by different users. But the design also raises new challenges for P2P streaming applications. In particular, it complicates the block scheduling problem. Recently, a number of works have employed the layered coding technique in their P2P streaming systems. Rejaie and Ortega [12] presented a framework for layered P2P streaming, where the round-robin mechanism is employed for requesting video layers for each peer. Lan et al. [13] proposed a scheduling algorithm, which enables scalable streaming for low- to high-bandwidth contents, for peers to request data from the senders. Zhang et al. [14,15] specifically addressed the block scheduling problem from the theoretical perspective of optimizing network performance. The work modeled the problem as a min-cost network flow problem and proposed a centralized optimal solution accompanied by a distributed heuristic version. All of the mentioned works confirm the viability of layered streaming in different aspects. With layered encoding, the layers can represent temporal scalability (frame rate), spatial scalability (picture size), or quality scalability for video [11]. Spatial scalability modality generally makes block sizes of different layers non-uniform,
2857
and thus, requiring distinct bandwidth to transport a block of data in each layer. In addition, some existing techniques, such as Scalable Video Coding (SVC), combine different scalability modalities to enhance video quality. These coding techniques often yield multiple layers with non-uniform block sizes. The non-uniformity of block size introduced by the layered coding techniques leads to further challenges for the block scheduling problem, and renders existing block scheduling algorithms either inapplicable or inadequate in offering satisfactory performance for users. Hence, there is a need to investigate block scheduling for P2P streaming from the perspective of non-uniform block size, in order to cope with a more realistic operating paradigm. The block scheduling algorithms are generally classified into two categories: push-based [16,17] and pull-based [14], according to either it is the sender or the receiver that decides which blocks to transmit, respectively. In this paper, we present a novel solution to the pull-based block scheduling problem. A priority-based model is introduced to determine the weight of a desired block. The priority of a block is designed as a function of block rarity, playback urgency, the layer of the block, and the block size. The priority function is unique in that it strikes good balance between different factors, which makes the priority of a block well represent the relative importance of the block over a wide variation of block size between different layers. More importantly, the priority function employs no weighting parameters and hence requires no time-consuming try-and-error or training to obtain a proper set of these weighting parameters. In general, the weighting parameter settings (if exist) are essential to the system performance; however it is not only troublesome to adjust the optimal parameter setting whenever a system condition changes, but also difficult to tune these weighting parameters to optimize the performance given a system setting. The system can significantly enhance the quality of video playback by maximizing the sum of priorities of the blocks that are exchanged between the peers. To this end, we have first studied the block scheduling problem from a centralized perspective, which is then extended to a distributed version of the block scheduling scheme for pragmatic purpose. Simulation-based evaluation has been conducted to evaluate the performance of the proposed algorithm with respect to various performance metrics, including layer delivery ratio, network goodput, useless throughput, and peak signal-to-noise ratio (PSNR). The simulation results show that the proposed algorithm successfully avoids delivering undecodable blocks, and hence better utilizes the bandwidth to improve the visual quality. The rest of the paper is organized as follows. Section 2 reviews the related work. Section 3 describes the network model used in this paper, followed by the proposed block scheduling algorithm. In Section 4, we conduct simulation to evaluate the performance of our algorithm and compare the results with existing solutions. Finally, we conclude the paper in Section 5. 2. Related work P2P video streaming systems comprise of two major operations: overlay construction and block scheduling. For
2858
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
overlay construction, there are two basic architectures: tree-based and mesh-based. A tree-based overlay architecture constructs a tree topology in which the root node is the server and is the origin source of the video stream. Every other node in the network requests video data from its parent. In a single tree-based overlay network, the leaf peers do not forward data to others and hence is considered unfair to other peers. To remedy this limitation, some research papers, such as Chunkyspread [18] and SplitStream [19], introduced the idea of multi-tree overlay topology where each peer receives the stream data through different routes. Another solution to improve the single tree-based overlay is the mesh-based approach. A mesh-based overlay organizes peers nodes into a less structured overlay network in which each peer exchanges media streaming with a subset of the peers. Examples of mesh-based overlays include Coolstreaming [1], PROMISE [20], and PRIME [21]. Another kernel issue for the P2P streaming service is the block scheduling problem. Conventional block scheduling approaches include pure random strategy [6–9] and local rarest first strategy (LRF) strategy [5,1]. With the random strategy, a node requests a video block of interest from a sender that is randomly selected from those neighbors that are known to cache the block. In contrast, the LRF approach first calculates the number of potential suppliers for each of the desired blocks. It then requests the blocks sequentially according to the ascending order of these supplier numbers. Note that these strategies only consider the single layer streaming environment. Recently, the scalable video coding technique [11] has been employed to encode video contents into multiple non-overlapping layers with each layer consisting of a sequence of data blocks. This technique enables the flexibility for managing the heterogeneity of the network environment. A number of recent works have employed the layered coding technique in their P2P streaming systems. Rejaie and Ortega [12] presented a framework for layered P2P streaming. In their system, once a set of senders is selected, a receiver peer keeps track of the overall throughput and periodically determines the total number of coding layers that can be delivered from all senders. The receiver node then employs the round-robin mechanism for requesting video layers from the senders. In [22], the authors proposed and prototyped a system called LayerP2P, which applied layered video for P2P live video streaming. In this system, a node basically allocates its upload bandwidth to its neighbors probabilistically based on the ratios of data rates it receives from these neighbors. This is done to advocate the incentive principle that peer nodes that upload more get better video quality. On the receiver end, a user node uses a simple prioritized random scheduling algorithm in which requested blocks are categorized into two types (or priorities), namely regular requests and probing requests. A threshold layer l is determined by the expected available download rate at the node. Those blocks belonging to layers that are lower than or equal to l is designated as regular requests, while blocks of layers higher than l are designated as probing requests. The regular requests are sent to the suppliers based on a random scheduling algorithm without further prioritization among different layers (layer 1 to layer l).
However, probing requests are sent to suppliers layer by layer, beginning with layer l + 1, also based on the same random scheduling algorithm. Regular requests are expected to be served first, while probing requests are served only when the suppliers have surplus upload bandwidth to be allocated to the user. In [13], Lan et al. proposed a network architecture called SVCP2P for live scalable media streaming. Every node in SVCP2P exchanges data availability information with the central server on a regular basis. In this framework, the block scheduling algorithm determines the supplier of the blocks according to a scheme that is similar to LRF. Recently, Zhang et al. [14,15] specifically addressed the block scheduling problem from the theoretical perspective of optimizing network performance. The work modeled the problem as a min-cost network flow problem and proposed a centralized optimal solution accompanied by a distributed version; Eberhardg et al. [23] performed a mapping of the layered content to pieces of fixed size and integrated several piece-picking techniques into Bittorrent-based P2P systems. Most of these works have employed certain weighting parameters to reflect the relative importance among different factors in their schemes. These weighting parameters are very critical to the overall system performance, however their values are not easy to be determined. In this paper, we present a novel block scheduling algorithm based on a soft priority function. 3. Non-uniform layered model and scheduling In this section, we first describe the system model used in this paper, followed by the presentation of the proposed block scheduling algorithms. 3.1. Network model and preliminaries In this paper, we assume that there is an overlay network formed among a set of user nodes N. Since the backbone network typically has tremendous amount of bandwidth, this paper assumes that the network bottleneck occurs at the last-mile access network only. Furthermore, we assume that a receiver’s download bandwidth is not a bottleneck. At any instant of time, each node has a set of neighboring peers from which it can request data blocks of interest. In the following treatment, N T Nbr represents the i number of neighbors of node ni, and N T NBR is the maximum number of neighbors that any node in the network is allowed to have. In this paper, we assume that the layered coding technique is used to encode the content of a video. The raw video content is encoded into L layers with each layer consisting of a series of data blocks. All blocks across different layers have equal time length, i.e. block duration is identical for all layers; however, average block sizes for different layers, denoted by r‘, ‘ = 1, . . . , L, may be different. A layer with larger block size demands a higher streaming rate and greater network bandwidth. In addition, we define the minimum block size among all L layers as a block unit; all block sizes are represented in terms of the block unit in the rest of the paper. Moreover, upload bandwidth and
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
streaming rate are all represented in terms of block units per second in the following discussion. The full streaming rate required for viewing the video at the highest quality P with all L layers is R per second, where R ¼ L‘¼1 r ‘ . Since block duration is identical for all layers, we consider that time axis has been divided into units of blocks. At any moment, each node maintains a buffering window of certain size as illustrated in Fig. 1. In the figure, the buffering window consists of a sequence of consecutive data blocks around the present time. Raw video is encoded into four layers (i.e. L = 4) in the figure with layer 1 in the bottom being the base layer. Filled blocks represent the blocks that have already been obtained from its neighbors, while empty blocks represent the ones not currently available in the buffer. Note that the left part of the buffering window contains those blocks that had just been played or will be played soon, thus is categorized as read-only as it would be too late for playout to retrieve any of these blocks. Each node caches the blocks that have been played in the buffering window for a while to serve requests from other nodes. We define a window right next to the read-only segment as exchanging window which is illustrated in Fig. 1. The blocks in the exchanging window are the ones before the playback deadline, and only these blocks are requested if they are not yet received. A node sends requests for desired video blocks to its neighbors periodically, with the buffering window sliding forward by the length of a request period each time. A request period is the time between the issuances of two consecutive block requests. In the following, we denote the length of a request period by s, which is specified in units of block duration and ranges between 1 and 6 s in practice. A higher-layer block cannot be decoded until all blocks of lower layers in the corresponding time frame are available. This basic framework provides us with a foundation to study the generic block scheduling problem. The following study will focus on the scheduling problem in which a user has to determine what data blocks to request from each of its peers under the constraint of upload access bandwidth. A node’s upload bandwidth is shared by those peers from which it receives data requests.
2859
The main goal of a P2P block scheduling algorithm is no doubt to maximize the average network throughput, that is, to deliver as many useful and decodable data blocks in each request cycle as possible so as to improve video streaming quality, given the constraint of upload bandwidth. To achieve this goal, we first define a priority for every desired block as a function of the urgency, rarity, layered dependency, and block size. This priority essentially captures the criticality of the block. The block scheduling problem is then converted to maximizing the priority sum of all delivered blocks. Table 1 summarizes primary notations used in this work to describe the network model and the following block scheduling algorithms. 3.2. Block priority function Different blocks in a video have different significance for video streaming at a user. For example, base-layer blocks guarantee fundamental continuity of video playback, while upper-layer data offers better viewing quality. Constrained by the upload bandwidth of the peers, it is difficult for a node to receive all the blocks it desires. Hence, in our scheme, a node first prioritizes the blocks it needs and sends block requests to the peers based on the priority information. Essentially, the priority of a data block reflects the significance of the block. To this end, in our scheme, four factors are considered to define the priority of a given block. The factors are playout urgency, layered dependency, rarity, and data size of the block. Playout Urgency: A block must be received before its due playback time. Hence, receiving a block whose playback time is close is more important than receiving one that will not be played until a later time. In other words, a block with higher playout urgency should be prioritized over the one just entering the buffer window. Layer Dependency: With the layered coding technique, a high-layer block can only be decoded successfully with the presence of all corresponding
Fig. 1. The x-axis is time and the y-axis represents layer. The raw video is divided into four layers and each layer is further partitioned into blocks. The grey and the white blocks represent the blocks that have been obtained from its neighbors and the blocks not currently available, respectively. Layer 1 is the base layer, providing the fundamental quality of the video, while higher layers provide incremental enhanced quality for users.
2860
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
Table 1 Listing of primary notations. Notation
Description
N
Set of all nodes in the overlay network, that is, N = {n1, n2, . . . , njNj} P The full streaming rate, i.e. Ll¼1 r l The outbound (or upload) bandwidth of node ni Set of desired blocks in the current buffering window of node ni The size of block k at node ni
R Oi Di sizeki T Nbri N T NBR k
hj 2 f0; 1g
Set of neighbors of node ni The maximum number of neighbors that any node is allowed to have k
‘‘hj ¼ 1’’ if node nj holds block k; otherwise k
‘‘hj ¼ 0’’.
r‘(‘ = 1, . . . , L)
The remaining time before block k will be played at node ni Length of a request period L: Number of encoded layers; Lk: the layer of block k The average streaming rate of layer l
xki;j 2 f0; 1g
‘‘xki;j ¼ 1’’ if node ni sends block k to node nj;
P ki kbwri
The priority of block k at node ni
k di
s L,Lk
‘‘xki;j ¼ 0’’, otherwise.
@i
Ratio of total upload bandwidth allotted to node ni by its neighbors to the full streaming rate R The effective number of blocks that sender ni can provide for its neighboring peers
To blend these factors in such a way that the resulting priority well captures the relative criticality of the desired blocks is key to the success of the block scheduling function. Suppose that node ni needs block k. In the following treatment, we use Pki to denote the priority of block k at node ni as of the current reception status. We define P ki as
1 k k k k P þ P sizei ; bwr R L i PkT
PkR
8 9 P k k < = nj 2T Nbr ðaj þ hj Þ i ¼ max 1 ;0 : : ; NT NBR
ð2Þ
k
lower-layer blocks it depends on. Therefore, lowlayer blocks tend to be more important than high-layer blocks. Rarity: Rarity of data blocks has often been considered as one of the key factors for data dissemination in a P2P network. Experience says that balancing data copies among different blocks is commonly a good practice for network performance. Therefore, a rare block among neighbors should have higher priority over those with more existing copies to acquire. Block Size: The introduction of this parameter marks the key difference of this study from the previous works. The block size normally reflects the video quality carried by the block. A large-sized block represents good video quality, and thus is considered more important than a small-sized block. However, block size dictates the bandwidth required for transporting the block. The same upload bandwidth can be used to send more blocks of small size than those of large size. Therefore, the quality-bandwidth tradeoff factor should be considered in the block scheduling problem.
Pki ¼
where PkT ; PkR ; PkL denote the playback urgency factor, the rarity factor, and the layer dependency factor, respectively, kbwri is an index1 representing the upload bandwidth allotk ted to node ni by the neighbors, and sizei represents the size of block k at node ni. Block priority is inversely proportional to P kT , but are proportional to both PkR and P kL , with the significance of P kR throttled by kbwri . k k The term PkT is given by PkT ¼ bdi =sc, where di denotes the remaining time until block k will be played at node k ni. Note that di is always greater than zero, since the user can only request blocks after read-only buffer as illustrated in Fig. 1. In comparison with P kR and PkL ; P kT plays a more crucial role in determining the priority of a block. In other words, the priority function gives preference for blocks that are urgent for playback. We define P kR in such a way to encourage rare blocks to be disseminated more rapidly as:
Here hj indicates whether or not node nj holds block k and takes on value one if it does and zero otherwise. To account for those copies of block k that has already been sent by a neighboring node nj to its neighbors that are invisible to node ni, we use akj to record such a number of copies at node nj. In other words, we also consider copies two hops from node ni. By taking this factor into consideration, the rarity of a block is better reflected when computing the block priority. Theconstant N T NBR is treated as a threshold; P k if nj 2T Nbr akj þ hj is equal to or larger than N T NBR , we coni sider the block is not rare at all and set P kR to zero. We now discuss the definition of PkL . P kL is composed of two components. The first component is strictly dependent on the layer of block k. Basically, we consider a lower layer has higher priority than an upper layer; because without lower layers, we cannot decode upper-layer contents. The second component depends on the current reception status and accounts for aggregate contribution made by all available blocks in the same time frame as block k, provided that block k is obtained successfully by node ni. Let Lk denote the layer to which block k belongs. In addition, let Iq indicate whether the block in layer q in the same time frame as block k has been received or not; Iq takes on value one if the block has been received and zero otherwise. PkL is then given as
" PkL ¼ 1
# " Lk L 1X 1 X r‘ þ R ‘¼1 R ‘¼1;‘–L
k
r‘
‘ Y
!# Iq
:
ð3Þ
q¼1;q–Lk
The first term is self-explaining. We now use an example in Fig. 2 to illustrate the idea about the second component. In the figure, blocks b1, b2, b3, and b4 are not available in the buffer and they all lie in layer 3. Note that we have P R ¼ 4‘¼1 r ‘ ¼ 10 in this example. According to our definition, the second component will be zero for both b1 and b3. This is because both b1 and b3 have some lower-layer block missing in the buffer. The video quality would not improve by obtaining b1 or b3 until its corresponding low-
ð1Þ 1
The notation kbwri will be explained in detail shortly.
2861
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
Fig. 2. An example to illustrate how we compute P kL for various blocks b1, b2, b3 and b4.
er-layer blocks are all received. However, the arrival of b2 or b4 can immediately enhance video quality. Hence the second components assigned to b2 and b4 are (1 + 2)/ 10 = 3/10 and (1 + 2 + 4)/10 = 7/10, respectively. Note that, unlike b2, obtaining b4 allows the existing block in layer 4 to become useful as well. Hence, the associated contribution is also included for b4. Note that P kR and PkL are basically treated on equal footing in the priority function, with the influence of P kR throttled by the parameter kbwri . This is explained as follows. One of the policies adopted by our scheduling algorithm is to give priority to meet fundamental demand. That is, a node is more likely to fetch lower-layer blocks before it considers upper-layer ones. Consequently, block rarity often occurs at high layers. Hence, the priority function encourages requests for rare blocks only when the bandwidth allotted to the node is adequate. The parameter kbwri denotes the ratio of total upload bandwidth allotted to node ni by the neighbors to the full streaming rate R. This parameter is dictated by the bandwidth allocation policy adopted by a system and is normally obtained through proper estimation. For example, in [22], the total downloading bandwidth at a node is estimated by its entire upload rate. When there is sufficient upload bandwidth, kbwri increases and the significance of rarity factor grows. With the above definition of block priority function, Fig. 3 illustrates an example for the block scheduling model used in the paper. Each node in our P2P streaming system plays two roles, a sender and a receiver. The notation Oi besides the pipe of each sender represents the upload bandwidth of the sender. The arrows indicate the receivers for each of the senders. In the figure, blocks buffered by a sender is given by the side of the sender with block IDs being indicated in the corresponding boxes. The box listed by the side of a receiver indicates the blocks desired by the receiver. To facilitate the exposition, the same block ID at different nodes indicates the same video block. If a sender owns the blocks that a receiver desires, the sender can offer its upload bandwidth for the receiver. For example, sender S2 holds blocks 1, 3, 4, and 5. It can send block 4 to R2 which desires blocks 4 and 6. In addition, S2 can send blocks 1, 3, and 5 to R3 if bandwidth is adequate. On the right side of each receiver, there is a table recording the priority of the
blocks desired by the receiver. Note that the priority for the same block may be different at different receivers. The goal of the block scheduling problem is to satisfy demand from the users by exchanging as many useful blocks between peer nodes as possible under the upload bandwidth constraint. To this end, we approach the block scheduling problem by maximizing the priority sum of the blocks that can be delivered in a scheduling cycle under the constraint of upload bandwidth. 3.3. Block scheduling algorithms 3.3.1. The centralized algorithm To gain insight into the problem, we first study the problem from a centralized perspective. Let Dj represent the set of blocks desired by node nj. The optimization problem is formulated as follows:
max
X X X
k
Pkj hi xki;j ;
ð4Þ
ni 2Nnj 2T Nbr k2Dj i
s:t: X X k xki;j sizej 6 sOi ; 8ni 2 N; nj 2T Nbr k2Dj
X
i
xki;j 6 1; 8nj 2 N; 8k 2 Dj ;
ni 2N
where xki;j 2 f0; 1g takes on value 1 if node ni sends block k to node nj and 0 otherwise. This problem is a variant of the parallel machine scheduling problem, which is classified as NP-hard [24]. Furthermore, the algorithm must quickly adapt to the highly dynamic network environment. Hence, we propose a heuristic algorithm to solve the problem. Obviously, a P2P streaming system should fully utilize the upload bandwidth of all the nodes in order to maximize the sum of delivered priorities. A block desired by a receiver may likely be provided by more than one neighboring peer. In this case, choosing the right source to send a desired block is crucial for bandwidth utilization. Consider, for example, that node ni needs a block that can be provided by either node nj or node nk. Suppose that node nj has small upload bandwidth such that this block is the sole block that node nj can offer to its neighboring peers (including node ni), while node nk has larger upload
2862
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
Fig. 3. Non-uniform block size scheduling model.
bandwidth and consequently has other blocks than this one to offer to its peers. If node nk chooses to send the sole block, that nj could offer, to node ni, the upload bandwidth of node nj would be completely wasted as it can serve no other request. Basically, the diversity of blocks and receivers that a sender can support is dictated by the amount of upload bandwidth available at the sender. High upload bandwidth tends to imply a wide choice of peers and data blocks that the sender can serve and such high diversity gives greater flexibility with respect to block scheduling for the sender. The previous observation tells that it should be a good practice to allow a sender with lower diversity to be scheduled ahead of another with higher diversity. The proposed heuristic algorithm takes this element into account and schedules block delivery with respect to the senders one by one in the ascending order of their serving capabilities. Essentially, the heuristic algorithm consists of two steps: sender ranking and block allocation, described as follows. Sender Ranking Step In this step, we rank the senders’ serving capabilities. The serving capability of a sender is essentially the amount of data blocks the sender node can offer to meet the needs of its peers. Such serving capability is represented by a 2-tuple (upload bandwidth, @), where @ is the effective number of blocks (regardless of the positions, layers, and sizes of the blocks) the sender can provide for its peers under the assumption that upload bandwidth is unlimited. In other words, we do not consider the constraint of upload bandwidth when we compute @. For example, in Fig. 3, S2 holds block 1 desired by R3, block
3 desired by R3 and R4, block 4 by R2, and block 5 by R3. Therefore, S2 can support five blocks as long as its upload bandwidth is adequate, thus the @ associated with node S2 is five. In the serving capability, the component of upload bandwidth is more significant than @. All senders in the network are ranked in the ascending order of their serving capabilities. If more than one sender has the same serving capability, we rank the nodes according to node ID, and thus all nodes are eventually fully ordered. Such an order is used by the algorithm to perform block scheduling in the subsequent block allocation step. Block Allocation Step In this step, our algorithm takes one sender at a time and determines the peers and the blocks the sender shall serve in the current scheduling cycle. The senders are treated according to the ascending order resulted from the previous step, a best-fit-like idea. In other words, the algorithm performs scheduling for the least competent sender first, followed by the second least competent sender, and so on. Note that the scheduling operation is cumulative in the sense that, to perform block scheduling for the ith sender, the outcome of all previous i 1 scheduling steps must be committed first. In following discussion, we describe how the block scheduling is done for a given sender. Assume that node n^i 2 N is the sender of interest and the corresponding upload bandwidth is O^i . Again let x^ki;j denote whether node n^i sends block k to neighbor nj 2 T Nbr^i with k 2 Dj. We proceed to maximize the priority sum for the sender n^i as indicated by the following optimization problem:
2863
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
X X k k Pj h^i x^ki;j
max
ð5Þ
nj 2T Nbr k2Dj ^ i
s:t: X X k x^ki;j sizej 6 sO^i
Assume that the current cycle is the mth one. Then, U ji is given by
U ji
nj 2T Nbr k2Dj ^ i
The constraint in (5) ensures that the number of requested blocks from n^i ’s neighbors satisfies the limitation of the upload bandwidth capacity at node n^i . Assume that there are a total of @^i effective blocks that n^i can send to its neighboring gi and pe ; 1 6 i 6 @^, represent the block size and peers. Let size i i the priority value of the ith block. Let M(s, t) denote the maximum priority sum that can be achieved with the candidate set of blocks being the first s blocks, i.e. blocks 1 to s, under the upload bandwidth constraint of t. We can then use dynamic programming technique [25] to solve the problem for which the recurrence equation is given as follows: 8 if s ¼ 0 or t ¼ 0; > < 0; ~ s > t; if size Mðs;tÞ ¼ Mðs 1;tÞ; > : gs Þ þ pe Þ; otherwise: maxðMðs 1;tÞ;Mðs 1;t size s
Accordingly, the optimal solution for the original problem will be Mð@^i ; sO^i Þ. We choose the greedy approach because the theoretical optimum cannot be achieved easily as this problem is NP-hard. The theoretical bound is not easy to compute because we have to consider the scheduling in parallel. Nevertheless, when we employ the greedy algorithm and focus on one sender at a time, we can obtain the priority sum not worse than one half of the optimal priority sum that is achieved by brute-force search [25]. 3.3.2. The distributed algorithm The centralized algorithm requires all relevant information to be collected beforehand in each scheduling cycle. Such requirement is not realistic for many P2P streaming service systems. In this section, we present another fullydistributed algorithm to deal with the block scheduling problem. The distributed algorithm is initiated by each receiver at the beginning of the scheduling period. Basically, each node computes a list of blocks to be retrieved from each of its neighbors. The main difference from the centralized mode is that a receiver has to estimate the amount of upload bandwidth that each of its neighbors may possibly allot to it in a distributed manner. The estimation outcome is then used to compute the above-mentioned list of blocks to be requested from the neighbors. Detail of the algorithm is described below. The upload bandwidth of a node is shared by all of its neighbors. Hence, in a distributed P2P system, it would be difficult for a node to know the exact amount of upload bandwidth that it will be allotted by each of its neighbors. In our algorithm, we resort to an estimation method to determine such upload bandwidth for each of the senders. To this end, our approach takes the average upload rate for the most recent p cycles regulated by a coefficient that adapts to the data delivery condition at the receiver. Specifically, we can consider a situation in a receiver node nj. Let U ji denote the estimated upload rate at which neighbor ni 2 T Nbrj allots to node nj. Also, let g ki;j denote the total amount of data received from neighbor ni in the kth cycle.
i m
¼c
P k ð m1 k¼mp g i;j Þ
sp
ð6Þ
;
where cim is a coefficient falling between 1 and 2 and is adapted from cim1 . A small value of cim may possibly lead to an underestimation for U ji , causing underutilization of upload bandwidth at node ni. However, compounding this coefficient may exceed the capacity of bandwidth that is allotted by node ni to node nj, causing some block requests to be turned down. Therefore, a good estimation is critical for system performance. In our scheme, two cases are considered in determining the value for cim . The first case is that some block requested from node ni in the (m 1) th cycle was not delivered. A reasonable conclusion is that the requests have exceeded the capacity of node ni. In this case, c(im is determined as follows:
cim ¼
cim1 0:1; if cim1 P 1:1; 1;
otherwise:
ð7Þ
The second case refers to the situation in which all requested blocks had been received from node ni. In this case, a reasonable speculation is that the upload bandwidth at node ni was underutilized. Thus, we become more aggressive as(shown by the following formula:
cim ¼
cim1 þ 0:1; if cim1 6 1:9; 2;
otherwise:
ð8Þ
Based on the estimated upload bandwidth, a receiver P j ni 2T Nbr
Ui
j and then compute priority will then set kbwrj to R for the blocks desired by the receiver nj accordingly. Subsequently, the algorithm generates a list of blocks to be requested from each of its neighbors. The process is similar to the centralized algorithm and also consists of two steps:
In the sender ranking step, node nj ranks the senders as described previously based on the estimated upload bandwidth U ji . The second step performs block allocation operation, in which senders are scheduled one by one according to the order determined in the previous step. However, in this case, only a single receiver, i.e., nj, is considered for each sender, and thus the optimization problem is
max
X k k P j hi xki;j k2Dj
s:t: X k xki;j sizej 6 sU ji ;
ð9Þ
k2Dj
The dynamic programming technique described earlier can also be used to solve the problem. 4. Performance evaluation 4.1. Simulation configuration To evaluate the performance of the proposed block scheduling algorithm, we have implemented the algorithm
2864
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
on an P2P streaming simulator called OPSS [26]. In the simulator, we assume that there is a streaming server that serves as the original video source. The maximum number of user nodes, jNj, in the overlay network is 1200. During the simulation, participating nodes join the overlay network according to a Poisson process with an average rate of one node per second. When a new node joins the system, the system selects seven nodes that have the close play times as the node’s neighbors. In return, this node can also be designated as neighbors of other nodes when they join the overlay network afterwards. In addtion, any user in the system is restricted to have at most fourteen neighbors at any time,2 that is NT NBR ¼ 14. The outbound bandwidth of the server is set to 6 Mbps throughout the simulation. However, we assume that there are three different classes of upload bandwidth that may be allotted by the users. As shown in Table 2, any user allots 120 Kbps, 520 Kbps, and 1200 Kbps of upload bandwidth for the streaming service with probability 0.2, 0.5, and 0.3, respectively. When a node joins the overlay network, it randomly selects the class of upload bandwidth accordingly. Such setup is mainly used to simulate the variation of upload bandwidth that may be allotted by different users in real P2P streaming applications. In contrast, the download bandwidth is always set to 10 Mbps for all users, ensuring that download links are not bottleneck for the service. In the simulation, real videos [28] are encoded by the Joint Scalable Video Model (JSVM) [29] beforehand in order to collect actual block sizes used in the simulation. The fundamental scalability types of JSVM are SNR (quality) scalability, temporal (frame rate) scalability, and spatial (resolution) scalability. The input videos are all 30 min3 in length and is encoded by JSVM with SNR and spatial scalability types. The number of layers of the encoded video content are five and three for SNR and spatial scalability, respectively. The exchanging and buffering window are set to 10 s and 30 s in length, respectively. Each node slides the buffering window and schedules block requests once every four seconds as suggested by [12,27], i.e. the request period s is set to four seconds. The playback delay is set as the same value of the exchange window.4 Recall that we have used the parameter kbwri , the ratio of total upload bandwidth allotted to node ni by the neighbors to the full streaming rate R, in the block priority function. In the simulation, we assume that a node ni computes this parameter by estimation based on the assumption that the upload bandwidth of a node is equally shared by all of its neighbors. Here we use Fig. 4 to illustrate the estimation process. In the figure, node ni has two sources, n1 and n2, from which it can request for desired blocks. There are respectively three and two neighbors to share the upload bandwidth of n1 and n2. Hence, the total upload bandwidth 2 Previous study [27] demonstrated that the delivery quality to the majority of peers is high when the number of neighbors falls in the range of 6–14. 3 We construct this 30-min video by wrapping around if the end of the original coded stream is reached. 4 To help readers to reproduce our simulation results, all tools and test data can be obtained through the following website: http://faculty.csie.ntust.edu.tw/hua/research/P2P.html.
Table 2 Distribution of upload bandwidth allotted by users. Upload bandwidth (Kbps)
Fraction of the users (%)
120 520 1200
20 50 30
allotted to node ni is estimated as O31 þ O22 , where O1 and O2 are the upload bandwidth of n1 and n2, respectively. In the simulation, we use the following four metrics to capture the performance of simulated the algorithms: Layer delivery ratio: This metric is used to reflect the delivery outcome for each layer. More specifically, the delivery ratio for layer ‘ is defined as:
useful data size received in layer ‘ : total data size in layer ‘ Goodput (useful throughput): This metric is defined as the average number of useful bits received by a user per second. Useless throughput: If a block is received after its playback deadline, this block is of no help for video quality and thus is considered useless. Besides, a received block cannot be decoded if any of the corresponding lower-layered blocks is not available. Hence, such block is also treated as a useless block. This metric reflects the amount of network bandwidth that is wasted in transmitting useless blocks. Peak signal-to-noise ratio (PSNR): This metric is employed to evaluate the average video quality decoded by all the users. The PSNR is defined as:
PSNR ¼ 10 log10
2552 ; MSE
where MSE is the sum over all squared value differences of the original video and the reconstructed video divided by image size. We also compare the performance of the proposed algorithm with those obtained using the existing solutions,
Fig. 4. An example for evaluating kbwri . Nodes n1 and n2 are sending blocks to node ni as indicated by the arrows. Nodes n1 and n2 have three and two neighbors (including ni) to share the upload bandwidth, respectively. The upload bandwidth allotted to node ni is estimated as O1 þ O22 , where O1 and O2 are the upload bandwidth of n1 and n2, 3 respectively.
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
2865
Fig. 5. Performance comparison of the proposed ‘‘NBSA-G’’ and ‘‘NBSA’’ method against ‘‘Min-Cost [14]’’, ‘‘RR [10]’’, and ‘‘RR-T [10]’’ when encoded with SNR scalability for different block scheduling schemes: (a) Layer delivery ratio; (b) Goodput; (c) Useless throughput; and (d) PSNR.
namely round-robin and min-cost. In the following treatment, the proposed distributed algorithm is called Nonuniform Block Scheduling Algorithm (NBSA for short). For the scenario of layered coding, the round-robin algorithm is usually classified into three different classes with respect to the block ordering sequence: conservative, aggressive, and trade-off [14,10]. Since the aggressive method will lead to a lot of useless blocks when the upload bandwidth is limited, we only simulate the conservative and the trade-off round-robin schemes, which are denoted as RR and RR-T, respectively. The min-cost method [14] transforms the block scheduling problem into a min-cost flow problem. For this method, several weighting parameters have been introduced in the min-cost algorithm. To gain a fair comparison of the simulation results, we have used the values that produce the best performance for these parameters via repeated trials.5 Furthermore, to gain more insight about the performance of NBSA, we have simulated the globally centralized version of our algorithm, which is denoted by NBSA-G in the following discussion. The perfor-
5 The weighting parameters used for the three factors (rarity, deadline, layer) in the min-cost method are respectively (0.2, 0.7, 0.1) in our simulation. We derived these values by trying all combinations of (a, b, c), where a = x 0.1, b = y 0.1, c = z 0.1 with x, y, and z being positive integers and x + y + z = 10 under the condition of uniform block size.
mance of NBSA-G essentially serves as an upper bound for NBSA.
4.2. Simulation results 4.2.1. SNR scalability We first encode ‘‘Foreman’’ sequence in CIF (352 288) resolution by JSVM [29] with five quality levels (i.e. five layers). The output bit rates varies, with the average bit rate for layer 1 to layer 5 being about 80 Kbps, 97 Kbps, 102 Kbps, 125 Kbps, and 175 Kbps, respectively. The total streaming rate is thus around 579 Kbps. Fig. 5(a) plots average delivery ratio versus different layers for the simulated scheduling algorithms. We note that the delivery ratio steadily declines from layer 1 to layer 5 for all methods. In particular, we observe significant drop of the delivery ratio at layer 5. This is mainly resulted from the fact that block sizes of high layers are relatively large, thus fewer senders have enough upload bandwidth to support the transmission of high-layer blocks. The layer dependency factor also affects the delivery ratio for high-layer blocks. The centralized algorithm NBSA-G evidently offers the best performance among all simulated methods. More importantly, the performance of NBSA is fairly close to its centralized counterpart and clearly outperforms Min-Cost, RR, and RR-T, which is important from the pragmatic perspective.
2866
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
1 0.9
Delivery Ratio
0.8 0.7 0.6 0.5
NBSA-G NBSA Min-Cost RR RR-T
0.4 0.3 0.2 0.1 0
5
10
15
20
Exchanging Window Size (sec) Fig. 6. Delivery ratio for different exchanging window sizes.
Fig. 5(b) depicts average goodput of the network for the algorithms. Again, NBSA performs very close to NBSA-G with respect to this metric. NBSA has a gain of 8%, 20%, and 13% when compared with Min-Cost, RR, and RR-T, respectively, in terms of goodput. A good block scheduling method should not only offer high throughput but also keep the number of useless blocks small. In Fig. 5(c), we examine the average useless
throughput for the block scheduling algorithms. Here the useless throughput refers to the amount of bandwidth consumed by the associated useless blocks in terms of Kbps. Evidently, NBSA-G has the lowest useless throughput since it has the exact bandwidth information, and thus is able to avoid useless blocks caused by being overdue. The useless throughput for NBSA is lower than those produced by MinCost and RR-T. This is mainly due to the fact that the proposed priority function can effectively capture the criticality of a block as of the reception status at a node. The nonuniform property of block size across different layers obviously makes Min-Cost method suffer a relatively high useless throughput. In contrast, RR-T yields the highest useless throughput because it tends to give comparatively high precedence for high-layer blocks. In addition to the evaluation metrics of goodput, average delivery ratio, and useless throughput, we also evaluate the performance in terms of the video quality through PSNR metric. As shown in Fig. 5(c), the proposed method successfully avoids delivering undecodable blocks since it has the lowest useless throughput. In other words, the proposed method better utilizes the bandwidth to improve the visual quality which is confirmed through PSNR metric as shown in Fig. 5(d). In Fig. 6, we examine the impact of the exchanging window size based on SNR scalability. One can readily observe that a larger exchanging window size results in a higher
Fig. 7. Performance comparison of the proposed ‘‘NBSA-G’’ and ‘‘NBSA’’ method against ‘‘Min-Cost [14]’’, ‘‘RR [10]’’, and ‘‘RR-T [10]’’ when encoded with spatial scalability for different block scheduling schemes: (a) Layer delivery ratio; (b) Goodput; (c) Useless throughput; and (d) PSNR.
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
delivery ratio. This is because when the exchanging window size grows large, there will be more opportunity for a node to remedy the transmission failure, such as data loss or bandwidth restriction. Nevertheless, with a larger exchanging window size, the peers will experience longer playback delay.
4.2.2. Spatial scalability To study the effect of spatial scalability on the performance of the proposed algorithms, we have encoded the ‘‘City’’ sequence with three spatial layers, that is, QCIF (176 144), CIF (352 288), 4CIF (704 576). The output bit rates varies with the average bit rate for layer 1 to layer 3 being about 52 Kbps, 159 Kbps, and 453 Kbps, respectively. The total streaming rate is then around 664 Kbps. Notice that in this case block size grows dramatically across different layers, which yields the ratio of streaming rate between consecutive layers to be comparatively larger than the previous simulation settings. Fig. 7 plots the simulation results with this setting. Evidently, our algorithm still outperforms the others. In fact, NBSA scores near 100% delivery ratio for all layers except layer 3. This is because the two lower layers consume relatively small bandwidth, leaving more bandwidth for layer 3. Although the RR scheme delivers decent delivery ratios for the lowest two layers, it, however, has poor delivery ratio for layer-3 blocks, leading to the worst average goodput among all simulated algorithms. Note that for the case that a node only receives the video blocks of the lower layer (s), but not of the higher layer (s), we will first interpolate the decoded lower-resolution image to be of the same resolution as 4CIF (the original resolution) and then compute the corresponding PSNR between the interpolated video and the original video. In summary, we believe that our soft priority function is able to incorporate different factors in a more comprehensive manner for block dissemination under bandwidth constraint. In addition, the proposed soft priority function can be successfully applied to various scalability types of scalable video coding. The proposed distributed NBSA algorithm is robust in steadily offering the best performance among all simulated distributed algorithms across various simulation settings. Moreover, NBSA is highly effective in terms of bandwidth utilization and video quality as indicated by its low useless throughput and high PSNR values.
5. Conclusion In this work, we have proposed a novel block scheduling approach for P2P layered video streaming systems. Our approach first assigns a priority to each block desired by any node. We then perform block scheduling by maximizing the priority sum of all delivered blocks. The proposed soft priority function is unique in that it effectively reflects the relative importance of a block. Based on the priority function, we have designed centralized and distributed algorithms for the problem. The simulation results show that the proposed scheme successfully avoids delivering undecodable blocks, and hence better utilizes the
2867
bandwidth to improve the visual quality by up to 2.5 dB when compared with the existing solutions. Acknowledgment This work was supported by the National Science Council of Taiwan (NSCT) under Grant NSC 101–2221-E-011– 138. We would like to thank Profs. Tai-Lin Chin and Tien-Ruey Hsiang for stimulating and helpful discussion. We would like to thank the anonymous reviewers for many useful and insightful comments. References [1] X. Zhang, J. Liu, B. Li, Y.S.P. Yum, Coolstreaming/DONet: a data-driven overlay network for peer-to-peer live media streaming, in: Proceedings of the 24th Annual Joint Conference of the IEEE Computer and Communications Societies. INFOCOM 2005, vol. 3, 2005, pp. 2102–2111. [2] A. Vlavianos, M. Iliofotou, M. Bitos, Bittos:enhancing bittorrent for supporting streaming applications, in: Proceedings IEEE INFOCOM, 25TH IEEE International Conference on Computer Communications, 2006. [3] PPTV , 2012 (accessed August 2012). [4] J. Mol, A. Bakker, J. Pouwelse, D. Epema, H. Sips, The design and deployment of a bittorrent live video streaming solution, in: Proceedings of the 2009 11th IEEE International Symposium on Multimedia, 2009, pp. 342–349. [5] B. Cohen, Incentives build robustness in bittorrent, in: Workshop on Economics of Peer-to-Peer Systems, 2003. [6] T. Bonald, L. Massouli, F. Mathieu, D. Perino, A. Twigg, Epidemic live streaming: optimal performance trade-offs, in: Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2008, pp. 325–336. [7] L. Massoulie, A. Twig, C. Gkantsidis, P. Rodriguez, Randomized decentralized broadcasting algorithm, in: Proceedings of IEEE Infocom, 2007, pp. 1073–1081. [8] L. Massoulié, A. Twigg, Rate-optimal schemes for peer-to-peer live streaming, Perform. Eval. 65 (11-12) (2008) 804–822. http:// dx.doi.org/10.1016/j.peva.2008.03.003. [9] V. Pai, K. Kumar, K. Tamilmani, V. Sambamurthy, A.E. Mohr, Chainsaw: eliminating trees from overlay multicast, in: Proceedings of the 4th International Workshop on International workshop on Peer-To-Peer Systems. IPTPS 2005, LNCS, vol. 3640, 2005, pp. 127–140. [10] V. Agarwal, R. Rejaie, Adaptive multi-source streaming in heterogeneous peer-to-peer networks, in: SPIE Conference on Multimedia Computing and Networking, 2005. [11] H. Schwarz, D. Marpe, T. Wiegand, Overview of the scalable video coding extension of the H.264/AVC standard, in: IEEE Transactions on Circuits and Systems for Video Technology In Circuits and Systems for Video Technology, 2007, pp. 1103–1120. [12] R. Rejaie, A. Ortega, PALS: peer-to-peer adaptive layered streaming, in: Proceedings of the 13th International Workshop on Network and Operating Systems Support for Digital Audio and Video, ACM, 2003, pp. 153–161. [13] X. Lan, N. Zheng, J. Xue, X. Wu, B. Gao, A peer-to-peer architecture for efficient live scalable media streaming on internet, in: Proceedings of ACM Multimedia07, ACM, 2007, pp. 783–786. [14] M. Zhang, Y. Xiong, Q. Zhang, L. Sun, S. Yang, Optimizing the throughput of data-driven peer-to-peer streaming, IEEE Transactions on Parallel and Distributed Systems. 20 (1) (2009) 97–110. [15] X. Xiao, Y. Shi, Y. Gao, Q. Zhang, LayerP2P: a new data scheduling approach for layered streaming in heterogeneous networks, in: Proceedings of the 28th IEEE International Conference on Computer Communications, INFOCOM 2009, 2009, pp. 603–611. [16] L. Abeni, C. Kiraly, R.L. Cigno, On the optimal scheduling of streaming applications in unstructured meshes, in: Proceedings of the 8th International IFIP-TC 6 Networking Conference, 2009. [17] I. Chatzidrossos, G. Dán, V. Fodor, Delay and playout probability trade-off in mesh-based peer-to-peer streaming with delayed buffer map updates, Peer-to-Peer Networking and Applications 3 (3) (2010) 208–221.
2868
K.-L. Hua et al. / Computer Networks 57 (2013) 2856–2868
[18] V. Venkataraman, K. Yoshida, P. Francis, Chunkyspread: heterogeneous unstructured tree-based peer-to-peer multicast, in: Proceedings of the 14th IEEE International Conference on Network Protocols, ICNP 2006, 2006, pp. 2–11. [19] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, A. Singh, SplitStream: high-bandwidth multicast in cooperative environments, SIGOPS Oper. Syst. Rev. 37 (2003) 298–313. http:// doi.acm.org/10.1145/1165389.945474. [20] M. Hefeeda, A. Habib, B. Botev, D. Xu, B. Bhargava, PROMISE: peer-topeer media streaming using collectcast, in: Proceedings of the 11th ACM International Conference on Multimedia, ACM, 2003, pp. 45– 54. [21] N. Magharei, R. Rejaie, PRIME: peer-to-peer receiver-driven meshbased streaming, in: Proceedings of the 26th IEEE International Conference on Computer Communications. INFOCOM 2007, 2007, pp. 1415–1423. [22] Z. Liu, Y. Shen, K. Ross, S. Panwar, Y. Wang, Layerp2p: using layered video chunks in p2p live streaming, IEEE Transactions on Multimedia 11 (7) (2009) 1340–1352. [23] M. Eberhard, T. Szkaliczki, H. Hellwagner, L. Szobonya, C. Timmerer, Knapsack problem-based piece-picking algorithms for layered content in peer-to-peer networks, in: Proceedings of the 2010 ACM Workshop on Advanced Video Streaming Techniques for Peerto-Peer Networks and Social Networking, AVSTP2P ’10, ACM, New York, NY, USA, 2010, pp. 71–76. URL http://doi.acm.org/ 10.1145/1877891.1877908. [24] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, second ed., The MIT Press, 2002. [25] G. Brassard, P. Bratley, Fundamentals of Algorithmics, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996. [26] L. Bracciale, F.L. Piccolo, D. Luzzi, S. Salsano, OPSS: an overlay peerto-peer streaming simulator for large-scale networks, ACM SIGMETRICS Performance Evaluation Review 35 (3) (2007) 25–27. [27] N. Magharei, R. Rejaie, Understanding mesh-based peer-to-peer streaming, in: Proceedings of the 2006 International Workshop on Network and Operating Systems Support for Digital Audio and Video, ACM, 2006, pp. 1–6. [28] Xiph.org, Xiph.org video test media , 2012 (accessed August 2012). [29] JVT, H.264/svc reference software (jsvm 9.19.14) and manual , 2011 (accessed August 2011).
Kai-Lung Hua received the B.S. degree in electrical engineering from National Tsing Hua University in 2000, and the M.S. degree in communication engineering from National Chiao Tung University in 2002, both in Hsinchu, Taiwan. He received the Ph.D. degree from the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, in 2010. Since 2009, Dr. Hua has been with National Taiwan University of Science and Technology, where he is currently an assistant professor in the Department of Computer Science and Information Engineering. He is a member of Eta Kappa Nu and Phi Tau Phi. His current research interests include digital
image and video processing, computer vision, and multimedia networking.
Ge-Ming Chiu received the B.S. degree from National Cheng-Kung University, Taiwan, in 1976, the M.S. degree from the Texas Tech University in 1981, and the Ph.D. degree from the University of Southern California in 1991, all in electrical engineering. He is currently a professor in the Department of Computer Science and Information Engineering at National Taiwan University of Science and Technology, Taipei, Taiwan. His research interests include mobile computing, data engineering, distributed computing, fault-tolerant computing, and parallel processing. Dr. Chiu is a member of the IEEE Computer Society.
Hsing-Kuo Pao (Kenneth) received the bachelor degree in mathematics from National Taiwan University, and M.S. and Ph.D. degrees in computer science from New York University. From 2001 to 2003, he was a post-doctorate research fellow in the University of Delaware, and later he joined in Vita Genomics as a research scientist. In 2003, he joined the department of computer science and information engineering in National Taiwan University of Science and Technology, and now he is an associate professor. His current research interests include machine learning, computer vision and bioinformatics.
Yi-Chi Cheng received the B.S. degree from National Taipei University, Taiwan in 2008, and the M.S. degree from National Taiwan University of Science and Technology, Taiwan in 2010, all in computer science and information engineering. Since 2010, she joined Apexx Technology Corporation as a software engineer. Her research interests include networking, corresponding algorithm, and protocol design.