QoE-Driven Cross-Layer Optimization for Wireless Dynamic Adaptive ...

1 downloads 0 Views 5MB Size Report
Mar 3, 2015 - Abstract— Recently, Dynamic Adaptive Streaming over ... Each segment is encoded into multiple bitrate streams or quality levels, which are ...
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 3, MARCH 2015

451

QoE-Driven Cross-Layer Optimization for Wireless Dynamic Adaptive Streaming of Scalable Videos Over HTTP Mincheng Zhao, Student Member, IEEE, Xiangyang Gong, Jie Liang, Senior Member, IEEE, Wendong Wang, Xirong Que, and Shiduan Cheng

Abstract— Recently, Dynamic Adaptive Streaming over HTTP (DASH) has attracted significant attention. In this paper, we consider DASH-based transmission of scalable videos in wireless broadband access networks (e.g., long-term evolution and WiMAX), and propose three methods to enhance the quality of experience of wireless DASH users. First, we design an improved mapping scheme from scalable video coding layers to DASH layers that can provide the desired bitrates, enhance the video end-to-end throughput, and reduce the HTTP communication overhead. Second, we develop a DASH-friendly scheduling and resource allocation algorithm by integrating the DASH-based media delivery and the radio-level adaptation via a cross-layer approach. It utilizes the characteristics of video content and scalable video coding, and greatly reduces the possibility of video playback interruption by considering the client buffer status. The optimization problem is formulated as a mixed binary integer programming problem, and is solved by a subgradient method. Finally, a DASH proxy-based bitrate stabilization algorithm is proposed to improve the video playback smoothness that can achieve the desired tradeoff between playback quality and stability. Simulations with the Qualnet tool demonstrate that our schemes achieve better performances than other methods in the literature. Index Terms— Cross-layer optimization, Dynamic Adaptive Streaming over HTTP (DASH), quality of experience (QoE), Scalable Video Coding (SVC), scheduling and resource allocation, video streaming.

I. I NTRODUCTION OBILE video services are increasingly popular. It has been predicted that mobile traffic will grow by a factor of 26% and almost 66% will be videos by the year 2015 [1]. However, sending videos over wireless access networks is still

M

Manuscript received February 7, 2014; revised June 8, 2014, July 17, 2014, July 29, 2014, and August 17, 2014; accepted September 4, 2014. Date of publication September 12, 2014; date of current version March 3, 2015. This work was supported in part by the National High-Tech Research and Development Program (863 Program) of China under Grants 2013AA013301 and 2013AA013303, in part by the Natural Sciences and Engineering Research Council of Canada under Grants RGPIN312262 and STPGP447223, and in part by the National Natural Science Foundation of China under Grant 61370197. This paper was recommended by Associate Editor R. Hamzaoui. M. Zhao, X. Gong, W. Wang, X. Que, and S. Cheng are with State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]). J. Liang is with the School of Engineering Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2014.2357094

very challenging due to the constraints of wireless communications as well as the high-rate and low-latency demands of video applications. MPEG-Dynamic Adaptive Streaming over HTTP (DASH) is an emerging international standard on video delivery [2], and many similar industrial solutions have also been developed, including Microsoft Silverlight Smooth Streaming, Apple HTTP Live Streaming, and Adobe HTTP Dynamic Streaming. DASH provides several mechanisms that facilitate efficient and high-quality delivery of streaming media over the Internet. To use DASH, each video is divided into small segments. Each segment is encoded into multiple bitrate streams or quality levels, which are stored in the HTTP web server. An XML media presentation description (MPD) file includes information of all segments. At the beginning, the MPD file is downloaded to the client. The client then adaptively chooses which version of a segment to download based on its estimated throughput or other constraints. Using HTTP-based delivery, DASH could easily penetrate firewalls and provide reliability and deployment simplicity. In DASH, the data are downloaded using the Transmission Control Protocol (TCP), thus there are less packet losses compared with the traditional User Datagram Protocol (UDP) video streaming, because lost packets will be retransmitted according to the TCP retransmission mechanism, unless the playback deadline of a packet is passed. There are many papers on the optimization of video transmission over wireless networks. Many existing methods can improve the performance of RTP/UDP video streaming (e.g., [3]–[5]). However, quality of experience (QoE)-driven optimization for DASH-based video delivery has not been fully investigated. To improve the QoE, the bitrate switch or adaptation strategy in the DASH client is critical. In [6], an algorithm is proposed based on the measured segment fetch time. The bitrate switch algorithm in [7] takes advantage of the buffer to reduce the risk of sudden drops of video quality. In [8] and [9], the problem of bitrate adaptation over heterogeneous multiple wireless access networks is studied using Markov decision process and dynamic programming. A control-theoretic rate adaptive method is developed in [10] to enhance the DASH performance over multiple content distribution servers. The DASH framework makes it very convenient to use Scalable Video Coding (SVC) [11], where the video is encoded

1051-8215 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

452

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 3, MARCH 2015

once, but can be decoded in different ways, based on factors, such as the network condition and the decoder’s capability. Using SVC in DASH can reduce the storage requirement at the server [12], and can allow the client to download enhancement layers of an already received segment when upgrading to a higher quality, thereby improving the flexibility and reducing the chance of wrong decision in selecting an appropriate version of a segment, especially in dynamic wireless conditions. In [13], SVC is integrated into DASH and evaluated under the vehicular mobility environment. The results show that SVC achieves better bandwidth utilization than H.264/Advanced Video Coding (AVC) [14]. However, there are several potential problems when adopting SVC in DASH [15]. First, separating a segment into several layers leads to multiple HTTP requests that increase the HTTP message overhead. Second, the HTTP request-response cycles increase the waiting time between the receptions of two adjacent SVC layers, and decrease the end-to-end throughput if a single TCP connection is used, although parallel TCP and HTTP pipelining could efficiently decrease this negative effect [16]. Finally, although SVC is generally cache friendly and can save cache storage, sometimes it could cause extra bits and cache storage, if the video program is not popular and is only requested once by one user during the entire cache timeout interval. The penalty is more significant as the number of layers increases. Several methods of mapping from SVC to DASH layers are proposed in [17] and [18]. The former focuses on SNR scalability, and the latter focuses on hybrid scalability, but they do not consider the communication cost. The client’s bitrate adaptation algorithm could be improved by considering the structural features of SVC. In [19], a diagonal policy is proposed that decides to prefetch (downloading for future segments) or to backfill (downloading for the current segment) SVC-coded videos when the download rate varies. Sieber et al. [20] propose an adaptation policy that considers users’ QoE. In [17], it is shown that SVC can improve the hit-rate of the cache and reduce the server storage space. A priority-based media delivery (PMD) adaptation strategy is also proposed in it. These methods can only improve the QoE of one client. In [21], it is demonstrated that there could be fairness issues with DASH when multiple clients share restricted network resources. In [22], three performance problems, namely, instability, unfairness, and bandwidth under-utilization, are discussed when many DASH players compete for the same bandwidth. In [23], an intelligent bitrate switch method is developed to provide fair access, and its effectiveness in WiFi networks is demonstrated. In [24], a proxy-based adaptation algorithm is designed to improve the stability and fairness for multiple DASH services. Another video adaptation proxy method is proposed in [25] to optimize multiple concurrent DASH flows in 3G networks. In these methods, the video quality is adapted according to the resources allocated by the base station (BS). However, the resource allocation in the BS is only based on the wireless channel conditions, but does not consider the characteristics of the video. Li et al. [3] propose the

joint optimization of multiuser packet scheduling and wireless resource allocation for video communications over orthogonal frequency division multiple access (OFDMA) systems. An end-to-end framework that considers cross-layer adaptations in source coding, queue prioritization, flow queuing, and resource management is developed in [5]. However, they focus on conventional RTP/UDP videos, and the characteristics of DASH are not considered. Oyman and Singh [26] suggest that better QoE for DASH can be achieved by tightly integrating DASH-specific media delivery and network-level and radiolevel adaptations. Moreover, the DASH client buffer is also a major factor to consider. It is reported that 60% of YouTube video sessions are aborted before users watch them for no more than 20% of the video duration, and these abortions cause 25%–39% of data to be unnecessarily transferred [27]. It is, therefore, necessary to download DASH data with the just-in-time feature. If the allocated rate is too low, playback interruptions will occur. In [28], a long-term evolution (LTE) downlink radio resource allocation method is proposed to avoid playback interruptions. A QoE-proxy resource allocation approach is developed in [29] to allocate the resource, where a simple linear mapping between the mean opinion score (MOS) and the peak signal-to-noise ratio (PSNR) is assumed, and the MOS is maximized based on the relationship between the MOS and date rate R. A greedy resource allocation method is also proposed. However, the method does not consider the client buffer condition. That is, if the segment requested by a client is closer to its playback deadline, more resources should be allocated to the client to avoid deadline violations. Moreover, the layered structure of SVC is not utilized well in this framework. In this paper, we focus on the SVC-based DASH optimization in wireless broadband access networks. First, an improved mapping from SVC to DASH layers is proposed, which merges small-sized layers to increase the end-to-end throughput and reduce the communication overhead. Second, we propose a DASH-friendly scheduling and resource allocation scheme (DFSRA) that is cross-layer and QoE-aware, in which a DASH information module collects the video information and transmits it to the scheduling and resource allocation module, where a subgradient-based algorithm is used to allocate the wireless resource for each user. Finally, to improve the playback smoothness, we propose a DASH proxy-based bitrate stabilization (DPBS) method that works transparently with existing client-based quality selection policies. Simulation results show that our schemes achieve superior performances over other methods. Some preliminary results of our DFSRA algorithm are reported in [30]. This paper provides the improved mapping from SVC to DASH layers, the DPBS method, and more comprehensive experimental results. The rest of this paper is organized as follows. Section II introduces the proposed system framework. The improved mapping method from SVC to DASH layers is depicted in Section III. Section IV describes the DFSRA and DPBS algorithms. Simulation results are given and analyzed in Section V, followed by concluding remarks and future works in Section VI.

ZHAO et al.: QoE-DRIVEN CROSS-LAYER OPTIMIZATION FOR WIRELESS DASH

Fig. 1.

Cross-layer system model for DASH-based SVC streaming.

II. S YSTEM D ESCRIPTION Fig. 1 shows the proposed cross-layer and QoE-aware DASH-based SVC streaming system model. We consider a typical centrally controlled cellular OFDMA downlink communication system, which is used in LTE and WiMAX. To support diverse quality-of-service requirements, services could be divided into the following classes: 1) guaranteed bitrate (GBR) services that define the minimum required rate, such as VoIP or videos and 2) nonguaranteed bitrate (non-GBR) services that do not have minimum required rates, such as FTP services. In one cell, a subset of all users K = {1, . . . , K } persistently requests DASH services and some other users request non-GBR services, such as besteffort (BE) traffics. The BS receives video streams from DASH servers and BE traffics through the Internet and other networks, such as the LTE core network (CN). We assume that the wired parts have high bandwidth and low latency. The service priority policy module decides the priority levels of different types of traffics. According to different priority strategies generated by the network manager, α ∈ [0, 1] portion of wireless resources could be allocated for DASH services and the others are allocated to BE services. Generally speaking, the priority policy module serves DASH traffics first. The non-GBR BE traffics are then served by the rest of available resources [31]. Thus, when the resources are very limited, α could be 1, which means that all the resources will be allocated to DASH services. The priority strategy could also be generated dynamically as in [25]. The study about the priority level policy is beyond the scope of our current work in this paper. In Fig. 1, the DASH proxy detects all the DASH interaction information, such as MPD information, HTTP requests and responses. The DASH proxy could parse the information and take certain actions. It could modify the HTTP requests based on network conditions and transfer related information to the DASH information module to enhance the stability of each DASH streaming. The DASH proxy is a functional entity that could be implemented in an independent server or be integrated into other physical entities, such as the packet data network gateway (P-GW) or the BS in LTE networks.

453

The DASH information module is located in the BS and only transfers the related information to the resource scheduler. Our proposed DFSRA algorithm dynamically allocates wireless resources for DASH services, and it can also work in parallel with non-GBR schedulers. The wireless system has a total available bandwidth of B, which is equally divided into N = {1, . . . , N} subchannels, each with several consecutive subcarriers. In LTE, a resource block (RB) consists of 12 consecutive subcarriers [32]. In this paper, a subchannel consists of one RB. All channels are assumed to exhibit block fading characteristics. Each subchannel can be allocated to only one user. For each user k ∈ K and each subchannel n ∈ N , the transmit power and the channel gain for user k in subchannel n are denoted as pk,n and h k,n , respectively, and σ 2 denotes the ambient noise variance. M-QAM is assumed to be employed in the communication system. Therefore, User k’s feasible channel rate on subchannel n can be expressed as   βpk,n |h k,n |2 B log2 1 + (1) rk,n = N σ2 where β is a constant, which is related to a targeted bit error rate (BER) by β = 1.5/− ln(5BER) [33]. At the DASH server, the SVC video is stored in a layered segment structure. To use SVC in DASH, for each DASH segment, the network abstraction layer units in the same SVC layer are stored together as a chunk [17], and is denoted as (i, j ) in this paper, where i ∈ {1, . . . , I } and j ∈ {1, . . . , J } are the segment index and the layer index, respectively. In our framework, the content quality information of each chunk, which is expressed by the quality contribution made by the successful transmission of the chunk, is precalculated before streaming and is added to the MPD file by extending the attribute of each chunk. At first, the DASH client will request the MPD file and obtain the content quality information. Next, the client will request chunks according to the MPD file. During the transmission, the DASH proxy could obtain the MPD file (e.g., by the deep packet inspection method [34]). During the playing process, the kth DASH client gets the currently played segment index i k, p . When it requests a new chunk, it could insert the requested chunk index (including the segment index i k,r and the layer index jk,r ) and the currently played segment index i k, p into the HTTP request header. The HTTP request will be detected by the DASH proxy, and the requested chunk index may be modified based on the network condition and other criteria. By checking the previously received MPD files, the DASH proxy could get the content quality information. The DASH proxy will transfer the following information to the DASH information module: 1) the requested chunk index or the modified requested chunk index; 2) the currently played segment index i k, p ; and 3) the content quality information. The information will then be transferred to our DFSRA scheduler. The DASH requests or the modified requests are transferred to the media server. The corresponding data will be transmitted from the media server over a backbone network to the corresponding user’s queue in the BS. In our framework,

454

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 3, MARCH 2015

the first two pieces of information can be used to calculate the requested chunk’s location in the client’s buffer, and is used as the buffer state information (BSI). The method of calculating the weight and BSI is discussed in Section IV-A2. Meanwhile, the channel state information (CSI) of each mobile station (MS) is submitted to the BS. Based on the CSI and video information, our proposed DFSRA is performed to allocate wireless resources (subchannels and power). Each MS will then receive the video data and insert them into its DASH buffer before decoding. III. M APPING F ROM SVC L AYERS TO DASH L AYERS The SVC bitstream consists of a base layer and several enhancement layers. The base layer provides a lower resolution or quality. As more enhancement layers are received, the decoded video resolution or quality is improved. The SVC extension of the H.264/AVC standard supports temporal scalability, spatial scalability, and quality scalability [11]. Temporal scalability can provide videos with different temporal resolutions or frame rates. Spatial scalability represents the video in different spatial resolutions or sizes. Quality scalability refers to the capability of representing the same picture in different PSNR or quality levels. The quality scalability in H.264 SVC includes coarse-grain scalability (CGS) and medium-grain scalability (MGS) [35], [36]. The CGS employs inter-layer prediction and encodes the prediction residual into a new enhancement layer. In MGS, the quantized coefficients of each block can be grouped into up to 16 MGS layers. This is more flexible than CGS. Therefore, in this paper, we consider H.264 SVC with MGS-based SNR scalability. Each MGS layer can be specified by its temporal level t and its quality level q, where t = 0, 1, 2, . . . , T and q = 0, 1, 2, . . . , Q. Given a SVC bitstream and a target bitrate, how to extract the various layers from the bitstream to achieve the best rate–distortion (R–D) performance is not trivial [35], [36]. The optimal result can be found by measuring the R–D contribution (the ratio of the PSNR improvement to the bits) of each MGS layer (t, q) and selecting the layer with the highest R–D contribution. However, the complexity of this method is relatively high. It is shown in [35] and [36] that a MGS-temporal-layer-based extraction method can achieve near-optimal performance with much lower complexity, which extracts the bits from the most important temporal layer to the least important temporal layer. Within the same temporal layer, it extracts from the most important MGS layer to the least important one. In this paper, the significance of each MGS layer is represented by its priority index Sl. Sl = 0 includes the base layers of all frames, which has the highest priority. For any other MGS layer (t  , q  ) with q  > 0, its priority index is given by Sl = t  · Q + q  . An example is given in Fig. 2(a), where each frame is encoded into one base layer and 2 MGS layers. The group of picture (GoP) size is 16, so there are 5 temporal layers, and the priority index ranges from 0 (highest) to 10 (lowest). If the bandwidth decreases, the MGS layers will be dropped from the lowest priority to the highest. The priority index defined above allows us to extract the desired bitstream segments from the SVC output according

Fig. 2. (a) Original SVC structure. (b) MGS-temporal layer configuration approach. (c) Frame level adjustment.

to the MGS-temporal layer method. However, this is still not enough to be applied in DASH. This is because most SVC video streams are generated by fixing various quantization step sizes and the numbers of coefficients of each block that are included in different MGS layers, i.e., the MGS weight configuration. However, the sizes of different MGS layers could vary dramatically, depending on the video contents. Therefore, to fully utilize the bitrate adaptation capability of DASH, it is necessary to decide how to partition or map the extracted bitstream into DASH layers, or how to determine the suitable operating points (OP) of the DASH transmission on the R–D curves. This problem was not addressed in [17]. In this paper, we design the DASH layers such that they have some desired properties, such as more uniformly distributed bitrates, or uniformly distributed video quality (PSNR). An example is given in Fig. 2(b), where the base layers of all frames form the first DASH layer, and all MGS layers are grouped into four additional DASH layers, leading to five possible DASH bitrates for the users to select. To get more accurate rate control in the DASH layers, it is possible to use frame-level mapping, as shown in Fig. 2(c), where the frames in the same temporal layer and MGS layer could be allocated to different DASH layers. In addition to creating desired bitrate distributions among different DASH layers, grouping small MGS layers together can also improve the DASH throughput and reduce the HTTP overhead, where the throughput is defined as the average bitrate of DASH streaming that the client can receive successfully. Otherwise, if the sizes of many DASH chunks are less than the TCP sender window size Wsnd = min{r wnd, cwnd},

ZHAO et al.: QoE-DRIVEN CROSS-LAYER OPTIMIZATION FOR WIRELESS DASH

the TCP throughput will be reduced, where cwnd is the TCP congestion window size, and r wnd is the TCP receiver advertised window size. This can be proved as follows. Lemma 1: When merging some small chunks into a big chunk, the HTTP message overhead   for transmitting the merged chunk c is lower than the overhead  for transmitting the original chunks separately. Similarly, the average end-to-end throughput T h  for transmitting the merged chunk c is higher than the throughput T h for transmitting the original chunks separately. Proof: Assume that the sizes of chunks ci and c are lci and lc , respectively. The sizes of HTTP request and HTTP response headers for chunks ci are lr q i and lr p i . The total HTTP message overhead for transmitting the original chunks  (lr p i + lr q i ), but the HTTP message separately is  = C i=1 overhead for the merged chunk is only   = lr p + lr q  . Thus, the HTTP message overhead can be reduced by the following equation, because lr pi and lr p  have similar sizes, and lr qi and lr q  are similar too as C  (lr p i + lr q i ) − (lr p  + lr q  ).

(2)

i=1

As the DASH chunks are transmitted using the TCP traffic model, the chunks are transmitted over some server–client interactive rounds. We focus on modeling the steady-state TCP congestion avoidance behavior in terms of rounds as in [37]. For simplicity, we assume b is the number of packets that are acknowledged by an ACK, and all the TCP packets have the same size, which is limited by the maximum transmission unit. The server’s TCP sending window size increases with a slope of 1/b packets per round-trip time (RTT), which can be expressed as wτ = wτ −1 + 1/b, where τ is the sequence of the round. We first consider the scenario of transmitting the original chunks separately. The i th chunk is transmitted completely until the round ti , which satisfies t i −1

wτ ≤ lci ≤

τ =ti−1 +1

ti 

wτ .

(3)

τ =ti−1 +1



wτ =

τ =1

tC 



wτ +



i=1

wτ =

τ =tC

τ =1 C 

tC 

lci +

τ =tC

ti 

i=1 τ =ti−1 +1

tC



C 

wτ >

C  i=1

lci .

C  The average throughput (T h  = i=1 lci /Td ) for transmit ting the merged chunk c is therefore higher than the value C lc /Td ) for transmitting the original chunks (T h = i i=1 separately. IV. R ESOURCE A LLOCATION AND B ITRATE S TABILIZATION A LGORITHMS In this section, we describe the proposed resource allocation and bitstream stabilization algorithms. A. DFSRA Algorithm 1) Problem Formulation: Our objective in this part is to design a scheduling and resource allocation algorithm that optimizes the long-term viewing experience for DASH users under dynamic wireless channel conditions. The QoE metrics that we are interested in achieving are: 1) playback continuity, i.e., avoiding playback interruption due to buffer underflow and 2) efficiency, i.e., improving the decoded video quality of all users. Moreover, we aim to improve the fairness of the resource allocation among all users. In each time slot, the objective is to allocate the channel and power that maximize the following sum of weighted rates: K 

ωk Rk

(5)

k=1

where ωk is a dynamically changed priority weight assigned to user k at time t. It is determined by the importance of the chunk that the user is requesting. The method of calculating ωk is detailed in Section IV-A2. Userk’s feasible channel N rate could be calculated as Rk = n=1 μk,n rk,n , where μk,n = 1 if subchannel n is allocated to user k and μk,n = 0 otherwise. Thus, the following conditions on μk,n ensure that each subchannel can be allocated to only one user: μk,n ∈ {0, 1} K 

μk,n ≤ 1 ∀n.

(6)

k=1

Moreover, it should wait a half RTT for each HTTP request. Thus the total time to download all the small chunks is Td = RTT(tC + 1/2). Next, we consider the scenario of transmitting the merged chunk. Let tC be the total number of RTT to download the merged chunk. The size of the tC   transmitted data is τ =1 wτ until round tC . If tC > tC , it can be observed from (3) that tC 

455

Moreover, the total power allocation should not exceed the maximum transmitting power of the BS, that is N K  

wτ +



τ =tC

max μ, p

s.t. (4)

 However, the total size of the merged chunk is only C i=1 lci , which contradicts (4). Thus, it can be deduced that tC ≤ tC . It also waits a half RTT for the HTTP request. The total time to download the merged chunk is thus Td = RTT(tC +1/2) < Td .

(7)

Therefore, the optimization problem is formulated as



tC 

pk,n ≤ Pmax .

k=1 n=1

K 

ωk Rk

k=1

(6) and (7).

(8)

2) Priority Weights Calculation: The objective function in (5) is a special case of the utility-based scheduling problem [38], [39]. A utility function quantifies the benefit or satisfaction of the usage of certain resources, and utilitybased optimization aims to maximize the utility. Utility-based resource allocation algorithms can be divided into two classes: rate-based utility optimizations for BE traffics (such as the

456

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 3, MARCH 2015

maximal system capacity (MSC) [40] and proportional fairness (PF) rules [41]), and delay-based utility optimizations for delay sensitive traffics (such as the max-delay-utility rule [38]). Since video traffics are sensitive to delay, our scheduling and resource allocation use delay-based utility optimization. According to [38] and [39], we should maximize the projection of the achievable rate vector onto the gradient of a utility function. Therefore, ωk should have the form of [38], [39]     (Wk ) ωk = (9) λk

the low-complexity Lagrangian dual decomposition method to efficiently solve the optimization problem. The Lagrangian function of the problem is defined as

where for user k, Wk is the average waiting time, (Wk ) is the utility function,   (Wk ) is the derivative of (Wk ), and λk is the average arrival bitrate. In our system, the quality of the decoded video is used as the utility (Wk ).   (Wk ) is thus related to the quality contribution C(i˜k , j˜k ) of the transmitted chunk (i˜k , j˜k ) for user k, which can be calculated as follows, similar to that in [3]:

where θ is the Lagrangian multiplier associated with the constraint of (7). The Lagrangian dual function is

C(i˜k , j˜k ) = D(i˜k , j˜k ) − D(i˜k ,J )

=

k=1 n=1 K  N 

(11)

where Tseg is the duration of a segment. Since we assume that the backbone link has high bandwidth and low latency, we let λk = L k , where L k is the length (in bits) of the chunk that the client is requesting. Thus, the priority weight for user k is formulated as C(i˜k , j˜k ) . (12) ωk =

k L k By this definition, if the deadline of the chunk requested by user k is approaching, k become smaller, and the priority weight will increase. For users with the same weight, the user with better channel rate is favored by (5). If a user’s priority is higher than others, the user will be allocated more wireless resources. After receiving the current chunk, the user will request a higher layer chunk or the same layer chunk in the future segment. However, the priority of the new request tends to decrease by (12), leading to a fair allocation scheme for all users. 3) Proposed DFSRA Algorithm: Our goal is to find the optimal binary variables μk,n and the optimal power control parameters pk,n . Thus, (8) is a mixed binary integer programming problem. According to [42], it can be solved via convex optimization techniques. In this paper, we use

k=1 n=1

[ωk μk,n rk,n ( pk,n ) − θ pk,n ] + θ Pmax

(13)

k=1 n=1

g(θ ) = maximize L(μ, p, θ ) μ, p

s.t. μk,n ∈ {0, 1}, 0 ≤ pk,n ≤ Pmax .

(10)

˜ where D(i, ˜ j˜) is the distortion of video segment i when up to j˜ layers are received. The enhancement layer chunks will not be transmitted if their deadlines have passed. In the calculation of the base layer chunk’s quality contribution D(i,0) ˜ , the last decodable frame is used to substitute the desired frame in the requested segment to conceal the error. From (10), we could see that the quality contribution index of the lower layer chunk is larger for the same segment. We calculate the speed of distortion reduction of the requested chunk before its deadline as k = C(i˜k , j˜k ) / k , where k is the duration before the chunk’s playback deadline, which can be calculated as

k = Tseg (i˜k,r − i˜k, p )

L(μ, p, θ )   K  K  N N   ωk μk,n rk,n ( pk,n ) + θ Pmax − pk,n =

K 

μk,n ≤ 1 ∀n

k=1

(14)

Given a θ ≥ 0, the optimal solution of {μ, p, θ } for the problem above gives an upper bound of the objective function in (8). Thus, the minimum upper bound can be found by finding the best θ for the following dual problem: minimize g(θ ). θ≥0

(15)

When the time sharing condition is satisfied, the duality gap reduces as N goes to infinity [42]. Since our problem satisfies the time sharing condition (the number of subchannels is sufficiently large), the duality gap for (15) is negligible. Hence, g(θ ) can be decomposed into N subproblems: each finds the optimal user for one subchannel. This allows us ∗ at the given θ for each to find the optimal μ∗k,n and pk,n subchannel nˆ by solving the subproblem maximize L(μ, p) = λ, p

K  [ωk μk,nˆ rk,nˆ ( pk,nˆ ) − θ pk,nˆ ] k=1

s.t. μk,nˆ ∈ {0, 1}, 0 ≤ pk,nˆ ≤ Pmax .

K 

μk,nˆ ≤ 1

k=1

(16)

Given a n, ˆ there is only one μm,nˆ = 1. Thus, the solution of (16) can be written as ⎧ ⎨1, if k = arg max Anˆ (m) m μk,nˆ = (17) ⎩0, otherwise where Anˆ (m) = max (ωm rm,nˆ ( pm,nˆ ) − θ pm,nˆ ) pm,nˆ

s.t. 0 ≤ pk,nˆ ≤ Pmax .

(18)

To solve the problem in (18), note that the channel rate in (1) is a concave function of the transmitting power. Substituting (1) into (18), and setting the derivative with

ZHAO et al.: QoE-DRIVEN CROSS-LAYER OPTIMIZATION FOR WIRELESS DASH

respect to pk,nˆ to 0, the optimal power allocation can be found to be ⎧ 2 + ⎪ ⎨ Bωk + ln(5BER)σ , pk,nˆ < Pmax   ∗ 2 Nθ ln 2 (19) pk, 1.5 h k,n  nˆ = ⎪ ⎩ Pmax , pk,nˆ ≥ Pmax where [·]+ = max(·, 0). Next, the subgradient method is used to solve the dual problem in (15) [43]. Subgradient methods are iterative methods for solving convex minimization problems and have been used in developing distributed cross-layer resource allocation mechanisms [44]. The updating rule for subgradient θ is θ = Pmax −

N  n=1 ˆ

∗ pk, nˆ .

(21)

where ξ l is the step size, which can be chosen according to the diminishing step policy [43]. The iterative procedure adjusts ξ l and θ l until the subgradient θ converges to 0. For each video chunk, the quality contribution can be precalculated before streaming, and the complexity for calculating the priority weight is negligible. The next step is to solve the resource allocation problem by the Lagrangian dual decomposition method with a complexity of O(K N). Thus, the overall worst case complexity of the DFSRA algorithm is O(K N). The entire DFSRA algorithm is summarized in Algorithm 1. B. DPBS Algorithm The previous method does not consider an important issue— the bitrate stability, because frequent bitrate switches can be annoying to users [24], [45]. In this part, we propose a DASH proxy-based bitrate stabilization DPBS method to enhance the bitrate stability of each user, where the DASH requests from each user are examined by the DASH proxy, which will modify a request if it could lead to unstable bitrate (too frequent bitrate switches). Efficiency and stability could be two conflicting objectives. This is because higher efficiency requires higher bitrate, which could hurt the stability. Thus, the DPBS method needs to make a tradeoff between them, and recommends a suitable reference bitrate level for each segment. We define a penalty function PSk for bitrate switching, and a penalty function PBk for over-utilization and under-utilization of the bandwidth. ˜ j), ˜ the proxy After receiving a new request for chunk (i, calculates the penalty function PSk ( j ) for each level j to decide the reference level for segment i˜ as follows. The DASH proxy records the highest levels for several previously requested segments by user k and finds the number of bitrate switches LSk for several previous segments. Let j  represent the highest level the user received for the previous segment i˜ − 1. We define the penalty function PSk ( j ) as PSk ( j ) = LSk | j − j  |.

Algorithm 1 DFSRA Algorithm

1 2 3 4 5

(20)

In the lth iteration, the dual variable θ l is updated by θ l = [θ l−1 − ξ l θ l−1 ]+

457

(22)

6 7 8 9 10 11 12 13 14 15 16 17 18

Input: The chunk index (including the segment index i k,r and the layer index jk,r ) that each client k ∈ K is requesting; The segment index i k, p that each client k ∈ K is playing; The content quality information C(i˜ , j˜ ) of the chunk that each k k client k ∈ K is requesting. Output: The optimal subchannel allocation strategy μ∗k,n and the ∗ for all clients k ∈ K and optimal power allocation strategy pk,n all subchannels n ∈ N. initialization: l ← 0, θ l > 0. repeat l ← l + 1; forall users k ∈ K do if the deadline of the requested chunk has passed and the level is not 0 then Notify the client to request next chunk. end Calculate weight ωk according to Eq. (12). end Solve the following scheduling and resource allocation problem. forall sub-channels n ∈ N do forall users k ∈ K do ∗ with Eq. (19) and the current θ l . Get pk,n Get μ∗k,n with Eq. (17), (18) and the current θ l . end end Update θ l and θ l according to Eq. (20) and Eq. (21). until θ ≈ 0;

˜ the DPBS method needs to calculate the For each segment i, penalty function PSk ( j ) many times and the value could be different each time as the client could upgrade the bitrate levels of the previous segments. PSk ( j ) = 0 when j = j  . In this case, the bitrate has the highest stability, no matter how the previous segments are upgraded in the future. When j = j  , the bitrate will have some variations. The penalty function is further amplified by LSk to discourage frequent switches within a short period of time. The DASH proxy could measure the available bandwidth as in [7], but it needs to adjust the TCP packet sending rate many times. In the DPBS method, the bandwidth is just used to control the tradeoff between the stability and efficiency. Thus, we use a simple estimation method. The proxy first obtains the chunk download time TCD . The bandwidth is then estimated as BWC = L/TCD . The final weighted estimate of the bandwidth is BW = BW · δ + BWC · (1 − δ) for a constant δ. Let r j be the bitrate of the chunk up to level j , the penalty function PBk ( j ) for layer j of the chunk is defined as PBk ( j ) =

|BW − r j | . min(BW, r j )

(23)

The minimum value of PBk ( j ) is 0 if BW = r j , because it makes the best use of the bandwidth. The penalty PBk ( j ) will be BW/r j − 1 if BW > r j , where the available bandwidth is underutilized, and the penalty increases with the available bandwidth. Similarly, PBk ( j ) = r j /BW − 1 if BW < r j , where the available bandwidth is over utilized, and the penalty decreases as the available bandwidth increases.

458

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 3, MARCH 2015

Algorithm 2 DPBS Algorithm Input: The chunk index (including the segment index i˜k and the layer index j˜k ) that each client k ∈ K is requesting; The estimated available bandwidth BWk for the client k ∈ K . Output: The chunk index (including the segment index i¯k and the layer index j¯k ) that is modified by DASH proxy;. 1 initialization: is Stable = F AL S E, i¯k = i˜k , and j¯k = j˜k . 2 while (!is Stable) do 3 forall level j ∈ (1, 2, . . . J ) of segment i˜k do 4 Calculate the penalty function PSk ( j ) according to Eq. (22). Calculate the penalty function PBk ( j ) according to 5 Eq. (23). Calculate the total penalty (PBk ( j ) + πPSk ( j )). 6 7 end 8 Determine the reference level j ∗ according to Eq. (24) 9 if ( j¯k ≤ j ∗ ) then 10 isStable = TRUE. 11 end 12 if ( j¯k > j ∗ ) then 13 if (i¯k + 1 > I ) then 14 Discard the request and notify the client to end the service; Don’t need to output i¯k , j¯k ; 15 16 isStable = TRUE. 17 end 18 else if (i¯k + 1 ≤ I ) then i¯k = i¯k + 1; 19 j¯k = j LU (i¯k ) which represents the lowest 20 unrequested chunk layer index in segment i¯k 21 end 22 end 23 end

Once these two penalty functions are known, the optimal reference level j ∗ for segment i˜ is found by solving the following optimization problem: j ∗ = arg min PBk ( j ) + πPSk ( j ) j ∈(1,2,...J )

(24)

where the parameter π controls the tradeoff between the stability and efficiency. When π is small, the algorithm favors bandwidth utilization and leads to higher playback quality. When π is large, it focuses more on reducing bitrate switches and will have higher playback stability. As discussed before, using SVC, DASH can efficiently upgrade the bitrate level for each segment many times. If the current delivered chunk layer index j˜ is less than the recommended reference level j ∗ , the chunk request is considered as reasonable. Otherwise, the request will be modified. In this case, instead of sending more layers of the requested segment, we send the lowest unrequested chunk from a future segment. The DASH proxy will modify the HTTP request to get the modified chunks. At the end of the video, if all the lowest unrequested layers of future segments are greater than their corresponding reference layers, it is not necessary to transmit more chunks. The DPBS algorithm will discard the request and notify the client to end the service. The entire DPBS algorithm is summarized in Algorithm 2.

TABLE I S IMULATION PARAMETERS

V. S IMULATION R ESULTS A. Simulation Setup In this section, we evaluate the performances of the proposed algorithms in the simulator Qualnet [46], and consider a single LTE cell with several DASH clients requesting SVC-coded videos. Two configurations are tested for low-rate CIF video sequences (Cfg. 1) and 720p HD sequences (Cfg. 2), respectively. The LTE parameters are summarized in Table I. The CIF sequences News, Hall, Foreman, Paris, and Coastguard, and the 720p sequences Mobcal, Stockholm, and Bigbuckbunny [47] are encoded using the H.264 SVC reference software JSVM (version 9.19.14) [48]. The macroblock adaptive inter-layer prediction and the CABAC entropy coding scheme are employed. Most CIF sequences and the 720p sequences Mobcal and Stockholm have about 300 frames. We use the first 240 frames and repeat 10 times. Sequence Bigbuckbunny is a long 720p sequence and only the first 2400 frames are used. The frame rate is 25 frames/s, and the GoP size is 16 pictures. Each DASH segment is chosen to have 3 GoPs. The QP values of the base layer and enhancement layers are 40 and 28, respectively. The default MGS weight vector is W = [1, 2, 2, 11], i.e., the enhancement layer is encoded into four MGS layers for each frame and 20 MGS layers in five temporal layers. We use the MGS-temporal layer extraction method to extract the SVC layers. We then map the SVC layers to four DASH layers for CIF sequences and five DASH layers for 720p sequences. The bitrates of the resulting DASH OP are roughly uniformly distributed from the lowest rate to the highest rate to facilitate DASH bitrate adaptation. The video statistics are summarized in Table II. All clients use the PMD method in [17]. The buffer of each user is 20 s and the prebuffer’s target value is 6 s. In the simulation, all DASH users begin to request services simultaneously. In LTE, the RTT can be as low as 50–60 ms in the radio access network and the CN transmissions [49]. Considering the latency of the Internet, the end-to-end RTT is set to be 100 ms in our simulations. The TCP version we use is TCP SACK since it could effectively recover

ZHAO et al.: QoE-DRIVEN CROSS-LAYER OPTIMIZATION FOR WIRELESS DASH

459

TABLE II DASH L AYER S TATISTICS OF D IFFERENT V IDEOS

Fig. 3. PSNR-rate sample plots and regression model for some (a) CIF video sequences and (b) 720p sequences.

from frequent losses and has good performance in wireless networks [50], [51]. We compare the DFSRA algorithm with the well-known MSC [40], round robin (RR), and PF scheduling rules [41]. We also implement a greedy approach to maximize the total utility, which is similar to [29] and is denoted as utility-rate greedy (UG) method. El Essaili et al. [29] assume a simple linear mapping between MOS and PSNR and use the utilityrate function. In this paper, we adopt the following PSNR-rate function [52]: U (r ) = a1 /(r + a2 ) + a3

(25)

which is a concave function of the rate, where a1 , a2 , and a3 are constant and can be calculated for different video sequences and encoding parameters. Some examples are shown in Fig. 3, which is similar to [29, Fig. 2]. When the proposed DPBS algorithm is turned ON, it works in parallel with DFSRA and is denoted as DFSRA + DPBS or simply D + D in the following. B. Evaluation Metrics We use the following QoE metrics to evaluate the transmission performance. Efficiency: The PSNR is used to evaluate the video quality and streaming efficiency [53]. Playback Stability: We use the following metric to measure stability, which is similar to the metric in [22]:   I −1      d=0 ji−d − ji−d−1 ·  (d) (26) S =1− I  d=1 · ji−d  (d)

where the second term is the instability index, which is the weighted sum of all switch steps in the previous I = 10 segments divided by the weighted sum of the highest received levels in the same duration. ji is the highest received level for segment i . The weight function  (d) = I − d gives more penalties to more recent bitrate switches. Buffer Level: DASH clients should download sufficient data to avoid playback interruption. However, storing more video data than necessary could waste the network bandwidth. Playback Continuity: We focus on the total number of segments that miss the playback deadlines and the waiting time for each interruption. The interruption ratio (IR) is defined as the total interruption time over the total playback time. Fairness: We use the PSNR variance of the same video in different users to measure the fairness. C. Performance of the SVC to DASH Mapping Method In this part, we show the performance of the proposed SVC to DASH mapping method. We compare our method with the simple method that maps each SVC layer extracted by the MGS-temporal layer method directly to a separate DASH layer, i.e., without grouping SVC layers together. We denote this as one-to-one mapping method, which can be considered as a special case of the proposed mapping method. In the simulations, all users are 250–350 m away from the BS with random locations. The default PF resource allocation method is used in this part. Fig. 4(a) and (b) shows the cumulative distribution function (CDF) of the DASH chunk sizes for different video sequences. As expected, the chunk size is much larger using our proposed approach. Fig. 4(c) and (d) compares the throughput and overhead of the two methods. The overhead percentage can be reduced by about 50% using our method, although the overhead only accounts for a small portion of all data, especially for HD sequences. On the other hand, the proposed method leads to a significant throughout gain—at least 50 kb/s for CIF and at least 240 kb/s for HD sequences when the number of users is large, or about 20%. More gains can be achieved when the number of users is small. D. Performance With Still Users In this scenario, we assume that half of the users, called good users, are close to the BS, with random locations at

460

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 3, MARCH 2015

Fig. 4. Comparison of DASH layer mapping methods. CDF of the chunk sizes for (a) Cfg. 1, (b) Cfg. 2; average throughput and average HTTP message overhead percentage for (c) Cfg. 1, (d) Cfg. 2.

Fig. 6. Video quality: average PSNRs with different numbers of users. (a) Cfg. 1. (b) Cfg. 2. (c) Cfg. 1: average PSNRs of individual sequences among good and bad users with 10 users. Sequence: 1. News (good), 2. Coastguard (good), 3. Foreman (good), 4. Hall (good), 5. Paris (good), 6. News (bad), 7. Coastguard (bad), 8. Foreman (bad), 9. Hall (bad), 10. Paris (bad). (d) Cfg. 2: average PSNRs of individual sequences when the number of users is 18. Sequence: 1. Bigbuckbunny (good), 2. Mobcal (good), 3. Stockholm (good), 4. Bigbuckbunny (bad), 5. Mobcal (bad), 6. Stockholm (bad).

Fig. 5. Tradeoff between the efficiency and stability for different π . (a) Relationship between the PSNR and stability. (b) Relationship between the stability and client buffer level.

distances between 100 and 200 m. The rest, called bad users, are 300–500 m away. To compare the fairness, the same video sequence is requested by a good user and a bad user simultaneously. All users are not moving in this test. First, we demonstrate the influence of the weighting parameter π in (24). Fig. 5 shows the relationship between the minimum stability index of all clients and the average PSNR, as well as the relationship between the minimum stability index and the average buffer level, when π is chosen as 0, 0.3, 0.5, and 1, respectively. 12 users in Cfg. 1 and 22 users in Cfg. 2 are tested. It can be seen that higher playback quality can be achieved when π is smaller, but the stability is lower. It also shows that larger π leads to higher buffer level, because DPBS will request chunks from the future segments if the current request is considered to be unstable. In the rest of this paper, π is chosen as 0.5, which yields a good tradeoff between the efficiency and stability. Fig. 6(a) and (b) shows the average PSNRs of all users with different methods. Fig. 6(c) and (d) shows the average PSNRs of good users and bad users that request the same sequences. DFSRA has the best quality. The next is DFSRA + DPBS, because it trades off efficiency for stability. In UG, many enhancement layer chunks miss the playback deadlines, thus

Fig. 7. Average PSNR difference for the same sequence with different numbers of users. (a) Cfg. 1. (b) Cfg. 2.

the average PSNR decreases seriously, especially when the number of users is large. The PSNR is the lowest using the RR and MSC methods, because MSC does not consider the video characteristics and only relies on the CSI, and RR allocates wireless resources to all users in turns without considering the video characteristics and the CSI. The average PSNR of good users using MSC is slightly higher than the PF, UG, and DFSRA methods. However, its PSNRs for bad users are much lower [more than 4 dB in sequences 6 and 10 in Fig. 6(c)]. UG could achieve similar performance to DFSRA in most cases, but many enhancement layer chunks miss their deadlines, thus the average PSNR is lower than that of DFSRA. Fig. 7 presents the average PSNR difference for the same sequence with different numbers of users, where a good user and a bad user are designed to request the same video. It is clear that DFSRA and DFSRA + DPBS have the best performance. UG performs better than PF and RR, and MSC has the worst performance.

ZHAO et al.: QoE-DRIVEN CROSS-LAYER OPTIMIZATION FOR WIRELESS DASH

461

TABLE III I NTERRUPTION R ATIO (%)

Fig. 8. Minimum stability with different numbers of users. (a) Cfg. 1. (b) Cfg. 2.

Fig. 11.

Time-varying SNRs of the three mobile users.

Fig. 9. The average buffer level of different methods. (a) Cfg. 1. (b) Cfg. 2.

Fig. 12. Performances of the three mobile users. (a) PSNR. (b) Stability index. (c) Playback interruption distribution. Fig. 10. Times of different interruption durations when the number of users is (a) 12 in Cfg. 1 and (b) 24 in Cfg. 2.

Fig. 8 shows the minimum stability index with different numbers of users and methods. Clearly, the DFSRA + DPBS method has the highest stability, which is almost independent of the number of users. Further examinations show that RR and MSC have good average stability, but their worst case performance is quite poor. Other methods also could not guarantee the stability. Fig. 9 reports the average buffer level with different numbers of users. It can be seen that for all methods, Cfg. 1 has larger differences than Cfg. 2 between the buffer levels of good users and bad users, especially MSC and PF. This is because the highest bitrates of some sequences, such as News and Hall, are quite low, and the allocated bandwidth using MSC and PF can be higher than the highest bitrates. In DFSRA, the average buffer remains above 4 s in most

cases for both good and bad users. Thus, HTTP data could be downloaded with just-in-time feature. DFSRA + DPBS has longer buffer than DFSRA, because it can request future chunks. In Cfg. 1, the average buffer of bad users in MSC and RR is below 1.92 s (the duration of a segment) when the number of users is large, thus playback interruption can easily happen. In Cfg. 2, bad users in MSC also have low buffer levels. Table III shows the IR of different methods with different numbers of users. UG, DFSRA, and DFSRA + DPBS can effectively avoid buffer underflow, as they can allocate enough wireless resources to bad users. PF is slightly worse, as it does not consider the characteristics of videos. MSC and RR have much worse performance even for small numbers of users, especially MSC. This is because MSC is a greedy method that only relies on the CSI; hence, the wireless resource is mainly

462

Fig. 13.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 3, MARCH 2015

Throughputs, bit-rates, and buffer levels of the three mobile users. Each column is for one user. Six methods are tested for each user.

allocated to good users when the resource is constrained, leading to serious underflow. The RR method equally allocates wireless resources to all users. Therefore, when the resource is limited as in Cfg. 1, it could not effectively transmit the base layer chunks of some difficult sequences, such as

Coastguard and Paris. In Cfg. 2, the system has more resources. The underflow problem is not severe using RR when the number of users is less than 24. Fig. 10 shows the interruption durations of different methods, which has similar relationships to Table III.

ZHAO et al.: QoE-DRIVEN CROSS-LAYER OPTIMIZATION FOR WIRELESS DASH

E. Performance With Mobile Users In this simulation, we set up a BS with 16 active users, including three vehicular users with a speed of 30 km/h, who are requesting three different 720p DASH sequences. Fig. 11 shows the average time-varying SNR of all subchannels of each user. Fig. 12 presents the PSNR, stability index, and distribution of playback interruptions for the three users. DFSRA has the best PSNR, followed by DFSRA+DPBS. In terms of stability, DFSRA+DPBS is the most stable by comparing the minimum and average stabilities. In this test, playback interruptions only happen in MSC, because it only uses the CSI, and the allocated bitrate could be too low to transmit the base layer. Fig. 13 presents the real-time throughputs, bitrates, and buffer levels of the three users with different methods. MSC only uses the CSI in resource allocation; hence, the average throughput changes frequently and the user could only request the lowest bitrate level when the channel quality is bad. RR allocates wireless resource equally. Its throughput variation is smaller. UG and DFSRA can also consider the R–D property of the video. From Table II, it can be seen that the PSNR gains of Layers 1 and 2 are the largest. Thus, UG and DFSRA will allocate more resources to users who request Layers 1 and 2. Using UG and DFSRA, the throughputs of Mobcal and Stockholm sequences are improved even when the channel quality is bad. However, DFSRA has better performance than UG. We can also see that DFSRA + DPBS is the most stable among the six methods. Moreover, the buffer is below 1 s using MSC if the channel quality is not good, and interruptions will happen. For other methods, the buffer is usually between 2 and 5 s. Using DFSRA + DPBS, the buffer will be more than 5 s. For example, for the sequence Bigbuckbunny, DPBS only allows to request the second layer in future segments to avoid frequent bitrate switches from 40 s, and the buffer will increase until to 10 s. From 60 s, the new requests to upgrade the previous segments will be considered stable. Thus, the client could request the third layer. After that, the buffer will decrease, because the allocated bandwidth is not too much more than the third layer’s bitrate. VI. C ONCLUSION In this paper, the optimization of DASH-based transmission of scalable videos over OFDMA systems is studied, and we solve the joint optimization problem by employing the Lagrangian dual decomposition method. By considering the end-to-end distortion, the buffer information at the application layer, and the wireless channel information at the physical layer, the optimal cross-layer resource allocation is determined and the layered feature of SVC is well utilized. We also propose a method to enhance the stability for each DASH streaming. Experimental results show that the proposed schemes outperform existing resource allocation methods. There are several issues that deserve further investigations. First, a quality selection policy should be developed for DASH transmission of scalable videos to reduce the quality

463

variation and to estimate the bandwidth more accurately compared to existing approaches. Second, some characteristics of the TCP, such as the congestion control mechanism, are not considered in this paper, and should be considered in the future. Finally, the users may adopt different adaptation approaches in real LTE networks. It would be useful to test the proposed schemes in real systems and compare with the simulation results in this paper. ACKNOWLEDGMENT The authors would like to thank Dr. M. Reisslein and Dr. P. Seeling for their help on SVC configurations, and the reviewers and the Associate Editor for their suggestions, which have significantly enhanced the quality and presentation of this paper. R EFERENCES [1] (2011). Global Mobile Data Traffic Forecast Update, 2010-2015. [Online]. Available: http://bit.ly/bwGY7L [2] Information Technology—Dynamic Adaptive Streaming Over HTTP (DASH)—Part1, Media Presentation Description and Segment Formats, document ISO/IECDIS23009-1, Aug. 2014. [3] F. Li, P. Ren, and Q. Du, “Joint packet scheduling and subcarrier assignment for video communications over downlink OFDMA systems,” IEEE Trans. Veh. Technol., vol. 61, no. 6, pp. 2753–2767, Jul. 2012. [4] S. Thakolsri, S. Khan, E. Steinbach, and W. Kellerer, “QoE-driven crosslayer optimization for high speed downlink packet access,” J. Commun., vol. 4, no. 9, pp. 669–680, 2009. [5] H. Du, J. Liu, and J. Liang, “Downlink scheduling for multimedia multicast/broadcast over mobile wimax: Connection-oriented multistate adaptation,” IEEE Wireless Commun., vol. 16, no. 4, pp. 72–79, Aug. 2009. [6] C. Liu, I. Bouazizi, and M. Gabbouj, “Rate adaptation for adaptive HTTP streaming,” in Proc. ACM Multimedia Syst. Conf. (MMsys), 2011, pp. 169–174. [7] R. K. P. Mok, X. Luo, E. W. W. Chan, and R. K. C. Chang, “QDASH: A QoE-aware DASH system,” in Proc. ACM Multimedia Syst. Conf. (MMsys), 2012, pp. 11–22. [8] M. Xing, S. Xiang, and L. Cai, “Rate adaptation strategy for video streaming over multiple wireless access networks,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Anaheim, CA, USA, Dec. 2012, pp. 5745–5750. [9] M. Xing, S. Xiang, and L. Cai, “A real-time adaptive algorithm for video streaming over multiple wireless access networks,” IEEE J. Sel. Areas Commun., vol. 32, no. 4, pp. 795–805, Apr. 2014. [10] C. Zhou, C.-W. Lin, X. Zhang, and Z. Guo, “A control-theoretic approach to rate adaption for DASH over multiple content distribution servers,” IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. 4, pp. 681–694, Apr. 2014. [11] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103–1120, Sep. 2007. [12] H. Kalva, V. Adzic, and B. Furht, “Comparing MPEG AVC and SVC for adaptive HTTP streaming,” in Proc. IEEE Int. Conf. Consum. Electron. (ICCE), Las Vegas, NV, USA, Jan. 2012, pp. 158–159. [13] C. Muller, D. Renzi, S. Lederer, S. Battista, and C. Timmerer, “Using scalable video coding for dynamic adaptive streaming over HTTP in mobile environments,” in Proc. IEEE Eur. Signal Process. Conf. (EUSIPCO), Bucharest, Romania, Aug. 2012, pp. 2208–2212. [14] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003. [15] R. Huysegems, B. De Vleeschauwer, T. Wu, and W. Van Leekwijck, “SVC-based HTTP adaptive streaming,” Bell Labs Tech. J., vol. 16, no. 4, pp. 25–41, 2012. [16] N. Bouten, S. Latré, J. Famaey, F. De Turck, and W. Van Leekwijck, “Minimizing the impact of delay on live SVC-based HTTP adaptive streaming services,” in Proc. IEEE Int. Symp. Integr. Netw. Manage. (IM), Ghent, Belgium, May 2013, pp. 1399–1404.

464

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 3, MARCH 2015

[17] Y. Sanchez et al., “Efficient HTTP-based streaming using scalable video coding,” Signal Process., Image Commun., vol. 27, no. 4, pp. 329–342, 2012. [18] M. Grafl, C. Timmerer, H. Hellwagner, W. Cherif, and A. Ksentini, “Evaluation of hybrid scalable video coding for HTTP-based adaptive media streaming with high-definition content,” in Proc. IEEE 14th Int. Symp. Workshops World Wireless, Mobile Multimedia Netw. (WoWMoM), Madrid, Spain, Jun. 2013, pp. 1–7. [19] T. Andelin, V. Chetty, D. Harbaugh, S. Warnick, and D. Zappala, “Quality selection for dynamic adaptive streaming over HTTP with scalable video coding,” in Proc. ACM Multimedia Syst. Conf. (MMsys), 2012, pp. 149–154. [20] C. Sieber, T. Hosfeld, T. Zinner, P. Tran-Gia, and C. Timmerer, “Implementation and user-centric comparison of a novel adaptation logic for DASH with SVC,” in Proc. IEEE Int. Symp. Integr. Netw. Manage. (IM), Ghent, Belgium, May 2013, pp. 1318–1323. [21] S. Akhshabi, L. Anantakrishnan, A. C. Begen, and C. Dovrolis, “What happens when HTTP adaptive streaming players compete for bandwidth?” in Proc. ACM Int. Workshop Netw. Operat. Syst. Support Digital Audio Video (NOSSDAV), 2012, pp. 9–14. [22] J. Jiang, V. Sekar, and H. Zhang, “Improving fairness, efficiency, and stability in HTTP-based adaptive video streaming with FESTIVE,” in Proc. ACM Int. Conf. Emerg. Netw. Experim. Technol. (CoNEXT), 2012, pp. 97–108. [23] K. J. Ma and R. Bartos, “HTTP live streaming bandwidth management using intelligent segment selection,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Houston, TX, USA, Dec. 2011, pp. 1–5. [24] C. Mueller, S. Lederer, and C. Timmerer, “A proxy effect analyis and fair adatpation algorithm for multiple competing dynamic adaptive streaming over HTTP clients,” in Proc. IEEE Int. Conf. Vis. Commun. Image Process. (VCIP), San Diego, CA, USA, Nov. 2012, pp. 1–6. [25] W. Pu, Z. Zou, and C. W. Chen, “Video adaptation proxy for wireless dynamic adaptive streaming over HTTP,” in Proc. 19th Int. Packet Video Workshop (PV), Munich, Germany, 2012, pp. 65–70. [26] O. Oyman and S. Singh, “Quality of experience for HTTP adaptive streaming services,” IEEE Commun. Mag., vol. 50, no. 4, pp. 20–27, Apr. 2012. [27] A. Finamore, M. Mellia, M. M. Munafò, R. Torres, and S. G. Rao, “YouTube everywhere: Impact of device and infrastructure synergies on user experience,” in Proc. ACM SIGCOMM Conf. Internet Meas. Conf. (IMC), 2011, pp. 345–360. [28] T. Wirth, Y. Sánchez, B. Holfeld, and T. Schierl, “Advanced downlink LTE radio resource management for HTTP-streaming,” in Proc. 20th ACM Int. Conf. Multimedia (MM), 2012, pp. 1037–1040. [29] A. El Essaili, D. Schroeder, D. Staehle, M. Shehada, W. Kellerer, and E. Steinbach, “Quality-of-experience driven adaptive HTTP media delivery,” in Proc. IEEE Int. Conf. Commun. (ICC), Budapest, Hungary, Jun. 2013, pp. 2480–2485. [30] M. Zhao, X. Gong, J. Liang, W. Wang, X. Que, and S. Cheng, “Scheduling and resource allocation for wireless dynamic adaptive streaming of scalable videos over HTTP,” in Proc. IEEE Int. Conf. Commun. (ICC), Sydney, Australia, Jun. 2014, pp. 1681–1686. [31] S. V. Tran and A. M. Eltawil, “Optimized scheduling algorithm for LTE downlink system,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), Shanghai, China, Apr. 2012, pp. 1462–1466. [32] Technical Specification Group Radio Access Network; Physical Layer Aspects for Evolved Universal Terrestrial Radio Access (UTRA), document 3GPP TS 25.814 v7.1.0, Sep. 2006. [33] C. Suh and J. Mo, “Resource allocation for multicast services in multicarrier wireless communications,” IEEE Trans. Wireless Commun., vol. 7, no. 1, pp. 27–31, Jan. 2008. [34] S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, and J. Turner, “Algorithms to accelerate multiple regular expressions matching for deep packet inspection,” ACM SIGCOMM Comput. Commun. Rev., vol. 36, no. 4, pp. 339–350, 2006. [35] R. Gupta, A. Pulipaka, P. Seeling, L. J. Karam, and M. Reisslein, “H.264 coarse grain scalable (CGS) and medium grain scalable (MGS) encoded video: A trace based traffic and quality evaluation,” IEEE Trans. Broadcast., vol. 58, no. 3, pp. 428–439, Sep. 2012. [36] P. Seeling and M. Reisslein, “Video transport evaluation with H.264 video traces,” IEEE Commun. Surveys Tuts., vol. 14, no. 4, pp. 1142–1165, 2012. [37] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP throughput: A simple model and its empirical validation,” ACM SIGCOMM Comput. Commun. Rev., vol. 28, no. 4, pp. 303–314, 1998.

[38] G. Song, Y. Li, L. J. Cimini, Jr., and H. Zheng, “Joint channel-aware and queue-aware data scheduling in multiple shared wireless channels,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), vol. 3. Atlanta, GA, USA, Mar. 2004, pp. 1939–1944. [39] G. Song and Y. Li, “Utility-based resource allocation and scheduling in OFDM-based wireless broadband networks,” IEEE Commun. Mag., vol. 43, no. 12, pp. 127–134, Dec. 2005. [40] R. Knopp and P. A. Humblet, “Information capacity and power control in single-cell multiuser communications,” in Proc. IEEE Int. Conf. Commun. (ICC), Seattle, WA, USA, 1995, pp. 331–335. [41] G. Song and Y. Li, “Adaptive resource allocation based on utility optimization in OFDM,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2003, pp. 586–590. [42] W. Yu and R. Lui, “Dual methods for nonconvex spectrum optimization of multicarrier systems,” IEEE Trans. Commun., vol. 54, no. 7, pp. 1310–1322, Jul. 2006. [43] D. P. Bertsekas, A. Nedi´c, and A. E. Ozdaglar, Convex Analysis and Optimization. Belmont, MA, USA: Athena Scientific, 2003. [44] X. Fang, D. Yang, and G. Xue, “Resource allocation in load-constrained multihopwireless networks,” in Proc. IEEE Int. Conf. Comput. Commun. (INFOCOM), 2012, pp. 280–288. [45] P. Ni, R. Eg, A. Eichhorn, C. Griwodz, and P. Halvorsen, “Spatial flicker effect in video scaling,” in Proc. 3rd Int. Workshop Qual. Multimedia Exper. (QoMEX), Mechelen, Belgium, Sep. 2011, pp. 55–60. [46] Scalable Networks Technologies. [Online]. Available: http:// www.scalable-networks.com, accessed May 2011. [47] YUV Video Sequences(CIF). [Online]. Available: https:// media.xiph.org/video/derf/, accessed Nov. 2013. [48] J. Reichel, H. Schwarz, and M. Wien, Joint Scalable Video Model 11 (JSVM 11), document JVT-X202, Joint Video Team, 2007. [49] E. Halepovic, J. Pang, and O. Spatscheck, “Can you GET me now?: Estimating the time-to-first-byte of HTTP transactions with passive measurements,” in Proc. ACM Conf. Internet Meas. Conf. (IMC), 2012, pp. 115–122. [50] M. C. Chan and R. Ramjee, “TCP/IP performance over 3G wireless links with rate and delay variation,” in Proc. ACM 8th Annu. Int. Conf. Mobile Comput. Netw. (MobiCom), 2002, pp. 71–82. [51] M. C. Chan and R. Ramjee, “Improving TCP/IP performance over thirdgeneration wireless networks,” IEEE Trans. Mobile Comput., vol. 7, no. 4, pp. 430–443, Apr. 2008. [52] D. Wang, P. C. Cosman, and L. B. Milstein, “Cross layer resource allocation design for uplink video OFDMA wireless systems,” in Proc. IEEE Global Telecommun. Conf. (GLOBECOM), Houston, TX, USA, Dec. 2011, pp. 1–6. [53] S. Latré et al., “An autonomic architecture for optimizing QoE in multimedia access networks,” Comput. Netw., vol. 53, no. 10, pp. 1587–1602, 2009.

Mincheng Zhao (S’14) received the B.E. degree from the Xi’an University of Posts and Telecommunications, Xi’an, China, in 2008. He is currently working toward the Ph.D. degree with State Key Laboratory of Networking and Switching, Beijing University of Posts and Telecommunications, Beijing, China. His research interests include video communications, wireless network optimization, and QoS of Internet.

Xiangyang Gong received the B.E. and M.E. degrees from Xi’an Jiaotong University, Xi’an, China, in 1992 and 1995, respectively, and the Ph.D. degree from Beijing University of Posts and Telecommunications (BUPT), Beijing, China, in 2012. He is a Professor with BUPT. His research interests include IP QoS, video communications, novel network architecture, and mobile Internet.

ZHAO et al.: QoE-DRIVEN CROSS-LAYER OPTIMIZATION FOR WIRELESS DASH

Jie Liang (S’99–M’04–SM’11) received the B.E. and M.E. degrees from Xi’an Jiaotong University, Xi’an, China, in 1992 and 1995, respectively; the M.E. degree from National University of Singapore (NUS), Singapore, in 1998; and the Ph.D. degree from The Johns Hopkins University, Baltimore, MD, USA, in 2003. He was with Hewlett-Packard Singapore, Singapore, and the Center for Wireless Communications, NUS, from 1997 to 1999. He was with the Video Codec Group, Microsoft Digital Media Division, from 2003 to 2004. He has been with the School of Engineering Science, Simon Fraser University, Burnaby, BC, Canada, since 2004, where he is currently an Associate Professor. In 2012 he visited University of Erlangen-Nuremberg, Erlangen, Germany, as an Alexander von Humboldt Research Fellow. His research interests include image/video coding and processing, multirate and sparse signal processing, and wireless communications. Dr. Liang is an Associate Editor of IEEE T RANSACTIONS ON I MAGE P ROCESSING, IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY, IEEE S IGNAL P ROCESSING L ETTERS , Signal Processing: Image Communication, and EURASIP Journal on Image and Video Processing. He is a member of the IEEE Multimedia Systems and Applications Technical Committee and Multimedia Signal Processing Technical Committee, and is a Professional Engineer in British Columbia.

Wendong Wang received the B.E. and M.E. degrees from Beijing University of Posts and Telecommunications (BUPT), Beijing, China, in 1985 and 1991, respectively. He is a Full Professor with BUPT. He has authored hundreds of papers in various journals and conference proceedings. His research interests include the next-generation network architecture, innovation applications, and mobile Internet. Prof. Wang is a member of the Association for Computing Machinery.

465

Xirong Que received the B.E. and M.E. degrees from Beijing University of Posts and Telecommunications (BUPT), Beijing, China, in 1993 and 1998, respectively. She is an Assistant Professor with BUPT. Her research interests include innovation applications, next-generation network architecture, and mobile Internet.

Shiduan Cheng received the Degree from Beijing University of Posts and Telecommunications (BUPT), Beijing, China. She twice joined Alcatel Bell, Antwerp, Belgium, in the 1980s and 1990s, as a Visiting Scholar, where she was involved in ISDN and ATM research. From 1992 to 1999 she was the Director of the National Laboratory of Switching Technology and Telecommunication Networks at BUPT. She is currently a Professor with BUPT. Her research interests include next-generation Internet, QoS of Internet, and data center networks. Ms. Cheng was a member of the Steering Committee of Communications in the National 863 Program.

Suggest Documents