Cloud-Assisted Streaming for Low-Latency ... - Semantic Scholar

International Conference on Computing, Networking and Communications Invited Position Paper Track

Cloud-Assisted Streaming for Low-Latency Applications Xiaoqing Zhu, Jiang Zhu, Rong Pan, Mythili Suryanarayana Prabhu, and Flavio Bonomi Advanced Architecture & Research, Cisco Systems Inc., San Jose, CA 95134, U.S.A. {xiaoqzhu,jiangzhu,ropan,mysuryan,flavio}@cisco.com Invited Paper

Abstract—Media cloud services offer an unique opportunity for alleviating many of the technical challenges faced by mobile media streaming, especially for applications with stringent latency requirements. In this paper, we propose a novel cloud-assisted architecture for supporting lowlatency mobile media streaming applications such as onling gaming and video conferencing. A media proxy at the cloud is envisioned to calculate the optimal media adaptation decisions on behalf of the mobile sender, based on past observations of packet delivery delays of each stream. The proxy-based intelligent frame skipping problem is formulated within the Markov Decisio Process (MDP) framework, which captures both the timevarying nature of video contents as well as bursty fluctuations in wireless channel conditions. The optimal frame skipping policy is calculated using the stochastic dynamic programing (SDP) approach, and is shown to consistently outperform greedy heuristic schemes. Our simulation studies further characterize how system performance is influenced by various key factors, such as application playout latency, network round-trip-time, and wireless link throughput. Index Terms—cloud computing, low latency media streaming, mobile video conferencing, video adaptation

I. I NTRODUCTION Recent years have seen a proliferation of smart mobile devices, which, in turn, has fueld the rapid growth of mobile media traffic. Many of the applications supported by today’s mobile devices have stringent latency requirements. Examples include media sharing of live events, online gaming, mobile video conferencing, and mediarich virtual desktops. According to [1], the latency threshold for first-person-based avatar games such as racing and combating is in the range of 50 - 100ms. Whereas for video conferencing, the recommended one-way latency is below 150ms [2]. Packet delivery latency, in addition to bandwidth, has is a key performance metric for such applications. On the other hand, mobile media streaming remains a daunting challenge due to inherently time-varying wireless communication channel, unpredictable user demand in the media cloud, and fluctuating source rate of media contents generated on-the-fly. Moreover, the mobile sender typically has limited battery power and computational resources, hence may not afford to implement sophisticated adaptation algorithms for matching the rate of streaming media to available wireless network throughput. Mobile media streaming could benefit from some form of assistance from media cloud servers or proxies in many ways. For instance, the relatively abundant and low-cost computational power of cloud servers could be leveraged for carrying out analysis and estimation of network conditions based on past packet measurements. Accordingly, they can make intelligent media adaptation decisions on behalf of the mobile devices. The cloud media proxy can fuse network measurement reports from many mobile users in the same coverage area, and derive robust statistical models for the wireless communication channel. In addition, as the cloud media proxy is typically situated half way along the path between sender and receiver, it can prompt the sender to take more agile adaptation actions in face of sudden changes, which is of particular importance for low-latency streaming applications.

978-1-4673-0009-4/12/$26.00 ©2012 IEEE

In this work, we showcase the potential benefits of cloud-assisted media streaming in the application scenario of mobile video conferencing. For simplicity, we consider the option frame rate adaptation at the sender. The proposed cloud media proxy takes into consideration past observations such as measured round-trip-times and recent packet delivery delays, and dictates the mobile sender whether to encode or skip the next video content frame as captured by the camera. The intelligent frame skipping problem is formulated within the Markov decision process (MDP) framework. Our formulation captures the impact of many contributing factors to end-to-end system latency: time-varying nature of wireless communication channels, traffic shaping delay at the sender, and video content fluctuation. It is shown that the optimal frame skipping policy can be calculated using the stochastic dynamic programing (SDP) approach. Alternatively, the optimal solution can be closely approximated using a greedy heuristic scheme that only takes into account the most recent observed packet delivery delay. Simulation results show how system performance is influcence by various key factors, including application playout deadlines, network round-trip-times, and wireless link throughputs. The rest of the paper is organized as follows. The next section reviews related work. Section III provides an overview of the proposed cloud-assisted media streaming architecture. Section IV explains how we model the various components in end-to-end media streaming delay. Section V presents our MDP-based formulation of the intelligent frame skipping problem, together with the SDP-based optimal policy. In Section VI, we study fundamental performance tradeoffs of the system via simulation results under various network conditions. Section VII concludes the paper. II. R ELATED W ORK There exists a rich body of literature on low-latency video streaming in the conventional server-client architecture. For instance, the work in [3] has applied linear quadratic optimal control theory to the design of an optimal streaming rate controller, which achieves low startup delay, continuous playback, and efficient bandwidth utilization. In [4], the rate-distortion optimized packet scheduling problem is cast within the MDP framework, which accounts for random packet losses and delay over the best-effort network, as well as the interdependencies between media packets. The MDP approach is also used for video encoder rate control [5] and joint packet pruning and adaptive playout [6] over time-varying wireless channel. Our work follows a similar mathematical framework, but differs from the above by leveraging a cloud media proxy for calculating the media adaptation decisions on behalf of the mobile senders. Recent research has recognized the potential benefits of leveraging proxy servers at the cloud for augmenting the computational and power constraints of the mobile devices [7] [8]. It is shown in [9] that joint adaptation in rendering and encoded video qualities by the cloud gaming proxy can effectively improve the user experience. We, instead, consider generic low-latency media streaming applications in this work, and study their fundamental performance tradeoffs.

949

sender, the proxy records from the receiver’s report one-way delay over the rest of the network. The cloud media proxy can also track network round-trip-time (RTT) for each stream. It then calculates the optimal frame skipping strategy accordingly, and indicates the updated frame skipping decision for the next available frame to the mobile sender via a separate control message. Figure 2 summarizes how information flows between the the sender, the proxy, and the receiver. IV. S YSTEM M ODEL

System overview for cloud-assisted media streaming. The cloud media servers act as proxy between the mobile sending and receiving devices. It is in charge of collecting packet delivery statistics based on sender and receiver reports, as well as calculating the optimal media adapation decision on behalf of the mobile sender.

Fig. 1.

Consider a live media stream generated at the original frame rate of vf and corresponding frame interval of τf = 1/vf . The time instant at which the n-th frame is available at the sender (e.g., captured by the builtin camera, or rendered by the game server) is denoted as ts (n) = nτf . The total number of frames is N . In the following, we explain how we model the uncertainties in various components of the media streaming system. A. Video Content Variation Denoting the encoded frame size for the n-th frame as B(n), we model its evolution as a first-order Gauss-Markov process. Introducing X(n) = B(n) − μB , we have: X(n)

=

ρX(n − 1) + Y (n)

X(n)

∼

2 N (0, σB )

Y (n)

∼

2 ). N (0, (1 − ρ2 )σB

(1)

Here, the correlation coefficient ρ reflects similarities across contents in adjacent frames. The mean and standard deviation of the frame sizes are captured by μB and σB , respectively. B. Wireless Channel Fluctuation

Fig. 2.

Time diagram of cloud-assisted media stremaing protocol.

III. A RCHITECTURE OVERVIEW As illustrated in Fig. 1, we envision the presence of a media proxy within the cloud, which helps to relay the video packets from the mobile sender to a (potentially mobile) receiver.1 The cloud media servers act as proxy between the mobile sending and receiving devices. The cloud media proxy is in charge of collecting packet delivery statistics based on sender and receiver reports. It also calculates the optimal media adapation decision on behalf of the mobile sender. In our design, both the sender and the receiver periodically report to the cloud media proxy observed per-packet delivery delay. Such information can either be embedded as part of meta data in the video packet header, e.g., as part of the extension header in the realtime transport protocol (RTP) [10], or can be conveyed separately in the form of control messages. Upon forwarding media packet from the sender to the receiver, the cloud media proxy extracts from the sender’s report encoded frame size, transmission delay and traffic shaping delay of the forwarded frame. Upon forwarding acknowledgement packet from receiver to

Random variations in the wireless link upload throughput B(n) are modeled as a two-state hidden Markov process. The two states correspond to good (G) or fading (F) channel conditions, and capture the bursty nature of the wireless communication channel. During the good channel state, the value of R(n) follows the normal distribution: 2 R(n) ∼ N (μG , σG ). During the fading state, R(n) ∼ (μF , σF2 ). The expected duration of each state can be calculated from the state transitional probabilities pGF and pF G , as: pF G τf , TG = pGF + pF G pGF τf . TF = pGF + pF G Here, pGF := Pr{frame n is in good state | frame n − 1 is in fading state}, pF G := Pr{frame n is in good state | frame n − 1 is in fading state}. C. End-to-end Frame Delivery Delay We assume that in this work the bottleneck along the path is the upload wireless link at the mobile sender. Consequently, the end-to-end delivery delay d(n) for the n-th frame consists of three components: queuing delay at the sending buffer dq (n), transmission delay of that frame over the wireless channel dw (n), and one-way delay over rest of the network dOW D . This can be expressed as:

1 Note

that the applications supported by such a system can involve bidirectionaly traffic, as in the case of video conferencing. Without loss of generality, we limit our discussions in this work to the media stream along one of the directions.

950

dq (n)

=

B(n) R(n) max[0, dq (n − 1) + dw (n − 1) − τf ]

d(n)

=

dw (n) + dq (n) + dOW D .

dw (n)

=

(2) (3) (4)

B. Actions Each frame is associated with two available actions: 0, skip frame n , an ∈ A. an = 1, transmit frame n Hence the action space is binary A = {0, 1}. To accommodate the fact that a skipped frame does not enter the sending buffer and incur transmission delay, we modify the expressions in (2) as: dw (n) = an

Illustration of media delivery timelines at the sender and the receiver. The frames encompassed by red encircles are voluntarily skipped at the sender.

Fig. 3.

B(n) . R(n)

(9)

Note that the correponding end-to-end delay for frame n is not only affected by past systems states sn , n < n, but also by past actions an , n < n. Combining (9), (3), and the statistical models for B(n)’s and R(n)’s in Section IV, one can one can derive the state transitional probability Pa (s , s) =Pr{sn+1 = s |sn = s, an = a} accordingly. C. Cost Function

In this work, the one-way-delay is assumed to be half of round-triptime across the network: dOW D = RT T /2. The value of RT T , in turn, can be estimated at the cloud media proxy by periodicly probing the sender and receivers using small control messages. Correspondingly, the time at which the frame arrives at the receiver is tr (n) = ts (n) + d(n). Given a playout deadline of To , the system should strive to deliver all transmitted frames before their playout deadline, such that tr (n) < ts (n) + To . Frames arriving after the playout deadline are discarded at the receiver. Figure 3 illustrates the evolution of the timeline both at the sender and at the receiver. V. MDP- BASED P ROBLEM F ORMULATION We now explain how the intelligent frame skipping problem can be formulated within the MDP framework. The following subsections describe the cloud-assisted media streaming system in terms of system states, actions, cost functions, and optimal policies.

The final video quality at the receiver is influcenced by following factors2 : • frame skipping: percentage of voluntarily skipped frames at sender, as dictated by the cloud media proxy. • frame dropping: percentage of late frames being dropped at receiver due to missed playout deadlines. The impact of skipped and dropped frames are captured by following cost function: g(sn , an ) = α(1 − an ) + βan 1{d(n) > To },

Following the timeline of the mobile sender, the system state of the n-th frame is defined as: sn = {B(n − k), R(n − k), dw (n − k), dq (n − k)}, sn ∈ S. (5) In (5), k = RT T /τf corresponds to the lag of observation introduced by round trip time RT T in the system. The value of the observations are discretized as B(n − k) ∈ B, R(n − k) ∈ R, dw (n − k) ∈ D, and dq (n − k) ∈ D: B = {b1 , b2 , · · · , bLB },

bl =

R = {r1 , r2 , · · · , rLR },

rl =

D = {d1 , d2 , · · · , dLD },

dl =

l B , LB max l R , LR max l Dmax , LD

min E

N

the the

(10)

[g(sn , an )].

(11)

n=1

The expectation is taken over all possible realizations of d(n), n = 1, · · · , N given the chosen actions (an ’s) for all frames. D. SDP-based Policy Given state transitional probability Pa (s , s) and cost function g(sn , an ), it is possible to solve the optimization problem in (11) using various standard algorithms [11]. In this work, we follow the stochastic dynamic programming (SDP) approach, and recursively minimize the expected cost-to-go function as:

l = 1, · · · , LB . (6) Jn (s)

l = 1, · · · , LR . (7) l = 1, · · · , LR . (8)

In (6) - (8), the maximum value of each variable is determined by Bmax , Rmax , and Dmax , respectively. The number of units for each variable is LB , LR , and LD , respecitvely. Correpondingly, the granularities of the discretization are influenced by both the range and number of units for each variable. The compound system state space is represented as S = B × R × D × D. The value of each state variable is discretized into multiple units: The overall size of the state space is |S| = |B||R||D|2 . As illustrated in Fig. 2, the cloud media proxy can collect the value of B(n), dw (n), and dq (n) based on sender’s report. It can then derive the value of R(n) according to (2).

the

where the coefficients 0 < α < β denote the relative penalties introduced by the percentage of skipped and dropped frames. The binary indicator function 1(.) takes on the value of 1 if its argument statement is true, and 0 if it is false. The overall expected cost for all N frames is expressed as: {an }N 1

A. System States

the

= =

min

{an }N n+1

E

N

[g(sn , an )]

n =n+1

min E[g(sn , an ) + an Pan (s , s)Jn+1 (s )].

(12)

s ∈S

Note that the optimal choice of frame skipping policy π(n) = an at each step is determined not only by the outcome of skipping or dropping the current frame, but also by the influence of that decision on the queuing delays of future frames. 2 In modern systems, wireless transmission errors are mostly elleviated by physical-layer and MAC-layer techniques such as channel coding and persistent retransmission. We therefore assume that no frame losses in the proposed system.

951

While the SDP-based policy can yield the optimal expected performance, it is fairly complicated. For each media stream it supports, the media cloud proxy needs to pre-compute the cost-togo function with O(M |S|) entries, where M is the optimization horizon and |S| is the size of the state space. It also needs to precompute the state transitional probability matrix with 2|S|2 ) entries. The online computational complexity is on the order of O(|S|) for calculating the expected future cost-to-go function based on state transitional probabilities. While such computational burdens can easily overwhelm a mobile client, it is still acceptable in the media cloud, with relatively aboundant computational resources at low cost. VI. S IMULATION R ESULTS A. Setup In this section, we evaluate the proposed frame rate adaptation schemes using numerical simulations. The original frame rate of the video sequence is chosen at 30 frames-per-second (fps). The correlation coefficient in the Gauss-Markov model parameters for the video sequences is chosen as ρ = 0.8. The average frame size is μf = 3000 bytes, corresponding to an average video streaming rate of 720 Kbps if no frames are skipped at the sender. The standard deviation of the frame size is set as σf = 300 bytes. The application playout deadline varies between 60ms to 600ms. The wireless channel rate fluctuates between a good and a fading state. The average throughput of the good channel state varies between 600 Kbps to 2 Mbps, whereas the average throughput of the fading state is chosen to be μF = 0.3μG . We further fix the standard deviation of the channel rates to be 10% of the average throughputs, i.e., σG = 0.1μG and σF = 0.1μF . Network round-trip-time varies between 20ms and 100ms.

Fig. 4. Percentage of skipped and dropped frames as achieved by the NFS, RFS, DFS, and SDP-based policies. The playout deadline varies between 60ms to 600ms. The average wireless link throughput μG is 1 Mbps during good channel and μF = 0.3μG during fading state. Network round-trip-time is at 60ms.

B. Competing Schemes As a basis for comparison, we also consider the system performance without frame skipping, referred to as No Frame Skipping (NFS), together with two heuristic schemes: • In the Random Frame Skipping (RFS) scheme, the mobile sender randomly chooses to skip a fixed percentage η of the content frames, without adapting to any observed packet delivery statistics. • In the Delay-based Frame Skipping (DFS) scheme, the cloud media proxy dictates the sender to skip frame n + k if the observed delay for the n-th frame d(n) exceeds a prescribed portion of the playout deadline: d(n) > γTo . The scaling factor γ < 1 can be tuned to adjust how cautious the sender is in reacting to observed long delivery delays for past frames. C. Varying Playout Deadline Figure 4 compares the heuristic schemes and the SDP policy in terms of the percentage of skipped and dropped frames, as a function of varying playout deadline of the application. The resulting normalized video quality are shown in Fig. 5. As the playout deadline becomes more relaxed, the percentage of frames missing their deadlines decreases for all schemes. The two adaptive schemes, DFS and SDP, also tend to reduce the percentage of voluntarily skipped frames along with increasing playout deadline. Consequently, the normalized video quality at the receiver increases with more relaxed playout deadlines. It is also worth noting that while overall performance of DFS heuristic scheme closely tracks that of the SDP policy, for more latency-sensitive applications with a playout deadline of 60ms, the SDP scheme significantly outperforms DFS, mainly by avoiding unnecessary volutary frame drops.

Fig. 5. Normalized video quality as achieved by NFS, RFS, DFS, and the

SDP-based policy. The playout deadline varies between 60ms to 600ms. The average wireless link throughput is μG = 1000 Kbps during good rate and μF = 300 Kbps during fading state. Network round-trip-time is at 60ms.

D. Varying Network RTTs Next, we examine how network latency affects system performance. We vary the round-trip-time (RTT) between 20ms to 100ms, while keeping the application playout deadline at 60ms. Figure 6 compares performance of NFS, RFS, DFS, SDP in terms of normalized video quality at the receiver. It is clear to see that the SDPbased scheme significantly outferforms others when network roundtrip-time is moderate, below 60ms. As RTT increases, performance of all four schemes degrades due to the increasing lag between past observation at the media cloud proxy and its frame skipping decision for future available frames at the mobile sender. At the very extreme, when network one-way-delay approaches the playout deadline, it is no longer feasible to support the streaming application. In our experiments, this corresponds to the data point where round-trip-time is 80ms, as a one-way-delay of 40ms accounts for 67% of end-to-end packet delivery delay.

952

utilization. On the other hand, even when there is sufficient overall bandwidth over-provisioning, the time-varying nature of the video contents and the link qualities still prevents the system from achieving full received video quality, as frames may still occasionally need to be skipped or miss their rather stringent latency deadlines. VII. C ONCLUSIONS

Fig. 6. Normalized video quality as achieved by NFS, RFS, DFS, and the SDP-based policy. Network round-trip-time varies between 20ms and 100ms. The average wireless link throughput μG is 1 Mbps during good channel and μF = 0.3μG during fading state. The playout deadline is chosen at 60ms.

In this paper, we propose a novel cloud-assisted architecture for supporting mobile media streaming applications with stringent latency constraints. The media proxy at the cloud is envisioned to take on the burden of calculating the optimal media adaptation decisions on behalf of the mobile sender, based on past observations of packet delivery delays for each stream. The actual adaptation actions of skipping a frame for encoding and transcoding are still performed at the mobile sender. The proxy-based intelligent frame skipping problem is formulated within the Markov Decisio Process (MDP) framework, which captures the time-varying nature and uncertainty both in video encoded frame sizes and in wireless channel conditions. It is shown that the optimal frame skipping policy can be calculated using the stochastic dynamic programing (SDP) approach. Alternatively, the optimal solution can be closely approximated using a greedy heuristic scheme that only takes into account the delivery delay information of the most recently observed frame. Simulation results over varying application playout deadlines, wireless link throughputs, and network round-triptimes confirm the optimality of the SDP approach. It is also shown that while relaxing the application playout deadline tends to gradually improve the received video quality, quality degradation introduced by increasing network round-trip-time or reducing network bandwidth tends to be more drastic once the system resources approach a certain limit. R EFERENCES

Normalized video quality as achieved by NFS, RFS, DFS, and the SDP-based policy. The average wireless uplink bandwidth μG varies between 600 Kbps to 2 Mbps during good channel state, with μF = 0.3μG during fading state. The playout deadline is chosen at 60ms. Network round-trip-time is at 60ms. Fig. 7.

E. Varying Wireless Uplink Bandwidth Finally, we study how bandwidth over-provisioning can help with accommodating low-latency streaming. We fix the playout deadline at 60ms and network round-trip-time at 60ms. The average wireless uplink bandwidth μG varies between 600 Kbps and 2 Mbps during good channel state. Accordingly, the average throughput of the wireless link during fading state is kept as μF = 0.3μG , varying between 180 and 480 Kbps. Figure 7 shows the normalized video quality achieved by NFS, RFS, DFS, and the SDP-based policy as a function of wireless uplink bandwidth during good channel state (μG ). When the available bandwidth is lower than the full video source rate, both the SDPbased policy and DFS can gracefully downgrade the quality of the received video within reasonable range, by directing the sender to voluntarily dropping the video frames. It can also be noted that the SDP-based scheme consistently outperforms DFS, especially around the region where μG =1 Mbps, corresponding to 80% of bandwidth

[1] M. Claypool and K. Claypool, “Latency and player actions in online games,” Communications of the ACM, vol. 49, no. 11, pp. 40–45, nov 2006. [2] “IITU-T Recommendation G.114 - One-way Transmission Time,” ITU-T (Standard), Feb. 2003. [3] C. Huang, P. A. Chou, and A. Klemets, “Optimal coding rate control for scalable streaming media,” in Proc. 14th International Packet Video Workshop (PV’04), Irvine, CA, USA, December 2004. [4] P. A. Chou and Z. Miao, “Rate-distortion optimized streaming of packetized media,” IEEE Trans. Multimedia, vol. 8, no. 2, Apr. 2006. [5] J. Cabrera and A. Ortega, “Stochastic rate control of video coders for wireless channels,” IEEE Trans. Circuits and Systems for Video Technology, vol. 12, no. 6, pp. 496–510, June 2002. [6] Y. Li, A.Markopoulou, N.Bambos, and J.Apostolopoulos, “Joint powerplayout control for media streaming over wireless links,” IEEE Trans. Multimedia, vol. 8, no. 4, pp. 830–843, Aug. 2006. [7] B.-G. Chun and P. Maniatis, “Augmented smart phone applications through clone cloud execution,” in Proc. HotOS XII, 2009. [8] B. Zhao, B. C. Tak, and G. Cao, “Reducing the delay and power consumption of web browsing on smartphones in 3g networks,” in Proc. IEEE 31st International Conference on Distributed Computing Systems (ICDCS’11),, Minneapolis, MN, USA, July 2011, pp. 413–422. [9] S. Wang and S. Dey, “Rendering adaptation to address communication and computation constraints in cloud mobile gaming,” in Proc. IEEE Global Telecommunications Conference (GLOBECOM’10), 2010. [10] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson., “RTP: A transport protocol for real-time applications,” RFC 3550 (Standard), 2003. [11] M. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, 1994.

953

Cloud-Assisted Streaming for Low-Latency ... - Semantic Scholar

Cloud-Assisted Streaming for Low-Latency ... - Semantic Scholar

Suggest Documents

Adaptive resource allocation for streaming ... - Semantic Scholar

Deadlock Avoidance for Streaming Computations ... - Semantic Scholar

Deep Compression for Streaming Texture ... - Semantic Scholar

Streaming Connected Component Computation for ... - Semantic Scholar

QoE for Mobile Streaming - Semantic Scholar

SECURE SCALABLE VIDEO STREAMING FOR ... - Semantic Scholar

grid data streaming - Semantic Scholar

Hierarchically Clustered P2P Streaming System - Semantic Scholar

Search-based composition, streaming and ... - Semantic Scholar

Multiple Description Streaming with Content ... - Semantic Scholar

Representing Internet Streaming Media Metadata ... - Semantic Scholar

Does Auditory Streaming Require Attention? - Semantic Scholar

Hydromagnetic Instability of Streaming ... - Semantic Scholar

Irregular Acoustic Streaming Formation in ... - Semantic Scholar

Investigating Streaming Techniques and Energy ... - Semantic Scholar

UniGrids Streaming Framework: Enabling ... - Semantic Scholar

Multiple Description Streaming with Content ... - Semantic Scholar

Round-Robin Streaming with Generations - Semantic Scholar

Improving Internet Video Streaming Performance ... - Semantic Scholar

Characterizing Adaptive Video Streaming Control ... - Semantic Scholar

Parallel Adaptive HTTP Media Streaming - Semantic Scholar

Video Streaming Over Wireless Networks - Semantic Scholar

Predictive Modeling of Streaming Servers - Semantic Scholar

Streaming Video Traffic : Characterization and ... - Semantic Scholar