On the buffer dynamics of scalable video streaming over wireless network Hsiao-Chiang Chuang, ChingYao Huang, and Tihao Chiang Department of Electronics Engineering, National Chiao Tung University, 1001, Ta Hsueh Rd., Hsinchu, 30050, Taiwan, R.O.C. Email Address:
[email protected] Abstract – 3G mobile standard offers unprecedented high speed digital service that can carry multimedia traffics such as video streaming. The distributed system buffers in the conventional client-server architecture absorb the fluctuations of channel conditions such as bit rate variations, delays and jitters. To enable easy control for the 3G system, MPEG committee is developing scalable video coding technology that provides flexibility for transmission over wireless networks. This paper will address issues in controlling these buffers that can impact and benefit the end-to-end system performance using the novel scalable video technology. Specifically, we will explore the effects of diverse channel characteristics on the buffers performance tolerance, and its impact on the quality of service. A novel model-based Adaptive Media Playout (AMP) buffer control is proposed to mitigate the risks of buffer outage. Keywords: buffer, cdma2000, 1x-RTT, video streaming, scalable video, adaptive media playout (AMP), rate control
I.
Introduction
Applications such as rich web content delivery, games, e-mail/file downloading, and multimedia streaming can be enabled with the 3G cellular systems. Delivery of video is the most challenging task because it has severe timing constraints and needs sophisticated controls for the delay, jitter, bit error rates, transmission rates, and synchronization. For example, proper allocation of the limited system resources, such as transmission power, and supplementary channels (e.g., Supplemental CHannel in cdma2000TM system [1]) is critical. The net effects of the resource allocation influence the visual quality of video streaming. To alleviate such effects, the system designers often put buffers in the transmission path. The overall effect of the resource allocation issue is demonstrated as the variation of channel throughput. To resolve this problem, we need to solve the problem for both the source coding and network control optimization. P
model-based AMP control that could effectively eliminate the event of buffer underflow. In general, a streaming server is located in one of node of wire-line Internet network, and the client could be any user terminal accessing the streaming service. The transmission path from the streaming server to any mobile station can be divided into two delivery segments. The first segment is the route between the streaming server and the associated BTS. The channel defined in this segment could be the heterogeneous environment of the Internet. The second segment should be the air-link between the BTS and the MS, while the definition of channel in this section is the air link between these two components. The scenario for a generic video streaming over a mobile cellular network is depicted in Figure 1. This paper mainly addresses the issue of buffering for video streaming in the last-mile transmission section shown in Figure 1. The paper is organized as follows. Section II provides some fundamental observations of the internal dynamics of MSB. Section III discusses how to exploit the observations to build the buffer control algorithm so that the probability for the buffer outage could be arbitrarily reduced. Section IV provides the experimental results, while Section V gives some concluding remarks for this paper.
Content Server
Handheld
Internet PDA
Streaming Server
Packet Data Switch
P
At the source coding side, the MPEG committee has standardized the MPEG-4 Fine-Granularity Scalability (FGS) [2] scalable video coding technology to utilize the available channel throughput efficiently. The major feature of the scalable video coding is to deliver a video stream such that it can adapt to the fluctuation of the channel throughput on-the-fly. As for the network control side, generally there are two buffers in the last-mile transmission path. One is located in the base transceiver station (BTS), namely, base-station queue (BSQ), and the other one is positioned in the mobile station (MS) called mobile-station buffer (MSB). In this paper, we will study the MSB to understand the effect of buffer dynamics for conventional client-server architecture, while assuming that the size of BSQ is large enough for low-latency video streaming application. Adaptive Media Playout (AMP) is a receiver-based buffer control technique that adjusts the frame rate of the playback process, such that the probability for the event of buffer outage is minimized. Take buffer underflow for an example, the frozen picture commonly seen causes serious damage to the perceptual quality, and the AMP scheme could prevent this situation by reasonably modulating the frame rate. Informal subjective tests have shown the reduction of playback rate up to 25% is unnoticeable [3]. This paper proposes a
BSC
BTS
Laptop
BSQ
MSB
Last-mile Transmission
Fig. 1. A generic scenario for video streaming over mobile cellular network
II.
Buffer Dynamics
This section describes the behavior of the buffering process for video streaming. We explore fundamentals from the link-layer simulation of transmitting scalable video bitstream over the cdma2000 1x-RTT system. The test system combines the use of MPEG-21 Testbed [4] (C++ based) and a link-layer transmission platform (MATLAB based) to realistically simulate video streaming over the cdma2000 1x-RTT system. The IP-packet profile is acquired from the MPEG-21 Testbed and fed into the link-layer transmission platform. Observation results could give designers some suggestions while designing the buffer control mechanism so that the buffer is free from outage. A.
Assumptions
Source Rate Controls There are typically two rate controllers in the transmission path for scalable video streaming, one is the source rate controller, and the other is the air-link rate control (rate assignment), as illustrated in Figure 2. The source rate controller should send out
data based on the channel condition of the subsequent network, including the Internet, BSQ, and available data throughput provided by the Base Station Controller (BSC). Typically, these two rate controllers are located within different devices in the network. If there are no negotiation protocols between source and air-link rate controllers to communicate with each other, or the round-trip delay is relatively large compared to the required response time, the behavior of source rate controller could be assumed to be independent from that of the air-link rate controller. The determination of source transmission rate would therefore be according to the result of probing the channel condition of Internet in the streaming server. We assume that the probed available channel bandwidth follows a uniform process, which is the upper bound of the highest data throughput that cdma2000 1x-RTT system could support. This assumption may cause some inconsistency from practical streaming case; however, this simple assumption would give us fundamental observations for an end-to-end system perspective.
because it degenerates to the case of offline file downloading which is insensitive to the channel variation. In addition, for real-time content such as live news, large pre-roll time incurs undesirable long latency for video playback. Initially, we assume that the effective buffer size is 200kB for video streaming application in the heterogeneous multitasking environment of the Real-Time Operation System (RTOS) in a mobile station.
Air-link Rate Assignment For the radio rate assignment, the data throughput that could be supplied by the system depends on two major factors: the first one is the amount of backlogged data in the BSQ, and the other is the available radio resource. The interaction among techniques of radio resource management such as call admission control, burst allocation, burst admission control, and rate assignment, would be rather complex. To simplify this problem, we assume that the rate assignment follows a Markov chain probability model with static transition probabilities. The assigned rate arises periodically and the corresponding (RLP) frame error rate is monitored. Once the current RLP frame error rate exceeds the target threshold, the assigned rate drops to lower bit rate levels instantly to ensure the transmission quality. Once steady state is reached, the probability of rate assigned to each level should approach a Gaussian distribution, where its mean should be close to the average source rate.
Table 1. Factors that influence the probabilities of buffer outage Departure rate P(underflow) ↑ P(underflow) ↓ μ(t) ↑ μ(t) ↓ P(overflow) ↓ P(overflow) ↑ Arrival rate P(underflow) ↓ P(underflow) ↑ λ(t) ↑ λ(t) ↓ P(overflow) ↑ P(overflow) ↓ MSB size P(underflow) P(underflow) B↑ B↓ P(overflow) ↓ P(overflow) ↑ Pre-roll time P(underflow) ↓ P(underflow) ↑ Pr ↑ Pr ↓ P(overflow) ↑ P(overflow) ↓
Streaming Server
BSQ
Internet
Transmitter BTS
probing Source Rate Control
B.
Fundamental Observations Based on the assumptions for rate assignment made in the previous subsection, most cases of buffer outage are buffer underflow due to the limited radio resource allocation, i.e., the assigned data throughput for a single user. To observe the probability of underflow, we consider parameters that would introduce buffer underflow initially. Table 1 shows 4 parameters that would cause buffer outage. In the following text, we observe how these parameters including departure and arrival rates influence the dynamics of buffer fullness.
We first let the other two parameters fixed, with buffer size B = 200kB and Pre-roll time = 1 seconds. The source rate controller transcodes the FGS-coded bitstream into a truncated bitstream with average bit rate of 30kbps, and the rate assignment follows the Markov chain probability model defined in Figure 3. P (R 1 |R 5 )
P (R 1 |R 4 )
P (R 1 |R 3 )
P (R 1 |R 2 ) P (R 1 |R 1 )
9 .6 k b p s
Air-link Rate control
Signaling
P (R 2 |R 1 )
BSC
P (R 2 |R 5 )
P (R 2 |R 4 )
P (R 2 |R 3 )
P (R 3 |R 5 )
P (R 3 |R 4 )
P (R 3 |R 3 )
Data
P (R 2 |R 2 )
2 8 .8 k b p s
Control Signaling
P (R 3 |R 2 )
Fig. 2. Two rate controllers in the transmission path for video streaming application
48kbps P (R 4 |R 3 ) P (R 4 |R 5 )
Buffer sizes There are mainly two buffers in the transmission path for video streaming. In this paper, we assume that the BSQ assigned for each video streaming user is large enough such that there is no overflow case for BSQ. As to the MSB, the internal application memories in the state-of-the-art 3G handsets could be several megabytes. For example, there are 3.4MB internal dynamic memories in the Nokia 3650 handset [5]. However, this memory is shared for all applications in the handsets, such as phone numbers, messages, and photos. Therefore, the buffer size for video streaming application may vary. Allocation of larger dynamic memory for streaming buffer could reduce outages since it lowers the probability of buffer overflow. Similarly, increase of the buffer size causes longer time for bitstream pre-rolling and reduces the probability of buffer underflow. However, a large pre-roll time makes video streaming less relevant
P (R 4 |R 4 )
8 6 .4 k b p s P (R 5 |R 4 )
P (R 5 |R 5 )
1 6 3 .2 k b p s
Fig. 3. State diagram for air-link rate assignment To point out the relationship between buffer fullness and probability of underflow, we study the probability of underflow in this streaming case as shown in Figure 4. Intuitively, if the buffer fullness is lower, there is higher probability that the buffer is underflow. Usually, the handling procedure of buffer underflow would halt the playback and wait for next data segment, with duration of pre-roll time. Here we assume that the buffer underflow occurs when the virtual decoder could not retrieve a complete video frame from the MSB at the specific time instant.
, where dγ is an infinitesimal interval of the arrival or departure rate, and γi is the boundary value for segmenting the arrival or departure curve into finite intervals. The approximation may lose some accuracy on the estimation for probability of underflow. However, the loss would be negligible when the number of intervals, N, is large enough. We could calculate the distribution for probability of underflow each time the fullness is changed. Nevertheless, the evaluation of distribution curve would cost a lot in the computational complexity since there is infinite number of levels for the buffer fullness. Moreover, the duration between two consecutive changes of buffer fullness could be too frequent. As a result, the frequency of distribution curve updating should be made infrequently.
0.25
B
Prob(Underflow,Fullness)
0.2
0.15
0.1
0.05
0
0
2000
4000
6000 8000 10000 MSB Fullness (kbits)
12000
14000
16000
Fig. 4. A probability distribution of underflow, with mean channel throughput = 40kbps, and mean source rate = 40kbps Based on the observation of various underflow cases, we conclude that the exponential distribution adequately describes the behavior of underflow event. The distribution including two parameters, α and β, is formulated as
P (underflow,fullness = F) ≅ α ⋅ exp(− β ⋅ α ⋅ F )
α = P(underflow, fullness = 0)
(2)
Note that for Eq. (1), the resulting exponential function is not be a valid probability density function because the integration of Eq. (1) may not equal to 1. Additional boundary condition should be introduced to adjust this probability distribution function. C.
Mathematical Analysis Consider a buffering structure with size B, arrival data rate λ(t), and departure data rate μ(t). We assume a random process ΔR(t) = λ(t) –μ(t) and observe its influence on the buffer underflow. Note that the value of distribution boundaries of these bounded processes could be either estimated from the stored record of traffic statistics, or from the results of negotiation protocol such as MPEG-21 Digital Item Adaptation (DIA) [6] to make them as a priori knowledge for these random processes. For example, the maximum, minimum, and average arrival rate could be contained in the conteXt Digital Item (XDI) description [6], and this information could be sent to the mobile station before streaming.
To find out those parameters used in the estimation of probability distribution of underflow defined in Eq. (1), we could start derivation from the following formula: P(underflow,fullness = F(t)) = P(ΔR(t) < -F(t))
(3)
= P(λ (t ) < μ (t ) − F (t ))
If the pre-roll time is long enough to de-correlate the arrival and departure processes we can assume the arrival and departure processes as independent. Hence, the probability defined in Eq. (3) could be re-written as: P(λ(t) < μ(t) − F(t)) =
∞
∑P(λ(t) < γ
i−1
∞
∑P(λ(t) < (i −1) ⋅ dγ , (i −1) ⋅ dγ < μ(t) − F(t) < i ⋅ dγ ) i =1
≈
∑P(λ(t) < γ N
i =1
i −1
, γ i−1 < μ(t) − F(t) < γ i )
α = P(underflow, fullness = 0) = P(λ (t ) < μ (t )) ≅
∑ P(λ (t ) < γ N
i −1
, γ i−1 < μ (t ) < γ i )
i −1
) ⋅ P(γ i−1 < μ (t ) < γ i )
(5)
i =1
=
∑ P(λ (t ) < γ N
i =1
∞
∫α ⋅ e ∫α ⋅ e
−α ⋅ β ⋅ f
μ max ∞
0
df
= e −α ⋅ β ⋅ μ = δ max
−α ⋅ β ⋅ f
(6)
df
, where δ represents the probability of underflow while the buffer fullness exceeds the maximum departure rate μmax, and it is reasonable to set it as a small number. As long as the Eq. (5) and Eq. (6) are both solved, we could obtain the exponential function to approximate the probability distribution curve of buffer underflow. Note that the update for the approximation function should only be done when some of the boundary conditions are altered. For example, once the maximum/minimum departure rate μmax/μmin is changed, the departure distribution function would be modified accordingly. Therefore, we should re-compute the parameters for approximation function of underflow probability. Moreover, the number of segments N would also affects the accuracy of the buffer outage modeling, and which results in a tradeoff between the computational complexity and the precision of the modeling. B
B
B
III.
B
B
B
Buffer Controls
In this section, we propose a model-based AMP control algorithm based on the fundamentals described in section II. The model-based buffer control mainly utilizes the parameter estimation using Eq. (5) and Eq. (6) to update the threshold dynamically. Let Eq. (1) equal to Pu(F), the corresponding probability density function for underflow would be: B
pdf u ( F ) =
B
Pu ( F )
∫
∞
0
Pu ( f ) df
= β ⋅ Pu ( F )
(7)
The approximation function with these two parameters is depicted in Figure 5, and the value of δ is typically set to 1%.
, μ(t) − F(t) = γ i−1 )
i =1
=
To reduce the computational complexity of estimating distribution curve for buffer underflow, we could utilize the approximation function defined in Eq. (1). Comparing Eq. (1) with Eq. (3), we find two boundary conditions for evaluating the parameters α and β in Eq. (1), as following two formulas:
(1)
, where the scale factor β in the exponent increases decay rate. The value of α represents that the probability of underflow when the buffer fullness is equal to zero, that is,
B
(4)
Based on the similar assumptions made in previous subsection, the steady-state arrival process should be analogous to a Gaussian process, which is bounded by the minimum and maximum air-link channel throughput during some period of time,
Prob(Underflow)
Observation Time = 100ms
0.1 0.05
Prob(Underflow)
0
0.5
1
1.5
2 2.5 3 Observation Time = 200ms
3.5
2 2.5 3 Observation Time = 300ms
3.5
2 2.5 MSB Fullness(bits)
3.5
4
4.5 4
x 10
0.1 0.05 0
0
0.5
1
1.5
4
4.5 4
x 10
0.1 0.05 0
Pdfu(F)
0
0.15 Prob(Underflow)
in the cdma2000 1x-RTT system. Meanwhile we assume that the departure process is uniformly distributed, which is the worst-case assumption due to its high uncertainty. However, the departure process resembles triangular distribution more than a uniform distribution. Since the source rate controller support the rate adaptation from the application layer to the network layer, the protocol stack headers occupy partial available channel bandwidth. Besides, the base-layer bitstream has higher priority for rate adaptation. Thus, it is more probable for the receiver to obtain low-rate video content, and hence the departure process is more likely to become a triangular distribution. Note that we could estimate the true departure process instead of the blind assumption of uniform distribution, such that the parameters of approximation function could be calculated more precisely.
0
0.5
1
1.5
3
4
4.5 4
x 10
Fig. 6. Illustration of different observation time
α ⋅β
PT
δ
Fullness Fth
μmax
IV.
Fig. 5. Approximation function for probability of buffer underflow To perform the buffer control, we have to determine the activation threshold of buffer fullness so that power consumed by the control process is reduced. A quality factor PT could be chosen in advance to ensure that this control decreases the probability of underflow below PT. Once this quality parameter has been determined, the threshold of fullness, Fth, could be computed from the approximation CDF function by table look-up. As shown in Figure 5, the value of Fth does not settle down until the associated covering area is equal to (or larger than) PT. B
B
B
B
B
B
B
B
B
B
Once the threshold is computed, the playout frame rate could be determined using any existing algorithms. To reduce the complexity of the control mechanism, we adopt a simple linear frame rate adjustment that could effectively modify the frame rate with low complexity, which is defined as follows: AdjustedFr ameRate =
To evaluate the performance of control algorithm, we study two metrics here. The first one is the mean discrepancy between original and controlled frame rate. This metric indicates the smoothness of the playback, while implying that the occupancy time of the associated user to the system. The second one is the standard deviation of frame rate, and this metric implies the degree of variation of playback speed. Large variation would often cause more annoyance to the human perceptual quality.
Fullness ⋅ OriginalFr ameRate Fth
(8)
To apply the control algorithm for the buffer controller, we determine the frequency of control activation in advance. There are in general two states for the control mechanism to operate. The first one the normal state, in which the control algorithm checks the current buffer fullness periodically and the threshold is updated according to the statistics. The second state is the potential outage state, where the control process is activated more frequently to prevent the buffer from underflow. Figure 6 shows the impact of the frequency of the control mechanism. If the frequency of the control mechanism is too low, the probability of underflow would not follow an exponential function, but a uniform distribution. To keep the accuracy of the proposed control scheme, we set the period of the control mechanism equal to the duration of two consecutive video frames. Note that the observation time which is less than the original video frame duration could be important as well when buffer overflow situation appears. Besides, the adjusted frame rate need not be integer since the control mechanism determines only the playback time of the next video frame.
Experimental Results
In this section, we explore the performance of the proposed model-based control scheme. Some associated system parameters are defined in Table 2. For higher data throughput, we also assume that some header compression techniques are employed, to reach a compression ratio of 5. Note that originally the overhead of each IP packet would be 16 (RTP) + 8 (UDP) + 20(IP) + 6(PPP) = 50bytes, and this may occupies a large portion of channel throughput for low-rate transmission. For low-rate transmission, one video frame of the FGS-coded bitstream is typically comprised of two IP packets (one for base layer, another for enhancement layer). If the frame rate of video clip is 10fps, the overhead would be 2x10x50x8 = 8kbps. Consequently, header compression becomes relatively significant to the end-to-end performance. The experiments employ Monte-Carlo method with 100 times of simulation. Furthermore, we assume that the rate assignment for each RLP frame could be done every 20ms, i.e., the transition between two states could take place every two consecutive RLP frames. Note that we support an air-link channel with average throughput of 40kbps instead of 30kbps. The reason of this higher-rate assignment can be twofold. One is to compensate for the cause of stuffing bits resulted from the mismatch between SDU [1] boundary and RLP boundary, and which effectively increases the required rate for delivering the video content. The other reason is to clearly observe the impact of channel variation. Table 2. System parameters for the streaming scenario Parameters Value RLP frame period 20ms Header compression ratio 5 Buffer size 200 Kbytes Avg. Source Rate 30 kbps Length of video clip 216 sec. Source frame rate 10 fps Avg. Air-Link Channel 40 kbps Throughput As we can see from Figure 7, higher pre-roll time and lower variation of channel throughput incur less buffer underflow. There
are two ways to reduce the event of buffer underflow. One is to increase the pre-roll time, but this is not suitable for the live streaming. Another is to reduce the variation of channel rate, which requires a good radio resource management strategy provided by the system. The stability of channel rate assignment of a video streaming user is also an important factor to the QoS of the streaming experience. It often takes more time to reload data of specific playback duration, and hence the count of underflow does not exactly reflect the total time of frozen playback.
0.9 No control Controlled 0.8
Mean discrepancy of frame rate (fps)
0.7
Pr=1 Pr=2 Pr=3 Pr=4 Pr=5
16
0.4
0.3
0.1
0 25
14
12 Underflow Count
0.5
0.2
18
30
35 40 45 Standard deviation of channel throughput (kbps)
50
55
Fig. 10. Influence on mean discrepancy of frame rate between controlled and uncontrolled streaming
10
8
6
4
2
0 25
30
35 40 45 Channel Standard Deviation (kbps)
50
55
Fig. 7. The impact of pre-roll time on the underflow 0.8 Pr=1 Pr=2 Pr=3 Pr=4 Pr=5
0.7
Next, we explore the effect of the proposed control scheme, while setting the pre-roll time equal to one second, and the standard deviation of channel throughput is set to 40 kbps. As shown in Figure 9, the controlled playback suffers less variation in frame rate, and hence the user could obtain better visual quality. The controlled playback undergoes a smaller variation in frame rate, since there is no instantaneous frame rate transition. In addition, the control scheme could keep the total playback time nearly unchanged, as depicted in Figure 10. It demonstrates that the proposed control scheme has limited impairment from the perspective of system resource management.
V.
0.6 Mean dsicrepancy of frame rate (fps)
0.6
0.5
0.4
0.3
0.2
0.1
0 25
30
35 40 45 Channel Standard Deviation (kbps)
50
55
Fig. 8. The mean discrepancy of frame rate among pre-roll times Figure 8 shows the mean discrepancy between the original frame rate and the effective frame rate, in the presence of buffer underflow. Since the total number of video frame is fixed, the larger discrepancy indicates that the longer playback time of the overall streaming process. Longer playback time may lengthen the occupancy time of an user for the system resource. This will reduce the multiplexing gain and data throughput of a system, and which is undesirable from the system perspective. 2.4 No Control Proposed 2.2
Standard Deviation of Frame Rate (fps)
2
1.8
1.6
1.4
1.2
1
0.8
0.6
This paper integrates the whole architecture of video streaming over cdma2000 1x-RTT system. The use of MPEG-21 Testbed makes the experimental result more reliable for practical system design. Some fundamentals are discovered and these fundamentals, such as the impact of pre-roll time and channel variation, could give the designers some suggestion for designing future or QoS-provisioning video streaming system. Simulation results of the link-layer transmission behavior could be further extended to the cdma2000-based systems such as 1xEV-DV or 1x-EV-DO, for designing and optimizing the QoS protocols. The proposed control algorithm considers both control performance and computational complexity, and it is shown from the experimental results that the control mechanism provides better visual quality, while keeping the multiplexing gain for the cellular system. The underflow case considered in this paper is better due to the assumption of header compression and full-slot utilization for information bits. The practical situation would be worse since the header compression is not used and some of the information bits should be allocated for signaling.
References [1] TIA TR 45.5, “The cdma2000 ITU-RTT Candidate Submission (0.18),” July 27, 1998 [2] Coding of Audio-Visual Objects ― Part-2 Visual ― Amendment 2: Streaming Video Profile, ISO/IEC 14496-2:2001 / Amd 2 : 2002. [3] M. Kalman, E. Steinbach, and B. Girod, “Adaptive playout for real-time media streaming,” ISCAS 2002, Scottsdale Arizona, May 2002. [4] C.-N. Wang et al., “Scalable Multimedia Streaming Test Bed for Media Coding and Testing in Streaming Environments,” ISO/IEC SC29/WG11 M10298, Dec. 2003 [5] http://www.nokia.com/nokia/0,8764,2275,00.html [6] Anthony Vetro and Christian Timmerer, “ISO/IEC 21000-7 FCD, Digital Item Adaptation,” MPEG03/N5845, July 2003. HTU
0.4 25
30
35 40 45 Standard Deviation of Channel Throughput (kbps)
50
55
Fig. 9. Comparison of standard deviation of frame rate
Conclusion
UTH