Radio Link Buffer Management and Scheduling for Video ... - CiteSeerX

Radio Link Buffer Management and Scheduling for Video Streaming over Wireless Shared Channels G¨unther Liebl, Hrvoje Jenkac, Thomas Stockhammer, Christian Buchner Institute for Communications Engineering (LNT) Munich University of Technology (TUM) D-80290 Munich, Germany Abstract— In this work we investigate strategies for joint radio link buffer management and scheduling for video streaming over wireless shared channels with focus on High–Speed Downlink Packet Access (HSDPA). The simulations have been carried out with the virtual mode of our wireless system emulator WiNe2. We compare different end–to–end streaming options including variations in the initial delay and timestamp–based streaming versus ahead–of–time streaming. It turns out that buffer management at the entrance to the wireless system has a similar effect as server–based rate control schemes, but avoids the need for frequent end-to-end link probing. In case of an overloaded system packets with the longest waiting time in the radio link buffer should be dropped, since most likely their presentation deadline has already expired anyway. We also conclude that hybrid scheduling strategies do not yield large gains for timestamp-based streaming, since the inherent coarse ”rate control” is sufficient to avoid extreme unfairness in the system. In this case the use of a simple maximum–throughput scheduling policy provides best results. However, if the streaming application does not behave fair as in case of ahead–of–time streaming, the maximum–throughput policy degrades the overall system, as users with bad channel conditions are blocked. Hence, a fair scheduling algorithm provides significantly better performance. Finally, it is shown that the exploitation of simple priority information in packet headers in the dropping strategy can only increase the quality for high initial delays.

I. I NTRODUCTION With the introduction of 2.5G and 3G wireless systems packet–based video streaming to mobile devices has become reality. Subscribers are already able to access millions of streams stored on the public Internet and to display these on their hand-held device. Content providers expect that streaming applications will gain enormous interest over the next years and that packet–based traffic will exceed circuit switched traffic, like voice communication, in the near future. Hence, the optimization and adaptation of streaming strategies to wireless networks, like Enhanced GPRS (EGPRS) or High–Speed Downlink Packet Access (HSDPA), has become a challenging task, since in general, streaming content is foreseen to be supplied to both wired clients and clients which use a wireless connection as the last hop in the overall transmission chain. This heterogeneous network structure results in a number of conflicting issues: On the one hand, significant performance gains for video transmission over wireless channels can be achieved by appropriate adaptation. On the other hand, optimization of the media encoding parameters or streaming server tranmission strategies exclusively to wireless links will result

Axel Klein Siemens AG -Information and Communication Mobile (ICM) D-81541 Munich, Germany

in suboptimal performance for a wired transmission and vice versa. Furthermore, the traditional approach is to consider each layer in the end–to–end connection independently in order to reduce the complexity of the integration process significantly. However, as already pointed out in [1], this strategy completely neglects the interactions between different system components by making “worst-case” assumptions for the message passing between layers. In order to increase system performance significantly, so–called “cross-layer” design is proposed to exploit the inherent correlations in the transmission path. For this type of advanced system design, a magnitude of components have to be considered, which are part of a wireless streaming system: The latter starts with the streaming server, which is located either on the Internet or in the operators core network, and ends with the wireless streaming client. Further entities which require appropriate optimization are, for example, media coding, intermediate buffering, channel resource allocation and scheduling, receiver buffering, admission control, media playout, error concealment, etc. The number of parameters and adaptation possibilities within these components is enormous, and the search for an optimal joint set of parameters and strategies is usually not feasible. In this work we focus our attention on finding suitable buffer management strategies for incoming IP–based multimedia streams at the radio link layer in a wireless system. We consider a wireless shared channel scenario, where several streaming users share the common physical resources bandwidth and transmission power at the same time. The scheduler is located at the base station and assigns for each time instant channel resources to individual flows of users based on a particular scheduling policy. The applied scheduling metric depends on one or more of the following measurements: the users’ channel quality, the queue lengths, the head-of-line (HOL) waiting times, or the packet deadlines. The overall goal in a streaming setup is to deliver each packet before its displaying deadline exceeds. Depending on the data rates of the competing streams and the actual scheduling history the fullness of the traffic queues of the streams will vary over time. Moreover, if the system operates at maximum capacity, some buffers will temporarily overflow. Therefore, appropriate packet dropping strategies at the buffers are required to handle this problem. The question of which media packets to schedule and drop over wireless links has been addressed, for example, in [2]. However, the authors do not consider multiuser issues

or dependencies among frames in a GOP in their analysis. In this work we develop and compare several buffer management strategies, which are either based on cross-layer information or do not consider any additional side information. We will show that significant performance gains can be achieved for wireless video streaming by applying low-complexity modifications to state-of-the-art solutions. The rest of this paper is organized as follows: Section II will provide in–depth information on the considered video streaming application. Next, a general overview of a wireless multiuser streaming system and the specific HSDPA system model is presented in section III. The various scheduling algorithms used in this work, as well as our proposed buffer management algorithms, will be explained in section IV. Detailed results for different radio link buffer management strategies, scheduling policies, and end–to–end streaming modes for a typical test case are given in section V. The paper concludes with a summary of the major issues. II. P RELIMINARIES AND D EFINITIONS FOR V IDEO S TREAMING A PPLICATIONS A. End–to–End Streaming System Streaming applications are usually set up as an end–to– end connection between a media streaming server and a client requesting pre–encoded data to be streamed to the end user. The receiver buffers the incoming data and starts playback after some initial delay. Once playback has started, a continuous presentation of the sequence should be guaranteed. For CBR channels with constant delay successful playout can be guaranteed by encoding and streaming of the video sequence such that the resulting bit–stream contains a leaky bucket. However, in our investigated system neither the bit– rate nor the delay is constant, and, in fact, some data units are even lost. Therefore, let us investigate an end–to–end streaming scenario where packets are delayed and dropped. Assume that the media server stores a packet–stream, defined by a sequence of packets called data units in the following, ie P = P1 , P2 , . . . . Each data unit Pn has a certain size rn in bits and an assigned timing information in form of a Decoding Time Stamp (DTS) tDTS,n indicating when this data unit must be decoded relative to tDTS,1 . After the server has received a request from a client it starts transmitting the first data unit P1 at time instant ts,1 . The following data units Pn are equivalently transmitted at time instants ts,n . Data unit Pn is completely received at the far–end at tr,n and the interval between receiving time and sending time is specified as δn tr,n − ts,n . The channel delay reflects the most important value for a real–time application as too long end–to–end delays result in significant performance degradation. For sake of completeness we model the loss of a data unit by an infinite channel delay, ie δn = ∞. We assume in the following that if data units are received, they are correct and otherwise, their loss or delayed arrival is detected by the use of appropriate sequence numbering. The received data unit Pn is kept in the receiver buffer until it is forwarded to the video decoder at decoding time td,n .

Without loss of generality we assume that the arbitrary value of the DTS of the first data unit is equivalent to the decoding time of the first data unit, ie tDTS,1 = td,1 . An important criteria of the performance of a video streaming system is the time between the request of a receiver for a certain stream and the time the first data unit is presented. Neglecting the delays for conveying the request from the receiver to the streaming server and the time for the streaming server to set up the streaming session, as well as assuming that the first frame is immediately presented after it is decoded, this delay can be computed as the difference between the sending time of the first data unit, ts,1 , and the decoding time of the first data unit, td,1 , and is in the following defined as the initial delay δinit td,1 − ts,1 . Then, data units which fulfill ts,n + δn ≤ tDTS,n can be decoded in time. The decoder buffer is used for de– jittering and data unit Pn is stored in it for some time tDTS,n − tr,n . In the remainder of this work we assume that the decoder buffer is sufficiently large such that no restrictions apply to the transmission time instant ts,n of data unit Pn . Small variations of the channel delay can be compensated for by this receiver buffer, but long–term variances result in loss of data units and therefore, we need new techniques to support streaming in these environments. An insightful summary of these techniques has been presented in [3] by dividing them into three different categories: Adaptive media playout [4] allows a streaming media client, without the involvement of the server, to control the rate at which data is consumed by the playout process. A second technology for a streaming media system has been proposed, which makes decisions that govern how to allocate transmission resources among packets. Recent work [5] provides a flexible framework to allow rate-distortion optimized packet scheduling. This can be supported, if media streams are pre-encoded with appropriate packet dependencies, possibly adapted to the channel (channel-adaptive packet dependency control) [6]. However, state–of–the–art streaming systems (which represent the third category) in general do not apply any of these techniques. Therefore, we restrict ourselves in the following to an important practical scenario, but we note that the generalization to more advanced technologies is easily feasible and is currently investigated. We dispense with adaptive playout technologies, as well as any rate–distortion optimization at the streaming server. To support some amount of scalability for the source data, we apply the simplest and practically only successful rate adaptation scheme for pre–encoded video, namely temporal scalability: We use a group–of–picture (GOP) structure which includes I-frames, P-frames, and disposable B-frames. B. Source Abstraction for Streaming In the following we will briefly introduce a formalized description of the encoded video basically following the definitions in [5]. We restrict ourselves to video where each video frame is transported in a separate packet referred to as data unit. However, the framework can be generalized to any source

structure being represented as an acyclic directed graph [5]. The video encoder Qe maps the video signal s = {s1 , . . . , sN } onto a packet–stream P Qe (s). The packet–stream is defined by the sequence of data units, P = P1 , P2 , . . . . After encoding, each data unit has a certain size rn in bits. For convenience, we define the sampling curve or encoding schedule, Bn , as the overall amount of dataproduced by the n video encoder up to data unit n, ie, Bn j=1 rj . We assume a one–to–one mapping between source units sn , n = 1, . . . , N (representing video frames in our case) to data units. Therefore, each video frame sn generates exactly one data unit Pn which can be transported separately over the network. Encoding and decoding of sn with a specific video coder Q results in a reconstructed source which generally differs from the original source. To be more specific, let us define the reconstruction quality for this source unit si as Qn q(sn , Q(sn )), where q(s, s^) measures the rewards/costs when representing s by s^. We restrict ourselves in the following exclusively to the Peak Signal–to–Noise Ratio (PSNR), as it is accepted as a good measure to estimate the video performance. However, we are also aware of the flaws and drawbacks of this measure when it comes to the evaluation of lossy video transmission. The total average quality up to source unit n for a sequence of size N is therefore defined as Qn (N)

1 N

n

if source unit sn is represented with source unit si as ˜ n (i) q(sn , Q(si )). Q Therefore, we express the importance of each data unit Pn as the amount by which the quality at the receiver increases if the data unit is correctly decoded, ie ⎛ ⎞ N 1 ⎝ ˜ i (n) − Q ˜ n (c (n)) + ˜ i (c (n)) ⎠ , Qn − Q Q In N i=n+1 ni

(2) ˜ n (i) the concealment quality, c (n) the number of with Q the concealing source unit for source unit sn , and n i indicating that i depends on n due to the concealment ˜ n (0) means strategy. Additionally, the concealment quality Q that source unit sn is concealed with a standard representation. In this way, the overall quality can alternatively be computed as N N 1 Q= Qn = Q0 + In , (3) N n=1

n=1

with Qn the single frame quality and the minimum quality, if all frames are presented as grey, defined as Q0

N 1 ˜ Qn (0) . N

(4)

n=1

Qi ,

(1)

i=1

and the total quality is defined as Q QN (N). According to [5], regardless of how many media objects there are in a multimedia presentation, and regardless of what algorithms are used for encoding and packetizing those media objects, the result is a set of data units for the presentation which can be represented as a directed acyclic graph. If such a set of data units is received by the client, only those data units whose ancestors have all been also received can be decoded. We will use this structure in the definition of a straightforward concealment algorithm: In case of a lost data unit, the corresponding source unit is represented by the timely–nearest received and reconstructed source unit instead of the lost source unit. In addition, only direct or indirect ancestors are taken as concealing source units. Note that the Presentation Time Stamp (PTS) tPTS,n of the concealing source unit can be in the future, eg for B–frames in video. If there is no preceding source unit, eg I–frames, the lost source unit is concealed with a standard representation, eg a grey image. In case of consecutive data unit loss, the concealment is applied recursively. Assume that c (n) = i. If data unit Pi is also lost, the algorithm has to find a concealing source unit for si , ie for source unit sj such that c (i) = j. By recursive substitution we can write c (i) = c (c (n)) = j, ie source unit sn is not concealed by si , but with source unit sj . To avoid the notion of long concealment chains we write concealment dependency as j n ∃i s.t. (c (i) = j) ∧ ((i n) ∨ (i = n)). ˜ n,ν (i), This allows us to define the concealment quality Q

It is said that quality is incrementally additive with respect to the partial order given by the dependency graph. An important limitation of this incrementally additive model is that the amount by which the quality increases when a data unit is decoded does not depend on whether its sibling or cousin data units are decoded. Thus, for example, in this model, the increase in quality when a B–frame is decoded does not depend on whether or not any other B-frame is decoded. This rules out a number of error concealment techniques. Fortunately, the incremental additivity provides a good approximation to reality even in those cases where it is not exact. Obviously, one can think of better concealment algorithms, however, for our purposes this strategy is sufficient. For this work we have encoded a QCIF sequence of length N = 2698 with alternating speakers and sport scenes using H.264/AVC [7] test model software JM4.0 by applying a single quantization parameter 28 at frame rate 30 frames per seconds without using any rate control algorithm. A typical Group–of–Picture (GOP) structure with IBBPBBP...I has been applied with I–frame distance of 1 second. The I–frames have Instantaneous Decoder Refresh (IDR) property, the B–frames are not referenced and are therefore disposable. The PSNR results in Q(N) = 36.98 dB, and the average bit–rate becomes 178.5 kbit/s. Figure 1 shows the normalized sampling curve Bn /BN and normalized cumulative importance Qn (N)/Q over the DTS tDTS,n . Due to the Variable Bit–Rate (VBR) encoding it is obvious that the PSNR increases almost linearly, whereas the sampling curve has periods with slow increase corresponding to the low–motion speaker scenes and periods with fast increase for sport sequences. In addition, especially

for low–motion parts, a staircase behavior at I–frame positions is obvious.

PSNR Qn (N)/Q in dB, Bn /BN

1

0.8

0.6

0.4

0.2 cumulative importance PSNR Qn (N)/Q sampling curve Bn /BN 0 0

10

20

30

40 50 tDTS,n in seconds

60

70

80

90

Fig. 1. Normalized sampling curve Bn /BN and normalized cumulative importance Qn (N)/Q for 90 seconds test sequence with N = 2698, Q(N) = 36.9833 dB, and BN = 2008271 bytes resulting in an average bit–rate of about 178.5 kbit/s.

Each encoded video frame is packetized into a single Network Abstraction Layer (NAL) unit, which itself is encapsulated into a Real–Time Transport Protocol (RTP) packet according to [8]. The overhead for the RTP header and the NAL header is considered in the sampling curve. Therefore, each RTP packet with inherent timing information DTS tDTS,n and packet size rn corresponds to a data unit Pn . In addition, the two–bit NAL Reference Identification (NRI) field in the NAL unit header is set such that disposable B–frames obtain value 0, IDR frames value 3, half of the P–frames with value 1, and half of the P–frames with value 2 according to their importance In . C. Streaming Parameters and Performance Criteria In the following we present a set of parameters when streaming video over lossy and variable delay channels. In addition, we will provide means to evaluate the performance of such a system. Note that the performance estimation can in no way be complete due to the high variability of the system. Still, we consider the results as representative and we also consider our measures taken as relevant enough to compare different strategies. It is important to notice that for an end–to–end system the performance of the system strongly depends on the bit–rate chosen for the multimedia stream and the amount and importance of the packets being not available at the decoder. To be more specific we define the observed channel behavior at the receiver for data unit Pn as Cn 1 {data unit Pn available} where 1 {A} denotes the indicator function being 1 if A is true and 0 otherwise. In case of a certain observed channel sequence C = {C1 , . . . , CN } and with the definition in (2) the received quality can be expressed as N n−1 Q(C) = Q0 + In Cn Cm . (5) n=1

m=1 m≺n

The video decoder might experience the absence of certain data units Pn in the decoding process due to one of three different reasons: 1) The data unit can be lost due to impairments on the mobile radio channel or buffer overflow in the network, ie, δn = ∞; 2) it arrives after its decoding time has expired such that it is no more useful (”late loss”), ie δn > tDTS,n − ts,n ; 3) the server has not even attempted to transmit the data unit. Whereas the former two reasons mainly depend on the channel, the latter can be viewed as temporal scalability and a simple means for offline rate control. In the remainder of this work we assume that the server transmits all data units of the stream. Each data unit has assigned a nominal sending time ts,n , which is computed from the DTS tDTS,n and the sending time of the first data unit, ts,1 . In the timestamp–based streaming (TBS) case the server forwards data unit Pn exactly at time ts,n to the network. In ahead–of–time streaming (ATS) the server can possibly transmit data unit Pn ahead of time. Note that this requires the decoder buffer to be sufficiently large, and that the network and intermediate buffers can support higher instantaneous data rates. More details on streaming server strategies are discussed in section IV, where interaction of network and streaming server is considered. The second parameter of choice in our end–to–end streaming system is the initial delay δinit at the decoder. On the one hand this value should be kept as low as possible. Significant startup delay is annoying to the end user and, if the video does not playback after a certain time, the end user might even stop the playback as he assumes that the network is starving. This is even more obvious for interactive applications where the user wants to react and modify the stream. For example, if switching to a different channel takes too long, the user might miss important information. This is especially undesirable for time–critical surveillance applications. On the other hand, longer initial delay can compensate for larger values of channel delays, and therefore, the loss due to cause 2 in the above enumeration can be reduced. We defer more advanced transmit and receiver strategies such as stream– switching, rate–distortion optimized streaming, out–of–order transmission, rebuffering at the client, or adaptive playout for later as we want to exclude too many different and possibly counteracting influences on our results. Obviously, any of these techniques can enhance the overall system performance. However, one can not expect that most state-of-the-art streaming servers and clients make use of these features. Under these premises we can reduce the evaluation of the streaming performance with the availability of a sequence of channel delays δ = {δ1 , . . . , δN } for each data unit Pn with n = 1, . . . , N and a predefined initial delay δinit as Q(δ, δinit ) = Q0 +

N n=1

In 1 {δn ≤ δinit }

n−1 m=1 m≺n

1 {δm ≤ δinit } . (6)

As a second measure of interest we introduce the percentage of lost data units P(δ, δinit ), which is computed as P(δ, δinit ) =

N 1 1 {δn > δinit } . N

Streaming Server

Streaming Server

Streaming Server

(7)

n=1

In general, the channel delay sequence δ is not deterministic, but varies significantly. Therefore, to evaluate the performance of video streaming applications it is essential to obtain a reasonable distribution of the channel delay sequence δ. However, the problem is that for complex systems this value is neither deterministic nor can it be determined by a simple statistical description as, for example, suggested in [5] for the modeling of Internet packet delays. The approach taken to simulate the channel delay sequences for a wireless multiuser system will be discussed, among others, in further detail in the following. III. W IRELESS M ULTIUSER S TREAMING S YSTEM A. General System Overview Figure 2 shows a simplified model of our investigated system. We assume that several users in the serving area of a base station in a mobile system have requested to stream multimedia data from one or more streaming servers. In the following we are almost exclusively interested in the downlink of this system. We assume that the core network is over–provisioned such that congestion is not an issue on the backbone. The streaming server forwards the packets directly into the radio link buffers, where packets are kept until they are transmitted over a shared wireless link and finally arrive at the media client. Each single end–to–end streaming connection behaves as discussed in section II. The number of users is denoted as M in the following. At the entrance of the wireless system a scheduler decides which users can access the wireless system: Rather than sharing a common bit–rate as in case of wired transmission, the mobile users in this serving area share common physical resources, namely the total available bandwidth and the transmit power. A resource allocation unit integrated in the scheduler assigns these resources appropriately. It is obvious that for the same resources available different users can transmit a different amount of data. For example, for the same resources available a user close to the base station can use a coding and modulation scheme which allows to transmit at higher bit–rate than one at the boundary of the serving area. In general, the performance of the streaming system should significantly depend on many parameters such as the buffer management, the scheduling algorithm, the resource allocation, the bandwidth and power share, the number of users, etc. In the following we will concentrate on a specific multiuser system which includes many advanced radio link features. We will briefly discuss the variability of this system with focus on streaming applications. B. System Example – HSDPA The system under investigation in this work is HSDPA [9], which is part of the Universal Mobile Telecommunications Systems (UMTS) release 5 specification and has been introduced to increase the packet data throughput to mobile

Buffer Management

Scheduler

Wireless Channel

Media Client

Media Client

Media Client

Fig. 2. Simplified multiuser streaming system.

terminals significantly while optimizing the resource allocation efficiency at the same time. The key new features compared to standard UMTS packet transmission modes are the use of adaptive modulation and coding to perform link adaptation instead of fast power control, fast Layer-1 hybrid ARQ with transmission combining, as well as fast scheduling directly at the Node-B on a very short time-scale of 2 ms. Thus, much of the signal processing previously performed in the radio network controller has been moved as close to the air interface as possible to allow immediate reaction to varying channel characteristics. We have chosen to investigate our proposed buffer management strategies in an HSDPA environment, since this type of cellular link seems to be very well-suited to accommodate the high data rate and low latency requirements typical of streaming applications. For this reason, we have extended our existing WiNe2 emulation platform to support HSDPA features at the data link and physical layer. The complete simulation and modeling will be described in section V. C. Multi–user Streaming – Problem Formulation and Performance Measures In case multiple users attempt to stream data, in addition to the parameters of the single user end–to–end streaming system the overall performance is also influenced by the algorithms and methods applied in the wireless network, as well as by the variance and dynamics of both the wireless system and the applications. The channel delay sequences δm for each individual user depend, among others, on • the activity and mobility of users in the system, • the variability of the channel for individual mobile users, • the traffic characteristics of the individual users,

the performance of the physical layer signal processing, such as adaptive modulation and coding schemes, retransmissions, etc., • the applied scheduling and resource allocation algorithms, • the radio link buffer management at the entrance to the wireless shared channel. In the following we will focus on the latter two aspects as they are of major interest in the design of the system. For this purpose we will attempt to set up a reasonable scenario and fix all other parameters appropriately. In addition, to address the aforementioned problems and to find solutions for different combinations it is necessary to define suitable performance measures which allow to assess and compare different algorithms and parameter settings. For this purpose we extend the performance criteria for the single user system by using the mean of the values defined in (6) and (7) by averaging also over all users, ie •

Q(M, δinit ) = P(M, δinit ) =

1 M 1 M

M m=1 M

Q(δm , δinit )

(8)

P(δm , δinit ).

(9)

m=1

Although these measures may not include all relevant information, they are still very helpful in an initial performance estimation of the system and will be used to present our experiments. We also like to note that we focus on VBR encoded video as we hope that due to statistical multiplexing the performance of the overall system can be increased. We have provided theoretical justification for this conjecture in [10]. This leaves the modeling and simulation of appropriate channel delay sequences δm for a specific system. We will address this problem by introducing the extended version of our simulation tool WiNe2 in section V. IV. S CHEDULING AND B UFFER M ANAGEMENT S TRATEGIES A. The Scheduling and Resource Allocation Unit An important component of an HSDPA system is the scheduling unit located at the Medium Access Control (MAC) layer. The scheduling in HSDPA is not standardized and therefore allows for optimization. For this purpose we have decided to implement different scheduling algorithms for performance comparison. The scheduling unit consists of two functional elements, the resource allocator and the scheduler itself. Given a certain resource budget, eg transmission power and a number of spreading codes, the task of the resource allocator is to determine an allocation for each active flow that best utilizes this budget. The allocation consists of the selection of an Adaptive Coding and Modulation (AMC) scheme, the number of spreading codes, and the assigned transmit power if a certain user or one of its corresponding flows were selected by the scheduler. The latter then selects one or several flows for each Transmission Time Interval (TTI) according to a certain scheduling policy possibly taking into account information

available from the channel and the application. However, note that regardless of the scheduling policy in case of wireless shared channels (in contrast to dedicated channels) only flows are scheduled which actually do have data to be transmitted in the corresponding queue. Some basic and enhanced scheduling algorithms have been proposed in the literature, eg in [13]– [15]. We will briefly present and characterize those strategies integrated into our HSDPA emulation environment. For more details on the algorithms we refer the interested reader to the provided references. 1) Basic scheduling strategies: Well–known wireless and fixed network scheduling algorithms include, for example, the Round Robin scheduler, which serves users and flows cyclically without taking into account any information from the channel or the traffic characteristics. 2) Channel–State Dependent Schedulers: The simplest, but also most appealing idea for wireless shared channels - in contrast to fixed network schedulers - is the exploitation of the channel state of individual users when selecting the next flow to transmit. Obviously, if the flow of the user with the highest signal–to–noise–and–interference ratio is mapped to an appropriate AMC scheme at any time instant, the overall system throughput is maximized. This scheduler is therefore referred to as Maximum Throughput (MT) scheduler and may be the most appropriate if throughput is the measure of interest. However, as flows of users with bad receiving conditions are blocked, some basic fairness is often required in the system. For example, the Proportional–Fair policy schedules the user which has currently the highest ratio of actual to the user’s average throughput. 3) Queue–Dependent Schedulers: The previously presented algorithms do not take into account the buffer fullness at the entrance to the wireless system except that flows without any data to be transmitted are excluded from the scheduling process. Queue–dependent schedulers take into account exactly this information. For example, the Maximum Queue (MQ) scheduler selects the flow whose Head-of-Line packet in the queue currently has the largest waiting time. 4) Hybrid Scheduling Policies: It has been recognized that it might be reasonable to take into account both criteria, the channel state information and the queue information, in the scheduling algorithm. In [16] some hybrid algorithms have been proposed under the acronyms Modified Largest Weighted Delay First (MLWDF) and Exponential Rule, which yield the most promising results among the standard solutions, but require a clever choice of gain factors and thresholds. Although all of these policies tend to neglect the special properties of streaming media flows, we have decided to evaluate the performance of these algorithms when transmitting real–time multimedia data. In addition, we are currently experimenting with advanced schedulers that take into account, for example, side information from upper layers about the structure of the stream [17].

B. Radio Link Buffer Management Even before the data units are segmented into appropriate radio link packets and forwarded to the scheduler in the MAC layer, they are stored within the Radio Link Control (RLC) layer in a structure referred to as radio link buffer in the following. We assume that any buffers in the lower layers of the wireless system, eg in the MAC layer, are only supplied with ”just enough” segmented data units, but do never overflow. Hence, the primary control of the input flows is done at the entry point to the air interface at the radio link buffer. The radio link buffer management controls the fill process of the buffer of each flow at the RLC layer. Note that we assume that this buffer stores entire IP packets, which are in our case abstracted to data units. For better insight into the problem, we assume that the radio link buffer can store N data units, independent of the size of the data units. Note that in practical systems the physical memory of this buffer should not be crucial. However, we will see from the experiments that a limited buffer size is sufficient or even preferable for real– time applications. In our scenario the data units Pn released by the streaming server instantaneously arrive at the radio link buffer. These buffers are emptied by the underlying transmission processes according to the scheduling and resource allocation policy. If the radio link buffers are not emptied fast enough because the channel is too bad and/or too many streams are competing for the common resources, the wireless system approaches or even exceeds the system capacity and buffer fullness of individual streams will quickly increase. When the buffer fullness approaches the buffer size N, data units in the queue have to be dropped. We will present and discuss several possible buffer management strategies in the following: 1) Infinite Buffer Size (IBS): Each radio link buffer has infinite buffer size N = ∞, which guarantees that the entire stream can be stored in this buffer. No packets are dropped resulting in only delayed packets at the streaming client. This is the standard procedure for a system with sufficient physical memory for a practical number of parallel streams. 2) Drop New Arrivals (DNA): Only N packets are stored in the radio link buffer. In case of a full queue, the newly arriving packets are dropped and are therefore lost. Note that this is the standard procedure applied in a variety of elements in a wired network, eg routers. 3) Drop Random Packet (DRP): Same as DNA, but instead of dropping the newly arrived packet we randomly pick a packet in the queue to be dropped. The incoming packet is enqueued at last position. This strategy is somewhat uncommon, but we have decided to include it here since all other possibilities are only specific deterministic variants of it. 4) Drop HOL Packet (DHP): Same as DRP, but instead of a random pick we choose to drop the Head–Of–Line (HOL) packet, ie the packet which resides longest in the buffer. The incoming packet is enqueued at last position. Our motivation for this dual strategy of DNA is the fact that streaming media packets usually have a deadline associated with them. Hence in order to avoid inefficient use of channel resources for packets

that are subject to late-loss at the media-client anyway, we drop the packet with the highest probability of deadline violation. 5) Drop Priority Based (DPB): Similar to DHP, but priority information is exploited. Assuming that each data unit has assigned a priority information, we choose to drop the data unit with the lowest priority which resides longest in the buffer. The incoming packet is again enqueued at last position. Our motivation for coming up with this strategy is the fact that sophisticated media codecs, like H.264/AVC, provide options to indicate the importance of certain elements in a media stream on a very coarse scale. Hence, this side information should be used to remove those packets first whose nonexistence does not affect the end-to-end quality significantly. C. Streaming Server Rate Control Finally, we will briefly discuss the two basic streaming server rate control mode, since they have a non-negligible influence on the system performance. In the following we will assume that no dynamic end–to–end rate control is performed between server and client (which might be too pessimistic for more advanced streaming systems). However, radio link buffer management can also be viewed as a simple dynamic means to adjust the data rate at the cost of lower quality before the packets enter the wireless system. Thus, dropping at the radio link buffer is almost equivalent to dropping at the server (provided that the wired links are over-provisioned). 1) Timestamp–Based Streaming: In case of TBS the data units Pn are transmitted exactly at sending time ts,n . If the radio link buffer is emptied faster than data units arrive according to the transmission schedule, then it possibly underruns. In this case, this flow is excluded from the scheduling even if it were selected by the scheduler. 2) Ahead–of–Time Streaming: In contrast, in case of ATS we assume that the streaming server is notified that the radio link buffer can accept packets. In this case, the streaming server forwards data units to the radio link buffer even before nominal sending time ts,n of data unit Pn such that the radio link buffer never underruns and all flows are always considered by the scheduler. However, the streaming server eventually has to forward a single data unit no later than at ts,n regardless of the fill notification. Thus, a drop strategy at the radio link buffer is still necessary. Note that this server mode only applies to pre-recorded streams and requires a large decoder buffer at the client. V. S ELECTED E XPERIMENTAL R ESULTS A. The Simulation Platform The WiNe2 demonstration platform [11] has been developed to provide a common environment for both objective and subjective performance evaluation of real–time multimedia solutions over wireless networks. It consists of a system–level simulator for the wireless protocol stack at packet level, which is capable of operating in two modes: In the virtual–time mode (which we have used in this work), detailed performance studies of different system modifications can be conducted under reproducible simulation conditions. The traffic running

a)

Maximum Throughput (MT) Scheduling, N=150, TBS 35

30

30

PSNR in dB

PSNR in dB

Maximum Throughput (MT) Scheduling, N=30, TBS 35

25

b)

20

IBS DNA DRP DHP

15

0

1000

2000

3000 4000 Initial Delay in ms

5000

6000

25

20

IBS DNA DRP DHP

15

7000

0

1000

2000


5000

6000

7000

Fig. 3. a),b) Average PSNR Q(M = 10, δinit ) versus initial delay δinit for MT scheduling and simple radio link buffer management strategies.

over the simulator is either produced by stochastic generators or read from trace files, and the output statistics computed by the respective traffic sinks are usually in terms of objective performance criteria. In the real–time mode, however, additional live IP–based multimedia traffic can be injected from outside into the system–level simulator, which either delays or drops the IP packets at the output according to the characteristics of the dynamic wireless link model. This allows for subjective evaluation of the perceived quality of audio and video applications over existing, emerging, and future mobile networks, which can be used to test, evaluate and asses new multimedia services by subjective observations, as well as for demonstration purposes. The existing WiNe2 platform is capable of emulating General Packet Radio Service (GPRS), EGPRS, and UMTS dedicated channel. Just recently, we have extended WiNe2 by an HSDPA simulation stack including a full version of the layer–2 functionality in the user data plane, ie Packet Data Convergence Protocol (PDCP), RLC, and MAC layer, as well as AMC and hybrid Automatic Repeat reQuest (ARQ) features. The physical layer, however, is modeled on an abstract level to enable real–time transmission, ie the entire coding, modulation, and transmission process is simulated via error statistics generated in extensive offline link level simulations. The usage of these patterns is nevertheless dynamic, ie for each transmission time interval (TTI) a new operation point of the system – specified by a certain signalto-noise-and-interference value – is chosen for each user based on standard cellular propagation models and the effect of self- and intracell-interference due to fast fading [12]. The channel estimation procedure at the mobile stations follows the principle guidelines stated in the 3GPP specifications for HSDPA, but can be influenced via adding a dynamic offset, thus taking into account the effect of advanced receiver structures using, for example, interference cancellation instead of a standard Rake receiver. For more details on the HSDPA physical layer features we refer the interested reader to [9]. B. Definition of Test Scenarios We consider a hexagonal cellular layout with one serving base station (called Node–B) and 8 tiers of interfering base

stations. The cell radius is 1 km and a Node-B serves three sectors of 120 degrees each. The propagation environment is of type Pedestrian-B as specified in [18]. The Node-B transmit power per sector is limited to 20 W per sector, of which 90% are available for HSDPA. Together with the 15 possible spreading codes they represent the resource budget to be assigned by the scheduler per TTI. All user terminals apply HARQ with chase combining at the receiver, and the maximum number of transmissions of one radio link packet is 4. A total of M = 10 users are attached to the serving Node-B, which are placed randomly in the serving area and move at a speed of 3 km/h. We assume that users do not enter or exit the serving area during simulation run by using appropriate mobility models reflecting a random walk with strongly correlated angle of direction. Each user has requested a streaming service at the same point of time for the same test sequence of figure 1. For each user the sequence starts at a random initial frame and is looped six times such that nine minutes of streaming service resulting in about N = 18, 000 data units are simulated. C. Buffer Management Performance for MT Scheduling In a first set of experiments we decided to fix the scheduling policy and use maximum–throughput scheduling, as a key motivation of HSDPA is the exploitation of the channel state to maximize the overall throughput. In Figures 3a,b we compare different simple buffer management strategies for this scheduling policy by evaluating the average PSNR Q(M = 10, δinit ) over the initial delay δinit selected at the receiver, which is varied from 0 to 7 seconds. For the radio link buffer size we have selected N = 30 (figure 3a) data units and N = 150 (figure 3b) data units corresponding to 1 and 5 seconds of video data, respectively. It is obvious that regardless of the buffer size and drop strategy, the system performance increases with larger initial delay as the probability of in–time packets increases. As the system is overloaded by about 20%, in case of IBS the fullness of the radio link buffers increases over the length of the streams. Since no dropping is performed, excessive initial delay is required for sufficient performance. Still it is worth noting that due to the MT policy at least

a)

Maximum Throughput (MT) Scheduling, N=150, TBS & ATS 35

30

30

PSNR in dB

PSNR in dB

Maximum Throughput (MT) Scheduling, N=30, TBS & ATS 35

25

b)

20

IBS, TBS DHP, TBS DPB, TBS DHP, ATS

15

0

1000

2000


5000

6000

25

20

IBS, TBS DHP, TBS DPB, TBS DHP, ATS

15

7000

0

1000

2000


5000

6000

7000

Fig. 4. a),b) Average PSNR Q(M = 10, δinit ) versus initial delay δinit for MT scheduling and advanced streaming and radio link buffer management strategies.

some users, namely those close to the base station, are served with good quality, but worse users experience too high channel delays for this setup. Hence, for improving the overall system performance it is beneficial to drop data units at the radio link buffers (irrespective of the strategy) to reduce the excess load at the air interface: By keeping the buffer size within the expected range of tolerable initial delay at the client, at least an in-time delivery of a temporally scaled version of the video stream can be guaranteed. The most obvious strategy DNA, ie dropping the newly arrived packet, shows worse performance than dropping packets randomly. The by far best performance is obtained by dropping the HOL packet, as it is likely that in case of a full buffer the HOL packet has expired anyway and thus should not be transmitted. It is also obvious that the buffer size N should be chosen to match the selected initial delay δinit . However, in general the radio link buffer management is unaware of the initial delay selected by the receiver. Therefore, conservative HOL dropping with smaller buffer size might be favorable. Due to the superior performance of HOL we will in the following concentrate on this strategy. D. Performance of Advanced Strategies In this section we will evaluate the influence of advanced streaming strategies for maximum–throughput scheduling. For means of comparison, the performance curves for IBS (lower bound) and DHP (current upper bound) from figures 3a,b are repeated in figures 4a,b. The first interesting observation is that the additional side information for DPB does not yield the initially expected gains. Moreover, in the region of practical values of the initial delay, DPB is worse than simple DHP. This is due to the fact that for the chosen priority scheme DPB leads to queues full of large I-frames for the worst users. However, although the priority is high, the HOL packet is very often still no more useful for the receiver as the decoding time–stamp has expired. The queue virtually blocks itself with a pile of large I-frames. However, we also observe that in case of sufficiently large initial delay DPB will outperform DHP, since then the higher priority frames are likely to be in– time. We also want to mention that with the help of priority

information the media client can dynamically select its initial delay δinit appropriately, an option that is not contained in our analysis. Therefore, we conjecture that in combination with rebuffering, adaptive initial delay determination, or adaptive playout strategies priority based dropping strategies can be beneficial. This aspect is currently under investigation. Figure 4a,b also show the performance of ahead–of–time streaming (ATS), which obviously performs much worse than time–stamp based streaming except for very low δinit . The reason for this behavior comes from the fact that due to the applied scheduling strategy only the best users are served. Therefore, data units for good users arrive early at the client at the expense that for worse users the data units are significantly delayed and do not meet their deadlines. This unfair behavior results as the buffers for the good users never underrun, and thus, they are always taken into account in the scheduling process. The behavior of these users can be viewed as unfair, as they take away radio resources although the latter are not immediately essential for in–time delivery. The optimized exploitation of the system throughput cannot be transferred to the quality of real–time services, and therefore we recommend that ATS should be considered carefully in shared wireless environments. E. Scheduler Comparison and Fairness Finally, we investigate the influence of the scheduler on the performance of the streaming application. Therefore, figure 5 shows the average PSNR for TBS and ATS with DHP and DPB in case of a MT and MLWDF scheduling policy. The radio link buffer size has been fixed to N = 30. For very short values of the initial delay with respect to the buffer size (N = 30 packets is equal to 1s of video), the MT scheduler with DHP performs best for both TBS and ATS: This is due to the fact that the MLWDF scheduler only manages to transmit packets of bad users which have resided too long in the buffer and are subject to late-loss anyway. The suboptimal performance of DPB compared to DHP holds for both MT and MLWDF. As soon as the initial delay matches the buffer size, the fairness criterion gets important. This is especially true for ATS, for which the MLWDF scheduler

MT & MLWDF Scheduling, N=30, TBS & ATS

strategies. In addition, this type of streaming may result in severe unfairness among users with different channel characteristics, and thus requires a TCP–like rate control and is therefore more complex than simple TBS. Hence, we propose to take great care when using this strategy unless more sophisticated schedulers using, for example, deadlinebased policies are available. The latter are subject of ongoing research at our institute and results will be hopefully available in the near future. In addition, more results that emphasize our conclusions, but could not be included in this paper due to the limited length can be found on our webpage (http://www.wine2.org).

35

PSNR in dB

30

25

20

MT, DHP, TBS MLWDF, DHP, TBS MT, DPB, TBS MLWDF, DPB, TBS MT, DHP, ATS MLWDF, DHP, ATS

15

0

1000

2000


5000

6000

7000

Fig. 5. Average PSNR Q(M = 10, δinit ) versus initial delay δinit for different streaming strategies. Schedulers are MT and MLWDF.

exhibits a significant performance gain compared to the MT policy for medium range initial delay values: Since all users compete for resources during each scheduling cycle, a policy which includes both channel state and waiting times in the queue improves fairness among users and increases the overall quality. In case of TBS, the MLWDF performance in the upper initial delay region only shows small gains compared to the unfair MT scheduling due to the inherent coarse ”rate control”. Hence, fairness is not a great issue when timestamp-based streaming is considered. Finally, we want to add that more advanced priority-based scheduling policies are expected to improve the performance of DPB, since then self-blocking of high priority frames is less likely. Furthermore, a deadline–based scheduling policy may allow ATS to outperform TBS by operating always at the maximum capacity of the channel. Issues regarding performance gain vs. complexity are currently under investigation. VI. C ONCLUSION In this paper we have investigated strategies for joint radio link buffer management and scheduling for video streaming over wireless shared channels. We have shown that in case of timestamp-based streaming (TBS) and fixed initial delay at the streaming client, limited buffer sizes together with sophisticated dropping strategies significantly improve the overall achievable reception quality in case of system overload. Thus, buffer management at the air interface has a similar effect like dynamic server-based rate control schemes, but avoids the need for frequent end-to-end link probing. The optimal dropping strategy for incoming IP packets at the radio link buffer is to drop the packet with the longest waiting time in the buffer (DHP), since most likely its deadline has already expired anyway. Furthermore, fairness among users is not an important issue in this streaming setup. We thus propose to use the simple maximum–throughput policy to exploit the full overall system throughput as much as possible. In a second investigation we have demonstrated that aheadof-time streaming (ATS) is mostly inferior to TBS for the given combinations of buffer management and scheduling

R EFERENCES [1] H. Zheng, “Optimizing wireless multimedia transmission through cross layer design,” in Proc. ICME 2003, Baltimore(MD), USA, July 2003. [2] J. Zhimei and L. Kleinrock, “A packet selection algorithm for adaptive transmission of smoothed video over a wireless channel,” in Journal of Parallel & Distributed Computing, vol. 60, no. 4, April 2000. [3] B. Girod, M. Kalman, Y. J. Liang, and R. Zhang, “Advances in video channel–adaptive streaming,” in IEEE International Conference on Image Processing, Rochester(NY), USA, Sept. 2002. [4] M. Kalman, E. Steinbach, and B. Girod, “Adaptive media playout for low–delay video streaming over error–prone channels,” IEEE Trans. on Circuits Syst. Video Technol., 2004. [5] P. A. Chou and Z. Miao, “Rate-distortion optimized streaming of packetized media,” 2001, submitted. http://research.microsoft.com/ pachou. [6] Y. Liang and B. Girod, “Rate-distortion optimized low-latency video streaming using channel-adaptive bitstream assembly,” in IEEE ICME, Lausanne, Switzerland, Aug. 2002. [7] A. Luthra, G. Sullivan, and T. Wiegand, Eds., Special Issue on the H.264/AVC Video Coding Standard, July 2003, vol. 13, no. 7. [8] S. Wenger, M. Hannuksela, T. Stockhammer, M. Westerlund, and D. Singer, “RTP payload format for H.264 video,” Internet Engineering Task Force (IETF), Internet Draft, Work in Progress draft-ietf-avt-rtph264-04.txt, Feb. 2004. [9] H. Holma and A. Toskala, WCDMA for UMTS. New York, NY, USA: John Wiley & Sons, Inc., 2002. [10] M. Mecking and T. Stockhammer, “Source–controlled resource allocation,” in ITG Conference Source and Channel Coding, Berlin, Germany, Jan. 2002. [11] T. Stockhammer, G. Liebl, H. Jenkac, P. Strasser, D. Pfeifer, and J. Hagenauer, “Wine2 - wireless network demonstration platform for ip-based real-time multimedia transmission,” in Proc. Packet Video Workshop 2003, Nantes, France, Apr. 2003. [12] A. Seeger, M. Sikora, and A. Klein, “Variable orthogonality factor: a simple interface between link and system level simulation for high speed downlink packet access,” in Proc. VTC Fall 2003, Orlando, FL, USA, Oct. 2003. [13] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, and P. Whiting, “Providing Quality of Service over a Shared Wireless Link,” IEEE Communications Magazine, vol. 39, pp. 150–154, February 2001. [14] H. Fattah and C. Leung, “An Overview of Scheduling Algorithms in Wireless Multimedia Networks,” IEEE Transactions on Wireless Communications, October 2002. [15] S. H. Kang and A. Zakhor, “Packet Scheduling Algorithm for Wireless Video Streaming,” Proceeding of International Packetvideo Workshop 2002, Pittsburgh, April 2002. [16] S. Shakkottai and A. L. Stolyar, “Scheduling Algorithms for a Mixture of Real-Time and Non-Real-Time Data in HDR,” in Proceedings of the 17th International Teletraffic Congress (ITC-17), Salvador, Brazil, September 2001. [17] R. S. Tupelly, J. Zhang, and E. K. Chong, “Opportunistic scheduling for streaming video in wireless networks,” in Proc. 37th Annual Conference on Information Science and Systems, Baltimore, MD, USA, Mar. 2003. [18] TR.101.112 v3.2.0, Section Procedure for the choice of radio transmission technologies of the UMTS, ETSI, 1998.

Radio Link Buffer Management and Scheduling for Video ... - CiteSeerX

Radio Link Buffer Management and Scheduling for Video ... - CiteSeerX

Suggest Documents

Radio Link Buffer Management and Scheduling for ... - Springer Link

JoBS: Joint Buffer Management and Scheduling for ... - CiteSeerX

Efficient Buffer Management and Scheduling in a ... - CiteSeerX

Efficient Buffer Management and Scheduling in a ... - CiteSeerX

Smart Buffer Management for Different Start Video ... - CiteSeerX

Adaptive Buffer Sensitive Scheduling for

Scheduling algorithms for multihop radio networks - CiteSeerX

Buffer Management for Wireless Media Streaming - CiteSeerX

buffer management for automated material handling ... - CiteSeerX

Robust Buffer Allocation for Scheduling of a Project with ... - CiteSeerX

Multimedia Presentation Servers: Buffer Management and ... - CiteSeerX

Semantic-aware Link Layer Scheduling of MPEG-4 Video ... - CiteSeerX

Buffer Memory Optimization for Video Codec Application ... - CiteSeerX

Developing a Video Buffer Framework for Video Streaming in Cellular ...

Optimized Buffer Management for Sequence

Link Error Prediction Based Cross-Layer Scheduling for Video ...

CONGESTION-OPTIMIZED SCHEDULING OF VIDEO ... - CiteSeerX

Disk Scheduling in Video Editing Systems - CiteSeerX

Service Scheduling for General Packet Radio Service ... - CiteSeerX

QoS-Aware Channel Scheduling for Multi-Radio/Multi ... - CiteSeerX

Buffer Management in Relational Database Systems - CiteSeerX

Service Scheduling for General Packet Radio Service ... - CiteSeerX

Buffer Management in Relational Database Systems - CiteSeerX

A Buffer-Inventory-Based Dynamic Scheduling Algorithm ... - CiteSeerX