Video Streaming with Constant-Quality Rate Adaptation, Prioritized ...

FGS MPEG-4 Video Streaming with Constant-Quality Rate Adaptation, Prioritized Packetization and Dierentiated Forward Lifeng Zhao, Jitae Shin, JongWon Kim and C.-C. Jay Kuo Integrated Media Systems Center and Department of Electrical Engineering-Systems University of Southern California, Los Angeles, CA 90089-2564 E-mail:

flifengzh,

g

jitaeshi, jongwon, cckuo @sipi.usc.edu

ABSTRACT A video streaming solution with the ner granular scalable (FGS) MPEG-4 stream delivered over prioritized networks is investigated in this work. An optimal truncation strategy for constant-quality rate adaptation is rst presented by embedding the rate-distortion (R-D) information based on a piecewise linear R-D model. Then, the video stream is prioritized for dierentiated dropping and forwarding, where rate adaptation is dynamically performed to meet the time-varying available bandwidth. It is shown that, although the prioritized stream bene ts from the prioritized network, its gain is heavily dependent on how well video source and network priorities match each other. All key components, including FGS encoding, rate adaptation and packetization, error resiliency decoding and dierentiated forwarding, are seamlessly integrated into one framework. By focusing on the end-to-end quality, we set both source and network parameters properly to achieve a superior performance of FGS video streaming. Keywords: MPEG-4 FGS, video streaming, rate adaptation, dierentiated service (DiServ), unequal error protection (UEP), prioritized packetization, and priority networks.

1. INTRODUCTION Streaming media applications over IP (Internet protocol) networks have attracted increasing attention as an enabling technology for future media distribution. Streaming oers a signi cant improvement over the download-and-play approach for both on-line and o-line media delivery from a media server. Users may have dierent preferences, processing capabilities and diverse network access mechanisms to the streaming server. The heterogeneity of users and networks demands a highly scalable video coding solution and a exible delivery technique to overcome the challenges imposed by the best-eort Internet. The unpredictable channel variation requires ner granularity than the layered representation oered by conventional MPEG-2 and H.263+ video. The complex dependency geared for coding eÆciency poses another bottleneck since it aects the robustness of video transmission in an erroneous environment. The ne grain scalability (FGS) of MPEG-4 [1] is one big step towards the scalable video solution where the base layer targets at providing the basic visual quality to meet the minimal user bandwidth while the scalable enhancement layer can be arbitrarily truncated to meet heterogeneous network conditions. With the help of MPEG-4 FGS, video streaming is much simpli ed since all the transcoding overhead required by non-scalable codecs is bypassed. However, scalable coding only addresses one part of the problem. To deal with the whole problem fully, a exible delivery technique is also critical. For media delivery applications today, people start with the current best-eort network model and nd innovative streaming solutions to mitigate the eect of unpredictable packet loss [2, 3]. That is, the application-layer QoS (Quality of Service) is provided to end users via rate adaptation and error control. Although rate adaptation and error control are traditionally investigated independently, they are actually aect each other and recent work attempts to integrate them together. Rate adaptation was performed in [5] to smoothly adjust the sending rate of MPEG-4 FGS coded video based on the estimated network bandwidth, and each packet is then protected unequally. However, this sender-oriented rate adaptation is mainly suitable for unicast video rather than multicast one. Moreover, the FEC level for unequal error protection is decided in a heuristic manner (i.e. without optimization for end-to-end quality). One fully receiver-driven approach for joint rate adaptation and error control was proposed in [7] with pseudo-ARQ (automatic repeat request) layers. The sender injects multiple source/channel layers into the network, where delayed channel layers (relative to the corresponding source layers) serve the packet recovery role. Each receiver performs rate adaptation and error control by subscribing a selected number of source/channel layers according to

the receiver's available bandwidth and channel condition. However, as discussed in [6], the receiver-driven approach is subject to several drawbacks, such as persistent instability in video quality, arbitrary unfairness with other sessions, and diÆcult receiver synchronization. On the other end, from the network infrastructure viewpoint, the trend is to promote more QoS support in network nodes (i.e. boundary or internal routers). Two representative approaches in the Internet Engineering Task Force (IETF) are the integrated services (IntServ) with the resource reservation protocol (RSVP) and the dierentiated services (DiServ or DS) [9, 11]. They are more suitable in accommodating QoS requirements of dierent applications than the best-eort IP network. Between these two approaches, the DiServ scheme provides a less complicated and scalable solution since IntServ requires to maintain the per- ow state across the whole path for resource reservation. In the DiServ model, resources are allocated dierently for various aggregated traÆc ows based on a set of priority bits (i.e. the DS byte). Thus, the DiServ approach allows dierent QoS levels for dierent classes of aggregated traÆc ows. Some work was performed on non-scalable (or coarse granular) video streaming over the QoS provision network, and a signi cant amount of gain obtained from unequal error protection (UEP) was usually claimed. However, the gain is often overhyped, and this is especially true when the source and network parameters do not match well. By applying error-resilient scalable source coding, constant quality rate adaptation and packet prioritization to the DiServ-based network, we tackle this problem from both the application and the network viewpoints. The scalable codec is based on the standardized MPEG-4 FGS coding, whose scalable stream allows to be truncated arbitrary according to the rate budget. The main contribution of this paper is to propose a framework that integrates error-resilient MPEG-4 FGS coding and constant quality rate adaptation into the DiServ framework through a rate-distortion (R-D) oriented ner granular packet priority. It can be further divided into the followings. First, an optimal truncation strategy for constant quality rate adaptation of MPEG-4 FGS enhancement layer (EL) is realized by embedding the minimal R-D (rate-distortion) information and relying on a piecewise-linear RD model. During the encoding process, R-D sample points are generated and embedded for each bitplane of EL. It is then piecewise-linear interpolated to serve as the R-D model of EL. With this assisted R-D model, the rate adaptation can easily control the distortion or its variation. MPEG-4 FGS base layer (BL) is known to be less

exible and more error sensitive than EL [5]. At times of severe network loading or provisioning mismatch, even the packets of BL get lost and render the quality unacceptable. To protect the BL, error resilience coding, source/channel-level UEP, or optimal packetization [8,13] may be employed. In our work, following MPEG-4 packetization principle, xed-size packets are generated for both BL and EL. The loss impacts of BL packets to the end-to-end video quality are measured and the priority is assigned accordingly. Similarly, for EL packets, the loss impacts (i.e. distortion increase) are calculated from the piecewise R-D model. With the ne granular packet priority, more graceful quality degradation is achieved than [5] that does not dierentiate the packets. The proposed framework also takes full advantage of the integration of the ne granular scalable video coding into the QoS-enabled DiServ network. After careful examination from both source and network angle, an appropriate DiServ service model is selected to eÆciently handle the MPEG-4 FGS stream. To avoid unpractical performance bias in favor of UEP over equal error protection (EEP), we attempt a fair comparison between UEP and EEP for both BL and EL. From the UEP/EEP comparison, we exploit and evaluate several deployment scenarios. As a result, the dierentiated forwarding of FGS video shows suÆcient eÆciency and exibility to overcome the short-term network variation and lower-to-middle range packet loss. Thus, by leveraging this, the rate adaptation at the sender is required to match only longer-term network variation (i.e. less frequent rate adaptation). The rest of this paper is organized as follows. In Section 2, we present an overview of the whole framework where key components such as scalable video coding and the DiServ network model are identi ed. In Section 3, we discuss the constant quality rate adaptation and prioritized packetization, respectively, for the MPEG-4 FGS codec with minimal distortion variation and prioritized delivery. Section 4 covers the error-resilient MPEG-4 FGS codec, followed by the discussion of its interaction with the DiServ model. In Section 5, we demonstrate that video streaming can bene t from rate adaptation, prioritized packetization and dierentiated forwarding. We also analyze the eect of both source and network operations for BL/EL UEP/EEP. Finally, we conclude our work and point out some future directions in Section 6.

2. OVERVIEW OF THE PROPOSED FRAMEWORK 2.1. DiServ for Video Applications Media delivery by taking dierent application QoS requirements into account allows better link utilization and user satisfaction. Since the premium service corresponds to expensive guaranteed service, only two QoS levels, i.e. BE and AS, are considered in this work. Research eorts in service dierentiation can be divided into two types: the absolute [9, 10] and the relative [11, 12] service dierentiations. The absolute service dierentiation has a higher complexity, and trades exibility for more guarantees. In contrast, the relative service dierentiation scheme promoted by Dovrolis et al. [11] provides a proportional service gap. That is, a higher DS level should provide a better (or at least not worse) performance in terms of queuing delay and packet loss. We will focus on AS of DiServ with relative service dierentiation in streaming video applications. There are several ways to realize the proportional DiServ scheme [4,12]. A detailed description of the implmentation is out of the scope of this work. However, the model as given in [12] is adopted in the network simulation. In the case of packet drop, the proportional DiServ model demands that loss rates of dierent DS levels are spaced as li

lj

i ; 5mm1 i; j N; j

where li is the average loss rate for DS level i and i is the loss dierentiation parameter with 1 > 2 > > N .

(1) >0

2.2. Dierentiaed Forwarding Figure ?? presents one general framework for end to end system that includes the scalable source encoding, constant quality rate adaptation, prioritized packetization, and dierential forwarding. In this framework, the source encoding is based on the MPEG4 FGS codec where the estimated minimal bandwidth works as the bandwidth constraints for BL. During encoding, the rate and distortion information is also generated for each bitplane and embedded in the VOP user data part. For non-real time streaming applications, the over-coded bitstream is pre-stored in the streaming server. Upon the streaming request, the rate adaptation module takes place to scale the EL bitstream based on the available bandwidth feedback to preserve the constant quality by referring to the embedded RD information. After the rate adaptation, the scaled bitstream including BL and EL will be packetized. The xed length packetization are applied to both EL and BL as recommended in the MPEG4 standard with the following rules: 1) For both BL and EL, no bitstream from two frames can be packetized into the same packet; and 2) for EL, no bitstream from dierent bitplanes can be packetized into the same packet either. Subsequently, by evaluating the anticipated loss impact of each packet on the end-to-end video quality (to cover the loss impact of its own and depending packets), we assign varying priority per each packet within the priority range for each layer (the resulting priority is named by relative priority index, RPI). Note here the priority assignment for one packet should be adaptive to the media adaptation module since not only concerned packet itself but also its child packets will aect the assigned priority. The number of child packets and their distortions are varying with the output of media adaptation module. With the assigned priority for each packet, they are sent out to DiServ network anticipating dierent forwarding treatment [4]. By mapping those packet priority to dierent QoS queue scheduling, correspondingly, packets with dierent priorities will have dierent packet loss behavior, hence, the unequal error protection (UEP) are realized in transport level by this dierential forwarding mechanism. In fact, this transport level UEP (TUEP) can be accompanied by an application level UEP (AUEP) such as FEC and ARQ. Besides this implicit priority packet dropping by DiServ router, explicit traÆc policing can also be performed at the intermediate video gateway/ lter (inside an active network version of DiServ routers or other special network devices) using packet ltering. In this manner, based one the assigned RPI, rate adaptation and error control with TUEP are jointly performed in the proposed framework. Another interesting point is that both rate adaptation and DiServ priority dropping can handle the network congestion in the proposed framework. For priority dropping, it is not necessary to estimate the ABW, however, additional complexity is needed to add to DiServ network node and a little lower link utilization is caused when packets are dropped in the middle of transmission. While rate adaptation is performed at the end-system and adds additional complexity to estimate ABW, but it will signi cantly reduce the network congestion when the estimate The

`DS Level' can be interpreted as the level of quality provided to a group of packets with the identical DS codepoint

in the IP header.

Encoding server

Video Preprocessing

RD Embedding

MPEG4 FGS Codec

BL Prioritization

Figure 1.

PSNR

Transmission server

Network Node Bandwidth Estimator

Rate Adaptation

EL Packetization Prioritization

End User

DiffServ Model

Framework of proposed dierentiated forwarding for scalable video EL bits

EL bits

PSNR

Frame No. PSNR

Frame No.

Frame No.

Frame No.

Frame No.

Impact of dierent bit allocation of EL on sequence PSNR. (a) Base layer only. (b) Allocate bits evenly. (c) Constant quality bit allocation.

Figure 2.

ABW is accurate. As a compromised approach, we perform the rate adaptation only when there is a signi cant bandwidth change. By combining rate adaptation at both source and intermediate node (i.e. the edge router) with priority dropping functionality based on source content from the DiServ mechanism, we can get maximum end-to-end quality and increase network link utilization.

3. CONSTANT QUALITY RATE ADAPTATION AND PRIORITIZED PACKETIZATION FGS has been adopted by MPEG4 as a standard coding tools for video streaming applications. As illustrated in Fig. 3 (a), it consists of one BL that is coded with an MPEG-4 compliant non-scalable coder, as well as a single EL coded progressively (bitplane by bitplane) with the embedded DCT coding scheme [1]. Compared with the non-scalable video codec where the rate adaptation are performed in the form of transcoding, the MPEG4 FGS codec simpli es the rate adaptation task a lot since the bitstream in EL can be arbitrarily truncated. However, only the decoder structure is de ned in the standard without specifying how to perform optimal truncation. One default approach is to evenly distribute the rest available bandwidth after BL coding to EL among frames. However, this trivial bit allocation does not give the best visual quality due to the following reason. The BL is normally constrained as the CBR video, hence there is a signi cant amount of quality variation existing among those frames as shown in Fig. ??(a). When the same amount of bits is allocated to EL of those frames, most probably, the shape of quality curve will be preserved as in Fig. ??(b). Ideally, it is desirable to have the constant visual quality as shown in Fig. ??(c) that indicates the unequal amount of bits should be assigned among frames. In the following, we will present our approach to perform the rate allocation among the frames to achieve the constant quality.

70

Fine-granular scalable enhancement layer

F1 Real F15 Real F30 Real

60 50

F1 Interp F15 Interp F30 Interp

40 30 20 10

P/B

Figure 3.

15000

14000

13000

12000

11000

10000

9000

8000

7000

6000

5000

4000

MPEG-4 base layer

3000

0

2000

P/B

1000

P/B

0

I

(a)FGS scalability structure (b) Comparison of Interpolated and real Distortion.

3.1. Rate-Distortion Information Embedding To obtain the distortion information, there are two major approaches in traditional rate control schemes. One approach as in [15] fully relies on the closed form model, however, it is found inaccurate at low bit rates. To overcome the inaccuracy existing in closed form models, Lin etc. utilize a set of RD samples to approximate the complete R-D relationship by one cubic interpolation [?]. Besides the model itself accuracy, the keys of RD sampling based approximation are the low complexity and low overhead. Here we proposed one low overhead, low computation and relatively accurate RD information embedding for MPEG4 FGS EL as follows. The EL is coded bitplane by bitplane, intuitively, the RD characteristic should be uniform with the same bitplane since the distortion reduction is approximately determined by the quantization parameters to which the concerned bitplane corresponds. That is, only the RD points at the beginning of each bitplane is needed to be embedded and calculated. Typically, there are only few bitplanes (i.e., 6 7 bitplanes corresponding the wide range QP from 1 to 26 or27 times QP) RD information necessary. Moreover, since the DCT is the unit transform which is invariant to the pixel variance, the distortion associated with each bitplane RD points can be directly calculated in the coeÆcient domain by calculating the (ci c^i ) where the c^i represents the reconstructed coeÆcients if up to current bitplane is decoded. This bitplane associated RD sample generation incurs negligible overhead and computational complexity without aecting the original encoding process. The generated RD information can be either stored in the user data part of each VOP or in a separate le. In Fig. 3 (b), the interpolated RD and actual RD curves are plot for the 1st., 15th. and 30th. frame of "Foreman" CIF sequence. From these gures, it is found that the piecewise linear model can model the RD curves very well.

P

3.2. Constant Quality Rate Adaptation The embedded RD samples and piecewise linear model described in previous subsection provides the suÆcient R-D information that can be used for constant quality rate adaptation. We proposed one sliding window based rate adaptation scheme as follows. Suppose that the window Wi starts from frame i that includes Mi frames, that is, min

subject to

XR

j 2W i

Ej

X kD i

j

Dj

Rwi FMRi

wi

1

k BBL :

(2)

The above optimization problem are performed within one sliding window across several frames. The Rwi represents the available bandwidth at window start time ti . And BBL is the total consumed bit budget from the base layer within this window. To have the optimal solution, we use bisection search to nd the Di while minimizing the distortion variation among frames. That is,

Step 1. Take the minimal BL distortion of all frames within one sliding window as the initial Di . Step 2. Calculate the Ri based on the given Di using the linear interpolation of embedded bitplane associated RD information. Mi BBL Æ , then Di = Di +2Dlow ; Dhigh = Di ; return to step 2. else if Ri > Step 3. If Ri < Rwi FR wi (Dhigh +Di ) M i Rwi FR BBL + Æ , then Di = , Dlow = Di . Return to Step 2, else stop. 2 wi

P

P

The Æ is the negligible factor to control the rate adaptation accuracy. With the above bisection search of Ri and normally only few number of iterations are needed to nd the bit allocation for each frame within the window. Each time, only linear interpolation are needed to calculate the Ri and Di , the complexity of this algorithm is very low and can be performed in real time. This approach will work best for slowly varying channel conditions. However, it will be shown in section 4 that even for the fast varying channel conditions, the approach presented here is still tenable when the dierentiated forwarding is followed which mitigates the most packet loss impact due to inaccurate rate adaptation. Another interesting point here is the window size Mi , by increasing the size of Mi , the quality smoothness can be improved further, however, it requires more buer and incurs more delay as well. Our nal remark about this optimization window is that the optimization window should not go across the multiple scenes since minimizing the distortion variation between frames from dierent scenes has little meaning to the nal visual quality. More generally, the constant quality rate control can be performed in hybrid spatiotemporal quality sense as in our recent paper [?] where the source is rst segmented into several temporal uniform region, then both the temporal frame rate and SNR quality can be adjusted to have the smooth visual quality. Di ,

3.3. Fixed-length Packetization and Priority Assignment Packetization scheme has an signi cant eect on the streaming eÆciency. Traditionally, there are two types of major packetization mode for video bitstream, such as GOB based packetization as proposed in H.263+ which is fully complied with the application level framing (ALF) concept. Another is the xed length packetization where the similar length video packet is formed and limit one packet within the same frame. Besides, the packet size is another important factor aecting eÆciency, the smaller packet size will incur more transport overhead. Recently, one optimal packetization scheme is proposed for scalable bitstream to improve the error resilience coding by formulating the packetization of embedded bitstream as a discrete optimization problem for minimal distortion [8]. In this paper, we take the xed length packetization as the packetization mode based on the following considerations. First, FLP can avoid the ineÆciency that the the small size GOB may have. More important, compared with the GOB based packetization, the FLP automatically separate the relative large motion region into several packets and group relatively static region into single packet. Typically, the region with large motion will have more eect on the end-toend quality than low motion region, hence the FLP scheme inherently provides the UEP for the packetized bitstream in terms of the packet loss ratio. In Section 5, we generalized the eect of these two packetization mode to end-to-end quality under EEP and UEP by its eect to the priority distribution of packets. According to their importance to the end-to-end visual quality, the packetization module grades each packet with certain priority. In terms of dierent service preference such as loss and delay, the priority can be further divided into a relative loss priority index (RLI) and a relative delay priority index (RDI) as in [4]. If the priority can accurately re ect the eect of each packet to the end-to-end quality, the most graceful quality degradation can be achieved by ordered dropping in terms of this assigned priority index. Currently, in our scheme, we measured the base layer packet priority according to its real eect of the concerned packet loss to the nal end-to-end quality. That is, we calculated the overall sequence MSE when the concerned packet got lost. Then, our base layer priority will be determined by the normalization of the caused MSE increase of the concerned packet. With this approach, the priority of each packet will re ect the actual eect to the nal video quality. Although it is the most accurate priority determination approach, it involves a lot of computation complexity due to multiple decoding with the supposed packet loss. How to determine an relatively accurate packet priority to re ect the eect to the nal end-to-end quality with low complexity itself is a open research problem, recently, in [4] and [18], by considering several video feature parameters such as number of Intra MBs, the packet initial error (i.e., the caused MSE by assuming the concerned packet loss with the default error concealment technique), the magnitude of motion vectors as well as the spatial ltering eect, one MB-based corruption model is developed to approximate the actual loss impact in the MSE sense.

In this paper, we adopt the accurate priority instead of the approximated ones for BL based on the following considerations. Firstly, we want to clearly analyze the gains from TUEP by the dierentiated forwarding, if the approximated priority is adopted, the approximation error will de nitely have an impact on the nal gains. Secondly, since the BL is normally determined by the minimal bandwidth and typically xed, the priority of BL packets can be calculated o-line for the pre-stored video. A good approximation to accurate MSE increase from single packet loss is proposed in [?], thus can be also used here to reduce the complexity. For the EL packets, the priority assignment is much simpli ed due to strict temporal separation within frames. The packet loss within EL will only aect the single frame and will not propagate to the later frames. Therefore, the incurred distortion from the EL packets can be accurately calculated from only one frame. The packet priority itself is calculated as 4Di (3) i = 4Ri where the 4Di represents the incurred distortion due to the speci ed packet loss, and the 4Ri is the size of the concerned packet. In addition, the packet dependency needed to be taken into consideration such that the packets in the more signi cant bitplane got lost, the less signi cant bitplane should discard anyway. Hence, the nal packet should be calculated as Pi = s + i (4) s2SDi where SDi is the set of descendants of packets i which can be are the packets corresponding to the same region in the less signi cant bitplane. By relying on one piece-wise linear RD model for each bitplane, the EL packet priority can be easily calculated on-line during the packetization procedure.

X

4. DIFFERENTIATED FORWARDING OF ERROR RESILIENT FGS VIDEO As mentioned in section 2, our packet priority is very ne granular in two senses. First, all the priority is associated with the individual packet instead of the layers in previous work, i.e., the BL and EL. Second, even within the enhancement layer, the ne granular priority is employed for each individual EL packet. With these ne granular priorities, the dierentiated forwarding can be employed accordingly. In this section, we rst describe the error resilient coding for both BL and EL. Then we propose our DiServ model that perform the dierentiated forwarding of this error resilient bit-stream. Finally, we generalize the comparison of UEP and EEP and provide the guideline for UEP in the sense of packet priority distribution.

4.1. Error Resilient MPEG-4 FGS Coding The base layer of MPEG4 FGS is fully compatible with the MPEG4 non-scalable coding scheme. A lot of error resilience tools are being standardized in MPEG4 standard including the video packet (VP), data partitioning (DP), RVLC etc. as well as some informative ER options such as CIR, AIR and Newpred. Among those options, the DP and RVLC are mainly for partial decoding of the video packets with bit error, hence it can be seldom applied to Internet environment where the packet loss is the dominant factor. The CIR and AIR are encoding choices where the CIR is performed as the cyclic Intra refreshment of some continuous Macroblocks, while AIR is more intelligent by considering the motion area where only the motion MBs get refreshed as the Intra-MB. In our paper, both AIR and CIR simulations are performed for UEP and EEP scenario and the performance gains are analyzed in terms of dierent impact on the packet priority distribution. In Fig. 4(b), we illustrate our EC scheme as reference where for lost MBs in P frame, the motion vector is interpolated using those of the upper and lower macroblocks. The following rules are applied to handle special cases: 1). If the missing macroblock is at the upper boundary of an image, its v1 and v2 vectors are set to the zero vector. 2). If the lower macroblock is corrupted or the missing macroblock is at the lower boundary of image, its v3 and v4 vectors are set equal to its v1 and v2, respectively. We employed the xed length packetization for the MPEG4 FGS EL bitstream according to the following rules that 1) no packets can go across two frames and 2)no EL packets can go across two bitplanes either. When the packets belonging to signi cant plane got lost, all packets in the less signi cant plane will be discarded as well.

BE AF1

X

X

X

Scheduler AFn

X

X

X

v1

v2

v3

v4

v1

v2

v3

v4

v1

v2

v3

v4

upper MB

missing MB

lower MB

Relatively proportional Loss Rate Figure 4.

(a)Utilized DiServ node based on Multiple Queue

(b) Error concealment scheme used in BL.

4.2. Dierentiated Forwarding of MPEG-4 FGS Video

Dierentiated forwarding of non-scalable H.263+ codec are discussed in [ref] where three proportional DS level based DiServ architecture is employed. However, when MPEG4 FGS codec is employed, typically, the ABW is suÆcient to transmit the BL bitstream due to its relatively smaller size. Compared with the EL packets, the BL packets are most critical and should be protected strongly. Therefore, the BL bitstream is mapped into higher priority class queue as more secure manner compared with EL bitstreams. There are two possible DiServ service model that can be provided. Although we can not assume that the ABW is always larger than bit rate of BL, if some minimal bandwidth that is much smaller than BL bitrate can be allocated, then the highest priority category in BL that is corresponding to this minimal data rate is all secure, i.e., no packet loss in that portion, then some minimal quality can always be preserved. Another simple model is that all the three BL categories are all mapped into relative proportional DiServ. As for EL bitstream, another lower priority class queue with two dierent drop preferences is assigned (i.e., by including best eort service as the worst one, there are also three dierent drop preference for EL in total). In Fig. 4 (a), one multiple queue DiServ node is illustrated and utilized as the dierentiated forwarding entity. In addition, rate adaptation (RA) can signi cantly reduce the network congestion when the ABW is available. RA can be either performed at the server side or in the edge router at the DiServ where the packets are dropped in strict order of priority instead of roughly priority category in the DiServ model. The RA can be performed implicitly or explicitly such that 1) Only dierent marking of IN for within rate-adapting portion in EL and OUT for the exceeding portion in EL are given (i.e., OUT is assigned as Best-eort service); or 2) OUT portion are just cut-o at sender side in order to avoid network congestion in advance while it may give worse network link utilization when the estimation of available BW is not so accurate. Compared with the DiServ forwarding, RA can provide more graceful quality degradation when ABW becomes smaller. However,it causes complexity to estimate available BW, and also there is time delay between ABW measured time and rate adaptation time upon measured ABW. In this paper, one compromised solution is taken such that when there is a big change in bandwidth, the rate adaptation is performed, otherwise, only DiServ forwarding is employed.

5. SIMULATION RESULTS In this section, we will demonstrate the performance of proposed system by enabling or disabling rate adaptation, or enabling or disabling the prioritized transmission. Several typical scenario appearing in video streaming application are identi ed. The gains of prioritized transmission compared with the non-prioritized ones are compared in detailed. The overall simulation setup is illustrated in Fig. 5.

5.1. Scenario 1: Dierentiated Forwarding of BL packets

An assumption that the MPEG4 FGS codec is building on is that the ABW should be suÆcient to transmit the BL bitstream. However, during some severe congestion period, this assumption does not hold anymore. It is very

RPI Generation

Video MPEG4 FGS encoder

Packetizer

Evaluate end-to-end video quality by PSNR

Prioritizer and rate adaptation

Estimated available BW

MPEG4 FGS ER decoder

Loss/error concealment

DiffServ Network Model

Experience Network delay and packet loss

De-Packetizer

Reconstructed video Figure 5.

Diagram of proposed simulation setup.

desirable if there is still graceful quality degradation even when there is no suÆcient ABW for BL. In this section, rst, we show the simulation results obtained by applying the prioritized transmission and non-prioritized transmission of MPEG-4 BL. The prioritized transmission is based on the proportional DiServ model where three DS level are applied and the packet loss ratio in each DS level is proportional. Another interesting issue is how to map the prioritized packets to those three levels since packet priorities are continuous while the network node only provide few DS level. In [4], the packets is rst categorized into dierent categories by uniform (or nonuniform) quantizing the RLI into dierent DS categories following by the QoS mapping of those categories to the limited DS elds guided by a price mechanism. Since our major goal is to investigate the possible gains that the DiServ can provide, for simplicity, we adopted one simple QOS mapping from RLI to DS levels that we map the similar number of packets to each DS level. Under the simulation setup as in Fig. 5, there are still some possible variations from the encoding and packetization modes. The typical simulation scenarios are as illustrated in Table 1. Case 1 2 3 4

Simulation Set up for BL of MPEG4 FGS codec frame rate Rate Control GOP mode ER options 10fps TM5 IPPPP... VP, AIR 10fps TM5 IPPPP... VP, CIR 10fps TM5 IPPPP... VP, AIR 10fps TM5 IPPPP... VP, CIR

Table 1.

Bitrate 128 kbps 128 kbps 128 kbps 128 kbps

Packet size 400 bytes 400 bytes GOB based GOB based

The BL are encoded using MPEG4 FGS codec with TM5 rate control at 128 kbps for CIF size sequence at 10fps with the rst frame is I followed by all the P frames. If the packets from I frame get lost, it will propagate to all the following P frames and may incur disastrous impacts. Hence we limit the packet loss only to P frames here. As mentioned above, other error resilient encoding options are AIR and CIR besides VP. Both AIR and CIR simulations are performed in current framework. The performance gain from those two modes are shown in Fig. 6 (a) and (b). From these gures, it is found that the DiServ transmission shows clear gain than that without service dierentiation in terms of PSNR under the same bit budget and overall packet loss ratio in all the four cases. An interesting point is that gains are varying signi cantly in the four cases where the least gain is obtained from case 1 and most gains are obtained from case 4. As a complementary, the distribution of packets RLI for each case is shown in Fig. 2. We will discuss the underlying reason in the later sub-section.

32

AIR_EEP

30

CIR_3DS_UEP CIR_EEP

30

28

28

PSNR

26

26

24

pkt loss

22 9%

8%

Figure 6.

in case 2

7%

6%

5%

4%

3%

2%

1%

0%

24 pkt loss

22 9%

8%

7%

6%

5%

4%

3%

2%

1%

0%

(a)PSNR comparisons of EEP and 3 level DS-UEP in case 1 (b)PSNR comparisons of EEP and 3 DS-UEP

14

0.045 0.04

12

RLI

PSNR

32

AIR_3DS_UEP

0.035

10

0.03

8

0.025

6

0.02 0.015

4

0.01

2

0.005

0

0

1

50

99 148 197 246 295 344 393 442 491 540

1

113 225 337

449 561 673 785 897 1009 1121 1233

(a)RLI of BL packets for "Foreman" Sequence of case 1 and (b) RLI of EL packets for "Foreman" Sequence with 384kbps EL

Figure 7.

5.2. Scenario 2: Dierentiated Forwarding of EL Stream In [5], two dierent levels of error protection are only applied to BL and EL accordingly. Our work is dierentiated from theirs in that we applied the TUEP (i.e., dierentiated forwarding) even within the EL. In [5], dierent bitplanes within EL is unequally protected, however, the same bitplane level of dierent frame has the dierent contribution to end-to-end visual quality, hence it should be protected unequally. For example, the EL of one frame with low quality BL typically has more signi cant impact than the EL packets of frame with high quality BL. The UEP in [ref] did not explore this dierentiation. While by evaluating the real impact of packet on the end-to-end quality with the help of embedded RD samples and piecewise linear model, each EL packet is dierentiated and dierent RLI is assigned accordingly. In Fig. 7(b), the RLI of packets from 384kbps EL are illustrated. In Fig. 8 (a),(b),(c) and (d), we show the performance advantage of priority dropping from DiServ (UEP) over uniform dropping (EEP) at 160kbps, 256kbps, 384kbps and 512kbps of EL. From these gures, it is found that the DiServ transmission shows clear gain than that without service dierentiation in terms of PSNR under the same bit budget and overall packet loss ratio in all the four cases. Moreover, in Fig. 8, we show the performance of rate adaptation when the accurate ABW is available, it is observed that compared with the UEP, more graceful quality degradation can be achieved since in RA the packets are dropped in strict order of packet priority. Another interesting observation can be made from Fig. 8 is that the gain of RA over UEP, UEP over EEP is varying under dierent packet loss ratio and dierent bit rate of EL. Under high packet loss range (i.e., 25 percent or more), the RA has the signi cant gain than UEP while under low packet loss, the gain of RA over UEP is minor. It is indicated that when there is big bandwidth change (i.e., corresponding to high packet loss ratio), more gains can be achieved from RA with the additional complexity to estimate the ABW, however, under small or even middle range of bandwidth change, the UEP without RA can

35

36 35 34 33 32 31 30 29

34

RA

33

3DS-UEP

32

EEP

31

RA 3 D S - U EP EEP

30 29

50% 40% 35% 30% 25% 20% 10%

50% 40% 35% 30% 25% 20% 15% 10%

5%

(a)

5%

(b)

34

32.4

33

32

RA

32

3DS-UEP

31

EEP

30

RA

31.6

3DS-UEP 31.2

EEP

30.8

29

30.4

50% 45% 40% 35% 30% 25% 20% 15% 10%

6%

40%

35%

30%

25%

(c)

20%

15%

10%

6%

(d)

Figure 8. PSNR comparison of RA, UEP and EEP for "Foreman" sequence under dierent EL bitrate (a) EL at 512kbps; (b) EL at 384kbps; (c) EL at 256kbps (d) EL at 160kbps. Under no loss, the PSNR without packet loss is 34.99db, 34.19db, 32.97db and 32.07db respectively.

works ne without losing too much eÆciency. The dierent bit rate of EL can also aect the gain of UEP over EEP, the smaller the EL is, the smaller gain is achieved that the UEP over EEP.

5.3. Impact of Packet Priority Distribution on the UEP Gain As found above, several coding and packetization parameters can have an impact on the gain for UEP over EEP in both BL and EL. When proper parameters such as AIR instead of CIR, xed length packetization instead of GOB packetization are applied, the performance gap between UEP and EEP can be relatively small. The underlying principle for those phenomena can be illustrated as the dierent packet priority distribution as shown in Table. 2. As an extreme case, when the packet priority is constant, the gain of UEP to EEP is zero. When we modify the coding and packetization modes, actually, we modify those priority distribution and the performance gap as well. Most work done in UEP area attempt to spread wide region where the gains from UEP is highlighted. As a conclusion, for UEP approach, more wide the priority index spread, more graceful quality degradation can be achieved with UEP on those prioritized packets. In constrast, for EEP, more similar the priority index is, more graceful quality degradation can be achieved with EEP. Therefore, the performance gap between UEP and EEP is largest when the priority spread across a wide range while negligible when the packet priority is cluster into a small region. This is veri ed in both the BL and EL simulation. As an example, when the xed length packetization scheme is applied, the packet priority distribution is more similar than that from the GOB based packetization scheme. Similarly, the CIR/AIR has the same impact on the packet priority distribution. For the EL, same observation are obtained that when the bitrate of EL is small, the gap between EEP and UEP is also very small. The reason is that when the EL has the low bitrate, all the EL packets are from rst few bitplanes, hence there is no big dierence among EL packets. Therefore, when there is no service dierentiation provided in the network node (i.e., the EEP case), it is desirable to generate the packets with the similar priority. While if there is QoS enabled service dierentiation existing, it is bene cial to generate packets with the wide priority region. This conclusion is also complied with the approaches in [17] that formed the equal importance packets for Internet video streaming applications.

6. CONCLUSION AND FUTURE WORK One sliding window based constant quality rate adaptation is proposed for MPEG4 FGS codec. It performs by embedding the RD information within each bitplane, and relies on piecewise linear model to obtain the real distortion.

Table 2.

priority distribution of BL and EL packets under dierent encoding and packetization parameters BL Case 1 BL Case 2 BL Case 3 BL Case 4 P P P P P~avg P~avg P~avg P~avg 3.04 2.45 3.35 3.09 160k EL 256k EL 384k EL 512k EL P~avg P P~avg P P~avg P P~avg P 0.023 0.026 0.024 0.031 0.026 0.035 0.027 0.0038

Then one dierentiated forwarding framework of error resilient MPEG4 FGS video is investigated with the ne granular BL and EL packet priority. Starting from the real distortion of each packet, we show the gains of priority dropping over the uniform dropping under dierent encoding and packetization parameters. We generalize that the gain gap of UEP over EEP can be illustrated in the dierent distribution of packet priority. By integrating the rate adaptation with the proposed DiServ framework, even more gains can be achieved. A couple of issues should be elaborated further. First, the mapping of both BL and EL packets to DS level is very heuristic. We believe the both the distribution of packet priority and the price mechanism associated with the DS level should plays a role in this mapping. Second, how to exploit the maximal gain by mapping packets from dierent streams to dierent DS levels if multiple MPEG4 FGS packets are multiplex. Third, the current service model should be polish more to cover the rate adaptation, packet ltering and dierentiated forwarding in more realistic scenario.

REFERENCES 1. W. Li, "Overview of Fine Granularity Scalability in MPEG4 Video Standard," IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 301-317, Mar. 2001. 2. D. Wu, Y. T. Hou, W. Zhu, Y.-Q. Zhang, J.M. Peha, "Streaming Video over the Internet: Approaches and Directions", IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no. 1, pp. 1-20, Feb. 2001. 3. J. Kim, Y.-G. Kim, H. Song, T.-Y. Kuo, Y. J. Chung, and C.-C. J. Kuo, \TCP-friendly Internet video streaming employing variable frame-rate encoding and interpolation", IEEE Trans. Circuits and Systems for Video Technology, vol. 10, no. 7, pp. 1164-1177, Oct. 2000. 4. J. Shin, J. Kim and C.-C. J. Kuo, "Quality-of-Service Mapping mechanism for Packet Video in Dierentiated Services Network," in IEEE Trans. Multimedia , vol. 3, No. 2, pp. 219-231, June 2001. 5. Q.Zhang, G. Wang, W. Zhu, and Y.-Q. Zhang, "Robust Scalable Video Streaming over Internet with NetworkAdaptive Congestion Control and Unequal Loss Protection," Proc. Packet Video, May, 2000. 6. R. Gopalakrishnan, J. GriÆoen, "A simple loss dierentiation Approach to Layered multicast", Proc. InfoComm 2000. 7. P.A.Chou, A.E.Mohr, A. Wang, and S. Mehrotra, "Error Control for Receiver-Driven Layered Multicast of Audio and Video," IEEE Trans. Multimedia, vol. 3, No. 1, pp. 108-122, March 2001. 8. X. Wu, S. Cheng, and Z. Xiong, "On packetization of embedded multimedia bitstreams," IEEE Trans. Multimedia, vol. 3, No. 1, pp. 132 -140, March 2001. 9. I. Stoica, S. Shenker, and H. Zhang, "Core-stateless fair queuing: Achieving approximately fair bandwidth allocations in high speed networks," in Proc. SIGCOMM, Vancouver, BC, Canada, Sept. 1998. 10. I. Stoica, and H. Zhang, "Providing guananteed services without per ow management", in Proc. SIGCOMM, Boston, MA, Sept. 1999, pp. 81-94. 11. C. Dovrolis, D. Stiliadis, and P. Ramanathan, "Proportional dierential services:Delay Doerentiation and packet scheduling," in Prof. SIGCOMM, Boston, MA, Sept. 1999. 12. C. Dovrolis and P. Ramanathan, "Proportional dierential services-Part II: Loss rate dierentiation and packet dropping, " in Proc. International Workshop on Quality of Service, Pittsburgh, PA, June 2000. 13. Y. Wang, Q. Zhu, "Error control and concealment for video communication: a review", in Proc. of IEEE, pp. 974-997, May, 1998. 14. G. Cheung, A. Zakhor, "Bit allocation for joint-source channel coding of scalable video", in IEEE Trans. Image Processing, vol. 9, No. 3, pp.340 356, Mar. 2000.

15. H. Lee, T. Chiang, and Y.-Q. Zhang, "Scalable rate control for MPEG4 video", in IEEE Trans. Circuits and Systems for Video Technology, vol. 10, no. 6, pp. 878-894, Sept. 2000. 16. L. Zhao, J. Kim, and C.-C. Jay. Kuo, "Scalable Internet video Streaming with Content aware rate control", in Proc. of VCIP, San Jose, CA, Jan. 2001. 17. W. Tan and A. Zakhor, "Video Multicast using Layered FEC and scalable Compression", in IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 373-386, Mar. 2001. 18. J.-G. Kim, J. Kim, J. Shin, and C.-C.J.Kuo, "Coordinated packet level protection employing corruption model for robust video transmission," in Proc. of VCIP, San Jose, CA, Jan. 2001.