MPEG-4 FGS Video Streaming with Constant-Quality Rate Control and Differentiated Forwarding Lifeng Zhao †, JongWon Kim ‡ and C.-C. Jay Kuo † † Integrated Media Systems Center and Department of Electrical Engineering University of Southern California, Los Angeles, CA 90089-2564 ‡ Department of Information and Communications Kwang-Ju Institure of Science and Technology, Kwang-Ju, Korea E-mail: {lifengzh, cckuo}@sipi.usc.edu,
[email protected] ABSTRACT Streaming video over IP is studied by integrating the error resilient scalable source codec with constant quality rate adaptation and prioritized network transmission. The scalable codec is based on the standardized ISO/IEC MPEG-4 fine granular scalable (FGS) coding scheme, which allows arbitrary truncation according to the given rate budget. To maximize the end-to-end visual quality, constant quality video rate control (CQVRC) is proposed to preserve smooth video quality when there is a significant amount of variation in the video source, the available bandwidth or both. The proposed CQVRC attempts to provide smooth video quality for both the base layer (BL) and the enhancement layer (EL). The video stream is then prioritized for priority dropping, where rate adaptation is dynamically performed to meet the time-varying available bandwidth. This kind of prioritized stream can benefit from QoS networks such as the IP differentiated service (DiffServ). All key components, including FGS encoding, CQVRC based rate adaptation and packetization, error resilient decoding, and differentiated forwarding, are seamlessly integrated into one system. By focusing on the end-to-end quality, we set both source and network parameters properly to achieve a superior performance of FGS video streaming. Keywords: MPEG-4 fine grain scalability (FGS), video streaming, constant quality, rate adaptation, differentiated service (DiffServ), unequal error protection (UEP), prioritized packetization, and priority networks.
1. INTRODUCTION For streaming video, the user and the network heterogeneity demands highly scalable video coding methods and flexible delivery techniques to overcome the challenge imposed by the best-effort Internet. The transmission of real time video typically has the bandwidth, the delay and the loss requirements. However, the current best-effort Internet does not offer any quality of service (QoS) guarantees to streaming video in these three aspects. Consequently, Internet video streaming faces numerous challenges. To provide application-layer QoS control, both rate control and error control should be employed to maximize the end-to-end visual quality [1, 2]. By matching the rate of the video stream to the available network bandwidth, rate control techniques reduce the possibility of network congestion. Besides, rate control is needed to preserve smooth video quality when there is a significant amount of variation in the video source, the available bandwidth or both. From the network infrastructure viewpoint, the trend is to promote more QoS support in network nodes (i.e. boundary or internal routers). Two representative approaches in the Internet Engineering Task Force (IETF) are the integrated services (IntServ) with the resource reservation protocol (RSVP) and the differentiated services (DiffServ or DS). Between these two approaches, the DiffServ scheme provides a scalable solution with less complexity since IntServ intends to maintain the per-flow state across the whole path for resource reservation. In the DiffServ model, resources are allocated differently for various aggregated traffic flows based on a set of priority bits (i.e. the DS byte). Thus, the DiffServ approach allows different QoS levels for different classes of aggregated traffic flows. ISO/IEC MPEG-4 fine grain scalability (FGS) video has been proposed as a standard coding tool for video streaming applications [3]. It consists of one non-scalable coded base layer (BL) and bitplane-encoded scalable enhanced layer (EL). Typically, the BL is targeted for the minimal user bandwidth and encoded by constant-bit-rate (CBR) rate control such as MPEG-2 TM5 or MPEG-4 VM8. For EL, a rate control which distributes remaining (after BL coding) available bandwidth evenly is a straightforward choice. However, this trivial rate control does not provide the best quality due to the following reasons. Since the characteristics of video source is non-stationary, reconstructed sequence from CBR-encoded BL often presents significant quality fluctuation in the area with large
object motion or scene change. This is annoying to human perception. When similar amount of bits are evenly allocated to the EL of each frame, the degree of quality variation will be kept about the same. Therefore, quality variation will present in the FGS decoded frames if the MPEG-4 default rate control is adopted. In this paper, we are tackling the IP video streaming problem from both the application and the network viewpoints. Especially to address the above quality variation problem of MPEG-4 FGS video streaming, a constantquality video rate control (CQVRC) scheme is employed to minimize the quality variation in both BL and EL as well as to maximize the end-to-end visual quality. Considering the relaxed delay requirement (supported by large buffer) of most streaming applications, we can do the rate control in larger time-scale. The CBR constraint is thus can be relaxed and even off-line encoding is possible with non real-time streaming. The proposed CQVRC for BL utilizes two-pass approach that exploits a larger decoder buffer, future frame information, and temporal scene segmentation to achieve much smoother video quality than previous rate control approaches primarily targeting for low-delay visual communication. The CQVRC in EL is achieved by embedding a small amount of rate-distortion (R-D) information for each bitplane and then use the embedded R-D samples to linear interpolate the R-D curve. With the estimated R-D curve, an adaptive rate allocation can be performed accordingly to achieve the constant video quality. The video stream is then prioritized for priority dropping, where the CQVRC-based rate adaptation is dynamically performed to meet the time-varying available bandwidth. This kind of prioritized stream can benefit from the QoS network such as IP DiffServ. All key components, including FGS encoding, CQVRC-based rate adaptation and packetization, error resilient decoding, and differentiated forwarding, are seamlessly integrated into the proposed system. The rest of this paper is organized as follows. In Section 2, we present an overview of the whole system, where key components such as scalable video coding and the DiffServ network model are identified. In Sections 3, CQVRC for BL is first discussed; then the R-D embedding and the proposed CQVRC for EL is described. Subsequently, Section 4 covers prioritized packetization, the error-resilient MPEG-4 FGS codec, followed by the discussion of its mapping to the DiffServ model. In Section 5, we demonstrate that video streaming can benefit from rate adaptation, prioritized packetization, and differentiated forwarding.
2. OVERVIEW OF THE PROPOSED SYSTEM The emerging DiffServ network provides different levels of services in a scalable manner. Current research efforts along this direction can be classified into two types: the absolute [4] and the relative [5, 6] service differentiation schemes. The absolute service differentiation scheme has a higher complexity due to the overhead of QoS provisioning. It trades flexibility for more strict guarantees. Dovrolis et al. [5] promote the concept of relative differentiation, which provides a proportional service gap with their own proprietary scheduling. That is, a higher DS level ∗ provides a better (or at least not worse) performance in terms of queuing delay and packet loss. Two DiffServ services are supported by IETF [7], i.e. the premium service (PS) that supports low loss and delay/jitter, and the assured service (AS) that provides QoS better than the best-effort yet without guarantee. For streaming video applications, where encoding/decoding is more resilient to packet loss and delay fluctuations, DiffServ AS seems to be a better match. In this paper, we focus on the DiffServ AS, especially the relative service differentiation, for streaming video applications. Note that MPEG-4 FGS originally assumes guaranteed delivery (e.g. DiffServ PS) of the BL and leaving the EL to the mercy of the best-effort Internet. Here, we relax this requirement partly to allow a broader range of applicability and partly to avoid the high cost of DiffServ PS. There exist several ways to realize DiffServ AS, especially the proportional (i.e. relative) DiffServ. For a more detailed description, we refer to [6,8]. This research adopts the proportional DiffServ model described in [6] for network simulation. In terms of packet loss, the proportional DiffServ model demands that loss rates of different DS levels are spaced as l¯i σi ≡ , 1 ≤ i, j ≤ N, ¯ σj lj
(1)
where ¯li is the average loss rate for DS level i, and σi , i = 1, . . . , N are loss differentiation parameters ordered as σ1 > σ2 > · · · > σN > 0. The overview of the proposed scalable video streaming system with DiffServ network QoS provisioning is shown in Fig. 1. The system consists of four key components: (1) scalable source encoding, (2) constant-quality rate adaptation, (3) prioritized packetization, and (4) differential forwarding. They are briefly described below: ∗
The ‘DS Level’ can be interpreted as the grade of quality provided to a group of packets with an identical DS codepoint in the IP header.
Encoding server
RD Embedding
Transmission server Rate Adaptation
Video Preprocessing
Network Node Bandwidth Estimator End User
MPEG4 FGS Codec
BL Prioritization
EL Packetization Prioritization
DiffServ Model
Figure 1. The overview of the proposed scalable video streaming system with network QoS provisioning • Scalable coding The source is first encoded by the MPEG-4 FGS codec, where the estimated minimal bandwidth gives the bandwidth constraint for BL. In the coding process, the BL is encoded with a two-pass process, where the first pass focuses on analyzing the sequence, and the second pass performs the real encoding based on the gathered statistics. For the EL, R-D samples are generated for each bitplane and embedded as the VOP user data. • Rate adaptation For non-real time applications, the over-coded bitstream is prestored in the streaming server. Upon the streaming request, the rate adaptation module takes place to scale the EL stream based on the feedback of the available bandwidth to preserve constant quality by referring to the embedded R-D samples. • Prioritized packetization The rate-adapted stream that contains both BL and EL is then packetized. The fixed-length packetization is adopted for both EL and BL streams as recommended by MPEG-4 [3]. By evaluating the anticipated loss impact of each packet on the end-to-end video quality (i.e. considering the loss impact to itself and depending packets), we assign a priority index, called the relative priority index (RPI), to each packet within the priority range of each layer. The priority assignment of a packet is dynamically determined in the media adaptation module, since not only the concerned packet but also its children packets (which is actually dynamically varying) will affect the assignment. • Differential forwarding With the assigned priority, these packets are sent to the DiffServ network to receive different forwarding treatment [8]. By mapping these prioritized packets to the different QoS DS levels, packets will experience different packet loss rates with this differential forwarding mechanism. This transport-side prioritization may be accompanied by the application-side prioritization such as FEC (forward error correction) and ARQ (automatic retransmission request). Besides prioritized dropping performed by DiffServ routers, traffic policing can be explicitly carried out at intermediate video gateways/filters (e.g. inside the active DiffServ routers or other special network devices) by using packet filtering. Thus, based on the assigned RPI, rate adaptation and error control can be jointly performed in the proposed system.
3. CONSTANT QUALITY RATE ADAPTATION In this section, the CQVRC approach used to achieve the minimal quality variation in both BL and EL is described.
3.1. Proposed Rate Control for BL To enhance the received video quality, the statistical information of not only the previous and current frames but also future frames is needed to be considered. As done in [12, 13], we exploit a two-pass coding procedure, where the first pass focuses on analyzing the underlying video and collecting the necessary information for the second pass. Then, the second pass performs the actual encoding task using a three-layer rate control.
3.1.1. Video Analysis We analyze the underlying image sequence so that the sequence can be temporally segmented into relatively homogeneous intervals (i.e. scenes). The sequence is first divided into variable size segments (i.e. group of pictures or GOP’s) by using the mean of absolute differences (MAD) feature. The scene change and global motion activity can be captured from the MAD curve between consecutive two frames. Scene changes are located at local maxima of M AD(i). For each frame i, the second-order differential can be approximated as M AD00 (i) =
d2 (M AD(i)) = M AD(i + 1) + M AD(i − 1) − 2 · M AD(i). (di)2
(2)
Thus, scene change can be located by detecting zero crossing locations of Eq. (2). A new GOP is started from every scene change instance. The proposed rate control aims at providing uniform quality for each temporally segmented GOP so that quality variation occurs only at GOP boundaries. This is due to the fact that minimizing quality variation (in the PSNR value) among different scenes provides little meaning to human perception. Moreover, we attempt to characterize the relative frame complexity as in [13]. A good complexity measure for frame should be invariant with the used quantization parameter (QP). It is observed that the bit count for non-texture information, such as frame headers and motion vectors (denoted by H(i)), remains almost constant regardless of QP change. Mostly, only bit count of the texture part is significantly varying with the QP change. Based on the MPEG-4 VM R-Q model, if the total bits used for coding the current frame i is Ri , then the texture information T (i) = Ri − Hi can be represented as Ri − Hi a1 a2 (3) = + 2, Mi Qi Qi where Mi is the MAD computed with motion-compensated residual that is invariant of QP. By some straightforward manipulation, we can define complexity measure of ith frame of g GOP that covers both the motion and texture bit count as follows: cg,i = (Rg,i − Hg,i )/Rg − Hg + M V(g,i) /M Vg , (4) where the M V(g,i) is the bit count for frame (g, i), M Vg is the average motion vector bit count of g GOP, and Rg − Hg is the average bit count of non-texture information for g GOP. Since the obtained complexity measure cg,i is QP invariant, only single run (for one QP) is sufficient to generate the frame complexity. It is also observed that the obtained cg,i is a good measure of the frame complexity. The frame complexity is thus calculated and will be utilized in the second encoding pass described as follows. 3.1.2. Three-Layer Constant-Quality Rate Control The constant-quality rate control works in a three-level hierarchy, i.e. the GOP level, the frame level, and the macroblock (MB) level. A. Complexity-guided GOP-Level Bit Allocation To assign bits to each GOP, we first define the complexity measure for each GOP. It is calculated by averaging P (g) spatial complexity as C = cg,i /Ng , where Ng is number of frame in the GOP. Then, the following GOP level recursive bit allocation is applied. • Step 1: Initialization. We set λ=0, bit budget Br = B, and the initial buffer fullness β1 = Td · R, where Td is target preloading time and R is the channel rate. Then start from GOP of index 1. • Step 2: Assign bits to gth GOP by using the formula (Cg · Ng ) · Br . Bt (g) = λ · (R/F ) · Ng + (1 − λ) · P g Cg · Ng where F stands for selected frame rate. • Step 3: Check the buffer status with the tentatively assigned bit budget Bt (g). If (βg−1 + Bt (g)) − (R/F ) · Ng < 0.8 · βmax , we accept it and go to Step 4. Otherwise, we adjust λ by λ − 0.1 and go back to Step 2.
(5)
• Step 4: Update the buffer status via βg = (βg−1 + Bt (g)) − (R/F ) · Ng , and set the remaining bit budget Br to Br − Bt (g). Then, go to Step 2 for next GOP of index g + 1. In the above, λ represents the weighting factor between the buffer variation and complexity demands. The case of λ = 0 represents the bit allocation scheme follows the frame complexity completely. If the buffer constraint can be met, this is the most desirable scenario. On the other hand, λ = 1.0 represents the case where the bit budget is evenly distributed without considering the frame complexity. In this case, minimal pre-loading and minimal size of the decoder buffer are required such that only the first frame has to be pre-fetched. The case with 0 < λ < 1.0 represents a bit-allocation tradeoff between the buffer and the quality constraints. B. Frame-level Bit Allocation The GOP level bit allocation solves the problem of allocating the bit budget to each GOP while meeting both the buffer and the quality constraints. However, the problem of allocating the bit budget to each frame within one GOP is still unsolved. To obtain the constant quality within each GOP, it is preferable to allocate the bit budget according to the frame complexity while still meeting the buffer constraint. The detailed algorithm is similar to the one for the GOP level bit allocation and omitted here. C. MB-Level Rate Control In the GOP and the frame level bit allocation schemes, we preserve a safe margin (0.8 of the maximal buffer) for buffer regulation. Therefore, one simple mode is to quantize all MB’s with the same QP. The QP value is determined in the frame level rate control with the following trial and error strategy. 1. If Bactuali > 1.15 · Bt (i), then QPi+1 = QPi + 1. 2. If Bactuali ≤ 1.15·Bt (i), then we check the condition Bactuali < 0.85·Bt (i). If this is true, then QPi+1 = QPi −1, else QPi+1 = QPi . 3. We set QPi+1 = max(QPi+1 , 1) and QPi+1 = min(QPi+1 , 31). However, if the allowed buffer is relatively small, then MB level rate control similar to that of MPEG-4 VM [10] can be adopted.
3.2. Proposed Rate Control for EL In order to achieve constant quality, it is necessary to model the real R-D curve. To avoid modeling inaccuracy of existing closed-form models at low bit rates [10], Lin et al. utilize a set of R-D samples to approximate the complete R-D relationship via cubic-spline interpolation [15]. Besides higher accuracy, the R-D approximation based on samples demands a low complexity and a small overhead. Thus, in our approach, R-D samples for MPEG-4 FGS EL are embedded for each coded bitplane. The R-D characteristic is assumed to be uniform within each bitplane since distortion reduction is roughly determined by the QP. Consequently, only R-D samples at the beginning of each bitplane have to be calculated and embedded. Typically, there are only a few bitplanes, i.e. 6 ∼ 7 bitplanes, corresponding to a wide range of QP (e.g. from 1 to 26 or 27 times QP.) Since DCT is an unitary transform, the distortion associated with R-D samples in each bitplane can be directly calculated in the coefficient domain. R-D samples of each bitplane introduces a negligible amount of overhead and demands a very low computational complexity. The generated R-D samples can be either stored in the user data of each VOP or in a separate file. Given a bit allocation REL , suppose Ri ≤ REL ≤ Ri+1 . Here Ri and Ri+1 corresponds to rate of bitplane i and i + 1. The corresponding distortion is then equal to cg,i =
(Ri+1 − REL ) × (Di − Di+1 ) + Di+1 . Ri+1 − Ri
(6)
In Fig. 2, the piecewise-linear R-D curve obtained via interpolation is compared to the empirical curve for the 1st, 15th, and 30th frames of the “Foreman” CIF sequence. It is verified that the piecewise-linear R-D model can approximate the empirical R-D curve quite well.
70 60
MSE
50
F1 Real
F1 Int erp
F15 Real F30 Real
F15 Int erp F30 Int erp
BE AF1
40
X
X
X
30
Scheduler
20
AFn
10 0 0
2000
4000
6000 8000 b its / fr m
X
X
X
10000 12000 14000
Relatively proportional Loss Rate
Figure 2. The comparison of interpolated and real (i.e. empirica l) distortions.
Figure 3. The DiffServ node with a multiple queue.
With this interpolated R-D model, we adopt a sliding-window approach to perform rate adaptation. Suppose that window Wi includes MWi frames. Then, the rate allocation to achieve constant quality can be performed within each window independently. In terms of mathematics, the problem can be written as X min kDj − Dj−1 k, (7) j∈Wi
subject to
X j∈Wi
Bj ≤
Mi · RWi − BBL , F RWi
where F RWi is the encoding frame rate inside the window Wi , RWi represents the available bit rate (bits/sec) at the start time of window i and BBL is the total bit budget for BL within this window. To reach the optimal solution, binary search is utilized to find the best DWi , which minimizes the distortion variation among frames. That is, • Step 1: Take the minimal frame distortion of all BL frames within one sliding window as the initial value of Dwi . P • Step 2: Calculate j Bj based on the given DWi by using the piecewise linear R-D model. • Step 3: If
P j
P
MWi ·RWi F RWi MWi ·RWi ( F Rw i
Bj < (
− BBL ) − ∆B − δ, then set DWi =
DWi +Dlow , Dhigh 2 DWi +Dhigh , Dlow 2
= DWi and return to Step
2. Else, if j Bj > − BBL ) − ∆B + δ, then set DWi = = DWi and return to Step 2. Note that δ is negligible factor to control rate adaptation accuracy. Else, continue with Step 4. • Step 4: Move to the next window, update by ∆B = Breal − end of sequence, go back to Step 1.
MWi ·RWi F Rwi
, Rwi+1 = Rwi +MWi . If we are not at the
Only a few iterations are needed in the above binary search of RWi and DWi . Since a simple interpolation scheme is needed to calculate these values, its complexity is low enough to be performed in real-time. This approach works best for slowly varying channel conditions. For a fast varying channel as shown in Section 4, the proposed approach is still acceptable when differentiated forwarding can mitigate packet losses due to inaccurate rate adaptation.
4. FIXED-LENGTH PACKETIZATION AND DIFFERENTIATED FORWARDING 4.1. Fixed Length Packetization and Priority assignment The packetization scheme has a significant effect on the efficiency and error-resiliency of video streaming. Two packetization schemes are used in the context of video streaming, i.e. the variable-length packetization (e.g. GOB packets of H.263+) and the fixed-length packetization (e.g. MPEG-4 video packets), where video packets of a similar length are formed. The packet size is related to efficiency and error-resilency, since a smaller packet size demands a higher overhead but is more resilient to errors. Recently, to improve error resiliency, a discrete optimization problem
to minimize the distortion was formulated in the packetization of an embedded stream [9]. In this work, we follow the fixed-length packetization (FLP) due to the following considerations. First, FLP can avoid the inefficiency of GOB’s of a very small size . More importantly, compared with GOB-based packetization, FLP automatically separates a larger motion region into several packets while grouping several static regions into one single packet. The coding of a motion region into multiple packets is able to spread the loss impact, thus increasing error resiliency. Each packet is then assigned with certain priority according to its impact to end-to-end visual quality. For different service preferences in terms of loss and delay, the priority can be further divided into the relative loss index (RLI) and the relative delay index (RDI) as given in [8]. If the assigned priority reflects the impact of each packet to end-to-end quality well, graceful quality degradation can be achieved by dropping packets with respect to the priority index. To determine the packet priority with a low complexity is an active research area today. Several features such as the initial error strength (i.e. in MSE by assuming the packet loss concealed), its propagation via motion vectors and the spatial filtering effect were used to develop a corruption model in [11] to determine the packet priority in terms of the loss impact. For BL packets, we adopt the accurate priority rather than its approximation. That is, we empirically measure the overall MSE when the packet of concern gets lost. By doing so, we can analyze the gain from differentiated forwarding. Besides, since BL is normally determined by the minimal bandwidth (which is typically fixed), the priority of BL packets can be calculated off-line. For EL packets, the priority assignment is simplified due to the strict separation of frames along the temporal direction. The packet loss within EL only affects a single frame, and it does not propagate. The incurred distortion from each EL packet can be accurately calculated within each frame. The packet priority can be calculated as 4Di ρi = , (8) 4Ri where 4Di represents the incurred distortion due to the specified loss, and 4Ri is the rate of the packet of concern. In addition, the packet dependency has to be taken into consideration such that if packets containing more significant bitplane get lost, packets containing the less significant bitplane in the same region should be discarded anyway. Hence, the final packet loss index can be calculated as X (EL) RLIi = ρs + ρi , s∈SDi
where SDi is the descendent set of packet i. By using the piecewise linear R-D model for each bitplane, the priority of EL packets can be easily calculated on-line during the packetization procedure.
4.2. Differentiated Forwarding With each packet assigned with a certain priority, differentiated forwarding can be employed accordingly. We first describe the error resilient coding for both BL and EL in MPEG-4 FGS, and then examine the DiffServ model to perform the differentiated forwarding mechanism to the error-resilient video streams. For non-scalable BL of MPEG-4 FGS, several error resilient tools have been recommended by MPEG-4. They include: video packet, data partitioning (DP), reversible variable length code (VLC), and cyclic/adaptive intra refresh (CIR/AIR), and NewPred (also known as reference picture selection or RPS). DP and RVLC are mainly for the partial decoding of video packets with bit errors, and it is less relevant to the Internet where the packet loss is dominant. CIR and AIR are encoding choices and AIR improves CIR by using more intelligent refresh based on the motion and so on. In our approach, both AIR and CIR are applied to the BL and their performance is compared for the UEP as well as the EEP scenarios. Also, intelligent error concealment (EC) scheme is adopted, where lost MBs in the P-frame are interpolated from those of upper and lower MBs based on the motion. Finally, as mentioned before, the fixed length packetization is applied to both BL and EL streams. Differentiated forwarding of the non-scalable ITU-T H.263+ stream was discussed in [8], where proportional DS levels were used in prioritized video streaming. For scalable MPEG-4 FGS, it is usually assumed that the available bandwidth is sufficient to cover the BL stream because relatively small bandwidth is required for the BL. Compared to EL packets, the BL packets are very important and should be protected effectively. That is, to secure reliable transmission, the BL stream should be mapped to higher priority DS levels. Even if the available bandwidth goes below the rate demanded by BL incidently, we may assume that some minimal bandwidth (which could be smaller than the BL rate) is still sustained. By prioritizing the BL stream and protecting it accordingly, we can preserve
its minimal quality by delivering at least the highest priority portion of BL packets. Thus, as a simple test case, we consider three BL categories for relative proportional DiffServ. For EL packets, another lower priority class queue (i.e. DS level) with two different drop preferences is assigned † . Thus, both BL and EL have three different drop preferences. The multiple queue DiffServ node as illustrated in Fig. 3 performs the differentiated forwarding policy. In addition to priority dropping, rate adaptation can significantly reduce network congestion when a good estimate of the available bandwidth is achieved. Rate adaptation can be performed at either the server side or the edge router in the DiffServ network. Packets can be dropped in advance by rate adaptation strictly following the priority order. Compared to priority dropping in differentiated forwarding, rate adaptation provides more graceful quality degradation when the available bandwidth becomes small. However, we have to pay some price to achieve the goal, e.g. the complexity required to estimate the available bandwidth, and the inevitable time delay between bandwidth measure and rate adaptation. Thus, in the proposed system, a compromised solution is suggested. When there is a big change in the available bandwidth, rate adaptation is performed. Otherwise, only DiffServ forwarding is employed.
5. EXPERIMENTAL RESULTS In this section, we first show the performance of the FGS rate control scheme with three different options, i.e. constant quality in BL only, in EL only, and in both. Then the performance of the proposed system is studied through extensive simulations.
5.1. Constant-Quality Rate Control for FGS codec 5.1.1. CQVRC for BL only
34
35
32
33 PSN R (d b )
PSNR(db)
We plot performance comparison results obtained for the “Foreman” sequence encoded at a frame rate of 10 fps and a bit rate of 128 kbps in Fig. 4. The average PSNR values for CQVRC, MPEG4-VM and MPEG-2 TM5 rate control are 30.99 dB, 30.43 dB and 30.34 dB, respectively. Thus, the proposed CQVRC scheme achieves 0.55 ∼ 0.65 dB performance improvement in the average PSNR value over MPEG-2 TM5 and MPEG4-VM rate control schemes. Besides, CQVRC has a much smaller PSNR variation.
30 CQVRC-2s 28
TM5
31
CQVRC FGSRC BL
29 27
MPEG4-VM
25
26 1
11
21
31
41
51
61
71
81
91
Frame No.
1
11
21
31
41
51
61
71
81
91 Frames
Figure 4. Comparison of different rate controls for Figure 5. Comparison of different rate controls for EL BL. coded at 160 kbps.
5.1.2. CQVRC for EL only We study the performance of CQVRC applied to EL under CBR and time-varying CBR channels. First, for CBR channel, we perform CQVRC for 10-second “Foreman” sequence whose BL is encoded at 10 fps and 128 kbps with MPEG-2 TM5 rate control and whose EL is encoded at 160 kbps. Those 100 frames can be temporally segmented into two scenes such that the first scene is mainly about the worker (i.e. frame 1 ∼ 50)) while the second scene is about the building (i.e. frame 70 ∼ 100). Frames 51 ∼ 70 are transition frames between two scenes. The optimization window adopted is 50-frame long, and two scenes lie in two separate optimization windows. We show the PSNR †
If we include the best-effort service as the worst one, then there are three drop preferences for the EL in total.
comparison of the proposed CQVRC and MPEG4-FGS rate control (FGSRC) in Fig. 5. The proposed CQVRC achieves an almost constant PSNR for every frame in first scene and much smaller variation in second scene. The maximal PSNR difference of CQVRC is 0.38 dB for the first scene, and 2.2 dB for the second scene. In comparison, the maximal PSNR difference of FGSRC is 2.8 dB and 6.4 dB for first and second scene. Next, when a channel is associated with bandwidth estimation (e.g. bandwidth feedback), it can be better modelled by re-negotiated CBR (R-CBR) since the channel bandwidth is assumed to be constant between two consecutive estimation points. We consider one simple bandwidth trace with a binomial distribution and a feedback period of 5 seconds. The total simulation duration is 10 seconds by using 300-frame “Foreman” CIF, whose BL is encoded at 30fps and 240 kbps. Now, the optimization window size in CQVRC is adaptive to the period of the bandwidth change (i.e. 150 frames). The bit rates of those two bandwidth levels are 960 kbps (i.e. the first 150 frames) and 600 kbps (i.e. the last 150 frames), respectively. The comparison of the PSNR value and its variation for CQVRC and FGSRC is given in Fig. 6. Compared with FGSRC, much smaller quality variation is achieved by the proposed CQVRC. In fact, almost constant PSNR values are obtained within the optimization window by the CQVRC. Visible quality variation happens only at the boundary of the optimization window. An important application of CQVRC is in the QoS-enabled network such as IntServ, where the bandwidth re-negotiation is supported, and CQVRC can be applied during two renegotiation points. 36
36 CQVRC
32
FGSRC
34
BL
32
30
PSNR
PSNR(db)
34
30
TM5+FGSRC_EL TM5+CQVRC_EL CQVRC_BL+CQVRC_EL
28
28 Frame No. 26 1
31
61
91
121
151
181
211
241
Frame No
26
271
1
11
21
31
41
51
61
71
81
91
Figure 6. The PSNR value comparison of CVQRC Figure 7. The PSNR comparison of different rate control and FGSRC under R-C BR channel. options of BL and EL.
5.1.3. CQVRC for both BL and EL Finally, we apply the CQVRC for both BL and EL. Fig. 7 shows the comparison of the CQVRC with the MPEG-4 default approach that utilizes the MPEG-2 TM5 for BL and uniform bit allocation for EL. Once again, it is shown that the proposed approach has significant improvements in terms of both average PSNR and PSNR variation.
5.2. Differentiated Forwarding of MPEG-4 FGS The simulation setup is given in Fig. 8. Several typical scenarios in video streaming applications are identified and evaluated. The gain of prioritized transmission over non-prioritized one is compared, where the proportional DiffServ is adopted in network simulation. 5.2.1. Scenario 1: Differentiated Forwarding of BL packets As discussed, the MPEG-4 FGS codec is built on the assumption that the available bandwidth is sufficient to transmit BL packets. However, in case of severe network congestion, it is desirable to degrade the quality of BL gracefully. Here, we first compare the performance of prioritized transmission of BL packets with that of non-prioritized one. The priority transmission adopts the proportional DiffServ model with three DS levels for proportional loss rates. One interesting issue here is how to map the continuously prioritized packets to three discrete DS levels. One possible solution is the mapping strategy proposed in [8], where packets are mapped to a limited number of DS levels, with a goal to minimize quality degradation under a pricing mechanism. However, for simplicity, we adopt a simpler QOS mapping policy in this simulation by adopting a direct mapping from RLI to DS levels. All packets are clustered
RPI Generation
Video MPEG4 FGS encoder
Packetizer
Evaluate end-to-end video quality by PSNR
Prioritizer and rate adaptation
Estimated available BW
DiffServ Network Model
Experience Network delay and packet loss
Loss/error concealment
MPEG4 FGS ER decoder
De-Packetizer
Reconstructed video
Figure 8. The diagram of the proposed simulation setup. 32
CIR_3DS_UEP
AIR_EEP
30
30
28
PSNR
PSNR
32
AIR_3DS_UEP
26 24
CIR_EEP
28 26 24
pkt loss 22
pkt loss
22
9%
8%
7%
6%
5%
4%
3%
2%
1%
0%
9%
8%
7%
6%
5%
4%
3%
2%
1%
0%
Figure 9. The PSNR comparison of EEP/UEP for BL packets: (a) EEP and 3-level DS-UEP of Case 1 and (b) EEP and 3-level DS-UEP of Case 2. into three groups, each of which has a similar number of packets, and each group of packets is mapped to one DS level. For the simulation setup in Fig. 8, we have to set various parameters in the encoding and the packetization modules. Selected parameters are shown in Table 1. The BL packets are encoded by using the MPEG-4 FGS codec with MPEG-2 TM5 rate control at 128 kbps. The CIF sequence is encoded at 10 fps with a leading I-frame followed by all P frames. Since packet loss from the initial I-frame is too catastrophic, we limit the packet loss only to P-frames. Both AIR and CIR simulations are performed and compared. The performance gain from those two modes are shown in Fig. 9(a) and (b). The DiffServ transmission has a clear gain in terms of PSNR under the same bit budget and the overall packet loss ratio. It is interesting to note that gains are varying significantly in two cases as shown in Table 1. The less gain is obtained from Case 1. The RLI distribution of BL packets under Case 1 is illustrated in Fig. 10(a). Table 1. Parameters for differentiated forwarding of BL packets of MPEG-4 FGS Case Bitrate Frame rate Rate Control GOP mode ER options Packet size 1 128 kbps 10fps TM5 IPPPP... VP, AIR 400 bytes 2 128 kbps 10fps TM5 IPPPP... VP, CIR 400 bytes 5.2.2. Scenario 2: Differentiated Forwarding of EL Packets BL and EL packets are usually protected by two error protection levels. In [14], different bitplanes within EL are unequally protected. However, even the same bitplanes in different frames may have different contributions to the end-to-end visual quality so that they may be protected unequally. For example, EL packets of a low-quality BL frame typically has a higher impact than that of a high-quality BL frame. Thus, we propose to apply differentiated forwarding to EL packets. By utilizing the R-D sample derived priority, we prioritize each EL packet and perform differentiated forwarding accordingly. The RLI distribution of EL packets at a rate of 384kbps are given in Fig. 10(b).
20
0.25
BL-priority
E L384k bps , avg= 0.026, s tddev= 0.035 0.2
15
0.15
10 0.1
5 0.05
0
0
1
101
201
301
401
501
1
201
401
601
801
1001
1201
Figure 10. The RLI distribution for packets in the “Foreman” sequence: (a) BL packets under Case 1 and (b) EL packets under 384kbps (b) 384kbps EL Gain: 0.8 db
(a) 512kbps EL Gain: 1.1 db
36
35
34
34
32
33
30
RA 3DS-UEP EEP
28 0.5
0.35
0.25
RA 3DS-UEP EEP
32 31 50%
0.1
(c) 256kbps EL Gain: 0.57 db
40%
35%
30%
25%
20%
15%
10%
5%
(d) 128kbps EL Gain: 0.15 db
33.5
32
32.5
31.5
31.5
RA
31 RA
3DS-UEP
3DS-UEP
EEP
30.5
EEP
30.5 50%
45%
40%
35%
30%
25%
20%
15%
10%
6%
40%
35%
30%
25%
20%
15%
10%
6%
Figure 11. The PSNR comparison of rate adaptation, UEP, and EEP for the “Foreman” sequence under different EL rates: (a) EL at 512 kbps, (b) EL at 384 kbps, (c) EL at 256 kbps, and (d) EL at 160 kbps, where the respective no-loss PSNR is 34.99 dB, 34.19 dB, 32.97 dB and 32.07 dB (Gain is for UEP over EEP). We show the performance advantage of priority dropping in DiffServ over uniform dropping at 160 kbps, 256 kbps, 384 kbps, and 512 kbps of EL packets in Figs. 11(a)-(d), respectively. As shown in these figures, prioritized transmission has a clear gain in PSNR under the same bit budget and the same packet loss ratio. We also present the ideal performance of rate adaptation when an accurate estimate of the bandwidth is available. More graceful quality degradation can be achieved since rate adaptation is efficient in dropping packets in a strict order of priority. Another observation is that the advantage of rate adaptation over UEP and the advantage of UEP over EEP are varying under different packet-loss ratios and different EL bit rates. Under a high packet loss ratio (i.e. 25 percent or more), rate adaptation has a significant gain over UEP. On the other hand, the gain of rate adaptation over UEP becomes smaller under a low packet loss. If there is a substantial amount of bandwidth variation (i.e. corresponding to a high packet loss ratio), a higher gain can be achieved from rate adaptation at the cost of additional complexity to estimate the available bandwidth. Under a small or medium range of bandwidth fluctuation, UEP without rate adaptation works well without losing much efficiency. Finally, different EL bit rates also affect the gain of UEP over EEP. The smaller the EL bit rate is, the smaller gain of UEP over EEP is achieved. 5.2.3. Impact of Packet Priority Distribution on the UEP Gain Coding and packetization parameters have an impact on the gain of UEP over EEP in both BL and EL packets. The AIR coding choice and the fixed-length packetization narrow down the performance gap between UEP and EEP (in reference to CIR and GOB packetization). This phenomenon can be explained by the packet priority distribution as
shown in Table 2. As an extreme case, when the packet priority is set the same, the gain of UEP over EEP is zero. By modifying the coding and packetization modes, we modify the priority distribution as well as the performance gap. Most work done in UEP attempts to spread the priority distribution over a wide region so that the gain from UEP is highlighted. In conclusion, for the UEP approach, a more widely spread priority index provides more graceful quality degradation. Thus, the performance gap between UEP and EEP is largest when the priorities are spread across a wide range. On the other hand, when packet priorities are clustered into a small region, it narrows down the performance gap. Table 2. Priority distribution of BL and BL Case 1 P˜avg σP Gain 3.04 2.45 0.97db 256k EL P˜avg σP Gain 0.024 0.031 0.57db
EL packets under different encoding and packetization parameters. BL Case 2 160k EL P˜avg σP Gain P˜avg σP Gain 3.35 3.09 2.13db 0.023 0.026 0.15db 384k EL 512k EL P˜avg σP Gain P˜avg σP Gain 0.026 0.035 0.8db 0.027 0.0038 1.1db
REFERENCES 1. D. Wu, Y. T. Hou, W. Zhu, Y.-Q. Zhang, and J. M. Peha, “Streaming video over the Internet: approaches and directions”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 11, no. 1, pp. 1-20, Feb. 2001. 2. J. Kim, Y.-G. Kim, H. Song, T.-Y. Kuo, Y. J. Chung, and C.-C. J. Kuo, “TCP-friendly Internet video streaming employing variable frame-rate encoding and interpolation”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 10, no. 7, pp. 1164-1177, Oct. 2000. 3. W. Li, “Overview of fine granularity scalability in MPEG4 video standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 301-317, Mar. 2001. 4. I. Stoica, S. Shenker, and H. Zhang, “Core-stateless fair queuing: Achieving approximately fair bandwidth allocations in high speed networks,” in Proc. SIGCOMM, Vancouver, BC, Canada, Sept. 1998. 5. C. Dovrolis, D. Stiliadis, and P. Ramanathan, “Proportional differential services: Delay differentiation and packet scheduling,” in Proc. SIGCOMM, Boston, MA, Sept. 1999. 6. C. Dovrolis and P. Ramanathan, “Proportional differential services - Part II: Loss rate differentiation and packet dropping,” in Proc. International Workshop on Quality of Service, Pittsburgh, PA, June 2000. 7. S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss, “An architecture for differentiated services,” RFC 2475, IETF, Dec. 1998. 8. J. Shin, J. Kim, and C.-C. J. Kuo, “Quality-of-Service mapping mechanism for packet video in differentiated services network,” in IEEE Trans. on Multimedia, vol. 3, no. 2, pp. 219-231, June 2001. 9. X. Wu, S. Cheng, and Z. Xiong, “On packetization of embedded multimedia bitstreams,” IEEE Trans. on Multimedia, vol. 3, no. 1, pp. 132-140, March 2001. 10. H. Lee, T. Chiang, and Y.-Q. Zhang, ”Scalable rate control for MPEG-4 video”, in IEEE Trans. on Circuits and Systems for Video Technology, vol. 10, no. 6, pp. 878-894, Sept. 2000. 11. J.-G. Kim, J. Kim, J. Shin, and C.-C. J. Kuo, ”Coordinated packet level protection employing corruption model for robust video transmission,” in SPIE Proc. VCIP, San Jose, CA, Jan. 2001. 12. I.-M. Pao and M.-T. Sun, “Encoding stored video for streaming applications”, in IEEE Trans. on Circuits Syst. Video Technol., pp.199 -209, Feb. 2001. 13. V. Varsa and M. Karczewicz, “Long Window Rate Control for Video Streaming”, in Proc. Packet Video Workshop 2001, Kyungju, Korea, May, 2001. 14. Q. Zhang, G. Wang, W. Zhu, and Y.-Q. Zhang, “Robust scalable video streaming over Internet with networkadaptive congestion control and unequal loss protection,” Proc. Packet Video, May, 2001. 15. J. Lin, A. Ortega, “Bit-rate control using piecewise approximated rate-distortion characteristics”, in IEEE Trans. Circuits and Systems for Video Technology, vol. 8, no. 4, pp. 446-459, Aug. 1998.