posed a rate control scheme based on optimal bit allocation for low bit rate ... where Rs denotes the available bit rate in bits per second. (bps), R denotes the ...
OPTIMAL BIT ALLOCATION FOR LOW BIT RATE VIDEO STREAMING APPLICATIONS Zhihai He, Chang Wen Chen
Jianfei Cai
Sarnoff Corporation 201 Washington Road Princeton, NJ 08543
Univ. of Missouri-Columbia Dept. of Electrical Engineering Columbia, MO 65211
Current rate control schemes in video coding standards do not have efficient frame-level bit allocation because of the inherent constraints in real-time encoding. In this paper, we assume an offline video encoding environment and proposed a rate control scheme based on optimal bit allocation for low bit rate streaming applications. Specifically, at the encoder, we code each video sequence in two passes. In the first pass, we generate the characteristic information of video sequences, including the rate-distortion (R-D) functions. Then, in the second pass, based on available channel bandwidth and the characteristic information we have extracted in the first pass, we implement frame-level bit allocation in an optimal way so that video sequences can be coded at low bit rate with an improved quality. Experimental results demonstrate the proposed scheme is able to achieve not only noticeable reduction in average distortion but also a more consistent and smoother visual quality.
standards, such as TM5 in MPEG-2, VM8 in MPEG-4 and TMN8 in H.263, are designed for real-time encoding applications. For encoding video stored in the server, we usually enjoy more computational power at the encoder. Therefore, it is possible to design a scheme that may require more processing and buffer size but can achieve better video quality. In this paper, we proposed a rate control scheme based on optimal frame-level bit allocation for offline video encoding at low bit rate. Specifically, at the encoder, we code each video sequence twice. At the first pass, we generate the characteristic information of video sequences, such as R-D functions. Then, in the second pass, based on available channel bandwidth and the characteristic information we have extracted in the first pass, we implement frame-level bit allocation in an optimal way so that video sequences can be coded at low bit rate with an improved quality. Notice that although the proposed video encoding scheme is targeted for low bit rate streaming, the same strategy can be applied to any storage or offline video coding applications.
1. INTRODUCTION
2. PROBLEM STATEMENT
With rapidly growing demand for streaming services, including video-on-demand, digital library and non-interactive distance learning, video streaming has received much attention for the last few years. Unlike text or image, video sequences typically have huge volume in data size. It is true that digital video data, in its original format, are too voluminous for transmission or storage. A certain degree of compression is desired and is often constrained by the target bit rate, or, the bandwidth of the channel. In a typical video streaming system, video sequences are encoded in advance and stored in the server, usually located in a backbone network. Users can access the server through a network access point. Since access points usually have much lower bandwidth comparing with backbone networks, the target bit rate for such a video streaming application will depend on the bandwidth of the access points. In the case of low bit rate video streaming, in order to achieve better video quality, optimal rate control is very important. Current rate control schemes in video coding
As mentioned in Section 1, current rate control schemes in video coding standards are designed for real-time encoding applications. These rate control schemes usually consist two major steps. In the first step, the target bit rate for each video frame is obtained by a frame-level bit allocation scheme. In the second step, the quantization parameter (QP), denoted as , is determined based on a rate model . The actual quantization parameter for each macroblock (MB) can be adjusted by the buffer status and the spatial activity of MBs. Since, for real-time applications, the future video frame information is not available, the frame-level bit allocation is usually very simple. For example, in TMN8 [1], the target number of bits for each frame is determined by
ABSTRACT
where (bps),
(1)
denotes the available bit rate in bits per second denotes the target frame rate in frames per sec-
ond (fps) and is a small feedback value. Since the actual number of bits of a frame is usually different from the target number of bits, it is necessary to use to reflect such a difference in order to keep the actual coding rate as close as possible to the available bandwidth. From Eqn. (1), we can see that, the frame-level bit allocation of TMN8 basically assigns equal number of bits to each frame. Obviously, such a bit allocation scheme is unable to achieve optimal performance because it does not match the nonstationarity characteristics of video signals. However, for offline encoding or video storage, it is possible to optimally allocate bits among video frames by a multi-pass encoding strategy [2]. To achieve an optimal bit allocation, it is desired to generate R-D functions of all the video frames within an entire video sequence. For standard video coding, rate and distortion largely depend on the quantization parameter . Therefore, R-D functions can be and functions. Assuming that the expressed as R-D function for each frame is independent, we can easily generate the R-D functions by coding the video sequence multiple times. However, for standard video coding, the RD function of each frame is not independent due to motion compensation. This makes the generation of the R-D functions for all the video frames nearly impossible. Recently, two video coding schemes for storage applications [3, 4] have been developed. The basic idea of these schemes is make use of information from future frames. In [4], the authors used original frames for motion estimation (ME) and motion compensation (MC). This way, the R-D function of one frame can be considered independent from other frames. However, directly applying this scheme to low bit rate coding may cause significant quality degradation. This is because in low bit rate coding, there is significant errors between reconstructed frames and original frames. These errors will accumulate and propagate along the motion compensation path if original frames are used for MC. In [3], the authors propose a slide-window frame-level bit allocation scheme for low bit rate streaming applications. Specifically, bits are allocated among the frames within a slide-window. The number of bits for each frame is allocated proportional to the weight of the standard deviation of frame within the frames. The experimental results demonstrate an improved performance. However, since two frames with an equal variance may have quite different ratedistortion performance, the bit allocation based only on the frame variances is inadequate. This is particularly evident in the case that a slide-window contains different types of video scenes.
3. OPTIMAL BIT ALLOCATION ANALYSIS Recently, a novel R-D model has been developed [5] for DCT-based video coding. In this model, the source coding
rate and the distortion of a frame are considered as functions of , which is the percentage of zero among the quantized DCT coefficients of the frame . Specifically, the rate model can be written as
(2)
where is a constant, is the number of pixels in a frame, and refers to the number of bits for the header information and the motion vectors for the frame . The distortion model can be written as
!"$#%&
(3)
where is the standard deviation of the frame , and ' is a constant. The rate and distortion models have been shown to offer an accurate rate control for various DCT-based video coding applications. We adopt these R-D models for the frame-level bit allocation. Suppose we are given frames. Based on Eqns. (2) and (3), the optimum frame-level bit allocation can be formulated as -
. (*),+ 0001!%#%& (4) , / # 23 4"35 . 6 . (5) ,/ ,/
where 798 is the total number of bits available for the 7;:
frames. With the Langrange multiplier, we can convert this constrained minimization problem into a unconstrained problem. By solving this minimization problem, the optimal number of bits for a frame can be calculated as
<
-
. ,/
D >,?0@
=
.
; =
-
,/
>,?0@ ;=
B A C =
-
,= / =
(6)
where F E . Notice that these optimal bit allocation results are = obtained under the total bandwidth constraint. For practical applications, there will also be buffer constraints. In a typical video decoder, a decoding buffer with limited size can be used to smooth out the difference between the channel transmission rate, that is, the input data speed of the buffer, and the decoding rate, that is, the output data speed of the buffer. However, if the accumulated difference exceed the capability of the buffer, buffer underflow and overflow will occur. Buffer overflow can be resolved by increasing the buffer size or increasing the output data speed. Buffer underflow can be resolved by increasing the pre-loading time or reducing the output data speed. The buffer constraints have been studies in many research works [3, 4]. In this research, we assume we have sufficient buffer size and pre-loading time. Therefore, only the total bandwidth constraint is considered.
4. RATE CONTROL ALGORITHM Based on the optimal frame-level bit allocation, we propose a rate control algorithm for offline video encoding. As shown in Eqn. (6), in order to optimally allocate bits among frames, we need to collect the characteristics of each frame,
3 3 3 . Since the entire including , , ' and , video sequence is available in advance in the case of offline video coding, we can pre-encode the video sequences once by using a fixed quantization parameter . For a frame in a video sequence, after this pre-encoding, we are able to obtain , the number of bits for the frame, , the percentage of zero among the quantized DCT coefficients, , the average distortion, , the average variance, and , the number of bits for the header information and the motion vectors. According to Eqns. (2) (3), we can compute and ' . With these pre-generated characterisitc information: , , ' and , we propose a rate control algorithm as follows: Step 0: Initialize , where is a feedback value which reflects the accumulated difference between the target number of bits and the actual number of bits of encoding the previous frames. Step 1: Compute the target bits for the frame as
< C -,= /
. ,/
.
>,?0@
=
=
,/
>,?0@
9= 9 3 =
9 A
(7)
=
Step 2: Use the macroblock-layer rate control algorithm proposed in [5] to distribute to the macroblocks in the th frame. Record the actual number of bits for the th frame as . Step 3: After encoding the th frame, update by . If there are no more frame, the encoding is finished. Otherwise, and go to step 1.
5. EXPERIMENTAL RESULTS In this section, we perform experiments based on the standard H.263 codec to illustrate the effectiveness of the proposed algorithm. Similar results can be obtained by using other standard video codecs. The experiments are performed on two QCIF format video sequences. The first video sequence is 300 frames of “Foreman”, which contains large facial movements and camera panning at the end. The second one is composed of three video scenes: 100 frames of “Foreman”, fast motion, 100 frames of “Mother & Daughter”, slow motion, and 100 frames of “Coastguard”, fast motion.
We compare our proposed rate control scheme with the rate control scheme presented in [5], termed as the histogram scheme (HIST). As shown in [5], HIST is able to achieve better rate control performance than TMN8 in H.263. Generally speaking, HIST is a MB-level rate control scheme, and the frame-level bit allocation method adopted in HIST is the same as that in TMN8. Since we also adopted HIST for the MB-level rate control in our proposed rate control scheme, the comparison between our proposed rate control scheme and HIST can be considered as the comparison between the proposed optimal frame-level bit allocation method and the frame-level bit allocation method in TMN8. Table 1 and 2 show the performance of encoding the test video sequences at 10 fps frame rate under different bit rates. The performance parameters include “Average Distortion”, “Distortion STD”, “Buffer Size” and “Pre-loading Time”. “Average Distortion” ( ) denotes the average MSE of a frame. “Distortion STD” ( ) denotes the standard deviation of the distortion for all frames. “Buffer Size” and “Pre-loading Time” denote the required buffer size and the required pre-loading time in order to guarantee no buffer underflow and overflow under a constant channel transmission rate. Although we did not consider buffer constraints in this work, the reason we still list these buffer parameters is for fair comparison.
Table 1. The performance of encoding video sequence 1. Channel RC Average DistortionBufferPre-loding Rate Scheme Distortion STD Size Time (kbps) (dB) (dB) (kbits) (second) 32 HIST 20.91 22.27 9.56 0.30 Proposed 19.66 11.05 30.98 0.82 64 HIST 16.75 12.48 9.56 0.15 Proposed 16.49 8.54 50.07 0.53 128 HIST 13.41 9.04 12.99 0.10 Proposed 13.20 5.58 64.90 0.16
Table 2. The performance of encoding video sequence 2. Channel RC Average Distortion Buffer Pre-loding Rate Scheme Distortion STD Size Time (kbps) (dB) (dB) (kbits) (second) 32 HIST 19.61 17.90 9.56 0.30 Proposed 19.07 14.31 62.23 0.91 64 HIST 16.69 15.12 9.56 0.15 Proposed 16.14 11.52 130.04 0.59 128 HIST 13.78 12.66 12.99 0.10 Proposed 12.95 8.57 237.34 0.35 As shown in Table 1 and 2, the proposed rate control scheme outperforms HIST under all the channel rates for all the test video sequences. The reduction in average distortion is from 0.21 dB to 1.25 dB, and the reduction in
standard deviation is more outstanding, for at least 3 dB. This indicates the proposed rate control scheme is able to achieve much smoother visual quality. In the case of encoding Video Sequence 1 at 32 kbps, the reduction of is up to 11.22 dB. Such a significant improvement is not only due to the optimal bit allocation but also due to a smart frame dropping control adopted in our proposed rate control scheme that reduces significantly the number of skipped frames. Due to the page limitation, the frame dropping control will not be described in this paper. Also shown in these tables are the buffer requirements for the proposed video coding. Although the proposed scheme requires larger buffer size and longer pre-loading time, the required buffer size is only several hundred kilobits and the required pre-loading time is well under one second for most cases. We believe this is acceptable for video streaming applications. Fig. 1 shows the bit allocation results of encoding Video Sequence 2 at 128 kbps. It is clear that more bits are allocated to the high activity scenes while less bits are allocated to the low activity scene. Fig. 2 shows the PSNR comparison between the HIST rate control scheme and the proposed rate control scheme for encoding Video Sequence 2 at 128 kbps. These experimental results demonstrate that the proposed rate control scheme is able to efficiently allocate bits among frames to achieve a better overall visual quality.
44 Histogram Method Proposed Method
4
3
x 10
Histogram Method Proposed Method
42
40
PSNR (dB)
38
36
34
32
30
28
0
20
40
60
80
100
Frame Number
Fig. 2. The PSNR comparison between the HIST rate control scheme and the proposed rate control scheme for encoding Video Sequence 2 at 128 kbps. allocated among video frames. Experimental results show that the proposed rate control scheme can achieve 0.21 dB to 1.25 dB reduction for the average distortion and, more outstandingly, it can achieve at least 3 dB reduction for the standard deviation. We demonstrate that the proposed rate control scheme based on optimal frame-level bit allocation is able to achieve not only noticeable reduction in average distortion but also a more consistent visual quality.
2.5
7. REFERENCES Bits Per Frame
2
[1] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for low-delay video communications,” IEEE Trans. on Circuits and Systems for Video Technology, pp. 172–185, Feb. 1999.
1.5
1
0.5
0
0
20
40
60
80
100
[2] W. Ding and B. Liu, “Rate control of MPEG video coding and recording by rate-quantization modeling,” IEEE Trans. on Circuits and Systems for Video Technology, pp. 12–20, Feb. 1996.
Frame Number
Fig. 1. The bit allocation results of encoding Video Sequence 2 at 128 kbps.
6. CONCLUSION In this work, we have proposed a rate control scheme based on optimal frame-level bit allocation for low bit rate offline video coding. Specifically, we pre-generated the characteristic information of all the frames within a video sequence. Based on the characteristic information, bits are optimally
[3] I.-M. Pao and M.-T. Sun, “Encoding stored video for streaming applications,” IEEE Trans. on Circuits and Systems for Video Technology, pp. 199–209, Feb. 2001. [4] Y. Yue, J. Zhou, Y. Wang, and C. W. Chen, “A novel two-pass VBR coding algorithm for fixed size storage applications,” IEEE Trans. on Circuits and Systems for Video Technology, pp. 345–356, March 2001. [5] Z. He, Y. Kim, and S. K. Mitra, “Low-delay rate control for DCT video coding via -domain source modeling,” IEEE Trans. on Circuits and Systems for Video Technology, pp. 928–940, Aug. 2001.