1
Single-Pass Rate-Smoothed Video Encoding with Quality Constraint Jianhua Wua , Jianfei Caia∗ , Chang Wen Chenb School of Computer Engineering, Nanyang Technological University, Singapore 639798 Dept. of Electrical and Computer Engineering, Florida Institute of Technology, FL 32901
a b
Abstract— In this paper, we study the rate smoothing problem in single-pass video encoding, i.e., given a certain video quality constraint, how to smooth out the traffic rate so that the complexity of the video delivery can be reduced. We apply the low-pass filtering idea, originally proposed for single-pass quality-smoothed video encoding, into the problem of rate-smoothed video encoding. In particular, we use the arithmetic averaging filter to smooth out the rate during single-pass video encoding. Both theoretical analysis and experimental results show that our proposed scheme can not only smooth out the bit rate but also automatically achieve the targeted average video quality.
I. I NTRODUCTION With the advance of video coding and communication technologies, video streaming applications have become more and more popular in consumer electronics devices. Generally, there are two types of video streaming applications: real-time video streaming and non real-time video streaming [1]. In real-time video streaming applications, video sequences are encoded on the spot and streamed straightaway. From the coding point of view, essentially, real-time video streaming requires singlepass video encoding. For video streaming applications, it is highly desired that video signals can be encoded in not only good average quality but also smooth video quality or less quality fluctuations among adjacent frames. This is known as variable bit rate (VBR) video encoding. Given a total bandwidth constraint, typically, two-pass encoding schemes are employed to achieve smooth video quality. However, for real-time video steaming applications, it is impossible to pre-encode a video sequence and thus global statistical information for the entire video sequence is not available. Recently, we have seen some interesting research work [2], [3] on quality-smoothed single-pass encoding This research is partially supported by Singapore A*STAR SERC Grant (032 101 0006). ∗ Contact Author. Phone: (65) 6790-6150. Fax: (65) 6792-6559. Email:
[email protected].
for real-time video streaming applications. In [2], a novel low-pass filtering scheme was proposed and the authors theoretically proved that using the geometric averaging filtering to smooth video distortion can automatically satisfy the average bandwidth constraint. In [3], a singlepass constant-distortion bit allocation scheme was proposed, which is based on the ρ-domain rate control algorithm [4] and takes the average distortion of all previous coded frames as the target distortion to encode the current frame. The experimental results show that the scheme proposed in [3] can achieve smooth video quality at the cost of high computational complexity. On the other hand, with the increased network bandwidth and the improved quality-of-service (QoS) support, users can specify the video streaming quality they want. For this type of applications, the constraint is not the network bandwidth but the user-specified quality. Without the bandwidth constraint, single-pass VBR video encoding becomes very easy. We can simply encode each video frame with a constant distortion. However, such a VBR video encoding is notorious for its high peak and large variation in bit rates, which can substantially increase bandwidth requirement for the continuous playback at the client site. Thus, in order to reduce the end-to-end network resource requirements, the smoothed traffic bit rate is desired [5]. In this paper, we investigate the problem of smoothing the rate during single-pass video encoding given a certain quality constraint. In particular, we apply the lowpass filtering idea in [2] into the problem. We use the arithmetic averaging filter to smooth out the rate during single-pass video encoding. Both theoretical analysis and experimental results show that our proposed scheme can not only smooth out the bit rate but also automatically achieve the targeted average video quality. The rest of the paper is organized as follows. Section II describes our proposed single-pass rate-smoothed video encoding scheme. Section III presents the experimental results. Finally, concluding remarks are given in section IV.
2
II. P ROPOSED R ATE S MOOTHED V IDEO E NCODING
From (6), the corresponding PSNR value U (n) can be calculated as
A. Theoretical Analysis Assume the target average PNSR of the entire video sequence is UT . The main task is to determine the number of bits R(n) to encode the current n-th frame. To smooth out the traffic bit rate, R(n) is obtained through the arithmetic averaging filtering as (1)
where M is the filter length and RT (i) is the corresponding bit rate when the i-th frame is coded with the PSNR value of UT . Similar to [2], we can prove that the target average PSNR of the entire video sequence can be automatically achieved by using this arithmetic averaging filter. In particular, the average PNSR of the entire video sequence is calculated as ¯ [N ] = 1 U N
U (n),
(2)
n=1
where N is the total number of encoded video frames and U (n) is the PSNR value of the n-th frame. Assume Gaussian source with the rate distortion model 1 σ2 R(D) = log2 , (3) 2 D where σ 2 is the picture variance. Let σ 2 (n) be the variance of the n-th frame, where 1 ≤ n ≤ N , and DT be the corresponding mean-squared error (MSE) for the PSNR value of UT . To encode the video with the PSNR value of UT , the number of bits required can be derived as 1 σ 2 (n) RT (n) = log2 . (4) 2 DT According to (1) and (4), R(n) can be expressed as QM
1 [ R(n) = log2 2
i=1 σ
2 (n
M 1 X log σ 2 (n − i)]. M i=1 10
(7)
Thus, we have
M 1 X R(n) = RT (n − i), M i=1
N X
2552 D(n) = UT − 10[log10 σ 2 (n) −
U (n) = 10 log10
1
− i)] M
DT
.
(5)
Combining (5) and (3), the MSE value of the n-th frame becomes D(n) = σ 2 (n)2−2R(n) σ 2 (n)DT = QM 1 . [ i=1 σ 2 (n − i)] M
(6)
N 10 X ¯ U [N ] = UT − [log10 σ 2 (n) − N n=1 M 1 X log σ 2 (n − i)]. M i=1 10
(8)
Since log10 σ 2 (n) is bounded, we confirm that ¯ [N ] = UT . lim U
N →+∞
(9)
B. Proposed Algorithm In this paper, we adopt the ρ-domain R-D model [4] for rate control, where the source coding rate Ri , the distortion Di of frame i are considered as functions of ρi , which is the percentage of zeros among the quantized DCT coefficients of frame i. Specifically, the rate model can be written as R(ρ) = θ(1 − ρ) + Ch ,
(10)
where θ is a constant and Ch refers to the number of bits for header information and motion vectors. In addition, we consider encoding video sequences with a pattern of one I-frame followed by all P-frames, and the first I-frame is coded with a fixed quantization parameter. The detailed rate-smoothing algorithm is described as follows. Step 1: Initialization. The first P-frame of the video sequence is encoded with the target PSNR UT , and the corresponding bit rate RT (1) is stored. The ratesmoothing procedure starts from the 2-nd P-frame. In order to encode the current frame with PSNR UT , we first estimate the number of zeros ρT corresponding to UT using the linear interpolation method in [3] and then compute RT according to Eq. (10), where the θ of the previous frame is used at first and then adaptively changed during the encoding as step 3. Step 2: Determine the target bit rate. The target number of bits for the current n-th frame R(n) is calculated according to Eq. (1). Note that if there are less than M P-frames being encoded, the filter length M is changed to the number of available P-frames.
3
In addition to the constraint of maintaining an average PSNR value, we can further limit the PSNR of each individual video frame within a certain range [UTmin , UTmax ], where UTmin < UT < UTmax . The corresponding number of bits RTmin (n) and RTmax (n) can be obtained using the same method as that for estimating RT in step 1. After obtaining RTmin (n) and RTmax (n), the target bit rate R(n) is further updated as R(n) = min(RTmax (n), max(RTmin (n), R(n))) in order to guarantee the PSNR of the current frame lies in the desired range. Step 3: Perform video encoding. Using the ρ-domain rate control algorithm to encode the current frame. If there is a PSNR constraint [UTmin , UTmax ] and Nm ≥ 10, where Nm is the number of the coded macroblocks in the current frame, update the value of RTmin (n) and RTmax (n) with the new value of θ . The condition of Nm ≥ 10 is selected in accordance with the ρ-domain rate control algorithm. R(n) is also needed to be further updated as R(n) = min(RTmax (n), max(RTmin (n), R(n))). Step 4: Estimate RT (n). After encoding the current frame, calculate RT (n) using the same method as that for estimating RT in step 1 with the actual value of θ since θ is known after encoding the current frame. Sept 5: Loop. If there is any frame left, go to step 2; otherwise stop. Note that the rate-smoothed encoding process needs to be stopped and re-initialized when a scene change occurs or a new I-frame is encoded. III. E XPERIMENTAL R ESULTS In this section, we conduct experiments to test the performance of our proposed single-pass rate-smoothed video encoding scheme. We implemented our proposed scheme in UBC H.263+ video codec [6] with unrestricted motion vector mode and advanced prediction mode. The experiments are performed on two QCIF format video sequences (300 frames and 30 fps): News and Coastguard. The first I-frame is coded with a fixed QP of 13 and the filter length M is set to 30. We compare the performance of three bit allocation schemes: the constant distortion scheme, the proposed rate-smoothing scheme (termed as R-Smoothing) and the constrained rate-smoothing scheme (termed as Constrained R-Smoothing). For the constant distortion scheme, each frame is encoded with the constant PSNR value UT . The constrained rate-smoothing scheme denotes the proposed rate-smoothing scheme with the constraint, i.e. limiting the target PSNR value of each frame within the range of [UTmin , UTmax ]. We choose UTmin = UT − δ and UTmax = UT + δ, and set δ to be 1 dB, which describes the allowable PSNR fluctuation.
Table I shows the performance comparison among different bit allocation schemes, where D denotes the average MSE, P SN R denotes the average PSNR and R denotes the average bits per frame. Note that σR denoting the standard deviationqof the number of bits per frame P P is calculated as σR = n1 ni=1 (R(i) − n1 nj=1 R(j))2 , and PR is the maximal number of bits used by one frame. The video quality variation is measured by ∆D, which 1 Pn is calculated as ∆D = n−1 i=2 |D(i) − D(i − 1)|. To perform a fair comparison, the I-frame and the first 30 P-frames are not included in the calculations of the results. From Table I, we can see that the average PNSR results of all the three schemes match the target values very well. For the Foreman sequence, our proposed rate smoothing schemes only reduce the rate variation σR slightly and the resulted peak rates are more or less the same as that for the constant distortion scheme. This is because the Foreman sequence contains various amount of motion with a large camera panning motion at the end, where the large number of intra-coded MBs dominates the peak rate. For the slow-motion News sequence, compared with the constant distortion scheme, our proposed R-Smoothing scheme significantly reduces the rate variation σR , up to 37% reduction, and the peak rate PR , up to 61% reduction, at the cost of increasing quality fluctuation. On the other hand, although the constrained R-Smoothing scheme cannot reduce the peak rate as significantly as the R-Smoothing scheme, it reduces the quality fluctuation ∆D of the R-Smoothing scheme, up to 43% reduction. For the smooth-motion Coastguard sequence, the similar reduction in rate variation can be observed although it is not as significant as that for slowmotion sequences. Table II shows the effect of using different filter lengths for our proposed schemes. From the table, we can see that larger filter length can provide smoother traffic rate. However, the performance gain achieved by further increasing the filter length is not significant when the filter length is sufficiently large. To illustrate the tradeoff between quality fluctuation and rate fluctuation in a clearer way, we depict the rate-distortion results of individual video frames for the QCIF Coastguard under the three bit allocation schemes with a target average PSNR of 38 dB in Fig. 1. IV. C ONCLUSION In this paper, we have proposed the R-Smoothing scheme and the constrained R-Smoothing scheme for single-pass VBR video encoding. For slow-motion and smooth-motion sequences, the R-Smoothing scheme can
4
TABLE I T HE PERFORMANCE COMPARISON OF DIFFERENT BIT ALLOCATION SCHEMES . Video Sequence
Target PSNR (dB)
Foreman
38
News
35
Coastguard
38
Bit Allocation Scheme Constant Distortion R-Smoothing Constrained R-Smoothing Constant Distortion R-Smoothing Constrained R-Smoothing Constant Distortion R-Smoothing Constrained R-Smoothing
P SN R (dB) 37.90 37.81 37.83 35.01 35.10 35.01 37.89 37.86 37.85
D
∆D
10.55 11.08 10.87 20.54 20.84 20.75 10.58 10.84 10.78
0.05 0.82 0.57 0.05 1.76 1.00 0.07 0.99 0.77
R (kbits) 13.58 13.41 13.32 3.51 2.91 3.14 21.39 21.67 21.57
σR (kbits) 3.79 3.13 3.00 1.71 1.08 1.16 5.19 4.09 4.26
PR (kbits) 22.33 22.41 21.21 13.85 5.37 9.56 40.57 29.75 35.94
TABLE II T HE R-D PERFORMANCE UNDER DIFFERENT FILTER LENGTHS .
R-Smoothing Constrained R-Smoothing
Fillter Length(M) 10 20 30 10 20 30
significantly reduce the peak rate and the rate variance while still achieving the target average video quality, and the constrained R-Smoothing scheme can trade off between the quality fluctuation and the rate fluctuation. We would like to point out that our proposed ratesmoothed video encoding schemes can only reduce rate fluctuations among the same type of video frames. For large-scale rate fluctuation due to scene changes and different types of video frames, buffer management or bandwidth smoothing techniques [5] need to be used.
P SN R (dB) 37.82 37.84 37.86 37.80 37.82 37.85
D
∆D
10.95 10.88 10.84 10.90 10.86 10.78
1.07 1.00 0.99 0.82 0.83 0.77
R (kbps) 21.51 21.54 21.67 21.38 21.45 21.57
σR (kbits) 4.71 4.28 4.09 4.61 4.41 4.26
PR (kbits) 34.86 30.92 29.75 36.29 35.97 35.94
45 Constant Distortion R−Smoothing Constrained R−Smoothing 40
35
30 Kbits
Bit Allocation Scheme
25
20
15
R EFERENCES 10
0
50
100
150 Frame
200
250
300
42 Constant Distortion R−Smoothing Constrained R−Smoothing 41
40
PSNR (dB)
[1] G. J. Conklin, G. S. Greenbaum, K. O. Lillevold, A. F. Lippman, and Y. A. Reznik, “Video coding for streaming media delivery on the internet,” IEEE Trans. on Circuits and Systems for Video Technology, pp. 269–281, Mar. 2001. [2] Z. He, W. Zeng, and C. W. Chen, “Low-pass filtering of ratedistortion functions for quality smoothing in real-time video communication,” IEEE Trans. on Circuits and Systems for Video Technology, pp. 973 – 981, Aug. 2005. [3] J. Lan, W. Zeng, and X. Zhuang, “Operational distortionquantization curve-based bit allocation for smooth video quality,” Journal of Visual Communication and Image Representation, pp. 527–543, Aug. 2005. [4] Z. He and S. Mitra, “Optimum bit allocation and accurate rate control for video coding via ρ-domain source modeling,” IEEE Trans. on Circuits and Systems for Video Technology, pp. 840 – 849, Oct. 2002. [5] J. Rexford and D. Towsley, “Smoothing variable-bit-rate video in an internetwork,” IEEE /ACM Transactions on Networking, pp. 202 – 215, Apr. 1999. [6] G. Cote, B. Erol, M. Gallant, and F. Kossentini, “H.263+: Video coding at low bit rates,” IEEE Trans. on Circuits and Systems for Video Technology, pp. 849 – 866, Nov. 1998.
39
38
37
36
35
0
50
100
150 Frame
200
250
300
Fig. 1. The R-D performance for encoding QCIF Coastguard under different bit allocation schemes with a target average PSNR of 38 dB.