Jun 19, 2003 - Yoon Kim, Jae-Young Pyun, Hye-Soo Kim, Sang-Hee Park, and Sung-Jea Ko, Senior Member, IEEE. Abstract â A real-time frame-layer rate ...
Y. Kim et al.: Efficient Real-Time Frame Layer Rate Control Technique for Low Bit Rate Video over WLAN
621
Efficient Real-Time Frame Layer Rate Control Technique for Low Bit Rate Video over WLAN Yoon Kim, Jae-Young Pyun, Hye-Soo Kim, Sang-Hee Park, and Sung-Jea Ko, Senior Member, IEEE Abstract — A real-time frame-layer rate control algorithm is proposed for low bit rate video coding over IEEE 802.11 wireless local area network (WLAN). The proposed rate control method performs bit allocation at the frame level to minimize the average distortion over an entire sequence as well as variations in distortion between frames. A new framelayer rate-distortion model is derived, and a non-iterative optimization method is used for low computational complexity. The proposed algorithm produces low encoding time delay, and is suitable for real-time low-complexity video encoder. Experimental results indicate that the proposed control method provides better visual and PSNR performance than the existing TMN8 rate control method. Index Terms — Frame-layer rate control, R-D model, Lagrangian multiplier. I.
T
INTRODUCTION
he rapid growth of wireless communication and access, together with the success of the Internet, has brought a new era of mobile/wireless multimedia applications and services. The convergence of Internet, wireless, and multimedia has created a new paradigm of research and development that enables multimedia content to move seamlessly between the Internet and mobile wireless networks. With the benefit of the increase in bandwidth in wireless networks, new access capabilities including mobile visual phones and video streaming with various video coding standards are becoming popular. In block-based video coders such as MPEG and H.26x, the number of bits and distortion for each image block encoding are controlled by the quantization parameter of the block. The objective of the rate control is to select the quantization parameters so that the encoder produces bits at transmission bandwidth and the overall distortion is minimized. Therefore, the rate control not only regulates the output bit-stream to meet certain given conditions, but also enhances the quality of coded video. However, the rate control algorithms are not standardized since they are independent of the decoder structure. Many rate control schemes have been proposed in [1]-[10]. In general, these schemes can be thought of as having a frame or a macroblock layer. The frame-layer rate control assigns a
This work was supported by a grant from Samsung Advanced Institute of Technology. Yoon Kim, Jae-Young Pyun, Hye-Soo Kim, Sang-Hee Park, and Sung-Jea Ko are with the Department of Electronics Engineering, Korea University, Seoul, Korea (e-mail: {yooni, jypyun, hyesoo, jerry, sjko}@dali.korea.ac.kr). Manuscript received June 19, 2003
target number of bits to each video frame and, at a given frame, the block-layer rate control selects the blockquantization parameters to achieve the frame target. Some frame-layer rate control approaches use simple formulas, but these simple methods generally do not achieve the target number of bits accurately [5]. Other approaches use various rate-distortion strategies to assign a target number of bits to each frame [6]-[8]. However, since they usually use either an iteration method for optimal bit allocation or a pre-analysis method on a group of frames before encoding, they produce time delay or high computational complexity. In this paper, we propose a real-time frame-layer control method for H.263+ [11] over IEEE 802.11 WLAN [12]. First, we estimate the target bandwidth for video transmission over WLAN using the received signal strength indication (RSSI) and the channel utilization obtained from medium access control (MAC) layer. Next, in order to achieve accurate frame rate control, a new frame-layer rate-distortion (R-D) model is derived. The proposed R-D model minimizes the average distortion over an entire sequence as well as variations in distortion between frames. Furthermore, we use a non-iterative method with a low computational complexity for the real-time rate control. It is seen that the proposed rate control algorithm produces low encoding time delay. The paper is organized as follows. In the next section, the conventional TMN8 rate control algorithm is briefly introduced. The proposed frame-layer rate control scheme is presented in Section III. Section IV presents and discusses the experimental results. Finally, our conclusions are given in Section V. II. REVIEW ON CONVENTIONAL TMN8 RATE CONTROL In H.263, the current video frame to be encoded is decomposed into macroblocks of 16×16 pixels per block, and the pixel values for each of the four 8×8 blocks in a macroblock are transformed into a set of coefficients using the DCT. These coefficients are then quantized and encoded with some type of variable-length coding. The number of bits and distortion for a given macroblock depend on the macroblock's quantization parameter used for quantizing the transformed coefficients. For example, in a test model TMN8 [13] for the H.263 standard, the quantization parameter is denoted by QP whose value corresponds to half the quantization step size. The TMN8 rate control uses a frame-layer rate control to select a target number of bits for the current frame and a macroblocklayer rate control to select the values of the quantization stepsizes for the macroblocks. In the following discussions, the following definitions are used:
0098 3063/00 $10.00 © 2003 IEEE
622
IEEE Transactions on Consumer Electronics, Vol. 49, No. 3, AUGUST 2003
Ni : number of macroblocks that remain to be encoded in the frame;
B : target number of bits for a frame; R : channel rate in bits per second; F : frame rate in frames per second;
σi
: standard deviation of the ith macroblock;
αi
: distortion weight of the ith macroblock;
W : number of bits in the encoder buffer; C : overhead rate; M : some maximum value indicating buffer fullness, by default, set R/F;
β i : number of bits left for encoding the frame, where β1 = B at the initialization stage.
Wprev : previous number of bits in the buffer; B' : actual number of bits used of encoding the previous frame.
III. PROPOSED FRAME-LAYER RATE CONTROL
In the frame-layer rate control, a target number of bits for the current frame is determined by
B=
R − ∆, F
(1)
W W > Z ⋅M, , ∆=F W − Z ⋅ M , otherwise,
(2)
W = max (W prev + B ' − R F , 0 ) ,
(3)
where Z=0.1 by default. The frame target varies depending on the nature of the video frame, the buffer fullness, and the channel throughput. To achieve low delay, the algorithm tries to maintain the buffer fullness at about 10% of the maximum M. If W is larger than 10% of the maximum M, the frame target B is slightly decreased. Otherwise, B is slightly increased. The macroblock-layer rate control selects the values of the quantization step-sizes for all the macroblocks in the frame, so that the sum of the bits used in all macroblocks is close to the frame target B in (1). The optimized quantization step size
* i
Q
First, we estimate the target bandwidth for video transmission over WLAN. The WLAN card of the video server for our experiments can periodically measure the link status information of the WLAN channel that are the RSSI and the channel utilization. The RSSI, the indication of up to 8 bits (256 levels), is a measure of the RF energy received by the physical layer. We estimate the target bandwidth for the period that is the time interval between two successive measurements of the link status. Next, the target bit budget is optimally allocated to each frame using the frame-layer rate control method. Fig. 1 shows the basic concept, where the bundle of frames during the time interval is referred to as the temporal frame segment. ^ R 1
^ R2
^ R N-1
^ RN
q
-q
q
q
1
2
N-1
N
kth temporal segment WLAN network status obtained from MAC Layer
WLAN network status obtained from MAC Layer Target bandwi dthBW k
S EG
(bits/sec)
Opti mal target bit (bits/frame) by frame layer rate control
Constant target bit (bits/frame) = BW k SEG / frame rate
for macroblock i in a frame can be determined by
Qi* =
σi AK β i − AN i C α i
Fig. 1. Bandwidth estimation using network status information and bit allocation for a frame
N
∑α σ k
k =i
where K : model parameter; A : number of pixels in a macroblock;
k
,
(4)
For the frame-layer rate control, we employ an empirical data-based frame-layer R-D model using the quadratic rate model and the affine distortion model [1] with respect to the average QP in a frame, which is given by
(
)
Rˆ ( qi ) = ( a ⋅ qi −1 + b ⋅ qi −2 ) ⋅ MAD fˆref , f cur ,
(5)
Y. Kim et al.: Efficient Real-Time Frame Layer Rate Control Technique for Low Bit Rate Video over WLAN
Dˆ ( qi ) = a′ ⋅ qi + b′,
N kSEG
∑ Dˆ ( q ) ⋅ ( Dˆ ( q ) − D ) ,
(6)
i
where a, b, a', and b' are the model coefficients,
fˆref is the
qi is the average QP of all macroblocks in the ith
frame, Rˆ ( qi ) and Dˆ ( qi ) are the rate and distortion models of the ith frame, respectively. The model coefficients are determined by using the linear regression analysis and the formula consisting of the previous encoding results as follows:
Ri ⋅ qi −1 − ⋅ b q ∑ i ˆ i =1 MAD f i −1 , f i , a= N N Ri N ⋅∑ i =1 MAD fˆ , f i −1 i b= 2 N N −2 −1 N ⋅ ∑ qi − ∑ qi i =1 i =1 , N N Ri ⋅ qi ⋅ q −1 ∑ i i =1 MAD fˆ , f ∑ i =1 i −1 i − 2 N N N ⋅ ∑ qi−2 − ∑ qi−1 i =1 i =1 N
(
)
(
a′ =
i =1
i
i =1
i
i =1
2
N N q − N ⋅ qi 2 ∑ ∑ i i =1 i =1
b′ =
N
N
i =1
i =1
∑ Di − a′ ⋅ ∑ qi N
,
∑ R ≤ BW
SEG k
i
(11)
i −1
⋅ TkSEG ,
(12)
i =1
where Dˆ i is the estimated distortion of the current frame, Di-1 is the actual distortion of the previous frame, NkSEG is the number of encoding frames in the kth temporal segment, BWkSEG and TkSEG are the bandwidth and the time intervals of kth temporal segment, respectively. In (11), we introduce a formulation minimizing the average distortion over an entire sequences as well as variations in distortion between frames. The optimization task in (11) and (12) can be elegantly solved using Lagrangian optimization where a distortion term is weighted against a rate term. The Lagrangian formulation of the minimization problem is given by
(
(
)
)
(13)
+ λi ⋅ max Bˆires , 0 , (8)
i −1
Bˆires = ∑ R j + Rˆi ( qi ) j =1
MADkj BWkSEG ⋅ TkSEG −∑ ⋅ , N kSEG j =1 Ave _ MADk −1 i
where
(14)
J i ( qi ) and λi is the cost function and the Lagrange
multiplier for the ith frame, Rj is the used bit-rate for the jth
i
,
(9)
(10)
where N is the number frames observed in the past, Di and Ri are the actual distortion and bit rate of the encoded ith frame, respectively. We consider a new formulation of frame-layer rate control based on the R-D model as follows: Determine
qi , i = 1, 2,L , N kSEG to minimize
i
J i ( qi ) = Dˆ i ( qi ) ⋅ Dˆ i ( qi ) − Di −1
N
∑ D ⋅∑ q − N ⋅∑(D ⋅q ) i
N kSEG
)
N
i
subject to
)
(
N
(7)
i
i =1
reconstructed reference frame at the previous time instant, fcur is the uncompressed image at the current time instant, MAD(⋅,⋅) is the mean of absolute difference between two frames,
623
frame,
MADkj is the MAD between (j-1)th and jth frames of
the kth temporal frame segment, Ave_MADk-1 is the average of MADs of the (k-1)th temporal frame segment. Since it exists the correlation of the temporal frame segments, for real-time encoding we use Ave_MADk-1 as a substitute for Ave_MADk. Therefore, the proposed algorithm does not require preanalysis process. Note that
Bˆires denotes the estimated bit
based on the R-D model. It was shown in [14] that
J i ( qi ) is
a convex function generally. Thus, we can get its optimal solution by using the gradient method given by
qˆi* = arg min J i ( qi ) . qi
Note that what we finally need is not
(15)
qˆi* but Rˆ ( qi* ) which is
the target bit budget for the ith frame. The proposed frame-layer rate control algorithm consists of
624
IEEE Transactions on Consumer Electronics, Vol. 49, No. 3, AUGUST 2003
two steps. The first step is to find the optimal bit-rates with the current Lagrange multiplier, and the second step is to adjust the Lagrange multiplier based on residual bit-rates. The properties of the Lagrange multiplier method are very appealing in terms of computation. Finding the best quantizer for a given λ is easy and can be done independently for each coding unit. In order to achieve the optimal solution at the required rate, an optimal λ must be found. Several approaches including the bisection search algorithm [15] are proposed to find a correct λ. However, the number of iterations required in searching for λ can be kept low as long as we do not seek to have an exact match of the budget rate. Moreover, since we may be performing allocations on successive frames having similar characteristics in video coding, it is possible to adjust λ for a frame using the value achieved for the previous frame. Thus, we employ the adaptive adjustment rule [17] given by
λi +1 = λi + ∆λ , ∆λ =
Bi Btarget ,i
− 1,
macroblock layer rate control algorithm allocates the bit budget to each macroblock with the solution Rˆ qi* .
( )
IV. EXPERIMENTAL RESULTS AND DISCUSSIONS Extensive experimental testing and comparison were performed on several sequences with different characteristics: “BREAM”, “CARPHONE”, “MOTHER & DAUGHTER”, “NEWS”, and “TABLE TENNIS”. These sequences are in QCIF format (176×144), and the frame rate F is 30 fps. The channel rate R is set to 64 kbps. Fig. 2 shows the captured server and client program.
(16)
where λi is the Lagrange multiplier for the ith frame and (a) i
Bi = ∑ R j ,
(17)
j =1
i
Btarget ,i = ∑ j =1
MADkj BWkSEG ⋅ TkSEG . (18) ⋅ Ave _ MADk −1 N kSEG
Therefore, the proposed rate control algorithm produces low encoding time delay. However, a negligible performance loss due to its intrinsic sub-optimality is inevitable in our design. Once the bit rate is allocated to the frame using the aforementioned frame-layer rate control, the TMN8
(b) Fig. 2. Snapshot of the video streaming system.
Calculated Measured
50000
Rate(bits)
40000
30000
20000
10000
0 0
5
10
15
20
25
30
Average QP
(a)
(b)
Fig. 3. Frame layer R-D modeling for the QCIF CARPHONE sequence: (a) the rate model (b) the distortion model as a function of the average QP of macroblocks.
Y. Kim et al.: Efficient Real-Time Frame Layer Rate Control Technique for Low Bit Rate Video over WLAN
625
TABLE I PERFORMANCE COMPARISON OF THE PROPOSED ALGORITHM WITH TMN8. Test sequences BREAM
CARPHONE
MOTHER & DAUGHTER
NEWS
TABLE TENNIS
Rate control method
Average PSNR
TMN8
31.633
1.683
Proposed technique
32.136
1.482
TMN8
31.395
2.402
Proposed technique
32.174
2.032
TMN8
36.324
1.058
Proposed technique
36.987
0.873
TMN8
32.867
0.847
Proposed technique
33.362
0.622
TMN8
29.062
1.257
Proposed technique
30.015
0.906
38
34
37
33
36 35
32 31
34
30
33
29 20
40
60
80
TMN8 Proposed
35
TMN8 Proposed
PSNR[dB]
PSNR[dB]
39
0
σ of PSNR
0
100
50
100
150
200
250
300
Frame
Frame
(a)
(b)
Fig. 4. PSNR comparison with a target average rate at 64 kbps: (a) QCIF MOTHER & DAUGHTER. (b) QCIF NEWS.
Fig. 3 (a) and (b) show the rate and distortion models, respectively, for the “CARPHONE” sequence, where the dots are the measured data points while the solid curve is the plot of the R-D model obtained by (5) and (6). As shown in these two figures, the R-D modeling works reasonably well. The R-D modeling method in fact provides a very good approximation for all test sequences in our experiment. Performance was mainly evaluated by visual judgment since there is no standard measure currently available to evaluate subjective quality. And, as an objective measure of the distance between an original image f(x,y) and its reconstructed image g(x,y), peak signal-to-noise ratio (PSNR) is used for m×n images with [0, 255] gray-level range, PSNR can be defined as
2 m × n × 255 . PSNR = 10 log10 m −1 n −1 2 ∑∑ ( f ( x, y ) − g ( x, y ) ) x =0 y =0
(19)
The performance of the proposed frame-layer rate control scheme is compared with that of TMN8. For the performance comparison for five test sequences, we show the average PSNR value and the standard deviation σ of PSNR in Table I. It is clearly seen that the proposed frame rate control algorithm can not only improve the average PSNR value, but also reduce the standard deviation of PSNR when compared with TMN8. The PSNR plots associated with the “MOTHER & DAUGHTER” and “NEWS” sequences as a function of the frame number are shown in Fig. 4 (a) and (b), respectively. It is seen that the proposed frame rate control can reduce the quality degradation better than TMN8. Visual comparisons of the proposed algorithm with the TMN8 are also provided in Figs. 5 and 6. To make a comparison of the subjective quality more clear, zoomed images are also presented. In Figs. 5 and 6, edges are well preserved by the proposed technique. Note that the proposed algorithm outperforms TMN8. Similar results were also observed on the other test sequences. Experimental results indicate that the proposed algorithm is a useful alternative to
626
IEEE Transactions on Consumer Electronics, Vol. 49, No. 3, AUGUST 2003
(a)
(b)
(c)
(d)
Fig. 5. Visual quality performance between the proposed algorithm and TMN8 on TABLE TENNIS. (a) TMN8. (b) Zoomed image of (a). (c) Proposed algorithm. (d) Zoomed image of (c).
TMN8 in terms of both the PSNR and subjective quality.
V. CONCLUSIONS In this paper, we presented new real-time frame-layer rate control for H.263+ over WLAN. We introduced a frame-layer rate control to minimize the average distortion over an entire sequences as well as variations in distortion between frames. Since the proposed technique uses the fast convergence method and does not require the pre-analysis process, it is suitable to real-time low-complexity video coding. The proposed algorithm has been tested on several sequences and found to provide better visual and PSNR performance than the existing TMN8 rate control algorithm.
REFERENCES [1]
T. Chiang and Y. Zhang, “A new rate control scheme using quadratic rate distortion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 1, pp. 246-250, Feb. 1997.
[2]
J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for lowdelay communications,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 1, pp. 172-185, Feb. 1999. [3] ITU-T SG16/Q15, Video Test Model Number 10 (TMN10), T.Gardos, Ed., Apr. 1998. [4] W. Ding and B. Liu, “Rate control of MPEG video coding and recording by rate-quantization modeling,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 12-19, Feb. 1996. [5] MPEG-2 Test Model 5, Doc. ISO/IEC JTC1/SC29 WG11/93-400, Apr. 1993. [6] K. Ramchandran, A. Ortega, and M. Vetterli, “Bit allocation for dependent quantization with applications to multiresolution and MPEG video coders,” IEEE Trans. Image Processing, vol. 3, pp. 533-545, Sept. 1994. [7] L. J. Lin and A. Ortega, “Bit rate control control using piecewise approximated rate-distortion characteristics,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, pp. 446-459, Aug. 1998. [8] P. Y. Cheng, J. Li, and C. C. Kuo, “Rate control for an embedded wavelet video coder,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp. 696-702, Aug. 1997. [9] ITU-T/SG15, Video codec test model, TMN7, Nice, Feb. 1997. [10] J. Ribas-Corbera and S. Lei, “Extension of TMN8 rate control to B frames and enhance layer,” ITU-T SG16/Q15, Doc. Q15-C-9, Dec. 1997.
Y. Kim et al.: Efficient Real-Time Frame Layer Rate Control Technique for Low Bit Rate Video over WLAN
(a)
(c)
627
(b)
(d)
Fig. 6. Visual quality performance between the proposed algorithm and TMN8 on NEWS. (a) TMN8. (b) Zoomed image of (a). (c) Proposed algorithm. (d) Zoomed image of (c).
[11] ITU-T, “Video coding for low bit-rate communication,” ITU-T recommendation H.263 Version2, January 1998. [12] IEEE standard for Wireless LAN-Medium Access Control and Physical Layer Specification, p802.11, Nov. 1997 [13] ITU-T, Video codec test model, near-term, version 8 (TMN8), H.263 AdHoc Group, Portland, June 1997. [14] L. J. Lin, A. Ortega, and C. C. J. Kuo, “Rate control using spline interpolated R-D characteristics,” in Proc. of SPIE Visual Communication Image Processing, pp. 111-122, Mar. 1996. [15] K. Ramchandran and M. Vetterli, “Best wavelet packet bases in a ratedistortion sense,” IEEE Trans. Image Processing, vol. 2, pp. 160-175, Apr. 1993. [16] G. M. Schuster and A. K. Katsaggelos, “An optimal quadtree-based motion estimation and motion based interpolation scheme for video compression,” IEEE Trans. Image Processing, vol. 3, pp. 327-331, May. 1994. [17] T. Wiegand, M. Lightstone, D. Mukherjee, T.G. Campbell, and S.K. Mitra, “Rate-distortion optimized mode for very low bit rate video coding and emerging H.263 standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 182-190, Apr. 1996. [18] D. Wu, Y. T. Hou, W. Zhu, H. J. Lee, T. Chiang, Y. Q. Zhang, and H. J. Chao, “On end-to-end architecture for transporting MPEG-4 video over the Internet,” IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 6, pp. 923-941, Sep. 2000.
Yoon Kim was born in Korea, on February 16, 1969. He received the B.S. and M.S. degrees in electronic engineering with the Department of Electronic Engineering from Korea University, in 1993 and 1995, respectively. From 1995 to 1999, he was with the LGPhilips LCD Co. where he was involved in research and development on digital image equipments. He is now a Ph.D. candidate in electronic engineering with the Department of Electronic Engineering at Korea University. His research interests are in the areas of video signal processing and multimedia communications. Jae-Young Pyun received the B.S. degree from Chosun University, and M.S. degree from Chonnam University, Kwang-ju, Korea, both in Electronics Engineering, in 1997 and 1999, respectively. He is now a Ph.D. candidate in electronics engineering with the Department of Electronics Engineering at Korea University. In 1999, he joined the Research Institute for Information and Communication Technology, where he is currently a research engineer. His research interests include Mobile QoS, IP QoS, and video compression and communication.
628
IEEE Transactions on Consumer Electronics, Vol. 49, No. 3, AUGUST 2003 Hye-Soo Kim received the B.S. degrees in electronics engineering with the Department of Electronics Engineering from Korea University, in 2002. He is currently working toward the M.S. degree in electronics engineering with the Department of Electronics Engineering at Korea University. His research interests are in the areas of Mobile QoS, IP QoS, Handoff, video signal processing and multimedia communications. Sang-Hee Park received the B.S. degree from Korea University, in Electronics Engineering, in 2002. He is now a M.S. candidate in electronic engineering with the Department of Electronic Engineering at Korea University. In 2002, he joined the Research Institute for Information and Communication Technology, where he is currently a research engineer. His research interests include Mobile QoS, IP QoS, and video compression and communication.
Sung-Jea Ko (M'88-SM'97) received the Ph.D. degree in 1988 and the M.S. degree in 1986, both in Electrical and Computer Engineering, from State University of New York at Buffalo, and the B.S. degree in Electronic Engineering at Korea University in 1980. In 1992, he joined the Department of Electronic Engineering at Korea University where he is currently a Professor. From 1988 to 1992, he was an Assistant Professor of the Department of Electrical and Computer Engineering at the University of Michigan-Dearborn. From 1986 to 1988, he was a Research Assistant at State University of New York at Buffalo. From 1981 to 1983, he was with Daewoo Telecom where he was involved in research and development on data communication systems. He has published more than 200 papers in journals and conference proceedings. He also holds over 10 patents on data communication and video signal processing. He is currently a Senior Member in the IEEE, a Fellow in the IEE and a chairman of the Consumer Electronics chapter of IEEE Seoul Section. He has been the Special Sessions chair for the IEEE Asia Pacific Conference on Circuits and Systems (1996). He has served as an Associate Editor for Journal of the Institute of Electronics Engineers of Korea (IEEK) (1996), Journal of Broadcast Engineering (1996 - 1999), the Journal of the Korean Institute of Communication Sciences (KICS) (1997 - 2000). He has been an editor of Journal of Communications and Networks (JCN) (1998 - 2000). He is the 1999 Recipient of the LG Research Award given to the Outstanding Information and Communication Researcher. He received the Hae-Dong best paper award from the IEEK (1997) and the best paper award from the IEEE Asia Pacific Conference on Circuits and Systems (1996).