ABSTRACT. At low bit rates, the bit budget for I-frame coding in H.263+ can be too high to be practical. A hybrid DCT/wavelet transform based I-frame coding is ...
HYBRID DCT/WAVELET I-FRAME CODING FOR EFFICIENT H.263+ RATE CONTROL AT LOW BIT RATES Hwangjun Song, Jongwon Kim and C.-C. Jay Kuo
Integrated Media Systems Center and Department of Electrical Engineering-Systems University of Southern California, Los Angeles, California 90089-2564 ABSTRACT
At low bit rates, the bit budget for I-frame coding in H.263+ can be too high to be practical. A hybrid DCT/wavelet transform based I-frame coding is proposed in this work as a solution to the rate control problem. This new coder is compatible with the H.263+ bit stream syntax, and aims at an R-D optimized performance with a reasonable amount of computational complexity. By employing fast estimation of the coding eciency with a rate-distortion model and performing an R-D based rate allocation, the hybrid coding scheme achieves higher coding gain at low bit rates.
1. INTRODUCTION Robust low bit rate visual communication is an active research area recently. Among various video compression schemes, H.263+ is an emerging low bit rate video coding standard, especially for video telephony and video conference applications. Similar to many other existing standards, the core ingredients of H.263+ include: block-based motion compensation and block DCT coding. Among the three types of frames of H.263+, i.e. intra (I), predictive (P), and bidirectional (B) frames, the I frame plays a major role in quality of the coded bitstream. Its coding requires a much higher bit rates than P/B frames since motion compensation is not employed. As the transmission bandwidth becomes narrower, the bit budget for I frames is a growing burden for the whole coded bitstream. Furthermore, unlike MPEG-1/2, there is no xed group of pictures (GOP) structure in H.263+. Only one intra frame refresh is required for 132 consecutive P-frames to control the accumulation of inverse transform mismatch error. A new H.263+ rate control scheme, which enables variable frame rate coding was developed from a global bit allocation viewpoint in our previous work [1], where a GOP is decomposed into one I frame and many P frames, and where P frames are divided into several sub-GOPs. Each sub-GOP was used as a basic unit for adjusting the frame rate. The issue of I frame coding was not considered in [1]. Instead, it was assumed that the bit budget for the I-frame could be determined by the allowed time-delay, the maximum buer size and the required video quality. The emphasis was on the maintenance of the quality of subsequent P frames to be as close as possible to that of the I-frame via rate control. An ecient I-frame coding scheme at low bit rates is proposed here to complement our previous work in [1] to
make the solution to H.263+ rate control complete. We consider a hybrid DCT/wavelet I-frame coding scheme, which provides an enhanced performance over DCT-based H.263+. The compressed bit stream is compatible with the syntax of H.263+. Even though several wavelet-based I-frame coding methods have been proposed in MPEG-4, they are not compatible with the syntax of H.263+. In addition to coding eciency, we also consider the tradeo of quality improvement and increased computational complexity in this work.
2. REVIEW OF I-FRAME CODING SCHEMES Generally speaking, there are two major approaches to Iframe (or still image) coding: block DCT and wavelet coding schemes. JPEG is a popular image coding standard using the block DCT transform. Recently, coders based on the multi-resolution wavelet transform such as EZW, LZC [8] and SPIHT [9] were proposed. These schemes provide a better performance over the DCT-based schemes. However, it is dicult to apply the above schemes directly to the Iframe coding of H.263+, since they are totally dierent from DCT-based schemes. Besides the multiresolution wavelet approach, model-based UTQ (uniform threshold quantizer) is another popular approach. UTQ is employed for subband signals since it approximates the optimum entropyconstrained scalar quantizer (ECSQ) for the generalized Gaussian distribution without complicated ECSQ. The characteristics of UTQ is fully determined by the xed step size and the deadzone size. A fast wavelet-based compression technique with model-based UTQ was proposed by LoPresto et. al. [6] for still image compression. In his work, each wavelet coecient is treated as a random variable with the Laplacian distribution. The standard deviation of the wavelet coecient to be encoded is estimated by using wavelet coecients coded earlier. Then, the optimal UTQ for each wavelet coecient is determined based on the estimated standard deviation. This scheme reduces the side information for quantizers. However, iterative calculation is required for the estimation of the standard deviation of each wavelet coecient. Thus, it is not ecient for I-frame coding. Silva and Ghanbari [2] used a hybrid DCT/wavelet approach to encode the I frame for video coding. Even though their scheme satis es the bit stream structure of H.261, it is not ecient to use block DCT for high frequency subbands. Various wavelet based I-frame coding schemes were proposed to replace the DCT based I-frame coding in MPEG4. A hybrid DCT/wavelet transform [3] was also proposed
LPF LPF
HPF
2
Q
H.263
2 HPF
2
LPF
2
HPF
2
High frequency subbands coding algorithm
2
H.263+ bit stream generator
Figure 1: Proposed hybrid DCT/wavelet I-frame coding for H.263+. in MPEG-4. However, these wavelet coders are not compatible with the H.263+ bitstream syntax.
3. PROPOSED H.263+ I-FRAME CODING SCHEME We propose a new ecient hybrid DCT/wavelet coding scheme which is compatible with H.263+ in this section. Its block diagram is depicted in Fig. 1. Each I-frame is decomposed into four subbands. The DCT-based H.263+ I-frame coding can still be applied to the LL subband without loss of eciency. However, images in the three high frequency subbands look like ghost images since only high frequency residual signals exist. We adopt a hybrid DCT/wavelet transform, and apply a fast R-D optimized bit allocation scheme to each subband. The high frequency subbands information can be transmitted with the supplementary option of H.263+.
3.1. Bit budget for the I frame H.263 rate control problem can be formulated as: minimize
Di
+
subject to
Ri
+
N ;1 X k=1 N ;1 X k=1
Dp;k ;
Rp;k
(1)
= R;
where Di is the distortion of the I frame, Dp;k is the distortion of the kth P frame, Ri is bit budget for the I frame, Rp;k is the bit rate for the kth P frame, N is the frame number of GOP and R is the given bit rate for a GOP. GOP is used as a basic rate control unit in MPEG, where bit rates for the I frame and P frames are allocated simultaneously. In H.263+, GOP is too long to be used as a basic rate control unit. One solution is to treat the I frame and P frames separately. Most existing H.263 rate control algorithms do not handle the I frame but only P frames. A rate control scheme for P frames was examined in detail in [1]. Eective I frame coding is crucial to the rate control of H.263 since the I frame coding requires a much higher bit rate, and its quality aects the quality of the whole GOP. For simplicity, let us assume that the bit budget for I-frame (Ri ) is determined by the maximum buer size, the time delay and the image quality requirements, and will focus on the I frame coding scheme.
3.2. Coding in the LL subband
The image in the LL subband is very similar to the original I-frame image with a size of a quarter of the original image. Hence, we apply the conventional DCT-based transform without loss of eciency, which is the standard I-frame coding scheme of H.263+. In H.263+, the I frame is divided into GOBs, GOB is divided into macroblocks (16 16) and macroblock is divided into blocks (8 8). For each block, 8 8 DCT transform is applied. Within a macroblock the same quantizer is used for all coecients except the rst one of INTRA blocks unless the optional advanced intra coding mode or modi ed quantization mode is used. The quantization parameter ranges from 1 to 31. The DCT-transformed and quantized coecients are entropy encoded. The quadratic rate model is the extended version of MPEG-2 test R-D model. The resulting distortion can be caused by quantizer and lossy DCT transform in LL subband. The quadratic rate model w.r.t. quantization parameter is Eq. (2). ;1 ;2 R1 (q1 ) = a q1 + b q1 ; (2) where q1 is the quantization parameter for a frame. As proposed in [5], we use linear regression method to determine the coecients a; b. b
=
n
;1 ;Pn qi Ri i=1 ; ;Pn ;1 2 Pn ; 2 n i=1 qi ; i=1 qi Pn ; qi Ri ; b qi;1 i =1 a= ;
Pn
i=1 Ri
;
;Pn
i=1 qi
(3)
(4) where n is number of frames observed previously, qi and Ri are actual encoding average quantization scale and bit count. In [5], distortion measure is represented as the average quantization scale of a frame because of the computational simplicity. In our work, ane distortion model w.r.t. quantization parameter is used. (see Eq. (5)). D1 (q1 ) = c q1 + d; 1 q1 31; (5) where c and d are determined by minimizing the MSE between the observed data and the estimated data. n
3.3. Coding in high frequency subband
Usually, it is assumed that high frequency subband of wavelet transform are Laplacian distributed with zero-mean. The probability density function of generalized Gaussian model is given by (; ) p(x) = (6) ;(1= ) exp(;((; )jxj) ); where is the standard deviation and is a shape parameter, ;1 ;(3= ) ; (; ) = ;(1= ) ;(z ) =
Z1
0
; z;1 dt:
exp( t) t
Hence, the probability density function of Laplacian distribution is determined completely by the standard deviation which is de ned in Eq. (7). The more practical variance estimation algorithm was considered in [4]. It can be computed as Npixel 1 X c2; 2 (7) = Npixel
i=1
900
800
700
rate
600
i
400
where ci is wavelet coecient and Npixel is the number of wavelet coecients in a subband. UTQ (uniform threshold quantizer) is employed in this work since it approximates the optimum entropy-constrained scalar quantizer (ECSQ) for the generalized Gaussian distribution without computing optimum ECSQ explicitly. Especially, we use UTQ with a deadzone equal to 1.5 times of step size to reduce the computational complexity and the side information. Reconstruction levels are optimized by using the centroid condition. For example, let us consider UTQ with 2L +1 levels. With the zero-mean assumption, the reconstruction level qj and its probability pj are symmetric. They are summarized in Eqs. (8)-(11). Only the luminance components(Y) of high frequency subband are encoded. p0 () =
2
1Z:5
q0
and pi ()
= 2
;x dx;
(0:5+i)
i
p
x
1.4
1.6
1.8
2 2.2 2.4 step size(w.r.t. variance)
2.6
2.8
3
2 2.2 2.4 step size(w.r.t. variance)
2.6
2.8
3
(a)
120
110
100
distortion
90
80
70
(8)
50
;x dx
2
1.4
1.6
1.8
(b)
Figure 2: (a) The rate model and (b) the distortion model in the LH subband for the Salesman sequence with observed data () and the model (solid line).
2e
(1:Z5+i) (0:5+i)
100 1.2
(9)
= 21 e;(0:5+i) ; e;(1:5+i) ;
1 qi () = p
200
40 1.2
= 0;
(1:Z5+i)
300
60
2e
0
500
(10)
e;x dx
= 21p (0:5 + i)e;(0:5+i) i ; 21pi (1:5 + i)e;(1:5+i) + 1 ; (11)
where = 2 , and where is the standard deviation. If the information about step size is available, we can reconstruct the quantized original data. Therefore, we can reduce the side information for the quantizer. Both UTQ and the adaptive arithmetic coder are used for high frequency subbands. For the generalized Gaussian distributed source with = 2, we can calculate the rate and distortion w.r.t. the control parameter theoretically by estimating the variances of high frequency subbands. However, it is observed that the accuracy of the model is not good enough even though the required computational complexity is small. Hence, we use the linear (ane) distortion
model and the quadratic rate model w.r.t. as the case in the LL subband. We use the following equation R()
= p0 () log(p0 ()) + 2
1 X i=1
pi () log(pi ())
(12)
to nd ecient control points in the range of interest and then use control points to determine the model coecients. We plot the derived R-D modeling result for high frequency subbands in Fig. 2. Based on this R-D model, we can exploit existing fast algorithms for bit allocation.
4. EXPERIMENTAL RESULTS In the experiment, the 7-9 biorthogonal wavelet lter [4] was used for the wavelet transform. Two CIF format image sequences were tested. They were Salesman and Akiyo. Control points were required to model the R-D curves of subbands. For LL subband, the set f3; 5; 8; 13; 21; 31g was used for q1 while the set f1:3,1:6,2, 2:5,3g was employed as control points for high frequency subbands, since we were interested in high frequency subbands for the very low bit rate cases. These control points were determined by using Eq. (12).
30.5
30
29.5
Hybrid approach
PSNR(dB)
29
28.5
28
DCT−based approach
27.5
27
26.5 28
30
32
34 36 Rate(kbit)
38
40
42
(a)
Figure 4: Performance comparison with the same bit budget for the I frame: PSNR plot of a GOP (Salesman) when the same rate control algorithm is used.
35 34.5 Hybrid approach 34 33.5
the solution of the general rate control problem for H.263+. It was shown that the proposed I-frame coding scheme reduced the blocking artifact at low bit rates signi cantly. Post-processing techniques can also be incorporated at the decoder end to further enhance the coded I-frame quality.
PSNR(dB)
33 32.5 32
DCT−based approach
31.5 31
6. REFERENCES
30.5 30 26
28
30
32
34 Rate(kbit)
36
38
40
42
(b)
Figure 3: Comparison of the PSNR plot of the I frame with the proposed hybrid scheme and the DCT-based scheme: (a) Salesman and (b) Akiyo. The PSNR plots of the I frame for CIF Salesman and Akiyo sequences with the proposed hybrid scheme and the DCT-based scheme are shown in Figs. 3 (a) and (b), respectively. In these gures, one can see that the PSNR value is greatly improved at low bit rates in both cases with the proposed hybrid scheme. In addition, the PSNR values of the following P frames also increase with a xed I-frame bit budget (encoded at about 32 kbits in the experiment). The result is shown in Fig. 4. It is clear from the gure that the proposed hybrid I-frame coding scheme improves the quality of whole frames in GOP in terms of the PSNR objective measure. Since the ith P-frame is used as a reference frame for the (i + 1)th Pframe, the proposed hybrid I-frame coding scheme actually improves the whole frames of a GOP. In terms of the subject quality measure, the blocking artifact is very obvious at low bit rate in the DCT-based coding scheme. The proposed coding scheme also reduces blocking artifact signi cantly.
5. CONCLUSION Due to the important role of the I frame at low bit rates, the I-frame coding must be included as an integral part for eective rate control. In this work, a hybrid DCT/wavelet I-frame coding scheme was proposed as a building block to
[1] Hwangjun Song and C.-C.Jay Kuo, "H.263+ rate control via variable frame rates and global bit allocation," In Proc. VCIP98, San Jose, Jan., 1998. [2] E. A. B. Da Silva and M. Ghanbari, "A Hybrid subband-DCT codec for Transmission of high resolution still pictures at 64 kbits," Journal of Visual communication and Image representation, Vol. 6, No. 2, pp. 164{177, June, 1995. [3] MPEG Video group, "Results of core experiment T13 :Block based DCT and wavelet selective coding," ISO/IEC JTC1/SC29/WG11 MPEG96/M1470, Nov., 1996. [4] Jin Li, Po-Yuen Cheng and C.-C. Jay Kuo, "Image compression using fast rate-distortion optimized wavelet packet transform," submitted to IEEE Trans. on CAS for Video Technology, 1996. [5] Tihao Chiang and Ya-Qin Zhang,"A new rate control scheme using quadratic rate distortion model," IEEE Trans. on CAS on Video Technology, pp. 246{250, Vol.7,No.1,Feb.,1997. [6] Scott M. LoPresto, Kannan Ramchandram and Michael T. Orchard, "Image coding based on mixture modeling of wavelet coecients and a fast estimationquantization framework," In Proc. of VCIP'97. [7] Youngjun Yoo, A. Ortega and B. Yu,"Image subband coding using progressive classi cation and adaptive quantization," submitted to IEEE Trans. on Image Processing, Jun., 1997. [8] D. Tauman and A. Zakhor, "Multirate 3-D subband coding of video," IEEE Trans. on Image Processing, Vol. 3, No. 5, Sep., 1994. [9] A. Said and W. A. Pearlman, "A new fast and ecient image codec based on set partitioning in hierarchical trees," IEEE Trans. on Video Technology, Vol. 6, June, 1996.