loop bit-rate estimation would skip the processes of quantization, inverse transform, entropy coding and reconstruction; we then use the estimated bit-rate and ...
Efficient Intra-4×4 Mode Decision Based on Bit-rate Estimation in H.264/AVC Jiaying Liu and Zongming Guo Institute of Computer Science and Technology, Peking University Beijing, China, 100871 Email: {liujiaying, guozongming}@icst.pku.edu.cn Abstract— Rate-distortion optimization (RDO) technique is widely employed by H.264/AVC for the purpose of determining the best mode. However, such technique results in dramatic increase in the computation complexity of the underlying encoder. In this paper, we address this problem by presenting an efficient intra-4×4 mode decision algorithm. The algorithm works by approximating the bit-rate so as to reduce the computational cost of RDO and the main idea is the following: First we have found a quick way to estimate the bit-rate via the number of DCT coefficients to be quantized to 0 and that to be quantized to ±1. The parameters of the estimated function are adaptively obtained by using the Least Squares Fitting method of the above and the left block in the current frame and the co-location one in the previous encoded frame as the feedback. This close loop bit-rate estimation would skip the processes of quantization, inverse transform, entropy coding and reconstruction; we then use the estimated bit-rate and Sum of Absolute Differences (SAD) to simplify the optimization process of R-D cost function. Experimental results show that our scheme decreases the time for intra coding by 50% with negligible loss of PSNR, and that comparing with those fast mode-decision algorithms based-on local edge direction, the optimal prediction mode obtained via our scheme is closer to that obtained via the original RDO in statistic.
I. I NTRODUCTION The H.264/AVC is the latest video coding standard developed by Joint Video Group (JVT) of ITU-T Video Coding Experts Group (VCEG) and ISO/IEC MPEG Video Group. It has been approved as the most efficient coding solution upto-date. The H.264/AVC encoder is able to offer up to 50% bit-rate reduction compared with MPEG-4 Advanced Simple Profile (ASP) [1], under the same PSNR quality. The rate-distortion optimization procedure is utilized in the intra and inter prediction of H.264/AVC for the purpose of selecting the optimal one from all candidate mode combinations. The mode selected by RDO yields the best visual quality under a given bit-rate, rather than just minimizing the bit-rate or maximizing the visual quality. However, such full search process results in tremendous increase in computation complexity, so it makes H.264 not applicable to real-time applications with low computation capability. A lot of efforts have been made in developing fast mode decision algorithms for intra prediction in H.264/AVC to reduce the computation complexity. Pan, et al. [2]. proposed a fast mode decision algorithm for intra prediction based on the local edge information; the algorithm measures a local edge map for a given block using Sobel operator and establishes the
978-1-4244-1684-4/08/$25.00 ©2008 IEEE
corresponding edge histogram, then it selects candidate modes based on edge direction for RDO. Cheng [3] presented a fast three-step algorithm for 4×4 intra prediction. In the algorithm, three high probable modes are initially compared in step 1. In step 2, two of neighboring modes are examined to determine the refined direction. And in the last step, R-D cost of the refined mode is calculated to make the final decision. Kim [4] proposed a scheme using spatial and transform domain features of the target block jointly to filter out the majority of candidate modes by analyzing the relation between the posterior error probability and average rate-distortion loss. These fast algorithms mainly focus on filtering out unlikely modes by pre-processing, without simplifying the RDO process itself; and not many works have been done along the latter direction. In this paper, we present an efficient bit-rate estimation function with SAD distortion measure to facilitate the process of optimizing R-D cost function. The resulting R-D cost function adequately considers the trade-off between the bitrate and the distortion, so that the optimized mode is not far away from the mode chosen by initial RDO. Simultaneously, the computation complexity is as low as SAD cost. The remainder of the paper is organized as follows. In Section II, intra-4×4 mode decision in H.264/AVC is introduced and Section III presents our proposed scheme in detail. Experimental results are presented in Section IV. Finally, a conclusion is given in Section V. II. I NTRA 4×4 MODE DECISION IN H.264/AVC H.264/AVC utilizes the directional intra prediction in spatial domain. Using the samples above and to the left (labeled A, B, . . . , M ) previously encoded and reconstructed, the current block is predicted according to the maximum correlated direction. Fig. 1 shows all kinds of nine intra prediction modes for 4×4 luma blocks. The samples a, b, . . . , p of the prediction block P are calculated based on the samples A, B, . . . , M . For intra-4×4 prediction, the mode decision for each of the sixteen 4×4 blocks is performed by minimizing RDCostinit = SSD(s, c | Qp ) + λM ODE · R(s, c | Qp ) (1) where Qp indicates the macroblock quantization parameter, λM ODE is the Lagrangian multiplier, R(·) represents the number of bits associated with the chosen mode and SSD(·) means the sum of the squared differences between the original
492
Authorized licensed use limited to: Peking University. Downloaded on March 4, 2009 at 21:30 from IEEE Xplore. Restrictions apply.
considering the effect of the bit-rate in the RDO function. To avoid the error caused by SAD, algorithms based on the local edge direction often select one or two more modes, such as the adjacent direction modes and DC mode as other candidate modes for RDO calculation. It is desirable to propose a R-D cost estimation which not only has simple calculation features like SAD, but also reflects the impact of the bit-rate, closer to initial RDO decision. III.
Fig. 1.
4×4 luma prediction modes
4×4 luma block denoted by s and its reconstruction c at each candidate mode. 3 3 SSD(s, c) = (sij − cij )2 i=0 j=0
In the calculating process of SSD, cij is the reconstructed pixel value after transform, quantization, inverse quantization and inverse transform. That means for each candidate mode, every block needs to be encoded and decoded for the sake of obtaining the rate and the distortion. Therefore, the number of mode combinations for luma and chroma components in an MB is M 8 × (M 4 × 16 + M 16), where M 8, M 4, and M 16 represent the number of modes for chroma prediction, I4MB prediction and I16MB prediction respectively. It means that it has to perform 4 × (9 × 16 + 4) = 592 different RDO calculations for an MB, before the best RDO mode is determined. As a result, the complexity of the encoder is extremely high. Due to this problem, JM6.1d provides another cost function to improve encoding performance [5]. RDCostSAD = SAD(s, p) + 4P · λM ODE
(2)
Before elaborating our solution, we first introduce the two notations: nzero denotes the number of DCT coefficients to be quantized to 0, and none represents the number of DCT coefficients to be quantized to ±1; the latter one is also called trailing one coefficient. We propose a scheme to obtain a simplified R-D cost estimation so that we could quickly determine the intra mode. First, we predict nzero and none of a quantized 4×4 DCT block based on the residual values; second, after studying CAVLC features, we estimate the bit-rate necessary for encoding the residual, using nzero and none predicted during the first step; finally, we use such novel R-D cost function to select the intra prediction mode quickly. The estimated RD cost integrates the bit-rate and intra prediction distortion, omitting the job of encoding every mode for every block, and thus avoiding the complexity of transform, quantization and entropy coding. A. Prediction of Zero and Trailing One DCT Coefficients To predict the numbers of zero coefficient and that of ±1 coefficient among the quantized DCT coefficients, nzero and none , we investigate the integer DCT transform and quantization procedure in H.264/AVC. A 4×4 integer DCT transform can be found as follows [6] Y (u, v) = A · X(x, y) · AT ⎡ a a a a ⎢ b c −c −b and A = ⎢ ⎣ a −a −a a c −b b −c
where SAD is written using the following equation, 3 3 SAD(s, p) = |sij − pij | i=0 j=0
and pij is the (i, j)th element of the prediction block. Factor P equals to 0 for the most probable mode and 1 for the other modes. This cost function would reduce the amount of computation significantly; however, there is noticeable difference between R-D cost based on SSD and that based on SAD, since the R-D cost based on SAD neglects the effect of the bit-rate. Especially, when comparing nine intra modes, if the second minimum SAD value is very close to the minimum value, which may induce higher bit-rate, multiplied by λM ODE , the mode decision will be different from the mode selection based on initial RDO. For most conventional fast intra mode decision algorithms [2]–[4], the main idea is to choose the mode having the maximum correlated direction with local edge information. It is the same as finding the minimum SAD mode with the coding block in nine predicted block essentially, without
EFFICIENT INTRA -4×4 MODE DECISION SCHEME
⎤ ⎥ ⎥ ⎦
(3)
where A is the transform matrix with entries a = 1, b = 2, c = 1. Given an integer DCT coefficient Y (u, v) and the quantization parameter Qp , the quantization coefficient Z(u, v) is written as |Z (u, v)| = (|Y (u, v)| · M (u, v) + f ) >> qbits, 0 ≤ u, v ≤ 3 (4) where qbits = 15 + f loor(Qp /6), Qp is ranging from 0 to 51. For the intra block, f = (2qbits /3). M (u, v) is the multiplication factor related to Qp %6 and three categories can be classified depending on the positions of the quantization table [1]. Wang [7] proposed the sufficient condition for DCT coefficients to be quantized to zeros. In order to obtain the exact zero number and reduce the error of the bit-rate prediction,
493
Authorized licensed use limited to: Peking University. Downloaded on March 4, 2009 at 21:30 from IEEE Xplore. Restrictions apply.
we offer the precision condition to quickly predict nzero and none . Further, instead of the normal matrix multiplication, we fully expand Eq.(3), and derive DCT transform into another expression by the transform matrix Aa (u, v) = [A(x, u) · AT (y, v)]4×4 . Y (u, v) =
3 3
Xa ⊗ Aa (u, v)
(5)
x=0 y=0
where Xa = [X(x, y)]4×4 , ⊗ denotes that each element of Xa is multiplied by the element in the same position of matrix Aa (u, v). For each element in the integer DCT coefficient matrix Y , whether it will be quantized to 0, we should consider the quantization value T (u, v), which is obtained by Eq.(4) T (u, v) ≈
2qbits − f M (u, v)
(6)
So the condition to detect zero quantized DCT Coefficients, nzero , can be given as |Y (u, v)| < T (u, v) Take an example of u = 0, v = 1, ⎡ 2 1 ⎢ 2 1 Aa (0, 1) = ⎢ ⎣ 2 1 2 1
−1 −1 −1 −1
(7) ⎤ −2 −2 ⎥ ⎥ −2 ⎦ −2
The transform matrix Aa (u, v) is regularity and symmetry, thus we divide the 4×4 block into four parts, such as AVi =
3
X(x, i),
i = 1, 2, 3, 4
(8)
x=o
Therefore, considering Eq.(8), we offer Y (0, 1) = 2AV0 + AV1 − AV2 − 2AV3
(9)
So the final condition for DCT coefficient in (0, 1) to be zero after quantization is
Fig. 2.
B. Bit-rate Estimation in CAVLC Entropy Encoder Context-based adaptive variable length coding (CAVLC) is a method for encoding residual, Zig-zag ordered 4×4 block of transform coefficients. It is designed to take advantage of several features [8]: 1) If there are non-zero coefficients, it is typically observed that there is a string of coefficients at the highest frequencies that are ±1. But for trailing one only the sign has to be coded. 2) For coefficients other than trailing one, level information is coded. Since this is a one dimensional parameter, coding is simplified and well structured VLCs are used. Let NCoef f denote the number of the total non-zero coefficients, NOne represent the number of ±1. Due to the above feature analysis, we conclude that two important CAVLC coding parameters, NCoef f and NOne , affect the length of coding bits. Obviously, NCoef f ≥ NOne and according to VLC tables, the larger NCoef f , would produce more encoded bits. Thus, we propose the linear function to estimate the encoded bits as the output of CAVLC, B = a × NCoef f + b × NOne + ε
T (u, v) ≤ |Y (u, v)| < 2T (u, v)
(12)
where a and b are the parameters to be fitted, ε equals to 1 for the most probable mode and 4 for other modes. We obtain the actual feedback on (NCoef f , NOne , Actual bits) of the above and the left encoded 4×4 block Ref.M, Ref.A, Ref.B in the current frame, and the colocation block Ref.N in the previous encoded frame separately, as shown in Fig.2. Then, by the method of the Least Squares Fitting, the parameter value of a and b could be calculated.
|Y (0, 1)| = |2AV0 + AV1 − AV2 − 2AV3 | < T (0, 1) (10) And other positions are calculated, corresponding with each transform matrix Aa (u, v). For convenience, we divide the block into different parts. In a similar way, we consider the condition of judging none by
Adaptive parameters prediction
⎡
⎤ ⎡ M NCoef f BM A ⎢ B A ⎥ ⎢ NCoef ⎢ B ⎥=⎢ B f ⎣ B ⎦ ⎣ NCoef f N BN NCoef f
⎤ ⎡ M NOne A ⎥ ⎢ NOne ⎥ [a b] + ⎢ B ⎦ ⎣ NOne N NOne
⎤ εM εA ⎥ ⎥ εB ⎦ εN
(13)
when using pure matrix notation, which becomes
(11)
By utilizing the above fast algorithm, nzero and none predicted are symbolized by Npre Zero and Npre One .
=N [a b] + ε B The least-squares estimator for [a b] is )−1 N TB TN [a b] = (N
494
Authorized licensed use limited to: Peking University. Downloaded on March 4, 2009 at 21:30 from IEEE Xplore. Restrictions apply.
(14)
TABLE II P ERFORMANCE OF OUR PROPOSED SCHEME RELATIVE TO JM8.6
C. Decision Based on Modified R-D Cost Taking into account with above-mentioned estimation algorithms, we predict Npre Zero and Npre One by using fast algorithm in subsection III-A . Then combining with the empirical fit coefficients obtained in subsection III-B for CAVLC coding, the length of encoding coefficients will be estimated. With SAD as the distortion measure, the modified R-D cost function to select the best mode becomes
Format
Sequence
Foreman News Container Mobile CIF Paris Tempete Bike Average
QCIF
RDCostmodif ied = SAD + λM ODE · [a × (16 − Npre Zero ) − b × Npre One + rate mode] (15) Using the simple R-D cost function based on the bit-rate estimation, nine intra modes have been calculated. Hence, we could select the minimum R-D cost as the best mode.
ΔP SN R(dB) Proposed Pan’s -0.240 -0.285 -0.262 -0.294 -0.216 -0.234 -0.224 -0.255 -0.283 -0.230 -0.240 -0.229 -0.262 -0.262 -0.246 -0.256
ΔBits(%) Proposed Pan’s 0.099 4.437 0.320 3.902 0.076 3.695 0.400 3.168 0.590 3.210 0.450 3.514 0.690 3.216 0.375 3.592
M ode best = M ode(min RDCostmodif ied ) IV. E XPERIMENTAL RESULTS The intra-4×4 mode decision scheme as described in Section III has been evaluated with H.264/AVC reference software JM8.6. The parameter configure file is based on JVT Main Profile with reference frame number set to 0 and GOP structure set to all I frames. The system platform is the Intel Pentium D Processor of speed 2.80GHz, 1GB DDR RAM and Microsoft Windows 2003 Server. A set of experiments are carried out with the quantization parameters 20, 22, . . . , 40. Seven test sequences are selected from the recommended sequences, including four QCIF sequences and three CIF sequences. For each sequence, 300 frames are encoded. We compare the terms of computation time, PSNR and the bit-rate with initial JM8.6 full search algorithm and Pan’s scheme, respectively. It can be seen from Table I that the efficient intra-4×4 mode decision algorithm achieves the time saving about 50%, compared with JM8.6. The average loss of PSNR is 0.24dB, and a slight increment in bit-rate (for some sequences, such as Claire and Akiyo, the bit-rate decreases). Table II shows that our proposal has superiority over Pan’s method. In Pan’s results, the average loss of PSNR is 0.256dB and the increment of bit-rate is 3.6%. Moreover, Fig. 3 illustrates the R-D curves of two sequences “Foreman” with QCIF format, “Akiyo” with CIF format. These two figures also show that our efficient intra mode decision algorithm has the similar RDO performance as that of JM8.6.
(a) Foreman, QCIF Fig. 3.
V. C ONCLUSION In this paper, we propose an efficient intra-4×4 mode decision for H.264/AVC algorithm. Based on the precise prediction of integer DCT coefficient to be quantized, the algorithm estimates the bit-rate of encoding the 4×4 block efficiently. The experimental results show that, compared with JM8.6, our scheme reduces the encoding complexity significantly, while on average having a similar PSNR and increasing the bit-rate only slightly. ACKNOWLEDGMENT This work was supported by the National High-Tech Research and Development Plan of China (863 Program) under Grant No. 2006AA01Z320.
TABLE I P ERFORMANCE OF OUR PROPOSED SCHEME RELATIVE TO JM8.6 Format
Sequence Container QCIF Foreman Claire Mobile CIF Bike Tempete Akiyo Average
ΔT ime(%) -51.00 -48.65 -46.41 -52.40 -55.50 -55.01 -45.09 -50.58
ΔP SN R(dB) -0.216 -0.240 -0.267 -0.224 -0.262 -0.240 -0.260 -0.244
(b) Akiyo, CIF
R-D curves comparison with JM8.6
ΔBits(%) 0.076 0.099 -1.090 0.400 0.690 0.450 -1.160 -0.076
R EFERENCES [1] G. J. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions”. In:SPIE Conference on Application of Digital Image Processing XXVII, Vol. 5558, part 1, 454–474, Aug. 2004. [2] Feng Pan and Xiao Lin et. al., “Fast mode decision for intra prediction”, ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, JVT Pattaya II, Thailand, March 2003. [3] Chao-Chung Cheng, Tian-Sheuan Chang, “Fast Three Step Intra Prediction Algorithm for 4×4 blocks in H.264”, In: International Symposium on Circuits and Systems, 1509–1512, 2005. [4] Changsung Kim, Hsuan-Huei Shih, C.-C. Jay Kuo, “Fast H.264 Intraprediction mode selection using joint spatial and transform domain features”, Journal of Visual Communication and Image Representation, 17 (2006) 291–310, 2006. [5] Joint Video Team (JVT) Reference Software. ver. 6.1d [Online]. Available: http://bs.hhi.de/ suehring/tml/download/Unofficial/ [6] A. Hallapuro, M. Karczewicz, and H. Malvar, “Low complexity transform and quantization”,In: Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, Jan. 2002, Docs. JVT-B038 and JVT-B039. [7] Hanli Wang, Sam Kwong, and Chi-Wah Kok, “Efficient Prediction Algorithm of Integer DCT Coefficients for H.264/AVC Optimization”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 16, No. 4, 547–552, Apr. 2006. [8] I. E. G. Richardson, H.264/MPEG-4 Part 10: Variable length coding H.264/MPEG-4 Part 10 White Paper, 2003.
495
Authorized licensed use limited to: Peking University. Downloaded on March 4, 2009 at 21:30 from IEEE Xplore. Restrictions apply.