A New Gradient-Based Mode Selection of Intra Prediction for 4x4 block in H.264/AVC Kyeong-Yuk Min
Jong-Wha Chong
Dept. of Electronic Engineering, Hanyang University 17 San, Haengdang-Dong, Sungdong-Gu Seoul, 133-791 KOREA
[email protected]
Dept. of Electronic Engineering, Hanyang University 17 San, Haengdang-Dong, Sungdong-Gu Seoul, 133-791 KOREA
[email protected]
Abstract – In this paper, we propose A New Gradient-Based Mode Selection of Intra Prediction for 4x4 block in H.264/AVC video coding standard. In order to achieve rate distortion optimization, H.264/AVC encoder has to code the video by exhaustively trying all the mode combinations, including different intra prediction modes. As a result, the computational complexity of H.264/AVC encoder increases drastically. Based on local edge direction, the proposed method considerably reduces the overall complexity and encoding time by using a few candidate modes instead of all nine modes for intra mode selection of H.264/AVC. The proposed method, which was implemented into JM9.4 provided by Joint Video Team(JVT), can reduce about 58.4% in regard to computation complexity and 51.9% as to encoding time over full search method while maintaining similar PSNR and bite rate.
prediction modes that have specific directions of prediction are supported. All possible directions are shown in Figure1 (a). Figure 1 (b) illustrates the intra prediction for a 4x4 luminance block..
(a) Eight directions for nine prediction modes in H.264
I. INTRODUCTION Some advanced features of H.264/AVC include 4x4 integer DCT, intra prediction, quarter-pixel motion estimation, multiple reference frames and variable block size motion compensation for Inter-frame, etc. Relative to prior video coding standards, these features enable to improve more than 50% coding efficiency while achieving similar PSNR and bit rate.[1] Traditionally, I-pictures are encoded by directly applying the transform to all macroblocks in a picture. I-pictures from the intra coding generates much larger number of data bits compared to P or B-pictures by the inter coding.[2] In order to increase the efficiency of the intra coding, H.264/AVC standard exploits intra prediction which is able to achieve higher compression efficiency from correlation removal of adjacent samples in spatial domain.[3] Intra prediction means that the samples a-p of the prediction block P are predicted by using the samples A-M of transmitted and reconstructed blocks of the same image in figure 1 (b). The encoder typically selects the prediction mode that minimizes the difference between a predicted block and a reconstructed block. In H.264/AVC standard, two different types of intra prediction are possible for the prediction of the luminance component. The first type is called the INTRA_4x4 and the second one is called the INTRA_16x16. In the type of INTRA_4x4, the macroblock is divided into sixteen 4x4 sub-blocks. Each of the sixteen 4x4 sub-blocks is individually predicted for luminance component of each 4x4 sub-block. For the intra prediction, nine different prediction modes are supported. In DC mode among nine prediction modes, all samples of the current 4x4 sub-block are predicted by the mean value of all samples, which have been already reconstructed at the coder, neighboring to the left and to the top of the current block. Besides DC prediction mode, eight 1-4244-0136-4/06/$20.00 '2006 IEEE
(b) A 4x4 prediction block P and its neighboring samples Fig. 1. Nine prediction modes and intra prediction for a 4x4 block
Note that in figure 1 (b), a to p are samples to be predicted, and A to M are the neighboring samples that are assumed to be already decoded. For example, if the vertical prediction mode is applied, all samples below sample A are predicted by sample A, all samples below sample B are predicted by sample B and so on. Four different prediction modes are supported for the type of INTRA_16x16. These are vertical prediction, horizontal prediction, DC prediction and plane prediction. Plane prediction uses a linear function between the neighboring samples to the left and the top in order to predict current samples. This mode is suitable for smoothly changing luminance blocks. Others except for plane prediction mode are the same as the type of INTRA_4x4. The only difference between the type of INTRA_16x16 and INTRA_4x4 is that the prediction modes for the type of INTRA_16x16 are applied to the whole macroblock instead of for a 4x4 sub-block. The efficiency of prediction modes for the type of INTRA_16x16 is high if the signal is very smooth within the macroblock. The intra prediction for the chrominance signals Cb and Cr of a macroblock is similar to the type of INTRA_16x16 luminance, because the chrominance signals are very smooth in the most cases. It is performed always on 8x8 sub-blocks using vertical prediction, horizontal prediction, DC prediction or plan prediction. Figure 2 illustrates nine prediction modes of intra prediction for 4x4 blocks.
3440
Fig. 2. Intra prediction with nine modes for 4x4 block.
To achieve the highest coding efficiency, H.264 uses rate distortion optimization technique to get the best coding result in terms of maximizing coding quality and minimizing resulting data bits.[4] This means that the encoder has to encode the video by using all available mode combinations in rate distortion optimization technique for intra blocks. As a result, the computational complexity of video coding in H.264/AVC increase drastically, which makes it very difficult for practical applications such as real time video communication.[5] So, we need to reduce the computational complexity and video encoding time avoiding a bottleneck in intra frame coding for practical real time hand-held applications.[4][5] We propose a new fast intra prediction mode selection algorithm which can resolve above problems. The proposed algorithm is presented in section II. In section III, experimental results are reported. Finally, conclusions are drawn in Section VI. II. GRADIENT-BASED METHOD SELECTION In the reference software JM9.4 by Joint Video Team, the full search algorithm is used to examine all the nine modes for each of the 4x4 block in one macroblcok.[6] The full search algorithm for intra prediction of a 4x4 block is described as following steps. At the first step, a predicted 4x4 block is produced from one of nine modes for intra prediction. Rate distortion optimization cost is computed after calculation of SAD16 between the original block and the predicted block in the second step from the Eq (1) C o s t 1 6 = S A D 1 6 + 4 P λ ( Q P ) R (1) where (QP) is an exponential function of the quantization factor QP, P equals 0 for the most probable mode and 1 for the other modes, respectively. The process of INTRA_4x4 intra prediction performs repeatedly for the remaining eight modes over previous two steps. After the above process, the one mode that has the minimum cost is chosen for the 4x4 block. Although the full search algorithm can archive optimal selection of prediction mode, it is computationally expensive. In this section, we describe the proposed algorithm that is a novel fast mode decision algorithm for 4x4_type intra prediction in H.264/AVC. It can reduce an amount of calculation for intra prediction by using a few candidate modes based on local edge directional information. Figure 3 shows the process of proposed algorithm.
Fig. 3. The process of proposed algorithm for intra prediction
Prior to the process of proposed algorithm, we have to divide a 4x4 block into enter and external parts for the precision from the values of gradient concerning four modes (0,1,3,4) depicted in figure 4.
Fig. 4. Candidate modes and it’s neighboring modes
Figure 5 shows the divided two parts including all samples in a 4x4 block. In the first step, the gradient values for four directional prediction modes (0,1,3 and 4) are computed from the four samples of center part in a 4x4 block. These can be obtained by using the following Eq (2). GStep1_ 0 = (| f − g | + | j − k |) / 2
(2)
GStep1_1 = (| f − j | + | g − k |) / 2 GStep1_ 3 = (| f − k |) GStep1_ 4 = (| g − k |)
From the Eq 2, Gstep1_0 represents the gradient value of mode 0. The rest of modes, which correspond to the gradient values with horizontal and diagonal directions, can be also calculated from the Eq (2). In the Eq (2), G is a abbreviator of gradient value and the step1_0 presents mode 0(vertical) in step1. part
pixels
Center
f, g, j, k
external
a, b, c, d, e, h, i, l, m, n, o, p
Fig. 5. The samples of center and external part in a 4x4 block
step 2, we can also compute gradient values for four modes by using external samples of a 4x4 block as step 1. GStep 2 _ 0 = (| c − h | + | i − l |) / 2 GStep 2 _ 1 = (| b − n | + | c − o |) / 2
3441
(3)
GStep 2 _ 3 = (| a − p | + | b − l | + | c − o |) / 3
candidate modes in step 5. Finally, one mode that has a minimum cost for rate distortion optimization is selected for intra prediction.
(4)
GStep 2 _ 4 = (| c − i | + | d − m | + | h − n |) / 3
Gm > Gm > Gm > ....... > Gm
(5) From the Eq (3), Gstpe2_0 is the gradient value for vertical direction (mode 0), and Gstpe2_1 denotes the gradient value for horizontal direction (mode 1). In the Eq (4), Gstpe2_3 and Gstpe2_4 represent gradient values for diagonal down-left (mode 3) and diagonal down-right (mode 4), respectively. These values are also computed from external samples. In a similar way as the step 1, we are able to calculate the gradient values for four modes (mode 0,1,3,4) by using the external samples. The sets for candidate modes can be selected from step 3 and step 4. In step 3, the set Gm18 is obtained by the step 1-2. Gm = { Gm , Gm , Gm , Gm , Gm , Gm , Gm , Gm } (6) Where the Gm18 set includes from Gm1 to Gm8. Gm1 and Gm8 denote the maximum and the minimum gradient value from step 1-2, respectively. In order to select the set for candidate modes, the Gm13 set is chosen. The differential value of between Gm1 and Gm8, Gm18 is computed for the decision whether DC mode is included for the set of candidate modes or not. If the differential value is small than threshold, then DC mode is included for a set of candidate 1
2
18
3
1
2
3
4
8
5
6
7
Our proposed algorithm was implemented in JM.9.4 and simulated on sequences of 100 frames for QCIF and CIF. The various QP factors for the proposed algorithm are tested. The test conditions are listed in Table 3. The encoding time was measured on the proposed method and the full search algorithm for test sequences. The PSNR and the computational complexity of proposed algorithm from the test sequence were compared with those of the full search algorithm. TABLE IV EXPERIMENTAL CONDITIONS FOR PROPOSED ALGORITHM
Test condition
8
modes or not. If the differential value is small than threshold, then DC mode is included for a set of candidate modes. Table 1 shows threshold values according to the range of parameter (Qp). TABLE I THRESHOLD VALUES TO THE RANGE OF PARAMETER (Qp)
The Range of parameter( Q )
Threshold
Q p < 20
20 ≤ Q p < 35
80 160
Qp ≥ 35
255
p
III. EXPERIMENTAL RESULTS
Profile Intra period Transform Symbol mode ROD Optimization Rate control
Baseline 1 Use Hadamard UVLC Enable Disable
The computational complexity is defined as the average number of searching for prediction modes in a frame.[7] We regard the computational load as a fourth of one search in our experiments. So, we can compute the total complexity which is defined as follows. Total computational complexity = (Lcomp + ns ) * Nblk
(7)
Where Lcomp is the computational load, ns is the average search number of prediction modes per frame. Nblk is defined as the number of 4x4 blocks of one frame.
TABLE II PSEUDO CODE FOR CANDIDATE MODES If(Gm1 == 0 && Gm2 == 3|| Gm1 == 3 && Gm2 == 0) Candidate Modes = {0, 7, 3}; elseif(Gm1 == 0 && Gm2 == 4|| Gm1 == 4 && Gm2 == Candidate Modes = {0, 5, 4}; elseif(Gm1 == 1 && Gm2 == 3|| Gm1 == 3 && Gm2 == Candidate Modes = {1, 8, 3}; elseif(Gm1 == 1 && Gm2 == 4|| Gm1 == 4 && Gm2 == Candidate Modes = {1, 6, 4}; elseif(Gm1 == 3 && Gm2 == 4|| Gm1 == 4 && Gm2 == If(Gm3 == 1) Candidate Modes = {3, 8, 1, 6, 4}; Else /* (Gm3 == 0) */ Candidate Modes = {3, 7, 0, 5, 4}; End if; End if;
0) 1)
Fig. 7. Intra predicted results of Container (CIF)
1)
Figure 7 shows the results of intra predicted images for the proposed method and the full search algorithm about Container sequence. It can be seen that the predicted images by the full search algorithm and by proposed method have similar visual quality. The proposed method can reduce about 58.4% in regard to computation complexity and 51.9% as to encoding time over the full search method while maintaining similar PSNR and bit-rate. Table 4~9 show the experimental results for various test image sequences.
3)
TABLE III THE TABLE FOR CANDIDATE MODE SETS WITHOUT DC MODE
Gm12 Set
Gm3 Set
{0, 3}
X
Set of the candidate modes in step 4 {0, 7, 3}
{0, 4}
X
{0, 5, 4}
{1, 3}
X
{1, 8, 3}
{1, 4}
X
{1, 6, 4}
{3}
{0, 1, 7, 8, 3}
{4}
{0, 1, 4, 5, 6}
{0}
{3, 7, 0, 5, 4}
{1}
{3, 8, 1, 6, 4}
{0, 1} {3, 4}
A. Experimental results of QCIF test sequences The proposed method and the full search are simulated on three QCIF sequences that are “Akiyo”, “Coastguard” and “Fore_man”. Note that positive values mean increases, and negative values mean decreases in these tables.
In table 1, two sets of Gm12 and Gm3 are chosen as sets of candidate modes for intra prediction mode selection in step 5. Rate distortion optimization costs can be calculated by the 3442
TABLE VIIII EXPERIMENTAL RESULTS OF CONTAINER(CIF)
TABLE IV EXPERIMENTAL RESULTS OF AKIYO(QCIF) QP 8
20
35
45
Method FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%)
Complexity 13815 6548 52.6 13815 6880 50.2 13815 5236 62.1 13815 4918 64.4
Encoding Time(Second) 124.456 67.667 45.63 209.579 114.074 45.57 172.186 81.909 52.43 144.935 65.453 54.84
Bits 10534728 10217633 3.01 4648864 4513582 2.91 1234184 1218140 1.3 401816 398320 0.87
PSNR(dB)
QP
57.76 57.53 0.23 48.14 48.06 0.08 36.35 36.29 0.06 28.27 28.27 0
8
20
35
45
TABLE V EXPERIMENTAL RESULTS OF COSTGUARD(QCIF) QP 8
20
35
45
Method
Complexity
Encoding Time(Second)
Bits
PSNR(dB)
FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%)
13815 6189 55.2 13815 5982 56.7 13815 6037 56.3 13815 5885 57.4
281.008 150.620 46.4 220.117 117.542 46.6 137.705 64.171 53.4 113.356 44.980 60.32
11408247 11151561 2.25 4956248 4844237 2.26 1131282 1111937 1.71 255287 252581 1.06
57.93 57.84 0.09 47.62 47.57 0.05 35.49 35.47 0.02 27.43 27.43 0
8
20
35
45
Method
Complexity
Encoding Time(Second)
Bits
PSNR(dB)
FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%)
13815 6590 52.3 13815 6161 55.4 13815 5719 58.6 13815 4794 65.3
292.304 156.675 46.4 233.084 124.467 46.6 149.028 71.116 52.28 116.887 52.786 54.84
13442081 13104685 2.51 6424586 6295452 2.01 1647295 1591287 3.4 491527 476290 3.1
57.71 57.58 0.13 47.63 47.61 0.02 34.33 34.32 0.01 26.04 26.04 0
[1]
[2]
Tables 7-9 show the experiment results for three CIF test sequences, “News”, “Hall” and ” Container”. QP 8
20
35
45
Method
Complexity
Encoding Time(Second)
Bits
PSNR(dB)
FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%)
56139 23915 57.4 56139 23129 58.8 56139 21838 61.1 56139 22444 60.02
1447.787 668.733 53.81 935.754 465.818 50.22 654.502 341.912 47.76 515.833 252.242 51.1
47356407 45196955 4.56 19311686 18823100 2.53 4986803 4854653 2.65 1538967 1512651 1.71
58 57.89 0.11 47.23 47.2 0.03 36.06 36.04 0.02 27.29 27.29 0
TABLE VIII EXPERIMENTAL RESULTS OF HALL(CIF) QP 8
20
35
45
Method
Complexity
Encoding Time(Second)
Bits
PSNR(dB)
FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%)
56139 26891 52.1 56139 25656 54.3 56139 20131 64.14 56139 19256 65.7
1317.208 687.187 47.83 1005.244 489.554 51.3 603.163 338.917 43.81 509.142 241.231 52.62
50668809 49549028 2.21 20945602 20746619 0.95 4038964 3936374 2.54 1308409 1278446 2.29
58.1 58.02 0.08 46.68 46.65 0.03 36.36 36.35 0.01 27.44 27.44 0
Encoding Time(Second)
Bits
PSNR(dB)
FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%) FS DBGM Improvement Rate(%)
56139 26891 52.1 56139 24813 55.8 56139 22624 59.7 56139 21221 62.2
1296.958 671.305 48.24 1071.043 534.665 50.08 621.196 355.262 42.81 529.064 202.949 61.64
53328261 51883065 2.71 25093589 23921718 4.67 5694884 5583264 1.96 1487761 1457411 2.04
58.33 58.29 0.04 46.76 46.67 0.09 37.47 36.4 1.07 26.54 26.54 0
This paper represented a fast mode decision algorithm for the type of INTRA 4x4 in H.264 video coding. To decrease the computational complexity of full search algorithm , the proposed gradient-based method use a candidate mode set which is consist of the minimum 3 or the maximum 5 modes for a 4x4 block intra prediction. We were able to confirm that the proposed method has a great reduction of the complexity more than prior FIPMS(Fast Intra Prediction Mode Selection) methods by the experimental results.[2][3] If the proposed algorithm is combined to another fast inter mode decision algorithm, the computational complexity of video coding in H.264/AVC will be decreased remarkably.[8] It is also possible to apply the proposed method to practical applications such as real time video communication.
B. Experimental results of CIF test sequences
TABLE VII EXPERIMENTAL RESULTS OF AKIYO(CIF)
Complexity
IV. CONCLUSION
TABLE VI EXPERIMENTAL RESULTS OF FOREMAN(QCIF) QP
Method
[3] [4] [5]
[6] [7]
[8]
3443
V. REFERENCES Tomas Wiegand, Gary J.Sullivan, Gisle Bjontegaard, and Ajay Luthra “Overview of the H.264/AVC Video Coding Standard” IEEE Transactions on Circuit And System For Video Technology, JULY, 2003 Feng PAN, Xiao LIN, Ragardja SUSANTO, Keng Pang LIM, Zheng Guo LI, Ge Nan FENG, Da Jun WU, and Si WU “Fast Mode Decision for Intra Prediction”, JVT-G013, 7th Meeting: Pattaya II, Thailand, 7-14 March, 2003 ITU-T Rec. H.264|ISO/IEC 14496-10AVC, “Joint Final Committee Draft (JFCD) of Joint Video Specification“ ,Klagenfurt, Austria, July 22-26, 2002. Gary J. Sullivan and Thomas Wiegand “Rate-Distortion Optimization for Video Compression” IEEE Signal Processing Magazine ,vo.15 no.6, pp74-90, NOV, 1998 Jorn Ostermann, Jan Bormans, Perter List, Detlev Matthias Narroshke, Fernando Pereira, Thomas Stockhammer, Thomas Wedi “Video coding with H.264/AVC: Tools, Performance, and Complexity” IEEE Circuits And Systems Magazine, pp7-28,2004 JVT Reference Software 9.4, http://bs.hhi.de/~suehring/tml download/jm94.zip Zhang Young-dong, Dai Feng, Lin Shou-xun “Fast 4x4 Intra-Prediction Mode Selection for H.264” IEEE International Conference on Multimedia Expo(ICME), 2004 K.P. Lim, s. Wu, D. J. Wu, S. Rahardja, X. Lin, F. Pan, Z. G.. Li, “Fast INTER Mode Selection,” Doc. 1020, Sep, 2003. Austria, July 22-26, 2002.