IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.3 MARCH 2005
800
LETTER
Fast Macroblock Mode Determination to Reduce H.264 Complexity Ki-Hun HAN† , Nonmember and Yung-Lyul LEE†a) , Member
SUMMARY The rate-distortion optimization (RDO) method is an informative technology that improves the coding efficiency, but increases the computational complexity, of the H.264 encoder. In this letter, a fast Macroblock mode determination algorithm is proposed to reduce the computational complexity of the H.264 encoder. The proposed method reduces the encoder complexity by 55%, while maintaining the same level of coding efficiency. key words: H.264, rate-distortion optimization (RDO), variable block size motion estimation (ME), discrete cosine transform (DCT)
1.
Introduction
The H.264 [1], [2], which is a joint standard of the ITUT and the ISO/IEC MPEG-4 AVC (Advanced Video Coding), offers better coding performance than the previous coding standards. Compared with previous coding standards such as H.263 [3], MPEG-2 [4] and MPEG-4 [5], the H.264 has certain distinct features such as variable block size ME (Motion Estimation), as shown in Fig. 1, which is based on quarter-pixel interpolation, multiple reference frames, and so on. The H.264 encoder performs ME of blocks with variable sizes, such as 16 × 16, 16 × 8, 8 × 16 and P8 × 8, for each 16 × 16 MB (Macroblock). In the case of the P8 × 8 block, the MB is divided into four 8 × 8 blocks and each of these 8 × 8 blocks can be encoded into 8 × 8, 8 × 4, 4 × 8 or 4 × 4 blocks. Since the inappropriate selection of the best motion vector and the MB type among the reference frames leads to the degradation of the coding efficiency, the MB mode decision in the H.264 encoder uses the informative RDO technology [6], [7]. The RDO mechanism is used to select the best MB mode among all of the block modes. The informative RDO technology provides a bit-rate reduction of up to approximately 10% and a PSNR (Peak Signal to Noise Ratio) improvement of up to 0.35 dB, while it increases the computational complexity of the H.264 encoder. In this letter, a fast MB mode determination algorithm is proposed to reduce the complexity of the H.264 encoder. 2.
Proposed Fast MB Mode Decision
The H.264 encoder uses the informative RDO technology Manuscript received September 10, 2004. Manuscript revised October 29, 2004. Final manuscript received November 29, 2004. † The authors are with the Department of Internet Engineering, Sejong University, 98 Kunja-Dong, Kwangjin-Gu, Seoul 143-747, Korea. a) E-mail:
[email protected] DOI: 10.1093/ietfec/e88–a.3.800
Fig. 1 coder.
The variable blocks used for motion estimation in the H.264 en-
to select the best MB mode among all possible MB modes. The H.264 encoder has seven possible MB modes that it can use: SKIP, INTER16 × 16, INTER16 × 8, INTER8 × 16, P8 × 8, INTRA16 × 16 and INTRA4 × 4. The SKIP mode only includes the (0,0) motion vector or the PMV (median Predicted Motion Vector). To choose the best MB mode that has the minimum rate-distortion value, RDcost, the RDcost values are computed for all MB modes, using the number of encoded bits and the distortion, as follows: RDcost = Distortion + λ Mode × Rates 15 15 where Distortion = (B(k, l) − B (k, l))2
(1)
k=0 l=0 (QP−12)/3
λmode = 0.85 × 2
In Eq. (1), Distortion represents the mean square error (MSE) between the original pixel values, B(k, l), of the current MB and the reconstructed pixel values B (k, l) of the reconstructed MB, while λ Mode , the Lagrangian parameter [8], is a function related to the H.264 quantization parameter (QP), whose value ranges from 0 to 51. Rates represents the total number of bits associated with the encoding of the motion vector, reference frames, residual signals, and so on. In order to compute the value of RDcost, (4 × 4DCT+Q), (4 × 4DCT+Q)−1 and the entropy coding value need to be calculated several times for each MB mode. Therefore, the computation of the RDcost value for the seven MB modes induces huge computational complexity. 2.1 Algorithm for P8 × 8 Mode Skipping When the P8 × 8 mode is encoded, the RDO operation is independently performed for each of the four 8 × 8 blocks in the MB and one mode is selected for each 8 × 8 block,
c 2005 The Institute of Electronics, Information and Communication Engineers Copyright
LETTER
801 Table 1 Ratio of the P8 × 8 mode in the P-frame, and the bitrates increases and PSNR decreases when P8 × 8 mode is not used.
Table 2 Ratio of INTRA MBs in the P-frames, and the bitrates increases and PSNR decreases when the INTRA modes are not used.
among the INTER8 × 8, INTER8 × 4, INTER4 × 8, and INTER4 × 4 block modes. When the number of RDcost computations in the P8 × 8 mode is compared with that in the INTER16 × 16 mode for the MB unit, the computations required for the P8×8 mode are found to be four times more complex than those for the INTER16 × 16 mode, in terms of the number of computations involving (4 × 4DCT+Q) and (4 × 4DCT+Q)−1 . In order to demonstrate the influence of the P8×8 mode in the H.264 encoder, the ratio of the number of P8 × 8 encoding modes to the seven MB modes in the P (Predictive) frames are shown in Table 1. Also, the increases in bitrate, ∆Bitrates, and the decreases in PSNR, ∆PS NR, when the P8 × 8 mode is not used are shown in Table 1. ∆Bitrates and ∆PS NR are calculated as follows: A−B × 100(%) (2) ∆Bitrates = A ∆PS NR = PS NR o f A − PS NR o f B (3)
performed and the RDcost value is computed with respect to the INTER16 × 16, INTER16 × 8 and INTER8 × 16 block modes. If the RDcost value of the INTER16 × 16 mode is the smallest of the three modes, it is preferable to select the INTER16 × 16 MB mode. Since the P8 × 8 mode is smaller block mode than the INTER16× 8 and INTER8 × 16 modes, the RDcost value of INTER16 × 16 mode will have the smaller value than that of P8×8. In this case, we can skip the ME and RDcost computations in Eq. (1) for the P8 × 8 mode.
where A represents the bitrates obtained using all seven MB modes and B is the bitrates using the six modes without the P8 × 8 mode. Experiments were performed using common test sequences on the H.264 JM (Joint Model) 80 codec [9] with QP values of 28, 32, 36 and 40. The experimental results in Table 1 show the average performance for the range of different QP values. As shown in Table 1, the P8 × 8 MB mode takes 10.22% in all seven MB modes on average. And when the P8 × 8 mode is not used, it affects the coding efficiency by about −2.70% and PSNR by about 0.1 dB. Therefore, a fast MB mode decision algorithm that maintains the coding efficiency is proposed in this letter. From the viewpoint of the RDcost calculation in Eq. (1), when an MB is divided into smaller blocks, the distortion is usually decreased, because it can express detailed motion, but the bitrates are increased due to the large number of bits in the block modes and the motion vectors. When an MB is divided into smaller blocks for ME, if the resulting decrease in distortion is greater than the increase in the bitrate, the MB must be further divided into much smaller blocks. However, if the increase in the bitrate is greater than the decrease in distortion, it is preferable to continue using bigger blocks. To decide whether or not to skip the RDcost calculation in the P8 × 8 mode in the early stages, ME is
2.2 Algorithm for Spatial-Predictive Coding Skipping To obtain high coding efficiency, the H.264 performs INTRA coding for each MB. Even if the current MB belongs to a P frame, the H.264 encoder examines the INTRA4 × 4 and INTRA16 × 16 modes. However, the spatial-predictive coding mechanism induces a huge computational load. Table 2 shows the ratio of the number of MBs encoded as INTRA modes in the P frames and the bitrates increases, ∆Bitrates, and the PSNR decreases, ∆PS NR, in various sequences in which the H.264 encoder does not perform the two INTRA modes. Although the ratio of the two INTRA modes is negligible in Table 2, the INTRA modes affect the coding efficiency of the H.264 encoder. Therefore, we propose a method which permits the spatial-predictive coding to be skipped, without any loss of coding efficiency. The proposed method allows the INTRA4×4 and INTRA16×16 modes to be skipped separately. For the INTRA16 × 16 mode decision in the H.264 standard, the vertical prediction is defined as the vertical (column) direction from the top sixteen pixels outside the current MB and the horizontal prediction is defined as the horizontal (row) direction from the left sixteen pixels outside the current MB, respectively. In order to develop the fast INTRA16 × 16 mode decision, the first column pixels and the first row pixels of the current MB are investigated for the vertical prediction and horizontal prediction, respectively, first, to decide whether to skip the INTRA16 × 16 mode calculation or not, the spatial prediction error (SPE) is defined as the minimum value of the vertical and horizontal spatial prediction errors (VSPE, HSPE) in Eq. (4), and the temporal prediction error (TPE) is defined in Eq. (5) as follows:
IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.3 MARCH 2005
802
Fig. 2
Vertical and horizontal spatial prediction errors.
VS PE =
1 |Pel(x, y − 1) − Pel(x, y + i)| 16 i=0
HS PE =
1 |Pel(x − 1, y) − Pel(x + i, y)| 16 i=0
15
15
S PE = min(HS PE, VS PE) BestInterMode sBlockCost T PE = T heNumbero f PixelsinMB
(4)
(5)
where the VSPE is the normalized sum of the absolute difference (SAD) between the reconstructed pixel, Pel(x, y−1), and the current pixels in the vertical direction, Pel(x, y + i), i = 0, 1, ..., 15, inside the current MB, as shown in Fig. 2(a), and the HSPE is defined in a similar manner, as shown in Fig. 2(b). The best INTER mode’s BlockCost used to calculate the TPE is the sum of the SAD between the current MB and the predicted MB with respect to the best INTER MB mode and the motion vector bits. When the SPE is greater than or equal to 20.0, which is experimentally obtained value, and the TPE is less than the SPE, INTRA16 × 16 mode calculation is skipped. Otherwise, the INTRA16 × 16 mode is calculated and the rate distortion values of the INTRA16 × 16 mode are compared with those of the INTER modes. Second, to decide whether to skip the INTRA4 × 4 mode calculation or not, the INT RA RD value is defined as the average RDcost of all MBs encoded in the INTRA4 × 4 mode in all frames. It is updated whenever an MB is encoded in the INTRA4 × 4 mode. The INT RA RD value is updated according to Eq. (6): CurRD + n × INT RA RD (6) n+1 where CurRD is the RDcost value when the current MB is selected as the INTRA4 × 4 mode and n denotes the number of INTRA4 × 4 MBs. When the RDcost of the best INTER mode is less than INT RA RD, the INTRA4 × 4 mode calculation is omitted. Otherwise, the INTRA4 × 4 mode calculation is performed, and when the best block mode is selected as the INTRA4 × 4 mode, the INT RA RD is update as shown in Eq. (6). INT RA RD =
2.3 Flow Chart of the Proposed Fast Mode Determination Algorithm Figure 3 shows the flow chart of the proposed fast mode
Fig. 3
Flow chart of the proposed fast mode determination algorithm.
determination algorithm. From the viewpoint of the RDO, the INTRA coding can be skipped if the INTER coding is performed in advance, First of all, the INTER codings of the INTER16 × 16, INTER16 × 8 and INTER8 × 16 block mode are performed and then the INTRA coding is performed as shown in Fig. 3. When the RDcost of the INTER16 × 16 mode has the minimum value among three modes, the P8×8 mode is skipped. For the determination of the INTRA16×16 and INTRA4 × 4 modes, the spatial prediction error (SPE) is compared to the temporal prediction error (TPE) as shown in the flow chart. If SPE is greater than or equal to 20.0 and the TPE is less than the SPE, we skip the INTRA16 × 16 mode. Otherwise, the INTRA16 × 16 mode computation is performed. If the RDcost of the best INTER mode is less
LETTER
803
than the INT RA RD that is average RDcost of INTRA4 × 4 mode that has computed until now, then the INTRA4 × 4 mode is skipped. Otherwise, the INTRA4 × 4 mode is computed. Finally, the block mode is selected. When the best block mode is selected as the INTRA4 × 4 mode, the INT RA RD is update as shown in Eq. (6). 3.
Experimental Result
The experiments were performed on a Pentium IV-1.8GHz computer, in order to demonstrate the effect produced when the proposed MB mode skipping method is used. The proposed MB mode determination algorithm is applied to the JM (Joint Model) 80 encoder [9] with the Exp-Golomb code, variable block-size ME/MC having 16 × 16, 16 × 8, 8 × 16 and P8 × 8 blocks, −16–+16 motion search range, quarter pixel interpolation, and 4 × 4 DCT, when the QP value are 28, 32, 36 and 40. Several image sequences as shown in Table 3, each of which has 300 frames, were used for this experiment. Each sequence is compressed with the scheme of I,P,P,P,P..., i.e., only the first frame is an INTRA frame and the others are all INTER frames without the B frame. Under the above test conditions, the efficiency of the P8 × 8 mode was shown in Table 1 of Section 2.1. In Table 4, ∆I16(%), ∆I4(%), and ∆P8(%) indicate the saved computation times for the INTRA16 × 16, INTRA4 × 4 and P8 × 8 modes, respectively, as compared to that of the JM80 codec. ∆Bitrates and ∆PS NR were calculated using Eqs. (2) and (3), respectively. ETime(%) denotes the total reduction in computation time in the encoding of the various test sequences. Referring to Table 4, it is shown that if the proposed method for MB mode determination is used, the computational load is decreased on average by 26.59% and 86.03% for the INTRA16 × 16 and INTRA4 × 4 modes, respectively, while that for the P8 × 8 mode is decreased by an average of 76.99%. Finally, the total encoding time in the JM80 encoder is decreased by an average of 55.05%. Nevertheless, on the average, the loss in the bitrates and PSNR are only −1.09% and 0.04 dB, respectively, over the H.264 test sequences. 4.
Table 3 Recommended common experimental conditions by H.264 standard group.
Conclusion
The proposed MB mode determination method can significantly reduce the computational complexity by skipping the P8 × 8, INTRA16 × 16 and INTRA4 × 4 modes in the H.264 encoder. When the proposed method is used in conjunction with the non-normative informative RDO technology in the H.264 encoder, the total encoding time can be reduced by 55.05% on average, while the bitrates and PSNR are maintained at the same level. In conclusion, the proposed method can be usefully applied to the H.264 encoder, in order to implement a high speed encoder with various code optimization techniques.
Table 4
Experimental results.
Acknowledgments This research was partly supported by the Ubiquitous Autonomic Computing and Network Project, the Ministry of Science and Technology(MOST) 21st Century Frontier R&D Program in Korea. References [1] T. Wiegand, Joint Final Committee Draft(JFCD) of Joint Video Specification (ITU-T Rec. H.264—ISO/IEC 14496-10 AVC), JVT-G050, March 2003. [2] T. Wiegand, G.J. Sullivan, G. Bjntegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol.13, no.7, pp.560–576, July 2003. [3] ITU Telecom Standardization Sector, “Video codec test model nearterm, version 10 (TMN10) draft 1,” H.263 Ad Hoc Group, April 1998. [4] “Information technology—Generic coding of moving pictures and associated audio information,” ISO/IEC JTC IS 13818-2 (MPEG-2), 1999. [5] “Information technology—Coding of audio-visual objects, Part2: Visual amendment 1: Visual extensions,” ISO/IEC JTC1/SC29/WG11
IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.3 MARCH 2005
804
N3056, Dec. 1999. [6] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” IEEE Signal Process. Magazine, pp.23–50, Nov. 1998. [7] G.J. Sullivan and T. Wiegand, “Rate-distortion optimization for video
compression,” IEEE Signal Process. Magazine, Nov. 1998. [8] K. Takagi, “Lagrange multiplier and RD-characteristics,” JVT-C084, May 2002. [9] http://bs.hhi.de/˜suehring/tml/download/old jm/jm80.zip