The 2004 International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC2004) Hotel Taikanso, Sendai/Matsushima, Miyagi-Pref., JAPAN July 6-8, 2004
Complexity Reduction in Block Mode Determination for H.264 Peng Liu1, Tomonori Aikawa1, Satoshi Miyaji2, Liang Zhao1 and Hideo Yamamoto1 1 Faculty of Engineering, Utsunomiya University, 7-1-2 Yoto, Utsunomiya-shi, Tochigi, 321-8585, Japan E-mail:
[email protected] 2 KDDI R&D Laboratories, 2-1-15 Ohara, Kamifukuoka-shi, Saitama, 356-8502, Japan E-mail:
[email protected] Abstract: This paper considers the “variable block size” feature in H.264. Based on experimental observations, we give a simple algorithm to reduce the coding complexity. To get the coding efficiency and reduce the complexity at the same time, it is applied only to blocks that have small estimated cost value. We show that the proposed algorithm can reduce the coding time by up to 40% with the same picture quality and only 10% increase of the bitrate.
1. Introduction H.264, also known as MPEG-4 part 10 (Advanced Video Coding), is a new video codec standard released in 2003 by the Joint Video Team (JVT) of ITU-T and ISO. To obtain a high coding efficiency, a number of new features are adopted in H.264 [1]. Among all of them, variable block size motion compensation (MC) is considered one of the most effective one, which can significantly increase the efficiency than fixed block size adopted in former codecs. On the other hand, however, variable block size MC is also time expensive. Actually, in our test with the reference software JM 7.3 [2], it averagely takes 10 seconds to code one CIF-size frame (or 25 days for a two hours video) with a PC equipped with a 2.4GHz Pentium 4 processor and 256MB memory. This is because that the JM tries to find the best mode from all alternatives by exhausted computations. In this paper, we consider the “variable block size” feature. Based on experimental observations, we show an algorithm to reduce the coding complexity in block mode determination. Our simple scheme is to apply the same mode as used by a reference block. To get the coding efficiency and reduce the complexity at the same time, we employ a threshold T and apply the scheme only to blocks that have costs (with respect to a pre-defined function) T or less. We show that it can reduce the coding time by up to 40% with the same picture quality and only 10% increase of the bitrate. The rest of this paper is organized as follows. In Section 2, we discuss the block mode and its relationship with the ratedistortion (R-D) cost function used in JM 7.3. In Section 3, we present the proposed algorithm and show experiment results in Section 4. Finally we conclude in Section 5.
divide the macroblock equally into sixteen 4x4 blocks and code them individually by intra prediction. Similarly, in inter mode, there is a so-called copy mode, which is to copy the whole macroblock from the reference frame without MC. On the other hand, sub-(intra)modes with MC in H.264 can be performed by using various block sizes and shapes, which are illustrated in Figure1. Hence there are a total of 262 modes for coding a 16x16 macroblock: 2 intra modes, 1 copy mode, 3 inter modes with size larger than 8x8, and 256 inter modes with size 8x8 or smaller (since each 8x8 block can be coded by one of the four modes). They are called the block modes (also called as macroblock partition and subpartition in [3]).
16x16
16x8
8x16
8x8
8x4 4x8 4x4 Figure 1. Variable block size in H.264. Hence a smaller block size can result a better-matched block, so it can reduce the prediction residual. On the other hand, it also increases the number of motion vectors (MVs). To get the best coding efficiency, the H.264 reference software JM tries to find a best block mode from all modes by calculating their R-D costs. In H.264, bitrate is denoted by rate (R), while picture quality is evaluated by distortion (D) [4]. Thus the smaller R and D are the better. A function (called the rate-distortion function, or R-D function simply) showing the relation between R and D is used in JM, and the R-D optimization is performed to find a best mode. In coding a P frame, the R-D cost is first calculated to determinate the reference frame, then is calculated to choose the best block mode. For choosing the best block mode, the cost is defined as D + λR , where λ is defined by (1) λ = 0.85 × 2 QP / 3 , where QP is the quantization parameter.
3. The Proposed Algorithm
2. Block Mode and R-D Cost When considering a P-frame, there are two modes, intra and inter, to code a 16x16 macroblock in H.264 basically. In intra mode, there are two sub-modes: I16x16 that is to code the whole macroblock in intra mode, and I4x4 that is to
From experiment results, we observe that, usually there is a lot of blocks use the same modes as in the previous frame (see Section 4). This is the basic idea of proposed algorithm.
7F3P-2-1
3.1. Algorithm in a glance The flow of our algorithm is shown in Figure2. Speaking simply, it is to apply the same mode (called the reference mode in this paper) used in the block (called the reference block) that is at the same position in the previous frame, if the estimated cost (will be explained in the next subsection) is less than a pre-defined threshold. Calculate the “cost” in reference mode.
Smaller than T
No
Compute cost for all modes
Save Mc Rc Dc
For the frame Pn (n = 2, 3, …), we first calculate the modified cost e by (2) in reference mode. Next, we compare e and the threshold T. If e is smaller than T, the reference mode will be applied. See the details in Figure 4. On the other hand, of course, intra prediction and the MV are computed newly except the copy mode. Encode block in reference mode Mr, and compute Rc, Dc
Find the best mode by comparing R-D costs for all block modes
Calculate e by (2).
Yes Encode the block in reference mode.
Encode in by mode of the minimum cost Figure 3. Processing P1
Encode the block in the mode above
No
T