a novel approach to fast multi-block selection for h.264 video coding

2 downloads 1459 Views 40KB Size Report
Email: [email protected], [email protected], [email protected]. ABSTRACT. The upcoming video coding standard, H.264, uses motion estimation with multiple block sizes ...


➡ A NOVEL APPROACH TO FAST MULTI-BLOCK MOTION ESTIMATION FOR H.264 VIDEO CODING Andy Chang, Oscar C. Au, Y. M. Yeung Department of Electrical and Electronic Engineering The Hong Kong University of Science and Technology Email: [email protected], [email protected], [email protected]

ABSTRACT The upcoming video coding standard, H.264, uses motion estimation with multiple block sizes to improve the rate-distortion performance. However, full exhaustive search of all block sizes is computational intensive with complexity increasing linearly with the number of allowed block size. In this paper, a novel fast multi-block motion estimation (FMBME) is proposed for H.264 video coding. Experimental results show that the proposed FMBME can efficiently reduce the computational cost by 40.73% with similar visual quality and bit rate.

1. INTRODUCTION The H.264 is an upcoming international video coding standard with superior objective and subjective image quality. The basic encoding algorithm of H.264 [1] is similar to H.263 or MPEG except that integer 4x4 discrete cosine transform (DCT) is used instead of the traditional 8x8 DCT. Additional features include intra prediction mode for I-frames, multiple block sizes for motion estimation, multiple reference picture selection for higher coding efficiency. In particular, the encoder can use seven different block types as shown in Fig.1 for motion estimation (ME) and motion compensation (MC). However, the processing time increases linearly with the number of block type used. This is because motion estimation needs to be performed for each block type. This full searching process (the examination of all seven block modes) provides the best coding result but the increase in computation is very high. Simulation shows that using seven different block sizes can save more than 15% of bit rate when compared to using only a 16x16 block size. In this paper, we propose a novel fast multi-block selection scheme focusing on 16x16 (mode 1), 16x8 (mode 2) and 8x16 (mode 3) that can efficiently reduce the computational cost while achieving similar visual quality

0-7803-7965-9/03/$17.00 ©2003 IEEE

and bit-rate. Instead of searching through all the possible block type, the proposed scheme tries to skip the block types that tend to give no performance gain. This is very useful for real-time applications. The paper will be organized as follows. Some observation in multi-block motion estimation will be shown in Section 2. Section 3 describes the proposed fast multi-block selection algorithm and experimental result will be presented in Section 4. A conclusion will be given in Section 5. 2. OBSERVATIONS ON MULTI-BLOCK MOTION ESTIMATION Accurate prediction can efficiently reduce the degree of error between the original image and the predicted image. For this reason, multiple block sizes motion estimation/compensation is adopted in H.264 for better compression performance. For a macroblock, it is possible to contain more than one object and the objects may not move in the same direction. Therefore using only one motion vector may not be enough to completely describe the motion of all objects in one macroblock. With only one motion vector, only part of the macroblock can have good motion compensation and the resulting residue energy can be large due to the mismatch in the remaining part of the macroblock. If multi-block motion estimation is allowed, the macroblock will be segmented into smaller zones. Each of them will get a motion vector pointing to the best matched zone in the preceding pictures. In H.264, seven type of block size with different shapes are supported. The best match is found by minimizing the cost function:

J (m, λMOTION ) = SAD( s, c(m )) + λ MOTION ⋅ R(m − p) with

m = (m x , m y )T

p = ( px , p y )

T

being

the

motion

(1) vector,

being the prediction for the motion vector

and λMOTION being the Lagrange multiplier. The term

I - 105

ICME 2003



➡ R(m − p) represents the bits used to encode the motion information and are obtained by table-lookup. The SAD (Sum of Absolute Differences) is computed as:

SAD( s, c(m)) =

B, B x =1, y =1

s[ x, y ] − c[ x − mx , y − my ]

(2)

with B = 16, 8 or 4 and s being the original video signal and c being the coded video signal. We observe there are two situations in which multi-block motion estimation is especially useful. 2.1. Object movement As mentioned before, this situation means more than one object is contained in a macroblock and are moving in different directions. This included objects moving over a background with different velocity. For example, in Fig. 2 the object is moving against a static background. In this case, the current block should be divided into two 8x16 sub-blocks whereas sub-block 0 should have a zero motion vector and sub-block 1 should have a motion vector such that the cost function can be minimized. 2.2. Texture regions A digital video sequence is a discrete representation of the continuous changing world. An image pixel value is the integration of the light intensity from some objects that falls into the detection range of some discrete detector like a CCD sensor. Suppose the edge of texture aligned perfectly with the sensor boundaries at a particular time instant such that the texture edge is clear and sharp as shown in Fig. 3a. We will describe this texture as having “integer-pixel location”. When the texture undergoes an integer-pixel translational motion, the texture will look exactly the same in the two consecutive frames except that one is a translation to another. And the moved texture can be predicted perfectly by integer-pixel motion estimation. If the edges of texture have a half-pixel offset relative to the senor, the edges may be blurred as shown in Fig. 3b and said to have “half-pixel location”. The original zero-pixel-wide (sharp) edge now becomes one pixel-wide (blurred). The pixel at the blurred edges may have only half the intensity of the original one, which can lead to difficulty in motion estimation. Similarly, if the edges have a quarter-pixel offset it may be blurred as shown in Fig. 3c. We will describe this texture as having “quarter-pixel location”. The zero-pixel-wide (sharp) object edge becomes one pixel-wide (blurred). The pixels at the blurred edges may have 3/4 or 1/4 of the intensity. The use of sub-pixel motion estimation algorithm,

like half-pixel or quarter-pixel estimation, uses interpolation to predict the sub-pixel shift of texture relative to the sampling grid. Different type of texture (integer, half or quarter) has a different response to fractional motion estimation. For example, the texture in Fig. 3b (half) can be predicted perfectly by the texture in Fig. 3a (integer) using half-pixel motion estimation but not vice verse. Since it is possible for a macroblock to contain more than one kind of texture, using only one integer, half or quarter pixel motion vector will not be sufficient to describe the texture content. For example, a macroblock may contain two 8x16 sub-blocks where sub-block 0 contains “half-pixel” texture and sub-block 1 contains “integer-pixel” texture. In this case, the current macroblock should be divided into two 8x16 sub-blocks in which half-pixel motion vector should be used for sub-block 0 and integer-pixel motion vector for sub-block 1. 3. PROPOSED FAST MULTI-BLOCK MOTION ESTIMATION (FMBME) The proposed FMBME composes of two basic elements – early termination and block segmentation. Simulation results show that, often, about 70% of the macroblocks will choose mode 1 (16x16) as their final block type. The use of early termination is to skip the searching for mode 2 and mode 3 if the performance of mode 1 is “good enough”. If early termination is not successful, block segmentation will be performed. The proposed scheme is as follows: 1. Initialization Define three variables as T: threshold for early termination. SAD1: accumulated SAD of mode 1. N1: accumulated number of macroblock used mode 1. Set T = 0, SAD1 = 0 and N1 = 0. Visit each macroblock and perform the following two steps. 2. Early termination Find the best integer pixel position (using full search or some kind of fast search) and calculate the best SAD S1min of mode 1. If S1 min < T, choose mode 1 as final block type and update T by following three equations: SAD1 = SAD1 + S1min, (3) N1 = N1 + 1, (4) T = SAD1 / N1. (5) Else go to step 3.

I - 106



➡ 3. Block segmentation From the best integer pixel position I obtained in step 2, there are eight neighboring ½-pel positions as shown in Fig. 4. For those nine positions, divide the macroblock into four regions and calculate their SADs as shown in Fig. 5. Using the letter representation in Fig. 4, the maximum SAD difference for each region is calculated as mSAD(r ) = max {SAD( I , r ) − SAD( p, r )} (6) p∈a − h

where r is the region, and p is the eight ½-pel positions. Let mSAD _ H = mSAD ( H 1) + mSAD ( H 2) (7) and mSAD _ V = mSAD (V 1) + mSAD (V 2) (8) Three conditions C1, C2 and C3 will be checked. C1: If mSAD_H > mSAD_V, select mode 2 for motion estimation and choose the best mode between mode 1 and mode 2. Else check condition C2. C2: If mSAD_H < mSAD_V, select mode 3 for motion estimation and choose the best mode between mode 1 and mode 3. Else check condition C3. C3: If mSAD_H = mSAD_V=0, select mode 1 as final block type, no further motion estimation is needed. Else perform motion estimation for both mode 2 and 3, and select the best mode as final block type. Fig.6 shows the flowchart of proposed algorithm. 4. EXPERIMENTAL RESULTS The proposed FMBME was tested on four QCIF (176x144) sequences, “Akiyo”, “Coastguard”, “Foreman”, and “Stefan”, with constant quantization factor, Qp=16. The proposed scheme was implemented in the H.264 reference software TML9.0 [3]. Spiral Full Search was used for motion estimation. Table 1 gives the results of PSNR, bit rate and complexity reduction of the proposed FMBME in the testing sequences. The computational cost using a powerful performance analyzer1 in term of number of clock cycles used. Compared with the full multi-block selection scheme using 3 modes in H.264, the proposed FMBME can reduce computational cost up to 40.73% (equivalent complexity of performing motion estimation on 1.78 block types instead of 3 block types) with negligibly small PSNR degradation (0.04dB) and slight increase in bit rate (0.92%). 5. CONCLUSION In this paper, a novel fast multi-block motion estimation called FMBME is proposed for H.264 video coding. Experiment results suggest that FMBME can reduce the computation cost significantly with little change to PSNR and bit rate. 1

6. ACKNOWLEDGEMENT This work was supported by Research Grants Council of Hong Kong SAR (HKUST6203/02E). REFERENCES [1] Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, “Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC) – Joint Committee Draft”, Document JVT-E022d3.doc, Sept. 2002. [2] B. Girod, “Efficiency Analysis of Multihypothesis Motion-Compensated Prediction for Video Coding”, IEEE Trans. on Image Processing, vol. 9, no. 4, Feb 2000. [3] JVT Reference Software unofficial version TML9.0, http://bs.hhi.de/~suehring/tml/download/tml90.zip [4] P. Topiwala, G. Sullivan, “Performance Evaluation of H.26L, TML 8 vs. H.263++ and MPEG-4”, Document VCEG-N18, Sept. 2001. [5] P. Topiwala, G. Sullivan, A. Joch, F. Kossentini, “Overview and Performance Evaluation of the ITU-T Draft H.26L Video Coding Standard,” Proc. SPIE Conf. Appl. Of Digital Image Processing, Aug. 2001. [6] A. Joch, F. Kossentini, “Performance analysis of H.26L coding features,” Document VCEG-O42, Pattaya, Thailand, 4-6 Dec. 2001. [7] D. Hepper, “Efficiency Analysis and Application of Uncovered Background Prediction in a Low Bit Rate Image Coder”, IEEE Trans. On Communications, vol. 38, no. 9, Sept. 1990.

Coastguard Complexity PSNR(dB) Bitrate H.264 417x109 28.44 524856 FMBME 270x109 28.40 531848 Difference 35.3% -0.04 -1.3% Akiyo Complexity PSNR(dB) Bitrate H.264 204.6x10 9 34.30 78984 FMBME 100.1x10 9 34.29 78792 Difference 51.1% -0.01 0.24% Stefan Complexity PSNR(dB) Bitrate H.264 369.5x10 9 27.49 1363536 FMBME 229.6x10 9 27.45 1383944 Difference 37.8% -0.04 -1.5% Foreman Complexity PSNR(dB) Bitrate H.264 342.9x10 9 30.40 497072 FMBME 210.8x10 9 30.34 502672 Saved 38.5% -0.06 -1.1% Table 1. Comparison of PSNR, bit rate and complexity for H.264 and proposed FMBME algorithm

Inter® Vtune™ Performance Analyzer 6.1

I - 107



➠ 16*8

16*16

8*8

8*16

0 Macroblock types

0

0

0

1

2

3

H1

1

1

V1 8*8

4*8

8*4

4*4

0 Sub macroblock types

0

0

V2

H2

0

1

2

3

1

1

Fig. 5 – Four Regions for a macroblock

Fig.1 – Different Mode in dividing a macroblock in H.264

Initialization of parameters

New MB 0

1

0

1

Motion Estimation on mode 1(16x16) Fig.2 – Example of an object moving on a static background

Yes Update Threshold

Early Termination

No Fig 3a – Integer-pixel texture

Fig 3b – Half-pixel texture

Fig 3c – Quarter-pixel texture

Texture analysis, segmentation and motion estimation

a

b

c

d

I

e

f

g

h

Yes

Mode 1?

No

Lower case letters (a,b,c…) : ½-pel positions

Fig 4 – Fractional Motion Estimation Position

Fig. 6 – Flow Chart of the proposed algorithm

I - 108

Suggest Documents