Fast Sub-Pixel Motion Estimation and Mode Decision ... - Google Sites

4 downloads 242 Views 144KB Size Report
partition selection and only performs the 'precise' sub-pel search for the best ... Motion Estimation process contains t
Fast sub-pixel motion estimation and mode decision for H.264 Weiyao Lin*1, David Baylon**, Krit Panusopone** and Ming-Ting Sun* *Department of Electrical Engineering University of Washington Seattle, WA 98195, USA {wylin,mts}@u.washington.edu

**Advanced Technology Department CTO Office, Home & Networks Mobility Motorola, Inc. San Diego, CA 92121, USA

Abstract—Motion Estimation (ME) is one of the most time-consuming parts in video coding. The use of multiple partition sizes in H.264 makes the ME even more complicated. It is important to develop fast sub-pixel ME algorithms due to (1) The computation overhead by sub-pixel ME has become relatively significant while the complexity of integer-pel search has been greatly reduced by fast algorithms and (2) Reducing sub-pel searching points can save the computation for interpolating sub-pixel values. In this paper, a new fast sub-pixel ME algorithm is proposed which performs a ‘rough’ sub-pel search before the partition selection and only performs the ‘precise’ sub-pel search for the best partition. Experimental results show that our method can reduce the sub-pel search points by more than 50% compared to existing fast sub-pel ME methods with little quality degradation.

I.

The paper is organized as follows. Section II reviews previous researches on sub-pel motion estimation. Section III discusses the ideas to further reduce the searching points for sub-pel ME for multiple partitions. The proposed algorithm is described in details in Section IV and experimental results are given in Section V. Section VI concludes this paper.

INTRODUCTION

H.264/AVC is the state-of-the-art video coding standard established by ITU-T and ISO/IEC. Compared to previous video coding standards, H.264 uses many new techniques and is able to save more than 50% in bitrate while having the same video coding quality compared to the MPEG-2 video coding standard [1]. Motion Estimation (ME) is one of the most time-consuming parts in video coding. Since H.264 uses 7 partition sizes for inter-frame prediction (16×16, 16×8, 8×16, 8×8, 8×4, 4×8 and 4×4), the complexity of multi-partition ME is even larger [2]. In the H.264 JM reference software [5], the Motion Estimation process contains two stages: integer pixel search over a large area and sub-pixel search around the best selected integer pixel. Nowadays, it is becoming important to develop fast sub-pixel ME algorithms. First, the computation overhead by sub-pixel ME has become relatively significant while the complexity of integer-pixel search has been greatly reduced by fast algorithms. For example, there have been integer-pel ME algorithms ([3], [4]) that only need 3~5 integer search points to get the final integer Motion Vector (MV). Compared to this, the computation from the 16-point sub-pel search method used in the JM becomes relatively large. Secondly, typical sub-pixel searches require interpolating sub-pixel values for computing the Sum of Absolute Difference (SAD). Reducing sub-pel searching points can also reduce the interpolation computations. _______________________________________ 1

In this paper, a fast sub-pixel ME algorithm is proposed for H.264, which performs a ‘rough’ sub-pel search before the partition selection and only the ‘precise’ sub-pel search for the best partition. Experimental results show that the proposed algorithm can further reduce the number of sub-pixel search points compared to other fast sub-pel ME algorithms ([6]-[9]]) with little quality degradation.

This work was performed while employed at Motorola.

978-1-4244-1684-4/08/$25.00 ©2008 IEEE

II. RELATED WORK Chen et al. [6] analyzed the difference between the integer-pel matching error (SAD) surface and the sub-pel matching error surface. The integer-pel matching error surface is far from a uni-modal surface inside the searching window due to the complexity of video content. However, for the sub-pel matching error surface, the uni-modal surface assumption holds in most cases because of the smaller search range of sub-pel ME as well as the high correlation between sub-pixels due to the sub-pixel value interpolation. There has been much research on fast sub-pixel ME ([6]-[9]). Most of these methods are based on the uni-modal surface assumption and perform the sub-pel search in two steps: (i) Predict a sub-pel MV (SPMV), (ii) Perform a small area search around the SPMV to obtain the final sub-pel MV. The method to get the sub-pel predictive MV can be summarized into two ways: using spatial-temporal information or modeling the SAD surface. Chen et al. [6] and Yang et al. [7] use spatial-temporal information to get the SPMVs. In [6], a Center Biased Fractional Pel Search (CBFPS) fast sub-pel ME method is studied, where the MVs of neighboring MBs are used to get the SPMV as in equation (1), frac_pred_mv = (pred_mv-MV) % 嘽

(1)

where pred_mv is the prediction MV of the current partition (in sub-pel resolution), MV is the best integer-pel MV of the current partition, ȕ=4 in the 1/4-pel case and ȕ=8 in the 1/8-pel case,

3482

and % represents the modulo operation. In [7], a larger partition MV (e.g. 16x8 inter-mode MV takes a 16x16 MV as a reference) or previous frame MV is used to get the SPMV. If combined with the SPMV from CBFPS, the accuracy of SPMV can be greatly increased. A more popular way to get the SPMV is to use a function (in most cases a second-order function) to model the SAD surface ([8], [9]). If the matching errors of the best integer-pel MV and its neighboring positions are known, the coefficients of the function can be solved. The position that corresponds to the smallest value in the SAD surface is then chosen as the SPMV.

is determined only for the best partition selected, then the number of search points for the non-best partitions can be reduced greatly. As shown in Figure 1(b), the purpose of the first stage ME is to obtain a rough sub-pel SAD which is close to the best SAD. The integer-pel SAD surface information can be used to decide whether the sub-pel SAD is close to the best one or not. Based on the above discussion, we propose a new fast sub-pel motion estimation algorithm described in detail in the next section.

(a)

Many functions can be used to model the SAD surface. Example second-order functions are listed in equations (2) and (3). f(x,y)=c1 x2+c2 xy+c3 y2+c4 x+c5 y+c6 , or 2

2

f(x,y)=c1 x +c2 x+c3 y +c4 y+c5

(3)

where x and y are coordinates of the surface, and f(x,y) is the matching error (SAD) value. Normally, the best integer-pel position is set to be located at (0,0) and its neighboring integer-pel positions at (1,0), (-1,0), (0,1), (-1,0), etc. As the number of model function coefficients increases, generally more integer-pel neighboring SADs are needed. In the proposed method, equation (3) is used to determine one of the SPMVs, which uses the best integer-pel SAD and the SADs of its four diamond integer neighbors. Given these SAD values, the coefficients of equation (3) can be computed as in [9]. The SPMV can be calculated as: −B −D , ) pmv = ( x p , y p ) = arg min f ( x , y ) = ( (4) x, y 2 A 2C where A=(I+J)/2, I=f(1,0)-f(0,0) f(0,0)=SAD(0,0)= c5 B=(I-J)/2, J=f(-1,0)-f(0,0) f(1,0)= c1+c2+c5 C=(K+L)/2, K=f(0,1)-f(0,0) f(-1,0)= c1 -c2+c5 D=(K-L)/2, L=f(0,-1)-f(0,0) f(0,1)= c3+c4+c5 f(0,-1)= c3-c4+c5

{

{

(b) Figure 1. Fast sub-pel ME approaches: (a) Process for previous fast sub-pel ME methods (b) Proposed fast sub-pel ME process

(2)

{

If (xp, yp) is a fractional vector, its components are quantized into quarter-pel units. III. FURTHER REDUCING SUB-PEL SEARCH POINTS WITH MULTIPLE PARTITIONS As shown in Section II, most previous fast sub-pel ME methods reduce the number of search points by only searching the reduced area around the SPMV. For H.264 multiple partition sizes, most previous algorithms try to find the best sub-pel MV (with the smallest SAD) for each partition before the partition selection, as shown in Figure 1(a). However, in practice, only the best partition of the MB needs precise sub-pel MVs. The MVs of other non-best partitions are only used for inter-mode selection. They are no longer useful after the best partition is selected. If a ‘rough’ sub-pel SAD is good enough to select the best partition, there is no need to search for more precise sub-pel points in the first stage. Therefore, if only a ‘rough’ sub-pel motion search is performed for each partition (the resulting MV does not necessarily have the smallest SAD), and a ‘precise’ sub-pel MV

IV. FAST SUB-PEL ME ALGORITHM The entire process of the proposed fast sub-pel ME algorithm can be described in Figure 2. In our algorithm, instead of using only the SAD to model the surface we use COST=SAD+ ȜR(MV) ,where R(MV) is the number of bits to code the MV, and Ȝ is a constant. In Step 1, the difference between the best COST of the integer positions and the two averaged COSTs of its 4 neighboring integer positions (the averaged COST of two vertical neighboring integer positions and the averaged COST of two horizontal neighboring integer positions) are checked. If the difference is small, it means that the COST surface is quite flat, and the best integer COST is close to the optimal sub-pel COST and therefore is good enough to estimate the best sub-pel COST. In this case, the sub-pel motion estimation is skipped for the current partition. The best COST of the integer position is used in the partition selection in Step 4. The rule for deciding the COST surface flatness is shown in equation (5), ­ Not _ Flat if any of (a), (b), (c) is true (5) COST _ Surface = ® ¯

Flat

else

where 5 5 (a) avg _ COSTvertical > COSTFull or avg _ COSThorizontal > COSTFull 4 4 (b ) if blocktype(i ) min(| COST full − avg _ COSTvertical |, | COST full − avg _ COSThorizontal |) > 10 if blocktype( ii ) ( c) min(| COST full − avg _ COSTvertical |, | COST full − avg _ COSThorizontal |) > 20

where COSTFull is the best COST after full-pel ME, avg_COSTvertical is the COST average of its two vertical full-pel neighbors and avg_COSThorizontal is the COST average of its two horizontal full-pel neighbors. Blocktype(i) represents 8x8,8x4,4x8 and 4x4 partitions, and blocktype(ii) represents 16x16, 16x8 and 8x16 partition sizes. If the COST surface is not flat in Step 1, the two sub-pel points that are pointed by two SPMVs will be searched in Step 2. Two sub-pel PMV calculation methods are used to get the two SPMVs. The first SPMV is calculated by the CBFPS method discussed in Section II, i.e., equation (1). The second SPMV is

3483

calculated by the second-order surface model discussed in Section II. After these two points are searched, the point that has the smallest COST is selected, namely COSTStep2. The motion vector that corresponds to COST step2 is defined as MVStep2. Table 1: The distribution of absolute distance between the best sub-pel MV and MVStep2 (d in quarter-pel units) Sequence

d 5 ( c ) if blocktype ( ii ), D > 10

and COSTmin_step2 = min(COSTStep2,COSTbest_full_pel), avg_COSTvertical and avg_COSThorizontal are the same as in equation (5). If D is large, COSTStep2 may not be close to the best sub-pel COST (as shown in Figure 3(a)). In this case, two more points vertically and horizontally next to MVStep2 in quarter-pel resolution will be checked. As shown in Figure 3(b), the black point is MVStep2 and the grey points are quarter-pel neighbors of MVStep2. In Step 3, two search points are selected as one point out of V1 and V2 and one point out of H1 and H2. A bilinear model is used to select one of the neighboring points. As shown in Figure 4, the slopes are computed (based on equation (8)) between the two horizontal neighboring integer points (or the two vertical neighboring integer points) and the best sub-pel point from step 2 (the point by MVStep2). Then, the quarter-pel neighboring point is selected corresponding to the slope with the smaller slope value, as shown in equation (7). ­ H if SH1 < SH 2 ­V if SV 1 < SV 2 (7) Horizonal Vertical and PStep PStep =® 1 =® 1 3 3 ¯H2 if SH1 > SH 2 ¯V2 if SV1 > SV 2 where COST Integer _ i − COST min_ Step 2 Si = , i = V 1, V 2, H 1, H 2 (8) Cord Integer _ i − Cord min_ Step 2

and integer_i represents the closest integer-pel point in i’s direction (i.e. V1 and V2 for the vertical direction, and H1 and H2 for the horizontal direction), and min_step2 represents the best sub-pel point after step 2. Cord is the coordinate (in quarter-pel resolution) of the points. The x-coordinate (horizontal direction) is used for H1 and H2 and y-coordinate (vertical direction) is used for V1 and V2. After steps 1, 2, and 3, a COST value (COSTRough) can be obtained for each partition, which is close or equal to the best COST. The sub-pel MV that corresponds to COSTRough is denoted by MVRough.

Figure 2. The proposed fast sub-pel ME algorithm If D is small, this means that the COST doesn’t decrease much between COSTStep2 and the best integer-pel COST, and that COSTStep2 is already close to the best sub-pel COST and is good enough for the mode selection. In this case, COSTStep2 is used in the partition selection in Step 4. The rule for deciding whether D is small or not can be described in equation (6), D

is

­ Large ® ¯ Small

if any of (a), (b), (c) is true else

(6)

(a) (b) Figure 3. (a) An example COST surface for COSTStep2 not close to best sub-pel COST (b) MVStep2 and its quarter-pel neighboring points In Step 4, COSTRough is used to select the best partition. In Step 5, a small area sub-pel refinement is performed around MVRough. In the proposed algorithm, the 8 quarter-pel neighbors around MVRough are searched. Since Step 5 is done only for the best partition selected, the average searching points per partition is reduced compared to previous fast sub-pel search algorithms.

3484

VI. CONCLUSION In this paper, a fast sub-pel ME algorithm is proposed which can reduce the number of average search points per partition by more than half of that of previous fast sub-pel ME algorithms, with relatively small performance decreases. Table 2: Comparison of the six ME methods Sequence

Foreman qcif

(a)

Akiyo qcif

Mobile qcif

(b) Figure 4. Using bilinear model to select neighboring search points (white end points: integer pixel; black points: MVStep2; grey point: neighboring point selected). (Note: in (a), the left slope is smaller than the right slope. Therefore in (b), the neighboring sub-pel point on the left is selected).

Container cif

V. EXPERIMENTAL RESULTS In the experiments, each test sequence of 100 frames is coded. The frame type structure is IPPP… , and the frame rate is 30 frames/sec. The search range is 16 for QCIF and 32 for CIF. The number of reference frames is one. Due to the limited space, we only show the results for QP=28. Six methods are compared for each sequence: (i) JM Reference Method [5] (Full Search) (ii) Use the best Integer COST directly to select partition and the use JM’s method to perform Sub-pel ME for the best partition (IC+SME) (iii) The method in [6] (CBFPS) (iv) The method in [9] (FPME) (v) The method in [7] (PDFPS) (vi) The Proposed Method (Proposed) In Table 2, the PSNR, Bit-rate and average searching points per partition size (SP/PS) for each method are compared. From Table 2, it can be seen that using the best integer-pel COST(IC+SME) can reduce the most number of search points, but the performance decrease is also large (Note: since we fix QP in our experiment, the performance decrease will mainly be reflected in bit rate in this case.) The previous methods (CBFPS, FPME, and PDFPS) can reduce the SP by reducing the search area around the predicted PMV. However, our proposed method can further reduce more than half of the SP from the previous methods (CBFPS, FPME, and PDFPS) by only performing ‘precise’ sub-pel search on the best partition.

Football cif

Method Full Search IC+SME CBFPS FPME PDFPS Proposed Full Search IC+SME CBFPS FPME PDFPS Proposed Full Search IC+SME CBFPS FPME PDFPS Proposed Full Search IC+SME CBFPS FPME PDFPS Proposed Full Search IC+SME CBFPS FPME PDFPS Proposed

PSNR (dB)

Bit-Rate (kbps)

35.26 35.21 35.25 35.24 35.26 35.25 38.03 37.98 38.03 38.04 38.01 38.02

117.62 130.25 117.13 120.95 119.68 119.95 31.64 34.54 31.90 32.26 34.01 32.32

32.95 32.92 32.95 32.95 32.95 32.95 35.66 35.63 35.66 35.65 35.65 35.66 36.03 36.01 36.01 36.00 36.01 36.01

453.39 484.43 453.90 455.82 457.17 456.61 204.65 213.91 204.38 205.40 204.69 204.68 1440.84 1473.46 1448.87 1455.60 1452.18 1451.55

SP/PS 16 0.51 6.59 5.25 5.01 2.28 16 0.40 5.98 2.71 3.13 0.92 16 0.51 7.02 5.81 5.72 3.1 16 0.39 5.80 3.77 3.29 0.94 16 1.13 7.63 6.21 6.85 3.13

VII. REFERENCES [1] T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuit System Video Technology, vol. 13, pp. 560-576, 2003. [2] J. Zhang and Y. He, “Performance and complexity joint optimization for H.264 video coding,” ISCAS, vol. 2, pp. 888- 891, 2003. [3] A. Chang, H.W. Wong, Y.M. Yeung and C. Au,“Fast multi-block selection for H.264 video coding,” ISCAS, vol. 3. pp. 817-820, 2004. [4] Z. Zhou and M.T. Sun, “Fast macroblock inter mode decision and motion estimation for H. 264/MPEG-4 AVC,” Int’l Conf. Image Processing, vol. 2, pp. 789- 792, 2004. [5] JVT Reference Software version JM 6.1, http://iphome.hhi.de/suehring/tml/download/old_jm/ [6] Z. Chen, P. Zhou and Y. He, “Fast Integer Pel and Fractional Pel Motion Estimation for JVT,” JVT-f017, 6th Meeting:Awaji, Island, JP, 2002. [7] L.Yang, K. Yu, J. Li and S. Li, “Prediction-Based Directional fractional Pixel motion Estimation for H.264 Video Coding,” IEEE Int’l Conf. Acoustics Speech and Signal Processing, vol. 2, pp. 901-904, 2005. [8] J.W. Suh and J. Jechang, “Fast sub-pixel motion estimation techniques having lower computational complexity,” IEEE Trans. Consumer Electronics, vol. 50 , pp. 968-973, 2004. [9] J.F. Chang and J.J. Leou, “A Quadratic Prediction Based Fractional-Pixel Motion Estimation Algorithm for H.264,” IEEE Int’l Symposium. Multimedia, pp. 491-498, 2005.

3485

Suggest Documents