A fast hierarchical motion vector estimation algorithm ... - IEEE Xplore

0 downloads 0 Views 895KB Size Report
Abstract-In this paper, a hierarchical motion vector estimation algorithm using mean pyramid is proposed. Using the same measurement window at each level of ...
344

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 5, NO. 4, AUGUST 1995

A Fast Hierarchical Motion Vector Estimation Algorithm Using Mean Pyramid Kwon Moon Nam, Joon-Seek Kim, Rae-Hong Park, Member, IEEE, and Young Serk Shim, Member, IEEE Abstract-In this paper, a hierarchical motion vector estimation algorithm using mean pyramid is proposed. Using the same measurement window at each level of a pyramid, the proposed algorithm, based on the tree pruning, reduces the computational complexity greatly with its performance comparable to that of the full search (FS). By varying the number of candidate motion vectors which are to be used as the initial search points for motion vector estimation at the next level, the mean squared error of the proposed algorithm varies, ranging between those of the FS and three step search (TSS) methods. Also, depending on the number of candidate motion vectors, the computational complexity of the proposed hierarchical algorithm ranges from 1/8-1/2 of that of the FS. The computer simulation results of the proposed technique compared with the conventional methods are given for various test sequences.

I. INTRODUCTION

I

N TRANSMITTING moving pictures, interframe coding is shown to be effective for compressing video data [I]-[2]. Especially, motion compensated coding (MCC), motion compensation interpolation (MCI), and motion compensation prediction (MCP) -considering the motion information between successive frames-increase further the coding efficiency [3]. One of these motion compensated methods is block matching algorithm (BMA). BMA which compresses the data by removing the temporal redundancy on a block basis shows good data compression performance; thus, it has been widely used for commercial applications because of its low computational requirement and simplicity of hardware implementation. The commercial approach in BMA is toward the full search (FS) which estimates motion vectors by searching all the points in the search area. It is optimal in terms of the performance measure specified, but its computational complexity is high. Thus, to reduce the computational complexity, a great number of fast algorithms have been proposed, such as the direction of minimum distortion (DMD) [4] and the three step search (TSS) [ 5 ] . The TSS is widely used because of its simplicity of Manuscript received April 18, 1994; revised June 16, 1995. This work was supported in part by the Korea Academy of Industrial Technology for the development of HDTV receivers. This paper was recommended by Associate Editor K.-H. Tzou. K. M. Nam was with the Department of Electronic Engineering, Sogang University, Seoul, Korea. He is now with Samsung Electronics Co. Ltd., Seoul, Korea. J.-S. Kim was with the Department of Electronic Engineering, Sogang University, Seoul, Korea. He is now with the Department of Electronic Engineering, Hoseo University, Chungnam, Korea. R.-H. Park is with the Department of Electronic Engineering, Sogang University, Seoul, Korea. Y. S. Shim was with the HDTV R&D Group, Korea Academy of Industrial Technology, Seoul, Korea. He is now with the Institude for Advanced Engineering, Seoul, Korea. IEEE Log Number 9413977.

hardware implementation with reasonable performance. Also hierarchical motion detection techniques were proposed, which include the method proposed by Biering [6], the method based on the hierarchical approximation of an error function [7], and the hierarchical motion vector estimation method using the geometrical relation between neighboring motion vectors [81. But they have complex computational requirements or dependency on input images. Generally, fast BMA’s with reduced computational complexity by trying less search points than the FS show the degraded performance compared to that of the FS. Thus, the development of the fast algorithm which not only reduces the computational requirement but also achieves greater compression maintaining the same quality, has been required and searched for the real-time video coding system. In this paper, we propose a fast hierarchical motion vector estimation method which could provide good performance with reduced computational complexity. The proposed method can trade the computational complexity with the quality of the reconstructed images, if necessary. In other words, its computational complexity can be controlled, ranging between half of that of the TSS and half of that of the FS, while its peak signal to noise ratio (PSNR) lies between those of the TSS and FS. In Section 11, the proposed hierarchical motion estimation method using mean pyramid is presented, and the modification of the proposed method for enhancing its Performance is introduced in Section 111. Comparison of computational complexity of the proposed method and the conventional ones are given in Section I V . Computer simulation results and their discussions are presented in Section V and finally conclusions are given in Section VI.

11. PROPOSED HIERARCHICAL BMA USINGMEANPYRAMID A large number of image processing methods using pyramidal structure have been presented. Hierarchical motion vector estimation proceeding from the higher level to lower ones reduces the computational complexity due to the reduced image size at the higher level. The proposed hierarchical method for finding motion vectors, based on the tree pruning, can greatly reduce the computational complexity by using mean pyramid, maintaining its PSNR performance comparable to that of the TSS.

A. Constructing Mean Pyramid In the conventional the TSS [5] for detecting the optimal motion vector, error calculation is based on all pixels in the I x

1051-8215/95$04.00 0 1995 IEEE

I

NAM et al.: A FAST HIERARCHICAL MOTION VECTOR ESTIMATION ALGORITHM

J subblock and error calculations at 9 points are needed at each step. To find the optimal motion vector for each subblock, 25 error calculations are required in total, neglecting the repeated calculations at the centers of the second and third steps. By using the mean pyramid in matching process, especially at levels 1 and 2 of the pyramid, it is possible to alleviate the computational complexity because of the reduced image size. If the pyramidal images are constructed by subsampling, the presence of noise leads to incorrect motion estimation, which makes the performance of the proposed algorithm worse than that of the TSS. Thus, to find the coarse motion vector correctly at the higher level, image pyramids are constructed by using a lowpass filter. If the Gaussian lowpass filter [9] is employed for pyramid construction, the prediction errors are reduced a little, but it requires heavy computational complexity. To satisfy our goal of implementing a fast hierarchical BMA, therefore, the 3-level pyramidal images are constructed by simple averaging: 1

where g L ( p , q ) represents the gray level at the position ( p , q ) of the Lth level and go@, q ) denotes the original image. Note that [ 1 represents the truncation. The construction of mean pyramid by simple nonoverlapping lowpass filtering is done by assigning a mean gray level of pixels in a lowpass window to a single pixel at the next level. The truncated mean value of four pixels at the lower level is recursively used in generating mean pyramid. Therefore, at the highest level having the coarsest resolution, the remaining lowpass components make it easy to find the global motions; thus, the influences of noise are greatly reduced. Also the computational complexity is reduced because of the reduced image size.

B. Motion Vector Estimation The general matching criteria used in BMA include the mean square error (MSE), normalized cross correlation function (NCCF), and mean absolute difference (MAD). Among them, the MAD is widely used because it has low computational complexity and relatively good matching results. With an I x J subblock assumed, the MAD is defined by

where arguments m and n denote the relative displacement in the search area, and g k ( i, j ) represents the gray level at the position ( i , j ) in a subblock on the Icth frame. The proposed method also uses the MAD. On the pyramidal images generated by (l), one pixel at level 2 corresponds to a 4 x 4 block and 2 x 2 block at levels 0 and 1, respectively. Therefore, at level L, I , and J in (2) are replaced by I / 2 L and J / 2 L ( L = 0 , 1 , 2 ) , respectively, and thus, (2) for the

345

proposed hierarchical BMA can be rewritten as -

I'

J'

- gL,k-l(i

+ m , j + n)1

L = 0,1,2,

m,n = 0 , f l

(3)

where the subscripts L and k denote the Lth level and the kth frame, respectively. Also note that I' = I / 2 L and J' = J / 2 L . After construction of mean pyramid, first the motion vectors are searched at level 2 with (3), and the motion vector having the smallest M A D L ( ~n,) is selected as the coarse motion vector at that level. The motion vector detected at this level is propagated to the next lower level in order to be used as an initial vector for motion search at that level. That is, the detected motion vector at the higher level is transmitted to the lower level and it guides the refinement step at that level. This motion estimation process is repeated once more down to level 0, which is illustrated in Fig. l(a). If the motion vector at level L is represented by MVL(T,s), the detected motion vector at level L can be written as MVL(T,S ) = 2 x MVL+I(T,S) L = l,o

+

M V L ( ~s ,) , (4)

where ( T , s ) denotes the position of the search block and A MVL(T , s) is the updated increment of motion displacement at level L. In this case, one-pixel interval at levels 2 and 1 correspond to 4-pixel and 2-pixel intervals at level 0, respectively. Thus, the maximum displacements in motion vector refinement steps of the proposed algorithm are equal to those of the modified the TSS, adopted by CCITT H.261 RM8, resulting in the maximum displacement of 7 pixels at level 0. Fig. l(b) shows the one-dimensional representation of the search process of the proposed hierarchical method. At each level, search is performed with the possible displacement (m and n in (2)) of -1, 0, 1, thus, there exist 9 possibilities. The number L in Fig. l(a) and (b) denote the detected motion vector and possible motion vectors at level L, respectively, and note that the detected motion vector is propagated to the next lower level to guide the further refinement of the motion vector. In order to detect effectively large or abrupt motions, the proposed method is extended to the 4-level hierarchy in a similar way. In the 4-level hierarchy case, one-pixel interval at level 3 corresponds to 8-pixel interval at level 0, thus, the maximum displacement of 15 pixels is assumed for motion estimation. So the proposed 4-level algorithm can effectively detect large motions at the expense of a slight increase of the computational requirement. C. Half-Pel Search

The sub-pel search is performed to reduce further the prediction error, which is defined by the difference between the original image and the motion-compensated image reconstructed by the detected motion vector. With this technique, a motion vector is refined further by f1/2- or fU4-pixel compared to the integer-pel motion vector detected. By the

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 5, NO. 4, AUGUST 1995

346

sub-pel search, the PSNR is increased. Here the half-pel search is implemented in the proposed hierarchical BMA. For the half-pel search, the image with half-pel resolution is generated by using the interpolation formula (9,recommended by MPEG SM3

I

I

+ -574) = ( d P 7 4 ) + d P + 1,4 ) ) / / 2 + .5) = ( d P , 4 ) + S ( P , 4 + 1))//2 S ( P + . 5 , 4 + .5) = ( d P , 4 ) + S ( P + 114) + S ( P , 4 + 1) (5) + d P + 1,Q + 1))//4 S(P

S ( P ,4

where // signifies the roundoff operation. The resulting motion vector having half-pel resolution gives the minimum MAD among motion vectors around and including the integer one. 111. MODIFICATION OF THE PROPOSED METHODFOR PERFORMANCE IMPROVEMENT

The proposed fast method described has better performance than the TSS occasionally, since mean pyramid constructed by lowpass filtering becomes less sensitive to noise. But its performance is still worse than that of the FS as the performance of the TSS is. The performance difference between the FS and TSS exists because the TSS does not search all the points in the search area and thus, it may find the local minimum rather than the global one. Similarly, the proposed hierarchical technique may fall into the local minimum at the higher level, and stays near there in the refinement step of motion vectors. Consequently, the performance of the proposed method cannot approach to that of the FS. Thus, it is expected that if the local minima problem is overcome, then the performance of the proposed method becomes better. The idea behind the modification of the proposed algorithm is to try more candidate motion vectors to overcome the local minima problem. Since 9 MAD’S at the highest level of the pyramid are calculated based on the relatively small blocks, almost the same values are likely to appear at several points. Thus, for solving the local minima problem, the proposed method is modified, in which more than one candidate motion vector is assumed. The originally proposed method discussed in Section I1 starts with a single initial motion vector at each level, but the modified hierarchical technique deals with several initial motion vectors at each level. A number of motion vectors at the highest level are independently propagated to the next level and the optimal number of candidate motion vectors is to be determined. For each candidate motion vector, the motion vector having the minimum MAD value is selected independently, then these motion vectors are propagated to the next lower level as initial motion vectors. The same search process is repeated once more and among all the candidate motion vectors at the lowest level, a motion vector having the smallest MAD is selected as the final motion vector for the subblock. Since the maximum displacement of the motion vector at each level is assumed to be 1, the maximum number of candidate motion vectors is limited to 9. Although this modification increases linearly the computational requirement as the number of candidate motion vectors n, increases, it makes the PSNR performance approach to that of the FS. This modified search process of the

(d) Fig. 1. (a) Hierarchical motion vector estimation with nc = 1. (b) One-dimensional representation of (a). (c) Hierarchical motion vector estimation with nc = 3. (d) One-dimensional representation with nc = 2.

proposed algorithm is illustrated in Fig. l(c) and (d). Fig. l(c) shows the search process with three candidate motion vectors. Three candidate motion vectors are searched independently at each level and then at the lowest level a single motion vector having the smallest error is selected. Fig. l(d) shows the one-

347

NAM et al.: A FAST HIERARCHICAL MOTION VECTOR ESTIMATION ALGORITHM

dimensional representation of motion vector search starting with two candidate motion vectors at level 2.

Iv. COMPARISON OF COMPUTATIONAL COMPLEXITY

16 x 17 x (1addition, 1 division) x 2 1 division)

= 1411 additions, 833 divisions

Subtraction

Absolute Operation

Addition

FS

57,600

57,600

57.375 6,375

Method

The computational requirement for a 16 x 16 search block of the TSS and the proposed method is as follows. For the TSS, 256 subtractions, 256 absolute operations, and 255 additions are required for a single search point, thus totally, with 25 search points, 6400 subtractions, 6400 absolute operations, and 6375 additions are needed. For the proposed algorithm with n, = 1, the following operations are required. At level 2, 16 subtractions, 16 absolute operations, and 15 additions, at level 1, 64 subtractions, 64 absolute operations, and 63 additions, at level 0, 256 subtractions, 256 absolute operations, and 255 additions are needed for each search point. Thus totally, with 9 search points, 3024 subtractions, 3024 absolute operations, and 2997 additions are needed to detect the motion vector for each subblock. In addition, for mean pyramid generation 80 x (3 additions, 1 division) operations are needed. Note that for each pixel 3 additions and 1 division are required and 80 (= 8 x 8 + 4 x 4) averaging operations are needed for error calculation, with a 16 x 16 subblock assumed (16 x 16 subblock at level 0 generates 8 x 8 and 4 x 4 subblocks at levels 1 and 2, respectively). Since the division operation by 4 for mean pyramid generation is easily implemented with shift right operations of registers and the number of the division operations is far less than that of the additiodsubtraction operations, the computation time for the division operations is neglected in estimating computational requirement. Then, the computational requirement for the proposed algorithm with n, = 1 is about 55% of that of the TSS. Note that if we consider the half-pel search, the corresponding overhead is equally added to both search algorithms. That is, for the half-pel resolution image, the computational requirement in (6), for interpolation by (5), is needed:

+ 17 x 17 x (3 additions,

TABLE I COMPARISON OF THE COMPUTATIONAL COMPLEXITY OFTHE PROFWED METHOD(16 x 16)

(6)

and for the MAD calculation for 8 half-pels around the integer motion vector, the computational requirement in (7) is also required:

8 x (256 subtractions, 256 absolute operations, and 255 additions) = 2048 subtractions, 2048 absolute operations, and 2040 additions. (7) The computational requirements for the FS, TSS, and the proposed method along with the half-pel search are summarized in Table I. For the modified hierarchical BMA presented in Section 111, n, x (the computationalcomplexity required for a single candidate motion vector search) is needed. Thus, the computational complexity is increased linearly with n,. Note that the computation required for mean pyramid generation is needed only once; thus, not a function of n,. Computer simulation of various methods on a workstation show the similar trend to the above analysis.

For Pyramid Construction

TSS

6,400

6,400

PI”””d

3.024

3.024

2.997

80x(3 additions. 1 division)

Half-oel

2.048

2.048

2.040

1,411

~~

v.

additions. 833 divisions

EXPERIMENTAL RESULTSAND DISCUSSIONS

We compare the performance of the proposed method with that of the conventional BMA’s and MPEG SM3. For the experiment, generally used 720 x 480 CCIR 601 format images such as “Table Tennis” and “Susie” sequences consisting of 29 frames each, “Flower Garden” sequence consisting of 31 frames, “Football” and “Popple” sequences consisting of 150 frames each, and “Mobile” sequence consisting of 136 frames are used as test sequences. For the quantitative evaluation, the PSNR defined by

P

q

is used, where P x Q denotes the image size, g ( p , q ) and g , ( p , q ) are the pixel values at a position ( p , q ) of the original and reconstructed images, respectively. In addition, subjective comparison of the difference images between original and reconstructed images is also made. Since the proposed method has the same displacement refinement steps as the TSS, the TSS and FS are also simulated for the performance comparison. In our experiments, 8 x 8 and 16 x 16 subblocks are employed. The maximum search displacement is assumed to be 7 and only the forward motion search is incoorperated. The half-pel search is also simulated and the resulting improvement of the PSNR performance, compared to the integer-pel search, is demonstrated. Also to analyze the performance of the proposed method having 4level hierarchy, it is compared to that of the FS with a 16 x 16 subblock. Furthermore, by combining the proposed method with MPEG SM3, we show the efficiency of the proposed hierarchical technique.

A. Experimental Results with 3-Level Hierarchy ( 8 x 8 Subblock) The average PSNR for each sequence is summarized in Table I1 in which Prop. n, indicates the proposed method with n, candidate motion vectors. As shown in Table II(a), Prop. 1 shows comparable performance to the TSS (with -0.5 +0.5 dB difference depending on the test sequence), but its computational complexity is about a half of that of the TSS. The proposed method with nc = 1, whose refinement step corresponds to that of the TSS, shows better results for the “Mobile” and “Popple” sequences in which a relatively regular motion and a rotational motion exist, respectively. Fig. 2 shows difference images for each method between the original and reconstructed images, which are magnified by a

-

~

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 5, NO. 4, AUGUST 1995

348

TABLE I1 AVERAGE PSNR (dB) OF THE PROPOSED METHODWITH 3-LEVEL HIERARCHY. (a) 8 x 8 SUBBLOCK;(b) 16 x 16 SUBBLOCK Image Sequence Accurscy

Integer-pel

Integer-pel

Method

Table T.

Flower G.

Susie

Popple

Football

Mobile

TSS

26.91

26.45

37.71

26.86

25.73

23.35

1

26.57

25.90

37.61

27.10

25.53

24.01

Prop. 2

26 96

27.21

38.33

27.91

26.16

25.65

5

27.26

27.92

38.64

28.63

26.63

26.60

9

27.31

28.08

38.79

28.87

26.77

26.75

FS

27.39

28.13

38.94

29.12

26.83

27.04

TSS

27.71

27.30

39.03

27.55

26.34

24.40

1

27.50

26.80

38.95

28.02

26.14

25.50

Prop. 2

27.90

28.25

39.75

28.86

26.83

27.45

5

28.07

29.07

40.04

29.62

27.37

28.59

9

28.10

29.31

40.10

29.88

27.54

28.74

28.12

29.31

40.12

29.98

27.59

28.86

FS

(a)

Imaee Seauence ~~~~~~~~

Accuracy

Integer-pel

Table T.

Flower G.

Susie

Popple

Football

Mobile

TSS

26.29

26.13

37.41

26.03

24.22

23.89

1

26.42

26.25

37.56

26.46

24.26

25.05

Prop. 2

26.56

26.63

37.87

26.92

24.49

25.50

5

26.63

26.83

38.01

27.24

24.63

25.64

9

26.64

26.85

38.05

27.33

24.67

25.67

FS

26.69

26.87

38.16

27.48

24.70

25.73

TSS

27.08

27.07

38.97

26.90

24.76

25.22

27.35

27.21

39.13

27.47

24.82

26.72

1 Integer-pel

~

Method

Prop. 2

27.44

27.60

39.40

27.89

25.06

27.14

5

27.48

27.82

39.45

20.22

25.22

27.28

9

27.48

27.86

39.47

28.32

25.26

27.30

27.50

27.88

39.50

28.38

25.28

27.32

FS

factor of 8 for easy comparison. Difference images by various fast algorithms look similar, and in the reconstructed image by the TSS, the letters in the calendar are not satisfactorily reconstructed because of the local minima effect. On the other hand, the proposed method determines well the global motion vector based on the lowpass-filtered images at the higher level; thus, it effectively overcomes the local minima problem. The larger n,, the higher the possibility of settling at the global minimum rather than the local one, and the better the quality of the reconstructed images. We observe experimentally that the PSNR of the proposed method approaches that of the FS as n, becomes large. This effect appears more remarkable when the performance difference between the TSS and FS is large. Fig. 3 shows the average PSNR of the proposed method as a function of n,, where HFS, HTSS, and Hprop. denote the FS, TSS, and the proposed hierarchical method, with the half-pel search, respectively. The numbers on the vertical axis are the maximum and minimum PSNR values resulted by six methods. Note that the FS and TSS are independent of n,. From Fig. 5, it is observed that the PSNR of the proposed method increases from that of the

(d) Fig. 2. Difference image between the original and reconstructed images (“Mobile,” Y component). (a) FS. (b) TSS. (c) Proposed method (nc = 1). (d) Proposed method ( n , = 9).

NAM et al.: A FAST HIERARCHICAL MOTION VECTOR ESTIMATION ALGORITHM

P

FS HFS

o rss X

HTSS

0 Prop.

Hprop.

(b)

S N

B CdB) 37.61

1 s

-

I

1

I

2

I

3

4

1 “



I

6

o rss

FS P HFS

X

1 -

I

5

”1

I

I

7

8

I

9

ne

0 Prop.

HTSS

Hprop.

(C)

28.86

27.59

7

P S

N II

CdB)

. - ” ” .

25.53 -

I

1

o

1

2

FS P HFS

1

3

1

4

1

o rss X

HTSS

.

A

1

6

5

-

7

A

1

8

1

9

nc

Prop. Hprop.

(e)

B

FS HFS

o rss X

HTSS

e Prop. Hprop.

(f)

Fig. 3. Average PSNR of the proposed method as a function of the number of candidate motion vectors (8 x 8). (a) “Table Tennis.” (b) “Flower Garden.” (c) “Susie.” (d) “Popple.” (e) “Football.” (f) “Mobile.”

TSS to that of the FS as n, increases. When nc is equal to the maximum value of 9, for all test sequences, the PSNR of the proposed method approaches that of the FS. In this case, the computational complexity is nearly 9 times as high as that of the proposed method with a single candidate, or equivalently as half as that of the FS. With the proper n, the better PSNR performance than that of the TSS is obtained with the computational complexity greatly reduced compared to that of the FS. For example, with two candidate motion vectors, for all test sequences, the PSNR performance of the proposed algorithm is in the middle of those of the TSS and FS, with its computational complexity comparable to that of the TSS. The results of the half-pel search are also listed in Table II(a). Compared to the performance with the integer-pel search, on the average, the PSNR performance with the half-pel search increases by about 1 dB. The PSNR increases nonlinearly to that of the FS as nc becomes large. With 9 candidate motion vectors, the PSNR of the proposed technique approaches that of the FS (note that the difference is 0.001 0.1 dB). N

B. Experimental Results with 3-Level Hierarchy (16 x 16 Subblock) The results with a 16 x 16 subblock listed in Table II(b) show the similar tendency as observed in Table II(a). The PSNR of the proposed method with nc = 2 is greater than that of the TSS by a half of the PSNR difference between those of the TSS and FS, with its computational complexity comparable to that of the TSS. Contrary to the 8 x 8 subblock case, note that the PSNR of the proposed method with a single candidate is greater than that of the TSS. With a 16 x 16 subblock, 16 pixels are incorporated to calculate the MAD for the coarse motion detection at level 2, while with an 8 x 8 subblock only 4 pixels are used. Thus, with a 16 x 16 subblock, the possibility of detecting the local minimum at the highest level is decreased, which results in the improved PSNR of the proposed algorithm, compared to the TSS. Note that the PSNR of the proposed method with a 16 x 16 subblock is less than that with an 8 x 8 subblock by 0.5 2 dB, depending on the test sequence. Of course, the subblock size provides a N

350

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 5, NO. 4, AUGUST 1995

tradeoff between the PSNR performance measure and the data amount required for motion vector transmission. Table II(b) shows also the result of the half-pel search with a 16 x 16 subblock. The overall performance has similar tendency to that with an 8 x 8 subblock.

C. Experimental Results with 4-Level Hierarchy (16 x 16 Subblock) We extend the proposed algorithm to 4-level hierarchy; thus, the maximum displacement is 15. The performance of the proposed 4-level algorithm is compared to that of the FS with a 16 x 16 subblock, as shown in Table 111. Table I11 shows the average PSNR of the integer-pel search for several test sequences. In Table III(a), Prop. n, represents the proposed 4-level algorithm with n, candidate motion vectors at level 3. At levels 2, 1, and 0, for each candidate motion vector, a single motion vector having the minimum error is selected, among 9 vectors, as a candidate motion vector and it is propagated to the next lower level. In Table TII(b), Prop. n,1nc2 denotes the proposed 4-level algorithm in which there are n,1 and n,2 candidate motion vectors at levels 3 and 2, respectively. At levels 1 and 0, each candidate motion vector is propagated to the next level as previously described in a case in Table III(a). Note that, as shown in Table IV, the proposed 4-level algorithm gives better performance than the FS for the “Football” sequence which contains large and abrupt motions. With other test sequences, it gives similar tendency to that observed by the proposed 3-level algorithm (see Table TI). The technique of increasing the number of candidate vectors can be applied to the TSS, resulting in performance improvement. But its PSNR is smaller than that of the proposed method, since the proposed method can search well the global minimum.

D. Application Experiments to MPEG SM3 For the performance analysis of the proposed hierarchical BMA, the proposed method is combined with the MPEG SM3 [IO]-[12]. Four test sequences with the CCIR 601 format (luminance resolution: 720 x 480, chrominance resolution: 360 x 240) are converted to an MPEG SM3 source input format (SIF) (luminance resolution: 352 x 240, chrominance resolution: 176 x 120) by using a subsampling filter [lo]. “Edited” and “Football” sequences consisting of 145 frames each, and “Popple” and “Mobile” sequences consisting of 133 frames each are used. They have relatively large and complex motions compared to the common head-and-shoulder images. The “Edited” sequence has scene changes at every 30th frame, that is, it consists of 5 different subsequences. In experiments, it is assumed that one group of frames (GOF) consists of 15 frames; 1 intraframe, 4 predicted frames, and 10 interpolated frames. Motion search has the half-pel accuracy. The fixed channel transmission rate is set to 1.15 Mbps, the transmission buffer capacity is limited to 120 Kbits, and the initial buffer state is assumed to be 30 Kbits. Table IV shows the average PSNR of the MPEG SM3 with various motion search method where Prop. n, represents the proposed method with n, candidate motion vectors. As seen in

TABLE 111 AVERAGE PSMR (dB) OF THE PROPOSED METHODWITH 4-LEVEL HIERARCHY (8 x 8). (a) CASEWHICHHAS MULTIPLE CANDIDATE ONLYAT LEVEL3, (b) CASEWHICHHAS MOTIONVECTORS MULTIPLE CANDIDATE MOTIONVECTORS AT LEVELS3 AND 2 Image Sequence Method FS

Table T.

Flower G.

27.19

Susie

Popple

Football

Mobile

28.33

38.05

28.31

26.21

25.79

1

26.48

24.62

37.16

26.36

24.97

24.00

2

26.85

27.08

37.57

27.16

25.69

24.97

~

~

3

26.95

27.74

37.69

27.47

25.97

25.21

4

27.01

27.93

37.72

27.64

26.08

25.32

Prop. 5

27.03

28.02

37.77

27.78

26.14

25.38

6

27.05

28.06

37.80

27.77

26.17

25.42

7

27.06

28.08

37.83

27.81

26.20

25.46

8

27.07

28.10

37.86

27.83

26.21

25.48

9

27.07

28.11

37.88

27.85

26.22

25.49

Method

Table T .

Flower G.

Susie

Popple

Football

Mobile

FS

27.19

28.33

38.05

28.31

26.21

25.79

Image Sequence

12

26.61

25.02

37.43

26.68

25.02

24.38

15

26.71

25.46

37.64

27.01

25.19

24.66

19

26.75

25.69

37.74

27.15

25.24

24.76

22

26.94

27.46

37.84

27.45

25.84

25.28

25

27.01

27.68

37.98

27.69

25.94

25.45

Prop. 29

27.03

27.76

38.04

27.79

25.96

25.49

52

27.11

28.26

37.96

27.94

26.19

25.57

55

27.14

28.31

38.03

28.06

26.23

25.65

59

27.15

28.31

38.06

28.11

26.24

25.67

92

27.14

28.30

38.03

28.03

26.24

25.63

95

27.15

28.32

38.06

28.12

26.27

25.67

99

27.16

28.32

38.06

28.14

26.27

25.67

TABLE IV AVERAGE PSNR (dB) OF THE MPEG SM3 COMBINED WITH VARIOUS SEARCH METHODS Search Method ~~

Image Sequence

Prop. TSS

1

2

5

9

FS

Edited

28.87

28.96

28.98

29.03

28.99

29.10

Football

26.35

26.50

26.49

26.60 26.52

26.70

Popple

29.89

30.01

30.07

30.10

30.12

30.35

Mobile

23.56

23.63

23.65

23.67 23.68

23.74

Table IV, the average PSNR of the TSS is smaller than that of the FS. On the other hand, the average PSNR of the proposed method approaches that of the FS as n, becomes large. Thus, implemented in the MPEG SM3, the proposed algorithm with a proper n, shows comparable performance to that of the FS with its computational requirement greatly reduced.

NAM et al.: A FAST HIERARCHICAL MOTION VECTOR ESTIMATION ALGORITHM

VI. CONCLUSION

35 1

[9] P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. COM-31, pp. 337-345, Apr. 1983. [lo] IS0 CD 11172-2 rev 1, “Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbitls,” Nov. 1991. [l 11 ISOAEC JTCl/SC2/WG11, “MPEG video simulation model three,” MPEG90/041, July 1990. [12] CCITT SGXV Working Party XV/4, Description of Ref. Model 7 (RM7), CCITT SGXV Doc. #446, 1988.

In this paper, a hierarchical motion vector search algorithm based on mean pyramid is proposed, which has similar motion refinement steps as the TSS. Based on mean pyramid the proposed hierarchical motion estimation technique reduces the computational complexity greatly by introducing the hierarchical refinement steps of motion vectors. The reason for the performance degradation, compared to the FS, of the TSS and the proposed method with a single candidate motion vector Kwon Moon Nam was born in Kangnung, Kois due to finding the local minimum rather than the global rea, on July 24, 1968. He received the B.S. and one. The hierarchical structure and several candidate motion M.S. degrees in electronic engineering from Sogang vectors reduce the possibility of finding the local minimum University, Seoul, Korea, in 1991 and 1993, respectively. of the error measure, resulting in performance improvement. Currently, he is with Samsung Electronics Co. Thus, the proposed hierarchical BMA closely approaches the Ltd. His research interests include image coding and FS in terms of the PSNR with its computational complexity computer vision. greatly reduced. With a single motion vector, the proposed hierarchical BMA shows the PSNR comparable to that of the TSS. With two candidate motion vectors, the proposed hierarchical BMA gives larger PSNR than the TSS, by one half of the difference between those of the FS and TSS. Also with 9 Joon-Seek Kim was born in Seoul, Korea, in 1963. candidate motion vectors, the proposed technique approaches He received the B.S., M.S., and Ph.D. degrees the FS, in terms of the PSNR measure, with its computational in electronic engineering from Sogang University, requirement being approximately one half of that of the FS. Seoul, Korea, in 1987, 1989, and 1993, respectively. He joined the Department of Electronic EngiWith the half-pel search, the performance of the proposed neering of Hoseo University, Chungnam, Korea, in method more rapidly approaches that of the FS and the per1994, where he is currently an Assistant Profesformance improvement over the integer-pel search is nearly sor. His major research interests are image coding, computer vision, and pattern recognition. the same as other search methods. Also the proposed 4level algorithm shows the performance similar to that of the FS. But the computational complexity of the proposed one is greatly lower than that of the FS. Furthermore, it is observed experimentally that the MPEG SM3 combined with Rae-Hong Park (S’7&M’84) was born in Seoul, the proposed hierarchical search algorithm shows performance Korea, in 1954. He received the B.S. and M.S. decomparable to the MPEG SM3 combined with the FS, with grees in electronicsengineering from Seoul National greatly reduced computational requirement. University, Seoul, Korea, in 1976 and 1979, respectively, and the M.S. and Ph.D. degrees in electriFurthermore, using repeatedly the same search unit, the cal engineering from Stanford University, Stanford, proposed algorithm can be easily implemented in hardware. CA, in 1981 and 1984, respectively. The proposed method is the motion search algorithm which He joined the Department of Electronic Engineering of Sogang University, Seoul, Korea, in 1984, achieves better performance with reduced computational comwhere he is currently a Professor. In 1990, he plexity; thus, it can be used as a fast BMA which reduces the spent his sabbatical year at the Computer Vision local minima effect effectively. Laboratory of the Center for Automation Research, University of Maryland, REFERENCES A. K. Jain, “Image data compression: A review,” Proc. ZEEE, vol. 69, pp. 349-389, Mar. 1981. A. N. Netravali and J. 0. Limb, “Picture coding: A review,” Proc. ZEEE, vol. 68, pp. 366406, Mar. 1980. N. S. Jayant and P. Noll, Digital Coding of Waveforms. Englewood Cliffs, NJ: Prentice-Hall, 1984, pp. 252-338. J. R. Jain and A. K. Jain, “Displacementmeasurement and its application in interframe image coding,” ZEEE Trans. Commun., vol. COM-29, pp. 1799-1808, Dec. 1981. T. Koga et al., “Motion-compensated interframe coding for video conferencing,” in Proc. Nat. Telecom. Con$. Nov./Dec. 1981, pp. G 5 . 3 . 1 4 5.3.5. M. Bierling, “Displacement estimation by hierarchical blockmatching,” in Proc. SPIE Con$ Visual Comnun. Image Process. ‘88, Cambridge, MA, Nov. 1988, vol. 1001, pp. 942-951. K. W. Chun and J. B. Ra, “Fast block-matching algorithm by successive refinement of matching criterion,” in Pmc. SPZE Cont Visual Commun. Image Process. ’92, Boston, MA, Nov. 1992, vol. 1818, pp. 552-560. K.3. Seo and J.-K. Kim, “Hierarchical motion vector estimation using the geometric relation of adjacent motion vectors,” in Proc. 1991 Korean Signal Process. Con$, Taegu, Korea, Sept. 1991, vol. 4, pp. 11-14.

College Park, MD, as a Visiting Associate Professor. His current research interests are image communication, computer vision, and pattern recognition.

Young Serk Shim (S’79-M’82) received the B.S.E.E. degree in 1976 from Seoul National University, the M.S. and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology, in 1978 and 1982, respectively. From 1983 to 1989, he was with Kyungpook National University, Taegu, Korea, as an Assistant Professor and Associate Professor, where he worked in the field of signal compression and transmission. From 1990 to 1994, he was with Korea Academy of Industrial Technology, where he was involved in the design and development of high definition television system. Since 1995, he has been with the Institute for Advanced Engineering. His current interests and responsibilities include the design and evaluation of digital video compression and transmission systems. Dr. Shim is a member of the Korea Institute of Telematics and Electronics and the Korea Institute of Communication Sciences.

Suggest Documents