A High Performance Algorithm for Fast Block Based Motion Estimation

0 downloads 0 Views 81KB Size Report
according to a predetermined criterion. This block is then used as a ... search (FS) in real time, these systems are expensive and are mainly used for video ...
A High Performance Algorithm for Fast Block Based Motion Estimation1 Alexis M. Tourapis, Oscar C. Au*, Ming L. Liou** Department of Electrical and Electronic Engineering The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong. Email: [email protected], [email protected]*, [email protected]** Tel.: +852 2358-7053*

Abstract In this paper we will present a new fast motion estimation algorithm that can be used in video coding. It is shown that the new algorithm is not only much faster than traditional algorithms, but in some cases can achieve much better visual quality, even from the “optimal” but computational intensive “full search” algorithm. Keywords: fast motion, estimation, circular, zonal

1. Introduction Coding of video sequences has been the focus of a great deal of research. By using motion estimation and compensation techniques, we are able to exploit the temporal correlation that exists between frames of video sequences. In most of the video standards, like MPEG1/2, ITU-T H.261/263, motion estimation can reduce temporal redundancy, which leads to high compression. The technique of block matching motion estimation is the one mostly used due to its simplicity. In this technique the current frame is divided into square blocks of pixels and then we try to find a block in a previous frame that is the closest to it, according to a predetermined criterion. This block is then used as a predictor for the current one and the displacement between the two blocks is used to define the motion vector associated with current block. The distortion measure used is typically mean absolute error (MAE or MAD) because it requires no multiplications and gives similar performance as the mean square error (MSE). The MAD for a block of size MxM inside the current block, compared to a block in the previous one is given as:

MAD(i, j ) =

1 M2

M

∑ I (x , y ) − I

x , y =1

t

t −i

( x + i, y + j )

(1)

If a maximum displacement of p pixels/frame is allowed, then we will have (2p+1)2 locations to search for the best match of the current block. The algorithm that examines all these locations is called the brute force exhaustive search (or full search (FS)) and it can be seen that it is very computational intensive. On the average it can use up to 80% of the computational power of the encoder. Even though there are currently many hardware systems that can implement full search (FS) in real time, these systems are expensive and are 1

mainly used for video production, and not for other real time applications such as video conferencing. In addition, even though considered optimal by many, full search is not optimal in all senses. It does not consider many other factors that could affect the performance of the coding, such as the rate requirement of the motion vectors. Many algorithms have been developed to overcome the computational cost of FS, but these algorithms usually make the search faster at the cost of incurring a significant loss in visual quality. Some examples include 3-step search [1], New 3-step search [3], 2D-Log search [2] etc. Most of these fast algorithms, if not all, do not take into consideration the bits required to encode the motion vectors, and thus are not exactly the best alternatives. An additional problem of FS, and of most fast algorithms, is that the estimated motion vectors are rather random since the motion vectors of adjacent blocks are not considered when performing the search, and thus the motion vectors could have a rather chaotic pattern. This can create problems when transmitting in noisy environments, since it is rather hard to predict and make the necessary corrections. Recently some new algorithms were proposed which take in consideration the effect of the bit requirement of the motion vectors in the motion estimation, and can achieve high computational speed up ratios with essentially the same, and in some cases better visual quality than FS [4,5]. Still these algorithms can still be significantly improved by combining them with other methods. In this paper, we propose a new algorithm called Half Stop Circular Zonal Search (HSCZS), which significantly improves the Circular Zonal Search (CZS) [5] by introducing one more criterion when moving from one zone to the next. The proposed algorithm is faster, if not much faster than most traditional motion estimation algorithms, and the PSNR of HSCZS can be as good as, and in some cases better than, the “optimal” full search. Due to the structure of the algorithm, the motion vectors that are created using HSCZS tend to be more regular than those of FS. 2. Circular Zonal Search (CZS) [5] In most cases, especially in video conferencing sequences, we can say that the center of the search area is mostly likely to be optimal due to the center-biased property. The remaining

This work was funded by grants HKTIIT92/93.001 and CRC98/01.EG05.

search points have decreasing likelihood to be optimal, as they move farther away from the center. It is also very possible for a block to have a motion vector that is close, if not equal to the motion vector of the adjacent block. Since in most of the common standards like MPEG1/2 and ITU-T H261/3, motion vectors are differentially encoded, this means that the closer the motion vectors of adjacent blocks are, the fewer bits they need. This allows us to spend more bits on coding the residue signal and thus improving the overall quality. CZS tries to take advantage of these facts. As shown in the CZS, probably the best way to do the search for the best matching block would be using a circular approach, starting from the center outwards. We can consider all blocks with a specific distance afar from the center to be equiprobable, and they should be scanned together. -7 -6 -5 -4 -3 -2 -1

0

+1 +2 +3 +4 +5 +6 +7

However, one problem of CZS is that not all blocks possess the center-biased property. For those cases there is no real improvement in speedup versus FS and the method needs to be improved. By including one more criterion in CZS it is possible to solve this problem, increase the motion estimation speed significantly, with little loss in visual quality. -7

3

4

2

3

4

0

4

3

2

1

2

3

4

+1

4

3

2

2

2

3

4

4

3

3

3

4

4

4

4

9

8

8

8

8

8

9

9

A

A

B

+3

8

8

7

7

7

7

7

8

8

9

9

A

+4

-5

A

9

8

7

7

6

6

6

6

6

7

7

8

9

A

+5

-4

9

8

7

7

6

5

5

5

5

5

6

7

7

8

9

+6

5

5

4

4

4

5

5

6

7

8

9

5

4

3

3

3

4

5

5

6

7

8

-1 0

8

7

6

5

4

3

2

2

2

3

4

5

6

7

8

8

7

6

5

4

3

2

1

2

3

4

5

6

7

8

+1

8

7

6

5

4

3

2

2

2

3

4

5

6

7

8

+2

8

7

6

5

5

4

3

3

3

4

5

5

6

7

8

+3

9

8

7

6

5

5

4

4

4

5

5

6

7

8

9

+4

9

8

7

7

6

5

5

5

5

5

6

7

7

8

9

+5

A

9

8

7

7

6

6

6

6

6

7

7

8

9

A

+6 +7

A

9

9

8

8

7

7

7

7

7

8

8

9

9

A

B

A

A

9

9

8

8

8

8

8

9

9

A

A

B

Fig.1. Definition of Circular zones in a ±7 search window.

In this method, instead of selecting the optimal block according to the search criterion, we select one close to optimal (Fig. 1), by using a thresholding criterion. It can be seen that this method always benefits the center and all the locations that are closer to the center. When the encoder finds a block that matches this particular threshold inside a search zone, the search stops without having to examine any of the other zones. In many cases the optimal block might be too far from the center of the search, which might lead into a large motion vector. By selecting a suboptimal one, it could also mean that the motion vectors used are smaller and this might allow us to use more bits for encoding the actual difference block. It is also possible to define multiple thresholds and/or different thresholding criteria for the different types of frames, which can lead to several tradeoffs between speed and quality. As we have mentioned before, motion vectors are differentially encoded. Thus in CZS in order to increase performance, we also perform a small circular type search around a predicted motion vector, taken as the best block found in the previous adjacent block (Fig 2).

4

3 2

9

5

4

3 2

9

6

4 4 3

A

6

+1 +2 +3 +4 +5 +6 +7

4

9

7

0

-1

-2

A

7

-1

minimum block with MAD < Thres

-3

A

8

-2

-4

B

8

-3

-5

-7

9

-4

-7

-6

-2

-5

-6

+2

-3

-6

+7

predicted from previous block

c

c

c

c

b

b

b

c

c

b

a

b

c

c

b

b

b

c

c

c

c

Fig. 2. Circular zones after considering the predicted MV

3. Half Stop Circular Zonal Search (HSCZS) Most of the conventional fast algorithms are based on the assumption that the block distortion measure increases monotonically as the checking point moves away from the global minimum [2]. By taking this in consideration, even though this is not always the case [3] we can define one more stopping criterion for the Circular zonal search. If a minimum has been found in a zone i, which does not yet satisfy our thresholding criterion, but after examining the next n zones, this has not been updated, this could mean that this minimum might be the real minimum with a high probability. Thus it would be a good choice to stop the search at that point and select that block as our minimum, without having to wait for the search to examine the rest of the zones. This can be applied to both the zones created around the predicted MV and the center, without affecting the thresholding criterions. In this way, the new algorithm can significantly improve the CZS algorithm, since the further we move away from the center, or the real motion vector, it is very unlikely that the distortion will keep on reducing further.

4. Algorithm for HSCZS Here is the algorithm of the proposed HSCZS Algorithm to estimate the motion vector MV of the current block: Step 1: If the current block is the leftmost block, set MVpredicted to be (0,0). Otherwise, set MVpredicted to be the motion vector of the previous block. If MVpredicted = (0,0), go to step 6. Set MinZone = -1. (Circular search around predicted motion vector)

Step 2: Construct M circular zones around MVpredicted in the search window. Set i =1.

Step 3:

(Thresholding criterion) If (i - MinZone)> TZ1 goto step 12.

Step 4: Compute MAD for each search point in zone i. Let MinMAD be the smallest MAD up to this point. Let MinZone be the zone where the smallest MAD has been found up to now. Step 5: If MinMAD< T1, goto step 12. Else if i TZ2 goto step 12.

Step 8: Compute MAD for each search point in zone i. Let MinMAD be the smallest MAD up to this point. Let MinZone be the zone where the smallest MAD has been found up to now. Step 9:

If MinMAD< T2 or LAST = true, goto step 12.

center biased. It can be seen that for tennis sequence we can achieve a speed up of 10 times compared to FS with a very small degradation in quality, where as CZS performs rather poorly. As in CZS the gain in PSNR is possible because the motion vectors of adjacent blocks are closer to each other and, since they are differentially encoded, fewer bits are needed to code the motion vectors leaving more bits to code the residue. Now though due to the new criterion, even more blocks are found to be closer to each other, and much faster (about 3-4 times faster than CZS). We can also notice that by adjusting the different thresholds we can achieve different tradeoffs between speedup factor and quality. It is also the case that HSCZS can perform even better in lower bit-rates, due to the reduced overhead required from the motion vectors, which greatly affects lower bitrates. In summary, the proposed half stop circular zonal search (HSCZS) has the following characteristics: 1. it can achieve various speedup and quality tradeoffs by adjusting the thresholds, 2. a small gain in PSNR is possible at lower and in some cases higher speedup factors, 3. the loss in PSNR at higher speedup factors is negligibly small, 4. it performs better in lower bit rates, which are the bit rates of interest for video conferencing, 5. it is moderately fast for even non video conferencing sequences without much degradation in quality.

Step 10: If T2< MinMAD< T3, set LAST = true. 6.

References

Step 11: If i

Suggest Documents