Trends in Block-matching Motion Estimation Algorithms Jozef Huska and Peter Kulla Department of Radioelectronics, Faculty of Electrical Engineering and Information Technology, Slovak University of Technology, Bratislava
[email protected],
[email protected] Abstract The motion estimation (ME) and compensation technique (MC) is widely used for interframe video coding applications but the real-time high quality motion estimation is not easy due to its enormous computations. After huge research there is still up-to-date the question about accuracy of motion estimation techniques and computational complexicity. There presented ME algorithms are block-matching with possible implementation in MPEG-2 encoder mainly used in DVB and DVD standards. There are many computional effective block-matching ME algorithms but with trade-off between the algorithm accuracy and algorithm speed. Introduction The primary design objective in video compression is to minimize the average number of bits used to represent a video-sequence in digital form while maintaining sufficient video quality. The high degree of redundancy that exists between successive frames of video sequences makes it possible to achieve high compression ratios. In order to exploit the spatial and temporal redundancy in an image sequence, a number of techniques have been developed to minimize the amount of information required to faithfully reproduce the image sequence at the decoder. A high compression ratio is achieved using a lossy step at the expense of image quality. Video compression techniques rely on three principles: the reduction of spatial, temporal and statistical redundancies. Coding techniques which reduce the spatial correlation are referred to as intraframe coding, whereas those which tackle the temporal correlation are called interframe techniques. All these techniques are base of the video-encoder in MPEG-2 standard. It is no the newest compression standard, but today is the most fitting on in many applications. The choice to use MPEG-2 in DVD-video and DVB-S, C, T standards yielded in its popularity. The secret of the success rests in well chosen balance between computational complexity and compression ratio. Thank to the ISO standard formulation, where is only syntax specified, it is possible to enhance the encoding efficiency with use of new motion estimation algorithms. One way to enhance the compression ratio is to enlarge the GOP sequence by putting in more B-frames. It requires involving the motion estimation algorithm with accuracy as good as possible. The purpose of this paper is to lead reader through today's motion estimation algorithms. Motion estimation and compensation Motion estimation techniques have been applied for the reduction of temporal redundancies. A natural way to exploit redundancy between frames is for current frame t determine predicted frame t from the frame t-∆t or from the frame t+∆t. Motion estimation and compensation are used to predict frame t to be coded between successive frames. Motion compensation works by estimating motion between two image frames. The motion is described by motion field of motion vectors. Consequently, the prediction error is transmitted instead of the frame itself as shown in Fig. 1. Along with the prediction error, the motion information is also transmitted to the decoder, for it to be able to estimate the motion. The
very good proportion between motion overhead and prediction error has block-based motion representation. It uses one motion vector per one macroblock.
Fig. 1. Predictive source coding with motion compensation Motion estimation is the estimation of motion between frames, while motion compensation is the exploitation of motion information for efficient interframe coding. Motion estimation has to provide accurate prediction while at the same time have low overhead information. The designer has to trade-off prediction error with motion information. And so still are used block-based motion estimation techniques which provide the good compromise between prediction error and motion overhead. The motion estimator is a very critical part of the coder. It is shown in [6] that motion estimation consumes the most of processing time (more than 50%) in video compression. Image quality and bit rate are controlled at this step. The trade off between spatial and temporal resolution is a function of the amount of motion within the image sequence, and this affects the bit rate. Block-based motion representation To carry out motion compensation, the motion of the moving objects has to be estimated first. The commonly used motion estimation technique in all the standard video codecs is the block matching algorithm (BMA). It results from the high correlation between neighboring pixels, so it is not necessary to assign motion vector to each pixel. It is sufficient to determine one motion vector per a block of pixels. In a typical BMA, a frame is divided into blocks of M × N pixels or, more usually, square blocks of N2 pixels, [4]. Then, for a maximum motion displacement of R pixels per frame, the current block of pixels is matched against a corresponding block at the same coordinates but in the previous frame, within the square window of width/height N+2R. The best match on the basis of a matching criterion yields the displacement. There are different parameters of the BMA with the impact on performance and accuracy in motion estimation and compensation. The first one is the matching function that measures the distortion or the match between the block in the current frame and the displaced candidate block in the reference frame. The choice of a suitable criterion is very important, for it impacts both the prediction quality and the computational complexity of the algorithm. It is possible to choice one from normalized cross-correlation function (NCCF), mean squared error (MSE) or mean absolute difference (MAD), [6]. Another important parameter of the
BMA is the block size. By default are used two different sizes, 8x8 and 16x16. It can be seen in [6] that a smaller block size achieves better prediction quality at the expense of larger motion overhead. The maximum allowed motion displacement, also known the search range, has a direct impact on both the computational complexity and the prediction quality of the BMA. A small range results in poor compensation for fast moving areas and consequently poor prediction quality. A large range, on the other hand, results in better prediction quality but leads to an increase in the computational complexity. A larger search range can also result in longer motion vectors and consequently a slight increase in motion overhead. Search accuracy has the main influence on the prediction quality. The BMA was designed to estimate motion displacements with full pixel accuracy. But there is 2 dB prediction gain increase with half pixel accuracy, [6]. Other accuracy increase does not yield in essential prediction improvement but very fast rises up the computational complexity. Searching procedures The searching strategy is important issue to deal with in block matching. The first approach is full search. In searching for the best match, the correlation window is moved to each candidate position within the search window. There are a total (2d+1)x(2d+1) positions that need to be examined. The minimum dissimilarity gives the best match. The full search is brute force in nature and it delivers good accuracy in searching for the best match. But because of a large amount of computation is involved, it is useless in real-time encoding. Hence there are many fast algorithms developed. The well known are these: 2-D logarithmic search, coarse-fine three step search, conjugate direction search, diamond search and its combination with multiresolution structure, [6], [11]. There is still live research within this „classical“ algorithms. The last improvements of these algorithms are represented by threshold multiresolution block matching [4], octagonal search procedure [9] and optimization in inner diamond search, [10]. These „classical“ search procedures involve the quantitative differences in comparing the block candidates such as MAD and MSE characteristics. While the minimum MSE or minimum MAD criterion is meaningful and mathematically tractable, it fails to consider the element that is most important in many signal compression applications, and that is the human perception of the signal, [8].
a) b) Fig. 2: Compression block diagram: a) conventional hybrid motion-compensated DCT video coder, b) fully DCT-based motion-compensated video coder, [12] Better results in motion estimation could be achieved with use of DCT domain methods, [1]. Motion estimation in transform (DCT) domain has also another benefit. No longer is needed the decoding part as for motion estimation in spatial domain as shown in Fig. 2. First approach is application of MAD criterion in DCT frequency domain by formulating dctmad cost function, [8]. Applying MAD criterion in DCT domain is respecting the properties of human visual system which has lower sensitivity in higher spatial frequencies. Another approach in frequency domain motion estimation is the DCT pseudophase technique,
[5], [7], [8]. The DCT pseudophase technique employs sinusoidal orthogonal principles to extract shift information from the pseudophases hidden in the DCT coefficients of images. It has certain advantage over full search block matching approach (FSBMA). The FSBMA searches for the best match between the current block and the reference blocks found within a search area. Instead, in the DCT ME the search area is the same as the candidate block, [3]. The computational complexity of the DCT ME depends only on the block size, [7]. Another very similar method is the DCT based phase correlation motion estimation, [2]. Compared to the DCT pseudophase algorithm this one works in terms of original image signals without preprocessing. There are used only 4 of the 8 transforms on contrary in DCT pseudophase ME. Conclusions This paper deal with the trends in block based motion estimation algorithms. There can be seen the efforts in replacing the spatial domain ME with DCT domain ME techniques. Motion estimation in DCT domain delivers less number of computations within video compression because of unneeded transform into spatial domain for motion estimation. References [1] Nikola Božinović, Janusz Konrad: Motion analysis in 3D DCT domain and its application to video coding, Signal Processing: Image Communication, Vol. 20, Issue 6, July 2005,p.510-528 [2] Li, M., Biswas, M.: DCT-based phase correlation motion estimation, ICIP 2004, 24-27 Oct. 2004, p.445 - 448 [3] Kwang-deok Seo, Jae-kyoon Kim: Fast motion vector re-estimation for transcoding MPEG-1 into MPEG-4 with lower spatial resolution in DCT-domain, Signal Processing: Image Communication, Volume 19, Issue 4, April 2004, p. 299-312 [4] Mohammed Ghanbari: Standard Codecs: Image Compression to Advanced Video Coding, Institution of Electrical Engineers, 2003, p. 407 [5] J. Chen, U.-V. Koc: Design of digital video coding systems: a complete compressed domain approach, Marcel Dekker, New York, 2002. [6] Mohammed Ebrahim Al-Mualla, C. Nishan Canagarajah: Video Coding for Mobile Communications, Academic Press An Elsevier Science Imprint, 2002, p. 293 [7] Miia Viitanen, Pasi Kolinummi: Scalable DSP implementation of DCT-based motion estimation algorithm, Eurasip 2000 [8] Vinod Menezes, S.K. Nandy: Signal compression through spatial frequency-based motion estimation. Integration, the VLSI Journal, Vol. 22, Issues 1-2, August 1997, p. 115135 [9] Lap-Pui Chau, Ce Zhu: A fast octagon-based search algorithm for motion estimation, Signal Processing, Vol. 83, Issue 3, March 2003, p. 671-675 [10] Ce Zhu, Xiao Lin, Lap-Pui Chau: Efficient inner search for faster diamond search, Signal Processing, Volume 84, Issue 3, March 2004, p. 527-533 [11] Yun Q. Shi: Image and video compression for multimedia engineering: fundamentals, algorithms and standards, CRC Press LLC, 2000 [12] Ut-Va Koc, K. J. Ray Liu: Motion Compensation on DCT Domain, EURASIP Journal on Applied Signal Processing 2001, Issue3, p. 147–162 Acknowledgement This contribution has been supported by the Slovakia Ministry of Education under VEGA Grant No.G-1/ 3107/06.