An 8x8-Block Based Motion Estimation Using Kalman Filter V. Ruiz1,2, V. Fotopoulos2, A.N. Skodras1,2 and A.G. Constantinides3 1
Electronics Laboratory, University of Patras, Patras GR-26110, Greece Computer Technology Institute, P.O. Box 1122, Patras GR-26110, Greece 3 Signal Processing Section, DEEE, Imperial College, London SW7 2BT, UK
[email protected] 2
Abstract It is now quite common in the pel-recursive approaches for motion estimation, to find applications of the Kalman filtering technique both in time and frequency domains. In the block-based approach, very few approaches are available of this technique to refine the estimation of motion vectors resulting from fast algorithms such as the three step on a 16x16-block basis. This paper proposes an 8x8-block based motion estimation which uses the Kalman filtering technique to improve the motion estimates resulting from both the three step algorithm and the 16x16-block based Kalman application of [22]. The statespace representation uses a first order auto-regressive model. Comparative results obtained for different classes of video sequences are presented.
1. Introduction In the field of motion estimation for video coding many techniques have been applied [1-5]. It is now quite common to see the Kalman filtering technique and some of its extensions to be used for the estimation of motion within image sequences. Particularly in the pixel-recursive approaches, which suit very much the Kalman formulation, one finds various ways of applying this estimation technique both in the time and frequency domains. On a very general perspective, we find use of Kalman filter (KF), which implies linear state-space representations and the extended Kalman filter (EKF), which uses the linearised expressions of non-linear statespace formulations [6-7]. Moreover, the parallel extended Kalman filter (PEKF) which consists of a parallel bank of EKF’s [8], is often encountered in practice. In the block-based motion-compensated prediction approaches, the most common procedure is the block-
matching technique [3-5]. Given a macroblock in the current frame, the objective is to determine the block in a reference frame (one of the past for forward motion estimation or one of the future for backward motion estimation) in a certain search area that “best” matches the current macroblock according to a specific criterion. In most coding systems, the “best” match is found using the mean absolute error (MAE) criterion. There are several well known algorithms that perform the block matching motion estimation, among them being the full search algorithm (FSA) [3-5] that determines the motion vector of a macroblock by computing the MAE at each location in the search area. This is the simplest method, it provides the best performance, but at a very high computational cost. To reduce this computational requirements, several heuristic search strategies have been developed, as for example the two-dimensional logarithmic search, the parallel one-dimensional search, etc [3-5]. These are often referred to as fast search algorithms. In fast algorithms the procedures applied for motion estimation are of significantly lower complexity, but yield a suboptimal solution in the sense that they may not avoid the convergence of the MAE cost function to a local minimum instead of the global one. Lately, some new fast strategies [9-13] as well as motion estimation [14-20] have been proposed. But, very few applications are available of Kalman filtering for the estimation of motion vectors resulting from a 16x16-block based approach (16x16-KF), [21-22]. This paper proposes an 8x8-block based motion estimation using Kalman filtering (8x8-KF) to improve the motion vector estimates resulting from both the conventional three step algorithm (TSA) [3-5] and the 16x16-block based Kalman filter proposed in [22]. Section 2 introduces the state-space representation for the motion vector. The corresponding Kalman equations are
given in section 3. The comparative results obtained for different classes of video sequences are presented in section 4.
2. State-space representation The scanning in a frame is from the top left to the bottom right. The motion vector of a macroblock can be predicted from that of its left spatial neighbour. The measured motion vectors are obtained through a conventional three step procedure. In the same manner as in [22], the intraframe motion estimation process is modelled through an auto-regressive model which produces the state-space equations. From this macroblock-based representation how do we define the 8x8-block based representation? Each 16x16-block yields a zig-zag sequence of four 8x8-blocks.
1
2
3
4
As the motion vector components are independent b1=a2=0 and a1 equals b2 for our particular application. The Kalman filter updates, i.e. essentially corrects the predicted motion estimate V(k/k-1) given the previous measurement at time index k-1, according to the new measured motion vector Y(k)=y(k) and the measurement equation: Y( k) =V( k) +N( k)
(2)
where N(k) is the measurement noise vector with covariance matrix R. The noise components nx, ny are assumed independent and Gaussian distributed with zeromean and same variance r. It is observed that the measured motion vectors are actually obtained from the TSA run on a 16x16-block basis that yields the zig-zag sequence of four measured motion vectors Y(k) with same values on the 8x8-block basis. This means that the Kalman filter has four measurements instead of one to adjust the motion estimate V(k/k) when the assumption of smooth changes is not strictly valid. As a result, it is expected that we have a better motion estimate for the 8x8-motion compensation procedure.
3. Kalman filter
16x16-pixel block
resulting 8x8-block zig-zag sequence
This corresponds to a conventional pixel decimation for block matching in an 8x8-bock. The motion vector of these sub-blocks is defined as Vt = (vx, vy), where vx and vy denote the horizontal and vertical components, respectively. These two components are assumed independent. The motion vector of an 8x8-block can be predicted (through KF) from the one of the previous 8x8-block according to the time index k of the zig-zag order using the state equation: V( k+ 1 )=F ( k) V ( k) +W( k)
(1)
where W(k) is the state noise vector, i.e. the prediction error with covariance matrix Q. The state noise components wx, wy are assumed independent and Gaussian distributed with zero-mean and same variance q. The matrix F(k) is the transition matrix which describes the dynamic behaviour of the motion vector from one 8x8-block to the next: a F (k ) = 1 a2
b1 . b2
From the above state-space model (eq. 1, 2) the consecutive motion vectors V(k/k-1) and V(k/k), with error covariance matrix P, are recursively estimated given the past measurement Y(k-1) and present measurement Y(k) through the Kalman filter. Based upon this very basic state-space representation for the motion, the conventional Kalman equations are implemented as follows: Update with the new measurement Y(k): • Compute the Kalman gain K( k) =P( k/k-1 ) .{P ( k/k-1 ) +R( k) } - 1
(3)
• Update the motion estimate with the measurement V( k/k) =V( k/k-1 ) +K( k) {Y( k) -V( k/k-1 ) } (4) • Update the error covariance matrix of the motion P( k/k)= {I -K( k)}.P ( k/k-1 ) (5) where I the unitary matrix. Prediction: • Motion vector prediction V ( k+ 1 /k) =F( k) V ( k/k)
(6)
• Prediction error P ( k+ 1 /k) = F( k) P( k/k) F t ( k) +Q( k)
(7)
t
where F (k) denotes the transpose matrix of F(k).
4. Results For comparison purposes, the Full Search Algorithm (FSA), the Three Step Algorithm (TSA), the 16x16-block based Kalman Filter (16x16-KF) presented in [22] and the proposed 8x8-block based Kalman Filter (8x8-KF) are run for different classes of video sequences. We use 50 frames of the following sequences: ‘Alkistis’, ‘Carphone’, ‘Foreman’ and a sub-sampled sequence of ‘News’. It is observed that all the above listed techniques can be qualified as sub-optimal techniques. Even the full search algorithm does not give the real motion. Hence, in any case the results expected are very close in terms of average PSNR (see Table 1). As in [22], the algorithm using the Kalman filtering on a 16x16-block basis provides a greater average peak signal to noise ratio than the one resulting from the conventional three step algorithm. As expected the 8x8block based proposed approach results in an even greater PSNR better than both the TSA and 16x16-KF. For the particular state-space representation used, when the real motion corroborate the assumptions made regarding, only translational motion with very smooth changes, the 8x8-block based motion estimation using Kalman filter is even better than the one resulting from the full search algorithm, as one can see from the average PSNR values and the curves in Figure 1. The computational complexity of each of the above stated algorithms is given in Table 2. The complexity is evaluated in terms of number of operations per block (NOPB). The formulation of the complexity uses N=16 and p=7, where N defines the size of a NxN-block and p the maximum displacement within the search area. The full search requires (2p+1)2 search locations and the three step 25. The Kalman filter is usually of high complexity. But for our particular application, that uses an intentionally simple state-space model for the motion, it is observed from Table 2 that the Kalman filter implementation for refining the motion estimates resulting from the TSA, does not significantly increase the computational complexity. Indeed, due to the basic formulation of the motion, the Kalman filter equations can be largely simplified and thus reduced to their scalar form. Subsequently, any expensive matrix calculation is avoided.
5. Conclusion The above presented results are encouraging, in the sense that with the appropriate state model and a priori assumptions close to the real motion vector behaviour, we are able through Kalman filtering to have a greater PSNR than the other techniques for any frame of the sequence. It is evident that a Kalman filter using such simple
formulation as in this paper is more suitable for class A image sequences. At this point, there are few questions / alternatives that may be of interest: - Shall we consider sub-pixel accuracy? - Shall we leave the motion components independency assumption and consider the real correlation that exists between the displacement in x- and y-direction? - Shall we envisage more elaborate state-space representation for the motion that will imply implementation of extended Kalman filters and parallel Kalman filters? Any of the above consideration is feasible. Regarding the usefulness of implementing such approaches, this will be a matter of trade-off among the resulting motion estimates, the computational complexity and the real visual quality of the images.
References [1] H. G. Musmann, P. Pirsch and H.-J. Grallert, ‘Advances in picture coding’, Proc. IEEE, Vol. 73, No. 4, pp. 523-548, 1985. [2] I. Aksu, F. Ildiz and J. B. Burl, ‘A comparison of the performance of image motion operating on low signal to noise ratio images’, Proc. of the 34th Midwest Symposium on Circuits and Systems, Vol. 2, pp. 1124-1128, NY 1992. [3] A. Murat Tekalp, ‘Digital video processing’, Prentice-Hall, 1995. [4] V. Bashkaran and K. Konstantinides, ‘Image and video compression standards: algorithms and architectures’, Kluwer Academic Publishers, 1995. [5] K. R. Rao and J. J. Hwang, ‘Techniques and standards for image, video and audio Coding’, Prentice-Hall, 1996. [6] G. Tziritas, ‘Motion analysis for image sequence coding’, Elsevier Science 1994. [7] N. M. Namazi, P. Penafiel and C. M. Fan, ‘Nonuniform image motion estimation using Kalman filtering’, IEEE Transactions on Image Processing, Vol. 3, No. 5, pp. 678683, 1994. [8] J. B. Burl, ‘A reduced order extended Kalman filter for sequential images containing a moving object’, IEEE Transactions on Image Processing, Vol. 2, No. 3, pp. 285295, 1993. [9] A. Kundu, ‘Motion estimation by image content matching and application to video processing’, Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 96, Vol. 4, pp. 1903-1906, 1996. [10] K. W. Cheng and S. C. Chan, ‘Fast block matching algorithms for motion estimation’, Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 96, Vol. 4, pp. 2313-2316, 1996. [11] S. Kim, J. Chalidabhongse and C.-C. Jay Kuo, ‘A new stochastic block matching algorithm (SBMA) for video coding based on modified 3-step search’, Proc. of the Int. Conf. on ASSP, ICASSP 96, Vol. 4, pp. 2301-2304, 1996. [12] G. Sinevriotis and T. Stouraitis, ‘A computationally efficient gradient search block matching algorithm for the motion estimation of image sequences’, Proc. of the 13th
13th Int. Conf. on Digital Signal Processing, DSP 97, Vol. 2, pp. 1131-1134, 1997. [20] V. Ruiz and A.N. Skodras, ‘Motion estimation through approximated densities’, Proc. of the 13th Int. Conf. on Digital Signal Processing, DSP 97, Vol. 2, pp. 805-808, 1997. [21] C.-M Kuo, C.-H Hsieh, H.-C. Lin and P.-C Lu, ‘Motion estimation algorithm with Kalman filter’, Electronics Letters, Vol. 30, No. 15, pp. 1204-1205, 1994. [22] C.-M. Kuo, C.-H. Hsieh, Y.-D. Jou, H.-C. Lin and P.-C. Lu, ‘Motion estimation for video compression using Kalman filtering’, IEEE Transactions on Broadcasting, Vol. 42, No. 2, pp. 110-116, 1996.
Int. Conf. on Digital Signal Processing, DSP 97, Vol. 2, pp. 1127-1130, 1997. [13] F. Yu and A. N. Willson, ‘A flexible hardware-oriented fast algorithm for motion estimation’, Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 97, Vol. 4, pp. 2681-2684, 1997. [14] K. Panusopone and K. R. Rao, ‘Efficient motion estimation for block-based video compression’, Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 97, Vol. 4, pp. 2677-2680, 1997. [15] V. Bhaskaran, K. Konstantinides and B. Natarajan, ‘Motion estimation using a computation-constrained criterion’, Proc. of the 13th Int. Conf. on Digital Signal Processing, DSP 97, Vol. 1, pp. 229-232, 1997. [16] J. C. Brailean and A. K. Katsaggelos, ‘Simultaneous recursive displacement estimation and restoration of noisyblurred image sequences’, IEEE Transactions on Image Processing, Vol. 4, No. 9, pp. 1236-1251, September 1995. [17] J. C. Brailean and A. K. Katsaggelos, ‘A recursive nonstationary MAP displacement vector field estimation algorithm’, IEEE Transactions on Image Processing, Vol. 4, p. 416, April 1995. [18] C. Cafforio, E. Di Sciascio and C. Guaranella, ‘Motion estimation and region segmentation via functional optimization’, Proc. of the 13th Int. Conf. on Digital Signal Processing, DSP 97, Vol. 2, pp. 1123-1126, 1997. [19] R. Carpentieri, ‘A bridge between block matching and semantic (model based) motion estimation’, Proc. of the
Table 1: Comparative results on the average PSNR
Sequences Alkistis Carphone Foreman News
Average PSNR dB TSA 16x16-KF 30.664 30.670 33.578 33.585 32.235 32.240 27.246 27.257
FSA 30.769 33.793 32.649 27.686
8x8-KF 30.821 33.672 32.342 27.414
Table 2: Computational complexity of the algorithms
Sequences FSA TSA 16x16-KF 8x8-KF
Addition Multiplication 2 (2p+1) (2N -1) = 114975 0 2 25(2N -1) = 12775 0 TSA+10add = 12785 TSA+10mult = 10 TSA+4x10add = 12815 TSA+4x10mult = 40 2
Square 0 0 TSA+0 = 0 TSA+0 = 0
Absolute (2p+1)2N2 = 57600 25N2 = 6400 TSA+0 = 6400 TSA+0 = 6400
NOPB 172575 19175 19195 19255
Figure 1: PSNR plots for Alkistis’ sequence
PSNR dB
32
30
FSA TSA~16x16-KF 8x8-KF
28
26 25
30
35
40 Frame No
45
50