A New Optimal Frequency Motion Estimation Algorithm F. Essannouni, Y. Hadi, R. Oulad Haj Thami, D. Aboutajdine and A. Salam
Abstract— Motion estimation techniques are widely used in today’s video processing systems. The most frequently used techniques are the spatial block matching methods and the differential method. In this paper however, we have studied this topic from a viewpoint different from the above. We prove here the existence of a new and simple frequency algorithm with optimal accuracy. Our method is based on the correlation Fourier theorem and the fast Fourier transform. The proposed method greatly outperforms the previous frequency-domain motion estimation and achieves a significant speed up over the direct full search block-matching algorithm.
I. I NTRODUCTION Motion estimation plays a key role in many video applications, such as frame-rate video conversion, video retrieval, video surveillance, and video compression. The key issue in these applications is to define appropriate representations that can efficiently support motion estimation with the required accuracy. Block-matching motion estimation (BME), an efficient and popular method, has been widely adopted in many applications of video processing and analysis. The simplest BME is the full search (FS) algorithm, which gives the global optimum motion solution, i.e. the minimum matching error point, by evaluating all the candidates within the search window. However, it is not practical for many applications especially for those that operate in real-time and are under power constraint, due to its substantial amount of computational load demanded. To overcome this drawback, many fast and spatial block matching algorithms have been proposed [1]–[4]. Another category, and the subject of our focus, consists of those algorithms which use the Fourier techniques for motion vector estimation. Recently there has been a lot of interest in frequency motion estimation techniques. These techniques offer well-documented advantages in terms of computational efficiency due to the employment of fast Fourier transform algorithms. Perhaps the best-known method in this class is the phase correlation method [5]. This method has become one of the global motion estimation methods of choice for a wide range of broadcasting applications [6], [7]. But in local motion estimation, the phase correlation method has not achieved the same level of success. It has been shown to give a spurious solutions and it is greatly less powerful F. Essannouni, Y. Hadi and D. Aboutajdine are with GSCM, UFR IT, Facult´e des Sciences, Universit´e Mohamed V-Agdal, B.P. 1014 Rabat, Maroc.
[email protected] R. Oulad Haj Thami is with Laboratoire SI2M, Equipe WiM ENSIAS, Universit´e Mohamed V-Souissi, Rabat, Maroc.
[email protected] A. Salam is with Laboratoire de Math´ematiques Pures et Appliqu´ees, Universit´e du Littoral-Cˆote d’Opale. C.U. de la MiVoix, 50 rue F. Buisson, B.P. 699, 62228 Calais, Cedex, France.
[email protected]
than the full search block matching algorithm in terms of motion estimation accuracy [8], [9]. Many other frequency techniques which outperforms the standard phase correlation method were recently proposed [10]–[13]. Although these algorithms are computationally more efficient than the full search block matching algorithm, they do not give as good a quality as it. Thus the application of frequency methods for local motion estimation applications have proven limitations in comparison to fast block matching algorithms. In our previous work however, we have been interested in optimal motion estimation using only frequency domain (see ref [14]). In this latter, we have shown that the sum square difference can be written on a new surface which can be computed using two real cross correlation operations. In this paper however, we show the existence of a new and faster frequency technique which can yield exactly to the same optimal result as the direct full search (FS) under SSD metric using only one real cross correlation per block. The key of our paper is to find a new method for computing the sum square blocks using the Fast Fourier transform in one operation for the whole frame. So the computation of the SSD function amounts only in one cross correlation computation per block. The rest of our paper is structured as follows. In Section II, the proposed frequency motion estimation technique is detailed. The principles of three state-of-the-art of frequency motion estimation algorithms are outlined in Section III. In Section IV we present some experimental results to show that the suggested approach can work as a promising solution. Conclusions are drawn in Section V. II. T HE PROPOSED FREQUENCY MOTION ESTIMATION A. Mathematical notations: The full search technique was originally described by Jain and Jain [15]. Each image frame is divided into a fixed number of usually square blocks B × B. For each block g in the current frame Ic , a search is made in the reference frame Ir over a search area f within a fixed-sized of search window ±w. The search is for the best matching block, to give the least prediction error, usually minimizing either sum absolute difference (SAD), or sum square difference (SSD). This latter can be expressed as follows: SSD(dx , dy ) =
B−1 P B−1 P l=0 k=0
2
(f (k + dx , l + dy ) − g(k, l)) ,
(1) where (dx , dy ) is the motion vector candidate. For simplicity of notations, we assume that dx and dy are in [0..2w]. Note
that dx between 0 and w indicates a negative displacement, meanwhile dx between w and 2w indicates a positive one, and the same for dy . Let: B−1 P B−1 P 2 T1 (dx , dy ) = f (k + dx , l + dy ), (2)
T3
=
l=0 k=0 B−1 P B−1 P
Improvement=FS/OPcorr
T2 (dx , dy ) =
l=0 k=0 B−1 P B−1 P
30
− 2 f (k + dx , l + dy )g(k, l),(3) 2
g (k, l).
20
B=24
15 B=16
10 5 0
(4)
B=32
25
l=0 k=0
B=8 0
10
20 30 Search window w
40
Then the SSD metric can be written as: SSD(dx , dy ) = T1 (dx , dy ) + T2 (dx , dy ) + T3 .
(5)
Let (i, j) be the position of the left corner of the search window f in the reference image Ir , therefore: f (k, l) = Ir (i + k, j + l).
(6)
So if we define the function S(x, y) as: S(x, y) =
B−1 P B−1 P l=0 k=0
Ir2 (k + x, l + y),
(7)
Fig. 1. Relative computational complexity of the proposed method versus direct full search.
transform (FFT) for calculating the term T2 (dx , dy ) and a novel data structure called the Windowed-Sum-SquaredTable for computing the sum square blocks. The approach here is different. In this paper, we propose to use again the FFT algorithm for computing the sum square blocks which can be derived from the function S(x, y)( (see (8) ). We can remark that the function S(x, y) can be written as:
then: T1 (dx , dy ) = S(dx + i, dy + j).
where:
SSD(dx , dy ) = S(dx + i, dy + j) + T2 (dx , dy ) + T3 . (9) The last term T3 is independent of the motion vector (dx , dy ). Therefore, minimizing SSD metric amounts to minimizing: (10)
In the next section we show an optimal approach for computing the function S(x, y) and the term T2 using only the fast Fourier transform algorithm.
Ir2 (k + x, l + y)m(k, l),
m(k, l) =
½
1 0
(12)
for 0 ≤ k < B and 0 ≤ l < B , for B ≤ k < W or B ≤ l < H
H and W are the numbers of pixels in vertical and horizontal directions of the reference image Ir respectively. This remark is crucial in our paper. Indeed from (12), the function S(x, y) can be also considered as a spatial correlation. Therefore, using the correlation Fourier theorem, the function S(x, y) can be computed using again the FFT algorithm: S(x, y) = < {IF F T ((R(u, v)M ∗ (u, v)))} ,
B. The proposed approach The term T2 (dx , dy ) can be viewed as a spatial correlation which can be efficiently computed using Fast Fourier transform (FFT). Given G the FFT of g and F the FFT of f , and the IFFT() the Inverse fast Fourier transform. The term T2 (dx , dy ) can be computed as bellow: T2 (dx , dy ) = −2 < (IF F T (F (u, v)G∗ (u, v))) ,
H−1 −1 XW X l=0 k=0
Thus the SSD metric can be expressed as bellow:
S(dx + i, dy + j) + T2 (dx , dy ).
S(x, y) =
(8)
(11)
where < denotes the real part of a complex number and the asterisk denotes complex conjugation. Note that f and g are correlated with FFTs by zero padding the size of g to the size of f prior to taking the forward FFTs. And since the cross correlation that we have performed is cyclic, the last B − 1 rows and B − 1 columns of result will contain wrap-around data that should be discarded. This result is not new and have been widely used in many works for speeding up the computation of the SSD metric. Take for instance ref [16]. In this later, the method proposed combines between the spatial and the frequency domain in its computation of the SSD metric. It uses the fast Fourier
(13)
where R(u, v), M (u, v) denote respectively the FFT of Ir2 (x, y) and the FFT of m(x, y). Finally the motion vector (dx , dy ) can be easily determined by minimizing (10) which can be computed using (11) and (13). C. Running time analysis The frequency block matching algorithm that we have presented consists of two steps. In the first step we compute the function S(x, y) using the FFT algorithm. This step takes only O(W H log2 (W H)), per frame. In the second step, we perform a per-block correlation. This later has a running time close to O((2w + B)2 log2 (2w + B)) per block. Therefore the total running time for this step is: µ ¶ WH 2 O ((2w + B) log2 (2w + B)) 2 . B A precise statement of the total running time of the proposed algorithm is then:
µ
µ
¶¶ ((2w + B)2 log2 (2w + B)) O WH + log (W H) . 2 B2 (14) Note that this computation time can be greatly reduced by using machine specific optimized FFT implementations which are widely available. Figure 1 shows the improvement by using the proposed method over the FS for different values of B and w. III. P REVIOUS FREQUENCY MOTION ESTIMATION The existing frequency motion estimation techniques are mainly used for the global motion estimation. They can also be applied to local motion estimation as block based approaches. The most known and popular frequency technique is the phase correlation method. In this section we review the principle of this method and two other correlation techniques that have recently appeared in the literature. The methods considered are based on the correlation methodology in the frequency domain and as such have similar computational complexity with our scheme. A. Phase correlation The phase correlation method [5] capitalizes on the wellknown Fourier shift theorem which states that shifts in the spatial domain correspond to linear phase changes in the Fourier domain. Its formula between two images f1 and f2 is given in [5] by: ½ ¾ F2 (u, v) F1∗ (u, v) P (x, y) = IF F T , (15) |F2 (u, v)F1∗ (u, v)| • • •
IF F T is the inverse Fourier Transform. F1 and F2 denote the Fourier Transforms of the two images f1 and f2 . F1∗ denotes the conjugate of F1 .
For block based motion estimation the phase correlation technique works by computing the phase correlation function(15) between two gions of image data, usually in the form of co-sited rectangular blocks in the current and the next frame of a video sequence [9], [17]–[19]. However, in local motion estimation, the phase correlation method fails in several cases. Indeed the mathematical analysis of this technique assumes ideal translation which is not true between two compared blocks from real frames. B. Robust correlations 1) Orientation correlation: Orientation correlation estimates the motion between two images f1 and f2 by correlating their orientation images [10]. Each pixel in a orientation image f1d is a complex number that represents the orientation of intensity gradient. This method is based on Andrews wave M estimator [20] which makes it statistically robust.
2) Fast robust correlations: A generalization of robust correlation using Andrews wave M estimator has been proposed in [13]. This approach works by expressing the matching surface in terms of many cross correlations which can be computed using the fast Fourier transform. In the simulations shown in [13], the authors have used only one cross correlation operation using a scale factor a1 as the inverse of the median distribution of the error data. However even if this method can outperforms the standard cross and phase correlation methods, this approach gives a suboptimal result when compared with FSBMA ( fore more details about this method and its implementation see video coding section in [13]). IV. S IMULATION R ESULTS Experiments are undertaken on three CIF video sequences called Bus, Carphone, and Tempete CIF (352 × 288) and three QCIF video sequences called Mobile, Mother, and Foreman QCIF (176×144). Only the luminance channel was considered. The frames are subdivided into equally blocks of size B and the current blocks is searched into the reference image using a search range ±w using the proposed frequency method as detailed in Section II. For all test sequences using different values of B and w, the proposed frequency method gives exactly the same motion vectors as the direct full search block matching algorithm under SSD metric. Consequently, the proposed frequency method gives an optimal solution unlike the existing frequency methods. In Figures 2 and 3, we compare between our proposed frequency method (FM), the phase correlation method (PC) [9], the orientation correlation method (OC) [10], and the fast robust correlation (FRcorr) as implemented in [13] (video coding section). The local motion estimation performance was assessed by applying motion compensation between the current frame and the previous frame using the estimated motion parameters and computing the relative Peak-Signalto-Noise Ratio (PSNR). The block size used in these experiments is 16 × 16 pixels and the maximum displacement is ±8. From Figures 2 and 3, the PSNR comparison results highlight the fact that, the proposed frequency motion estimation algorithm greatly outperforms all of the standard method of phase correlation, the orientation correlation, and the fast robust correlation in terms of local motion estimation accuracy (Note that for mother QCIF sequence (Figure 3(c)), the FM algorithm and the FRcorr have produced the same PSNR results). V. C ONCLUSION In this paper we have presented an optimal block based motion estimation which, although uses a method that is computationally simple, can provide a singular solution to the problem of the overwhelming complexity of the direct full search. Note that the proposed algorithm can be even faster by using machine specific optimized FFT implementations which are widely available. On the other
36
26
29
34
25
28
24
30 28
FM FRcorr OC PC
26 24
0
5
10 Frame N°
15
27
23
PSNR
PSNR
PSNR
32
22
20 19
20
0
5
10 Frame N°
(a)
24 23
20
26.5
40
50
38
48
FM FRcorr OC PC
25.5
10 Frame N°
15
32
15
20
28 20
26
0
5
10 Frame N°
15
44 42
FM FRcorr OC PC
30
(a)
FM FRcorr OC PC
40 20
38
0
(b) Fig. 3.
10 Frame N°
46
34
PSNR
PSNR
26
5
5
(c)
36
0
0
PSNR results for a) Carphone b) Bus and c) Tempete CIF video sequences.
27
PSNR
15
FM FRcorr OC PC
(b) Fig. 2.
25
25
FM FRcorr OC PC
21
26
5
10 Frame N°
15
20
(c)
PSNR results for a) mobile b) Foreman and c) mother QCIF video sequences.
hand, the proposed frequency method greatly outperforms the existing frequency motion estimation techniques in terms of accuracy while still keeping the same order of complexity. R EFERENCES [1] R. Li, B. Zeng, and M.L. Liou, “A new three-step search algorithm for block motion estimation,” IEEE Trans. On Circuits and Systems for Video Technology, vol. 4, no. 4, pp. 438–442, August 1994. [2] Y.C. Lin and S.C. Tai, “Fast full-search block-matching algorithm for motion-compensated video compression,” IEEE Transactions on communications, vol. 45, no. 5, pp. 527–531, May 1997. [3] Y.S. Chen, Y.P. Hung, and C.S. Fuh, “Fast block matching algorithm based on the winner-update strategy,” IEEE Transactions on Image Processing, vol. 10, no. 8, pp. 1212–1222, August 2001. [4] T. Ahn, Y.H. Moon, and J.H. Kim, “An improved multilevel successive elimination algorithm for fast full-search motion estimation,” in ICIP03, 2003, pp. II: 351–354. [5] C.D. Kuglin and D.C. Hines, “The phase correlation image alignment method,” in IEEE 1975 International Conference on Systems, Man and Cybernetics, September 1975, pp. 163–165. [6] L. Hill and T. Vlachos, “On the estimation of global motion using phase correlation for broadcast applications,” in IEEE International Conference on Image Processing and it’s Applications (IPA 99), 1999, pp. 721–725. [7] T. Vlachos, “Cut detection in video sequences using phase correlation,” SPLetters, vol. 7, no. 7, pp. 173–175, July 2000. [8] L. Hill and T. Vlachos, “Global and local motion estimation using a higher order search,” in Meeting on Image Recognition and Understanding (MIRU), 2000, pp. 131–135. [9] Y. Liang, “Phase-correlation motion estimation,” EE392J Project report, 2000.
[10] A.J. Fitch, A. Kadyrov, W.J. Christmas, and J.V. Kittler, “Orientation correlation,” in BMVC02, 2002, p. Matching/Recognition. [11] V. Argyriou and T. Vlachos, “Estimation of sub-pixel motion using gradient cross-correlation,” IEEE Electronic Letters, vol. 39, pp. 980– 982, 2003. [12] V. Argyriou and T. Vlachos, “Using gradient correlation for sub-pixel motion estimation of video sequences,” in ICASSP04, 2004, pp. 329– 331. [13] A.J. Fitch, A. Kadyrov, W.J. Christmas, and J.V. Kittler, “Fast robust correlation,” IEEE Transactions on Image Processing, vol. 14, no. 8, pp. 1063–1073, August 2005. [14] F. Essannouni, R. Oulad Haj Thami, A. Salam, and D. Aboutajdine, “A new fast full search block matching algorithm using frequency domain,” in IEEE ISSPA, Sydeny, Australia, 2005. [15] J.R. Jain and A.K. Jain, “Displacement measurement and its application in interframe image coding,” IEEE Transactions on Communications, vol. COM-29(12), pp. 1799–1806, 1981. [16] Kilthau, S.L. Drew, and T. M.S. Moller, “Full search content independent block matching based on the fast fourier transform,” in ICIP, 2002, pp. I–669– I–672. [17] G.A. Thomas, “Television motion measurement for datv and other applications,” Research Report, 1987. [18] Y.M. Chou and H.M. Hang, “A new motion estimation method using frequency components,” JVCIR, vol. 8, no. 1, pp. 83–96, March 1997. [19] T. Vlachos and G. Thomas, “Motion estimation for the correction of twin-lens telecine flicker,” in ICIP96, 1996, p. 16A4. [20] P. Huber, Robust statistics, John Wiley, New York, 1981.