optical flow in a sequence of images with a spatioâtemporal regularizer explicitly ... Most of the better known methods only deal with the problem of estimating ...
A TEMPORAL REGULARIZER FOR LARGE OPTICAL FLOW ESTIMATION A. Salgado and J. Sa´ nchez Computer Science Department University of Las Palmas de G. C. 35017 Las Palmas de G.C., Spain {asalgado, jsanchez}@dis.ulpgc.es ABSTRACT The aim of this work is to propose a model for computing the optical flow in a sequence of images with a spatio–temporal regularizer explicitly designed for large displacements. We study the introduction of a temporal regularizer that expands the information beyond two consecutive frames. We use the large optical flow constraint equation in the data term, the Nagel–Enkelmann operator for the spatial smoothness term and a newly designed temporal regularization. Our model is based on an energy functional that yields a partial differential equation (PDE). This PDE is embedded into a multipyramidal strategy to recover large displacements. The numerical experiments show that thanks to this regularizer the results are more stable and accurate. Index Terms— Motion measurement, Optical velocity measurement, Variational methods 1. INTRODUCTION In this paper we consider the problem of estimating the optical flow assuming that the objects may undergo large displacements. The difference with respect to other related approaches is that we exploit the temporal dimension of the sequence. Most of the better known methods only deal with the problem of estimating the optical flow between two frames, ignoring that the sequence comprises several images which are all related. We show in this paper that it is possible to integrate the temporal information to improve the results and provide more stable solutions. We propose a variational technique in where an energy functional is minimized yielding a diffusion–reaction PDE. These kind of energy–based approaches have been largely used in optical flow estimation. Horn and Schunck [7] propose to minimize the so–called optical flow constraint equation (OFC) together with a smoothing term depending on the optical flow gradient. Later some authors have proposed several improvements to overcome the shortcomings of this method Thanks to Joachim Weickert for valuable comments on this work. This work has been partly supported by the Spanish Ministry of Science and Technology and FEDER through the research project TIC2003-08957.
1424404819/06/$20.00 ©2006 IEEE
1233
like in [2], [5] and [6]. In order to compute large displacements a common strategy is to use a multi–pyramidal decomposition like in [2] and [8] in where each scale is represented by decreasing size of images. A different approach is that of Alvarez et al. [1] that uses a linear scale–space approach in the focusing strategy. The idea behind all these methods is the same: use coarsest scales to compute raw estimates of the optical flow and utilize this as an approximation to finer scales. In a recent paper by Brox et al. [4] the authors state that the latter and former approaches are equivalent. The first works on spatio–temporal methods are due to Nagel [9] and Black and Anandan [3]. The former propose an extension of his oriented smoothness operator by including the temporal dimension. The latter propose to compute the motion incrementally assuming acceleration in time. In order to make this method more robust to noise, the acceleration is averaged in time. More recently Weickert and Schn¨orr [12] propose a continuous model with a nonlinear convex spatio– temporal smoothness constraint. The spatial and temporal derivatives are formulated in a homogeneous way. Other related paper is due to Brox et al. [4]. In this case a Total Variation functional is used in the data and smoothing terms. The data term includes the large optical flow constraint equation and a gradient constancy term. The smoothness term is formulated continuously with spatial and temporal derivatives treated in the same manner. In this work we propose a novel temporal regularizing term which is explicitly designed to support for large displacements. The previous mentioned spatio–temporal methods treat the spatial and temporal dimensions in the same way. We show here that when large displacements are present, it is not suitable to use temporal derivatives. It is more convenient to separate the temporal smoothing term and formulate it in a different way. In Sect. 2 we explain our method. In Sect. 3 we minimize this energy and in Sect. 4 we show several experimental results with synthetic sequences and show some numerical comparisons that emphasize the performance of this new method. Finally in Sect. 5 the conclusions.
ICIP 2006
2. THE METHOD
no relation between the h i functions, therefore, we expect to obtain independent solutions. The optical flow, h(x) = (u(x), v(x)) T , is the apparent moVariational temporal regularizers have been introduced in tion of pixels in a sequence of images. One of the first in insome works like in [3] and [9]. More recently Weickert and troducing a variational formulation for the computation of the Schn¨orr [12] introduced a method in where they used the OFC 2 optical flow was Horn and Schunck [7]. They proposed the as data term and a regularizing term of the form ψ(|∇ 3 u| + dI(x,t) 2 so–called optical flow constraint equation (OFC), dt = |∇3 v| ). In this case the temporal derivative is treated in the ∇I · h + It = 0, which states that the image intensity remains same manner as the spatial derivatives. This has the advantage constant through the sequence. of beeing an homogeneous formulation that does not make When large displacements are present in the scene, OFC any difference between the spatial and temporal dimensions. is not valid, and we need to formulate it in a different way. A The continuous formulation of this method restricts its scope typical approach is to use I 1 (x) − I2 (x + h(x)) = 0. Since to images with small displacements of the objects. we have a set of images then a normal approach is In [4] a similar method is proposed. In this case they use a data term which is based on the large optical flow conN −1 2 2 straint equation and as regularizing term a similar ψ(|∇ 3 u| + (Ii (x) − Ii+1 (x + hi (x))) (1) |∇3 v|2 ). The former is thus designed to support for large disi=1 placements meanwhile the second is designed continuously where N is the number of frames and i stands for the temporal which means that it only supports short displacements. dimension. This proposal has two major inconveniences: the first one Other approaches different from the quadratic form are is due to the fact that the temporal and spatial components commonly used. For instance, in papers [3], [8] the authors of the gradient are coupled. Thus, when large displacements propose some robusts functionals in order to decrease the efare present, the temporal estimate may inhibit regularization fect of outliers. It also may be replaced by other non–Lambertian on the spatial domain; the second inconvenience has already functionals in order to deal with more complex image sebeen remarked in the previous paragraph and has to see with quences like, for instance, [11] which is based on multimodal the fact that the model uses a temporal derivative in the smoothimage matching. ness term. It supposes a continuity of the flow in time but the Typically, the data term is accompanied by a smoothness data term allows for large discontinuities. 2 term. Horn and Schunck [7] proposed to minimize ∇h = A simple way to overcome the first drawback is to replace 2 2 ∇u + ∇v which provides smooth solutions and yields 2 2 ψ(|∇3 u| + |∇3 v| ) with separate spatial and temporal reguan isotropic diffusion equation at the PDE. This model has 2 2 2 2 larizers as ψ(|∇u| + |∇v| ) + ψ(|ut | + |vt | ). been improved during the last years and some authors has Our energy is then composed of two parts introduced some different approaches in order to respect the image or flow discontinuities. A well known approach is that E({hi }) = ES ({hi }) + ET ({hi }) (4) of Nagel–Enkelmann [9] that introduce an operator dependwhere ES is the spatial energy given by (3) and E T our teming on the image gradient that enables anisotropic diffusion in poral energy. order to respect the object contours. Other improvements are If we also want to deal with large displacements, then it given by the methods explained in [1], [5] and [6]. We choose the Nagel–Enkelmann operator [9]: is necessary to replace the temporal derivative with a differ2 2 ent estimate. We need to replace ψ(|u t | + |vt | ) by a more N −1 suitable temporal regularizer for large displacements. If we trace(∇hTi D(∇Ii )∇hi ) (2) suppose that the object velocities are smooth in the sequence, i=1 then we know that given two consecutive frames the value of This operator diffuse anisotropically at the object contours the optical flow, hi (x), should be similar to h i+1 (x + hi (x)) and isotropically at the homogeneous regions. Our spatial enin the following frame. For the same reason, h i (x), should ergy is then as follows also be similar to hi−1 (x + h∗i−1 ) in the previous frame – N −1 with h∗i−1 the backward optical flow –. Our temporal energy 2 is then formulated as (Ii − Ii+1 (x + hi )) dω ES ({hi }) = i=1 Ω N −2 2 N −1 Φ(hi − hi+1 (x + hi ) ) dω ET ({hi }) = β T + α trace(∇hi D(∇Ii )∇hi ) dω(3) i=1 Ω i=1 Ω N −1 2 +β Φ(hi − hi−1 (x + h∗i−1 ) ) dω If we minimize the previous energy, the estimation of the i=2 Ω optical flows is equivalent to computing the optical flows in(5) dependently between every two consecutive frames. There is
1234
x2
where Φ(x2 ) = 1 − γe− γ . In this case, we have two temporal terms: the first one relates the actual frame with the following, and the second relates it with the previous one. This kind of estimate will favour translational displacements with smooth velocities. 3. MINIMIZING THE ENERGY In order to obtain a solution to our energy we derive the Euler– Lagrange equations: 0
Fig. 1. Angular error for the square sequence.
= − (Ii (x) − Ii+1 (x + hi )) ∇Ii+1 (x + hi ) T
−α (div(D (∇hi ) ∇ui ), div(D (∇hi ) ∇vi )) 2
+β Φ (hi − hi+1 (x + hi ) ) · (hi − hi+1 (x + hi ))T Id − ∇hTi+1 (x + hi ) 2 +βΦ (hi − hi−1 (x + h∗i−1 ) ) (6) · hi − hi−1 (x + h∗i−1 ) We apply a gradient descent technique to reach the solution of the previous system of equations. We embed these gradient descent equations into a multi–pyramidal approach to deal with large displacements. This approach is commonly used in works on large optical flow estimation like in [2], [4] and [8]. We create a pyramid of scales for the whole sequence with different size of images and solve the previous system of equations at each level. Once we obtain a stable solution for a scale, we use this as a first estimate for a finer scale. Therefore we have a number of scales s 1 , s2 , . . . sn each one representing different size of images. Normally the pyramid is formed by half sized images – the ratio between two images is 0,5 – but other ratios may be considered like in [4] where they propose to use a reduction factor η between 0,5 and 0,95. At each scale we solve the previous system of equations for the whole set of unknows {u si , vis } and then we use this as a first approximation for the following scale {usi 1 , vis1 } → {usi 2 , vis2 } → . . . {usi n , visn }. 4. EXPERIMENTAL RESULTS We analyze three methods: The spatial method given by (3), the temporal method (5) without the last term of the backward flow (we call this ”Temporal”) and the temporal method including the backward flow (”Bi–Temporal”). The first sequence consists of 10 frames with a black square that is moving ten pixels horizontally forward with constant velocity over a white background. In Fig. 1 we show the angular errors for all the frames using the three methods. In table 1 we show the angular and euclidean errors for the square sequence. From table 1 and Fig. 1 we can see that the temporal methods are very stable: the angular and euclidean errors for all the frames are very similar, meanwhile the
1235
Method Spatial Temporal Bi-Temporal
AEµ 0,029 o 0,025 o 0,022 o
AEσ 3,6E-3o 3,7E-4o 3,0E-5o
EEµ 0,0189 0,0246 0,0132
EEσ 0,013 3,8E-4 1,9E-4
Table 1. Mean angular and euclidean errors for the square sequence: AEµ is the mean angular error and AE σ its standard deviation; EE µ is the mean euclidean error and EE σ its standard deviation. spatial method is far from being stable. The improvement of the ”Temporal” method with respect to its spatial counterpart is about 13,79% of the angular error and in the case of the ”Bi–Temporal” is about 24,14%. The euclidean error for the ”Temporal” method is bigger than the spatial method. This is justifiable because the last optical flow has a big error with respect to the rest of frames. The simple temporal method does not improve the last solution because it does not receive information from other frames. Even more, when this last estimate is bad, it propagates the bad guests to the rest of sequence. In the case of the ”Bi–Temporal” method the improvement with respect to the spatial one is about 31,58%. This method is not affected by the last optical flow as in the previous case. The second experiment is the Marble Block sequence (Fig. 2) composed of 30 frames. This sequence is copyright by H.H. Nagel (KOGS/IAKS, University of Karlsruhe, Germany) and was first used in [10]. In Fig. 3 we show the angular error and in table 2 the mean angular and euclidean errors for all the frames. Method Spatial Temporal Bi-Temporal
AEµ 6,695 o 5,402 o 4,731 o
AEσ 2,698o 1,327o 1,330o
EEµ 0,2480 0,2081 0,1848
EEσ 0,0963 0,0638 0,0661
Table 2. Mean angular and euclidean errors for the Marble Block sequence. We can see that the temporal methods are more stable than the spatial one and that the accuracy improves in a similar
Fig. 3. Angular error for the Marble Block sequence. [3] M. J. Black and P. Anandan, ”Robust dynamic motion estimation over time”, Computer Society Conference on Computer Vision an d Pattern Recognition, pp 292–302, June 1991 Fig. 2. Marble Block sequence: top–left is the 10th frame of the sequence; top–right its true optical flow; bottom–left the solution for the spatial method; and bottom–right the solution for the ”Bi–Temporal” method.
[4] T. Brox, A. bruhn, N. Papenberg and J. Weickert, ”High Accuracy Optical Flow Estimation Based on a Theory for Warping” ECCV 8, Lecture Noter in Computer Science, Springer, 3024, Vol. 4, pp. 25–36, 2004 [5] I. Cohen, ”Nonlinear Variational Method for Optical Flow Computation”, SCIA 8, Norway, 1993.
magnitude as in the previous experiment.
[6] R. Deriche, P. Kornprobst and G. Aubert, ”Optical flow estimation while preserving its discontinuities: a variational approach”, ACCV 2, Vol. 2, pp. 290–295, 1995
5. CONCLUSIONS We have proposed a new energy model that includes a temporal term that minimizes the difference of the optical flows in time, assuming that the object displacements are large. We claim that, in this case, it makes no sense to use temporal derivatives which is the standard approach in other related works. To our knowledge, this is the first work that deals with this problem in the scope of variational optical flow estimation. We have shown that temporal regularization in general provides much more stable and accurate results. We have also studied the effect of including the backward flow in the temporal regularization. This provides better results than using the unique forward term. 6. REFERENCES [1] L. Alvarez, J. Weickert, and J. S´anchez, ”Reliable Estimation of Dense Optical Flow Fields with Large Displacements”, International Journal of Computer Vision, Springer, Vol. 39, 1, pp. 41–56., 2000 [2] P. Anandan, ”A Computational Framework and an Algorithm for the Measurement of Visual Motion”, International Journal of Computer Vision, Springer, 2, pp. 283–310, 1989
1236
[7] B. Horn and B. Schunck, ”Determining Optical Flow”, Artificial Intelligence, Vol. 17, pp.185–203, 1981 [8] E. M´emin and P. P´erez, ”Hierarchical estimation and segmentation of dense motion fields”, International Journal of Computer Vision, Springer, Vol. 46, 2, pp. 129–155, 2002 [9] H. H. Nagel, ”Extending the ’oriented smoothness constraint’ into the temporal domain and the estimation of derivatives of optical flow”, ECCV 90, Lecture Notes in Computer Science, Vol. 427, pp. 139–148, 1990 [10] M. Otte and H.-H. Nagel, ”Estimation of Optical Flow Based on Higher-Order Spatiotemporal Derivatives in Interlaced and Non-Interlaced Image Sequences”, Artificial Intelligence Journal, Vol. 78, pp. 5–43, 1995 [11] P. Viola and W. M. Wells, ”Alignment by Maximization of Mutual Information”, International Journal of Computer Vision, Springer, Vol. 24, pp. 137–154, 1997 [12] J. Weickert and C. Schn¨orr, ”Variational Optic Flow Computation with a Spatio–Temporal Smoothness Constraint”, Journal of Mathematical Imaging and Vision, Springer, Vol. 14, pp. 245–255, 2001