2044
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013
Constrained Optical Flow Estimation as a Matching Problem Mikhail G. Mozerov, Member, IEEE
Abstract— In general, discretization in the motion vector domain yields an intractable number of labels. In this paper, we propose an approach that can reduce general optical flow to the constrained matching problem by pre-estimating a 2-D disparity labeling map of the desired discrete motion vector function. One of the goals of the proposed paper is estimating coarse distribution of motion vectors and then utilizing this distribution as global constraints for discrete optical flow estimation. This pre-estimation is done with a simple frame-to-frame correlation technique also known as the digital symmetric-phase-only-filter (SPOF). We discover a strong correlation between the output of the SPOF and the motion vector distribution of the related optical flow. A two step matching paradigm for optical flow estimation is applied: pixel accuracy (integer flow) and subpixel accuracy estimation. The matching problem is solved by global optimization. Experiments on the Middlebury optical flow datasets confirm our intuitive assumptions about strong correlation between motion vector distribution of optical flow and maximal peaks of SPOF outputs. The overall performance of the proposed method is promising and achieves state-of-the-art results on the Middlebury benchmark. Index Terms— Digital symmetric-phase-only-filter (SPOF), discrete energy minimization, optical flow estimation.
I. I NTRODUCTION
R
ECENTLY proposed optical flow (OF) estimation methods [1]–[11] show an impressive level of accuracy in terms of the criterions proposed in [12]. However, this progress is achieved due to a significant complication of the optical flow estimation paradigm. This paradigm is now far from the original formulations of Horn and Schunk [13] or Lucas and Kanade [14]. As an example, the general concept behind the method presented in [8], which ranks at the top of the Middlebury optical flow benchmark is to mix the best OF estimation techniques in such a manner that the final estimation scheme is a tradeoff between advantages and drawbacks of each individual approach. It seems that this trend is the only way to achieve state-of-the-art results in OF estimation. Results on the epipolar constrained stereo matching problem seem to reach its natural limits since global optimization methods were proposed. One of the papers [15]–[17] that ranks at the top the Middlebury stereo benchmark was published in 2006 [15].
Manuscript received July 10, 2012; revised January 22, 2013; accepted January 24, 2013. Date of publication January 30, 2013; date of current version March 19, 2013. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Anthony Vetro. The author is with the Universitat Autonoma de Barcelona, Department of Computer Vision Center, Barcelona 08193, Spain (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2013.2244221
And most of the top methods are somehow derivatives of the MRF energy minimization approach proposed in [18] and effectively solved in [19]. Stereo matching as well as optical flow estimation aim to recover a dense map of displacement vectors, which establish the global correspondence between every pixel of two considered images. Stereo matching techniques assume the epipolar constraint that makes the problem feasible for the discrete MRF optimization. In contrast, even the integer discretization in the motion vector domain for optical flow yields considerably more labels due to the 2D nature of the domain, thereby significantly increasing the computational complexity of the estimation. These observations motivate us to search a mechanism that can sufficiently restrict the initial search space for the integer OF (pixel accuracy level). If we are able to obtain a reasonable number of dominant motion vectors of a sequence by preestimating a motion vector distribution, the OF estimation problem can be reduced to a constrained matching task similar to stereo matching. Our research is motivated by work in the field of pattern localization with the symmetric-phase-only-filter (SPOF) [20], [21]. If we convolve two consecutive frames in sequences with SPOF we can obtain the global motion (a simple shift vector for all pixels in the image plane). This observation implies that if there are several regions in a scene, each possessing different shift vectors relative to the reference frame, there will be the same number of peaks in the output domain of the SPOF between two consecutive frames. It is important to understand the difference between our technique and convenient correlation-based methods [22], [23], which were popular in stereo matching. The core of these methods is sliding window matching: the desirable displacement vector value is obtained at each point (or in a sparse set of points) of the image plane by choosing one maximal peak in the output of a cross-correlation filter. The problem of this technique is the over-smoothing and possible loss of several values in the integer displacement space. Application of SPOF can improve robustness of OF estimation, due to the sharpness of the filter output peaks [24]. However, the main concept remains in the general framework of correlation-based methods. In contrast, our constraint pre-estimation method obtains all non-zero values of the distribution in the integer OF vectors space directly, by taking a reasonable number of maximal peaks in the output of SPOF. In other words, one can get the desirable distribution without estimating OF in the image plane. Most modern OF algorithms use an image pyramid and in this way cannot deal with large motion
1057-7149/$31.00 © 2013 IEEE
MOZEROV: CONSTRAINED OPTICAL FLOW ESTIMATION AS A MATCHING PROBLEM
differences that exceed the size of the object. Thus, one of the advantages of the SPOF application is that the method has no limits for the range of the estimated displacement vectors. In principle, any value that does not overpass the size of a frame can be detected. Thus, the initial idea is to find the constraint parameters globally by convolving two consecutive frames of a video sequence. The number of maximal peaks to be chosen is an open problem, and our final constraint preestimation method is a tradeoff between global and local SPOF application. In Fig. 1 an overview of the optical flow method proposed in this paper is given. The main idea of our paper is that we reduce the OF estimation problem to a labeling problem which can be solved accurately by global optimization during each step of the estimation. This is achieved as follows. Firstly, the 2D disparity labeling map is pre-estimated. Then we estimate a dense integer OF field using the stereo disparity matching paradigm. We also introduce multidirectional matching approach to improve occlusion handling. Further subpixel accuracy is achieved by the same global optimization approach, with different levels of accuracy tuning. During each step of the coarse-to-fine procedure our algorithm chooses one of nine corrective vectors (labels) by optimizing an MRF under the global smoothness constraint. Finally, a cascade of postprocessing filters is applied to achieve higher accuracy in OF estimation. Note, that the general scheme of the constrained optical flow matching (COFM) in Fig. 1 is a combination of three successive problems that can be solved in separate modules: constraints pre-estimation; the integer OF estimation; the subpixel accuracy OF estimation. The paper is organized as follows: in Section II the related work is discussed; in Section III the method of disparity labeling map estimation with SPOF is explained; in Section IV the COFM problem is formulated, and the general scheme of the algorithm is described; computer experiments are discussed in Section V; and Section VI summarizes our conclusions. II. R ELATED W ORK Several methods that use discrete optimization algorithms in the OF estimation have been reported in recent years [2], [25]–[27]. In [25] the quantization problem is solved by computing candidate flow fields using Horn and Schunk [13] and Lucas and Kanade [14] approaches, and [27] SIFT matching is used for the same purpose. Note that in both papers constraint estimation is done by known optical flow estimation techniques and these techniques involve the pyramidal multiresolution approach (with the decreasing number of image grid nodes on each level), which potentially leads to the loss of some significant motion vectors. In [26] the multi-resolution approach is combined with OF estimation in a sparse set of points that allows algorithm starting from the full range of the search space and then successively restricting the initial search space to a reasonable number of dominant motion vectors through the multiresolution levels. A similar approach is proposed in [28], but in this work the more flexible region-tree scaled technique for different accuracy levels is proposed. Both approaches may
2045
suffer from the loss of significant dominant motion vectors. For example, application of the method [26] fails to detect important values of the integer OF for some test sequences on the Middlebury benchmark (e.g. Urban). In contrast, our constraint estimation algorithm does not use the multi-resolution approach for the integer flow estimation. Also, our algorithm estimates it globally (previous work use local constraints or non-constrained search space). Global constraints make the problem definition more clear, strict and flexible like in stereo matching. Application of SPOF for global constraint estimation can be considered as a novelty: previous work uses SPOF for local OF estimation, which severely increases computational complexity and decreases accuracy. Another problem of discreet optical flow estimation is submodular constraint for the regularization term. Especially, this difficulty arises for the coarse-to-fine optical flow estimation, where the same neighbor labels map to different motion vectors. For example, the method [26] achieves subpixel accuracy by using the discreet optical flow estimation approach. However, the global energy minimization algorithm that is used in [26] does not allow non-sub-modular prior matrices. To overcome this difficulty the authors propose a morphing algorithm [29]: on each level of accuracy algorithm transforms the initial image and a temporally flow to a motion interpolated and then match with the target frame. To obtain the final OF the method performs multiple-flow-unwrapping with multiple interpolations. Thus, this approach complicates the calculation scheme and puts limits to achievable accuracy. In contrast, our subpixel accuracy module uses the sequential tree-reweighted max-product message passing (TRW-S) optimization [30], which allows non-sub-modular matrices (but formally belonging to the part of the general sub-modular matrix). Thus all corrections are directly accumulated in every pixel of the initial grid. Consequently, this module can work with accuracy up to 0.01 pixel and works so well that the final post-processing step only polishes the solution and has much less impact than in many other methods, see for example [8] and Table II. III. D ISPARITY L ABELING M AP P RE -E STIMATION The SPOF is a method for rigid image registration, which exploits Fourier transform. We chose this approach because the SPOF algorithm is fast and accurate [21]. Consider two X 1 ×X 2 images, f 1 (x 1 , x 2 ) and f2 (x 1 , x 2 ). Let F1 (k1 , k2 ) and F2 (k1 , k2 ) be the 2D discrete Fourier transforms of the two images. In the definition of POF which uses the spectral phase of f 2 (x 1 , x 2 ) as the filters transfer function, the output r S P O F (x 1 , x 2 ) is the 2D inverse discrete Fourier transform of R S P O F (k1 , k2 ) which is given by RS P O F =
F1 (k1 , k2 ) F2∗ (k1 , k2 ) , |F1 (k1 , k2 )| |F2 (k1 , k2 )|
(1)
where F2∗ (k1 , k2 ) is the complex conjugate of F2 (k1 , k2 ). Let X = {X 1 /2, X 2 /2} is the image half size vector, x = {x 1 , x 2 } is the image coordinate vector. Also suppose that two compared images are shifted one relative another by a constant motion vector v such as f1 (x) = f 2 (x + v)
(2)
2046
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013
It
v1 ( x )
Constraints pre-estimation
v ( ln )
MRF energy minimization
v (x)
v (x)
η ( x)
v2 ( x)
It+1
Integer OF estimation
Fig. 1.
Subpixel accuracy
General scheme of the COFM method illustrated by resultant flow images of the Teddy motion sequence.
Then the sharp peak appears in the coordinate xmax and a shift vector v is equal
ϕ = arg (IG − I R , IG − I B )
v = mod (xmax + X, 2X) − X xmax = arg max (r S P O F (x))
(3)
x
The operator mod() in (3) is the modulo operation with the divisor 2X. Suppose that first image f 1 (x) is composed of L regions f 1l xl and each of them has its own shift vector vl such as (4) f 1l xl = f 2l xl + vl In this case the output function of the SPOF includes L peaks, which can be detected as follows vl = mod xlmax + X, 2X − X l
xlmax = arg max (r S P O F (x)) ,
increasing the computational complexity of the approach. In our algorithm the phase depends on the color as follows:
(5)
x
where the arg maxl function retuns the coordinate xlmax of the l’s maximum of the function r S P O F (x). The operator mod() in (5) is the modulo operation with the divisor 2X. Note, our coarse heuristic model, which is formalized in (5) does not take into account occlusion. More detailed impact of occlusion is discussed later in this section. The set of vectors {vl } in (5) is considered as a constraint for the matching problem. For the SPOF application usually the sum of the color values is taken to encode the input image function f (x): ⎛ ⎞ f (x) = ⎝ (6) Ic (x)⎠ exp (i ϕ) , c∈R,G,B
where Ic (x) are pixel-wise values of the image color channels and the phase variable ϕ is supposed to be zero. Thus the color information is lost. We propose a heuristic encoding that allows to preserve the color information while it is not
(7)
In other words, the phase is equal to the argument of a complex number z = IG − I R + i (IG − I B ). Thereby an arbitrary color vector can be mapped onto the complex plane. To avoid uncertainty we define arg (0, 0) = 0. Consequently, for a grayscale image the phase value automatically becomes zero. The proposed encoding decreases the number of false positives for images with rich color texture, thus potentially decreases computational time for the OF estimation process. The experimental difference between the color and the grayscale encoding is shown in Table I, compare columns “false detections” and “false detections for the grayscale version of images” (FDGS). For accuracy reasons, the output function r S P O F (x) is averaged by a Gaussian filter with the averaging radius equaling one pixel. The number L is an open problem. We did not find strong correlation between number of ground truth (GT) labels and statistical characteristics of the output function r S P O F (x) of the SPOF. However we found a solution that makes the distribution map estimation adaptive: the image plane is parted to several S overlapped regions (with the size 128 × 128 and with the overlapping period 32) and the first m = 65 maxima of every local SPOF output are taken. Then the general distribution for the full frame is accumulated, and 10% of the less repeated values are truncated. In this case any set of the GT discrete vectors belongs to a set of the pre-estimated motion vectors {vl }GT ⊂ {vl } P E for the Middlebury datasets. For example, the resultant disparity labeling maps for the two test sequences Urban-2 and Grove-3 with the GT are shown in Fig. 2. These are two problematic sequences with a large scale of disparity vectors: L = 65 for the Urban-2 sequence and L = 106 for the Grove-3 sequence. In Fig. 2 the colored bins represent the detected GT vectors, the gray bins represent
MOZEROV: CONSTRAINED OPTICAL FLOW ESTIMATION AS A MATCHING PROBLEM
2047
(a)
(b)
Fig. 2. Disparity labeling maps and the GT. (a) GT of the Urban-2 motion sequence and its mutual disparity labeling map. (b) GT of the Grove-3 motion sequence and its mutual disparity labeling map. The colored bins represent the detected GT vectors, the gray bins represent the false detected vectors. See Fig. 3 for the color coding of the flow.
the false detected vectors. More detailed statistics are given in Table I. To illustrate our experimental result we use the HSV color wheel that was proposed in [12]. The color coding is depicted in Fig. 3. Our algorithm considerably reduces the initial search space. However the new space is still redundant due to false detections, especially for sequences with a small range of the motion vectors. It means that some labels are useless. On the other hand, the proposed multidirectional matching approach assumes that the pre-defined label space structure is used at least three times: to estimate the forward-backward flows and for the final integer flow merging procedure. The forward integer flow estimation is able to detect unused labels; as a result the further reduction of the label space is possible. Thus the variable L in our method possesses two values: L 0 - the number of labels after the SPOF pre-estimation and L 1 - the reduced number of labels after the forward integer flow estimation. The results of the secondary reduction of the label space are given in Table I as the numbers in between brackets, and symbols L 0 , L 1 are also introduced in Figs. 7-8. The presence of large occlusion in the considered scene is a real problem of any matching technique, because several regions have no visual correspondence. If occluded patches are relatively large it can produce false detections in the output of the SPOF, thus raising the redundancy of the solution search space, consequently increasing the computational complexity of the estimation process. Nevertheless, our experiments with the Middlebury OF datasets show that there are no lost bins even in the presence of large occlusion (urban sequences), thus we can conclude that at least for the considered datasets the constraints pre-estimation module does not increase OF estimation errors due to occlusion. It is interesting to see, how the SPOF pre-estimation reacts to large patches with multiple or even continuously changing motions. The simplest model of such a motion field is an image zoom. Several experiments are designed to test the algorithm under this model, and the experiments show reasonable preestimation result up to 15% zoom. It means that the algorithm does not lose the GT vectors in the pre-estimation process. Fig. 3 illustrates the result of the labeling map pre-estimation for a synthetic sequence with 10% zoom. This particular sequence is generated on the base of the RubberWhale image
and possesses a large scale of GT disparity vectors: L = 513. In Fig. 3 the colored bins represent the detected GT vectors, the gray bins represent the false detected vectors. Note that in this experiment we get only 198 false positives and the relative detection redundancy is even less than for the Middlebury OF datasets, see Table I. In this paper we aim to solve the problem of the constrained OF estimation. The experiments with the Middlebury datasets confirm our intuitive assumption that the integer OF space can be restricted to a reasonable number of non-zero bins. Actually, for all the 24 sets the number of labels does not overpass 300. The point is, if it is possible to apply the same approach in the case, where all vectors are equally distributed or there is no restriction of search space. The image frame domain might be parted into several overlapped regions with reasonable number of non-zero possible vectors, and then one can estimate OF for each region independently. IV. P ROBLEM D EFINITION AND S OLUTION In this paper the general OF estimation problem is branched into two levels: the integer OF estimation level and the subpixel adjustment level. Let us define symbols: v¯ (x), v (x) and v˜ (x) as a desired solution vector, a solution of the integer OF estimation problem and a corrective motion vector of the subpixel accuracy respectively. Then the mentioned two step strategy can be formalized by the following expression: v¯ (x) = v (x) + η (x)
K k=1
δk vˆ (x) + o (δk ) ,
v˜ (x)
(8)
where η (x) is a confidence map and the term o (δk ) defines an approximation accuracy level. The confidence map factor η (x) masks the occlusion regions. The meaning of the intrinsic parameters of the subpixel adjustment level: δk , K and vˆ will be explained later. A. Integer OF Estimation The general stereo and motion matching approach aims to find correspondence between pixels of images It (x) and It +1 (x), where x is a coordinate of a pixel in the image plane, t is an index of the considered image in the sequence. A vector
2048
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013
TABLE I P RE -E STIMATION M ISSED AND FALSE D ETECTION S TATISTICS BASED ON THE GT A NALYSIS OF THE M IDDLEBURY D ATASETS . “FDG S : FALSE D ETECTIONS F OR THE G RAYSCALE V ERSION OF I MAGES ” I S A BBREVIATED AS FDGS Datasets
GT labels
Pre-estimated Labels
Missed Detections
False Detections
FDGS
Dimetrodon Grove 2 Grove 3 Hydrangea RubberWhale Urban 2 Urban 3 Venus
21 9 104 91 39 61 44 17
111 124 225 180 127 269 257 172
0 0 0 0 0 0 0 0
90 (16) 115 (21) 111 (33) 89 (29) 88 (18) 208 (106) 213 (63) 155 (38)
90 116 113 88 95 229 218 156
(37) (32) (137) (120) (57) (167) (107) (55)
vectors values is pre-defined through the disparity labeling algorithm, which is described in Section III. The cost values C(v(x)) that form the DSI is computed as follows C (x, v) = |It+1 (x + v)−It (x)|+α (x)|∇ It+1 (x + v)−∇ It (x)| (10)
Fig. 3. GT of the synthetic 10% zoom sequence and its mutual disparity labeling map. The colored bins represent the detected GT vectors, the gray bins represent the false detected vectors.
v (xt ) in Fig. 4 (a) denotes the disparity of two corresponding pixels xt and xt +1. The disparity vector v has usually the same dimensionality as the domain of the image except in certain special cases: for instance, if stereo matching is considered the disparity vector domain becomes one dimensional due to the additional epipolar constraints. Simply expressed, if the stereo or motion matching problem is considered a dense disparity map v(x) has to be obtained. In stereo matching the most appropriate way to solve such a problem is global optimization [15]–[17], [31], [32]. Note that in the case of the distribution constrained OF the desired motion vectors belong to a set of discrete vectors v ∈ {vl }, and this set can be pre-estimated by the technique that is described in the previous section. The global energy minimization approach intends to find the desired disparity function v(x), which minimizes the energy function E(v(x)) in the disparity space image (DSI) C(x, v), see Fig. 4 (b). The DSI (sometimes called the correlation volume or the cost volume) is the 4D discrete space that is mapped to the problem solution domain. The DSI represents a discrete collection of correspondence cost. For example, if two compared pixels (x1 )t and (x1 + v1 )t +1 have the same luminance value (which means that these pixels are a potential match) the cost value C (x1 , v1 ) might be minimal. Vice versa, if the luminance values differ, the related cost value increases. The global energy usually contains two terms, the data term and the smoothness term C (x, v(x)) + G (v(x)), (9) E (v(x)) = x∈
x∈
where G is a smoothness function and is the domain of the vector x. The domain of the vector v composed of L motion
where It +1 , ∇ It +1 , It , ∇ It are luminance and gradient of luminance values of two neighboring images in dynamic sequences respectively. A local weight parameter α (x) is calculated as α (x) = 1−ρ (|∇ It (x)|) , (11) where ρ() is the cumulative distribution of the gradient modulus of an image luminance It . The goal of introducing the function α (x) in (10) is to augment the matching robustness in the texture-less regions of an image by increasing the weight of the gradient in the cost function. In (11) we slightly modify the idea that is proposed in [27]. The function G in the smoothness term of (9) is given by f (|vr (x d + 1) − vr (x d )|), (12) G (v(x)) = ω (x) d∈D r∈R
where ω (x) is a locally adaptive function used to penalize motion vector discontinuities, R and D are the dimensionalities of motion vector and image spaces respectively. Later on in this paper a shortcut f (|vr (x d + 1) − vr (x d )|) (13) f (|v (x d + 1) − v (x d )|)= r∈R
is used for the distance measure. A positive definite increasing function f is usually proportional to the gradient of the motion vector or its squared value |v (x d + 1) − v (x d )| (14) G (v (x)) = ω (x) d∈D
To prevent over-penalizing discontinuity a more flexible smoothness function is used ¯ G(v(x)) = ω (x) min f (|v(x d + 1) − v(x d )|) , f (g) , d∈D
(15) where g is a truncation threshold, and the function ω (x) can be expressed as 2λ ρ (|∇ It (x)|) < 0.7 ω (x) = (16) λ otherwise
MOZEROV: CONSTRAINED OPTICAL FLOW ESTIMATION AS A MATCHING PROBLEM
2049
C ( x1 , v )
y
v v (x)
x (a)
(b)
Fig. 4. (a) Scheme of correspondence matching between two images. (b) DSI as a collection of correspondence costs C (x, v), and a desired function of disparity value v(x).
where λ is an experimental constant. In other words, if a value of a local gradient of an image It (x) is less than a threshold that is calculated on the base of the cumulative distribution ρ(), than ω (x) = 2λ. The idea of such adaptive function is proposed in [19]. In our experiments we put the threshold g equal to 3 and choose the most popular truncated linear prior f (|a|) = |a| in (15). Also, the parameter λ of the prior function in (15) is taken proportional to the mean value C (x, v) of the DSI cost 3 C (x, v) (17) λ= g
range of integer motion vectors. On the other hand, invisible pixels in the frame It become visible in the frame It+1 . Thus, one can expect that analysis of the results of two integer OF estimations v (xt ) and v (xt+1 ) helps to handle occlusion problem as it was done for stereo matching in [34]. For this purpose, the forward and backward flows have to be estimated with the procedure described in the previous subsection. Then these pre-flows are used for a more accurate estimation of the desired integer flow with algorithm described in this subsection. Let us consider an unwrapped mirror integer OF
The desired discrete function v (x) is a solution of the global energy minimization problem
(19)
v (x) = arg min (E (v(x))) ,
(18)
v
where E is introduced in (9). In general, the energy minimization problem is an NP-hard problem, thus approximate minimization algorithms have to be chosen to solve the problem. To make our choice we follow the analysis given in [33] and finally apply the TRW-S algorithm described in [30]. The TRW-S is the method, which was developed in the framework of the belief propagation paradigm. The sequential approach makes this algorithm convergent and fast. For the truncated linear and quadratic priors the method usually reaches 1% approximation accuracy in a few iterations, thus outperforming the popular graph cut expansion algorithm both in accuracy and speed. The additional advantage of the TRW-S is that the algorithm requires half as much memory as traditional message passing approaches. B. Occlusion handling The solution v (x) of (18) can suffer from inaccuracy especially in occluded regions in sequences with considerable
v2 (xt ) = −v (xt +1 + v (xt )) ,
where xt+1 ∈ t+1 and xt ∈ t are two domains of two consecutive frames in video sequence, v (xt+1 ) and v (xt ) are two respective results of integer OF estimation with two different matching directions t → t + 1 and t ← t + 1. If we define v1 (xt ) ≡ v (xt ) then two integer OF v1 and v2 have to be equal for all non-occluded pixels. Thus, the confidence map in (8) can be defined as η (x) = (1 + |v2 (x) − v1 (x)|)−1 ,
(20)
where function η (x) achieves its maximal values in pixels with strong equality of two different integer OFs v2 (x) = v1 (x). Occlusion based confidence measures similar to (20) have been introduced in [35]. If more than two images in a sequence are available it is useful to estimate another integer OF v3 (x) relative to the previous frame t − 1 ← t, and its mirror integer OF v4 (x): v4 (xt ) = −v (xt −1 + v3 (xt ))
(21)
Our motivation to introduce these additional integer OFs is based on the assumption that v3 (x) = v4 (x) ≈ v1 (x) = v2 (x) and also that the reciprocal occlusion regions of a pair of images It and It +1 in general not coincide with the reciprocal
2050
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 5, MAY 2013
TABLE II AVERAGE AND M EDIAN ( IN B RACKETS ) EE FOR D IFFERENT S TEPS OF THE P ROPOSED A LGORITHM AND D IFFERENT A CCURACY L EVELS o (δk ). D: D IMETRODON . G2: G ROVE 2. H: H YDRANGEA . RW: RUBBERWHALE . U2: U RBAN 2. IOF: I NTEGER OF IS. PPOF: P OST-P ROCESS OF. IOF+LK: I NTEGER OF P LUS L UCAS -K ANADE R EFINEMENT Step
o (δk )
D
IOF v1 IOF v OF v¯ 1 OF v¯ 12 PPOF v¯ P P IOF+LK
0.5 0.5 0.25 0.002 0.002 0.002
0.399 0.395 0.215 0.097 0.088 0.251
G2 (0.395) (0.397) (0.196) (0.068) (0.061) (0.186)
0.467 0.447 0.280 0.112 0.109 0.387
H (0.413) (0.407) (0.232) (0.035) (0.031) (0.182)
(a)
0.291 0.302 0.276 0.156 0.154 0.271
RW (0.187) (0.180) (0.180) (0.034) (0.034) (0.121)
0.301 0.291 0.211 0.072 0.065 0.187
U2 (0.239) (0.239) (0.174) (0.029) (0.028) (0.086)
0.867 0.668 0.439 0.228 0.211 0.604
(0.486) (0.507) (0.255) (0.066) (0.058) (0.229)
(c)
(b)
Fig. 5. Quantitative evaluation of the Gaussian noise impact. (a) Lost bins (%) according to the Gaussian noise Std. Dev. (b) Average EE according to the Gaussian noise Std. Dev. (c) Redundancy(%) according to the Gaussian noise Std. Dev.
occlusion regions of a pair of images It −1 and It . Similar temporal continuity constraints have been researched in [36]. To estimate the final integer OF we apply the same global optimization approach that is described in the previous subsection. The difference is that the merging algorithm uses the previously estimated back and forward flows as an observation. Thus the cost term C (x, v) in (9) now is not a color matching similarity measure, but a likelihood estimate based on local statistics of the previously calculated flows. If, for example, all previously calculated flow values of a local neighborhood l centered in a pixel x are equal to vl then the probability of the event: (x)t corresponds to x + v t +1 is 1. In this particular l case we expect l that the cost function is C x, v = v = 0 and C x, v = v = ∞. The colors of the image appear now only in the bilateral kernel, which defines a local neighborhood of a pixel x. Consequently, the cost function is calculated as follows ⎛ ⎞ T φ (t) η2 (x) (x + r)1v (vt (x))⎠ C (x, v) = − log ⎝ t =1
|r|