ROBUST DIFFUSION KERNELS FOR OPTICAL FLOW SMOOTHING Ashish Doshi and Adrian G. Bors Department of Computer Science, University of York, York YO10 5DD, UK E-mail:
[email protected] ABSTRACT This paper provides a comparison study among a set of novel algorithms that implement robust diffusion on optical flows. The proposed algorithms combine the anisotropic smoothing ability of the heat kernel and the outlier rejection mechanism of robust statistics algorithms. The diffusion kernel is considered Gaussian, where the covariance matrix is the local Hessian. This enables the kernel to detect significant transitions in the signal. In this study we show that diffusion does not eliminate outliers but rather spreads them around. We calculate the resulting bias induced by diffusing the outliers in their neighbourhood. On the other hand robust statistics operators reject the outliers from the diffusion process. Alpha-trimmed mean and median statistics are considered in combination with the diffusion processing. The proposed algorithms are applied for smoothing optical flow. 1. INTRODUCTION Analyzing motion is essential for understanding visual surroundings. Representative applications of motion analysis [1] include video coding, robotic vision, moving object segmentation [2], tracking [3], etc. The consistency of localized grey-level patterns in consecutive frames is the basis for optical flow estimation. The block matching algorithm decides the motion vector according to the best correlation in pairs of blocks localized in consecutive frames [2]. However, the lack of contrast, the presence of noise and the variation in illumination can lead to wrong motion estimation. These problems are amplified when modelling complex movement such as that of fluids or of multi-object systems. This paper develops a methodology that combines the advantages of two different approaches: diffusion and robust statistics and applies it for smoothing vector fields. Anisotropic diffusion was employed by Perona and Malik for image smoothing while preserving the edges [4]. Their work is extended to colour image diffusion by Black et al. [5]. While Perona and Malik used a Lorentzian function for edge detection, Black et al. proposed using the Tukey’s biweight function for preserving sharper boundaries. Ling and Bovik combined the median filter and the diffusion operator of Black et al. [5], for smoothing noisy molecular
images [6]. The stochastic formulation of the heat equation employing partial differential equations (PDE) was used for shape contour modelling in [7]. Tschumperl´ e and Deriche provide a simpler interpretation of the diffusion process in terms of local filtering with spatially adaptive Gaussian kernels [8]. Their method disassembles the regularization process into the smoothing itself and the underlying geometry that drives the smoothing. This algorithm was applied for smoothing colour images and for inpainting [8]. According to a study provided in this paper, most of the diffusion algorithms are not able to properly deal with outliers. Outliers are diffused in the neighbourhood instead of being removed. This paper introduces a new category of algorithms that combine robust statistics [9] and Hessian based diffusion kernels. These algorithms are applied for smoothing optical flows. The smoothing of optical flows is very important for various applications including frame prediction [3] and video coding. The concept behind our approach is to enable the smoothing process with an outlier rejection mechanism. We extend these algorithms to employ 3-D Hessian based kernels. As a result we increase the amount of vectors used in the smoothing process by including the temporal information. Section 2 describes the application of Hessian based kernels to the optical flow, while Section 3 introduces the robust statistics diffusion kernels. Section 4 provides the experimental results of this study on both artificial data and in image sequences, while the conclusions of this study are outlined in Section 5. 2. DIFFUSION KERNELS Generally, the two dimensional heat equation of a geometric manifold can be described as [10, 11] : ∂I(x, t) − ∇2 I(x, t) = 0 (1) ∂t where I(x, t) describes the heat location at x and time t, starting with the initial conditions I(x, 0) = I(x). The solution to the heat equation yields [10] : Z Kt (x, y)I(x)dx (2) I(y, t) = M
where Kt (x, y) is the heat kernel which applies an operation of diffusion onto M and x, y ∈ M .
If we assume M to be compact, then Kt (x, y) can be decomposed with respect to its eigenvalues and eigenvectors. In a 2D space we have the eigenvalues as {c1 , c2 }, while the eigenvectors {u, w} define an orthogonal space. The Heat equation uses oriented Laplacians and in this case becomes [8] : ∂2I ∂2I ∂I(x) = c1 2 + c2 2 = trace(TH(x)) (3) ∂t ∂ u ∂ w where the second derivatives of the data are calculated along the directions of the eigenvectors, H(x) is the Hessian matrix at location x while T is the 2 × 2 tensor defined by :
T = c1 uut + c2 wwt (4) When M ≡ IR, then the heat kernel is the Gaussian kernel [10]. In this case, the solution to the Heat equation (1) yields : Z 1 ˆ c) = √ 1 I(z exp − (x − zc )t Σ−1 (x − zc ) I(x)dx 4d 4πd IR (5) where Σ represents the covariance matrix, zc is the kernel center and d is a normalization coefficient. The Hessian of local data can be considered as the covariance matrix for the heat kernel. In the following we show how the optical flow is estimated from image sequences. If we use the Taylor expansion in an image sequence [1] : I(x + δx, t + δt) ≈ I(x, t) = I(x, t) + ∇I · δx + It δt (6)
∂I ∂I where ∇I = ( ∂x , ∂y ) and It represent first order partial spatial and temporal derivatives, respectively. Rearranging equation (6) yields :
∇I · V + It = 0
(7)
where V = (Vx , Vy ) denotes the motion vector and ∇I is the spatial intensity gradient. Equation (7) is known as the constrained optical flow equation [1]. Extending its calculation for multiple frames we obtain a vector field on a 3D lattice, where each plane of the lattice corresponds to the motion between two consecutive frames. Motion vectors can be initially calculated using the block matching algorithm [2]. In the block matching algorithm we search for the best correlation in the original frame of each block of pixels from the reference frame. However, the block matching algorithm often leads to wrong decisions, particularly in areas with similar texture and colour, or due to noise and changes in the illumination conditions. 2.1. The 2D Hessian Kernel Generally, in the case of image sequences one challenge is to achieve high optical flow robustness against strong assumption violations which are commonly met in practice. In such cases vector field smoothing is necessary. However, indiscriminate smoothing could produce undesired effects in the estimated optical flow. In order to obtain smooth optical flows second order differential methods [1], or robust
statistics [2] is needed. The Hessian H2D is a matrix whose entries are the second order local partial derivatives which represents a suitable measure of variation in the geometry of the local statistics [11]. The correlation with the local Hessian can be viewed as the application of a moving mask that represents the variation in the local data. Therefore, the Hessian can be used as a detector of change in the direction of the optical flow indicating the location of moving object boundaries. The Hessian of the local data is considered as the covariance matrix in the heat kernel from (5). The non-singularity of the local Hessian can be enforced by using various procedures [8]. The optical flow associated with complex motion such as that resulting from rotation, zooming or that created by the movement of turbulent fluids can be better represented after being smoothed with a Hessian kernel. After discretizing and normalizing equation (5), we obtain : P Vtki exp[−(xi − zc )T H−1 2D,c (xi − zc )] t+1 xi ∈η(zc ) ˆ Vkc = P exp[−(xi − zc )T H−1 2D,c (xi − zc )] xi ∈η(zc )
(8) where Vtki is the vector at location i within a neighbourhood η(zc ) = 3 × 3 centered at location zc , t denotes iteration number, k is the frame number, and where the dependency of d is no longer necessary. 2.2. Multiple 2D Hessian Kernels The 2D Hessian kernel from (8) applies smoothing on motion vector fields calculated between pairs of consecutive frames. We can extend this approach by considering multiple frames. In this case we have a 3-D lattice formed from parallel and equidistant vector fields modelling the optical flow from the entire image sequence. Firstly, we apply the local smoothing by using (8) at each motion vector location from each frame. Thereafter, we consider temporal smoothing by calculating the resulting diffused vectors from averaging the effects of the Hessian based diffusion from the neighbouring frames. In the experiments two vector fields are considered on either side of the central frame. 2.3. 3D Hessian Kernel The diffusion kernel can be extended to 3D lattices : P Vtki exp[−(xi − zc )T H−1 3D,c (xi − zc )] t+1 xi ∈η3D (zc ) ˆ kc = V P exp[−(xi − zc )T H−1 3D,c (xi − zc )] xi ∈η3D (zc )
(9) where the neighbourhood is defined in 3D as η3D (zc ) = 3 × 3 × 4, and k is the central frame. In (9), the 2D Hessian kernel is extended to 3D in order to accommodate the spatio-temporal variation in the optical flow. By processing a larger amount of data, the optical flow transitions and moving object boundaries would be better mod-
120
100
80 Function
elled whilst diffusing the vector field. The 3D Hessian matrix is given by : ψxx ψxy ψxt H3D = ψyx ψyy ψyt (10) ψtx ψty ψtt ∂2V ∂2V ∂2V where ψxx = , ψ = , ψ = , ψyt = yy xt ∂x2 ∂y 2 ∂x∂t 2 2 2 2 ∂ V ∂ V ∂ V ∂ V , ψtx = , ψty = and ψtt = , where t ∂y∂t ∂t∂x ∂t∂y ∂t2 denotes the frame index.
60
40
20
3. ROBUST STATISTICS DIFFUSION KERNELS
0
0
2
4
6
3.1. The bias introduced by diffused outliers
j=−N/2
(12) where N is the size of the diffusion window. The location where the diffusion is calculated corresponds to the center of the window located at zc = K − i. It can be observed that the second derivatives in the signal from (11) are zero except at the locations j = {i − 1, i, i + 1}. After replacing (11) into (12) we obtain the following result : g(K − i) = y+ 2 M exp −i2 M 4 2 4 exp −(i − 1)2 M + exp −i2 M + exp −(i + 1)2 M
The right part of this expression represents the expression of the bias resulted from diffusing the outlier present in the signal from (11). The diffusion of the outlier will influence several signal values in its neighbourhood, depending on the window size, N . We assume a window of N = 3 and we apply diffusion repeatedly on the signal displayed in Fig. 1(a). The results produced by diffusing the signal for successive iterations are shown in Fig. 1(b). As we can observe, the outlier is diffused in the surrounding signal. After a certain number of iterations, the values of the diffused signal are stationary and always biased. In conclusion we should use a robust diffusion operator in order to eliminate or reduce the bias.
10 Location
12
14
16
18
20
16
18
20
(a) Initial signal. 70
60
k=1
50 Diffused function
In the following we perform a study of the bias introduced when diffusing outliers. Let us consider the following onedimensional (1D) signal : y , x 6= K f (x) = (11) y+M , x=K This signal has an outlier of height M at the location x = K, and its discretized version is displayed in Fig. 1(a) when M = 100. If we consider the implementation of the Hessian based diffusion on 1D, the expression (8) becomes : " −1 # 2 N/2 X ∂ f (zc + j) 2 f (zc + j) exp −j ∂x2 j=−N/2 g(zc ) = " −1 # 2 N/2 X ∂ f (zc + j) 2 exp −j ∂x2
8
k=3 k=6
40
30
20
k=12
10
0
0
2
4
6
8
10 Location
12
14
(b) Diffused signal. Fig. 1. The effect of diffusion on outliers 3.2. The alpha-trimmed mean of Hessian kernel In the following we show how we can robustly diffuse vector fields. Interquartile averaging, also known as the alphatrimmed mean algorithm [9] ranks the given data samples and eliminates a certain percentage of them at the extremes of the ranked array. The aim of this method is to remove outliers and to apply the diffusion algorithm only on data that is statistically consistent. The updating equation is: N −αN P Vt(i) exp[−(xi − zc )T H−1 c (xi − zc )] t+1 i=αN ˆ kc = V N −αN P exp[−(xi − zc )T H−1 c (xi − zc )] i=αN
where α is the trimming percentage from a ranked array of N vectors from the neighbourhood η, and Vt(i) represents the i-th ordered vector at iteration t. This algorithm reduces the computational cost of the diffusion. In the experimental results N = 9, while αN = 2 and so 4 vectors are eliminated from the smoothing process. The Hessian can be either H2D or H3D from (10). 3.3. The median of directional Hessian kernels Another robust statistics approach consists of combining the use of median statistics with the diffusion kernel. The me-
Diff
Diff
Diff
noisy vectorial field corrupted with Poisson noise of variance σ 2 = 0.25, is shown in Fig. 3(b). 3
60
Diff 2
Diff
Diff
MED
50 1
40 0
30 −1
20
Diff
Diff
Diff
−2
10 −3 −3
Fig. 2. Median of directional Hessian kernels. dian algorithm has the ability to eliminate up to 50 % outliers [9] and has been used for robustifying Gaussian kernels [2] as well as in combination with diffusion kernels for smoothing molecular images [6]. The way how this algorithm works is illustrated in Fig. 2. The Hessian-based diffusion algorithm is initially applied as in (8) but calculated directionally instead of centrally with respect to the window location. The window locations for which the diffusion is calculated are indicated by crosses in Fig. 2. Subsequently, the median operator is applied onto the results produced by the directional diffusions: ˆ t+1 = M edian(V ˆ t+1 , ηk (med)) V (13) kc
(i)
where ηk (med) is the window of the median operator centered at k, usually 3×3 vectors that resulted from directional diffusions. This algorithm takes into account extended neighbourhoods aiming to reduce overlaps among local estimates. The total number of vectors considered in the smoothing process is extended from 3 × 3 vectors to a total of 5 × 5 vectors when assuming overlaps among the diffused neighbourhoods. The influence of outliers will be diffused during the first operation. However, during the second processing operation any biased influence is eliminated. This operator can be applied onto the 2D Hessian, multiple frame 2D Hessian or onto the 3-D Hessian, where the neighbourhood becomes η(med) = 5 × 5 × 4 vectors. 4. EXPERIMENTAL RESULTS The proposed robust diffusion algorithms have been applied on artificial vector fields and on optical flows extracted from image sequences. In the first set of experiments an artificial vector field is considered for testing 2-D robust diffusion kernels. Let us consider the function : x 2 2 − Z(x, y) = 3(1 − x)2 exp−x −(y+1) −10 5 (14) 2 2 2 2 1 −x3 − y 5 exp−x −y − exp−(x+1) −y 3 ∂Z , The vector field components are obtained as V = ∂Z ∂x ∂y . The resulting vector field is shown in Fig. 3(a). We corrupt the vector field using Gaussian and Poisson noise distributions, both added independently along x and y axes. The
−2
−1
0
1
2
3
10
(a) Artificial vector field 60
50
50
40
40
30
30
20
20
10
10
20
30
40
50
30
40
50
60
(b) Poisson noise
60
10
20
60
10
(c) 2DH
20
30
40
50
60
(d) Perona-Malik
60
60
50
50
40
40
30
30
20
20
10
10
10
20
30
40
50
60
10
20
30
40
50
60
(e) ATM2DH (f) MED2DH Fig. 3. Artificial vector field after corruption with Poisson noise with σ 2 = 0.25 and after 5 iterations of smoothing by various methods. The algorithms described in this paper and two well know diffusion algorithms, respectively Perona-Malik (PM) proposed in [4] and Black et al. algorithm described in [5], have been used for smoothing the noisy data with the aim of reconstructing the original vectorial fields. In Table 1 are provided the numerical results obtained after one iteration of smoothing synthetic vector fields corrupted by noise. The algorithms presented in this paper are denoted as: 2DH - 2D Hessian, ATM2DH - Alpha-Trimmed Mean 2D Hessian, and MED2DH - Median 2D Hessian. Five different variance values are considered for both Gaussian and Poisson noise. The numerical results are assessed in terms of two error measures representing the difference between the original and the smoothed vector fields. These error measures consists of the mean square error (MSE) and the mean cosine error (MCE). The MSE is given by: PN (Vi − Vˆi )(Vi − Vˆi )T (15) MSE = i=1 N where N is the total number of vectors in the given vector field, Vi is the ground truth before adding the noise and ˆ i is the noisy vector field at location i after smoothing and V
Poisson
Gaussian
Noise σ2 0.01 0.10 0.25 0.30 0.40 0.01 0.05 0.10 0.25 0.40
Perona-Malik MSE MCE 0.0111 0.7217 0.0612 0.5317 0.1557 0.4354 0.2137 0.4690 0.4253 0.4147 0.0211 0.9607 0.2906 0.8032 1.3642 0.6311 8.0080 0.3741 19.5920 0.2391
Black MSE MCE 0.0111 0.7217 0.0612 0.5315 0.1556 0.4351 0.2133 0.4686 0.4247 0.4140 0.0211 0.9611 0.2906 0.8051 1.3641 0.6332 8.0077 0.3755 19.5909 0.2399
Method 2DH MSE MCE 0.0170 0.6811 0.0834 0.5355 0.1663 0.4730 0.2467 0.5005 0.4595 0.4552 0.0618 0.9518 0.6488 0.7817 2.2359 0.6050 10.2883 0.3552 22.5858 0.2318
ATM2DH MSE MCE 0.0218 0.6259 0.2155 0.4419 0.4733 0.3573 0.4618 0.3752 0.6156 0.3239 0.0049 0.9764 0.0090 0.9714 0.0699 0.9254 2.4078 0.6323 10.6945 0.3705
MED2DH MSE MCE 0.0113 0.7674 0.0431 0.6045 0.0961 0.5490 0.1402 0.5815 0.3310 0.5448 0.0098 0.9858 0.1508 0.8819 0.9082 0.6918 7.6896 0.3464 19.3195 0.2149
Table 1. MSE and MCE results for the synthetic vector field after one iteration of smoothing. smoothing. The MCE is given by: PN Vi · Vˆi MCE = i=1 (16) N The dot product between the two vectors provides the cosine of the angle between them. The method providing the best result of the five tested, for each noise distribution, variance and according to each error measure, is highlighted in bold in Table 1. MED2DH provides the best performances when smoothing out Gaussian noise whilst ATM2DH is the best kernel when smoothing out Poisson noise. Figs. 3(c)-3(f) show the results after applying diffusion kernels onto the noisy artificial vector field. From these figures we conclude that MED2DH provides well defined moving objects and smooth vector fields. The second set of experiments provides a comparative study when diffusion algorithms are applied on optical flows extracted from image sequences. The block matching algorithm was used to initialize the optical flow. The image sequences considered in the experimental results are enumerated in Table 2. Various smoothing algorithms have been considered on the resulting optical flows. The algorithms are denoted according to the type of kernel used for smoothing. The proposed algorithms are applied on both 2-D and 3-D vectorial fields. The new notations represent: 3DH - 3D Hessian kernel, ATM-3DH - alpha trimmed mean of 3D Hessian, MED-3DH - median of 3D Hessian. The frame 4 from the “Concorde take-off” sequence is shown in Fig. 4(a). This sequence was chosen for its complex motion characteristics such as rotational movement, turbulent air from jet thrusters, blocky artifacts from frame compression and camera movement combined with a rigid moving object. Fig. 4(b) shows the optical flow estimated using the block matching algorithm between frames 4 and 6. In order to assess the efficiency of the smoothed optical flows we calculate the PSNR (peak signal-to-noise ratio) between the real frame and the frame reconstructed using the smoothed optical flow. This is an important measure for frame prediction based on the optical flow, particularly in video coding
10
20
30
40
50
60
(a) Frame 4
10
20
30
40
50
60
70
80
(b) Original vector field
10
10
20
20
30
30
40
40
50
50
60
60 10
20
30
40
50
60
70
10
80
20
(c) 2DH
30
40
50
60
70
80
(d) 3DH
10
10
20
20
30
30
40
40
50
50
60
60
10
20
30
40
50
60
(e) ATM-3DH
70
80
10
20
30
40
50
60
70
80
(f) MED-M2DH
Fig. 4. Smoothed optical flow in “Concorde” sequence. algorithms when used in the “Concorde” sequence. Fig. 5 displays the PSNR convergence, obtained when displacing each block of the frame individually, with its corresponding smoothed vector when considering various smoothing algorithms. Figs. 4(c)-4(f) show the smoothed vector fields after 5 iterations when using 2DH, 3DH, ATM-3DH and MED-M2DH. Whilst diffusing the noise, the vector field is smoothed uniformly along the edges of the rigid object in the case of robust kernels.
posed algorithms are applied on artificial data and on motion estimated from image sequences. The proposed robust diffusion methodology provide clear improvements over classical diffusion algorithms. The improvements are especially evident when dealing with complex optical flows. 6. REFERENCES [1] J. L. Barron, D. J. Fleet, and S. Beauchemin, “Performance of optical flow techniques,” Int. Journal of Computer Vision, vol. 12, no. 1, pp. 43–77, 1994.
Fig. 5. PSNR for the predicted “Concorde” frame 6. PSNR(dB) Taxi ConcordeFighterCloudsTornadoTraffic 2DH 19.38 18.01 17.46 18.11 21.78 14.01 PM 18.15 17.40 16.65 16.93 21.14 12.97 Black 18.05 17.39 16.68 16.99 21.21 13.00 ATM-2DH 20.51 16.97 16.73 17.06 22.33 14.10 M2DH 19.35 19.20 18.40 17.20 21.84 13.68 ATM-M2DH20.20 18.06 16.10 16.36 23.97 13.38 3DH 19.34 17.94 17.54 18.02 21.86 14.10 ATM-3DH 19.92 17.91 15.91 16.24 23.72 13.26 MED-2DH 21.11 19.09 19.17 20.66 22.84 16.06 MED-M2DH20.74 20.56 20.17 20.12 23.05 16.22 MED-3DH 21.22 19.08 19.08 20.55 23.25 16.03 Method
Table 2. PSNR between the predicted frame based on smoothed optical flow after 1 iteration and the actual frame. Table 2 provides the PSNR calculated between the actual frame and the predicted frame for six different image sequences. The best results in Table 2 are highlighted in bold. From these results it can be concluded that robust kernels such as MED-M2DH, ATM-3DH, ATM-M2DH, and MED-2DH have better overall performance than other algorithms such as Perona-Malik, Black or 2DH. In general, methods that operate on 3D lattices such as 3DH (10) and MED-2DH kernels (13) are better than those that employ 2DH kernels, as for example (8). 5. CONCLUSIONS A set of robust diffusion algorithms is employed for optical flow smoothing. The local Hessian diffusion kernel performs smoothing along the main optical flow structure, thus preserving the moving object borders. In this study, we have shown that diffused outliers produce bias in the neighbouring data. The extention to 3D Hessian based kernels considers the temporal information from multiple frames. Robust statistics algorithms such as alpha trimmed-mean and marginal median are used for eliminating outliers. The pro-
[2] A. G. Bors and I. Pitas, “Optical flow estimation and moving object segmentation based on median radial basis function network,” IEEE Trans. on Image Processing, vol. 7, no. 5, pp. 693–702, 1998. [3] A. G. Bors and I. Pitas, “Prediction and tracking of moving objects in image sequences,” IEEE Trans. on Image Processing, vol. 9, no. 8, pp. 1441–1445, 2000. [4] P. Perona and J. Malik, “Scale-space and edge detection using anisotropic diffusion,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 12, no. 7, pp. 629–639, 1990. [5] M. J. Black, G. Sapiro, D. H. Marimont, and D. Heeger, “Robust anisotropic diffusion,” IEEE Trans. on Image Processing, vol. 7, no. 3, pp. 421– 432, 1998. [6] J. Ling and A. C. Bovik, “Smoothing low-SNR molecular images via anisotropic median-diffusion,” IEEE Trans. on Medical Imaging, vol. 21, no. 4, pp. 377– 384, 2002. [7] G. Unal, H. Krim, and A. Yezzi, “Stochastic differential equations and geometric flows,” IEEE Trans. on Image Processing, vol. 11, no. 12, pp. 1405–1416, 2002. [8] D. Tschumperl´ e and R. Deriche, “Vector-values image regularization with PDEs: a common framework for different applications,” IEEE Trans. on Pattern Anal. and Machine Intell., vol. 27, no. 4, pp. 506–517, 2005. [9] I. Pitas and A. N. Venetsanopoulos, Nonlinear Digital Filters: principles and applications, Kluwer Academic, Norwell: MA, 1992. [10] R.A. Hummel, B. Kimia, and S.W. Zucker, “Gaussian blur and the heat equation: forward and inverse solutions,” in Proc. of Int. Conf. on Computer Vision and Pattern Recognition (CVPR), 1985, pp. 668–671. [11] J. Lafferty and G. Lebanon, Diffusion Kernels on Statistical Manifolds, CMU-CS-04-101, Technical Report, Carnegie Mellon University, USA, 2004.