the time derivative of the image intensity contains noise. In this paper, we ... least-squares criterion, covariance matrix, generalized eigenval- ues, derivative ...
IEICE TRANS. INFORMATION AND SYSTEMS, VOL. E79-D, NO. 7 JULY 1996
915
PAPER
Optical Flow Detection Using a General Noise Model Naoya OHTAy , Member SUMMARY In the usual optical ow detection, the gradient constraint, which expresses the relationship between the gradient of the image intensity and its motion, is combined with the least-squares criterion. This criterion means assuming that only the time derivative of the image intensity contains noise. In this paper, we assume that all image derivatives contain noise and derive a new optical ow detection technique. Since this method requires the knowledge about the covariance matrix of the noise, we also discuss a method for its estimation. Our experiments show that the proposed method can compute optical ow more accurately than the conventional method. key words:
image motion, maximum likelihood estimation,
least-squares criterion, covariance matrix, generalized eigenvalues, derivative kernels.
1. Introduction
Detecting optical ow from motion images is one of the most fundamental processes for computer vision, and many techniques for it have been proposed in the past [2]. One of the most basic approaches is the use of the gradient constraint , which is expressed in the form Ex u + Ey v + Et = 0; (1) where Ex , Ey and Et are partial derivatives of the image intensity with respect to the space coordinates x, y and time t, respectively; u and v are the x- and ycomponent of the ow, respectively. This approach, however, has the following two problems: 1. The gradient constraint alone is insucient to determine the ow uniquely at each pixel. 2. The gradient constraint imposes only an approximate relationship between the ow and the image derivatives. The rst problem is often referred to as the aperture problem. In order to obtain additional constraints, we need some assumption about the spatial distribution of the ow. The following are typical examples: Constant ow assumption: The ow is constant over a small image patch [7]. Manuscript received October 11, 1995. Manuscript revised March 6, 199x.
yThe
author
is
with
the
Department
of
Computer
Science, Gunma University, 1{5{1 Tenjin-cho, Kiryu 376 Gunma, Japan
The ow changes smoothly over the entire image [3]. The second problem arises for several reasons. First of all, electronic noise of the imaging device invalidates Eq. (1). Furthermore, we must approximate the image derivatives by the value obtained by applying a digital kernel in actual computation. A simple way to cope with this problem is to introduce an error term into Eq. (1) in the from Ex u + Ey v + Et = ; (2) and to apply the least-squares criterion. Here, symbols without bars represent actually computed values. Combining the least-squares criterion with the constant ow assumption, we obtain the following minimization problem: X ! min : (3) JLS =
Smoothness assumption:
2
patch
The symbol P denotes summation over the pixels in the image patch over which the ow is assumed to be constant. We can view the above minimization as the following statistical estimation. Suppose the time derivative Et is observed with additive Gaussian noise Et but Ex and Ey are free from noise: Ex = Ex ; Ey = Ey ; Et = Et + Et : (4) The minimization (3) is equivalent to maximum likelihood estimation for this noise model (Appendix). However, this noise model is inconsistent with the fact that the observed image intensity contains noise and all the derivatives are computed by digital kernels. Hence, it is more reasonable to assume that all the derivatives are observed with Gaussian noise in the form Ex = Ex + Ex ; Ey = Ey + Ey ; (5) Et = Et + Et : In the next section, we present an optical ow detection method based on this noise model coupled with the constant ow assumption. patch
2. Flow Detection
2.1 De nitions We de ne the true intensity gradient vector rE , the observed gradient vector rE, the three-dimensional
IEICE TRANS. INFORMATION AND SYSTEMS, VOL. E79-D, NO. 7 JULY 1996
916
representation of the ow u and the three-dimensional Gaussian noise vector e as follows: rE = (Ex Ey Et)> ; (6) > rE = (Ex Ey Et) ; (7) > u = (u v 1) ; (8) e = (Ex Ey Et )> : (9) In the above equations, the subscript runs over the pixels in the patch over which the ow is assumed to be constant, and the superscript > denotes transpose. We assume that the noise e has zero mean and the same covariance matrix Ve at all pixels. In terms of the above vectors, the noise model is rE = rE + e; (10) and the gradient constraint has the form (rE; u) = 0; (11) where ( 1; 1 ) denotes the inner product of vectors. Equation (2) can be rewritten as (rE; u) = : (12) Under the noise model of Eq. (10) and the constraint (11), the maximum likelihood estimator u^ of the ow u is given as the solution of the following minimization (Chap. 7 of Ref. [6] or Chap. 7 of Ref. [5]): n X (rE ; u) ! min : J= (13) (u; Ve u) Here, n is the number of the pixels in the patch. 2.2 Minimization Computation Since the function J given in Eq. (13) is independent of the scale of u, the minimization can be conducted over a sphere in three dimensions. Using the threedimensional vector v = ( p q r )> (14) and the matrix 2
=1
M
=
n X
=1
rErE>;
(15)
we can rewrite the problem (13) as follows (k 1 k denotes the norm of a vector): (v; M v) ! min; kvk = 1: (16) J= (v; Ve v) The vector v can be converted into the vector u by normalizing the third component into 1: u = v=r. The solution of (16) is obtained by solving the following generalized eigenvalue problem of M with respect to Ve (Sect. 2.2 of Ref. [6] or Sect. 8.2 of Ref. [5]):
M v = Vev:
(17) The solution of this generalized eigenvalue problem is obtained by the following procedure: Step 1: Compute the eigenvalues i and the corresponding unit eigenvectors mi of matrix Ve (i = 1; 2; 3). Step 2: Compute P
=
3 X mi m> i
i=1
pi :
(18)
Compute the eigenvalues i and the corresponding eigenvectors ni (i = 1; 2; 3) of matrix P MP . Step 4: The generalized eigenvalues and the corresponding unit generalized eigenvectors of Eq. (17) are given by i and P ni =kP nik, respectively. The solution v^ = (^p q^ r^)> of the problem (16) is given by the generalized eigenvector for the smallest generalized eigenvalue of Eq. (17). The estimate u^ of the ow is given by v^=r^. 2.3 Reliability of the Detected Flow It is very important to know the reliability of the detected ow [2], [8]. The reliability is measured by the covariance matrix Vu of the estimated ow u^ , which can be evaluated as follows (Sect. 7.1 of Ref. [6] for the details): 1. Approximate the function J given in Eq. (13) by a quadric function J~ in u and rE around their true values u and rE . 2. Let u = u +1u, and determine the value 1u that minimizes J~ as a linear function in rE . 3. Evaluate the covariance matrix Vu = E [1u1u>]. It can be shown that the resulting covariance matrix has the form !0 n X ( Pk rE )(Pk rE )> ; (19) Vu = (u; Ve u ) where Pk = diag(1; 1; 0): (20) Here, ( 1 )0 denotes the (Moor-Penrose) generalized inverse, and diag(a; b; c) denotes the diagonal matrix with diagonal elements a, b and c in that order. It can be proved that Eq. (19) gives a theoretical bound, called the Cramer-Rao lower bound, on attainable accuracy [4],[6]. Step 3:
=1
OHTA: OPTICAL FLOW DETECTION USING A GENERAL NOISE MODEL
In an actual computation, the true values u and
rE in Eq. (19) are approximated by the estimated value u^ and the observed values rE, respectively.
Since the third component of u is 1, the elements in the third column and the third row of Vu are all zero. Here, the covariance matrix of (^u v^)> is given in the form 1 M M 0 ; V [u; v] = (21) (^u; Ve u^ ) M M where Mij is the (i; j ) element of the matrix M de ned by Eq. (15). 11
12
21
22
1
3. Estimation of the Noise Covariance Matrix
In order to compute the estimator u^ , we need to know the covariance matrix Ve of the noise e . Since the scale of Ve does not aect the solution of the minimization (13), the absolute magnitude of Ve is not very important. However, we need to know at least the ratios between the elements. Although we do not know the true value rE , the covariance matrix Ve can be estimated from the distribution of the residual if the true ow u is known. Let u = (u v 1)> be the true ow at the th pixel. From Eqs. (10), (11) and (12), the residual can be written as = (rE + e ; u ) = (e ; u ): (22) Since e is a Gaussian random variable of mean zero, so is . Its variance is given by (23) s = (u ; Ve u ): If we observe , = 1; 1 1 1 ; m, the log-likelihood l is given by m log 2 ~l l=0 (24) 2 0 2; where m ~l = X log s + : (25) 2
=1
s
The maximum likelihood estimator of Ve is obtained by minimizing ~l under the constraint that Ve is positive de nite. 4. Experiments
4.1 Generation of Image Sequences We simulated motion images by the following procedure: 1. We took 16 shots of a static scene with a video
917
camera (SONY XC-75) and averaged them to remove noise. The averaged image (which we call image A) is regarded as a noise-free image; it is shown in Fig. 1(a). The frame size is 6402480 pixels; the intensity level is 255. 2. We created images Bi by shifting image A along the x-axis (horizontal axis) by i pixels (i = 04; 03; : : : ; 4). 3. We created images Ci by convolving images Bi with a Gaussian kernel of standard deviation 2.4 pixels and decimating the pixels to reduce the frame sizes by 1=4. This procedure simulates the imaging process of a camera which has a Gaussianshaped receptive eld of standard deviation 0.6 (= 2:4=4) pixels. 4. We took 20 images of a white panel. The intensity level is almost homogeneous around 200. We averaged them and extracted the camera noise by subtracting the averaged image from each of the 20 images. We then cut out ten 1602120-pixel patches Nj (j = 1; 2; : : : ; 10) from the resulting images without overlaps. One of them is shown in Fig. 1(b). 5. We created image Ii by adding the noise image Ni to image Ci (i=04; 03; : : : ; 4). The resulting images are regarded as the second frames of the simulated motion images. We separately created image Ip by adding the remaining noise image N to C . This image is regarded as the rst frame common to all the motions. Thus, image Ip and image Ii constitute the rst and second frames of a >simulated motion images, for which the true ow (u v) is (i=4 0)> (i = 04; 03; : : : ; 4). 4.2 Estimation of the Noise Covariance Matrix We assume that the o-diagonal elements of Ve are zero, namely, the x-, y- and t- components of the noise are independent. We also assume that Ex and Ey have the same variance. This is because the same kernel is used for computing the both derivatives. It follows that we only need to consider one-dimensional motions. Let 0 s Ve = 0 ; (26) t +5
10
0
2
2
= (u 1)> ; rE = (Ex Et )>: (27) We smoothed images Ip and Ii by applying a Gaussian kernel of standard deviation 1 pixel before computing the derivatives. Let Sp and Si be the resulting images, respectively. We computed S 0S +S 0S Exi = i x ;y i x0 ;y p x ;y p x0 ;y ; (28) 4 u
( )
[ +1
]
[
1
]
[ +1
]
[
1
]
IEICE TRANS. INFORMATION AND SYSTEMS, VOL. E79-D, NO. 7 JULY 1996
918
= Si x;y 0 Sp x;y ; (29) i u i = ; (30) 4 i = Exi u i + Eti ; (31) where Sp x;y and Si x;y are the intensities of images Sp and Si at the th pixel with coordinates (x; y), respectively. Thus, the data consist of 126,000 triplets fExi ; u i ; i g, = 1; : : : ; 1402100y, i = 04; 03; : : : ; 4. The distribution of Exi in the data heavily concentrates in small values. We reduced this concentration by the following procedure. We divided the data into 180 (= 20 2 9) bins by selecting such triplets that