Motion Parameter Estimation from Optical Flow without ... - CiteSeerX

2 downloads 0 Views 121KB Size Report
eters from optical flow, which is a typical computer vision problem, and compare ... true line is specified by parameters h (the distance from the origin) and θ (the ...
Motion Parameter Estimation from Optical Flow without Nuisance Parameters Naoya Ohta Department of Computer Science Gunma University Kiryu, Gunma 376-8515 Japan

Abstract

images are the nuisance parameters. In order to illustrate this distinction in parameters, let us consider the simple problem of line fitting (fig. 1). We have feature points on an image, and want to fit a line to them. To formulate this problem in terms of statistical estimation, we assume the existence of a true line with true feature points (black circles in the figure) on it. The observed feature points (white circles) are regarded as true feature points contaminated by random noise (e.g. Gaussian noise). The true line is specified by parameters h (the distance from the origin) and θ (the angle), which are the structural parameters. To specify positions of the true feature points, we need parameter dα for each, and these are nuisance parameters. Note that if one more feature point (observation) arises, we need one more dα . In contrast, h and θ are independent of the number of feature points. The conventional estimation strategy is to estimate all the parameters h, θ, and {dα } by the maximum likelihood method, making no distinction between the structural and nuisance parameters. In addition to all the computer vision problems stated above, bundle adjustment also adopts this strategy. However, according to a theory of statistics, this strategy yields poor estimation results [1]. More precisely, estimators based on this strategy are, in general, inconsistent and asymptotically inefficient [12]. One way to cope with this problem is to use a method based on semiparametric models [1, 3], where the estimates are computed through so-called estimating functions. Although an application of this approach to computer vision problems has been reported in [13], it is difficult in general to obtain estimating functions ready to use in a wide range of computer vision problems. A more applicable approach is to eliminate the nuisance parameters by assuming a probability distribution on them [11]. This approach resembles Bayesian estimation, but it is different in that the nature of the nuisance parameter distribution is also estimated. The purpose of this paper is to apply this approach to the problem of estimating motion parameters from optical flow, which is a typical computer vision problem, and compare the estimation accuracy with that of the conven-

Many kinds of computer vision problems can be formalized as statistical estimation problems with nuisance parameters. In the past, such problems have been solved without making any distinction between the nuisance parameters and structural ones. However, a theory of statistics suggests that eliminating the nuisance parameters by assuming a probability distribution on them improves estimation accuracy of the structural parameters. In this paper, we apply this strategy to problem of estimating motion parameters from optical flow, which is a typical computer vision problem, and compare the estimation accuracy with that obtained by the conventional estimation method.

1. Introduction Statistical parameter estimation methods have been applied to many computer vision problems in the past [7]. Examples are found in camera motion and object shape computation from either point correspondences in two views [6, 7] or optical flow [2, 8], as well as homography computation between two images [6, 9]. In the first and second examples, motion of a camera and 3-D positions corresponding to image features are parameters to be estimated. In the third example, a homography matrix and positions of corresponding image points are the statistical parameters. In statistics, there are two types of parameters, structural 1 and nuisance parameters [12]. The former are usually our main concern, and have the characteristic that the number of the parameters is independent of the number of observations. On the other hand, the latter are defined as parameters whose number increases as the number of observations increases. We are usually less interested in them. The computer vision problems stated above contain both types of parameters. The components of the camera motion and the homography matrix make up the structural parameters, and the 3-D positions and the corresponding points in the 1 Structural parameters are also referred as either interest, principal, or model parameters.

1

and rotation velocity ω = (ω x ωy ωz ) , and Zα is the Z coordinate of the 3-D point imaged on the image plane at xα = (xα yα ) , then the optical flow u α = (uα vα ) observed at that point is expressed by the equation: dα

uα = pα Aα h + Bα ω,

(1)

where

  1 −f 0 xα , pα = , Aα = 0 −f yα Zα   xα yα /f −(x2α /f + f) yα Bα = . yα2 /f + f −xα yα /f −xα

h θ

(2) (3)

The optical flow is observed at multiple points on the image. The subscript α (= 1, . . . , N ) in the above equations is used to distinguish these individual points. We regard the inverse depth pα as the parameter representing the object shape. Although eq. (1) expresses the optical flow expected from the geometrical relation, optical flow u ∗α actually computed from images contains noise ∆u α , as follows:

Figure 1: Line fitting as a statistical estimation problem. tional estimation method. Issues surrounding structural and nuisance parameters have long been noted in computer vision (e.g. see [4], [5], [15]). For instance, computation of nuisance parameters was eliminated, under the name of the reduced approach, in [15]. However, the purpose of this elimination was to lower computation cost, not to increase estimation accuracy. In other words, the estimator was based on the conventional maximum likelihood estimation and was assumed optimal, with the elimination being performed for ease of computation by approximating the steps of the estimator. In contrast, we aim here to develop a method capable of giving fundamentally more accurate results, in a statistical sense, than those obtained by conventional maximum likelihood methods. In the following, we formalize the motion parameter estimation problem and describe the conventional estimation method, which includes nuisance parameters, as well as the new one without nuisance parameters. Then, we provide experimental results in which we compare accuracies of the two methods.

u∗α = uα + ∆uα .

(4)

We assume that the noise ∆uα has a Gaussian distribution with mean 0 and covariance matrix Vα , and that the covariance matrix Vα can be expressed as follows, using the squared noise level 2 and the normalized covariance matrix V0α [7, 8]: Vα = 2 V0α . (5) In addition, we assume that the normalized covariance matrix V0α is known, and that the squared noise level  2 is unknown, but common through all the image points. Our problem is to estimate the motion parameter (h, ω) on the basis of the optical flow {u ∗α } (α = 1, . . . , N ) observed at N image points. If we are interested in the object shape, we also estimate the parameters {pα } (α = 1, . . ., N ). The motion parameter (h, ω) is the structural parameter, and the inverse depths {pα } are the nuisance parameters because each pα is associated with each optical flow uα .

2. Motion Parameter Estimation from Optical Flow

3. Conventional Estimation Method

The motion parameter estimation problem from optical flow is formalized as follows. We adopt the central projection as the imaging model of a camera, and set the 3-D coordinate (X, Y, Z) so that the Z axis coincides with the optical axis of the camera. The image plane is placed parallel to the X-Y plane, at a distance of f (the focal length) from the origin (the projection center). We assume that f is known (i.e. a calibrated camera). The image coordinate (x, y) is set so that its origin coincides with the point where the optical axis (Z axis) goes through the image plane, and the x and y axes are parallel to X and Y axes, respectively. If the camera moves with translation velocity h = (h x hy hz )

If we do not distinguish between the structural and nuisance parameters, the maximum likelihood estimate is computed as follows. Given the observed optical flow {u ∗α } at N image points, the likelihood function is expressed by the next equation on the basis of the noise model expressed by eqs. (4) and (5). Pr({u∗α }) =

N  α=1

1 2π2 |V

1

0α | 2

  1 −1 exp − 2 (u∗α − uα ) V0α (u∗α − uα ) , (6) 2 2

where uα is as defined in eq. (1). Therefore, the maximum likelihood estimate is given as the parameter that maximizes the above quantity. In the above equation, (h, ω) and p α appear only within the exponent. So, the estimates are computed as the values minimizing the quantity JM L =

N  α=1

−1 (u∗α − uα ) V0α (u∗α − uα ).

∆h _ h

^h

(7)

Differentiating J M L with respect to p α and setting the resultant formula to be zero, the value of p α minimizing J M L is given by pα =

−1 ∗ h A α V0α (uα − Bα ω) . −1 h A α V0α Aα h

Figure 2: Computation of error vector ∆h for the translation velocity.

(8)

Substituting eq. (8) for p α in eq. (1), pα can be eliminated from JM L and the minimization is carried out only in (h, ω) space. However, because JM L is independent of the length of h, the minimization must be done under the constraint of |h| = 1. For this reason, we re-parametrized h by the two angles expressing its direction, and minimized the objective function by the Powell method [14]. The estimate pˆα of the inverse depth p α is computed ˆ ω) ˆ for by substituting the estimated motion parameter ( h, (h, ω) in eq. (8).

4. Estimation without Nuisance Parameters

Figure 3: Optical flow generated under the standard condition.

Here, we regard parameter pα as a stochastic variable, and eliminate it from the parameters to be estimated. If we adopt the Gaussian distribution with mean m p and variance σp2 as the distribution of p α , the true optical flow u α also has Gaussian distribution with mean E[u α ] and covariance matrix V [u α ]. E[uα ] =

mp Aα h + Bα ω

V [uα ] =

σp2 Aα hh A α

Taking logarithm of the above equation and ignoring the constant, the quantity to be minimized is given as follows. JN N

α=1

(u∗α −E[u∗α ]) V

(9)

V

[u∗α ]

=

mp Aα h + Bα ω, σp2 Aα hh A α

N 

1

α=1

2π|V [u∗α ]| 2

 (14)

Estimation is now carried out by minimizing the above quantity with respect to (h, ω, m p , σp2 , 2 ). In the forthcoming experiments, h was re-parametrized by the two angles of its direction, and J N N was minimized by the Powell method [14], as in conventional estimation. In this estimation, we eliminated parameter p α to inˆ ω). ˆ Howcrease the accuracy of the motion parameter (h, ever, if we are interested in the object shape, we can compute the estimate pˆα of the parameter pα by substituting the ˆ ω) ˆ for (h, ω) in eq. (8). estimated value (h,

(11) 2

+  V0α .

(12)

Therefore, the likelihood function is as follows. Pr({u∗α }) =

[u∗α ]−1 (u∗α −E[u∗α ])

(10)

Note that V [uα ] is singular. If the noise in the optical flow is assumed to be Gaussian expressed by eqs. (4) and (5) as in the conventional estimation, the observed optical flow u∗α also has a Gaussian distribution. Its mean E[u ∗α ] and covariance matrix V [u∗α ] are given by E[u∗α ] =

N   log(|V [u ∗α ]|) + =

5. Experiments

1

  1 exp − (u∗α−E[u∗α ]) V [u∗α ]−1 (u∗α−E[u ∗α ]) (13) 2

We conducted experiments to compare accuracies of the motion parameters estimated by the method without nui3

0.04

0.0008 ML NN

0.035

2.5 ML NN

0.0007

0.03

0.0006

0.025

0.0005

0.02

0.0004

0.015

0.0003

0.01

0.0002

0.005

0.0001

0

1.5 1 0.5

0 0

0.5

1

1.5

2

2.5

3

ML NN TR

2

0 0

0.5

(a) errh

1

1.5

2

2.5

3

0

0.5

1

(b) errω

0.03

1.5

2

0.01

3

2.5

3

3

NN

NN

0.025

2.5

(c) errp ML NN

2.5

0.008

0.02

2 0.006

0.015

1.5 0.004

0.01

1 0.002

0.005 0

0.5

0 0

0.5

1

1.5

(d) E[m ˆ p]

2

2.5

3

0 0

0.5

1

1.5

(e) E[ˆ σ p]

2

2.5

3

0

0.5

1

1.5

(f) E[ˆ ]

2

Figure 4: Accuracy of parameters under the standard condition. We assumed that the motion parameter (h, ω) was our main concern rather than the object shape {pα }. However, if the motion parameter were more accurately estimated, the object shape computed by eq. (8) would also be more accurate. Therefore, we evaluated the accuracy of the reconstructed shape using the following quantity.

sance parameters and the conventional one. The sequence of the experiments was as follows. First, ¯ ω) ¯ and true shape we fixed the true motion parameter (h, parameters {¯ pα }, and generated the true optical flow. Then, we added isotropic noise (mainly Gaussian) to each of the optical flow vectors, and estimated the motion parameters ˆ ω ˆ ) via the two methods. We repeated this estimation (h, 500 times with different seeds of random numbers for the noise, and computed the average error of the estimated motion parameters. This evaluation was performed for different values of the true parameters and noise levels. The average error of the motion parameters was measured as follows. For each estimation, we computed the difference vectors ∆h and ∆ω of the true and estimated parameters. Then, we calculated the second moment matrices E[∆h ∆h ] and E[∆ω ∆ω  ] through the 500 trials, and used the traces of these matrices as the quantitative error measures, which are referred to as errh and errω .  tr(E[∆h ∆h ]) (15) errh =  errω = tr(E[∆ω ∆ω  ]) (16)

errp = E

ˆ h ¯ − h.  ¯ ˆ hh

(18)

1 .   ¯ ¯ h Aα V0α Aα h

(19)

pα ], This is the average error of pˆα with weights 1/ V0 [¯ where V0 [¯ pα ] is an approximated normalized variance of pˆα computed as V0 [¯ pα ] =

While it is desirable to evaluate the two methods under exhaustive experimental conditions, doing so makes it difficult to report the results in a clear and systematic way, since we have many parameters specifying the condition. Therefore, we first defined a standard condition, and then varied independently each parameter for the experimental condition. As the standard experimental condition, we set the motion parameters to be

Note that, since the translation velocity h is constrained on the unit sphere (|h| = 1), the error vector ∆h must be mea¯ (fig. 2). The sured on the tangent space at the true value h projection is carried out viz: ∆h =

N 1  |ˆ pα − p¯α |

N α=1 V0 [¯ pα ]

¯ = (sin15◦ cos60◦ , sin15◦ sin60◦ , cos15◦) , (20) h ¯ = r(sin60◦ cos45◦ , sin60◦ sin45◦ , cos60◦) , (21) ω r = (2π/360)×0.2. (22)

(17)

4

0.04

0.04

0.08

0.08

0.035

0.035

0.07

0.07

0.03

0.03

0.06

0.06

0.025

0.025

0.05

0.05

0.02

0.02

0.04

0.04

0.015

0.015

0.03

0.03

0.01

0.01

0.02

0.02

0.005

0.005

0.01

0

0 Gauss

Uniform-p Uniform-Z

0.0008

0.0008

0.0007

0.0007

0.0006

0.0006

0.0005

0.0005

0.0004

0.0004

0.0003

0.0003

0.0002

0.0002

0.0001

0.0001

0

B

ref

C

Uniform-p Uniform-Z

(a) Shape

4-value

0 2640

0 Gauss

0.01

0 A

4-value

660

165 0.002

0.0015

0.0015

0.001

0.001

0.0005

0.0005

0 A

B

ref

C

Discrete

Uniform

Discrete

Uniform

0 2640

(b) Motion

Gauss

0.002

660

165

(c) Flow number

Gauss

(d) Noise

Figure 5: Estimation accuracy under various conditions. (top: err h , bottom: err ω , white: conventional, shaded: new) We assumed a Gaussian distribution of the true object shape ¯ p = 0.02 and parameter p¯α characterized by the mean m variance σ ¯p2 = 0.0082. The size of the image was assumed to be 640×480 pixels, and the optical flow was computed at 2640 (60×44) points on it. The focus length of the camera was set to 800 pixels. The optical flow generated by these parameters is shown in fig. 3. The noise ∆uα to be added to the true optical flow was assumed to be isotropic Gaussian noise. The estimation results are shown in fig. 4. The horizontal axes are for the noise level  in pixel units for all the graphs. The solid and the dashed lines, respectively, we used with, correspond to the conventional (ML) and new (NN) methods. It is observed that the estimation accuracy of the new method is superior at all noise levels (graphs (a) and (b)). Indeed, its errh and errω values are less than 1/3 of those of the conventional method. The accuracy of the reconstructed shape (errp ) is shown in graph (c). Although there is not a large difference between the two methods, a slight improvement is observed. The dotted line (TR) is for values com¯ ω), ¯ which can puted using the true motion parameter ( h, be regarded as a tentative lower bound. The line of new method overlaps this. Graphs (d) and (e) are for the average values of the estimated mean m ˆ p and standard deviation σ ˆp of the nuisance parameter pα , which are available only for the new method. As their true values are 0.02 and 0.008, respectively, they are quite accurately estimated. Graph (f) is for the estimate of the noise level . While the new method accurately estimates it, the conventional method yields an even lower estimate. This is an example of the fact that the straightforward application of the maximal likelihood method to this type of problem does not have consistency [12], as stated in section 1. A discussion related to this subject is also found in [10]. In the above experiments, we assumed a Gaussian distribution for the true shape parameter p α , which is the

most favorable condition for the new method. So, we examined three other distributions: uniform distribution between 0.004 and 0.036 (Uniform-p), uniform distribution in Zα = 1/pα between 30 and 250 (Uniform-Z), and four values 30, 50, 70, and 90 occurring evenly for Z α (4-value). The results are shown in fig. 5(a). The upper graph is for errh (translation) and the lower is for err ω (rotation). The white and shaded bars correspond, respectively, to the conventional and new methods. In these graphs, only the values when the noise level  is 2 pixels are shown, since general variation with respect to the noise level is the same under the standard condition shown in fig. 4. The label Gauss indicates the values under the standard condition. These graphs show that the new method improves the accuracy even if the shape distribution is not Gaussian, though the degree of improvement is best for the Gaussian distribution. To examine the dependency on the motion parameter, ¯ = (0, 0, 1)  ω ¯ = (0, 0, 0)  we used three values: h ¯ = (sin45◦ cos60◦ , sin45◦ sin60◦, cos15◦ ) ω ¯ = (A), h (0, 0, 0)  (B), and r = (2π/360)×0.3 with eqs. (20) and (21) (C). The results are shown in fig. 5(b), where the graphs were drawn in the same way as in fig. 5(a). The label ref corresponds to the standard condition. With regard to the degree of the improvement, it becomes worse as the direc¯ approaches a line parallel tion of the translation velocity h to the image plane (B), or as the rotation velocity ω ¯ becomes faster (C). However, the improvement in accuracy is observed for all the motion parameters. The graphs in fig. 5(c) show the dependency on the number of the optical flow vectors, namely the number of nuisance parameters. The number is 2640 for the standard condition, and we examined either 1/4 (660) or 1/16 (165) of them. The average errors of both the methods increase as the number considered decreases. In addition, the degree of improvement by the new method is smaller for the smaller optical flow numbers. This is reasonable because the effect 5

of eliminating nuisance parameters is significant when the number of the nuisance parameters is large. Next, we investigated the dependency on the distribution of the noise ∆uα in the optical flow. We examined two distributions other than Gaussian: uniform distribution in the region of |∆uα | ≤ 2 (Uniform), and discrete distribution generated by rounding Gaussian noise with a standard deviation of 2 pixels into the nearest integers (Discrete). The graphs in fig. 5(d) show that the estimation accuracy is almost independent of the distribution of the flow noise for both methods. We have reported the results where each factor defining the experimental condition was independently varied. This was only done for conciseness. Indeed, we conducted the experiment under many other conditions, and observed that in almost every case the new method improved accuracy. One exception was when each of the four values for Zα were assigned to each quadrant of the image coordinate. In that case, unstable behavior of the new method was observed. However, we have not done enough investigation about this phenomenon to be reported. We plan to report it in a succeeding paper.

[2] M.J. Brooks, W. Chojnacki and L. Baumela, Determining the Egomotion of an Uncalibrated Camera from Instantaneous Optical Flow, Journal Optical Society of America A, Vol. 14, No. 10, pp. 2670—2677, Oct. 1997. [3] P.J. Bickel, C.A.J. Klaassen, Y. Ritov and J.A. Wellner, Efficient and Adaptive Estimation for Semiparametric Models, Johns Hopkins Press, London, 1993. [4] W. Chojnacki, M.J. Brooks, A. van den Hengel and D. Gawley, On the Fitting of Surfaces to Data with Covariances, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 11, pp. 1294–1303, Nov. 2000. [5] W. F¨orstner, Direct Optimal Estimation of Geometric Entities Using Algebraic Projective Geometry, Festschrift anl¨aßlich des 60. Geburtstages von Prof. Dr.-Ing. Bernhard Wrobel, Technische Universit¨at Darmstadt, pp. 69–88, 2001. [6] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, Cambridge, 2000. [7] K. Kanatani, Statistical Optimization for Geometric Computation: Theory and Practice, Elsevier, Amsterdam, 1996. [8] K. Kanatani, Y. Shimizu, N. Ohta, M.J. Brooks, W. Chojnacki and A. van den Hengel, Fundamental Matrix from Optical Flow: Optimal Computation and Reliability Evaluation, Journal of Eletrconic Imaging, Vol. 9, No. 2, pp. 194– 202, Apr. 2000.

6. Conclusion In this paper, we generated a method of computing camera motion from optical flow using a statistical approach without nuisance parameters, and compared its accuracy with that of the conventional method, which contains nuisance parameters. Experiments show that the new method mostly gave more accurate estimation results. In the best cases, the estimation error of the new method was about 1/5 of that of the conventional method. However, unstable behavior of the new method was observed for a special arrangement of depth, though details are left to a succeeding report. Nevertheless, the experimental results suggest that methods for other computer vision problems could be improved by formulating them in terms of statistical estimation without nuisance parameters.

[9] K. Kanatani, N. Ohta and Y. Kanazawa, Optimal Homography Computation with a Reliability Measure, IEICE Transactions on Information and Systems, Vol. 83-E, No. 7, pp. 1369–1374, June 2000. [10] K. Kanatani, Model Selection for Geometric Inference, Proceedings of the 5th Asian Conference on Computer Vision (ACCV 2002), Vol. 1, pp. xxi–xxxii, Jan. 2002, Melbourne, Australia. [11] J. Nagano and T. Han, Construction of MDL Criteria in the Presence of Nuisance Parameters and Its Application to Geometric Model Selection Problems, IEICE Transactions A, Vol. J83-A, No. 1, pp. 83–95, Jan. 2000. (in Japanese) [12] J. Neyman and E.L. Scott, Consistent Estimator based on Partially Consistent Observations, Econometrica, Vol. 32, pp. 1–32, 1948.

Acknowledgments The author thanks Prof. Kenichi Kanatani of Okayama University, Prof. Michael J. Brooks of University of Adelaide, Prof. Yoichi Seki of Gunma University, and Dr. Takayuki Okatani of Tohoku University, who discussed technical issues with the author and gave helpful suggestions.

[13] T. Okatani, and K. Deguchi, On Bound of Estimation Accuracy of the Structure from Motion Problem, Proceedings of the Meeting on Image Recognition and Understanding, Vol. II, pp. 335–340, Aug. 2002. [14] H.P. Press, B.P. Flannery, S.A. Teukolsky and W.T. Vetterling, Numerical Recipes in C, Cambridge University Press, Cambridge, 1988.

References

[15] B. Triggs, Optimal Estimation of Matching Constraints, Proceedings of European Workshop on 3D Structure from Multiple Images of Large-scale Environments, June 1998.

[1] S. Amari and M. Kawanabe, Estimation of Linear Relation: ¯ os¯uri, Vol. 6, No. Is the Least Square Estimator Optimal?, Oy¯ 2, pp. 86–109, June 1996. (in Japanese)

6