Velocity Estimator via Fusing Inertial Measurements and Multiple ... - MIT

1 downloads 0 Views 5MB Size Report
This algorithm computes the instant veloc ity using epipolar constraints as an coarse initial estimate, and refines the results by minimizing the reprojection errors.
Proceeding of the IEEE International Conference on Robotics and Biomimetics (ROBIO) Shenzhen, China, December 2013

Inertial Measurements and Multiple Velocity Estimator via Fusing Inei Feature Correspondences from a Single Camera Guyue Zhou 1 , Fangchang Ma 11, Zexiang Li 1 and Tao Wang 2 1: Department of Electronic and Computer Engineering, the Hong Kong University of Science and Technology Technology Co, Ltd. 2: Hong Kong Dajiang Innovation Innovi {zhouguyue, fma, eezxli}@ust.hk, : [email protected]

Abstract—In this paper, we present a novel real-time velocity estimation algorithm. A sensor assembly consisting of a monoc­ ular camera and an inertial measurement unit with three-axis accelerotmeter and gyroscope is considered. To improve the robustness of the velocity estimator with respective to image noise, we apply a coarse-to-fine structure based on multiple feature correspondences over three consecutive frames. The presented algorithm starts with an initial guess by solving a set of linear equations from modified epipolar constraints, which has increased accuracy and computational efficiency in comparison to previous work. Then, a highly accurate velocity estimation is achieved by non-linear minimization of the reprojection errors using the Levenberg-Marquardt algorithm. We implement our approach and present the results both in simulation and on real data. I. INTRODUCTION

The simultaneous localization and mapping (SLAM) prob­ lem is one of the most important techniques in the field of mobile robotics. It asks if a robot can locate its relative location in an unknown environment and at the same time build up a map of the environment. Most of the prevailing SLAM techniques in the literature, such as Kalman filter based approaches [1] [4], particle filters [5] [7] and some smoothing approaches [8] [10] can successfully recover the poses of the robot. However, in many applications besides robot navigation and exploration, we are interested in the velocity of the robot instead of the absolute positions. For example, information about the velocity of a quadrotor heli­ copter is more helpful than information about the abosolute positions in stablizing the aircraft and flight control. More­ over, an accurate real-time measurement of the velocity of a smart phone provides an intuitive and user-friendly interface for many applications. A simple and straightforward solution for estimating the velocity is to derive the poses of the robot first and then use the distances between adjacent poses divided by the time difference as velocity. This approach, however, does not reflect the fact that velocity might be varying even in a short time interval due to non-trival accelerations. The result computed from this approach is in fact the average velocity over a short time interval, instead of the real instant velocity. A more sophisticated solution has been proposed in [11]. It provides a closed-form solution of the velocity determination problem by employing a monocular camera and a an Inertial Measurement Unit (IMU). The solution is computed with a

978-1-4799-2744-9/13/$31.00 ©2013 IEEE

1077

single feature observed over three consecutive camera poses and the inertial readings. A more complete analysis of all the physical quantities that can be estimated and computed in a closed-form is given in [12]. [13] evaluates this approach based on real data and the results show that this algorithm is highly sensitive to image noise. This is because of the fact that deterministic approaches consider short-time intervals only and suffer from wrong assumptions about the noise in the data. Therefore, closed-form solutions similar to [11] should only be used as a coarse estimation. We present a monocular vision-inertial algorithm which applies a coarse-to-fine structure based on multiple feature correspondences. This algorithm computes the instant veloc­ ity using epipolar constraints as an coarse initial estimate, and refines the results by minimizing the reprojection errors over three consecutive image frames. This paper makes two main contributions: ■ A novel parametrization of both the epipolar geometry problem in and optimization of graph-based SLAM problem. • An integration of a variant of the deterministic approach in [11] and the probabilistic approach in [8] [10] that is both robust to image noise and computationally efficient. The remainder of the paper is organized as follows: Section 2 introduces the notation and assumptions used in the paper. Section 3 gives a review of the existing closedform solution for velocity estimation fusing vision and IMU and then proposes our improved algorithm which is a variant of the epipolar constraints. Section 4 provides a probabilistic model to refine the estimation result. Section 5 describes the experiments and results, demonstrating the efficiency and robustness of our algorithms. II. NOTATION AND ASSUMPTIONS

Following the mathematical conventions, variables (e.g. t, T\, etc.) are set in an italic typeface and vectors (e.g. v, x, etc.) are in bold throughout this paper. A. Overview The estimation algorithms discussed in this paper are based on the sensor assembling that is constituted by a single monocular camera and an inertial sensor equipped with a three-axis accelerometer and gyroscope. Given the

fact that the frequency of inertial sensors is much higher than that of camera, we can assume that the inertial and visual observations are taken at the time instances t = kT\ and t = kTi = kmT\ respectively, where k = 0,1,2,..., and m is a large positive integer. We define the camera frame as follows: the z axis is along the optical axis and the origin of the camera frame is at the center of the normalized projection plane. The x axis and y axis are parallel to the imaging sensor's columns and rows. Si and Si represent the camera frames at the ΐ-th sampling time of the camera and inertial sensor respectively, which implies that S^ is equivalent to Smfc where fe = 0,l,2,....

\ l ■ i

\\ ^

I

(Sra-^

S0(S0)

Sl(Sm)

\

\ Î2 SïiSïm)

Fig. 1. An illustration of camera motions

As all discussed algorithms in this paper are based on 3 image frames, we only consider the first 3 successive camera samplings starting from time 0, without loss of generality. The global frame is tied to the frame SO. An illustration of the camera motions and the position of a feature point is shown in Figure 1. B. Inertial Readings Most of the commonly used and low-cost IMUs have the following two characteristics: the bias of gyroscopes is calibrated online by fusing the information from a magnetic compass, and the acceleration of gravity is eliminated based on an accurate attitude estimation [12]. The bias and uncer­ tainty of inertial sensors have also been clearly discussed in [12] when vision information is fused. Therefore, the relative rotation and body accelerations provides from IMU can be assumed to be reliable in the short term. With the pre-calibrated extrinsic parameters between the camera and inertial sensors, we can assume that the attitude Rk and accelerations a*, in the global frame are known at each inertial sampling time t = kT\. We represent the translations among frames So, Si and 52 using the inertial readings and an unknown camera velocity v, which is the camera velocity in the global frame at the time instance t = IT^. For instance, the translation from S\ to Si is denoted as t\i, representing the position of the origin of Si in the frame S\. R\i is the relative rotation from S\ to Si. The relationship between t\i and the unknown parameter v is given by ii2 = A i 2 ( T 2 t ; - r 1 2 s i 2 )

(1)

1078

where S12 is the sum of the acceleration vectors given by the accelerometers. The expression of S12 depends on the specific methods for integrating the accelerations, and the rectangular rule is adopted in this paper, as shown below. In

Si2 = ^ ( 2 r a + l

-i)a,i

(2)

We will leave out the exact expressions of toi, *i2, soi and s 12, which can be calculated in a similar way. C. Visual Readings We assume that the camera intrinsic and distortion pa­ rameters have been calibrated in advance. The vector /o, which represents the displacement from the origin of SO to a feature point F in the frame So, forms an image reading. However, the lack of depth information indicates that /o is essentially an up-to-scale vector. Therefore, we define /o as a normalized vector with the third element being 1, such that the representation of /o is unique. We also assume that a triplet of / o , / i and fi, the correspondence of a feature in different views, is always correct when we ignore the little noise. In practise, feature correspondences are usually recovered using local tracking or global matching methods based on sparse feature descriptors. By employing hypothesis-testing methods such as RANSAC [14], incorrect matches will be rejected as outliers. However, the inlier correspondence is still subject to noise, such as feature detector and descriptor errors. Therefore we use a reasonable assumption that the image points are corrupted by independent and identically distributed Gaussian noise, as feature points are extracted independently from images by the same algorithm [15]. III.

DETERMINISTIC SOLUTION

In this section, all feature correspondences are assumed to be inliers and two deterministic solutions to estimate camera velocity are described. Both algorithms fuse vision and IMU data and make use of multiple image feature cor­ respondences. We first have a quick review on the extended algorithm in previous work [11] [13], and then propose a variant of this algorithm based on the coplanar constrains, which are modified epipolar constraints. A. Depth Solution Consider the data fusion problem between IMU data and a single point feature correspondence in 3 successive camera frames. A deterministic solution, which is referred as the depth solution in this paper, has been proposed in [12], [13]. Α(Φ)

v z

= 6(Φ)

(3)

where z is the depth of the point feature, while the 4 x 4 matrix A and the 4 x 1 vector b are ex­ pressed by the parameters contained in the set Φ rep­ resenting sensor readings. Specifically, Φ consists of

/ο,/ι,/2,-βοι,-βΐ2,-βθ2,8οι,8ΐ2,8ο2,ΤΊ and Τ2. An unique solution to v exists and can be obtained by solving (3) when toi is non-zero. In the case of multiple features, the total number of features is denoted as n. There will be a pair of Ai and bi for the i-th feature. We define the first three columns of Ai as Ci and the last column of Ai as di and (3) can be extended

Ci di 0 c2 0 d2 c3 0 0

0 .. 0 0 .. 0 0 d3 .

0

0

0

' &1 "

V

Z\ Z2

tL

=

&2 &3

(4)

bn

Z"n

Gd(&)

Z-2

= M$)

(5)

B. Coplanarity Solution The above depth solution solves for the vehicle velocity and the point feature depth simultaneously. Therefore, the accuracy of velocity estimation is dependent on the one of feature depth calculation. Any image noise will lead to a biased depth and, when the depth of point features dominates in terms of scale, the accuracy of velocity estimation will be significantly reduced. So the following coplanarity solution is designed with the motivation to calculate the velocity without the need to estimate the feature depth. It is noted that the vectors f\, ti2 in the frame S\ and f2 in the frame S2 are in the same geometrical plane, which implies (Ä12/2) 7, 7i*ia = 0

(6)

where /1 stands for the symmetric skew matrix of the feature vector / 1 . By replacing t\2 with (1) in equation (6), an equation of the velocity v can be obtained. Similar equations are derived from toi and to2 and linear equations of v are constructed as below. (-R12/2)

/1-R12

(-R01/1)

foRo2

(Λ02/2)

/0-R02

v = a

/ι 0 (Φ) τ = [ &ι(Φ) Γ

&2(Φ)Τ

... Αη(Φ)τ ] (10) .-

&„(Φ) Γ ]

det(Ä) ■ [(Roifif

[(R02f2)TfoR 021

foR02} x

= [ ( f l i 2 / 2 ) T A < ] · [(Λοι/ι) Τ /ο] x [(Ä02/ 2 ) T /o] ~[(Αΐ2/2) Γ /ΐΑ2Ί]/θ

foRo2Soi

(#02/2)

foRo2so2 (8)

It is straight-forward to extend (8) to the case with n feature correspondences.

σ0(Φ)» = hep)

Note that Ri2f2,fi and -RQI/O are observations of the same feature point from different views, rotated to the same frame Si. Therefore a unique solution v for (8) exists when the feature is not coplanar with all three origins of So, Si and 52. In the case of multiple features, it is obvious that Gc is full column rank if any of A\ is full rank. Therefore, the least square solution to (9) v = (G^G c ) _ 1 G^/i c exists when there is at least one feature that is not coplanar with all three origins of SO, Si and S2. As we wish, this coplanarity solution can solve v inde­ pendently without the need to calculate feature depth. It also reduces the dimension of matrix inversion operation to a constant 3, instead of n+3 required by the depth solution, in the presence of n feature correspondences. For the detailed comparison of both solutions in accuracy and time efficiency, please refer to Section V. IV.

PROBABILISTIC SOLUTION

In the previous section, a coarse velocity estimation can be obtained by minimizing the algebraic error of either (4) or (9). Using this coarse estimation as an initial guess, we can apply a refinement process by minimizing the reprojection errors to achieve higher accuracy. A. Minimizing Reprojection Error For each point feature at position F in the global frame, we take an image observation defined as

ί(Φ)=[[/θ]«

[My [/l]x

[fl]y [Λ]χ [My f (13)

(7)

where a = T?/T2 is a fixed ratio. Alternatively, Α(Φ)υ = 6(Φ)

(12)

/1Ä12S12

{Roifl)

(H)

According to the geometric meaning, there must be a solution v to (8), therefore v can be uniquely solved in (8) if and only if A is invertible. Before calculating the determinant of A, recall the relationship (a x c) x (6 x c) ~ c where a,b,c e E 3 and ~ stands for the up-to-scale equality.

=(Äia/ 2 ) · Λ x (Roifo)

The velocity v can therefore be recovered as the first three rows of the vector {Gf^Gd)~1Gf^hd, which is the least square solution to (4).

(-R12/2)

Gc($f = [ Ιι(Φ) Γ Α2(Φ)Τ

=[(Ri2f2)TfiRi2]

By defining two new variables, a An x (3 + n) matrix Gd and a 4n x 1 vector hd, (4) is further simplified as zi

where

(9)

1079

where [f]x = fx, [/]„ = /„ and [f]z = fz for an arbitrary vector f=[fx fy Λ ]T e M3. Note that the three camera center positions, i.e., to2 in SO frame, ti2 in Si frame, Î22 in S2 frame, can be determined with the camera velocity v and the sensor readings Φ using (1) respectively. Therefore, a theoretical image projection can be computed as z(F,v^)=[u0

v0

ui

vi

u2

v2 ]

(14)

where Ui

=

[Rj2(*)F+ti2(v,*)]x

(15)

[Ri2(&)F + ti2(v,$)}z

(16) fori = 0,1,2. Comparing the theoretical projection with the observation, we define the reprojection error as \\z — z\\2. With a total number of n features, we construct a (3n+3) x 1 state vector T

T

x =[v

if

n

]

(17)

as well as a 6n x 1 observation vector

ϋ(Φ)τ=[ζΐ

zi

...

zi]

Then a 6n x 1 theoretical projection vector y(x, Φ ) = ζ [ ζϊ 2 ··· zn ] c a n ^ e derived from x and Φ, and we want to find x that minimizes the total reprojection error \\y — y ||2. The problem can be expressed as (19)

Solving the nonlinear least square problem (19) is equiv­ alent to pursue the maximum-likelihood (ML) estimation which maximizes the conditional probability density function ρ(Φ\χ) when the image errors are Gaussian and independent. If some prior information, saying the low frequency and sporadic global positioning system (GPS) signals, can be provided and the noise on it can be assumed to be Gaussian as well, we can easily get a maximum a posteriori (MAP) estimation based on Bayes' rule. ρ(χ\Φ) (χρ(Φ\χ)ρ(χ)

(20)

Lead to another nonlinear least square problem. x = argmin \\y(x, Φ) - £(Φ)|| 2 + \\g(x)\\2

(21)

X

where g(x) is the distance function from the prior Gaussian center. There are various numerical solutions to recover the lo­ cal minimum of a nonlinear least square problem with an initial value. In this paper, the famous Levenberg-Marquardt algorithm [16] is employed and the initial value is obtained by using the deterministic methods from previous section and the midpoint triangulation algorithm [17] for depth estimation. B. Uncertainty Propagation Based on the previous Gaussian image noise assumption, the observation vector follows y ~ Af(y,Cy) and Cy = λσ 2 /βηχ6η where λ is the scale factor from pixel space to essential space. Our goal is to approximately analyse the uncertainty of velocity v by using the first order error propagation method in [18]: let y —> f(y) be a C1 continuous function, the first order differential function, with Jacobian | i , then we have error propagation

m~m(y),^~(y)Cy^mT

(22)

1080

d^ dx

e/

T

dy_ dx

xdy_

(23)

dx dy Therefore x ■W(y),C x )witii

^(dyTdy

Cx

(24)

Furthermore, |j[ can be divided into blocks according to the definition of x and y.

(18) τ

x = arg min \\y(x, Φ) - 2/(Φ)||2

We omit the invariant Φ in (19) and the reprojection error minimization problem becomes f(y)= arg mina· ||y(sc) — y II2. As y is C2 continuous with full rank Jacobian ^ , the proposition 1 in [18] can be applied.

r

dy_ dx

3ζΛ

Suggest Documents