ISSN 20751087, Gyroscopy and Navigation, 2013, Vol. 4, No. 1, pp. 1–13. © Pleiades Publishing, Ltd., 2013.
Robust VisionAided Inertial Navigation Algorithm via EntropyLike Relative Pose Estimation F. Di Corato, M. Innocenti, and L. Pollini Department of Energy and Systems Engineering, University of Pisa, Largo Lucio Lazzarino, 1 56122 Pisa, Italy Emails:
[email protected],
[email protected],
[email protected] Received May 31, 2012
Abstract⎯The paper presents a loosely coupled approach for the improvement of state estimation in auton omous inertial navigation, augmented via imagebased relative motion estimation. The proposed approach uses a novel pose estimation technique based on the minimization of an entropylike cost function which is robust by nature to the presence of noise and outliers in the visual features. Experimental evidence of the per formance of this approach is given and compared to a stateoftheart algorithm. An indirect Kalman filter is used for navigation in the framework of stochastic cloning. The robust relative pose estimation given by our novel technique is used to feed a relative position fix to the navigation filter. The simulation results are pre sented and compared with the results obtained via the classical RANSACbased direct linear transform approach. DOI: 10.1134/S2075108713010033
1. INTRODUCTION
them do so by minimizing a squared norm of the esti mation error. Recently, integrated visioninertial nav igation systems have appeared; the most recent works in the field of the visionaided inertial navigation differ mainly in the coupling approach between vision and inertial navigation system adopted. Papers [5] and [6] use a tightly coupled approach, that is, each collected keypoint is added to the navigation filter state and cooperates to the estimation phase. In contrast, papers [7] and [8], in particular, use a loosely coupled approach, that is, the navigation filter is provided with relative position fixes only. In [7], the stochastic cloning approach is introduced and the relative pose estima tion is computed via a classical least square minimiza tion. Paper [8] uses a similar approach as in [7], while the filter is provided with relative pose measurements which are obtained via a robust 2norm minimization approach using the Huber cost function [9] in a frame work of Mestimation. The work in [10] uses a stereo vision system as the main navigation sensor, while the processed IMU measurements are used to feed atti tude corrections to the EKF.
Inertial navigation suffers from drifts due to several factors, in particular, inertial sensor errors. Usually aiding sensors, like GPS, barometers or laserdoppler speedometers, are used to provide corrections to the navigation system. A viable alternative to these assis tance systems is a vision system that estimates motion of the camera, thus of the body, given a stream of suc cessive images. The use of vision for navigation is often referred to as visual odometry. A core problem in the field of visual odometry/visual navigation in an unknown environment is the estimation of the pose of the vision system with respect to the observed scene. Pose estimation is a concluding step in a sequence of different phases: 1) detection of significant features in the scene from camera images; 2) tracking of the fea tures between successive frames; 3) pose estimation. The existing techniques in the field of camera pose estimation differ mainly in the methods adopted for the three steps listed above. The most challenging issue in solving the pose estimation problem is the presence of noise and outliers in the data. The latter are due mainly to inaccurate keypoint matching and/or track ing between the left and right images, in the stereo vision case, and successive time instants. The outlier rejection problem is often solved via minimization techniques (L2, L∞ etc.) [1] or via iterative refinements [2, 3, 4], that is, via image pre/postprocessing tech niques. The wellknown robust approaches in estimat ing camera poses are RANSACtype algorithms [2, 3] which have no guarantee of optimality. Almost all the schemes proposed in the literature perform outlier rejection as a pre/postprocessing phase, and most of
Here we propose a loosely coupled approach which uses a stereo vision system and an inertial measure ment unit. The relative pose estimations given from the vision system are computed using an entropylike cost function, which is robust by nature with respect to the outliers in the data. The estimated pose is thus used to give relative position fixes to the indirect Kalman navigation filter in the framework of stochastic cloning, as discussed in [7]. The main contribution of the paper shows that the adoption of the proposed robust pose estimation algo 1
2
DI CORATO et al.
rithm, which is robust to a large class of disturbances, provides a net improvement to the navigation accuracy and that there is still room for further improvements that better exploit the peculiar characteristics of the proposed pose estimation algorithm. 2. MATHEMATICAL BACKGROUND This section presents the mathematical back ground and the nomenclature used throughout the paper for both inertial navigation and visionbased pose estimation. For the purposes of this paper, we employed a combination of strapdown inertial mech anization and indirect or, as is often called, errorstate Kalman filter. We assumed the use of an inertial mea surement unit (IMU) with fiber optic gyroscopes and MEMS accelerometers; the accuracy of this type of IMU is usually sufficient for attitude estimation but not for navigation [10, 15]: large position estimation drifts arise due to accelerometer inaccuracies and timevarying biases. These IMUs are usually comple mented by GPS or other external sensors for providing position error corrections and estimation of acceler ometer biases. This paper proposes instead the use of a vision system in replacement of GPS, or simply to be used when GPS is not available, which follows the same rationale: attitude estimation is performed using inertial measures only, while velocity and position esti mation is aided by a vision system. 2.1. Inertial Navigation Upon the considerations above, the inertial mech anization state variables are the direction cosine matrix Cbn, which rotates from body (b) to navigation (n) frames, the velocity vector Vn = [VNorth VEast VDown]T (expressed in navigation frame), and the position vec tor re = [φ λ h]T composed of the latitude, longitude, and altitude of the navigation frame with respect to an earthfixed frame (e). Any navigation and earthfixed frame can be used; the results presented in this paper were achieved using the NED and ECEF reference frames. Following these assumptions, the mechanization equations can be divided into two parts, one for atti tude, and the other for position and velocity: ⎧⎪Cbn = Cbn ⎡ ωibb − Cnbωinn ∧⎤ ⎣ ⎦ (1) ⎨ n b ⎪⎩x = f x, Cb , f ib , where x = [Vn, re] and f ibb and ωibb are the strapdown accelerometer and gyroscope measures, and ωinn is the transport rate. Full derivation of the above equations can be found in several textbooks and is omitted here (see, for instance, [11]). As anticipated, these papers use the errorstate or indirect Kalman filter (IKF) approach and we have already identified the accelerometers as the major
(
(
)
)
source of errors for the class of the IMUs considered in this research. According to procedure described in [11], the following error variables were first defined: the velocity error δVn, the position error δre, and the accelerometer error (or bias) δf ibb, which establish the relationship between actual values (variables without a hat) and estimated values (with a hat) or measured val ues (with tilde): ⎧Vˆ n = V n + δV n ⎪ e e e (2) ⎨rˆ = r + δr ⎪ f b = f b + δf b + ν , ib ib f ⎩ ib where νf is a zeromean Gaussian process with vari ances E[νf ν Tf ] = Qf and it represents accelerometer noise. The error variable dynamics was then formu lated as: ⎡δV n ⎤ ⎡FVV FVr Cbn ⎤ ⎡δV n ⎤ ⎢ e⎥ ⎢ ⎥⎢ e ⎥ δx(t) = ⎢ δr ⎥ = ⎢ FrV Frr 0 ⎥ ⎢ δr ⎥ ⎢ δf b ⎥ ⎢ 0 0 0 ⎥ ⎢ δf b ⎥ ⎦ ⎣ ib ⎦ ⎣ ib ⎦ ⎣ (3) n ⎡Cb 0⎤ ⎢ ⎥ ⎡ν f ⎤ + ⎢ 0 0⎥ ⎢ ⎥ = F (t ) δx(t) + G (t ) uc(t), ν rf ⎢⎣ 0 I ⎥⎦ ⎣ ⎦ where the time varying components of matrix F are the Jacobians of error dynamics, and the accelerometer bias dynamics was modeled as Brownian motion, with dummy dynamics: δfibb = νrf with E[νrf νTrf ] = Qrf. Equation (3) represents the linear parameter vary ing (LPV) approximation of actual nonlinear error dynamics; this LPV formulation is well suited for use inside an extended Kalman filter. In the remainder, for the purpose of discrete time Kalman filtering, we will consider a timediscretized version of the above dynamics: (4) δx k +1 = Ψ k δx k + Γ k uc,k . Several approaches could be used to obtain the sys tem matrices in Equation (4); the results of this paper were achieved using the Euler integration method with sample time ΔT yielding Ψk = (I – F(tk)ΔT) and Γk = G(tk)ΔT. 2.2. Pose Estimation Using Stereo Vision The visionaided inertial navigation problem, with relative measurements, requires the estimation of camera motion between successive frames. In a stereo vision system, each couple of the left and right images produces a cloud of N 3D keypoints {Pi, k} obtained by detection, matching and triangulation of two corre sponding sets (one for the left and one for the right images) of N 2D features {pi, k}. Tracking of 2D features at two successive time instants tk and tk + h produces two clouds of 3D key
GYROSCOPY AND NAVIGATION
Vol. 4
No. 1
2013
ROBUST VISIONAIDED INERTIAL NAVIGATION ALGORITHM
3
Fig. 1. Example of a possible matching ambiguity connected with the use of covariant feature detectors (e.g., SIFT).
points Pi, k and Pi, k + h that are related by a rigid motion relationship. This relationship represents, essentially, the camera motion, translation Tkk + h and rotation C kk + h between tk and t k + h. Thus, the following relationship holds:
Pi,k +h = Ckk +hPi,k + Tkk +h = g kk +hPi,k,
{
keypoints that are both in the left and right images are used for triangulation and tracking. With the same dis tancebased approach it is possible to track the key points that are present in the current and past images; this makes the selection of keypoints Pi, k and Pi, k + h possible.
(5)
}
where g kk +h = Ckk +h, Tkk +h ∈ SE (3) is the transforma tion mapping the pose of the camera at time tk to the pose of the camera at time tk + h. The notation g kk + hPi,k denotes the application of the rotation and translation transformations Ckk +h, Tkk + h to point Pi, k. Given any threedimensional parameterization of the rotation matrix, the transformation matrix in Equation (5) can be written as g kk +h ( θ p ) , θp ∈ Θp ⊂ ⺢6, where Θp is the set of all possible motion parameters (angular dis placements and translations). The pose estimation problem becomes the estimation of the unknown motion parameters vector θp, given the two clouds of N features at time tk and tk + h. The solution of the prob lem can be found by using a minimization approach over the estimation residuals: k +h
Ei,k +h = Pi,k +h − g k
(θ p ) Pi,k,
(6)
that is, N
θˆ p = arg mi n θp
∑ ᏸ{P
i,k + h
}
− g kk + h ( θ p ) Pi,k ,
i =1
(7)
where ᏸ {} ⋅ is a suitable cost function of the pose esti mation residual (6). Due to triangulation and calibration errors, we need at least N > 3 of nonaligned points, tracked along the camera motion. The feature selection and tracking approach used in the outdoor experiments of this arti cle uses stereo vision and the scale invariant feature transform (SIFT) algorithm [12], as is conventional for this application. The algorithm starts by acquiring a stereo images pair and relies on feature disparity to obtain 3D points by triangulation. The SIFT algo rithm is applied to detect, select and match features from the left and right images. The stereo matching of the keypoints between the left and right images is per formed by comparing the squared distance between the descriptors of each keypoint in the two images and selecting the pair with the smallest distance. Only the GYROSCOPY AND NAVIGATION
Vol. 4
No. 1
3. ROBUST CAMERA POSE ESTIMATION The desirable behavior of a feature detector is the capability of exactly tracking the same warped image regions as they are deformed along changes in view points. To do so, covariant feature detectors, such as SIFT, are designed to modout the effects of transfor mations belonging to some group. In SIFT, the fea tures are canonized with respect to translation, rota tion and scale, that is, the effects of these transforma tions are compensated and features become invariant to them. Canonization induces a certain amount of loss of information in the detected features, thus ambiguities could arise. Figure 1 shows an example where this information loss leads to a mismatch: the two highlighted image regions share almost the same appearance information even if they occur in distant areas; the only difference between them is that one is the rotated version of the other. As soon as they undergo the canonization process, involved in extract ing invariant features, they can be considered being at the same scene point by the detector, leading to a mis match. As a result, the whole set of features collected during the acquisition, matching and tracking phases may be affected by a certain amount of outliers. For the purposes of this work, we found it conve nient to recast the definition of outliers in terms of the effect they have on the motion description. In partic ular, keypoint mismatching between the left/right frames (in a stereo vision configuration), keypoint mistracking between successive time instants and due to features belonging to moving objects on the scene will generate points which will move on the image plane in a different way with respect to the good points. A robust procedure is thus generally needed in order to guarantee a certain amount of insensitivity to the set of possible deviations from the nominal model assump tions. What is clear, with the discussion made so far, is that observations will show a certain amount of disper sion, with respect to the assumptions induced by the nominal model, if any outliers are involved. In the fol
2013
4
DI CORATO et al.
lowing, a technique which is able to give a measure of the degree of dispersion of the data will be used to design a robust pose estimator. 3.1. Least EntropyLike (LEL) Estimator A robust nonlinear alternative to leastsquare esti mation was recently proposed [13]. The aim of this estimator is to give a representation of the dispersion of the residuals; such function is built on the concept of Gibbs’ entropy [14], which is the reason why such estimator was given the name least entropylike (LEL) estimator. Given N observations yi, i = 1, …, N extracted from a general nonlinear system, parameterized via the vec tor θ, (8) yi = f ( u, θ) + є i , i = 1, 2,…, N , the LEL estimator can be designed around the relative squared residuals πi defined as:
ri2
πi =
∑
N
r2 j =1 j
(9)
ri = yi − yˆi , where u is the known system input vector, єi is the nor mally distributed innovation term, yˆi = f(u, θˆ ) is the prediction of the system output, given the estimation θˆ of the unknown parameter vector, and N
πi ∈ [0,1],
∑π
i
= 1.
(10)
i=1
Given Equation (9), the normalized entropylike cost function is defined as follows: N ⎧ ri 2 = 0 ⎪0 if ⎪ j =1 H =⎨ N ⎪ 1 − π i log π i otherwise, ⎪ log N ⎩ i =1
∑
(11)
∑
∑
N
where the condition r 2 = 0 is introduced j =1 i for mathematical completeness and represents a per fect fit (in this situation, an optimization based on the 2norm would provide the exact match). Reference [13] proposed the following estimator: (12) θˆ LEL = arg min H . θ
The idea behind this estimator is to make the rela tive squared residuals as unequally distributed as pos sible. If this is the case, most residuals are small (with
∑
N
respect to the least square cost r 2 ) and few of the j =1 i residuals are large (see [13]). Data points correspond ing to these large residuals are outlier candidates.
3.2. Robust Pose Estimation using LEL When residuals ri in Equation (9) are defined using the pose estimation error as in Equation (6), the LEL estimator can be applied to solve the robust pose esti mation problem: N ⎛ ⎞ (13) π i log π i ⎟ , θˆ LEL = arg min ⎜ − 1 ⎟ θ ⎜ log N ⎠ ⎝ i =1 where ψ (r ) (14) πi = N i , r ψ ( ) j j =1
∑
∑
{
}
(15) ri = pi,k + h − K c proj g kk +h ( θ p ) Pi,k , where ψ(⋅) is the following Huberlike function [9]: 2 ⎪⎧a , if a ≤ d (16) ψ(a) = ⎨ 2 ⎪⎩d , otherwise and Kc and proj{⋅} are the pinhole model camera cal ibration matrix and perspective projection operator, usually adopted in computer vision. This formulation of the least entropylike solution to the pose estimation problem represents a substantial improvement to the one the authors proposed in [15] and [16]. In the cited works, the relative residuals in πi (14) were defined using the standard 3D form of the residuals, as in (6). Triangulated features usually have a significant ambiguity in the depth direction due to the intrinsic image noise. Such ambiguity grows pro portionally with the relative distance between the camera and feature points and it can seriously reduce the estimation accuracy. One way to improve the esti mation results, while keeping scale information (proper of the pose estimation schemes between 3D points), is to employ the reprojection error scheme since it involves only image coordinates which are invariant to changes in depth [17]. This is why in this work the pose estimation residual is recast in terms of the reprojection error, leading to better estimation accuracy. The employment of the Huberlike function allows avoiding bad conditionings of the entropylike cost function by limiting the upper bound of the denominator in (14) and thus avoiding the uncon trolled growth of the sum of the residuals norm due to numerical sensitivities. So far, no parameterization of the homogeneous matrix g kk + h ( θ p ) has been adopted yet. Actually, only a parameterization for the rotation matrix C kk + h must be chosen which can be made in several ways (Euler Angles, AxisAngle, Quaternion, etc.). What is desir able for a parameterization which is involved into a nonlinear optimization problem is for it to be fair [18], that is, it should not introduce more numerical sensi tivity than the one involved in the problem itself. For the purposes of this work (see also [16]), the solution proposed in [19] was used. The idea behind the pro posed approach is to parameterize the normalized
GYROSCOPY AND NAVIGATION
Vol. 4
No. 1
2013
ROBUST VISIONAIDED INERTIAL NAVIGATION ALGORITHM
quaternion q ∈ ⺡ with a 3vector like v ∈ ⺦, such that q = B(v ), being B(⋅): ⺦ → ⺡ a certain nonlinear map ping defined in [19]. Using this approach, the quater nion can be described by a reduced set of parameters that are the components of the 3vector v = [v1 v 2 v3 ]T . Although this parameterization is not singularity free, it was shown [19] that it can be suc cessfully adopted in minimization problems where the quaternion changes at each algorithm step are small; furthermore, this approach allows the algorithm to change vector v freely, and without imposing the unitynorm constraint (which is always met by the corresponding quaternion q = B(v )), thus an uncon strained minimization can be carried out.
The updated Equation (18) moves the solution of the optimization problem along the directions where the gradient ∇ θ p H ∈ ⺢1×6 is smaller, depending on the value of the (approximated) Hessian matrix ∇ 2θ p H ≈ 6×6 ∇ θ p H T ∇ θ p H ∈ ⺢ . The gradient of the entropylike function can be computed numerically or analytically; in this work the former solution was chosen and such approach produced acceptable results. The algorithm is applied until the estimation vector θˆ p,k reaches a sta tionary point, within a tolerance є, such that θˆ p,k − θˆ p,k−1 < є . 4. EXPERIMENTAL RESULTS OF POSE ESTIMATION
3.3. Optimization Algorithm The entropylike penalty function is nonlinear and multiple local minima may exist. Thus, the minimiza tion has to be computed with a particular attention to the initial conditions. Nonlinear optimization prob lems are usually performed after a preliminary (usually linear) optimization step, by finding solutions that are close to the optimal values. This raw estimation is then used as the initial condition for the nonlinear refine ment. The purposes of this work are such that we expect to have an acceptable local estimate of the motion given by inertial mechanization performed over a short period of time (between the two successive frames). This can be considered a feasible assumption since, in this short period, acceleration measurements describe in a quite good manner the actual motion experienced by the system. Therefore, it is possible to start the nonlinear estimation with the parameter θˆ p,0 that can be extracted by the best available estimate of the relative transformation:
(
gˆkk +h = gˆk−+h
)
−1
gˆk+ → θˆ p,0,
(17)
where gˆk+ is the best (corrected by the filter in the past) estimate of the navigation at time tk, gˆk−+ h is the naviga tion prediction at the current time t k +h. The arrow symbol in Equation (17) means that the value of θp, 0 is extracted by the transformation gˆkk + h. It is often assumed in the literature dealing with the minimization of similar problems that the shape of the cost function near the local optimum can be consid ered quasiparabolic; thus, the optimization can be performed locally using the LevenbergMarquardt algorithm. The estimation vector θp, k ∈⺢ 6 is then iter atively updated by the following approximate equa tion: θˆ p,k = θˆ p,k −1 − γ k ∇θ p H T ∇θ p H (18) −1 T T + μ kdiag ∇θ p H ∇θ p H ∇θ p H .
(
{
})
GYROSCOPY AND NAVIGATION
Vol. 4
5
No. 1
In [16], the authors showed that the proposed method performed better than ICP, a standard 2norm based approach. In this section the proposed algo rithm is compared with a RANSACbased direct lin ear transform (DLT) [20], with nonlinear refinement via bundle adjustment [21]. The comparison is made by performing some experiments of pure pose estima tion (without IMU integration). The RANSAC algo rithm used by DLT was set up with our best knowledge of the image noise in to obtain the best result possible by the refined DLT algorithm. This optimal estimation was used for comparisons with our approach. We performed outdoor experiments with a hand held Firewire stereo camera system; we selected a res olution of 516 × 388 pixels since it was judged to give the best one in terms of the ratio between the speed of image processing and the accuracy in feature selection and matching. An industrial 1.6 GHz PC with 1 GB RAM was used to collect the test videos; then, the video frames were processed offline, together with the estimation algorithm. 4.1. Initialization without INS Mechanization In the case of pure camera pose estimation, no IMU mechanization was available, thus no local motion estimation could be inferred. In such condi tions, a proper initialization of the algorithm must be made by using image measurements only. The 8point algorithm [20], with ten steps of RANSAC, is a good candidate for this aim and it was used to obtain a raw estimation of the pose. Such algorithm employs monocular imagespace measurements only, thus it is able to recover translation up to a scale factor (due to the perspective projection framework). We used this approach since it provides a fast closed form solution to the problem and the pose estimation obtained can be considered close enough to the optimum. In partic ular, such initialization procedure allows recovering the rotation matrix with a high accuracy. It was our experience that by forcing one term, between the
2013
6
DI CORATO et al.
Fig. 2. Outdoor experiment #1. Image pairs with point correspondences and estimation results. The green dots are the matched SIFT features. The green circles are the features reprojected using the LEL pose estimation result.
translation or the rotation, toward its optimal value, the minimum of the entropylike cost function with respect to the other was in general in the neigh borhood of the expected optimal value. The discussed approach proved very useful in obtaining the rotational information with appreciable accuracy. The only drawback resides in the scale ambiguity, which affects the 8point algorithm estimation. In order to over come such ambiguity and to adjust the actual transla tion scale, we used the distance between two randomly chosen points in space whose depths are known thanks to the triangulation of tracked features. The resulting initial pose estimation was used to initialize the nonlinear refinement via minimization of the entropylike function. Such minimization was performed by using the whole dataset of the collected features.
4.2. Results The following figures show the results of some experiments. In particular, Figs. 3 and 6 show the sorted 2norm of the reprojection residuals:
er,i
2
{ (
–πi log(πi)
RANSACDLT LEL
100
H|e| = 0.0024279
||er||
H(θLEL, πi)
0.02 Outlier 0
10–1
0.2
0.4 0.6 Hdlt = 0.0024847
–πi log(πi)
0.04
10–2 10–3
20
30
40 50 60 ith residual
70
80
Fig. 3. Outdoor experiment #1. Sorted reprojection errors.
1.0
H(θRobDLT, πi)
0.2
0.4
10–4 10
0.8
0.02
0 0
2
using the motion parameters θˆ p estimated by the two methods, LEL and robust DLT. The camera calibra tion matrix Kc was determined experimentally. Pi, 1 are the 3D keypoints triangulated in the first position of the camera (at time ); pi, 2 are the imagespace coordi nates (in pixels) of the corresponding features t1 on the image plane of the image acquired in the final position of the camera (at time t2). Figures 2 and 5 show the
0.04
101
) }
, DLT = pi,2 − K c proj g12 θˆ LEL Pi,1 p
πi
0.6
0.8
1.0
90 Fig. 4. Outdoor experiment #1. Evaluation of entropylike function by using the estimations given by the two tested methods. The value of the entropylike function is smaller around the estimation of our approach, which points out, in this case, to a slightly bigger dispersion of the relative residuals in the case of LEL minimization. GYROSCOPY AND NAVIGATION
Vol. 4
No. 1
2013
ROBUST VISIONAIDED INERTIAL NAVIGATION ALGORITHM
7
Fig. 5. Outdoor experiment #2. Image pairs with point correspondences and estimation results. The green dots are matched SIFT features; the green circles are the robust DLT estimated inliers; and the black squares (behind the green circles) are the LEL esti mated inliers.
106 RANSACDLT LEL
105 104 103 ||er||
102 101
100 10–1 10–2 10–3 0
20
10
30
40 50 60 ith residual
70
80
90
100
Fig. 6. Outdoor experiment #2. Sorted reprojection errors.
(
{ (
) }
pˆi,2 = K c proj g12 θˆ LEL Pi,1 . p Only estimated inliers were used in Fig. 5; Fig. 2, instead, highlights the mismatching between the mea sured and estimated projection, once the optimal transformation is applied to one outlier. One outlier is marked in Fig. 2 with one “x” connected via a dashed black line. Once the pose estimation results are obtained by the two selected techniques, it is possible to compute the resulting normalized squared residuals ˆ i (Equation (14)) around the estimations of the LEL π and the robust DLT algorithm; then, the terms GYROSCOPY AND NAVIGATION
Vol. 4
ˆ i, j log π ˆ i, j , H i ( θˆ j ) = −π
)
reprojections of transformed points g12 θˆ LEL Pi,1 on p the plane of the second image, in the case of LEL esti mations, that is, the quantities
No. 1
i = 1,…, N , j = {LEL, DLT } ˆ i, j in were computed and plotted with respect to π ˆ i, j the Fig. 4. The outliers correspond to the residuals π value of which is very close to the unity. Moreover, the dispersion of such residuals with respect to the others, the ones related to the inliers – residuals closer to zero – can also be seen. The robust DLT algorithm performs in a similar manner since its value was com puted by excluding the outlier and the noisiest points from the dataset. Thus, it can be stated that the intro duced approach (which is run on the whole dataset) is able to perform as well as a twonorm approach, once the dataset is purged of outliers and ambiguous points. Such conclusion was first drawn in [13]; these results
2013
8
DI CORATO et al. 3.5
× 10–4
Pose estimation error⎯Translation DLT LEL
Squared norm of error, m2
3.0 2.5 2.0 1.5 1.0 0.5 0 10
15
20 25 Outliers Probability, %
30
Fig. 7. Comparison of translation estimation errors.
provide a better support to it. Although no analytical guarantee exists, in general, the LEL algorithm per forms as well as the robust DLT algorithm with nonlin ear refinement (tuned with our best efforts), which was used as a benchmark. In some particular experiments, the accuracy of the methods cannot be absolutely stated since no ground truth was available in order to compare algorithms. The outdoor experiment #2, for instance, proposes a case the analysis of which can be considered ambiguous. The scene undergoes a wide change in the baseline (due to a wide camera motion) and the resulting images exhibit a large number of mis matching, as Fig. 5 shows. The two approaches show a sufficient accuracy in estimating the change in the pose, but, since a groundtruth reference (i.e., an additional high accuracy sensor) was not available, there is no way to state whether one algorithm per forms better than the other. In order to perform a qual itative comparison, we defined two sets of possible inliers by selecting the points whose squared reprojec tion error was below the threshold of 2 square pixels. Considering only these residuals, the LEL estimation produces squared reprojection errors which are slightly below the ones produced by the robust DLT algorithm, approximately up to the residual #40. Thus, it can be assumed that the LEL algorithm per forms slightly better than the other in the situation highlighted by Fig. 5. In order to assess numerically the performance of both algorithms, we prepared a Monte Carlo simula tion: LEL and DLT were compared in 50 different pose estimation problems with 100 features each; the features were randomly distributed in the camera viewing cone, and artificial outliers were added in 5 different ratios: 10, 15, 20, 25 and 30%, for a total of 250 test cases. The mean and standard deviation of the
resulting distribution of squared pose estimation error (separately for position and attitude) were recorded and plotted against the outlier level. The results are shown in Figs. 7 and 8; it is evident that LEL outper forms the robust DLT method. 5. KALMAN FILTER FOR THE FUSION OF VISUAL AND INERTIAL MEASUREMENTS The posebased looselycoupled approach in visionaided inertial navigation suffers from drifts since the navigation algorithm is built upon the philos ophy of deadreckoning navigation [7, 10]. This is due to the fact that the correction feeds (being relative in nature) can be seen as pseudo velocities, thus, for instance, a bias or an almost fixed error in each pseudo measurement will lead to a divergent estimation [17]. Thus, since a divergent estimation cannot be avoided when using relative measurements only, the navigation filter aim is to provide the smallest possible drift [8]. 5.1. Definition of the Relative Pose Pseudo Measurement Error Usually all aiding sensors produce absolute mea sures (with respect to a known and fixed reference) of the estimated variables (e.g., GPS measures re and Ve ) while, in the case of visual odometry, the motion mea surements are relative only (i.e., only the relative dis placement between two successive images is mea sured). This section summarizes the equations used for the fusion of the relative motion measurements, given from the pose estimation algorithm, and the inertial data. Without loss of generality, we can assume that the inertial system is initially aligned with the geo detical reference frame (Euler angles equal to zero);
GYROSCOPY AND NAVIGATION
Vol. 4
No. 1
2013
ROBUST VISIONAIDED INERTIAL NAVIGATION ALGORITHM 7
Pose estimation error⎯Rotation
× 10–6
DLT LEL
6 Squared norm of error, rad2
9
5 4 3 2 1 0 10
15
20 25 Outliers probability, %
30
Fig. 8. Comparion of attitude estimation errors.
this is equivalent, for the purposes of this work, to assuming knowledge of the initial attitude. The camera pose at time tk with respect to the ini tial position (at time t0) can be related to the inertial mechanization states (position and attitude) as:
{
}
g k0 = Rk0,Tk0 ,
(19)
where Rk0 = Cbn (t k ) and Tk0 represents the position of the origin of the navigation/body frame at time tk seen from the initial navigation frame (at time t0). For the purposes of this paper, we assumed that the relative displacement (latitude and longitude) between suc cessive images is small enough so that the navigation frame (NED) can be considered not to change its ori entation (with respect to the ECEF frame); thus, a simple planar projection can be used (approximation of a flat surface) to approximate the motion in the neighborhood of the starting point. Then, the camera position can be obtained: ⎡ Pn ( φ k − φ 0 ) ⎤ 0 e (20) Tk = ξ r (t k ) = ⎢Pn cos φ k ( λ k − λ 0 )⎥ , ⎢ ⎥ hk ⎣⎢ ⎦⎥ where re(0) = [φ0 λ0 0] represents the position of the vehicle, in latitude and longitude, when the navigation task began its execution (at time t0). Pn is the radius of the curvature normal to the ellipsoid surface at the point of tangency at the given latitude φ [11]. Given two stereo pairs at successive times tk and tk + h, the relative motion g kk +h, which must be com puted by the vision system, is related to the absolute poses at time tk and tk + h by
(
{
)
} {
(
)}
Here the concept of successive simply states that the images used to estimate the relative motion are related to two different time instants, which can be far both in time and space. The proposed reduced version of the indirect Kal man filter estimates the errors in velocity (δVn), posi tion in the ECEF frame (δre), and accelerometer biases
(δf ) . The state vector is b ib,b
δx k = ⎡⎣δVkn δrke δf ibb,b ⎤⎦ .
It is now necessary to define a filter output that can be used to construct a measurement residual with the vision system output. Thus, first, we construct a navi gation position error estimate using the planar projec tion operator ξ(⋅):
δ y k = H k δ x k =Δ δ Tk , 0
where ⎡ ∂ Tk0 ⎤ H k = C nb,k ⎢0 0 ⎥. e ⎢⎣ ∂ rk rˆke ⎥⎦
Note that the nonzero element in the Hk matrix is
( )
the Jacobian of the function ξ rke with respect to the estimated position in the ECEF frame around rˆke. Then we can estimate the relative navigation error Δδyk + h between time tk + h and tk as:
g kk +h = Ckk +h,Tkk +h = C0k +hCk0, C0k +h Tk0 − Tk0+h . GYROSCOPY AND NAVIGATION
Vol. 4
No. 1
(21)
Δδy k + h = H k + hδx k + h − H k δx k . 2013
10
DI CORATO et al.
Note that Δδyk + h is the estimate of the navigation error of the value Tˆkk + h computed by the inertial mech anization. Since the vision system actually measures g kk +h = C kk +h,Tkk +h (where the rotation term is discarded since
{
5.2. Propagation of Error Navigation Equations At each time step inertial mechanization is per formed to obtain a new estimate of the vehicle state (position, velocity and accelerometer biases):
}
we assume to be using navigation grade gyroscopes), it is possible to compute a pseudomeasure of the rela tive pose error from the measured relative pose Tkk + h and its estimation Tˆkk + h reconstructed from the naviga tion equations, as a function of the filter state. In our case, such error can be written as k +h k +h Δδyk*+h = Tk − Tˆk , where the estimated relative motion is Tˆkk +h = C0k + h(Tˆk0 − Tˆk0+h)
(22)
and the measured relative motion is, by definition, equal to the actual data corrupted by noise ν: k +h k +h Tk = Tk + ν k
( (Tˆ (Tˆ
)
= C0k +h Tk0 − Tk0+h + ν k k +h
= C0
= C0k +h
)
0 0 0 − δTk − Tˆk +h + δTk +h + ν k
0 k
− Tˆk0+h − C0k +h δTk0 − δTk0+h + ν k.
(26) Pk +1 = Ψ k Pk Ψ k + Γ kQΓ k , where Q is the process noise covariance matrix. After h steps (the time span needed to obtain the second image), the covariance matrix becomes
(
)
Thus, the pseudomeasure of the relative pose error can be rewritten as a function of the states (current and of the past) of the indirect Kalman filter only:
(
)
k +h 0 0 Δδy k*+ h = C0 δTk + h − δTk + ν k .
An important aspect to be noted is that the above formulation of the position measurement error does depend on the motion variables corresponding to the current time (via rˆke+h and C0k + h ) and to some steps in the past (via rˆke ). Thus, it is necessary to augment the filter state with a memory of the past; this allows us to keep track of the cross covariance of estimated naviga tion between the two time instants [15]. The state of the error navigation filter is augmented with one exact copy δ x k of itself when a key frame is acquired. Suppose a new reference keyframe arrives at time tk, the state of the Kalman filter will be set to ⎡δx ⎤ (23) δx k = ⎢ k ⎥ ⎣δx k ⎦ and the state covariance matrix is set to ⎡P P ⎤ (24) Pk = ⎢ k k ⎥ , ⎣Pk Pk ⎦ with Pk = E δx k δx kT . The state copy δ x k is initialized to δxk and is kept constant during the filter propaga tion, whereas the state δxk vector is propagated according to error dynamics.
{
}
where f(⋅) represents the discretized version of the standard INS mechanization which maps corrected navigation states on the states at the next time step; variable xˆk−+1 is the estimate of the vehicle position and velocity (at time tk + 1) before the corrections, if any, produced by the Kalman filter are applied (i.e., the a priori estimate). According to the above discussion, the indirect Kalman filter prediction step is performed using (25) ⎡δ x k +1 ⎤ ⎡I 0 ⎤ ⎡δ x k ⎤ ⎡ 0 ⎤ = ⎢ − ⎥ ⎢0 Ψ ⎥ ⎢δ x ⎥ + ⎢Γ ⎥ w k = Ψ k δ x k + Γ k w k . ⎣ k⎦ k⎦ ⎣ k⎦ ⎣δ x k +1 ⎦ ⎣ The propagation equation for the covariance matrix is like for standard Kalman filtering: T
0 k
)
xˆk−+1 = f ( xˆk , uk ) ,
T
T ⎡ ⎛ h ⎞ ⎤ ⎢ Pk Pk ⎜ Ψ k +i ⎟ ⎥ ⎜ ⎟ ⎥ ⎢ ⎝ i =1 ⎠ . (27) Pk + h = ⎢ ⎥ h ⎢ ⎥ Ψ k +i Pk + h ⎢Pk ⎥ ⎢⎣ i =1 ⎥⎦ The offdiagonal blocks represent the crosscorre lation between the navigation errors at time tk and tk + h. When the vision system provides a new relative pose measurement, the update step is performed as follows:
∏
∏
T ⎤ − ⎡ H S k + h = [ H k H k + h ] Pk + h ⎢ Tk ⎥ + Rk ⎣H k + h ⎦ T ⎤ −1 − ⎡ H K k + h = Pk + h ⎢ Tk ⎥ S k + h ⎣H k + h ⎦ k +h k +h Δδy k*+ h = Tk − Tˆk +
(
−
δx k + h = δx k + h + K k + h δxk + h Δδy k*+ h − Δδy k + h
(28)
)
+ − + xˆk + h = xˆk + h +δx k + h
Pk + h = Pk + h − K k + h [ H k H k + h ] Pk + h, +
−
−
where Rk is the measurement noise covariance matrix. Variable xˆk++ h is the estimation of the vehicle position and velocity (at time tk + h), given the corrections pro duced by the Kalman filter (i.e., the a posteriori esti mate).
GYROSCOPY AND NAVIGATION
Vol. 4
No. 1
2013
ROBUST VISIONAIDED INERTIAL NAVIGATION ALGORITHM
11
Fig. 9. Synthetic environment used for simulations.
Tx error, m Ty error, m
0.2
Tz error, m
Absolute position error 0.2
0.2
0
–0.2 0
10
20
30
40
50
0
10
20
30
40
50
0
–0.2 LEL Robust DLT
0
–0.2 0
20
10
30
40
50
Time, s Fig. 10. Position error transient during one of the simulations.
6. SIMULATION RESULTS This section presents some simulations using the proposed entropylike robust pose estimation method in the described inertial navigation framework. Our approach was compared against the results obtained with the RANSACbased direct linear transform with nonlinear refinement via bundle adjustment. The simulations were performed by generating sample (noisy) accelerations and clean angular veloci ties. The result was a sample camera trajectory in 6DOF. The motions of accelerations and angular velocities were generated with the use of a VTOL quad rotor aircraft simulator. Figure 9 shows a snapshot of the synthetic environment [22] used to simulate the camera trajectory and to collect image points. Along such a trajectory, some 3D points were collected along GYROSCOPY AND NAVIGATION
Vol. 4
No. 1
the path and projected onto the image planes of the stereo pair. The IMU process was supposed to run 5 times faster than the vision one. Noise and a certain amount of outliers were used to corrupt 2D features. The pixel noise was a white noise with zero mean and a variance of 0.5 px2. In order to emulate the presence of outliers in the data, some random numbers were added to the imagespace 2D coordinates of the key points; the mean error introduced by the outliers was 5/40 px, and the percentage of outliers with respect to the cardinality of the data set was in the range 5/50%. At each time step corresponding to a new image acqui sition, we generated a new outlier contamination probability and we drew a certain number of outliers according to such probability. Figure 10 shows the absolute navigation error (position) for both LEL and
2013
12
DI CORATO et al.
m/s2
5
× 10–3
accelerometer bias error⎯LEL
0 × 10–3
20 30 40 10 accelerometer bias error⎯DLT
0
10
bax bay baz
0
–5
m/s2
5
50 bax bay baz
0
be reduced with respect to a robust technique based on 2norm minimization plus nonlinear refinement via bundle adjustment. Furthermore, the algorithm is computationally simple (its execution time is compa rable to that of DLT), it requires low memory resources (due to loose coupling), and robustness to outliers allows the use of simple feature trackers. The proposed integrated navigation approach may be used in all the cases where IMU accuracy, although rela tively good for attitude estimation, may not be used for long term position estimation, and, in addition, the GPS signal is not, or only seldom, available; as a mat ter of fact, the estimation filter structure could be eas ily adopted to include seldom absolute position fixes (from the GPS or landmarks).
–5 20
30 time, s
40
50
Fig. 11. Position error transient during one of the simula tions.
DLT. Figure 11 shows the accelerometer bias estima tion error for both algorithms. It is clear that LEL per forms better than DLT. In order to evaluate and com pare algorithmic complexity of LEL and DLT, the exe cution time of the two algorithms was recorded during all the experiments; the results are shown in Table. The two algorithms were implemented in Matlab; we fol lowed coding best practices, and we believe that no substantial improvement in the execution time could be achieved within this environment. Although these figures of merit cannot be substitutive of a computa tional complexity analysis, we may conclude that the two algorithms have comparable execution time. 7. CONCLUSIONS The paper has proposed a robust loosecoupling solution to the visionaugmented inertial navigation problem which makes use of a novel cost function, the entropy of relative squared residuals. Such cost func tion has been shown with simulations and experimen tal results to be robust to the presence of noise and out liers in the visual features. The simulation results have shown that using the LEL approach to solve the prob lem of pose estimation allows the drift in navigation to Comparison of the execution time of DLT and LEL algo rithms DLT
LEL
Average Computational Time [ms]
125.0
116.2
Standard Deviation—1 sigma [ms]
69.7
51.4
REFERENCES 1. Hartley, R. and Kahl, F., Optimal Algorithms in Multi view Geometry, Asian Conference on Computer Vision, 2007. 2. Milella, A. and Siegwart, R., StereoBased Ego Motion Estimation Using Pixel Tracking and Iterative Closest Point, Proc. of the 4th IEEE International Con ference on Computer Vision Systems, 2006. 3. Nistér, D., Preemptive RANSAC for Live Structure and Motion Estimation , 9th IEEE International Con ference on Computer Vision, 2003. 4. Campa, G., Mammarella, M., Napolitano, M. R., Fra volini, M. L., Pollini, L., and Stolarik, B., A compari son of Pose Estimation Algorithms for Machine Vision Based Aerial Refueling for UAVs, IEEE Mediterranean Conference on Control and Automation, 2006, vol. 1. 5. Jones, E. and Soatto, S., VisualInertial Navigation, Mapping and Localization: A Scalable RealTime Causal Approach, International Journal of Robotics Research, 2011, vol. 30, issue 4. 6. Mourikis, A., Trawny, N., Roumeliotis, S., Johnson, A., Ansar, A., and Matthies, L., VisionAided Inertial Nav igation for Spacecraft Entry, Descent, and Landing, IEEE Transactions on Robotics, April 2009, vol. 25, no. 2, pp. 264–280. 7. Roumeliotis S., Johnson, A., and Montgomery, J., Augmenting Inertial Navigation with ImageBased Motion Estimation, Proc. of IEEE International Con ference on Robotics and Automation, 2002. 8. Tardif, J.P., George, M., Laverne, M., Kelly, A., and Stentz, A., A New Approach to VisionAided Inertial Navigation, Proc. of International Conference on Intelli gent Robots and Systems (IROS), 2010. 9. Huber, P.J. and Ronchetti, E.M., Robust Statistics, Sec ond Edition, John Wiley & Sons, Inc, 2009. 10. Konolige, K., Agrawal, M., and Solà, J., LargeScale Visual Odometry for Rough Terrain, Robotics Research; Springer Tracts in Advanced Robotics, 2011, vol. 66, pp. 201–212. 11. Rogers, R., Applied Mathematics in Integrated Navi gation Systems, American Institute of Aeronautics and Astronautics, 2007.
GYROSCOPY AND NAVIGATION
Vol. 4
No. 1
2013
ROBUST VISIONAIDED INERTIAL NAVIGATION ALGORITHM 12. Lowe, D., Object Recognition from Local Scale Invariant Features, Proc. of the International Conference on Computer Vision (ICCV), 1999. 13. Indiveri, G., An EntropyLike Estimator for Robust Parameter Identification, Entropy, 2009, vol. 11, pp. 560–585. 14. Chakrabarti, C. and De, K., BoltzmannGibbs Entropy: Axiomatic Characterization and Application, Journal of Mathematics and Mathematical Sciences, 2000, vol. 23, no. 4. 15. Di Corato, F., Innocenti, M., and Pollini, L., An EntropyLike Approach to VisionAided Inertial Nav igation, Proc. of 18th IFAC World Congress, Milan, Italy, 2011. 16. Di Corato, F., Innocenti, M., Indiveri, G., and Pollini, L., An EntropyLike Approach to Vision Based Autonomous Navigation, Proc. of the IEEE Interna tional Conference on Robotics and Automation, 2011. 17. Nistér, D., Naroditsky, O., and Bergen, J., Visual Odometry, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2004.
GYROSCOPY AND NAVIGATION
Vol. 4
No. 1
13
18. Hornegger, J. and Tomasi, C., Representation Issues in the ML Estimation of Camera Motion, Proc. of the 7th IEEE Int. Conference on Computer Vision, 1999. 19. Schmidt, J. and Niemann, H., Using Quaternions for Parametrizing 3D Rotations in Unconstrained Non linear Optimization, Proc. of the Vision Modeling and Visualization Conference, 2001. 20. Hartley, R.I. and Zisserman, A., Multiple View Geome try in Computer Vision, Cambridge University Press, 2000. 21. Triggs, B., McLauchlan, P., Hartley, R., and Fitzgib bon, A., Bundle Adjustment – A modern synthesis, Vision Algorithms: Theory and Practice, Lecture Notes in Computer Science, 2000, vol. 1883, pp. 298–372. 22. Pollini, L., Innocenti, M., A Synthetic Environment for Dynamic Systems Control and Distributed Simula tion, IEEE Control Systems Magazine, 2000, vol. 20, no. 2, pp. 49–61.
2013