UAV Position and Attitude Estimation using IMU, GNSS and Camera

UAV Position and Attitude Estimation using IMU, GNSS and Camera Cesario Vincenzo Angelino, Vincenzo Rosario Baraniello, Luca Cicala CIRA, the Italian Aerospace Research Centre Capua, Italy Email: {c.angelino,v.baraniello,l.cicala}@cira.it

Abstract—The aim of this paper is to present a method for integration of measurements provided by inertial sensors (gyroscopes and accelerometers), GPS and a video system in order to estimate position and attitude of an UAV (Unmanned Aerial Vehicle). Inertial sensors are widely used for aircraft navigation because they represent a low cost and compact solution, but their measurements suffer of several errors which cause a rapid divergence of position and attitude estimates. To avoid divergence inertial sensors are usually coupled with other systems as for example GNSS (Global Navigation Satellite System). In this paper it is examined the possibility to couple the inertial sensors also with a camera. A camera is generally installed on-board UAVs for surveillance purposes, it presents several advantages with respect to GNSS as for example great accuracy and higher data rate. Moreover, it can be used in urban area or, more in general, where multipath effects can forbid the application of GNSS. A camera, coupled with a video processing system, can provide attitude and position (up to a scale factor), but it has lower data rate than inertial sensors and its measurements have latencies which can prejudice the performances and the effectiveness of the flight control system. The integration of inertial sensors with a camera allows exploiting the better features of both the systems, providing better performances in position and attitude estimation.

I. I NTRODUCTION The integration of measurements provided by inertial sensors (gyroscopes and accelerometers) and video systems (VS) has gained an increasing popularity in the last decade [1]. These kind of sensors bring complementary chacacteristics. Indeed inertial sensors cannot discern a change in inclination from the body acceleration, due to Einstein’s equivalence principle. Moreover, inertial sensors present high uncertainty at low velocities and low relative errors at high velocities while camera motion estimation becomes less reliable at high velocity because of the low camera acquisition framerate. Furthermore, the camera estimates the motion up to a scale factor, i.e., a near object with low relative speed appears the same as a far object with high relative speed. Based on these observations, the motivation for integration of vision and inertial sensing is clear. There are two broad approaches usually called loosely and tightly coupled. The loosely coupled approach uses separate INS and VS blocks, running at different rates and exchanging information. The tightly-coupled systems combine the disparate raw data of vision and inertial sensors in a single, optimum filter, rather than cascading two filters, one for each sensor. Tipically, in order to fusion the data a Kalman Filter is used. The

integration of visual and inertial sensing mode opens up new directions for the application in the field of autonomous navigation, robotics and many other fields. The use of inertial sensors in machine vision applications has been proposed by now more than twenty years ago and further studies have investigated the cooperation between inertial and visual systems in autonomous navigation of mobile robots, in image correction and to improve motion estimation for 3D reconstruction of structures. More recently, a framework for cooperation between vision sensors and inertial sensors has been proposed. The use of gravity as a vertical reference allows the calibration of focal length camera with a single vanishing point, the segmentation of the vertical and the horizontal plane. In [2] is presented with a function for detecting the vertical (gravity) and 3D mapping, and in [3] such vertical inertial reference is used to improve the alignment and registration of the depth map. Virtual reality applications have always required the use of motion sensors worn by the user. The augmented reality, where virtual reality is superimposed on a real time view, is particularly sensitive to any discrepancy between the estimated and the actual motion of the user. An accurate estimate of the user position can be obtained through the use of many sensors, using the external vision and specific markers such as radio transponder, ultrasound beams or lasers, etc. Having available MEMS inertial sensors at low cost, these are used in combination with computer vision techniques [4]. The objective is to have a visual-inertial tracker, able to operate in arbitrary environments relying only on natural observable characteristics, suitable for augmented reality applications. In [5] an extended Kalman filter is used to combine the lowfrequency stability of vision sensors with the high frequency of the gyroscope, thereby reaching a stable tracking of the pose (6 degrees of freedom) both static and dynamic. The augmented reality systems are based on hybrid tracker to fuse the images successfully in real time with 3D models dynamic [4], [6], [7], [8]. Applications in robotics are increasing. Initial works on vision systems for automated passenger vehicles have also incorporated inertial sensors and have explored the benefits of visual-inertial tracking [9], [10]. Other applications include agricultural vehicles [11], wheelchairs [12] and robots in indoor environments [13], [14]. Other recent work related to Unmanned Aerial Vehicles (UAV) include fixed-wing air-

craft [15], [16], [17], [18], rotorcraft [19], [20] and spacecraft [21]. For underwater vehicles (AUVs) recent results can be found in [22], [23], [24]. A good overview of the methods applied for attitude estimation using a video system is provided in [25]. These methods can be generally distinguished in: a) methods based on horizon detection, b) methods based on the analysis of the optical flow, c) methods based on the detection of salient features in the field of view. In [26] the pitch and roll angles, and three body angular rates are determined by detection of the horizon in a sequence of images and by the analysis of the optical flow. The use of the horizon detection technique does not allow determining the value of the yaw angle, whose knowledge is required, for example, in trajectory tracking algorithms. Moreover, for some configurations of the video system or particular maneuvers, the horizon could not be present in the images (for example cameras could frame only the sky if they are also used for Collision Avoidance). In [27] it is proposed another method, based on the detection of the horizon in an images sequence, to estimate roll and pitch angles. In [28] the aircraft complete attitude is determined using a single camera observations. Specifically, the proposed method required the knowledge of the positions of the own aircraft and of the points identified in the camera images. (a georeferenced map of the surrounding space is required). In [29] it is proposed another method to estimate both aircraft position and attitude using a single camera. In this application is required the presence of some reference points or some known geometries in the surrounding space too. This paper is organized as follow. In Section II some notations are introduced. Section III provides a description of the camera system. Section IV and Section V deal with the estimation of camera egomotion and the Kalman Filter theory respectively. Experimental setup and numerical analysis are described in Section VI. Finally, conclusions are drawn in Section VII. II. N OTATIONS We first introduce some notation. Matrices are denoted by capital italics. Vectors are denoted by bold fonts either capital or small. A three-dimensional column vector is specified by (s1 , s2 , s3 )T . A vector is sometimes regarded as a column matrix. So vector operation such as cross product (×) and matrix operations such as matrix multiplication are applied to three-dimensional vectors. Matrix operations precede vector operations. 0 denotes a zero vector. For a matrix A = [aij ], ||A|| qP denotes the Euclidean norm of the matrix, i.e., ||[aij ]|| = 2 ij aij . We define a mapping [·]× from a three-dimensional vector to a 3 by 3 matrix:   0 −x3 0 0 −x1  . (x1 , x2 , x3 )T × =  x3 (1) −x2 x1 0 Using this mapping, we can express cross operation of two vectors by the matrix multiplication of a 3 by 3 matrix and a

column matrix: X × Y = [X]× Y.

(2)

The reference system s, in which the coordinates of the vector x are expressed, is reported as superscribe on the upperleft corner of x with the notation xs . The reference systems considered in the paper are the following: i inertial system; e the ECEF (Earth Cenetered Earth Fixed) system; n the NED (North East Down) system, tangent to the earth ellipsoid, at a reference Latitude and Longitude (Lat0 , Lat0 ); b the ”body” system as seen by the IMU (Inertial Measurement Unit); c the camera system, with fixed orientation with respect the IMU. III. C AMERA E GO - MOTION A. Coordinate System The default coordinate system in which the UAV position, i.e. (Lat, Lon, Alt) of the vehicle center of mass, is given is the ECEF system. Cne is the reference change matrix from NED to ECEF. The attitude and heading of the vehicle is given as Cbn (φ, θ, ψ) where Cbn is the reference change matrix associated with the roll (φ), pitch (θ), and yaw (ψ) angles. In general, the camera standard reference system might be rotated with respect to the body reference system. Thus, the transformation from the camera reference to ECEF is given by Cce = Cne (Lat, Lon) Cbn (φ, θ, ψ) Ccb (3) where Ccb represents the constant reference change matrix from camera system c to body unit system b. B. Essential Matrix Suppose we have a calibrated camera, i.e. with known intrinsic parameters, moving in a static environment following an unknown trajectory. Consider two images taken by the camera at different time. The equation linking two image points p0 and p which are the projection of the same 3D point is: p0 Ep = 0, (4) where p0 = [u0 , v 0 , 1]T and p = [u, v, 1]T are the (normalized) homogeneous coordinates in the second and the first camera c2 reference frame respectively. The matrix E = [r21 ]× Ccc12 is called the essential matrix and contains the information about the camera movement. The essential matrix depends on 3 parameters for the rotation and 2 for the translation (since c2 eq. (4) does on the modulus of r21 ). c notc2 depend Here Cc12 , r21 represents the rigid transformation (rotation and translation) which brings points (i. e. transforms coordinates) from the camera 1 to the camera 2 standard reference system. The coordinates change equation is given by c2 xc2 = Ccc12 xc1 + r21

(5)

where xck represent the coordinates in the camera k standard reference system. s The translation vector between the two camera center r12 = s s O2 − O1 expressed in a generic reference system s is equal c2 to −C2s r21 . The matrix R contains in it different contributions: 1) the rotation of the camera by the pan-tilt unit; 2) changes in the vehicle attitude and heading; 3) rotation between the two NED systems (in t2 and t1 ). Indeed it can be also expressed as Ccc12 = Cec2 Cce1 = Cce2 T Cce1

(6)

By means of eq. (3) and with a little algebra we obtain c2 Ccc12 = Cb2 Cnb22 Cen2 Cne 1 Cbn11 Ccb11 | {z }

(7)

≈I

In the above equation we supposed (Lat,Lon) did not change a lot between two consecutive frames, hence the idendity approximation. This also means that the two NED reference systems (i.e., corresponding to n1 and n2 ) can be thought to be coincident up to a translation. The eq.(7) then reads as Ccc12 ≈ Cbc22 Cnb22 Cbn11 Ccb11 .

(8)

Therefore, supposing a fixed camera orientation with respect to the body, the functional dependency is only in the rotation of the body, i.e. in the change of the vehicle attitude and heading, Cbb12 ≈ Cnb22 Cbn11 ≈ R(dφ, dθ, dψ) = Rx (dφ)Ry (dθ)Rz (dψ) (9) where dφ = φ2 − φ1 , dθ = θ2 − θ1 and dψ = ψ2 − ψ1 . The expression of R is given in (10) on the top of the next page. It can be approximated for small angle variations by 

1 R(dφ, dθ, dψ) ≈  −dψ dθ

dψ 1 −dφ

 −dθ dφ  = I − W dt (11) 1

where ω = [dφ, dθ, dψ]T /dt is the angular velocity (in the body reference system) and W = [ω]× . c2 Once estimated r21 and Ccc12 from the essential matrix E, they give information on v(t) and ω(t) respectively. Such an information can be used either as a prediction or as a measurement for the Kalman Filter. Similarly, the IMU output can be used as a priori information for the vision algorithms. IV. C OMPUTATION OF E AND ERROR ANALYSIS A. Fast computation of image point corrispondences To calculate the essential matrix by video analysis, the point corrispondences between two successive frames can be calculated using a computer vision system based on the Lukas-Kanade method [30]. This method assumes that the displacements of the points belonging to the image plane is small and approximately constant. Thus the velocity of

the points in the image plane must satisfy the Optical Flow equation ˙ Iu (p)u˙ + Iv (p)v˙ = −I(p) (12) where p = (u, v) is a point in the image plane, Iu , Iv are the components of the spatial gradient of the image intensity in the image plane and x˙ is the notation for the temporal derivative of x. The Lucas-Kanade method [30] can be used only when the image flow vector between the two frames is small enough for the differential equation of the optical flow to hold. When this condition does not hold, a rough preliminary prediction of the optical flow must be performed with other methods and the Lukas-Kanade method can be used for refinement only. When external sensors, like IMU or GNSS (Global Navigation Satellite System), are available, the camera motion is predictable, so that a rough estimation of the optical flow can be easily initialized starting from the sensor measurements [31]. To aim this task, a basic knowledge of the scene 3D geometry should be known or at least approximatively known. This is the case of the Earth surface that can be modeled as ellipsoid, geoid or digital elevation model. Using this prior information, the computational burden of the optical flow estimation can be strongly reduced, so that the delay of the video processing can be made compatible with the application of the proposed technology of sensor fusion to the navigation scenario. To further reduce the computational burden of the computer vision algorithms an improvement of the Lukas-Kanade method, the Kanade-Lucas-Tomasi (KLT) feature matching algorithm [32], has been adopted. The KLT has been improved using the ellipsoid model of the Earth surface, increasing the speed of calculus of the Optical Flow. B. Essential matrix calculation Obtained a set of corrispondences of image points for a couple of successive frames, our objective is to find the essential matrix E. Each point corrispondence gives one equation 4 which is linear and homogeneous in the elements of E. n point correspondences give N such equations. Let   e1 e4 e7 E =  e2 e5 e8  , e = (e1 , e2 , . . . , e9 )T . (13) e3 e6 e9 Rewriting equation (4) in the elements of E using k point correspondences, we have A e = 0. where A is  u1 u01  u2 u02  A= .  ..

uk u0k

(14)

u1 v10 u2 v20 .. .

u1 u2 .. .

v1 u01 v2 u02 .. .

v1 v10 v2 v20 .. .

v1 v2 .. .

u01 u02 .. .

v10 v20 .. .

uk vk0

uk

vk u0k

vk vk 0

vk

u0k

vk0

1 1 .. .

    

1 (15) In presence of noise, A is a full rank matrix. Therefore we solve for unit vector h such that ||Ah|| = min.

(16)



cos dθ cos dψ R(dφ, dθ, dψ) =  sin dφ sin dθ cos dψ − cos dφ sin dψ cos dφ sin dθ cos dψ + sin dφ sin dψ

The solution of h is the unit eigenvector of AT A associated with the smallest eigenvalue. Then E is determined by   h1 h4 h7 √ E = [E1 E2 E3 ] = 2  h2 h5 h8  . (17) h3 h6 h9 C. Getting motion parameters from E Once got the essential matrix, the Huang and Faugeras theorem [33] allows to factorize E in rotation (Ccc12 ) and c2 translation (r21 ). Let E = U DV > be the singular value decomposition (SVD) of E and let     0 −1 0 0 1 0 S0 =  1 0 0  and R0 =  −1 0 0  (18) 0 0 0 0 0 1 c2 Then, there are four possible factorizations for E = [r21 ]× Ccc12 given by c2 [r21 ]× = U (±S 0 )U >

Ccc12

0

= βU R V

>

or

(19) Ccc12

0>

= βU R V

>

(20)

cos dθ sin dψ sin dφ sin dθ sin dψ + cos dφ cos dψ cos dφ sin dθ sin dψ − sin dφ cos dψ

Gh = S∆S T [h1 I9 h2 I9 . . . h9 I9 ],

D. Error analysis In order to do an error analysis, we want to relate the errors in the correpsondence matrix A to the solution of the minimization problem (16). The sources of errors in the image coordinates include spatial quantization, feature detector errors, point mismatching and camera distortion. In a system that is well calibrated so that systematic errors are negligible, errors in the image coordinates of a feature can be modeled by random variables. These errors result in the errors of the estimates of the motion parameters. The covariance matrix for e is Σe = 2Σh ' 2Dh ΣAT DhT ,

(22)

where ΣAT is the covariance matrix of AT and Dh = Gh GAT A

(23)

GAT A = [Fij ] + [Gij ],

(24)

where [Fij ] and [Gij ] are matrices with 9 by n submatrices Fij and Gij , respectively. Fij = aij I9 . Gij is a 9 by 9 matrix

(25)

where the columns of S represent the eigenvectors (ordered in nondecreasing order) of S T S, ∆ = diag {0, (λ1 − λ2 )−1 , . . . , (λ1 − λn )−1 }, λ being the eigenvalues of AT A, and [h1 , . . . , h9 ] is the solution of the minimization problem (16). Assume the errors between different points and different components in the image coordinates are uncorrelated, and they have the same variance σ 2 . With this assumption we get ΣAT = σ 2 diag {Σ1 , Σ2 , . . . , Σn } , where Σi , 1 ≤ i ≤ n, is a 9 by 9 submatrix:  0 0T   pi pi 0 0 ui ui J ui vi J p0i p0T 0 + ui vi J vi vi J Σi =  0 i ui J vi J 0 0 p0i p0T i where



1 J = 0 0

(21)

As observed in [34], the right factorization can be determined by imposing that 3D points (which can be founded by triangulation), must lie always in front of the camera, i.e., their z-coordinate must be positive.

(10)

with the ith column being the column vector Aj and all other columns being zeros. The matrix Gh is given by

where β = det U V > .

 − sin dθ sin dφ cos dθ  cos dφ cos dθ

0 1 0

 0 0 . 0

(26)

 ui J vi J  , J (27) (28)

V. K ALMAN F ILTER The final purpose of the proposed approach is the estimation of the position and attitude of the UAV. The data fusion algorithm is based on the Extended Kalman Filter (EKF), because dynamic and observation equations are non-linear in their original form. The state vector includes the position and speed of the aircraft UAV in the tangent reference frame, and the rotation matrix from body to tangent reference frame. The EKF is applied to estimate the state variables. The input measurements for prediction of state variables are inertial accelerations and angular speeds processed by an IMU. The fusion algorithm is composed by three EKF as sketched out in Figure 1. With reference to Figure 1: • the EKF block GPS-IMU uses linear accelerations provided by the IMU and position and speed provided by GPS in order to compute position and speed of the aircraft center of gravity; •

the VIDEO-IMU(1) block is based on angular speed measurements provided by the IMU and attitude differences measured by the video-system in order to estimate aircraft attitude. Attitude estimates are provided to the other two EKF blocks;

The output equation for the estimation of speed based on video measurements is: ∆p n · |V | = vˆcg , (35) ∆t where ∆p is provided by the video system and represent the three unit vector in the direction of the movement, |V | is the Euclidean norm of the speed considered constant (is the last valid value obtained from the GPS-IMU sensor fusion algorithm) and ∆t is the time difference between two measurements of the video system.

Fig. 1.

•

Proposed fusion algorithm block diagram.

the VIDEO-IMU (2) block takes the linear accelerations provided by the IMU as in GPS-IMU, but it uses as observables the position differences provided by the video system, in order to estimate position and speed of the aircraft center of gravity. This filter is used during GPS outages only in order to propagate position and speed. When it is not activated, its output is set equal to zero.

For the sake of simplicity, we will consider the IMU installed at the center of gravity of the aircraft (in order to eliminate lever arm effects) with its axes aligned with the body axes. Euler angles are used to represent aircraft attitude. Therefore, we have n ˆcg r¨ = Cˆbn f˜b + g n n n rˆ˙cg = vˆcg

and

 ˆ φ¨   b ˆ θ, ˆ ψˆ · ω  θˆ  = J φ, ¨ ˜ nb   ˆ ψ¨

(29) (30)



(31)

Starting from these non-linear equations, EKF perturbated form of the prediction equations are derived. Output equations are based on the following GPS predicted measumerents n b + Cbn · rcg−GP (32) pˆsGP S = Cns rˆcg S h i b b n n n vˆGP S = vˆcg + Cb · ω ˜ nb × rcg−GP S . (33) For the video system considering that it provides deltaattitudes (derived from the essential matrix), a pseudo-attitude video based is built starting from an initial known value.  ˆ   ˆ   ˆ  φ φ dφ ˆ ˆ   θˆ     = + dθ (34) θ ˆ ˆ ˆ ψ video(k) ψ video(k−1) dψ video(k)

VI. N UMERICAL A NALYSIS Two types of sensors configurations have been considered. The first one refers to a big lever arm between GPS and aircraft center of gravity (10 meters on each axis), while the second one is a more typical configuration because it consider a lever arm only for the y-body axis with a magnitude of two meters. Figure 2 shows the UAV attitude estimation results relative to the first configuration. Attitude is reported in terms of roll, yaw and pitch angles. A performance improvement of attitude estimation accuracy is clearly observable since the accuracy of video based estimation is better than the GPS one. Indeed the static accuracy of Video System is of the order of only 1 degree, while the GPS accuracy is of the order of 5 degrees. For the second configuration, results are reported in Figure 3. It can be observed that the φ angle estimation shows the same behavior of the first configuration for both GPS and video assisted estimation algorithms. This happens because observability is guaranteed by the lever arm. On the contrary, for θ (not reported here for the sake of brevity) and ψ angles, the estimation error with only the GPS is much greater with respect to the previus configuration. This is principally due to the absence of a significant lever arm, in which case the estimation error is that of simple INS measurements integration. However, the IMU+VIDEO algorithm is more robust to the lever arm effect since it still performs as well as in the first congifuration. The distance between the GPS antenna and the aircraft center of gravity is generally minimized because of the aircraft configuration and the need to reduce cables and wiring. Moreover, a big lever arm introduces errors in the estimation of aircraft center of gravity position and speed. Figures 4 and 5 show the results of position and speed estimations with video measurements. At time instant 120 sec a GPS outage has been simulated with a duration of 120 sec. This situation could happen, for example, while traveling through a tunnel or in wooded area, in indoor navigation and/or in hostile environments where the GPS signal is not available. The use of video measurements allows continuing estimation of position and speed with a good accuracy throughout the sequence. Without the fusion of these measurements, solution would diverge after short time, due to integration of inertial measurement errors. VII. C ONCLUSION In this paper an innovative technique for estimation of the position and attitude of an UAV has been proposed. The main

40

20 10

50

100 150 Time [sec]

6

10

−10 0

250

50

100 150 Time [sec]

200

250

200 150

2 Psi Angle [deg]

Theta Angle [deg]

200

Estimated Video Reference Estimated GPS

4

0 −2 −4

100 50 0


−6 −50

−8 −10 0

50

100 150 Time [sec]

200

200

250


150 Psi Angle [deg]

20

0

0 −10 0


30 Phi Angle [deg]

30 Phi Angle [deg]

40


100 50 0 −50 0

50

100 150 Time [sec]

200

250

Fig. 2. Attitude estimation in terms of Euler angles for the big lever arm configuration. From top to bottom: roll (φ) pitch (θ) and yaw (ψ) angles. Groundtruth (Red), IMU+GPS estimation (Black) and Video aided estimation (Blue).

innovative aspect of this technique concerns the introduction of a vision-based system composed of a low-cost camera device. The information carried by the camera is then integrated with classical data coming from the IMU and GPS units in a sensor fusion algorithm. A linearized Kalman Filter has been

−100 0

50

100 150 Time [sec]

200

250

Fig. 3. Attitude estimation in terms of Euler angles for the small lever arm configuration. Roll (up) and yaw (down) angles. Groundtruth (Red), IMU+GPS (Black) and IMU+Video estimation (Blue).

implemented, because dynamic and observation equations are non-linear in their original form. The simulation setup is composed of accurate models for realistic UAV flight behavior, sensors simulation and 3D photorealistic rendering. Preliminary results are very encouraging, since they show the improvement up to one order of magnitude on the attitude accuracy, using an HD camera. Future work is the validation of simulation results on a real flight test using a mini-UAV platform. R EFERENCES [1] P. Corke, J. Lobo, and J. Dias, “An introduction to inertial and visual sensing,” I. J. Robotic Res., vol. 26, no. 6, pp. 519–535, 2007. [2] J. Lobo and J. Dias, “Vision and inertial sensor cooperation using gravity as a vertical reference,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, no. 12, pp. 1597–1608, Dec. 2003. [3] ——, “Inertial sensed ego-motion for 3d vision,” J. Robot. Syst., vol. 21, no. 1, pp. 3–12, Jan. 2004. [4] S. You, U. Neumann, and R. Azuma, “Orientation tracking for outdoor augmented reality registration,” IEEE Computer Graphics and Applications, vol. 19, pp. 36–42, 1999. [5] S. You and U. Neumann, “Fusion of vision and gyro tracking for robust augmented reality registration,” in Proceedings of the Virtual Reality 2001 Conference (VR’01), ser. VR ’01. Washington, DC, USA: IEEE Computer Society, 2001, pp. 71–.

30 20

Estimated Video Reference

−100

10 Pos Down [m]

Vel North [m/sec]

0


0 −10 −20

−200 −300 −400

−30 −500

−40 −50 0

50

100 150 Time [sec]

25

200

−600 0

250


20

50

100 150 Time [sec]

200

250


−440

Pos Down [m]

Vel East [m/sec]

−460 15 10 5

−480 −500 −520 −540

0 −5 0

−560 50

100 150 Time [sec]

Vel Down [m/sec]

5

200

120

250


[9] [10]

[11]

[12]

50

100 150 Time [sec]

200

250

Fig. 4. Velocity estimation in North-East-Down (NED) reference system. Groundtruth (Black), IMU+Video estimation (Blue).

[13]

[14] [6] L. P., K. A., P. A., and B. G., “Inertial tracking for mobile augmented reality,” in Proc. of the IMTC 2002, vol. 2, Anchorage, USA, 2002, pp. 1583–1587. [Online]. Available: Lang2002.pdf [7] U. Neumann, S. You, J. Hu, B. Jiang, and J. Lee, “Augmented virtual environments (ave): Dynamic fusion of imagery and 3d models,” in Proceedings of the IEEE Virtual Reality 2003, ser. VR ’03. Washington, DC, USA: IEEE Computer Society, 2003, pp. 61–. [8] B. Jiang, U. Neumann, and S. You, “A robust hybrid tracking system for outdoor augmented reality,” in Proceedings of the IEEE Virtual Reality

160

180 200 Time [sec]

220

240

Fig. 5. Down position estimation. Top: entire simulation. Bottom: zoom during the GPS outage. Groundtruth (Black), IMU+Video estimation (Blue).

0

−5 0

140

[15] [16]

[17]

2004, ser. VR ’04. Washington, DC, USA: IEEE Computer Society, 2004, pp. 3–. E. D. Dickmanns, “Vehicles capable of dynamic vision: a new breed of technical beings?” Artif. Intell., vol. 103, no. 1-2, pp. 49–76, Aug. 1998. J. Goldbeck, B. Huertgen, S. Ernst, and L. Kelch, “Lane following combining vision and dgps,” Image and Vision Computing, vol. 18, no. 5, pp. 425–433, 2000. T. Hague, J. A. Marchant, and N. D. Tillett, “Ground based sensing systems for autonomous agricultural vehicles,” Computers and Electronics in Agriculture, vol. 25, no. 1-2, pp. 11–28, 2000. T. Goedem?, M. Nuttin, T. Tuytelaars, and L. V. Gool, “Vision based intelligent wheel chair control: The role of vision and inertial sensing in topological navigation,” Journal of Robotic Systems, vol. 21, no. 2, pp. 85–94, 2004. D. D. Diel, P. DeBitetto, and S. Teller, “Epipolar constraints for visionaided inertial navigation,” Proc. IEEE Motion and Video Computing, pp. 221–228, 2005. I. Stratmann and E. Solda, “Omnidirectional vision and inertial clues for robot navigation,” Journal of Robotic Systems, vol. 21, no. 1, pp. 33–39, 2004. J. Kim and S. Sukkarieh, “Real-time implementation of airborne inertialslam,” Robot. Auton. Syst., vol. 55, no. 1, pp. 62–71, Jan. 2007. M. Bryson and S. Sukkarieh, “Building a robust implementation of bearing-only inertial slam for a uav,” J. Field Robotics, vol. 24, no. 1-2, pp. 113–143, 2007. J. Nyg˚ards, P. Skoglar, M. Ulvklo, and T. H¨ogstr¨om, “Navigation aided image processing in uav surveillance: Preliminary results and design of an airborne experimental system,” J. Robot. Syst., vol. 21, no. 2, pp. 63–72, Feb. 2004.

[18] S. Graovac, “Principles of fusion of inertial navigation and dynamic vision,” J. Robot. Syst., vol. 21, no. 1, pp. 13–22, Jan. 2004. [19] L. Muratet, S. Doncieux, Y. Briere, and J.-A. Meyer, “A contribution to vision-based autonomous helicopter flight in urban environments,” Robotics and Autonomous Systems, vol. 50, no. 4, pp. 195–209, 2005. [20] P. Corke, “An inertial and visual sensing system for a small autonomous helicopter,” J. Field Robotics, vol. 21, no. 2, pp. 43–51, 2004. [21] S. I. Roumeliotis, A. E. Johnson, and J. F. Montgomery, “Augmenting inertial navigation with image-based motion estimation,” in ICRA. IEEE, 2002, pp. 4326–4333. [22] R. Eustice, H. Singh, J. Leonard, M. Walter, and R. Ballard, “Visually navigating the rms titanic with slam information filters,” in Proceedings of Robotics: Science and Systems, Cambridge, USA, June 2005. [23] A. Huster and S. M. Rock, “Relative position sensing by fusing monocular vision and inertial rate sensors,” Ph.D. dissertation, STANFORD UNIVERSITY, 2003. [24] M. Dunbabin, K. Usher, and P. I. Corke, “Visual motion estimation for an autonomous underwater reef monitoring robot,” in FSR, 2005, pp. 31–42. [25] A. E. R. Shabayek, C. Demonceaux, O. Morel, and D. Fofi, “Vision based uav attitude estimation: Progress and insights,” Journal of Intelligent & Robotic Systems, vol. 65, pp. 295–308, 2012. [26] D. Dusha, W. Boles, and R. Walker, “Attitude estimation for a fixedwing aircraft using horizon detection and optical flow,” in Proceedings of the 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications, ser. DICTA ’07. Washington, DC, USA: IEEE Computer Society, 2007, pp. 485–492. [27] S. Thurrowgood, D. Soccol, R. J. D. Moore, D. Bland, and M. V. Srinivasan, “A Vision based system for attitude estimation of UAVS,” in IEEE/RSJ International Conference on Intelligent Robots and Systems. IEE, October 2009, pp. 5725–5730.

[28] V. Sazdovski, P. M. G. Silson, and A. Tsourdos, “Attitude determination from single camera vector observations,” in IEEE Conf. of Intelligent Systems. IEEE, July 2010, pp. 49–54. [29] Z. Zhu, S. Bhattacharya, M. U. D. Haag, and W. Pelgrum, “Using singlecamera geometry to perform gyro-free navigation and attitude determination,” in IEEE/ION Position Location and Navigation Symposium (PLANS), vol. 24, no. 5. IEEE, May 2010, pp. 49–54. [30] B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2, ser. IJCAI’81. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1981, pp. 674–679. [31] M. Hwangbo, J.-S. Kim, and T. Kanade, “Inertial-aided klt feature tracking for a moving camera,” in IROS. IEEE, 2009, pp. 1909–1916. [32] C. Tomasi and T. Kanade, “Detection and tracking of point features,” International Journal of Computer Vision, Tech. Rep., 1991. [33] T. S. Huang and O. D. Faugeras, “Some properties of the e matrix in two-view motion estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 12, pp. 1310–1312, Dec. 1989. [Online]. Available: http://dx.doi.org/10.1109/34.41368 [34] H. C. Longuet-Higgins, “A computer algorithm for reconstructing a scene from two projections,” Nature, vol. 293, no. 5828, pp. 133–135, 1981.