Robust Direct Visual Inertial Odometry via Entropy ...

8 downloads 2012 Views 2MB Size Report
School of Computer Science and Electronic Engineering, University of Essex. Wivenhoe Park ..... university of Essex for their technical support. REFERENCES.
Proceedings of 2015 IEEE International Conference on Mechatronics and Automation August 2 - 5, Beijing, China

Robust Direct Visual Inertial Odometry via Entropy-based Relative Pose Estimation Jianjun Gui, Dongbing Gu and Huosheng Hu School of Computer Science and Electronic Engineering, University of Essex Wivenhoe Park, Colchester CO4 3SQ, UK. Emails: {jgui,dgu,hhu}@essex.ac.uk

estimation from different sensors then fused with inertial data. Although separating these two measurement sources leads to a reduction in computational cost, it comes with the expense of information loss, say, the measurement correlations and biases estimation. In order to achieve higher precision, tightly coupled methods are more favourable, which jointly estimate robotic pose and scene structures using both visual and inertial data, thus involving calibration between two sensors naturally. Like the work in [7], the authors use the feeds from these two sensors under a tightly coupled structure to statistically optimize the 6 DoF IMU pose, the inter-sensor calibration, the visual scale factor in monocular systems, and the gravity vector in the vision frame. Although it can not be ignored that tightly coupled methods essentially own the disadvantage of more computational complexity, strategies like dynamic marginalizing out the oldest state, keeping a moving window [8] or select keyframes among sequential images [9] can still bound these algorithms to an acceptable computational level. How to exploit the visual information always determines the efficiency of whole framework and recovered results of robotic pose or scene structure. The most traditional approach is to extract a sparse set of salient features, then express them as feature descriptors [10]. And because of this, such methods often come with the limitation that only the scene meeting a certain feature pattern (e.g. corners, lines) can be expressed and used. However, due to the successful rotational and scale invariant character of some feature descriptors and detectors, some classical methods in visual SLAM or visual odometry (e.g. PTAM [1], monoSLAM [11]) are developed and commonly accepted. But in order to use such algorithms to provide a real-time on-board guidance for MAV missions, it still can not neglect the computational burden when mapping features into high-dimension description or performing outlier rejection. In contrast, the direct methods recover the structure or motion directly from intensity and gradient values of image regions. The magnitude and direction can be used in estimation compared to only distance discriminant used in feature matching process. The most prominent character of direct based methods is all information in a sole image can be fully exploited, even for environments which have little texture, few keypoints or huge impact of camera defocus and blur. Only recently some direct based visual odometry methods

Abstract— Visual solution methods, like monocular visual odometry and monoSLAM, have attracted increasingly interests in robotics area. However, due to the large computational burden around volume sequential images processing, it is still hard to make numerous visual-based algorithms applying in highly agile platforms like Micro Aerial Vehicle (MAV) in real-time circumstance. In this paper, we present a method, which combines the direct image information from monocular camera and the measurements from inertial sensor in an Extend Kalman Filter (EKF) framework to perform an effective odometry solution. In contrast to other odometry methods, our solution gets rid of traditional feature extraction and expression, using the mutual information between images to perform the tracking. This entropy based tracking method enhances the robustness to illumination variation. The result of our method has been tested on real data. Index Terms— Visual inertial odometry, Tracking, Pose estimation, Entropy, Mutual information

I. I NTRODUCTION With the development of Microelectromechanical systems (MEMS), tiny sensors with relatively high quality became available. Many mobile platforms like smart phones, MAVs or even toys are able to provide the live video or system dynamic parameters to users. The information is easily accessible but with large redundant data or noise. In order to enlarge the application area with limited sensor information, related research about localisation, mapping and navigation using only monocular camera and an IMU becomes increasingly popular. However, there is always a trade-off in dealing more environment information with less computational burden. In recent years, related community launched several algorithms or frameworks for odoemtry and SLAM research [1], [2], [3], but most of them are still in controlled laboratory environments with obvious feathers or high performance computational units. A. Related Work While considering visual inertial fusion, there are two major branches in methodology. The most computational efficient approaches are loosely coupled ones, i.e., processing inertial data and image measurements separately. For instance, methods like [4] the authors firstly process the consecutive images for estimating relative motion then fuse them with the inertial measurement. Alternatively, like [5], inertial measurement can be used to perform rotational estimation, then fused with imagebased algorithm or in [6], they regard the visual framework as a black box which can be replaced by any other pose 978-1-4799-7098-8/15/$31.00 ©2015 IEEE

887

random variable X has possible values {x1 , · · · , xn }, its the entropy H(X) is given by:

have been proposed: In [12], [2], fully dense depth maps are built, depending on the computational ability of state-of-art GPU, while it is very hard to launch on an easily practical usage. Thus, the researchers in [13] proposed a semi-dense depth mapping method to cut down dense regions for reducing the computational complexity. The method [3] combines direct information with keypoints repetitively to enhance intra-frame tracking results and whole frame rate. However, all these methods purely focus on visual algorithms thus may not notice what easy-get proprioceptive information (e.g. from IMU, baro, compass) in MAVs can be used to aid improving estimation results. As the image intensity is quite sensitive to illumination change, directly using the image intensity would trigger large difference in pixels even with small movement. Thus, the information reflected by the intensity is more valuable than the intensity itself. In information theory, such information between two signals is defined as mutual information proposed by Shannon [14]. Considering two images as two different signal sources, the higher value the mutual information is, the better match result between these images would be. This method has been widely used in image registration [15] or visual servo [16], but not yet adopted in inertial aided visual odometry.

H(X) =

n X

P (xi )I(xi ) = −

i=1

n X

P (xi ) log2 (P (xi ))

(1)

i=1

where I(x) = − log2 (P (x)) represents the self-information. Intuitively, the more number of x are in equally probability, the larger value entropy H(X) would be, until reaching the maximum: H (p1 , · · · , pn ) ≤ H (1/n, · · · , 1/n) = log2 (n). Based on the definition of entropy, the system containing two or more random variables also can be introduced similarly. Joint entropy H(X, Y ) describes the uncertainty associated with a set of variables, defining the theoretical number of bits needed to encode a joint system of two random variables X, Y with possible values {x1 , · · · , xn } and {y1 , · · · , ym } respectively. H(X, Y ) = −

n X m X

P (xi , yj ) log2 (P (xi , yj ))

(2)

i=1 j=1

The joint entropy is bounded by max (H(X), H(Y )) ≤ H(X, Y ) ≤ H(X) + H(Y ), the second equality happens if and only if X and Y are statically independent, while the first inequality becomes equal when Y ⊆ X or X ⊆ Y , which means X or Y can fully represent the other. Under the definition of joint entropy, the alignment problem can be viewed as finding the minimum of entropy, leading to the two set of variables being represented by each other in a large extent. In practical, adding a variable Y into a system only increase the entropy H(X, Y ) but adding a variable set, which is exactly equal to original set X, i.e. X = Y , or only a constant set (variable probability is zero) will not add variability to the system. In the constant case, the set Y obviously can not express X, thus the alignment fails. However, the conception of mutual information has been proposed to solve this issue [14]. The mutual information (MI) or transformation of two random variables is a measure of the variables’ mutual dependence. For two random variables X and Y , their MI can be given by the following equation:

B. Outlines In this paper, we propose an effective approach to tackle the issue of pose estimation, which are only driven from data of IMU and direct information of sequential images. With the aim of applying the algorithm into realistic scenarios with limited on-board computational ability, an entropy based tracking method is adopted, while in other thread, the probability depth map within regions with large gradient is evaluated. In our framework, the IMU is regarded as the proprioceptive sensor, which provides the state prediction under EKF; the monocular camera works as an exteroceptive sensor, the estimated pose from vision will be directly fused with the prediction in updating process. As far as we know, no methods in literature currently adopt mutual information in the issue of filtering based viusal inertial odometry. This paper is structured as follows: tracking on mutual information will be firstly introduced in section II, followed by the depth mapping part. In section III, the IMU dynamic equation and a brief measurement will be stated in an EKF framework. Some primary results based on real data will be analysed in section IV.

I(X, Y ) = H(X) + H(Y ) − H(X, Y ). Substituting equation (1) and (2), it yields to:   XX P (x, y) I(X, Y ) = P (x, y) log2 P (x)P (y)

(3)

(4)

x∈X y∈Y

II. T RACKING AND M APPING

where P (x, y) is the joint probability distribution function of X and Y , and P (x) and P (y) are the marginal probability distribution functions of X and Y respectively. As we can see in equation (3), if the random variables X and Y are independent, then I(X, Y ) = 0; if they totally depends on each other, say X = Y , then I(X, Y ) = H(X) = H(Y ). It can be easily checked that even for the set Y including most constant outliers, I(X, Y ) = 0 due to H(Y ) = 0, then the

A. Tracking on direct image information 1) Expression of mutual information: In information theory, entropy H(X) defines the theoretical number of bits needed to encode a random variable X. This random variable could stand for an event, sample or character drawn from a distribution or data stream. Here, for visual tracking, we consider the images or regions of interest as the random variables. If a discrete 888

acquire the pose through an optimization problem as {C} X ξ

f u

X′

ξ

du

(8)

which can be solved by Levenberg-Marquardt method[18],

Y C

u′ {C′ }

ξ ? = min (−I (IC (ξ), IC0 )) ,

Z

pu

ξ = −α (Hξ + β[Hξ ]d )

Z′

−1

GTξ

(9)

where Hξ and Gξ are the gradient and Hessian matrix of equation (8) with the parameters α and β. The gradient can be computed as

Y′

∂I (IC (ξ), IC0 ) ∂ξ   X ∂P (i, j)  P (i, j) =− 1 + log2 ∂ξ P (i) i,j

Fig. 1. Camera movement and spacial projection. A 3D point C pu is projected into two image frames linked by motion ξ.

Gξ = −

alignment will fail. Thus, the maximum value of the mutual information means more match of these two signals. Compared to sum of squared differences (SSD), this alignment method does not need to find the linear relation between two signals [17]. 2) Pose tracking based on mutual information of images: The goal of visual tracking is to obtain the current pose through image alignment. Here the random variables mentioned in above section become pixel intensity of images. The relationship between pixel pairs through a movement can be shown in Fig. 1. A rigid body transformation ξ denotes rotation and translation, while a point C pu in 3D space maps to the image coordinate u via the camera projection. With this transformation ξ, the mutual information of two images is given by:   XX P (i, j, ξ) I(X, Y ) = P (i, j, ξ) log2 (5) P (i, ξ)P (j) i j

(10)

with the joint probability that can be calculated further by chain rule in equation (6),   1 X λ0 ∂Φ ∂IC ∂u λ0 ∂P (i, j) =− Φ j − IC0 (11) ∂ξ N u λ ∂IC ∂u ∂ξ λ ∂u C where ∂I ∂u is the image gradient and ∂ξ is the interaction matrix at point u, given by a perspective projection:   ∂u −1/du 0 u/du uv −(1 + u2 ) v = (12) 0 −1/du v/du 1 + v 2 −uv −u ∂ξ

The depth value du above is estimated through a mapping process, which is similar to the one in [19]. Having the gradient matrix, the Hessian is easily to be approximately given by omitting a higher order term, ∂Gξ ∂ξ  X ∂P (i, j) T ∂P (i, j)  1 1 '− − ∂ξ ∂ξ P (i, j) P (i) i,j

Hξ =

where i and j are pixel intensities in image IC and IC0 , respectively. P (i, ξ) and P (j) are the probability of intensity i and j in the images, and P (i, j, ξ) is the joint probability of two intensities. They can be computed as a normalized histogram:   1 X λ0 P (i, ξ) = Φ i − IC (u, ξ) N u λ   X 1 λ0 P (j) = Φ j − IC0 (u) (6) N 0 λ u     1 X λ0 λ0 P (i, j, ξ) = Φ i − IC (u, ξ) Φ j − IC0 (u) N u λ λ

(13)

III. V ISUAL I NERTIAL F USING A. IMU data driven dynamic model An IMU state vector of 3D rigid body at any time instant can be defined by a 16 × 1 vector,  T xI = IW q¯T W pTI W vIT bTg bTa (14) where IW q¯ is the unit quaternion describing the rotation from world frame {W} to IMU frame {I}, W pI and W vI are the position and velocity with respect to {W}, bg and ba are 3 × 1 vectors which describe the biases affecting the gyroscope and accelerometer measurements, respectively. Assuming that the inertial measurements contain the noises with zero-mean Gaussian models, denoted as ng and na , the real angular velocity ω and the real acceleration a are related with gyroscope and accelerometer measurements in the following form:

where N is the number of pixels in selected region, λ is the gray levels of the original image and λ0 is a desired value space. Φ(·) is B-spline function as following:  if t ∈ [−1, 0]  t+1 −t + 1 if t ∈ [0, 1] Φ(·) = (7)  0 otherwise.

ωm = ω + bg + ng

Having the mutual information between images, we can 889

am = a + ba + na

(15)

 T with the noise vector n = nTa , nTwa , nTg , nTwg . And the covariance matrix of n depends on the noise  characteristics 2 2 2 2 of IMU, Qc = diag σn , σ , σ , σ nwa ng nwg . a In practical, the inertial measurements for state propagation are obtained from IMU in a discrete form, thus we assume the signals from gyroscopes and accelerometers are sampled with time interval ∆t, and the state is propagated by using numerical integration like Runge-Kutta methods. Moreover, the covariance matrix of EKF filter is defined as   PIIk|k PICk|k (23) Pk|k = PICk|k PCCk|k

The data driven dynamic model is a combination of 3D rigid body dynamics and the above IMU measurements, which can be represented by the following form: 1 Ω (ω) IW q¯ W v˙ I = CTIW q¯a − g 2 W p˙ I = W vI b˙ g = nwg b˙ a = nwa

I˙ ¯ (t) Wq

=

(16)

where CIW q¯ denotes a rotational matrix described by IW q¯ and g T as the gravity vector in world frame {W}. ω = [ωx ωy ωz ] is the angular velocity expressed in IMU frame {I}, while   −bω×c ω Ω (ω) = is the quaternion kinematic matrix −ω T 0 with bω×c representing the skew-symmetric matrix. The IMU biases are modelled as random walk process, driven by the Gaussian noise, nwg and nwa . Apart from the current IMU state xI mentioned above, a widely used state vector for filtering based approaches T includes current camera pose w χC = [WC q¯T W pTC ] , which can be transformed from the general form ξ by the methods in [20]. Thus the system state can be wrote as,  T x = IW q¯T W pTI W vIT bTg bTa WC q¯T W pTC (17)

where PIIk|k is the covariance matrix of the IMU state, PCCk|k is the 6N × 6N covariance matrix of the camera pose estimates, and PICk|k is the correlation between errors in IMU state and camera pose estimates. With this notation, the covariance matrix can be propagated by Pk+1|k = Fd Pk|k FdT + Qd

where the state-transition matrix can be calculated by assuming Fc and Gc to be constant over the integration time interval between two consecutive steps,

with other two differential equations, W˙ ¯ Cq

=0

W

p˙ C = 0

(24)

1 (25) Fd = exp(Fc ∆t) = I + Fc ∆t + Fc2 ∆t2 + · · · 2 and the discrete-time covariance matrix Qd can also be derived through numerical integration, Z T Qd = Fd (τ ) Gc Qc GTc Fd (τ ) dτ (26)

(18)

Applying the expectation operator in above equations, we obtain the prediction results using the IMU data driven dynamic model:     ˙ = 1Ω ω − b W˙ T Iˆ ˆg I q ˆ ˆ ¯ ˆ ¯ v = C a − b −g q I m I m a ˆ W W ¯ Wq 2 W˙ W˙ ˆ˙ g = 0 b ˆ˙ a = 0 W q ˆ˙ ˆ I = W vˆI b ˆC = 0 p p C¯ = 0 (19)

∆t

The mean and covariance propagation process can be summarized as follows, 1) when IMU data, ωm and am , in a certain sample frequency, is available to the filter, the state vector is propagated by using numerical integration on Eqs. (19). 2) calculate Fd and Qd according to equation (25) and (26) respectively. 3) the propagated state covariance matrix is computed from (24).

For the position, velocity, and bias state variables, the arithmetic difference can be applied (i.e., the error in the ˆ of a quantity x is defined as x ˜ = x − x), ˆ but estimate x the error quaternion should be defined under the assumption ˆ ¯ is the estimated value of local minimal situation [8]. If q ¯ then the orientation error is described by of quaternion q, ¯ which is defined by the relation the error quaternion δ q, ˆ¯ ⇒ δ q¯ = q¯ ⊗ q ˆ ¯ −1 . In this expression, the symbol q¯ = δ q¯ ⊗ q ⊗ denotes quaternion multiplication. The error quaternion δ q¯ can be written as #  "  1 1 δθ 2 δθ q 2 ≈ (20) δ q¯ = 1 1 − 14 δθT δθ

B. Visual measurement model and updating Due to the biases and noises in IMU data, the prediction results from the propagation step become worse as time increasing. However, the measurements from visual sensors are able to provide key information to bound the increasing errors. To do so, in an EKF framework, key information extracted from images should be cast into measurement equations. There are various ways to build a measurement model. For the loosely coupled methods, image alignment is used to obtain the position and orientation change, then fused with the prediction via an EKF framework [6]. In general, a non-linear algebraic equation can be viewed as the measurement equation:

Intuitively, the quaternion δ q¯ describes a small rotation that causes the true and estimated attitude to coincide. Thus, the error state vector containing 22-elements is h iT T ˜T b ˜T δθW T W p˜T ˜ = δθWI W p˜TI W v˜IT b x (21) g a C C By stacking the differential equations for error state, the linearised continuous-time error state equation can be formed as, ˜˙ = Fc x ˜ + Gc n x (22)

zk = h(xk ) + nm,k

(27)

where nm models the Gaussian noise with zero mean and covariance R in visual measurement. 890

After linearisation, the measurement error is expressed in a linear form: ˜ + nm z˜ = zˆ − z = H x (28) where H is the Jacobian matrix, and the noise term is Gaussian distributed and uncorrelated to the state error. zˆ and z represent the prediction and real measurement, respectively. The spatial relationship between a point u in the camera frame and a point W pu in the world frame {W} is established as following equation:  C pu = CCW q¯ W pu − W pC (29) The pixel’s position in the image frame C pu is linked to the state variables W pC and W q¯C in the world frame via the camera projection model. The measurement model thus can be expressed briefly as 1 [x y] + nm (30) z And the estimation of this point in the world frame W pˆu can be calculated by iterative minimisation methods beforehand [21]. When the updating step is ready, the Kalman gain is calculated as −1 K = Pk|k H T HPk|k H T + R (31) z=u=

.

˜ k+1 = K · z. ˜ Lastly, the error and the final correction as x state covariance is updated as Pk+1|k+1 = (I − KH) Pk+1|k

Fig. 2. Sample images for a forward movement. The histograms corresponding to the left images indicate the trend of pixel intensity during movement, which are used in entropy based pose tracking.

(32)

The full updating process is summarized as follows: 1) when visual data (raw image) is available, key information will be extracted by image processing. 2) calculate the residual as (28). 3) compute the Kalman gain as (31). 4) update the state by adding the correction. 5) update the error state covariance as (32).

In Fig. 4, the trajectory is plotted on a google map where the experiment took place. The estimated trajectory implies its start on the white point. We used radio control panel to manage this MAV, and the initial and final landing spot was measured by metre scale. Although the true final position with respect to the initial pose was approximately [−3, −2, −0.5]T m, the estimated final position from a modified dense tracking method based on [19] was [−8.1352, −9.8206, −3.7473]T m and from our entropy based tracking method was [−13.2563, −7.3573, −2.0541]T m. Thus the final position error is approximately 9.3559m from modified dense method and 11.5712 from ours in about 2.5km flying distance. This is remarkable because the algorithm did not detect the loop closure or depend on prior information of this area.

IV. R ESULTS The algorithm described in the proceeding sections has been tested through real data of out-door environments. The experiment setup consisted of a monocular camera hung beneath the MAV with optical axis 45◦ relative to the gravity direction, while the IMU mounted on-board providing inertial measurement at the rate of 100Hz. The MAV flew over the Martins farm, St. Osyth in Colchester, Essex, UK. During the experiment all data were stored on an on-board memory card and processed off-line on a desktop computer with Intel i53570 CPU at 3.40GHz. The original video was got from a Gopro Hero 3+ camera recording the images of resolution 1280 × 720 pixels at 48 fps in wide view, then defished the video through GroPro Studio software. The camera intrinsic parameters were acquired by a standard camera calibration process on a video of a black-white grids, which was recorded and defished as the same method above. The appearance of our MAV is shown as Fig. 3 and some sample images along a movement are shown in Fig. 2.

The results indicate that the proposed method is capable for estimating the trajectory of a moving platform which only contains a monocular camera and an IMU. After further developing, it is able to aid highly agile platforms like MAVs to produce accurate pose estimates in real-time. It is worth pointing out here the outdoor environment inevitably contains different and complex illumination conditions and magnetic interference, like a facula may occur when facing the sun, which leads to huge impact on image quality. 891

100m

Fig. 3.

The appearance of our MAV platform. Fig. 4. Estimated trajectory on real map. Trajectory marked as red dense dash line refers to the estimated results only from pure inertial estimation, trajectory marked as blue solid line refers to the results through a modified dense method based on [19], and the yellow loose dash line shows the entropy based tracking estimates.

V. C ONCLUSION In this paper, we propose a method, which combines entropy based pose tracking from monocular camera and inertial prediction in an Extend Kalman Filter (EKF) framework to perform an effective visual inertial odometry solution. In contract to other odometry methods, our solution gets rid of traditional feature extraction and expression, using the mutual information between images. This method is robust by natural using the significant information of images rather than sparse features. Besides, in the entropy based methods, the exactly relationship between two sets of image pixels and related adjustable parameters are not compulsory, thus optimal calculation can be easily adjusted based on computational ability of hardware system. The experiment with a promising result indicates the potential applications of navigation aiding for highly agile platforms. However, the results can be further improved by analysing the properties of the system and adjusting parameters properly.

[6] S. Weiss, M. W. Achtelik, S. Lynen, M. C. Achtelik, L. Kneip, M. Chli, and R. Siegwart, “Monocular Vision for Long-term Micro Aerial Vehicle State Estimation: A Compendium,” Journal of Field Robotics, vol. 30, no. 5, pp. 803–831, 2013. [7] J. Kelly and G. S. Sukhatme, “Visual-inertial sensor fusion: Localization, mapping and sensor-to-sensor self-calibration,” The International Journal of Robotics Research, vol. 30, no. 1, pp. 56–79, 2011. [8] M. Li and A. I. Mourikis, “Improving the accuracy of EKF-based visualinertial odometry,” in Robotics and Automation (ICRA), 2012 IEEE International Conference on. IEEE, 2012, pp. 828–835. [9] S. Leutenegger, P. T. Furgale, V. Rabaud, M. Chli, K. Konolige, and R. Siegwart, “Keyframe-Based Visual-Inertial SLAM using Nonlinear Optimization,” in Robotics: Science and Systems, 2013. [10] D. Scaramuzza and F. Fraundorfer, “Visual odometry [tutorial],” Robotics & Automation Magazine, IEEE, vol. 18, no. 4, pp. 80–92, 2011. [11] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “MonoSLAM: Real-time single camera SLAM,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, no. 6, pp. 1052–1067, 2007. [12] M. Pizzoli, C. Forster, and D. Scaramuzza, “REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time,” in Proc. IEEE Int. Conf. on Robotics and Automation, 2014. [13] J. Engel, J. Sturm, and D. Cremers, “Semi-Dense Visual Odometry for a Monocular Camera,” in Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013, pp. 1449–1456. [14] C. E. Shannon, “A mathematical theory of communication,” ACM SIGMOBILE Mobile Computing and Communications Review, vol. 5, no. 1, pp. 3–55, 2001. [15] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens, “Multimodality image registration by maximization of mutual information,” Medical Imaging, IEEE Transactions on, vol. 16, no. 2, pp. 187– 198, 1997. [16] A. Dame and E. Marchand, “Entropy-based visual servoing,” in Robotics and Automation, 2009. ICRA’09. IEEE International Conference on. IEEE, 2009, pp. 707–713. [17] P. Viola and W. M. Wells III, “Alignment by maximization of mutual information,” International journal of computer vision, vol. 24, no. 2, pp. 137–154, 1997. [18] J. J. Mor´e, “The Levenberg-Marquardt algorithm: implementation and theory,” in Numerical analysis. Springer, 1978, pp. 105–116. [19] J. Engel, T. Sch¨ops, and D. Cremers, “LSD-SLAM: Large-Scale Direct Monocular SLAM,” in Computer Vision–ECCV 2014. Springer, 2014, pp. 834–849. [20] R. M. Murray, Z. Li, S. S. Sastry, and S. S. Sastry, A mathematical introduction to robotic manipulation. CRC press, 1994. [21] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint Kalman filter for vision-aided inertial navigation,” in Robotics and Automation, 2007 IEEE International Conference on. IEEE, 2007, pp. 3565–3572.

ACKNOWLEDGMENT The first author has been financially supported by scholarship from China Scholarship Council. All the authors would like to thank Mr. Robin Dowling and Mr. Ian Dukes at the university of Essex for their technical support. R EFERENCES [1] G. Klein and D. Murray, “Parallel tracking and mapping for small AR workspaces,” in Mixed and Augmented Reality, 2007. ISMAR 2007. 6th IEEE and ACM International Symposium on. IEEE, 2007, pp. 225–234. [2] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “DTAM: Dense tracking and mapping in real-time,” in Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011, pp. 2320–2327. [3] C. Forster, M. Pizzoli, and D. Scaramuzza, “SVO: Fast Semi-Direct Monocular Visual Odometry,” in Proc. IEEE Intl. Conf. on Robotics and Automation, 2014. [4] S. Shen, Y. Mulgaonkar, N. Michael, and V. Kumar, “Multi-sensor fusion for robust autonomous flight in indoor and outdoor environments with a rotorcraft MAV,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on. IEEE, 2014, pp. 4974–4981. [5] R. Brockers, S. Susca, D. Zhu, and L. Matthies, “Fully self-contained vision-aided navigation and landing of a micro air vehicle independent from external sensor inputs,” in SPIE Defense, Security, and Sensing. International Society for Optics and Photonics, 2012.

892

Suggest Documents