A New Combined Vision Technique for Micro ... - Semantic Scholar

robotics Article

A New Combined Vision Technique for Micro Aerial Vehicle Pose Estimation Haiwen Yuan 1,2, *, Changshi Xiao 1,3,4, *, Supu Xiu 1 , Yuanqiao Wen 1,3,4 , Chunhui Zhou 1,3 and Qiliang Li 2, * 1 2 3 4

*

School of Navigation, Wuhan University of Technology, Wuhan 430063, China; [email protected] (S.X.); [email protected] (Y.W.); [email protected] (C.Z.) Electrical & Computer Engineering, George Mason University, Fairfax, VA 22030, USA Hubei Key Laboratory of Inland Shipping Technology, Wuhan 430063, China National Engineering Research Center for Water Transport Safety, Wuhan University of Technology, Wuhan 430063, China Correspondence: [email protected] (H.Y.); [email protected] (C.X.); [email protected] (Q.L.); Tel.: +571-992-7338 or 86-155-2727-2422 (H.Y.); +86-136-6727-3296 (C.X.); +703-993-1596 (Q.L.)

Academic Editors: Kah Bin Lim and Chui Chee Kong Received: 5 December 2016; Accepted: 23 March 2017; Published: 28 March 2017

Abstract: In this work, a new combined vision technique (CVT) is proposed, comprehensively developed, and experimentally tested for stable, precise unmanned micro aerial vehicle (MAV) pose estimation. The CVT combines two measurement methods (multi- and mono-view) based on different constraint conditions. These constraints are considered simultaneously by the particle filter framework to improve the accuracy of visual positioning. The framework, which is driven by an onboard inertial module, takes the positioning results from the visual system as measurements and updates the vehicle state. Moreover, experimental testing and data analysis have been carried out to verify the proposed algorithm, including multi-camera configuration, design and assembly of MAV systems, and the marker detection and matching between different views. Our results indicated that the combined vision technique is very attractive for high-performance MAV pose estimation. Keywords: micro aerial vehicle (MAV); triangulation; perspective-three-point (P3P); pose estimation; particle filter

1. Introduction In recent years, micro aerial vehicles have shown amazing capability in performing certain difficult tasks and movements, such as exact landing [1], team cooperation [2], building blocks [3], etc. In performing these tasks, state estimation for the aerial vehicles must be precisely obtained and be carried out in a real-time format because any small error or delay may lead to a complete failure. Therefore, the perception of environment and pose of MAVs is important for precise and real-time control. However, in a conventional GPS (Global Position System)-Inertial system, GPS is not reliable enough to deal with the task due to signal blockage or multi-path scattering. The inertial measurement unit (IMU) is able to offer nice pose measurements but suffers from accumulative error over time. Therefore, it needs frequent correction [4]. Although Light Detection and Ranging (LIDAR) can be used to explore the surrounding environment and calculate the relative pose of a MAV, its size and weight are out of the range of the load capacity of a micro aerial robot. In comparison, the most-adopted method follows a vision-based approach, which is able to provide enough accuracy to satisfy the needs for these MAV tasks, such as take-off, landing, indoor navigation, etc. State estimation for the MAV can be defined as the process of tracking the three-dimensional (3D) pose of the vehicle. The vision-based approach has been intensively studied Robotics 2017, 6, 6; doi:10.3390/robotics6020006

www.mdpi.com/journal/robotics

Robotics 2017, 6, 6

2 of 16

and recently developed into many different methods categorized by different state estimation methods. According to the arrangement of cameras, the vision-based state estimation can be roughly divided into two categories: onboard navigation and on-ground navigation. Onboard navigation refers to a MAV that tracks some structured or natural object to achieve self-state by using onboard vision, such as relative target and landmark navigation [5–7], visual odometry (VO) [8,9], simultaneous localization and mapping (SLAM) [10,11], etc. In the early stage of onboard navigation, relative target and landmark navigation were used to estimate the relative MAV pose via onboard vision [5–7], where the structure of the reference targets/landmarks is known. Then methods based on structure from motion (SFM), SLAM, and VO have been proposed one by one. Using the methods, the MAV pose can be estimated by tracking some typical features from the natural scene. Without the transformation from image plane to 3D space, bio-inspired optical flow techniques have been also applied to directly infer the MAV pose and motion depending on only the information in images [12,13]. Alternatively, a MAV calculates its pose by an external visual measuring system, which is usually called on-ground navigation. In such a system, one or more cameras are distributed in the surrounding and have the ability to capture the image data about MAVs in real time. Normally as a testbed, on-ground navigation is applied to test the algorithms of robot global positioning, control, and planning. In this situation, either carrying load or computation capacity is no longer considered to be a problem. Even though current studies mainly focus on onboard vision, like the above-mentioned methods, on-ground navigation is worth studying as well [14–18]. However, these systems used only the triangulation method in stereo camera configuration or the PNP (perspective-n-point) method in single camera configuration. For the on-ground navigation method, we have developed a new combined vision technique to significantly enhance the real-time MAV pose estimation, and built an on-ground visual system to test it. In this CVT, two positioning strategies with different constraints have been integrated by a particle filter framework at the same time: multi-view triangulation and monocular perspective-3-point (P3P) algorithms. The multi-view triangulation algorithm is based on the intersection of different views, while the monocular P3P technique is based on the shape change of a rigid body in the image. The visual system is expected to have a more stable performance when these two location methods (mono-view P3P and multi-view triangulation) are considered simultaneously by a PF framework. The remainder of this paper is organized as follows: Section 2 reviews the related work of visual measurement in MAV navigation. A summary of the visual system including hardware and software is presented in Section 3. In the hardware part, the MAV and camera configurations are introduced. In the software part, the feature point detection and multi-camera calibration are also described. Section 4 illustrates the proposed CVT visual algorithm which is used to calculate MAV poses by integrating the two different methods. Finally, experiment testing results are demonstrated and conclusions will be drawn in Sections 5 and 6, respectively. 2. Previous Research in On-Ground Visual Navigation for MAV Pose Estimation MAV pose estimation in different applications and many advanced strategies to achieve this task have been intensively studied for the past two decades. On-ground visual navigation is a commonly used strategy for motion capture, which employs a set of external cameras with known camera parameters to track and locate MAV. These cameras are usually installed around to observe the flight in space simultaneously. This strategy is usually applicable to a limited space but with good accuracy. For example, the VICON system (Vicon Motion Systems Ltd., Oxford, UK), a commercial product, is able to track and locate more than one robot at the same time. The system exhibits a good performance of positioning accuracy and processing rate. Based on this system, MAV control and navigation have been well demonstrated in [19–21], where MAVs are controlled by the feedback of the visual system to dance, play musical instruments, build houses, etc. However, such a configuration requires multiple high-speed external IR cameras and is too expensive. An indoor test-bed named RAVEN (Real-time indoor Autonomous Vehicle test ENvironment), based on VICON system, has been developed by the Aerospace Control Lab [22] for autonomous navigation of multiple rotorcrafts, where vision is the

Robotics 2017, 6, 6

3 of 16

only manner used for MAV pose estimation. The test-bed acquired pose information by detecting and locating markers installed around MAV. The markers are usually beacons or reflection spheres which are easy to recognize by infrared camera. Therefore, such a system has a high resolution for location positioning and has been used to perform complicated tasks, such as autonomous landing, charging, fancy flying, and multi-MAV motion planning. Martínez et al. [14] proposed two visual manners including onboard vision and on-ground vision to enable MAV automation. One detects and tracks the planar structure by an onboard camera. The other one is a 3D reconstruction of the position of MAV by an on-ground camera system. The on-ground camera system consists of three visible light cameras, by which the color markers attached to the flying vehicle can be tracked and located. However, the results are rough because the markers on the aircrafts are too large and the view is limited. In the work by Faessler et al. [15], the MAV pose estimation was done with the P3P algorithm. An infrared camera is used as the major sensor in such a system to assist detection. However, it takes a certain calculation time to distinguish markers attached to different positions of the aerial body, as they look the same in the infrared image. Different from the above, Hyondong et al. [16] employed a visible light camera instead of an infrared camera to detect and track indoor MAV pose with color balls as the markers. Since onboard markers are attached to the pre-defined location and have different color features, the position and attitude of MAV could be recognized by color feature and triangulation. In addition, an extended Kalman filter (EKF) framework integrated with MAV dynamic was introduced to improve the performance of the visual measurement system. For example in [17], two stationary and upward-looking cameras were placed on the ground to track four black balls attached to the helicopter. The errors between the positions of the tracked balls and pre-specified references are taken as the visual feedback control input. A pair of ground and onboard cameras was employed for the problem of quadrotor pose estimation [18]. The two cameras were set to face each other to estimate the full 6-degrees-of-freedom (DOF) pose of MAVs. Different from the above-mentioned works, this paper did not pay too much attention to the system configuration, but focused on how to make use of the image data from a visual system to archive a stable pose estimation. In this work we will try to consider the two kinds of location methods (mono-view P3P and multi-view triangulation) simultaneously by a PF framework. 3. Description of the Visual Measurement System The on-ground visual measurement system is designed to test the proposed CVT and includes the cameras, image grabber, PC processor, and self-built MAV with markers. The major focuses in this study are calibration of the multi-camera system, MAV system design, and MAV detection. 3.1. Multi-Camera System Calibration Before starting the measuring work, the multi-camera system should be calibrated to acquire a mapping between the features in the real 3D space and the 2D image. By considering the basic pinhole camera model, the calibration setups are separated into two setups. The first setup is for intrinsic parameter calibration where each camera is individually calibrated with the Bouguet camera calibration toolbox [23]. The second setup is for extrinsic parameter calibration carried out by the multi-camera self-calibration toolbox [24]. In our measurement system (see Figure 1), four analog CCD (Charge Coupled Device) cameras are connected to a PC station via a general 8-channel image grabber. So the data source from the cameras can be captured synchronously and processed in real-time with the help of the OpenCV library. At first, we use several image chessboards from different camera views and calibrate the four intrinsic parameters of every camera, including the focal length (fx , fy ) and principal point (µ0 , ν0 ). These parameters constitute the internal matrix (K-matrix) in the imaging model, which can be written as:

Robotics 2017, 6, 6 Robotics 2017, 6, x

4 of 16 4 of 17

 fx e world Re world | t ] X world X PX where K == 0 se ximage sx=image K [RK |t[] X =P  world , where  0 

 f x0 u00  u0   0f y fvy0  v0  00 01  1

(1) (1)

𝑇e ̃ where xe𝑥̃image is the 2D image point represented by a homogeneous vector (µ, ν, 1) T , X is the 3D 𝑋𝑤𝑜𝑟𝑙𝑑 is the 𝑖𝑚𝑎𝑔𝑒 is the 2D image point represented by a homogeneous vector (𝜇, 𝜈, 1) ,world T 𝑇 world point point represented by a homogeneous four-vectorfour-vector , and the ( X, Y, Z, 1) (𝑋, 3D world represented by a homogeneous 𝑌, s𝑍,is1)a scalar , andrepresenting s is a scalar depth information. R is the rotation transform andtransform t is the translation transform from representing the depth information. R is the matrix rotation matrix and t is thematrix translation the world frame camera frame.frame R, t, and K-matrix constitute theK-matrix mappingconstitute between the world transform matrixto from the world to camera frame. R, t, and the 3D mapping and 2D image, called thewhich Projective matrix (P-matrix). between the 3Dwhich worldcan andbe2D image, can be called the Projective matrix (P-matrix).

Figure system. Figure 1. 1. Structure Structure diagram diagram of of the the multi-camera multi-camera measurement measurement system.

Once K-matrix is acquired, only seven parameters remain unknown for every camera, including Once K-matrix is acquired, only seven parameters remain unknown for every camera, the scalar s, 3-vector r, and 3-vector t, respectively, in the P-matrix. All the image points, 3D-world including the scalar s, 3-vector r, and 3-vector t, respectively, in the P-matrix. All the image points, points, and camera projectives from all the cameras can be put into the following equation: 3D-world points, and camera projectives from all the cameras can be put into the following equation: 



 s1   1     Ws =  Ws      m  s1 

  u11 

1 u   n     1 1  1 1   1    P P1    h i ..  .. .. =   ..  X 1 e 1X n · ·4·n X en X    . .   . 4× n  . m  m  Pm  m un     P 3 m4 u1m u1  um  n 3m × 4  m  m  m  mm   v1m s1  v1· ·· sm n snvn vn   1  1  1 1       

u11 1  1  s1 v1  v11   · · ·   

u1  ...1  s1n 1n v1n sn  vn    

(2) (2)

Based on on the Wss can can be be factorized factorized to to recover recover the the projective projective motion motion P P and and the Based the equation equation above, above, W the i i 𝑖 𝑖 projective shape X µ j𝑗,, 𝜈ν𝑗j)) can be collected. During the the calibration of X ifif enough enough noiseless noiseless points points ((𝜇 collected. During extrinsic parameters, spots willwill be waved through the working volume of the camera system, extrinsic parameters,the thebright bright spots be waved through the working volume of the camera as shown in Figure 2a. Besides intrinsic parameters, the coordinates of the bright spots in all images system, as shown in Figure 2a. Besides intrinsic parameters, the coordinates of the bright spots in all are the known The detailed and calibration processes have been previously images are theinformation. known information. The computation detailed computation and calibration processes have been explained [24]. With the multi-camera self-calibration toolbox, not only the extrinsic parameters of previously explained [24]. With the multi-camera self-calibration toolbox, not only the extrinsic all cameras of butallalso the coordinates of coordinates bright spotsof inbright the 3Dspots spaceincan is shown parameters cameras but also the thebe 3Drecovered. space can This be recovered. in Figure 2b. This is shown in Figure 2b.

Robotics 2017, 6, 6

5 of 16

Robotics 2017, 6, x

5 of 17

Robotics 2017, 6, x

5 of 17

(a)

(b)

Figure 2. Multi-camera extrinsic calibration. (a) The waving bright spot in the camera view is

Figure 2. Multi-camera extrinsic calibration. (a) The waving bright spot in the camera view is detected; detected; (b) recovery results of the camera structure and spot motion. (a) (b) (b) recovery results of the camera structure and spot motion.

3.2. MAV and Detection FigureDesign 2. Multi-camera extrinsic calibration. (a) The waving bright spot in the camera view is detected; (b)Detection recovery results of the camera structure and spot motion. 3.2. MAV Design and As shown in Figure 3, the MAV system is built with only one airframe, four speed controllers rotorsDesign and propellers, microcontroller, communication module, and IMU sensor. In the Asand shown Figure 3, theonboard MAV system is built with only one airframe, four speed controllers 3.2. MAV in and Detection system, the main part for control and information processing is operated by a computer (PC) on the and rotors and propellers, onboard microcontroller, communication module, and IMU sensor. As shown in Figuremicrocontroller, 3, the MAV system is built with only one airframe, four speed controllers ground. The onboard STM32F103 (STMicroelectronics, Shanghai, China), is in In the system, theand main part for control and information processing is operated a computer (PC) and rotors propellers, microcontroller, communication module, and IMUby sensor. In the charge of receiving speedonboard command from the ground PC to generate corresponding Pulse-Width on the ground. microcontroller, STM32F103 (STMicroelectronics, system, theThe mainonboard part for control and information processing is operated by a computerShanghai, (PC) on the China), Modulation (PWM) for the speed controllers to control the four motors. Also, the microcontroller ground. receiving The onboard microcontroller, STM32F103 (STMicroelectronics, Shanghai, China), Pulse-Width is in is in charge speed command from the ground PCIMU to generate corresponding can beofutilized to capture the inertial information from the sensor and send it to the ground PC charge of receiving speed command from the ground PC to generate corresponding Pulse-Width Modulation forIMU, the speed controllers to control the four Also, the microcontroller in real(PWM) time. The MPU9150 (InvenSense, San Jose, CA, US),motors. is a unique sensor on the drone, Modulation (PWM) for the speed controllers to control the four motors. Also, the microcontroller can be which utilized capture thesending inertialmeasurements information from IMU sensor and send it and to the ground is to continuously of thethe attitude angle, acceleration, angular can be utilized to capture the inertial information from the IMU sensor and send it to the ground PC PC in real time. The IMU,MPU9150 MPU9150 (InvenSense, SanCA, Jose, CA, US), is a is unique velocity to the at(InvenSense, 100 Hz. TheSan module Xbee PRO S3B responsible foron thethe in real time. Themicrocontroller IMU, Jose, US), is a900HP unique sensor on thesensor drone, communication between the onboard and ground PC, which is able to work in full-duplex mode. A drone, which is continuously sending measurements of the attitude angle, acceleration, and angular which is continuously sending measurements of the attitude angle, acceleration, and angular connected to theatground PC by simulator and is used to S3B change the flight for state thethe velocityjoystick to the is microcontroller 100 Hz. Xbee PRO 900HP S3B is responsible velocity to the microcontroller at 100 Hz.The Thea module module Xbee PRO 900HP is responsible theoffor MAVs. In between orderbetween to conveniently and efficiently test the control the lab, themode. quadrotor is communication onboard and ground which is able to work in full-duplex A mode. communication thethe onboard and groundPC, PC, which is system able toin work in full-duplex mounted on a planktovia a ground cardan PC joint so that its state can be flexibly adjusted without flying far joystick is connected the by a simulator and is used to change the flight state of the A joystick is connected to the ground PC by a simulator and is used to change the flight state of the away. MAVs. In order to conveniently and efficiently test the control system in the lab, the quadrotor is MAVs. In order to conveniently and efficiently test the control system in the lab, the quadrotor is mounted on a plank via a cardan joint so that its state can be flexibly adjusted without flying far mounted on a plank via a cardan joint so that its state can be flexibly adjusted without flying far away. away.

(a)

(b)

(a) MAV (micro aerial vehicle) system (a) and ground (b) PC station (b). Figure 3. The Figure 3. The MAV (micro aerial vehicle) system (a) and ground PC station (b).

Figure The (micro aerial vehicle)insystem (a) and ground PCthree station (b). color balls As color is 3. one of MAV the most distinct features the environment, we set different around the airframe as markers for detection. Firstly, Gaussian filtering is done with a five-by-five As color is one of the most distinct features in the environment, we set three different color balls around the airframe as markers for detection. Firstly, Gaussian filtering is done with a five-by-five patch for every image to remove some sharp and unnecessary information. Because of different As color is one of the most distinct features in the environment, we set three different color balls patch for to remove some sharp and unnecessary information. of different colors, theevery three image balls could be well segmented from the scene in HSV (hue, Because saturation, value) space. around the airframe as markers for detection. Firstly, Gaussian filtering is done with a five-by-five colors, the the three balls could be in well fromdifferent the sceneviews in HSV (hue, saturation, value) space. with However, measurements thesegmented images from (cameras) are not independent patch for every image to remove some sharp and unnecessary information. Because of different However, the measurements in the images from different views (cameras) are not independent with colors, each other in the multi-camera configuration. It should be noted that there is an epipolar geometry

the threeeach balls could bemulti-camera well segmented from theItscene inbe HSV (hue, value) space. However, other in the configuration. should noted thatsaturation, there is an epipolar geometry the measurements in the images from different views (cameras) are not independent with each other in the multi-camera configuration. It should be noted that there is an epipolar geometry constraint

Robotics 2017, 6, 6

6 of 16

between every two fixed views. The constraint allows the correspondence of image point x from one view and image point x’ from the other view, which can be written as: Robotics 2017, 6, x 6 of 17 xe0 Fe x = 0 allows the correspondence of image point (3) constraint between every two fixed views. The constraint x from one view and image point x’ from the other view, which can be written as: where F is the fundamental matrix that contains the information of the two cameras’ intrinsic and (3) x ' Fx  0 extrinsic parameters and can be acquired in advance according to Equation (4), where F is the fundamental matrix that contains theh information of the two cameras’ intrinsic and i −T 0advance T according T extrinsic parameters and can be acquired in to Equation (4), F=K RK KR t (4)

F  K T RK T  KRT t 

×

(4)

 where K 0 and K are the intrinsic matrices of the two cameras, while R and t represent the 3 × 3 rotation where K’ and the intrinsic matrices the equation, two cameras, R and t represent the 3 × 3 x0 matrix and the 3 ×K1 are translation matrix. Fromofthis we while can determine the corresponding

→ → rotation matrix and the 3 × 1 translation matrix. From this equation, we can determine the

in the other view along the epipolar line l = F x once the point x in one view is found. As a result, corresponding x’ in the other view along the epipolar line 𝑙⃗ = 𝑭𝑥⃗ once the point x in one view is the detection in all views takes less processing time and improves noise immunity. The results of the found. As a result, the detection in all views takes less processing time and improves noise marker detection are shown in Figure 4a–e, showing that the detection is steady and the mean error is immunity. The results of the marker detection are shown in Figure 4a–e, showing that the detection less than one pixel. is steady and the mean error is less than one pixel.

(a)

(b)

(c)

(d)

(e)

Figure 4. Detection of one marker in two views. (a) marker detection and the corresponding epipolar

Figure 4. Detection of one marker in two views. (a) marker detection and the corresponding epipolar line in one view; (b) marker detection and the corresponding epipolar line in the other view; (c) the line in onecoordinates view; (b) by marker detection and corresponding epipolar line incoordinates the otherbyview; image continuous detection forthe the three markers in (a); (d) the image (c) thecontinuous image coordinates by continuous detection for the three markers in (a); (d) the image coordinates detection for the three markers in (b); (e) amplification of the green marker detection in by continuous detection for the three markers in (b); (e) amplification of the green marker detection in (a). (a).

4. Pose Estimation 4. Pose Estimation 4.1. Marker Location MAV Pose Computation 4.1. Marker Location andand MAV Pose Computation It is noted that there is a unique solution to determine the position of a 3D point in the two-view

It is noted that there is a unique solution to determine the position of a 3D point in the two-view geometry by triangulation. Given the parameters of one camera, every point in the image can be geometry by triangulation. Given the parameters of one camera, every point in the image can be used used by the two equations to determine the three unknown coordinates of the corresponding point by theintwo equations to determine the three unknown coordinates of the point 3D space according to the projection model (Equation (2)). Therefore, in corresponding the case of two or morein 3D spacethan according to the projection (Equation (2)). to Therefore, the case ofmore two equations or more than two views, there are more model than enough equations obtain thein solution. The two views, there are more than enough equations to obtain the solution. The more equations that are that are considered, the higher accuracy the solution has. The fundamentals of the triangulation considered, solution has. The of the triangulation methodthe arehigher shown accuracy in Figure the 5a. In this method, the fundamentals color marker represented by point Xmethod can be are located in the5a. 3DIn world based on the of the corresponding positions in the image planes shown in Figure this method, the knowledge color marker represented by point X can be located in the 3D world based on the knowledge of the corresponding positions in the image planes taken by Camera

Robotics 2017, 6, 6

7 of 16

Robotics 2017, 6, x

7 of 17

taken N. by The Camera 1–Camera N. The detailed computation is (5), shown in Equation (5), whichof is all a the 1–Camera detailed computation is shown in Equation which is a combination combination of all the projective equations, projective equations, 1 1  P h i TT  P  →  .. X NN N →1 s1 x1 → 1 N  s x  X (5) (5) s x  · · · s x  =   .  PN P N   3 N 4 3N ×4 ⃗⃗𝑖 denotes the →i i, 𝑃 where 𝑥⃗ 𝑖 denotes the homogeneous coordinate of the marker in image plane →i

whereprojective x denotes the homogeneous the marker in image planeofi, the P denotes projective matrix of camera i, andcoordinate 𝑋⃗ denotesof the homogeneous coordinate marker the in the 3D →

world. matrix of camera i, and X denotes the homogeneous coordinate of the marker in the 3D world.

(a)

(b)

(c) Figure 5. (a) The multi-view triangulation method. (b) The mono-view P3P (perspective-three-point)

Figure 5. (a) The multi-view triangulation method. (b) The mono-view P3P (perspective-three-point) method. (c) MAV pose computation. method. (c) MAV pose computation.

At the same time, by using the known relative relations among the three color markers on the rigid marker andknown the rigid body pose relativeamong to the camera frame canmarkers be derived. At thebody, samethetime, by position using the relative relations the three color on the is a so-called problem, where onlyrelative given information is the actual distances rigid This body,method the marker positionP3P and the rigid bodythe pose to the camera frame can be derived. between each two markers. details ofwhere the P3P method are shown in Figure 5b. Theactual image point This method is a so-called P3PThe problem, the only given information is the distances 𝑚𝑖(𝜇 , 𝜈 ) with focal normalization is calculated with Equation (6). The side lengths of the image trianglepoint 𝑖 𝑖 between each two markers. The details of the P3P method are shown in Figure 5b. The structured from the markers M1, M2, and M3 are denoted by a, b, and c, respectively. In fact, the mi (µi , νi ) with focal normalization is calculated with Equation (6). The side lengths of the triangle vector [𝑥𝑐𝑖 , 𝑥𝑐𝑖 , 1]𝑇 has the same direction with the unit vector 𝑒⃗𝑖 , which is expressed in Equation (7). structured from the markers M1, M2, and M3 are denoted by a, b, and c, respectively. In fact, the vector → [ xci , xci , 1] T has the same direction with the unit vector e i , which is expressed in Equation (7).

h

xci

yci

1

iT

= K −1

h

ui

vi

1

iT

(6)

Robotics 2017, 6, 6

8 of 16

→

ei = q

1

h

2 + y2 + 1 x1ci 1ci

xci

yci

1

iT

(7)

Consequently, the cosine of the angles between the three unit vectors could be determined by Equation (8). Therefore, the three equations about the distance d1 , d2 , and d3 between the points M1–M3 and the center of camera can be obtained according to the cosine law, as described in Equation (8). The equations can be solved by linear iteration with two real solutions of d1 –d3 .  T   cos α = e2 e3 cos β = e1T e3   cos γ = e T e 1 2

(8)

 2 2 2   d2 + d3 − 2d2 d3 cos α = a 2 2 d1 + d3 − 2d1 d3 cos β = b2   d2 + d2 − 2d d cos γ = c2 1 2 2 1

(9)

Finally, the positions of the three markers M1–M3 relative to the camera reference frame can be determined by Equation (10). Since the camera orientation in the 3D world frame has been calibrated in advance, the positions of the three markers M1–M3 relative to the 3D world frame can be also calculated. → → Mi = di e i , i = 1, 2, 3 (10) Because the positions of markers M1–M3 on the MAV are given, the pose of the MAV relative to the 3D world reference can be determined from the discussion above. The MAV pose contains both position and attitude, as shown in Figure 5c. The center of the markers M1 and M2 is defined h→ → →i → → → → as the MAV position. In the rotation matrix R = n , o , a , the three vectors, n , o , and a represent →

→

→

the x b , y b , z b direction of the body frame in n the 3D world o coordinates, respectively. As shown in Equation (11), the MAV body reference frame

→

→

→

x b , y b , z b can be built with the vector from the center

→

→

→

between M2 and M3 to M1 as x b , and the vector from M2 to M3 as y b , and z b to be determined by the right-hand rule,  →0 → → → 1  ( M + M n = M −  2 3) 1 2   →0  →  n  n = →0   kn k  → → 0 → (11) o = M 3 − M2   →0  →   o = →o 0    ko k   → → → a = n×o →

→

→

where a can be determined by the cross product of n and o following the orthogonality of R. Now, the rotation and translation relationship between the body frame and the world frame has been derived. The three attitude angles (roll, pitch, yaw) of the MAV can be solved by the Rodriguez transformation expressed in Equation (12): h iT → = Rodriguez( R 3×3 ) (12) ϕ θ ψ where ψ, ϕ and θ represent yaw, pitch, and roll angle, respectively. It should be noted that each angle needs to be adjusted to a different range so that it can be consistent with the IMU output. In general, for a rigid body (MAV) the three attitude angles (roll, pitch, and yaw) should have a defined range and change direction. For example, the range of the roll angle is set to [−180◦ , 180◦ ], the range of the pitch angle is set to [−90◦ , 90◦ ], and the range of the yaw angle is set to [0◦ , 360◦ ]. However, the computation result acquired directly from the visual system is [0◦ , 360◦ ]. So it is necessary that the two sets of attitudes are adjusted to be consistent. The adjustment is also called “normalization”.

Robotics 2017, 6, 6

9 of 16

4.2. Pose Estimation As the measurement is taken from the visual system, the MAV pose is continuously calculated from multi-view triangulation and mono-view P3P, which will be used to correct or update the state of MAVs in a fusion framework. In general, inertial measurements taken at high rate (100 Hz–2 kHz) are fused with lower rate exteroceptive updates from vision, or GPS. For the MAV pose estimation, conventional approaches are based on the indirect formulations of Extended or Unscented Kalman Filters and Particle Filters (PF). For a multiple-camera measurement of one quantity at an approximate same time, the framework in this work is based on the indirect formulation of an iterated PF, where the state prediction is driven by IMU. →

→

→

→

→

→

With I = { e 1 , e 2 , e 3 } as a right-hand inertial frame, B = { b 1 , b 2 , b 3 } defines a body-fixed frame with its center at the center of mass of the vehicle. The ground 3D-world reference in the visual system is set to be consistent with the inertial frame. The motion model of the rigid body can be derived based on the kinematic equation of attitude and Newton’s equation of motion, as expressed →

→

in Equation (13). ξ I = ( x, y, z) is the position of the center of mass of the vehicle in frame I. v I is →

the linear velocity in frame I. Ω = ( p, q, r ) T is the angular velocity of the airframe with respect to the frame B. The attitude of the rigid body is given by the rotation matrix RBI ( ϕ, θ, ψ) : B → I.         

. →

→

ξI = vI

. →

→

1 I m RB F → = RBI Ω× → → − Ω × × IΩ + τ

vI = ge3+ . I RB

    .   →   IΩ =

(13)

→

The operator (·)× maps Ω into an anti-symmetric matrix. m represents the rigid object’s mass. I ∈ R3×3 represents the inertia matrix which is a constant. The vectors F and τ ∈ {B} represent the principal non-conservative forces and moments applied to the rotorcraft airframe by the aerodynamics of the rotors, respectively. It is difficult for a PF to deal with high dimensional states. This is because the filter is likely to be divergent as the dimensions increase. The estimated state of a MAV consists of . . . the linear velocity ( x, y, z), the attitude angle ( ϕ, θ, ψ) in the inertial frame, the acceleration, and the angular velocity in the body frame. The acceleration and the angular velocity measured by the IMU are supposed to be the sum of the real value, a bias noise, and a Gaussian noise. Therefore, the number of the PF state is 12, as expressed in the following, . . . T x k = x, y, z, ϕ, θ, ψ, abx , aby , abz , p, q, r

→

(14)

Measurements of the MAV state include its position ( x, y, z) and attitude ( ϕ, θ, ψ) in the inertial reference coordinate system. Measurement noises are complicated and mainly come from camera calibration, marker detection, and varying environmental conditions. A particle filter is applied to deal with the noises, because it is capable of modeling and canceling the random noise. Such a particle filter is able to simulate the true state of the MAV and many measurements for one state by using different testing methods with the large number of particles in the filter. Therefore, it is used to estimate the state of the MAV in this work. Similar to other filter frameworks [25–27], our estimation based on PF mainly consists of the predicting and the updating periods. The detailed process is shown in Figure 6. During the predicting period, the state of every particle evolves from xk to xk + 1 driven by the IMU based on the motion model discussed above, with fixed weights for the moment. This is followed by the updating period, when the weights of all particles are adjusted more than once. The weights are proportional to the separation of the predicted state and the measurement in every updating step. At first, the weights

Robotics 2017, 6, 6

10 of 16

of all particles are adjusted one by one based on the measurement results from each camera. Then, 2017, 6, xperiod, the particle weights based on multi- and mono-view measurements 10 of 17 after theRobotics updating are normalized. As a result, the state of each particle is estimated by accumulating the product of the Then, after the updating period, the particle weights based on multi- and mono-view measurements particle are andnormalized. its weight. addition, the of particles withislittle weight will be abandoned, while particles AsIn a result, the state each particle estimated by accumulating the product of the with large weight will be resampled. As a result, the particles to be measured are always enough in particle and its weight. In addition, the particles with little weight will be abandoned, while particles Robotics 2017, 6, x 10 of 17 with large weight will be resampled. As a result, the particles to be measured are always enough in the procedure. theThen, procedure. after the updating period, the particle weights based on multi- and mono-view measurements are normalized. As a result, the state of each particle is estimated by accumulating the product of the particle and its weight. In addition, the particles with little weight will be abandoned, while particles with large weight will be resampled. As a result, the particles to be measured are always enough in the procedure.

Figure 6. State estimation based on multiple measurements from the visual system.

Figure 6. State estimation based on multiple measurements from the visual system.

5. Experiments and Analyses

5. Experiments and6.Analyses Figure State estimation based on multiple measurements from the visual system. The vision-based pose estimation algorithm is tested in our laboratory with a small quadrotor shown in pose Figureestimation 7. A visual algorithm system is setisup with in four cameras and a with PC with an image Thesystem vision-based tested our laboratory a small quadrotor 5. Experiments and grabber. As shown in Analyses Figure 7, the quadrotor is flying in the view of the visual system. Its position system shown in Figure 7. A visual system is set up with four cameras and a PC with an image and attitude relative to pose the ground 3D algorithm reference isframe be calculated by the above-mentioned The vision-based estimation testedcan in our laboratory with a small quadrotor grabber.procedure. As shown in Figure 7,visible the is is flying thein view of theand visual Its position and The low-cost lightsystem cameras employed thecameras system areadistributed a system shown in Figure 7. Aquadrotor visual set upinwith four PCsystem. with an around image attitude4relative to the ground 3D reference frame can be calculated by the above-mentioned procedure. in room. Figure 7, the quadrotor is flying in the view of the visual system. Its position mgrabber. × 3 m ×As 3 mshown testing andvisible attitude light relative to the ground 3D reference canare be calculated by the above-mentioned The low-cost cameras employed in theframe system distributed around a4m×3m×3m procedure. The low-cost visible light cameras employed in the system are distributed around a testing room. 4 m × 3 m × 3 m testing room.

Figure 7. The multi-camera measurement system and the testing MAV. 7. The multi-camera measurement system and the testing MAV. 5.1. Pose ComputationFigure Results

Figure 7. The multi-camera measurement system and the testing MAV.

When the positions of the three markers in the image are available, the location of the MAV can 5.1. Pose Computation Results be calculated by multi- or mono-view geometry. As shown in Figure 8A,B, the position and attitudes of the testing MAV are calculated by using the two methods, respectively. On account of the P3P be calculated by multi- or mono-view geometry. As shown in Figure 8A,B, the position and attitudes method introducedof inthe Section 4, markers two set ofinsolutions are are obtained and are discriminative with red can be When three thetwo image available, ofthe the MAV ofthe thepositions testing MAV are calculated by using the methods, respectively.the Onlocation account of P3P and blue lines in Figure 8B. Even though the two solutions of the position calculation are almost introduced in Section 4, two set of solutions are in obtained are discriminative calculatedmethod by multior mono-view geometry. As shown Figureand 8A,B, the positionwith andred attitudes of similar, the lines smallindisparity give rise a large difference on the following angle and blue Figure 8B.would Even though thetotwo solutions of the position calculationattitude are almost

5.1. Pose Computation When theResults positions of the three markers in the image are available, the location of the MAV can

the testing MAV are calculated by using the two methods, respectively. On account of the P3P method similar, the small disparity would give rise to a large difference on the following attitude angle introduced in Section 4, two set of solutions are obtained and are discriminative with red and blue lines in Figure 8B. Even though the two solutions of the position calculation are almost similar, the small disparity would give rise to a large difference on the following attitude angle calculations. It could

Robotics 2017, 6, 6

11 of 16

Robotics 2017, x 11 of displayed 17 be observed that6,one of the two solutions is more close to the solution of the triangulation in Figure 8A. The other solution could be discarded using the triangulation solution as the reference. calculations. It could be observed that one of the two solutions is more close to the solution of the As a result, this provides a solving method to distinguish the two solutions from P3P. Then by triangulation displayed in Figure 8A. The other solution could be discarded using the triangulation calculating theasposition pitch, yawto thesolutions corresponding ( x, y,Asz)a and ) continuously, solution the reference. result,the thisattitudes provides a(roll, solving method distinguish the two histograms about the multi-and mono-view measurements are shown in Figures 9 and 10, ) from P3P. Then by calculating the position ( , , ) and the attitudes ( , ℎ, respectively. continuously, the corresponding histograms about the multi-and mono-view measurements aremethods, This information could help us understand in advance some performance of the two location in Figures and 10, respectively. This information could us understandof inthe advance such asshown stability, if the9results of the methods are regarded ashelp measurements filtersome framework. performance of the two location methods, such as stability, if the results of the methods are regarded Through the histograms, each measured variable, whether it is position or attitude, is centered on a as measurements of the filter framework. Through the histograms, each measured variable, whether fixed value and changes around it. it is position or attitude, is centered on a fixed value and changes around it. In Figure 11, we11, compared the estimated attitude angles with thethe attitude and heading In Figure we compared the estimated attitude angles with attitude and headingreference ◦ –1◦ and the systemreference (AHRS)system to verify the to validity ofvalidity the visual The precision of AHRS is is0.5 (AHRS) verify the of thesystem. visual system. The precision of AHRS 0.5°–1° ◦ and is the0.1 resolution is 0.1°.inAs shown11, inboth Figure bothand the visual IMU and visual measurements are resolution . As shown Figure the11,IMU measurements are obtained and obtained andsame displayed the same when theisquadrotor fixed on thevia plank via cardan displayed at the timeatwhen thetime quadrotor fixed onisthe plank cardan lock. lock. The results The results indicated that the visual measurements including roll, pitch, and yaw angles are quite indicated that the visual measurements including roll, pitch, and yaw angles are quite consistent with consistent with the IMU data. This comparison indicated that the proposed technique is quite the IMU data. This comparison indicated that the proposed technique is quite accurate. accurate. (a) position in X direction (unit: mm)

(a) position in X direction (unit: mm)

600 560

0

100

200 300 400 500 (b) position in Y direction (unit: mm)

600

700

100

200 300 400 500 (c) position in Z direction (unit: mm)

600

700

200 300 400 500 (b) position in Y direction (unit: mm)

600

700

0

100

200 300 400 500 (c) position in Z direction (unit: mm)

600

700

0

100

200

300 400 (d) roll angle (unit: °)

500

600

700

0

100

200

300 400 (e) pitch angle (unit: °)

500

600

700

0

100

200

300 400 (f) yaw angle (unit: °)

500

600

700

0

100

200

300 400 image sequence

500

600

700

200

600 0

100

200

300 400 (d) roll angle (unit: °)

500

600

500

700

-24.5

50

-25

0 0

100

200

300 400 (e) pitch angle (unit: °)

500

600

700

-50

700

200 100 0

5 0 0

100

200

300 400 (f) yaw angle (unit: °)

500

600

3

0 -20

2 1

100

700

660

-5

0

220 0

680

-25.5

400 240

225

640

solution2

500

230 220

solution1

600

580

0

100

200

300 400 image sequence

500

600

700

-40

(A)

(B)

Figure 8. Pose computation when the quadrotor keeps a fixed state. (A) Results of the multi-view

Figure triangulation 8. Pose computation when the quadrotor keeps a fixed state. (A) Results of the multi-view method. (B) Results of the mono-view P3P method. triangulation (B) Results of the mono-view P3P method. Robotics 2017, method. 6, x 12 of 17 350

250

300

300

200

250

250

150

200

200 150

150

100

100

100

50

50

50

0 560

565

570

575

580

585

X(mm)

200

0 220

222

224 226 Y(mm)

228

230

650

655 Z(mm)

660

665

2.5

3

350

250

300

200

150

0 645

250 150

200

100

150

100

100

50 50

0 -25.3

-25.2

-25.1 -25 roll angle(°)

-24.9

-24.8

0 -0.5

50 0

0.5

1 1.5 pitch angle (°)

2

2.5

3

0

1

1.5

2 yaw angle (°)

9. Histograms of MAV fixedorientation orientation by multi-view measurement. FigureFigure 9. Histograms of MAV atatfixed bythe the multi-view measurement. 300

350

250

300

200

250

150

100

200 150

150 100 50

100 50

50

100

150

100

100

50 50

0 -25.3

-25.2

-25.1 -25 roll angle(°)

Robotics 2017, 6, 6

-24.9

-24.8

0 -0.5

50 0

0.5

1 1.5 pitch angle (°)

2

2.5

0

3

1

1.5

2 yaw angle (°)

2.5

3

12 of 16

Figure 9. Histograms of MAV at fixed orientation by the multi-view measurement.

300

350

250

300

150

250

200

100

200 150

150 100

50

100

50

50

0 555

560

565 X(mm)

570

575

0 204

206

208 Y(mm)

210

0 610

212

613

614

615

160 140

250

200

612 Z(mm)

300

250

611

120

200

100

150 150

100

60 40

50

0 -32

80

100 50

-31.5

-31

-30.5 -30 roll angle (°)

-29.5

-29

-28.5

0

20

1

1.5

2

2.5 3 pitch angle(°)

3.5

4

0 -33.8

4.5

-33.7

-33.6 yaw angle (°)

-33.5

-33.4

Figure 10. Histograms MAVatatfixed fixedorientation orientation by measurement. Histograms ofofMAV bythe themonocular monocular measurement. 13 of 17

Figure Robotics 2017,10. 6, x

roll (°)

20

est imu

0 -20 -40

pitch (°)

0 20

20

40

60

80

100

120

140

0

20

40

60

80

100

120

140

0

20

40

60 time (s) 80

100

120

140

0 -20

yaw (°)

0 -50 -100 -150

Figure 11. MAV attitudes from visual (blue) and IMU (Inertial Measurement Unit) (red) measurements.

Figure 11. MAV attitudes from visual (blue) and IMU (Inertial Measurement Unit) (red) measurements. 5.2. MAV Pose Estimation 5.2. MAV Pose Estimation It is necessary to model and study the noise in the two measurements from the visual system in

It is necessary to model and study the noise in the two measurements from the visual system order to finally achieve an accurate design parameter of the filter. The noise model from the in order tomeasurements finally achieve an obtained accurate parameter of the filter. The 9noise model has been bydesign analyzing the histograms shown in Figures and 10. In this from the work, the noises of the process and measurement are considered as approximately a zero-mean measurements has been obtained by analyzing the histograms shown in Figures 9 and 10. In this distribution. work, the Gaussian noises of the process and measurement are considered as approximately a zero-mean Especially, more particles are required when the estimation is expected to hold an approximate Gaussian distribution. real state, and disastrous computation also happens with it. So after numerous attempts, the particle Especially, more particles the for estimation expected to holdisan approximate number N is set to 1000.are As arequired result, thewhen cycle time the visual is measurement system 0.421 s, usingdisastrous the Matlab platform on a PCalso with happens a 1.6 GHz Intel processor. 12 showsattempts, the results the of particle real state, and computation withi5it. So afterFigure numerous state using the PF measurements module andsystem IMU while the s, using number N is setestimation to 1000. As a result, the cycle timefrom for the thetriangulation visual measurement is 0.421 quadrotor is hovering in the air. The mean value of the state of the quadrotor is thought to be the Matlabapproximately platform onzero. a PCAswith a 1.6 GHz Intel i5 processor. Figure 12 shows the results of state a consequence, the standard deviations for the estimated state and the estimation original using the PF measurements the0.0138, triangulation module and IMU0.9917, while6.1615, the quadrotor measurements are [0.0580,from 0.1475, 0.0161, 0.0230, 0.0197, 1.7052, 118.4207, 71.4597, 44.4818] and [0.3067, 0.8531, 0.2053, 0.0335, 0.0380, 0.0364, 3.5187, 3.2062, 7.7488, is hovering in the air. The mean value of the state of the quadrotor is thought to be approximately 121.5115, 76.2109, respectively. Thefor results shown in Figure indicate that themeasurements state zero. As a consequence, the52.0670] standard deviations the estimated state 12 and the original estimation by the proposed technique is more stable than the measurement directly obtained by a are [0.0580,visual 0.1475, 0.0138, 0.0161, 0.0230, 0.0197, 1.7052, 0.9917, 6.1615, 118.4207, 71.4597, 44.4818] system or IMU. and [0.3067, 0.8531, 0.2053, 0.0335, 0.0380, 0.0364, 3.5187, 3.2062, 7.7488, 121.5115, 76.2109, 52.0670]

Robotics 2017, 6, 6

13 of 16

respectively. The results shown in Figure 12 indicate that the state estimation by the proposed technique Robotics 6, xthan the measurement directly obtained by a visual system or IMU. 14 of 17 is more2017, stable

0.1

ax(m/s 2)

vx(m/s)

0.1 estimation triangulation

0.2 0 -0.1

0

50

100

150

0 -0.1

200

0 -0.5

20

40

60

80

100

120

140

160

180

50

100

150

0

20

40

60

80

100

120

140

160

180

0

20

40

60

80 100 time (s)

120

140

160

180

0 -0.1

200

az(m/s 2)

0

vz(m/s)

0.1 0.05 0 -0.05

0

0.1 ay(m/s 2)

vy(m/s)

0.5

-1

imu estimation

0

50

100 time (s)

150

1 0.8

200

(a)

(b) estimation triangulation

0 -10

0

20

40

60

80

100

120

140

160

180

-1000

q (°/s)

pitch (°)

0

0

50

100

150

200

0

50

100

150

200

0

50

100 time (s)

150

200

0

-1000 0

20

40

60

80

100

120

140

160

180 500 r (°/s)

160 yaw (°)

estimation imu

0

1000

10

-10

1000 p (°/s)

roll (°)

10

140 0

20

40

60

80 100 time (s)

(c)

120

140

160

180

0

-500

(d)

Figure 12. MAV estimated state (blue) and observed measurements (red) from the visual system or Figure 12. MAV estimated state (blue) and observed measurements (red) from the visual system IMU. (a) translational velocity ( , , ) in the inertial reference frame; (b) accelerated velocity or IMU. (a) translational velocity v x , vy , vz in the inertial reference frame; (b) accelerated velocity ) in the inertial reference , ℎ, ( , , ) in the body reference frame; (c) attitude angles ( a x , ay , az in the body reference frame; (c) attitude angles (roll, pitch, yaw) in the inertial reference frame; (d) angular velocities ( , , ) in the body reference frame. frame; (d) angular velocities ( p, q, r ) in the body reference frame.

Meanwhile, another experiment of real-time pose estimation has also been carried out while the Meanwhile, another of region real-time estimation hasInalso out while quadrotor is flying freelyexperiment in a control of pose the visual system. thisbeen test,carried the position and the quadrotor is flying freely in a control region of the visual system. In this test, the position and heading of the quadrotor are remotely controlled by a ground station and its attitude is heading of the quadrotor controlled by aresult ground and attitude isflight autonomously autonomously stabilizedare by remotely the onboard IMU. The of station a part of theitsrecovered trajectory stabilized IMU. The of the trajectorywell, estimated by the estimatedby bythe theonboard PF framework is result shownofina part Figure 13. recovered In order toflight be observed it should be PF framework is shown in Figure 13. In order to be observed well, it should be noted that the Z-axis of noted that the Z-axis of the 3D world (inertial) frame has been converted to the opposite direction the world (inertial) frame has been converted to the opposite direction and points upwards. and3D points upwards.

Robotics Robotics 2017, 2017, 6, 6, 6x

14 15 of of 16 17

Figure 13. Real time trajectory estimation (blue) and measurement pose (red) of the flying quadrotor in Figure 13. Real time trajectory estimation (blue) and measurement pose (red) of the flying quadrotor the 3D reference frame. in the 3D reference frame.

From the results and discussion above, the proposed CVT enables a stable estimation when the From the results and discussion above, the proposed CVT enables a stable estimation when the flying quadrotor is within the view of the designed vision system. Additionally, the camera view flying quadrotor is within the view of the designed vision system. Additionally, the camera view or or image noise may lead to different errors in each location method based on different constraints. image noise may lead to different errors in each location method based on different constraints. It is It is helpful to achieve a stable result when the solutions from triangulation and P3P are referred to by helpful to achieve a stable result when the solutions from triangulation and P3P are referred to by a a filter. filter. 6. Conclusions 6. Conclusions In summary, we have presented a combined vision technique and built a visual measurement In summary, we have presented combined vision technique built a visual measurement system for MAV pose estimation. In athis work, an indoor visual and measurement system has been system for MAV pose estimation. In this work, an indoor visual measurement system hasdesign been developed and introduced, including multi-camera parameter calibration, MAV detection, and developed and introduced, including multi-camera parameter calibration, MAV detection, and of a quadrotor. The unique contribution of this paper is the proposed CVT in which two visual design of a quadrotor. The unique contribution of this paper is the proposed CVT in which two positioning methods can be integrated in a PF framework to obtain a stable estimated pose. With the visual positioning methods can be integrated in a PF framework to obtain a stable estimated designed visual measurement system, some interesting proof-of-concept tests have been carriedpose. out. With the designed visual measurement system, some interesting proof-of-concept tests havefuture been The results show that the proposed CVT can provide stable pose estimation for MAVs. In our carried The results show that proposed CVT provide navigation stable pose platform estimation MAVs. plan, theout. proposed system will bethe assembled into ancan on-ground tofor guide the In our future plan, the proposed system will be assembled into an on-ground navigation platform to MAV for safe and precise landing. It can also be considered as a low-cost and high-performance flight guide the MAV for safe and precise landing. It can also be considered as a low-cost and test-bed, which can be applied to study the dynamics, control, and navigation of MAVs. high-performance flight test-bed, which can be applied to study the dynamics, control, and Acknowledgments: This work reported in this paper is the product of several research stages at the Wuhan navigation of MAVs.

University of Technology and has been sponsored in part by Natural Science Foundation of China (51579204), Double First-rate Project of WUT Graduate andresearch innovative research of Acknowledgements: This work (472-20163042) reported in thisand paper is the self-determined product of several stages at theFunds Wuhan WUT (2015-JL-014 and 2016IVA064). I thank three peer reviewers for their greatly appreciated comments and University of Technology and has been sponsored in part by Natural Science Foundation of China (51579204), criticisms to help me improve my paper. Double First-rate Project of WUT (472-20163042) and Graduate self-determined and innovative research Funds Author H.Y. and C.X. conceived and designed the experiments; H.Y. and S.X. performed the of WUTContributions: (2015-JL-014 and 2016IVA064). I thank three peer reviewers for their greatly appreciated comments experiments; Y.W. and C.Z. contributed the quadrotor platform and the experimental materials; H.Y. analyed the and criticisms to help me the improve data; H.Y. and Q.L. wrote paper.my paper.

Author Contributions: and declare C.X. conceived and Conflicts of Interest: TheH.Y. authors no conflict ofdesigned interest. the experiments; H.Y. and S.X. performed the experiments; Y.W. and C.Z. contributed the quadrotor platform and the experimental materials; H.Y. analyed the data; H.Y. and Q.L. wrote the paper. Conflicts of Interest: The authors declare no conflict of interest.

References

Robotics 2017, 6, 6

15 of 16

References 1. 2. 3.

4. 5.

6. 7. 8.

9. 10.

11. 12. 13. 14. 15. 16.

17. 18. 19.

20. 21.

Sanchez-Lopez, J.L.; Pestana, J.; Saripalli, S.; Campoy, P. An Approach toward Visual Autonomous Ship Board Landing of a VTOL UAV. J. Intell. Robot. Syst. 2014, 74, 113–127. [CrossRef] Maza, I.; Kondak, K.; Bernard, M.; Ollero, A. Multi-UAV Cooperation and Control for Load Transportation and Deployment. J. Intell. Robot. Syst. 2010, 57, 417–449. [CrossRef] Mulgaonkar, Y.; Araki, B.; Koh, J.S.; Guerrero-Bonilla, L.; Aukes, D.M.; Makineni, A.; Tolley, M.T.; Rus, D.; Wood, R.J.; Kumar, V. The Flying Monkey: A Mesoscale Robot That Can Run, fly, and grasp. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016. Brockers, R.; Humenberger, M.; Kuwata, Y.; Matthies, L.; Weiss, S. Computer Vision for Micro Air Vehicles. In Advances in Embedded Computer Vision; Springer: New York, NY, USA, 2014; pp. 73–107. Mondragón, I.F.; Campoy, P.; Martinez, C.; Olivares-Méndez, M.A. 3D pose estimation based on planar object tracking for UAVs control. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Anchorage, AK, USA, 3–7 May 2010. Gomez-Balderas, J.E.; Flores, G.; García Carrillo, L.R.; Lozano, R. Tracking a Ground Moving Target with a Quadrotor Using Switching Control. J. Intell. Robot. Syst. 2013, 70, 65–78. [CrossRef] Xu, G.; Qi, X.; Zeng, Q.; Tian, Y.; Guo, R.; Wang, B. Use of Land’s Coorperative Object to Estimate UAV’s Pose for Autonomous Landing. Chin. J. Aeronaut. 2013, 26, 1498–1505. [CrossRef] Shen, S.; Mulgaonkar, Y.; Michael, N.; Kumar, V. Vision-Based State Estimation for Autonomous Rotorcraft MAVs in complex Environments. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013. Chowdhary, G.; Johnson, E.N.; Magree, D.; Wu, A.; Shein, A. GPS-denied Indoor and Outdoor Monocular Vision Aided Navigation and Control of Unmanned Aircraft. J. Field Robot. 2013, 30, 415–438. [CrossRef] Weiss, S.; Achtelik, M.W.; Lynen, S.; Achtelik, M.C.; Kneip, L.; Chli, M.; Siegwart, R. Monocular Vision for Long-term Micro Aerial Vehicle State Estimation: A Compendium. J. Field Robot. 2013, 30, 803–831. [CrossRef] Yang, S.; Scherer, S.A.; Schauwecker, K.; Zell, A. Autonomous Landing of MAVs on an Arbitrarily Textured Landing Site Using Onboard Monocular Vision. J. Intell. Robot. Syst. 2014, 74, 27–43. [CrossRef] Thurrowgood, S.; Moore, R.J.; Soccol, D.; Knight, M.; Srinivasan, M.V. A Biologically Inspired, Vision-based Guidance System for Automatic Landing of a Fixed-wing Aircraft. J. Field Robot. 2014, 31, 699–727. [CrossRef] Herissé, B.; Hamel, T.; Mahony, R.; Russotto, F.X. Landing a VTOL Unmanned Aerial Vehicle on a Moving Platform Using Optical Flow. IEEE Trans. Robot. 2012, 28, 77–89. [CrossRef] Lupashin, S.; Hehn, M.; Mueller, M.W.; Schoellig, A.P.; Sherback, M.; D’Andrea, R. A Platform for Aerial Robotics Research and Demonstration: The Flying Machine Arena. Mechatronics 2014, 24, 41–54. [CrossRef] Michael, N.; Mellinger, D.; Lindsey, Q.; Kumar, V. The grasp multiple micro UAV testbed. IEEE Robot. Autom. Mag. 2010, 17, 56–65. [CrossRef] Valenti, M.; Bethke, B.; Fiore, G.; How, J.P. Indoor Multivehicle Flight Testbed for Fault Detection, Isolation and Recovery. In Proceedings of the AIAA Guidance, Navigation, and Control Conference and Exhibit, Keystone, CO, USA, 21–24 August 2006. How, J.; Bethke, B.; Frank, A.; Dale, D.; Vian, J. Real-time Indoor Autonomous Vehicle Test Environment. IEEE Control Syst. Mag. 2008, 28, 51–64. [CrossRef] Martínez, C.; Mondragón, I.F.; Olivares-Méndez, M.A.; Campoy, P. On-board and Ground Visual Pose Estimation Techniques for UAV Control. J. Intell. Robot. Syst. 2010, 61, 301–320. [CrossRef] Faessler, M.; Mueggler, E.; Schwabe, K.; Scaramuzza, D. A Monocular Pose Estimation System based on Infrared LEDs. In Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), Hong Kong, China, 31 May–7 June 2014. Oh, H.; Won, D.Y.; Huh, S.S.; Shim, D.H.; Tahk, M.J.; Tsourdos, A. Indoor UAV Control Using Multi-Camera Visual Feedback. J. Intell. Robot. Syst. 2011, 61, 57–84. [CrossRef] Yoshihata, Y.; Watanabe, K.; Iwatani, Y.; Hashimoto, K. Multi-camera visual servoing of a micro helicopter under occlusions. In Proceedings on the IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, CA, USA, 29 October–2 November 2007.

Robotics 2017, 6, 6

22. 23. 24. 25.

26. 27.

16 of 16

Altug, E.; Ostrowski, J.P.; Taylor, C.J. Control of A Quadrotor Helicopter Using Dual Camera Visual Feedback. Int. J. Robot. Res. 2005, 24, 329–341. [CrossRef] Jean-Yves Bouguet. Camera Calibration Toolbox for Matlab. 2008. Available online: http://www.vision. caltech.edu/bouguetj/calib_doc/index.html (accessed on 2 March 2015). Svoboda, T.; Martinec, D.; Pajdla, T. A convenientmulti-camera self-calibration for virtual environments. PRESENCE: Teleoper. Virtual Environ. 2005, 14, 407–422. [CrossRef] Lynen, S.; Achtelik, M.W.; Weiss, S.; Chli, M.; Siegwart, R. A Robust and Modular Multi-Sensor Fusion Approach Applied to MAV Navigation. In Proceedings of the IEEE/Rsj International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, 3–7 November 2013. De Marina, H.G.; Pereda, F.J.; Giron-Sierra, J.M.; Espinosa, F. UAV Attitude Estimation Using Unscented Kalman Filter and TRIAD. IEEE Trans. Ind. Electron. 2012, 59, 4465–4475. [CrossRef] Gustafsson, F.; Gunnarsson, F.; Bergman, N.; Forssell, U.; Jansson, J.; Karlsson, R.; Nordlund, P.J. Particle Filters for Positioning, Navigation, and Tracking. IEEE Trans. Signal Process. 2002, 50, 425–438. [CrossRef] © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).