Laser-Based Online Sliding-Window Approach for ... - SAGE Journals

International Journal of Advanced Robotic Systems

ARTICLE

Laser-based Online Sliding-window Approach for UAV Loop-closure Detection in Urban Environments Regular Paper

Anqing Wang1, Chi Li1, Yisha Liu2, Yan Zhuang1*, Chunguang Bu3 and Jizhong Xiao4 1 School of Control Science and Engineering, Dalian University of Technology, Dalian, China 2 Information Science and Technology College, Dalian Maritime University, Dalian, China 3 State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, China 4 Department of Electrical Engineering, The City College, City University of New York, USA *Corresponding author(s) E-mail: [email protected] Received 23 August 2015; Accepted 27 February 2016 DOI: 10.5772/62755 © 2016 Author(s). Licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

1. Introduction

Online loop-closure detection serves as an essential task for Unmanned Aerial Vehicles (UAVs) equipped with laser scanners. Due to the inherent errors in UAVs’ pose estima‐ tion, a 3-D reconstruction algorithm is adopted to perform 3-D map building, which establishes probabilistic models of the system according to the assumption of errors. To meet the demand of online loop-closure detection using sequential 2-D laser data, a robust ISW-NDT (incremental sliding-window-based NDT) approach is proposed, which compares the appearance similarity between two scans by sliding a window with fixed size. Compared with the conventional 3-D NDT approach, the proposed loopclosure detection algorithm is capable of providing superior performance in large-scale outdoor environments, achieving higher recall rate at 100% precision and ensuring successful online implementation. Experimental results show the validity and robustness of the proposed method.

Unmanned Aerial Vehicles (UAVs), particularly low-flying small ones, are portable, flexible and potentially costeffective for robotic applications such as mapping, naviga‐ tion and exploration. Recently, a variety of approaches have been developed for aerial survey tasks with UAVs in both indoor and outdoor environments [1-4], among which the loop-closure detection problem has aroused great interest for its significance in self-localization and simulta‐ neous localization and mapping (SLAM). Loop-closure detection is presented as the problem of identifying previously visited locations. However, it is still a challeng‐ ing task for UAVs to detect the loop-closure based on sequential laser scanning in large-scale outdoor environ‐ ments.

Keywords Loop-closure Detection, 3-D Laser Point Cloud, Unmanned Aerial Vehicles (UAVs)

There have been a wealth of studies carried out on loopclosure detection over the past decade for both UAVs and Unmanned Ground Vehicles (UGVs). Among all the loopclosure detection methods, the majority are based on visual sensors. Visual loop-closure detection using Bag-of-Words (BoW) has been extensively studied in robotics research [5, Int J Adv Robot Syst, 2016, 13:61 | doi: 10.5772/62755

1

6]. Another method called FAB-MAP is considered an important step in appearance-based loop-closure detec‐ tion, especially in large-scale environments [7,8]. More recently, direct feature matching [9] and tree structure [10, 11] have also received considerable attention in visual loopclosure detection. Although visual sensors are powerful and widely used in loop-closure detection, their problems of sensitivity to various lighting conditions and small fields of view remain to be solved. Meanwhile, laser range finders (LRFs) provide a wider field of view than most monocular or stereo cameras and are able to provide high-quality data even in dark environ‐ ments. In recent years, LRFs have been increasingly concerned with loop-closure detection and most laserbased loop-closure algorithms were designed for UGVs [12-16]. A machine learning framework based on AdaBoost was presented in [12], which expressed the problem of loop-closure detection as a classification task. A more detailed version in [13] presented extensions of this work by using various publicly available data sets. The authors showed detection rates of 63% and 53% for 3-D outdoor and indoor environments, respectively. Magnusson et al. proposed an appearance-based loop detection approach, in which the normal distributions transform (NDT) [14] was used as the local appearance descriptor of a 3-D scan [15]. Not only an outdoor data set but also highly similar mine data were used to support the effectiveness of the proposed algorithm. Moreover, the methods in [13] and [15] both used a rotation invariance approach. In a more recent work, Steder et al. [16] presented an approach to transform point clouds into a range image, so that interest points could be obtained to extract features and to score candidate trans‐ formations. According to the authors, the performance of rotation invariance would not be affected as long as the robot was moving on flat ground. Their experiment covered both a real campus environment and a freely available data set. Recently, some groups have been trying to use small UAVs for laser-based loop-closure detection research. The authors in [17] employed a method for loop-closure detection in a pose graph in order to augment the perform‐ ance of Kalman filter-based navigation, which used the FLIRT feature for the detector and descriptor pair. The FLIRT feature, a multi-scale interest region operator for 2D range data, was first proposed by Tipaldi and Arras in order to solve the 2-D navigation task in [18]. However, the method is unable to ensure the robustness for 3-D loopclosure detection. In addition, most experiments in the paper are implemented with a freely available data set, where the laser data are obtained in a stop-and-start way. So far, there is no general method readily available for sequential 2-D laser data collected on the fly. In this paper, a laser-based online sliding-window ap‐ proach is proposed for 3-D loop-closure detection with small UAVs in large-scale outdoor environments. The description for UAV data collection with LRF (laser range 2

Int J Adv Robot Syst, 2016, 13:61 | doi: 10.5772/62755

Figure 1. 3-D point clouds are composed of sequential 2-D laser scan lines acquired by the laser range finder on the UAV

finder) is illustrated in Figure 1., where the scan lines, perpendicular to the UAV’s heading direction, are gener‐ ated by LRF sequentially. The work focuses on two aspects: 1.

3-D Reconstruction. Compared with UGVs, low-flying small UAVs are much less constrained on account of their navigational capabilities. The vibration of UAVs can bring in pose estimation errors, making it unsatis‐ factory for detecting loop-closures with distorted point clouds. In order to provide accurate 3-D maps for detecting “loops”, the raw 3-D point clouds must first be aligned. A 3-D reconstruction algorithm is used here based on the work by Thrun et al. in [19]. The algorithm establishes probabilistic models of the system according to the assumption of errors. The reconstruction problem is then converted to an optimization problem of an integrated probabilistic model. The 3-D reconstruction performance and computational efficiency of the proposed method have been demonstrated with experimental results.

2.

Loop-closure Detection. How to detect loop-closures effectively with sequential laser data is another problem stemming from the real applications. This paper presents an online incremental sliding-windowbased NDT (ISW-NDT) algorithm to deal with a 3-D loop-closure detection problem. As the UAV flies forward, laser scans are generated by the forward movement of a window with fixed size. Yet the number of scans with this method increases greatly compared with the traditional scan segmentation method. In order to ensure the efficiency of ISW-NDT concurrently, we simply calculate new added data and rotate the NDT grids directly rather than the point clouds. Experimental results show that the ISW-NDT can not only robustly detect loop-closures, but also achieve online efficiency.

The rest of this paper is organized as follows: Section 2 describes the 3-D reconstruction algorithm with sequential 2-D laser scanning data. Section 3 presents the ISW-NDT loop-closure detection approach. The experimental results are introduced in Section 4. Finally, the conclusions and future work are given in Section 5.

2. 3-D Reconstruction Using Sequential Laser Scanning Data 3-D reconstruction with raw laser data is considered an important procedure in UAV loop-closure detection. The location error induced by UAV vibration makes it difficult to build accurate 3-D maps, which has a great impact on the following loop-detection performance. In this section, we introduce an approach using sequential 2-D laser data and the UAV’s pose estimation to perform 3-D reconstruction, based on the method proposed by Thrun et al. in [19]. The proposed method converts the 3-D mapping problem into the optimization of a probabilistic model, which can be solved iteratively. The algorithm includes three parts: measurement model establishment, local smoothness model establishment and optimization. 2.1 Measurement model The micro UAV is equipped with an LRF, a GPS, an Optical flow and an AHRS, which is a common configuration for performing a 3-D reconstruction task. In urban environ‐ ments, considering the shades of trees or buildings (typical GPS-denied environments), the optical flow is used along all the routes to get the local pose increment for the 3Dreconstruction application. Let xt denote the real pose at

time t, composed of three coordinates and three Euler angles (pitch, yaw, roll). Similarly, the measurement pose is presented as yt = xt + εt , where εt is assumed to be white

Gaussian noise. Additionally, the noise caused by each state variable is for practical purposes considered to be mutually independent. The probability of measuring yt if

the correct pose is xt is presented as p(yt | xt ) : p ( yt | xt ) =

1 (2p )3 det( A)

(

exp -( yt - xt )T A-1( yt - xt )

)µ

µ exp ( -( yt - xt )T A-1( yt - xt ))

(1)

p( Dyt | Dxt ) = 1

exp æç -( Dyt - Dxt )T B-1 ( Dyt - Dxt ) ö÷ è ø

In structured or semi-structured environments, there are always clumpy planes such as walls, flat roads and road edges, which provide regular distribution of point clouds. This local smoothness property is used to align the scan line at time t. The scanning line at time t needs to be rotated and translated until it coincides with the laser data aligned before time t. Note that there are always discontinuous regions, for example, the edges of a raised or depressed part of the smooth surface, pedestrians or cars appearing unexpectedly, etc. Performance of matching will be affected if these discontinuous point clouds are introduced. The laser points within discontinuous regions are removed by setting the maximum allowable variation ths. If the distance between one point and its nearest neighbour is larger than ths, the pair of laser points will be deleted. The local smoothness model in this subsection is established after filtering jumping laser points at the edges. Let zt = {zti | i = 1, … , n } denote the scan measurement at time t, where zti shows the pose of the ith point, relative to k previous scans zt −1, … , zt −k . The local smoothness model is defined as follows:

æ

difference between two measurements and two real poses, respectively.

(2p )3 det( B)

In order to improve the positioning accuracy based on the preliminary location estimates of an odometer, most SLAM algorithms implement registration of point clouds depend‐ ing on the overlap between neighbouring scans. However, the neighbouring two scans in our application can hardly overlap since the heading direction of UAV is perpendicu‐ lar to the scanning plane. Thus, we introduce the local smoothness model, using the scanning results within a short time frame as reference for data alignment at time t.

p( zt | xt , xt -1 ,K xt - k , zt -1 ,K , zt - k ) µ

The measurement covariance A is a 6 × 6 diagonal matrix. Since all sensors work with systematic error (e.g., drift), the incremental measurement can naturally provide more accurate results. We thus introduce the differential model p(Δyt | Δxt ), where Δyt = yt − yt −1 and Δxt = xt − xt −1 denote the

=

2.2 Local smoothness model

(2)

Here B is the covariance matrix of differential measure‐ ment noise, also obeying Gaussian Distributions. Howev‐ er, the two models above are both established based on the pose of UAVs directly, which is insufficient for accurate mapping.

n

(

)ø 2ö

µ exp çç -å ( R × zti + T - zti - ) × ni ÷÷ i =1 è

(3)

We set zti− = argminz i (zti − zti−)T (zti − zti−) , zti− ∈ zt −1, … , zt −k , t− picking out the point nearest to zti within the previous k scans. R and T indicate the rotation and translation matrix, respectively. The dot product projects the vector into ni , where ni is the normal vector of the plane calculated by points in the neighbourhood of zti−. Obviously, the proba‐ bility can reach maximum only when the two vectors are perpendicular. Considering that a 2-D laser sensor is used in our 3-D reconstruction system, lacking the measurement informa‐ tion of the third dimension, we omit the dimension where the laser range sensor moves forward while calculating the maximum in (3). 2.3 Optimization After the assumption of measurement and systematic errors, the resulting probabilistic model is proportional to the following product:

Anqing Wang, Chi Li, Yisha Liu, Yan Zhuang, Chunguang Bu and Jizhong Xiao: Laser-based Online Sliding-window Approach for UAV Loop-closure Detection in Urban Environments

3

(

(

)

P µ p yt |xt p( Dyt |Dxt )p zt |xt , xt -1,¼ , xt - k , zt -1,¼zt - k

)

(4)

The negative log-likelihood is expressed as follows: 1 2

1 2

(

(

E( xt ) = - log( P) = - log p ( yt |xt ) p( Dyt |Dxt ) p zt |xt , xt -1,¼ , xt - k , zt -1,¼zt - k According to the theory of maximum likelihood estimate, the real pose xt can be recovered by minimizing E(xt ), as

shown below.

min : E( xt ) =

T 1 yt - xt ) A-1 ( yt - xt ) + ( 2

T 1 Δyt - Δxt ) B-1 ( Δyt - Δxt ) + ( 2 2 1 n + å R × zti + T - zti - × ni 2 i =1

+

((

(6)

) )

Note that E(xt ) contains a discrete optimization problem which involves finding the corresponding zti− within k (we

set k=2 in our work) previous scans for each measurement zti . In addition, the rotation matrix R makes E(xt ) a non-

linear function. This problem is solved by iterating repeat‐ edly until the pose xt eventually converges to the optimal pose vector xt *. The solution above is an algorithm that

(5)

D laser data Rk and current pose data Pk of the UAV. In the first step, the raw laser data and pose data are fused and optimized to reconstruct the current 3-D map Mk by the method introduced in Section 2. Noting that Mk contains all the 3-D laser data the UAV has collected until the kth moment, a 3-D laser scene Sk is selected by a slidingwindow-based algorithm from the long map Mk (i.e., defining the current place the UAV is visiting). For the current scene Sk, we extract robust appearance-based features to get descriptor Dk. The scene generation step and feature extraction step together constitute the ISW-NDT algorithm, which will be introduced later in Section 3.3. Finally, a matching step is conducted to compare the descriptor Dk of the current place with the descriptors D1, D...Dk-1 of previously visited places. If Dk is similar to one of the previous descriptors in the reference database, the current scene Sk is considered a loop-closure. When all the steps have been completed, the descriptor Dk will be put into the reference database and used for the next matching steps.

carries out the optimization approach in an incremental and iterative fashion, where the pose at time t can be calculated from the previous two poses at time t-1, t-2 of all scan measurements. In addition, the proposed algorithm has low computational complexity, which can satisfy the real-time processing requirement. It should be noted that formulas (1)-(6) are presented by Thrun et al.; the detailed derivation can be referred to in [19].

As the UAV flies, new places come continuously. The loopclosure detection framework repeatedly runs all the steps above to determine whether a place is a loop-closure or not. Note that the size of the reference database incrementally grows as the UAV flies. However, the framework can be very efficient at computing, for the reason that the database is updated with appearance descriptors rather than storing the raw 3-D points of the scene.

3. Sliding-window Approach for Loop-closure Detection

3.2 Appearance descriptor extraction using 3D-NDT

3.1 Loop-closure detection framework The structure of the framework is illustrated in Figure 2. The input data consist of the newly acquired sequential 2-

Figure 2. Loop-closure detection framework

4

))


We use a 3D-NDT algorithm to generate appearance descriptor Dk from current scene Sk. 3D-NDT is presented by Magnusson et al. for completing 3-D laser-based loopclosure detection. The 3D-NDT algorithm represents a laser

Figure 3. Illustration of the insertion and deletion procedure for the sliding window. Left: starting position within the sliding window, represented in red. Right: the UAV moves forward with distance increment λ and the area of the window (in red) overlaps partially with some areas. The scan lines to delete are shaded in green; the scan lines to add are shaded in blue.

scan as Gaussian distribution N(μ, Σ) within a 3-D grid structure q ∈ R 3, in which every cell uses parameters of the probability density function (PDF) for N(μ, Σ) to describe the local shape. In practice, the parameters of PDF are obtained from covariance matrix Σqi of 3-D points in the cell,

labelling the local surface shape as spherical, linear or planar with different orientations. The appearance descrip‐ tor of the laser scan, which is used for distinguishing loop closure, is then created from the shape histogram of all cells. In order to make the appearance descriptor invariant to rotation, the 3D-NDT algorithm normalizes the orientation of the 3-D scan to standard orientations calculated from planar classes, making multiple normalized scans. For every normalized scan, shape histograms are generated within various metric intervals to make the appearance descriptor invariant to distance. Histograms from all normalized scans and all metric intervals together form the final appearance descriptor. 3.3 Incremental sliding-window-based NDT algorithm The 3D-NDT algorithm has solved the problem of extract‐ ing feature descriptors from a single 3-D scene. However, for the on-the-fly data from the UAV platform, there is no clear definition of a scene. In order to deal with on-the-fly data, this paper proposes the ISW-NDT algorithm, which combines scene selection mechanism and 3D-NDT compu‐ tation. The ISW-NDT can not only extract feature descrip‐ tors from on-the-fly data, but is also efficient enough to meet the online computing requirement. For the ISW-NDT in this paper, the sliding window is applied to complete the scene segmentation online. Similarly to the approach in other fields such as image processing, ISW-NDT implements a window with fixed size d to label the boundaries of a scene. The details of how the sliding window works are illustrated in Figure 3. As the UAV moves, the window slides forward with the distance increment λ , the newly generated 3-D laser scan lines La (scan lines to add) are inserted into the area of the slidingwindow and the old 3-D scan lines Ld (scan lines to delete) outside the area of the sliding window are deleted. For

every step λ , a new scene is generated by all the scan lines in the window. Next we examine how to efficiently compute the appear‐ ance descriptor for the scene in the current sliding window. The original 3D-NDT algorithm turns out to be timeconsuming when trying to ensure rotation invariance of the appearance descriptor. The 3D-NDT algorithm first calculates the NDT grids G, then rotates the raw 3-D scene for multiple times and calculates the NDT grids G’ for the second time. However, we consider that G and G’ can conserve the properties of the raw point clouds and there is no need to rotate the enormous raw data of the scene. The ISW-NDT algorithm in this paper straightforwardly rotates the NDT grids G to get G’ and removes the step of recalculation of NDT grids. The algorithm will be proved to be effective and efficient in the experiment in Section 4. The ISW-NDT algorithm uses an incremental storage technique to further increase efficiency. For the NDT grids G of the “area of sliding-window after UAV motion” (i.e., the current scene), the NDT grids Ga of the “scan lines to add” and the NDT grids Gd of “scan lines to delete”, the NDT grids of the current kth scene can be calculated as Gk= Gk-1- Gkd+ Gka, where Gk-1 represents the NDT grids of k-1th scene. For every step the sliding window moves, the algorithm only needs to calculate Gka to get Gk (Gkd is a part of Gk-1 and need not be calculated in the current step), causing the sliding window to work in an incremental way. All the details of the whole ISW-NDT algorithm are described in Algorithm I. 3.4 Matching and reference database updating The matching step compares the current descriptor Dk of Sk with the previous descriptors Di ∈ {D1, D2, … , Dk −1} of Si by

difference measure methods, which used Euclidean distance with normalization to describe the difference between two descriptors. A single difference threshold td has been used to distinguish loop-closure. To be concise, the distance threshold td is set to a fixed value based on automatic selection. If the difference measure between Dk

and Di is less than td, the current scan Sk is considered as


5

be proved to be effective and efficient in the experiment in Section 4. The ISW-NDT algorithm uses an incremental storage technique to further increase efficiency. For the NDT grids G of the “area of sliding-window after UAV motion” (i.e., the current scene), the NDT grids Ga of the “scan lines to add” and the NDT grids Gd of “scan lines to delete”, the NDT grids of the current kth scene can be calculated as Gk= Gk-1- Gkd+ Gka, where Gk-1 represents the NDT grids of k-1th scene. For every step the sliding window moves, the algorithm only needs to calculate Gka to get Gk (Gkd is a part of Gk-1 and need not be calculated in the current step), causing the sliding window to work in an incremental way. All the details of the whole ISW-NDT algorithm are described in Algorithm I. Algorithm 1 The ISW-NDT algorithm Inputs: 3D map Mk Output: The descriptor Dk of the scene Sk Initialize k=1, sliding window W at the start point of UAV, NDT grids G0=null. while UAV flies do Move W forwards by a step of λ, generate current scene Sk from Mk, get Lka (scan lines to add) and Lkd (scan lines to delete). Get NDT grids Gkd from Gk-1 according to Lkd. Compute NDT grids Gka with Lka and get NDT grids Gk=Gk-1-Gkd+Gka of current scene Sk. Rotate Gk to get Gk’ according to the method in [15] to make the result invariant to rotation. Compute the appearance descriptor Dk with Gk. k=k+1. end while

3.4 Matching and Reference Database Updating The matching stepit compares the current descriptor overlapping, otherwise is non-overlapping. In order D tok of Sk with the previous descriptors Di   D1, D2 , , Dk 1 of Si increase the efficiency in the matching step, two strategies by difference measure methods, which used Euclidean distance with normalization to describe the difference between two are applied to reduce the number of comparisons. The first descriptors. A single difference threshold td has been used to distinguish loop-closure. To be concise, the distance threshold strategy is based on the fact that the robot cannot possibly td is set to a fixed value based on automatic selection. If the difference measure between Dk and Di is less than td, the close a loop in practical SLAM within several scene as overlapping, current The scan distance Sk is considered sequences. of indexes for Sk and otherwise Si can beit is non-overlapping. In order to increase the efficiency in the matching strategies reduce number of comparisons. The first strategy is based on the fact that the | and | k − itwo mustare beapplied larger tothan anthe index described as step, robot tcannot possibly close a loop in practical SLAM within threshold . The second strategy requires that S must be several scene sequences. The distance of indexes for Sk and Si x i

within metric (defined ) of threshold tx. The second strategy requires that Si must be within k  i andthreshold must be larger thanas antrindex cana specific be described as distance Sk. By using only the two strategies, the potential historical a specific metric distance threshold (defined as tr) of Sk. By using only the two strategies, the potential historical descriptors descriptors can be added to determine whether the current can be added to determine whether the current scan is considered a loop-closure. The descriptor Dk will be put into the scan is considered a loop-closure. The descriptor Dk will be reference databasedatabase and will be used next put into the reference and willfor bethe used formatching the nextsteps. Figure 4. The HOKUYO UTM-30LX-EW 2D laser scanner mounted on a matching steps.

small UAV. The white parts are used for protecting the laser sensors.

4. Experimental Results

buildings, trees, cars, roads, etc. The details of the DUTUAV data sets can be found on the following link to the college’s website: http://scse.dlut.edu.cn/English/ Research/Projects/UAV_laser_Dataset.htm.

4.1 3-D laser data set acquisition with UAV platform In this study, a small UAV with six fixed rotors is used to collect sequential 2-D laser data in a complex urban environment, with a Hokuyo UTM-30LX-EW LRF, PCM-3362 CPU, UBLOX LEA-6h GPS, PX4 optical flow and MAHRS, the configurations of which are listed in TABLE I. The point clouds are obtained from the Hokuyo UTM-30LX-EW 2D laser range finder mounted on the UAV, as illustrated in Figure 4. The laser range finder provides the UAV with sequential 2-D scan lines at a rate of 40Hz with 0.25◦ resolution, oriented roughly perpendic‐ ular to the robot’s flight direction with a view of 270◦, as shown previously in Figure 1. The UAV is flown under manual control. In order to evaluate the performance of the algorithm, we chose two routes on the DUT campus, named DUT-UAV1 and DUT-UAV2 respectively, as denoted in Figure 5. In each figure, the route of the UAV was labelled with red lines, of which the pink dashed lines were hand-labelled ground truth. The DUT-UAV data sets are a typical campus environment, containing common urban structures such as 6


Name of the

Parameter of the configuration

configuration Hokuyo UTM-30LX- at a rate of 40Hz; 0.25 degree resolution; a view of 270 EW Laser scanner

degrees; weight 370g

SDI-MAHRS

at a rate of 100Hz; static angle error of ±0.1 degree; dynamic angle error ±1.0 degree; weight 18g

UBLOX LEA-6h GPS at a rate of 4Hz; positioning accuracy"/>=2.5MCEP; weight 33g PX4 optical flow

at a rate of 100Hz; 16mm M12 lens; weight 32g

PCM-3362 CPU

Intel Atom N450 1.66GHz; 2G memory; weight 580g

Table 1. Configurations Of The UAV

4.2 3-D reconstruction results The goal of optimization in Section 2.3 is to minimize the objective function E(xt ) without taking the geometry of the

(a)

(b)

Figure 5. An aerial view of DUT-UAV data set; trajectories of the data set are overlaid on Google Earth. The UAV flew along the red lines in the figure with increasing distance. (a) DUT-UAV1 is about 551 m; (b) DUT-UAV2 is about 933 m. The proportion of loop-closures for DUT-UAV1 and DUT-UAV2 is 10.1% and 19.6%, respectively.

(a)

(b)

(c)

Figure 6. Three groups of 3-D reconstruction results selected from DUT-UAV data set including structured or semi-structured environments. (Top) raw Figure 6. Three groups of 3-D reconstruction results selected from DUT-UAV data set including structured or semi-structured environments. (Top) raw point point clouds; (Bottom) the same point clouds after 3-D reconstruction. The execution time for Fig. 6(a), 6(b) and 6(c) is 78.61s, 81.02s and 188.43s, clouds; (Bottom) the same point clouds after 3-D reconstruction. The execution time for Fig. 6(a), 6(b) and 6(c) is 78.61s, 81.02s and 188.43s, respectively. respectively.

We tested the performance of the 3-D reconstruction algorithm in a variety of structured or semi-structured environments on the DUT campus. Some typical experimental results are provided in Figure 6., which are randomly selected from the DUT-UAV data set, including the 3-D reconstruction results of a vertical wall scene, a slope scene and a tennis court scene. It should be noted that the following 3-D reconstruction results are obtained while omitting the vibration of roll.

4.3 Timing Analysis of 3-D Reconstruction As represented above, the Hokuyo LRF works at a rate of 40Hz with 0.25◦ resolution; in other words, each scan line (a) However, there is no need (b) to use so many points in practical (c) (d) for smaller scenes contains 1,081 points. applications, especially FigureLRFs 7. A group down-sampling results of 3-D reconstruction algorithm. (a) raw 3-D point clouds; (b) the 3-D reconstruction result of the same or other with oflower resolution.

Figure 7. A group scene of down-sampling results reconstruction (a) raw 3-D point clouds; (b)(d)the reconstruction without down-sampling; (c) of the 3-D 3-D reconstruction resultalgorithm. of the same scene by one-third down-sampling; the 3-D 3-D reconstruction resultresult of the of the same scene without down-sampling; (c)bythe 3-D reconstruction same scene one-fifth down-sampling. result of the same scene by one-third down-sampling; (d) the 3-D reconstruction result of the same scene by one-fifth down-sampling. TABLE II.

Label of Scene

Num. of Lines

EXECUTION TIME BY DIFFERENT DOWN-SAMPLINGS

Time(s) without down-sampling

actual environment into consideration. However, as mentioned in Section 2.2, there may be some discontinuous regions existing in the scene. Once E(xt ) falls into the trap of local minimum, the scan line will be aligned erroneously. In our experiment, we set the maximum jumping distance ths = 0.2, which can reach good performance.

We tested the performance of the 3-D reconstruction algorithm in a variety of structured or semi-structured environments on the DUT campus. Some typical experi‐ mental results are provided in Figure 6., which are ran‐ domly selected from the DUT-UAV data set, including the 3-D reconstruction results of a vertical wall scene, a slope scene and a tennis court scene. It should be noted that the following 3-D reconstruction results are obtained while omitting the vibration of roll.

Time(s) by 1/3

Time(s) by 1/5

down-sampling down-sampling 4.3 Timing analysis of 3-D reconstruction

As represented above, the Hokuyo LRF works at a rate of 40Hz with 0.25◦ resolution; in other words, each scan line contains 1,081 points. However, there is no need to use so many points in practical applications, especially for smaller scenes or other LRFs with lower resolution. Label of Num. of Time(s) without

Time(s) by 1/3

Time(s) by 1/5

Scene

Lines

down-sampling

down-sampling

down-sampling

Fig. 6(a)

2,850

78.61

29.15

21.49

Fig. 6(b) 2,470

81.02

31.36

22.11

Fig. 6(c)

188.43

72.03

50.44

5,877

Table 2. Execution Time by Different Down-Samplings


7

We introduce the down-sampling strategy to test the 3-D reconstruction performance. The laser points are downsampled by one-third and one-fifth, respectively. The respective execution time of scenes in Figure 6. is summar‐ ized in TABLE II. The runtime analysis is carried out on a laptop with a Core i5-4200 CPU and 4GB of RAM, running Windows 8. It is clear that there is a remarkable acceleration in processing time by down-sampling and that the average time per line by no down-sampling, one-third and one-fifth is 30.8ms, 11.7ms and 8.3ms, respectively. Meanwhile, a group of 3-D reconstruction results (the same scene in Figure 6. (a)) by several different down-samplings are shown in Figure 7., from which it can be seen that downsampling has a negligible effect on 3-D reconstruction performance. Considering the online requirement in the experiment, we finally use the reconstruction results by one-third down-sampling for the following loop-closure detection approach, where each scan line contains 361 points. 4.4 Loop-closure detection evaluation In general, there are two types of evaluation method to judge the discrimination ability of 3D-NDT: Full Evalua‐ tion and SLAM Scenario [15]. Considering the practicabil‐ ity in our application, we assess the efficiency of the ISWNDT algorithm in SLAM application. When a scan Si is labelled as a loop-closure, we require that the distance from Si to the most similar scan Si’ should be less than the distance threshold tr and, simultaneously, the difference between the two scans should be below the difference threshold td. The algorithm is evaluated in terms of preci‐ sion rate (P) and recall rate (R), which are defined as Precision=

Recall =

true positive true positive+false positive

true positive true positive+false negative

(7)

(8)

The true positive and false negative together constitute the ground truth, which is obtained in our work by manually labelling online-generated 3-D laser scans as either “over‐ lapping” or “non-overlapping”. The labelling is done for a series of individual scans instead of all combinations of scan pairs. The reason is that, once the data set contains hundreds of scenes, it is not practical to conduct; some scan pairs are also not easy to judge. The precision-recall rates are important characteristics for any detection problem and, generally, it is difficult to achieve high precision rate and high recall rate simultaneously. For the loop detection problem in the SLAM application, even a single false positive (i.e., the non-overlapping mistakenly being considered overlapping) has a destructive effect on the following global optimization. Thus we argue that the concern in this work is to keep precision at 100%, while maximizing the recall rate. 8


4.5 Parameters The parameters of the proposed appearance descriptor are selected for the loop-closure detection experiments, the values of which were chosen empirically: • NDT cell size q = 0.5 m • Sliding window size sw= 300 laser lines • The distance threshold tr= 30 m • The index threshold tx = 3000/λ • The difference threshold td= 0.051 The NDT cell size q is chosen mainly based on the config‐ uration of LRF. If the grid is too small, the scanner noise will affect the appearance descriptor dominantly, especial‐ ly for sparsely distributed point clouds at farther parts of a scan. On the other hand, if the grid is too large, it is unable to describe the details of point clouds accurately. The experimental result shows that q = 0.5 m works well for ISW-NDT loop-closure detection by our UAV platform equipped with a Hokuyo 2-D LRF. In addition, the slidingwindow size sw is also an empirical parameter which is determined by the number of laser lines. Using 300 laser lines per scene produced good results in our experiment. Another choice is to generate scenes by fixed distance, which is less convenient to conduct for the sliding-window approach. Note that our UAV flies under manual control at a fixed speed; the line-based or distance-based scene generation can produce almost the same performance. As mentioned above, this study considers that the two scans being compared should be below the distance threshold tr. Cummins and Newman [8] use a 40-m threshold for a trajectory of about 100 km, which was too large for our data set here, while the work in [15] uses a 2.6m threshold for a trajectory of 111 m. Almost all the loop closures in our work can be detected when the distance threshold tr is set to 30 m. For our DUT-UAV data set, 98% of the detected scans are less than 15 m and 88% are within 5 m. A minimum loop size has also been applied in our practical SLAM experiment based on the fact that the UAV cannot return to the previously visited place within only a few steps. When finding the most similar scan of Sk, we only compare the scans that are more than tx steps away in the scan sequence. The index threshold tx is identified as 3000/λ , where λ is the slide increment. When λ=300 (equal to the size of sliding window), we set tx a fixed value: 10 steps; once λ changes, tx will vary with λ as mentioned above. Moreover, for the purposes of discussing the ability of detecting “loops” as λ changes, the difference threshold td is set to a fixed value according to the automatic threshold selection in [15]. We use the DUT-UAV1 data set to get td = 0.051 and then the DUT-UAV2 data set is used for per‐ formance testing as discussed below. The slide increment λ , which is not a fixed parameter in our experiment, has a great effect on the performance for ISW-

(a)

(b)

Figure 8. (a) The precision and recall curves as λ varies. (b) The average elapsed time as λ varies.

NDT loop-closure detection. It is of vital importance to find a good value for the slide increment λ . Using a too-small value greatly increases the number of scenes as well as the comparisons between scenes; thus the expected time increases with the number of scenes. On the contrary, a toolarge value can evidently cause a reduction in recall rate. For example, a place in one scene may be segmented into several scenes which may lose some distinctive features when the same place is revisited as λ increases, making the difference between these scans larger than td; in other words, the false negative is introduced. Figure 8. illustrates the influence of slide increment, showing how the preci‐ sion-recall curves and the elapsed time change for the DUTUAV2 data set when λ varies, respectively. As shown in Figure 8., the distance threshold td can guarantee the precision at 100% in most conditions during the change of λ . Yet, once λ is less than 50, the available td will produce a false positive, which can result in mistakes for subsequent mapping in SLAM. Meanwhile, the recall keeps increasing with the reduction in λ . During this process, there is an obvious ascent when λ ≤ 150. It can be explained that the sliding-window size here is set to 300 laser lines; once the slide increment λ is larger than half of the window, some generated scenes from the same revisit‐ ed place can only overlap less than 50%, causing the drop of recall. In addition, the maximum recall rate for DUTUAV2 is 84.7% at 100% precision, rising to 86.9% at 99% precision. Figure 8. shows the average elapsed time for each comparison and each scan, respectively. The average elapsed time per comparison is almost unchanged with a value of about 100ms. In contrast, the average elapsed time per scan keeps increasing as λ decreases. This is mainly because the candidate database expands with the UAV flying forward; thus the current scan Sk needs to be compared with more candidate scans. When λ ≥ 40, the average elapsed time per scan is less than 630ms, which can generally meet the online requirement. Considering the effectiveness for SLAM application as well as the efficiency for online operation, λ = 50 is finally chosen as the most appropriate value for detecting loops in the experiment, where the recall rate is 84.7% at 100% precision and the elapsed time per scan is about 510ms.

4.6 Experimental results of loop-closure detection The loop-closure performance using λ = 50 for the DUTUAV2 data set is visualized in the maps shown in Figure 9. Due to the strategy of one-third down-sampling, each scan line now contains 361 points. In this case, each scene covers 300 scan lines, containing 108,300 points in total. The whole data set contains 809 scans (with 270-degree field of view) and covers a trajectory of about 933 m. The recall rate at 100% precision is 84.7% at td = 0.051, with 24 false negatives (15.3% of the 159 overlapping scans). Of the omissions (labelled in green in Figure 9.), the scans at locations A, C and D are missing because they are from the corners of buildings, where the scan lines are divergent. The scans at location B are from an open environment, making it difficult to get available appearance descriptors. In addition, the stretch A-C is revisited in the opposite direction, which can provide a good suggestion that the ISW-NDT algorithm is robust to viewpoint changes. The ISW-NDT algorithm has also been compared with the 3D-NDT algorithm, which is the state-of-the-art loopclosure detection method used for UGVs. More discussion regarding the difference between the 3D-NDT and ISWNDT algorithm is summarized in TABLE III., running on the DUT-UAV data set (the laser data are collected while the UAV is flying), where an obvious improvement can be seen in both timing and recall rate. Moreover, it can also be confirmed that, if no 3-D reconstruction approach has previously been applied, the recall rate for the DUT-UAV2 data set with λ = 50 is about 75.6% at 100% precision, which is almost 10% lower than the previous loop detection results. DUT-UAV1 Algorithm

3D-NDT algorithm ISW-NDT algorithm ( λ=50 )

DUT-UAV2

Timing per Recall Prec. Timing per Recall Prec. scan (ms)

(%)

(%)

scan (ms)

(%)

(%)

2,840

45.5

100

2,910

42.2

100

498

83.3

100

510

84.7

100

Table 3. Comparison Between 3D-NDT And ISW-NDT


9

have shown that the presented approach can achieve high recall rates at 100% precision in large-scale outdoor environments. With the strategy of one-third downsampling, the total time for 3-D reconstruction and loopclosure detection is about 1.1s for a scene with 108,300 points, which can well ensure the online implementation.

Figure 9. Loop-closure maps for the DUT-UAV2 data set with λ=50. The ground truth is shown in pink, while the green colour denotes the false negatives (loop closures that are not correctly detected). A total of 135 loop closures are detected, with no false positives.

4.7 Timing of loop-closure detection The runtime analysis of the loop-closure detection ap‐ proach was carried out on a laptop with a Core i5-4200 CPU and 4GB of RAM, running Windows 8. For the DUT-UAV2 data set, there are 108,300 laser points in each scene and the whole data set contains 809 scans when setting λ = 50. As shown in Figure 8., with the slide increment λ = 50, there are 4,740 comparisons in total and each comparison costs about 100ms. The ISW-NDT algorithm costs about 413s in total and an average 510ms per scan to complete the loopdetection task (including generating histograms and computing similarity between two scans). As can be seen in TABLE III., the 3D-NDT algorithm proposed by Mag‐ nusson et al. [15] has also been carried out on the DUTUAV2 data set, which costs about 2.9s per scan for loopclosure detection. The proposed ISW-NDT algorithm is almost six times faster than their work. Taking the time of the 3-D reconstruction into considera‐ tion, if we set λ = 50, it will take about 600ms for each sliding to complete the 3-D reconstruction task. Thus the whole process of loop-closure detection costs about 1.1s for each sliding of the window. On the other hand, the LRF used in this study works with 0.25° resolution, which takes 1.25s to slide forward by 50 laser lines. Hence, the time cost for the loop-closure detection task can well ensure its online implementation. 5. Conclusion and Future Works This paper presented the ISW-NDT algorithm to solve a UAV’s loop-closure detection problem using sequential 2D laser data. Considering the measurement error of UAVs, a 3-D reconstruction approach is first used to build accurate 3-D maps, which convert the mapping problem into the optimization of probabilistic model. On the other hand, in order to solve the loop-closure detection problem with sequential 2-D laser data, the ISW-NDT approach is proposed, which can detect the revisited places effectively by sliding a window with fixed size. Experimental results 10


To further improve the performance of this approach, future work should focus on developing better validity of the 3-D reconstruction algorithm, since the reconstruction can at present only be done with a piecewise strategy (if the heading direction of the UAV changes, the reconstruction algorithm needs initialization). In addition, further work should consider dynamic disturbances, such as cars or pedestrians passing by in the scene. 6. Acknowledgements This work was supported in part by the National Natural Science Foundation of China (Grant No. 61305128, 61375088) and the State Key Laboratory of Robotics (Grant No. 2013-O07). 7. References [1] Zhang X, Chen J, Xin B, et al. Online path planning for UAV using an improved differential evolution algorithm [C]//The 18th IFAC World Congress. 2011, 18. [2] Kumar V, Michael N. Opportunities and challenges with autonomous micro aerial vehicles [J]. The International Journal of Robotics Research, 2012, 31(11): 1279-1291. [3] Piciarelli C, Micheloni C, Martinel N, et al. Outdoor Environment Monitoring with Unmanned Aerial Vehicles [M]//Image Analysis and Processing– ICIAP 2013. Springer Berlin Heidelberg, 2013: 279-287. [4] Nex F, Remondino F. UAV for 3D mapping appli‐ cations: a review [J]. Applied Geomatics, 2014, 6(1): 1-15. [5] Newman P, Cole D, Ho K. Outdoor SLAM using visual appearance and laser ranging [C]//Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference on. IEEE, 2006: 1180-1187. [6] Angeli A, Doncieux S, Meyer J A, et al. Real-time visual loop-closure detection [C]//Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on. IEEE, 2008: 1842-1847. [7] Cummins M, Newman P. FAB-MAP: Probabilistic localization and mapping in the space of appear‐ ance [J]. The International Journal of Robotics Research, 2008, 27(6): 647-665. [8] Cummins M, Newman P. Appearance-only SLAM at large scale with FAB-MAP 2.0 [J]. The Interna‐

tional Journal of Robotics Research, 2011, 30(9): 1100-1123.

2003. Proceedings. 2003 IEEE/RSJ International Conference on. IEEE, 2003, 3: 2743-2748.

[9] Kawewong A, Tongprasit N, Tangruamsub S, et al. Online and incremental appearance-based SLAM in highly dynamic environments [J]. The Interna‐ tional Journal of Robotics Research, 2011, 30(1): 33-55.

[15] Magnusson M, Andreasson H, Nüchter A, et al. Automatic appearance‐based loop detection from three-dimensional laser data using the normal distributions transform [J]. Journal of Field Robot‐ ics, 2009, 26(11‐12): 892-914.

[10] Liu Y, Zhang H. Indexing visual features: Real-time loop closure detection using a tree structure [C]// Robotics and Automation (ICRA), 2012. IEEE International Conference on. IEEE, 2012: 3613-3618. [11] Heng L, Honegger D, Lee G H, et al. Autonomous visual mapping and exploration with a micro aerial vehicle [J]. Journal of Field Robotics, 2014, 31(4): 654-675. [12] Granstrom K, Callmer J, Ramos F, et al. Learning to detect loop closure from range data [C]//Robotics and Automation, 2009. ICRA 2009. IEEE Interna‐ tional Conference on. IEEE, 2009: 15-22. [13] Granström K, Schön T B, Nieto J I, et al. Learning to close loops from range data [J]. The International Journal of Robotics Research, 2011, 30(14): 1728-1754. [14] Biber P, Straßer W. The normal distributions transform: A new approach to laser scan matching [C]//Intelligent Robots and Systems, 2003. IROS

[16] Steder B, Grisetti G, Burgard W. Robust place recognition for 3D range data based on point features [C]//Robotics and Automation, 2010. ICRA 2010. IEEE International Conference on. IEEE, 2010: 1400-1405. [17] Crocoll P, Caselitz T, Hettich B, et al. Laser-aided navigation with loop closure capabilities for Micro Aerial Vehicles in indoor and urban environments [C]//Position, Location and Navigation Symposi‐ um-PLANS 2014, 2014 IEEE/ION. IEEE, 2014: 373-384. [18] Tipaldi G D, Arras K O. Flirt-interest regions for 2d range data [C]//Robotics and Automation, 2010. ICRA 2010. IEEE International Conference on. IEEE, 2010: 3616-3622. [19] Thrun S, Diel M, Hähnel D. Scan alignment and 3D surface modeling with a helicopter platform [C]// The 4th International Conference on Field and Service Robotics, 2003.


11