May 31, 2013 - p(xt|x1:tâ1) = Ï0U(Rd) + ...... Olson uses an incremental state, where pose xt is the sum of all ...... 9100 byte were sent from robot 2 to robot 1.
Generalized Simultaneous Localization and Mapping (SLAM) on Graphs with Multimodal Probabilities and Hyperedges by
Max Pfingsthorn
A thesis submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Computer Science
Approved, Dissertation Committee
Prof. Dr. Andreas Birk
Prof. Dr. Kaustubh Pathak
Prof. Dr. Udo Frese
Date of Defense: 31. May 2013
School of Engineering and Science
ii
I would like to thank my wife, Joanna, for her never-ending support. Without you, this thesis would never have happened. Of course, my thanks also go to my parents for their continuous support, and for encouraging me to attend Jacobs University in the first place. I would also like to thank my supervisor, Prof. Andreas Birk, for offering his guidance whenever I needed it and for allowing me to explore my scientific self when I wanted to. My gratitude also goes to my PhD committee members, Prof. Kaustubh Pathak and Prof. Udo Frese. Thank you for taking the time to read this hopefully coherent account of my scientific work over the last few years. And last, but by no means least, a great thank you to my fellow lab mates and all the friends I made on campus, you know who you are. Thank you very much for making working and living here fun and rewarding.
iii
iv
Abstract Simultaneous Localization and Mapping (SLAM) is one of the cornerstones of robotics research. Any mobile robot which is to operate in a previously unknown area requires a method for estimating both a model of its new surrounding and its location within it. Even stationary robots that may have to process previously unknown objects require a method to model these objects as they are detected. SLAM methods offer solutions for these situations, utilizing varying sensors, robot models, and estimation techniques. This thesis focuses on Graph-based SLAM methods, where sensor observations are related with spatial constraints in a network fashion. Numerical optimization methods are used to estimate the most likely global configuration of observations, which, when merged, represents the model of the environment or object in question. However, the original graph topology and sensor observations are kept intact and are reusable for further mapping operations. The thesis consists of two main parts. First, several contributions to traditional Graph-based SLAM research are discussed. Novel uncertainty estimation techniques for 2D and 3D spectral registration methods are described that allow the use of such methods in Graph-based SLAM. Spectral registration methods are especially robust to noise and their computational requirements only depend on the resolution of the data, not its structure. Additionally, novel approaches to multi-robot Graph-based SLAM under communication constraints are described, including a formal description of the underlying graph structure and techniques to optimize the use of available bandwidth. Second, novel contributions in the very exciting field of robust optimization techniques for Graph-based SLAM are shown and collected in the description of the Generalized Graph SLAM framework. Specifically, the two major sources of errors in traditional Graph-based SLAM are addressed: Multiple local optima in the registration cost function (local ambiguity) that can impact the performance of traditional methods severely are represented in the Generalized Graph SLAM framework as multimodal probability distributions within the spatial constraints. The second major source of errors lies in place recognition methods, which is needed to improve the map by relating current sensor observations to much older ones. Significant work has been done to eliminate false positives, usually at the cost of false negatives. When a repetitive environment gives rise to multiple independent places in the map that fit the current sensor observation (global ambiguity), traditional methods would either disregard such ambiguous results if they can be detected or fatally diverge. In order to take full advantage of even such globally ambiguous cases, they are represented as hyperedges in the Generalized Graph SLAM framework. Multiple estimation methods are described that are applicable to these v
extended graph structures. Additionally, a method to generate multimodal registration results is presented. All contributions are further substantiated with extensive experimental results, both using synthetic and real world data sets. Synthetic data sets are used for systematic analysis of the involved parameters and comparison with ground truth. Results on real world data sets show the applicability and effectiveness of the respective methods in realistic scenarios.
vi
Contents List of Figures
xi
List of Tables
xvii
Glossary
I
xix
Introduction
1
1 Motivation 1.1 Simultaneous Localization and Mapping . . . . . . . . . . . . . . . . . . . 1.2 Multi-Robot Simultaneous Localization and Mapping . . . . . . . . . . . . 1.3 Local and Global Ambiguities in Simultaneous Localization and Mapping 1.4 The Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Structure of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Note on the Timeline of Publications on Robust Pose Graph SLAM . . .
. . . . . .
3 3 4 5 8 8 12
2 Graph-based Simultaneous Localization and Mapping 2.1 The Pose Graph Map Data Structure . . . . . . . . . . . . 2.2 General Cost Function for Pose Graph Optimization . . . 2.3 Multi-Robot Pose Graph SLAM . . . . . . . . . . . . . . . 2.3.1 The Multi-Robot Pose Graph Map Data Structure 2.3.2 Map Merging is Equivalent to Loop Closing . . . . 2.3.3 What Data is Necessary to Build a Shared Map? . 2.4 Parts of a Working Pose Graph SLAM System . . . . . . 2.4.1 The Front-End . . . . . . . . . . . . . . . . . . . . 2.4.2 Constraint Generation Methods . . . . . . . . . . . 2.4.3 Data Transmission Methods . . . . . . . . . . . . . 2.4.4 The Back-End . . . . . . . . . . . . . . . . . . . . 2.4.5 Final Map Rendering . . . . . . . . . . . . . . . . 2.5 A Quality Metric for Experimental Map Evaluation . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
13 13 15 16 16 17 18 19 19 20 20 21 22 22
3 Local and Global Ambiguities in Graph-based 3.1 Motivation . . . . . . . . . . . . . . . . . . . . 3.1.1 Local Ambiguity in Practice . . . . . . . 3.1.2 Global Ambiguity in Practice . . . . . . 3.2 The Generalized Graph SLAM Framework . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
25 25 25 27 29
vii
Methods . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS 3.2.1 3.2.2 3.2.3 3.2.4
II
Using Multimodal Mixtures of Gaussians to Represent Local Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Hyperedges to Represent Global Ambiguity . . . . . . . . . Properties of Mixtures of Gaussians . . . . . . . . . . . . . . . . . Generalized Graph SLAM Complexity Metric . . . . . . . . . . . .
. . . .
Applications of Pose Graph SLAM
37
4 Uncertainty Estimation of Registration Results for the Use in Pose Graph SLAM 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Estimating Mean and Covariance for Spectral Image Registration . . . . . . 4.2.1 The iFMI image registration algorithm . . . . . . . . . . . . . . . . . 4.2.2 Uncertainty Information from Spectral Image Registration . . . . . . 4.3 Estimating Mean and Covariance for Spectral Registration without Scale . 4.3.1 A 2D Scan Matching Variant of the iFMI Image Registration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Post-processing of the Translation Parameters . . . . . . . . . . . . . 4.4 Estimating Mean and Covariance for Spectral Voxel Grid Registration . . . 4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 2D Affine iFMI Mapping . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1.1 Cold Corals . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1.2 Rocky Testing Pool . . . . . . . . . . . . . . . . . . . . . . 4.5.1.3 Real-World ROV Data . . . . . . . . . . . . . . . . . . . . 4.5.2 2D Spectral Scan Matching . . . . . . . . . . . . . . . . . . . . . . . 4.5.2.1 Real-World 2D Sonar Range Scans . . . . . . . . . . . . . . 4.5.3 3D iFMI Voxel Grid Mapping . . . . . . . . . . . . . . . . . . . . . . 4.5.3.1 Simulated 3D Sonar Range Scans . . . . . . . . . . . . . . 4.5.3.2 Real-World 3D Sonar Range Scans . . . . . . . . . . . . . . 5 Multi-Robot Graph-based SLAM 5.1 Bandwidth Advantages of the Multi-Robot Pose Graph . . . . . . . . . . 5.1.1 Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Prioritizing Data Transfers Based on Map Estimates for Limited Bandwidth Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Results in Severely Bandwidth Limited Environments . . . . . . . . . . . 5.3.1 Realistic Underwater Acoustic Modem Data Rates . . . . . . . . . 5.3.2 Message Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Underwater Multi-Robot Image SLAM in 2D . . . . . . . . . . . . 5.3.4 Underwater Multi-Robot Full 3D SLAM . . . . . . . . . . . . . . . 5.3.4.1 Real 3D Sonar Data . . . . . . . . . . . . . . . . . . . . . 5.3.4.2 A Simulated Team of Four Robots . . . . . . . . . . . . . viii
29 31 34 36
39 39 40 40 41 43 43 44 45 48 48 48 49 53 56 56 59 59 60
. . . . .
65 65 65 66 67 69
. . . . . . . .
70 71 71 73 74 79 79 81
CONTENTS
III Solving Local and Global Ambiguity with Generalized Graph SLAM 85 6 Optimization Methods for the Generalized Graph SLAM 6.1 Particle-Based Optimization . . . . . . . . . . . . . . . . . . 6.2 Trust-Region Newton . . . . . . . . . . . . . . . . . . . . . . 6.3 Reduction Methods . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Basic methods: Exhaustive, Max, and Multi-Edge . . 6.3.2 MoG-only Prefilter . . . . . . . . . . . . . . . . . . . 6.3.3 Generalized Prefilter for Hypergraphs . . . . . . . . 6.3.4 Optimizing the resulting unimodal pose graph . . .
Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
87 87 89 90 90 91 93 95
7 Results of Systematic Experiments on Local Ambiguity with Synthetic Datasets 97 7.1 Generation of Random Multimodal Graphs . . . . . . . . . . . . . . . . . . 97 7.2 Optimization Results over Increasing Levels of Complexity . . . . . . . . . . 100 7.2.1 The Influence of Good Odometry Estimates . . . . . . . . . . . . . . 104 8 Results on Local Ambiguity from Real World Datasets 8.1 Generating Mixture of Gaussian Estimates from Registration Methods 8.1.1 Mixture of Gaussian Results from Plane Registration . . . . . . 8.2 Bremen City Center Dataset . . . . . . . . . . . . . . . . . . . . . . . 8.3 Hannover Fair Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
113 113 113 116 125
9 Results of Experiments on Global and Local Ambiguity using the Full Generalized Graph SLAM Framework 131 9.1 Systematic Evaluation with Synthetic Data . . . . . . . . . . . . . . . . . . 131 9.2 Real World Dataset: Bremen City Center . . . . . . . . . . . . . . . . . . . 136
IV
Conclusion
141
10 Conclusions 10.1 Significant Contributions of this Thesis . . . . . . . . . . . . . . . . 10.1.1 The Generalized Graph SLAM Framework . . . . . . . . . . 10.1.2 Mixture of Gaussian Results from Plane-based Registration 10.1.3 The Prefilter Method . . . . . . . . . . . . . . . . . . . . . 10.1.4 The Trust-Region Newton Method . . . . . . . . . . . . . . 10.1.5 Multi-Robot Pose Graph Transmission Methods . . . . . . 10.1.6 Uncertainty Analysis of Spectral Registration Methods . . . 10.2 Summary of Answers to the Research Question . . . . . . . . . . . Bibliography
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
143 . 143 . 143 . 143 . 144 . 144 . 144 . 144 . 145 147
ix
CONTENTS
x
List of Figures 1.1 1.2
Sense-Plan-Act cycle adapted to Sense-Estimate-Move for Simultaneous Localization and Mapping (inspired by [159]) . . . . . . . . . . . . . . . . . . .
4
A visualization of an ambiguous registration process. Structural ambiguity in the environment can be one source of registration errors, here for example in form of the crossing of two corridors. Two scans (bottom left) are to be registered; one scan contains the corridor to the North while the other scan includes the East corridor after the robot turned to the right. Using a simple correlation-based method to match the two scans, multiple maxima are immediately visible in the resulting parameter space (top left). Some registration results corresponding to the local maxima are shown on the right, ordered by quality. However, in this case, A is not the globally correct result, but C is. (published in [140]) . . . . . . . . . . . . . . . . . .
6
2.1
A schematic depiction of a pose graph. The vertices in the graph correspond to poses where sensor observations were made. Matching sensor observations give rise to constraints on the edges. Sequential edges follow the trajectory, non-sequential edges “close loops”. . . . . . . . . . . . . . . . 14
2.2
A schematic depiction of a multi-robot pose graph. Inter-robot edges join previously disconnected components in the pose graph, one per robot. . . . 16
4.1
Example results from the iFMI image registration method.
4.2
Two matched pairs before and after scan matching. The left pair matches well. Systemic errors in the sonar sensor, such as the “bow-tie” shape of straight walls, shown on the right, give rise to features that make some pairs harder to match. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3
This dirac spectrum shows multiple possible translations (circles) for the right pair in figure 4.2. Note the distinct ”V” shape of possible positions corresponding directly to sliding overlaps in the scans. . . . . . . . . . . . . 44
4.4
Twos sequential scans with a successful registration result from the flood gate data set. The top row shows the scans before and after matching. The bottom shows the corresponding POMF results with matched normal distributions. The P N R values are high: Yaw 0.9608, roll/pitch 29.6166, translation 37.3184. Note that the probability mass functions are described well by normal distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . 47 xi
. . . . . . . . . 41
LIST OF FIGURES 4.5
4.6
4.7
4.8 4.9
4.10 4.11 4.12
4.13 4.14 4.15
4.16
4.17 4.18
Two non-sequential scans from the flood gate data set with low overlap, leading in an unsuccessful matching result. The top row shows the scans before and after matching. The bottom shows the corresponding POMF results with matched normal distributions. The P N R values are low: Yaw 0.1955, roll/pitch 0.0505, translation 0.0279. Note that the probability mass functions are not described well by normal distributions since the matching was unsuccessful. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Short sequence of three images from the cold water corals data set. The detail on the bottom shows the merged image before optimization, and the one above shows the much clearer structures of the optimized image map. The corresponding pose graph with a slightly misaligned top is shown on the right. Note the global covariance matrices plotted as ellipses in magenta. Complete image map generated from the testing pool data set, before (left) and after (right) graph optimization. Note the blurred lower left part of the map before optimization and the misalignment of two rocks (arrows). After optimization, the map is visibly improved and the rocks are aligned properly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detail of the first and last frame of the testing pool image map, before and after pose graph optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . Pose graph that represents the map shown in Figure 4.7, before (top) and after (bottom) optimization with the TORO graph optimization library. It consists of 35 vertices and 57 edges. The starting pose is green, constraint edges from registration are blue, vertices that are projected to multiple poses because of accumulated errors in a loop are shown in red. Note that these conflicts are resolved after optimization. . . . . . . . . . . . . . . . . . The Romeo ROV used in the experiments. The bottom image shows the monocular camera assembly. . . . . . . . . . . . . . . . . . . . . . . . . . . . Rendered Photomosaic Maps . . . . . . . . . . . . . . . . . . . . . . . . . . Details of a small sea mount structure in the map. Note the significant improvement when using spectral registration and subsequent graph optimization (right). Note the obvious image seams pointed out by arrows. These seams almost vanish completely after optimization. . . . . . . . . . . High-resolution crop of the start area. Note the gross error of the map without loops. These errors are corrected after optimization with loops. . . Complete Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of successful registration results. Note that the rotation is greater than 90 degrees in the two examples on the left. Top right: One wall is missing from one scan, but the scan matcher still succeeds. . . . . . . . . . Two simulated sonar scans as points clouds before (top left) and after (top right) registration. The bottom row shows the resulting PMFs and the fitted normal distribution used for mapping. . . . . . . . . . . . . . . . . . . Two screenshot of the simulated world with the robot model. Total area of the world was 200m × 200m. . . . . . . . . . . . . . . . . . . . . . . . . . . The 3D point cloud map before (top) and after optimization (bottom). Note the improvement in the top left area. The map contains approximately three million points. The underlying pose graph structure is shown in blue. Height is color coded ranging from blue (low) to red (high). . . . . . . . . . xii
47
48
50 51
52 53 54
55 56 57
58
60 60
61
LIST OF FIGURES 4.19 A 3D map of the Lesumer Sperrwerk, a flood gate in Bremen. It is generated by sequentially matching several 3D sonar scans with the 6 DoF spectral registration method by B¨ ulow and Birk [18]. The 3D map is once shown as a 3D voxel grid (top) and once projected onto ground truth imagery from Google Earth (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.1
5.2
5.3
5.4
5.5
5.6 5.7
5.8
5.9
7.1
Total bandwidth required for the map generated with the Intel labs (upper left), Freiburg Campus outdoor (upper right), Freiburg albert (lower left), and Freiburg fr079 (lower right) data set. . . . . . . . . . . . . . . . . . . Size comparison of updates for an occupancy grid vs. the pose graph for the Intel labs (upper left), Freiburg Campus outdoor (upper right), Freiburg albertb (lower left), and Freiburg FR079 (lower right) data set. Note the log scale on the y-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Left: Sample paths on a high resolution image for a team of four robots. Start locations are circled. Right: Example images extracted for the start locations of robots 1 (bottom) and 3 (top). . . . . . . . . . . . . . . . . . An example result of the cooperative mapping strategy. The pose graph structure is superimposed (poses are black arrow heads, edges are blue, inter-robot edges are yellow, residual errors are red, startposes are marked with circles, ground truth is shown in gray). Details of the areas indicated by red squares are shown in figure 5.5 . . . . . . . . . . . . . . . . . . . . Map details from figure 5.4. No Comms case (left) is compared with the map built with the Multi-Robot strategy (right). Numbers correspond to indicated areas in figure 5.4. . . . . . . . . . . . . . . . . . . . . . . . . . . Bytes transferred per time step, first experiment. The multi-robot update strategy uses the available bandwidth almost optimally. . . . . . . . . . . The final shared map overlaid on Google Earth imagery. Note that due to angle dependent sensor noise, the lower parts of the structure show more error in the range measurements and thus seem closer to the sensor position (bottom left). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Top: The four single robot maps, using ground truth poses. Bottom Left: The resulting map using no communication. Obvious misalignments are visible. Bottom Right: The final optimized map generated by robot 1 within the bandwidth limits. Maps by the other robots are identical. Loops found between robot trajectories are drawn in green. . . . . . . . . . . . . . . . . Used bandwidth per time step. For comparison, the needed bandwidth for both full communication in which all sensor data is send indiscriminately is also shown, once for plane clouds and once for raw unprocessed points clouds. Note the logarithmic scale on the y axis. . . . . . . . . . . . . . .
. 68
. 69
. 74
. 76
. 77 . 78
. 80
. 82
. 83
This figure shows an example multimodal pose graph of complexity C(G) = 4, meaning it contains four two-component mixtures, all other edge distributions are unimodal Gaussians. Edges that are assigned a low probability transformation are shown in dark grey/magenta. The ground truth is shown in light grey in the background. Note that the left column represents the state-of-the-art up to 2012. . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 xiii
LIST OF FIGURES 7.2
Performance of each method by condition. Reported figures are the minimum, lower quartile, median, upper quartile and maximum of the residual SSE errors relative to ground truth. Smaller is better. The marker location specifies the median, a vertical line is drawn between the minimum and maximum, and horizontal line ticks indicate the quartiles. In some cases, the median marker may occlude the quartile marks. Note the log scale on the y axis. SSExy is shown in the top graph, the bottom shows SSEθ . . . . 103
7.3
Performance the Particle method by condition and different levels of fused odometry. Reported figures are the minimum, lower quartile, median, upper quartile and maximum. The marker location specifies the median, a line is drawn between the minimum and maximum, and line ticks indicate the quartiles. Note the log scale on the y axis. SSExy is shown in the top graph, the bottom shows SSEθ . Also, note the different number of particles used for the three cases (90, 160, and 10,000). . . . . . . . . . . . . . . . . . . . . 107
7.4
Performance the Max LM and Prefilter LM methods by condition and different levels of fused odometry. Reported figures are the minimum, lower quartile, median, upper quartile and maximum. The marker location specifies the median, a line is drawn between the minimum and maximum, and line ticks indicate the quartiles. Note the log scale on the y axis. SSExy is shown in the top graph, the bottom shows SSEθ . . . . . . . . . . . . . . . . 108
7.5
Performance the Max SGD and Prefilter SGD methods by condition and different levels of fused odometry. Reported figures are the minimum, lower quartile, median, upper quartile and maximum. The marker location specifies the median, a line is drawn between the minimum and maximum, and line ticks indicate the quartiles. Note the log scale on the y axis. SSExy is shown in the top graph, the bottom shows SSEθ . . . . . . . . . . . . . . . . 109
8.1
Above: Scans 1 (right) and 2 (left) of the Bremen City data set. Below: The effects of ambiguity due to occlusion on plane-registration of the scan pair. The left image shows the most likely result as reported by the plane registration method. The second most likely registration result, shown on the right, is the globally correct but less certain one (see the covariance determinants in the first row of table 8.2). Detail views of where the scans meet at the church tower are shown in the right column. Note that no odometry exists to disambiguate the results. . . . . . . . . . . . . . . . . . . 119
8.2
Final maps in plane patch representation after optimization with the traditional unimodal Max SGD method (top) and the multimodal Prefilter LM (bottom). The planes used for matching as well as the graph structure is shown. The log probability of the traditional result is −1.26933 · 109 , while the log probability of the multimodal result is −1.01384 · 105 . . . . . . . . . 121
8.3
Mapping results in point cloud representation using the traditional unimodal Max SGD method (top), and the multi modal Prefilter LM method (bottom). Laser reflectance values are used for assigning greyscale values (coded with the Jet colormap in color). See also Extension 1 for an animated view of these maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 xiv
LIST OF FIGURES 8.4
8.5
9.1
9.2
9.3
9.4 9.5
Final map after optimization with Prefilter LM overlaid on aerial imagery of Bremen City Center from Google Earth. Note that due to the height of the buildings, some parallax exists in the aerial image, and some ground features (fountains, small trees), as well as some high structures (cathedral tower) do not match exactly. The image shows the down projected map at the general roof level. The height is used to assign greyscale values (coded with the Jet colormap in color). . . . . . . . . . . . . . . . . . . . . . . . . . 123 Hannover Fair map. Top: Traditional Max SGD method using only the locally most likely registration result. Bottom: Result of the Prefilter LM method. The local z coordinate was used to assign greyscale values (coded with the Jet colormap in color). See also Extension 1 for an animated view of these maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 One example graph of the data set of complexity class 7 and C(G) = 32, with a total of 16 multimodal MoG edges with two components and 16 hyperedges with two hypercomponents. Ground truth is shown in gray in the background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Final SSE metric relative to ground truth for each of the 11 complexity classes. The median and upper/lower quartiles are shown. Note the log scale on the y axis. The final SSE metric of the optimization result using the ground truth graph is also shown for comparison. . . . . . . . . . . . . . Runtimes for each of the methods on all 11 complexity classes. The median and upper/lower quartiles are shown. Times were recorded on an Intel i73770 3.4GHz with 16GB RAM. Note the log scale on the y axis. Runtimes varied from 0.01s to 1.45s over all methods, quartiles are between 0.03s and 0.44s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Orthographic view of the planar maps generated from the exhaustively matched Bremen City Center dataset after optimization. . . . . . . . . . . . Perspective view of the planar maps generated from the exhaustively matched Bremen City Center dataset after optimization. . . . . . . . . . . . . . . . .
xv
132
133
134 138 139
LIST OF FIGURES
xvi
List of Tables 5.1 5.2 5.3 5.4 5.5 5.6 7.1
7.2
7.3
7.4
7.5
7.6
Total number of bytes transmitted . . . . . . . . . . . . . . . . . . . . . . Average number of bytes transmitted per update, with particle changes and without . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Message definitions for the experiments below. . . . . . . . . . . . . . . . Comparisons of map quality with respect to ground truth. . . . . . . . . . Comparisons of map accuracy with respect to ground truth. . . . . . . . . Map sizes and optimization run times of the final map. . . . . . . . . . . .
. 68 . . . . .
70 73 78 83 84
The 11 conditions used in the experiments with their different amounts of multimodal edges and their degree of multimodality C(G). The overall percentage of multimodal edges (MM%) is also shown. On the right, the minimum Euclidean and squared Mahalanobis distances between two components from the same mixture are also shown. On average, components were located 189.045 units from each other, with an average Mahalanobis distance of 20,825.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Successes by method and condition (table 7.1) in terms of the number of trials that produced a result within 5 times the SSExy and SSEθ error of the Exhaustive SGD method. The Particle method used 10,000 particles. . 101 Runtimes (means and standard deviations) in seconds by method and condition. The experiments were run on a Intel Core i7-2720QM, 2.2GHz, 8GB RAM. All implementations were done in C++, only Newton was implemented in Matlab. The Particle method used 10,000 particles. M-E LM stands for Multi-Edge LM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Additional trials of Prefilter SGD to investigate its asymptotical behavior. Note the exponential complexity. The mean runtime is shown in seconds. Means and standard deviations of the final SSE error are reported. As in table 7.2, successful trials are reported as a percentage within 5 times the SSE error of Exhaustive SGD. . . . . . . . . . . . . . . . . . . . . . . . . . 106 Runtimes (means and standard deviations) in seconds for the Particle method for different levels of odometry applied to the initial pose graph. The experiments were run on a Intel Core i7-2720QM, 2.2GHz, 8GB RAM. . . . . 110 Successes by the Particle, Max LM, Prefilter LM, Max SGD, and Prefilter SGD methods (as in Table 7.2). Odometry information with different noise levels (L1 and L2 ) was merged with the pose graphs before the methods were applied. None means no odometry was fused, as with the other methods summarized in Table 7.2. The unfused results are shown for comparison.111 xvii
LIST OF TABLES 8.1 8.2
8.3
8.4
8.5 8.6
8.7
8.8
8.9
9.1 9.2
9.3
Plane matching parameters used for the Bremen City data set. . . . . . . . 117 The multimodal edges in the Bremen City map. The left column shows the edge in question, the other columns show the list of modes in the order reported by the plane based registration. Each mode is shown in form of the estimated translation T () and rotation R() as well as the determinant of the covariance det(C) associated with it. Furthermore, each globally correct mode is highlighted in gray. . . . . . . . . . . . . . . . . . . . . . . . 118 Runtimes in seconds and result quality for the traditional Max SGD/LM, Multi-Edge LM, and Prefilter SGD/LM on the Bremen City Data Set. Recorded on a Core i7-2720QM 2.2GHz with 8GB of RAM. The SSE metric (see section 2.5) was computed relative to the de-facto ground truth transformations given by the marker based registration. Max SGD/LM were initialized with a breath-first traversal of the graph, the rest of the methods were initialized with the Prefilter method described in section 6.3.2. . . . . 120 Runtimes in seconds and result quality for different numbers of particles (labeled # above) of the Particle method on the Bremen City Data Set. Due to the nondeterministic nature of the Particle method, 100 trials were run for each particle count and summarized by the mean and standard deviation. The data was recorded on a Core i7-2720GM 2.2GHz with 8GB of RAM. The SSE metric (see section 2.5) was computed relative to the de-facto ground truth transformations given by the marker based registration.124 Plane matching parameters used for the Hannover Fair data set. . . . . . . 125 Runtimes in seconds and the final log probability achieved by the Particle method using different particle counts (labeled # above) on the Hannover Fair data set. Due to the nondeterminism of the Particle method, 100 trials were run per count. The data was recorded on a Core i7-2720QM 2.2GHz with 8GB of RAM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Runtimes in seconds and results for the traditional Max SGD/LM, MultiEdge LM, and Prefilter SGD/LM, recorded on a Core i7-2720QM 2.2GHz with 8GB of RAM. Hannover Fair data set. Max SGD/LM were initialized with a breath-first traversal of the graph, the rest of the methods were initialized with the poses computed by the Prefilter method described in section 6.3.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Multimodal edges in the Hannover Fair map, continued in table 8.9. Again, the left column shows the edge containing the modes on the right. Up to five modes per edge were detected. The globally correct mode is highlighted.128 Multimodal edges in the Hannover Fair map, continued from table 8.8. Again, the left column shows the edge containing the modes on the right. Up to five modes per edge were detected. The globally correct mode is highlighted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 The 11 complexity classes used in the experiments. . . . . . . . . . . . . . 131 Connectivity matrix between all 13 scans, showing the number of components in the multimodal registration result per pair. A missing number in the upper triangle means that no registration result was found for that pair. 137 SSE errors relative to the “gold standard” marker-based registration for each optimization method. SC stands for Switchable Constraints. . . . . . . 137 xviii
Glossary µ
Mean
The pose difference operator
⊕
The pose compounding operator
Σ
Covariance
a.b
If a is a tuple with member b, e.g. a = (b, c, d), this notation stresses that the exact b which is in the a tuple is meant in case there are more such tuples.
a∗
An optimal value of variable a
at
A value of the variable a at time t
a1:t,1:n
A list of values of the variable a from time 1 to t, for robots 1 through n
a1:t,n
A list of values of the variable a from time 1 to t, for robot n
a1:t
A list of values of the variable a from time 1 to t
m
A map
SSE
State Squared Error, from [122], same as Mean Square Error but explicitly defined for pose graphs.
t
A time, usually the current time
t
A future time, usually relative to the current time t
u
A control input (e.g. a motor voltage)
x
A pose
z
A sensor observation (e.g. a laser range reading)
DoF(s)
Degree(s) of Freedom
MSE
Mean Square Error
Pose
A position and orientation, mathematically a member of the Special Euclidean Group SE2 for 2D or SE3 for 3D problems.
RMSE
Root Mean Square Error
xix
GLOSSARY
xx
Part I
Introduction
1
Chapter 1
Motivation In general, mobile robots move in an environment to achieve a task, for example a delivery, manipulation, or search task. The more is known about the environment, e.g. object locations or topology, the better the robot can perform its task. Clearly, there exists a whole spectrum of levels of knowledge about the environment in different robotics applications. Factory robots are at the end of the spectrum where the complete environment is predefined, known, and completely certain. Such robots rely on the fact that a welding point, for example, will be at the same location in every cycle of a production line. Warehouses are sometimes automated in a comparable way, where transport robots constantly move items from one known location to another to facilitate packing outgoing parcels as quickly as possible. Any outside interference in the placement of items would break the system. Mobile robots operating in unstructured environments are at the other end of this spectrum. Often, no initial knowledge about the environment is assumed at all, and the robot begins with a completely empty representation of the world that surrounds it. This may happen, for example, in a planetary or underwater survey scenario, in settings after natural or manmade disasters, or simply if data about the environment is not available in a usable or affordable format. Thus, a method is required for these robots to build a representation of the environment and estimate their location within it incrementally in such a way that facilitates the robot’s original task.
1.1
Simultaneous Localization and Mapping
The Simultaneous Localization and Mapping (SLAM) problem is a formalization of exactly this scenario, and methods that try to solve this problem have been investigated for some time, going back to the 1980’s and 1990’s [42, 105, 106, 163, 174]. In the SLAM problem, a robot is assumed to have one or more sensors that allow it to observe the environment (exteroceptive) or itself (interoceptive). Examples of exteroceptive sensors include touch sensors, cameras (monocular and stereoscopic, color and infrared, etc), laser range sensors, sonar range sensors, Global Positioning System (GPS) receivers, a compass, or structuredlight RGB-D cameras (such as the Microsoft Kinect or Asus XTion) that produce color and range information simultaneously at very high rates. Examples of interoceptive sensors are wheel encoders to produce odometry, Volt and Ampere meters to observe motor power consumption, or inertial measurement units (IMU). Also, the robot is assumed to have a 3
1. Motivation
Map & Pose Estimate
Sense
Environment
Mobile Robot
Move Figure 1.1: Sense-Plan-Act cycle adapted to Sense-Estimate-Move for Simultaneous Localization and Mapping (inspired by [159])
certain propulsion mechanism, or another form of actuator, which allows it to move in its environment. This actuator is controlled by control commands or inputs that the robot issues. While the robot is traversing an initially unknown environment, it records sensor observations as well as control inputs to its actuators at certain intervals. The SLAM problem states that a map of the environment should be computed and the robot pose in that map should be estimated at the same time from these readings. One very popular formulation of the SLAM problem is probabilistic. Given a sequence of sensor observations z1:t and control inputs u1:t until time t, compute the most probable map m and trajectory x1:t : x∗1:t , m∗ = argmax [p(x1:t , m|z1:t , u1:t )]
(1.1)
x1:t ,m
In their textbook on Probabilistic Robotics, Thrun et al. [174] discuss a number of algorithms to solve this probabilistic estimation problem, including several versions of the Kalman filter [89] and the particle filter [43, 64]. The most important point Thrun et al. make is that this probability can be factored as: p(x1:t , m|z1:t , u1:t ) = p(x1:t |z1:t , u1:t )p(m|x1:t , z1:t )
(1.2)
It is thus possible to estimate the trajectory separately from the map. Successful SLAM methods thus reduce the above full SLAM problem to a localization problem in order to compute good estimates for the trajectory x1:t [65, 112]. x∗1:t = argmax [p(x1:t |z1:t , u1:t )]
(1.3)
x1:t
Deriving the map m∗ from the optimal trajectory x∗1:t is trivial. Depending on the map representation, it is usually done by applying forward sensor models [129, 171] from the respective poses in the optimal trajectory or by transforming point clouds into a shared coordinate system relative to the first pose x1 .
1.2
Multi-Robot Simultaneous Localization and Mapping
An obvious extension of the general Simultaneous Localization and Mapping problem involves utilizing multiple robots. Before the actual task of a single robot or a team of 4
1.3 Local and Global Ambiguities in Simultaneous Localization and Mapping robots can begin, i.e. until enough information is available about the environment to attempt executing the original task, the environment needs to be explored and knowledge about it needs to be accumulated and stored in a map. This exploration task is inherently parallelizable, since different robots may explore different parts of the environment at the same time. Thus, multi-robot SLAM methods have a great potential to save time and additionally provide redundancy [21, 53, 56, 57, 92, 126, 158, 162, 170, 172, 182]. However, having a distributed SLAM system presents very unique challenges. Apart from the distributed exploration problem, it poses significant challenges in inter-robot communication (which may be disrupted) [115, 156], map registration between the single robot maps to relate them spatially without odometry estimates [24, 173] and map representation in general [31, 82]. Sensor observations are inherently large, e.g. a high-resolution camera image or a 3D laser scan consisting of some thousand up to several million points, and require a substantial bandwidth to be transferred in a timely manner to other robots in the team in order to build a shared map on each robot. In the event that the communication link is lost, the multi-robot SLAM system should still be able to function as a single-robot SLAM system would. Furthermore, when the communication link is reacquired, the accumulated changes in the maps of the respective robots have to be synchronized and merged into a joint map. If several robots start mapping an initially unknown environment from initially unknown locations, their local maps have no known spatial relationship. With time, as robots get close enough to a) establish a communication link and synchronize their map and trajectory data and b) observe the same parts of the environment, it becomes possible to relate the initially separate maps through similar shared sensor observations. This problem is known as map matching, or map registration. Several alternatives exist for this purpose in literature [9, 25, 26, 40, 75, 95]. In the multi-robot case, eq. 1.1 becomes x∗1:t,1:n , m∗ = argmax [p(x1:t,1:n , m|z1:t,1:n , u1:t,1:n )]
(1.4)
x1:t,1:n ,m
where x1:t,1:n is the list of n robot trajectories through time t, z1:t,1:n is list list of sensor observations made by the robot team, and u1:t,1:n is the list of per-robot control inputs. Note that one single shared map m is estimated using information from all robots in the team. Again, this probability can be factored as above. x∗1:t,1:n = argmax [p(x1:t,1:n |z1:t,1:n , u1:t,1:n )]
(1.5)
x1:t,1:n
1.3
Local and Global Ambiguities in Simultaneous Localization and Mapping
The major source of errors in SLAM is faulty data association. Specifically, two types of data association errors are identified in this thesis: a) errors in identifying common data in two consecutive sensor observations (local ambiguity) and b) errors identifying common data in temporally distant sensor observations (global ambiguity). 5
1. Motivation 0.6
A
B
C
D
E
F
G
H
0.3
0 8 6
7 6
5
5 4 4 3 3 2
2
1
1 0
0 −1 −4
−3
−2
−1
0
1
2
3
4
−4
−3
−2
−1
0
1
2
3
4
5
6
Figure 1.2: A visualization of an ambiguous registration process. Structural ambiguity in the environment can be one source of registration errors, here for example in form of the crossing of two corridors. Two scans (bottom left) are to be registered; one scan contains the corridor to the North while the other scan includes the East corridor after the robot turned to the right. Using a simple correlation-based method to match the two scans, multiple maxima are immediately visible in the resulting parameter space (top left). Some registration results corresponding to the local maxima are shown on the right, ordered by quality. However, in this case, A is not the globally correct result, but C is. (published in [140])
Specifically, the term ambiguity is used for the situation where a clear global optimum of the respective registration cost function in the local case or the data association metric in the global case cannot be found, but multiple local optima are present instead. Situations occur in which the transformation estimate between two consecutive scans becomes ambiguous, e.g. repetitive corridor (translational), corridor crossing (rotational), occlusions/low overlap (both), or due to violation of assumptions and harsh approximations of the registration method used, e.g. 3D motion for a 2D scan matcher. Especially, the registration between observations from two different robots may be ambiguous because of vastly different perspectives and low overlap and/or occlusions. When the robot moves from xa to xb and acquires sensor data za and za in these locations, there are many possible reasons why a registration of za and zb may not lead to a unique solution. One fundamental issue is ambiguity in the environment. Consider for example a hallway with a repetitive pattern of doors or lights on the ceiling or a corridor intersection. Any form of processing of the observed sensor data can only resolve the robot motion up to the distance to the nearest door in the hallway case, but it would be impossible to estimate which door seen in the previous observation is the currently nearest one. Similarly, it is possible to know the relative position to a corridor entrance in the symmetric corridor intersection case, but not which exact corridor entrance it is. Thus, in 6
1.3 Local and Global Ambiguities in Simultaneous Localization and Mapping these ambiguous cases, the resulting probability distribution contains a discrete number of rather pronounced local maxima, one per possible robot motion. A traditional sensor data registration method may either just randomly choose one of these modes and report it, or report a result with an exaggerated uncertainty. In these examples of translational and rotational ambiguity, it is impossible to compute a single accurate motion estimate by only considering data within the observation pair being registered. Other possible sources for ambiguities include the presence of changes in the environment over time, occlusions, limited sensor range, or limitations in the concrete registration methods themselves. An illustrative example of a registration process with ambiguous sensor data is presented in figure 1.2, showing the corridor intersection case from above. Registration methods that enumerate multiple possible transformations that all lead to acceptable registration results are virtually non-existent. While some authors [121] noticed that their registration cost function contains multiple optima, and while it has long been known that registration methods that use local gradient information (such as ICP [6] or the Vasco scan matcher from the CARMEN framework1 ) will converge to different solutions with different initial guesses indicating multiple optima as well, no registration method exists that takes these multiple optima into account. This specific form of ambiguity is referred to as local ambiguity. A registration result of two scans is called locally ambiguous if these ambiguities cannot be resolved using only information present in the observations themselves. In other words, the registration cost function has multiple optima and could be represented as a multimodal Mixture of Gaussian (MoG) probability distribution. Mk X
p(xt |xt−1 ) =
πm N(xt xt−1 |µm , Σm )
(1.6)
m=1
P
with πm = 1. Each mean µm corresponds to an optimum in the registration cost function, Σm usually corresponds to the inverse of the hessian at that point, and the weight πm should be proportional to the value of the registration cost function at µm . Global ambiguity corresponds to the case where the current and a temporally distant sensor observation may show the same section of the environment, or where the current observation matches a number of different places in the map. Formally, there exists a probability mass function (PMF) which is defined over all previous poses in the trajectory, and a null hypothesis in case the current observation is completely new. This PMF can be represented as the weights πm of a more generalized mixture over all previous poses and an uninformative uniform distribution representing the null hypothesis. p(xt |x1:t−1 ) = π0 U(R ) + d
t−1 X
πi p(xi xt |ci )
(1.7)
i=1
P with t−1 πi = 1, and where x1:t−1 are all poses from time 1 to t − 1, p(dx|c) is any 0 probability distribution representing a registration result, π0 is the weight of the null hypothesis, and U(Rd ) is the uniform distribution over all real numbers of the same degree of freedom d as the poses. Note that local ambiguity may also occur in the registration result referenced in the global ambiguity case. Thus they describe orthogonal problems, both or either may or 1
from http://carmen.sourceforge.net
7
1. Motivation may not occur in any given SLAM problem. Solutions to both are required, though in the past both have been neglected in favor of a simple traditional unimodal SLAM model. In this traditional case, only the components with the highest probability or weights π are used.
1.4
The Research Question
The main research question this thesis aims to answer is: How can a robot or a team of robots robustly estimate a map of an environment that is repetitive or otherwise leads to ambiguous loop detection (global ambiguity) and sensor data registration results (local ambiguity) efficiently and effectively? In order to answer the major part of the research question above, the following issues concerning SLAM under local and global ambiguities will be addressed. • How can a registration method produce all potential solutions, not just the locally most likely one? • How can taking the rest of the map into account resolve local ambiguity? • Can global ambiguity be resolved similarly? • What improvements can an explicit treatment of these sources of ambiguity lead to? In order to answer the multi-robot SLAM aspect of the research question, the following issues will be addressed. • What data is necessary for a team of robots to build a shared map? • How can a team of robots share a map efficiently and under communication constraints? The following issues concerning SLAM in general will also be addressed. • How can uncertainty information be extracted from results of spectral registration methods for use in SLAM? All these issues will be addressed in the context of a SLAM paradigm rooted in graph theory. A brief motivation and the fundamentals of Graph-based Simultaneous Localization and Mapping will be introduced in the next chapter.
1.5
Structure of this Thesis
The thesis is organized as follows. Chapter 2 introduces graph-based SLAM, derives popular cost functions, gives an overview over related work in the field, and describes what a complete graph-based SLAM system consists of. Chapter 3 describes work beyond the current state-of-the-art in graph-based SLAM. Specifically, several new ideas in the direction of robust SLAM methods are described that can deal with local and global 8
1.5 Structure of this Thesis ambiguity in sensor observations, registration results, as well as loop recognition results. These are consolidated in a novel Generalized Graph SLAM framework. In Part II, Chapter 4 describes methods to gain important uncertainty information from spectral registration methods, both in 2D and 3D, and how these are used in a Graphbased SLAM framework. Chapter 5 shows how graph-based SLAM can be employed in multi-robot teams, especially under communication constraints. Part III describes several methods to solve global and local ambiguity, as well as several experiments to analyze their effectiveness. Chapter 6 introduces and discusses several methods that take global and local ambiguity into account within the Generalized Graph SLAM framework. The experiments in Chapter 7 systematically evaluate these graph optimization methods and their performance under varying degrees of local ambiguity in synthetic graphs that allow direct comparison to a known ground truth. Further results in Chapter 8 corroborate the findings of the previous chapter with real world datasets. Chapter 9 shows the most recent results using the full Generalized Graph SLAM framework to solve local as well as global ambiguity. Part IV concludes the thesis with Chapter 10. Parts of this thesis have been published in several peer-reviewed journals and conference proceedings: • Chapter 4 – Max Pfingsthorn, Andreas Birk, S¨oren Schwertfeger, Heiko B¨ ulow, and Kaustubh Pathak. Maximum likelihood mapping with spectral image registration. In Robotics and Automation, 2010. ICRA 2010. Proceedings of the 2010 IEEE International Conference on, 2010 – Heiko B¨ ulow, Max Pfingsthorn, and Andreas Birk. Using robust spectral registration for scan matching of sonar range data. In 7th Symposium on Intelligent Autonomous Vehicles (IAV), IFAC. IFAC, 2010 – Max Pfingsthorn, Andreas Birk, and Heiko B¨ ulow. Uncertainty estimation for a 6-dof spectral registration method as basis for sonar-based underwater 3d slam. In Robotics and Automation, 2012. Proceedings. ICRA ’12. IEEE International Conference on. IEEE Press, 2012 – M. Pfingsthorn, H. B¨ ulow, A. Birk, F. Ferreira, G. Veruggio, M. Caccia, and G. Bruzzone. Large-Scale Mosaicking with Spectral Registration based Simultaneous Localization and Mapping (iFMI-SLAM) in the Ligurian Sea. In OCEANS 2013 Bergen, June 2013 • Chapter 5 – Max Pfingsthorn, Yashodhan Nevatia, Todor Stoyanov, Ravi Rathnam, Stefan Markov, and Andreas Birk. Towards collaborative and decentralized mapping in the jacobs virtual rescue team. In L. Iocchi, H. Matsubara, A. Weitzenfeld, and C. Zhou, editors, RoboCup 2008: Robot Soccer World Cup XII. Springer Verlag, Berlin, 2008 – Max Pfingsthorn and Andreas Birk. Efficiently communicating map updates with the pose graph. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), 2008 9
1. Motivation – M. Pfingsthorn, A. Birk, and H. Bulow. An efficient strategy for data exchange in multi-robot mapping under underwater communication constraints. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages 4886–4893, Oct 2010. doi: 10.1109/IROS.2010.5650270 – Max Pfingsthorn, Andreas Birk, Narunas Vaskevicius, and Kaustubh Pathak. Cooperative 3d mapping under underwater communication constraints. In IEEE Oceans, 2011 • Parts of chapters 3, 7, and 8 – Max Pfingsthorn and Andreas Birk. Simultaneous Localization and Mapping with Multimodal Probability Distributions. The International Journal of Robotics Research, 32(2):143–171, 2013. doi: 10.1177/0278364912461540. URL http://ijr.sagepub.com/content/32/2/143.abstract • Chapter 9 – Max Pfingsthorn and Andreas Birk. Handling local and global ambiguities via a generalized graph slam framework based on multimodal and hyperedge constraints. In Proceedings of the 1st Workshop on Robust and Multimodal Inference in Factor Graphs at ICRA 2013, May 2013 – Max Pfingsthorn and Andreas Birk. Representing and Solving Local and Global Ambiguities as Multimodal and Hyperedge Constraints in a Generalized Graph SLAM Framework. In Robotics and Automation, 2014. Proceedings. ICRA ’14. IEEE International Conference on, 2014 Several publications produced leading up or related to the results presented here are: • Jann Poppinga, Max Pfingsthorn, Soeren Schwertfeger, Kaustubh Pathak, and Andreas Birk. Optimized octtree datastructure and access methods for 3d mapping. In IEEE Safety, Security, and Rescue Robotics (SSRR). IEEE Press, 2007 • Vytenis Sakenas, Olegas Kosuchinas, Max Pfingsthorn, and Andreas Birk. Extraction of semantic floor plans from 3d point cloud maps. In International Workshop on Safety, Security, and Rescue Robotics (SSRR). IEEE Press, 2007 • Andreas Birk, Kaustubh Pathak, Jann Poppinga, S¨oren Schwertfeger, Max Pfingsthorn, and Heiko B¨ ulow. The jacobs test arena for safety, security, and rescue robotics (ssrr). In WS on Performance Evaluation and Benchmarking for Intelligent Robots and Systems, Intern. Conf. on Intelligent Robots and Systems (IROS). IEEE Press, 2007 • Yashodhan Nevatia, Todor Stoyanov, Ravi Rathnam, Max Pfingsthorn, Stefan Markov, Rares Ambrus, and Andreas Birk. Augmented autonomy: Improving human-robot team performance in urban search and rescue. In International Conference on Intelligent Robots and Systems (IROS). IEEE Press, 2008 • Ravi Rathnam, Max Pfingsthorn, and Andreas Birk. Incorporating large scale ssrr scenarios into the high fidelity simulator usarsim. In IEEE International Workshop on Safety, Security, and Rescue Robotics (SSRR), pages 1–6. IEEE Press, 2009 10
1.5 Structure of this Thesis • A. Birk, B. Wiggerich, V. Unnithan, H. B¨ ulow, M. Pfingsthorn, and S. Schwertfeger. Reconnaissance and camp security missions with an unmanned aerial vehicle (uav) at the 2009 european land robots trials (elrob). In IEEE International Workshop on Safety, Security and Rescue Robotics, SSRR, November 2009 • Ioana Varsadan, Andreas Birk, and Max Pfingsthorn. Determining map quality through an image similarity metric. In Luca Iocchi, Hitoshi Matsubara, Alfredo Weitzenfeld, and Changjiu Zhou, editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in Artificial Intelligence (LNAI), pages 355–365. Springer, 2009 • Kaustubh Pathak, Max Pfingsthorn, Narunas Vaskevicius, and Andreas Birk. Relaxing loop-closing errors in 3d maps based on planar surface patches. In International Conference on Advanced Robotics, Munich, Germany, June 2009 • K. Pathak, N. Vaskevicius, J. Poppinga, M. Pfingsthorn, S. Schwertfeger, and A. Birk. Fast 3d mapping by matching planes extracted from range sensor point-clouds. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, 2009 • Kaustubh Pathak, Andreas Birk, Narunas Vaskevicius, Max Pfingsthorn, Soeren Schwertfeger, and Jann Poppinga. Online 3d slam by registration of large planar surface segments and closed form pose-graph relaxation. Journal of Field Robotics, Special Issue on 3D Mapping, 27(1):52–84, 2010 • Andreas Birk, Kaustubh Pathak, Narunas Vaskevicius, Max Pfingsthorn, Jann Poppinga, and Soeren Schwertfeger. Surface representations for 3d mapping: A case for a paradigm shift. KI - Kuenstliche Intelligenz, 24(3):249–254, 2010 • Andreas Birk, Burkhard Wiggerich, Heiko B¨ ulow, Max Pfingsthorn, and Soeren Schwertfeger. Safety, security, and rescue missions with an unmanned aerial vehicle (uav): Aerial mosaicking and autonomous flight at the 2009 european land robots trials (elrob) and the 2010 response robot evaluation exercises (rree). Journal of Intelligent and Robotic Systems, 64(1):57–76, 2011 • Max Pfingsthorn, Andreas Birk, and Narunas Vaskevicius. Semantic annotation of ground and vegetation types in 3d maps for autonomous underwater vehicle operation. In IEEE Oceans, 2011 • Andreas Birk, Max Pfingsthorn, and Heiko B¨ ulow. Advances in underwater mapping and their application potential for safety, security, and rescue robotics (ssrr). In IEEE International Symposium on Safety, Security, Rescue Robotics (SSRR). IEEE Press, 2012 • K. Pathak, M. Pfingsthorn, H. B¨ ulow, and A. Birk. Robust Estimation of CameraTilt for iFMI based Underwater Photo-Mapping using a Calibrated Monocular Camera. In Robotics and Automation (ICRA), 2013 IEEE International Conference on, May 2013 • M. Pfingsthorn, H. B¨ ulow, Igor Sokolovski, and A. Birk. Underwater Stereo Data Acquisition and 3D Registration with a Spectral Method. In OCEANS 2013 Bergen, June 2013 11
1. Motivation
1.6
Note on the Timeline of Publications on Robust Pose Graph SLAM
The original draft of the journal article eventually published in the International Journal of Robotics Research (IJRR) [140] (described in part III) was submitted to the IEEE Journal Transactions on Robotics (TRO) in December 2010. It spend around a year in review there, and was finally rejected after an initial conditional acceptance. This final rejection was due to TRO policy, since only two rounds of reviews are allowed. The reviewers insisted on more results featuring real-world datasets, for which the page limit at TRO was too restrictive. The article was submitted to IJRR instead in December 2011, resulting in a rather fast acceptance with minor modifications, which were finalized in August 2012. Finally, the article was published online first in October 2013. At the time of the initial submission to TRO, no related work other than the work by Stachniss et al. [166] from 2007 existed. The paper by Sunderhauf and Protzel [169] was the earliest related work towards robust pose graph SLAM methods, published at ICRA in May of 2012. Two other papers at RSS 2012 (July) by Olson and Agarwal [124] and Latif et al. [100] also worked towards robust pose graph SLAM. Both Sunderhauf and Protzel [168] and Latif et al. [99] published follow up papers at IROS 2012 (Oktober). According to the summary on their website1 , Olson and Agrawal seem to have an extended paper in the upcoming RSS special issue of IJRR. Since then, the field has seemingly become very popular and contributions are being made by various researchers. Several preprints of upcoming ICRA 2013 and submitted IROS 2013 papers are already available, amongst others also from Burgard’s group at University of Freiburg and John Leonard’s Group at MIT. Also, there is a workshop planned for ICRA 2013, jointly organized by Olson (University of Michigan), S¨ underhauf (TU Chemnitz), and John Leonard (MIT), where the most recent findings reported in chapter 9 will be presented.
1
at http://openslam.org/maxmixture.html
12
Chapter 2
Graph-based Simultaneous Localization and Mapping One intuitive description of the SLAM task is the following: As the robot moves in its environment, it takes snapshots of its immediate surroundings, relates them spatially to previously made snapshots in an optimal way, and tracks its own trajectory relative to the stored snapshots. This intuition is naturally translated into a graph of spatial relations, connecting poses where snapshots were taken [105]. Approaches using this graph-based formulation require an explicit numerical optimization method to arrive at a maximum likelihood estimate with respect to equation 1.1. Efficient sparse optimization methods have only recently been developed, which led to a resurgence of research in this area1 . The survey article by Grisetti et al. [70] describes some of the history of this approach. Graph-based methods presented themselves as the most suitable within the context of this thesis. Other methods lack either the expressive power needed to achieve many of the results presented (e.g. the Kalman filter [89]), have excessive computational demands (e.g. the Gaussian Sum or Mixture Kalman filters [34]), or would not scale in an acceptable way with additional robots or mapping complexity (e.g. Particle filters [43, 64]). The exact advantages of graph-based methods for SLAM, computationally and in terms of representative power, will be described, discussed, and analyzed throughout this thesis. The rest of this chapter focuses on the theoretical underpinnings of graph-based SLAM methods, while the next chapter focuses on the explicit representation of the novel concepts of local and global ambiguity (as introduced in section 1.3) in such graphs.
2.1
The Pose Graph Map Data Structure
Formally, a pose graph is an undirected graph G = (V, E) consisting of vertices V and edges E. The vertices vi ∈ V denote poses where the robot obtained sensor observations zi . A pose estimate xi is also associated with the vertex and thus it is a tuple vi = (xi , zi ). In addition to the vertices it connects, each edge ek ∈ E contains a constraint ck on the pose estimates of the associated vertices, thus ek = (vi , vj , ck ). While the graph itself is undirected, the edge has to declare a sort of observation direction, the direction in 1
At the time of this writing, there are 6890 hits for “+pose +graph +mapping +localization +robot” on Google Scholar. Half of these have been written since 2008.
13
2. Graph-based Simultaneous Localization and Mapping which the constraint was generated. In case the edge is traversed in reverse direction, the constraint c must be inverted. What exactly that entails is up to the representation of the constraint.
Robot Trajectory
Pose Graph Vertex
Environmental Feature Sensor Observation Sequential Pose Graph Edge Nonsequential Pose Graph Edge (loop closure)
Figure 2.1: A schematic depiction of a pose graph. The vertices in the graph correspond to poses where sensor observations were made. Matching sensor observations give rise to constraints on the edges. Sequential edges follow the trajectory, non-sequential edges “close loops”.
Generally, two robot poses xi and xj are connected by a constraint iff a part of the environment was observed by both corresponding sensor readings zi and zj . Figure 2.1 shows this concept in a schematic way. Specifically, two different types of pose graph edges are identified: a) Sequential edges, where two consecutive vertices are connected (i.e. j = i + 1), e.g. by constraints originating from odometry, and b) non-sequential loop closing edges, where two temporally far vertices are connected because the robot revisited a previously visited part of the environment (i.e. drove in a loop). Formally, they form two disjoint subsets of E = Es ∪ El , where Es = {e|e = (vi , vj , ck ) ∈ E ∧ j = i + 1} consists of all sequential edges and El = {e|e = (vi , vj , ck ) ∈ E ∧ j 6= i + 1} consists of all the non-sequential loop closing edges. In general, El is much harder to obtain than Es . Pose Graph SLAM has been the method of choice in the latest literature on SLAM in dynamic environments [179], portable SLAM systems for humans [46, 51], long-term autonomous SLAM [2, 110], SLAM for autonomous cars [102], SLAM with micro aerial vehicles (MAV) [58, 101], as well as underwater SLAM [50, 147]. 14
2.2 General Cost Function for Pose Graph Optimization
2.2
General Cost Function for Pose Graph Optimization
We consider the general case, where a constraint between vertices vi and vj is of the form p(xj xi |zi , zj , ui:j ) def.
= p(tji |zi , zj , ui:j )
(2.1) (2.2)
where is the pose difference operator [163], which produces the relative transformation tji from the coordinate frame at vi (namely xi ) to the frame at vj (xj ), and ui:j is the sequence of control inputs between the two. The constraint is generated, for example, by odometry using ui:j or a sensor data registration algorithm using zi and zj . In general, the constraint c stored in the graph will have parameters that depend on zi , zj , and ui:j . Thus, we will write p(tji |zi , zj , ui:j ) = p(tji |ck ) for constraint k (2.3) It is now possible to express the probability of the robot trajectory as a function of the constraints: YY p(x1:t |z1:t , u1:t ) = p(xj xi |zi , zj , ui:j ) (2.4) i
j
Y
p(x1:t |G) =
p(xj xi |ck )
(2.5)
(vi ,vj ,ck )∈E
As not all observations are directly related to each other, the specific graph of constraints is used to formulate the probability in a more intuitive and tractable way. The constraint probability density (eq. 2.3) has been mostly modeled with a single multivariate normal distribution in the literature. This allowed to formulate very efficient optimization algorithms to solve eq. 1.3. The constraint probability density functions have been modeled as multivariate normal distributions as follows: j 1 − 21 (tji µk )T Σ−1 k (ti µk ) p(tji |ck ) = e (2.6) |2πΣk |1/2 Here, µk is the mean of the transformation estimate for the constraint ck , and Σk is the corresponding covariance matrix. Note that |2πΣk | = (2π)d |Σk | if Σk ∈ Rd×d [135]. In the context of optimization, the logarithm of the full trajectory probability (eq. 2.5) is most useful due to its numerical stability and ease of computation: 1 1 j ln p(tji |ck ) = − ln (|2πΣk |) − (tji µk )T Σ−1 k (ti µk ) 2 2 X 1 ln p(x1:t |G) = − ln (|2πΣk |) 2
(2.7)
(vi ,vj ,ck )∈E
−
1 2
X
j (tji µk )T Σ−1 k (ti µk )
(2.8)
(vi ,vj ,ck )∈E
By neglecting the constant terms, this is turned into a cost function equivalent to the sum of squared Mahalonobis distances over all constraints. X j cost(x1:t |G) = (tji µk )T Σ−1 (2.9) k (ti µk ) (vi ,vj ,ck )∈E
15
2. Graph-based Simultaneous Localization and Mapping This cost function was first described by Lu and Milios [105]. It is equivalent to the X2 metric. Since minimizing eq. 2.9 represents a weighted least-squares optimization problem, a number of classical methods are applicable. These methods include Gauss-Newton, Levenberg-Marquardt, and Conjugate Gradient methods [14, 117].
2.3 2.3.1
Multi-Robot Pose Graph SLAM The Multi-Robot Pose Graph Map Data Structure
Robot Trajectory
Pose Graph Vertex Pose Graph Edge Intra-Robot Sequential Pose Graph Edge Intra-Robot Nonsequential Pose Graph Edge Inter-Robot
Figure 2.2: A schematic depiction of a multi-robot pose graph. Inter-robot edges join previously disconnected components in the pose graph, one per robot.
A multi-robot pose graph consists of one or more disconnected components, with vertices and edges supplied by each member of the team. Components can become connected by successfully registering sensor data collected by two different robots. Formally, the multi-robot pose graph G+ = (V + , E + )
(2.10)
consists of multiple disjoint sets of vertices V
+
=
N [
Vi
(2.11)
i=1
where N is the number of robots in the team. This set of edges E + can be expressed 16
2.3 Multi-Robot Pose Graph SLAM similarly as E+ =
N [
N[ −1
! Ei
∪
N [
Ei/j
(2.12)
i=1 j=i+1
i=1
Here Ei is the set of edges just involving vertices of robot i, and Ei/j is the set of edges connecting vertices of robot i with vertices of robot j. Ei can be further subdivided into sequential Ei,s and loop closing edges Ei,l as in section 2.1. The set of inter-robot edges in the graph is denoted E × for ease of notation in this section. E× =
N[ −1
N [
Ei/j
(2.13)
i=1 j=i+1
Figure 2.2 shows a schematic depiction of a multi-robot pose graph. Edges are color coded depending on which set they belong to. Edges in E1,s are blue, edges in E2,s are green, edges in E1,l are red, and edges in E1/2 are orange. The edges in Ei/j ∪ Ej are significant as they allow robot i to close additional loops, for example when two robots start at one location and meet at another after having taken different routes. Robots i and j do not have to revisit places in their own trajectory, but can reuse their teammates’ trajectories for the same purpose. Because of its simplicity, the pose graph data structure has been rather popular in multi-robot SLAM applications [31, 84, 90, 91, 152].
2.3.2
Map Merging is Equivalent to Loop Closing
In the past, a number of approaches have been developed to register maps from multiple robots with each other in order to relate their coordinate frames spatially [9, 24–26, 40, 75, 95]. This work has been mostly restricted to occupancy grid maps produced by particle filter SLAM methods. However, in a pose graph SLAM context, the map estimates are encoded in the graph structures are are not readily available as e.g. occupancy grid maps these approaches rely on. Given the formulation above, map merging in the pose graph case is equivalent to discovering pairwise constraints between sensor observations of different robots, i.e. the edges in Ei/j . When multiple separate single robot maps should be merged in one pose graph, all single robot pose graphs have to be merged in a single pose graph structure. This is a very straight forward union operation on the vertex and edge sets as described above. Much like in the case where a single robot sequential registration fails, e.g. due to sudden unexpected movement of the robot (the kidnapped robot problem [35]), this results in multiple disconnected components in the complete multi-robot pose graph. What remains is the problem of discovering constraints that link these disconnected components, i.e. edges in E × . As discussed in section 2.4.1, the general problem in loop detection is to find a previous sensor observation zt< that is similar to the current observation zt . This concept can be extended to the multi-robot pose graph to not only include previous observations of the same robot, but all other robots. Thus, an observation zt 0 ∨ S2 (i, j) > 0 0, otherwise XX f (S1 , S2 ) = C(i, j) C(i, j) =
i
j
−250 −200 −150 −100 −50 0 50 100 150 200 250 −250
−200
−150
−100
−50
0
50
100
150
200
250
Figure 4.3: This dirac spectrum shows multiple possible translations (circles) for the right pair in figure 4.2. Note the distinct ”V” shape of possible positions corresponding directly to sliding overlaps in the scans.
More overlap results in more pixels having a non-zero value in both scans, thus increasing this metric. The candidate translation with the highest overlap is reported as the result. All local maxima which are at least as large as 80% of the global maximum are taken into account. Unfortunately, this processing depends heavily on the scans themselves. Some scans may have many similarities, thus reducing the amount of local maxima and this metric might not have to be evaluated. Other scans, such as the one shown in figure 4.2, give rise to many local maxima as shown in figure 4.3. However, this refinement step is entirely optional, thus still maintaining the constant time property of the basic algorithm. 44
4.4 Estimating Mean and Covariance for Spectral Voxel Grid Registration
4.4
Estimating Mean and Covariance for Spectral Voxel Grid Registration
The 6-DoF spectral registration method introduced in [18] also relies on the fact that two signals with the same spectral magnitude carry their shift information within the phase, just as its 2D variant above. Again, any rotation of the 3D signal results in a rotation of the spectral magnitude and is thus decoupled from translation. A spherical re-projection of the 3D spectral magnitude transforms the three rotations into two translations (roll and pitch) and another rotation (yaw). Finally, the yaw parameter can be transformed into a signal shift by appropriate re-projection as well. All these shifts can be determined by a Phase-Only Matched Filter (POMF). An important element of this registration method is the non-trivial resampling of the data on which the POMFs are applied. This resampling is very fast and efficient but it introduces inherent distortions. The method hence trades high robustness against noise and partial overlap with limitations in the maximum amount of rotations between scans. Concretely, the 6-DoF registration method by B¨ ulow and Birk [18] is restricted to changes of yaw of up to ±90◦ combined with simultaneous changes of roll and pitch of up to ±35◦ between the two to be registered scans. A detailed discussion of the method itself as well as experimental evaluations including a comparison with ICP can be found in [18]. The 6-DoF registration method allows to generate 3D maps by sequential registration of scans. Obviously, registration errors then accumulate and the resulting map is subject to drift. It is hence of interest to improve this by using SLAM. An uncertainty estimation is required for this purpose, which is introduced here as follows. Each POMF generates a peak at the location of the shift, i.e. at the according parameter(s) of the related DoFs. The peaks are in theory ideal Dirac pulses. In reality however, they are deteriorated by noise as well as occlusions and partial overlap of the scans much like in the 2D case above; according examples of the POMF results are shown in figures 4.4 and 4.5. The three separate POMF results used in the registration method are hence normalized and treated as probability mass functions (PMF). More precisely, the different DoFs are treated as follows based closely on the decoupling achieved in the registration method. The yaw parameter is computed using a one-dimensional POMF and thus results in a onedimensional PMF. Both roll and pitch parameters are computed simultaneously and are encoded in a two-dimensional PMF. Finally, the three translational parameters, which are determined by a 3D POMF, result in a three-dimensional PMF. For a thorough description of the registration method and why the 6 parameters are decoupled this way, see [18]. For each parameter set, a normal distribution is fitted according to the neighborhood of the maximally weighted parameter. For the one-dimensional case, the normal distribution is fitted as follows: X µ= pmf (i) · p(i) (4.9) i
C=
X
pmf (i) · (p(i) − µ)2
(4.10)
i
where µ is the resulting mean, C is the variance, and p(i) is the parameter corresponding to cell i. The mean and variance is computed using a neighborhood of size N around the cell with the maximum probability i∗ , so i = i∗ − N, . . . , i∗ + N . Usually, N ≈ 10 to 45
4. Uncertainty Estimation of Registration Results for the Use in Pose Graph SLAM achieve a good balance between accuracy and computation speed as well as robustness. Larger neighborhoods potentially include outliers that distort the result significantly. In higher dimensional PMF results, the sum is expanded to encompass all dimensions and the covariance matrix. Only the three-dimensional case is shown here, the two-dimensional case is defined similarly. XXX µ= pmf (i, j, k) · p(i, j, k) (4.11) i
C=
XXX i
j
j
k
pmf (i, j, k) · (p(i, j, k) − µ)T · (p(i, j, k) − µ)
(4.12)
k
In total, there are thus three means µyaw ∈ R, µroll,pitch ∈ R2 , and µx,y,z ∈ R3 , and their respective covariances. The final registration result is then: (µTx,y,z , µTroll,pitch , µyaw )T
µ=
(4.13)
Cx,y,z 03×2 03×1 C = 02×3 Croll,pitch 02×1 01×3 01×2 Cyaw
(4.14)
In figures 4.4 and 4.5, the interpolated mean as well as the covariances are plotted along with the analyzed POMF result. All ellipses show a 95% confidence interval. Along with the mean and covariance, the analysis also computes a peak-to-noise ratio in order to estimate the reliability of the registration result. Intuitively, the peak in the POMF results better approximate a Dirac pulse the better the two scans fit together (figure 4.4). On the other hand, the POMF results are more uniform when the scans do not share enough common features (figure 4.5). Thus, a good prediction of the registration reliability can be achieved by comparing the amount of energy in the POMF result in the neighborhood used to compute the mean and covariances above to the total energy. The idea here is that good results have a single pronounced maximum around which we fit the distribution, bad results contain a noisy but rather uniform distribution. Thus the more energy or probability mass is left out of our calculation, the worse the result. Since the POMF result is normalized to obtain a PMF, this results in the following sum: X E= pmf (i) (4.15) i
PNR =
E E−1
(4.16)
where i again refers to all cells in the neighborhood of the peak, as used for the computations above. As with the computation of the above mean, the sum is expanded to contain all dimensions present in the PMF. As mentioned, figures 4.4 and 4.5 show two examples that illustrate two extreme cases. In figure 4.4, a relatively easy scan pair is chosen, which has a high overlap and decent noise levels, leading to a good registration result. This is reflected in the according P N R levels and covariances. Figure 4.5 shows a hard case, where the two pairs were chosen such that the registration fails, which is clearly reflected in the P N R levels and covariances. A threshold for P N R can be used to distinguish good and bad registration results during the mapping progress. 46
4.4 Estimating Mean and Covariance for Spectral Voxel Grid Registration
y
y
z
z
x
x
0.12
0.1
probability
0.08
0.06
z
0.04
0.02
0 −2
−1.5
−1
−0.5
0
yaw (radians)
0.5
1
1.5
y
x
2
Figure 4.4: Twos sequential scans with a successful registration result from the flood gate data set. The top row shows the scans before and after matching. The bottom shows the corresponding POMF results with matched normal distributions. The P N R values are high: Yaw 0.9608, roll/pitch 29.6166, translation 37.3184. Note that the probability mass functions are described well by normal distributions.
y
y
z
z
x
x
0.035 0.03
probability
0.025 0.02 0.015
z 0.01 0.005 0 −2
y −1.5
−1
−0.5
0
yaw (radians)
0.5
1
1.5
2
x
Figure 4.5: Two non-sequential scans from the flood gate data set with low overlap, leading in an unsuccessful matching result. The top row shows the scans before and after matching. The bottom shows the corresponding POMF results with matched normal distributions. The P N R values are low: Yaw 0.1955, roll/pitch 0.0505, translation 0.0279. Note that the probability mass functions are not described well by normal distributions since the matching was unsuccessful.
47
4. Uncertainty Estimation of Registration Results for the Use in Pose Graph SLAM
4.5
Experimental Results
4.5.1
2D Affine iFMI Mapping
4.5.1.1
Cold Corals
Figure 4.6: Short sequence of three images from the cold water corals data set. The detail on the bottom shows the merged image before optimization, and the one above shows the much clearer structures of the optimized image map. The corresponding pose graph with a slightly misaligned top is shown on the right. Note the global covariance matrices plotted as ellipses in magenta.
The first data set discussed here was recorded off the coast of Sweden. It was collected by a remote operated vehicle (ROV) and contains a video stream showing cold water corals. The data is quite challenging for an image registration algorithm as fish, plankton, and the ground itself all give rise to different flow patterns. Our results show that the iFMI registration algorithm is both robust against such noise and generates covariance matrices usable for maximum likelihood mapping with the pose graph. Figure 4.6 shows a set of three images which were mutually registered. All three images are rendered transparently on top of each other using a gaussian kernel. This is clearly shown in the resulting pose graph shown on the right. Each of the three vertices is connected to the two other vertices. The global covariances of the poses are plotted in magenta. Note that the first node (here shown in green) does not have any positional uncertainty as it is defined as the origin of the global map coordinate frame. The final result after pose graph optimization is shown on the left of figure 4.6. Even before optimization, the error is only minimal, as seen in the close up (bottom center) and 48
4.5 Experimental Results in the pose graph on the right. The top vertex in the graph is actually projected to two different poses, depending on the path taken to accumulate the registration results. Such conflicts are visualized in red. The detail view in the middle of figure 4.6 shows how the slight error that is accumulated even over two steps can be mitigated by maximum likelihood mapping. Before optimization, distinguishing features are blurred and smudged in the composite image, shown in the bottom center image. Once the underlying pose graph is optimized, the highlighted features are much crisper since the images are aligned better. In addition, the significant drop in the objective function used during optimization shows quantitively that the map improved. The sum squared Mahalanobis distance or X2 metric over all edges before optimization is 0.611929. After optimization, it is reduced to 3.05721 · 10−32 . This demonstrates that all three constraints are satisfied simultaneously. The computation time of the pose graph optimization with three vertices and three edges is negligible, less than 0.2 ms on an Intel Core i7 2.67 GHz with the SGD method implemented in the TORO optimization library [68]. 4.5.1.2
Rocky Testing Pool
The following data has been recorded in a small pool used for testing landers and crawlers at Jacobs University Bremen’s Ocean Lab. The bottom of the pool was covered with sand and several differently sized rocks. Conditions in the pool were not favorable as it was mostly taken over by algae and the ground was barely visible through the green tinted water. However, since it was a very controlled environment, we were able to collect data that contains many loops and is thus very suitable for mapping. Figure 4.7 shows a map of the pool ground and rocks before and after optimization. Before optimization, as shown on the left of the figure, the map uses only the cumulative transformations calculated by sequential image registration. It is quite obvious that the lower part of the map is blurred beyond recognition. Two rocks, which seem to be properly localized in the upper portion of the map are actually visible twice, as pointed out by the arrows. Due to translational errors in the right part of the map, two rocks (the triangular one and its left neighbor) seem properly sized, but are localized incorrectly relative to each other. After optimization with the SGD method implemented in the TORO library [73], the map shows much crisper and more visible features and is shown on the right of figure 4.7. The bottom left of the map is clearer. The two rocks indicated by the arrows are localized properly. The optimization of the pose graph was finished after 5.25 ms on an Intel Core i7 2.67 GHz. The small computation time is due to the relatively small size of the graph. This map contains only 35 vertices and 57 edges, as shown in figure 4.9. Please refer to [73, 119] for a detailed investigation and performance analysis of the optimization algorithm. An interesting detail is revealed by looking only at the first and last picture of the sequence. Figure 4.8 shows how far apart the two pictures are localized before optimization. This specific artifact is the same as indicated by arrows in the complete map (figure 4.7). After optimization, the two images overlap properly. This is especially visible when looking at the diamond shaped rock. In the picture before optimization, it is present twice, afterwards, both images show the rock at the same location. Figure 4.9 shows the underlying pose graph data structure. The arrow heads represent 49
4. Uncertainty Estimation of Registration Results for the Use in Pose Graph SLAM
Figure 4.7: Complete image map generated from the testing pool data set, before (left) and after (right) graph optimization. Note the blurred lower left part of the map before optimization and the misalignment of two rocks (arrows). After optimization, the map is visibly improved and the rocks are aligned properly.
50
4.5 Experimental Results
Figure 4.8: Detail of the first and last frame of the testing pool image map, before and after pose graph optimization.
the global poses of the vertices. Blue lines show edges between vertices. Conflicts in the pose graph where the same vertex is projected to two or more different poses due to loops are highlighted in red. A similar distribution of errors is visible in this visualization. Vertices on the lower left loop of the graph are not very well localized before optimization. One of the vertices projects to significantly different poses, shown here in red with lines connecting them to the most likely pose drawn in black. After optimization, all constraints were satisfied and no conflicts remain. This fact is also demonstrated by the significant decrease in the Mahalanobis distance. Before optimization, the value is 367.478. Afterwards the value drops to 9.527 · 10−4 . These values quantify the error in the map described above and show that the optimization algorithm was able to use the generated uncertainty information in a meaningful way. The performance of the registration is dominated by the underlying FFT implementation. In the following experiments, the FFT implementation of the Gnu Scientific Library is used. Preliminary tests have shown that more optimized FFT implementations may increase performance by 30%. On an Intel Core i7 2.67 GHz with 6GB RAM, registration of two images takes on average 0.055 seconds (σ = 0.0071). Please note that the program was not multi-threaded and thus did not use the processor to the fullest extend. Performance can be improved if the images are sequential, which means that some of the previous FFT computations can be reused. The registration then only takes 0.028 seconds on average (σ = 0.0058) for each sequential registration after the first one. These numbers include the time needed to extract the described uncertainty information. Generating the map shown in figure 4.7 took 1.9 seconds in total. 51
4. Uncertainty Estimation of Registration Results for the Use in Pose Graph SLAM
Figure 4.9: Pose graph that represents the map shown in Figure 4.7, before (top) and after (bottom) optimization with the TORO graph optimization library. It consists of 35 vertices and 57 edges. The starting pose is green, constraint edges from registration are blue, vertices that are projected to multiple poses because of accumulated errors in a loop are shown in red. Note that these conflicts are resolved after optimization.
52
4.5 Experimental Results 4.5.1.3
Real-World ROV Data
Figure 4.10: The Romeo ROV used in the experiments. The bottom image shows the monocular camera assembly.
Graph-based SLAM with the robust iFMI image registration method is applied to imagery collected by the Romeo Remotely Operated Vehicle (ROV) system (see Figure 4.10) in the Summer of 2005 in the Ligurian sea near Portofino, Italy. The trial took 44 minutes and 15 seconds which corresponds to 13275 processed images of 360 × 272 pixels at a 5Hz rate and an area of approximately 23m2 at a depth around 20m. The Romeo ROV navigated through way-points along a two-dimensional grid (lawn mowing pattern) in auto-altitude mode and at constant heading. For more details about the setup please refer to [23]. The open source implementation OpenFABMAP [62] was use to detect loops in the video stream using SURF features. To make the method most comparable to the previous work on this data set, the same parameters for the SURF descriptor was used [54], namely 3 for the number of levels and octaves, and 300 as the minimum hessian value. The nearest allowed loop detection was allowed to be 10 images ago. The minimum match value was lowered from the default of .98 to .2, allowing many more, possibly false, loop hypotheses to be generated by OpenFABMAP. However, the registration method was robust enough to detect these false positives and only add true positive registration results to the final map. Figure 4.11 shows several maps generated from the input data. The map generated by Ferreira et al. [54] is shown in figure 4.11(a). The pose graph map shown in 4.11(c) consists of 4075 vertices and 6818 edges, or image registration results. The map in 4.11(b) contains only sequential edges, and thus consists of 4075 vertices and 4074 edges. Using a Matlab-based implementation, more than 10 image pairs can be registered per second, yielding scaling, rotation, and translation estimates as well as a corresponding covariance matrix. A C++ implementation of iFMI achieves more than 50 registrations per second in the same resolution. This can be further sped up by caching FFT results if the images are sequential. The final optimization of the graph was done with a Gauss-Newton method implemented in the g2o library [98]. Optimizing the pose graph map shown in figure 4.11(c) took 0.505s on an Intel i7-3770 3.4GHz with 16GB RAM. Figure 4.13 and 4.12 shows how significantly the addition of loop closures from OpenFABMAP and subsequent optimization improves the overall map quality. Note the highlighted image seams that are 53
4. Uncertainty Estimation of Registration Results for the Use in Pose Graph SLAM
(a) Map by Ferreira et al. [54]
(b) Without loop constraints
(c) With loop constraints
Figure 4.11: Rendered Photomosaic Maps
54
4.5 Experimental Results
(a) Map by Ferreira et al. [54]
(b) Without loop constraints
(c) With loop constraints
Figure 4.12: Details of a small sea mount structure in the map. Note the significant improvement when using spectral registration and subsequent graph optimization (right). Note the obvious image seams pointed out by arrows. These seams almost vanish completely after optimization.
55
4. Uncertainty Estimation of Registration Results for the Use in Pose Graph SLAM
(a) Without loop constraints
(b) With loop constraints
Figure 4.13: High-resolution crop of the start area. Note the gross error of the map without loops. These errors are corrected after optimization with loops.
corrected after optimization. In the results presented, only image registration was used, other navigation data from the vehicle as well as the installed laser altimeter were neglected. The only assumption made was that the vehicle kept a constant heading.
4.5.2 4.5.2.1
2D Spectral Scan Matching Real-World 2D Sonar Range Scans
The scan matcher was tested on sonar data collected in an abandoned marina in Spain (see [153, 154], published in the Radish repository by [83]). A total of 217 360◦ scans were extracted from the data set and motion-compensated with available Doppler Velocity Log odometry data. Motion-compensation was necessary as the rotating head sonar scanner takes around 14 seconds for a full revolution. During this time, the robot is not stationary. To compensate, each beam was given a starting pose within a local scan-centric coordinate frame. This pose was interpolated from the synchronized Doppler Velocity Log readings. The beam was then projected onto a 512 by 512 discrete grid with a 20 cm cell resolution. This resolution was chosen because computing the FFT is more efficient in this size, rather than non power-of-two sizes. Since the proposed method is based on correlation with the POMF, some features of walls precluded a correct matching result. Such an example is shown in figure 4.2 on the right. The increasing width of the wall as observed by the sonar sensor due to measurement noise is a large enough feature to disturb the correlation result. This happened as well in 56
4.5 Experimental Results
(a) A map generated by incrementally matching (b) A map containing loops. Note that some loops scans with the proposed scan matcher. result in slight misalignments, such as the erroneous double wall on the top of the map. This is due to errors caused by low overlap, as shown in figure 4.2.
(c) The map from (b) optimized with a (d) The map from (b) optimized with the SGD translation-only variant of Lu and Milios [105] method implemented by Grisetti et al. [73]
Figure 4.14: Complete Maps
57
4. Uncertainty Estimation of Registration Results for the Use in Pose Graph SLAM the corridor to the right of the map shown in figure 4.14(a). Both parallel walls showed such features and it thus appeared that the robot did not move at all, even though it was moving down the corridor. However, with some features, such as in figure 4.2, failure is easily detectable by observing the dirac spectrum. The algorithm chooses the best peak in the spectrum, but there may be multiple other alternatives. Specific failure conditions, such as the number of peaks, their relative height, etc, are currently being investigated. Figure 4.14(a) shows a map generated by incrementally matching pairs of scans that were taken up to 20 meters apart. Note that the scan matcher failed to match scans close to the long corridor as the DVL used for motion compensation lost accuracy. As a result, the scans were distorted even after motion compensation, and the scan matcher was unable to find a good match. It was however possible to close a large loop once the robot arrived at the lower left corner again. The very first and the 123rd scan were successfully matched (shown in figure 4.2 on the left). A small trajectory from the 123rd until the 145th scan was added on the bottom of the map. Figure 4.14(b) shows a map containing a number of loops. The good overall fit shows that the scan matcher can robustly match different paths to the same location consistently. Only one error is obvious, the top wall is present twice. This is due to a slight error while matching two opposite scans of the ”V” shape that ends in the top wall. As shown in figure 4.2, these can be hard to match.
Figure 4.15: Examples of successful registration results. Note that the rotation is greater than 90 degrees in the two examples on the left. Top right: One wall is missing from one scan, but the scan matcher still succeeds.
On a Intel Core 2 Duo 2.8GHz laptop running MATLAB, a single scan matching operation takes on average 0.6711 seconds (σ = 0.0328). The runtime is only dependent on the resolution of the scan and is thus nearly constant as shown by the low standard deviation. A preliminary C++ implementation achieves up to 50Hz for the above operation. Depending on the number of extra peaks that need to be evaluated, the post-processing (also implemented in MATLAB) needs from 0.42 to 8 (µ = 2.5356, σ = 1.7145) seconds after the actual scan matching operation is finished. As evident by the large standard 58
4.5 Experimental Results deviation, the required computation time varies significantly as expected. Generating the map shown in figure 4.14(b) took 211.64 seconds in total. It includes 66 matched pairs. Figure 4.14(a) contains 36 scans, each matched in sequence. Acquiring the data required more time, and thus the method can be referred to as real-time. Note that most (three out of four) maps in figure 4.14 are distorted due to false registration results for very few pairs. This highlights the need for robust graph optimization methods discussed in part III of this thesis.
4.5.3 4.5.3.1
3D iFMI Voxel Grid Mapping Simulated 3D Sonar Range Scans
Simulated 3D sonar data was collected in a high-fidelity simulator developed for the EU FP7 project “Cooperative Cognitive Control for Autonomous Underwater Vehicles (Co3AUVs)”. It is a distributed multi-robot simulator built upon popular open source components for physics simulation and graphics, namely Bullet Physics and OGRE. A ROS interface was developed for easy access and control of the simulated robots. The simulated 3D sonar sensor was modeled after the TriTech Eclipse imaging sonar at the maximum resolution as used in the real data set described in the next section. Its field of view is 120 degrees horizontally and 45 degrees vertically, with 256 beams and 91 beams respectively. Synthetic zero-mean normally distributed noise is added to the ground truth scans collected in the simulation. All scans received noise with a rather large standard deviation of 2% of the ground truth distance of the beam. Over the sensor range of 100 meters, this usually resulted in the 2-3 meter deviation as also seen in the real scans. A total of 700 scans were recorded over a 3D trajectory forming a loop in the environment shown in figure 4.17. The map in figure 4.18 contains a total of 787 registration pairs. Only the 3D spectral registration of the resulting point clouds was used to generate these maps; no motion sensors like INS or DVL were used. The P N R values were used to classify good registration results. During the analysis of the experiment, it became clear that the P N R of the translation result is enough to predict good matching performance. As it is the last step of the process, its P N R combines the quality of the previous steps. A simple threshold of 0.1 was enough to reliably filter out bad results of registrations involving temporally near pairs, e.g. scan number 1 with scan number 5. Another, much more aggressive, threshold of 40 was used to filter temporally far but spatially near pairs for loop closure, e.g., scan number 1 with scan number 700. The final map as shown in figure 4.18 was constructed in a pose graph SLAM framework. The freely available TORO [73] optimizer is used to optimize the pose graph. Using a breadth-first initialization of vertex poses, the initial log-probability of the complete pose graph was 849.525. After optimization with TORO, the log-probability rose to 2879.63. Two-hundred iterations of TORO required 0.64 seconds of computation time on a Core 2 Duo 2.8GHz. Also, the mean squared distance to the ground truth decreased significantly. Before optimization, the edge-wise mean square error proposed in [97] relative to ground truth from the simulation reported values of 5.8025 (m2 ) for translation and 0.0044569 (rad2 ) for rotation. After optimization based on our uncertainty estimation, the error decreased to 0.737942 and 0.00075762 respectively. The mean square error metric between global vertex poses and ground truth SSExyz and SSEφψθ (see section 2.5), decreased from 59
4. Uncertainty Estimation of Registration Results for the Use in Pose Graph SLAM
0.16 0.14
probability
0.12 0.1 0.08
z
0.06 0.04
y
0.02 0 −2
−1.5
−1
−0.5
0
0.5
yaw (radians)
1
1.5
2
x
Figure 4.16: Two simulated sonar scans as points clouds before (top left) and after (top right) registration. The bottom row shows the resulting PMFs and the fitted normal distribution used for mapping.
115.414 to 25.124 for translation and from 0.063867 to 0.0432832 for rotation.
Figure 4.17: Two screenshot of the simulated world with the robot model. Total area of the world was 200m × 200m.
4.5.3.2
Real-World 3D Sonar Range Scans
This data set consists of 18 scans that were collected around a river flood gate and lock in the north of Bremen, Germany, as shown in figure 4.19. The sensor used is a Tritech Eclipse sonar, a multibeam sonar with time-delay beamforming and electronic beam steering. Its core acoustic sensing parameters are: • Operating Frequency: 240 kHz • Beam Width: 120◦ 60
4.5 Experimental Results
Figure 4.18: The 3D point cloud map before (top) and after optimization (bottom). Note the improvement in the top left area. The map contains approximately three million points. The underlying pose graph structure is shown in blue. Height is color coded ranging from blue (low) to red (high).
61
4. Uncertainty Estimation of Registration Results for the Use in Pose Graph SLAM
Figure 4.19: A 3D map of the Lesumer Sperrwerk, a flood gate in Bremen. It is generated by sequentially matching several 3D sonar scans with the 6 DoF spectral registration method by B¨ ulow and Birk [18]. The 3D map is once shown as a 3D voxel grid (top) and once projected onto ground truth imagery from Google Earth (bottom).
62
4.5 Experimental Results • Number of Beams: 256 • Acoustic Angular Resolution: 1.5◦ • Effective Angular Resolution: 0.5◦ • Depth/Range Resolution: 2.5 cm • Maximum Range: 120 m • Minimum Focus Distance: 0.4 m • Scan Rate: 140 Hz at 5 m, 7 Hz at 100 m • Vertical Scan Range: 45◦ The sensor data was recorded and used with only minor processing. Specifically, the echo intensities were thresholded to discard erroneous readings. The data set is used for a qualitative evaluation of the uncertainty estimation method. The scans were recorded in a sequential manner. Hence, there is decreasing overlap and therefore one can expect increasing uncertainty - when scans are registered that are further away from each other in the sequence. Figures 4.4 and 4.5 show two examples from the opposite ends of the registration tests, namely the registration results of S1 and S2 , as well as S1 and S13 , respectively. As also supported through visual inspection, the result of the S1 /S2 registration is very accurate whereas the S1 /S13 registration fails completely, which is not very surprising as 11 scans in the sequence were skipped. These qualitative observations are also reflected in the according uncertainty estimations. Especially, all P N R values decrease significantly from figure 4.4 to figure 4.5, dropping well below the threshold of 0.3 and thus correctly identifying the second pair as erroneous. Figure 4.19 shows the complete 3D map built by this method, once as a voxel grid, once projected on ground truth imagery from Google Earth.
63
4. Uncertainty Estimation of Registration Results for the Use in Pose Graph SLAM
64
Chapter 5
Multi-Robot Graph-based SLAM 5.1
Bandwidth Advantages of the Multi-Robot Pose Graph
While not immediately obvious, even transmitting a full pose graph including all sensor observations, e.g. to a robot operator station, is much more bandwidth efficient than transmitting occupancy grids, for example. In this section, the specific case of SLAM with 2D laser range finders is discussed. Several experiments show that transmitting a pose graph map representation and updating it over time saves a significant amount of bandwidth over time than updating a traditional occupancy grid map, e.g. as generated by particle filter methods such as [71]. To allow a fair comparison, only the map representation and its sequential updates are analyzed, which meant implementing a derived pose graph representation of a particle filter mapping result. Thus the results in this section do not contain any edge information as the particles in the particle filter only contain trajectory information, i.e. global pose estimates for the individual sensor observations.
5.1.1
Transmission
There are various ways to transmit updates to both the occupancy grid and pose graph. Several message types are defined below in order to formalize their bandwidth requirements. The most efficient way to update a grid map is to send partial updates, either through a list of changed grid cells with their new values, or through sending a complete subset of the map defined by a bounding box. In the following, these are named CellList and BoundingBox, respectively. To be as efficient as possible, BestGrid dynamically chooses the best of CellList and BoundingBox for each update since each message excels in different scenarios. If many cells close to each other change, it might be best to use BoundingBox. Otherwise, CellList will probably require less bandwidth. Specifically, the two message types require the following amount of memory: 1. CellList: Transmits (xc , yc , vc )Nc which requires 3 · Nc · 4 bytes. Let CellList(Nc ) denote this size. (x −x )(y2 −y1 )
2. BoundingBox : Transmits (x1 , y1 , x2 , y2 , vc 2 1 bytes. Let BoundingBox(Nb ) denote this size. 65
) which requires (4 + Nb ) · 4
5. Multi-Robot Graph-based SLAM 3. BestGrid : Let BestGrid(Nc , Nb ) = min(CellList(Nc ), BoundingBox(Nb )). with the number of changed cells Nc and the number of cells in a bounding box Nb . The above also assumes that 4-byte f loats or ints are used to represent the data. For the pose graph, the plain laser range scan is sent, along with the pose of the sensor when the scan was taken. This corresponds to the full vertex information v = (x, z). If the current particle changes, the updated poses of all previously sent laser range scans are transmitted, i.e. the list x1:t . Since both messages are necessary to transmit the simplified pose graph completely, they will be jointly named PoseGraph. It is important to notice that the regular update message for PoseGraph has constant size since it depends on the physical resolution of the sensor. Again assuming 4-byte f loats or ints, the message will consist of (3 + 181) · 4 = 736 or (3 + 361) · 4 = 1456 bytes for a laser range finder with 181 or 361 beams, respectively. In case the poses of previous observations are included, the message requires (3 + Nl + 3 · Ns ) · 4 bytes, with Nl being the number of beams of the laser sensor, and Ns being the number of previous scans. Let P oseGraph(Nl , Ns ) denote this size. Note that Nl is dependent on the sensor, and here set to a constant value of Nl = 181. These three parameters Nc , Nb , Ns and their implications are briefly discussed in the next section.
5.1.2
Analysis
To properly compare the two map representations, an analysis of the corner cases as well as the best and worst cases is required. First, the globally best and worst cases are considered. For both representations, the best case is to have no particle changes at all, that is the initial estimate is always the correct one. In this case, only incremental updates are needed for which the least amount of data has to be sent. Practically, Nc , Nb , and Ns are all minimized. In the worst case each map update causes the best particle to change. This applies for both representations as well. A particle change would mean sending the entire map for the occupancy grid and sending the whole path along with the current laser range scan for the pose graph. This would result in the largest possible messages to be sent for every update. In terms of the previously identified parameters, all (Nc , Nb , Ns ) are maximized. However, these scenarios are not very informative for a comparison between the two. Much more interesting are the cases where one representation requires significantly less data to be sent than the other, i.e. BestGrid(Nc , Nb ) N then 34 sort X by joint probability of assigned vertex poses per hypothesis; 35 truncate X to contain only the N most probable elements; 36 end 37 end
92
6.3 Reduction Methods off time and memory requirements versus accuracy, only the most probable N assignment sets are kept after each iteration. The algorithm is deterministic, it produces the same sets of global pose assignments given the same parameters. This is especially useful as it allows for a fast and consistent filtering of congruent pose estimates in the MoG pose graph. Only the global pose assignment set that is most probable, i.e. the best ranked set after algorithm 3, is used to select the component of each edge mixture for further optimization. For each edge, the pose difference tji = xj xi between the two connected vertices vi and vj is computed based on their assigned global poses from this set. Then the component which assigns the highest weighted probability - its net contribution to the full mixture probability - to the pose difference is chosen. Concretely, the selected component is h i m∗ = argmax πm N(tji |µm , Σm ) (6.2) m
or equivalently m = argmax 2 ln ∗
m
πm |2πΣm |
−
(tji
T
µm )
j Σ−1 m (ti
µm )
(6.3)
This method is denoted as Prefilter. Also other approaches were considered and tested, including: • Random selection of components during each iteration • Greedy selection of most probable components during each iteration (as in Olson and Agarwal [124]) • Greedy selection of closest components based on residual during each iteration • Random selection of components, subsequent optimization, and finally reporting the best result (RANSAC-like) However, each of these more heuristic methods did not perform well, especially in comparison to the above Prefilter method, and they are thus not included in the discussion here for space constraints.
6.3.3
Generalized Prefilter for Hypergraphs
The extension of the original Prefilter algorithm of the previous section towards also handling hyperedges in addition to multimodal ones is rather straight forward. Algorithm 4 shows the pseudocode for the Prefilter method extended to hypergraphs. The main difference to the original Prefilter is that through choosing the hypercomponent to follow, the underlying graph topology for each sample changes. Therefore, the state of the whole minimum spanning tree traversal has to be kept associated with the corresponding pose sample in a list of traversal state T. For simplicity, each MoG component also gives rise to a new traversal state instance, even though they do not change the graph topology and some data is duplicated. This simplification also allows straightforward parallelization of the algorithm for large values of N and complex graphs. However, the implementation used in the experiments is single threaded to allow a fair comparison. 93
6. Optimization Methods for the Generalized Graph SLAM Framework Algorithm 4: The Prefilter algorithm. Input: MoG Hyper PoseGraph G, maximum number of hypotheses N Output: X: a set of N sets of vertex poses X = {xi } 1 initialize an empty list T of traversal states; 2 let t be a traversal state; 3 t.X = {x1 }; 4 t.Vused = {v1 }; 5 t.Eused = ∅; 6 initialize priority queue t.P to sort by edgeweight(e); 7 for all adjacent edges e of v1 do 8 enqueue(t.P , (v1 , e)); 9 t.Eused = t.Eused ∪ {e}; 10 end 11 append t to T; 12 while ∃t ∈ T : ¬empty(t.P ) ∧ |t.Vused | < |V | do 13 for ∀t ∈ T : ¬empty(t.P ) ∧ |t.Vused | < |V | do 14 (v, e) = dequeue(t.P ); 15 if v = e.vstart then 16 for every hyperedge component j do 17 ExpandM ultimodal(T, t, v, vj , cj ); 18 end 19 else 20 let j be the hyperedge component of e where vj = v; 21 ExpandM ultimodal(T, t, vj , v, invert(cj )); 22 end P 23 if N j=1 e.πj = 1 then 24 remove t from T; 25 end 26 end 27 if |T| > N then 28 sort T by joint probability of assigned vertex poses X of each element; 29 truncate T to contain only the N most probable elements; 30 end 31 end S 32 X = t∈T t.X;
Note that the null hypothesis is never directly referenced in the algorithm. By design of the algorithm, keeping an unmodified version of the current traversal state t in the list T corresponds to the case where the current edge e is not used, i.e. where the null hypothesis is chosen. This works since edges are marked as used when they are enqueued in the priority queues, and dequeueing an edge from t without using it effectively deletes it from the graph topology for t. Furthermore, calling ExpandM ultimodal(. . . ) does not change the passed current traversal state, only new traversal states are generated corresponding to all modes. This means that by line 23 in algorithm 4, the current traversal state t is unchanged, and thus corresponding to the null hypothesis. Line 23 checks if the null 94
6.3 Reduction Methods Algorithm 5: edgeweight(e) Input: MoG Hyper PoseGraph edge e ∈ E Output: computed edge weight ω 1 ω = 0; 2 for all constraints cj in e do 3 ω = ω + Mk ; 4 end 5 return ω; Algorithm 6: ExpandM ultimodal(T, t, v, vnext , c) Input: List of traversal states T, current traversal state t, current vertex v, next vertex vnext , multimodal constraint c Output: Modified list of traversal states T 1 for every multimodal component m in c do 2 make a new traversal state tnew as a copy of t; 3 x = pose of v in t.X; 4 tnew .X = tnew .X ∪ {x ⊕ µm }; 5 tnew .Vused = tnew .Vused ∪ {vnext }; 6 for every edge eadj adjacent to vnext that is not in tnew .Eused do 7 enqueue(tnew .P , (vnext ,eadj )); 8 tnew .Eused = tnew .Eused ∪ {eadj }; 9 end 10 append tnew to T; 11 end
hypothesis is inadmissible by checking its weight, and if it is not, removes t from T, thus not following the null hypothesis. The final set of sorted vertex poses X can be used to select not only components from a MoG, as described in the previous section, but also hyperedge components in the same way.
6.3.4
Optimizing the resulting unimodal pose graph
For the second step of optimizing the resulting unimodal pose graph, two state of the art methods are used. First of all, an improved version of Olson’s Stochastic Gradient Descent (SGD) [118] as implemented in the popular TORO library [68] is used. An important detail of this implementation, and SGD in general, is that it uses a relative parameterization of the vertex poses, such that any vertex pose is computed as the sum of relative pose increments from the start of the trajectory to itself. In particular, TORO uses a spanning tree for this relative parameterization in order to make better use of the underlying graph structure. Only the pose increments of each vertex relative to its parent in the spanning tree are changed during optimization. This allows good global convergence because the actual global vertex poses change significantly when small increments are applied to the relative poses, and SGD is thus less likely to find local minima. However, the optimization may 95
6. Optimization Methods for the Generalized Graph SLAM Framework also oscillate significantly, which is managed by a step size schedule which reduces the step length every iteration. Additionally, due to speed considerations, a diagonal approximation of the hessian is used in SGD, which may reduce convergence speed and accuracy. Furthermore, a sparse Levenberg-Marquardt (LM) method as implemented in the g 2 o library [98] is employed. LM is a very well known least-squares method, and is guaranteed to converge, but only to the nearest optimum. The method cannot jump out of one basin of convergence to another, as SGD can, which makes it even more important to find a good initial guess. In contrast to SGD, LM uses the full hessian information, and thus also off-diagonal elements of the constraint covariances, which is expected to make it converge faster. The g 2 o library implements a robust least squares formulation, namely iteratively reweighted least squares (IRLS), and is thus the only method usable for the Multi-Edge approach. In this method, each constraint is assigned a weight in each iteration dependent on the current residual. The g 2 o library utilizes the Huber cost function [86], which is quadratic close to zero, and linear with larger residuals, but still smooth and convex. In theory, this should allow the Levenberg-Marquardt method to behave well even in the presence of outliers such as inconsistent constraints in the graph. The robust LevenbergMarquardt formulation is denoted with LM in all methods below that use it as second step, except in the Exhaustive case where no outliers due to the incorrect selection of modes are expected. Additionally, a sparse Gauss-Newton (GN) method also implemented in theg 2 o library [98] is employed as well. GN, similarly to LM, is a classical least-squares method that, in contrast to LM, does not require the explicit hessian of the square cost function. Instead, the jacobian is used to approximate it, which makes it very fast. Most of the other caveats of LM apply, however, such that it will not escape local minima as well as SGD does, for example. As this method is also implement in the g 2 o library, it also takes advantage of the robust Huber cost function as implemented there. In the following, all Reduction methods are denoted with the according prefix for the first step and the related postfix for the second step. For example, Exhaustive SGD refers to exhaustive filtering as first step for finding a unimodal replacement in combination with Stochastic Gradient Descent (SGD) as subsequent unimodal optimization in the second step. Accordingly, Exhaustive LM is the combination of exhaustive filtering with Levenberg-Marquardt optimization. Prefilter GN would refer to the combination of the Prefilter reduction step and the Gauss-Newton optimization method.
96
Chapter 7
Results of Systematic Experiments on Local Ambiguity with Synthetic Datasets The discussion of experimental results is split into two distinct parts. This chapter focuses on a systematic evaluation and characterization of the proposed optimization methods using synthetic MoG pose graphs to motivate and analyze the generic aspects involved in multimodal graph SLAM. The synthetic data sets have the important advantage that the amount of multimodality in the constraints can be easily controlled and varied. Especially, it will be shown that standard SLAM methods, in particular traditional graph based and particle filter SLAM methods, are challenged by ambiguous registration results. Furthermore, MoG pose graphs that model these ambiguities in constraints with multimodal distributions together with Prefilter SGD/LM optimization are a very promising alternative for this case.
7.1
Generation of Random Multimodal Graphs
The random MoG pose graphs used in section 7.2 for a systematic evaluation of the different methods are generated with algorithm 7. In the experiments, multimodal registration results are assumed to be an exception, i.e. most edges contain a standard Gaussian and only a few contain a mixture. In addition, the number of modes is assumed to be small. Note that the exact number of multimodal edges with their exact number of modes is provided as input to the algorithm, thus the amount of multimodality can be exactly controlled. The algorithm 7 simply ensures that there is a proper mode per edge, i.e. that there is one in principle correct (but noisy) registration, by checking that the related spatial transformation does not pass through a wall such that there is a line of sight between the two related vertices. The Gaussian noise for the proper first constraint per edge is initialized with covariance Σ = diag(1 + .05|x|, 1 + .05|y|, .01 + .01|θ|). So, in algorithm 7 σxy = .05 and σθ = .01, as well as δxy = 1. and δθ = .01. Here, x, y, and θ denote the relative ground truth pose of the target vertex relative to the base vertex of the constraint. If an additional constraint is added, a random weight between 0.01 and 1 is generated; this is followed by a renormalization of all weights. 97
7. Results of Systematic Experiments on Local Ambiguity with Synthetic Datasets
(a) Max SGD
(b) Prefilter SGD
(c) Max LM
(d) Prefilter LM
(e) Particle (10k)
(f) Trust-Region Newton
Figure 7.1: This figure shows an example multimodal pose graph of complexity C(G) = 4, meaning it contains four two-component mixtures, all other edge distributions are unimodal Gaussians. Edges that are assigned a low probability transformation are shown in dark grey/magenta. The ground truth is shown in light grey in the background. Note that the left column represents the state-of-the-art up to 2012.
98
7.1 Generation of Random Multimodal Graphs
Algorithm 7: Algorithm to generate the multimodal graphs
1 2 3 4 5 6
7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 39 40 41 42
Input: Minimum and maximum vertex distances d− and d+ Input: Number of vertices and edges NV and NE Input: For each number of components m = 2...M , the number of desired edges containing this number of components Nm Input: Translation variance factor σxy and rotation variance factor σθ Input: Translation variance offset δxy and rotation variance offset δθ Output: MoG Pose Graph G ex = (1, 0, 0) ey = (0, 1, 0) eθ = (0, 0, 1) while |V | < NV do Sample a pose in free space x with random orientation if ∃vi ∈ V : the distance from x to xi is between d− and d+ and the line between them does not intersect an obstacle then tgt = x xi Σ = diag(δxy + σxy ||tTgt ex ||, δxy + σxy ||tTgt ey ||, δθ + σθ ||tTgt eθ ||) µ = tgt ⊕ N(0, Σ) vnew = (x, )˙ V = V ∪ vnew E = E ∪ (vi , vnew , ({1}, {µ}, {Σ})) end end while |E| < NE do Select random vertex vi ∈ V for all vertices vj not connected to vi do if the distance from xi to xj is between d− and d+ or there is no line of sight between the two then continue end tgt = xj xi Σ = diag(δxy + σxy ||tTgt ex ||, δxy + σxy ||tTgt ey ||, δθ + σθ ||tTgt eθ ||) µ = tgt ⊕ N(0, Σ) E = E ∪ (vi , vj , ({1}, {µ}, {Σ})) end end for ∀m = 2...M do for ∀n = 1...Nm do Select random edge ei ∈ E which has only one component (va , vb , c) = ei (π, µ, Σ) = c while |c| < m do Sample a pose in free space x with random orientation if distance from xa to x is not between d− and d+ or there is no line of sight between them then continue end µ = x xa Σ = diag(σxy ||µT ex ||, σxy ||µT ey ||, σθ ||µT eθ ||) c=c end end end
99
7. Results of Systematic Experiments on Local Ambiguity with Synthetic Datasets Furthermore, the algorithm ensures that the edges have a reasonable length, i.e. that no registrations are assumed between far away places. All 110 synthetic graphs are sampled from the environment shown in Fig. 7.1. This environment spans an area of 1300 by 900 units. Each edge is approximately 100 units long on average. Each graph consists of 128 vertices and 256 edges. Any edge is between 75 and 230 units long by design. Note that all “ground truth” modes, i.e. the ones which represent a in principle correct but just noisy spatial transformation, are generated in the same way as all other components in the mixtures. This reflects the idea of local ambiguity: any of the components in a mixture are potential registration results.
7.2
Optimization Results over Increasing Levels of Complexity
Synthetic pose graphs were generated to test the presented methods in depth. A total of 110 different random pose graphs are used, broken up into 11 separate cases with varying complexity with respect to the amount of multimodality in the constraints. In all cases, multimodal constraints are assumed to be an exception, i.e. most edges contain a standard Gaussian and only a few contain a mixture. Also, the number of modes in case of mixtures is quite small. As will be shown, even these relatively small amounts of multimodality have significant effects. All 110 graphs are generated from the same environment as shown in Fig. 7.1. The exact algorithm to generate the graphs is included in Appendix 7.1. The degree of multimodality plays a significant role in the shape of the log probability function. As this effect is mainly due to the combinatorial properties of pose composition
condition 1 2 3 4 5 6 7 8 9 10 11
C(G) 1 2 3 4 8 16 32 7.92 8 15.92 31.85
#edges with X modes X=1 X=2 X=3 X=4 255 1 0 0 254 2 0 0 253 3 0 0 252 4 0 0 248 8 0 0 240 16 0 0 224 32 0 0 251 0 5 0 252 0 0 4 244 6 5 1 232 12 10 2
MM % 0.4 0.8 1.2 1.6 3.2 6.4 12.8 2.0 1.6 4.8 9.6
Minimum Distance 38.996 15.909 70.701 29.372 21.000 5.428 12.322 6.957 5.419 25.285 6.052
Minimum Mahalanobis 509.816 160.058 1917.990 428.785 416.107 9.109 134.981 180.949 98.743 362.238 60.956
Table 7.1: The 11 conditions used in the experiments with their different amounts of multimodal edges and their degree of multimodality C(G). The overall percentage of multimodal edges (MM%) is also shown. On the right, the minimum Euclidean and squared Mahalanobis distances between two components from the same mixture are also shown. On average, components were located 189.045 units from each other, with an average Mahalanobis distance of 20,825.7.
100
7.2 Optimization Results over Increasing Levels of Complexity # 1 2 3 4 5 6 7 8 9 10 11
Max LM 40% 0% 30% 40% 20% 0% 0% 20% 20% 0% 0%
Max SGD 60% 40% 0% 0% 0% 0% 0% 0% 0% 0% 0%
Particle 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
Multi-Edge LM 40% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
Newton 80% 70% 60% 60% 100% 70% 100% 70% 70% 80% 80%
Prefilter LM 80% 0% 60% 60% 40% 50% 70% 70% 60% 70% 50%
Prefilter SGD 100% 100% 100% 100% 100% 100% 90% 100% 100% 100% 100%
Table 7.2: Successes by method and condition (table 7.1) in terms of the number of trials that produced a result within 5 times the SSExy and SSEθ error of the Exhaustive SGD method. The Particle method used 10,000 particles.
in the graph with multiple components, we can define the following complexity metric: Y X C(G) = log2 Mk = log2 (Mk ) (7.1) (vi ,vj ,ck )∈E
(vi ,vj ,ck )∈E
For example, a graph with a single two-component edge and all other edges being unimodal would have a multimodal complexity of C(G) = 1. In condition 8, a graph with 5 threecomponent edges has a complexity of C(G) = log2 (35 ) = 7.92. Similarly, a graph without any MoG constraints would have a C(G) = 0. The different cases considered for the experiments in this article are shown in table 7.1. Each method described in section 6 was used to optimize each generated pose graph. As traditional optimization methods used for comparison with the standard unimodal case, the Max SGD and Max LM methods are assumed not to know about multimodal constraints, and thus are initialized with a breadth-first assignment of poses using the maximum component for each edge. Trust-Region Newton, Prefilter SGD and Prefilter LM, as well as Multi-Edge LM, are initialized with the most probable set of global vertex poses produced with the Prefilter step described in section 6.3.2. Here, N = 200 was chosen for the good trade-off between computation speed (≈ 2ms) and accuracy. Only the best result of these was used as the initial condition for optimization and to select modes as described above. The Particle method did not need initialization, and since it is not deterministic it was run 10 times per graph. All other methods are deterministic and thus did not need additional trials. Table 7.2 summarizes the number of trials that delivered a result within 5 times the residual of the Exhaustive SGD method, which is a canonical comparison basis as it represents the best achievable result by all methods described in this paper. Exhaustive LM did not perform quite as well on these graphs, most probably due to the differences in the underlying SGD and LM optimization methods (see section 6.3.4). Both methods are initialized with the same graph and starting conditions. LM is very sensitive to the starting conditions, while SGD is slightly more robust in that respect. LM was run without the robust cost function for the Exhaustive case, since no outliers are expected due to small errors in the reduction process. At large, the differences between LM and SGD in the Exhaustive case are coincidental and not of concern in this paper. They are 101
7. Results of Systematic Experiments on Local Ambiguity with Synthetic Datasets to show what is achievable with either method, and mainly used as a comparison basis with the reduction heuristics discussed below. From this brief summary, it is clear that the standard graph based methods Max SGD and Max LM do not work well with multimodal registration results, even if there are only very few of them. Max LM can sometimes achieve reasonable results in a few relatively simple conditions, mostly because of its robust cost function. Max SGD only achieves very few good results in two simplest conditions. With increasing complexity, the Max SGD and Max LM methods do not achieve any reasonable results anymore. It is also clear that the Particle method often fails to converge, especially with very multimodal graphs, even though an extremely large number of 10,000 particles was used. More surprising is the complete failure of the Multi-Edge LM method. It appears that instead of converging to a single mode per constraint, it converged to some configuration in between multiple modes. Thus, the final distance to ground truth is quite large. The Trust-Region Newton method performs very well, and is only outperformed by Prefilter SGD. This method is of note since it is the only one taking the full multimodal Mixture of Gaussian cost function into account. The Multi-Edge LM method also uses all components, but does not retain the component weights. The performance of the Prefilter SGD method proposed here is much better and more consistent. It achieves a very accurate result in nearly all cases. An illustrative example result of the Max SGD/LM and Prefilter SGD methods is shown in Fig. 7.1. Fig. 7.2 shows the distribution of SSE residual errors after optimization for each method by condition. Since these distributions can be very skewed, the five-number summary (minimum, lower quartile, median, upper quartile, and maximum) was used to better represent the performance of each method in these figures. The figure shows that Prefilter SGD’s performance is almost identical to the Exhaustive SGD method, which obviously requires substantially more computation time. Here, it is also clear that Exhaustive LM and Prefilter LM converge to nearly the same result as well. The difference of the final results is rather caused by the differences between the underlying SGD and LM optimization methods, not the quality of the Prefilter approach. The Trust-Region Newton method converges to very good results as well, though usually to a similar one as the LM -based methods above. This is probably due to the fact that it is so closely related to the LM method. The Multi-Edge LM method takes all components into account, but not their weights. Unfortunately, the conceptually interesting Multi-Edge LM method presents similarly bad results as the Max SGD/LM methods. In the lower complexity graphs it performs well, but it diverges significantly already at a complexity of 8 (condition 5) or more. The high similarity of results achieved by the Prefilter and Exhaustive methods, when used with the same underlying optimization method, shows that the Prefilter step as described in section 6.3.2 allows a very close approximation of the combination of modes used by the Exhaustive method in significantly less time. While the exhaustive search requires exponential time in the graph complexity, i.e. O(2C(G) ), the filtering is done in O(|V |). Since the optimal combination is only approximated, the filtering can report a wrong assignment, which happened for example in condition 7 where especially Prefilter SGD produced a larger variance of results (see both table 7.2 and Fig. 7.2). The Particle method did not converge to a good result consistently, even with 10,000 particles. It appears that the globally best particles are assigned too little probability early on in the filtering process and are thus filtered out during the resampling phase. 102
7.2 Optimization Results over Increasing Levels of Complexity
Particle (10k) Max LM Max SGD
Multi-Edge LM Prefilter LM Prefilter SGD
Exhaustive LM Exhaustive SGD Trust-Region Newton
106
SSExy
105
104
103
102 1
101
2
Particle (10k) Max LM Max SGD
3
Multi-Edge LM Prefilter LM Prefilter SGD
4
5 7 6 Complexity Condition
8
9
10
11
8
9
10
11
Exhaustive LM Exhaustive SGD Trust-Region Newton
100
SSEθ
10-1
10-2
10-3
10-4
1
2
3
4
5 7 6 Complexity Condition
Figure 7.2: Performance of each method by condition. Reported figures are the minimum, lower quartile, median, upper quartile and maximum of the residual SSE errors relative to ground truth. Smaller is better. The marker location specifies the median, a vertical line is drawn between the minimum and maximum, and horizontal line ticks indicate the quartiles. In some cases, the median marker may occlude the quartile marks. Note the log scale on the y axis. SSExy is shown in the top graph, the bottom shows SSEθ .
103
7. Results of Systematic Experiments on Local Ambiguity with Synthetic Datasets However, even further increasing the number of particles significantly did not show much improvement; neither did increasing or lowering the minimum effective particle count Nmin in algorithm 2. Another interesting observation is that the residual error to ground truth of the Max SGD and Max LM methods, as shown in Fig. 7.2, actually increases exponentially with the graph complexity C(G) (note the log scale). The more complex the graph, the worse the approximation that Max SGD/LM relies on. Table 7.3 shows measured runtimes by method and condition. The Max SGD/LM, Multi-Edge LM, and Prefilter SGD/LM methods have nearly constant runtimes, as their computational cost is dominated by the underlying Stochastic Gradient Descent or LevenbergMarquardt implementation, which in turn depends only on the size of the graph. The Trust-Region Newton method is the slowest, caused by the prototype implementation in Matlab. A large part of the processing time is spent evaluating the MoG Pose Graph cost function in each iteration. Since the number of iterations until convergence varies, the standard deviation of the required time becomes especially large due to this factor. We believe that an efficient implementation in C++ would not be slower than the other methods as the same sparsity patterns can be exploited. The filtering of components only requires around one millisecond for the specific size of the graphs studied here, which results in a very small increase of computation time in the Prefilter SGD/LM methods. Some runtimes of Exhaustive SGD and Exhaustive LM were not recorded, as the exponential factor would have made the experiment infeasible, e.g. individual trials in conditions 7 and 11 would have taken almost 10 years to complete. However, because the ground truth is known, it was still possible to compute the ideal results by only considering the correct combination. Note that table 7.4 shows a few additional results to provide some idea of the asymptotical behavior of Prefilter SGD. More graphs of the same size as the ones described in table 7.1 were generated with up to 11 modes per edge and a complexity of up to C(G) = 512. In the most complex graph, only a single edge was left with a unimodal Gaussian constraint. The table shows the predicted combinatorial explosion.
7.2.1
The Influence of Good Odometry Estimates
We believe that good odometry estimates are the single most important factor that allowed traditional methods to succeed so far. In order to substantiate this postulate, and also to compare with the results achieved by Stachniss et al. [166], a number of trials were run with odometry estimates fused with the Mixture of Gaussian pose graphs from the previous results. Odometry with two separate noise levels was used to illustrate their effect in the context of the Particle method. The odometry noise added to the ground truth transformation was computed as follows: σxy = vxy · d
(7.2)
σθ = vθ · max(.4π, a + .00001d) odoji
= xj xi ⊕ N(0, diag(σxy , σxy , σθ ))
(7.3) (7.4)
where d is the ground truth distance and a is the absolute ground truth rotation angle. In the first noise level (Level 1 ), vxy = 0.01, and vθ = 0.002. In the second noise level, 104
105
Exhaustive LM 0.0333 σ0.0056 0.1447 σ0.0969 0.2544 σ0.1393 0.4504 σ0.2660 18.5471 σ8.5286 ≈ 103.7 ≈ 108.5 18.0294 σ8.7641 12.4474 σ8.7379 ≈ 103.7 8.4 ≈ 10 -
Exhaustive SGD 0.0659 σ0.0022 0.1390 σ0.0025 0.2755 σ0.0091 0.5494 σ0.0153 8.9363 σ0.2580 ≈ 103.3 ≈ 108.2 4.1744 σ4.1132 4.4003 σ4.3367 ≈ 103.7 8.4 ≈ 10 -
Max LM 0.0262 σ0.0112 0.0350 σ0.0174 0.0553 σ0.0568 0.0475 σ0.0380 0.0227 σ0.0058 0.0376 σ0.0352 0.0719 σ0.0274 0.0202 σ0.0079 0.0465 σ0.0352 0.0738 σ0.0500 0.0287 σ0.0166
Max SGD 0.0298 σ0.0010 0.0322 σ0.0031 0.0303 σ0.0009 0.0310 σ0.0023 0.0303 σ0.0013 0.0297 σ0.0012 0.0298 σ0.0009 0.0310 σ0.0021 0.0305 σ0.0017 0.0306 σ0.0005 0.0313 σ0.0016
M-E LM 0.0452 σ0.0307 0.0306 σ0.0208 0.0522 σ0.0328 0.0691 σ0.0533 0.0762 σ0.0416 0.0533 σ0.0435 0.0423 σ0.0304 0.0534 σ0.0293 0.0790 σ0.0316 0.0402 σ0.0330 0.0295 σ0.0209
Particle 2.2399 σ0.0591 2.2329 σ0.0466 2.5432 σ0.4449 2.4625 σ0.3710 2.8486 σ0.4083 2.6239 σ0.4432 2.8482 σ0.3115 2.5284 σ0.3911 2.8559 σ0.3266 2.7521 σ0.3410 2.5556 σ0.3424
Newton 9.5689 σ11.8021 24.9870 σ32.4403 3.2163 σ1.6373 1.3982 σ0.1244 4.9639 σ3.3042 2.7607 σ0.8787 12.4078 σ13.3939 18.7410 σ23.2703 2.5514 σ1.0478 2.7327 σ1.2642 2.4349 σ1.1637
Prefilter LM 0.0231 σ0.0083 0.0224 σ0.0072 0.0269 σ0.0081 0.0356 σ0.0138 0.0245 σ0.0098 0.0322 σ0.0084 0.0354 σ0.0111 0.0205 σ0.0070 0.0385 σ0.0167 0.0289 σ0.0063 0.0276 σ0.0076
Prefilter SGD 0.0392 σ0.0262 0.0299 σ0.0005 0.0309 σ0.0021 0.0309 σ0.0023 0.0301 σ0.0012 0.0308 σ0.0027 0.0305 σ0.0021 0.0298 σ0.0008 0.0304 σ0.0014 0.0304 σ0.0015 0.0316 σ0.0017
Table 7.3: Runtimes (means and standard deviations) in seconds by method and condition. The experiments were run on a Intel Core i7-2720QM, 2.2GHz, 8GB RAM. All implementations were done in C++, only Newton was implemented in Matlab. The Particle method used 10,000 particles. M-E LM stands for Multi-Edge LM.
11
10
9
8
7
6
5
4
3
2
# 1
7.2 Optimization Results over Increasing Levels of Complexity
7. Results of Systematic Experiments on Local Ambiguity with Synthetic Datasets C(G) Runtime (s) Successes SSExy SSEθ
32 0.03 100% 51.07 σ16.86 0.00 σ0.00
64 0.05 80% 535.46 σ840.28 0.03 σ0.07
128 0.93 10% 3250.20 σ3654.21 0.08 σ0.05
256 44.32 20% 5255.02 σ5607.53 0.22 σ0.17
512 232.22 0% 155257.52 σ348952.17 0.91 σ1.34
Table 7.4: Additional trials of Prefilter SGD to investigate its asymptotical behavior. Note the exponential complexity. The mean runtime is shown in seconds. Means and standard deviations of the final SSE error are reported. As in table 7.2, successful trials are reported as a percentage within 5 times the SSE error of Exhaustive SGD.
vxy = 0.04, and vθ = 0.004. For each noise level, one odometry estimate per edge was generated and fused with the corresponding edge. This is done using the measurement update equations of the Gaussian Sum Filter [1, p.214]. As usual, the mean and covariance of a single component m are updated as follows: µm = µ ¯m + Km (µodo − µ ¯m ) ¯ m − Km Σ ¯m Σm = Σ
(7.5)
¯ m (Σ ¯ m + Σodo )−1 =Σ
(7.7)
Km
(7.6)
This is done for each component. Finally, the fused weights are ¯ m + Σodo ) πm ∝ π ¯m p(¯ µm |µodo , Σ
(7.8)
¯ m , and π where µ ¯m , Σ ¯m are the component parameters before fusion. The weight update has the result that components that are far away from the odometry estimate are weighted significantly less. In the results described in section 7.2.1, a component is dropped from the fused mixture if its weight drops below 0.005 after normalization. After one or more components have been dropped from the mixture, the remaining weights are normalized again. The approximation error can be made arbitrarily small by selecting a lower threshold. Figure 7.3 shows the performance of the Particle method on these fused graphs, as well as the unfused ones for comparison. For the first noise level, the number of particles was set to 90. 160 particles were used for the second noise level. A lower number of particles was needed with odometry than without since the fusion process severely discounted modes far away from the odometry result and thus reduced multimodality. These results are comparable with those of Stachniss et al. [166]. It is clear to see that the Particle method does not converge to a good solution without odometry information, and in fact presents a very large variance in all computed solutions. The Particle method cases with fused odometry show a much reduced variance and better results, especially in the rotation error. This is to be expected due to the corrective effects of odometry. Note that the recovered trajectory was of good quality in the cases with odometry while using a relatively reasonable number of particles, replicating and validating the findings of Stachniss et al. [166]. Additionally, it is clear from the data that 106
7.2 Optimization Results over Increasing Levels of Complexity
Odometry Level 1 (90)
Odometry Leve 2 (160)
No Odometry Information (10k)
106
SSExy
105 104 103 102 101
101
1
2
3
Odometry Level 1 (90)
4
5 7 6 Complexity Condition
Odometry Level 2 (160)
8
9
10
11
9
10
11
No Odometry Information (10k)
100
SSEθ
10-1
10-2
10-3
10-4
1
2
3
4
5 7 6 Complexity Condition
8
Figure 7.3: Performance the Particle method by condition and different levels of fused odometry. Reported figures are the minimum, lower quartile, median, upper quartile and maximum. The marker location specifies the median, a line is drawn between the minimum and maximum, and line ticks indicate the quartiles. Note the log scale on the y axis. SSExy is shown in the top graph, the bottom shows SSEθ . Also, note the different number of particles used for the three cases (90, 160, and 10,000).
107
7. Results of Systematic Experiments on Local Ambiguity with Synthetic Datasets
107
Max LM - Level 1 Prefilter LM - Level 1
Max LM - Level 2 Prefilter LM - Level 2
Max LM - No Odometry Prefilter LM - No Odometry
106 105
SSExy
104 103 102 101 100 1
101
2
Max LM - Level 1 Prefilter LM - Level 1
3
4
Max LM - Level 2 Prefilter LM - Level 2
5 7 6 Complexity Condition
8
9
10
11
8
9
10
11
Max LM - No Odometry Prefilter LM - No Odometry
100
SSEθ
10-1 10-2 10-3 10-4 10-5 1
2
3
4
5 7 6 Complexity Condition
Figure 7.4: Performance the Max LM and Prefilter LM methods by condition and different levels of fused odometry. Reported figures are the minimum, lower quartile, median, upper quartile and maximum. The marker location specifies the median, a line is drawn between the minimum and maximum, and line ticks indicate the quartiles. Note the log scale on the y axis. SSExy is shown in the top graph, the bottom shows SSEθ .
108
7.2 Optimization Results over Increasing Levels of Complexity
106
Max SGD - Level 1 Prefilter SGD - Level 1
Max SGD - Level 2 Prefilter SGD - Level 2
Max SGD - No Odometry Prefilter SGD - No Odometry
105
SSExy
104 103 102 101 100
101
1
2
Max SGD - Level 1 Prefilter SGD - Level 1
3
4
5 7 6 Complexity Condition
Max SGD - Level 2 Prefilter SGD - Level 2
8
9
10
11
8
9
10
11
Max SGD - No Odometry Prefilter SGD - No Odometry
100
SSEθ
10-1 10-2 10-3 10-4 1
2
3
4
5 7 6 Complexity Condition
Figure 7.5: Performance the Max SGD and Prefilter SGD methods by condition and different levels of fused odometry. Reported figures are the minimum, lower quartile, median, upper quartile and maximum. The marker location specifies the median, a line is drawn between the minimum and maximum, and line ticks indicate the quartiles. Note the log scale on the y axis. SSExy is shown in the top graph, the bottom shows SSEθ .
109
7. Results of Systematic Experiments on Local Ambiguity with Synthetic Datasets
# 1 2 3 4 5 6 7 8 9 10 11
Level 1 (90) 0.0197 σ0.0013 0.0195 σ0.0007 0.0199 σ0.0011 0.0199 σ0.0008 0.0197 σ0.0009 0.0202 σ0.0010 0.0202 σ0.0008 0.0195 σ0.0008 0.0198 σ0.0009 0.0198 σ0.0010 0.0199 σ0.0009
Level 2 (160) 0.0335 σ0.0014 0.0331 σ0.0013 0.0333 σ0.0013 0.0332 σ0.0011 0.0335 σ0.0014 0.0340 σ0.0017 0.0336 σ0.0013 0.0337 σ0.0015 0.0336 σ0.0014 0.0336 σ0.0017 0.0335 σ0.0017
None (10k) 2.2399 σ0.0591 2.2329 σ0.0466 2.5432 σ0.4449 2.4625 σ0.3710 2.8486 σ0.4083 2.6239 σ0.4432 2.8482 σ0.3115 2.5284 σ0.3911 2.8559 σ0.3266 2.7521 σ0.3410 2.5556 σ0.3424
Table 7.5: Runtimes (means and standard deviations) in seconds for the Particle method for different levels of odometry applied to the initial pose graph. The experiments were run on a Intel Core i7-2720QM, 2.2GHz, 8GB RAM.
110
111
L1 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
Max LM L2 None 100% 40% 100% 0% 100% 30% 100% 40% 100% 20% 100% 0% 100% 0% 100% 20% 100% 20% 100% 0% 100% 0%
Prefilter LM L1 L2 None 100% 100% 80% 100% 100% 0% 100% 100% 60% 100% 100% 60% 100% 100% 40% 100% 100% 50% 100% 100% 70% 100% 100% 70% 100% 100% 60% 100% 100% 70% 100% 100% 50%
Max SGD L1 L2 None 100% 90% 60% 100% 100% 40% 100% 90% 0% 100% 80% 0% 100% 80% 0% 100% 100% 0% 100% 90% 0% 100% 100% 0% 90% 80% 0% 100% 90% 0% 100% 100% 0%
Prefilter SGD L1 L2 None 100% 90% 100% 100% 100% 100% 100% 90% 100% 100% 80% 100% 100% 80% 100% 100% 100% 100% 100% 90% 90% 100% 100% 100% 90% 90% 100% 100% 90% 100% 100% 100% 100%
Table 7.6: Successes by the Particle, Max LM, Prefilter LM, Max SGD, and Prefilter SGD methods (as in Table 7.2). Odometry information with different noise levels (L1 and L2 ) was merged with the pose graphs before the methods were applied. None means no odometry was fused, as with the other methods summarized in Table 7.2. The unfused results are shown for comparison.
# 1 2 3 4 5 6 7 8 9 10 11
Particle (90/160/10k) L1 L2 None 80% 80% 0% 100% 30% 0% 80% 50% 0% 90% 60% 0% 80% 50% 0% 60% 70% 0% 100% 40% 0% 90% 80% 0% 90% 30% 0% 90% 70% 0% 70% 10% 0%
7.2 Optimization Results over Increasing Levels of Complexity
7. Results of Systematic Experiments on Local Ambiguity with Synthetic Datasets with increasing multimodality of the graph, the performance of the particle filter, even with fused odometry, decreases. Figure 7.4 shows the performance of the Max LM and Prefilter LM methods on the fused graphs. One main observation is that with odometry, the two methods perform virtually identically. The strong effect of the odometry estimate on the MoG weights, even with very noisy odometry, allows the Max LM method to choose the right component. However, as odometry noise increases, the fused component means are further away from ground truth, and both methods converge to a worse result. Much the same can be seen in figure 7.5, showing the performance of the Max SGD and Prefilter SGD methods. The reweighting effect of the odometry estimates is present here as well. Table 7.5 shows the times required to run the Particle method on the fused pose graphs. The Particle method exhibits a rather constant, but large, time requirement, linear with the number of particles. Naturally, the lower number of particles required in the cases with odometry decreases the runtimes accordingly. Additionally, less components have to be sampled, depending on the fusion result. The time required for the graph based methods remained the same as in the cases without odometry discussed above as the size of the graphs did not change. Table 7.6 shows the performance of the Particle, Max LM, Prefilter LM, Max SGD, and Prefilter SGD methods on the graphs with odometry information and, as a reference, on the ones without as well. In general, the result achieved with odometry of any level is much better.
112
Chapter 8
Results on Local Ambiguity from Real World Datasets 8.1
Generating Mixture of Gaussian Estimates from Registration Methods
It is important to note that the specific method used to generate Mixture of Gaussian (MoG) motion estimates or registration results is separate from the general theoretical framework introduced above. Much like in the traditional unimodal case, the choice of sensors or of the registration method is generally irrelevant to the pose graph data structure and the optimization method used in the SLAM system. Since traditional registration methods usually report the uncertainty of the single registration result as a linear Gaussian, some changes are needed to generate MoG constraints. Many possible methods come to mind. For example, there is the option of using any iterative method like ICP [6] that may converge to different local minima given different perturbed initial guesses, much like the computation of the proposal distribution in [166]. Also, many registration methods already generate a list of ranked results, which can be employed in a canonical way for generating a combined MoG result. A concrete example for the second case is presented in this section, namely the extension of plane based registration algorithm to generate multimodal constraints. However, no registration method exists right now to provide multiple registration results as a multimodal MoG in probabilistically correct way. One candidate solution may be 2D iFMI (see section 4.3.2), but fitting a MoG to the dirac parameter domain with Expectation Maximization proved difficult and a complete method was not implemented. Instead, the plane-based registration method [133] is extended to generate good multimodal MoG constraints in this section.
8.1.1
Mixture of Gaussian Results from Plane Registration
The plane matching algorithm introduced in [133] is based on large planar surface patches extracted from range scans. Concretely, plane matching uses an algorithm called Minimally Uncertain Maximal Consensus (MUMC) to determine the unknown plane correspondences through maximizing geometric consistency by minimizing the uncertainty volume in configuration space. These correspondences give rise to a least squares transfor113
8. Results on Local Ambiguity from Real World Datasets mation estimate that respects the plane parameter uncertainties computed during plane extraction. The method also includes closed form expressions for the final transformation covariances. Plane matching is a very recent method, but it has already been used successfully in several applications where only very limited overlap occurs between scans [131, 132, 134]. Nevertheless, multimodal results can also be observed here. MUMC produces a ranked list of candidate transformations where the top ranked result is often the correct one. As long as some parts of the data overlap, the correct transformation usually does appear in the list, but not necessarily as the highest ranked one. Within the main processing loop of the plane based registration algorithm from Pathak et al., new hypotheses are created during each execution and appended to a list [133, Alg. 2]. Instead of choosing only the best result ω ¯ , the list W of all potential results is postprocessed here to deliver a small amount of good results to incorporate in multimodal constraints. The list W consists of the tuples ωi , where each tuple contains a possible registration result and some accompanying information and metrics. In the following, the notation ω.a is used to denote a member a of a tuple ω. Specifically, each tuple ω in W contains the following: 1. the translation vector `r t ˇ 2. the rotation quaternion `r q 3. the translation covariance `r C tt 4. the rotation covariance `r C qˇ qˇ 5. the uncertainty volume α 6. the plane correspondence overlap metric op Note that the same notation as in [37] is used to describe the respective coordinate frames of the plane clouds as left (denoted l) and right (r). See also [133] for a more detailed description. The overlap metric op is computed as follows: op =
#Γ # rP
(8.1)
where #Γ is the number of used plane correspondences, and # r P is the number of planes in the right plane cloud [133]. In effect, this quantifies how much of the complete scene described by planes was used to compute the registration result. The more planes are available for matching, the more should be used as correspondences in a successful match. However, since we expect ambiguous results also due to low overlap, a high threshold for op only prevents coincidental results in very cluttered scenes. As a side effect of the explicit inclusion of multiple registration results, the overlap op is allowed to be significantly lower, yielding more results that would otherwise have been discarded. This allows the inclusion of locally not very likely registration results that may be globally correct. In some cases, these parameter settings produce up to 200 results in the list W, many of which were either very similar to each other due to minor variations in the set of 114
8.1 Generating Mixture of Gaussian Estimates from Registration Methods
Algorithm 8: Post-processing of the complete list of potential solutions W. input : A list of all consistent registrations W from Algorithm 2 in [133] output: The reduced list of potential solutions W∗ 1 2 3
4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
27 28 29 30
Initialize W∗ = ∅ Sort the elements ωi ∈ W by uncertainty volume of the solution ωi .α. Set ωmin to the first ωi ∈ W where the maximum eigenvalue of ωi . `r C qˇ qˇ and ωi . `r C tt is less than λmax . if no such ωmin exists then return end Set W∗ = {ωmin } Set αmax = Lunc · ωmin .α. Set omin = Omin · ωmin .op . for ∀ωi ∈ W do if ωi .op < Op or ωi .op < omin or max eigenvalue of ωi . `r C qˇ qˇ or ωi . `r C tt > λmax or ωi .α > αmax then continue end Set jrep = −1 for ∀ωj∗ ∈ W∗ do ˇ ωj∗ . `r q ˇ < Rmin then if ||ωi . `r t − ωj∗ . `r t|| < Dmin or ωi . `r q jrep = j break; end end if jrep == −1 then Append W∗ ← W∗ ∪ {ωi } end else This ωi may replace ωj∗rep . if ωi .op > ωj∗rep .op and ωi .op < Omax and ωi .α < ωj∗rep .α · Lrep and ˇ ωj∗ . `r q ˇ < Rmax then ||ωi . `r t − ωj∗ . `r t|| < Dmax and ωi . `r q ∗ Replace ωjrep with ωi end end end
115
8. Results on Local Ambiguity from Real World Datasets correspondences used, or simply very unlikely. To reduce the number of redundant results and to exclude very unlikely ones, this list is processed with algorithm 8. Several parameters are used to quickly discard solutions in algorithm 8: 1. Op dictates the minimum overlap allowed overall for a solution to be considered. 2. Lunc expresses the maximum uncertainty volume α relative to the least uncertain solution in W. So, the actual maximum uncertainty volume is αmax = Lunc · ωmin .α. 3. Omin is the minimum overlap relative to the least uncertain solution in W. So the actual minimum overlap is omin = Omin · ωmin .op . 4. Dmin is the minimum Euclidean distance between accepted solutions. 5. Rmin is the minimum angular distance between accepted solutions, which is computed as the absolute angle value of the rotation between two solutions in the axisangle notation ( in the algorithm). 6. λmax is the maximum covariance eigenvalue allowed. In addition, some parameters are needed to determine that a slightly less likely solution (based on the uncertainty volume) may replace another more likely one. This is needed if a very likely solution, that only has very few correspondences, is close to but not as accurate than a slightly less likely solution that used many more correspondences. 1. Lrep expresses the maximum relative uncertainty volume the replacement is allowed to have. 2. Omax is the maximum overlap that allows a potential solution to be considered to be replaced. 3. Dmax is the maximum Euclidean distance to the replacement candidate. 4. Rmax is the maximum angular distance to the replacement candidate. Each result left in W∗ after post-processing is used as a component in the final Mixture of Gaussians for a constraint in the graph. Several ways to compute a good mixture weight for each solution were tried, however the specific method did not seem to make a difference in practice. In the experiments below, the weights are close to uniform, but their ordering keep the original ordering as returned by the processing algorithm above.
8.2
Bremen City Center Dataset
The first experiment with real world data is based on 13 scans that were recorded with a Riegl VZ-400 in the center of Bremen, Germany. Each point cloud consists of between 15 and 20 million points with reflectance information. The scanner was mounted on a tripod without a mobile base, thus no odometry information is available. However, markers were placed in the environment beforehand to allow for a comparison with the “gold standard” for geodetic applications, i.e. registration with artificial markers in the Riegl software which requires additional manual assistance in the process like confirming or re-selecting correspondences. This registration based on artificial markers can also be used to seed 116
8.2 Bremen City Center Dataset methods that need a good initial guess, e.g. for ICP based methods like 6D-SLAM [15]. Note that no initial guess, i.e. no initial marker based registration, no motion estimates, no GPS, or anything similar is used in the results presented here, other than as a comparison. Translations between the different scanning locations were quite large, up to 50m. As mentioned, 6D-SLAM requires the marker based Riegl registration as an initial guess to successfully run on this dataset. Plane registration, in contrast, performs very well already without any initial motion estimates [134]. On most pairs, the plane registration results are very close to the marker based registration, and even appear to be more precise under close visual inspection of the point cloud data. But the method also fails on several scan pairs. It can be noted that these pairs tend to have very large occlusions and very low overlap, and hence display quite some ambiguities in terms of possible plane correspondences. Furthermore, it can be noted that the correct solutions for these pairs are among the top candidates in the MUMC ranking. One example where occlusion results in diminished overlap and thus to ambiguous registration results is shown in Fig. 8.1. The plane matching parameters used are shown in table 8.1. The Prefilter step again used N = 200 samples in the filtering process, as in the synthetic case above. Using the multimodal plane registration method as described in section 8.1, the 13 scans and the registrations between them gave rise to a graph with a total of 13 vertices and 23 edges, of which 7 are multimodal. This MoG pose graph has a complexity C(G) = 7.58496. Table 8.2 gives an overview over the 7 multimodal edges, especially their modes in terms of the estimated transformations in form of translations T and rotations R as well as the determinant of the covariances det(C) associated with them. The smaller the determinant, the more certain the candidate registration result. Using the settings described in table 8.1, at most 3 modes per edge were encountered. For each pair shown in table 8.2, the correct mode is indicated by gray shading. Note that four times the optimum is correct while three times the 2nd best choice is the proper solution. Also, higher ranked components than the globally correct ones are all supposedly more certain, showing the need to take less certain registration results into account to achieve global convergence. Furthermore, note that the modes tend to be quite far apart, i.e. there tend to be significantly different spatial transformations associated with them. This is only partially an effect of the post processing applied to the MUMC results as it allows a minimum distance of 5m between components (see table 8.1), while distances between the reported components are much larger. Choosing the wrong mode has hence strong impacts on SLAM, which cannot be easily repaired with additional loop closures. Figures 8.2 and 8.3 show the final maps after optimization using both the traditional unimodal Max SGD method and the Prefilter SGD/LM methods. The results from PreOp Lunc Omin Dmin Rmin
0.1 e15 0.667 5 0.065
λmax Lrep Omax Dmax Rmax
800000 e3 0.24 0.2 0.005
Table 8.1: Plane matching parameters used for the Bremen City data set.
117
8. Results on Local Ambiguity from Real World Datasets
Pair 1→2
#1 T = (8.22, −4.88, 0.45) R = (2.03, 0.69, −68.12)
det(C) = 2.38e − 23 T = (−22.21, 0.14, 0.27) R = (0.78, 0.77, 72.69) det(C) = 4.94e − 26 6 → 7 T = (−21.30, 5.89, 0.19) R = (−1.41, 1.10, −132.24) det(C) = 7.03e − 22 1 → 12 T = (−47.18, −48.63, −1.49) R = (−1.11, −0.13, 139.99) det(C) = 2.26e − 21 8 → 10 T = (40.83, 15.20, −1.25) 5→6
R = (1.54, −3.78, −176.13) det(C) = 6.89e − 16 9 → 11 T = (−20.42, 49.43, −1.27) R = (1.47, 0.90, 28.85) det(C) = 1.54e − 11 9 → 12 T = (2.64, 3.50, −0.08) R (−0.49, −1.19, −53.71) det(C) = 5.83e − 19
=
#2 T = (−1.67, 41.22, −0.26) R = (2.30, −0.02, −157.07) det(C) = 6.97e − 19 T = (−22.14, 0.24, 6.67) R = (0.64, 0.76, 72.69) det(C) = 2.08e − 23 T = (0.85, 4.52, −0.08) R = (−2.31, 0.58, −42.07) det(C) = 6.70e − 16 T = (3.13, 2.22, 0.10) R = (0.14, −0.80, 50.63) det(C) = 6.92e − 20 T = (1.15, 16.00, 0.04) R = (0.06, −1.22, −113.78) det(C) = 2.33e − 13 T = (7.73, 32.00, −0.44) R = (2.34, 0.78, 124.47) det(C) = 1.14e − 08 T = (−41.38, 61.60, −1.75) R = (−0.87, −0.87, −50.20) det(C) = 2.86e − 13
#3 -
-
-
-
T = (−8.52, 12.75, 0.21) R = (2.14, −1.13, 10.13) det(C) = 1.05e − 12 -
-
Table 8.2: The multimodal edges in the Bremen City map. The left column shows the edge in question, the other columns show the list of modes in the order reported by the plane based registration. Each mode is shown in form of the estimated translation T () and rotation R() as well as the determinant of the covariance det(C) associated with it. Furthermore, each globally correct mode is highlighted in gray.
118
Figure 8.1: Above: Scans 1 (right) and 2 (left) of the Bremen City data set. Below: The effects of ambiguity due to occlusion on planeregistration of the scan pair. The left image shows the most likely result as reported by the plane registration method. The second most likely registration result, shown on the right, is the globally correct but less certain one (see the covariance determinants in the first row of table 8.2). Detail views of where the scans meet at the church tower are shown in the right column. Note that no odometry exists to disambiguate the results.
8.2 Bremen City Center Dataset
119
8. Results on Local Ambiguity from Real World Datasets
Method Max SGD Max LM Multi-Edge LM Prefilter SGD Prefilter LM
Initialization 0.000226 0.000231 0.000276 0.000279 0.000269
Optimization 0.006497 0.001963 0.022238 0.006288 0.003624
Log Probability −1.26001 · 109 −3.56238 · 109 −1.45329 · 1010 −1.30253 · 105 −1.01384 · 105
SSExyz 3.65999 · 109 6.63873 · 109 2.65199 · 109 1.02629 · 105 1.02628 · 105
SSEρθφ 7.92405 · 10−1 1.11384 · 100 1.61360 · 100 3.19938 · 10−5 3.30090 · 10−5
Table 8.3: Runtimes in seconds and result quality for the traditional Max SGD/LM, Multi-Edge LM, and Prefilter SGD/LM on the Bremen City Data Set. Recorded on a Core i7-2720QM 2.2GHz with 8GB of RAM. The SSE metric (see section 2.5) was computed relative to the de-facto ground truth transformations given by the marker based registration. Max SGD/LM were initialized with a breath-first traversal of the graph, the rest of the methods were initialized with the Prefilter method described in section 6.3.2.
120
8.2 Bremen City Center Dataset
Figure 8.2: Final maps in plane patch representation after optimization with the traditional unimodal Max SGD method (top) and the multimodal Prefilter LM (bottom). The planes used for matching as well as the graph structure is shown. The log probability of the traditional result is −1.26933 · 109 , while the log probability of the multimodal result is −1.01384 · 105 .
121
8. Results on Local Ambiguity from Real World Datasets
Figure 8.3: Mapping results in point cloud representation using the traditional unimodal Max SGD method (top), and the multi modal Prefilter LM method (bottom). Laser reflectance values are used for assigning greyscale values (coded with the Jet colormap in color). See also Extension 1 for an animated view of these maps.
122
8.2 Bremen City Center Dataset
Figure 8.4: Final map after optimization with Prefilter LM overlaid on aerial imagery of Bremen City Center from Google Earth. Note that due to the height of the buildings, some parallax exists in the aerial image, and some ground features (fountains, small trees), as well as some high structures (cathedral tower) do not match exactly. The image shows the down projected map at the general roof level. The height is used to assign greyscale values (coded with the Jet colormap in color).
123
8. Results on Local Ambiguity from Real World Datasets # 25 50 100 200 400 800
Time 0.001554 σ2.05189 · 10−4 0.002379 σ1.57122 · 10−4 0.004469 σ2.72404 · 10−4 0.008602 σ4.38217 · 10−4 0.016455 σ4.87811 · 10−4 0.032582 σ8.26252 · 10−4
Log Probability −1.28219 · 108 σ6.140699 · 108 −1.43149 · 106 σ1.117032 · 107 −2.98506 · 105 σ2.333723 · 104 −2.90968 · 105 σ1.073785 · 104 −2.86378 · 105 σ4.710047 · 103 −2.84246 · 105 σ4.490705 · 103
SSExyz 2.850298 · 108 σ9.643072 · 108 9.127352 · 106 σ7.872734 · 107 5.3707584 · 105 σ1.314959 · 106 2.8931014 · 105 σ5.761819 · 105 2.2318785 · 105 σ5.652077 · 104 2.2788351 · 105 σ5.825175 · 104
SSEρθφ 0.024349 σ0.157559 5.061603 · 10−5 σ6.387661 · 10−5 4.427961 · 10−5 σ8.960579 · 10−7 4.442851 · 10−5 σ6.917717 · 10−7 4.439304 · 10−5 σ6.021706 · 10−7 4.436361 · 10−5 σ6.852595 · 10−7
Table 8.4: Runtimes in seconds and result quality for different numbers of particles (labeled # above) of the Particle method on the Bremen City Data Set. Due to the nondeterministic nature of the Particle method, 100 trials were run for each particle count and summarized by the mean and standard deviation. The data was recorded on a Core i7-2720GM 2.2GHz with 8GB of RAM. The SSE metric (see section 2.5) was computed relative to the de-facto ground truth transformations given by the marker based registration.
filter LM and Prefilter SGD are virtually indistinguishable with almost the same residual error (see table 8.3). The planar representation along with the pose graph structure is shown in Fig. 8.2. Fig. 8.3 shows a high quality point cloud rendering using the mapped reflectance value as color. It is quite obvious that the traditional Max SGD/LM methods fail while Prefilter SGD as well as Prefilter LM are able to find good map estimates. Two comparisons to ground truth are made, once using the marker based registration, and once by superimposing the map on aerial imagery from Google Earth. Table 8.3 shows runtimes as well as the quality of the results based on final log probability and an error metric relative to the marker based transformations. Fig. 8.4 shows the final multimodal map computed by Prefilter LM in relation to the Google Earth imagery. The figures in table 8.3 clearly show that for negligible computational overhead, Prefilter LM and Prefilter SGD arrive at significantly better solutions. Not only is the resulting map more than four orders of magnitude more likely given the edge constraints (here, the full log joint probability is used for both methods), but the result is also much closer to the de-facto ground truth given by the marker based registration. Additionally, it aligns well with the aerial imagery. Note that the units are in millimeters, so an SSExy value of 102628 for both Prefilter SGD and Prefilter LM is very small (around 30cm mean square error per position). The SSExy achieved by Max SGD of 3.58956 · 109 on the other hand is still very big, around 60m. The Particle method was also applied to this data set. Table 8.4 shows the performance given different particle counts over 100 trials. The resulting log probability stabilizes with more than 800 particles, and many more would be needed to achieve a better result. While the performance is not bad, much better in fact than the standard Max SGD/LM methods, a comparable result to Prefilter SGD/LM cannot be achieved. Additionally, the required computation time is an order of magnitude larger. 124
8.3 Hannover Fair Dataset
8.3
Hannover Fair Dataset Op Lunc Omin Dmin Rmin
0.1 e22 0.667 2 0.065
λmax Lrep Omax Dmax Rmax
800000 e3 0.24 0.5 0.035
Table 8.5: Plane matching parameters used for the Hannover Fair data set.
In a second experiment with real world data, a set of 22 scans was recorded in Hall 22 at Hannover Fair exhibition grounds during evening hours with a nodding 2D Sick S300 laser scanner actuated with a cheap and rather inaccurate servo. The set was recorded during the RoboCup German Open 2009 competition, and many screens, booths, and competition fields are visible in the data. People were occasionally moving through the scans, introducing additional noise; due to the rather slow motion of the scanner, they appear as blurry blobs.A further challenging aspect of this data set is that rather large translations and rotations occurred between scans, and thus the overlap between scans is often very small. The average translation between two scans is around 5m, with a maximum sensor range of 30m. Note that odometry of the mobile robot base is in theory available, however, it is very imprecise as the robot uses tracked locomotion over a mixture of carpets and slippery floors. The plane matching parameters used are shown in table 8.5. As above, the Prefilter step used N = 200 samples in the filtering process. The final map consists of 23 vertices and 53 edges. Table 8.8 and 8.9 show all 21 edges containing multimodal constraints. At most 5 modes were detected per edge, and in the worst case, the correct mode is actually the fifth one. The resulting MoG pose graph complexity is C(G) = 30.4167. Note that in relation to the modes encountered in the Bremen City data set (table 8.2), these modes are spatially closer to each other. This is due to a much more cluttered scene with many parallel planes (e.g. partitions, screens) and noise (e.g. by moving people and jitter in the servo control) which generate ambiguities that tend to have a negative impact on the registration method. Despite the relative spatial closeness, the wrong optima of the plane registration are so far away from the proper solution that the standard Max SGD and Max LM methods fail (see Fig. 8.5). Table 8.6 shows the achieved results of the Particle method with different numbers of particles. Since the graph was rather small, and the method is stochastic, 100 trials were run to produce these numbers. Naturally, the computation time increases linearly with the amount of particles used. Also, the log probability of the final MoG pose graph configuration increases when more particles are used. Around 1600 particles are needed to find a good result reliably. At first glance, this may seem less than for the synthetic graphs discussed above, but note the large difference in graph size. Still, Prefilter SGD and Prefilter LM converge to a better result than the Particle method, and they do this at least one order of magnitude faster. Table 8.7 shows the runtime and map quality comparisons between the traditional Max SGD/LM methods, the Multi-Edge LM method, and the Prefilter SGD/LM methods. 125
8. Results on Local Ambiguity from Real World Datasets
Figure 8.5: Hannover Fair map. Top: Traditional Max SGD method using only the locally most likely registration result. Bottom: Result of the Prefilter LM method. The local z coordinate was used to assign greyscale values (coded with the Jet colormap in color). See also Extension 1 for an animated view of these maps.
126
8.3 Hannover Fair Dataset # 100 200 400 800 1600 3200 6400 12800
Time 0.010071 σ2.831905 · 10−4 0.019808 σ3.68352 · 10−4 0.037288 σ6.222948 · 10−4 0.072761 σ1.710345 · 10−3 0.147081 σ6.048332 · 10−3 0.295176 σ1.216805 · 10−2 0.579942 σ2.648059 · 10−2 1.213969 σ7.467676 · 10−2
Log Probability −5.285083 · 108 σ8.736545 · 108 −1.923713 · 108 σ5.632018 · 108 −1.060568 · 107 σ1.027818 · 107 −5.349669 · 106 σ4.192651 · 107 −3.769561 · 106 σ1.698311 · 106 −3.353374 · 106 σ1.632804 · 106 −3.269732 · 106 σ6.91194 · 104 −3.244765 · 106 σ4.488209 · 104
Table 8.6: Runtimes in seconds and the final log probability achieved by the Particle method using different particle counts (labeled # above) on the Hannover Fair data set. Due to the nondeterminism of the Particle method, 100 trials were run per count. The data was recorded on a Core i7-2720QM 2.2GHz with 8GB of RAM.
Method Max SGD Max LM Multi-Edge LM Prefilter SGD Prefilter LM
Initialization 0.000457 0.000462 0.000312 0.000353 0.000304
Optimization 0.014856 0.048428 0.059520 0.015385 0.064116
Final Log Prob. −2.54952 · 109 −1.20496 · 108 −2.33928 · 1010 −2.36122 · 106 −7.76502 · 105
Table 8.7: Runtimes in seconds and results for the traditional Max SGD/LM, Multi-Edge LM, and Prefilter SGD/LM, recorded on a Core i7-2720QM 2.2GHz with 8GB of RAM. Hannover Fair data set. Max SGD/LM were initialized with a breath-first traversal of the graph, the rest of the methods were initialized with the poses computed by the Prefilter method described in section 6.3.2.
The five methods do not differ significantly in their runtime, but produce very different results. The Prefilter SGD/LM methods arrive at a map that is almost four orders of magnitudes more likely given the log joint probability than the traditional methods. Fig. 8.5 show the two resulting maps. Quite obviously, the traditional Max SGD method failed, resulting in vastly misplaced observation poses and visibly inaccurate map parts. The same holds for the Max LM method which also produces obviously distorted maps. Both the Prefilter SGD and Prefilter LM methods, however, converge to nearly the same very good final result, even though the joint probability of the results is quite low. This may be explained by the many multimodal constraints that actually assign a 127
8. Results on Local Ambiguity from Real World Datasets
Pair 3→4
8→9
18 → 19
19 → 20
22 → 23
23 → 24
24 → 25
2→4
5→7
7 → 10
7 → 11
#1 T = (3.94, 0.02, 0.17) R = (−0.29, 0.32, 3.38) det(C) = 1.83e − 21 T = (3.23, −7.09, 7.11) R = (−89.05, 20.77, −2.88) det(C) = 1.55e − 24 T = (2.62, 0.65, 0.04) R = (−0.33, −0.02, 12.09) det(C) = 1.81e − 20 T = (1.32, 0.91, 0.03) R = (−0.65, −0.36, 17.24) det(C) = 5.06e − 17 T = (4.30, −9.80, 0.22) R = (−3.34, 0.48, 32.70) det(C) = 1.50e − 14 T = (0.71, −0.36, −0.07) R = (0.11, −0.02, −5.77) det(C) = 4.33e − 19 T = (4.20, 0.11, 0.23) R = (−0.31, 0.17, −1.63) det(C) = 1.67e − 11 T = (0.35, −0.14, 0.04) R = (−0.33, 0.06, −0.23) det(C) = 1.40e − 19 T = (8.61, −0.33, 0.43) R = (0.74, −1.65, −3.50) det(C) = 3.58e − 20 T = (8.71, 0.32, 0.15) R = (−1.26, 0.80, 22.18) det(C) = 1.38e − 21 T = (14.61, 7.57, 2.49) R = (−92.61, −39.57, 179.09) det(C) = 3.25e − 21
#2 T = (3.87, 0.00, 0.74) R = (−0.17, 4.56, 3.59) det(C) = 5.32e − 14 T = (4.17, 0.13, 0.23) R = (−1.05, −0.88, 21.01) det(C) = 3.50e − 23 T = (1.45, 2.96, −0.05) R = (−0.90, −0.30, 12.56) det(C) = 3.41e − 15 T = (3.33, 1.31, 0.19) R = (−0.39, 0.72, 19.52) det(C) = 2.13e − 15 T = (2.25, −2.53, 0.06) R = (1.07, 1.74, −44.70) det(C) = 2.54e − 14 T = (2.89, −0.64, 0.09) R = (0.16, −0.25, −5.50) det(C) = 9.49e − 16 T = (−2.60, 0.70, −0.49) R = (1.17, −4.77, −0.88) det(C) = 1.94e − 11 T = (5.65, 0.79, 2.30) R = (−93.61, 3.16, 175.76) det(C) = 3.24e − 16 T = (−6.01, 1.35, 2.14) R = (91.82, −5.18, −2.01) det(C) = 3.29e − 19 T = (0.24, 0.52, 0.06) R = (−2.45, −3.22, 21.72) det(C) = 8.54e − 15 T = (−3.70, −8.97, −0.64) R = (−91.18, 46.23, −1.69) det(C) = 1.97e − 20
det(C) = 9.93e − 17 T = (0.75, 0.49, 0.01) R = (−0.07, 0.01, 8.04) det(C) = 1.83e − 14 -
#3 T = (2.04, −0.04, 0.03) R = (0.08, 1.48, −15.49) det(C) = 3.43e − 13 T = (4.34, 5.22, 2.11) R = (83.89, −22.27, 90.23)
R = (0.93, 15.27, −4.79) det(C) = 1.85e − 12 -
T = (−0.46, 0.41, 0.09) R = (1.77, 0.37, −42.38) det(C) = 2.60e − 07 T = (1.33, −0.32, 2.16)
-
det(C) = 3.23e − 16 -
T = (3.56, 0.08, 2.36) R = (0.48, 3.18, 20.43)
#4 -
-
-
-
-
-
#5 -
R = (−1.72, 0.94, 47.72)
T = (10.39, 1.83, 0.30)
-
T = (7.03, −0.18, 0.33) R = (−0.59, 0.14, −0.37) det(C) = 3.12e − 14 -
det(C) = 9.08e − 17
R = (−1.04, 0.28, −42.12)
T = (0.69, 5.97, 0.28)
-
-
-
-
-
-
-
-
T = (−0.38, 0.43, 0.09) R = (1.56, −0.39, −27.46) det(C) = 1.53e − 08 T = (−4.96, −13.59, −0.51) R = (0.51, −1.17, −9.87) det(C) = 7.30e − 13 -
det(C) = 2.93e − 20
Table 8.8: Multimodal edges in the Hannover Fair map, continued in table 8.9. Again, the left column shows the edge containing the modes on the right. Up to five modes per edge were detected. The globally correct mode is highlighted.
128
129 T = (1.25, 4.42, 0.13) R = (−3.20, 2.38, 112.68) det(C) = 5.41e − 21 T = (5.52, 3.09, 0.26) R = (−2.33, 0.20, 60.02) det(C) = 2.25e − 17 T = (6.99, 1.47, 0.16) R = (1.08, 1.41, −72.35) det(C) = 8.66e − 15
det(C) = 1.48e − 15
det(C) = 9.19e − 24 T = (2.30, 2.30, 0.13) R = (−3.48, 0.88, 80.71) det(C) = 5.22e − 25 T = (1.68, 4.31, −0.95) R = (87.78, −48.40, 71.85) det(C) = 6.54e − 26 T = (3.20, 0.01, 0.13) R = (−2.07, −0.69, 60.96) det(C) = 6.39e − 19 T = (8.63, 1.56, −1.16) R = (−2.39, 1.73, −150.53) det(C) = 1.86e − 17
det(C) = 1.99e − 23 T = (2.13, 0.84, 0.11) R = (−0.24, −0.61, 26.36)
det(C) = 6.60e − 22 T = (5.39, 1.43, 5.19) R = (−81.72, −73.25, 169.04) det(C) = 6.64e − 16 T = (0.23, 1.48, 12.04) R = (−93.50, 46.82, 173.12) det(C) = 2.54e − 19 T = (0.64, 1.62, 2.13) R = (−92.31, −39.37, −112.83) det(C) = 7.96e − 21 T = (0.99, 2.51, −2.96) R = (89.03, −17.37, 71.29) det(C) = 4.24e − 16 T = (4.87, 6.26, 3.10) R = (92.29, 40.49, 70.00) det(C) = 5.90e − 25 T = (3.73, −2.25, 0.23) R = (−1.46, −1.29, 60.83) det(C) = 1.69e − 17 T = (7.57, −0.97, 0.11) R = (0.30, 1.24, −72.06)
det(C) = 1.69e − 24 T = (2.24, −0.33, 10.86) R = (175.90, −2.60, 165.37) det(C) = 2.67e − 20 T = (3.71, 4.71, 0.25) R = (−2.28, 1.79, 134.30)
det(C) = 1.40e − 16 -
det(C) = 1.74e − 16 T = (−2.76, 2.65, 0.27) R = (−0.84, −1.62, 26.93)
det(C) = 2.10e − 16 T = (1.23, 8.75, 0.30) R = (1.40, 3.87, −136.78)
T = (3.97, 3.35, 5.72) R = (−91.80, −66.75, −91.24) det(C) = 2.26e − 14 T = (5.60, 3.07, 0.32) R = (−3.65, 1.95, 102.15)
#3 T = (2.16, 6.37, 0.27) R = (−3.61, 2.02, 13.55) det(C) = 4.58e − 13 -
#2 T = (−5.07, 0.40, 3.58) R = (99.93, 75.13, 101.31) det(C) = 5.12e − 12 T = (1.05, 1.97, −3.94) R = (91.64, 10.44, 1.92) det(C) = 4.40e − 18 T = (3.92, 0.23, 0.26) R = (−1.24, 0.07, 20.85)
#1 T = (9.96, 3.36, 0.12) R = (−2.49, 1.71, 102.66) det(C) = 2.65e − 21 T = (4.38, 5.74, 0.10) R = (−1.55, 2.03, 166.28) det(C) = 7.35e − 24 T = (0.75, 0.25, 0.03) R = (−1.53, −1.18, 19.53)
=
det(C) = 3.59e − 12
T = (8.58, 1.57, 0.23) R = (1.69, 5.52, −150.08)
T = (1.23, 4.37, 2.29) R = (−3.15, 2.02, 112.82) det(C) = 1.93e − 20 -
det(C) = 4.42e − 15 -
T = (3.69, 4.56, 0.29) R = (−0.01, −0.35, 25.79)
T = (2.58, 10.60, 0.47) R (−0.69, 4.28, −168.06) det(C) = 4.73e − 16 -
-
-
#4 -
det(C) = 9.17e − 09
T = (0.49, 0.12, 0.07) R = (−0.09, 1.75, −34.80)
-
-
-
-
-
-
-
-
#5 -
Table 8.9: Multimodal edges in the Hannover Fair map, continued from table 8.8. Again, the left column shows the edge containing the modes on the right. Up to five modes per edge were detected. The globally correct mode is highlighted.
19 → 23
19 → 21
10 → 13
10 → 12
9 → 11
8 → 13
8 → 12
8 → 10
7 → 14
Pair 7 → 12
8.3 Hannover Fair Dataset
8. Results on Local Ambiguity from Real World Datasets low weight to the correct mode. Still, it seems that the information from other constraints reduces the likelihood of other optima sufficiently. This data set also illustrates that the modes do not have to be significantly far apart from each other for the Prefilter step to work.
130
Chapter 9
Results of Experiments on Global and Local Ambiguity using the Full Generalized Graph SLAM Framework 9.1
Systematic Evaluation with Synthetic Data #edges with X modes/hypercomponents # 1 2 3 4 5 6 7 8 9 10 11
C(G) 1 2 3 4 8 16 32 64 82.72 105.36 126.99
X=1
X=2
X=3
X=4
X=5
255 254 253 252 248 240 224 192 192 192 192
0/1 1/1 2/1 2/2 4/4 8/8 16/16 32/32 16/16 8/8 4/4
0 0 0 0 0 0 0 0 16/16 8/8 4/4
0 0 0 0 0 0 0 0 0 16/16 8/8
0 0 0 0 0 0 0 0 0 0 16/16
% 0.4 0.8 1.2 1.6 3.2 6.4 12.8 25 25 25 25
Table 9.1: The 11 complexity classes used in the experiments.
A synthetic data set was generated, much like the one used in [140]. Each generated graph consists of 128 vertices and 256 edges. The aim of the experiments is to find out how robust the methods are with respect to ambiguities in the data, so a number of pose graphs with different complexities were generated. Specifically, the complexity metric introduced in [140] was extended to encompass hyperedges as well: N N YX X X C(G) = log2 Mj = log2 Mj (9.1) e∈E j=1
e∈E
j=1
This way, a hyperedge with n unimodal hypercomponents has the same complexity as 131
9. Results of Experiments on Global and Local Ambiguity using the Full Generalized Graph SLAM Framework
(a) Optimization result with Max-Mixture from the (b) Optimization result with Switchable Contraditional breadth-first initialization, SSExy = straints also from breadth-first initialization, 631554, SSEθ = 4.75323. SSExy = 1162181, SSEθ = 2.02238.
(c) Optimization result with RRR also from (d) Robust Gauss-Newton optimization result usbreadth-first initialization, SSExy = 625717, ing the MoG and hyperedge components chosen by SSEθ = 4.31798. Prefilter, SSExy = 168.343, SSEθ = 0.00173.
Figure 9.1: One example graph of the data set of complexity class 7 and C(G) = 32, with a total of 16 multimodal MoG edges with two components and 16 hyperedges with two hypercomponents. Ground truth is shown in gray in the background.
a hyperedge with just one hypercomponent containing a MoG constraint with n modes. The metric hence captures the fact that hypercomponents and MoG components represent different alternative spatial relations between nodes in the graph that can lead to a combinatorial explosion. Table 9.1 shows a summary of generated complexity classes and the distribution and number of components in the MoG constraints and hyperedges. The overall percentage of non-simple edges is also shown. A total of 110 graphs were generated in 11 classes with an increasing complexity, 10 per class. The first 7 classes only contain MoG constraints or hyperedges with two components in varying numbers. In class 3, for example, the graphs contain two MoG constraints and one hyperedge, both with two components (X = 2 in the table). Classes 9 through 11 do not add more non-simple edges, but more MoG or hyperedge components. Instead of generating a completely new random graph for the more complex classes, 132
9.1 Systematic Evaluation with Synthetic Data
108 Max 107
Max-Mixture Switchable Constraints RRR Prefilter Optimization of Ground Truth
SSExy
106 105 104 103 102
1
2
3
4
5 7 6 Complexity Condition
8
9
10
11
4
5 7 6 Complexity Condition
8
9
10
11
101 Max
SSEθ
100
Max-Mixture Switchable Constraints RRR Prefilter Optimization of Ground Truth
10-1
10-2
10-3
1
2
3
Figure 9.2: Final SSE metric relative to ground truth for each of the 11 complexity classes. The median and upper/lower quartiles are shown. Note the log scale on the y axis. The final SSE metric of the optimization result using the ground truth graph is also shown for comparison.
133
9. Results of Experiments on Global and Local Ambiguity using the Full Generalized Graph SLAM Framework
runtime (s)
Max Max-Mixture Switchable Constraints RRR Prefilter
10-1
1
2
3
4
5 7 6 Complexity Condition
8
9
10
11
Figure 9.3: Runtimes for each of the methods on all 11 complexity classes. The median and upper/lower quartiles are shown. Times were recorded on an Intel i7-3770 3.4GHz with 16GB RAM. Note the log scale on the y axis. Runtimes varied from 0.01s to 1.45s over all methods, quartiles are between 0.03s and 0.44s.
134
9.1 Systematic Evaluation with Synthetic Data the already generated less complex graphs were reused. Thus, a total of 10 base graphs were generated consisting completely of simple edges. For class 1, one hypercomponent was added to a random simple edge of the base graph. For class 2, one MoG component was added to a random simple edge of the same graph. For class 3, one MoG component was added to another random simple edge, and so on. For more complex classes, existing components on either hyperedges or MoG constraints were reused and additional components added where necessary. This way, the differences between the graphs in increasing complexity are minimal, and thus the performance difference of the methods is solely due to the additional components in either MoG constraints or hyperedges. Five main methods were tested and compared. The state-of-the-art Max-Mixture [124], Switchable Constraints [168, 169], and RRR [100] methods were used as a comparison basis to evaluate the extended Prefilter method described above. Open source implementations of these methods were published by their respective authors and use the g2o library1 . The comparison between the methods is mostly one of initialization methods, as MaxMixture is very sensitive to the initial condition, as also noted by the authors of [124]. The Switchable Constraints method is less susceptible, but still suffers significantly from bad initial conditions. RRR is supposed to be independent of the initial configuration, but fails to even maintain the connectedness of the graph. A fifth method called Max representing the traditional case where only the most weighted component (hypercomponent or MoG component) is chosen, i.e. j ∗ , m∗ = argmax πj πm , is also used as a baseline for the other methods [140]. This approach models the case where an unreliable loop detection method and unimodal registration method is used. A traditional breadth-first initialization was performed before optimization with either of these methods. The result of Prefilter was used to select components of all hyperedges and MoG constraints analogous to the way described in [140] for optimization with a standard robust optimization method implemented in the g2o library [98]. The same solver, a Gauss-Newton method implemented in g2o, was used for all approaches other than RRR, so their convergence and computational complexity can be fairly evaluated. For the Max, Max-Mixture, and Prefilter methods, a Cauchy robust kernel was used. Switchable Constraints implements explicit reweighting, so an implicit one with a robust kernel was not used. The RRR implementation hardcoded its solver of choice, which was left as is. Figure 9.2 shows the median and upper and lower quartiles of the final SSExy and SSEθ error [122] relative to Ground Truth of the ten sample pose graphs per complexity class after optimization using all investigated methods. Note the log scale of the y axis. As a comparison of achievable results with Ground Truth initialization and no outliers, the final SSE errors of the optimized Ground Truth base graphs is also included. This optimization did not use a robust kernel, therefore it is surprising, but not impossible that the Ground Truth optimization result has a higher final error than some of the other methods. It is obvious that, even though there is a high variance for all methods, Prefilter performs multiple orders of magnitude better than Max-Mixture, and around an order of magnitude better than Switchable Constraints. This especially holds in highly complex classes. Switchable Constraints exhibits a slightly better robustness towards graph complexity than Max-Mixture, though it also exhibits a very large variance in the 1
Max-Mixture: https://github.com/agpratik/max-mixture, Switchable http://openslam.org/vertigo.html, RRR: https://github.com/ylatif/rrr
135
Constraints:
9. Results of Experiments on Global and Local Ambiguity using the Full Generalized Graph SLAM Framework results. Surprisingly, RRR fails at all graphs, and changing any of the exposed parameters (odometry and loop rate) has no effect. This happens because a very significant number of constraints are falsely rejected, which breaks the graph (see figure 9.1c). Figure 9.3 shows the runtimes of the compared methods. Again, note the log scale of the y axis. The required runtime of the Max-Mixture method increases significantly with the graph complexity. A similar trend is evident in the time required by the Switchable Constraints method, note the large median runtime. The Prefilter method only needs more computational time occasionally with very complex graphs, indicated by the low median required time for this method even at high complexities. However, at these complexities (classes 8-11), the other methods no longer converge to satisfactory results at all. Thus longer runtimes of Prefilter relative to the less complex classes are definitely worth the significant gain in robustness over the other methods.
9.2
Real World Dataset: Bremen City Center
This real world data is based on 13 scans that were recorded with a Riegl VZ-400 in the center of Bremen, Germany. Each point cloud consists of between 15 and 20 million points with reflectance information. The scanner was mounted on a tripod without a mobile base, thus no odometry information is available. However, markers were placed in the environment beforehand to allow for a comparison with the “gold standard” for geodetic applications, i.e. registration with artificial markers in the Riegl software which requires additional manual assistance in the process like confirming or re-selecting correspondences. This registration based on artificial markers can also be used to seed methods that need a good initial guess, e.g. for ICP based methods like 6D-SLAM [15]. Note that no initial guess, i.e. no initial marker based registration, no motion estimates, no GPS, or anything similar is used in the results presented here, other than as a comparison. This dataset has been used for multimodal SLAM before, namely in the authors’ original paper presenting the Prefilter method [140]. The same multimodal plane matching method is used here, though with a slight change. Namely, the absolute minimum overlap parameters Op was again lowered to 0.045, allowing even more potential results to be considered. In the experiments performed in [140], loop closures were added to the map only after validation that one of the reported results actually was the correct one. Here, all scans are exhaustively matched with each other, resulting in a very dense graph. Two cases of registration results were treated slightly differently: Registrations between two sequential scans were added to the graph as a regular edge, i.e. not a hyperedge. Registrations between one scan an all its preceding scans without its immediate predecessor were added as a single hyperedge. Any registration result was allowed to be multimodal, also in the sequential case. Table 9.2 shows a connectivity matrix between all pairs. Note that the graph is almost completely connected because of the exhaustive loop generation fashion. However, there are only 23 edges in the graph. These are the 12 sequential MoG edges, and 11 nonsequential MoG hyperedges. For example, the loop closing MoG hyperedge from scan 7 to its predecessors 0 through 5 contains 6 hypercomponents with a total of 19 MoG components. This results in a graph complexity C(G) = 36.95, which is large regarding the small size of the graph. 136
9.2 Real World Dataset: Bremen City Center
# 0 1 2 3 4 5 6 7 8 9 10 11
1 3 -
2 4 1 -
3 3 1 1 -
4 2 2 1 1 -
5 5 1 1 1 1 -
6 3
3 1 -
7 4 1 1 2 4 7 1 -
8 1 1 1 1 4 9 1 -
9 1 2 2 2 2 1 2 1 -
10 1 1 1 2 1 2 2 1 1 -
11 1 1 1 1 1 2 1 4 2 1 1 -
12 1 1 4 2 1 1 1 2 3 1 1 1
Table 9.2: Connectivity matrix between all 13 scans, showing the number of components in the multimodal registration result per pair. A missing number in the upper triangle means that no registration result was found for that pair.
SSExyz SSEψφθ runtime (s)
Max 4.931 · 109 2.51842 0.01051
Max-Mixture 4.582 · 109 2.50516 0.1632
SC ∞ 3.65766 0.0473
Prefilter 75762.5 0.00006 0.00935
Table 9.3: SSE errors relative to the “gold standard” marker-based registration for each optimization method. SC stands for Switchable Constraints.
137
9. Results of Experiments on Global and Local Ambiguity using the Full Generalized Graph SLAM Framework
(a) Optimization result of Max method.
(b) Optimization result of Max-Mixture method.
(c) Optimization result of Prefilter method.
(d) Ground Truth by marker-based registration.
Figure 9.4: Orthographic view of the planar maps generated from the exhaustively matched Bremen City Center dataset after optimization.
Table 9.3 shows the final SSE errors relative to the marker-based ground truth for all tested optimization methods. The implementation of RRR by Latif et al. [100] currently does not support 3D pose graphs, thus it was not evaluated for this dataset. The same optimization parameters related to the specific solvers, robust kernels, and number of iterations were used here as in the experiments with synthetic graphs above. Switchable Constraints diverged to a result so far away from the ground truth that the square error overflowed the double precision floating point representation. This happened at a distance of around 1019 meters from the ground truth. Interestingly, the result does not improve significantly between the Max and Max-Mixture methods, even though Max-Mixture takes significantly longer. Clearly, the Prefilter method outperforms all others, both in the quality of the optimization result and efficiency. Note that the mean square errors SSE in the table are in mm2 , so the final SSExyz of 75762.5 corresponds to a mean distance of 0.27m to each ground truth vertex pose. The mean distance achieved by the Max method is 70.22m, while Max-Mixture achieves a distance of 67.69m on average. Figures 9.4 and 9.5 show the actual maps computed by the different methods. The ground truth is also shown as a comparison. The changes in graph topology induced by the Prefilter method after rejecting incongruent MoG and hyperedge components is especially visible in figure 9.4 showing the map from the top with an orthographic projection. Note also the large number of constraints used in the Max-Mixture method, as no rejection of outliers is done beforehand.
138
9.2 Real World Dataset: Bremen City Center
(a) Optimization result of Max method.
(b) Optimization result of Max-Mixture method.
(c) Optimization result of Prefilter method.
(d) Ground Truth by marker-based registration.
Figure 9.5: Perspective view of the planar maps generated from the exhaustively matched Bremen City Center dataset after optimization.
139
9. Results of Experiments on Global and Local Ambiguity using the Full Generalized Graph SLAM Framework
140
Part IV
Conclusion
141
Chapter 10
Conclusions 10.1
Significant Contributions of this Thesis
This thesis highlighted several contributions to the state-of-the-art in Simultaneous Localization and Mapping (SLAM). These contributions are focused on, first of all, representing local and global ambiguity in Graph-based SLAM in a theoretically sound manner, and offering approaches to solve SLAM problems that exhibit such ambiguities. Furthermore, the first method to generate multimodal Mixture of Gaussian (MoG) registration results from a plane-based registration method was presented, allowing the proposed solutions to ambiguity in SLAM to be applied to real world data. An additional contribution facilitates the efficient use of Graph-based SLAM methods in multi-robot teams under communication constraints. Finally, novel uncertainty estimation techniques allow the use of especially robust spectral registration methods in Graph-based SLAM.
10.1.1
The Generalized Graph SLAM Framework
A formal description of robust Graph-based SLAM under local and global ambiguities using multimodal constraint probability densities and hyperedges has been presented in section 3.2, to my knowledge for the first time in the literature. Local ambiguity is represented as Mixture of Gaussian (MoG) probability densities in the graph constraints. Global ambiguity results in a mixture of multiple candidate loop constraints and a null hypothesis stored as a hyperedge. Generalized Graph SLAM represents an umbrella framework for the state-of-the-art methods for robust Graph-based SLAM currently described in the literature. Specifically, the representations used in methods by Olson and Agarwal [124], Sunderhauf and Protzel [169], and Latif et al. [100] are shown to be special cases of this framework. Such a formal description allows for a theoretically founded comparison, discussion, and analysis of previously disjoint methods. It also allows for a precise identification of limitations of the discussed methods by checking which aspects of the general framework are neglected in the specific model used by the method.
10.1.2
Mixture of Gaussian Results from Plane-based Registration
Since the use of multimodal probability densities in SLAM is in its infancy, a registration method specifically designed to produce multimodal Mixture of Gaussian (MoG) results 143
10. Conclusions did not yet exist. Section 8.1.1 describes how to generate such MoG results with a planebased registration method, to my knowledge also the first of its kind. Several parameters of the original plane-based registration method were able to be relaxed significantly, producing more results, but increasing the likelihood that the correct result is among the ones produced.
10.1.3
The Prefilter Method
The Prefilter method was presented in section 6.3.2 and shown to be the most robust method to solve problems within the Generalized Graph SLAM framework. Several experiments with graphs containing multimodal constraint probability densities as well as hyperedges show the significant improvement achievable with the Prefilter method relative to the current state-of-the-art methods. The method is inspired by the particle filter and uses a minimum spanning tree traversal of the graph which minimizes the encountered number of mixture or hyperedge components, keeping the number of combinations to track small. The estimate of global vertex poses generated by this method can be used to either select mixture and hyperedge components that are most globally consistent, or as an initial estimate for gradient-based methods that take into account all components.
10.1.4
The Trust-Region Newton Method
As a departure from least-squares methods used in SLAM, section 6.2 describes the classic Newton optimization method with a step-size limiting approach based on an adaptive trust region around the current estimate. This is a rather popular and very general non-linear optimization approach for problems where the hessian is not always positive semi-definite as in least-squares problems. Therefore, this approach is also applicable to SLAM with multimodal constraints, as their multiple optima give rise to saddle points around which the hessian is indefinite. To my knowledge, this exact approach has not been used in SLAM research before, though the Gauss-Newton and Levenberg-Marquardt methods used in state-of-the-art approaches are special cases of the general Newton method.
10.1.5
Multi-Robot Pose Graph Transmission Methods
Multi-robot Graph-based SLAM presents itself as a practical and efficient choice, since the graph representation allows for easy integration of information from other robots and deferred optimization. Section 5.2 shows a simple algorithm to prioritize sensor data transmissions between two cooperating robots. Since sensor data requires significant bandwidth resources to be transmitted, it is important to send the data most likely to be of value to the other robot and the shared map as a whole first. This allows the efficient and effective construction of a shared trajectory estimate and graph map on several cooperating robots even under severe communication constraints.
10.1.6
Uncertainty Analysis of Spectral Registration Methods
Spectral registration methods in 2D and 3D are especially robust to noise in the sensor data, however no previous work had estimated uncertainty information in the form of covariance matrices around the estimated registration result. In order to use the full 144
10.2 Summary of Answers to the Research Question information obtainable from such methods, a simple yet effective approach to extract covariance matrices from these methods was presented in sections 4.2.2 and 4.4 for the 2D and 3D variants, respectively. The benefit of this approach is shown by estimating good maps from several data sets exhibiting significant noise.
10.2
Summary of Answers to the Research Question
The research question stated in section 1.4 was How can a robot or a team of robots robustly estimate a map of an environment that is repetitive or otherwise leads to ambiguous loop detection (global ambiguity) and sensor data registration results (local ambiguity) efficiently and effectively? This question gave rise to a number of issues this thesis addressed. The major part of the research question concerning SLAM under local and global ambiguity is answered by the contributions highlighted in 10.1.1, 10.1.2, 10.1.3, and 10.1.4. Specifically, globally ambiguous loop detection results are represented as hyperedges, and locally ambiguous multimodal registration results as Mixture of Gaussian constraint probability densities. Such extended Pose Graphs can be optimized efficiently and effectively with the Prefilter or Trust-Region Newton methods. The multi-robot SLAM aspect is addressed by the multi-robot Pose Graph formalization in 1.2 and the multi-robot Pose Graph construction method highlighted in 10.1.5. Here, the focus is on efficiently communicating Pose Graph updates and sensor data necessary to construct inter-robot constraints in the graph in an effective way. The presented method prioritizes data estimated to be most useful for the overall map quality, and is shown to construct a map equivalent to full communication under severe communication constraints. Novel advances useful for general SLAM methods in the form of uncertainty estimates for spectral registration methods in 2D and 3D are highlighted in 4.2.2 and 4.4.
145
10. Conclusions
146
Bibliography [1] B. D. O. Anderson and J. B. Moore. Optimal filtering. Prentice-Hall, Englewood Cliffs, N.J., 1979. [2] T.D. Barfoot, B. Stenning, P. Furgale, and C. McManus. Exploiting reusable paths in mobile robotics: Benefits and challenges for long-term autonomy. In Computer and Robot Vision (CRV), 2012 Ninth Conference on, pages 388–395, 2012. doi: 10.1109/CRV.2012.58. [3] S. Barkby, S. Williams, O. Pizarro, and M. Jakuba. An efficient approach to bathymetric slam. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages 219–224, 2009. [4] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. Speeded-up robust features (surf). Computer Vision and Image Understanding, 110(3):346–359, 2008. [5] Ola Bengtsson and Albert-Jan Baerveldt. Location in changing environments – estimation of a covariance matrix for the idc algorithm. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pages 1931–1937, Oct. 2001. [6] Paul J. Besl and Neil D. McKay. A method for registration of 3-d shapes. IEEE Trans. on Pattern Analysis and Machine Intelligence, 14(2):239–256, Feb 1992. [7] A. Birk, B. Wiggerich, V. Unnithan, H. B¨ ulow, M. Pfingsthorn, and S. Schwertfeger. Reconnaissance and camp security missions with an unmanned aerial vehicle (uav) at the 2009 european land robots trials (elrob). In IEEE International Workshop on Safety, Security and Rescue Robotics, SSRR, November 2009. [8] Andreas Birk. A quantitative assessment of structural errors in grid maps. Autonomous Robots, 28:187–196, 2010. [9] Andreas Birk and Stefano Carpin. Merging occupancy grid maps from multiple robots. IEEE Proceedings, special issue on Multi-Robot Systems, 94(7):1384–1397, 2006. [10] Andreas Birk, Kaustubh Pathak, Jann Poppinga, S¨oren Schwertfeger, Max Pfingsthorn, and Heiko B¨ ulow. The jacobs test arena for safety, security, and rescue robotics (ssrr). In WS on Performance Evaluation and Benchmarking for Intelligent Robots and Systems, Intern. Conf. on Intelligent Robots and Systems (IROS). IEEE Press, 2007. 147
BIBLIOGRAPHY [11] Andreas Birk, Kaustubh Pathak, Narunas Vaskevicius, Max Pfingsthorn, Jann Poppinga, and Soeren Schwertfeger. Surface representations for 3d mapping: A case for a paradigm shift. KI - Kuenstliche Intelligenz, 24(3):249–254, 2010. [12] Andreas Birk, Burkhard Wiggerich, Heiko B¨ ulow, Max Pfingsthorn, and Soeren Schwertfeger. Safety, security, and rescue missions with an unmanned aerial vehicle (uav): Aerial mosaicking and autonomous flight at the 2009 european land robots trials (elrob) and the 2010 response robot evaluation exercises (rree). Journal of Intelligent and Robotic Systems, 64(1):57–76, 2011. [13] Andreas Birk, Max Pfingsthorn, and Heiko B¨ ulow. Advances in underwater mapping and their application potential for safety, security, and rescue robotics (ssrr). In IEEE International Symposium on Safety, Security, Rescue Robotics (SSRR). IEEE Press, 2012. [14] ˚ A. Bj¨orck. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, 1996. ˜ 1 chter, and Joachim [15] Dorit Borrmann, Jan Elseberg, Kai Lingemann, Andreas NA 4 Hertzberg. Globally consistent 3d mapping with scan matching. Robotics and Autonomous Systems, 56(2):130–142, 2008. [16] Mary Ann Branch, Thomas F. Coleman, and Yuying Li. A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems. SIAM J. Sci. Comput., 21(1):1–23, 1999. [17] Heiko B¨ ulow and Andreas Birk. Fast and robust photomapping with an unmanned aerial vehicle (uav). In International Conference on Intelligent Robots and Systems (IROS). IEEE Press, 2009. [18] Heiko B¨ ulow and Andreas Birk. Spectral registration of noisy sonar data for underwater 3d mapping. Autonomous Robots, 30(3):307–331, 2011. [19] Heiko B¨ ulow, Andreas Birk, and Vikram Unnithan. Online generation of an underwater photo map with improved fourier mellin based registration. In IEEE OCEANS. IEEE Press, 2009. [20] Heiko B¨ ulow, Max Pfingsthorn, and Andreas Birk. Using robust spectral registration for scan matching of sonar range data. In 7th Symposium on Intelligent Autonomous Vehicles (IAV), IFAC. IFAC, 2010. [21] W. Burgard, D. Fox, M. Moors, R. Simmons, and S. Thrun. Collaborative multirobot exploration. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE Press, 2000. [22] W. Burgard, C. Stachniss, G. Grisetti, B. Steder, R. Kummerle, C. Dornhege, M. Ruhnke, A. Kleiner, and J.D. Tardos. A comparison of slam algorithms based on a graph of relations. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages 2089–2095, 2009. [23] M Caccia. Vision-based SLAM for ROVs: preliminary experimental results. In Proc. of 7th IFAC Conference on Manoeuvring and Control of Marine Craft, 2006. 148
BIBLIOGRAPHY [24] Stefano Carpin. Fast and accurate map merging for multi-robot systems. Autonomous Robots, 25(3):305–316, 2008. ISSN 0929-5593. doi: 10.1007/ s10514-008-9097-4. URL http://dx.doi.org/10.1007/s10514-008-9097-4. [25] Stefano Carpin and Andreas Birk. Stochastic map merging in rescue environments. In Daniele Nardi, Martin Riedmiller, and Claude Sammut, editors, RoboCup 2004: Robot Soccer World Cup VIII, volume 3276 of Lecture Notes in Artificial Intelligence (LNAI), page p.483ff. Springer, 2005. [26] Stefano Carpin, Andreas Birk, and Victoras Jucikas. On map merging. International Journal of Robotics and Autonomous Systems, 53:1–14, 2005. [27] M.A. Carreira-Perpinan. Mode-finding for mixtures of Gaussian distributions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(11):1318 – 1323, nov 2000. [28] A. Censi, L. Iocchi, and G. Grisetti. Scan matching in the hough domain. In Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on, pages 2739–2744, April 2005. [29] Andrea Censi. An accurate closed-form estimate of icp’s covariance. In IEEE Int. Conf. on Robotics and Automation, pages 3167–3172, April 2007. [30] H.J. Chang, C.S.G. Lee, Yung-Hsiang Lu, and Y.C. Hu. Simultaneous localization and mapping with environmental structure prediction. In Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference on, pages 4069– 4074, 2006. [31] H.J. Chang, C. S G Lee, Y.C. Hu, and Yung-Hsiang Lu. Multi-robot slam with topological/metric maps. In Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on, pages 1467–1472, 2007. doi: 10.1109/IROS. 2007.4399142. [32] H.J. Chang, C.S.G. Lee, Yung-Hsiang Lu, and Y.C. Hu. P-slam: Simultaneous localization and mapping with environmental-structure prediction. Robotics, IEEE Transactions on, 23(2):281–293, 2007. [33] Q. Chen, M. Defrise, and F. Deconinck. Symmetric phase-only matched filtering of Fourier-Mellin transforms for image registration and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16:1156–1168, 1994. [34] Rong Chen and Jun S. Liu. Mixture kalman filters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(3):493–508, 2000. ISSN 1467-9868. doi: 10.1111/1467-9868.00246. URL http://dx.doi.org/10.1111/ 1467-9868.00246. [35] Howie Choset, Kevin M. Lynch, Seth Hutchinson, George Kantor, Wolfram Burgard, Lydia E. Kavraki, and Sebastian Thrun. Principles of Robot Motion. MIT Press, 2005. [36] R.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to algorithms. MIT Press, 2001. 149
BIBLIOGRAPHY [37] J. J. Craig. Introduction to robotics – Mechanics and control. Prentice Hall, 2005. [38] Mark Cummins and Paul Newman. Fab-map: Probabilistic localization and mapping in the space of appearance. The international Journal of Robotics Research, 27(6): 647–665, 2008. [39] Mark Cummins and Paul Newman. Appearance-only slam at large scale with fabmap 2.0. The international Journal of Robotics Research, 30(9):1100–1123, 2011. [40] G. Dedeoglu and G.S. Sukhatme. Landmark-based matching algorithm for cooperative mapping by autonomous robots. In Proceedings of the 5th International Symposium on Distributed Autonomous Robotic Systems (DARS). 2000. [41] Frank Dellaert. Square root SAM. In Proceedings of Robotics: Science and Systems, Cambridge, USA, June 2005. [42] M. W. M. Gamini Dissanayake, Paul Newman, Steven Clark, and Hugh F. DurrantWhyte. A solution to the simultaneous localization and map building (SLAM) problem. IEEE Trans. on Robotics and Automation, 17(3):229–241, June 2001. [43] Arnaud Doucet, Nando de Freitas, Kevin P. Murphy, and Stuart J. Russell. Raoblackwellised particle filtering for dynamic bayesian networks. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pages 176–183. Morgan Kaufmann Publishers Inc., 2000. [44] P. Elinas, R. Sim, and J.J. Little. /spl sigma/slam: stereo vision slam using the raoblackwellised particle filter and a novel mixture proposal distribution. In Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference on, pages 1564–1570, 2006. [45] J. Elseberg, D. Borrmann, and A. Nuchter. Efficient processing of large 3d point clouds. In Information, Communication and Automation Technologies (ICAT), 2011 XXIII International Symposium on, pages 1–7, Oct. doi: 10.1109/ICAT.2011. 6102102. [46] F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard. An evaluation of the rgb-d slam system. In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pages 1691–1696, 2012. doi: 10.1109/ICRA. 2012.6225199. [47] Evologics GmbH. http://www.evologics.de, 2010. [48] N. Fairfield, G. Kantor, and D. Wettergreen. Towards particle filter slam with three dimensional evidence grids in a flooded subterranean environment. In Proceedings 2006 IEEE International Conference on Robotics and Automation (ICRA), pages 3575–3580, 2006. [49] Nathaniel Fairfield, George A. Kantor, and David Wettergreen. Real-time slam with octree evidence grids for exploration in underwater tunnels. Journal of Field Robotics, 24(1-2):3–21, 2007. 150
BIBLIOGRAPHY [50] Maurice F. Fallon, Hordur Johannsson, Michael Kaess, John Folkesson, Hunter McClelland, BrendanJ. Englot, FranzS. Hover, and JohnJ. Leonard. Simultaneous localization and mapping in marine environments. In Mae L. Seto, editor, Marine Robot Autonomy, pages 329–372. Springer New York, 2013. ISBN 978-1-46145658-2. doi: 10.1007/978-1-4614-5659-9 8. URL http://dx.doi.org/10.1007/ 978-1-4614-5659-9_8. [51] M.F. Fallon, H. Johannsson, J. Brookshire, S. Teller, and J.J. Leonard. Sensor fusion for flexible human-portable building-scale mapping. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 4405 –4412, oct. 2012. doi: 10.1109/IROS.2012.6385882. [52] William Feller. The fundamental limit theorems in probability. Bulletin of the American Mathematical Society, 51:800–832, 1945. [53] J.W. Fenwick, P.M. Newman, and J.J. Leonard. Cooperative concurrent mapping and localization. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation, ICRA. IEEE Computer Society Press, 2002. [54] F. Ferreira, G. Veruggio, M. Caccia, and G. Bruzzone. An online slam-based mosaicking using local maps for rovs. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages 1058–1063, 2011. doi: 10.1109/ICRA.2011.5980521. [55] Martin A. Fischler and Robert C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Graphics and Image Processing, 24(6):381–395, 1981. [56] D. Fox, W. Burgard, H. Kruppa, and S. Thrun. A probabilistic approach to collaborative multi-robot localization. Automous Robots, Special Issue on Heterogeneous Multi-Robot Systems, 8(3):325–344, 2000. [57] D. Fox, J. Ko, K. Konolige, B. Limketkai, D. Schulz, and B. Stewart. Distributed multirobot exploration and mapping. Proceedings of the IEEE, 94(7):1325–1339, July 2006. ISSN 0018-9219. [58] F. Fraundorfer, L. Heng, D. Honegger, G.H. Lee, L. Meier, P. Tanskanen, and M. Pollefeys. Vision-based autonomous mapping and exploration using a quadrotor mav. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 4557 –4564, oct. 2012. doi: 10.1109/IROS.2012.6385934. [59] U. Frese, P. Larsson, and T. Duckett. A multilevel relaxation algorithm for simultaneous localization and mapping. Robotics, IEEE Transactions on, 21(2):196–207, April 2005. ISSN 1552-3098. doi: 10.1109/TRO.2004.839220. [60] Udo Frese. A discussion of simultaneous localization and mapping. Autonomous Robots, 20:25–42, 2006. [61] Franck Gerossier, Paul Checchin, Christophe Blanc, Roland Chapuis, and Laurent Trassoudaine. Trajectory-oriented ekf-slam using the Fourier-Mellin transform applied to microwave radar images. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages 4925–4930, Oct. 2009. doi: 10.1109/IROS.2009.5354548. 151
BIBLIOGRAPHY [62] A. Glover, W. Maddern, M. Warren, S. Reid, M. Milford, and Gordon Wyeth. Openfabmap: An open source toolbox for appearance-based loop closure detection. In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pages 4730–4735, 2012. doi: 10.1109/ICRA.2012.6224843. [63] M. Golfarelli, D. Maio, and S. Rizzi. Correction of dead-reckoning errors in map building for mobile robots. Robotics and Automation, IEEE Transactions on, 17(1): 37–47, Feb 2001. ISSN 1042-296X. doi: 10.1109/70.917081. [64] N.J. Gordon, D.J. Salmond, and A. F M Smith. Novel approach to nonlinear/nongaussian bayesian state estimation. Radar and Signal Processing, IEE Proceedings F, 140(2):107–113, 1993. ISSN 0956-375X. [65] G. Grisetti, C. Stachniss, and W. Burgard. Improving grid-based slam with raoblackwellized particle filters by adaptive proposals and selective resampling. In IEEE Int. Conf. Robotics and Automation, pages 2443–2448, Barcelona, Spain, 2005. [66] G. Grisetti, S. Grzonka, C. Stachniss, P. Pfaff, and W. Burgard. Efficient estimation of accurate maximum likelihood maps in 3d. In Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on, pages 3472–3478, 29 2007-Nov. 2 2007. [67] G. Grisetti, C. Stachniss, and W. Burgard. Improved techniques for grid mapping with rao-blackwellized particle filters. Robotics, IEEE Transactions on, 23(1):34–46, 2007. [68] G. Grisetti, C. Stachniss, S. Grzonka, and W. Burgard. A tree parameterization for efficiently computing maximum likelihood maps using gradient descent. In Proceedings of Robotics: Science and Systems, Atlanta, GA, USA, June 2007. [69] G. Grisetti, D.L. Rizzini, C. Stachniss, E. Olson, and W. Burgard. Online constraint network optimization for efficient maximum likelihood map learning. In Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on, pages 1880– 1885, May 2008. ˜ 1 mmerle, C. Stachniss, and W. Burgard. A tutorial on graph[70] G. Grisetti, R. KA 4 based slam. Intelligent Transportation Systems Magazine, IEEE, 2(4):31–43, 2010. ISSN 1939-1390. doi: 10.1109/MITS.2010.939925. [71] Giorgio Grisetti, Cyrill Stachniss, and Wolfram Burgard. Proper label: Grisettimapping-icra05!!! improving grid-based slam with rao-blackwellized particle filters by adaptive proposals and selective resampling. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA, 2005. [72] Giorgio Grisetti, Cyrill Stachniss, and Wolfram Burgard. Improving grid-based slam with rao-blackwellized particle filters by adaptive proposals and selective resampling. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA, 2005. [73] Giorgio Grisetti, Cyrill Stachniss, Slawomir Grzonka, and Wolfram Burgard. A tree parameterization for efficiently computing maximum likelihood maps using gradient descent. In Robotics: Science and Systems (RSS), 2007. 152
BIBLIOGRAPHY [74] Giorgio Grisetti, Rainer K¨ ummerle, Cyrill Stachniss, Udo Frese, and Christoph Hertzberg. Hierarchical optimization on manifolds for online 2d and 3d mapping. In Robotics and Automation, 2010. ICRA ’10. IEEE International Conference on, pages 273 –278, 2010. [75] Wesley H. Huang and Kristopher R. Beevers. Topological map merging. In Proceedings of the 7th International Symposium on Distributed Autonomous Robotic Systems (DARS). 2004. [76] D. Hahnel, W. Burgard, D. Fox, and S. Thrun. An efficient fastslam algorithm for generating maps of large-scale cyclic environments from raw laser range measurements. In Intelligent Robots and Systems, 2003. (IROS 2003). Proceedings. 2003 IEEE/RSJ International Conference on, volume 1, pages 206–211 vol.1, 2003. [77] Frank Hanson and Stojan Radic. High bandwidth underwater optical communication. Applied Optics, 47(2):277–283, 2008. [78] Emili Hernandez, Pere Ridao, David Ribas, and Angelos Mallios. Probabilistic sonar scan matching for an auv. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages 255–260, Oct. 2009. doi: 10.1109/ IROS.2009.5354656. [79] Christoph Hertzberg. A framework for sparse, non-linear least squares problems on manifolds. Master’s thesis, University of Bremen, 2008. [80] Paul W. Holland and Roy E. Welsch. Robust regression using iteratively reweighted least-squares. Communications in Statistics - Theory and Methods, 6:813–827, 1977. URL http://www.informaworld.com/10.1080/03610927708827533. [81] Armin Hornung, Kai M. Wurm, Maren Bennewitz, Cyrill Stachniss, and Wolfram Burgard. OctoMap: An efficient probabilistic 3D mapping framework based on octrees. Autonomous Robots, 2013. doi: 10.1007/s10514-012-9321-0. URL http: //octomap.github.com. Software available at http://octomap.github.com. [82] A. Howard. Multi-robot mapping using manifold representations. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 4198–4203. 2004. [83] Andrew Howard and Nicholas Roy. The robotics data set repository (radish). http: //radish.sourceforge.net/, 2003. [84] Andrew Howard, LynneE. Parker, and GauravS. Sukhatme. The sdr experience: Experiments with a large-scale heterogeneous mobile robot team. In Jr. Ang, MarceloH. and Oussama Khatib, editors, Experimental Robotics IX, volume 21 of Springer Tracts in Advanced Robotics, pages 121–130. Springer Berlin Heidelberg, 2006. ISBN 978-3-540-28816-9. doi: 10.1007/11552246 12. URL http: //dx.doi.org/10.1007/11552246_12. [85] Marco Huber, Tim Bailey, Hugh Durrant-Whyte, and Uwe Hanebeck. On entropy approximation for gaussian mixture random vectors. In IEEE Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems, 153
BIBLIOGRAPHY 2008. URL http://www-personal.acfr.usyd.edu.au/tbailey/publications/ gmmentropybounds.htm. [86] Peter J. Huber. Robust regression: Asymptotics, conjectures and monte carlo. The Annals of Statistics, 1(5):pp. 799–821, 1973. ISSN 00905364. URL http://www. jstor.org/stable/2958283. [87] Peter J. Huber and Evezio M. Ronchetti. Robust Statistics. John Wiley & Sons, Inc., 2nd edition edition, March 2009. ISBN 978-0-470-12990-6. [88] S. Jaruwatanadilok. Underwater wireless optical communication channel modeling and performance evaluation using vector radiative transfer theory. Selected Areas in Communications, IEEE Journal on, 26(9):1620–1627, 2008. [89] R.E. Kalman. A new approach to linear filtering and prediction problems. Transactions of ASME. Journal of Basic Engineering, 83, 1960. [90] Been Kim, M. Kaess, L. Fletcher, J. Leonard, A. Bachrach, N. Roy, and S. Teller. Multiple relative pose graphs for robust cooperative mapping. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 3185–3192, 2010. doi: 10.1109/ROBOT.2010.5509154. [91] A. Kleiner, J. Prediger, and B. Nebel. Rfid technology-based exploration and slam for search and rescue. In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages 4054–4059, 2006. doi: 10.1109/IROS.2006.281867. [92] J. Ko, B. Stewart, D. Fox, K. Konolige, and B. Limketkai. A practical, decisiontheoretic approach to multi-robot mapping and exploration. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2003. [93] A. Koenig, J. Kessler, and H.-M. Gross. A graph matching technique for an appearance-based, visual slam-approach using rao-blackwellized particle filters. In Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on, pages 1576–1581, 2008. [94] A. Kolling, A. Kleiner, M. Lewis, and K. Sycara. Pursuit-evasion in 2.5d based on team-visibility. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages 4610–4616, Oct. doi: 10.1109/IROS.2010.5649270. [95] K. Konolige, D. Fox, B. Limketkai, J. Ko, and B. Steward. Map merging for distributed robot navigation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 212–217. 2003. [96] Kurt Konolige, Giorgio Grisetti, Rainer K¨ ummerle, Wolfram Burgard, Benson Limketkai, and Regis Vincent. Efficient sparse pose adjustment for 2d mapping. In Intelligent Robots and Systems, 2010. IROS 2010. IEEE/RSJ International Conference on. In Press, 2010. [97] Rainer Kuemmerle, Bastian Steder, Christian Dornhege, Michael Ruhnke, Giorgio Grisetti, Cyrill Stachniss, and Alexander Kleiner. On measuring the accuracy of slam algorithms. Autonomous Robots, 27(4):387–407, 2009. 154
BIBLIOGRAPHY [98] R. K¨ ummerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. G2o: A general framework for graph optimization. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages 3607 –3613, may 2011. doi: 10. 1109/ICRA.2011.5979949. [99] Y. Latif, C. Cadena, and J. Neira. Realizing, reversing, recovering: Incremental robust loop closing over time using the irrr algorithm. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 4211–4217, Oct. doi: 10.1109/IROS.2012.6385879. [100] Yasir Latif, Cesar Cadena Lerma, and Jose Neira. Robust loop closing over time. In Proceedings of Robotics: Science and Systems, Sydney, Australia, July 2012. [101] R. Leishman, J. Macdonald, T. McLain, and R. Beard. Relative navigation and control of a hexacopter. In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pages 4937 –4942, may 2012. doi: 10.1109/ICRA.2012.6224983. [102] Jongwoo Lim, Jan-Michael Frahm, and Marc Pollefeys. Online environment mapping using metric-topological maps. The International Journal of Robotics Research, 31 (12):1394–1408, 2012. doi: 10.1177/0278364912461455. URL http://ijr.sagepub. com/content/31/12/1394.abstract. [103] LinkQuest Inc. http://www.link-quest.com, 1999-2010. [104] David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. [105] F. Lu and E. Milios. Globally consistent range scan alignment for environment mapping. Autonomous Robots, 4(4):333–349, 1997. ISSN 0929-5593. doi: http: //dx.doi.org/10.1023/A:1008854305733. [106] B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings DARPA Image Understanding Workshop, pages 121–130, 1981. [107] A. Mallios, P. Ridao, E. Hernandez, D. Ribas, F. Maurelli, and Y. Petillot. Posebased slam with probabilistic scan matching algorithm using a mechanical scanned imaging sonar. In OCEANS 2009-EUROPE, 2009. OCEANS ’09., pages 1–6, May 2009. doi: 10.1109/OCEANSE.2009.5278219. [108] R.L. Marks, S.M. Rock, and M.J. Lee. Real-time video mosaicking of the ocean floor. Oceanic Engineering, IEEE Journal of, 20(3):229 –241, Jul 1995. [109] Tim K. Marks, Andrew Howard, Max Bajracharya, Garrison W. Cottrell, and Larry H. Matthies. Gamma-slam: Visual slam in unstructured environments using variance grid maps. Journal of Field Robotics, 26(1):26–51, 2009. [110] J. McDonald, M. Kaess, C. Cadena, J. Neira, and J.J. Leonard. Realtime 6-dof multi-session visual {SLAM} over large-scale environments. Robotics and Autonomous Systems, (0):–, 2012. ISSN 0921-8890. doi: 10.1016/j. robot.2012.08.008. URL http://www.sciencedirect.com/science/article/pii/ S0921889012001406. 155
BIBLIOGRAPHY [111] M. Montemerlo and S. Thrun. Simultaneous localization and mapping with unknown data association using fastslam. In Robotics and Automation, 2003. Proceedings. ICRA ’03. IEEE International Conference on, volume 2, pages 1985–1991 vol.2, 2003. [112] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. Fastslam: A factored solution to the simultaneous localization and mapping problem. In Proceedings of the AAAI National Conference on Artificial Intelligence, Edmonton, Canada, 2002. AAAI. [113] Kevin Murphy. Bayesian map learning in dynamic environments. In Advances in Neural Information Processing Systems (NIPS), pages 1015–1021, 1999. [114] J. Neira and J. D. Tardos. Data association in stochastic mapping using the joint compatibility test. Robotics and Automation, IEEE Transactions on, 17(6):890–897, 2001. [115] Eric Nettleton, Sebastian Thrun, Hugh Durrant-Whyte, and Salah Sukkarieh. Decentralised slam with low-bandwidth communication for teams of vehicles. In Shin’ichi Yuta, Hajima Asama, Erwin Prassler, Takashi Tsubouchi, and Sebastian Thrun, editors, Field and Service Robotics, volume 24 of Springer Tracts in Advanced Robotics, pages 179–188. Springer Berlin Heidelberg, 2006. ISBN 978-3-540-32801-8. doi: 10.1007/10991459 18. URL http://dx.doi.org/10.1007/10991459_18. [116] Yashodhan Nevatia, Todor Stoyanov, Ravi Rathnam, Max Pfingsthorn, Stefan Markov, Rares Ambrus, and Andreas Birk. Augmented autonomy: Improving human-robot team performance in urban search and rescue. In International Conference on Intelligent Robots and Systems (IROS). IEEE Press, 2008. [117] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer Series in Operations Research. Springer-Verlag New York, Inc., first edition, 1999. [118] E. Olson, J. Leonard, and S. Teller. Fast iterative alignment of pose graphs with poor initial estimates. Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference on, pages 2262–2269, May 2006. ISSN 1050-4729. doi: 10.1109/ROBOT.2006.1642040. [119] E. Olson, J. Leonard, and S. Teller. Fast iterative alignment of pose graphs with poor initial estimates. In J. Leonard, editor, Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference on, pages 2262–2269, 2006. [120] E. Olson, J. Leonard, and S. Teller. Spatially-Adaptive Learning Rates for Online Incremental SLAM. In Proceedings of Robotics: Science and Systems, Atlanta, GA, USA, June 2007. [121] E.B. Olson. Real-time correlative scan matching. In Robotics and Automation, 2009. ICRA ’09. IEEE International Conference on, pages 4387–4393, 2009. [122] Edwin Olson. Robust and Efficient Robotic Mapping. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, June 2008. [123] Edwin Olson. Recognizing places using spectrally clustered local matches. Robotics and Autonomous Systems, Dec 2009. 156
BIBLIOGRAPHY [124] Edwin Olson and Pratik Agarwal. Inference on networks of mixtures for robust robot mapping. In Proceedings of Robotics: Science and Systems, Sydney, Australia, July 2012. [125] Edwin Olson, Matthew Walter, John Leonard, and Seth Teller. Single cluster graph partitioning for robotics applications. In Proceedings of Robotics Science and Systems, pages 265–272, 2005. [126] L.E. Parker. Current state of the art in distributed autonomous mobile robots. In L.E. Parker, G. Bekey, and J.Barhen, editors, Distributed Autonomous Robotic Systems 4, pages 3–12. Springer, 2000. [127] K. Pathak, N. Vaskevicius, J. Poppinga, M. Pfingsthorn, S. Schwertfeger, and A. Birk. Fast 3d mapping by matching planes extracted from range sensor pointclouds. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, 2009. [128] K. Pathak, M. Pfingsthorn, H. B¨ ulow, and A. Birk. Robust Estimation of CameraTilt for iFMI based Underwater Photo-Mapping using a Calibrated Monocular Camera. In Robotics and Automation (ICRA), 2013 IEEE International Conference on, May 2013. [129] Kaustubh Pathak, Andreas Birk, Jann Poppinga, and S¨oren Schwertfeger. 3d forward sensor modeling and application to occupancy grid based sensor fusion. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, San Diego, Nov 2007. [130] Kaustubh Pathak, Max Pfingsthorn, Narunas Vaskevicius, and Andreas Birk. Relaxing loop-closing errors in 3d maps based on planar surface patches. In International Conference on Advanced Robotics, Munich, Germany, June 2009. [131] Kaustubh Pathak, Andreas Birk, and Narunas Vaskevicius. Plane-based registration of sonar data for underwater 3d mapping. In International Conference on Intelligent Robots and Systems (IROS), pages 4880 – 4885, 2010. [132] Kaustubh Pathak, Andreas Birk, Narunas Vaskevicius, Max Pfingsthorn, Soeren Schwertfeger, and Jann Poppinga. Online 3d slam by registration of large planar surface segments and closed form pose-graph relaxation. Journal of Field Robotics, Special Issue on 3D Mapping, 27(1):52–84, 2010. [133] Kaustubh Pathak, Andreas Birk, Narunas Vaskevicius, and Jann Poppinga. Fast Registration Based on Noisy Planes with Unknown Correspondences for 3D Mapping. IEEE Transactions on Robotics, 26(2):1 – 18, March 2010. doi: 10.1109/TRO. 2010.2042989. [134] Kaustubh Pathak, Dorit Borrmann, Jan Elseberg, Narunas Vaskevicius, Andreas Birk, and Andreas Nuchter. Evaluation of the robustness of planar-patches based 3d-registration using marker-based ground-truth in an outdoor urban scenario. In International Conference on Intelligent Robots and Systems (IROS), pages 5725 – 5730, 2010. 157
BIBLIOGRAPHY [135] K. B. Petersen and M. S. Pedersen. The matrix cookbook, oct 2008. URL http: //www2.imm.dtu.dk/pubdb/p.php?3274. Version 20081110. [136] M. Pfingsthorn, A. Birk, and H. Bulow. An efficient strategy for data exchange in multi-robot mapping under underwater communication constraints. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages 4886–4893, Oct 2010. doi: 10.1109/IROS.2010.5650270. [137] M. Pfingsthorn, H. B¨ ulow, A. Birk, F. Ferreira, G. Veruggio, M. Caccia, and G. Bruzzone. Large-Scale Mosaicking with Spectral Registration based Simultaneous Localization and Mapping (iFMI-SLAM) in the Ligurian Sea. In OCEANS 2013 Bergen, June 2013. [138] M. Pfingsthorn, H. B¨ ulow, Igor Sokolovski, and A. Birk. Underwater Stereo Data Acquisition and 3D Registration with a Spectral Method. In OCEANS 2013 Bergen, June 2013. [139] Max Pfingsthorn and Andreas Birk. Efficiently communicating map updates with the pose graph. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), 2008. [140] Max Pfingsthorn and Andreas Birk. Simultaneous Localization and Mapping with Multimodal Probability Distributions. The International Journal of Robotics Research, 32(2):143–171, 2013. doi: 10.1177/0278364912461540. URL http://ijr. sagepub.com/content/32/2/143.abstract. [141] Max Pfingsthorn and Andreas Birk. Handling local and global ambiguities via a generalized graph slam framework based on multimodal and hyperedge constraints. In Proceedings of the 1st Workshop on Robust and Multimodal Inference in Factor Graphs at ICRA 2013, May 2013. [142] Max Pfingsthorn and Andreas Birk. Representing and Solving Local and Global Ambiguities as Multimodal and Hyperedge Constraints in a Generalized Graph SLAM Framework. In Robotics and Automation, 2014. Proceedings. ICRA ’14. IEEE International Conference on, 2014. [143] Max Pfingsthorn, Yashodhan Nevatia, Todor Stoyanov, Ravi Rathnam, Stefan Markov, and Andreas Birk. Towards collaborative and decentralized mapping in the jacobs virtual rescue team. In L. Iocchi, H. Matsubara, A. Weitzenfeld, and C. Zhou, editors, RoboCup 2008: Robot Soccer World Cup XII. Springer Verlag, Berlin, 2008. [144] Max Pfingsthorn, Andreas Birk, S¨oren Schwertfeger, Heiko B¨ ulow, and Kaustubh Pathak. Maximum likelihood mapping with spectral image registration. In Robotics and Automation, 2010. ICRA 2010. Proceedings of the 2010 IEEE International Conference on, 2010. [145] Max Pfingsthorn, Andreas Birk, and Narunas Vaskevicius. Semantic annotation of ground and vegetation types in 3d maps for autonomous underwater vehicle operation. In IEEE Oceans, 2011. 158
BIBLIOGRAPHY [146] Max Pfingsthorn, Andreas Birk, Narunas Vaskevicius, and Kaustubh Pathak. Cooperative 3d mapping under underwater communication constraints. In IEEE Oceans, 2011. [147] Max Pfingsthorn, Andreas Birk, and Heiko B¨ ulow. Uncertainty estimation for a 6-dof spectral registration method as basis for sonar-based underwater 3d slam. In Robotics and Automation, 2012. Proceedings. ICRA ’12. IEEE International Conference on. IEEE Press, 2012. [148] S. T. Pfister, K. L. Kriechbaum, S. I. Roumeliotis, and J. W. Burdick. Weighted range sensor matching algorithms for mobile robot displacement estimation. In Robotics and Automation, 2002. Proceedings. ICRA ’02. IEEE International Conference on, volume 2, pages 1667–1674, 2002. [149] Jann Poppinga, Max Pfingsthorn, Soeren Schwertfeger, Kaustubh Pathak, and Andreas Birk. Optimized octtree datastructure and access methods for 3d mapping. In IEEE Safety, Security, and Rescue Robotics (SSRR). IEEE Press, 2007. [150] Jann Poppinga, Andreas Birk, Kaustubh Pathak, and Narunas Vaskevicius. Fast 6dof path planning for autonomous underwater vehicles (auv) based on 3d plane mapping. In IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pages 1–6. IEEE Press, 2011. [151] Ravi Rathnam, Max Pfingsthorn, and Andreas Birk. Incorporating large scale ssrr scenarios into the high fidelity simulator usarsim. In IEEE International Workshop on Safety, Security, and Rescue Robotics (SSRR), pages 1–6. IEEE Press, 2009. [152] R. Reid and T. Braunl. Large-scale multi-robot mapping in magic 2010. In Robotics, Automation and Mechatronics (RAM), 2011 IEEE Conference on, pages 239–244, 2011. doi: 10.1109/RAMECH.2011.6070489. [153] David Ribas, Pere Ridao, Jose Neira, and Juan D. Tardos. Slam using an imaging sonar for partially structured underwater environments. In Pere Ridao, editor, Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages 5040–5045, 2006. [154] David Ribas, Pere Ridao, Juan Domingo Tard´os, and Jos´e Neira. Underwater slam in man-made structured environments. J. Field Robot., 25(11-12):898–921, 2008. ISSN 1556-4959. doi: http://dx.doi.org/10.1002/rob.v25:11/12. [155] H. Riksfjord, O.T. Haug, and J.M. Hovem. Underwater acoustic networks - survey on communication challenges with transmission simulations. In Sensor Technologies and Applications, 2009. SENSORCOMM ’09. Third International Conference on, pages 300–305, 2009. [156] Martijn N. Rooker and Andreas Birk. Multi-robot exploration under the constraints of wireless networking. Control Engineering Practice, 15(4):435–445, 2007. [157] Peter J. Rousseeuw and Annick M. Leroy. Robust Regression and Outlier Detection. John Wiley & Sons, Inc., 2005. ISBN 9780471725381. doi: 10.1002/0471725382. fmatter. URL http://dx.doi.org/10.1002/0471725382.fmatter. 159
BIBLIOGRAPHY [158] N. Roy and G. Dudek. Collaborative exploration and rendezvous: Algorithms, performance bounds and observations. Autonomous Robots, 11, 2001. [159] S. Russel and P. Norwig. Artificial Intelligence - A modern approach. Prentice Hall International, 1995. [160] Vytenis Sakenas, Olegas Kosuchinas, Max Pfingsthorn, and Andreas Birk. Extraction of semantic floor plans from 3d point cloud maps. In International Workshop on Safety, Security, and Rescue Robotics (SSRR). IEEE Press, 2007. [161] C. Schroeter and H.-M. Gross. A sensor-independent approach to rbpf slam - map match slam applied to visual mapping. In Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on, pages 2078–2083, 2008. [162] R. G. Simmons, D. Apfelbaum, W. Burgard, D. Fox, M. Moors, S. Thrun, and H. L. S. Younes. Coordination for multi-robot exploration and mapping. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 852–858. 2000. [163] Randall Smith, Matthew Self, and Peter Cheeseman. Estimating uncertain spatial relationships in robotics. In I. J. Cox and G. T. Wilfon, editors, Autonomous robot vehicles, pages 167–193, New York, 1990. Springer-Verlag. [164] E.M. Sozer, M. Stojanovic, and J.G. Proakis. Underwater acoustic networks. Oceanic Engineering, IEEE Journal of, 25(1):72–83, 2000. [165] C. Stachniss, G. Grisetti, and W. Burgard. Recovering particle diversity in a raoblackwellized particle filter for slam after actively closing loops. In Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on, pages 655–660, 2005. [166] C. Stachniss, G. Grisetti, W. Burgard, and N. Roy. Analyzing Gaussian Proposal Distributions for Mapping with Rao-Blackwellized Particle Filters. In Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on, pages 3485 –3490, Nov. 2007. doi: 10.1109/IROS.2007.4399005. [167] M. Stojanovic. Recent advances in high-speed underwater acoustic communications. Oceanic Engineering, IEEE Journal of, 21(2):125–136, 1996. [168] N. Sunderhauf and P. Protzel. Switchable constraints for robust pose graph slam. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 1879 –1884, oct. 2012. doi: 10.1109/IROS.2012.6385590. [169] N. Sunderhauf and P. Protzel. Towards a robust back-end for pose graph slam. In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pages 1254 –1261, may 2012. doi: 10.1109/ICRA.2012.6224709. [170] S. Thrun. A probabilistic online mapping algorithm for teams of mobile robots. International Journal of Robotics Research, 20(5):335–363, 2001. [171] S. Thrun. Learning occupancy grids with forward sensor models. Autonomous Robots, 15:111–127, 2003. 160
BIBLIOGRAPHY [172] S. Thrun, W. Burgard, and D. Fox. A real-time algorithm for mobile robot mapping with applications to multi-robot and 3d mapping. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 321–328. 2000. [173] Sebastian Thrun and Yufeng Liu. Multi-robot slam with sparse extended information filers. In Paolo Dario and Raja Chatila, editors, The Eleventh International Symposium on Robotics Research (ISRR), volume 15, pages 254–266. Springer, 2005. [174] Sebastian Thrun, Wolfram Burgard, and Dieter Fox. Probabilistic Robotics. The MIT Press, Cambridge, MA, 2005. [175] G.D. Tipaldi and K.O. Arras. Flirt - interest regions for 2d range data. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 3616 – 3622, may 2010. doi: 10.1109/ROBOT.2010.5509864. [176] D. Titterington, A. Smith, and U. Makov. Statistical Analysis of Finite Mixture Distributions. John Wiley & Sons, 1985. [177] M. Tomono. Monocular slam using a rao-blackwellised particle filter with exhaustive pose space search. In Robotics and Automation, 2007 IEEE International Conference on, pages 2421–2426, 2007. [178] Ioana Varsadan, Andreas Birk, and Max Pfingsthorn. Determining map quality through an image similarity metric. In Luca Iocchi, Hitoshi Matsubara, Alfredo Weitzenfeld, and Changjiu Zhou, editors, RoboCup 2008: Robot WorldCup XII, Lecture Notes in Artificial Intelligence (LNAI), pages 355–365. Springer, 2009. [179] A. Walcott-Bryant, M. Kaess, H. Johannsson, and J.J. Leonard. Dynamic pose graph slam: Long-term mapping in low dynamic environments. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 1871 –1878, oct. 2012. doi: 10.1109/IROS.2012.6385561. [180] J. Welle, D. Schulz, T. Bachran, and A.B. Cremers. Optimization techniques for laser-based 3d particle filter slam. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 3525–3530, 2010. [181] W. S. Wijesoma, L. D. L. Perera, and M. D. Adams. Toward multidimensional assignment data association in robot localization and mapping. Robotics, IEEE Transactions on, 22(2):350–365, 2006. [182] S.B. Williams, G. Dissanayake, and H. Durrant-Whyte. Towards multi-vehicle simultaneous localisation and mapping. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation, ICRA. IEEE Computer Society Press, 2002. [183] Woods Hole Oceanographic Institute. Acoustic communications. http://acomms. whoi.edu, 2006. [184] Son-Cheol Yu, T. Ura, and N. Yoshiaki. Multi-auv based cooperative observations. In T. Ura, editor, Autonomous Underwater Vehicles, 2004 IEEE/OES, pages 7–13, 2004. 161
BIBLIOGRAPHY
162