Apr 19, 2017 - mapping for robot navigation, our framework supports real- time erosion and .... meant to run on desktop computers equipped with high- ..... [11] J. S. Gutmann, M. Fukuchi, and M. Fujita, âA floor and obstacle height map for 3d ...
SkiMap: An Efficient Mapping Framework for Robot Navigation
arXiv:1704.05832v1 [cs.CV] 19 Apr 2017
Daniele De Gregorio1 and Luigi Di Stefano1
Fig. 1. SkiMap encodes seamlessly a full 3D reconstruction of the environment (left), a height map (center) and a 2D occupancy grid (right). The three representations can be delivered on-line with decreasing time complexity. The displayed maps have been obtained on the Freiburg Campus dataset.
Abstract— We present a novel mapping framework for robot navigation which features a multi-level querying system capable to obtain rapidly representations as diverse as a 3D voxel grid, a 2.5D height map and a 2D occupancy grid. These are inherently embedded into a memory and time efficient core data structure organized as a Tree of SkipLists. Compared to the wellknown Octree representation, our approach exhibits a better time efficiency, thanks to its simple and highly parallelizable computational structure, and a similar memory footprint when mapping large workspaces. Peculiarly within the realm of mapping for robot navigation, our framework supports realtime erosion and re-integration of measurements upon reception of optimized poses from the sensor tracker, so as to improve continuously the accuracy of the map.
I. INTRODUCTION Key to autonomous robot navigation is the ability to attain a sufficiently rich perception of the environment, so to allow the robot to plan a path, localize itself and avoid obstacles. This kind of perception is realized through suitable sensors and algorithms. As for the former, laser rangefinders have traditionally been employed to capture a planar view of the surroundings, while visual sensors, and in particular RGBD cameras, are becoming more and more widespread on account of their potential to model the environment in 3D. In the space of algorithms, many proposals concern fullfledged SLAM (Simultaneous Localization And Mapping) systems aimed at both building a map of the workspace and localizing the sensor (i.e. the robot) therein. Other works, differently, are focused on the mapping task and address issues such as memory efficiency, quite mandatory to enable navigation in large spaces, and time efficiency, which concerns creating on-line the representation required by the navigation system, such as a 2D occupancy grid to plan a path through the environment or a 3D reconstruction to avoid obstacles reliably while a robot moves around or 1 DISI,
University of Bologna, Italy
a 2.5 (aka height) map to assess free space at the flight altitude of a MAV (Micro Aerial Vehicle). Along the latter research line, in this paper we focus on mapping and present a novel approach, dubbed SkiMap, which is particularly time efficient and flexible enough to support seamlessly different kinds of representations that may be delivered online according to the application requirements (Figure 1). Another favorable trait of our mapping framework is the ability to erode and fuse back measurements in realtime upon receiving optimized poses from the sensor localization module in order to improve the accuracy of the map. Indeed, many recent sensor localization algorithms based on visual data can perform pose optimization online, e.g. upon detection of a loop closure, which holds the potential to continuously improve the map as long as sensor measurements may be injected therein according to the new optimized poses rather than the old ones. Our framework has been implemented as a ready-touse ROS [1] package freely distributed for research and education purposes1 . The package can be configured either to achieve mapping in conjunction with any external sensor localization module or as a full-fledged SLAM system, Slamdunk [2] providing camera poses optimized on-line in the latter option. II. RELATED WORK As described in [3], the classical mapping approach for robot navigation is the 2D occupancy grid. Accordingly, sensors measurements (typically from planar laser scanners) are fused into a 2D Grid wherein each tile (i.e. a square chunk of the space) contains an occupancy probability which can be interpreted as the likelihood that the tile belongs to an obstacle. Many robot navigation systems, often referred to as Grid-Based SLAM, rely on this 2D occupancy grid [4], 1 https://github.com/m4nh/skimap_ros
which is available in ROS [1] and can be considered as a baseline for robot navigation. Yet, planar sensing and related 2D mapping may not be reliable enough due to defective reconstruction of the environment, e.g. when dealing with a MAV (micro aerial vehicle) for indoor navigation, or, more generally with any Mobile Robot that cannot be modeled as a bi-dimensional agent or any robot that has more than 3 DOFs. A conceptually straightforward approach to pursue 3D mapping when deploying sensors, e,g. visual sensors, capable of delivering 3D measurements would consist in extending the occupancy map to a 3D Grid by cutting the 3D space into Voxels (i.e. small cubes) [5], each voxel storing the probability for an obstacle to be located therein. However, handling a 3D occupancy grid may easily become impractical when dealing with large workspaces due to the excessive memory footprint; for example, should the probability stored in each voxel be encoded as a float number (4 Bytes), the 3D occupancy grid would require as many bytes of memory as MOG =
x×y×z ×4 r3
(1)
x, y, z being the sizes of each of the dimensions of the workspace and r the voxel resolution (i.e. the voxel size expressed in the same units as x, y, z). A quite popular memory efficient alternative to the 3D occupancy grid is the Octree [6], whereby the 3D Space is recursively partitioned into octants (octants being equivalent of quadrants in the 3D space) until voxels take the desired resolution. As such, this data structure is a tree in which each node has exactly 8 children; unlike the 3D grid, though, the Octree avoids modeling the empty space as only leaf and inner nodes associated with occupied space need to be allocated, thereby yielding significant memory savings when representing large environments. Hence, well-known mapping frameworks like Octomap [7] rely on this kind of data structure to build the required workspace representation. In particular, Hornung et. al.[7] shape their data structure so that voxels (leafs of the Octree) are the only nodes storing mapping information, all other ones containing references to children only. Therefore, the memory occupancy in bytes can be expressed as: MOCT = nleaf × Bleaf + ninner × Binner
(2)
where Bleaf s and Binner are the occupancy in bytes of leaf and inner nodes, and nleaf ,ninner the number of leaf and inner nodes corresponding to non-empty space, respectively. Accordingly, the memory footprint is dependent on the amount of space actually occupied and not on the overall size of the environment. However, using an Octree rather than a 3D grid implies a space vs. time trade-off, the memory footprint is reduced at the expense of the query time, the computational complexity of a random voxel access being just O(1) for a 3D Grid, as large as O(log d) for an Octree (d denoting the depth of the tree). This issue has motivated a recent proposal by Labschutz et. al.[8] who mix the two approaches into a novel
data structure referred to as Jittree and fully managed by the GPU. Researchers have also explored other solutions, such as Multi-Level Surface Maps [9] and Multi-Volume Occupancy [10], aimed at ameliorating the memory efficiency of the data structures, referred to as height or 2.5D maps, that endow a 2D Grid with measurements concerning the height of obstacles. In particular, information about occupied and free-space is accounted through a dynamic 2D grid where each element is a list of voxels. Thus, akin to Eq.2, the memory occupancy may be expressed as: MM LS =
(x × y) × Btile + nvoxels × Bvoxel r2
(3)
where x, y are the sizes of the projection of the workspace onto a plane (e.g. the ground plane), r the resolution of the above mentioned 2D grid, Btile is the occupancy in bytes of a tile of the grid, Bvoxel that of each element of a voxel list, nvoxels the number of voxels dealing with non-empty space. This kind of approach compares favorably w.r.t. the Octree in terms of memory footprint [10], though, again, at the expense of time complexity: as the data structure is basically a linked-list on top of a grid, a random voxel access takes O(n) (n being the number of voxels in a list). As already mentioned, a basic 2D Map built from planar range sensors is ofter not reliable for navigation due to lack of information concerning the height of obstacles. On the other hand, due to the complexity of pursuing path planning directly in the 3D space, most proposals, such as [11][12][13], deploy the rich information embedded into a 3D map so to create a reliable 2D projection that is actually used for the sake of planning. Consequently, the time efficiency of visiting both the entire 3D Map as well as the local neighborhood of a given 3D point (aka radius search) are crucial aspects in the selection of the right data structure. The mapping framework proposed in this paper, dubbed SkiMap, features a memory footprint similar to an Octree when dealing with large environments and a better time complexity, i.e. O( logk n ), thanks to a highly-parallelizable computational structure. Besides, it inherently embodies a 3D, 2.5D and 2D map that can be delivered with a time complexity which decreases alongside the richness of the representation. Moreover, unlike the previously mentioned mapping frameworks proposed for robot navigation, SkiMap can deploy the ability of the sensor localization module to deliver optimized poses in order to update the map in real-time. Indeed, such a task is pursued nowadays only by state-of-the-art SLAM systems, such as [14] or [15], aimed at producing high-quality 3D scans of the workspace and meant to run on desktop computers equipped with highperformance and power-hungry GPU cards, an application domain quite different from mobile robotics which calls for compact, low-power computing platforms mounted onboard. In particular, the recent proposal in [15] represents the 3D Space using VoxelHashing [16], which is a memoryefficient data structure managed by the GPU which enables
fast random voxel access, but turns out to be vastly inefficient for radius search, which is a key requirement for robot navigation. III. S KI M AP MAPPING ALGORITHM In this section we explore the SkiMap algorithm in its entirety, describing the key data structure as well as how to carry out mapping differently from standard approaches like Octree or 3D Grid. Furthermore, we highlight the inherent parallelism of the proposed data structure, which is conducive to notably improved time efficiency in key tasks dealing with robot navigation. A. DATA STRUCTURE: TREE OF SKIPLISTS
Fig. 5. Grouping voxels into a Tree of SkipLists. Each voxel (blue box) is linked to the rootNode by a yNode (green tile) which in turn is linked to a xNode (red tile).
Fig. 3. Tree structure to group voxels according to their coordinates. The maximum depth of the tree is 3, nodes with depth d3 being voxels while those with depths d1 , d2 being transient nodes. Nodes at depths d1 , d2 store only integer numbers representing the associated quantized coordinate, while voxels (blue nodes) can be deployed to store user data, such as for example Occupancy Probability [3].
Fig. 4. The visible part of a SkipList is identical to a LinkedList. The hidden segment of a SkipList shall ensure a random access complexity of O(log n) rather than O(n).
SkiMap relies on the basic concept of grouping voxels within a tree as outlined in Figure 3. The actual voxels are nodes at depth 3, which are grouped into nodes at depth 2 according to equal quantized (x, y) coordinates, the nodes at depth 2 in turn grouped into nodes at depth 1 according to equal quantized x coordinates. However, adopting a classical tree structure to realize the concept illustrated in Figure 3 would not be efficient because of the unbounded number of siblings at each depth level (unlike the octree, in turn, where each node has always 8 children). Indeed, should the children of each node be stored in a ordinary list, performing a random access would exhibit O(n) complexity. To overcome this efficiency issue, we adopted a rather uncommon data structure called SkipList and proposed by Pugh et. al. [17]. As shown in Figure 4 a SkipList is apparently similar to an Ordered Linked List, but the former also stashes a superstructure aimed at bringing the computational complexity
associated with random access from O(n) down to O(log n). In a SkipList elements are kept ordered, and thus, compared to an ordinary list, insertion time grows from O(1) to O(log n) due to each insertion requiring a search. Figure 5 shows the actual realization of the concept illustrated in Figure 3. A first SkipList keeps track of quantized x coordinates, thereby realizing depth level 1 of Figure 3; the items of the first SkipList are referred to as xNodes and colored in red in Figure 5; each xNode is in turn a SkipList which keeps track of quantized y coordinates, thus implementing depth level 2 of Figure 3; the items in these nested SkipLists are dubbed yNodes (green) in Figure 5; eventually, each yNode is a SkipList of zNodes (blue), which represent the actual voxels and provide the containers for any kind of user data. Therefore, the concept shown in Figure 3 is realized by a novel data structure that may be thought of as a Tree of SkipLists. It is worth pointing out that with the proposed data structure the coordinates of a voxel can be obtained by iterating through its predecessors and thus need not to be stored in the containers together with user data; for example for the voxel referred to as z3 in Figure 5, iterating back through predecessors provides coordinates (x0 , y1 , z3 ). A similar technique is used in Octomap to avoid coordinates storage in the leaves of the octree [7]. As a detailed description of the SkipList is outside the scope of this paper and can be found in [17], we conclude this section with a brief review of the key concepts related to this topic. A SkipList is a multi-level linked-list in which the first level is a list containing all the elements ordered by a Key (each node being a pair < Key, V alue >) level i contains about half elements of level i − 1, still ordered by Key. Similarly to a Binary Tree, a search is performed starting from level i = imax down to i = 1 in O(log n), at the expense of memory footprint (due to replicated elements). Depth (i.e. number of levels) is a parameter of the SkipList to be chosen based on application settings. Indeed, there exists an upper limit beyond which one gets no further benefits in terms of timing performance while significantly
SkipList Depth 4 8 16 32 64
Integration Time 56 ms 29 ms 29 ms 32 ms 33 ms
Visiting Time 215 ms 269 ms 215 ms 259 ms 258 ms
Memory 432 MB 588 MB 900 MB 1524 MB 2743 MB
TABLE I A NALYSIS OF S KIP L IST DEPTH : TESTS PERFORMED ON F REIBURG C AMPUS DATASET WITH A RESOLUTION OF 0.05m. T HE TABLE
We can classify all the possible operations on the data structure into two main categories: • Visiting Operations: Visiting the whole tree (i.e. reaching each voxel of the map) consists in visiting all firstlevel nodes in parallel and collecting the results: π(Γ) =
(∼180 K POINTS ), THE AVERAGE TIME FOR A
FULL VISIT OF THE MAP AND THE MEMORY FOOTPRINT OF THE MAP.
increasing memory footprint. As reported in Table I, this is vouched also by our experimental findings. B. VOXEL INDEXING As each node of our data structure is addressable by a Key, we can use it to map real world coordinates to quantized indexes just as it would happen in a 3D Grid. Thus, to retrieve the voxel v(Ix , Iy , Iz ) corresponding to a 3D point p(x, y, z): jyk jz k jxk , Iy = , Iz = (4) Ix = r r r r denoting, as usual, voxel resolution. Unlike a 3D Grid, however, we can use also negative indexes as they represent Keys of a map rather than simple indexes of an array. This is important for mapping applications as, more often than not, the ground reference of the map (aka Zero Reference Frame) is not known a priori. With our data structure, querying for a voxel f (Ix , Iy , Iz ) = v consists in executing the iterative query h(g(f (Ix ), Iy ), Iz ) = v. Thus, with reference to Figure 5: • f (•) retrieves a red tile / xNode • g(•) retrieves a green tile / yNode • h(•) retrieves a blue box / zNode / Voxel Each of three query function f (•), g(•), h(•) can result in either a Hit or a Miss. Moreover, each generic function φ(f (Ix )) may be performed concurrently because it involves separate branches of the SkipList Tree (see again Figure 5). C. PARALLELIZATION As highlighted in the previous section, the proposed data structure inherently provides for a high degree of parallelization. Besides, even a single SkipList may enable a certain level of parallelization by using locks on nodes [18]. However, we decided to exploit only the high parallelism among voxel indexing operations enabled by our data structure while not deploying also the lock-based technique to further parallelize accessed within a SkipList, mainly to maintain a lean and simpler code and secondly due to lockbased algorithm being often unpredictable, which makes them unsuited to real-time tasks. As already mentioned the operations involving separated branches of the first level of our SkipList Tree, that is f (Ixi ) 6= f (Ixj ) → xi 6= xj , can be performed in parallel,
π(f (Ixi ))
(5)
i=min
REPORTS THE AVERAGE COMPUTATION TIME TO INTEGRATE NEW SENSOR MEASUREMENTS
max X
•
Updating Operation: upon performing a generic update operation we cannot know in advance whether it will produce a new allocation or a deallocation or it will just update the content of an existing voxel without performing further search. To ensure that an update operation is not concurrent over others we can reuse again the previous technique: we assume that two update operations do not conflict if they belongs to two separate first level branches. In a typical application scenario we are given a set of sensor measurement to be integrated into the map, i.e. a set of 3D points: C = {p(x, y, z)}. Hence, we can group these points in subsets according to their first quantized coordinates C=
max [
jxk = Ixi } Cxi | Cxi : { r i=min
(6)
so to perform the integration operation dealing with each of the subsets in parallel while ensuring no concurrent memory access. This kind of parallelization is useful not only to improve timing performance but also in scenarios in which map updates may occur from separated sensors in separated chunks, for example in multi-agent localization and mapping [19]. For the record, parallelization across yNodes is also possible but may lead to a computational overhead due to only xNodes being extensive. Nonetheless, we plan to investigate on a possible deeper parallelization of the computation. D. NEAREST NEIGHBOR AND RADIUS SEARCH In a SkipList the Nearest Neighbor Search is straightforward: when we search for a Key in the list we always know the previous and next Keys present in the set, even when the searching Key is missing. A Radius Search around a target index is performed collecting all the elements between two indexes Ir− , Ir+ obtained starting from a center index and computing the boundaries with discrete radius dimension: radius Ir+ , Ir− = I ± (7) resolution as a SkipList is an ordered linked-list, iterating from Ir− to Ir+ allows for executing the search with O(k) time complexity, k being the number of elements within the range (Range Search). We can extend this approach to each of the SkipLists present in our Tree, so to perform a Range Search along each x, y, z dimension and obtain a Box Search. Then, filtering all the points found within the Box based on the distance from the box center allows for fetching a Sphere and
thus achieve a Radius Search. As it will be shown in Section IV, thanks to the parallelization approach enabled by our data structure and discussed in previous section, our method outperforms standard implementation of search algorithms such as the KD-Tree or Octree. E. MAP UPDATE ON POSE GRAPH OPTIMIZATION
Fig. 6. The Pose History consists of a set of queues associated with Sensor Measurements (SM). This structure allows for linking diffent poses to any SM so to keep track of which pose has been used to integrate them into the map as well as of the existence of newer ones possibly produced by the online pose optimization process. For example, at time t2 the history linked to SM0 shows that the mesaurements have been fused into the map according to P0 but there exists a newer pose, i.e. P2 : the Pose Integrator may choose to erode SM0 from the map according to P0 and fuse measurements back according to P2 , marking then the latter as the last integrated pose for SM0 . Conversely, the last pose and last fused pose associated with SMn do coincide, so no action would be taken by Pose Integrator fot those measurements.
The idea of Erosion of past sensor measurements and Fusion (or Integration) of new ones in a voxel grid was first introduced by Fioraio et. al. [14]. The integration procedure, described by Curless et. al. [20], allows to fuse sensor measurement in a voxel grid according to a weight; for example, to integrate the occupancy probability: P (v)W (v) + pi (v)wi (i) , W 0 (v) = W (v) + wi (v) W (v) + wi (v) (8) 0 where P (v), P (v) are the new and old occupancy probability of voxel v, respectively. W 0 (v), W (v) the new and the old weight. As proposed in [14], the Erosion process consists in just inverting the integration process: P 0 (v) =
P (v)W (v) − pi (v)wi (i) , W 0 (v) = W (v) − wi (v) W (v) − wi (v) (9) Erosion and fusion of sensor measurements may be deployed in conjunction with any sensor localization module capable of delivering optimized poses, e.g. upon detection of a loop closure. Thereby, the map may be updated by removing sensor measurements according to old poses and fusing them back according to the new, optimized poses. Our mapping system supports this feature by a weight field and a generic data type associated with each voxel, which allows the user to handle any desired kind of measurement (Occupancy P 0 (v) =
Probability, SDF, RGB ....) in order to implement equations 8 and 9. However, though a sensor tracker typically produces poses at a certain controlled and approximately fixed pace (e.g. at every new sensor measurement or a controlled subset of them), optimized poses are delivered asynchronously with respect to such a regular rhythm, e.g. because a loop closure has been detected, and may happen to compete with live tracking as concerns updating the map. Therefore, as illustrated in Figure 6, we have endowed SkiMap with a Pose Manager capable to create a Pose History: the system treats live poses and optimized poses seamlessly by inserting them in a set of queues, each associated with the sensor measurements (e.g. a depth image) taken at a certain time stamp; a Pose Integrator chooses from the Pose History a subset of poses and integrates the associated sensor measurements in the voxel map; if the pose that’s about to be integrated is an optimized one, its predecessor will be eroded from the map first. The choice of the subset of poses to be integrated into the map occurs according to the following criteria: • live poses must be integrated as soon as possible. • among optimized poses, those spatially closer to the current live pose are picked first. • the upper bound of the subset cardinality is fixed to ensure predictable computation time. F. GROUND TRACKING AND 2D QUERYING Although our proposal may be considered a generic 3D Mapping framework, it has been conceived to address robot navigation scenarios. Therefore, we found it useful to endow the framework with a module dedicated to tracking the ground plane. Thus, upon activation of the ground tracking module, the camera mounted on-board the robot must get a shot of the ground plane in the very first frame. The main plane found in the first frame is treated as the ground plane, which allows for classifying easily all the 3D points sensed in the successive frames as either ground or obstacle points. This technique permits also to set the Zero Reference Frame of our map in the centroid of the first floor, thereby ensuring that z coordinates are zero near the ground. More generally, if the core SkiMap algorithm may be provided with measured points classified as ground or obstacles, they can be integrated in the map differently, and, in particular, so as to reduce the time complexity to integrate the former. Indeed, with reference to Figure 5, integrating a ground point boils down to allocating or updating only a green tile/yNode rather than a voxel, which implies reaching just depth level 2 of our SkipList Tree, whilst integrating obstacle points would require going deeper to reach level 3. We can make the same point as visiting through the SkipList Tree: should we wish to retrieve only the information about ground in order to obtain a 2D Map we would need to visit the tree only up to depth level 2, thereby reducing time complexity dramatically as vouched by Figure 9 . The figure shows also that the ability to create extremely rapidly a 2D view of the 3D Map is peculiar to SkiMap, a classical approach like the Octree being much slower due to
the need to visit all voxels and project them on the ground in order to retrieve a 2D Map. G. IMPLEMENTATION DETAILS SkiMap is implemented in C++ and wrapped in a ROS package, so to maximize its portability and usability in the robotics community. Thanks to widespread use of C++ Generics, the SkiMap data structure is contained in a couple of header files. Furthermore C++ Generics enable to chose Data Type to represent coordinates: for example, in our current implementation we have chosen short as index data type allowing values in range [−32.768, 32.768], which results in a map of 655.36m along each dimension with a resolution of 0.01m. Also voxels are templetized so to allow the user to store whatever information therein. IV. RESULTS The SkiMap mapping framework has been evaluated using some heterogeneous datasets categorized as follows: • Medium-sized datasets captured with RGB-D sensors [21]. • Public large-sized datasets captured with laser scanners mounted on pan-tilt units (Freiburg Campus 2 , New College [22]). • Small and Medium-sized datasets captured in our Lab through RGB-D sensor on mobile robots (Figure 12). The public datasets are endowed with ground truth camera poses, while in the experiments concerning our datasets we deploy Slamdunk[2] to track the camera. Thus, the quantitative evaluation reported in Figures 7, 8, 9, 10 deals with the first two categories only - because of the availability of ground poses - and concerns a comparison between SkiMap and the Octree3 that is, to the best of our knowledge, the foremost mapping solution in terms of memory efficiency. To attain a more comprehensive assessment, for each dataset we have considered multiple map resolutions, i.e. 0.05m, 0.1m and 0.2m. In Figure 10 we have considered also the kd-tree4 because of its wide adoption in spatial search tasks such as radius search. All the experiments have been run on a 5th generation Intel Core i7. First we have assessed basic tasks like “Integrating New Measurements” (Figure 7) and “Visiting the Map” (Figure 8), finding out that SkiMap is almost always more efficient than the Octree. Figure 9 highlights how the 2D Query feature introduced in III-F enables to outperform the Octree in obtaining a similar representation. A qualitative example of the 2D Query feature can also be seen in Figure 11, with the ground correctly reconstructed; it is worthwhile pointing out here that, as vouched by Figure 9, obtaining this kind of representation by performing per-voxel projection to ground would imply a significantly higher time complexity. Finally, Figure 10 is about the timing performance of the radius search task, quite relevant, e.g., for the sake of 2 Courtesy of B. Steder, available at http://ais.informatik. uni-freiburg.de/projects/datasets/fr360/ 3 version used: https://github.com/OctoMap/octomap 4 version used: http://pointclouds.org/
Dataset
Type
Memory Saving wrt 3D Grid Resolutions 0.05m 0.1m 0.2m
octree skimap
98.75% 98.52%
96.21% 94.28%
90.61% 83.33%
octree skimap
99.74% 99.77%
99.00% 98.84%
96.76% 95.30%
octree skimap
90.50% 82.62%
84.71% 71.30%
74.91% 54.63%
Freiburg Campus 292 × 167 × 28m3
New Dataset College 250 × 161 × 33m3
Freiburg Long Office 23 × 25 × 10m3
TABLE II P ERCENTAGE OF MEMORY SAVINGS WITH RESPECT TO A FULL 3D GRID .
avoiding obstacles while navigating within the workspace under reconstruction Figure 10 points out the much higher efficiency of SkiMap with respect to both Octree and kd-tree, even without considering the initialization time to build the index required by the kd-tree which is not accounted for in the Figure. As for memory occupancy, Table II highlights how SkiMap tend to be almost as efficient as the Octree in case of large environments while providing less memory savings with smaller workspaces. As for the experiments dealing with datasets taken in our Lab, we used two mobile robots, namely Youbot [23] and Tiago 5 , equipped with a Asus Xtion RGB-D sensor and, rather than relying on ground truth information, deployed SlamDunk[2] to track the robot/camera 6-DOF pose and fuse sensor measurements into the map according to the estimated poses. Furthermore, leveraging on the Pose Optimization module offered by SlamDunk, we can realize the Map Update feature of SkiMap (see Section III-E). Both robots were operated manually, in small (Youbot) and medium (Tiago) sized environments within our Lab, so to collect and fuse together multiple sensor measurements in order to reconstruct a map of the explored workspace. Figure 12 depicts examples of reconstructed maps with and without the Map Update process enabled by SlamDunk’s Pose Optimization module. It is worthwhile pointing out that with our approach the optimized maps are not attained offline within a post-processing step but built in real-time as described in III-E. V. CONCLUDING REMARKS In this work we have described a novel mapping approach mainly devoted to robot navigation. The primary objective was to provide an efficient mapping framework suitable to real-time applications in embedded robotics platforms. Thus, unlike approaches that focus on dense and accurate 3D reconstruction, such as e.g. [15], our method is aimed at building as efficiently as possible the kinds of representation required to support robot navigation effectively. In its current state the framework can also provide some basic form of semantic information, such as telling apart ground and obstacles. We plan to enrich the degree of semantic perception accommodated by SkiMap by incorporating detection of certain object instances [24], e.g. items to be picked or 5 http://tiago.pal-robotics.com/
Fig. 7. Time to integrate new measurements into the map with increasing number of total points. The first three datasets deal with RGB-D sensors (∼ 320k points per scan) while the last one was acquired by a Laser Scanner mounted on Pan-Tilt unit (∼ 180k points per scan). SkiMap provides inferior performance in the last dataset due to the scans featuring very spread and distant points (up to 50m).
Fig. 8.
Time to visit the whole map.
Fig. 9. Comparison between 3D and 2D reconstructions. The Octree requires the same time to perform a full 3D or a 2D reconstruction because in both cases it needs to iterate over all the 3D points. SkiMap, instead, turns out faster than the Octree in obtaining a 3D map as well as much faster in creating a 2D map thanks to the 2D Query feature.
Fig. 10.
Time to perform a radius search with increasing of radius size. SkiMap outperforms both the Octree and the kd-tree on all datasets.
manipulated by the robot, as well as by leveraging on perframe Semantic Segmentation so to fuse category labels into the map [25], [26]. R EFERENCES [1] M. Quigley, K. Conley, B. P. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating system,” in ICRA Workshop on Open Source Software, 2009. [2] N. Fioraio and L. Di Stefano, “Slamdunk: Affordable real-time rgbd slam,” in Computer Vision - ECCV 2014 Workshops: Zurich, Switzerland, September 6-7 and 12, 2014, Proceedings, Part I, 2015, pp. 401–414. [3] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). The MIT Press, 2005.
[4] G. Grisetti, C. Stachniss, and W. Burgard, “Improved techniques for grid mapping with rao-blackwellized particle filters,” IEEE Transactions on Robotics, vol. 23, no. 1, pp. 34–46, Feb 2007. [5] Y. Roth-Tabak and R. Jain, “Building an environment model using depth information,” Computer, vol. 22, no. 6, pp. 85–90, June 1989. [6] D. J. Meagher, “Geometric modeling using octree encoding,” in Computer Graphics and Image Processing, 1982, pp. 129–147. [7] A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “Octomap: an efficient probabilistic 3d mapping framework based on octrees,” Autonomous Robots, vol. 34, no. 3, pp. 189–206, 2013. [Online]. Available: http://dx.doi.org/10.1007/ s10514-012-9321-0 [8] M. Labschtz, S. Bruckner, M. E. Grller, M. Hadwiger, and P. Rautek, “Jittree: A just-in-time compiled sparse gpu volume data structure,” IEEE Transactions on Visualization and Computer Graphics, vol. 22,
Fig. 11. A Map built from Corridor Dataset collected in Octomap[7]. SkiMap allows for efficiently detecting the ground and, without further computational cost, discard higher obstacles like the roof (red voxels in the left image) and labeling the ground voxels as navigable (white regions in the right image).
Fig. 12. The first row concerns a small room (5m × 4m × 3m) reconstructed by Youbot in eye-on-hand configuration. The second row represents a medium-size environment (8m × 35m × 3m) reconstructed by Tiago through an RGB-D camera mounted on the head. The middle column highlights the significant improvement in reconstruction accuracy provided by the real-time map optimization process.
no. 1, pp. 1025–1034, Jan 2016. [9] R. Triebel, P. Pfaff, and W. Burgard, “Multi-level surface maps for outdoor terrain mapping and loop closing,” in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct 2006, pp. 2276–2282. [10] I. Dryanovski, W. Morris, and J. Xiao, “Multi-volume occupancy grids: An efficient probabilistic 3d mapping model for micro aerial vehicles,” in Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, Oct 2010, pp. 1553–1559. [11] J. S. Gutmann, M. Fukuchi, and M. Fujita, “A floor and obstacle height map for 3d navigation of a humanoid robot,” in Proceedings of the 2005 IEEE International Conference on Robotics and Automation, April 2005, pp. 1066–1071. [12] D. Maier, A. Hornung, and M. Bennewitz, “Real-time navigation in 3d environments based on depth camera data,” in 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), Nov 2012, pp. 692–697. [13] J. Biswas and M. Veloso, “Depth camera based indoor mobile robot localization and navigation,” in Robotics and Automation (ICRA), 2012 IEEE International Conference on, May 2012, pp. 1697–1702. [14] N. Fioraio, J. Taylor, A. Fitzgibbon, L. D. Stefano, and S. Izadi, “Large-scale and drift-free surface reconstruction using online subvolume registration,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 4475–4483. [15] A. Dai, M. Nießner, M. Zollh¨ofer, S. Izadi, and C. Theobalt, “Bundlefusion: Real-time globally consistent 3d reconstruction using online surface re-integratio,” arXiv preprint arXiv:1604.01093, 2016. [16] M. Nießner, M. Zollh¨ofer, S. Izadi, and M. Stamminger, “Real-time 3d reconstruction at scale using voxel hashing,” ACM Transactions on Graphics (TOG), 2013. [17] W. Pugh, “Skip lists: A probabilistic alternative to balanced trees,” Commun. ACM, pp. 668–676, 1990.
[18] ——, “Concurrent maintenance of skip lists,” Tech. Rep., 1998. [19] L. E. Parker, K. Fregene, Y. Guo, and R. Madhavan, “Multi-robot localization, mapping, and path planning,” in Multi-Robot Systems: From Swarms to Intelligent Automata: Proceedings from the 2002 NRL Workshop on Multi-Robot Systems. Springer Science & Business Media, 2013, p. 21. [20] B. Curless and M. Levoy, “A volumetric method for building complex models from range images,” in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 1996, pp. 303–312. [21] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 573–580. [22] M. Smith, I. Baldwin, W. Churchill, R. Paul, and P. Newman, “The new college vision and laser data set,” The International Journal of Robotics Research, vol. 28, no. 5, pp. 595–599, May 2009. [Online]. Available: http://www.robots.ox.ac.uk/NewCollegeData/ [23] R. Bischoff, U. Huggenberger, and E. Prassler, “Kuka youbot-a mobile manipulator for research and education,” in Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, 2011, pp. 1–4. [24] N. Fioraio and L. Di Stefano, “Joint detection, tracking and mapping by semantic bundle adjustment,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013. [25] T. Cavallari and L. Di Stefano, “Volume-based semantic labeling with signed distance functions,” in Pacific-Rim Symposium on Image and Video Technology. Springer International Publishing, 2015, pp. 544– 556. [26] ——, “On-line large scale semantic fusion,” in Computer Vision – ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III, 2016, pp. 83–99.