Generation of RGB-D data for SLAM using robotic framework V-REP P.S. Gritsenko1,a) , I.S. Gritsenko2,b) , A.Zh. Seidakhmet3,c) and A.E. Abduraimov4,d) 1
Al-Farabi Kazakh National University, Almaty, Kazakhstan Al-Farabi Kazakh National University, Almaty, Kazakhstan 3 Al-Farabi Kazakh National University, Almaty, Kazakhstan 4 Al-Farabi Kazakh National University, Almaty, Kazakhstan 2
a)
[email protected] [email protected] c)
[email protected] d)
[email protected] b)
Abstract. In this article, we will present a methodology to debug RGB-D SLAM systems as well as to generate testing data. We have created a model of a laboratory with an area of 250 m2 (25 x 10) with set of objects of different type. V-REP Microsoft Kinect sensor simulation model was used as a basis for robot vision system. Motion path of the sensor model has multiple loops. We have written a program in V-Rep native language Lua to record data array from the Microsoft Kinect sensor model. The array includes both RGB and Depth streams with full resolution (640 x 480) for every 10 cm of the path. The simulated path has absolute accuracy, since it is a simulation, and is represented by an array of transformation matrices (4 x 4). The length of the data array is 1000 steps or 100 m. The path simulates frequently occurring cases in SLAM, including loops. It is worth noting that the path was modeled for a mobile robot and it is represented by a 2D path parallel to the floor at a height of 40 cm.
INTRODUCTION Data arrays in the public domain are able to significantly advance practically any field of science. Such data creates opportunity for objective analysis and testing of various theories and algorithms. The article touches one of the most pressing problems in robotics, the problem of simultaneous localization and mapping (SLAM). The SLAM problem has been well studied for sensors such as sonar, laser, cameras, TOF sensors and other. However, comparatively inexpensive RGB-D sensors, such as Kinect, have appeared recently as well as SLAM systems using them [1, 2, 3]. There are arrays of real RGB-D data presented in the public domain, but real data is in demand for testing production version of an algorithm, but not for developing and debugging a new one. Unlike real data, simulation is necessary at the initial stages of system design, when any changes even replacement of hardware used for vision system can be introduced with minimum additional labor costs. Simulation is a crucial moment for creating fully autonomous system capable of solving not only SLAM, but also tasks of path planning, path execution, identification of objects, etc. It is necessary because development process is iterative, thus changes will be introduced on a regular basis (sensors, algorithms, architecture etc.) and system should be tested each time changes have been introduced. This way using simulation one can avoid significant labor costs.
Literature review The problem of simultaneous localization and mapping has many developments both in robotics [4] - [7] and in computer vision [5, 8, 9].
Various methods of working with sensors have already been studied, including monocular cameras [5, 9, 12], two-dimensional laser scanners [10], stereo-systems [13], 3D scanners [11], and recently RGB-D sensors such as Microsoft Kinect [1, 2, 3]. There are many data sets for SLAM systems based on laser cameras such as Rawseeds and Intel [14, 15]. For SLAM systems based on stereo images visual odometry data with real positions was presented [16]. Worth noting that there are no depth images presented in this sample. For SLAM systems based on RGB-D sensors, there are two arrays of data. The first one is represented by nontextured point cloud and was recorded using Microsoft Kinect. A special tracking system was applied to record this array of data [17]. The second one was recorded without any tracking system. Given no correct sensor positions it was supposed to restore a sensor path from RGB images, but the method failed to give the proper level of accuracy [18].
Microsoft Kinect sensor model Kinect (formerly Project Natal) is a game controller originally introduced for the Xbox 360 console, and much later for personal computers. The Kinect kit includes (Fig. 1): • • • •
640x480 RGB camera to capture a color image; Infrared (IR) emitter and depth sensor. The emitter emits infrared rays of light and a depth sensor reads IR rays reflected back. Reflected rays are transformed into depth - a distance between an object and a sensor; An array of microphones to capture sound. Since there are several microphones, there is possibility to identify location of a sound source and direction of an audio wave; 3-axis accelerometer to determine current Kinect orientation.
FIGURE 1. Microsoft Kinect components.
V-Rep supports connection and usage of a real Microsoft Kinect sensor, as well as usage of its model (Fig. 2). It is worth noting that the model simulates only RGB camera and Depth camera i.e. an array of microphones and an accelerometer are not presented in the model.
FIGURE 2. V-Rep Microsoft Kinect sensor model.
We use Kinect, because it was originally built into the architecture of our system. However, V-Rep provides a powerful abstraction of sensors used in Kinect model as well as in all other sensor models: • •
Proximity sensor (Ray-type, Randomized ray-type, Pyramid-type, Cylinder-type, Disk-type, Cone-type); Vision sensors (Orthographic projection-type, Perspective projection-type).
Using this abstraction and parameterization, one can create a model that will serve as an analog for almost any real sensor.
Array of data In our sample, there is both an array of RGB-D data and an array of transformation matrices that provide the correct path of a sensor. A model of a laboratory with an area 250 m2 filled with set of objects of different type was created (Fig. 3). While sensor model executes path each 10 cm RGB image, depth image is recorded, as well as transformation matrices necessary for correct path restoration. The Depth image is recorder as scaled distance from 0-255 and is saved exactly in image format, rather than in text format. This way, volume of data is reduced approximately 20 times. The total amount of data is 234 Mb per 1000 steps, or 234 Kb per 1 step.
FIGURE 3. Laboratory model and sensor path.
Worth noting, that loop closures are presented on the simulated path. Loop closures are used as a correction tool in many SLAM systems [19, 20]. Thus, loop closures are necessary to perform an objective analysis of SLAM systems. There are additional features provided by V-Rep and our laboratory model: •
•
•
•
Powerful visualization. It creates ability to analyze SLAM system from different angles and to identify the most likely cause of a problem. Simulation provides a complete picture of an environment. In the same time real data has a problem that depending on the sensor type specific tools are required to restore environment; Parameterization. Once real data captured it can no longer be changed. In the same time simulation provides full control over an environment and a robot. Subsequently, under such conditions, SLAM system can be tested in a much larger variety of cases. This also includes control over the level of noise and interference in the sensor data. This is a particular advantage for debugging process, since SLAM system robustness can be examined under different levels of noise; Testability of an experiment - solves the problem of testing fundamentally different sensors with identical conditions, for example, laser and RGB-D sensor. This way, it is possible to test many combinations of sensors of different types and SLAM systems; Minimal labor costs - in fact, simulation excludes human factor from an experiment, which eliminates errors in obtained data, and significantly speeds up and reduces cost of producing and testing complete SLAM system.
SLAM system evaluation One common metric does not require a correct path, and it lies in measuring internal errors after map optimization, such as reprojection errors or, more commonly, χ2 errors [7, 21]. However, a low χ2 error does not guarantee quality of mapping, or path accuracy and correctness. We have a correct path represented by an array of transformation matrices, thus we can use another metric - to evaluate SLAM system by its output values - a map or a path, in other words, to compare obtained result with a correct one [21]. SLAM can be verified from results of its mapping, for example, compare resulting map with corresponding
model or by a correspondence of a reconstructed path and a real one. Two commonly used methods are the relative pose error (RPE) and the absolute trajectory error (ATE). RPE measures a difference between a reconstructed path and a correct one. It can be used to estimate an error of visual odometry systems [22] or accuracy of SLAM with loop closures [19, 20]. Instead of evaluating relative differences, ATE first aligns two paths, and then directly estimates absolute differences. This method is well suited for evaluating visual SLAM-systems [21, 23], but requires a correct path.
Conclusions In this article, we presented a methodology for generating RGB-D data and debugging RGB-D SLAM systems. A model of a laboratory with an area 250 m2 filled with set of objects of different type was created. Motion path was modeled in such a way that it has multiple loop closures which are important for SLAM systems. A program recording array of data from the Kinect sensor model was written in Lua. The array includes both RGB and Depth images in full resolution (640 x 480) for every 10 cm of the path. The simulated path has absolute accuracy, since it is a simulation, and it is represented by an array of transformation matrices (4 x 4). The length of the data array is 1000 steps or 100 m. Evaluation metrics for SLAM performance were presented.
ACKNOWLEDGMENTS This research is financially supported by a grant from the Ministry of Science and Education of the Republic of Kazakhstan (Grant No. 0762/GF4).
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8]
[9] [10] [11] [12] [13] [14] [15]
P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, RGB-D mapping: Using depth cameras for dense 3D modeling of indoor environments, in Intl. Symp. on Experimental Robotics (ISER), 2010. N. Engelhard, F. Endres, J. Hess, J. Sturm, and W. Burgard, Realtime 3D visual SLAM with a hand-held RGB-D camera in RGB-D Workshop on 3D Perception in Robotics at the European Robotics Forum, 2011. C. Audras, A. Comport, M. Meilland, and P. Rives, Real-time dense appearance-based SLAM for RGB-D sensors in Australasian Conf. on Robotics and Automation, 2011. F. Lu and E. Milios, Globally consistent range scan alignment for environment mapping Autonomous Robots, vol. 4, no. 4, pp. 333 349, 1997. G. Klein and D. Murray, Parallel tracking and mapping for small AR workspaces in IEEE and ACM Intl. Symposium on Mixed and Augmented Reality (ISMAR), 2007. G. Grisetti, C. Stachniss, and W. Burgard, Non-linear constraint network optimization for efficient map learning IEEE Transactions on Intelligent Transportation systems, vol. 10, no. 3, pp. 428439, 2009. R. Kummerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard, g2o: A general framework for graph optimization, in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2011. H. Jin, P. Favaro, and S. Soatto, Real-time 3-D motion and structure of point features: Front-end system for vision-based control and interaction in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2000. J. Stuhmer, S. Gumhold, and D. Cremers, Real-time dense geometry from a handheld camera in DAGM Symposium on Pattern Recognition (DAGM), 2010. G. Grisetti, C. Stachniss, and W. Burgard, Improved techniques for grid mapping with rao-blackwellized particle filters, IEEE Transactions on Robotics (T-RO), vol. 23, pp. 3446, 2007. A. Segal, D. Haehnel, and S. Thrun, Generalized-icp, in Robotics: Science and Systems (RSS), 2009. H. Strasdat, J. Montiel, and A. Davison, Scale drift-aware large scale monocular SLAM, in Proc. of Robotics: Science and Systems (RSS), 2010. A. Comport, E. Malis, and P. Rives, Real-time quadrifocal visual odometry, Intl. Journal of Robotics Research (IJRR), vol. 29, pp. 245266, 2010. C. Stachniss, P. Beeson, D. Hahnel, M. Bosse, J. Leonard, B. Steder, R. Kummerle, C. Dornhege, M. Ruhnke, G. Grisetti, and A. Kleiner, Laser-based SLAM datasets. The Rawseeds project, http://www.rawseeds.org/rs/datasets/.
[16] [17] [18] [19] [20]
[21] [22] [23]
A. Geiger, P. Lenz, and R. Urtasun, Are we ready for autonomous driving? the KITTI vision benchmark suite, in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Providence, USA, June 2012. F. Pomerleau, S. Magnenat, F. Colas, M. Liu, and R. Siegwart, Tracking a depth camera: Parameter exploration for fast ICP, in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2011 S. Bao and S. Savarese, Semantic structure from motion, in IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), 2011. R. Kummerle, B. Steder, C. Dornhege, M. Ruhnke, G. Grisetti, C. Stachniss, and A. Kleiner, On measuring the accuracy of SLAM algorithms, Autonomous Robots, vol. 27, pp. 387407, 2009. W. Burgard, C. Stachniss, G. Grisetti, B. Steder, R. Kummerle, C. Dornhege, M. Ruhnke, A. Kleiner, and J. Tardos, A comparison of SLAM algorithms based on a graph of relations, in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2009. E. Olson and M. Kaess, Evaluating the performance of map optimization algorithms, in RSS Workshop on Good Experimental Methodology in Robotics, 2009. K. Konolige, M. Agrawal, and J. Sola, Large scale visual odometry for rough terrain, in Intl. Symposium on Robotics Research (ISER), 2007. W. Wulf, A. Nuchter, J. Hertzberg, and B. Wagner, Ground truth evaluation of large urban 6D SLAM, in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2007.