A Cost Effective Open-Source Three-dimensional ...

1 downloads 0 Views 1MB Size Report
system is explained, which includes data acquisition and preprocessing and 3D reconstruction. Section III describes the proposed trajectory analysis algorithm.
A Cost Effective Open-Source Three-dimensional Reconstruction System and Trajectory Analysis for Mobile Robots Rafael Ferrari Pinto, Andre G. S. Conceicao, P. C. M. A. Farias and Eduardo T. F. Santos

Abstract— This paper discusses the acquisition of image and depth data from the Microsoft Kinect's sensors for the subsequent reconstruction of a tridimensional environment, also emphasizing the stage of data pre-processing. Additionally, an algorithm that uses 3D reconstruction in order to verify if a selected trajectory is a viable route without obstacles along its path is proposed. Results from tridimensional environment reconstructions and viable trajectories are presented to demonstrate the performance of the approach. Index Terms  Computer Vision, Kinect, 3D Reconstruction, Trajectory planning, Mobile robots. I.

INTRODUCTION

Computer vision provides a set of techniques and theoretical tools that enable machines to visual perception aspects of the real world and its subsequent interpretation in order to achieve certain goals [1]. To obtain a complete computer vision system, we need sensors that can perceive the environment and an efficient system that can extract the relevant features. In several areas of application, the mapping of a physical environment can be very useful, for example, analytical methods for inhospitable environments, such as deep wells [2]; systems of localization for robotics [3] [4]; or futuristic holographic three-dimensional environments. All these applications require an appropriate sensing for mapping and visualization of the real world with the goal of creating a useful model. The use of techniques of digital signal processing applied to computer vision enables the extraction of parameters for object recognition and other information from environment. The main problem of the use of monocular systems in these applications is that the range of depth of environment cannot be determined without using other sensor [5], which reinforces the current trend in the use of TOF cameras ("Time Of Flight"). The TOF cameras obtain the depth associated with the pixels of an image. Thus, one can obtain the depth of each pixel of a color image, to generate a three-dimensional surface (3D) of the region under analysis. The cost of these cameras has been reduced thanks to its popularization in the electronic entertainment industry, in order to make them viable for applications in general. In particular may be mentioned Microsoft Kinect [6], which has conventional and R. F. Pinto, A.G.S. Conceicao and P. C. M. A. Farias are with the LaR Robotics Laboratory, Department of Electrical Engineering, Federal University of Bahia, Salvador, Brasil. (e-mail: [email protected]). E. T. F. Santos is with the Instituto Federal de Educação, Ciência e Tecnologia da Bahia. (e-mail: [email protected]).

depth cameras, providing a color and depth of an associated three-dimensional environment image. In literature, there are several approaches to threedimensional reconstruction, and according to the application and the resources there are techniques that best fit. The Kinect depth sensor is not based on duality of cameras; so, techniques such as those described by [7] and [8] are not a good option to the research described in this article, which the main objective is a procedure for three-dimensional reconstruction based on low-cost sensors. The use of Microsoft Kinect sensors as three-dimensional reconstruction tools have been increasingly relevant [9], resulting in greater documentation and justifying the choice of this type of sensor for future application in Mobility devices, such as Wheelchairs [10]. This kind of mobile robot has used Kinect in its control systems, for example to map the environment around and avoid obstacles [11] or in order to recognize gestures and consequently commands [12]. This paper presents a procedure for data acquisition and 3D reconstruction from Microsoft Kinect for Xbox 360’s depth data. The Kinect depth map is usually incomplete, which may require pre-processing to get a full depth map, even if interpolated in some way, and that was performed using Inpainting technique [13]. Additionally, we propose an algorithm that uses 3D reconstruction in order to verify if a desired trajectory is a viable route without obstacles along its path. The paper organization attempt to the following sequence: in Section II the Three-dimensional reconstruction system is explained, which includes data acquisition and preprocessing and 3D reconstruction. Section III describes the proposed trajectory analysis algorithm. Finally, Section IV presents the conclusion and future works.

II. THREE-DIMENSIONAL RECONSTRUCTION SYSTEM A. Equipment and Software The Microsoft Kinect is a system composed by an RGB camera for the acquisition of single color images, and a system (sensor/ receiver) based on infrared, for determination of depth maps [14]. This integrated system has been widely used in robotics, it consists of a robust set of sensors, is relatively inexpensive and quite efficient. Its mechanism to compute the depth emits infrared signals in a pseudo-random pattern to be subsequently received by the CMOS sensor after reflecting on physical objects in the environment [15].

The OpenKinect project emerged as a set of open-source tools geared toward the acquisition and use of sensors comprehensively. With the drivers and features implemented in the core library you can export the data read by the sensor for files or integrate them directly with any type of application. The Matlab was used for programming and computational image processing, which has robust features for the treatment of numerical arrays and used a wide variety of libraries available for extension and compatibility. B. Data acquisition and preprocessing The geometric model of the RGB and depth cameras of the Microsoft Kinect, which projects a tri-dimensional point X into an image point [ , ] , is given by [16]: = 1

2 + ) 2

= 1

+

(1 +



1

+

!"#!$ "#%&' &#'(

+ +

+

(

+(

1

+2 +2

&!(-.(/#!$ "#%&' &#'(

(1)

) + 0 ) ) ,

(2)

1 ² = ² + ², 1 = 2(3 − 5) (3) 1 Where = [ , , … , ] are the distortion parameters, K is the callibration matrix with rotation R, and the camera center is C. The Kinect’s depth camera is associated with the infrared sensor’s geometry. It gives the inverse depth 7 associated with the 1 axis (depth) for each pixel [ , ] as follows: − ; 8 − ; 9 = : @ (4) / ∙ − ? 7 /< > /