surveillance system, human-computer interface, video compressing and robotics. The target object .... The hardware used is a notebook HP Pavilion G6, Intel.
3D Color-based Tracking System for Real-time using Kinect Sensor Francisco Jurado, Guillermo Palacios, Francisco Flores División de Estudios de Posgrado e Investigación/Instituto Tecnológico de la Laguna Blvd. Revolución y Clzada. Cuauhtémoc S/N, Torreón, Coahuila de Zaragoza, México Email: [fjurado][gpalacios][fflores]@itlalaguna.edu.mx ABSTRACT In this paper, we propose a method for real-time color-based object tracking in a three dimensional (3-D) space using color and depth information provided by the Microsoft KINECT™ sensor (color and depth cameras). The object segmentation is performed using color filtering based on the HSV (Hue, Saturation, Value) color model (color camera). After that the segmentation procedure is carried out, the centroid for the target is calculated. At this point, with this information and using the calibration parameters between depth and color cameras, the system gets the depth information (depth camera) corresponding to the centroid for the target. In this way, 3-D position of the center for the target object is then available. Our system is based on open source and free software for Windows 64 bits platform. The performance of our tracking system is validated via experimental results in real time. KEYWORDS: HSV color model, Kinect sensor, object tracking, open source.
1
INTRODUCTION
Vision systems are focusing now on new applications thanks to innovations in cameras. Visual object tracking is one of these applications, which can be used in fields like traffic control, surveillance system, human-computer interface, video compressing and robotics. The target object may have special characteristics like shape, color or size. Robust tracking strategies have been proposed to deal with changes in target appearance and track targets with complex motions. Optic flow, matching, shape, and color-based techniques can be used in order to solve the object tracking problem. Many works deal with color-based object tracking approach, which consists about the processing of a 2-D image captured either by a color or gray camera. They are color-based robust techniques, which implies a modest computational cost, like CamShift[1] and MeanShift [2]. The object tracking in a 3-D space complexity is due to the lack of low cost depth sensors. Nowadays, this problem has been solved from the launching in November 2010 of the Microsoft KINECT™ sensor (see Fig. 1).
Figure 1. Microsoft KINECT™ sensor. From left to right: IR light source, RGB camera, and monochrome camera (depht sensor).
KINECT™ sensor is a low cost device that provides, at a frame rate of 30 Hz, RGB video stream as well as a monochrome intensity coded depth map both in VGA resolution (640x480 pixels). The video offers 8-bit resolution while depth map has 11-bit resolution. In [3], part numbers 1
of the hardware inside the KINECT™ are available. The sensor has an angular field of view of 57° horizontally and 43° vertically. RGB camera uses a 1.3 Megapixels MT9M112 sensor and the monochrome camera (depth sensor) uses a 1.3 Megapixels MT9M001C12STM sensor. The depth sensor has a practical operation range between 1.2 m to 3.5 m [4] and an extended range between 0.4 m to 7.0 m. KINECT™ sensor also has a microphone array as well as a motorized tilt mechanism. Connection between PC/XBOX 360™ with the KINECT™ peripheral device can be achieved through a proprietary connector combining USB communication with additional power. KINECT™ sensor was designed to use only with XBOX 360™ video game console but thanks to Microsoft, OpenKinect [5], and OpenNI [6] the peripheral device can be connected to a PC. OpenNI is a not-for-profit organization formed to certify and to promote the compatibility and interoperability of natural interaction (NI) devices, applications, and middleware [6]. One goal for OpenNI organization is to introduce NI applications into the marketplace. OpenNI was founded by important leaders organizations like Willow Garage [7] (experts in personal robotics applications), Asus [8] (hardware for NI applications), Side Kick [9] (motion control games), and PrimeSense™ [10]. In [11], the use of the KINECT™ sensor in a real-time four-rotor helicopter, known as quadrotor, control application has been reported. Camera calibration was performed via stereo vision technique. Intrinsic parameters of the RGB camera were calculated using the chessboard approach implemented by a function from OpenCV [12] (Open Source Computer Vision Library). Calibration of the intrinsic monochrome camera parameters as well as the extrinsic parameters of both cameras was performed manually. Depth mapping was used in order to control the altitude for the quadrotor. The sensor was mounted under the aircraft pointing towards the ground. An embedded attitude controller running on the quadrotor hardware maintained attitude stabilization while control of altitude was handled by the visual controller. Experimental results showed that the PI controller allowed maintain the altitude for the quadrotor around of an offset from a set-point until the reflection from a wooden floor gave incorrect depth measurements which caused that the aircraft become unstable. Future research is related with the combination of images from both cameras in order to integrate the KINECT™ sensor into the navigation layer of the quadrotor system. A method for real-time three-dimensional object tracking based on integration of 3-D range and color information, using the KINECT™ sensor, is proposed in [13]. It is assumed that intrinsic and extrinsic parameters for depth and RGB cameras are known. Integration of 3-D range and color information is achieved using the estimated intrinsic parameters and relative transformation between cameras. Calibration of a depth mapping, in order to map the 3-D depth to a practical distance in a 3-D scene, using a special function whose algorithm is not described has been assumed. Position and size of the object are estimated under a particle filtering framework. Validation of the proposed method is shown through experimental results using a robotic arm. In vision camera applications, it is often useful to provide feedback to the user about live camera image, objects/colors detection, image manipulation results, and virtual reality. C# [14] (pronounced C Sharp), an evolution of Microsoft C and Microsoft C++, is a powerful programming language designed for building a wide range of enterprise applications that run on the .NET framework. C# is simple, type safe, modern, and object oriented. Emgu CV [15] is a cross platform .NET wrapped to OpenCV that allows OpenCV functions to be called from .NET compatible languages such as C#, Visual Basic, Visual C++, IronPython, just to mention a few. In this work, in order to achieve real-time 3-D object tracking, using the KINECT™ sensor, we propose a color-based approach where images, getting from color and depth cameras, are processed via C# and Emgu CV to then be displayed on a GUI (Graphical User Interface). The performance of our approach is validated via experimental tests in real-time using a 5-DOF Mitsubishi robot manipulator.
2
2
METHODOLOGY
2.1.
Getting depth and color images from KINECT
TM
sensor
KINECT™ sensor delivery simultaneously, at 30 Hz or 30 fps (frames per second), a 640×480 RGB image and a 640×480 depth raw data. In order to process images data by means of C# and Emgu CV is necessary translate them into a proper format. TM On one hand, RGB image from KINECT sensor is captured in a raw format. Thus, to recover each value for red, green, and blue channel from each pixel, a nested double cycle has been implemented in order to create a bitmap format image that can be recognized by C#. Then, it is necessary to translate the bitmap format image into an Emgu CV structure to then be processed by OpenCV libraries. On the other hand, depth raw data have an 11-bit format resolution. Hence, in order to display this information by C# and Emgu CV, a matching into an 8-bit format resolution image is needed. The matching process is performed as follows: • Getting data from memory using pointers. • Normalizing data to values between 0 and 255 to get a histogram of the depth. • Creating a valid bitmap for C# using normalized data. • Converting bitmap to Emgu CV structure.
2.2.
Resizing RGB image
Color image resizing from 640×480 to 320×240 pixels is required in order to get a fast processing of it. This is performed using the Emgu CV function, on which a lineal interpolation method is implemented.
2.3.
Converting RGB image to HSV color model
Once that RGB image has been resized, then it is converted to the HSV color model. The advantages of using the HSV color model [16] are for robust color filtering [17], [18], [19] as well as by its properties of immunity to illumination conditions. Conversion from RGB to HSV model is achieved by the use of the algorithm [17] described below:
if if else
2.4.
; if
return
;
; then then if
if if then
then then else
else else
Segmentation of the target by color filtering
Once that the RGB image is converted into a HSV color model, a segmentation process is then performed to remove pixels that are out of range with respect to HSV values for the target object under the operational condition 3
(1) with , , , and function as destination array, inclusive lower boundary, inclusive upper boundary, and source array; respectively. In order to establish a range, a GUI has been implemented to set the function values easily by the user.
2.5.
Removing segmentation noise
Depending on the scene and the range established through the GUI, the segmentation may results “clean”, getting only the target. However, most of the time the segmentation noise will be present. Therefore, a post-processing can be necessary to refine the segmentation. To do this, two morphological operations [20] can be applied. The first one is the so-called erosion, which is defined as follows
(2) where is the translation of by the vector , i.e., as default a 3×3 rectangular structuring element. This operation can be repeated The last one is the so-called dilation, which is defined as
. Emgu CV sets times.
(3) where denotes the symmetric of , i.e., . Similarly, as with the erosion operation, a 3×3 rectangular structuring element is used. These two procedures are optional and can occur independently. The user can easily select the application of these procedures from the GUI. Figure 2 shows the post-processing results when using the procedures described above. If even after this post-processing similar objects still appearing, i.e., more than one region is exhibited, an extra filtering can be applied depending on the area for each object. This value for the area can be visually adjusted in real-time through the GUI.
4
2.6.
Figure 2. From left to right, up to down: RGB image, HSV filtering, Erode operation, Erode & Dilation. Getting the centroid for the target
After segmentation and post-processing procedures have been carried out, a binary image only containing the pixels of our target is then obtained. The next procedure consists into get the centroid for the target (see Figure 3). The centroid coordinates are calculated using the equations:
(4)
(5) where the coordinate pair denotes the location for the centroid of the target. and is the maximum and minimal value, respectively, on the horizontal axis associated to the edge of the region defined to wrap the target object. Similarly, and is the maximum and minimal value, respectively, on the vertical axis associated to the edge of the region of interest.
5
Figure 3. Centroid for the target object
2.7.
Getting depth value for the centroid
In [21], a method to calibrate the intrinsic and extrinsic parameters of the color and depth cameras from the KINECT™ sensor has been proposed. This method gives almost the same calibration accuracy as that given by the manufacturer [10]. In [22], a typical camera calibration procedure using OpenCV [12] has been carried out. There are also different applications, available for Windows and Linux, in order to accomplish this calibration procedure [23], [24]. In this work, it has been assumed that all intrinsic and extrinsic parameters for both cameras are similar like those given by PrimeSense Ltd. [10], which exhibit a tolerance error about ±3mm [21]. To make possible the system transformation between depth and color camera coordinates (see Fig. 4), the knowledge of the calibration parameters and transformation matrix is essential. As this last information jointly with the conversion of the 11 bits output from the 3-D depth sensor into a practical distance in millimeters is given in advance [10], here it is assumed. Depth information for each pixel from the depth image captured is obtained through the function from OpenNI. Once that the centroid for the target object is calculated, and after that distortion correction and system transformation have been carried out, the occlusion problem is solved (see Fig. 4), i.e., , . Thus, the depth information is now given by .
Figure 4. From up to down: Images from cameras without transformation, images from cameras with transformation
6
3
EXPERIMENTAL RESULTS
In order to evaluate the real-time performance of our method via experimental results, we use a 5-DOF Mitsubishi robot manipulator. The hardware used is a notebook HP Pavilion G6, Intel RCoreTMi5-2430MCPU @ 2.40 GHz., 6 GB RAM, and a Microsoft KINECT™ sensor for XBOX 360™ [4]. The software used is Windows 7 Home Premium 64 bits SP1 (OS), Visual C# Express 2010, OpenNI 1.5.2.23 [6], PrimeSense 3.1.2 [10], and Emgu CV 2.3.0.1416 [15]. Our goal is to achieve 3-D color-based tracking in real-time using the color and depth cameras inside KINECT™ sensor. We have chosen the gripper of a 5-DOF Mitsubishi robot manipulator as target object. From Fig. 3, it can be seen that the gripper is of blue color. We have established a 3-D rectangular path as prescribed trajectory for the gripper. Figure 5 shows the 3-D trajectory described by the gripper in solid line and measurements from the KINECT™ sensor are represented by dots.
Figure 5. Trajectory described by the gripper (solid line) and tracking measurements from the KINECT™ sensor (dots) Figure 6 exhibits deviations from the KINECT™ sensor measurements, for each axis, with respect to the trajectory described by the gripper. The overall procedure for segmentation, postprocessing, and displaying of results on the GUI takes an average time from 5 to 6 milliseconds at a frame rate of 30 Hz. Experimental results in real-time show that the proposed approach achieves 3D color-based object tracking via KINECT™ sensor.
7
Figure 6. Tracking errors: a) c)
4
[mm vs n]; b) [mm vs n]
[mm vs n];
CONCLUSION
In this work, the 3-D color-based object tracking using a KINECT™ sensor is carried out in real-time. It can be seen from the experimental results that the proposed approach fulfills with the 3D color-based tracking achieving a good accuracy. The tracking algorithm is very simple and fast. Through a GUI, the user has the options to set adjustments in real-time like either to ignore objects or choose another color for the target to track. It is important to note that all software here used is open source and free. The validation of the proposed approach in real-time is an interesting step ahead the control of dynamic systems using a low cost vision system.
5
ACKNOWLEDGMENT
The authors would like to thank UVM Campus Torreón to allowing them its facilities. The authors would like also to thank Luis A. Vázquez for his assistance with the 5-DOF Mitsubishi robot operation. The second author thanks to Cecilia Correa for her assistance with the KINECT™ sensor setup. This work has been supported by PROMEP under grant ITLAG-EXB-000.
8
REFERENCES [1] Bradski GR. Real time face and object tracking as a component of a perceptual user interface. Proceedings of the Fourth IEEE Workshop on Applications of Computer Vision. 1998. [2] Comaniciu D, Ramesh V, Meer P. Real-time tracking of non-rigid objects using mean shift. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2000. [3] ChipWorks company. Available at: http://www.chipworks.com (Accessed on: May 16, 2012). [4] Support XBox site. Available at: http://support.xbox.com (Accessed on: Dec 20, 2011). [5] OpenKinect community. Available at: http://openkinect.org (Accessed on: Dec 20, 2011). [6] OpenNI group. Available at: http://openni.org/ (Accessed on: Jan 6, 2012). [7] Willow Garage site. Available at: http://www.willowgarage.com/ (Accessed on: Apr 5, 2012). [8] Asus company. Available at: http://www.asus.com/ (Accessed on: Dec 20, 2012). [9] Side Kick company. Available at: http://www.sidekick.co.il/ (Accessed on: May 10, 2012). [10] PrimeSense Ltd. Available at: http://primesense.com (Accessed on: Jan 10, 2012). [11] Stowers J, Hayes M, Bainbridge-Smith A. Altitude control of a quadrotor helicopter using depth map from Microsoft Kinect sensor. Proceedings of the IEEE International Conference on Mechatronics. Istanbul, Turkey, April 13-15, 2011. [12] OpenCV libraries. Available: http://opencv.willowgarage.com/wiki/ (Accessed on: Nov 10, 2011). [13] Nakamura T. Real-time 3-D object tracking using Kinect sensor. Proceedings of the IEEE International Conference on Robotics and Biomimetics. Phuket, Thailand, December 711, 2011. [14] C Sharp reference. Available at: http://msdn.microsoft.com/en-us/library/aa287558. (Accessed on: Dec 10, 2011). [15] Emgu CV libraries. Available at: http://emgu.com (Accessed on: Dec 10, 2011). [16] Penn State’s Visualization Group. Available at: http://viz.aset.psu.edu (Accessed on: Dec 15, 2011). [17] Smith AR. Color gamut transform pairs. Proceedings of the SIGGRAPH Conference. New York, New York, USA, August, 1978. [18] Sural S, Qian G, Pramanik S. Segmentation and histogram generation using the HSV color space for image retrieval. Proceedings of the IEEE International Conference on Image Processing. 2002. [19] Zhao M, Bu J, Chen C. Robust background subtraction in HSV color space. Multimedia systems and applications. Boston, Massachusetts, USA, 4861:325, 2002. [20] Serra J. Image analysis and mathematical morphology. London: Academic Press 1982. [21] Herrera Castro D, Kannala J, Heikkilä J. Accurate and practical calibration of a depth and color camera pair. Proceedings of the International Conference on Computer Analysis of Images and Patterns. 2011. [22] Kramer J, Parker M, Burrus N, Echtler F. Hacking the Kinect. Apress, 2012. [23] Burrus N. Kinect RGB Demo. Available at: http://labs.manctl.com/rgbdemo/ (Accessed on: Dec 11, 2011). [24] Computer vision and pattern recognition group. Kinect calibration utility. Available at: http://www.cse.usf.edu/gaherna2/kinect (Accessed on: Dec 10, 2011).
9