A Fast Floor Segmentation Algorithm for Visual ... - Semantic Scholar

6 downloads 0 Views 5MB Size Report
A Fast Floor Segmentation Algorithm for Visual-based Robot Navigation. F. Geovani Rodrıguez-Telles, L. Abril Torres-Méndez. Robotics and Advanced ...
A Fast Floor Segmentation Algorithm for Visual-based Robot Navigation F. Geovani Rodr´ıguez-Telles, L. Abril Torres-M´endez Robotics and Advanced Manufacturing Group CINVESTAV Campus Saltillo Ramos Arizpe, Coahuila, MEXICO Emails: [email protected], [email protected]

Abstract—We present a novel technique that robustly segments free-space for robot navigation purposes. In particular, we are interested in a reactive visual navigation, in which the rapid and accurate detection of free space where the robot can navigate is crucial. Contrary to existing methods that use multiple cameras in different configurations, we use a downward-facing monocular camera to search for free space in a large and complicated room environment. The proposed approach combines two techniques. First, we apply the Simple Linear Iterative Clustering super-pixel algorithm to the input images. Then, by relying on particular characteristics of floor superpixels, we use a simple classification method based on the SSD similarity measure to group together those superpixels that belongs to the floor (considered as free space). The method intermittently examines low resolution images (80 × 60) in the CIE Lab color model. Experimental results show that our segmentation approach is robust, even in the presence of severe specular reflections and allows for real-time navigation. Keywords-SLIC superpixels; segmentation

I. I NTRODUCTION In robotics, the term navigation refers to the capability of moving autonomously in the environment. The study of navigation in living beings has a long history in neuroscience [7], [5], [18]. The discoveries found in the last fifty years have provided a physiological grounding related to the type of representation of spatial locations in our brain. Those spatial locations are usually linked to the relationships we build between the information captured from the environment through our senses, and the conceptual information based on previous knowledge about the functional characteristics of the environment. When walking through an environment, we all have experienced the need to perceive our spatial sense, also known as spatial awareness or proximity sense, that is, to know the dimensions our body occupies with respect to the empty space of the environment in which we can walk in. Because of this spatial sense, we are able to get a clearer perception of how much we have move forward, related to where we were, and associate it to what we see next. When driving a new car, we need to adjust our spatial sense to the size of the car. Our brain does this adjustment while driving, not before. Most cars now have a rear view camera

Edgar A. Mart´ınez-Garc´ıa Lab. de Rob´otica, Inst. of Eng. and Tech. Universidad Aut´onoma de Cd. Ju´arez Juarez, Chihuahua, Mexico Email: [email protected]

to improve the driver’s ability when backing the vehicle. The field view of these cameras is limited but sufficient to detect empty space (also known as free space) up to certain distance. What drivers do is a quick detection of the free space from the captured images. Estimating the spatial sense of a mobile robot and how much distance must travel towards free space is somewhat similar to the car’s example. The detection of free space is linked to the robot’s motor capabilities and the type of environment. For the case of an indoor structured environment and a mobile robot that translates in a forward direction and rotate (pan) around its vertical axis, the floor represents free space to navigate. Also, by knowing where the floor is, obstacle avoidance can be easily achieved. To segment the floor, certain patterns in the set of image pixels need to be recognized. Texture and color are both important characteristics. However, finding specular reflections and shadows in floors due to illumination conditions is commonplace. This is one of the main difficulties faced by most segmentation algorithms. There is a vast amount of research done on textural image segmentation. These techniques can be classified according to the use of contours, features or regions [15], [21] and the level of supervision or prior knowledge of the kind of texture to segment [4]. For the particular case of segmenting the floor for robot navigation, many techniques try to solve the problem by detecting the immediate region around the robot [13], [24], [23]. Other techniques focus on detecting the specific wall-floor boundary [25]. However, of the above techniques, only a few allow for real-time processing. For robotics applications this is requirement. Moreover, some techniques make assumptions about the environment or require special conditions such as the ceiling to be visible. In this paper, we present a novel algorithm that segments the floor of low-resolution images in real-time, thus allowing for mobile robot navigation. Contrary to existing approaches, our method does not use homographies nor optical flow as the ground plane constraint is not considered. Therefore, the camera does not need to be calibrated. We exploit the fact that high-resolution information is not needed for

navigation, as it has been demonstrated in studies of the behavior of human drivers under adverse conditions [9], [17], [2]. Our technique applies the Simple Linear Iterative Clustering (SLIC) superpixel algorithm [1] to low resolution images (80 × 60). A relevant characteristic of this algorithm is its linear computational complexity, unlike existing algorithms based on superpixels segmentation that are of complexity O(nlogn) or greater. Thus, the objects and free space (the floor) can be segmented in a fast and efficient way. Each superpixel region is represented by a pixel (called superpixel), usually located at the center of the region and contains characteristics in color and texture. The set of all superpixels in the image are then grouped together according to its common characteristics. Our approach uses a small set of ”floor” superpixels from which relationships between greyscale and color subspaces are used to classify the rest of superpixels. The main idea is to use the independent variables (color, position and area shape of the superpixels) to predict the behavior of the dependent variable (if a superpixel belongs to the floor or not). We have carried on experiments in images from several indoor environments at different times in order to show the feasibility of our approach. The outline of the paper is as follows. Section 2 presents some related work. The free-space detection methodology we propose is presented in Section 3. The analysis to decrease the time complexity and some experimental results are described in Section 4. Finally, in Section 5, conclusions and future work are given. II. R ELATE W ORK Detection of free space is an important problem for mobile robot navigation. A great number of research work have been focused on the obstacle avoidance problem. These techniques search for free space in the immediate region around the robot. They use information about the disparity or motion of pixels that lie on the ground plane; for example, by using stereo cameras [19] or optical flow [22]. Some researchers have considered the floor detection problem as a more natural way to search for free space. Techniques can be divided as those that automatically segment the floor; estimate the lines that divide the floor from the walls and those that focus on segmenting images into regions with a specific class associated. These techniques also assume the ground plane constraint. For example, Zhou and Li [26] estimate planar homographs that are best associated to the ground plane. However, the resulting segmentation is sparse since only point features of the ground are extracted. Ohnishi and Imiya [16] calculate the optical flow in an image sequence and the dominant flow is considered as floor. The main drawback of their approach is to assume that the floor is the largest region in the image and the camera has a small displacement. Li and Birchfield [12] use multiple cues to compute boundary lines between floor and walls. Although

they approach is well suited to corridors, it may not adapt to open spaces with multiple obstacles or more complicated structures. In indoor scenes, illumination plays an important role since it causes specular reflections on surfaces, especially in floors, as well as shadows from objects in the scene. Imagebased segmentation techniques must be robust enough to cope with these. In recent years, the use of superpixels for image segmentation has become very popular. Superpixels algorithms group pixels with similar characteristics, for example according to their color (in one or more color models) and their position in the image. The goal of grouping these pixels is to reduce image complexity to facilitate post-processing as the operations are carry on the superpixels instead of the total number of image pixels. There are several approaches to generating superpixels [20], [6], [14], [10]. The more recent one, the SLIC superpixels algorithm proposed by Achanta et al. [1], presents clear advantages in terms of speed, ability to adhere to image boundaries, and impact on segmentation performance. SLIC is a simple algorithm that adapts k−means clustering to generate superpixels. It has a linear complexity in the number of pixels N (independent of the number of superpixels k). Color and spatial proximity is combined throughout a weighted distance measure. This measure also provides control over the size and compactness of the superpixels. Once superpixels are computed, they can be processed to segment the objects in the image. A good approach for segmenting using superpixels is to compute color, texture, geometry, and location features. Segmentation approaches can be classified as edge-based, clustering-based, regionbased, threshold-based and graph-based [11]. However, for real-time applications they may be not suitable. Another important aspect is that segmentation algorithms should be robust enough to consider specular reflections as well as certain types of shadows. III. O UR METHOD The Pioneer 3DX mobile robot is equipped with a PointGrey DragonFly Express camera that provides color images at a 480×640 pixel resolution. The camera is centered on top of the robot and directed downwards in order to capture the floor (see Figure 1). Detection of free space consists of two stages: a) image segmentation using the SLIC superpixels; b) floor segmentation using a simple classifier. A. Image Segmentation using SLIC superpixels The SLIC superpixels segmentation algorithm [1] starts from a regular grid of centers or segments, and grow the superpixels by clustering pixels around the centers. At each iteration, the centers are updated, and the superpixels are grown again. The original method is efficient but does not run in real-time. We solve this problem by subsampling the image to a really small resolution of 80 × 60 pixels. A

Figure 1.

Our mobile robot with the camera.

parameter to define is the number of the desired superpixels K in the image. The approximate size of each superpixel is N/K, where N is the number of pixels in the image. For superpixels having a similar number p of pixels, a superpixel center Ck is set at intervals S = N/K. Each superpixel center is a 5-tuple defined by Ck = [lk , ak , bk , xk , yk ], with k = [1..K] at regular grid intervals S. lk , ak and bk are the corresponding channels in the CIELab color space, and xk , yk the pixel coordinates. The SLIC algorithm is summarized in Algorithm 1 (for more details of this algorithm the reader is directed to [1]).

edges indicate if two nodes belongs to the same object. The nodes in the graph are the superpixels in the image and contain information about the area (number of pixels) in a superpixel, its coordinates in the image and the values of its L, a, b channels. Additionally, we consider the shape of each superpixel’s area. We have observed that the shapes of the superpixels area near object boundaries are not regular (see Fig. 2), we can exploit this characteristic in our segmentation algorithm. We start by defining a dependent variable which value will indicate the probability that a superpixel belongs to the floor. The training phase uses the value of this dependent variable for a small set of superpixels that are considered part of the floor. We assume that the superpixels in the center columns in the two bottom rows of the images captured by our robot will always correspond to the floor (according to our camera setting this is always true). Thus, the shape of the area covered by a superpixel and its color are the main characteristics considered by our classifier. Specifically, in order to determine the probability that a superpixel belongs to the floor, we consider the following aspects, which resulted from the observations in the superpixels segmentation results:

Algorithm 1: SLIC superpixel segmentation 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17:

Initialize cluster centers Ck = [lk , ak , bk , xk , yk ]T by sampling pixels at regular grid steps S. Move cluster centers to the lowest gradient position in a n × n neighborhood, to the lowest gradient position. Set label l(i) = −1 for each pixel i. Set distance d(i) = ∞ for each pixel i. repeat for each cluster center Ck do for each pixel i in a 2S × 2S region around Ck do Compute the distance D between Ck and i. if D < d(i) then d(i) ← D l(i) ← k end if end for end for Compute new cluster centers. Compute residual error E. until E ≤ threshold

(a)

Figure 2. An example of how the shape of the SLIC superpixels changes according to the objects in the scene. (a) shows the SLIC superpixels segmentation. Panel (b) shows the zoomed-in image regions highlighted by the yellow and red squares. It can be observed that the shape of the superpixels near object boundaries are irregular whereas the shape of superpixels on textured surfaces of the same objects are more regular (square-shaped).

The color of the superpixels (i.e., the center pixel in the superpixel area) belonging to the floor should be very similar. • The superpixels located in the upper region of the image are less probable to be part of the floor. • The shape (bounding box) of the superpixels area that contains pixels with very similar texture tends to be regular, like a square. Thus, the independent variables that we consider for each superpixel are: 1) the Lk , ak and bk channels •

We set the number of superpixels to be K = 300. Thus, for an image with size of 480 × 640, the approximate size of each superpixel is 1024 and, for an image with 60 × 80 pixels, the average size is 16 pixels. B. Floor segmentation sec:floorsegmentation In order to segment the floor from the other objects in an indoor scene we build a graph by using the superpixels obtained in the previous section. The

(b)

Figure 3. Some examples of applying our floor segmentation algorithm when considering the three Lab channels and only the a and b channels. We observed that specularities in the floor do not appear in the a channel but mainly in the L channel. The input color image (first column) is of 480 × 640 pixels. The second to fourth columns show the images of the L-a-b channels respectively. The last two columns show the resulting segmentation when applying the three L − a − b channels and when applying only the a and b channels.

of superpixels (2 variables); 2) the actual area of the superpixels (1 variable); 3) the width, height and diagonals measures of the superpixel area (4 variables) and; 4) xk , yk , the superpixel cluster centers coordinates (2 variables). Therefore, the total number of independent variables is 8. We use a normalized SSD measure to classify if a superpixel belongs to the floor or not. We first compute the mean values for each independent variable of the superpixels in the training set. Then these mean values are subtracted from each independent variable in a superpixel to classify. For the classification, we define a threshold value according to the maximum normalized SSD measure obtained from the training set plus a value d. Where d is a coefficient with a small value. This is in order to not restrict too much the segmentation of the floor.

IV. E XPERIMENTAL R ESULTS A. Simplifications to improve time complexity Since we require real-time navigation, we focus mainly on reducing as much as possible the computational time of our segmentation algorithm while keeping its robustness. First, we analyze the effect of using the CIE Lab color space in our classification since this is the color space the SLIC superpixels use. We note that false changes in intensity caused by specularities in the floor are kept mostly by the luminance (L) channel. These specularities are present due to illumination conditions and properties of the floor itself. However, the a channel basically ignores these specularities as their contain no amount of red color. Figure 3 shows some examples. The last two columns are the resulting segmentation images when applying the Lab channels and the ab channels, respectively. Superpixels with a blue dot in its center indicate that all pixels in their area belongs to the floor while superpixels with a red dot are considered as no-floor.

Second, we compare results of the floor segmentation on different image scales. Fig. 4 depicts examples on the same indoor scene at different resolutions starting from the original size of 480 × 640 down to the 60 × 80. We observed that the segmentation of the floor remains similar at all scales. This can be due to the fact that the content of information loss is very low (less than the 5% of the total information), this can be mathematically analyzed by computing the entropy of the images. In fact, we noted that the segmentation results get even better up to a certain scale (in our case 60 × 80). This might be because the texture details tend to smooth. We run our algorithm on the onboard computer of the robot. Table I reflects the computational times for each image scale when running our floor segmentation algorithm. Image size 480 × 640 240 × 320 120 × 160 60 × 80

Time (sec) 3.4850 0.8843 0.2480 0.0928

Table I C OMPUTATIONAL TIMES AT DIFFERENT IMAGE RESOLUTION .

B. Results In order to test the performance of our approach, we acquire a set of more than 200 images of typical indoor environments with different floor textures and illumination conditions. We also intentionally put small objects on the floor of different size and texture. These set of images were captured by our mobile robot by remotely driving it through the environment. The images were processed by the onboard computer of the robot to compute the exact computational time required for the segmentation task. Therefore, by using only two channels of the color space and low-resolution images we decrease significantly the computation time of our algorithm. Figure 5 shows a set of different indoor images. Our approach achieves nearly 90% detection of free space on the images in our database. The segmentation is good even on highly textured floors and when specularities are present. V. C ONCLUSIONS Reactive robot navigation is generally achieved by integrating information from multiple sensors, such as cameras and range finders, as they tend to have limited capabilities if used alone. In this work, we present a floor segmentation algorithm suitable for real-time mobile robot navigation. We have demonstrated that with only one monocular camera and low resolution images the detection of free space can be robustly achieved. Our approach uses the SLIC superpixels to initially segment the input low resolution image and a simple SSD similarity measure to classify superpixels that

belongs to the floor (free space). The results shown that even with specular reflections, shadows coming from far located objects, and small objects on the floor, our method efficiently segments the free space. ACKNOWLEDGMENT The authors would like to thank CONACyT for funding this project. R EFERENCES [1] Achanta, Radhakrishna, et al. Slic superpixels. cole Polytechnique Fdral de Lausssanne (EPFL), Tech. Rep 149300 (2010). [2] J. C. Brooks and D. A. Owens. Effects of luminance, blur, and tunnel vision on postural stability. Journal of Vision, 1(3), 304. [3] X.-N. Cui, Y.-G. Kim, and H. Kim. Floor Segmentation by Computing Plane Normals from Image Motion Fields for Visual Navigation. In International Journal of Control, Automation, and Systems, 7(5):788-798, 2009. [4] Derraz, F., Taleb-Ahmed, A., Peyrodie, L., Pinti, A., Chikh, A., Bereksi-Reguig, F.: Active Contours Based Battachryya Gradient Flow for Texture Segmentation. In 2nd. International Congress on Image and Signal Processing (CISP), Tianjin, pp. 16, 2009. [5] M. J. Farah, The Handbook of Neuropsychology. Disorders of visual behavior. F. Boller, J. Grafman, eds., Elsevier, Amsterdam, pp. 395413, 1989. [6] P. Felzenszwalb and D. Huttenlocher, Efficient Graph-Based Image Segmentation, Intl J. Computer Vision, vol. 59, no. 2, pp. 167-181, September, 2004. [7] R.G. Golledge and G. Zannaras, Cognitive Approaches to the Analysis of Human Spatial Behaviour, Environmental Cognition, pp. 5994, 1973. [8] Y.-G. Kim and H. Kim. Layered Ground Floor Detection for Vision-based Mobile Robot Navigation. In Proc. of the IEEE lntl. Conference on Robotics and Automation (ICRA), 2004. [9] H. W. Leibowitz, C. S. Rodemer and J. Dichgans. The independence of dynamic spatial orientation from luminance and refractive error. Perception Psychophysics, 25(2), 7579, 1979. [10] A. Levinshtein, A. Stere, K. Kutulakos, D. Fleet, S. Dickinson, and K. Siddiqi, Turbopixels: Fast Superpixels Using Geometric Flows, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 12, pp. 2290-2297, Dec. 2009. [11] A. Lucchi, K. Smith, R. Achanta, V. Lepetit, and P. Fua, A Fully Automated Approach to Segmentation of Irregularly Shaped Cellular Structures in EM Images, Proc. Intl Conf. Medical Image Computing and Computer Assisted Intervention, 2010. [12] Y. Li and S.T. Birchfield. Image-Based Segmentation of Indoor Corridor Floors for a Mobile Robot. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems(IROS), 2010.

Figure 4. An example of floor segmentation at different scales using only the a and b-channels. It can be observed that the results for floor segmentation are similar in all scales. The segmentation gets even much better for smaller images since texture details tend to smooth.

[13] L.M. Lorigo, R.A. Brooks, and W.E.L. Grimsou. Visuallyguided obstacle avoidance in unstructured environments. In IEEE Intelligent Robots and Systems (IROS), 1997, volume 1, pages 373 379. [14] A. Moore, S. Prince, J. Warrell, U. Mohammed, and G. Jones, Superpixel Lattices, Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.

[20] J. Shi and J. Malik, Normalized Cuts and Image Segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000. [21] Shuang, W., Xiuli, M., Xiangrong, Z., Licheng, J.: Watershedbased Textural Image Segmentation. In International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Xiamen, pp. 312315, 2007.

[15] Mukhopadhyay, S., Chanda, B.: Multiscal Morphological Segmentation of Gray-Scale Images. IEEE Transactions on Image Processing 12(5), 533549, 2003.

[22] N. O. Stoffler, T. Burkert, and G. Farber. Real-Time Obstacle Avoidance Using an MPEG-Processor-based Optic Flow Sensor. International Conference on Pattern Recognition, Barcelona. September 2000.

[16] N. Ohnishi and A. Imiya. Model-based plane-segmentation using optical flow and dominant plane. In MIRAGE, volume 4418 of Lecture Notes in Computer Science, pages 295306. Springer, 2007.

[23] Iwan Ulrich and Illah Nourbakhsh. Appearance-based obstacle detection with monocular color vision. In Proceedings of AAAI, 2000.

[17] D. A. Owens. Twilight vision and road safety. In J. Andre, D. A. Owens, L. O. Harvey Jr. (Eds.), Visual perception: the influence of H.W. Leibowitz. Washington: American Psychological Association, 2009. [18] M. Riddoch, G. Humphreys, Neuropsychology of visual perception. Hillsdale: Lawrence Erlbaum Associates, pp. 79103, 1989. [19] K. Sabe, M. Fukuchi, J.-S. Gutmann, T. Ohashi, K. Kawamoto, and T. Yoshigahara. Obstacle Avoidance and Path Planning for Humanoid Robots using Stereo Vision. In Proceeding of the IEEE International Conference on Robotics and Automation, 2004.

[24] Hui Wang, Kui Yuan, Wei Zou, and Yizhun Peng. Real-time obstacle detection with a single camera. In IEEE Industrial Technology (ICIT), 2005, pages 92 96. [25] Yinxiao Li and S.T. Birchfield. Image-based segmentation of indoor corridor floors for a mobile robot. In IEEE Intelligent Robots and Systems (IROS), 2010, pages 837 843. [26] J. Zhou and B. Li. Robust ground plane detection with normalized homography in monocular sequences from a robot platform. In Proceedings of the International Conference on Image Processing, 2006.

Figure 5. Examples of floor segmentation at different indoor scenes. Note how the segmentation of the floor works well even with the presence of small objects on the floor.

Suggest Documents