Supervised Traversability Learning for Robot ... - Semantic Scholar

4 downloads 0 Views 2MB Size Report
Estimation of terrain traversability has been of interest for the mobile robotics community during ... the development of autonomous planetary rovers and the DARPA Grand Chal- ..... the DARPA grand challenge: Research articles. Journal of ...
Supervised Traversability Learning for Robot Navigation Ioannis Kostavelis, Lazaros Nalpantidis, and Antonios Gasteratos Laboratory of Robotics and Automation, Department of Production and Management Engineering, Democritus University of Thrace, Xanthi, Greece {gkostave,lanalpa,agaster}@pme.duth.gr

Abstract. This work presents a machine learning method for terrain’s traversability classification. Stereo vision is used to provide the depth map of the scene. Then, a v-disparity image calculation and processing step extracts suitable features about the scene’s characteristics. The resulting data are used as input for the training of a support vector machine (SVM). The evaluation of the traversability classification is performed with a leave-one-out cross validation procedure applied on a test image data set. This data set includes manually labeled traversable and nontraversable scenes. The proposed method is able to classify the scene of further stereo image pairs as traversable or non-traversable, which is often the first step towards more advanced autonomous robot navigation behaviours. Keywords: traversability classification, robot navigation, SVM, machine learning, stereo vision, v-disparity image

1

Introduction

The development of an efficient method for inspecting the traversability of a scene is an active research topic [5,2]. Obstacle detection and traversability evaluation are important, as they both provide crucial information for the navigation of mobile robots. The goal of this work is the development of a trained system capable of detecting non-traversable scenes using solely stereo vision input. This task demands reliable and robust machine vision algorithms. Towards this direction, a stereo vision algorithm is used, which retrieves information about the environment from a stereo camera and produces the disparity map of the scene. The v-disparity image is then calculated based on the disparity map [17] and a feature extraction procedure extracts useful information from the v-disparity image. This procedure is repeated for all the available v-disparity images in order to form a data set. The data set is then utilised for the training and the testing phase of a support vector machine (SVM) classifier. Fig. 1 summarises the steps of the proposed methodology. R. Groß et al. (Eds.): TAROS 2011, LNAI 6856, pp. 289–298, 2011. c Springer-Verlag Berlin Heidelberg 2011 

290

I. Kostavelis, L. Nalpantidis, and A. Gasteratos

Fig. 1. Flow chart of the proposed methodology

1.1

Related Work

Stereo vision is often used in vision-based robotics, instead of monocular sensors, due to the simpler calculations involved in the depth estimation. A correspondence search between the two stereo images can provide dense information about the depth of the depicted scene. Recently, efficient stereo algorithms have been presented that can provide accurate and reliable depth estimations in frame rates suitable for autonomous robotic applications [11]. Estimation of terrain traversability has been of interest for the mobile robotics community during the last decades and in 1994 a statistics-based method for classifying field regions as traversable or not was proposed in [7]. More recently, the development of autonomous planetary rovers and the DARPA Grand Challenge have triggered rapid advancements in the field [5,13,15]. Machine learning methodologies have often been employed [12] and stereo vision has been widely used as input for such systems [4,6]. One of the most popular methods for terrain traversability analysis is the initial estimation of the v-disparity image [2]. This method is able to confront the noise in low quality disparity images [17,14] and model the terrain as well as any existing obstacles. Several researchers have proposed robot navigation methods based on terrain classification. These methods use features derived from remote sensor data such as colour, image texture and surface geometry. Initially, colour-based methods have been proposed for the classification of outdoor scenes using a mixture of Gaussians models [8]. Additionally, in [3] a terrain classification method based on texture features of the images was introduced. A more sophisticated and computationally demanding method based on 3D point clouds that uses the statistical distribution in space, has been proposed in [9]. Those navigation methods are applied both in traversable and non-traversable scenes consuming valuable resources and time. For efficient robot navigation nontraversable scenes should not be examined in detail. Therefore, scenes should

Supervised Traversability Learning for Robot Navigation

291

firstly be inspected and classified according to their overall traversability. Supervised machine learning techniques have the advantage that while the training phase is slow, the classification of new unseen instances is performed very fast. The main contribution of the present work is that the training phase can be significantly accelerated by using features of the v-disparity image, rather than features of the input images or the depth map. This is a more abstract description of the scene and reduces the general problem into a simpler one. Once the scene is classified as traversable then higher level navigation algorithms can be applied.

2 2.1

Algorithm Description Stereo Vision and V-Disparity Image Computation

The disparity maps are computed using a local stereo correspondence algorithm [10]. The utilised stereo algorithm combines low computational complexity with appropriate data processing. Consequently, it is able to produce dense disparity maps of good quality in frame rates suitable for robotic applications. The main attributes that differentiate this algorithm from the majority of the other ones is that the input images are enhanced by superimposing the outcome of a Laplacian of Gaussian (LoG) edge detector and that the matching cost aggregation step consists of a sophisticated gaussian-weighted sum of absolute differences (SAD) rather than a simple constant-weighted one. Furthermore, the disparity selection step is a simple winner-takes-all choice, as the absence of any iteratively updated selection process significantly reduces the computational payload of the overall algorithm. The results of the per pixel optimum disparity values are filtered at two consequent steps. Firstly, the reliability of the selected disparity value is validated. That is, for every pixel of the disparity map a certainty measure is calculated indicating the likelihood of the pixel’s selected disparity value to be the right one. The certainty measure cert is calculated for each pixel (x, y) as in Eq. 1   d−1     SAD(x, y, z)      cert(x, y) = SAD(x, y, disp(x, y)) − z=0  d      

(1)

According to this, the certainty cert for a pixel (x, y) that the computed disparity value disp(x, y) is actually right is equal to the absolute value of the difference between the minimum matching cost value SAD(x, y, disp(x, y)) and the average matching cost value for that pixel when considering all the d candidate disparity levels for that pixel. What the aforementioned measure evaluates is the amount of differentiation of the selected disparity value with regard to the rest candidate ones. The more the disparity value is differentiated, the most possible it is that the selected minimum is actually a real one and not owed to noise or other effects.

292

I. Kostavelis, L. Nalpantidis, and A. Gasteratos

A threshold is applied to this metric and only the pixels whose certainty to value cert(x,y) ratio SAD(x,y,disp(x,y)) is equal to or more than 30% are counted as valid ones. The value of this threshold has been chosen after exhaustive experimentation so as to reject as many false matches as possible while retaining the majority of the correct ones. Moreover, a bidirectional consistency check is applied. The selected disparity values are approved only if they are consistent, irrespectively to which image is the reference and which one is the target. Thus, even more false matches are disregarded. The outcome of the presented stereo algorithm is a sparse disparity map, as shown in Fig. 2(c), containing disparity values only for the most reliable pixels. The rest pixels, shown in black in Fig. 2(c) are not considered at all.

(a) Left image

(b) Right image

(c) Disparity map

Fig. 2. A stereo image pair and the resulting disparity map

Using the sparse disparity map obtained from the stereo correspondence algorithm a reliable v-disparity image can be computed, as shown in Fig. 3(a). In a v-disparity image each pixel has a positive integer value that denotes the number of pixels in the input image that lie on the same image line (ordinate) and have disparity value equal to its abscissa. The terrain in the v-disparity image is modelled by a linear equation. The parameters of this linear equation can be found using Hough transform [2], if the majority of the input images’ pixels belong to the terrain and not to obstacles. A tolerance region on both sides of the terrain’s linear segment is considered and any point outside this region can be safely considered as originating from a barrier. The linear segments denoting the terrain and the tolerance region overlaid on the v-disparity image are shown in Fig. 3(b). 2.2

Feature Extraction and SVM-Based Learning

A feature extraction procedure is then applied to the v-disparity image. For each scanline of a v-disparity image, e.g. that of Fig. 3(b), the values of the pixels lying outside the tolerance region are aggregated. The outcome of this procedure is a feature vector for each v-disparity image (or equivalently, for each stereo image

Supervised Traversability Learning for Robot Navigation

(a) Calculated v-disparity image

293

(b) V-disparity image with terrain modelled by the continuous line and the tolerance region shown between the two dashed lines

Fig. 3. V-disparity images for a stereo image pair

pair) that has as many dimensions as the number of the image’s scanlines and each value stem from the aforementioned values’ aggregation. More specifically, for a given v-disparity image of M × N dimensions, the output of the feature extraction method is a vector x = [x1 , x2 , ..., xM ] where, xi denotes the sum of the pixels lying outside the tolerance region for the row i and i = 1, 2, ..M indicates the number of rows of the disparity map. As an example, the pixels of the v-disparity image whose values are aggregated so as to obtain the x50 component of the feature vector are the ones lying within the red rectangular regions of Fig. 4. In this figure the red rectangular regions are exaggerated for the purpose of readability. The feature vectors for all the the stereo pairs of the used data set comprise a T data matrix D = [x1 , x2 , ..., xL ] , where each vector xj corresponds to a stereo pair and j = 1, 2, ..L denotes the number of samples. For the evaluation of our method a diminished data set constituted of 23 traversable and 10 nontraversable indoor scenes was used. One traversable and one non-traversable sample reference image are given in Fig. 5(a) and Fig. 6(a) respectively. The traversability of those scenes is deduced by the distance of the closest object to the camera, as it will be discussed in the next section. The next step of the proposed methodology deals with the training procedure for the aforementioned data set. This is an off-line context and, therefore, non time-critical part. Thus we chose to use a SVM classifier [16]. For the classification method, the LIBSVM library was used. More detailed information about the selected library and definitions concerning the selection of the optimal parameters can be found in [1]. The SVM approach constructs a binary classifier for each pair of classes by building a function that will be positive for one class (i.e. traversable) and negative for the other one (i.e. non-traversable). Linear,

294

I. Kostavelis, L. Nalpantidis, and A. Gasteratos

Fig. 4. Features extraction for the 50th image line

polynomial and Gaussian kernels have been tested. The model regularisation parameter C, which penalises large errors, is chosen equal to 100 in order to optimise data separation. The optimal parameter γ used to control the width of the Gaussian distribution was set to the value of 0.1. As for the polynomial kernel, the second order polynomial function was chosen for testing, whereas the third order polynomial didn’t offer any additional classification gain. In order to validate the results, the proposed method has also been tested using alternatively a k-nearest neighbour (k-nn) classifier. The limited size of the data set enforces this additional examination of the method’s efficiency. Towards this direction, overtraining results and polarised classification, which may stem from SVM classification, were examined. The parameter of the knearest neighbour classifier, which gave the greatest separability between the two classes, was also selected using a leave-one-out cross validation procedure and set to k = 5 neighbours.

3

Experimental Validation

The methodology described in this paper does no require a balanced data set for the SVM training phase. In the used input image pairs the traversable scenes were more numerous than the non-traversable ones (i.e. 23 traversable scenes and 10 non-traversable scenes). The used images had 512 × 384 pixels resolution. Each scene was manually labeled according to the distance of the nearest depicted obstacle. If that distance was less than a threshold value, set for our experiments to 50 cm, the scene was labeled as non-traversable. Otherwise, the scene was considered to be a traversable one. Fig. 5 and Fig. 6 present the

Supervised Traversability Learning for Robot Navigation

(b) Sparse disparity map

   

(a) Reference image

295

(c) V-disparity image

       

               





 



 



 

  

(d) Obstacles highlighted on the reference image

(e) Histogram of features

Fig. 5. Reference image and experimental results for a traversable scene tested

experimental result for an indicative traversable and non-traversable scene, respectively. The reference (left) images of each stereo image pair is given in Fig. 5(a) and Fig. 6(a). The corresponding sparse disparity maps are given in Fig. 5(b) and Fig. 6(b). These disparity maps are used for the computation of the v-disparity images, given in Fig. 5(c) and Fig. 6(c). The obstacles indicated by these v-disparity images are highlighted in red colour in Fig. 5(d) and Fig. 6(d). Finally, Fig. 5(e) and Fig. 6(e) show the results of the v-disparity image’s feature extraction process as a histogram of the detected features in each scanline. The k-nearest neighbour classifier manages to achieve a 74, 3% classification rate in leave-one-out cross validation with k = 5 nearest neighbours. Considering that this classifier is inherently prone to errors, it can be deduced that the preprocessing procedures are efficient and the feature extraction method indeed creates features which contain crucial information about the traversability of the scenes. Additionally, the SVM classifier succeeded a 91, 2% classification rate using the second order polynomial kernel. This proves great separability between the two classes, taking into consideration that this classification rate corresponds to 30 correct out of 33 classified samples. The SVM classifier using a linear kernel as well as a Gaussian one achieved success rates somewhere between the other two classifiers. Table 1 presents the classification rates for the different classifiers that have been tested.

296

I. Kostavelis, L. Nalpantidis, and A. Gasteratos

(b) Sparse disparity map

   

(a) Reference image

(c) V-disparity image

       

               





 



 



 

  

(d) Obstacles highlighted on the reference image

(e) Histogram of features

Fig. 6. Reference image and experimental results for a non-traversable scene tested Table 1. Classification rate for different classifiers

4

Classifier Type

Classification Rate

k-nn Linear SVM Gaussian SVM Polynomial SVM

74,30% 87,88% 81,83% 91,20%

Conclusion

A traversability classification system for autonomous robot navigation has been proposed. The system consists of an optimised stereo algorithm that ultimately produces v-disparity images. The v-disparities are then coded, in a novel and simple way, so as to form the feature vectors of a two-class data set. The primitive v-disparity maps are manually labeled as traversable and non-traversable and a SVM classifier is trained to separate the two different classes. The efficiency of our method was first tested using a k-nearest neighbour classifier, which succeeded 74.3% classification rate and finally using a SVM classifier that employed a second order polynomial kernel. This classifier achieved a classification rate of 91, 2%. The significantly high classification ability achieved stems from the production of noise-free disparity maps, which result in reliable v-disparity images.

Supervised Traversability Learning for Robot Navigation

297

This initial step is very important for the success of the proposed method as the input feature vectors of the SVM classifier contain crucial and concise information for the traversability of the scene. The high efficiency rate encourages the use of the traversability inspection system for the primary exploratory analysis of a scene. The trained system can be used for autonomous robot navigation diminishing the computational cost and minimising the demanded on-line execution time. It should be noted that the aforementioned classification rates were obtained using a very limited set of input images, i.e. only 33 stereo image pairs. Moreover, the set of input images was not balanced, but the traversable scenes were more than twice the number of the non-traversable ones. To conclude, the experimental results of the proposed methodology are encouraging. The training of the system can be performed off-line and the separation ability is high. The separation ability is expected to improve even more if a larger and balanced set of stereo input images is applied. Such a trained system is expected to be able to perform terrain traversability classification with limited on-line effort and with high success rates.

References 1. Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001), software available at http://www.csie.ntu.edu.tw/~ cjlin/libsvm 2. De Cubber, G., Doroftei, D., Nalpantidis, L., Sirakoulis, G.C., Gasteratos, A.: Stereo-based terrain traversability analysis for robot navigation. In: IARP/EURON Workshop on Robotics for Risky Interventions and Environmental Surveillance, Brussels, Belgium (2009) 3. Dima, C.S., Vandapel, N., Hebert, M.: Classifier fusion for outdoor obstacle detection. In: IEEE International Conference on Robotics and Automation, vol. (1), pp. 665–671 (1994) 4. Happold, M., Ollis, M., Johnson, N.: Enhancing supervised terrain classification with predictive unsupervised learning. In: Robotics: Science and Systems, Philadelphia, USA (August 2006) 5. Howard, A., Turmon, M., Matthies, L., Tang, B., Angelova, A., Mjolsness, E.: Towards learned traversability for robot navigation: From underfoot to the far field. Journal of Field Robotics 23(11-12), 1005–1017 (2006) 6. Kim, D., Sun, J., Min, S., James, O., Rehg, M., Bobick, A.F.: Traversability classification using unsupervised on-line visual learning for outdoor robot navigation. In: IEEE International Conference on Robotics and Automation (2006) 7. Langer, D.: A behavior-based system for off-road navigation. IEEE Transactions on Robotics and Automation 10(6), 776–783 (1994) 8. Manduchi, R.: Learning outdoor color classification from just one training image. In: European Conference on Computer Vision, vol. 4, pp. 402–413 (2004) 9. Vandapel, N., Huber, D., Kapuria, A., Hebert, M.: Natural terrain classification using 3-D ladar data. In: IEEE International Conference on Robotics and Automation, vol. 5, pp. 5117–5122 (2004) 10. Nalpantidis, L., Sirakoulis, G.C., Carbone, A., Gasteratos, A.: Computationally effective stereovision SLAM. In: IEEE International Conference on Imaging Systems and Techniques, Thessaloniki, Greece, pp. 453–458 (July 2010)

298

I. Kostavelis, L. Nalpantidis, and A. Gasteratos

11. Nalpantidis, L., Sirakoulis, G.C., Gasteratos, A.: Review of stereo vision algorithms: from software to hardware. International Journal of Optomechatronics 2(4), 435–462 (2008) 12. Shneier, M.O., Shackleford, W.P., Hong, T.H., Chang, T.Y.: Performance evaluation of a terrain traversability learning algorithm in the DARPA LAGR program. In: Performance Metrics for Intelligent Systems Workshop, Gaithersburg, MD, USA, pp. 103–110 (2006) 13. Singh, S., Simmons, R., Smith, T., Stentz, A., Verma, I., Yahja, A., Schwehr, K.: Recent progress in local and global traversability for planetary rovers. In: IEEE International Conference on Robotics and Automation, vol. 2, pp. 1194–1200 (2000) 14. Soquet, N., Aubert, D., Hautiere, N.: Road segmentation supervised by an extended V-disparity algorithm for autonomous navigation. In: IEEE Intelligent Vehicles Symposium, Istanbul, Turkey, pp. 160–165 (2007) 15. Thrun, S., Montemerlo, M., Dahlkamp, H., Stavens, D., Aron, A., Diebel, J., Fong, P., Gale, J., Halpenny, M., Hoffmann, G., Lau, K., Oakley, C., Palatucci, M., Pratt, V., Stang, P., Strohband, S., Dupont, C., Jendrossek, L.E., Koelen, C., Markey, C., Rummel, C., van Niekerk, J., Jensen, E., Alessandrini, P., Bradski, G., Davies, B., Ettinger, S., Kaehler, A., Nefian, A., Mahoney, P.: Stanley: The robot that won the DARPA grand challenge: Research articles. Journal of Robotic Systems 23(9), 661–692 (2006) 16. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995) 17. Zhao, J., Katupitiya, J., Ward, J.: Global correlation based ground plane estimation using V-disparity image. In: IEEE International Conference on Robotics and Automation, Rome, Italy, pp. 529–534 (2007)

Suggest Documents