JAISCR, 2013, Vol. 3, No. 3, pp. 201 – 213 DOI 10.2478/jaiscr-2014-0014
COMPUTATIONALLY INEXPENSIVE APPEARANCE BASED TERRAIN LEARNING IN UNKNOWN ENVIRONMENTS Prabhakar Mishra1 and Anirudh Viswanathan2 1 PES
Centre for Intelligent Systems,PES Institute of Technology,BSK Stage - III, Bangalore Karnataka, India email:
[email protected] 2 Robotics
Institute,Carnegie Mellon University,5000 Forbes Ave,Pittsburgh, PA 15213, USA, email:
[email protected] Abstract
This paper describes a computationally inexpensive approach to learning and identification of maneuverable terrain to aid autonomous navigation. We adopt a monocular vision based framework, using a single consumer grade camera as the primary sensor, and model the terrain as a Mixture of Gaussians. Self-supervised learning is used to identify navigable terrain in the perception space. Training data is obtained using pre-filtered pixels, which correspond to near-range traversable terrain. The scheme allows for on-line, and in-motion update of the terrain model. The pipeline architecture used in the proposed algorithm is made amenable to real-time implementation by restricting computations to bit-shifts and accumulate operations. Color based clustering using dominant terrain texture is then performed in perception sub-space. Model initialization and update follows at the coarse scale of an octave image pyramid, and is back projected onto the original fine scale. We present results of terrain learning, tested in heterogeneous environments, including urban road, suburban parks, and indoors. Our scheme provides orders of magnitude improvement in time complexity, when compared to existing approaches reported in literature.
1
Introduction
This paper describes a Computer Vision algorithm developed to identify navigable terrain, for autonomous rover operation in unknown environments. Autonomous rovers deployed on remote planetary surfaces, to function as surrogate explorers need to be endowed with the ability of in-motion mapping and localization. This capability is central to the mission objectives which decides the system configuration and payload. The utility of such intelligent systems lie in minimal query and decision support from an Earth based mission operator. For successful exploration of the environment,
the perception module should be able to distinguish between regions of free-space, from those corresponding to obstacles. Additionally, practical concerns impact the choice of the sensor suite based on size, power, and operability constraints ([21]). The on-board scientific instrumentation, specific to the mission, are allocated significant pay-load volume, weight, and power resources. The perception subsystem, in turn, is constrained for minimization of operability, power budget, and on-board computational resources. This motivates our case to investigate methodologies for optimizing the perception module. In this paper, we restrict the sensor suite a single consumer grade camera, with a proximity
Unauthenticated Download Date | 9/10/15 10:59 PM
In this mer grade obstacle nhancing multiple n though s not in-
n applicaundance, amework, h models color. A ain modapproach restricted ror prone over the
erception imilar to models of ng. The e training r scanner ace. Our sion, and ods, that roaches. oposed in e to realhat adapt 1 shows
202
Mishra P., Viswanathan A.
sensor for near-range obstacle detection. We restrict our investigation to enhancing monocular vision based algorithms. Use of multiple sensors and probabilistic multi-sensor data fusion though necessary in applications of this complexity, is not included in the scope of this work. Terrain characterization, typical for navigation applications, is associated with the relief profile, rock abundance, and soil granularity [10]. In the monocular framework, these engineering parameters are associated with models in the perceptual space, such as terrain hue and color. A limited Earth based solution is to use a-prior terrain models, trained using examples of off-line data, an approach adopted by [8]. However, such classifiers when restricted to a finite number of learned models, become error prone with illumination and terrain surface variations over the mission lifetime. An on-line terrain classification scheme in perception space is proposed by [7]. This system is similar to our perception architecture, and learns color models of navigable terrain using self supervised learning. The significant difference between the two, is at the training stage. The work by [7] relies on using a laser scanner to identify navigable regions in perception space. Ourasproposed which is orders of magnitude faster comparedscheme, to the is restricted to monocular vision, and emphasizes technique given in [7]. computationally inexpensive methods, that are orders of magnitude faster than previous approaches.
2
Previous Work
image1.png (a)
(c)
(b)
(d)
Figure 1: Typical terrain types in we expect to identify Figure 1. Typical types in we navigation. expect to regions of free-space forterrain autonomous rover Dynamic urbanregions roads (a), outdoor (b,c), and identify of unstructured free-space for autonomous cluttered indoor environments (d). (a) urban is fromroads the standard rover navigation. Dynamic (a), RAT-SLAM dataset, (b-c) are custom datasets, and (d) is unstructured outdoor (b,c), and cluttered indoor a test case from the IDOL dataset.
environments (d). (a) is from the standard RAT-SLAM dataset, (b-c) are custom datasets, and Previous work in learning models to represent terrain, a test case from the IDOL dataset. are a part(d)ofisthe Defense Advanced Research Projects
To overcome the limitations of algorithms proposed in literature, and achieve implementation amenable to real-time resource constraints, we propose schemes that adapt to in-motion learning of terrain types. Figure 1 shows typical terrain in different environments. Our approach relies on a self-supervised learning algorithm, trained by prefiltering pixels in the perception space. The prefiltered pixels are assumed to correspond to regions of free-space immediately in front of the rover. The vision algorithm learns dominant colors, as a Mixture of Gaussians. We propose sub-space clustering, implemented using bit-shifts, to learn color models of maneuverable regions. Additionally, a horizon detection scheme restricts the model update related computation to the ground region. The combined strategies of learning models in scale space, with look-up tables to establish the pixel-wise mapping between scales, allows for real-time throughput which is orders of magnitude faster as compared to the technique given in [7].
2
Previous Work
Previous work in learning models to represent terrain, are a part of the Defense Advanced Research Projects Agency (DARPA) programs. Other programs include a terrain classification module targeted at driver assistance systems and autonomous vehicles. The reader is directed towards the survey paper by [9] for early work in vision based mobile robot navigation. The authors describe perception schemes successful terrain learning schemes, applicable in both structured and unstructured environments. More recently, three DARPA programs, namely the Grand Challenge (DGC - 2004), the Urban Challenge (DUC - 2007), and Learning Applied to Ground Vehicles (LAGR - 2004–2008) were involved with autonomous mobile robot navigation ([13]). The DGC involved autonomous driving across 132 miles of desert terrain, the results of which are summarized by [5]. The winner of the DGC was Stanford University’s team. The architecture describing Stanley, the team’s entry is described by [20]. The vision module ([7]), enabled far-range
Agency (DARPA) programs. Other programs include a terrain classification module targeted at driver assistance Unauthenticated systems and autonomous vehicles. Download Date | 9/10/15 10:59 PM The reader is directed towards the survey paper by [9] for early work in vision based mobile robot navigation.
203
COMPUTATIONALLY INEXPENSIVE . . .
terrain classification using probabilistic data-fusion across a laser range finder, and monocular vision. Training data is delineated by the laser scanner, to identify traversable terrain in front of the robot. Subsequently, pixels corresponding to the training region are used to model terrain. The solution allows the vehicle to identify far-range navigable terrain, while driving at high speed.
niques. Consequently, are not directly applicable to our monocular vision framework.
The DUC involved autonomous navigation in urban environments. The teams had to complete a circuit spanning 96 km, while obeying traffic rules, avoiding other competitors, and moving traffic (Editorial, J. Field Robotics[2]). Additionally, vehicles had to accurately localize the curb, relative to the road, to avoid rollover. The typical solution used by teams included stereo-vision, GPS based localization, and laser scanners to successfully maneuverer the urban environment. Carnegie Mellon Universitys Boss, won the DUC, and the robot architecture is described by ([22]). The perception system employs a sensor layer, and a fusion layer. An object hypothesis set is used in dynamic obstacle detection and tracking, and is updated using the sensor layer. The fusion layer implements global classification and object management, for successful navigation on urban roads.
– Speedup of upto 60x in pixel labeling, processing real-time HD video.
LAGR ([12], [13]), led teams into terrain learning in outdoor, unstructured environments. The LAGR platform was equipped with stereo-vision, proximity-sensors, inertial measurement unit, and infrared rangefinders. The nature of forested terrain and dense foliage led to the development of techniques for in-motion terrain learning ([23]). The technique employs three stages : feature learning, feature training, and terrain prediction. Feature learning is performed by identification of points which correspond to the ground plane, using LIDAR data. The training stage uses a visual classifier to label obstacles, and regions of free-space. This is followed by the predictive stage, which allows for high speed terrain recognition. [11] employs labels from the stereo-camera, which are used to train a deep hierarchical network, and predict realtime traversibility. [3] describe methods to shortrange and long-range terrain classification. The long-range terrain classification is image-based and learns geometry-based terrain appearance. All of the above solutions, emphasize use of multiple sensors, and multi sensor data-fusion tech-
2.1
Contributions The key-contributions of this work lie in:
– Monocular vision based terrain learning.
– Computationally inexpensive terrain model and update. – Pipeline based perception architecture - sequential individual stages and pixel level parallelization. – Computations restricted to integer-onlyarithmetic, and bit-shift based clustering.
3
Algorithm Description
The monocular vision perception scheme is a pipeline based design. The individual stages in the pipeline are amenable to parallelization, for operations at the pixel level, as shown in Figure 2. The 5 major parts of the pipeline are as follows: A Formation of octave pyramid. B Pre-filtering near-range pixels, which correspond to navigable terrain. C Color based K-Means pixel clustering, at the coarse scale version of the image. D Horizon detection – removal of sky region. E Scoring pixels in perception space, by learned models. We describe each stage of the pipeline individually.
3.1
Formation of octave pyramid
An image pyramid as shown in figure 3, is formed for each frame f , in the incoming video feed. Such a pyramid representation, denoted by gk , at level k is helpful, since the search space for color based clustering is smaller at the coarser level of the
Unauthenticated Download Date | 9/10/15 10:59 PM
204
Mishra P., Viswanathan A.
image2.png
Figure 2: An overview of the algorithm – with individual stages and their corresponding outputs. The processing 2. An the algorithm – with individual stages and their corresponding outputs. The pipeline isFigure sequential andoverview each stageofenables parallel processing at the pixel level.
processing pipeline is sequential and each stage enables parallel processing at the pixel level.
ertial measurement and infrared rangefinders. ule ([7]), enabled far-range terrain classification using pyramid. Decimation is performed for each image, ure 4. In ourunit, specific implementation, we The choose 5 probabilistic data-fusion across a laser range finder, and nature of forested terrain and dense foliage led to the to reduce the resolution. We formulate the downlevels for the pyramid. At the coarse scale, the immonocular vision. Training data is delineated by the development of1techniques for in-motion terrain learning sampling schemetraversable as described by [18]. The the original Color based clus32 times employs Theistechnique threesize. stages : feature laser scanner, to identify terrain in front of original the ([23]). age image, f , is convolved with a low-pass filter, h (to tering is now performed on the coarse scale robot. Subsequently, pixels corresponding to the training learning, feature training, and terrain prediction. Featureof the aliasing), evaluated every rth sample. pyramid. learningoctave is performed by identification of points which regionavoid are used to model terrain.at The solution allowsPixel correspond to the ground plane, using LIDAR data. The the vehicle to identify terrain, while(i, j). coordinates are far-range denoted navigable by the ordered-pain stage Pre-filtering uses a visual classifier to labelpixels obstacles, drivingThe at high speed. process is implemented using (1).training3.2 decimation near-range and regions of free-space. This is followed by the The DUC involved autonomous navigation in urban predictive stage, which allows for high speed terrain This section is based on our previous work in environments. The teams had to complete a circuit (1) g(i, j) = ∑ f (k, l)h(i − rk, j − rl) recognition. [11] employs labels from the stereo-camera, [15]. Our perception scheme assumes the following spanning 96 km, while obeying traffic rules, avoiding k,l used to train hierarchical network, and properties of thea deep environment: other competitors, and moving traffic (Editorial, J. Field which are predict real-time traversibility. [3] describe methods to Robotics[2]). Additionally, vehicles hadh(k, to accurately Where the smoothing kernel l) is the binoshort-range and long-range terrain classification. The localize the filter, curb, introduced relative to the to avoid rollover. – Obstacles are different in color, from the domimial byroad, [6]. We use the binomial long-range terrain classification is image-based and learns The typical solution used by teams included stereo-vision, nant terrain shade. kernel as in 2. geometry-based terrain appearance. GPS based localization, and laser scanners to successfully All of the above solutions, emphasize use of multiple 1 maneuverer the urban environment. Carnegie Mellon – The immediate vicinity, of 50cm, in front of the h = [1|4|6|4|1] (2) sensors, and multi sensor data-fusion techniques. ConUniversitys Boss, won the DUC, 16 and the robot archi- sequently, rover navigable. are notis directly applicable to our monocular tecture is described by ([22]). The perception system vision framework. employs a sensor layer, and a fusion layer. An object hypothesis set is used in dynamic obstacle detection and tracking, and is updated using the sensor layer. The fusion layer implements global classification and object management, for successful navigation on urban roads.
3.1 The Formation octave pyramid justificationoffor the premise in our argument, lies in the fact that we operate a proximity image pyramid as shown in figure 3, is formedsensor, for 2.1An Contributions
eachfor frame f , in the incoming feed. Such near-range obstaclevideo detection. Thea pyramid proximity Therepresentation, key-contributions of is this work in: independent k is helpful, since denoted by gk , atlielevel sensor module functionally of the the search space for color based clustering is smaller at the – Monocular vision based terrain learning. perception module, and triggers the control mechLAGR ([12], [13]), led teams into terrain learning in image3.png coarser level of the pyramid. Decimation is performed for anism to execute a turn-in-place maneuver, till a of upto 60x in pixel labeling, processing real- the outdoor, unstructured environments. The LAGR platform – Speedup each image, to reduce the resolution. We formulate clear field of view is obtained. The discussion time HD video. scheme as described by [18]. The original of was equipped with stereo-vision, proximity-sensors, indownsampling near-field obstaclewith avoidance control is beimage, f , is convolved a low-pass filter,schemes h (to avoid th yond the scope of this paper. aliasing), evaluated at every r sample. Pixel coordinates 3 are denoted by the ordered-pain (i, j). The decimation A computationally inexpensive pre-filter is used process is implemented using (1).
Figure 3. A shortofsection of DNA: nucleotide Figure 3: Formation the octave image two pyramid. The figure shows the decimation process restricted to three aggregates levels.
to identify pixels at the coarse level, which correspond to navigable terrain. The l)h(i − rk, details j − rl) of our previ(1) g(i, j) = ∑ f (k, ous approach [15] k,l is summarized below.
The convolution operation can be implemented WhereThe is h(k, implemented using Rethe pre-filter smoothingstage kernel l) is the binomial introduced by [6]. We based use theprocessing. binomial kernel using bit-shifts and addition operations, and is com- filter,gion of Interest (ROI) The as ROI – Computationally inexpensive terrain model and update. in 2. putationally inexpensive. This is illustrated in figoperator is a rectangular window X, of dimension 1 [1|4|6|4|1] (2) 16 The convolution operation can be implemented using Unauthenticated bit-shifts addition operations, and is computationally Download Date | 9/10/15and 10:59 PM – Computations restricted to integer-only-arithmetic, and inexpensive. This is illustrated in figure 4. In our specific bit-shift based clustering. – Pipeline based perception architecture - sequential individual stages and pixel level parallelization.
h=
205
COMPUTATIONALLY INEXPENSIVE . . .
(a) Original
(b) Bit-shifted,
(c) Bit-shifted, b=4
no shift, b = (d) 0 Bit-shifted, b = 3 image4.png (c) Bit-shifted,
it-shifted, b=3
b=4
(d) Bit-shifted, b=5
b=5
(e) Bit-shifted, b=6 (e) Bit-shifted, b = 6 Figure 4: Bit-shifted perception sub-space. The original and bit-shifted versions of the image are shown in (a)-(e). Figure 4. Bit-shifted perception sub-space. The original and bit-shifted versions of the image are shown in quality and contouring after 5 shifts. Consequently we choose b=4, a trade-off The original andPerceptible bit-shifted versionsdegradation, ofquality the image arefalse shown in (a)-(e). (a)-(e). Perceptible degradation, and falseoccurs contouring occurs after 5 shifts. Consequently we choose between computation throughput and perceptual image quality. ontouring occurs after 5 shifts. Consequently we choose b=4, a trade-off b=4, a trade-off between computation throughput and perceptual image quality. ptual image quality. The of near-field control 2 p xdiscussion 2q , as shown in figureobstacle 5. In ouravoidance specific impleschemes is beyond the scope of this paper. A computamentation, we choose the window to be a size 64 x tionally inexpensive pre-filter is used to identify pixels at 128 pixels, at the fine scale. µX correspond = P(znavigable (3) u,v ) ∑ zu,vto the coarseMean: level, which terrain. (u,v)∈S The details of our [15] summarized ), (u2 , v2 ), . . .is, (u Formally, letprevious {(u1 , v1approach m , vm )} ∈ below. S represent the position of pixel inliers in the ROI. To reduce the computational complexity, the ROI is a th The pre-filter stage is implemented using Region of pixel in color space, is represented by The i window of size size 64 x 128 pixels. Consequently the 3 Interest (ROI) based processing. The ROI operator is , vi ) ∈ Rreduces . Consequently, pixels in the ROIand area g(uiequation above to an accumulate step in (4), p x 2q , as shown in rectangular window X, of dimension 2 a bit shift in (5).by X ⊂ {S ∩ f }. Individual pixels, prerepresented figure 5. In our specific implementation, we choose the filtered, in the ROI, are represented by x(u, v) ∈ X. window to be a size 64 x 128 pixels, at the fine scale. Accumulate: . , (uzmu,v, vm )} ∈ S (4) repFormally, let {(u1 , v1 ), (uµ2 ,Xv= 2 ), . .∑ (u,v)∈S resent the position of pixel inliers in the ROI. The ith pixel in color space, is represented by g(ui , vi ) ∈ R3 . Consequently, pixels in the ROI are represented by X ⊂ {S ∩ f }. Individual pixels, pre-filtered, in the ROI, are represented by x(u, v) ∈ X. image5.png Additionally, we adopt the following convention in our discussion:
dance control A computantify pixels at gable terrain. s summarized
ng Region of operator is a , as shown in we choose the fine scale. m )} ∈ S repROI. The ith (ui , vi ) ∈ R3 . ented by X ⊂ the ROI, are
vention in our
The mean pixel value in the active window is given by Mean: (3): µX = ∑ zu,v P(zu,v ) (3) (u,v)∈S
To reduce the computational complexity, the ROI is a window of size size 64 x 128 pixels. Consequently the ) (4), and (3) Mean: µXan=accumulate u,vin ∑ zu,v P(z above equation reduces to step (u,v)∈S a bit shift in (5). Accumulate:
∑
zu,v
(4)
(u,v)∈S
To reduce the computational complexity, the ROI is a window of size size 64 x 128 pixels. Consequently the above equation reduces to an accumulate step in (4), and a bit shift in (5).
Accumulate:
µX − Mean of the ROI.
µX =
∑
zu,v
(4)
(u,v)∈S
P(xu,v )− Image histogram, at the pixel intensity value xu,v . 5: Pre-filtering based on the Region of Interest ntensity value Figure Figure 5. Pre-filtering based on the Region of (ROI), highlighted in orange. The image on the right are Interest (ROI), highlighted inwindow orange. image The mean in one the active isThe given by (3): pixels whichpixel fall value within standard deviation of ROI given by (3): mean. on the right are pixels which fall within one
standard deviation of ROI mean.
5
µX =
Figure 5: Pre-filtering based on the Region of Interest (ROI), highlighted in orange. The image on the right are pixels For which withinsize one as standard of ROI thefall window above,deviation we require a 11 mean.
bit right-shift for division as in (5).
5
Additionally, we adopt the following convention in our discussion:
Shift:
µX = µX p ∗ q
(5)
– µX − Mean of the ROI. – P(xu,v )− Image histogram, at the pixel intensity value xu,v .
The computation of pixel mean, is used in the cluster initialization stage, as described next.
Unauthenticated Download Date | 9/10/15 10:59 PM
1 bit right-
a past history of the color clusters, to update the terrain model.
(5)
3.4
he cluster
Horizon detection
To206 update the model, we score individual pixels in the visual field, based on computation of a distance metric from each cluster center. To reduce the number of computations, pixels which comprise ofSpace the sky region, 3.3 Clustering in Perception
tion using we operate ual pixels, of x(u, v). t(u, v) ← x. Conse-
K-means clustering (Duda and Hart[1]) is performed on the m training examples. Specifically, the training data set is {t (1) ,t (2) , . . . ,t (m) } ∈ T .
The cluster initialization is performed for each of the K cluster centroids {µ1 , µ2 , . . . , µK ∈ R3 }. µ1 is set to the mean µX , and each of the rest µk are randomly initialized.
T . This arch space n of color colors. In choice led hroughput, lor space,
The cluster assignment step, for the ith training example is indexed by (6).
(a)
C(i) ← k that minimizes t (i) − µk
image6.png
(6)
Subsequently, the move centroid step is performed using (7). 1 µk ← (7) ∑ t (i) |Ck | i∈C k
performed aining data (b)
ach of the s set to the initialized. g example
The cost function which is minimized is given by (8). J(c(1) , . . . , c(m) , µ1 , . . . , µK ) =
(6)
1 m (i) ∑ x − µc(i) m i=1 (8)
At this stage of the pipeline, we have K-color clusters, which become training models representative of navigable terrain, as shown in figure 6. We use this data, and a past history of the color clusters, to update the terrain model.
(c)
med using (7)
by (8).
c(i)
Mishra P., Viswanathan A.
3.4
(8)
Horizon detection
(d)
or clusters, of navigadata, and
Figure 6: Original image (a). Training clusters (K=3) in Figure 6. Original image (a). Training clusters (b-d).
(K=3) in (b-d).
6
The pre-filter stage, allows for cluster initialization using the dominant terrain color. Subsequently, we operate on a perception sub-space, T ⊂ X. Individual pixels, t(u, v) ∈ T correspond to bit-shifted image7.png (a) (b) versions of x(u, v). We define a scalar shift operator b to obtain t(u, v) ← x(u, v) b, which is a bitshifted version of x. Consequent the mapping on T b is defined by X → − T . This pre-processing before clustering, reduces the search space from 224 for the standard 24 bit representation of color images, to 2(24−3b) unique combinations of colors. In our implementation, we set b to 4. This choice led to (c) (d) substantial improvement in computation throughput, while maintaining sufficient resolution in color FigureFigure 7. Horizon detection, performed on the 7: Horizon detection, performed on the original space, with distinguishing features. images original (a),(c). images Filtered(a),(c). resultsFiltered after horizon resultsremoval after shown in (b),(d). horizon removal shown in (b),(d).
is turning. In su and the horizon d pipeline. ghorizon is upda
ghoriz
Figure 7 shows been performed. update the Mixtur Individual pixe the Mahalanobis this stage, we hav mean µ, covarianc or cluster mass, ω we have N learn history of domina Initially, all th update occurs, if are removed using a horizon detection scheme. Horizon training model, a Unauthenticated detection, is performed at the coarse scale by thresholding Download Date | 9/10/15 10:59 PM (where subscripts pixel intensity values. The variance of the dominant color and L denotes a l model, which corresponds to the ROI, is computed using
207
COMPUTATIONALLY INEXPENSIVE . . .
To update the model, we score individual pixels in the visual field, based on computation of a distance metric from each cluster center. To reduce the number of computations, pixels which comprise of the sky region, are removed using a horizon detection scheme. Horizon detection, is performed at the coarse scale by thresholding pixel intensity values. The variance of the dominant color model, which corresponds to the ROI, is computed using (9). σ2X =
∑
(u,v)∈X
(zu,v − mX )2 P(zu,v )
learned models, which incorporate the past history of dominant terrain color, with N > K. Initially, all the N learned models are null. Model update occurs, if there is an overlap between the kth training model, and nth learned model, according to (13) (where subscripts T denotes one of the K training models, and L denotes a learned model N). These equations are reproduced from research by [7], with modification to reduce computation cost.
(9)
We apply a thresholding scheme for pixels falling within two standard deviations of the mean in the active window, denoted by ghorizon . 1, if zu,v ∈ (−2σ2SX , 2σ2SX ). ghorizon (u, v) = 0, otherwise. (10) The horizon in this case is defined as the row of pixels in ghorizon which have the largest occurrence of zeros, or minimal sum. Formally, the horizon H, for an n x m image is defined as a collection of pixels at row uh : {h(uh , 1), h(uh , 2), . . . , h(uh , m)} ∈ H.
(µL − µT )T (ΣL + ΣT )−1 (µL − µT )T ≤ 1
(13)
We restrict the covariance matrix to be diagonal, and save computation time associated with full covariance matrix inversion. When the condition in 13 is satisfied, the learned model is updated. The update equations are as in (14)–(16): µL ←
µL ωL + µT ωT ωL + ωT
(14)
ΣL ←
ΣL ωL + ΣT ωT ωL + ωT
(15)
uh is obtained using (11).
uh ← ui that minimizes
m
∑
ui ,v=1
ghorizon (ui , v) (11)
This scheme is limited to instances where a clear horizon is defined in color space, it fails when the rover is turning. In such scenarios, uh gets assigned to null, and the horizon detection stage is skipped in the overall pipeline. ghorizon is updated according to (12). 1, if u < uh . ghorizon (u, v) = 0, otherwise.
ωL ← ωL + ωT
3.5
Scoring pixels in perception space
If the current training mode does not match any of the learned models, as per (13), then the model update is performed using (17).
(12)
Figure 7 shows images, after the horizon removal has been performed. We now have our training data ready, to update the Mixture of Gaussians. Individual pixels in the visual field are scored based on the Mahalanobis distance from the learned models. At this stage, we have K color clusters, characterized by their mean µ, covariance ∑, and number of pixels in the cluster, or cluster mass, ω. In addition to the K training models, we have N
(16)
µL ← µT
∑←∑ L
T
ωL ← ωT
that minimizes ωn .
(17)
This summarizes the scoring mechanism and learning scheme, to incorporate a history of terrain color in the perception scheme. Figure 8 shows learning dominant color models, in an indoor environment.
Unauthenticated Download Date | 9/10/15 10:59 PM
← µT µL (a)
∑←∑ L
208
T
ωL ← ωT
(b) that minimizes ωn .
(17)
This summarizes the scoring mechanism and learning (d) color in the (c) scheme, to incorporate a history of terrain perception scheme. Figure 8 shows learning dominant color models, in an indoor environment.
(e)
(f)
Figure 8: Learning terrain models in the indoor environment. Original images (a),(c),(e), dominant terrain colors (b) (a) overlaid with model mass are shown in (b),(d), and (e). Model update occurs by replacing the cluster with lowest mass,image8.png with new training data. Images look best, when viewed in color.
4
(c)
(d)
Experiments
We validate the proposed scheme on several standard databases, across indoor and outdoor environments. Figure 9 is representative of typical test scenarios. (f) (e) also compare the proposed Additionally, we approach R to labeling training data by [7], using the Microsoft Figure 8: Learning terrain models in the indoor environ-
The New College Dataset, is a collection at Oxford, for of the datasets androbot experimental follows.comprises Figure vision and mobile research. setup The dataset 10 shows our experimental rover platform on which we of images captured from an omnidirectional camera, implemented our vision system. a laser ranger, and stereo imagery, over the college’s
grounds and neighboring parks. We validate our algorithm Description using the vision captured by Viswanathan the Ladybug Mishra P., A. 4.1 of data, Datasets platform. The dataset is described in [17]. WeFor chose to validate on heterogeneous validation in theterrain indoorlearning environment, we run tests illumination conditions, varyingDatabase obstacle for field configuraon the standard IDOL2(Image rObot Localcomprises of several milesdynamism. of urban isroad, and intions, anddatabase. diverse environment The standard ization) A formal description available by cludes vehicles and pedestrians during difdata bases includeincorporates monocular urban road sequences, from [14]. Themoving dataset several trials under varying times offield the day. theferent RAT-SLAM experiments as described by [4]. The dataset, comprises of several miles of urban road, and includes moving vehicles and pedestrians during different times of the day. The New College Dataset, is a collection at Oxford, for vision and mobile robot research. The dataset comprises of images captured from an omnidirectional camera, a laser ranger, and stereo imagery, over the college’s groundsimage10.png and neighboring parks. We validate our algorithm using the vision data, captured by the Ladybug platform. The dataset is described in [17]. For validation in the indoor environment, we run tests on the standard IDOL2(Image Database for rObot Localization) database. A formal description is available by [14]. The dataset incorporates several trials under varying Figure 10:10. The ‘Freelancer,’ a custom builtbuilt robotic platFigure The ‘Freelancer,’ a custom robotic form, equipped with a camera and proximity sensors, platform, equipped with a camera and proximityis used for field testing our perception scheme.
sensors, is used for field testing our perception
Figure 8. Learning terrain models in the indoor ment. Original images (a),(c),(e), dominant terrain colors scheme. overlaid with model massimages are shown in (b),(d), and (e). 8 environment. Original (a),(c),(e), dominant The New College Dataset, is a collection at OxModel by replacing the cluster withshown lowest terrainupdate colorsoccurs overlaid with model mass are ford, for vision and mobile robot research. The mass,inwith new and training data. Images best, (b),(d), (e). Model updatelook occurs bywhen dataset comprises of images captured from an omviewed in color. replacing the cluster with lowest mass, with new training data. Images look best, when viewed in color.
4 Experiments 4 Experiments We validate the proposed
scheme on several standard databases, across indoor and outdoor environments. We validate the proposed scheme on several Figure 9 is representative of typical test scenarios. standard databases, indoor and outdoor enAdditionally, we also across compare the proposed approach R vironments. to labeling training data by [7], using the Microsoft
nidirectional camera, a laser ranger, and stereo imagery, over the college’s grounds and neighboring parks. We validate our algorithm using the vision data, captured by the Ladybug platform. The dataset is described in [17]. in thea indoor we FigureFor 10: validation The ‘Freelancer,’ custom environment, built robotic platrun tests on the standard IDOL2(Image Database form, equipped with a camera and proximity sensors, is forfor rObot database. A formal descripused fieldLocalization) testing our perception scheme.
tion is available by [14]. The dataset incorporates Figure 9 is representative of typical test scenarseveral trials under varying illumination conditions: ios. Additionally, we also compare the proposed 8 cloudy, sunny, and at night. Additionally, the the approach to labeling training data by [7], using the data is scattered with instances of moving people. R KinectTM Sensor, as our range finder. Microsoft The above datasets span structured outdoor and A brief description of the datasets and experimenindoor environments. Additional experiments, in tal setup follows. Figure 10 shows our experimental unstructured environments were also conducted. rover platform on which we implemented our vision The dataset, populated at The Cubbon Park, in Bansystem. galore, offered several navigable terrain profiles. We demonstrate the efficacy of the method, on a 4.1 Description of Datasets walkway, in a bamboo forest, and on dirt and grass We chose to validate terrain learning on heteroterrain at the park. We describe the outcome of the geneous illumination conditions, varying obstacle experiments, based on absolute pixel misclassificafield configurations, and diverse environment dytion error in the following section. namism. The standard data bases include monocular urban road sequences, from the RAT-SLAM field experiments as described by [4]. The dataset,
Unauthenticated Download Date | 9/10/15 10:59 PM
209
COMPUTATIONALLY INEXPENSIVE . . .
New College Dataset, Oxford
St. Lucia Suburb (RAT-SLAM Dataset)
IDOL2 Dataset, KTH
Cubbon Park Dataset
Urban Road and Apartment Trials
–Outdoor –Structured –Low Dynamism –Cloudy weather
–Outdoor –Structured –High Dynamism –Clear skies
–Indoor –Structured –Low Dynamism –Varying lighting
–Outdoor –Unstructured –Static –Varying lighting
–Outdoor –Structured –Varying dynamism –Varying lighting
(b) In urban traffic
(c) Moving people
(d) Grassy terrain
(e) Parking lot
(g) Car as obstacle
(h) Person avoided
(i) Free space
(j) Free space
(l) Shadow on road
(m) Cluttered indoor
(n) Sunlight on dirt terrain
(o) Traffic Intersection
(r) Free space
(s) Learns patchy terrain
(t) Delineates road, misclassified white cars
(a) Tree at the Quad
image9.png
(f) Free space
(k) At the Quad
(p) Learns grass terrain
(q) Shadow is navigable
Figure 9: The figure shows the spectrum of experiments 14 used to validate the proposed method, on standard datasets. Figure 9. The figure shows the spectrum of experiments used to validate the proposed method, on standard The variations in test conditions include illumination changes, environment dynamism, and structure. The tests were datasets. Theonvariations in test conditions illumination changes, and conducted off-line standard datasets, and on-lineinclude for the field and parking lot tests.environment Images look dynamism, best, when viewed structure. The tests were conducted off-line on standard datasets, and on-line for the field and parking lot in color.
tests. Images look best, when viewed in color
Unauthenticated Download Date | 9/10/15 10:59 PM
navigable terrain profiles. We demonstrate the efficacy of the method, on a walkway, in a bamboo forest, and on dirt and grass terrain at the park. We describe the outcome of the experiments, based on absolute pixel misclassification error in the following section.
210 4.2 Training Stage Validation
Mishra P., Viswanathan A.
Figure 11: Precision-Recall Curve for the training stage, Subsequently, location with true positiveswe ∈ Xcompare false positive ∈ X. correD ∩ X and pairwise
section we describe comparing our method, with 4.2In this Training Stage Validation
the standard implementation using a laser range finder, In [7]. this The section describe comparing by pixels,wechosen as training samplesour are delineated by parsing through range finder method, withnavigable, the standard implementation using a data. The Kinect sensor, captures three dimensional scene laser range finder, by [7]. The pixels, chosen as depth, upto 10 meters from the current position of the training samples are delineated navigable, by parssensor. The point cloud , is a collection of samples, ing{dthrough range finder data. 3 The Kinect sen1 , d2 , . . . , dm } ∈ D, where di ∈ R . di denotes the Cartesor,sian captures threeofdimensional scene depth, uptoto coordinates a point in space, with reference the origin.position of the sen10 camera meterscoordinates from theat current the authors of [7],samples, model the sor. To Theobtain pointtraining cloudsamples, , is a collection height difference uncertainties between samples as a tem3 {d1 , d2 , . . . , dm } ∈ D, where di ∈ R . di denotes the poral Markov chain. The details of the implementation, Cartesian of aand point in space, with refthat usescoordinates a laser scanner, 6 degree of freedom pose erence to camera estimator ( [19]).coordinates at the origin. Weobtain simplifytraining the procedure, by thresholding the [7], point To samples, the authors cloud, to restrict data to near-range navigable terrain. model the height difference uncertainties between Formally, we define an threshold dth computed using (18). samples as a temporal Markovfunction chain. isThe detailsfrom of The choice of the exponential motivated theour implementation, that uses laser scanner, and 6 previous investigation into alearning from raw-sensor depthof([16]), wherepose the parameters and b are learned at degree freedom estimator a( [19]). system initialization.
at night. stances of
nd indoor structured set, popued several efficacy of nd on dirt utcome of ssification
dth = ae
bz
spondence using XD and pre-filtered pixel locations, X in the ROI. Figure 11 shows the precision-recall Subsequently, we compare pairwise location corresponOurpre-filtered experimental suggest that curve, at dXthD. and dence using pixel results locations, X in the at the worst case, the position difference between ROI. Figure 11 shows the precision-recall curve, at dth . Our experimental the worst case, the pixels in X andresults XD issuggest limitedthat to at30. position difference between pixels in X and XD is limited to 30. Table 1. Average computation time at different
pipeline stages. The averages, standard deviation,
Table Average computation different pipeline and 1:max time is calculatedtime per atframe update. The stages. The averages, standard deviation, and max time results are compiled for the Cubbon Park dataset, is calculated per frame update. The results are compiled for a 5 level octave pyramid, bit-shift b = 3, N = 5, for the Cubbon Park dataset, for a 5 level octave pyramid, bit-shift b = 3, N = 5, andand k = k3.= 3. Pipeline Average Standard Max Pipeline Average Standard Stage time (s) Deviation (s) Time(s)
Stage
0.151172
0.000095
0.000018
0.000095
0.008524
0.000502
0.151172
0.000119
0.000018
0.008974
Pixel Pre-Filter Clustering Stage 0.0002360.008524 0.001347 0.000502 0.000265 (Training Stage) Pixel Pixel Clustering 0.0067100.000236 0.036943 0.001347 0.009510 Clustering (Model Update) (Training Stage) Horizon Detection 0.000326 0.002104 0.009565
Pixel Clustering (Model Update) Horizon Detection
image11.png
(19)
tigation into learning from raw-sensor depth ([16]), where the parameters a andtime b are system Table 1: Average computation at learned differentatpipeline stages. The averages, standard deviation, and max time initialization. bz results are compiled is calculated per frame update. The dth = ae (18)
Pre-Filter Stage
0.008524
0.000502
Pixel Clustering (Training Stage)
0.000236
0.001347
0.008974 0.000265
0.008974 0.000265
0.036943
0.009510
0.000326
0.002104
0.009565
Results
Our experimental results are summarized in figure 9. Statistics like worst case pixel misclassification, is found in our previous work ([15]). The change in frame throughput, using the octave pyramid, bit-shifted perception space, and a combination of both is shown in the graphs below. The performance curves (figures 12, 13 and 14), show increased throughput by forming the octave pyramid.The time for individual operations in the pipeline is included in table 1. The saving in computation time, is less pronounced when using only the bit-shifted perception subspace. A combination of the two schemes, allows a reduction of 0.8 seconds, at b = 1.
Unauthenticated Download Date | 9/10/15 10:59 PM
0.000119
0.006710
9
for the Cubbon Park dataset, for a 5 level octave pyramid, bit-shift b = 3, N = 5, and k =is3.performed to obtain pixel Point cloud filtering the point Pipeline Average Standard locations, (x, y) ∈ X samples byMax subD , of training e terrain. Stage time (s) Deviation (s) Time(s) space decomposition. The decomposition is persing (18). formedPyramid according to 19. ated from Octave 0.008133 0.054102 0.151172 aw-sensor (19) XD = {(x, y) | (x, y, z) ∈ D and z < dth } learned at Bit-shift level 0.000095 0.000018 0.000119
ixel loca-
0.054102
Bit-shift level
Bit-shift level
Figure 11: 11. Precision-Recall CurveCurve for thefor training stage, Figure Precision-Recall the training hod, with with ∩ X and false positive ∈ X. true positives ∈ X stage, with trueDpositives ∈ XD ∩ X and false ge finder, positive ∈ X. mples are ge finder Subsequently, we compare pairwise location corresponprocedure, thresholding the onal scene dence We usingsimplify XD and the pre-filtered pixelby locations, X in the point cloud, to restrict data to near-range on of the ROI. Figure 11 shows the precision-recall curve, navigaat dth . bleexperimental terrain. Formally, we define anworst threshold dth samples, Our results suggest that at the case, the the Carte- position computed usingbetween (18). The choice of the exponendifference pixels in X and XD is limited erence to to tial 30. function is motivated from our previous inves-
(18)
0.008133
0.054102
5
model the as a temmentation, dom pose
Max Time(s)
0.008133
Pre-Filter Stage
Point cloud filtering is performed to obtain pixel locations, (x, y) ∈ XD , of training samples by sub-space decomposition. The decomposition is performed according to 19. XD = {(x, y) | (x, y, z) ∈ D and z < dth }
Deviation (s)
Octave Pyramid
Octave Pyramid
(18)
time (s)
reduction 0.8 seconds, at bbit-shifted = 1. using the of octave pyramid, perception space, and a combination of both is shown in the graphs below. The performance curves (figures 12, 13, and 14), show increased throughput by forming the octave pyramid.The time for individual operations in the pipeline is included COMPUTATIONALLY ... in table 1. The INEXPENSIVE saving in computation time, is less pronounced when using only the bit-shifted perception subspace. A combination of the two schemes, allows a reduction of 0.8 seconds, at b = 1.
saturates beyond that value. Choosing b > 5 produces resultant images of poor visual quality.
frame throughput, compared to the original approach. The 211by computation time is reduced by orders of magnitude, learning dominant terrain color in bit-shifted perception sub-space, at thegain coarse level of anincreases octave pyramid. Figure 13: The in throughput till b = 3, and to the original approach. The computation saturates beyond thatmonocular value. Choosing b > 5 time produces Limitations of the vision framework, liesisin reduced by orders of magnitude, by learning domresultant imagesobjects of poorwhich visualare quality. misclassifying similar to the dominant inant color corner in bit-shifted perception subterrainterrain color. Such cases violate our assumption of heterogeneous free-space obstaclepyramid. color. Specific space, at the coarse level ofand an octave frame throughput, compared to the original approach. The computation time is reduced by orders of magnitude, by Figure 12: Exponential reduction in processing time, is learning dominant terrain color in bit-shifted perception image12.png seen at coarse levels of the octave pyramid. sub-space, at the coarse level of an octave pyramid. Limitations of the monocular vision framework, lies in misclassifying objects which are similar to the dominant terrain color. Such corner cases violate our assumption image14.png 6 Conclusion of heterogeneous free-space and obstacle color. Specific
In this paper, we have described a computationally inexpensive to identify terrain,time, using Figure 12: algorithm Exponential reductionnavigable in processing is aseen color based clustering approach. Additionally, at coarse levels of thereduction octave pyramid. Figure 12. Exponential in processing our primary sensing modality, is a single consumer grade time, is seen at coarse levels of the octave pyramid. web-camera. The proposed monocular vision scheme, is shown to be as effective in delineating training samples, 14:14. OnOn combining the the twotwo schemes, it is itevident as data fusion from a laser ranger. Table 2 Figure Figure combining schemes, is 6 multi-sensor Conclusion from the figure, that operating at the coarse level allows shows the mean error rate of the classifier, based on the evident from the figure, that operating at the coarse reduction in computation overhead. Adexperiments conducted. In this paper,wewe have described a computationally in- for significant level allows for significant reduction in The scheme is completely implemented using anusing inte- ditionally, a boost (of 0.87 seconds) is obtained by bitexpensive algorithm to identify navigable terrain, n fig. 9. Statiscomputation overhead. Additionally, a boost (of ger data-path; bit-shifts and accumulate operaa color based employs clustering approach. Additionally, our shifting the perception subspace, using b > 2. An overall is found in our 0.87 seconds) is obtained byisbit-shifting in performance of upto 60x achieved. the tions. Oursensing perception method,ishas exponentially faster per boost primary modality, a single consumer grade me throughput, image13.png web-camera. The proposed monocular vision scheme, is perception subspace, using b > 2. An overall boost ception space, in performance of upto 60x is achieved. shown to be as effective in delineating training samples, graphs below. 10 as multi-sensor data fusion from a laser ranger. Table 2 Figure 14: On combining the two schemes, it is evident and 14), show monocular vision framework, the figure, of thatthe operating at the coarse level allows shows the mean error rate of the classifier, based on the fromLimitations e pyramid.The lies in misclassifying which are similar to for significant reductionobjects in computation overhead. Adexperiments we conducted. ne is included ditionally, a boost (of color. 0.87 seconds) is obtained bitthe dominant terrain Such corner casesbyviThe scheme is completely implemented using an intetime, is less shifting the perception subspace, using b > 2. An overall ger data-path; employs bit-shifts and accumulate operaolate our assumption of heterogeneous free-space ted perception boostobstacle in performance upto 60x is achieved. tions. Our perception method, has exponentially faster per and color. of Specific examples are evident emes, allows a Figure 13: The gain in throughput increases till b = 3, and in the case of gray cars misclassified as traversable, Figurebeyond 13. Thethat gain in throughput till b saturates value. Choosing increases b > 5 produces resultant of poor visualthat quality. = 3, andimages saturates beyond value. Choosing b > 10on the urban road tests. We acknowledge this lim-
essing time, is .
utationally interrain, using ditionally, our onsumer grade ion scheme, is ining samples, anger. Table 2 , based on the
5 produces resultant images of poor visual quality. frame throughput, compared to the original approach. The 6 Conclusion computation time is reduced by orders of magnitude, by learning dominant terrain color in bit-shifted perception In this we level have of described computationsub-space, at paper, the coarse an octaveapyramid. ally inexpensive identify navigable Limitations of thealgorithm monoculartovision framework, liesterin rain, using aobjects color based Admisclassifying which clustering are similar approach. to the dominant terrain color. our Such corner cases violate our assumption ditionally, primary sensing modality, is a sinofgle heterogeneous free-space and obstacle color. Specific consumer grade web-camera. The proposed
monocular vision scheme, is shown to be as effective in delineating training samples, as multi-sensor data fusion from a laser ranger. Table 2 shows the mean error rate of the classifier, based on the experiments we conducted. The scheme is completely implemented using an integer data-path; employs bit-shifts and accumulate operations. Our perception method, has exponentially faster per frame throughput, compared
itation in our perception scheme, which makes a strong case for multi-sensor data fusion. The fusion module would operate on a higher hierarchical layer, taking into account the terrain learnt using a single camera, and there are gains to be realized here.
We tested monocular vision based terrain learning in diverse terrain types. Validation spanning variations in obstacle field configuration, illumination conditions, and environment structure, shows the feasibility of our approach.
7
Aknowledgements
The authors gratefully acknowledge Professor Koshy George, Director at the PES Centre for Intelligent Systems, for his feedback and suggestions.
Unauthenticated Download Date | 9/10/15 10:59 PM
Figure 14: On combining the two schemes, it is evident from the figure, that operating at the coarse level allows
elf-supervised learning algorithm on different datasets. Description (Structured/ Unstructured)
(Static/ Dynamic)
Structured
Static
Unmodified Algorithm
212
Structured
Structured
Structured Structured Structured
5.88 %
Image Pyramid (scaled 1/32x Original)
Image Pyramid and Bit-shifts (b = 3)
6.87%
7.24%
Table 2. Mean error rate of the self-supervised learning algorithm on different datasets. Table 2. Mean rate of the self-supervised learning algorithm on different datasets. Dynamic 12.13 %error 12.55% 12.67% Table 2: Mean error rate of the self-supervised learning algorithm on different datasets. Dataset Description Unmodified Image Pyramid Image Pyramid Dataset Description Unmodified Image Image Pyramid Dataset Description Unmodified Image Pyramid Image (Outdoor/ (Structured/ (Static/ Algorithm (scaledPyramid 1/32xPyramid and Bit-shifts (Outdoor/ (Structured/ (Static/ Algorithm (scaled 1/32x and Bit-shifts (Outdoor/ (Structured/ (scaled 1/32x and Indoor) Unstructured) (Static/ Dynamic) Algorithm Original) (b =Bit-shifts 3) Indoor) Dynamic) Original) Original) (b = 3) Dynamic 5.5 % 7.32% Unstructured) 7.76% Indoor) Unstructured) Dynamic) (b = 3) Standard Test Datasets Standard Test Datasets Standard TestDataset Datasets Outdoor New College OutdoorStructured StructuredStatic Static5.88 % 5.88 % 6.87% 7.24% 7.24% New College Dataset 6.87% New College Dataset Outdoor Structured Static 5.88 % 6.87% 7.24% (Oxford Mobile Robotics) (Oxford Mobile Robotics) (Oxford Robotics) Static Mobile11.97% 12.45% 13.66% St. Outdoor 12.13 % 12.1312.55% St.Lucia LuciaLoop Loop OutdoorStructured StructuredDynamic Dynamic % 12.55%12.67% 12.67% (Rat-SLAM Dataset) St. Lucia Loop Outdoor Structured Dynamic 12.13 % 12.55% 12.67% (Rat-SLAM Dynamic Dataset) 15.34 % 16.12% 16.83% (Rat-SLAM Dataset) IDOL2 Dataset Indoor Structured Dynamic 5.5 % 7.32% 7.76% IDOL2 Dataset10.67% Indoor Structured 7.32% 7.76% Dynamic 11.31% 12.98% Dynamic 5.5 % IDOL2 Dataset Indoor Structured Dynamic 5.5 % 7.32% 7.76%
Test Datasets Unstructured Custom Dynamic 15.56 % 17.88% 18.45% Apartment Complex Structured Static 11.97% 12.45% Custom Test Datasets Outdoor Custom Test Datasets Apartment Complex Outdoor Structured Static 11.97% Urban Road Outdoor Structured Dynamic 15.34 % 16.12% Apartment Complex Outdoor Structured Static 11.97% Robotics: Science and Systems II, August 16-19, Urban Road OutdoorStructured StructuredDynamic Dynamic % 2006. University of Pennsylvania, Philadelphia, University Campus Outdoor 10.67% 15.3411.31% Urban Road Outdoor Structured Dynamic 15.34 % Pennsylvania, USA. The MIT Press, 2006. Cubbon ParkCampus Dataset Outdoor 15.56 % 10.67% 17.88% University OutdoorUnstructured StructuredDynamic Dynamic [7] Bob Davies and Rainer Lienhart. Using cart to University Campus Outdoor Structured Dynamic 10.67% segment road images. pages 60730U–60730U–12, Cubbon Park Dataset Outdoor Unstructured Dynamic 15.56 % 2006. Larry H. Cubbon Park Dataset Outdoor Unstructured Dynamic 15.56 %
Turmon. nd-to-end l of Field
, Gordon d. Openm system. 3.
sults: EdSeptember
n pyramid ons, IEEE
r,
David Bradski. ection in e, Stefan x, editors,
Mishra P., Viswanathan A. Mishra P., Viswanathan A.
13.66%
12.45% 12.45%16.83% 16.12%12.98% 16.12% 11.31%18.45% 11.31% 17.88% 17.88%
13.66% 13.66% 16.83% 16.83% 12.98% 12.98% 18.45% 18.45%
examples areDeSouza evident inand the case gray cars misclassified [8] G.N. A.C.of Kak. Vision for mo- References as traversable, on the urban road tests. We acknowledge bile robot navigation: a survey. Pattern Analysis terrain. In d.g. Gaurav Sukhatme, Stefan References this limitation in ourIntelligence, perception scheme, which makes on, a [1] R.o. desert duda, p.e. hart, and stork,S.pattern classifiand Machine IEEE Transactions Schaal, Wolfram Burgard, and Dieter Fox, editors, strong case for multi-sensor desert In wiley Gaurav S. Sukhatme, Stefan References cation, newterrain. york: John & sons, 2001, pp. xx 24(2):237–267, 2002.data fusion. The fusion mod[1]would R.o. operate duda, p.e. and d.g. stork, pattern Robotics: Science and Systems II,24(2):305– August 16-19, ule on hart, a higher hierarchical layer, classifitaking Schaal, Burgard, Dieter Fox, editors, + 654, isbn: Wolfram 0-471-05669-3. J. and Classif., cation, new york: John wiley & sons, 2001, pp. xx [1] R.o. duda, p.e. hart, and d.g. stork, pattern classifi2006. University of Pennsylvania, Philadelphia, into account the terrain learnt using a single camera, and [9] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern 307,Robotics: SeptemberScience 2007. and Systems II, August 16-19, +Classification, 654, isbn: 0-471-05669-3. J.&Classif., 24(2):305– cation, new york: John wileyand sons, pp. xx there are gains to be realized here. Pennsylvania, USA. MIT Press, 2006. John Wiley Sons, 2001, 2nd Edition, 2006. University of The Pennsylvania, Philadelphia, 307, September 2007. + 654, isbn: 0-471-05669-3. J. Classif., 24(2):305– 2001. vision based terrain learning in [2] [8] Pennsylvania, USA. The MIT Press, 2006. We New testedYork monocular J. Field Robot., 25(9), 2008. Bob Davies and Rainer Lienhart. Using cart to 307, September 2007. diverse terrain types. 25(9), Validation variations [2] J. Field Robot., 2008.P. spanning segment roadand images. pages 60730U–60730U–12, [8] Bob Davies Rainer Lienhart. Using cart to [10] John A. Grant, Matthew Golombek, John P. in obstacle field configuration, illumination conditions, [3] Maxsegment Bajracharya, Andrew Howard, Larry H. 2006. [2] J.Grotzinger, Field Robot.,Sharon 25(9), 2008. road images. pages 60730U–60730U–12, A. Wilson, Michael M. [3]environment Max Bajracharya, Larry and structure, Andrew shows theHoward, feasibility of ourH. Matthies, Benyang Tang, and Michael Turmon. 2006. Watkins, Ashwin R. Andrew Vasavada, Jennifer Griffes, [9] G.N. DeSouza A.C. Kak. for moMatthies, Benyang Tang, and MichaelL. Turmon. approach. [3] Max Bajracharya, Howard, Larry H. Autonomous off-roadand navigation with Vision end-to-end and Timothy J. Parker. The for bile robot navigation: a survey. Pattern Analysis Autonomous off-road navigation with process end-to-end [9] G.N. DeSouza and A.C. Kak. Vision for Matthies, Benyang Tang, andscience Michael Turmon. learning for the lagr program. Journal of Field moselecting for thethe landing site for theJournal 2011end-to-end mars sciand robot Machine Intelligence, IEEE Pattern Transactions on, learning lagr program. of Field bile navigation: Analysis Autonomous off-road navigation with Robotics, 26(1):3–25, 2009.a survey. ence laboratory. Planetary Space of Science, Aknowledgements 24(2):237–267, 2002. Robotics, 26(1):3–25, 2009. andJournal and Machine Intelligence, IEEE Transactions on, learning for the lagr program. Field 59(1112):1114 – 1127,2009. 2011. Geological Mapping 24(2):237–267, Robotics, 26(1):3–25, [4][10] David Ball, Heath, Janet Wiles, Gordon John A. Scott Grant,2002. Matthew P. Golombek, John P. [4] David of Mars.Ball, Scott Heath, Janet Wiles, Gordon Wyeth, Peter Corke, and Michael Milford. OpenThe authors gratefully acknowledge ProfesGrotzinger, Sharon A. Wilson, Michael M. [10] John A. Grant, Matthew P. Golombek, John P. Wyeth,Ball, Peter Scott Corke,Heath, and Michael Open[4] David Janet Milford. Wiles, Gordon ratslam: an open source brain-based slam system. sor George, Director at the PES Centre for Watkins, Ashwin R. Vasavada, Jennifer L. Griffes, [11]Koshy Raia Hadsell, Pierre Sermanet, Jan Ben Ayse Grotzinger, Sharon A. Wilson, Michael M. ratslam:Peter an open source slam system. Wyeth, Corke, and brain-based Michael Milford. OpenAutonomous Robots, 2013. process Intelligent Systems, for hisScoffier. feedback and suggestions. and Timothy J. 34(3):149–176, Parker. The science for Erkan, and Marco Learning long-range Watkins, Ashwin R. Vasavada, Jennifer L. Griffes, Autonomous Robots, 34(3):149–176, 2013. ratslam: an open source brain-based slam system. selecting the landing site for the 2011 mars scivision for autonomous off-road driving. Journal of and Timothy J. Parker. The science process for Autonomous Robots, 34(3):149–176, 2013. [5] Martin Buehler. pages Summary of dgc2009. 2005 results: Edence laboratory. Planetary Spacemars Science, Field Robotics, 120–144, selecting the landing site for and the 2011 sci11 itorial. Buehler. J. Robot.Summary Syst., 23(9):659–660, September [5] Martin of dgc 2005 results: Ed59(1112):1114 2011. and Geological ence laboratory.– 1127, Planetary Space Mapping Science, 2006. J. Robot. Syst., 23(9):659–660, September itorial. of Mars. 59(1112):1114 – 1127, 2011. Geological Mapping 2006. of Mars. [6] P.J. Burt and E.H. Adelson. The laplacian pyramid [11] Raia Hadsell, Pierre Sermanet, Jan Ben Ayse as aBurt compact image code. Communications, IEEE [6] P.J. and E.H. Adelson. The laplacian pyramid Erkan,Hadsell, and Marco Scoffier. Learning [11] Raia Pierre Sermanet, Jan long-range Ben Ayse Transactions on, 31(4):532–540, 1983. as a compact image code. Communications, IEEE vision for off-road driving.long-range Journal of Erkan, andautonomous Marco Scoffier. Learning Transactions on, 31(4):532–540, 1983. Field Robotics, pages 120–144, 2009. vision for autonomous off-road driving. Journal of [7] Hendrik Dahlkamp, Adrian Kaehler, David Field Robotics, pages 120–144, 2009. [12] Wes Huang, Greg Grudic, and Larry Matthies. EdStavens, Sebastian and Gary R. Bradski. [7] Hendrik Dahlkamp,Thrun, Adrian Kaehler, David itorial. Journal of Grudic, Field Robotics, 26(1):1–2, Self-supervised monocular road detection in [12] Wes Huang, Greg and Larry Matthies.2009. EdStavens, Sebastian Thrun, and Gary R. Bradski. itorial. Journal of Field Robotics, 26(1):1–2, 2009. Self-supervised monocular road detection in Unauthenticated Download Date | 9/10/15 10:59 PM
213
COMPUTATIONALLY INEXPENSIVE . . .
[13] L.D. Jackel, Douglas Hackett, Eric Krotkov, Michael Perschbacher, James Pippine, and Charles Sullivan. How darpa structures its robotics programs to improve locomotion and navigation. Commun. ACM, 50(11):55–59, November 2007. [14] J. Luo, A. Pronobis, B. Caputo, and P. Jensfelt. Incremental learning for place recognition in dynamic environments. In in Proc. IROS07, 2007. [15] P. Mishra and A. Viswanathan. Computationally inexpensive labeling of appearance based navigable terrain for autonomous rovers. In Intelligence in Vehicles and Transportation Systems (CIVTS), 2013, IEEE Symposium on,, pages 88–92, April 2013. [16] P. Mishra, A. Viswanathan, and A. Srinivasan. A supervised learning approach to far range depth estimation using a consumer-grade rgb-d camera. In Electronics, Computing and Communication Technologies (CONECCT), 2013 IEEE International Conference on, pages 1–6, 2013. [17] M. Smith, I. Baldwin, W. Churchill, R. Paul, and P. Newman. The new college vision and laser data set. The International Journal of Robotics Research, 28(5):595–599, May 2009. [18] Richard Szeliski. Computer vision: algorithms and applications. Springer, 2010. [19] S. Thrun, M. Montemerlo, and A. Aron. Probabilistic terrain analysis for high-speed desert driving. In Proceedings of Robotics: Science and Systems, Philadelphia, USA, August 2006. [20] Sebastian Thrun, Mike Montemerlo, Hendrik Dahlkamp, David Stavens, Andrei Aron, James Diebel, Philip Fong, John Gale, Morgan Halpenny, Gabriel Hoffmann, Kenny Lau, Celia Oakley, Mark Palatucci, Vaughan Pratt, Pascal Stang, Sven
Strohband, Cedric Dupont, Lars-Erik Jendrossek, Christian Koelen, Charles Markey, Carlo Rummel, Joe van Niekerk, Eric Jensen, Philippe Alessandrini, Gary Bradski, Bob Davies, Scott Ettinger, Adrian Kaehler, Ara Nefian, and Pamela Mahoney. Stanley: The robot that won the darpa grand challenge: Research articles. J. Robot. Syst., 23(9):661–692, September 2006. [21] Edward Tunstel and Ayanna Howard. Sensing and perception challenges of planetary surface robotics, 2002. [22] Christopher Urmson, Joshua Anhalt, Hong Bae, J. Andrew (Drew) Bagnell, Christopher R. Baker, Robert E Bittner, Thomas Brown, M. N. Clark, Michael Darms, Daniel Demitrish, John M Dolan, David Duggins, David Ferguson, Tugrul Galatali, Christopher M Geyer, Michele Gittleman, Sam Harbaugh, Martial Hebert, Thomas Howard, Sascha Kolski, Maxim Likhachev, Bakhtiar Litkouhi, Alonzo Kelly, Matthew McNaughton, Nick Miller, Jim Nickolaou, Kevin Peterson, Brian Pilnick, Ragunathan Rajkumar, Paul Rybski, Varsha Sadekar, Bryan Salesky, Young-Woo Seo, Sanjiv Singh, Jarrod M Snider, Joshua C Struble, Anthony (Tony) Stentz, Michael Taylor, William (Red) L. Whittaker, Ziv Wolkowicki, Wende Zhang, and Jason Ziglar. Autonomous driving in urban environments: Boss and the urban challenge. Journal of Field Robotics Special Issue on the 2007 DARPA Urban Challenge, Part I, 25(8):425–466, June 2008. [23] Shengyan Zhou, Junqiang Xi, Matthew W. McDaniel, Takayuki Nishihata, Phil Salesses, and Karl Iagnemma. Self-supervised learning to visually detect terrain surfaces for autonomous robots operating in forested terrain. J. Field Robot., 29(2):277–297, March 2012.
Unauthenticated Download Date | 9/10/15 10:59 PM