Break on through: Tunnel-based exploration to learn about outdoor terrain Brian P. Gerkey
Motilal Agrawal
Willow Garage
[email protected]
SRI Artificial Intelligence Center
[email protected]
Abstract— We study the problem of designing a ground robot that learns how to navigate in natural, outdoor settings. This robot must learn, through its own experience, a traversability model. This model provides a mapping from the robot’s sensors to a traversability measure. We propose a novel approach to exploration in which the robot purposefully probes into apparent obstacles, with the goal of improving its traversability model. We formulate a decision-theoretic algorithm that identifies potential “tunnels” through apparent obstacles and estimates the benefit of exploring them. We have implemented and tested the algorithm on an outdoor robot platform and we present results from initial experiments that demonstrate the operation of our algorithm and quantify its performance advantage over a baseline system.
That is, the passive robot will never learn on its own that an apparent obstacle is actually traversable terrain, because under autonomous control it will always avoid the apparent obstacle.
I. I NTRODUCTION Adaptability is a key characteristic of intelligence. The ability to adapt to one’s surroundings is necessary for survival in animals, and is a desirable skill for autonomous robots. This skill is all the more valuable for robots that operate in outdoor environments, where they experience a wide variety of structured and unstructured terrain (Fig. 1). We aim to provide robots with the facilities to adapt to, and behave intelligently in, such environments. In particular, we study the problem of designing a ground robot that learns how to navigate in natural, outdoor settings (Fig. 5). This robot must learn, through its own experience, a traversability model. This model provides a mapping from the robot’s sensors (e.g., cameras, laser range finders) to a traversability measure. The traversability measure indicates the ease with which the robot can drive over a particular kind of terrain; it intuitively varies from easy (e.g., flat ground) to difficult (e.g., sand dune) to impossible (e.g., redwood tree). Such models are most often hand coded, using combinations of heuristics, such as thresholds on the gradient of the ground or height of objects relative to the robot. This process is obviously time- and labor-intensive, and requires detailed knowledge of both the robot’s sensors and the environment in which it will operate. It is straightforward to learn a traversability model passively, for example, by recording the robot’s experience as it is teleoperated by a human. Again, significant time and expertise are required of the developer and/or operator to produce a sufficiently complete model, and the resulting model exhibits little flexibility during autonomous operation. In particular, a passively learning robot can acquire new information about difficult terrain by colliding with obstacles that appeared to be traversable (assuming that it can sense collisions), but it will never learn about the opposite case.
Fig. 1.
Our mobile robot navigating in an outdoor environment.
We focus instead on actively learning traversability models, via intelligent exploration. We propose a novel approach to exploration in which the robot purposefully probes into apparent obstacles, with the goal of improving its traversability model. We formulate a decision-theoretic algorithm that identifies potential “tunnels” through apparent obstacles and estimates the benefit of exploring them. This algorithm has the key advantages of being extremely efficient to execute, being independent of the learning algorithm used to construct traversability models, and minimally intruding on the robot’s existing planning and control systems. We have implemented and tested the algorithm on the LAGR robot platform (Fig. 4). We present results from initial experiments that demonstrate the operation of our algorithm and quantify its performance advantage over a baseline system. II. P ROBLEM A. Background The general problem of deciding where to explore, absent a concrete task or goal, is ill posed, and provides no obvious basis for evaluation. Instead, we focus on a robot that is executing a specific task, with the aim of improving its performance on this task through exploration. In particular, we assume that the robot’s task is a “goto”: from some initial location, the robot must navigate to a given goal location, while minimizing travel distance and time. The goal
(a)
(b)
(c)
Fig. 2. Top-down view of cost grids produced during an experimental run. The color code: blue=unknown, red=obstacle, green=traversable
(darker green is higher cost), and purple=exploration tunnel. The robot’s pose is indicated by the red arrow, the robot’s trajectory so far is the green line leading up to the robot, and the optimal path to goal is the cyan line leading away from the robot. Shown in (a) and (b) are, respectively, the cost grid C and the expanded cost grid E, both without exploration. Shown in (c) is the same grid with exploration enabled. Note how the path circumnavigates the obstacle in (b) and is biased to pass through the tunnel in (c).
is specified in a known frame (we use GPS), and the robot continuously estimates its pose within the same frame (we use a combination of GPS, inertial measurement, and visual odometry [1], [2]). The environment, however, is unknown. That is, the robot knows where it is and where the goal is, but has no a priori knowledge about the intervening terrain. The robot must safely and efficiently navigate to the goal, recognizing and avoiding obstacles along the way. We assume that the robot is equipped with sensors that allow it to perceive terrain at distance (we use stereo vision). As the robot drives around, its sensors measure the environment within some sensor-specific field of view. These measurements are incorporated into a two-dimensional, uniform-resolution grid (we use 20 cm×20 cm cells). We represent the sensed data as a feature vector f ∈ Rk , where k is the number of features that the sensors can extract (we use eight geometric features). The sensors yield one feature vector fij for each grid cell (i, j) that is currently in the robot’s field of view. In other words, the robot measures its three-dimensional environment, computes features, and then projects the feature values on a two-dimensional grid, thereby approximating the world as a plane. The result is a feature grid F that contains one vector fij for each grid cell (i, j) that has been observed so far. Newer observations generally overwrite older ones. Unobserved cells contain null vectors. To identify obstacles and plan paths, we must convert the k-dimensional feature grid F into a parallel, one-dimensional cost grid C. This grid contains a cost cij ∈ R for each cell (i, j). Higher costs represent more difficult terrain, and costs above a certain threshold are considered obstacles that the robot cannot traverse (Fig. 2(a)). We assume that the robot uses a traversability model to perform the mapping from feature grid F to cost grid C, and that the robot learns this model online, from its own experience (it may be provided with an initial model that will be updated online).
For each cell of terrain over which the robot drives, the learner is given a new training instance (f, c), where f is the observed feature vector associated with the cell, and c is the proprioceptively measured cost of traversing the cell. Easily traversed terrain (e.g., where the robot can drive fast) is given low cost, difficult terrain (e.g., where the robot’s wheels slip) is given higher cost, and nontraversable terrain (e.g., where the robot collides with an obstacle) is given the highest cost. As new training instances are provided, the learner updates its model. Simultaneously, this model is queried to provide costs for observed cells that the robot has not driven over. That is, for each observed cell (i, j), the learner is given the feature vector fij and must predict a corresponding cost cij . The predicted costs are incorporated into the grid C, which is in turn used for path planning. The learner must also supply an associated confidence value γij ∈ [0, 1] for each cell (i, j), which estimates the probability of the learner being correct in its prediction for that cell. Lower confidence values indicate greater prediction uncertainty, and vice versa. Confidence values are incorporated into a confidence grid Γ, parallel to C. B. Problem statement Under autonomous control, the robot will most often reinforce its existing model, as it drives over apparently low-cost terrain and avoids apparently high-cost terrain. Occasionally, when the model incorrectly predicts low cost for an area that is actually an obstacle, the robot will learn otherwise by colliding with the obstacle and incorporating the resulting training instance. However, the robot will never experience the opposite situation, in which the model incorrectly predicts high cost for an area that is actually traversable, because the robot will always avoid such areas. We study the problem of providing the robot with these important experiences that it otherwise lacks. Specifically, we wish to modify the robot’s behavior so that it is willing to explore apparent obstacle regions so as to
more efficiently reach the goal location, learning an improved traversability model along the way. We assume independent pose-estimation, feature-extraction, model-learning, path-planning, and motion-control systems, none of which is our focus in this paper. We concentrate our efforts on using the cost grid C, the confidence grid Γ, and knowledge of the goal location to identify regions that should be explored
the goal. Because of its computational efficiency, the planner can be executed at the same rate at which we command the robot, around 20 Hz. Thus at all times we have the potential (cost of the optimal path to the goal) for every cell in the cost grid. B. Tunneling
III. R ELATED WORK The topic of exploration has been studied extensively, most commonly in the context of map building. The goal is to identify exploration targets that will cause one or more (usually indoor) robots to build the most complete map [3], [4], [5], [6], [7]. These robots strictly obey a fixed (usually laser-based) obstacle model, and reason only about the value of observing unknown space, not about whether their obstacle model is actually correct. Variations on this exploration problem include coverage [8], [9] and deployment [10]. More closely related to the problem we study is work on learning policies for indoor obstacle avoidance [11]. Slip estimation and compensation have also been studied previously [12], [13], [14]. The focus in such work is usually on adjusting motor commands to keep a slipping robot on a desired trajectory. Because we simply wish to mitigate the risk of exploring difficult terrain (Section V-D), we employ a simpler slip-compensation mechanism. IV. A PPROACH We approach the problem laid out in Section II as a decision-theoretic question: Where should the robot explore so as to maximize the expected benefit in its performance? We answer this question with a simple and efficient algorithm that exploits both the structure of the robot’s task and the structure of our path planner, the salient aspects of which we briefly explain. A. Path planning For global planning, we use a gradient planner [15], [16] that computes optimal paths from the goal to the robot, given the cost grid C [1]. For efficiency, our planner employs the common trick of expanding high-cost areas (e.g., obstacles) in the cost grid by half the width of the robot, to produce a parallel expanded cost grid E. Planning then occurs in the grid E as if the robot were a single point (details such as non-holonomic constraints are handled by a local predictive controller [1]). Fig. 2(a) and Fig. 2(b) show a cost grid without and with expanded obstacles, respectively. The planner deterministically selects the optimal (i.e., lowestcost) path through the expanded grid, with the only constraint being that the robot must not pass through obstacles. A key characteristic of this planner is that it operates by computing a navigation function [17] over the entire cost grid. The value of the navigation function, called a potential, is computed for every cell in the cost grid C, and is stored in a parallel potential grid P . The potential pij ∈ R for cell (i, j) is the total cost of the optimal path from (i, j) to
+ o '' o
o' o-
Fig. 3. Demonstration of the tunneling algorithm. From the robot’s
position (blue triangle), a ray is cast in the cost grid, passing through an obstacle region. The last nonobstacle cell is designated o− , the first obstacle cell is o0 , the last obstacle cell is o00 , and the first nonobstacle cell to follow is o+ .
The first step in our approach is to identify potential regions for exploration. Because the planner will, in its regular operation, send the robot through any nonobstacle region, we restrict our attention to obstacles. We work first in the expanded cost grid E. Beginning at the robot’s current grid position r ∈ Z 2 , we search for nearby obstacles. This search is implemented by raytracing (Fig. 3); in E we cast a set of rays O = {o = (r, θ)|θ ∈ Θ}, where the angles Θ are a discrete sample of the possible angles. We stop a ray o when either the ray impacts an obstacle cell, or the ray’s length exceeds a maximum value dO . Because we operate on a discrete grid, a ray is represented by the list of grid cells through which it passes. In our implementation, we cast rays at 10◦ intervals around the robot, and set dO =6m. Rays that exceed the maximum length are discarded; no nearby obstacles lie in those directions. For each ray o that impacts an obstacle, the last nonobstacle cell is designated o− , and the first obstacle cell is o0 . The ray o is then extended until it impacts another nonobstacle cell, which is designated o+ . The last obstacle cell is designated o00 . Rays that never reach nonobstacle cells are discarded. We collect ˆ Each ray o ∈ O ˆ originates the remaining rays in the set O. at the robot, passes through a nearby obstacle region, and ˆ terminates in a nonobstacle region. We call the rays O tunnels, as they represent potential passageways through obstacles that the robot might exploit to more quickly reach its goal.
C. Evaluation We wish to compute the expected benefit of sending the ˆ While in general this robot through each tunnel o ∈ O. question would be difficult or impossible to answer, we can exploit the fact that the robot is tasked to reach a specific goal location, as quickly as possible. First we compute the difference in potential between the cells before and after the obstacle through which the tunnel passes: ∆po = po− − po+
(1)
Thus ∆po is the difference between the cost of reaching the goal from the beginning of the tunnel and the cost of reaching the goal from the end of the tunnel. If ∆po > 0, then the robot would reach its goal more efficiently if it were to jump through the tunnel instead of circumnavigating the intervening obstacle. Of course, the robot cannot instantaneously jump through the tunnel, so we must consider the travel cost of the tunnel itself. For this purpose, we assume that the cost of each obstacle cell in the tunnel is actually some nonobstacle value ct . In our implementation, we set ct to a moderate value, slightly higher than the cost of flat ground. We compute the cost of passing through the tunnel as co = L(o0 , o00 )ct
(2)
where the operator L((i, j), (k, l)) returns the number of grid cells on the straight (raytraced) line between the two given cells. We also incorporate the confidence values that are produced by the traversability model. Recall that γij ∈ [0, 1] is the estimated probability that the associated cost cij for cell (i, j) is correct. We compute the average confidence of the obstacle cells through which the tunnel passes Po00 (i,j)=o0 γij γ¯o = (3) L(o0 , o00 ) where the sum iterates over the list of cells in the tunnel. Putting it all together, we compute the expected benefit of a tunnel o as bo = (1 − γ¯o )(∆po − co )
(4)
In other words, the benefit of a tunnel is the difference in the expected cost of traveling to the goal using the tunnel and not using the tunnel, weighted by the probability that the tunnel will actually turn out to be traversable. If bo > 0, then we expect that the robot will perform better by exploring tunnel o than by avoiding the obstacle through which it passes. Tunnels with nonpositive benefits are discarded. Note that we incur nearly zero computational overhead for the benefit computation, because the necessary cost, confidence, and potential values were already computed over the entire grid as part of the regular operation of the traversability model and planner. The primary overhead of this algorithm lies in the raytracing operations, which can be performed extremely efficiently, for example, using Bresenham’s algorithm [18].
Fig. 4.
The LAGR robot used in experiments.
D. Modifying the plan Given a set of potentially beneficial tunnels, we must decide how to modify the robot’s behavior so that it will explore them. One possibility would be to construct a new cost grid in which the cost of the tunnel’s cells is set to ct , and then plan a path from the robot to the far end of a tunnel, then plan another path from the end of the tunnel to the goal. Alternatively, we could manually construct a path that passes directly through a tunnel and then continues on to the goal. Both of these approaches require us to select a single tunnel, and force the robot to pass through that tunnel. We opt instead for a simpler and less intrusive approach that allows for multiple tunnels to be considered simultaneously. Rather than directing the robot through a particular tunnel, we bias it toward one or more tunnels by lowering the cost of the tunnels’ cells in the expanded cost grid E, and then running the path planner again. The result is that, if the appearance of a tunnel causes the planner to select a path through the tunnel, the robot will go that way, and otherwise it will not (Fig. 2). In this way, we avoid explicitly designating intermediate goals that are different from the real goal. We also maintain the regular operation of the planner, which, for example, uses an internal hysteresis mechanism to prevent high-frequency oscillations in the chosen path. V. E XPERIMENTAL SETUP A. Vehicle This work was conducted as part of the DARPA Learning Applied to Ground Robotics (LAGR) project. The LAGR robot (Fig. 4) is equipped with two stereo camera devices encompassing a 110◦ field of view, with a baseline of 12 cm each. The robot is near sighted: depth information degrades rapidly after 6 m. The cameras constitute the robot’s only method for detecting distant obstacles. There is also an inertial measurement unit (IMU) with angular drift of several degrees per minute, and a WAAS-enabled GPS. There are four Pentium-M 2 GHz computers, one for each stereo device, one for planning and map making, and one for control
of the robot and integration of GPS and IMU readings. In our setup, each stereo computer performs local map making and visual odometry, and sends registered local maps to the planner, where they are integrated into a global map. The planner is responsible for global planning and local control, sending commands to the controller. The exploration algorithm that we describe in the paper is integrated into the planner. B. Traversability model We base our traversability model on a set of features developed by Happold et al. [19]. They define an eightdimensional space that comprises key geometric features, such as mean height above ground, terrain slope, and point density. These features are computed from the threedimensional point clouds produced by the robot’s stereo vision system. Because it is strictly geometric, this feature space is invariant to color, which can change dramatically when the robot moves into a new environment. C. Learning algorithm The goal of the learning algorithm is to learn the association between feature vectors and cost from the robot’s experience. Each cell that the robot drives over or bumps into adds a new training sample for the learner. We use AdaBoost to learn this association. AdaBoost [20], [21] is a very popular learning algorithm and has been successfully applied to many computer vision problems including face detection [22]. The input to AdaBoost is a set of labeled training data. AdaBoost calls a given weak learner repeatedly in a series of rounds and also maintains a distribution of weights over the training step. Initially, all weights are set equally, but on each round, the weights of incorrectly classified examples are increased so that the weak learner is forced to focus on the hard examples in the training set. Typically, each weak learner is a simple rule. In our case, the weak learner is taken as a simple decision stump (decision tree with two leaves) [22]. We formulate learning as a two-class problem. The easily traversable terrain with low cost is taken as the positive class; the nontraversable terrain (robot gets a bumper hit) and the difficult terrain (the robot’s wheels slips) are taken as the negative class. In practice, the training dataset will be imbalanced because the area of traversable ground is probably more than the footprint of obstacles in most practical environments. Learning from imbalanced data sets, where the number of examples of one (majority) class is much higher than the others, presents an important challenge to the machine learning community. AdaUBoost [23] is a modification to the basic algorithm and deals with uneven datasets by assigning higher weights to the minority class initially. This weight is controlled by an additional parameter β. This parameter also guides the weight update rule in each round of AdaBoost, increasing the weights of the minority class more aggressively than the other class. Each round of AdaBoost produces a weak learner that is a simple rule ht : f −→ [−1, 1] such that the sign of ht
is the predicted binary label and |ht | is the confidence of the classification. We also get a weight αt that controls the influence of each ht . The final strong classifier for T rounds of AdaBoost is simply h(f ) =
T X
αt ht (f )
(5)
t=1
As before, the sign of h is the predicted binary label and |h| is the confidence of the classification When the robot starts up, it builds up an initial classifier from prior training data. This training data is collected by the robot offline during a training phase on a similar terrain. Each new training instance (f, y), where f is the observed feature vector associated with the cell, and y ∈ −1, +1 is the class type, is appended to the training data. This instance is classified using the current classifier and the predicted class is compared against the observed class. Thus the robot continuously monitors its classification accuracy. Once the classification accuracy gets too low, the AdaUBoost training algorithm is called with all the training data collected so far to update the classifier. One of the advantages of AdaBoost is that classification using (5) is extremely fast since each weak classifier is a very simple decision stump. Not only is classification very fast, but also the training phase is very fast for several thousands of training instances. However, the main drawback is that AdaBoost is not designed for online incremental learning. Therefore, we have to store all the training data and start from scratch each time we need to update our classifier. Since the training phase of AdaBoost is fast, this is not a major issue but we are looking into ways of using AdaBoost for online learning [24]. D. Risk mitigation Exploration of apparent obstacle terrain is an inherently risky proposition. The robot will almost certainly guess incorrectly and collide with obstacles some of the time. Such occurrences do not cause serious problems for the LAGR platform, as it is physically robust and equipped with a front bumper to sense (and partially absorb) obstacle impacts. The more difficult issue with the LAGR platform is its vulnerability to becoming stuck in slippery terrain, especially loose soil such as sand or gravel, and dense underbrush or deadfall. It is quite possible for the LAGR robot to become irretrievably stuck in such scenarios. We mitigate the risk of this occurring in two ways. First, we filter exploration tunnels that are highly likely to contain true obstacles. If a tunnel o passes through at least one obstacle cell (i, j) for which the confidence γij exceeds a threshold γmax ∈ [0, 1], we discard o from consideration. The intuition is that, no matter what the expected benefit might be, the robot should refuse to explore obstacle regions for which the traversability model has extremely high confidence. For the experiments presented in this paper, we set γmax = 0.8. Second, we implemented a slip estimation and compensation mechanism. Comparing the robot’s motion as estimated
Goal
Tall grass
(a)
(b)
(c)
Long path Fig. 5. The course that we used for testing, viewed from the robot’s starting position. The goal is directly in front of the robot, approximately
23 m away (a). The direct line to the goal is blocked by tall grass (b). To the robot’s right is a flat, but circuitous, path (c).
by visual odometry to the motion reported by the wheels, we continuously estimate the slip ratios for each wheel. When the slip ratio for a wheel exceeds a threshold, we apply a threshold function to the maximum allowed velocity to that wheel. This mechanism works similarly to traction control in automobiles, which selectively applies brake pressure to slipping wheels. When the robot compensates for slip in this manner, it is far more capable of escaping from high-slip situations. E. Course description To test our exploration algorithm, we built the outdoor course shown in Fig. 5. From the robot’s starting position, the goal is straight ahead and approximately 23m away. The direct line to the goal is obstructed by tall grass that appears to be an obstacle, but which the robot can actually drive through. To the robot’s right is a flat, easily traversable path that takes a longer, more circuitous route to the goal. To measure the impact of exploration, we tested the robot with and without exploration on this course. To isolate the effect of exploration and to ensure experimental repeatability, we disabled run-to-run learning. Thus, in all tests, the robot started with the same initial traversability model, learned offline by manually gathering training instances and providing them to AdaBoost. Because this model classifies the tall grass as obstacle, we expect the baseline system (without exploration) to take the more circuitous route. The exploring robot should be willing to attempt the direct route through the tall grass. VI. R ESULTS We performed three runs with exploration and three runs without exploration, on the same course. In all three runs without exploration, the robot behaved as expected; it avoided the tall grass and took the long path to the goal. In all three runs with exploration, the robot selected a tunnel through the tall grass and negotiated a shorter route to the
goal. We measured three quantities: time to reach the goal, total distance traveled to reach to the goal, and number of bumper hits (i.e., collisions with obstacles) en route.
No exploration Exploration
Time [seconds] 55.243 (11.996) 32.793 (3.610)
Distance [meters] 41.825 (6.628) 24.221 (1.385)
Bumps [total] 0 2
TABLE I
Results comparing three runs each with and without exploration. The experimental mean and (standard deviation) are given for elapsed time and travel distance. Bumper hits (i.e., collisions) are reported as a total for the three runs.
The results are summarized in Table I. Exploration provides a clear (and statistically significant) benefit over the baseline system in terms of time and distance. On average, the exploring robot reached the goal 41% faster, with a 42% shorter path, than the baseline system. On the other hand, the exploring robot also incurred more bumper hits. While the baseline system experienced zero collisions, the exploring robot collided in some runs with hay bales. This tradeoff is unavoidable; if the robot is willing to explore apparent obstacles, it will occasionally decide poorly and collide with an actual obstacle. However, the robot also learns from the collisions, updating its traversability model, and thus being less willing in the future to explore similar terrain. VII. S UMMARY We have presented a novel approach to exploration in which the robot purposefully probes into apparent obstacles, with the goal of improving its traversability model. We presented a decision-theoretic algorithm that identifies potential “tunnels” through apparent obstacles and estimates the benefit of exploring them. This algorithm has the key advantages of being extremely efficient to execute, being independent of the learning algorithm used to construct
traversability models, and minimally intruding on the robot’s existing planning and control systems. We validated our algorithm in experiments in an outdoor course, showing a clear advantage, in terms of time and travel distance, for the exploring robot over the baseline system. However, we also discovered that the exploring robot is more likely to collide with obstacles. More generally, an exploring robot is less conservative and thus more likely to get into troublesome situations than a nonexploring robot. While simple collisions and occasional wheel slip are not necessarily problematic, clearly there exist exploratory actions that would be fatal (e.g., driving off a cliff, falling into a body of water). We expect that in many applications, the most reliable behavior can be produced by a combination of manual training and semi-supervised autonomous exploration. Our ongoing work in this area focuses on two issues. First, we are studying more principled techniques for estimating and mitigating the risk of exploration, to avoid taking fatal actions. Second, we are determining the most effective way to employ run-to-run, or even lifelong, learning in which the robot’s traversability model persists and is improved over an extended period of time. We expect such prolonged learning to further demonstrate the value of our approach to exploring in difficult terrain. VIII. ACKNOWLEDGMENTS This work is supported in part by the DARPA LAGR program. We thank Mark Ollis for providing us with his feature extraction library. R EFERENCES [1] K. Konolige, M. Agrawal, R. C. Bolles, C. Cowan, M. Fischler, and B. P. Gerkey, “Outdoor mapping and navigation using stereo vision,” in Proc. of the Intl. Symp. on Experimental Robotics (ISER), Rio de Janeiro, Brazil, July 2006. [2] M. Agrawal and K. Konolige, “Real-time localization in outdoor environments using stereo vision and inexpensive GPS,” in Proc. of Intl. Conf. of Pattern Recognition (ICPR), August 2006. [3] S. Koenig and R. G. Simmons, “Exploration with and without a map,” in Proc. of the AAAI Workshop on Learning Action Models at the Eleventh National Conference on Artificial Intelligence (AAAI), 1993, pp. 28–32, (also available as AAAI Technical Report WS-93-06). [4] B. Yamauchi, “Frontier-Based Exploration Using Multiple Robots,” in Proc. of Autonomous Agents, Minneapolis, Minnesota, May 1998, pp. 47–53. [5] W. Burgard, D. Fox, M. Moors, R. Simmons, and S. Thrun, “Collaborative multi-robot exploration,” in Proc. of the IEEE Intl. Conf. on Robotics and Automation (ICRA), San Francisco, California, 2000, pp. 476–481. [6] R. Zlot, A. Stentz, M. B. Dias, and S. Thayer, “Multi-robot exploration controlled by a market economy,” in Proc. of the IEEE Intl. Conf. on Robotics and Automation (ICRA), Washington, DC, May 2002, pp. 3016–3023. [7] W. Burgard, M. Moors, C. Stachniss, and F. Schneider, “Coordinated multi-robot exploration,” IEEE Transactions on Robotics, vol. 21, no. 3, 2005. [8] A. Howard, M. J. Matari´c, and G. S. Sukhatme, “Mobile sensor network deployment using potential fields: A distributed, scalable solution to the area coverage problem,” in Proc. of the Intl. Symp. on Distributed Autonomous Robotic Systems (DARS), 2002, pp. 299– 308. [9] M. Batalin and G. S. Sukhatme, “The design and analysis of an efficient local algorithm for coverage and exploration based on sensor network deployment,” IEEE Transactions on Robotics, vol. 23, no. 4, pp. 661–675, Aug. 2007.
[10] C. Ortiz, R. Vincent, and B. Morisset, “Task inference and distributed task management in the centibots robotic system,” in Proc. of Intl. Conf. on Autonomous Agents and Multi Agent Systems, July 2005. [11] W. D. Smart and L. P. Kaelbling, “Effective reinforcement learning for mobile robots,” in Proc. of the IEEE Intl. Conf. on Robotics and Automation (ICRA), vol. 4, May 2002, pp. 3404–3410. [12] K. Iagnemma and S. Dubowsky, “Traction control of wheeled robotic vehicles in rough terrain,” The Intl. J. of Robotics Research, vol. 23, no. 10–11, pp. 1029–1040, 2004. [13] A. Angelova, L. Matthies, D. Helmick, and P. Perona, “Slip prediction using visual information,” in Proc. of Robotics: Science and Systems, Philadelphia, USA, Aug. 2006. [14] D. Helmick, S. Roumeliotis, Y. Chang, D. Clouse, M. Bajracharya, and L. Matthies, “Slip-compensated path following for planetary exploration rovers,” Advanced Robotics, vol. 20, no. 11, pp. 4782– 4795, Nov. 2006. [15] K. Konolige, “A gradient method for realtime robot control,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2000. [16] R. Philippsen and R. Siegwart, “An interpolated dynamic navigation function,” in Proc. of the IEEE Intl. Conf. on Robotics and Automation (ICRA), 2005. [17] J.-C. Latombe, Robot Motion Planning. Norwell, Massachusetts: Kluwer Academic Publishers, 1991. [18] J. Bresenham, “Algorithm for computer control of a digital plotter,” IBM Systems J., vol. 4, no. 1, pp. 25–30, 1965. [19] M. Happold, M. Ollis, and N. Johnson, “Enhancing supervised terrain classification with predictive unsupervised learning,” in Proc. of Robotics: Science and Systems, Philadelphia, USA, Aug. 2006. [20] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” J. of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997. [21] R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions,” in Computational Learning Theory, 1998, pp. 80–91. [22] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. of Computer Vision and Pattern Recognition Conf. (CVPR), vol. 1, 2001, pp. 511–518. [23] G. Karakoulas and J. Shawe-Taylor, “Optimizing classiers for imbalanced training sets,” in Proc. of Advances in Neural Information Processing Systems (NIPS), 1999, pp. 253–259. [24] H. Syed-Mohammed, J. Leander, M. Marbach, and R. Polikar, “Can AdaBoost.M1 learn incrementally? A comparison to Learn++ under different combination rules,” in Intl. Conf. on Artificial Neural Networks (ICANN), Athens, Greece, 2006, pp. 254–263.