Programming an autonomous robot so that it reliably acts in a dynamic environment is a ... simulation is at least as costly as building a good working robot.
PUBLISHED IN IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS–PART B, VOL.26, NO.3, PP. 361–364, 1996. SPECIAL ISSUE ON LEARNING AUTONOMOUS ROBOTS
Editorial Introduction to the Special Issue on Learning Autonomous Robots Marco Dorigo, Member, IEEE Guest Editor Programming an autonomous robot so that it reliably acts in a dynamic environment is a difficult thing to do. This is due to such problems as missing necessary information at design time, the unpredictability of the environment dynamics, and the inherent noise of the robot's sensors and actuators. For example, a robot may have to move in a cluttered environment with an a priori unknown topology. Moreover, people may be moving around, and the robot may be equipped with noisy sensors, like sonars and dead-reckoning, to sense its surroundings. It is clear that a learning autonomous robot, that is, an autonomous robot that can acquire knowledge by interaction with the environment and subsequently adapt and/or change its behavior in the course of its life, could greatly simplify the work of its designer. A learning robot need not be given all the details of the environment in which it is going to act: it will acquire them by direct interaction and exploration. Also, its sensors and actuators need not be finely tuned: they will adapt to the specific task requirements and environmental conditions. The goal of this special issue is to present some of the most interesting ongoing research on the design and building of learning robots. Most of the papers of this special issue, all but one, present results obtained with real robots. This has to do with the growing consensus in the field that, to be considered useful for real robotic applications, a learning technique must be tested on a real robot. In fact, simulations can only be an approximate model of the real world and building a good simulation is at least as costly as building a good working robot. Moreover, often researchers build their simulation environments in such a way that makes their learning problems easier than the ones the corresponding real robot would have to solve. For example, often global information is assumed, like the capability to know exactly the robot position; or sensors are given capacities they simply do not have, like the capability to reliably recognize different kinds of objects in the environment. Nevertheless, for the time being, working only with real robots seems to be too costly in terms of time and necessary resources. Simulation retains therefore an important role: researchers use simulation to speed up the building and prototyping of their systems. In fact, running simulations requires much less experimental effort than running real world experiments and allows one to come up with a working, although perhaps coarse, version of the learning system. Still, once the major issues have been solved in simulation, it is necessary to test the developed learning system on the real robot. In fact, although simulation results often hold, with minor changes, in the real world, sometimes even very nice results obtained with sophisticated simulators do not hold in the real world, and new techniques must be developed for the real robot. An example of this is described by Yamauchi and Beer in their correspondence paper. Given the relatively young age of the field the applications we find in the papers of this special issue are relatively simple when compared to what we would like our robots to be able to do. We can divide these applications in two main categories. In the first category animat-like robots interact with real environments to accomplish fairly simple tasks like avoiding obstacles, following objects, reaching places, recharging batteries, and so on. The papers by Colombetti, Dorigo, and Borghi, Donnart and Meyer, Floreano and Mondada, Millán, Meeden, and Gaudiano, Zalama, and López-Coronado fall in this category. The second category, which deals with robots learning to navigate in a given environment, includes the papers by Donnart and Meyer, Millán, Tani, Greiner and Isukapalli, and Yamauchi and Beer. Moreover, we have Baluja's paper where the goal is to learn the steering direction of an autonomous vehicle. Obviously, there are many different ways to organize the papers of this special issue, and the one above is just one of the possible manners. Another possibility is to use the learning technique as the discriminatory variable. In this case we have a few, often overlapping, categories. In the first group of four papers the learning technique used is based mainly on the evolutionary computation paradigm (EC). In one paper a genetic algorithm is part of a learning classifier system (Colombetti, Dorigo, and Borghi) while in three papers (Floreano and Mondada, Baluja, Meeden) evolutionary algorithms are used to evolve a neural controller. Many of the papers are concerned with neural networks. Besides those cited above, we find the paper by Millán, in which the neural controller is incrementally built at run time using a reinforcement learning technique based on temporal differences, the paper by Tani, which uses a recurrent neural net (RNN) which learns by the backpropagation through time algorithm [9], and the correspondence paper by Gaudiano, Zalama, and López-Coronado where an architecture based on self-organizing neural networks is used. Besides these two main paradigms, there are Donnart and Meyer’s paper which uses a hierarchical learning classifier system, Greiner and Isukapalli's paper, which proposes a supervised learning algorithm that employs the statistical tests and evaluation criteria used in the "probably approximately correct" (PAC) learning framework [10], McCallum's paper, which deals with the hidden state problem by a technique called instance-based learning, and Yamauchi and Beer's paper, which proposes the adaptive place network technique, a special purpose learning algorithm designed for spatial learning. Still another way to classify the papers is by the learning paradigm. Most of the authors of this special issue use reinforcement learning (Colombetti, Dorigo, and Borghi, Donnart and Meyer, Millán, McCallum, Meeden). Floreano and Baluja use evolutionary learning, that is they use an evolutionary algorithm to learn the weights (Floreano) or both
DORIGO
TSMC SPECIAL ISSUE ON LEARNING AUTONOMOUS ROBOTS
1
the weights and the topology (Baluja) of a neural network. Colombetti, Dorigo, and Borghi also use evolutionary learning, but within the learning classifier system (LCS) framework [1]. Finally, Tani uses supervised learning, Greiner and Isukapalli use supervised PAC-learning, Gaudiano, Zalama, and López-Coronado use unsupervised selforganizing learning, and Yamauchi and Beer use unsupervised learning of topological and metric spatial information. Table I summarizes the different classifications.
Table I. Main characteristics of the robots and of the learning systems presented in the papers of this special issue. Authors
Type of robot
Real robot application AUTONOMOUSE: Search and follow a moving light ROBUTER: Collect objects and store them IBM 7547 SCARA : Reach a still object
Sensors used (# of sensors) AUTONOMOUSE: Light sensors (2) Sonars (1) Bumpers (3) Change of direction sensor (1) ROBUTER: Sonars (24) Chromatic sensor (1) Dead-reckoning IBM 7547 SCARA : Infrared (1) Proprioceptive sensor (1) Infrared (6) Dead-reckoning
Learning technique Learning Classifier System and Evolutionary Computation
Learning paradigm Reinforcement Learning and Evolutionary Learning
Learning Classifier System
Reinforcement Learning
Infrared (8) Light sensors (2) Brightness of the floor sensor (1) Infrared (16) Sonar (16) Bumpers (20) Dead-reckoning Laser range finder (1) CCD camera (3)
Evolutionary Computation and Recurrent Neural Network Temporal Differences and Incremental Neural Network Recurrent Neural Network
Evolutionary Learning
CCD camera (1)
PAC learning
Colombetti Dorigo Borghi
AUTONOMOUSE ROBUTER IBM 7547 SCARA
Donnart Meyer
KHEPERA
Navigation Goal reaching Obstacle avoidance
Floreano Mondada
KHEPERA
Goal reaching Obstacle avoidance
Millán
NOMAD 200
Goal reaching Obstacle avoidance Navigation
Tani
YAMABICO
Greiner Isukapalli Baluja
NOMAD 200
Navigation Goal reaching Obstacle avoidance Navigation
NAVLAB
Control of steering Video Camera (1) direction of an autonomous vehicle
McCallum
----------
----------
----------
Meeden
CARBOT
Seek and avoid light Obstacle avoidance
Bumpers (4) Light sensors (2)
Gaudiano ROBUTER Zalama López-Coronado Yamauchi NOMAD 200 Beer
DORIGO
Goal reaching Dead-reckoning Trajectory following Navigation
Infrared (16) Sonar (16) Laser range finder (1) Dead-reckoning
TSMC SPECIAL ISSUE ON LEARNING AUTONOMOUS ROBOTS
Reinforcement Learning
Supervised Learning Supervised PAC learning Evolutionary Learning
Evolutionary Computation and Neural Network Memory-based and Q-learning Evolutionary Computation and Recurrent Neural Network Self-organizing Neural Network
Reinforcement Learning Reinforcement Learning and Evolutionary Learning Unsupervised Learning
Adaptive place network
Unsupervised Learning
2
Papers in this special issue should give a good overview of the ongoing research in the field. Obviously, a lot of interesting work is not represented in this issue. The interested reader can find a further collection of works in a book recently edited by J.H.Connell and S.Mahadevan [2]. Also, two other journals are currently editing collections of papers on the learning robots subject: the special issue on Robot Learning of the Machine Learning journal, edited by J. Franklin, T. Mitchell and S. Thrun [5], and the special issues on Reinforcement Learning and Robotics, edited by Ben Kröse [7], and on Animat Approach to Control Autonomous Robots interacting with an unknown world, edited by Philippe Gaussier [6], both to be published by the Robotics and Autonomous Systems journal. These collections will appear approximately at the same time as this special issue. THE PAPERS IN THIS SPECIAL ISSUE In the first paper of this collection Colombetti, Dorigo and Borghi propose a new technological discipline, which they call Behavior Engineering, whose main concern is, in the authors' words, "to establish methodologies, models and tools for shaping the behavior of robots." They also present a behavior engineering methodology called Behavior Analysis and Training (BAT) to design learning robots. After a detailed introduction to the B AT methodology, three case-studies are presented in which the methodology is applied to two autonomous robots and to an industrial mechanical arm. The learning system they use is ALECSYS [3,4], based on Holland's learning classifier system [1]. The kind of applications are within the animat style tasks. Donnart and Meyer propose MONA L YSA, a control architecture inspired by the motivational system of animals. Their architecture comprises both a reactive and a planning module, and implements an autonomous animat which is capable of choosing its course of actions as a function of the input received from sensors, its internal state, and the expected consequences of its current and future behavior. They use a hierarchical learning classifier system as learning technique, and their application is goal reaching with obstacle avoidance. Their learning system is also capable of inferring planning rules from previous experience. Floreano and Mondada propose an evolutionary approach to develop obstacle avoidance and battery recharging behaviors for their small KHEPERA robot [8]. They have been able to run experiments on the real robot lasting ten days without human intervention. An interesting aspect of their research is in the use of a neuroethologically grounded approach to the analysis of the neural controllers they were able to evolve. Their robotic application is also an animat style task. In his paper Millán proposes a reinforcement connectionist learning architecture, called T ESEO, to acquire navigation strategies. The main characteristic of T ESEO is that it allows the robot to become operational very quickly and to improve its performance incrementally. The kind of application presented is to learn to reach a goal position following a short and smooth trajectory. Tani discusses such problems as symbol grounding and situatedness. He argues that his model-based learning allows a robot to become grounded. Moreover, he shows how his robot, once it has learned, can become situated in the environment. He uses a RNN which learns by supervised backpropagation in time. The application is navigation with obstacle avoidance and goal reaching. Greiner and Isukapalli's paper focuses on learning a selection function to choose, given the set of landmarks which might be visible from the estimated current position of the robot, those which can be most reliably used to estimate the robot position. In this paper, the authors also prove, using a statistical technique, that the learned selection function is with high probability at a local optimum in the space of the possible functions. Baluja discusses the use of an evolutionary algorithm called population-based incremental learning to learn the weights of a feedforward neural network whose task is to control the steering direction of an autonomous vehicle. Experimental results obtained with Carnegie Mellon's N AVLAB have shown that the proposed evolutionary approach outperforms the traditional backpropagation approach. McCallum's paper deals with the hidden state problem, that is, the problem that arises when a robot's choice of the next action depends on information that is hidden from the robot's sensors due to problems like occlusion, bounded field of view and the like. To solve this problem he proposes instance-based state identification, a method that learns with an order of magnitude fewer steps than several previous state identification techniques. Although the paper does not contain any results on a real robot, it is considered an interesting contribution for real robot practitioners given that hidden state is a problem that many real robots will have to solve. Last, the issue is concluded by three correspondence papers. In the first one Meeden considers whether a local method of reinforcement (such as back-propagation) or a global method (such as a genetic algorithm) is a better means of providing guidance to an adapting neural network based robot controller. In the second correspondence Gaudiano, Zalama and López-Coronado apply their NETMORC neural architecture [11] to various low-level control problems on a ROBUTER and also compare their approach to a classical non linear controller. The issue is concluded by Yamauchi and Beer's correspondence on learning a map of the environment by the E LDEN system. They introduce adaptive place networks and integrate them with a reactive controller. The application presented is navigation in a dynamic environment containing multiple forms of change, including both moving people and rearranged obstacles.
DORIGO
TSMC SPECIAL ISSUE ON LEARNING AUTONOMOUS ROBOTS
3
SOME STATISTICS Twenty seven papers were submitted to this special issue. Of these, eight were accepted as full papers, three as correspondences. Each submitted paper was reviewed by at least three referees and all accepted papers were revised at least once (most of them went through two revisions). Of the submitted papers, 9.3 were from US-based researchers, 13.2 from the European Union, 1.5 from Switzerland, one from Israel, one from Japan, and one from Korea 1. The accepted papers are: 5.3 from US-based researchers, 4.2 from the European Union, one from Japan and 0.5 from Switzerland. ACKNOWLEDGMENTS I am grateful to the authors for their efforts in producing such high quality papers. I would like to thank Andrew Sage, Editor-in-Chief of these Transactions, for giving me the opportunity to edit this special issue and for being very flexible about deadlines. I am also indebted to some 60 referees who are listed below. With their time and work they greatly contributed to the success of this enterprise. This work was possible also thanks to a CEC Human Capital and Mobility Programme Fellowship to Marco Dorigo for the years 1994-1996. LIST OF REVIEWERS P. Agre, S. Baluja, R.D. Beer, H. Bersini, P. Bessiere, A. Bonarini, G. Bontempi, G. Borghi, V. Caglioti, P.V.C. Caironi, G. Cembrano, M. Colombetti, J.H. Connell, C. Decaestecker, M. Dorigo, D. Floreano, T. Fogarty, D. Fogel, A. Fraser, L.M. Gambardella, P. Gaudiano, P. Gaussier, M. Gini, R. Greiner, B. Hallam, M. Heger, P. Husband , I. Kamon, Y.S. Kim, P. Lima, J. López-Coronado, D. Luzeaux, S. Mahadevan, V. Maniezzo, A. McCallum, L. Meeden, J.-A. Meyer, O. Michel, J.d.R. Millán, F. Mondada , M. Patel, R. Pfeifer, J. Rice, R. Riolo, A. Saffiotti, A. Schultz, C. Sheaffer, N. Swarup, J. Tani, S. Thrun, C. Torras, C. Touzet, N. Tschichold-Gürman, T. Tyrrell, T. Van de Merckt, C. Versino, B. Yamauchi, E. Zalama, S. Zrehen, and three additional anonymous referees that reviewed the submission by M. Colombetti, M. Dorigo and G. Borghi. REFERENCES [1] [2] [3] [4] [5] [6] [7]
L. Booker, D.E. Goldberg and J. H. Holland, “Classifier Systems and Genetic Algorithms,” Artificial Intelligence, vol.40, no.1-3, pp.235–282, 1989. J.H.Connell and S.Mahadevan, Eds., Robot Learning. Kluwer Academic Publishers, 1993. M. Dorigo, "ALECSYS and the AutonoMouse: Learning to Control a Real Robot by Distributed Classifier Systems," Machine Learning, vol.19, no.3, 1995. M. Dorigo and E. Sirtori, "ALECSYS: A Parallel Laboratory for Learning Classifier Systems," Proceedings of the Fourth International Conference on Genetic Algorithms, San Diego, California, R.K.Belew and L.B.Booker (Eds.), Morgan Kaufmann, pp.296–302, 1991. J.Franklin and T.Mitchell and S.Thrun, Eds., Special issue on Robot Learning, Machine Learning, 1995, forthcoming. P. Gaussier, Ed., Special issue on Animat Approach to Control Autonomous Robots interacting with an unknown world, Robotics and Autonomous Systems, forthcoming. B.Kröse, Ed., Special issue on Reinforcement Learning and Robotics, Robotics and Autonomous Systems, Vol.15, No. 3, July 1995.
[8] F. Mondada, E. Franzi, and P. Ienne, "Mobile Robot Miniaturization: A Tool for Investigation in Control Algorithms," in Experimental Robotics III: Proceedings of the 3rd International Symposium on Experimental Robotics, 1993, Tsuneo Yoshikawa and Fumio Miyazaki (Eds.), Springer-Verlag, pp.501–513, 1994. [9] D. Rumelhart, G. Hinton, and R.Williams, "Learning internal representations by error propagation," in Parallel Distributed Processing, D. Rumelhart, G. Hinton (Eds.), Cambridge, MA: MIT Press, 1986. [10] L. G. Valiant, "A Theory of the Learnable,” Communications of the ACM, vol.27, no.11, pp.1134--1142, 1984. [11] E. Zalama, P. Gaudiano and J. López-Coronado, "A real-time, unsupervised neural network for the low-level control of a mobile robot in a nonstationary environment," Neural Networks, Vol.8, No.1, pp.102–123, 1995.
1
Fractions are due to multi-national submissions.
DORIGO
TSMC SPECIAL ISSUE ON LEARNING AUTONOMOUS ROBOTS
4