A neural network model of avoidance and approach behaviors for mobile robots Carolina Chang and Paolo Gaudiano Boston University Neurobotics Lab Dept. of Cognitive and Neural Systems 677 Beacon Street, Boston, MA 02215 classical conditioning paradigm, learning occurs by repeated association of a Conditioned Stimulus (CS), which normally has no particular significance for an animal, with an Unconditioned Stimulus (UCS), which has significance for an animal and always gives rise to an Unconditioned Response (UCR). For example, a dog that repeatedly hears a bell before being fed will eventually begin to salivate when the bell is heard. The response that comes to be elicited by the CS after classical conditioning is known as the Conditioned Response (CR). A related form of learning is known as operant conditioning. In this case the animal learns the consequences of its actions. It learns to exhibit more frequently a behavior that has led to a reward, and to exhibit less frequently a behavior that has led to punishment. For example, a hungry cat placed in a cage from which it can see some food will learn to press a lever that allows it to escape the cage to reach the food. In this situation, the animal cannot simply wait for things to happen, but it must generate different behaviors and learn which are effective. This kind of learning is related to reinforcement learning [6].
Abstract In this paper we describe a neural network for reactive and adaptive robot navigation. The network is based on a model of classical and operant conditioning first proposed by Grossberg [3]. The network has been successfully implemented on the real Khepera robot. This work shows the potential of applying self-organizing neural networks to the area of intelligent robotics.
Introduction In order to survive, an autonomous agent must interact successfully with its environment. Since the environment changes constantly, the agent must learn about the causes and consequences of the events that take place around it. Moreover, it must learn to predict the consequences of its own actions, in order to promote favorable events and to prevent unfavorable events. In this respect, one impressive aspect of animal behavior is the facility with which organisms learn to interact with their environment as they gather food, seek mates, and avoid predators and other natural hazards. At our Lab, we believe that biological inspired neural networks have a great potential for achieving autonomy in real robots. In particular, through operant conditioning our robot learns to avoid obstacles reactively with only minimal information about its structure and its environment. Using the same principle, the robot learns to approach a source of environmental light. In this article we summarize the proposed neural network and report its successful implementation on the real Khepera robot.
Conditioning and obstacle avoidance
We have designed a neural network model (fig. 1) for obstacle avoidance based on the Conditioning Circuit initially proposed by Grossberg [3], and subsequently refined by Grossberg & Levine [4]. In this model the sensory cues (CSs) are stored in Short Term Memory (STM) within the population labeled . The CS nodes correspond to activation from the robot’s range sensors. At the bottom of the figure Classical and Operant Conditioning the drive node (e.g. fear) is represented, and condiModels of classical and operant conditioning [5, 6, 3] tioning can only occur when the drive node is active. have emerged from the field of psychology. In the For simplicity, the drive node is directly associated with the US (e.g. collision). The population of nodes E-mail: fcchang,
[email protected]. Supported , learns to “imitate” the UCR. The neurons at the by the Office of Naval Research (ONR N00014-96-1-0772 and N00014-95-1-0409) and by the mediaCenter GmbH, in far right of the figure represent the response (condiFriedrichshafen, Germany. tioned or unconditioned), and are thus connected to
S
D
P
1
Sensor readings S CS3
x1i
CS 2
P
x2i CR
CS 1 UCR
Collision
D
Angular Velocity Map
Figure 1: Conditioning model for obstacle avoid- Figure 3: Khepera (K-team SA, Pr´everenges, Switzerland) is a miniature (2” diameter) round ance. See text for details. robot with eight infrared proximity detectors and a 68331-based microcontroller. Excitatory (target)
prior to the collision will be active. Activation of the drive node allows two different kinds of learning to take place: the learning that couples sensory nodes with the drive node, and the learning of the angular velocity pattern that existed just before the collision. The first type of learning allows the robot to predict when it is about to collide against an obstacle, based on its current sensor readings. The second type of learning is used to associate the sensor activations to the angular velocity map. By using an inhibitory learning law, each sensory node learns to generate a pattern of inhibition that matches the activity profile active at the time of collision. Once learning has occurred, the activation of the angular velocity map is given by two components (fig. 2). An excitatory component, which is generated directly by the sensory system, reflects the angular velocity required to reach a given target in the absence of obstacles. The second, inhibitory component, generated by the conditioning model in response to sensed obstacles, moves the robot away from the obstacles as a result of the activation of sensory signals in the conditioning circuit.
Summation
Angular Velocity Map Inhibitory (obstacle)
Figure 2: Gaussian peak shift property.
the motor system. The motor population represent an angular velocity map, in which the leftmost node corresponds to turning left at the maximum rate, the central node corresponds to moving straight ahead, and so on. The activation of the map is distributed as a Gaussian centered on the desired angular velocity. One reason for using a Gaussian distribution of activity is depicted in fig. 2. When an excitatory Gaussian is combined with an inhibitory Gaussian at a slightly shifted position, the resulting net pattern of activity exhibits a maximum peak that is shifted in a direction away from the peak of the inhibitory Gaussian. Hence, the presence of an obstacle to the right causes the robot to shift to the left, and vice versa. Further details on this operation can be found elsewere [2]. The robot is trained by allowing it to make some random movements in a cluttered environment. Whenever the robot collides with an obstacle (or comes very close to it), the nodes corresponding to the largest (closest) range sensor measurements just
Experimental results Our earlier simulations [1] had been based on a the Khepera Simulator, which closely resembles the performance of the real robot. We now have reproduced our results using the real Khepera (fig. 3 ). The robot is able to learn to avoid obstacles rapidly. During the initial training phase, the model develops a set of weights by letting the robot perform movements at different angular velocities in an environment cluttered with obstacles of randomly chosen sizes and positions. After learning, the robot 2
imals to Animats 4, pages 373–381, Cambridge, MA, 1996. MIT Press.
is able to reach target positions while avoiding obstacles in unknown or constantly changing environments (e.g. moving obstacles). We have also used the same mechanism explained in the previous section to develop an approaching behavior in the robot. In this case, the Khepera’s infrared sensors are used to detect and approach a source of environmental light. Instead of being punished, the robot is rewarded each time it moves closer to the light.
[2] Paolo Gaudiano, Eduardo Zalama, and Juan Lopez ´ Coronado. An unsupervised neural network for real-time, low-level control of a mobile robot: noise resistance, stability, and hardware implementation. IEEE SMC, 26:485–496, 1996. [3] S. Grossberg. On the dynamics of operant conditioning. Journal of Theoretical Biology, 33:225– 255, 1971.
Conclusions
[4] S. Grossberg and D.S. Levine. Neural dynamics of attentionally modulated Pavlovian conditioning: blocking, interstimulus interval, and secondary reinforcement. Applied Optics, 26:5015–5030, 1987.
In this article we have described a model that learns to generate avoidance and approach behaviors for a wheeled mobile robot by using a conditioning paradigm. The robot progressively learns to avoid the obstacles without the necessity of external supervision, but simply through the “punishment” signals produced by the collision of the robot. One of the main properties of the model is that it is not necessary to know the robot’s geometry nor the configuration of ultrasonic sensors on the robot’s surface, because the robot learns from past experiences to avoid directions of movement that make the robot collide against the obstacles. Similarly, the robot learns to approach a source of light using “reward” signals. We are extending this models of conditioning to develop more complex behaviors. In particular, we are investigating conditioning circuits that permit the robot to choose among different behaviors depending on the moment-by-moment combination of sensorial information and its internal necessities. For example, a more complex system of sensory and drive nodes could be used to modulate how much the robot will try to approach a target position, depending on its necessity to recharge its batteries.
[5] Robert A. Rescorla and Allan R. Wagner. A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black and W. F. Prokasy, editors, Classical Conditioning II, chapter 3, pages 64–99. Appleton, New York, 1972. [6] Richard S. Sutton and Andrew G. Barto. Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review, 88:135–170, 1981.
Acknowledgments Initial simulations of the model were done with the Khepera Simulator Package version 2.0, written at the University of Nice Sophia-Antipolis by Olivier Michel (downloadable from the World Wide Web at http://wwwi3s.unice.fr/˜om/khep-sim.html).
References [1] Paolo Gaudiano, Eduardo Zalama, Carolina Chang, and Juan Lopez ´ Coronado. A model of operant conditioning for adaptive obstacle avoidance. In P. Maes, M. J. Mataric, J. Meyer, J. Pollack, and S. W. Wilson, editors, From An3