Improving Hallway Navigation in Mobile Robots with Sensor Habituation Carolina Chang
[email protected] Grupo de Inteligencia Artificial, Departamento de Computaci´on y Tecnolog´ıa de la Informaci´on Universidad Sim´on Bol´ıvar, Apartado Postal 89000 Caracas 1080, Venezuela Abstract
Habituation is a form of non-associative learning observed in a variety of species of animals. Arguably, it is the simplest form of learning. Nonetheless, the ability to habituate to certain stimuli implies plastic neural systems and adaptive behaviors. This article describes how computational models of habituation can be applied to real robots. In particular, we discuss the problem of the oscillatory movements observed when a Khepera robot navigates through narrow hallways. Results show that habituation to the proximity of the walls can lead to smoother navigation. Habituation to sensory stimulation to the sides of the robot does not interfere with the robot’s ability to turn at dead ends and to avoid obstacles outside the hallway. This work shows that simple biological mechanisms of learning can be adapted to achieve better performance in real mobile robots. Keywords: Mobile Robots, Habituation, Robot Learning, Unsupervised Neural Networks.
1
Introduction
Over the past years we have applied biological learning theories to achieve adaptive behaviors in mobile robots (Chang & Gaudiano, 1998). The behaviors are learned by an unsupervised neural network, which imitates principles of classical conditioning and operant conditioning. We have used our neural network model to train a Khepera miniature mobile robot (K-Team S.A., Preverenges, Switzerland) to avoid obstacles and approach lights. As a result of its own experiences, the robot learns simultaneously to generate more often those movements that lead to increased light levels, while suppressing those movements that lead to collisions. Learning of these behaviors requires no knowledge about the robot’s geometry and the location of the sensors on the robot’s body. Moreover, no prewired reflex behaviors are used. Instead, the robot learns to sort out which sensors are associated with obstacle or light sensing (Chang & Gaudiano, 1998; Gaudiano & Chang, 1997). Our model for reactive navigation has proven to be robust and fast. However, oscillations in the robot’s movements are observed when the robot navigates through narrow corridors. This is primarily due to the short range of the Khepera’s infrared sensors, which can detect obstacles up to a distance of only about 2cm. The neural network learns to generate abrupt turns (avoidance behaviors) when it detects an obstacle. The continuous detection of the walls in a narrow hallway elicits the obstacle avoidance behaviors very often, yielding oscillatory movements. Oscillations in hallway navigation are undesired behaviors produced by our neural network model. Notice however that other approaches to reactive obstacle avoidance have similar problems when negotiating narrow hallways. For example, the built-in Braitenberg’s Vehicle of the Khepera robot produces erratic movements very often (Braitenberg, 1984; K-Team, 1995). Moreover, the navigation of hallways is not collision-free, and the robot can get stuck at deadends. Other solutions such as the Distributed Adaptive Control (DAC) (Pfeifer & Verschure, 1992; Verschure, Kr¨ose, & Pfeifer, 1992) cannot even let the robot pass through corridors because the obstacle avoidance behaviors cause it to back-off in the presence of strong sensory stimulation. To address the problem of oscillations in hallway navigation we have included in our neural network a mechanism for habituation to prolonged sensory input. In addition to learning through classical conditioning and operant conditioning, the neural network described here includes a mechanism of non-associative learning that is similar to habituation. Section 2 reviews mechanisms of habituation observed in animals. We describe some physiological changes that take place in the synapses during habituation and discuss the computational model of neurotrasmitter accumulation and depletion proposed by Grossberg (1972). Experiments using habituation in a real robot are described in section 3. Results show that abituation to the proximity of walls reduces oscillatory movements in narrow hallways. Finally, section 4 presents the conclusions of this work.
2
Habituation
Habituation is a decrease in the strength of a response, after a repeated presentation of the stimulus that produces the response (Mazur, 1994). For example, a person may fail to show a startle reaction after repeated presentation of a loud noise. Habituation is a form of non-associative learning since only one type of stimulus is involved. From the protozoan Stentor coeruleus to Homo sapiens, many species of the animal kingdom have in common this type of learning. For this reason, it has been hypothesized that the same physiological mechanisms of habituation may be shared by different species. Some researchers have investigated the mechanisms of habituation of simple animals. For example, the study of the Aplysia’s gill-withdrawal reflex has led to the discovery of the physiological changes that are responsible for its habituation (Castellucci, Pinsker, Kupfermann, & Kandel, 1970; Kandel & Schwartz, 1982). When the animal’s siphon is touched, its gill contracts for a few seconds. If the siphon is stimulated repeatedly, the gill-withdrawal reflex habituates (smaller contractions and faster return to a normal position). The study of the Aplysia’s neural circuit revealed that the presynaptic (sensory) neurons released less transmitter into the synapses with repeated stimulus presentation. On the other hand, there was no change in the sensitivity to the transmitter of the postsynaptic neuron. Therefore, habituation in Aplysia results from changes in the effectiveness of existing synapses rather from anatomical changes such as the growth of new connections between neurons. It has been more difficult to identify the physiological changes involved in habituation in mammals due to the complexity of their nervous system. However, as in Aplysia, there is evidence of changes in existing synapses of the nervous system. Moreover, these changes take place in the sensory areas of the neural circuit (Davis, Gendelman, Tischler, & Gendelman, 1982; Condon & Weinberger, 1991). Transmitter accumulation-depletion mechanisms have been proposed by Grossberg as part of his psychophysiological theory of learning (Grossberg, 1972, 1982). “Transmitter gates” imitate the role of chemical transmitter depletion in the neurons’ synapses. The amount of transmitter g determines the efficacy of the synaptic transmission. Each gate multiplies the input signal x, to form a gated output signal T :
T = xg
(1)
The output signal is a joint function of the input and the availability of transmitter. Hence, the output is large when a strong input is presented and the level of chemical transmitter is high. However, the transmitter level g changes over time depending on the strength of the sensory input x. Through a process of habituation, the transmitter level decreases as the input increases, according to:
dg dt
? g) ? T xg
= (
(2)
where , and T are positive constants that determine the habituation rate of the transmitter. The term ( ? g ) indicates that the transmitter g recovers to its maximum value at the rate . The term T xg indicates that the transmitter level decreases at a rate proportional to the amount of transmitter available. Equation 2 ensures that an increase in the input signal depletes more transmitter, while the transmitter returns to its maximum level in the absence of an input signal. This simple non-associative learning mechanism is used in section 3 to improve the navigation of a real mobile robot.
3
Robot Hallway Navigation
We have proposed a neural network model that learns simultaneously to produce obstacle avoidance behaviors and light approach behaviors in a mobile robot (Chang & Gaudiano, 1998). Learning is achieved through classical conditioning and operant conditioning. Figure 1 shows a simplified network that can learn only to avoid obstacles. The sensory nodes xi1 at the upper left of figure 1 receive activation from the robot’s range sensors. There is no knowledge built into the network about the kind of sensor information (e.g., infrared or sonar), or the position of the sensor on the robot’s body. At the bottom of figure 1 the drive node D is represented: learning can only occur when the drive node is active. In our model this happens when the robot collides with an obstacle, which could be detected through a bump sensor, or when any one of the range sensors indicates that an obstacle is closer than the sensor’s minimum range. The neurons xmi represent the response (conditioned or unconditioned), and are thus connected to the motor system. In our model the motor population consists of neurons encoding desired angular velocities. For instance, the
leftmost node corresponds to turning left at the maximum rate, the central node corresponds to straight ahead, and so on. Finally, Neurons in population P require joint activation of the sensory neurons and the drive node in order to become active and learn to inhibit the angular velocity pattern that existed just before the collision. As a result of its own experiences, the robot learns to predict and avoid impeding collisions. Additional details on this neural network can be found elsewhere (Chang & Gaudiano, 1998). Sensor Readings
S CS3
x1i
CS 2
P
x2i CR
CS 1 UCR
xm j Collision
D
Figure 1: Conditioning model for obstacle avoidance. The range sensor activities represent the conditioned stimuli. A collision detector activates the unconditioned stimulus. Motor learning occurs at a population coding the robot’s target angular velocity. After conditioning, the pattern of activity across the range sensors can predict a collision and modify the robot’s angular velocity to avoid the obstacle.
Angular Velocity Map
Survival
Our reactive method for obstacle avoidance does not make use of any information about the location of the sensors on the robot’s body, as discussed above. As a consequence, the robot never navigates parallel to walls because any activation of the side sensors predicts a collision, causing the robot to turn away from the walls. While this is not a problem when the robot navigates open environments, oscillations are observed when navigating through narrow corridors, as shown in figure 2(a). The figure is a digital image captured from a camera mounted above the Khepera’s environment, which is a closed and narrow corridor made out of LEGO bricks. A tracking algorithm localizes the robot’s position and direction (square) and traces the trajectory described by the robot (black dots).
(a)
(b) Figure 2: (a): Hallway navigation using transmitter gates. Sensory adaptation eliminates the robot’s oscillations. (b): The obstacle avoidance behavior causes the robot to oscillate in narrow hallways. A mechanism for sensory habituation was added to the neural network to reduce the robot’s oscillations. The activity of the sensory nodes x1i is propagated to the rest of the network through transmitter gates, as described in section 2. The use of transmitter gates allows the robot to navigate through narrow hallways without oscillations. As the robot’s sensors are activated by the nearby walls, habituation decreases the gated output of the sensory neurons, thereby preventing oscillations, as shown in figure 2(b). The values = 0:005, = 1:0, and T = 1:0 were used in the experiment of figure 2(b). Different levels of habituation can be obtained depending on the parameter values of the transmitter gates. In particular, we can study the effect of , the replenishment rate of the transmitter. When the transmitter recovery is fast,
the robot does not habituate to the proximity of the walls. On the other hand, if the replenishment is slow, the robot habituates, reducing the number of turns it makes to avoid the walls. Table 1 shows the average number of obstacle avoidance behaviors observed in the robot for = 1:0, and T = 1:0 and different values of . The robot navigates 10 times a 55x9.5 cm hallway. The number of turns produced in each trial are counted, with the exception of the required turns at the two ends of the hallway. For example, 15 turns are observed in figure 2(a).
no habituation
= 0:020 = 0:010 = 0:005 = 0:002
Number of turns 15.0 11.9 7.7 3.5 2.0
Table 1: Average number of turns for different transmitter replenishment rates. As decreases the robot’s oscillations are reduced. The decrease in the number of turns is related to the reduction of the turning angle produced by the robot. For example, 2 turns are observed in figure 2(b). However, the turning angles are so small that navigation is smooth and free of oscillations.
Oscillations are also affected by the width of the hallway. When the walls are too far apart, they are not detected constantly by the robot. Therefore, fewer turns are observed as the hallway gets wider. However, habituation does not eliminate such turns because there is more time for the transmitter to recover in wide areas. As the hallway gets narrower, there is more improvement of navigation due to habituation. Table 2 shows the average number of turns in 10 trials with and without habituation for different hallway widths. The parameter values of the transmitter gates were = 0:005, = 1:0, and T = 1:0 for all the trials. hallway width (cm) 7.5 9.5 13.5 17.5
turns no habitation 34.0 15.0 8.2 5.0
turns habituation 1.1 3.5 4.0 2.4
percentage reduction 96.76 76.67 52.50 52.00
Table 2: Reduction of the number of obstacle avoidance behaviors produced by the robot in hallways of different widths. More oscillations are produced in narrow corridors. In such environments habituation improves the navigation by reducing the number of turns of the robot. Oscillations are not observed in wider environments but habituation still reduces the number of turns.
In very narrow corridors is were oscillations become a real problem for the robot. It is in such cases that habituation is most valuable. As shown in table 2, there is a reduction of 96.76% in the number of turns in a 7.5 cm wide corridor. Notice that this is a very narrow corridor since the robot’s diameter is 5.5 cm. Not only can the robot pass through, but also the resulting navigation is smooth along the entire corridor.
4
Conclusions
Reactive hallway navigation based only on infrared or ultrasonic sensors is a problem for many controllers. For example, the DAC architecture in its original version does not even allow a robot to enter a hallway. The excessive activation of the range sensors causes the robot to move backwards. This problem has been addressed by Salomon (1998), who incorporated proprioceptive sensors to differentiate strong sensory input from actual collisions. Training with the improved architecture allowed the robot to pass through narrow places. It is not clear, however, whether oscillations in hallways are observed with this new architecture. Our approach consisted in letting the robot habituate to the strong and continuous sensory input obtained in narrow places. To this end, we adopted the transmitter gates model proposed by Grossberg (1972, 1982). The mechanism of habituation reduces the transmitter released into the synapses when a stimulus is presented repeatedly. We have included transmitter gates only in the sensory neurons of our neurocontroller, as suggested by the available studies of habituation in biological organisms. Our experimental results confirm that short-term habituation can be achieved with no need for anatomical changes, such as the growth of new connections between neurons. Since the mechanism of habituation only introduces changes in the effectiveness of existing synapses we were able to incorporate habituation with our neural network a posteriori. No additional training was required for the network to perform correctly. This suggests that habituation can be easily integrated with other architectures, not only to reduce oscillations in hallways but also to produce other behavioral changes due to repeated stimulation.
Habituation is a simple but effective form of learning. The phenomena of habituation, rehabituation and recovery have been implemented in hardware (Goru & Akers, 1996). However, the usefulness of habituation for adaptive robots has not been been fully explored yet. The results of this work suggests that habituation can be combined easily with other forms of learning in order to improve the robot’s behaviors.
5
Acknowledgments
Initial parts of this work were supported by the Office of Naval Research and the Navy Research Laboratory, through grant ONR-00014-96-1-0772. The realization of this work was possible thanks to the generosity and support of Dr. Paolo Gaudiano, Director of the Boston University Neurobotics Lab.
References Braitenberg, V. (1984). Vehicles: Experiments in Synthetic Psychology. MIT Press. Castellucci, V., Pinsker, H., Kupfermann, I., & Kandel, E. R. (1970). Neuronal mechanisms of habituation and dishabituation of the gill-withdrawal reflex in Aplysia. Science, 167, 1745–1748. Chang, C., & Gaudiano, P. (1998). Application of biological learning theories to mobile robot avoidance and approach behaviors. Journal of Complex Systems, 1(1), 79–114. Condon, C. D., & Weinberger, N. M. (1991). Habituation produces frequency-specific plasticity of receptive fields in the auditory cortex. Behavioral Neuroscience, 105, 416–430. Davis, M., Gendelman, D. S., Tischler, M. D., & Gendelman, P. (1982). A primary acoustic startle circut: Lesions and stimulation studies. Journal of Neuroscience, 2, 791–805. Gaudiano, P., & Chang, C. (1997). Adaptive obstacle avoidance with a neural network for operant conditioning: experiments with real robots. In Proceedings of the 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA), pp. 13–18 Monterey, California. Goru, V. K., & Akers, L. A. (1996). VLSI Implementation of Rehabituation. In Proceedings of the 1996 IEEE International Conference on Neural Networks Washington D.C. Grossberg, S. (1972). A neural theory of punishment and avoidance, II: Quantitative theory. Mathematical Biosciences, 15, 253–285. Grossberg, S. (1982). A psychophysiological theory of reinforcement, drive, motivation and attention. Journal of Theoretical Neurobiology, 1, 286–369. K-Team, S. A. (1995). Khepera: User Manual. Switzerland. Kandel, E. R., & Schwartz, J. H. (1982). Molecular biology of learning: Modulation of trasmitter release. Science, 218, 433–443. Mazur, J. E. (1994). Learning and Behavior (Third edition). Prentice-Hall, Inc. Pfeifer, R., & Verschure, P. (1992). Distributed adaptive control: a paradigm for designing autonomous agents. In Varela, F. J., & Bourgine, P. (Eds.), Toward a practice of autonomous systems, pp. 21–30. MIT Press, Cambridge, Massachusetts. Salomon, R. (1998). Improving DAC. In From Animals to Animats 5. Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior, pp. 331–339. MIT Press. Verschure, P. F. M. J., Kr¨ose, B. J. A., & Pfeifer, R. (1992). Distributed adaptive control: The self-organization of structured behavior. Robotics and Autonomous Systems, 9, 181–196.