An embodied model of action selection ... - Semantic Scholar

6 downloads 0 Views 108KB Size Report
thereby forming a single, multi-way 'switch'. It is interesting to note that recurrent reciprocal inhibition within these local striatal regions probably provides one of ...
In From Animals to Animats 6: Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior. Meyer, J-A., Berthoz A., Floreano, D., Roitblat, H., & Wilson, S.W. (Eds.) MIT Press: Cambridge, MA. Pages 157-166.

An embodied model of action selection mechanisms in the vertebrate brain Fernando Montes Gonzalez, Tony J. Prescott, Kevin Gurney, Mark Humphries, and Peter Redgrave Department of Psychology, University of Sheffield, Western Bank, Sheffield S10 2TP, UK. {f.montes, t.j.prescott, k.gurney, pcp98mdh, p.redgrave} @sheffield.ac.uk

Abstract In previous research we have demonstrated a computational model of the intrinsic circuitry of the vertebrate basal ganglia based on the proposal that these central brain structures play an important role in action selection. The current work embeds this model within the control architecture of a Khepera mobile robot allowing action selection between multiple behaviors styled on some of the home cage activities of a laboratory rat. Our results demonstrate appropriate and clean switching by the embodied basal ganglia model between wall-following, search, 'food'-pickup, corner-finding, and ‘food’-deposit behaviors. The robot can be seen to select appropriate actions for different circumstances and to generate integrated sequences of behavior. The normal function of the basal ganglia is sensitive to fluctuations in the level of the neurotransmitter dopamine. The effects on the robot model of varying the simulated dopamine level show interesting similarities to those observed in animals. This research demonstrates that the proposed functional model of the basal ganglia is sufficient for effective action selection within a fully specified behavioral control architecture.

1. Introduction Action selection is the task of resolving conflicts between competing behavioral alternatives. This problem has received considerable attention in the growing adaptive behavior literature (see reviews in Maes, 1995; Prescott, Redgrave, and Gurney, 1999) much of which has built on earlier research in ethology (see, e.g. McFarland, 1989) Recent research has moved towards evaluating candidate action selection mechanisms in simulated agents embedded in virtual environments or in the control architectures of mobile robots acting in the real world (see Prescott et al.,

1999, for review). Only limited attention has been paid, however, either in the ethology or adaptive behavior literature, to the neural circuitry that supports action selection in animals. Research in the neurosciences suggests where action selection mechanisms might be found in animal nervous systems and how they might operate (Prescott et al., to appear). A recent proposal that we have made (Prescott et al., 1999; Redgrave, Prescott, and Gurney, 1999a) is that a group of functionally related structures in the vertebrate brain, called the basal ganglia, may be suitably connected and configured to serve as a specialized action selection mechanism. In section 2 we briefly review some of the evidence in support of this proposal, while in section 3 we describe some of the findings from our computational model of the intrinsic circuitry of the vertebrate basal ganglia (Gurney, Prescott, and Redgrave, 1998; Gurney, Prescott, and Redgrave, submitted). Section 4 describes how we have embedded this basal ganglia model within the control architecture of a Khepera mobile robot. Finally, Section 5 describes some of the experiments we have carried out to date using the robot model to determine the effectiveness of the basal ganglia as an embodied action selection mechanism. This research has several objectives. First we wish to evaluate our hypothesis that the vertebrate basal ganglia provides a neural substrate for action selection. In this respect, the robot model constitutes a strong test of the sufficiency of the proposed basal ganglia model as an effective action selection device. Second, we know that in a biological setting animal behavior switching is strongly influenced by changes in the level of the neuromodulator dopamine (Redgrave, Prescott, and Gurney, 1999a,b). Such changes are very important from a clinical point of view as abnormal dopamine levels underlie human brain disorders such as Parkinson's disease, Tourett's syndrome, and obsessive-compulsive behavior. The effect of dopamine modulation on basal ganglia function has been simulated in our model of intrinsic basal ganglia circuitry, and using the robot model we are able to observe the consequences of this

158

modulation for actual behavioral sequences. To the extent that we are able to simulate the behavioral effects of dopamine modulation in animals this will add validity to the model as a simulation of basal ganglia function, and may also aid our understanding of the role of dopamine in animal and human behavior switching. Finally, given that vertebrates are versatile and successful autonomous agents, we hope that a better understanding of the substrate of action selection in vertebrate brains may assist in the design of future control architectures for artificial multi-tasking agents such as mobile robots.

2. The vertebrate basal ganglia viewed as an action selection device There are likely to be multiple substrates for action selection in a complex control architecture such as the vertebrate brain, and we have discussed a number of possible neural selection mechanisms elsewhere (Prescott et al., to appear). Action selection may also be an emergent function (see e.g. Maes, 1995). That is, it may arise through the interaction of control system elements with different functional roles without the need for specific elements dedicated to the resolution of selection conflicts. However, we have also argued (Prescott et al., 1999) that there are potential benefits to be had from the inclusion in a control architecture of specialized action selection components. First, there is the advantage of modularity. To the extent that the problem of selection can be distinguished from the perceptual and motor control problems involved in coordinating behavior it can be advantageous to decouple the selection circuitry from other parts of the control system. As separate components each can be improved or modified independently. In contrast, in a circuit that displays emergent selection, a change directed at some other aspect of function could impact on the switching behavior of the network with possibly undesirable consequences. The advantages incurred by modularity in dissociating functionally distinct components of the system are probably as significant for evolved systems as they are for engineered ones (Wagner and Altenberg, 1996). The decoupling of action selection from other aspects of control is also a feature of a growing number of robot control architectures (Bryson, in press). A second advantage of specialized selection mechanisms is economy of inter-connectivity. This may be a particularly important criteria for biological control systems as their are many important constraints on brain size including the high metabolic cost of neural tissue (Leise, 1990). As we have previously pointed out (Prescott et al., 1999), interconnectivity can increase at a rate O(n2) in a distributed control architecture that lacks centralized switching components (where n is the number of competing action systems). This is the case, for instance, in systems that use recurrent reciprocal inhibition between the different

competitors. The use of a centralized action selection device, on the other hand, reduces the required connectivity to O(n), which means that interconnectivity within the control system may scale at a more manageable rate as the number of competing action systems is increased. The specific proposal that we have made is that the basal ganglia may act as a centralized selection mechanism in the vertebrate brain. Below we briefly describe the basal ganglia and summarize some of the neurobiological evidence in support of this conjecture. The principle components of the basal ganglia include the striatum and pallidum in the base of the vertebrate forebrain, and the substantia nigra and ventral tegmental area in the midbrain. Figure 1 shows these main basal ganglia nuclei and some of their intrinsic and extrinsic connections within the mammalian brain. The basal ganglia are archaic in evolutionary terms and homologous structures are found in the nervous systems of all classes of jawed vertebrates and possibly in all vertebrates (Medina and Reiner, 1995). Neurobehavioral studies also suggest that the core function of the basal ganglia may be similar across the different vertebrate classes (see Redgrave et al., 1999a). The proposal that the basal ganglia are involved in action selection is based on a growing consensus amongst neuroscientists that a key function of these structures is to enable desired actions and to inhibit undesired, potentially competing, actions (see e.g. Chevalier and Deniau, 1990; Mink, 1996). The following provides a brief summary of the proposed functional architecture, a full account of this view has been provided elsewhere (Redgrave et al. 1999a; Prescott et al. 1999). Anatomical evidence shows that cortical and midbrain sensorimotor systems, plus several of the forebrain limbic structures, communicate directly with motor and pre-motor mechanisms in the brainstem and spinal cord. However, these systems also project, usually via a collateral (split) pathway, to the striatum, the main input center of the basal ganglia, this branch could allow them to enter into a competition for control of motor outputs hosted within the basal ganglia. Afferents from a wide range of sensory and motivational systems also arrive at striatal input neurons. This connectivity could allow both extrinsic and intrinsic motivating factors to influence the strength of rival 'bids'. The level of activity in different populations of striatal neurons may therefore form a 'common currency' in which competing requests for access to actuating systems can be effectively compared. The main output centers of the basal ganglia (parts of the substantia nigra, ventral tegmental area, and pallidum) are tonically active and direct a continuous flow of inhibition at neural centers throughout the brain that either directly or indirectly generate movement. This tonic inhibition places a powerful brake on these motor mechanisms such that the

159

basal ganglia seem to hold a ‘veto’ over all voluntary movement. Signals emanating from active striatal neurons inhibit the (inhibitory) basal ganglia output centers and can thereby disinhibit their target movement systems. This disinhibitory mechanism thus forms the primary basis for action selection by the basal ganglia. A key pathway here is the direct striatonigral projection, which is shown by the bold arrow in figure 1. This pathway is found in all jawed vertebrates and possibly in all vertebrates.

local areas of the striatum the vertebrate brain may have evolved to exploit the potential of distributed switching whilst minimizing the cost of inter-competitor connectivity. Figure 2 illustrates our general hypothesis of the basal ganglia as an action selection mechanism showing the resolution, by disinhibition, of the conflict between multiple competing sensorimotor systems with different initial salience levels. If this hypothesis is correct, then the basal ganglia as a whole may provide an array of similar switching devices. Competing sensorimotor systems

Context

Cortex

us

m la

a

Th

Striatum

Basal Ganglia

Pallidum STN

Striatum

SN/VTA

Other BG intrinsic pathways

Hindbrain

Output structures (e.g. Subs. Nigra)

Figure 1. Diagrammatic representation of the principle structures of the mammalian basal ganglia showing some of their intrinsic and extrinsic connections. Abbreviations: SN/VTA—substantia nigra/ventral tegmental areas, STN—subthalamic nucleus. Adapted from Prescott et al. (1999).

Projection lines through the various sub-components of the basal ganglia appear to be largely organized into segregated parallel ‘channels’. This segregation is maintained in the disinhibitory output projections. Behavioral studies in mammals indicate that although the architecture of these channels is similar throughout most of the basal ganglia, different areas are functionally heterogeneous. For instance, restricted lesions at different locations in the striatum effect different actions such as forelimb manipulation, biting and gait. This would suggest that the circuitry in these local areas in the striatum may primarily be used to resolve conflicts between competitors bidding for incompatible uses of specific groups of muscles. More generally, each local group of parallel circuits may be competing for a single output mechanism thereby forming a single, multi-way 'switch'. It is interesting to note that recurrent reciprocal inhibition within these local striatal regions probably provides one of the mechanisms that helps to resolve selection competitions (Wickens, 1997). The axon collaterals of individual striatal neurons typically occupy an area of about 0.5 mm in diameter, so reciprocal inhibition can be expected to occur in local regions of approximately this size. By implementing this form of selection within

Motor Commands

Excitatory Inhibitory

Motor mechanisms

Figure 2: The basal ganglia viewed as an action selection mechanism. Multiple sensorimotor systems project to both the basal ganglia and to motor mechanisms elsewhere in the brain. The competition between rival sensorimotor systems is resolved in the basal ganglia. The winning competitor inhibits tonically active neurons in basal ganglia output structures which then selectively disinhibit required motor mechanisms. Other basal ganglia intrinsic pathways excite output neurons contributing the increased inhibition of losing competitors. Thicker lines indicate stronger excitatory or inhibitory signals. Here, the central channel has the strongest initial activation (salience) and is selectively disinhibited. See text for further explanation. From Prescott et al. (1999).

As we have already mentioned, a number of human brain disorders may be related to abnormal levels of the neuromodulator dopamine in the basal ganglia. An important role of dopamine appears to be to moderate the balance between the different control pathways through the basal ganglia. A deficit of dopamine, such as is seen in Parkinson’s Disease leads to too much inhibition on motor systems, and, hence, slowing of movement (bradykinesia) or difficulty in initiating movement. Excess dopamine, on

160

the other hand leads to the excessive movement seen in Huntington’s disease and Tourette’s syndrome, or causes certain activities to become over-dominant as seen in obsessive-compulsive disorders.

3. A computational model of basal ganglia intrinsic circuitry A number of computational models of basal ganglia function, at both the cellular and circuit level, have been investigated (see Houk, Davis, and Beiser, 1995), however, there are as yet few models that capture the distinctive neurodynamics of basal ganglia circuits while mimicking their behavioral functions. Our research is directed at developing models of exactly this sort, and, as a first step, we have developed a system-level simulation in which different functional components of the intrinsic basal ganglia circuit are modelled as leaky integrator units (Arbib, 1995). A full quantitative description of the model and simulation results are available elsewhere (Gurney et al., 1998, Gurney et al., submitted), here we briefly outline some of our principal findings. In order to develop a detailed model we have focused on the intrinsic basal ganglia circuitry of the rat. Since the evolution of the basal ganglia has been relatively conservative this model should generalise reasonably well to other mammals, and to a lesser extent to other vertebrate classes. In mammals, the various basal ganglia nuclei have a rich interconnectivity, partly illustrated in Figure 1, whose function is only partly understood. However, Mink (1996) has reviewed anatomical and electrophysiological evidence indicating that the different intrinsic pathways provide complimentary mechanisms that act to focus activity in the basal ganglia output nuclei—disinhibiting desired channels while maintaining or increasing inhibition on competing channels. This proposal has been explored and extended in our computational model. The new model proposes two forms of action selection within the basal ganglia. The first occurs via recurrent reciprocal inhibitory circuits within local regions of the striatum. This inter-striatal circuitry selects a single winner from a group of competitors and generates an output for that winner which is proportional to its input (salience). Since striatal reciprocal inhibition is topographically localised this mechanism may result in local winners in several regions of the striatum, we have therefore hypothesised that a second form of switching occurs between these multiple winners via a feed-forward selection mechanism that involves other basal ganglia intrinsic nuclei such as the subthalamic nucleus and the pallidum. The effects of changes in dopamine level have also been incorporated into our model. Our results demonstrate that dopamine modulation operates synergistically with the various intrinsic basal ganglia pathways to enhance or reduce the system’s capacity to switch in a manner which is

consistent with the observed effects of abnormal dopamine levels in human clinical conditions. Good matches have also been found between signal traces generated by the model and those obtained from single-cell recordings of intrinsic basal ganglia nuclei. We believe that these results are indicative of common architectural characteristics between the model and the biological basal ganglia. Meeting the requirements for effective selection Earlier work (Snaith and Holland, 1990; Maes, 1995 Prescott et al., 1999) has identified a number of important requirements for effective action selection. These can be loosely divided into those related to making appropriate selections and those concerned with effective switching between competitors. With respect to selection, a widely applied criteria, and one that appears to be exploited in vertebrate decisionmaking (see, e.g. McFarland, 1989), is to prefer the most strongly supported competitor as indicated by relevant external and internal cues. We have shown analytically that our model of basal ganglia intrinsic circuitry is suitable configured to satisfy this minimum requirement for effective selection (Gurney et al., submitted). In order to facilitate effective and timely switching between competitors a number of useful properties have been identified that an action selection mechanism should possess (Snaith and Holland, 1990; Prescott et al., 1999). First, a competitor with a slight edge over its rivals should see the competition resolved rapidly and decisively in its favor. This can be termed clean switching. Second, the presence of other competitors which are activated but not engaged should not interfere with the performance of the winning behavior once the competition has been resolved. This can be termed lack of distortion. Finally, it may also be useful for a winning competitor to remain active at lower input levels than are initially required for it to overcome the competition. This characteristic, termed persistence or hysteresis, can prevent unnecessary switching and is often implemented through some form of non-linear, positive feedback loop (e.g. Houston and Sumida, 1985). Results from our basal ganglia intrinsic model suggest the capacity to support both clean switching and lack of distortion. The study of an embedded model should help us to further evaluate the performance of the basal ganglia on these criteria. With regard to the issue of persistence, neuroscientific evidence suggests that the mammalian thalamus may provide a route for positive feedback to cortical sensorimotor systems (Redgrave et al., 1999a). This possibility is briefly considered below. Persistence through a neurobiologically plausible model of positive feedback Specific nuclei in the mammalian thalamus appear to have reciprocal excitatory projections to cortical sensorimotor systems that send input to the basal ganglia.

161

Interestingly, these thalamic nuclei are also under the inhibitory control of basal ganglia output neurons. Selective disinhibition of the thalamus by the basal ganglia could therefore lead to an increase in the excitation of cortical sensorimotor systems so completing a positive feedback loop. This role for thalamic feedback has been investigated by Humphries and Gurney (1999) in an extended computational model of action selection by the basal ganglia. This new model also incorporates a structure called the thalamic reticular nucleus which plays an important role in modulating the thalamic feedback circuit. Results generated using the extended model show that the basal ganglia-thalamocortical circuit is capable of meeting all the requirements for effective switching outlined above, while also demonstrating an appropriate degree of persistence that was lacking in the original model of intrinsic basal ganglia circuitry. A version of the thalamic feedback circuit, developed by Humphries and Gurney has therefore been incorporated into the current embodied model of action selection by the basal ganglia.

4. An embodied model of action selection in the vertebrate basal ganglia The modeling work considered above focused on understanding some the intrinsic circuitry of the mammalian brain aside from any specific embedding within a full control system. These models can therefore be regarded as demonstrating signal selection (see Gurney et al., submitted, for definition) by the basal ganglia rather than action selection per se. To demonstrate the capability of the basal ganglia model to act as an action selection device we believe that it needs to be embodied in a realtime sensorimotor interaction with the physical world. An important goal was therefore to construct an embedded basal ganglia model in which selection occurs between multiple, physically-realized behaviors in a mobile robot. Our specific program of research has been inspired by observing the behavior-switching of an adult rat in the square arena illustrated in figure 3. Here, bedding material from the animal’s home cage is positioned in one shaded corner of the arena, and items of food are placed in a small dish in the center of the arena. When the rat is initially placed in this environment it shows exploratory behavior and thigmotaxis (staying close to the walls) and a strong preference for the ‘nest’ corner of the box. A common behavior on locating the food dish is to carry a food item back to the nest where it is consumed. The balance between locomotion, feeding, and resting is of course sensitive to the level of hunger of the animal and its familiarity with the arena. Salomone (1988) has demonstrated that effective behavior switching in a similar environment is compromised by the dopamine antagonist Haloperidol and by dopamine-depleting lesions of the striatum.

Figure 3. The behavior of an adult rat in this square arena with a shaded nest area (top right) and central food resource has provided the inspiration for the embedded model of action selection by the vertebrate basal ganglia.

Our initial efforts have focused on producing a similar, if much simplified, problem setting for a Khepera mobile robot and have resulted in the model described below. The wheeled robot, which possesses a gripper-arm and a ring of infra-red (IR) distance sensors, is placed in a square, walled arena in which a number of small cylindrical objects are also placed. The cylinders substitute for food pellets, so the collection and consumption of food is modeled by collecting cylinders and depositing them in the corners of the arena. The robot has five action sub-systems which it can switch between at any time, these are cylinder-seek, cylinder-pickup, wall-seek, corner-seek, and cylinderdeposit. The ‘salience’ of each sub-system at a given moment depends on the values of various extrinsic and intrinsic variables. Extrinsic variables are computed by perceptual sub-systems that process the raw sensory data available to the robot, currently these include wall-detect, corner-detect, cylinder-detect, and gripper-status. Intrinsic variables are computed by motivational sub-systems as a function of recent behavior and experience. The simulation currently includes two simple intrinsic drives loosely analogous to ‘hunger’ and ‘fear’. The ‘fear’ drive is a function of exposure to the environment and is reduced by time spent exploring the environment. ‘Hunger’, on the other hand, gradually increases with time, and is reduced when cylinders are deposited in the corners of the arena. Each action sub-system is also capable of generating an intrinsic variable, relevant to its own salience, as a function of its current state. At each time-step both extrinsic and intrinsic variables are provided to the input components of the model basal ganglia which compute a salience value for each behavior as a weighted function of all the variables relevant to that behavior. The intrinsic circuitry of the basal ganglia model then resolves the competition between behaviors and

162

disinhibits winning action sub-systems. This general architecture is illustrated in figure 4. Note that this simulation is operating at two levels of approximation to the biological control system we are trying to model. The intrinsic basal ganglia model is intended as a systems level approximation of actual neural pathways and structures in the mammalian brain. With the exception of the thalamus, the remaining components of the model are not intended as simulations of specific brain structures but are included in order to provide a complete control architecture that can be evaluated in terms of objective measures of behavior. Raw sensor data (ir, gripper, etc.) Perceptual Action sub-systems sub-systems CylinderCornerdetect Cylinder- pickup deposit Wall- Cylinder- GripperCorner- seek detect status seek Wall+ Cylinderdetect Motivational + sub-systems seek Hunger Fear

Thalamus Basal Ganglia

-

-

Gripper motor

Wheel motors Figure 4. Architecture of the embodied basal ganglia model. The ‘salience’ for each action sub-system is calculated in the basal ganglia where inputs from the perceptual, motivational, and action sub-systems converge. The basal ganglia also resolve the competition between action sub-systems and disinhibit the motor outputs (to gripper and wheel motors) of the winning sub-system. Disinhibition also provides positive feedback, via the thalamus, to the winning competitor. Note that only a small sub-set of actual connections are shown. See text for further explanation.

Unfortunately there is insufficient space here to give full details of the implementation of the model which will be described elsewhere (Montes Gonzalez, in preparation). The following account therefore provides an outline description of the key components of the model. Timing The simulation operates on a series of discrete time-steps of equal duration. At each step all modules are updated, conflict resolution is carried by the basal ganglia model, and the winning motor commands are executed.

Sensory data At each step the values of the eight IR detectors, and of an optical sensor in the gripper jaw are provided as required to the action and perceptual sub-systems. Action sub-systems Each action sub-system operates independently to generate a continuous stream of motor command vectors, that are directed toward the motor systems. At each timestep the command vector is a function of relevant sensory data. So, for instance, cylinder-seek will use the IR signals to judge the direction toward the nearest cylinder and will generate a motor command vector that specifies movement in that direction. Each action sub-system also contains an integer-valued short-term memory (STM) register. Action sub-systems such as cylinder-pickup require the execution of an ordered sequence of movements (e.g. reverse into lifting position, lower gripper, close gripper, lift gripper) and for these systems the STM register records the current position within this sequence. Note that this movement subsequence could alternatively be decomposed into the behavior of multiple action sub-systems at a finer level of granularity. This issue will be addressed in the discussion section. Action sub-systems receive feedback from the basal ganglia via a component of the control architecture modeled on the mammalian thalamus. As explained earlier, thalamic neurons are tonically active and have reciprocal excitatory connections with (cortical) action sub-systems. Disinhibition of a thalamic channel by the basal ganglia will therefore increase the excitation directed at the corresponding action sub-system. In our model thalamic feedback has two related uses. First, it is used to complete the positive feedback loop required to generate behavioral persistence. Second, it is used by those action sub-systems that generate movement sequences to determine whether or not they are currently selected (i.e. controlling the motor output systems). Specifically, if the thalamic feedback for a channel is above a preset threshold then the corresponding action sub-system assumes itself to be selected, and increments its STM register. If, on the other hand, the feedback is below threshold, the sub-system assumes itself to be deselected and sets the STM register to zero. Two signals are sent by each action sub-system to the input side of the basal ganglia. The first, called below the persistence signal, varies directly with the thalamic feedback signal so completing the positive feedback loop. The second, termed the motor-state signal, allows the subsystem to make a contribution to its own salience calculation in the basal ganglia. This signal is set to zero by default but can be given a value of +1 by the motor component of an active sub-system. For instance, a subsystem such as cylinder-pickup, may use the motor-state signal to indicate to the basal ganglia that it is in a critical

163

phase of a movement sequence (such as reversing in order to pick-up a cylinder). Perceptual and motivational sub-systems At each time-step the output of each perceptual subsystem is a bi-polar value indicating the presence (+1) or absence (-1) of the relevant target feature (e.g. a cylinder, a wall, a corner, or an object in the gripper) as determined by processing relevant sensory signals. The output of each motivational sub-system is a single scalar value in the range (0-1) which gradually decreases (fear) or increases (hunger) over time. Hunger is also reduced by a fixed amount when a cylinder is deposited in a corner of the arena. Salience Basal ganglia input neurons (striatal medium spiny cells) have very large dendritic trees and may receive input from many different regions of the brain. This can be viewed as a mechanism for assimilating relevant signals from a variety of sources that can then be integrated (by summing over the dendritic tree) to formulate a measure of salience for the activity associated with a specific basal ganglia channel. In the robot control architecture each basal ganglia channel receives input from diverse sources—the persistence and motor-state signals from the corresponding action sub-system together with signals from perceptual and motivational sub-systems relevant to the selection of that action. These signals form the input vector for the channel and the salience is then computed as the dot product of this vector with a vector of hand-picked weights. Of course, in principle, the relevant weights for each channel could also be learned. This possibility will be given some consideration in section 6. Action selection by the intrinsic basal ganglia model In our earlier description of the intrinsic basal ganglia model we described two complimentary mechanisms for action selection in the basal ganglia—recurrent inhibitory connections within local areas of the striatum, and feedforward selection mechanisms that can operate to select between non-neighbouring striatal areas. In the experiments described below all channels compete through the local reciprocal inhibitory connections. The exploration of action selection in an embedded model using the feed-forward mechanism alone will be investigated in future work. Note that the use of local recurrent inhibition does not mean that the basal ganglia is simply implementing a winner-takes-all competition. The output of the striatum will be proportional to the salience of the strongest channel which will itself be dependant on the behavior of the thalamic feedback pathway. Additionally, dopamine modulation of intrinsic pathways has a significant effect on the ability of a weakly salient channel to override the tonic inhibitory signals generated by the basal ganglia output nuclei.

Motor systems A selected basal ganglia channel gates the motor command vector emanating from the corresponding action sub-system by removing tonic inhibition from its motor pathway. In the embodied model this is simulated by a multiplicative gate. Specifically, the motor command vector for each sub-system is multiplied by the value (1 – ky) where y (0≤y≤1) is the strength of the inhibitory output signal for its basal ganglia channel and k is a constant. The motor command vector itself is expressed as the desired change in the current robot state, that is, as changes in wheel-speeds, gripper-arm elevation, and gripper-mouth position. This means that in the event of full basal ganglia inhibition the motor command vector will have zero value and the current robot state will not change (i.e. it will freeze in its current position). In the event of partial basal ganglia inhibition the robot will act but its movements will be slowed by the resulting reduction in the size of the command vector. The motor vectors arriving from all action sub-systems are summed following gating by the basal ganglia, thus the vector generated by any losing action subsystem whose output is not fully inhibited by the basal ganglia will combined with that of the winner. This mechanism allows for the possibility of distortion in the event of ineffective suppression of competitors by the basal ganglia.

5.

Results

Our initial results have demonstrated appropriate and clean switching by the embodied basal ganglia model. In other words, given adequate dopamine levels and suitable salience weighting vectors, the robot selects appropriate actions for different circumstances and generates sequences of integrated behavior over time. Of course, this is not a particularly impressive achievement in terms of robot control—alternative and perhaps simpler control architectures could be as effective in performing these simple tasks. However, from our perspective the experiment does demonstrate that the proposed basal ganglia selection mechanism is capable of effective and clean switching when embedded in a complete architecture. The behavior of the embodied model is illustrated in Figure 5 below. Graphs are shown for basal ganglia activity relating to each of the five action sub-systems for a test period of 700 simulation steps. The graphs in the left-hand column illustrate the component of the salience of each subsystem due to intrinsic and extrinsic variables from the action, perceptual, and motivational sub-systems. Those in the central column show the total salience which includes the effects of the persistence signal from the thalamic feedback loop. The graphs on the right show the basal ganglia output. Note that a selected channel should ideally have zero output, and a non-selected one should have a relatively high output.

164

The behavior of the simulated robot in this experiment was as follows. The robot begins with a high level of ‘fear’ which results in slightly higher salience for wall-seek than for other actions. After finding a wall, corner-seek becomes more salient and the robot spends time finding a corner, then waits there until increasing hunger and decreasing fear drive up the salience of cylinder-seek. When the salience for corner-seek falls below that of cylinder-seek the robot switches to looking for cylinders. When it finds a cylinder, cylinder-pickup is selected, followed by wall-seek, cornerseek, and finally cylinder-deposit. Cylinder_seek

The left-hand column graphs in figure 5 shows that in some circumstances, two or more action sub-systems can have similar levels of salience thus generating a requirement for action selection by the basal ganglia. The central column of graphs shows the role of the thalamic feedback in boosting the salience of a selected sub-system so that it is less likely to suffer interruption due to variations in salience caused by changing sensor signals and sensor noise. Finally, the graphs of basal ganglia output (right-hand column) show clean switching between action sub-systems with selected sub-systems fully disinhibited.

Cylinder_seek

Cylinder_seek

3.5

3.5

1

2.5

2.5

0.8

1.5

1.5

0.6

0.5

0.5

0.4

-0.5

-0.5

0.2

-1.5

-1.5

0

Cylinder_pickup

Cylinder_pickup

Cylinder_pickup

3.5

3.5

1

2.5

2.5

0.8

1.5

1.5

0.6

0.5

0.5

0.4

-0.5

-0.5

0.2

-1.5

-1.5

0

Wall_seek

Wall_seek

Wall_seek

3.5

3.5

1

2.5

2.5

0.8

1.5

1.5

0.6

0.5

0.5

0.4

-0.5

-0.5

0.2

-1.5

-1.5

0

Corner_seek

Corner_seek

Corner_seek

3.5

3.5

1

2.5

2.5

0.8

1.5

1.5

0.6

0.5

0.5

0.4

-0.5

-0.5

0.2

-1.5

-1.5

0

Cylinder_deposit

Cylinder_deposit

Cylinder_deposit

3.5

3.5

1

2.5

2.5

0.8

1.5

1.5

0.6

0.5

0.5

0.4

-0.5

-0.5

0.2

-1.5

-1.5

0

Figure 5. Basal ganglia activity relating to each of the five action sub-systems for a test period of 700 simulation steps. Left: the component of the salience of each sub-system due to intrinsic and extrinsic variables and excluding persistence. Center: total salience including the effect of the persistence signals. Right: Basal ganglia output, selected sub-systems should show near-zero activity. The salience weightings used in the simulation were as follows—Cylinder-seek: persistence +1.0, motor-state 0.0, gripper-status –0.3, cylinder-detect –0.3, hunger +0.9. Cylinder-pickup: persistence +1.5, motor-state +0.1, gripper-status –0.3, cylinder-detect +0.3, hunger +0.9. Wall-seek: persistence +1.0, motor-state 0.0, wall-detect –0.3, gripper-status +0.3, fear +0.9. Corner-seek: persistence +1.9, motorstate 0.0, wall-detect +0.3, gripper-status +0.3, fear +0.8. Cylinder-deposit: persistence +1.5, motor-state +0.1, gripper-status +0.75, corner-detect +0.75, hunger +0.5. See text for further explanation.

165

Initial experiments with simulated dopamine modulation have also generated encouraging results. For instance, lowering dopamine below the normal tonic level causes the robot difficulty in approaching a cylinder or in picking one up. Furthermore, in a reduced dopamine condition, a selected action sub-system may not always be fully disinhibited resulting in the desired movement being performed more slowly than is normal. There may be an interesting similarity between the slowing of movement observed in the robot and the bradykinesia observed in human patients with Parkinson’s disease. In circumstances of increased dopamine the robot may select two action subsystems simultaneously resulting inappropriate movements such as repeated lifting and lowering of the arm during cylinder-seek. Inappropriate limb movements are also a characteristic of the human basal ganglia disorder Tourette’s syndrome which is related to abnormally high levels of dopamine.

6.

Discussion

In the following we briefly review some of the issues that have been raised in the previous section and that we are planning to address in our future work. Hierarchies of action selection and automatisation An important issue concerns the granularity of the action selection provided by the basal ganglia. For instance, some researchers have proposed a role for the basal ganglia in a more fine-grained sequencing of movement than selecting between competing behavioral alternatives (see Mink, 1996, for review). Indeed, this level of action selection would be equivalent to the type of movement sequencing currently performed within our action subsystems (e.g. cylinder-pickup). The suggestion that the basal ganglia is involved in movement sequencing can, however, be reconciled with the view of the basal ganglia as an action selection device on the grounds that such tasks can be regarded as action problems on a much shorter timescale. Elsewhere (Prescott et al., 1999) we have proposed a general hypothesis, that appears to be consistent with the anatomy of the BG, that similar switching circuitry is employed in different regions of the basal ganglia to resolve selection problems at different levels of functional integration. It seems likely, however, that in the case of innate or well-practiced movement patterns, fine-grained control of movement generally takes place outside the basal ganglia. Two recent studies help to demonstrate this point. First, following extensive investigation of the role of the basal ganglia in the ‘syntax’ of rat grooming behavior, Cromwell and Berridge (1996) conclude that the role of sequencing the component movements of basic grooming acts is satisfied by pattern-generators in the rat brainstem. The function of the basal ganglia, they propose, “is not so much for the generation of the serial order pattern [...] as for

the implementation of that pattern in the normal flow of behavior." (p. 3455). Second, Carelli, Wolske, and West (1997) have shown that striatal neurons that fire while a rat is learning a lever pressing task cease firing once the behavior is well-practiced. The conclusion they derive is that the striatal activity needed to learn a particular motor response may not be required for its performance once the action has become automated. This result is open to a number interpretations, however, one interesting possibility is that action selection by the basal ganglia may be involved in constructing new movement sequences which, following practice, then become available for selection as larger ‘chunks’ of behavior (see e.g. Graybiel, 1995). Learning The basal ganglia are strongly implicated in instrumental conditioning and in various forms of sequential learning. Many dopaminergic neurons in the midbrain basal ganglia areas appear to fire in conjunction with rewarding events, or prior to anticipated rewarding events. Shultz, Dayan, and Montague (1997) have proposed that the afferents from these structures to the striatal input neurons may provide a training signal similar to the temporal difference error used in artificial reinforcement learning methods. Houk, Adams and Barto (1995) have further speculated that something akin to an actor-critic learning system may be operating in the basal ganglia. In this paper we have considered an alternative role for dopamine in modulating the sensitivity of basal ganglia switching, however a dual role for dopamine in both learning and selection seems likely (see Redgrave et al. 1999b). The current model uses hand-crafted weights at the input to the basal ganglia, in future work we plan to explore the use of reinforcement learning and/or genetic algorithms in order to determine appropriate weight values. Multiple-winners and non-local switching The simulation described above made use of local interstriatal switching circuits in order to resolve selection competitions. However, as we have pointed out, it is possible that competitions for different motor systems may take place in different areas of the striatum. In the future we plan to investigate this possibility in the robot model by allowing separate competitions for the use of the drivenwheels and the gripper-arm. An interesting issue concerns the selection of compatible actions by different motor systems. Whilst, it is clear that it is useful to be able to walk while chewing gum, in general, selections performed in different parts of the motor system must meet the constraints of generating coherent behavior—for instance, it is not such a good idea to walk forwards while looking backwards. Addressing these issues is an important goal of our future research.

166

7.

Conclusion

In this paper we have demonstrated that a model of the intrinsic circuitry of the vertebrate basal ganglia can provide effective action selection when embedded in the control architecture of an autonomous robot engaged in simple, animal-like behaviors. We hope that future experiments will show further specific similarities between behavior switching in the model and in vertebrates. Of course we are currently a very long way from modeling the much richer behavior switching observed in real animals such as the rat shown in figure 3, however, we do believe we have made a start.

References Arbib, M. (1995). Introducing the neuron. In The Handbook of Brain Theory and Neural Networks. M. A. Arbib. Cambridge, MA, MIT Press. Bryson, J. (in press). Cross-paradigm analysis of autonomous agent architecture. Journal of Experimental and Theoretical Artificial Intelligence. Carelli, R. M., M. Wolske and M. O. West (1997). Loss of lever press-related firing of rat striatal forelimb neurons after repeated sessions in a lever pressing task. Journal Of Neuroscience 17(5): 1804-1814. Chevalier, G. and J. M. Deniau (1990). Disinhibition as a basic process in the expression of striatal functions. Trends in Neurosciences 13(7): 277-280. Cromwell, H. C. and K. C. Berridge (1996). Implementation of action sequences by a neostriatal site - a lesion mapping study of grooming syntax. Journal Of Neuroscience 16(10): 3444-3458. Graybiel A.M. (1995) Building action repertoires: memory and learning function of the basal ganglia. Current Opinion in Neurobiology 5: 733-741. Gurney, K., T. J. Prescott and P. Redgrave (1998). The basal ganglia viewed as an action selection device. 8th Int. Conf. Artificial Neural Networks, Skövde, Sweden. Gurney, K., T. J. Prescott and P. Redgrave (to appear). A computational model of action selection in the basal ganglia. Biological Cybernetics. Houk, J. C., J. L. Adams and A. G. Barto (1995). A model of how the basal ganglia generate and use neural signals that predict reinforcement. In Houk, Davis, & Beiser (1995). Houk, J. C., J. L. Davis and D. G. Beiser (1995). Models of Information Processing in the Basal Ganglia. Cambridge, MA, MIT Press. Humphries, M. and K. Gurney (1999). A computational model of action selection in the basal ganglia: Thalamic and cortical interactions. AI Vision Research Unit Memo, no. 132. University of Sheffield.

Houston, A. and B. Sumida (1985). A positive feedback model for switching between 2 activities. Animal Behaviour 33(FEB): 315-325. Leise, E. M. (1990). Modular construction of nervous systems: a basic principle of design for invertebrates and vertebrates. Brain Research Reviews 15: 1-23. Maes, P. (1995). Modelling adaptive autonomous agents. Artificial Life: An Overview. C. G. Langton. Cambridge, MA, MIT Press. McFarland, D. (1989). Problems of Animal Behaviour. Harlow, UK, Longman. Medina, L. and A. Reiner (1995). Neurotransmitter organization and connectivity of the basal ganglia in vertebrates - implications for the evolution of basal ganglia. Brain Behavior and Evolution 46(4-5): 235258. Mink, J. W. (1996). The basal ganglia: Focused selection and inhibition of competing motor programs. Progress In Neurobiology 50(4): 381-425. Montes Gonzalez, F. (in preparation). An embodied model of action selection by the vertebrate basal ganglia, PhD thesis, University of Sheffield, UK. Prescott, T. J., (to appear). The evolution of action selection. In Towards the whole iguana. D. McFarland and O. Holland. Cambridge, MA, MIT Press. Prescott, T. J., P. Redgrave and K. N. Gurney (1999). Layered control architectures in robots and vertebrates. Adaptive Behavior 7(1): 99-127. Redgrave, P., T. Prescott and K. N. Gurney (1999a). The basal ganglia: A vertebrate solution to the selection problem? Neuroscience 89: 1009-1023. Redgrave, P., T. J. Prescott and K. Gurney (1999b). Is the short latency dopamine burst too short to signal reward error? Trends in Neuroscience 22: 146-151. Salamone, J. D. (1988). Dopaminergic involvement in activational aspects of motivation - effects of haloperidol on schedule-induced activity, feeding, and foraging in rats. Psychobiology 16(3): 196-206. Schultz, W., P. Dayan and P. R. Montague (1997). A neural substrate for prediction and reward. Science 275: 15931599. Snaith, S. and O. Holland (1990). An investigation of two mediation strategies suitable for behavioural control in animals and animats. First International Conference Simulation of Adaptive Behaviour, Paris. Wagner, G. P. and L. Altenberg (1996). Perspective complex adaptations and the evolution of evolvability. Evolution 50(3): 967-976. Wickens, J. (1997). Basal ganglia: structure and computations. Network-Computation in Neural Systems 8(4): R77-R109.