Combining Visual and Proprioceptive Information in a Model of Spatial Learning and Navigation Ricardo Chavarriaga and Wulfram Gerstner Laboratory of Computational Neuroscience I&C and Brain Mind Institute, EPFL CH-1015, EPFL, Lausanne, Switzerland E-mail:
[email protected]
Abstract— Animals can adopt different navigation strategies according to the environment and the task they have to solve. The process of selecting the appropriate navigation strategy in mammals involves parallel pathways. The hippocampus has been thought to be the potential basis for a first type of navigation strategy in which a representation of the space is required. This representation uses spatially tuned neurons (place cells) found in the rat hippocampus. A second, different, navigation strategy that involves the dorsal striatum in the basal ganglia, is used by rats if the target can be identified by a visible cue. This paper presents a computational model of the rat hippocampus and its interactions with the basal ganglia, able to reproduce neurobehavioral experiments regarding self-localisation, integration of sensory and proprioceptive information, and automatic selection of appropriate navigation strategies.
I. I NTRODUCTION Neurophysiological findings suggest that spatial selflocalisation of rats is supported by place-sensitive and direction-sensitive cells. Neurons in the rat Hippocampus, termed place cells [1], are active only when the rat is at a specific location of the environment. In other words, Hippocampal activity encodes a spatial representation of the environment in allocentric (i.e. external) coordinates (See [2] for a good review on this topic). Another set of neurons in the hippocampal formation called head-direction cells, encode directional information and can be seen as an allocentric compass of the animal. Place cells and head-direction cells interact with each other and form a neural circuit for spatial representation [3]. The firing properties of place cells are controlled by both external (allothetic) sensory input, and internal sources (idiothetic) such as vestibular or proprioceptive cues. Several neurobehavioral studies have addressed the influence of those sources of information on the hippocampal space code. It has been observed that the firing pattern of the place cell population can rotate following the rotation of a visual landmark [3]. In another study [4], rats were trained to shuttle on a track between a fixed and a movable reward sites. In the initial parts of the journey cells fired at fixed distances to the point of departure (suggesting a representation based on path integration), whereas as the rat goes farther along the track, the spatial code became aligned with external stimuli. This supports the idea that external sensory input can influence the spatial representation in case of ambiguity with the idiothetic
information. All navigational tasks where the target cannot be directly identified by any cue (or sequences of cues) require a spatial representation and are classified as locale navigation. Such a representation is related to the theory of cognitive maps, whose anatomical locus seems to be the hippocampus [1]. Other navigation task can be solved by using a simple landmarkguidance behaviour (Taxon navigation). Furthermore, when the navigation towards a goal requires always the same specific motor sequence acquired by training, the task is classified as praxic navigation. Animals can adopt any of those strategies depending on the environment they are embedded in. The model presented in this paper extends the capabilities of a previous model [5], [6] in two ways. First, the new model allows a direct interaction between both external and internal sources of information in the hippocampal formation. Second the new model is able to switch between different strategies of navigation depending on the environment and the task to be solved. The model was tested in several experimental paradigms and reproduces and explains the results observed in behavioural experiments with rats. II. H IPPOCAMPAL MODEL The computational model of the rat Hippocampus presented in this paper is able to learn a representation of the space by exploration. Starting with no prior knowledge, the system grows incrementally based on agent-environment interaction. At each new location that the agent has not seen before, a place cell is added. Once the spatial representation has been obtained, it can be used to perform navigational tasks. In our neural model of place cells, external (Allothetic) sensory information is correlated with proprioceptive (Idiothetic) information using Hebbian learning. This yields a stable space representation where ambiguities in the visual data are resolved by the use of idiothetic information. In the same way, the idiothetic processing which normally drifts due to the accumulation of errors over time, is stabilised using information from the allothetic pathway. Figure 1 presents a functional diagram of the model, and the corresponding structures in the rat brain. Allothetic (visual) stimuli are encoded in a population of view cells (VC), which project to another population (referred to as Allothetic Place Cells, APC) where a vision-based representation of
Allothetic Pathway
Idiothetic Pathway
2) Allothetic input: Behavioural experiments suggest that the hippocampal spatial representation is sensitive to the position of landmarks placed around the environment [8]. In our model, each view cell encodes the current local view using the distances (dψ ) to the closest wall in 21 different directions (ψ) in the visual field (Ω). A similar sensory system was proposed in [9]. The features of a local view as perceived by an agent in a simulated environment is presented in the figure 2.
local view
dΩ
Odometry signals Storage of local views
v,θ
VC
APC
Allothetic spatial representation
PI
PC
Fig. 1.
Idiothetic spatial representation
Hippocampal spatial representation
Functional diagram of the hippocampal model.
−135deg
the space is built. This representation is proposed to reside in the superficial layers of the rat Entorhinal cortex. The transformation of visual input to a vision-based representation is part of the allothetic pathway leading to hippocampus. A second pathway is based on proprioceptive (idiothetic) selfmotion information. We refer to the ability to navigate using self-motion information (i.e. odometry) as Path Integration (PI). It allows the rat to navigate in darkness or in absence of visual cues. In our model, the superficial layer of the medial Entorhinal Cortex has been suggested as the neural locus of this representation. Both, Allothetic (APC) and Idiothetic (PI) representations project onto the hippocampal formation to form the population of place cells (PC). We now discuss our model in more detail. 1) Idiothetic input: In our model, the Idiothetic representation of the space, encoded in the PI cells is implemented by an evenly distributed population of cells, relying on preconfigured metric interrelations. If the agent is moving with speed v(t) we estimate its position pPI (t) = (x(t), y(t)) by integration starting from the previous estimate pest (t − 1), p
est
(t − 1) = (xest (t − 1), y est (t − 1))
x(t) = x
est
(t − 1) +
y(t) = y est (t − 1) +
Z
(1)
t
v(t0 ) cos(θ(t0 ))dt0
(2)
v(t0 ) sin(θ(t0 ))dt0
(3)
t−1
Z
t t−1
Proprioceptive information will provide the speed (v) of the agent, and we assume that the population of head direction cells1 provides the angle θ. After a movement, the activity rjPI of each cell j in the path integration module (PI) is updated according to, 2 (pPI − pPI j ) PI (4) rj = exp − 2 2σPI
Where pPI j is the center of the field of the cell j, σPI its width, and pPI = (x(t), y(t)) is the updated estimate based on path integration.
1 A neural model for the head-direction has been implemented previously, see [7].
ψ
135deg
dψ
Fig. 2. Left. Example of the agent situated in a simulated environment. Right. Local view as perceived by the agent. The features (dψ ), stored by the view cells, are the normalised distances to the walls in 21 different directions ψ in the visual field (Ω = 270◦ ). VC of a view cell m is computed by compaThe activity rm ring the features of the current local view (dΩ = {dψ | ψ ∈ Ω}) with the local view (dm Ω ) stored when the cell was created. 2 1 X m ! |dψ − dψ | VC rm = exp
|Ω|
−
ψ∈Ω
2 2σvc
(5)
View cells (VCs) project downstream onto a population of cells (APC) that will code for the location using only visual information. Unsupervised Hebbian learning is applied on those projections to allow the integration of information from several local views into a single APC cell. Specifically, a synapse from a vision cell j to a APC cell i changes according to, ∆wij = η riAPC (rjVC − wij )
(6)
Where η is a small learning rate. A. Combining Idiothetic and Allothetic input Allothetic and idiothetic information streams are combined in two different ways. First, both streams project onto a common target area, the hippocampus. Second, there is a direct interaction between the two streams so as to avoid accumulation of errors. The two different representations of the space discussed above, driven by visual and proprioceptive input respectively, are located in APC and PI populations. These two populations project onto the hippocampal module, where the visual and proprioceptive information is combined.
III. M ODELLING NAVIGATION S TRATEGIES
Fig. 3. Activity of the PC population. The cross signals the population vector (pPC ) of the hippocampal place code.
During exploration, new Place cells (PC) are recruited and connected to simultaneously active APC and PI cells. Those connections are modified by means of Hebbian learning in order to integrate both, idiothetic and allothetic information, by an algorithm analogous to Eq. 6. In order to interpret the information represented by the ensemble of hippocampal neurons, we employ population vector decoding, i.e. the center of mass, pPC =
X
rjPC p~jPC
j
. X
rjPC
k
(7)
of the activity cloud represents the location of the agent. Here, p~jPC is the center of the place field of PC cell j. Figure 3 shows the activity of the PC population when the agent is at the same location as in Figure 2. The additional direct interaction between the allothetic and idiothetic pathways allows to correct a possible mismatch between them. External input is used to prevent the idiothetic representation to accumulate error over time. Furthermore, it allows to reinitialise the path integration when the internal information has been corrupted (e.g. when the agent enters at an unknown point into the arena or if it is disoriented by the experimentalist). Equations (2)-(4) account for the update of the idiothetic estimation of position by integrating information about the movement. In order to incorporate external sensory information, we allow the PI population to also depend on the estimation of location made by the allothetic pathway, rjPI
2 (pest − pPI j ) = exp − 2 2σPI
(8)
The mammalian basal ganglia has been generally associated with stimulus-reward associations [10], [11], as well as the selection of motor commands [12]. Several studies [13] suggest that there exist at least two memory systems working in parallel involving this structure. One, involving the dorsolateral striatum, mediates a form of learning in which stimulus-response (SR) associations or habits are incrementally acquired. The other system participates in the association between motivational states (or highly processed sensory inputs) and behavioural actions. The hippocampus and the ventral striatum are components of this system. These systems work in parallel in a competitive way according to the situation in which learning occurs [14], and seem to mediate different navigation strategies since locale strategies have been observed to be hippocampal dependent, whereas taxon strategies are impaired by lesions on the dorsal striatum [15], [16]. To allow our model to implement different navigation strategies, the navigation system described in [6] has been extended into a mixture of experts. Each expert corresponds to one memory system (and thus, to one strategy of navigation). Actions are learnt by applying reinforcement learning through exploration. A gating network is used to select the most appropriate strategy according to the external input and the internal state of the system. Figure 4 shows a diagram of the model of navigation. The selected action of the system will be the result of a competition among the experts. For an expert k the probability of being selected is a function of a gating value gk . The biological locus of this competition could be the interactions between the Striatum and the Substantia Nigra pars Reticulata (See [12] for an anatomical review). Both the experts and the gating network modify their parameters by means of reinforcement learning. The learning algorithm used was proposed in [17], in an actor-critic architecture. One of the experts corresponds to the ventral and dorsome-
Sensory Input (Local view)
I(ψ )
d(ψ)
...
...
CI
PC Hippocampal place representation
Dorsolateral striatum
Taxon
Locale
ACI
where,
Ventral striatum
PC
A Gating
pest = αpPI + (1 − α)pAPC
(9)
Where pPI , and pAPC (computed as in Eq. 7) are the estimated location using respectively, idiothetic and allothetic information. This update rule will modify the idiothetic activity in order to keep it consistent with the estimation based on external sensory input.
TD
reward
A
Fig. 4. Architecture of the model of navigation. PC ≡ Spatial information coming from place cells in the hippocampus. CI ≡ Cortical sensory input (Iψ , dψ ). ACI ≡ Action selected by the stimulus-driven strategy. APC ≡ Action selected by the place-driven strategy. A ≡ Action selected by the gating network.
dial striatum and is driven by the hippocampal activity. The set of hippocampal place cells represents the current state s, and the expert learns a mapping between this hippocampal spatial representation (PC) and a set of actions, implementing a placebased strategy (i.e. locale strategy). The other expert receives external sensory inputs (CI), and encodes simple stimulusresponse behaviours (taxon strategy). The sensory input to this module consists of a vector of 21 distances to the closest walls (dψ ) and their 21 grey values (Iψ ). This set of 42 values represents the current sensory state s of the expert. The biological locus for this expert may be dorsolateral striatum. Both experts encode actions in a population of action cells (AC). An AC cell i in a population k represents a specific direction of movement ϕ ~ i k , and its firing rate aki represents, in the sense of reinforcement learning, the action value Q(s, i) for the current state s and action i. Here k stands for PC or CI depending on the expert. The action values for both place cell-driven (aPC ) as stimulus driven (aCI ) strategies are computed in the following way = aPC i
X
PC PC rj wij
aCI i =
X
CI CI rj wij
(10)
For each expert k, the direction of the population vector of its ensemble of action cells represents the continuous direction Φk which maximises the reward. The length of the population P ~ i k | represents the quality of the action. vector Ak =| i aki · ϕ The two experts compete for action selection. The probability for an action Ak , proposed by an expert k (k ∈ {P C, CI}), to be selected is a function of its gating value gk . gk =
X
PC PC rj ) + (zkj
X
CI CI rj ) (zkj
(11)
The probability depends on both, hippocampal (PC) and cortical (CI) activity, P (direction = Φk ) =
gk Ak Σ(gi Ai )
(12)
After an action has been selected, the reward prediction error δ is computed, δk (t) = [Rt − Ak (t − 1)] + γAk (t)
(13)
Where Rt is the reward received at time t. The weights (w P C and wCI ) in Eq. 10 are updated using a Q-learning algorithm [10]. k ∆wij = η k δk hk aki
(14)
Here η k is the learning rate of expert k, δk is the reward prediction error from Eq. 13, and aki is the action value from Eq. 10. The scale factor hk , is a function of the gating value and the prediction error for that expert. hk =
g k ck Σ(gi ci )
(15)
Where ck = exp(−ρδk2 ) (ρ > 0). It assures that the weights update is more significant for those experts that have consistently small reward prediction errors (ck ≈ 1) and have a high probability of being selected (gk > gi ). The gating network is updated according to Eq. 16. ∆zkj = ξ(hk − gk )rjPC,CI
(16)
with the same scale factor hk and a learning rule ξ. Notice that the experts modify their weights such that, even if an expert is not selected, it can improve its performance in the current task. The system described so far is able to select among different strategies according to its perceptual input. At each timestep, every expert proposes an action on the basis of its afferents and gating value. However, a strategy should be applied during several timesteps in order to exhibit a coherent behaviour and better assess its suitability for the current task. To allow this, instead of imposing a competition at every timestep, a chosen strategy will continue till its accumulated prediction error (since the moment it was chosen) reaches some threshold. Then, a new competition among the experts is performed (allowing, but not forcing, the selection of a different strategy). IV. E XPERIMENTS A. Mismatch correction between Allothetic and Idiothetic cues In order to elucidate how external landmarks and internal information control the activity of the hippocampal place cells, Gothard et al. [4] trained rats to alternate between a fixed reward site at one end and a movable reward box at the other end of an elevated linear track. The movable box remained in a constant position during the inbound journey (i.e. when the rat goes from the fixed site to the reward box). While the rat returns to the fixed site (outbound journey) and presumably cannot see the movable box behind it, the box is moved to a different location (Fig 5). Their results show that in the initial part of the journey, the place cells fire at fixed distances relative to the point of departure (driven mostly by path integration). Later on, the place code changes its activity according to the external stimuli. When applied to the same behavioural task, our model reproduces the results obtained with rats. Figure 6 shows the place field of four simulated hippocampal cells. Notice that the activity of cells at the beginning of each journey are subject to little alterations with respect to the starting point, later on the mismatch between the current estimation of the path integration and the external (visual) cues changes the activity of the hippocampal ensemble. The dashed lines in the figure 6 show how the encoded position of a particular cell changes during this task. The shifting of the firing profile for the PC population with respect to the change of the box position (displacement slope) [4] is shown in the figure 7. The slope takes the value of 1.0 if the firing profile (rate as a function of the position on the track) shifts the same amount as the box. When the firing
Fixed Reward Site
Movable Box Inbound Journey
Box 1 Box 2 Box 3
Box 4 Box 5
Fig. 5. Behavioural apparatus. The rat moves back and forth between a fixed reward site and a movable box, which is displaced during outbound journeys. Inbound
Outbound
1 Box1 0 1 Box2
0 1
Box3
0 1
Box4
0 1
Box5 0
Fig. 6. Firing profiles (rate as a function of the position on the track) of four hippocampal cells, for different positions of the movable box. Two of the cells are active during inbound journey (left), end the other two, during the outbound journey (right). The positions of the movable box are arranged from the longest (top) to the shortest track (bottom).
1.0 0.8 0.6 0.4 0.2 0
Inbound
1.0 0.8 0.6 0.4 0.2 0
Outbound
Fig. 7. Displacement slope. The figure shows how much the firing profile changes with respect to the preferred firing location along the track.
location of the reward, requiring the use of a locale strategy. Figure 8 presents the navigation map obtained after training in the environment shown in the Figure 2. In another experimental paradigm, Devan and White [15] trained rats concurrently in a hidden (locale navigation task) and visible (taxon navigation task) version of the Morris water maze. After training, a competition test was performed, in which the visible platform is at a different location than it was in during training. Hippocampal lesions prevented spatial learning and produced a preference for the cue response (Cue responders). Dorsolateral striatum lesions biased the behaviour toward a spatial based strategy (Place responders). Intact animals were able to solve the task, but nearly half of them swam first to the former platform location and only then they went towards the visible cue. This suggest that those animals relied first on hippocampal (place) based strategy, and then switched to use a strategy using only the visible cue. We present the results of testing the model in the paradigm described above. During training in an environment surrounded by white walls, a dark visible cue was placed at the location of the reward at a height that is inside the visual field of the agent. After training, the agent was tested in a competition trial with the visible cue in a different position. Figure 9 shows the navigational map after training in the hidden/visible water maze, with the visible cue still located at the reward position. Note that both strategies solve the task, and hence the probabilities of being selected (i.e. the gating values) are close to equal across the environment. The competition trial is shown in the figure 10, where the locale strategy still leads the agent towards the location of the goal during training, as opposed to the taxon strategy which points towards the visible cue. Figure 11, shows two different trajectories followed by the agent in the competition situation. In one case, at the beginning of the trial the agent selects a locale strategy (place response) and when it fails to find the platform where it used to be, it changes to use a taxon strategy. In the other case, the cue-based strategy is selected from the beginning and the agent goes straight to the visible platform. Reproducing the behaviour reported for unlesioned animals. V. C ONCLUSIONS
profile doesn’t change, the slope will be 0.0. The horizontal axis corresponds to the position on the track where the cells was most active in the longest track (measured from the fixed site). It shows that cells which fire close to the fixed reward site have a small slope, and that this value increases as the rat moves toward the box (where the mismatch between idiothetic and allothetic cues occurs).
Our model captures qualitatively the results reported by several experimental paradigms, and provides a simple frame-
B. Goal-Based navigation The Morris water maze is one of the most popular experimental paradigms in hippocampus dependent navigation. In the rat experiments, an animal has to find a hidden escape platform while swimming in milky water. In our agent experiment a positive reward is given at some specific location of the environment, but there is no specific cue signalling the
Fig. 8. Navigation map leading to a hidden goal. The goal position corresponds to the dark circle.
Fig. 9. (Left) Navigation map of both experts. Given that the reward is located at the same location as the cue, both maps lead to the same position. (Right) Gating values for each position in the environment. The dark grey correspond to the locale strategy.
the system is able to develop coexisting strategies and select the most appropriate according to the conditions of the task. In conflicting situations, as in the competition trial in [15], the behaviour exhibited by the model is comparable to the results reported in neurobehavioral studies. This model can be a good starting point for further developments which deal with complex situations or more elaborated sensory inputs. In the same way, several experts can be implemented in the system in order to develop different strategies (several SR associations and/or different contexts signalled by the hippocampal activity). ACKNOWLEDGEMENT This work was supported by the Swiss National Science Foundation under grant no. 200020-100265/1. R EFERENCES
Fig. 10. Competition Test. (left) Navigation map encoding a locale strategy. This expert still encodes the trajectories towards the place where the goal used to be located during training. (Right) The taxon strategy leads the agent to the current location of the cue.
Fig. 11. Example of trajectories observed during the competition test. (Left) The agent follows a locale strategy before going towards the visible goal (place responder). (Right) The agent using a taxon strategy since the beginning of the trial (cue-responder).
work to evaluate the interactions between allothetic and idiothetic information in the formation of a spatial representation. One possible extension include the modelling of the dynamics of those interactions. Studies suggests that whenever a large mismatch occurs there is always a delay before the visual information exerts their influence over the place and heading representation. This leads to the hypothesis that those representations are mainly driven by idiothetic stimuli, being the visual path a secondary influence. How, and when this influence is exerted constitutes an interesting problem to model, and is a key aspect in the building and maintenance of robust representations such as the hippocampal place code. In addition to that, by implementing two parallel competing navigation systems, coding for locale (place-responses associations) and taxon (stimulus-responses associations) strategies,
[1] J. O’Keefe and L. Nadel, The Hippocampus as a Cognitive Map. Oxford: Clarendon Press, 1978. [2] A. Redish, Beyond the Cognitive Map, From Place Cells to Episodic Memory. London: MIT Press-Bradford Books, 1999. [3] J. Knierim, H. Kudrimoti, and B. McNaughton, “Place cells, head direction cells, and the learning of landmark stability,” Journal of Neuroscience, vol. 15, pp. 1648–1659, 1995. [4] K. Gothard, W. Skaggs, and B. McNaughton, “Dynamics of mismatch correction in the hippocampal ensemble code for space: Interaction between path integration and environmental cues,” Journal of Neuroscience, vol. 16(24), pp. 8027–8040, 1996. [5] A. Arleo, F. Smeraldi, S. Hug, and W. Gerstner, “Place cells and spatial navigation based on 2d visual feature extraction, path integration, and reinforcement learning,” in Advances in Neural Information Processing Systems 13, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds. MIT Press, 2001, pp. 89–95. [6] T. Str¨osslin and W. Gerstner, “Reinforcement learning in continuous state and action space,” in Artificial Neural Networks - ICANN 2003, 2003. [7] A. Arleo and W. Gerstner, “Spatial orientation in navigating agents: Modeling head-direction cells,” Neurocomputing, vol. 38–40, pp. 1059– 1065, 2001. [8] R. Muller and J. Kubie, “The effects of changes in the environment on the spatial firing of hippocampal complex-spike cells,” Journal of Neuroscience, vol. 7, pp. 1951–1968, 1987. ´ [9] N.Burgess, A. Jackson, T. Hartley, and J. OKeefe, “Predictions derived from modelling the hippocampal role in navigation,” Biological Cybernetics, vol. 83, pp. 301–312, 2000. [10] R. Sutton and A. G. Barto, Reinforcement Learning - An Introduction. MIT Press, 1998. [11] W. Schultz, P. Dayan, and P. R. Montague, “A neural substrate of prediction and reward,” Science, vol. 275, pp. 1593–1599, 1997. [12] J. W. Mink, “The basal ganglia: focused selection and inhibition of competing motor programs.” Prog Neurobiol, vol. 50, no. 4, pp. 381– 425, Nov. 1996. [13] M. G. Packard and B. J. Knowlton, “Learning and memory functions of the basal ganglia.” Annu Rev Neurosci, vol. 25, pp. 563–93, 2002. [14] N. M. White and R. J. McDonald, “Multiple parallel memory systems in the brain of the rat.” Neurobiol Learn Mem, vol. 77, no. 2, pp. 125–84, Mar. 2002. [15] B. D. Devan and N. M. White, “Parallel information processing in the dorsal striatum: relation to hippocampal function.” J Neurosci, vol. 19, no. 7, pp. 2789–98, Apr 1 1999. [16] M. G. Packard and J. L. McGaugh, “Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects expression of place and response learning.” Neurobiol Learn Mem, vol. 65, no. 1, pp. 65–72, Jan. 1996. [17] G. Baldasarre, “A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviours,” Cognitive Systems Research, vol. 3, no. 1, pp. 5–13, Mar 2002.