In From Animals to Animats 2: The Proceedings of the Second International Conference on Simulation of Adaptive Behaviour. MIT Press: Cambridge, MA. pp. 233242, 1993.
Building Long-range Cognitive Maps using Local Landmarks Tony J. Prescott and John E. W. Mayhew Artificial Intelligence Vision Research Unit, Sheffield University, Sheffield S10 2TN, United Kingdom. Telephone +44 742 768555 ext. 6546/6554 Fax +44 742 766515 Email:
[email protected] [email protected]
Abstract Cognitive maps can be built using only information about the relative positions of locally visible landmarks. We describe a system that can compute paths between any two locations, irrespective of whether they share common visible landmarks, without using 'compass' senses or dead reckoning abilities. This is achieved by encoding the position of each landmark in the barycentric co-ordinate frames defined by groups of neighbouring cues. Paths between distant points are computed by calculating, using these frames, the positions relative to the agent of landmarks further and further away from the immediate scene. Once the relative positions of landmarks local to the goal are known a vector giving its position in the agent's egocentric frame can be found. This implicit map allows the agent to compute the overall distance and direction to distant targets; find and follow paths to goal locations; generate explicitly the layout of the whole environment relative to its own position; and discriminate between perceptually similar landmarks. The system is robust to noise, its calculations require only linear mathematics, and its memory requirements are proportional to the total number of landmarks.
1
Introduction
There is substantial evidence (see O’Keefe and Nadel (1978); Gallistel (1990); O’Keefe (1990) for reviews) that
birds and mammals are able to find novel or straight-line routes to distant positions outside the current visible scene. The spatial knowledge underlying this navigation competence appears to be declarative rather than procedural in nature in that it describes the relations between places rather than encoding specific instructions of how to travel from one place to another. It also represents metric (rather than merely topological) information since both the distance and direction to important locations can be found. These observations can be summarised by saying that animals construct cognitive maps of their environments that code the locations of salient places in allocentric (i.e. non egocentric) co-ordinate frames. There is some controversy, however, over how perceptual data is used to construct the cognitive map. Several researchers (Gallistel 1990; McNaughton, Chen et al. 1991) have argued that the principal sources of position information are dead-reckoning skills (integrating changes in position from sensory signals) and 'compass' senses (determining orientation by using non-local cues such as the sun, or by sensing physical gradients such as geomagnetism). These skills are used to maintain an estimate of the animal's current position and heading relative to the origin (e.g. the nest) of a global allocentric frame. Places are coded in terms of the distance and direction from this origin or from single landmarks whose position in the global frame are already stored. Following McNaughton et al. (1991) such a mapping scheme is here called a vector coding system. An alternative view, proposed for instance by O’Keefe (1990), is that a place is
Prescott and Mayhew encoded in terms of the array of local landmarks visible from that position. In other words, that the cognitive map stores the locations of potential goals in the local allocentric co-ordinate frames determined by groups of (two or more) salient cues. Such a system is called a relational coding. This paper is not concerned with the empirical question of which coding method is used in animal cognitive maps. Indeed, there seems no particular reason to believe that any one method will be relied upon to the exclusion others. Rather, since robust navigation skills are critical to survival, multiple coding systems may be employed. In other words, for any specific navigation task, both vector and relational solutions may be computed and behaviour chosen to be as consistent as possible with all available constraints. The goal of this paper, is instead to consider some of the computational issues involved in using relational codes. McNaughton et al. base their preference for vector encoding over relational methods partly on the grounds that the former is computationally simpler and substantially more economical in memory use. We describe here a model of a relational map that encodes spatial knowledge and calculates paths over a wide environment yet requires memory and processing time proportional only to the number of landmarks. The system performs parallel computations using linear mathematics that can be implemented by a simple associative neural network. It is therefore not significantly more complex or demanding of memory than a vector coding. This encourages the view that similar mechanisms could be employed in animal navigation.
2
Using relational maps
The task of navigating a large-scale environment using relational methods divides into three problems: identification and re-identification of salient landmarks; encoding, and later remembering, goal locations in terms of sets of visible local cues; and finally, calculating routes between positions that share no common view. The first task, landmark identification, has been considered elsewhere both from the point of view of animal and robot navigation systems (Zipser (1983a,b); Levitt and Lawton (1990); O’Keefe (1990)). In this paper landmarks are taken to be prominent, visually distinct objects that have point-based locations in egocentric space. We assume the agent is able to select suitable landmarks from the visual scene and determine their positions relative to itself. Our primary concerns are therefore with the problems of encoding and
Long-Range Cognitive Maps
2
remembering goals in local landmark frames, and extending these methods to long-range navigation tasks. The problems arising from inaccurate perceptual data and visually ambiguous landmarks are also briefly discussed.
3
Local allocentric frames
O'Keefe (1990) has proposed an interesting model of a local relational coding inspired by empirical studies of ‘place’ cells in the hippocampus of the rat. He suggests that the rat brain computes the origin and orientation of a polar coordinate frame from the vectors giving the egocentric locations of the set of salient visible cues. Specifically, he proposes that these location vectors are averaged to compute the origin (or centroid) of the polar frame, and that the gradients of vectors between each pair of cues are averaged to compute its orientation (or slope). Goal positions can then be recorded in the co-ordinate system of this allocentric frame in a form that will be invariant regardless of the position and orientation of the animal. However, there are problems with this hypothesis. Firstly, the computation of the slope is such that the resulting angle will differ if the cues are taken in different orders1. Since any ordering is essentially arbitrary, a specific sequence will have to be remembered in order to generate the same allocentric frame from all positions within sight of the landmark set. Secondly, as landmarks move out of sight, are occluded by each other, or new ones come into view, the values of the slope and centroid will change. Rather than changing the global frame each time a landmark appears or disappears it seems to us more judicious to maintain multiple local frames based on subsets of the available cues. These would supply several mutually-consistent encodings making the mapping system robust to changes in individual landmarks. The use of multiple local frames has been proposed by Levitt and Lawton (1990). They observe that the minimum number of landmarks required to generate a co-ordinate frame is two (in two-dimensions, three in 3D). They also provide a useful analysis of how the constraints generated by multiple local frames can be combined, even in the presence of very poor distance information, to provide robust location estimates. To calculate, from a novel 1This arises because the gradient of a line is a scalar function
with a singularity at infinity when the line is parallel to the y axis. Hence in order to average gradients this must be done in terms of angles or vectors, in which case the order in which the points are compared is important.
Prescott and Mayhew
Long-Range Cognitive Maps
3
position, a goal location that has been encoded in a two landmark frame requires non-linear computations (trigonometric functions and square roots). It also requires that an arbitrary ordering of the two landmarks is remembered in order to specify a unique co-ordinate system. Zipser (1986), who also considered a landmark pair method (Zipser 1983b), points out that if one more landmark is used to compute the local frame then all the calculations required become linear. In fact, all that is required to encode a goal location using three landmarks (in 2D, four in 3D) is that one constant is associated with each cue. Zipser called these constants “β-coefficients”, they are, however, identical to the barycentric2 co-ordinates that have been known to mathematicians since Moebius (see for instance Farina, 1988). Since the system for long-range navigation described below uses this three landmark method the following section considers it in more detail. The remainder of the paper treats the navigation problem as twodimensional, however, the extension of these methods to three dimensions is straightforward.
the egocentric frames centred at V and V ʹ′ respectively. The two frames can therefore be described by the matrices
4
Though this inverse calculation uses only linear maths, the value of the β-encoding as a possible biological model has been questioned on the grounds of its apparent mathematical complexity (Worden 1992). However, Zipser (1986) points out that an even simpler computational mechanism is possible by allowing the β values to
Barycentric co-ordinates
Figure One shows the relative locations of a group of three landmarks (hereafter termed an L-trie) labelled A, B, and C, seen from two different viewing positions V and V ʹ′ . A goal site G is assumed to be visible only from the first viewpoint. B
x'B V'
A
x'A x'C
C xC
xB
xA xG
G
V Figure One: relative positions of three landmarks and a goal from two viewpoints V and V'.
The column vectors
x i = (x i , yi ,1)Τ and x ʹ′i = ( x ʹ′i , yʹ′i ,1)
Τ
give the location in homogeneous co-ordinates of object i in
2This term, originally used in physics, is derive from
“barycentre” meaning “centre of gravity” .
X = [x A
xB
x C ] , Xʹ′ = [ xʹ′A
xʹ′B x Cʹ′ ]
If the three landmarks are distinct and not collinear then Τ there exists an unique vector β = (βA ,β B , BC ) such that
X β = x G and Xʹ′β = xʹ′G .
β the egocentric goal position from any new viewing position V ʹ′ In others words, by remembering the invariant can be determined by the linear sums
xGʹ′ = β A x ʹ′A + β B x ʹ′B + β C xCʹ′ , yGʹ′ = β A y ʹ′A + β B y ʹ′B + β C yʹ′C, (1 = β A + β B + β C ). Note that since each constant is tied to a specific cue the ordering of the landmarks is irrelevant. The β vector can be determined directly by computing −1 the inverse matrix X since
β = X−1X β = X −1 x G .
converge to a solution under feedback. This can be viewed as adapting the connection strengths of a linear perceptronlike (Rosenblatt 1961) learning unit. A network architecture that instantiates this mechanism is as follows. The network consists of two types of simple processing unit. The first are object-position units (object-units hereafter) whose activation represents the locations in egocentric space of specific goal-sites and salient landmarks. These units receive their primary input from high-level perceptual processing systems that identify goals and landmarks and determine their positions relative to the self. The second type of processor is termed a β-coding unit. This receives input from three object-units and adapts its connection strengths (the β values) to match its output vector to the activation of a fourth. An example of this architecture is illustrated in Figure Two which shows a β-coding unit G/ABC that receives the positions of the landmarks A, B, and C as its inputs. The unit adapts its weights ( β A , β B , BC ) till the output
(x, y, z) matches the goal vector (xG , yG , 1) . The unit is
Prescott and Mayhew
Long-Range Cognitive Maps
assumed to be triggered whenever all three input nodes are active. Gradient-descent learning is used to adapt the connection strengths. For the weight β i from the ith object
G(`A, `B, `C) = (1.3, -1.8, 0.6)
B
hA
A
unit this gives the update rule at each iteration C
Δ β i = η[(x G − x)xi + (yG − y)yi + (1− z)] where the parameter
hG
G
η is the learning rate. The network
rapidly converges to an accurate estimate of the β values.
1
5
z
xB yB
1
`A
`C
y
`B
`A
landmarks A, B, and C using barycentric co-ordinates.
G/ABC x
`C
Figure Three: Coding a goal position (G) relative to the three
G x G yG
4
1 yC
1
xC
yA
C
xA
`B
A
B Figure Two: The β-coding unit. A perceptron model of the βcoefficient calculation.
In order to provide a further understanding of the βencoding a geometrical interpretation can be given. The coefficient associated with each landmark is the ratio of the perpendiculars from the goal and that landmark to the line between the other two cues. For example, consider landmark A in Figure Three. The coefficient β A defines an axis that is perpendicular to the line BC and scaled according to the ratio of the two perpendiculars hG hA (this can also be thought of as the ratio of the areas of the triangles GBC and ABC). Taken together the three β coefficients define a barycentric co-ordinate frame.
Long-range navigation
We now describe how this coding method can be extended to navigate between points over a wide environment that share no common landmarks. The essence of the method is to build a two-layer relational network of object and βcoding units that stores the positions of landmarks in the local frames defined by neighbouring L-trie groups. The resulting structure will therefore record the relationships between multiple local frames. Thereafter, the locations of distant landmarks (and goal sites) can be found by propagating local view information through this network. Zipser (1983b) and Levitt and Lawton (1990) have both discussed methods of this type for long-range navigation using landmark-pair co-ordinate frames. The advantage of using the three landmark method, however, is that following a sequence of transformations through the network is significantly simpler. Since all calculations are linear and independent of landmark order the process can be carried out by spreading activation through the relational network. In contrast, a landmark-pair method would require networks of local processing units of considerably greater complexity in order to perform the necessary non-linear transformations.
6
Encoding the cognitive map
The relational network that encodes the cognitive map is constructed whilst exploring the environment. The specific method investigated here is as follows. Each time the agent moves the egocentric positions of all visible landmarks are computed. If there are any new landmarks in this set then new object-units are recruited to the lower layer of the network to represent landmark locations. Then, for each Ltrie combination in the set a test is made to see if a β-coding unit (with this local frame) already exists for each of the
Prescott and Mayhew
Long-Range Cognitive Maps
remaining visible cues. If not, new β-units are recruited to the network’s upper layer and linked appropriately with the object-nodes. The β-coefficients are then calculated either directly (using matrix inversion) or gradually (via the perceptron learning mechanism) as the agent moves within sight of the relevant cues. Figure Four shows an example of this process for a simple environment of five landmarks. From the current view position four landmarks A, B, C, and D are visible (as a simplification we assume 360° vision) for which the agent generates β−coding units A/BCD, B/ACD, C/ABD, D/ABC. Following adequate exploration the network illustrated in Figure Five will have been generated.
5
Given this network the agent can determine the location of any target landmark when it is within sight of any group of three others. For instance if cues A, B, and C are visible and E is required, then the active object units will trigger D/ABC (activating unit D) and hence E/BCD to give the egocentric location of the target. The method clearly generalises to allow the position of any goal site that is encoded within an L-trie frame to be found.
7
The topology of the relational net
The connectivity of the relational network defines a graph of the topological arrangement of local landmark frames. For instance, the network shown above instantiates the L-trie graph shown in Figure Six.
agent D
B
ABD
BDE
E A
ACD
BCD
BCE
C ABC
CDE
Figure Four: An environment with five landmarks, the agent is oriented towards the right with field of view as indicated by the circle .
Figure Six: the L-trie adjacency graph for the five landmark environment.
B/ACD
C/BDE
D/ABC
A/BCD
E/BCD
B/CDE
C/ABD
D/BCE
C
A
B
E
D
Figure Five: A relational network for the five landmark environment. The network consists of an input/output layer of object-units and a hidden layer of β-coding units that encode landmark transformations in specific local frames. The thin lines indicate input of known landmark positions to the hidden layer, thick lines show the output of newly calculated cue locations.
The links between nodes in this graph correspond to the β−coding units. Although the graph shown here has entirely bilateral connections, there is nothing intrinsically symmetrical about the coding method. For instance, it would be quite possible to encode the relationship D/ABC and not the reverse A/BCD. This could plausibly happen if the agent, whilst moving through its environment, encodes the positions of landmarks in front with respect to those it is already passing but not vice versa. This property of the mapping mechanism accords with observations of nonreflexive navigation behaviour and spatial knowledge in humans (for a review of this evidence see Kuipers, 1982) . Figures Seven and Eight show respectively an environment of twenty-four landmarks and the adjacency graph generated after exploration by random walk. In learning this environment the system was restricted to encoding the relative locations of only the four closest
Prescott and Mayhew
Long-Range Cognitive Maps
landmarks at any time. This reduces the connectivity of the graph and the memory requirements of the network substantially. However, even without this restriction, the memory requirements of the system are O(N) (i.e. proportional to the number of landmarks) rather than O(N2) since only local landmark relations are stored.
A
B
R
E
O
C D
L F
G
H
K
I
J
P M N W
S Q T U V X
Figure Seven: Landmark locations for a sample environment. The circle indicates the range of the simulated perceptual system. The box indicates a target landmark (see below).
start
goal Figure Eight: An L-trie graph. Each vertice represents an L-trie node and is placed in the position corresponding to the centre of the triangle formed by its three landmarks (in the previous figure). The boxes enclose the L-trie nodes local to the agent’s position and the target landmark ‘X’.
6
8 Target finding: estimating the distance and direction to desired goals From Figure Eight it is evident that there are multiple possible paths through the network that will connect any two landmark groups. Hence the system represents a highly redundant coding of landmark locations. As described in the previous section the (external) activation of the object units for any group of three landmarks triggers the firing of all associated β-coding units which in turn activate further object units. This ‘domino’ effect will eventually propagate egocentric position estimates of all landmarks throughout the graph. Indeed, due to the redundancy of the paths through the graph many estimates will be computed for each landmark, each arriving at the object node after differing periods of delay. For any specific goal or landmark the delay (between the initial activation and the arrival of the first estimate) will be proportional to the length of the shortest sequence of frames linking the two positions. Assuming noise in the perceptual mechanisms that determine the relative positions of visible cues (and hence noise in the stored β-coefficients) the position estimates arriving via different routes through the graph will vary. The question then arises as to how the relative accuracy of these different estimates can be judge. The simplest heuristic to adopt is to treat the first estimate that arrives at each node as the best approximation to that landmark’s true position. This is motivated by the observation that each transition in a sequence of frames can only add noise to a position estimate, hence better estimates will (on average) be provided by sequences with a minimal number of frames (whose outputs will arrive first). We call this accordingly the minimal sequence heuristic. However, there is a second important factor that effects the accuracy of propagated location estimates which is the spatial arrangement of the cues in each L-trie group. The worst case is if all the landmarks in a given group lie along a line, in this situation the β-coefficients for an encoded point will be undefined. In general, landmarks groups that are near collinear will also give large errors when computing cue positions in the presence of noise. One possibility, not explored here, is for the system to calculate estimates of the size of these errors and propagate these together with the computed landmark positions. Each computed location would then arrive with a label indicating how accurate an estimate it is judged to be. In the following examples landmark positions were calculated simply by rejecting information from L-trie
Prescott and Mayhew frames that were near-collinear (i.e. within some margin of error) and otherwise using the minimal-sequence heuristic, that is, adopting the first location estimate to arrive at each node. The issue of combining multiple estimates to obtain greater accuracy is considered further below. Finding the direction and distance to a specific goal by propagating landmark positions is here called target finding. This competence is sufficient to support behaviours such as orienting toward distant locations, and moving in the direction of a goal with the hope of finding a direct route. However, this mechanism will not always be appropriate as a method of navigation for two reasons. First, the direct line to a goal location is clearly not always a viable path. Secondly, the target finding system is susceptible to cumulative noise. We have simulated the effect of 0, 5, 10 and 20% gaussian relative noise3 in the measurement of all egocentric position vectors that occurs during map learning and navigation. Figure Nine shows an example of the effect of this noise on estimates for the position of landmark X (in the environment shown in Figure Seven) relative to the agent’s starting location. The figure demonstrates that with the less accurate perceptual data only a rough approximation to a desired goal vector can be obtained. Of course it is would be possible for the agent to move in the direction indicated by target finding and then hope to use landmarks that are recognised en route to gradually correct the initial error and so home in on the goal. However, this will often be a risky strategy as a direct line will not necessarily cross known territory. The following section describes a method which exploits a heuristic of this sort but in a form that is more likely to generate a successful path to the goal.
9 Path following: a robust method for moving to a goal location. An alternative strategy is to move, not in the direction of the goal itself, but towards landmarks that are known to lie along a possible path to the goal. This method, here called path following, involves determining a sequence of adjacent local frames that link start and goal positions prior to
3If n and n are samples from the gaussian noise distribution, x y
the vector (x,y) is perceived as (x + nxx, y + nyy).
Long-Range Cognitive Maps
A
7
B
R
E
O
S
C D
Q
L F
G
H
K
I 0%
T
P M N W
J
U V X
5%
10%
20%
Figure Nine: Target finding in the presence of 0, 5, 10 and 20% relative noise in perceived landmark positions.
setting out, then navigating by moving from frame to frame through this topological sequence. Because perceptual information about known landmarks will (very likely) become available as each frame is crossed, the agent will be able to replace estimates of cue positions with ‘hard’ data, thus avoiding the build-up of noise encountered by the target finding system. There is however, some computational overhead to be incurred through the need to calculate a suitable sequence of adjoining frames. Since, again, there are multiple sequences to choose from some heuristics are required to determine which should be preferred. The minimal sequence heuristic is again appropriate though on the slightly different grounds that shorter sequences should (on average) give more direct routes. Other heuristics are possible, for instance, estimates of the actual distances covered by alternate routes could be calculated allowing a more informed judgement as to which is the shortest path. To find the minimal sequence we simply reverse the process of propagating information through the relational network. In other words, we perform a spreading activation search from the goal back toward the start position. This is easiest to imagine in the context of the L-trie graph (Figure Eight) however it could be implemented directly in the relational network by backward connections between units.
Prescott and Mayhew
Long-Range Cognitive Maps
8
Figure Ten: Spread of activation through the L-trie graph (Figure Eight) after 0, 4, and 9 time steps (λ = 0.95). The points in the figures represent the vertices in the L-trie graph. The size of each point shows the level of activation of that L-trie node. The boxes indicate that a minimal sequence ABC BCE CEL EGL GLM LMN MNP MNW NVW VWX was found.
Our simulations model this parallel search process through a series of discrete time-steps. This occurs as follows. The L-trie node closest to the goal is activated and clamped on (i.e. its activity is fixed throughout the search) while all other nodes are initialised with zero activity. The signal at the goal is then allowed to diffuse through the adjacency graph decaying by a constant factor λ for each link that is traversed4. Once the activation reaches the L-trie node local to the agent the minimal sequence can be found. Beginning with the start node this sequence is traced through the network simply by connecting each node to its most active neighbour. This spread of activation is illustrated in Figure Ten. The three frames show the activity after 0,4, and 9 timesteps, after which time the activity has filtered through to the start node. The path found is indicated by the boxes enclosing the winning nodes. Having found a minimal sequence the path following method proceeds as follows. The agent moves toward the average position of the landmarks vectors for the first L-trie in the path. Once that position is reached (it will be near the centre of the three cues), the position of the next L-trie is generated (using direct perceptual data as far as possible) and so on till the goal is reached. Figure Eleven illustrates
4This is achieved by, at each time-step, updating the acticity of
each L-trie node to be equal to the maximum of its own activation and that of any of its immediate neighbours (multiplied by the decay factor) at the previous time-step.
this mechanism for the noise-free case, and Figure Twelve for noise levels of 5, 10 and 20%. The second figure demonstrates that path following is extremely robust to noise as the error in the final goal estimate is independent of total path-length.
A
B
R
E
O
C D
L F H I
G K J
P M N W
S Q T U V X
Figure Eleven: Moving to the goal by path following. The dotted lines indicate additional landmark locations that were utilised en route.
Prescott and Mayhew
A
Long-Range Cognitive Maps
B
R
E
O
C D
L G
F
K
H
J
I 5%
P M N W
S
North B
Q
A
C
E
I
T
L
F
D
K
H
G
M J
U N
V R
X 10%
9
Q
O P
W X
S
20%
Z
Y
TU V
Figure Twelve: Performance of the path following system in the
Figure Thirteen: An environment of twenty-six landmarks with the
presence of 5, 10 and 20% relative noise in perceived landmark
agent positioned and oriented as shown.
positions. (Different L-trie sequences were followed in each case as networks with different connectivity were acquired during
North
exploration).
10
T V U R S
Building a predictive map
The ‘domino’ effect that propagates local landmark positions through the relational net will eventually generate egocentric location estimates for all landmarks (with connections to the L-trie graph). The resulting activity can be thought of as a dynamic map of the environment which could update itself as the agent moves and is automatically arranged with the agent at the centre and oriented towards its current heading. Figures Thirteen and Fourteen show this egocentric map computed (with 20% noise) for an environment of twenty-six landmarks. As a result of cumulative error the exact layout of landmarks is more accurately judged close at hand than further away, however, the topological relations between landmarks are reproduced throughout. One use to which such a map might be put is to disambiguate perceptually similar landmarks by calculating predictions of upcoming scenes. In other words, if the agent sees a landmark which appears identical to one it already knows, then it can judge whether the two cues actually arise from the same object from the extent to which the place where the landmark appears agrees with the location predicted for it by the mapping system. If there is a large discrepancy between actual and predicted locations then the new landmark could be judged to be a distinct object. On the other hand, if there is a good match the agent could conclude that it is observing the original cue.
Q Y Z
X
W
D
P
N
O
J
H
M
K I L
C
G F
E
B
A
Figure Fourteen: The cognitive map in the agent’s egocentric coordinate frame generated from the viewing position shown in the previous figure.
11
Future Work
An examination of the error tolerance of the computation of β values reveals several interesting facts. One is that the error in encoding the position of an arbitrary goal relative to an L-trie is minimised if the three landmarks form an equilateral triangle. This arises because the size of the error in the β-coefficients is proportional to the variance (due to noise) in the area of this triangle. This variance is lowest when the ratio of the area to the total length of the sides is maximal. By the same argument, when encoding each of four landmarks relative to the other three the opti-
Prescott and Mayhew mal configuration is for the landmarks to form a square. This suggests that certain landmark configurations should be given greater weight during path computation. A second factor affecting the size of the error in location estimates is the viewing position at which the β computation is made. Through a monte carlo simulation we have found that the error is generally lower when the agent is within the area enclosed by the four positions and has a minimum near to the centroid of the L-trie. The difference in error due to viewing position also becomes more exaggerated as the landmark configuration becomes less optimal. We hope to use these findings to determine better methods for coping with noise in the map computations. Both in encoding the β-coefficients during exploration and in combining multiple estimates of landmark positions during navigation.
12
Conclusion
We have described a system for way-finding between places that share no common views. The system works by encoding landmark locations in the local co-ordinate frames defined by groups of three nearby cues. These relations can be stored in a simple linear network of perceptron-like units. The system can orient towards distant targets, find and follow a path to a goal, form an overall map, and disambiguate perceptually similar landmarks. The robust and economical character of this system makes it a feasible mechanism for implementing a large-scale cognitive map.
Acknowledgements The authors are grateful to John Porrill, John Frisby, Neil Thacker, Pete Coffey, and Mark Blades for advice and discussion. This research was funded by the Science and Engineering Research Council.
References Farin, G. (1988). Curves and Surfaces for Computer Aided Geometric Design: A Practical Guide. Boston, Academic Press. Gallistel, C. R. (1990). The Organisation of Learning. Cambridge, MA, MIT Press. Kuipers, B. (1982). “The “map in the head” metaphor.” Environment and behaviour 14: 202-220.
Long-Range Cognitive Maps
10
Levitt, T. S. and D. T. Lawton (1990). “Qualitative navigation for mobile robots.” Artificial Intelligence 44: 305-360. McNaughton, B. L., Chen, L. L. and E.J. Markus (1991). “Landmark learning and the sense of direction - a neurophysiological and computational hypothesis.” Journal of Cognitive Neuroscience 3(2): 192-202. O’Keefe, J. (1990). The hippocampal cognitive map and navigational strategies. Brain and Space. Oxford, Oxford University Press. O’Keefe, J. A. and L. Nadel (1978). The Hippocampus as a Cognitive Map. Oxford University Press. Rosenblatt, F. (1961). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Washington, DC, Spartan Books. Worden, R. (1992). “Navigation by fragment fitting: a theory of hippocampal function.” Hippocampus 2(2): 165188. Zipser (1983a). The representation of location. Institute for Cognitive Science, UCSD, ICS 8301. Zipser, D. (1983b). The representation of maps. Institute of Cognitive Science, UCSD, ICS 8304. Zipser, D. (1986). Biologically plausible models of place recognition and place location. Parallel Distributed Processing: Explorations in the Micro-structure of Cognition, Volume 2. Cambridge, MA, Bradford. 432-70.