The Dynamic Approach to Autonomous Robot Navigation Axel Steinhage
Gregor Schoner
Institut fur Neuroinformatik Ruhr-Universitat Bochum, Germany
[email protected]
Centre de Recherche en Neurosciences Cognitives C.N.R.S. Marseille, France
[email protected]
Abstract | We present a dynamical systems approach to generate autonomous behavior. For the example of navigation and exploration of an autonomous agent, we develop a dynamical system, that produces classical control-type behaviors as well as complex behavioral sequences and discrete behavior selections. Examples for the control-type behaviors, that are generated, are obstacle avoidance, position estimation, recalibration and target acquisition. Examples for the complex behaviors are adaptive exploration, adaptive path planning, adaptive target selection and searching. The approach works with continuous dierential equations throughout and combines the advantages of the formulation as a classical control problem and those of the formulation as an algorithmic problem. Keywords | Dynamical Systems, Autonomous Behavior, Navigation, Behavioral Sequence, Action Selection.
line for how to derive those dierential equations and behavioral variables. It also tells us how to incorporate and fuse sensor information from many possibly con icting or erroneous sensors. A set of basic principles, that have to be obeyed by the designer, helps to avoid instabilities when coupling many mostly nonlinear dynamical systems into a complete model. In the next chapter, we will explain these principles brie y, as we use this approach for the controltype parts of our navigation model. The basic property of the control type dynamics is, that the system described by it is in a de ned state at all times: velocity and heading direction have unique values at all times. Therefore the dynamical system is instantiated: the behavioral variable that describes the corresponding behavior, has a unique value at all times too. For the more complex behaviors, this may not be true: when doing target acquisition for instance, there may be more than one appropriate target or even none at all. Another example is the situation, in which a certain behavior relies on suf cient sensor information: in visual snapshot navigation a position estimate is only recalibrated, if a camera picture is similar to a previously stored one. Sucient sensor information in this case is only available at discrete locations in space, namely at those, at which a snapshot has been recorded previously. At all other places, this information is absent and the behavioral variable, that describes the visual position estimate, has no value at all. For situations like these, the instantiated dynamics is not appropriate. Two other concepts have therefore been incorporated into the dynamic approach so far: the neural eld concept and the competitive dynamics concept. The neural eld concept is appropriate, if the underlying behavioral dimension has a topology, like the x; y-plane in our example of visual snapshot navigation. In [1] a review about this approach is given. In this paper we want to focus on the competitive dynamics approach for handling non-instantiated behavioral dimensions that do not possess a metric, like the index space of all complex behaviors implemented or the index space of potential targets. Although there is nothing like a "distance" between the possible instances of the behavioral dimension in this case, there is an interaction between them. We will describe, how adaptive and exible behavioral sequences are generated by designing the interactions according to a control- ow plan which describes the logical inter-dependencies between the behaviors. The advantage of the competitive dynamics concept is, that although the underlying behavioral dimension is not instantiated, every
I. Introduction
Autonomy requires systems, that are not only capable of controling their motion in response to sensor inputs (such as when avoiding obstacles for instance), but that are also capable to react to "unexpected" events, like qualitative changes in the environment or strong perturbations that lead to a complete decalibration. Moreover, autonomous systems must be able to plan their actions in a exible way so as to make good use of their own resources. Although these requirements go well beyond simple inputoutput type systems, there is not necessarily a need to introduce symbolic representations and algorithmic structures as they are commonly used in arti cial intelligence (AI). We will show this for the example of autonomous navigation, because this problem contains control-type elements as well as complex behaviors. Furthermore a lot of work has been done on this eld already, oering the possibility to compare the classical solutions with our approach. By "behaviors" we understand all processes occuring along the stream from sensing to acting, even when this requires to extend the meaning of the expression "behavior" as it is used in daily life. For producing control type behaviors like obstacle avoidance and target acquisition, the formulation of the problem as a continuous dierential equation has many advantages [1]: the stability of the system can be analysed algebraicly and there is a close link between the observables of the behavior (heading direction and velocity for instance) and the corresponding behavioral variables of the model. Furthermore, the in uence of sensor inputs on the behavioral variables can be explicitely incorporated as mathematical terms in the model equation. The so called "dynamical systems approach" is a clear guide-
single one of its possible instances is described by an instantiated dynamics. In the following, we will at rst describe the basic concepts of the dynamic approach focussing on the control-type instantiated dynamics and on the competitive dynamics concept. Then we will present our navigation model as an implementation of the concepts. II. Basic principles of the dynamic approach to autonomous behavior generation
Autonomous systems consist of behaviors. Every behavior is characterized by a (possibly vector valued) variable, that de nes quantitatively the state of the system projected onto the corresponding behavioral dimension. These so called "behavioral variables" must ful ll two requirements: First, the tasks, which the system should solve by performing the behavior, must be expressible as points or regions in the space spanned by the behavioral dimensions. The second requirement is, that the variables must be linked to sensory or motor surfaces of the system. This linkage can be direct or indirect, i.e. going through other behavioral levels. In other words: sensory information must be available to specify the values of the behavioral variables. An example for a behavioral variable is a vehicle's heading direction , measured over the behavioral dimension relative to a world xed reference 0 = 0. If the task is, to drive to a target, which is located in direction , the behavior to be performed is called "target acquisition". An example for a direct linkage to sensory information is a sensory array of photo receptors on the vehicle's periphery giving the angular position of a shining target. An indirect linkage would be if the angular target position was calculated from the retrieved x; y- position of a target stored in a target representation memorized from earlier sensory inputs. Behavior is generated by evolving in time the solutions of a dynamical system (see [4] for an introduction to dynamical systems theory) that governs the behavioral variables. As an important design principle, we require, that these solutions must at all times be in or close to a stable attractor state of the dynamics, which can be achieved through an adequate choice of the timescales. The advantage of this is, that the system is endowed with stability properties against the in uence of noise. Further, we can use the qualitative theory of dynamical systems [1] to design the dynamics if we restrict the analysis to the attractor states. The attractors are de ned by the tasks the system must solve (see Fig. 1): a target speci es an attractor at the value , an obstacle speci es a repellor at . The stability of the attractors is given by a timescale with which the system relaxes from a perturbation. We call the inverse of this timescale the strength = ?1 of the attractor. Finally, every attractor (repellor) is endowed with a range that de nes its basin of attraction: only if the behavioral state lies within this range, the system is aected by the attractor (repellor). A typical dynamical system of this type is:
here, _ is the time-derivative of the heading direction and the are the angular positions at which the targetattractors ( > 0) and the obstacle-repellors ( < 0) are located. The can be made a function of the distance of the targets (obstacles) to the robot, so that attractors (repellors) in small distances are stronger. The ranges can be made a function of the robot's diameter and the distance: the targets and obstacles have a bigger range of in uence for small distances and big robot diameters, because in small distances the targets and obstacles cover a bigger angular range and the robot must be able to turn towards the target or away from obstacles in time when approaching them (see [3] for a hardware implementation of this concept). In Fig. 1 this situation is shown: a robot h
i
i
i
i
i
Target 2 Obstacle
Strength Φ t2
Φo Φ0 Robot Φh Φt1
h
Target 1
t
t
t
t
Fig. 1. Obstacle avoidance and target acquisition. The phaseplot of (1) is plotted in polar coordinates _ = r() (bold line). The targets specify attractors (negative slope), the obstacle speci es a repellor (positive slope). The strength is a function of the distance, the range is a function of the robot's diameter and the distance.
which is driving with a constant forward velocity v and whose heading direction is governed by the dynamics (1) will approach one of the targets if its heading direction falls into the target's basin of attraction. An important property of the dynamics is, that the system is capable of making decisions when to average between two attractors and when to select one of them. Fig. 2 shows two attrach
0
φ
h
X ( ? )e? i?ih (
i
i
i
h
2 2
)2
φ
1
2
0
φ
φ
1
o
_ =
Range
2
0
φ
av
Fig. 2. A continuous variation of the distance between the xed points leads to a bifurcation at a critical distance.
tors at various distances. If the ranges of the attractors do not overlap, the system will fall into one of them, otherwise it will average between them. We use this feature (1) for fusing the position estimates of dierent sensors, as we will describe later. Qualitative changes in the behavior are
induced by continuous variation of the dynamic's parameters: in our example a continuous variation of the distance between the attractors lead to a switching from the behavior "selecting" to the behavior "averaging". Mathematically, the dynamics undergoes bifurcations when varying its parameters: two stable xed points are fused into one when reducing their distance. This is an important design principle too, as it oers the possibility to express discrete phenomenons (like switching between behaviors) by continuous dierential equations as we will show later. How can behavior be generated, if the system is in a stable state at all times ? The solution is, to let the attractors (repellors) move on a slower time scale than the time scale which the dynamics needs to relax into the stable state. This movement is brought about by the changing sensor information: the directions in which the robot "sees" the targets (obstacles) change on the time scale of the robot's movement. The designer must guarantee, that this time scale is suciently slow (small robot velocity v), so that the dynamics (time scale ) can always follow the movement of the stable states. The same principle is used when coupling two behavioral dynamics: if one dynamics speci es the stable states of the other, this dominating dynamics must have a slower time scale than the dominated one. This implies, that every dynamics in a coupled system must have its "own" time scale and the slowest time scale is the one which governs the over all behavior.
it has all the properties mentioned in the previous chapter. Particularly, for an appropriate choice of the time scale it is at all times in one of the stable xed points 1; 0, depending on the competitive advantage and on the state of the other competing n ( = 0). The stochastic noise prevents the system from getting stuck in unstable xed points. Fig. 3 shows phaseplots of the system (2) for a so j
. n
As we mentioned in the introduction, there are cases in which we need non-instantiated behavioral dimensions. An example for such a dimension is the index space, i.e. the set of numbers for counting discrete instances. In our navigation model for instance, we have a set of locations which have a special meaning to the robot: these so called home bases must be visited often to recalibrate the robot's internal position estimate by recognizing previously stored views from these locations (see Fig. 8). Whenever the agent does target acquisition to a user-de ned target, it tries to plan a path leading through the home bases which are subtargets in this case. The behavioral variable for this subtarget acquisition is the index i of the next home base on the path. This variable is non-instantiated: there can be none or even more than one sub-target between the robot and the main-target. We take this into account by de ning an activation variable n for every home base. The activation n is an auxiliary variable, measuring how much of the instance i of the behavior is activated. In our example n de nes whether the corresponding home base is the agent's next subtarget (jn j = 1) or not (n = 0). The dynamical system for the activation n is a competitive dynamics:
i
0 . n
−1
0
1
n
i
−1
0
1
n
i
−1
0
1
n
i
i
0 . n
i
0 fp
1 0 −1
i
III. The competitive dynamics approach
i;i
Fig. 3. Phaseplot and bifurcation diagram of the dynamics (2) with
= 1. In the top panel we set n = 1 for one j 6= i to show the eect of the competitive term: the dynamics has only a stable xed point at 0 although we set 1 > > 0. For the other three panels we set n = 0 for all j 6= i so that the competitive term is zero (which can also be achieved by setting = 0). In the second panel we set > 0, and in the third < 0. The bifurcation diagram in the fourth panel shows, that (2) has a stable xed point (solid line) at n = 0 for < 0 which becomes unstable (dashed line) for > 0 while two new stable xed points appear at jnj = 1. j
i
j
i
i
i
i
called winner-takes-all-dynamics = 1: a continuous increase of across 0 leads to a bifurcation from the xed point n = 0 to jn j = 1, but only if all other instances are "switched o" (n = 0). By an appropriate design of the
, we can de ne which two instances (i; j ) can be active simultaneously ( = 0) and which compete with each other ( > 0). In our example of trajectory planning, we chose = 1 because only one base should be a subtarget at a time. The for every home base are a function of the base's distance to the robot d and the angular dierence j ? j between the direction from the robot to the main target and the base: that base wins, which is nearest to the robot and lies closest to the direct path towards the main target (see Fig. 4). An interesting stability property of the i
i
i
j
i;j
i;j
i;j
i
i
t
i
Target
HomeBase 1
i
i
i
i
n_ = n ? j jn3 ? i
Robot
d1 Φ1 Φt
i
i
i
α
0
i
i
X
i
n2 n + i;j
i
j
j
t
(2)
Fig. 4. Sketch of the parameters that de ne the competitive advantage in the path-planning behavior.
here, is the time scale of the dynamics, is a stochastical noise term, is an element of the interaction matrix dynamics (2) is hysteresis: the currently active instance and is the so called competitive advantage of instance i. jn j = 1 is not switched o, even when another becomes As the dynamics (2) is an instantiated control dynamics, bigger than . Only when itself drops below 0, another i
t
i;j
i
i
j
i
i
instance j gets its chance to win the competition. We use The function for the looks like this: the dynamics (2) not only to select between subtargets, but Y A + A2 also for switching between dierent behaviors: the behav+ A jn j) I = C (1 ? 2 ioral variable is then the index of the competing behaviors and the n de ne, whether the system is in behavior i or = 2r I ? 1 (3) not. In this case, the hysteresis is of major importance: the currently active behavior should only be deactivated, As Q is a multiplication, (3) implements a logical andwhen its own competitive advantage becomes negative. condition: if one of the activation conditions is not ful lled, the input I is zero and the activation becomes negative so that behavior i will be deactivated. The variable IV. Interaction between instances C 2 [0; 1] builds the interface to the control dynamics. By While the competitive advantage of one sub-target was default we set C = 1, but some behaviors must be acindependent of the other, the for behavior selection are tivated depending on sensory inputs or a speci c state of functions of the state of the other behavioral instances. The the control dynamics. This input must be transformed into the continuous logical value C . As an example we consider the searching behavior i: if the system is lost, it searches α for a home base to recalibrate by recognizing a snapshot 0 view. This searching behavior should be deactivated, if i the behavior j "snapshot recognition" becomes active. As the system tries to recognize stored snapshots at all times, A this behavior must coexist with all other behaviors, so that |n|
= = 0. The recognition of a snapshot is indicated γ 1 by a high maximal correlation between the current and the 0 stored views. The maximal correlation Kmax 2 [0; 1] can ... i 1 2 3 N be directly used as the input C from the control system. Fig. 5. The concept of behavior selection: jn j is the activation of a The activation of n should be independent of all other bebehavior i. The interaction between the behaviors is de ned by haviors, so that A = 0. Finally, an active recalibration the interaction matrix 2 f0; 1g, that controls which behaviors behavior jn j = 1 should deactivate the searching behavcan coexist and which do compete. The behavioral layer receives ior, so that A = ?1. A stability problem arises, if the input from the competitive advantage layer , which is controlled by the behavioral states n through the activation matrix A 2 deactivating behavior is active for a short time only. In f?1; 0; 1g. our example this happens, if the system recognizes a snapshot while driving and the input C is a short pulse only. interaction matrix has only two possible elements: a) two In that case, the competitive dynamics is too slow to let behaviors i and j are not allowed to coexist ( = 1) and the next behavior in the sequence come up. Instead, the b) the two behaviors are independent of each other and searching behavior will reactivate right after it has been therefore can coexist ( = 0) (see [5] for an application switched o. This leads to unwanted oscillations in the beof interaction matrices with entries other than 0,1). The havior selection. To prevent this, a refractory dynamics r elements of the activation matrix A have three possible is incorporated in (3): values: A = 1 means, behavior j must be active to allow (4) r_ = I ? r + I r ? r for an activation of behavior i, A = ?1 means, behavior j must be inactive to activate behavior i and A = 0 means, the activation of behavior i is independent of the this dynamics has the following characteristics: while bestate of behavior j . In principle, this concept is a local havior i receives input I = 1, the refractory variable r is one: we de ne only interactions between pairs of behaviors. in the stable xed point r = 1. A short deactivating pulse The behavioral sequence is generated by an appropriate I = 0 sets r = 0. As long as I = 0, the xed point r = 0 design of the A-matrix: every behavior has to prepare the is stable. When behavior i receives input I = 1 again, the activation conditions for the next behavior in the sequence. xed point r = 0 becomes unstable and the refractory variThis approach has three advantages: a) there is no need for able approaches the stable xed point r = 1 on a time scale a high level representation of the whole sequence, b) the . In other words, de nes how long behavior i is prelocal design allows for some exibility: if a sequence cannot vented from becoming activated again. Choosing long be acted out in one situation, the system can fall back enough ensures, that the next behavior in the sequence can into a default behavior and enter the previously cancelled win the competition in (2). sequence at some other time (but this time from another V. Our navigation model behavioral state), and c) as we keep all the principles of All the principles and concepts described so far have been the control dynamic approach, the system is always in a behaviorally stable state. The actual forms of A and have applied to a dynamical systems model of complex navigato be designed such, that the desired behavioral sequences tion behavior. We will describe the basic properties of this model brie y (for a detailed description see [2]): The agent are generated. i;j
i
i;j
i;j
i
j
j
i
i
i
i
j
i
i
i
i
i
i
;j
j;
j
j
i
j;
i;j
j
i;j
i;j
j
i;j
i;j
i
i;j
i
i
i
i
i;j
i
i
r
i;j
r
r
r
(5) (6)
the vision attractor, so that the internal estimate Xint is governed by the dead reckoning estimate Xdr . If a local view is recognized (Kmax ' 1), (5) is dominating. In that case, Xint is governed by the vision estimate Xvis . Due to wheel slip for instance, the movement command is not acted out exactly and therefore Xdr accumulates errors, which we simulate too. In time, Xdr drifts away from the real X -position of the robot. While no local view is recognized, the internal estimate is governed by the dead reckoning estimate so that Xint accumulates the same errors. When a view is recognized, the attractor in (6) gets strong, forcing the dead reckoning estimate into the internal estimate Xint . As Xint in that case is governed by the visual estimate, which, as a world- xed cue, does not accumulate errors, the dead reckoning estimate is recalibrated. This concept of inverting the hierarchical order between two dynamical levels through manipulation of the dynamical time scales (7) is called time scale inversion. Obstacle avoidance is done as described with Fig. 1. For measuring the distances, we simulate a simple infra red type sensor that delivers a 360 distance scan of the robot's surrounding. The action selector implements our competitive dynamics concept for selecting between 13 "higher level" behaviors: User-target acquisition (default behavior): a target in world coordinates set by an operator has to be acquired. Sub-target acquisition of base i: Home base i is the next subtarget on the trajectory. Check home base i: Home base i is made a main target to check it for its recalibration suitability. Home base i acquired: The internal position estimate is near a memorized home base location. Recognition of a recalibration: A local view is recognized. Recognition of a decalibration: No local view is recognized although the internal position estimate predicts it. Recalibration failure at base i: Recalibration at base i fails. Second Recalibration failure at base i Recalibration timer i exceeds limit: Base i has not served as recalibration location for a duration T . Homing: The initial home base is made the main target. Recreate home base i: Collecting local views to renew base i while driving a spiral trajectory. Search: A spiral trajectory is generated to search for a recalibrating base. Call operator: Emergency-halt the system.
(7) one attractor in (5) is speci ed by the visual estimate Xvis . This estimate is the location of the stored local view that has the highest correlation Kmax 2 [0; 1] with the current view. The other attractor in (5) is the so called dead reckoning-estimate Xdr, i.e. the stable state of dynamics (6). It is calculated by integrating the robot's movement commands v; dr : a suciently low movement velocity v shifts the stable xed point Xdr = Xint slowly enough to keep the system in a stable state. As (5) and (6) are the components of a coupled dynamical system, the time scales ?1 given by (7) speci y which is the dominating component: if no stored local view is recognized (Kmax ' 0), (6) is the slower dynamics and thus dominating. At the same time, the dead-reckoning attractor in (5) is stronger than
Through their C -parameter some of these behaviors get direct input from the control-dynamics, like the position estimate or the visual correlation Kmax. The interactionand activation matrices A and , are designed such that the behavioral schedule in Fig.7 is generated. The motor output of the system is generated by the dynamics: _ dr = ran + obs O + tar T + spi S; v = const (8) the target- and obstacle contributions O and T are the right sides of dynamics of the type (1). For T , we add up attractors at the angular direction of the main target and at all active sub-targets, for O we add up repellors at the angular positions at which the distance scan sensed objects. S is the right side of a dynamics that produces a spiral trajectory by constantly decreasing an intitially high angular velocity contribution _ = ! . The in (8) are proportional to the behavioral instances jn j: an jn j = 1
Agent A
local views
Map
Vision estimate Home Base and recaliAssociate locations bration estimated and recalisignal location with bration view signal Location of Home B Position C Action Bases Estimate
Create Home Bases
Environment H
set main target
I
Selection
Recalibration
DeadReckoning estimate Integrate movement D DeadReckoning command
Contribution of selected action E
F
G
Random
Obstacle Avoidance
Operator
K
Movement Obstacle avoidance contribution
Randomcontribution
act out movement command
Vision
sense distance of obstacles
Motor
L
Distances
Fig. 6. The navigation model.
interacts with the environment through two sensor channels and a motor channel (for this and the following see Fig. 6). The sensor models are kept very simple so that they can easily be implemented in hardware. The visual sensor is modelled as a one-dimensional 360 retina: the agents environment in a xed azimuth is represented as a binary intensity distribution with an angular resolution of 3:6 . This retinal picture is then dierentiated twice to enhance the representation of vertical edges. The result is a 100-element data-set, which is suciently correlated with the current location of the agent in the environment. For a limited number of locations, these so called local views are stored in an associative matrix memory together with the agents internal position estimate Xint; Yint at storage time. This memory of topologically ordered visual representations is used as a cognitive map in our model. The current position estimate Xint; Yint is the stable state of a sensor-fusion dynamics (5) of the control type (1) (shown for the dimension X only): o
o
X_ int = int dr(Xdr ? Xint )e ;
X_ dr
? (Xdr2?X2 int )2 dr
?
?
(Xvis Xint )2 22 vis
+ int vis (Xvis ? Xint )e = dr (Xint ? Xdr ) + v cos(dr ) ;
int dr = c1 ; int vis = c2 Kmax; dr = c3 Kmax; c1