Hybrid systems in robotics centre on signs and their meaning. These issues arise ... In our architecture, the robot is supposed to learn word-object associations based on input ... Figure 1 Overview of the Symbol Anchoring process. The robot ...
Anchoring Symbols in Hybrid Autonomous Systems Using Isomap Sequences Erich Prem, Erik Hoertnagl, Patrick Poelz Austrian Research Institute for Artificial Intelligence {erich, erik, patrick}@oefai.at Freyung 6/6, A-1010 Vienna, Austria Abstract. We describe a novel approach to anchoring symbols in the sensory data of a hybrid autonomous system. Using an autonomous mobile robot as a test platform we show how an Isomap can be used to detect and properly classify time-series of sensory data. In the past, similar approaches have used self-organizing maps (SOMs) for this purpose. Isomap can be regarded as an improved technique for generating a topology preserving low-dimensional embedding of higher-dimensional data. The interactions of the robot with objects in its environment produce a sequence of points on the map. For object recognition, these sequences need to be correctly classified. This technique forms the basis of our symbol anchoring architecture.
1. Introduction Hybridization is a topic of central importance in the area of robotics research. Ever since the advent of reactive architectures (Brooks 91), the discussion has focussed on how to properly establish the connection between sensory-motor interactions and “higher” reasoning capabilities of a more symbolic type. Even for those arguing for a purely reactive approach, the question of symbolic (e.g. linguistic) interaction with the robot remains an important issue. This is why we find a number of researchers in this field who focus on problems related to symbols and artificial systems: work in this field ranges from hybrid architectures (Heinen & Osorio 02) to behaviour-based robots using signs and language (Steels & Vogt 97, Steels 01, Billard & Dautenhahn 97, 99). In particular, technical approaches to mapping objects in an autonomous robot’s environment on structures internal to the robot (“Symbol Anchoring”) are an active field of research. 1.1 Symbol Grounding and Symbol Anchoring Hybrid systems in robotics centre on signs and their meaning. These issues arise from at least three different perspectives: The first one originates in the aim of creating a system that uses signs for communicative acts with humans or other artificial systems. The underlying motivation here can either be to achieve this desired communication or to study processes of child-like language acquisition or even, language evolution. The second perspective is to connect the meaning of internal structures (“representations” or “anchors”) to objects in the world. Here, the goal often is to create a model of the robot’s environment for planning purposes. Finally, another perspective (often alluded to by the two others)
focuses on the more philosophical “Symbol Grounding Problem” that arose in discussions following John Searle‘s famous “Chinese Room” argument. Harnad poses the question as to how it is possible for an artificial agent to acquire symbols that possess intrinsic meaning (Harnad 90, 93) which is not “parasitic” on the meaning of other symbols in our head. Harnad originally proposed neural networks as a potential solution to the problem, and a number of neural approaches to hybrid systems have since been developed. The challenge remains how to establish and maintain the relationship of sensory data of objects in the agent’s environment with symbolic representations, i.e. to develop descriptors for sensory signals that allow the classification of a part of the sensory stream as caused by an object in the environment of the agent. This process has been termed symbol anchoring and several approaches to the problem have been presented recently (Coradeschi & Saffiotti 03). The techniques used range from prototypes, feature-based approaches, to more sophisticated dynamical systems solutions. In our architecture, the robot is supposed to learn word-object associations based on input by a human teacher (cf. Fig. 1). For this, we need an internal mediating representation that captures the “meaning” of symbols. Note that in most examples described in the literature, symbols refer to static objects or to features of these objects. Verbs and other word categories are only rarely grounded in practice. In this paper we also focus on grounding identifiers for objects. “Corner“
Testbed
Testbed
• Tube “Corner“
• Corner • Box
kurt2
•... Isomap
Internal Object Representations
“Common Name” Object Representations
Figure 1 Overview of the Symbol Anchoring process. The robot actively explores its environment and maps sensory-motor experiences on the Isomap. A sequence of Isomap points corresponds to objects in the robot’s environment which can be labelled by a human teacher. The robot then reproduces the labels as it recognizes objects in its environment.
1.2 The Test Platform We use a wheeled mobile robot platform originally designed for sewage pipe inspection for our experiments (KURT-2). The robot is equipped with six wheels, twelve infrared sensors, and two sonar sensors as well as other sensors not used by us in the work described here. The robot carries a conventional laptop running control software developed at our institute.
The test environment of the robot consists of rooms at our offices equipped with a number of everyday objects such as cardboard boxes, walls, bins, etc. as obstacles and objects. For our experiments here, the robot control software drives the robot along walls and from time to time randomly changes direction to explore the whole space. The wall-following behaviour ensures that the robot experiences the objects in similar ways whenever it passes them. 1.3 Dimensionality Reduction and Architecture Most mobile robot platforms today are equipped with sensors such as infrared light detectors that deliver streams of values corresponding to distance measurements of the robot and its environment. An important sub-problem in recognizing objects in the robot’s environment is to correctly detect parts of the signal stream which correspond to measurements during the robot’s interaction with these objects. This detection is difficult due to a number of problems including noise, sensor drift, sensor failure, etc. Also, it is often not clear, at what point in time a time-series of values corresponding to an objects starts. Interestingly, the time-series of the measurements are usually highly correlated. Thus there usually is severe redundancy in the data and the complete higher dimensional space of potential measurements is only sparsely populated by “informative” measurements. Often this means that the space can be embedded in lower dimensions corresponding to the constrained space of the robot interacting with its environment. A number of researchers have in the past made efforts to apply techniques for dimensionality reduction to robot data. A commonly pursued approach is to make use of the dimensionality reduction features of Kohonen maps. While this approach can lead to interesting results, Kohonen maps also suffer from a number of problems. Two major disadvantages are that the map uses a fixed grid for the output units. This, of course, introduces a mapping error whenever a high-dimensional data point is mapped on the twodimensional grid. Other disadvantages concern the unreliable convergence properties of the map and the unknown error function which the Kohonen map minimizes (Flexer 99). In this paper we present an approach that is targeted at a more statistically sound and robust technique for dimensionality reduction, Isomap. The general idea behind this dimensionality reduction is to map a point manifold on a corresponding lower-dimensional embedding. For robotics, a 2- respectively 3-dimensional map is particularly useful because the robot’s “path” on the map can be followed visually and compared with the robot’s movement through the environment. As the robot passes an object, a sequence of sensor readings (real valued vectors) is generated and mapped using the Isomap or a similar technique. Passing an object in the real world then will generate a sequence of points on the map. These “trajectories” can be interpreted as significant signatures of the interaction of the robot with an object. For anchoring symbols, these signatures need to be stored and during object recognition compared with those trajectories already in memory. Note that this mapping is not only performed for the purpose of object recognition. The Isomap in particular has a number of additional features such as visualizing the robot’s path, local neighbourhood preservation, and the ability to “predict” sensor vectors in the vicinity of any given points on the map.
2. Isomapping Robot Data Isomap (Tennenbaum et al. 00) is a technique capable of preserving the nonlinear structure of the input space as captured in the geodesic manifold distances between all pairs of data points. The technique can be considered an improved approach to classical multidimensional scaling (MDS - Kruskal 64) and has recently been successfully applied to video data segmentation problems (Jenkins & Mataric 02). Tenenbaum's Isomap algorithm extracts intrinsic structure (and and therefore reduces dimensionality) by measuring the distance between data points in the geometric shapes formed by items in a nonlinear data set while considering locality.
Two-dimensional Isomap embedding (with neighborhood graph). 600
500
400
0 200
-500 -800 0
-600 -400 600 -200
-200
400 200
0
0
-400
-200
200 -400 400
-600
-600 -800
-600
-400
-200
0
200
400
Figure 2 Isomap of sensory data (IR values) recorded during environment exploration of the mobile robot KURT. Data shown corresponds to app. three minutes of movement. Sensor values have been recorded at 11 Hz which yields app. 1700 vectors. Left graph depicts a three-dimensional embedding, right graph the twodimensional embedding of the same data.
The technique measures the intrinsic geometry of the surface based on the distances between points along the surface. I.e. input space distances are used for neighbouring points and then, for distant points, the geodesic distance is approximated by summing sequences of “short hops” between neighbouring points. On the resulting distance matrix, classical MDS is applied. While MDS works fine for linear data, the Isomap is tailored to more complicated nonlinear spaces, in particular those which can be “flattened out” like cylinders or Swiss rolls. For data that maps mathematically to a roll shape, for instance, the algorithm measures the distance between any two points on the shape, then uses these geodesic distances in combination with the classic multidimensional scaling algorithm in order to make a low dimensional representation of that data. Figure 2 shows the result of applying the Isomap algorithm on sensory data generated from our mobile robot test platform to yield a two-dimensional (right) and three-dimensional (left) embedding. A good map for object recognition should present the data as it develops over time, serve as a basis for discriminating a number of objects, should be tolerant to faulty sensor values, and tolerant with respect to temporal dilation (if the robot moves at a slower pace, for example). The statistical properties of the Isomap nicely fulfil these requirements.
until end of object sequence
next sensor input
object-recognition Start
next sensor input
can be mapped ?
Yes
corresponds to first point of an object
Yes No
No
list of objects …….. …….. …….. ……..
Rebuild Isomap
Isomap
Figure 3 The overall architecture for Isomap-based object recognition. The algorithm reads sensor values and maps them using the Isomap. If the point is in the list of objects, points are read until the end of the object sequence. See text for further description.
In our overall architecture, the robot starts with a first phase of “learning” the Isomap which corresponds to learning the statistical properties of the sensor values in a particular case of robot-environment interaction. Currently, we first collect data from training runs and train the Isomap offline. In our project this approach marks a first step of “growing up” the robot. The algorithm described in the next section assumes that most “experiences” have already been made, at least that no completely new sensor situations arise. Such situations can, however, be detected so that the map can be re-trained if necessary. 3. Object Recognition In order to use the Isomap for object recognition, it is necessary to classify trajectories over points on the map (as the robot moves in its environment) as objects. An example for how the Isomap is used for object recognition and how it translated the problem to sequence comparison is depicted in Figure 3. In the example, the robot passes a dust bin in a time interval of approximately five seconds. 200
100 90
150 80
100
70 60
50 50
0
40 30
-50
20
-100 10
0
0
5
10
15
20
25
30
35
40
-150 -300
-250
-200
-150
-100
-50
0
50
100
150
200
Figure 4 Trajectories of points on the Isomap (right) based on sensor readings as the robot passes a dustbin in the test environment. The run amounts to app. 60 points. Left figure depicts the sensor values during one run (for 4 sensors).
During object recognition, the current sensor value is mapped using a previously generated map and compared with potential candidates of “starting points” of previously stored objects. This means that we map sensor readings on the nearest point in the Isomap (within some plausible limit). If such a point exists, the next points are compared with the fitting sequence. In this way, we recognize situations similar to those that have already been stored in memory. If we train the robot on situations where it interacts with an object (as passing by the dustbin in Fig. 3), this effectively amounts to object recognition. During recognition, the problem of recognizing objects is translated into judging the similarity between sequences on the map. While the Isomap supports recognition because it reduces the dimensionality in the data and thus makes the sequences easily recognizable, two similar objects will not produce exactly the same sequence in two consecutive runs. Examples for typical deviations between sequences which stem from passing the same object are depicted in Figure 4.
Fig. 4 Typical cases of deviations between sequences mapped on the Isomap as the robot passes the same object in two consecutive trials. Single points may lie off the original sequence and in some cases points may have been inserted.
The algorithm for comparing sequences should recognise pairs of trajectories which correspond to the same object and pairs which do not. As the examples in Figure 4 imply, a simply point-to-point Euclidean distance is unlikely to deliver good results (in particular in the case where a single outlier distorts the sequence, or points are inserted). We also studied more elaborate versions of Euclidean distances which eliminate “multiple” points. Better results can be achieved using a simple variant of “Dynamic Time Warping” (Park et al. 1999) in which the sequences are dilated or shortened so as to minimize the error between a given pair of trajectories. For our application here, we simply limit the amount of dilation etc. to remain below a problem-dependent threshold. Figure 5 depicts three different objects on the Isomap which are used as prototypes. It can be seen that points may, of course, be parts of the descriptions of several objects. Also, it is straightforward that in many cases a few points may be sufficient for predicting which object the robot currently perceives. A core problem of the approach is to find the starting and end points of sequences during training. In a symbol anchoring application, the problem is less difficult to overcome: when using a supervised approach, event detection is implicit information during the training process by explicitly labelling prototypical situations for the robot.
200
150
100
50
0
-50
-100
-150 -300
-250
-200
-150
-100
-50
0
50
100
150
200
Figure 5 Three objects depicted as trajectories on the Isomap. Small squares mark the starting points of the sequences. Many sequences start in a region around “idle” points, i.e. points with little sensor activity. (See text for further description.)
An important feature of this approach is that during recognition it does not need complicated event detection for finding the start points of objects. This information is already present in the collected prototypes. 4. Discussion The approach to recognizing objects and situations follows a clear two-stage process. In the first step, the robot learns about the statistical properties of its interactions with the environment and constructs a lower dimensional embedding of its higher dimensional sensor space. In a next step, the robot is trained to recognize objects in memorizing sequences of points on the map. Two questions arise in this context: what happens if the robot should learn about a sequence not present during the learning of the Isomap and how can the robot recognize start and end points of the sequences? 4.1 Previously Unseen Situations The rationale behind our algorithm is to base learning of objects on sufficiently trained Isomap-embeddings of the robot’s experiences. It is, however, not necessary that all the objects to be classified are already present during Isomap training as long as the Isomap receives sufficient information about the structure of the robot-environment interaction. As an example consider the case depicted in Fig. 6. Here, the robot passes a small tube that was not present in the part of the data set used for Isomap generation. However, in principle, the sensory experience during recognizing this object is similar to other situations so that the algorithm can easily construct a sequence of points for this situation. Fig. 5 also depicts the Isomap sequence for passing the same tube at a slightly different angle again.
150
100
90
100 80 70
50
60
0
50 40
-50 30
20
-100 10
-150 -40
0
-20
0
20
40
60
80
100
120
140
0
10
20
30
40
50
60
70
Figure 6 Trajectories of points on the Isomap (left) based on sensor readings as the robot passes a tube in the test environment and the sensors used (right figure). The two sequences correspond to two distinct runs in which the robot passed the same object in different angles and velocity. Both runs amount to app. 60 points. Right figure depicts the sensor values during one run (for 4 sensors).
If, however, the sensor vectors become too different from those that were originally used for Isomap training, re-calculation of the Isomap becomes necessary. 4.2 Event Points Apart from simply telling the robot when exactly it passes an object, it is often possible to use significant points during learning to correctly find start and end points of objects. One simple approach are “idle” points in the sensory data. These idle points correspond to low sensor activity (vector of all zeros or other values corresponding to “no object” readings of the sensors) which can be a good indicator for empty space between objects to be recognized. Figure 5 shows that in the simple test case, all three object descriptions start at or near such an “idle” point. This nicely reflects the structure of the environment used in our experiments. For nearly all objects, there is some “empty” space between these objects. More generally, it is also possible to use “difference” points for this purpose. These are points that mark significant changes in the stream of sensory values. Note that the Isomap itself can serve as an indicator for finding such difference points since they often correspond to significant jumps on the map. This simply follows from the properties of the lower-dimensional embedding. In all these cases, however, the system can no longer rely on “one shot” learning and statistical techniques need to be employed for finding the right start and end points. 4.3 Further Work The data presented here describe our first experiments with the Isomap approach to recognizing objects and situations in robot-environment interaction. While the technique works well for selected test cases, more evaluation and automation of the procedures is necessary. This concerns primarily the detection of start and end points for object-type
situations. We are also working on developing an incremental variant of the Isomap training so that the re-learning of the map can be performed on-line. The map itself has many interesting features, such as that nearby points on the map correspond to similar sensor readings. This can be used for short-term extrapolation of current trajectories on the map, i.e. for predicting what will happen next. The most interesting feature of the Isomap is that it performs a statistically sound dimensionality reduction and visualization of the data typically encountered in robot-environment interaction that also has excellent convergence properties. Acknowledgements This research is supported by the European Commission’s Information Society Technology Programme project “SIGNAL” IST-2000-29225. Partners in this project are University of Bonn, Napier University, Istituto di Elettronica e di Ingegneria dell’Informazione e delle Telecomunicazioni at Genoa, and the Austrian Research Institute for AI, which is also supported by the Austrian Federal Ministry for Education, Science, and Culture. References [1]
A. Billard, K. Dautenhahn, Grounding communication in situated, social robots, Proc. of TIMR 97, Towards Intelligent Mobile Robots Conference, Tech. Rep. UMCS-979-1, Dept of Computer Science, Manchester University, 1997.
[2]
A. Billard, K. Dautenhahn. Experiments in learning by imitation – grounding and use of communication in robotic agents. Adaptive Behavior, 7 (3/4), 1999.
[3]
R.A. Brooks, Intelligence without representation. Foundations of Artificial Intelligence, Artificial Intelligence, 47 (1-3), 1991.
[4]
A. Cangelosi, Evolution of Communication and Language Using Signals, Symbols, and Words, IEEE Transactions on Evolutionary Computation, Vol.5 (2), pp. 93-101, 2001.
[5]
S. Coradeschi, A. Saffioti, Perceptual anchoring – Anchoring symbols to sensor data in single and multiple robot systems, Special Issue of Robotics and Autonomous Systems, Vol. 43 (1-2), 2003.
[6]
A. Flexer, On the Use of Self-Organizing Maps for Clustering and Visualization, Principles of Data Mining and Knowledge Discovery, pp. 80-88, 1999.
[7]
S. Harnad, Categorical Perception: The Groundwork of Cognition. Cambridge University Press, Cambridge, UK, 1987.
[8]
S. Harnad, The symbol grounding problem, Physica D, 42, pp. 335-346, 1990.
[9]
S. Harnad, Symbol grounding is an empirical problem, Proc.of the 15th Annual Conference of the Cognitive Science Society, Boulder, CO, Lawrence Erlbaum Associates, pp. 169-174, 1993.
[10] F. Heinen, F.S. Osório, A Robust Hybrid Control Architecture for Autonomous Robots. 2nd International Conference on Hybrid Systems (HIS’02), Santiago, Chile, 2002. [11] O.C.Jenkins, M.J.Mataric, Automated modularization of human motion into actions and behaviours. USC Center for Robotics and Embedded Systems, Technical Report CRES-02-002, University of Southern California, 2002. [12] J. Kruskal, Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29:115-129, 1964. [13] S. Park, D. Lee and W.W. Chu, Fast Retrieval of Similar Subsequences in Long Sequence Databases. Proceedings of the 3rd IEEE Knowledge and Data Engineering Exchange Workshop, 1999. [14] P.M. Poelz, E. Hoertnagl, Erich Prem, Processing and Clustering Time Series of Mobile Robot Sensory Data. Report, Austrian Research Institute for Artificial Intelligence, TR-2003-10, Vienna, Austria. 2003. [15] E. Prem, E. Hoertnagl, G. Dorffner, Growing event memories for autonomous robots. In Proceedings of the Workshop On Growing Artifacts That Live, Seventh Int. Conf. on Simulation of Adaptive Behavior, Edinburgh, Scotland. 2002. [16] L. Steels, Language games for autonomous robots. In: Semisentient Robots, IEEE Intelligent Systems, 16 (5), pp. 16-22, 2001. [17] L. Steels, P. Vogt, Grounding adaptive language games in robotic agents. In: P.Husbands & I. Harvey (eds.), Fourth European Conference on Artificial Life (ECAL97), MIT Press/Bradford Books, Cambridge, MA, pp. 474-482, 1997. [18] J.B. Tennenbaum, V. de Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000.