state-of-the-art in the field of mobile robots. Researchers are ..... tasks [3]. In this kind of application the neural network per- .... the development of methods for handling uncertainty on the ... 1802â1807, Los Angeles, CA, December 1987. IEEE.
Concurrent Object Identification and Localization for a Mobile Robot Hans A. Kestler Stefan Sablatn¨og Gerhard K. Kraetzschmar
Steffen Simon Stefan Enderle Axel Baune Friedhelm Schwenker G¨unther Palm
Abstract Identification and localization of task-relevant objects is an essential problem for advanced service robots. We integrate state-of-the-art techniques both for object identification and object localization to solve this problem. Based on a multilevel spatial representation architecture, our approach integrates methods for mapping, self-localization and spatial reasoning for navigation with visual attention, feature detection, and hierarchical neural classifiers. By combining probabilistic representations with qualitative spatial representations, the robot can robustly localize and navigate to previously detected objects, and also associate symbolic knowledge with task-relevant objects, which is essential for task planning and interaction with humans.
1 Introduction and Motivation Robots like MINERVA [25] convincingly demonstrate the state-of-the-art in the field of mobile robots. Researchers are now following various directions to extend the capabilities of mobile robots [12, 26]. One direction of particular interest in our group is to enhance the robot’s capability to identify and locate objects [16, 19], which become task-relevant either immediately or at a later point in time. This capability would allow the robot to perform new kinds of tasks, which most current robots cannot perform. A simple example is to find a well-known object, which happens to be on different desks over time (“Find the French dictionary!”). It is obvious that the described capability is also a prerequisite for most object manipulation capabilities. For illustration, we introduce a very rudimentary scenario 1. The task of the robot is to securely navigate within a typical office environment and to update the location of a set of objects with simple geometrical shapes. Whenever the robot enters a room, it should scan the room and verify the presence and location of previously detected objects, update their locations, and register newly discovered objects. On demand, the robot should be able to present a list of objects it has seen, together with symbolic descriptions of their locations, to a human user, e.g. by sending email. The French dictionary mentioned above could occur in the list as “A French dictionary was seen on Birgit’s desk in room 4.27 at 12:30”. In order to successfully manage such a task, the robot must have functionality a) to represent symbolic information about objects (object representation), e.g. descriptive identifiers and object attributes, b) to represent spatial information about objects and relate them to spatial knowledge about the
Figure 1: A typical scene for the object identification and localization task. Object Representation - knowledge base (descriptive identifiers, object attributes) - spatial inference
Object identification
Object Localization
- visual attention - feature detection - hierarchical classification
- multi-level spatial representations - sensor interpretation - sensor fusion - spatial abstraction - spatial queries
Figure 2: A rough sketch of the software architecture.
environment (object localization), and c) to visually identify objects (object identification). Accordingly, the software architecture consists of three major components (see Figure 2). In order to properly localize an identified object, relative position information derived from camera information is combined with the robot’s own position and symbolically represented knowledge about the object (e.g. its size). Both object localization and identification depend on noisy sensor information; in order to deal with the resulting uncertainties, probabilistic representations are of special interest. If an object was previously known, the localization component will merely update the object’s position. Registration of new objects encompasses creation of an appropriate object instance in the knowledge base and mapping the object in the localization component.
2 Object Representation
Knowledge Base qualitative symbolic knowledge
The knowledge base used for our project provides a semantic hierarchy of object classes needed to model typical office environments. Object instances have descriptive identifiers, which are useful for interaction with humans, and attributes describing object features like color, weight, size, etc. Further object attributes are linked to data objects in other representation mechanisms, thereby serving as means to combine symbolic and subsymbolic information. For example, each physical object is associated with a region in a region map, which represents its position and extent in the object localization component. The knowledge base was implemented with the LOOM knowledge representation system [18]. In addition to LOOM, we used a visual environment modelling tool [10], which allows three-dimensional visualizations of environments and uses quantitative information provided in the LOOM knowledge base. All quantitative calculations are performed outside of LOOM and are restricted to collision detection between the three dimensional models of the objects under consideration ([10]). The integration of subsymbolic (typically neural networkbased) and symbolic representations (neurosymbolic integration) is a core research interest of our group. Topics of particular interest are space, time, and uncertainty. Representing space, time, and uncertainty in symbolic knowledge representation systems is quite hard. Although well-known calculi exist for dealing with qualitative aspects of spatial [5] and temporal [1] knowledge, efficient implementations are difficult to obtain. As a consequence, we decided to handle these issues in lower-level representation levels and by using special mechanisms.
3 Object Localization The object localization component provides essential services to a variety of other components in our robot’s software architecture: Building and maintaining spatial representations suitable for navigation, including maps for path planning, trajectory planning, and collision avoidance. Determining the position and orientation of the robot within these environment models (self-localization). Localizing objects, which includes spatial abstraction from low-level representations and associating such abstractions with symbolic representations of objects in the knowledge base. Answering spatial queries, including the instantiation of relational primitives in qualitative spatial representations. As these services span a wide variety of representational requirements, we decided to build a multi-level spatial representation system, called DYNAMO (see Figure 3).
A tpp B
Region Map quantitative grounding of symbols
A
B
Occupancy Map temporal and multimodal sensorfusion
Figure 3: The DYNAMO spatial representation architecture.
3.1 Occupancy Maps On our robot, occupancy maps [6] are probabilistic spatial representations which are automatically built and maintained by interpreting data from distance sensor systems like sonar rings and laser scanners. We use the generic architecture proposed in [7], which divides the map-building task in three subtasks: a) sensor interpretation, b) sensor fusion, and c) temporal integration. Sensor Interpretation. The sensor interpretation task for a specific sensor class (e.g. laser scanner, sonar sensors, etc.) transforms a single sensor scan into an egocentric occupancy map , and is formally modeled as mapping . The interpretation functions and are different, because the sensors have different physical characteristics, which are reflected in the sensors’ data. The sonar scan which consists of 24 distance readings is interpreted by a neural network architecture proposed in [8]. The network itself determines the distance to the nearest sonar-detectable obstacle in a given direction (see Figure 4). This approach performs early fusion of data obtained by different sonar sensors and is able to correct errors being made by only looking at a single sonar reading. The input of the preprocessing module consists of the sensor scan !#"%$'&'()))(*$,+-/. , as well as the angle 0213 and the distance 451,3 to the desired point "768(:9;. . The preprocessing step mainly extracts a partial scan =< from which holds only the readings of the > sensors pointing in direction 0?1,3 . This step was proposed in [24] and reduces the network size. Unlike Thrun, we present 021,3 relative to the “middle” sonar. Formally, let @A be the direction of sonar , then the preprocessing step computes B