KANTRA - A Natural Language Interface for Intelligent Robots - DFKI

7 downloads 6963 Views 198KB Size Report
Karlsruhe, D-76128 Karlsruhe, email: [email protected] ?? Faculty of ... Abstract. The future use of advanced technical systems, for example robots in the ..... Int. Conf. on Robotics and Automation, San Diego, CA, 1994. [Neumann 89] B.
Sonderforschungsbereich 314 Kunstliche ¨ Intelligenz - Wissensbasierte Systeme KI-Labor am Lehrstuhl fur ¨ Informatik IV Leitung: Prof. Dr. W. Wahlster

VITRA

Universitat ¨ des Saarlandes FB 14 Informatik IV Postfach 151150 D-66041 Saarbrucken ¨ Fed. Rep. of Germany Tel. 0681 / 302-2363

Bericht Nr. 114

KANTRA - A Natural Language Interface for Intelligent Robots

Thomas Laengle, Tim C. Lueth Eva Stopp, Gerd Herzog, Gjertrud Kamstrup

Marz ¨ 1995

ISSN 0944-7814

114

KANTRA - A Natural Language Interface for Intelligent Robots Thomas Laengle?, Tim C. Lueth? Eva Stopp, Gerd Herzog, Gjertrud Kamstrup?? SFB 314 – Project VITRA, FB 14 Informatik, University of the Saarland, D-66041 ¨ Saarbrucken, email: [email protected] ? Institute for Real-Time Computer Systems and Robotics, University of Karlsruhe, D-76128 Karlsruhe, email: [email protected] ?? Faculty of Electrical Engineering and Computer Science, Norwegian Institute of Technology, N-7034 Trondheim Norway, email: [email protected] Abstract The future use of advanced technical systems, for example robots in the area of service and maintenance, will lead to high demands for interaction between man-machine and machine-man to make these systems more easily accessible for human operators. On that account, natural language could be an efficient means to use robots in a more flexible manner. First, it is possible to convey information in a varying degree of condensation, on the other hand, communication can be performed on different levels of abstraction in an application-specific way. In order to fully exploit the capabilities of a natural language access, a dialogue-based approach will be used. In this article, we want to report about the joint efforts of the University of Karlsruhe and the University of the Saarland in providing natural language access to the autonomous mobile two-arm robot KAMRO which is being developed at IPR. A closer view at the description of the interface architecture will be given. It is documented how the integration of the man-machine interface into the control architecture of the robot should be performed in order to supply access to all internal information and models that are necessary for an autonomous behaviour. Key Words. Natural Language, Man-Machine-Interface, Robot-Human-Interaction

1

Introduction

Natural language and robotics are two major areas of Artificial Intelligence, but they have been studied rather independently in the past. Only a few research works use natural language as a tool in human-machine-interaction. Some of these works will be presented in the next section. As advanced robot systems steadily gain higher intelligence and greater autonomy, the requirements for the design of flexible interfaces to control these systems increase. This is for instance the case, according to [Rembold et al. 93], for future use of intelligent robots in manufacturing or as service robots for different applications. On that account, Proceedings of the 4th International Conference on Intelligent Autonomous Systems

1

natural language as a communication medium for humans is an efficient means to make a technical system easier accessible to its users, [Wahlster 89]. A practical advantage of a natural language access is the possibility to convey information in a varying degree of condensation and to communicate on different levels of abstraction in an application-specific way.

2

State of the Art

Some of the research which has been carried out in order to combine the fields of robotics and natural language processing will now be presented. In [Sondheimer 76], it is focused on the problem of spatial reference in natural language machine control. The well known SHAKEY system, [Nilsson 84], a mobile robot without manipulators, is able to understand simple commands given in natural language. The work described in [Sato & Hirai 87] concentrates on language-aided instruction for teleoperational control. Specific words can be utilized to simplify the specification of teleoperational functions for the instruction of a remote robot system. In [Torrance 94], a natural language interface for a navigating indoor office-based mobile robot is presented. In addition to giving commands and asking questions about the robot’s plans the user can associate arbitrary names with specific locations in the environment. Some theoretical aspects of natural language communication with robot systems from the perspective of computer linguistics are discussed in [Lobin 92]. Other approaches have been concerned with natural language control of autonomous agents within simulated 2D or 3D environments, [Badler et al. 91; Chapman 91; Vere & Bickmore 90]. One salient aspect for natural language access to robot systems is the relationship between sensory information and verbal descriptions. Such issues have already been investigated in the field of integrated natural language and vision processing, [Bajcsy et al. 85; Herzog & Wazinski 94; Neumann 89; Wahlster et al. 83].

3

The Intelligent Mobile Robot KAMRO

Higher intelligence and greater autonomy of more advanced robot systems increase the requirements for the design of a flexible interface to control the system on different levels of abstraction. In the KAMRO (Karlsruhe Autonomous Mobile RObot) project, for example, an autonomous mobile robot (Fig. 1) for assembly tasks is being developed with the capability of recovering from error situa¨ & Rembold 94]. The autonomous mobile robot KAMRO is a two-arm tions, [Luth robot-system that consists of a mobile platform with an omnidirectional drive system, two Puma 260 manipulators, and different sensors for navigation, docking and manipulation. KAMRO is capable of performing assembly tasks (Fig. 2) autonomously. The tasks or robot operations can be described on different levels: assembly precedence graphs, implicit elementary operations (pick, place) and explicit elementary operations (grasp, transfer, fine motion, join, exchange, etc.). A given complex task is transformed by the control architecture (Fig. 3) from assembly precendence graph level to explicit elementary operation level. The 2

Figure 1: The Mobile Robot KAMRO

generation of suitable sequences of elementary operations depends on position and orientation of the assembly parts on the worktable while execution is controlled by the real-time robot control system. Status and sensor data which is given back to the planning-system enable KAMRO to control the execution of the plan and correct it, if necessary.

4

Human-Robot Interaction

Intelligently behaving autonomous robot systems have several sensor systems to provide the perceptual capabilities which are necessary to explore and analyze their environment, e.g., tactile, acoustic, and vision sensors. They use this information to generate an environment model. But an intelligent robot sometimes is not able to complete incomplete information from its sensors and its knowledge base. In this situation, it is an advantage to query the human operator for the missing information. So, we argue that a natural language interface should not use natural language just as a command-language. There should exist a dialog between user and autonomous system to resolve ambiguities and misunderstandings.

3

Figure 2: The Cranfield Assembly Benchmark

In the context of natural language access we consider four main situations of human-machine interaction:

 Task specification: Operations and tasks to be performed by the robot can be given on different levels of abstraction: from high-level commands like "assemble benchmark", implicit robot operations, e.g., "pick sideplate", to explicit robot operations like "grasp" or "finemotion". Or the operator could just give a description of the final positions of the considered objects.  Execution monitoring: One of the most significant features of autonomous systems is the possibility to work up an assembly mission in different orders. Because of this property the operator should be informed about what the robot is actually doing: descriptions and explanations can be given in more or less detail.  Explanation of error recovering: Autonomous systems normally are able to recover from error situations. This capability could cause comprehension problems for the user because the robot sometimes does not behave as expected. So an explanation why and how plans have been changed increases cooperativeness.  Updating and describing the environment representation: Since the visual field of an autonomous mobile robot is restricted geometric and visual data can fairly be complete in dynamic complex environments. The human operator can aid the robot in maintaining the environment representation by providing additional information in natural language. On the other hand, he should also have the possibility to ask for verbal descriptions of the scene. Most existing natural language interfaces have been developed in order to provide access to databases or expert systems. In general, three main modules can be distinguished: 4

Knowledge base

Action plan Plan execution system FATE Robot operation

Status Real-time robot control RT-RCS

KAMRO

Figure 3: Structure of the KAMRO System

 Analysis component: Natural language input must be translated by a parser into a semantic representation encoded in a knowledge representation language.  Evaluation component: Then, the utterances are interpreted with respect to internal world knowledge of the intelligent system. This component forms the interface between natural language access and autonomous system. Feedback from the application system is given back to the dialog system which has to contact the user.  Generation component: The information given by the evaluation component then has to be translated into natural language utterances depending on the situational context. Fig. 4 shows the resulting architecture of our KANTRA system (KAmro Natural language TRAnslator). Autonomous system and dialog system must continuously update their environment model. This must especially be done after the execution of a command. In order to analyse an utterance we must be able to identify the objects in the utterance because, in general, one cannot rely on unique identifiers. According to [Herskovits 86] spatial expressions are used to describe the location of an object in order to identify it. Such spatial expressions must be related to visual and geometric information about the environment, i.e., a referential semantics must be defined. After the analysis of an utterance its result must be transferred to the robot using a representation for the different robot commands. Since the autonomous mobile robot KAMRO is a maximally cooperative system instructions can be given as short as possible, underspecified information can to a certain degree be com-

5

Commands Queries Analysis

Morpho-Syntactic Knowledge Conceptual Knowledge

Encapsulated Knowledge

User Model

Evaluation

of the Robot

Linguistic Dialog Memory

...

Autonomous Mobile Robot

Generation Descriptions Explanations Queries

Task Representation

KAMRO

Environment Representation Execution Representation

Figure 4: Structure of the Natural Language Interface KANTRA

pleted by the robot itself, while some other uncertainties are removed by the dialog system. An autonomous system has a planning component which is responsible for the correct execution of plans. If the commands are given by the user certain error situations can occur, e.g., a manipulator can only place an object if it has picked it before. This information often is intended by the user but not mentioned in the utterance. Another problem is that a robot only has a certain number of manipulators. If the operator gives a sequence with more pick commands as manipulators without any place between them the robot will not be able to perform the instructions.

5

Environment Representation

The correct environment representation must permanently be accessible for KAMRO and its natural language interface. On that account, the robot uses one of its visual sensor, the overhead camera, to record the situation on the workbench (Fig. 5). In order to make this information available to the KANTRA system, it is stored in a common database. World representation changes in time, so it is important to use a timestamp of the snapshot. This way, it is possible to merge older and newer knowledge about the environment. For each object, the database contains the following information: