Training Personal Robots Using Natural Language ... - CiteSeerX

5 downloads 1059 Views 1MB Size Report
tion, MIT Media Laboratory, Cambridge, Mass., 1999. 2. S.B. Huffman ... M.C. Torrance, Natural Communication with Robots, MSc thesis, MIT Dept. of Electrical.

S e m i s e n t i e n t

R o b o t s

Training Personal Robots Using Natural Language Instruction Stanislao Lauria, Guido Bugmann, and Theocharis Kyriacou, University of Plymouth Johan Bos and Ewan Klein, University of Edinburgh


ntelligent robots must execute reasonably complicated tasks with some degree of autonomy. This requires both adaptivity to a dynamic environment and the ability to

generate plans. Domestic robots in particular must adapt to their user’s special needs. However, most users are naive about computer language and thus cannot personalize

As domestic robots become pervasive, uninitiated users will need a way to instruct them to adapt to their particular needs. The authors are designing a practical system that uses natural language to instruct a visionbased robot.


progress on each, and the challenges we’re encountering along the way.

robots using standard programming methods. Indirect methods, such as learning by reinforcement or imitation, are also inappropriate for acquiring userspecific knowledge. Learning by reinforcement, for example, is a lengthy process and, while suitable for refining low-level motor controls, is impractical for complex tasks. Learning by imitation also has limited scope. Neither method can readily generate knowledge representations that the user can interrogate. We are currently exploring an alternative method: Instruction-Based Learning (IBL), which trains robots using natural language instruction. As the “Related Work” sidebar explains, most previous work in this area has focused on issuing commands or language learning. IBL uses unconstrained language in a real-world robotic application that learns prior to execution. It thus offers several potential advantages. Natural language can concisely express rules and command sequences. Also, because it uses symbols and syntactic rules, it is well suited to interact with robots that represent knowledge at the symbolic level. Such symbolic communication can help robots learn faster1 than those that learn at the sensory-motor association level. Here we describe our initial steps toward realizing an IBL system. Along with an overview of how IBL works, we discuss its key steps in detail, our

Process overview We start this process with a predefined initial knowledge. This “innate” knowledge consists of primitive sensory-motor procedures with names, such as turn_left or follow_the_road. These names constitute the symbols; the pieces of computer program that execute procedures are the actions (see Figure 1a). We ground the symbols by associating each one with an action.2 When a user explains a new procedure to the robot—say, a route from A to B that involves several primitive actions—the IBL system assigns a name to the new procedure and writes a new piece of program code to execute it. Within that code, primitive actions are referred to by name. The system does not duplicate the low-level code defining theses primitives. Given this, the new program can be seen as a combination of

1094-7167/01/$10.00 © 2001 IEEE


How IBL works In IBL, we convert the user’s verbal instructions into new internal program code that represents new procedures. Such procedures become part of a procedure pool that robots reuse to learn increasingly complex procedures. Hence, the robot should be capable of executing increasingly complex tasks.

Related Work symbols rather than a combination of actions (see Figure 1b). Because all new procedures are constructed from grounded symbols, they become grounded by inheritance and thus the system can “understand” them when they are referred to in natural language. When explaining a new procedure, users can also refer to old procedures they’ve previously defined. In this way, they increase the complexity of the robot’s symbolic knowledge (see Figure 1c). Generating program code is thus a key part of IBL; we accomplish this using the Python scripting language. Error detection Human–machine communication is notoriously error-prone. Error types range from the robot misrecognizing a word to the user issuing incomplete instructions. One way to detect errors is to verify whether the learned action sequence is executable. To this end, we represent each procedure as a triplet—SiAijSj —with properties similar to productions in the State Operator and Result system (SOAR).3 The state Si is the precondition for action Aij, and the state Sj is the final state, resulting from action Aij applied to the initial state Si. To realize a sequence of actions, the final state of one action must be compatible with the precondition of the next one. If necessary, the system will attempt to satisfy this condition by asking the user for more information. Toward an application To evaluate IBL’s potential and limitations, we’ve developed a real-world instructions task that is simple enough to realize and generic enough to produce conclusions that will hold for other task domains. Our environment is a miniature town that is 170 cm × 120 cm (see Figure 2). Our task is a simple route scenario that uses real speech input and a vision-based robot to navigate the instructed route.4 We’ll now discuss in detail the key components of IBL—corpus collection and data analysis, natural language instruction, and robot manager design—and follow this with a general discussion of our work.

Corpus collection and data analysis In the corpus collection phase, we record samples of task-specific dialogues. We then analyze the data to create a functional vocabulary. SEPTEMBER/OCTOBER 2001

Previous work on verbal communication with robots has mainly focused on issuing commands—that is, on activating preprogrammed procedures using a limited vocabulary. An example of this type of work is the office navigation contest held in 1995 at the International Joint Conference on Artificial Intelligence. Other work has focused on language learning, such as that by Deb Roy and his colleagues.1 Only a few research groups have considered learning as the stable and reusable acquisition of new procedural knowledge. One inspiring project was InstructoSOAR.2 This system used textual input into a simulation of a manipulator with a discrete state and action space. Another investigation3 used voice input to teach displacements within a room and mathematical operations, but with no reusability. In other research,4 textual input was used to build a graph representation of spatial knowledge. This system was brittle because researchers used odometric data for place recognition and IR sensors for reactive motion control. Also, knowledge acquisition was concurrent with navigation, not prior to it. Concurrent learning was also used in Jijo-2, an “office conversant mobile robot.”5 More recent projects with related scope are CARL6 and Hermes.7

References 1. D. Roy, Learning Words from Signs and Sounds: A Computational Model, doctoral dissertation, MIT Media Laboratory, Cambridge, Mass., 1999. 2. S.B. Huffman and J.E. Laird, “Flexibly Instructable Agents,” J. Artificial Intelligence Research, vol. 3, 1995, pp. 271–324. 3. C. Crangle and P. Suppes, “Language and Learning for Robots,” CSLI Lecture Notes No. 41, Center for the Study of Language and Communication, Stanford, Calif., 1994. 4. M.C. Torrance, Natural Communication with Robots, MSc thesis, MIT Dept. of Electrical Engineering and Computer Science, Cambridge, Mass., 1994. 5. H. Asoh et al., “Socially Embedded Learning of the Office-Conversant Mobile Robot, Jijo2,” Proc. 15th Int’l Joint Conf. Artificial Intelligence (IJCAI’97), Springer-Verlag, Berlin, 1997, pp. 880–885. 6. L. Seabra Lopes and A.J.S. Teixeira, “Human-Robot Interaction through Spoken Language Dialogue,” Proc. IEEE/RSJ Int’l Conf. Intelligent Robots and Systems, IEEE CS Press, Los Alamitos, Calif., 2000. 7. R. Bischoff and T. Jain, “Natural Communication and Interaction with Humanoid Robots,” Second Int’l Symp. Humanoid Robots, 1999; available online at campus/LRT6/staff/bif/bif_a17.htm (current 10 Sept. 2001).

Symbolic level

Innate links

Action level




Figure 1. IBL’s symbolic learning process. (a) A schematic representation of the initial system, comprising symbols associated with preprogrammed primitive action procedures. (b) We define a new procedure (the open circle) as a combination of symbols. The new procedure is grounded because we construct it out of grounded symbols. (c) We define a new procedure by combining one we’ve previously defined with primitive action procedures.


S e m i s e n t i e n t

R o b o t s




Figure 2. The IBL experimental environment. (a) Users give the robot route instructions to navigate through the miniature town. Letters indicate the destinations and origins of various routes, including the museum (M), the post office (X), and the library (Y). The miniature robot (b) is 8 cm × 8 cm at base and carries a CCD color TV camera (628 × 582 pixels), a TV VHF transmitter, and an FM radio. (c) The view from the onboard color camera. Images are processed by a PC, which acquires images with a TV capture card. The PC sends the robot motion commands through an FM radio.

Dialogue sample We derived the following sample from our corpus collection, and we’ll use it throughout to explain IBL and its implementation. In the sample, the user asks the robot to go from the museum to the library. The robot responds with a request for explanation. The user replies 4

with three utterances to explain the route, using the post office route explained earlier in the session. The robot’s initial position is the museum (M in Figure 2a). User: Go to the library. Robot: How do I go to the library? User: Go to the post office, go straight ahead, the library is on your left

Corpus analysis: The functional vocabulary. Our aim in corpus analysis is twofold. First, we define the users’ vocabulary to tune the speech recognition system for optimal task performance. Second, we establish the functional vocabulary—a list of primitive procedures that users refer to in their instruc-

tions. We then preprogram these so that the robot can directly translate from natural language to the grounded symbols (we discuss task vocabulary in more detail elsewhere4). To functionally analyze the corpus from our experiments, we merged groups A, B, and C. Our discussion here of instruction annotation in terms or procedures is somewhat subjective and influenced by two considerations: 1. The defined primitives will be realized as computer programs. Therefore, we transcribed the corpus into a few general procedures, characterized by several parameters (see Table 1). 2. Knowledge representation is an important issue. According to the SAS repre-

16 14 Average unique functions

Data collection To collect linguistic and functional data specific to route learning, we recorded 24 subjects as they gave the robot route instructions in the environment. We divided the subjects into three groups of eight. The first two groups (A and B) used totally unconstrained speech and provided the performance baseline. We assume that a robot that can understand these instructions as well as a human operator represents the ideal standard. Subjects from group C communicated with a remote human operator, who then induced shorter utterances based on brief interactions with each subject. We told the subjects in groups A and B that a human operator would use their recorded instructions to teleoperate the robot at a later date. We specifically told them that the human operator would be located in another room, seeing only the image from the robot’s wireless onboard video camera (see Figure 2c). This encouraged subjects to use a camera-centered viewpoint relevant for future robot autonomous navigation. Each subject described six routes that had the same starting point and six different destinations. We changed the starting points after every two subjects and collected a total of 144 route descriptions. More details on the corpus collection and analysis are available elsewhere.4

12 10 8 6 4 2 0 0



60 80 Route descriptions




Figure 3. Average number of unique procedures as a function of the number of collected route instructions. We obtained the curve by averaging over 50 sets comprising N route descriptions randomly sampled from 144 collected descriptions. The x-axis shows the number N. The slope of the curve indicates that, on average, we’ll add one new function to the functional lexicon for every 38 additional route instructions collected.


Table 1. Primitive navigation procedures collected from route descriptions. Count 1


2 3

183 147

4 5 6 7 8 9 10 11 12 13 14

62 49 42 12 4 4 2 2 2 1 1

Primitive procedures MOVE FORWARD UNTIL [(past |over |across) ] | [(half_way_of | end_of) street ] | [ after [left | right]] | [road_bend] TAKE THE [] turn [(left | right)] | [(before | after | at) ] IS LOCATED [left | right |ahead] | [(at | next_to | left_of | right_of | in_front_of | past | behind | on | opposite | near) < landmark >] | [(half_way_of | end_of | beginning_of | across) street] | [between and ] | [on turning (left | right)] GO (before | after | to) GO ROUND ROUNDABOUT [left | right] | [(after | before | at) ] TAKE THE EXIT [(before | after | at) ] FOLLOW KNOWN ROUTE TO UNTIL (before | after | at) TAKE ROADBEND [left | right] STATIONARY TURN [left | right | around] | [at | from ] CROSS ROAD TAKE THE ROAD in_front GO ROUND TO [front | back | left_side | right_side] PARK AT EXIT [car_park | park]

sentation described earlier, primitive procedures always have a starting and terminating condition. However, subjects rarely specified explicitly the starting point of an action, nor did they always define the final state in the same utterance. We assumed that the IBL system would retrieve missing information from the context. Therefore, we assumed that all actions that the subjects referred to were of the SAS type. For example, when a subject specified a nonterminating action, such as keep going, we classified it as move forward until, assuming that the system would infer a termination point from the next specified action.

• • • •

speech recognition, linguistic analysis, ambiguity resolution, and dialogue updating.

To realize these subtasks, we designed a dialogue manager, which acts as an interface between the user and the robot manager (see Figure 4). As the figure shows, the dialogue manager converts speech input into a seman-


tic representation and converts robot manager requests into user dialogue. The system runs its components as different processes communicating with each other through a blackboard architecture. The robot manager listens to new messages from the dialogue manager while processing previous ones using a multithread approach. It does this by launching a message-evaluation thread, execution process, through its communication interface. It then resumes listening to the dialogue manager. The execution process thread’s

Dialogue manager

Speech recognizer Utterance generator Dialogue Move Engine (DME)

Generation & synthesis

Communication interface

Knowledge database Communication interface

Execution process Process launcher Robot procedure





Our analysis methodology differs from other, similar analyses.5 In our analysis, there are no statements describing landmarks— these are integrated into the procedure specifications—and consequently there are no actions without reference to landmarks either. We annotated the instructions manually because there is no off-the-shelf tool to do it automatically. Table 1 shows the list of procedures we found in the route descriptions. Most subjects used Procedure 3 to indicate a route’s last leg, when the goal is in sight. Figure 3 shows that the number of distinct procedures increases with the number of sampled instructions. In our case, we discovered an average of one new procedure for every 38 route instructions. This is a much smaller rate than we found when subjects introduce new words, which averaged about one per route instruction.4 In Table 1, the new procedures typically appear with the least frequency.

domain is typically divided into four subtasks:

Robot manager

Natural language instruction The process of understanding instructions in spoken English in the personal robot training SEPTEMBER/OCTOBER 2001

Figure 4. IBL system architecture. The dialogue manager acts as a bidirectional interface between the robot manager and the user.


S e m i s e n t i e n t

R o b o t s

knowledge based on the corpus analysis and is tuned for our domain (talking to mobile robots). The grammar rules support features for both syntactic constraints and for producing logical forms. Because GSL doesn’t support left-recursive rules or a feature-value system, to compile the unification grammar to GSL format we eliminated left-recursive rules within the grammar and replaced syntactic category symbol features (and their possible values). As a consequence, our speech recognition language models are enormous compared to the original unification grammar, but still feasible for small lexicons (a few hundred words, in the case of IBL). More importantly, our models are linguistically motivated. Each word in the lexicon is given a semantic representation, and we compile the syntactic structure’s semantic operations into GSL as well. As a pleasant side effect, speech recognition and semantic construction are integrated into one component. In other words, we shortcut the parsing and compositional semantics into a single component. Hence, the speech recognizer’s direct output is a logical form, rather than a list of words. Figure 5. The sample dialogue’s discourse representation structure (DRS). The dialogue manager produces the DRS, which is similar to a world model. Discourse referents stand for model objects with assigned properties.

aim is to understand the dialogue manager’s message and act accordingly. Speech recognition Speech recognition maps acoustic signals into symbolic representations. In general, the speech recognizer’s output is a word lattice (covering all possible recognized patterns), but in its simplest form, it is a list of words associated with a confidence score. Given our limited domain (and a lexicon well under 1,000 words), off-the-shelf speech recognition tools give us enough power to perform the task, even when we account for speakerindependent requirements. The prototype we are currently developing uses Nuance speech tools for speech recognition. Nuance’s speech technology lets you specify a speech recognition package on the basis of its grammar specification language (GSL). Nuance technology uses GSL grammar for both language modeling and parsing. This contrasts with traditional approaches, in which the speech recognizer’s output is typically a word lattice, which is 6

fed to the parser to filter out nongrammatical interpretations and produce a meaningful representation of the user’s instruction. We faced various difficulties in successfully integrating a speech recognition component into a working prototype. Some of them have practical importance, such as speaking in noisy environments, speaking in situations with different acoustic qualities, speaking with variable distances between the speaker and the robot’s microphone, or knowing when users are talking to the robots and when they’re not. Other challenges are more theoretical and related to the process of designing a GSL grammar, which we believe should be primarily driven by linguistic knowledge. Linguistic analysis Typically, users manually build GSL grammars for specific applications. We use a rather different approach, compiling the GSL grammars from linguistically motivated unification grammars. In our prototype, the unification grammar encodes linguistic

Ambiguity resolution Used in isolation, the meaning of natural language can be highly ambiguous. To deal with ambiguities in user utterances, we encode the logical forms as underspecified discourse representation structures (DRSs). Our motive is based in discourse representation theory;6 our goal is to cover a range of context-sensitive expressions such as pronouns and presupposition triggers. Typically, a DRS captures the entire dialogue between user and robot. Given this, we must embed the meaning representation of a newly uttered utterance into the context’s DRS and resolve any ambiguities with respect to that context. Among the questionable phenomena are lexical ambiguities, structural ambiguities, and referential ambiguities (such as pronouns). Figure 5 shows the DRS for the example dialogue from earlier; we’ve marked imperative expressions such as go to the post office with the delta operator. We view DRSs as small models of the world. Within this world model we assert a set of discourse referents (standing for objects in the world model) and assign properties (or relations) to these referents. Graphically, we do this by placing the discourse referents in the upper part of the DRS, IEEE INTELLIGENT SYSTEMS

Import go_museum_postoffice,is_located def action(): go_museum_postoffice.action () is_located.action(‘library’,’left_position’) return()

# user defined procedure # procedure primitive

Figure 6. The Python procedure created from Figure 5’s DRS for the museum-to-library route.

and the properties and relations between them in its lower part. Moreover, as the figure shows, DRSs are recursive structures and thus can appear as subcomponents of other DRSs. Discourse referents signify objects for possible later reference, for example, by use of a pronoun. However, a DRS’s internal structure constrains pronoun resolution, which is realized by an accessibility relation. Accessibility is governed by the way DRSs are nested into each other, and hence narrows down the choice of an antecedent in the pronoun-resolution process. Figure 5 shows this internal structure. Dialogue updates We view a dialogue as a series of alternative “moves” between the user and robot. Common moves are assertions, questions, or instructions, but moves can also include acknowledgments, acceptances, or denials. As Figure 5 shows, the DRS explicitly represents such moves. The dialogue move engine coordinates the integration of new moves (from the user as well as the robot) and decides the next move (for the robot) using the recent-information-state approach to dialogue processing.7 We realize robot utterances by generating prosodically annotated strings from the DRS and feeding these to a speech synthesizer. Using DRSs for modeling dialogue has an obvious advantage: they let us make inferences, as there is a standard translation available from DRSs to first-order logic (and for first-order logic, various theorem provers are available8). Hence, using this feature allows us to let the robot make “intelligent” responses. Inferences are required not only to resolve ambiguities in user input (of a scopal, referential, or lexical nature), but also to • detect the move associated with a new utterance (for example, did the user answer a question or raise a new issue?), • plan the next utterance or action, and • generate natural-sounding utterances (by distinguishing old from new information within an utterance).

Robot manager design The robot manager analyzes the dialogue manager’s DRSs and controls the robot. As SEPTEMBER/OCTOBER 2001

Figure 4 shows, the execution process thread interprets the dialogue manager’s message and initiates the relevant subprocesses. In particular, if the user utterance is an executable command—that is, if there is a corresponding procedure—the execute process starts. Otherwise, if it is an unknown execution command, the learn process starts along with a dialogue manager interaction to resolve the impasse. We also plan to let the user issue a stop command to suspend or end any other process. We wrote the robot manager using C and the Python scripting language (www. An important feature of scripting languages is their ability to write their own code. For example, when the user gives a route instruction, the robot manager saves it as a Python script, which then becomes part of the procedure set available for execution or future learning. During learning, the dialogue manager sends the updated DRS to the process launcher, which searches it for user commands (indicated by !δ) or explanations. In Figure 5, an explanation is represented as either a box in square brackets tagged with a δ symbol (such as [δ DRS]), or as a pair made by the predicate state(X) and an X followed by DRS (such as state(X) and X: DRS). For example, the box containing the predicates event(O), agent(O,D), go(O), and to(O,B), is an instruction that is part of the explanation of how to go to the library. Once the process launcher finds such a box, it detects the action (go) and the necessary attributes (to) using action formats stored in the knowledge database. For example, the primitive action go, as defined using the corpus, matches the user instruction and requires the attribute to, which is also in the DRS that the dialogue manager sent. In this way, the process launcher generates a first list that contains the actions and their attributes contained in the user’s instructions. The process launcher then checks whether the user gave the necessary parameters for an action’s requested attributes (for example, post office for the to attribute associated with the go action). The system can also infer the starting point (from(C1,E) and museum(E)) from the context. Finally, the system generates the action’s procedure name (go_museum_postoffice). It then checks to see if the database has a procedure matching that name. If so, it adds the procedure to the program that defines the new

cedure. If the procedure is not in the database, or the attributes list is incomplete (for example, if the user action is turn, but no direction is specified), the system will start an interaction with the dialogue manager to solve the impasse and put the learn thread on hold. Figure 6 shows the procedure go_museum_ library after the system has terminated the learning process. According to the user instructions, the first procedure to be called is go_museum_ postoffice.action, which allows the robot to reach the post office. When the next procedure (is_located.action(‘library’,’left_position’)) is called, the system follows the remaining user instructions to reach the library. Note that in Figure 5, the boxes T and S represent the instructions “the library is located on your left” and “go straight ahead.” The robot manager has combined them, extracting the correct procedure primitive to perform that part of the instruction. The next time the user asks the robot to execute a command that the system can associate with an existing procedure—for example, asking the robot to “go to the library” which is associated with go_museum_library—the process launcher will successfully call the Python procedure through the execute thread. So far, we have tested the full conversion from natural language utterances into procedures using primitives executing preprogrammed robot displacements, rather than the vision-based navigation procedures. We are currently developing the latter.

Discussion Using natural language to teach a robot a route is an application of a more general instruction-based learning methodology. We selected the corpus-based approach because it lets users operate the robot using unconstrained speech, while also creating an efficient natural language processing system using a restricted lexicon. With this method, we created a small functional vocabulary that contains only 14 primitives, but remains open. New users can thus either formulate all instructions in terms of these existing primitives, or teach the robot new primitives (such as “cross the road”). To enable such learning, the robot must posses an additional set of primitives that let users refer to lower-level robot actions (such as the number of wheel turns) in their instructions. In their research, Luis Seabra Lopes 7

S e m i s e n t i e n t

R o b o t s

and Antonio Teixeira used lower-level robot actions in this way.9 Our approach requires a new corpus collection process to determine the necessary additional primitive procedures. Another solution might be to use dialogue management to reformulate instructions. By using the corpus-based approach, we expect to minimize such “repair dialogues.” An open question is how to detect new functions in the user’s utterance, as the lexicon might not contain the required vocabulary. Theoretical foundations From one perspective, our approach to robot control attempts to integrate the good properties of behavior-based control10 and classical AI. Behavior-based control is an effective method for designing low-level primitives that can cope with real-world uncertainties, and AI has developed effective tools for symbol manipulation and reasoning. However, our system differs from both methods in several ways. In our approach, the corpus defines which symbols and primitives to use. Consequently, some of the primitives are rather complex functions that execute planning tasks using representations of the environment. These are not always compatible with the representation-less philosophy of behavior-based systems. From an AI perspective, our system does not use the full range of reasoning capabilities offered by systems such as SOAR. Our goals with symbolic processing are simply to verify the consistency of instructions and build new procedure specifications. In particular, at this stage in the project, we don’t need planning at the symbolic level. Instead, users do the planning and communicate the plan to the robot using natural language. This limits the robot’s autonomy but improves safety by limiting its unpredictable behavior. Symbol grounding Many researchers have investigated other hybrid architectures integrating behaviorbased systems and AI to solve the symbolgrounding problem.11–14 Essentially, the problem is to maintain coherence between representations that reflect actions and events, and the stream of sensory information a dynamic environment produces. Stevan Harnad and his colleagues offer one of several detailed discussions of this problem.15 Generally, researchers accept that the problem is avoidable if a robot’s reasoning process somehow depends on its relation to the world—that is, if the development of the 8

internal categories and their transformations depends on external interactions. Accordingly, truly sentient robots need learning abilities that constrain abstract reasoning in relation to dynamically changing external events and the results of their own actions. Several works have addressed this issue. Christopher Malcom and his colleagues developed a system that could carry out complex tasks in a relatively well-ordered and predictable world.11 However, given that the symbol system only functioned under internal syntactic constraints, the grounding problem was not really addressed. In a system developed by Karl MacDorman and Jun Tani and their colleagues, the development of internal categories and their transformations depended on exter-

Truly sentient robots need learning abilities that constrain abstract reasoning in relation to dynamically changing external events and the results of their own actions. nal interactions.12,13 However, the system did not interact with users, and thus users (experienced or not) could not modify the grounding. In contrast, Hideki Asoh and his colleagues developed a system that could learn new actions through natural language dialogues, but only while the robot was performing them—that is, it could only learn a new route from A to B while actually moving from A to B and dialoguing with the user.14 In our IBL system, learning occurs solely at the symbolic level and thus happens prior to performance. Because we can predict future states, using SAS representations, the robot can engage in a verification dialogue with users before execution errors occur. If the environment changes and invalidates an instruction, the robot will detect this from the mismatch between the expected result and the actual one. However, learning is not autonomous. The robot must interact with a human user to learn new symbols and their meaning. This simplifies robot design, transferring part of the cognitive load to the user.


ur future experiments will focus on two questions. First, can this approach create effective and socially acceptable domestic robots? We might begin this exploration by focusing on an autonomous wheelchair. Second, can our approach be generalized to other instruction contexts? Given the relation between human language and human motor and cognitive skills, we might find that using unconstrained natural language programming is the most appropriate use for domestic robots or intelligent appliances that perform human-like tasks.

Acknowledgments Our work is supported by EPSRC grants GR/M90023 and GR/M90160. We are grateful to Angelo Cangelosi and Kenny Coventry for enlightening discussions.

References 1. A. Cangelosi and S. Harnad, “The Adaptive Advantage of Symbolic Theft over Sensorimotor Toil: Grounding Language in Perceptual Categories,” to be published in Evolution Communication, 2001. 2. L. Steels, “The Origins of Syntax in Visually Grounded Robotic Agents,” Artificial Intelligence, vol. 103, nos. 1–2, 1998, pp. 133–156. 3. J.E. Laird, A. Newell, and P.S. Rosenbloom, “SOAR: An Architecture for General Intelligence,” Artificial Intelligence, vol. 33, no. 1, 1987, pp.1–64. 4. G. Bugmann et al., Using Verbal Instruction for Route Learning, tech. report UMC-01-41, Dept. of Computer Science, Manchester Univ., Manchester, UK, 2001. 5. M. Denis, “The Description of Routes: A Cognitive Approach to the Production of Spatial Discourse,” Current Psychology of Cognition, vol. 16, no. 4, 1997, pp. 409–458. 6. H. Kamp and U. Reyle, From Discourse to Logic, Kluwer Academic Publishers, Norwell, Mass., 1993. 7. D. Traum et al., “A Model of Dialogue Moves and Information State Revision,” Trindi Report D2.1, 1999; projects/trindi (current 30 Aug. 2001). 8. P. Blackburn et al., “Inference and Computational Semantics,” 3rd Int’l Workshop Computational Semantics (IWCS-3), Kluwer Academic Publishers, Norwell, Mass., 1999, pp. 5–19. 9. L. Seabra Lopes and A.J.S. Teixeira, “HumanRobot Interaction through Spoken Language Dialogue,” Proc. IEEE/RSJ Int’l Conf. Intelligent Robots and Systems, IEEE CS Press, Los Alamitos, Calif., 2000, pp. 528–534.


T h e

A u t h o r s Stanislao Lauria is a research fellow at the University of Plymouth. His general research interests are in neural networks, artificial intelligence, and robot vehicles. He was previously a research fellow at the University of Reading. He received a degree in physics from the Universita di Napoli in Napoli, Italy, and his PhD in cybernetics from the University of Reading. Contact him at the Centre for Neural and Adaptive Systems, School of Computing, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK; [email protected].

Guido Bugmann is a senior research fellow in the University of Plymouth’s School of Computing, where he develops vision-based navigation systems for robots and investigates biological planning and spatial memory. He previously worked at the Swiss Federal Institute of Technology in Lausanne and NEC’s Fundamental Research Laboratories in Japan, and has three patents and more than 90 publications. Bugmann received his PhD in physics at the University of Geneva. He is a member of the Swiss Physical Society, the Neuroscience Society, and the British Machine Vision Association. Contact him at the Centre for Neural and Adaptive Systems, School of Computing, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK; [email protected].

Theocharis Kyriacou is a graduate student in the University of Plymouth’s

School of Computing, where he works on instruction-based learning for mobile robots. He earned his B.Eng (Hons) in electronics engineering from the University of Sheffield. Contact him at the Centre for Neural and Adaptive Systems, School of Computing, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK; [email protected].


Johan Bos is a research fellow in the Language Technology Group at the

University of Edinburgh, Scotland, where he works on the design of humanmachine dialogue systems. Trained as a computational linguist, his research interests are in computational semantics, with particular focus on the speechsemantics interface, aspects of knowledge representation and inference, and discourse analysis. Contact him at the Institute for Communicating and Collaborative Systems, Division of Informatics, University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, Scotland, UK; [email protected].

Ewan Klein is a reader in the Division of Informatics at the University of Edinburgh and director of Natural Language Research at Edify Corporation. His research interests include computational approaches to phonology, syntax, and semantics; multimodal interfaces; natural language specification of hardware design; and dialogue with intelligent systems. He received a BS in social and political science and a PhD in formal semantics from the University of Cambridge and an MS in general linguistics from Reading University.

10. R.A. Brooks, “Intelligence without Representation,” Artificial Intelligence, vol. 47, nos. 1–3, 1991, pp. 139–159. 11. C.A. Malcom, “The SOMASS System: A Hybrid Symbolic and Behaviour-Based System to Plan and Execute Assemblies by Robot,” Hybrid Problems and Hybrid Solutions, J. Hallam et al., eds., Oxford Univ. Press, Oxford, UK, 1995, pp. 157–168. 12. K.F. MacDorman, “Grounding Symbols through Sensorimotor Integration,” J. Robotics Soc. of Japan, vol. 17, no. 1, 1999, pp. 20–24. 13. J. Tani, “Model Based Learning for Mobile SEPTEMBER/OCTOBER 2001

Robot Navigation from the Dynamical System Perspective,” IEEE Trans. Sys. Man Cybernetics, Part B, vol. 26, no. 3, 1996, pp. 421–436. 14. H. Asoh et al., “Socially Embedded Learning of the Office-Conversant Mobile Robot, Jijo2,” Proc. 15th Int’l Joint Conf. Artificial Intelligence (IJCAI’97), Springer-Verlag, Berlin, 1997, pp. 880–885. 15. S. Harnad, “The Symbol Grounding Problem,” Physica D, vol. 42, 1990, pp. 335–346; available online at uk/~harnad/Papers/Harnad/harnad90. sgproblem.html (current 10 Sept. 2001).