Appears in Proceedings of the First Workshop on Embodied Conversational Characters, Tahoe City, CA, October 1998
Task-Oriented Dialogs with Animated Agents in Virtual Reality Je Rickel and W. Lewis Johnson
Information Sciences Institute & Computer Science Department University of Southern California 4676 Admiralty Way, Marina del Rey, CA 90292-6695
[email protected],
[email protected] http://www.isi.edu/isd/VET/vet.html
Abstract We are working towards animated agents that can carry on tutorial, task-oriented dialogs with human students. The agent's objective is to help students learn to perform physical, procedural tasks, such as operating and maintaining equipment. Although most research on such dialogs has focused on verbal communication, nonverbal communication can play many important roles as well. To allow a wide variety of interactions, the student and our agent cohabit a threedimensional, interactive, simulated mock-up of the student's work environment. The agent, Steve, can generate and recognize speech, demonstrate actions, use gaze and gestures, answer questions, adapt domain procedures to unexpected events, and remember past actions. This paper focuses on Steve's methods for generating multi-modal behavior, contrasting our work with prior work in task-oriented dialogs, multimodal explanation generation, and animated conversational characters.
Introduction
We are working towards animated agents that can carry on tutorial, task-oriented dialogs with human students. The agent's objective is to help students learn to perform physical, procedural tasks, such as operating and maintaining equipment. Thus, like most earlier research on task-oriented dialogs, the agent (computer) serves as an expert that can provide guidance to a human novice. Research on such dialogs dates back more than twenty years (Deutsch 1974), and the subject remains an active research area (Allen et al. 1996; Lochbaum 1994; Walker 1996). However, the vast majority of that research has focused solely on verbal dialogs, even though the earliest studies clearly showed the ubiquity of nonverbal communication in human task-oriented dialogs (Deutsch 1974). To allow a wider variety of interactions among agents and human students, we use virtual reality; agents and students cohabit a three-dimensional, interactive, simulated mock-up of the student's work environment.
Virtual reality oers a rich environment for multimodal interaction among agents and humans. Like standard desktop dialog systems, agents can communicate with humans via speech, using text-to-speech and speech recognition software. Like previous simulationbased training systems, the behavior of the virtual world is controlled by a simulator, and agents can perceive the state of the virtual world via messages from the simulator, and they can take action in the world by sending messages to the simulator. However, an animated agent that cohabits a virtual world with students has a distinct advantage over previous disembodied tutors: the agent can additionally communicate nonverbally using gestures, gaze, facial expressions, and locomotion. Students also have more freedom; they can move around the virtual world, gaze around (via a head-mounted display), and interact with objects (e.g., via a data glove). Moreover, agents can perceive these human actions; virtual reality software can inform agents of the location (in x-y-z coordinates), eld of view (i.e., visible objects), and actions of humans. Thus, virtual reality is an important application area for multi-modal dialog research because it allows more human-like interactions among synthetic agents and humans than desktop interfaces can. Although practically ignored until recently, nonverbal communication can play many important roles in task-oriented tutorial dialogs. The agent can demonstrate how to perform actions (Rickel & Johnson 1997a). It can use locomotion, gaze, and deictic gestures to focus the student's attention (Lester et al. 1998; Noma & Badler 1997; Rickel & Johnson 1997a). It can use gaze to regulate turn-taking in a mixedinitiative dialog (Cassell et al. 1994). Head nods and facial expressions can provide unobtrusive feedback on the student's utterances and actions without unnecessarily disrupting the student's train of thought. All of these nonverbal devices are a natural component of human dialogs. Moreover, the mere presence of a life-like agent may increase the student's arousal and
motivation to perform the task well (Lester et al. 1997; Walker, Sproull, & Subramani 1994). To explore the use of animated agents for tutorial, task-oriented dialogs, we have designed such an agent: Steve (Soar Training Expert for Virtual Environments). Steve is fully implemented and integrated with the other software components on which it relies (i.e., virtual reality software, a simulator, and commercial speech recognition and text-to-speech products). We have tested Steve on a variety of naval operating procedures; it can teach students how to operate several consoles that control the engines aboard naval ships, as well as how to perform an inspection of the air compressors on these engines. Moreover, Steve is not limited to this domain; it can provide instruction in a new domain given only the appropriate declarative domain knowledge. Our work on Steve complements the long line of research on verbal task-oriented dialogs.
Steve's Capabilities
To illustrate Steve's capabilities, suppose Steve is demonstrating how to inspect a high-pressure air compressor aboard a ship. The student's head-mounted display gives her a three-dimensional view of her shipboard surroundings, which include the compressor in front of her and Steve at her side. As she moves or turns her head, her view changes accordingly. Her head-mounted display is equipped with a microphone (to allow her to speak to Steve) and earphones (through which Steve speaks to her). After introducing the task, Steve begins the demonstration. \I will now check the oil level," Steve says, and he moves over to the dipstick. Steve looks down at the dipstick, points at it, looks back at the student, and says \First, pull out the dipstick." Steve pulls it out (see Figure 1). Pointing at the level indicator, Steve says \Now we can check the oil level on the dipstick. As you can see, the oil level is normal." To nish the subtask, Steve says \Next, insert the dipstick" and he pushes it back in. Continuing the demonstration, Steve says \Make sure all the cut-out valves are open." Looking at the cut-out valves, Steve sees that all of them are already open except one. Pointing to it, he says \Open cut-out valve three," and he opens it. Next, Steve says \I will now perform a functional test of the drain alarm light. First, check that the drain monitor is on. As you can see, the power light is illuminated, so the monitor is on" (see Figure 2). The student, realizing that she has seen this procedure before, says \Let me nish." Steve acknowledges that she can nish the task, and he shifts to monitoring her performance.
Figure 1: Steve pulling out a dipstick
Figure 2: Steve describing a power light
Figure 3: Steve pressing a button The student steps forward to the relevant part of the compressor, but is unsure of what to do rst. \What should I do next?" she asks. Steve replies \I suggest that you press the function test button." The student asks \Why?" Steve replies \That action is relevant because we want the drain monitor in test mode." The student, wondering why the drain monitor should be in test mode, asks \Why?" again. Steve replies \That goal is relevant because it will allow us to check the alarm light." Finally, the student understands, but she is unsure which button is the function test button. \Show me how to do it" she requests. Steve moves to the function test button and pushes it (see Figure 3). The alarm light comes on, indicating to Steve and the student that it is functioning properly. Now the student recalls that she must extinguish the alarm light, but she pushes the wrong button, causing a dierent alarm light to illuminate. Flustered, she asks Steve \What should I do next?" Steve responds \I suggest that you press the reset button on the temperature monitor." She presses the reset button to extinguish the second alarm light, then presses the correct button to extinguish the rst alarm light. Steve looks at her and says \That completes the task. Any questions?" The student only has one question. She asks Steve why he opened the cut-out valve.1 \That action was 1 Unlike all other communication between the student and Steve, such after-action review questions are posed via a desktop menu, not speech. Steve generates menu items for all the actions he performed, and the student simply selects one. A speech interface for after-action review would require more sophisticated speech understanding.
relevant because I wanted to dampen oscillation of the stage three gauge" he replies. This example illustrates a number of Steve's capabilities. It can generate and recognize speech, demonstrate actions, use gaze and gestures, answer questions, adapt domain procedures to unexpected events, and remember past actions. The remainder of the paper focuses on Steve's methods for generating multi-modal communicative acts. For additional technical details on this and other aspects of Steve's capabilities, see (Rickel & Johnson 1998a). For additional motivation behind this research, as well as a discussion of the related software components (e.g., the virtual reality software and the simulator) see (Johnson et al. 1998). For a description of Steve's use in team training, where multiple students and agents can practice tasks that require coordinated action by multiple team members, see (Rickel & Johnson 1998b).
Generating Multi-Modal Behavior
Like many other autonomous agents that deal with a real or simulated world, Steve consists of two components: the rst, implemented in Soar (Laird, Newell, & Rosenbloom 1987; Newell 1990), handles high-level cognitive processing, and the second handles sensorimotor processing. The cognitive component interprets the state of the virtual world, constructs and carries out plans to achieve goals, and decides how to interact with the student. The sensorimotor component serves as Steve's interface to the virtual world, allowing the cognitive component to perceive the state of the world and cause changes in it. It monitors messages from the simulator describing changes in the state of the world, from the virtual reality software describing actions taken by the student and the student's position and eld of view (set of objects within the viewing frustum), and from speech recognition software describing the student's requests and questions posed to Steve.2 The sensorimotor component sends messages to the simulator to take action in the world, to the virtual reality software to control Steve's animated body, and to text-to-speech software to generate speech. Steve's high-level behavior is guided by three primary types of knowledge: a model of the current task, Steve's current plan for completing the task, and a representation of who has the task initiative. Steve's model of a task is encoded in a hierarchical partialorder plan representation, which it generates automatically using task decomposition planning (Sacerdoti Steve does not currently incorporate any natural language understanding; it simply maps prede ned phrases to speech acts. 2
1977) from its declarative domain knowledge. As the task proceeds, Steve uses the task model to maintain a plan for how to complete the task, using a variant of partial-order planning techniques (Weld 1994). Finally, it maintains a record of whether Steve or the student is currently responsible for completing the task; this task initiative can change during the course of the task at the request of the student.3 When the student has the task initiative, Steve's primary role is to answer questions and evaluate the student's actions. Steve's answers are currently just verbal. Steve follows the student around by attaching a miniaturized version of its body to the corner of the student's eld of view; this allows Steve to remain in the student's view without requiring the student to shift attention between Steve and the objects of the task. When evaluating the student's actions, Steve accompanies negative feedback with a shake of its head, and provides positive feedback on correct actions only nonverbally, by nodding its head. Our rationale is that such positive feedback should be as unobtrusive as possible, to avoid disrupting the student, and we expect verbal comments to be more disruptive. When Steve has the task initiative, its role is to demonstrate how to perform the task. In this role, it follows its plan for completing the task, demonstrating each step. Because its plan only provides a partial order over task steps, Steve uses a discourse focus stack (Grosz & Sidner 1986) to ensure a global coherence to the demonstration. The focus stack also allows Steve to recognize digressions and resume the prior demonstration when unexpected events require a temporary deviation from the usual order of task steps. Most of Steve's multi-modal communicative behavior arises when demonstrating a primitive task step (i.e., an action in the simulated world). For example, to demonstrate an object manipulation action, Steve would typically proceed as follows: 1. First, Steve moves to the location of the object it needs to manipulate. The cognitive component sends a locomotion motor command to the sensorimotor component, along with the object to which it wants to move, then waits for perceptual information to indicate that the body has arrived. 2. Once Steve arrives at the desired object, it explains what it is going to do. This involves describing the step while pointing to the object to be manipulated. To describe the step, Steve outputs a speech speci cation with three pieces of information: In the future, we plan to use the approach used in TOTS (Rickel 1988) to allow Steve to initiate shifts in task initiative based on a model of the student's knowledge. 3
the name of the step { this will be used to retrieve the associated text fragment whether Steve has already demonstrated this step { this allows Steve to acknowledge the repetition, as well as choose between a concise or verbose verbal description (both descriptions are provided in the domain knowledge) a rhetorical relation indicating the relation in the task model between this step and the last one Steve demonstrated { this is used to generate an appropriate cue phrase (Grosz & Sidner 1986; Moore 1993) Once Steve sends the motor command to generate the speech, it waits for an event from the sensorimotor component indicating that the speech is complete. 3. When the speech is complete, Steve performs the task step. This is done by sending an appropriate motor command and waiting for evidence in its perception that the command was executed. For example, if it sends a motor command to press button1, it waits for a message from the simulator indicating the resulting state: button1_state depressed. 4. If appropriate, Steve explains the results of the action, using appropriate text fragments and pointing gestures. This sequence of events in demonstrating an action is not hardwired into Steve. Rather, Steve has a class hierarchy of action types (e.g., manipulate an object, move an object, check a condition), and each type of action is associated with an appropriate suite of communicative acts. Each suite is essentially an augmented transition network (ATN) represented as Soar production rules. Each node in an ATN represents a high-level act, such as moving to an object, explaining a task step, or performing a step. Arcs represent the conditions for terminating one act and beginning another. Each action type in the class hierarchy inherits from the action types above it; thus, the communicative suite for an action type can be compactly represented as the deviations from its parent's suite. Currently all action types inherit their communicative suite from one of a few general action types; however, our approach allows us to easily extend Steve's behavior if these suites prove inadequate for new types of actions encountered in new domains. Moreover, by representing a suite as an ATN rather than a xed sequence of acts, Steve's demonstration of an action can be more reactive and adaptive; transitions are sensitive to the state of the virtual world as well as the state of the student (e.g., when Steve references an object and
points to it, if the object is not in the student's eld of view, Steve says \Look over here!" and waits until the student is looking before proceeding with the demonstration). Steve's communicative suites are similar to the schemata approach to explanation generation pioneered by McKeown (McKeown 1985). In contrast, Andre et al. (Andre, Rist, & Mueller 1998) employ a standard top-down discourse planning approach to generating the communicative behavior of their animated agent, and they compile the resulting plans into nite state machines for ecient execution. The tradeos between these two approaches to discourse generation are well known (Moore 1995). The individual nodes in an ATN serve as domainindependent building blocks for Steve's behavior. Each results in a set of motor commands sent from Steve's cognitive component to the sensorimotor component. These motor commands are abstract: speak a text string to someone, look at someone or something, move to an object, point at an object, manipulate an object (including several variations such as press, pull, and turn), nod the head in agreement or shake it in disagreement, and move the hand to a neutral position (i.e., not manipulating or pointing at anything). The sensorimotor component implements these commands by decomposing them into messages sent to the simulator, text-to-speech software, and virtual reality software. While the communicative suites provide knowledge of how to demonstrate dierent types of actions, Steve's detailed behavior need not be scripted. Instead, much of the behavior is more general, independent of any particular suite. Some of this behavior is generated as deliberate acts in the cognitive component. This includes such things as looking at someone when waiting for them or listening to them or releasing the conversational turn, nodding the head when Steve is informed of something or when the student takes an appropriate action, and shaking the head when the student makes a mistake. Other actions are generated in the sensorimotor component to accompany motor commands from the cognitive component; for example, Steve looks where it is going, looks at an object immediately before manipulating or pointing at it, looks at someone immediately before speaking to them, and changes facial expression to a \speaking face" (i.e., mouth open and eyebrows slightly raised) when speaking. Finally, low-level behavior that requires a frameby-frame update is implemented in code that is linked in with the virtual reality software; this includes the animation of Steve's locomotion and arm movements, periodic blinking, slight periodic movement of the lips
when speaking, and tracking abilities of Steve's gaze (including three varieties { focus with the entire body, look with the head and neck, and glance with the eyes only). The low-level behavior also takes care of motion constraints; for example, if an object is moving around Steve, he will track it over his left shoulder until it moves directly behind him, at which point he will track it over his right shoulder. Because Steve's dialog with the student is mixed initiative, it must support interruptions from the student. Steve's ability to handle such interruptions is provided as general behavior; it need not be speci ed in communicative suites for demonstrating actions. The student's behavior is unconstrained; she can speak or interact with objects at any time. Although Steve is always aware of the student's actions, it will always complete the current utterance or action before responding. However, Steve gives the student frequent openings; after telling the student something or showing them something, Steve brie y releases the conversational turn by looking at them in silence for about one second. (Again, this behavior need not be speci ed in the communicative suites.) Steve receives a message from the speech recognition software whenever the student starts speaking; this allows Steve to interrupt a demonstration and listen to the student. When the student's utterance is complete, the speech recognition software sends a representation of its content, and Steve stops listening and responds. If the student interacts with objects during Steve's demonstration, Steve may have to revise its plan for completing the task. Such a revision may even occur in the middle of demonstrating a task step; for example, Steve might be describing the action it is about to do when the student performs that action, at which point Steve will acknowledge that it does not need to do the action and continue with the task.4 Such abilities to handle interruptions and unexpected events are crucial to achieving true mixed-initiative interaction. While the cognitive component reasons about interaction with the student and the world at a high level of abstraction, the sensorimotor component requires some spatial knowledge of the virtual world. First, the virtual reality software provides bounding spheres for all objects, thus giving Steve knowledge of each object's position and a coarse approximation of its spatial extent. Second, Steve requires a vector pointing at the front of each object (from which Steve determines where to stand) and, optionally, vectors specifying the direction to press or grasp each object. FiSteve does not currently reproach the student for such interruptions, although we are considering such an extension (Elliott, Rickel, & Lester 1997). 4
nally, to support collision-free locomotion through the virtual world, Steve requires an adjacency graph: each node in the graph represents an object, and there is an edge between two nodes if there is a collision-free path directly between them. The sensorimotor component uses this graph to plan Steve's locomotion: given a motor command to move to a new object, Steve uses Dijkstra's shortest path algorithm (Cormen, Leiserson, & Rivest 1989) to identify a collision-free path. Although more detailed geometric knowledge of the virtual world would allow Steve to interact with it more precisely, the above knowledge is simple to provide and maintain, and it supports the critical functionality for task-oriented dialog.
Discussion
In contrast to prior work in multi-modal explanation generation (Maybury 1993), which focused mainly on combining text and graphics, the issue of media allocation seems less an issue for animated agents. The decision between conveying information in text or graphics is particularly dicult because graphics can be used in many ways. In contrast, the nonverbal behavior of an animated agent, though important, is a far less expressive medium. Therefore, nonverbal body language serves more to complement and enhance verbal utterances, but has less ability to replace them than graphics does. (Although see (Cassell forthcoming) for a deeper discussion of this issue.) The two areas where nonverbal actions can signi cantly replace verbal utterances are demonstrations and facial expressions. Demonstrating an action may be far more eective than trying to describe how to perform the action, and is perhaps the biggest advantage of an animated agent. Our work in controlling Steve's facial expressions has only recently begun, but we hope to use them to give a variety of dierent types of feedback to students when a verbal comment would be unnecessarily obtrusive. One important area for further research is the synchronization of nonverbal acts with speech at the level of individual words or syllables. This capability is needed to support many features of human conversation, such as the use of gestures, head nods, and eyebrow movements to highlight emphasized words. Like Steve, most current animated characters are incapable of such precise timing (Andre, Rist, & Mueller 1998; Lester et al. 1998; Stone & Lester 1996). One exception is the work of Cassell and her colleagues (Cassell et al. 1994). However, they achieve their synchronization through a multi-pass algorithm that generates an animation le for two synthetic, conversational agents. Achieving a similar degree of synchronization during
a real-time dialog with a human is a more challenging problem that will require further research. Our work has focused more on multi-modal behavior generation than multi-modal input. To model faceto-face communication, we must extend the range of nonverbal communicative acts that students can use. To handle multi-modal input in virtual reality, the techniques of Billinghurst and Savage (Billinghurst & Savage 1996) would nicely complement Steve's current capabilities. Their agent, which is designed to train medical students how to perform sinus surgery, combines natural language understanding and simple gesture recognition. They parse both types of input into a single representation, and their early results con rm the intuitive advantages of multi-modal input: (1) dierent types of communication are simpler in one or the other mode, and (2) in cases where either mode alone would be ambiguous, the combination can help disambiguate. The work of Thorisson and Cassell on the Gandalf agent (Thorisson 1996; Cassell & Thorisson 1998) is even more ambitious; people talking with Gandalf wear a suit that tracks their upper body movement, an eye tracker that tracks their gaze, and a microphone that allows Gandalf to hear their words and intonation. Although Gandalf was not developed for conversation in virtual reality, many of the techniques used for multi-modalinput would apply. Our work on Steve complements the long line of research on verbal task-oriented dialogs in the computational linguistics community. Steve currently has no natural language understanding capabilities; it can only understand phrases that we add to the grammar for the speech recognition program. Steve's natural language generation capabilities are also simple; all of Steve's utterances are generated from text templates, although more sophisticated methods could be added without aecting other aspects of Steve's behavior. We are particularly interested in integrating Steve with recent work on spoken dialog systems. For example, the TRAINS system (Allen et al. 1996; Ferguson, Allen, & Miller 1996) supports a robust spoken dialog between a computer agent and a person working together on a task. However, their agent has no animated form, and does not cohabit a virtual world with users. Because TRAINS and Steve carry on similar types of dialogs with users, yet focus on dierent aspects of such conversations, a combination of the two systems seems promising. Our work also complements research on sophisticated control of human gures (Badler, Phillips, & Webber 1993). Such work targets more generality in human gure motion. Our human gure control is ecient and predictable, and it results in smooth anima-
tion. However, it does not provide human-like object manipulation (Douville, Levison, & Badler 1996), and it would not suce for movements such as reaching around objects or through tight spaces. Our architecture is carefully designed so that a new body, along with its associated control code, can be easily integrated into Steve; a well-de ned API separates Steve's control over its body from the detailed motion control code.
Conclusion
Steve illustrates the enormous potential for face-toface, task-oriented dialogs between students and synthetic agents in virtual environments. Although verbal exchanges may be sucient for some tasks, we expect that many domains will bene t from an agent that can additionally use gestures, gaze, facial expressions, and locomotion. Although Steve has only been tested on a virtual shipboard environment for naval training, it can be used for other domains given only a description of the domain procedures and minimal knowledge of the spatial environment; none of Steve's dialog capabilities or multi-modal behaviors are speci c to the naval domain. Moreover, Steve's architecture is designed to accommodate advances in related research areas, such as natural language processing and human gure control.
Acknowledgments
This work was funded by the Oce of Naval Research, grant N00014-95-C-0179. We are grateful for the contributions of our many collaborators: Randy Stiles and his colleagues at Lockheed Martin developed the virtual reality software; Allen Munro and his colleagues at Behavioral Technologies Laboratory developed the simulation authoring and execution software; and Richard Angros, Ben Moore, Behnam Salemi, Erin Shaw, and Marcus Thiebaux at ISI contributed to Steve. We are especially grateful to Marcus, who developed the 3D model of Steve's current body and the code that controls its animation.
References
Allen, J. F.; Miller, B. W.; Ringger, E. K.; and Sikorski, T. 1996. Robust understanding in a dialogue system. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, 62{70. Andre, E.; Rist, T.; and Mueller, J. 1998. Employing AI methods to control the behavior of animated interface agents. Applied Arti cial Intelligence. Forthcoming.
Badler, N. I.; Phillips, C. B.; and Webber, B. L. 1993. Simulating Humans. New York: Oxford University Press. Billinghurst, M., and Savage, J. 1996. Adding intelligence to the interface. In Proceedings of the
IEEE Virtual Reality Annual International Symposium (VRAIS '96), 168{175. Los Alamitos, CA: IEEE
Computer Society Press. Cassell, J., and Thorisson, K. R. 1998. The power of a nod and a glance: Envelope vs. emotion in animated conversational agents. Applied Arti cial Intelligence. Forthcoming. Cassell, J.; Pelachaud, C.; Badler, N.; Steedman, M.; Achorn, B.; Becket, T.; Douville, B.; Prevost, S.; and Stone, M. 1994. Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Proceedings of ACM SIGGRAPH '94. Cassell, J. forthcoming. Embodied conversation: Integrating face and gesture into automatic spoken dialogue systems. In Luperfoy, S., ed., Automatic Spoken Dialogue Systems. MIT Press. Cormen, T. H.; Leiserson, C. E.; and Rivest, R. L. 1989. Introduction to Algorithms. New York: McGraw-Hill. Deutsch, B. G. 1974. The structure of task oriented dialogs. In Proceedings of the IEEE Speech Symposium. Pittsburgh, PA: Carnegie-Mellon University. Also available as Stanford Research Institute Technical Note 90. Douville, B.; Levison, L.; and Badler, N. I. 1996. Task-level object grasping for simulated agents. Presence 5(4):416{430. Elliott, C.; Rickel, J.; and Lester, J. C. 1997. Integrating aective computing into animated tutoring agents. In Proceedings of the IJCAI Workshop on Animated Interface Agents: Making Them Intelligent, 113{121. Ferguson, G.; Allen, J.; and Miller, B. 1996. TRAINS-95: Towards a mixed-initiative planning assistant. In Proceedings of the Third Conference on AI Planning Systems. Grosz, B. J., and Sidner, C. L. 1986. Attention, intentions, and the structure of discourse. Computational Liguistics 12(3):175{204. Johnson, W. L.; Rickel, J.; Stiles, R.; and Munro, A. 1998. Integrating pedagogical agents into virtual environments. Presence 7(6).
Laird, J. E.; Newell, A.; and Rosenbloom, P. S. 1987. Soar: An architecture for general intelligence. Arti cial Intelligence 33(1):1{64. Lester, J. C.; Converse, S. A.; Kahler, S. E.; Barlow, S. T.; Stone, B. A.; and Bhogal, R. S. 1997. The persona eect: Aective impact of animated pedagogical agents. In Proceedings of CHI '97, 359{366. Lester, J. C.; Voerman, J. L.; Towns, S. G.; and Callaway, C. B. 1998. Deictic believability: Coordinating gesture, locomotion, and speech in lifelike pedagogical agents. Applied Arti cial Intelligence. Forthcoming. Lochbaum, K. E. 1994. Using Collaborative Plans to Model the Intentional Structure of Discourse. Ph.D. Dissertation, Harvard University. Technical Report TR-25-94, Center for Research in Computing Technology. Maybury, M. T., ed. 1993. Intelligent Multimedia Interfaces. Menlo Park, CA: AAAI Press. McKeown, K. R. 1985. Text Generation. Cambridge University Press. Moore, J. D. 1993. What makes human explanations eective? In Proceedings of the 15th Annual Conference of the Cognitive Science Society, 131{136. Moore, J. D. 1995. Participating in Explanatory Dialogues. Cambridge, MA: MIT Press. Newell, A. 1990. Uni ed Theories of Cognition. Cambridge, MA: Harvard University Press. Noma, T., and Badler, N. I. 1997. A virtual human presenter. In Proceedings of the IJCAI Workshop on Animated Interface Agents: Making Them Intelligent, 45{51. Rickel, J., and Johnson, W. L. 1997a. Integrating pedagogical capabilities in a virtual environment agent. In Proceedings of the First International Conference on Autonomous Agents. ACM Press. Rickel, J., and Johnson, W. L. 1997b. Intelligent tutoring in virtual reality: A preliminary report. In Proceedings of the Eighth World Conference on Arti cial Intelligence in Education, 294{301. IOS Press.
Rickel, J., and Johnson, W. L. 1998a. Animated agents for procedural training in virtual reality: Perception, cognition, and motor control. Applied Arti cial Intelligence. Forthcoming. Rickel, J., and Johnson, W. L. 1998b. Animated pedagogical agents for team training. In Proceedings of the ITS Workshop on Pedagogical Agents, 75{77. Rickel, J. 1988. An intelligent tutoring framework for task-oriented domains. In Proceedings of the International Conference on Intelligent Tutoring Systems.
Sacerdoti, E. 1977. A Structure for Plans and Behavior. New York: Elsevier North-Holland. Stone, B. A., and Lester, J. C. 1996. Dynamically sequencing an animated pedagogical agent. In Pro-
ceedings of the Thirteenth National Conference on Arti cial Intelligence (AAAI-96), 424{431. Menlo Park,
CA: AAAI Press/MIT Press. Thorisson, K. R. 1996. Communicative Humanoids:
A Computational Model of Psychosocial Dialogue Skills. Ph.D. Dissertation, Massachusetts Institute of
Technology. Walker, J. H.; Sproull, L.; and Subramani, R. 1994. Using a human face in an interface. In Proceedings of CHI-94, 85{91. Walker, M. A. 1996. The eect of resource limits and task complexity on collaborative planning in dialogue. Arti cial Intelligence 85:181{243. Weld, D. S. 1994. An introduction to least commitment planning. AI Magazine 15(4):27{61.