Form Approved OMB No. 0704-0188
Report Documentation Page
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.
1. REPORT DATE
3. DATES COVERED 2. REPORT TYPE
2005
00-00-2005 to 00-00-2005
4. TITLE AND SUBTITLE
5a. CONTRACT NUMBER
Towards Collaboration with Robots in Shared Space: Spatial Perspective and Frames of Reference
5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER
6. AUTHOR(S)
5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)
Naval Research Laboratory,Navy Center for Applied Research in Artificial Intelligence (NCARAI),4555 Overlook Avenue SW,Washington,DC,20375 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)
8. PERFORMING ORGANIZATION REPORT NUMBER
10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S)
12. DISTRIBUTION/AVAILABILITY STATEMENT
Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT
15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: a. REPORT
b. ABSTRACT
c. THIS PAGE
unclassified
unclassified
unclassified
17. LIMITATION OF ABSTRACT
18. NUMBER OF PAGES
Same as Report (SAR)
3
19a. NAME OF RESPONSIBLE PERSON
Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18
them; participate in successful relationships with people that provide benefit over the long term; understand people as people—in social-psychological terms to appreciate our goals, beliefs, feelings, etc.; and communicate with us in human-centered terms. Given that robots have been exploring other planets for years, it’s ironic to think of your home as the “final frontier” for robots. But the complexity of human society raises many new challenges for them. However, with every advance in these “grand challenge” areas, R2-D2 and C-3PO are coming closer to a home near you. ABOUT THE AUTHOR A pioneer in the areas of human-robot interaction and sociable robotics, Cynthia Breazeal is an assistant professor of Media Arts and Sciences at the MIT Media Lab where she is director of the Robotic Life Group and holds the LG Group career development chair. Her research interests focus on the scientific pursuit and technological innovation necessary to create machines that understand and engage people in social and affective terms. Her first book, Designing Sociable Robots, is published by The MIT Press (2002).
SPECIAL SECTION: ROBOTS
© ACM 1072-5220/05/0300 $5.00
■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ IMAGINE MOBILE ROBOTS OF THE FUTURE, working side by side with humans, collaborating in a shared workspace. For this to become a reality, robots must be able to do
Towards Collaboration with Robots in Shared Space: Spatial Perspective and Frames of Reference
something that humans do constantly: understand how others perceive space and the relative positions of objects around them—they need the ability to see things from another person’s point of view. Our research group and others are building computational, cognitive, and linguistic models that can deal with frames of reference. Issues include dealing with constantly changing frames of reference, changes in spatial perspective, understanding what actions to take, the use of new words and common ground. Our approach is an implementation informed by cognitive and computational theories. It is based on developing computational cognitive models (CCMs) of certain high-level cognitive skills humans possess and that are relevant for collaborative tasks. We then use these models as reasoning mechanisms for our robots. Why do we propose using CCMs as opposed to more traditional programming paradigms for robots? We believe that by giving the robots similar representations and reasoning mechanisms to those used by humans, we will build robots that act in a way that is more compatible with humans. Hide and Seek.
Our foray into this area started when we were developing compu-
tation cognitive models of how young children learn the game of hide and seek [1]. The purpose was to enable our robots to use human-level cognitive skills to make the decisions about where to look for people or things hidden by people. The research resulted in a hybrid architecture with a reactive/probabilistic system for robot mobility [5], and a high-level cognitive system based on ACT-R [6] that made the high-level decisions for where to hide or seek (depending on which role the robot was playing). Videos of the robot playing a
Alan C. Schultz
[email protected] J. Gregory Trafton
[email protected]
game of hide and seek can be seen at www.nrl.navy.mil/aic/iss/aas. While this work was interesting in its own right, the system led us to the realization that the ability to do perspective-taking is a critical cognitive ability for humans, particularly when they want to collaborate. Spatial Perspective in Space.
: / 22
i n t e r a c t i o n s
/
To determine just how important perspective and
m a r c h
+
a p r i l
2 0 0 5
frames of reference are in collaborative tasks in shared space (and also because we were working on a DARPA funded project to move these capabilities to the NASA Robonaut), we analyzed a series of tapes of two astronauts and a ground controller training in the NASA Neutral Buoyancy Tank facility for an assembly task for Space Station mission 9A. We performed a protocol analysis of several hours of these tapes focusing on the use of spatial language and commands from one person to another. We found that the astronauts changed their frame of reference (as seen during their dialog) approximately every other utterance. As an example of how prevalent these changes in frame of reference are, consider this following utterance from ground control: “... if you come straight down from where you are, uh, and uh, kind of peek down under the rail on the nadir side, by your right hand, almost straight nadir, you should see the...” Here we see five changes in frame of reference (highlighted in italics) in a single sentence! These rates in the change of reference are consistent with work by Franklin, Tversky, & Coon, 1992 [4]. In addition, we found that the astronauts had to take other perspectives, or forced others to take their perspective, about 25 percent of the time [3]. Obviously, the ability to handle changing frames of reference and being able to understand spatial perspective will be a critical skill for robots such as NASA Robonaut and, we would argue, any other robotic system that needs to communicate with people in spatial contexts (i.e., any construction task, direction giving, etc.). Models of Perspective Taking.
Imagine the following task. An astronaut and his
robotic assistant are working together to assemble a structure in shared space. The human, who can see one wrench, says to the robot, “Pass me the wrench.” Meanwhile, from the robot’s point of view, two wrenches are visible, while the human has a partially occluded view and can only see one wrench. What should the robot do? Evidence suggests that
REFERENCES 1. J. G. Trafton, Alan C. Schultz, N, L. Cassimatis, L. Hiatt, D. Perzanowski, D. P. Brock, M. Bugajska, and W Adams, (2004). “Using Similar Representations to Improve HumanRobot Interaction,” Agents and Architectures (in press), Erlbaum, 2004. 2. N. Cassimatis, J. G. Trafton, M. Bugajska, and A. C. Schultz (2004). “Integrating Cognition, Perception, and Action through Mental Simulation in Robots.” Robotics and Autonomous Systems, 49(1-2), Elsevier, Nov. 2004, pp. 13-23. 3. Trafton, J. G., Cassimatis, N. L., Brock, D. P., Bugajska, M. D., Mintz, F. E., & Schultz, A. C. (under review). Enabling effective human-robot interaction using perspective-taking in robots. 4. Franklin, N., Tversky, B., & Coon, V. (1992). Switching points of view in spatial mental models. Memory & Cognition, 20(5), 507-518. 5. Schultz, A., Adams, W., and Yamauchi, B. (1999). “Integrating Exploration, Localization, Navigation and Planning Through a Common Representation,” Autonomous Robots, 6(3), June, 1999. 6. Anderson, J. R., & Lebiere, C. (1998). Atomic components of thought. Mahwah, NJ: Erlbaum. 7. Clark, H. H. (1996). Using language. Cambridge University Press. 8. Harrison, A. M., & Schunn, C. D. (2002). ACT-R/S: A computational and neurologically inspired model of spatial reasoning. In W. D. Gray & C. D. Schunn (Eds.), Proceedings of the Twenty Fourth Annual Meeting of the Cognitive Science Society (pp. 1008). Fairfax, VA: Lawrence Erlbaum Associates.
humans, in similar situations, will pass the wrench that they know the other human can see since this is a jointly salient feature [7]. We developed two models of perspective taking that could handle the above scenario in a general sense. The first approach used the ACT-R/S system [8] to model perspective taking using a cognitively plausible spatial representation. The second approach used Polyscheme [2] and modeled the cognitive process of mental simulation; humans tend to mentally simulate situations in order to resolve problems. Using these models we have demonstrated a robot being able to solve problems similar to the wrench problem. Videos of a robot and human in this task can be seen at http://www.nrl.navy.mil/aic/iss/aas/. It is clear that if humans are to work as peers with robots in shared space, the robot must be able to understand the natural human tendency to use different frames of reference and to take the human’s perspective. To create robots with these capabilities, we propose using CCMs, as opposed to more traditional programming paradigms for robots. First, a natural and intuitive interaction results in reduced cognitive load. Second, more predictable behavior engenders trust. Finally, more understandable decisions allow the human
i n t e r a c t i o n s
/
m a r c h
+
a p r i l
2 0 0 5
: / 23
to recognize and more quickly repair mistakes. ABOUT THE AUTHORS
Alan C. Schultz is head of the Intelligent Systems Section, Navy Center
for Applied Research in Artificial Intelligence at the Naval Research Laboratory (NRL) in Washington D.C. He has 18 years experience and over 70 publications in robotics, human-robot interaction, and machine learning, and is responsible for establishing and running the robotics laboratory at NRL. Mr. Schultz taught at the first IEEE/RAS Summer School on Human-Robot Interaction, and has chaired many conferences and workshops in robots and human-robot interaction. His research is in the areas of human-robot interaction, machine learning, autonomous robotics, and adaptive systems. Greg Trafton is a cognitive scientist at the Naval Research Laboratory. He received a B.S. in computer science (second major in psychology) from Trinity University in San Antonio, TX in 1989. He received a master’s and Ph.D. in psychology from Princeton University in 1994. He is interested in putting appropriate computational cognitive models on robots to facilitate human-robot interaction.
■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ AT MITSUBISHI ELECTRIC RESEARCH LABORATORIES, we are investigating the role of hosting activities with humanoid robots. Hosting activities are activities where an
Robots as Laboratory Hosts
agent in an environment provides services, particularly information and entertainment services. They include tasks such as being a docent in a museum or host in a laboratory, and greeting and guiding visitors in stores and shopping malls. This research is driven by three scientific questions: • How do people convey connection both verbally and non-verbally in their interactions with one another?
Candace L. Sidner
[email protected]
SPECIAL SECTION: ROBOTS
© ACM 1072-5220/05/0300 $5.00
Christopher Lee
[email protected]
• Given what is learned from the above, how can robots convey engagement in interactions with people? • Under the constraints of current robotics, vision and language technologies, can humanoid robots act as hosts to people and successfully engage with them? Our robot behavior has been informed by study of human-to-human interactions involving laboratory hosting [1]. From videotapes of people hosting visitors to our lab, we have extracted an understanding of the engagement process, the process by which people begin, maintain and end their perceived connection to one another. While the ability to converse and collaborate are central to this process, non-verbal behaviors are also critical—getting them wrong is likely to cause a human to misinterpret the robot’s behavior. In our study we focused on the use of the face to look at another or the surrounding environment as a feature of engagement. In general, looking at another conveys engagement, while long looks away indicate a decline in maintaining engagement. However, since people cannot and do not always look at their conversational partners, even when Figure 1: Mel and the IGlassware Demo
: / 24
i n t e r a c t i o n s
/
m a r c h
+
a p r i l
2 0 0 5