Preliminary Field Trial for Teleoperated Communication Robots Satoshi Koizumi, Takayuki Kanda, Masahiro Shiomi, Hiroshi Ishiguro, and Norihiro Hagita, Member, IEEE
Abstract—This paper introduces a “teleoperated communication robot” whose unique point is that its language component is performed by an operator to avoid the automatic recognition difficulty of spoken language. On the other hand, conceptually, its nonverbal behavior will be autonomously controlled. One important point is to identify whether ordinary people will accept this type of communication robot. Therefore, we conducted a preliminary field trial at a shopping center, which revealed positive perspectives as well as technical problems to be solved.
I. INTRODUCTION
O
UR research aims to develop "communication robots" that naturally interact with humans and support everyday activities at train stations or shopping centers (Figure 1). Since the target audience of a communication robot is the general public who does not have specialized computing and engineering knowledge, a conversational interface using both verbal and nonverbal expressions is becoming more important. Previous studies in robotics have emphasized the merit of robot embodiments, showing the effectiveness of facial expression [1], eye-gaze [2], and gestures [3]. On the other hand, communication robots do not yet have enough capability for verbal communication. The principal difficulty concerns speech recognition of colloquial utterances in noisy environments. Current technology is only capable of recognizing formal utterances in noiseless environments. Although research is being done in robot audition (for example, [4]), the difficulties in daily
Fig. 1 Scene of a communication robot providing information service in a shopping center
Language level
Reflective Sensor
control
Actu ator
Behavioral control
Visceral (a) Human model by D. Norman ([5] p.28)
humans Language level
imple control Actu Sensor -ment ator Behavioral control
Manuscript received March 15, 2006. This work was supported in part by the Ministry of Internal Affairs and Communications of Japan. S. Koizumi is with the ATR Intelligent Robotics and Communication Laboratories, Kyoto, Japan. (phone: +81-774-95-1476; fax: +74-774-95-1408; e-mail:
[email protected]). T. Kanda is with the ATR Intelligent Robotics and Communication Laboratories, Kyoto, Japan. (e-mail:
[email protected]). M. Shiomi is with the ATR Intelligent Robotics and Communication Laboratories, Kyoto and Osaka University, Osaka, Japan. (e-mail:
[email protected]). H. Ishiguro is with the ATR Intelligent Robotics and Communication Laboratories, Kyoto and Osaka University, Osaka, Japan. (e-mail:
[email protected]). N. Hagita is with the ATR Intelligent Robotics and Communication Laboratories, Kyoto, Japan. (e-mail:
[email protected]).
Reactive (b) Teleoperated communication robots Fig. 2 Model of teleoperated communication robots
environment are still beyond the grasp of current technology. In addition, there is a principal difficulty in the development process for communication robots in real fields. They must be placed in real situations; otherwise, we cannot reproduce similar situations. For example, we tried to reproduce real field settings in our laboratory with several
researchers, but failed due to unexpected human behavior. People in the field do not behave as predicted. This trend is obvious if the number of people is large, if there are various kinds of people such as children, parents, and senior citizens, and if there are many objects in the environment. This highly promotes interaction among people, which results in an excessively chaotic situation. Toward these problems, our solution is to use semi-autonomous robots under support from human operators. This allows robots to perform useful tasks in real fields. Moreover, it enables human developers to continuously improve robot capability so that they can be gradually automated. Figure 2 describes our teleoperated communication robot model, which is inspired by D. Norman’s human model (p. 28 in [5]). In his model (Figure 2(a)), the visceral layer corresponds with actions that can be done by simple creatures, such as lizards. The behavioral layer corresponds with the behavior of mammals obtained through repeated training. Behavior from these two layers is unconsciously absorbed. Thus, humans can think (reflective layer) by taking the actions of these two layers. In our model of a teleoperated communication robot, the reflective layer is operated by humans (operators and developers). In the beginning, most of this part will be performed by an operator, and the reactive and behavioral parts will be replaced with software modules. Simple vocal communication such as greetings can be implemented as behavioral modules, as implemented in Robovie [6]. Replacing its reflective part with software may be difficult. In other words, most language communication will be continually managed by human operators, and new behaviors will be continually supplied by human developers. An important consideration is the robot’s partial autonomy. There is a considerable amount of time when the robot is not talking and is idle because no one is interacting with it. Partial autonomy will potentially enable one operator to simultaneously manage two or more robots. (Some robots can be fully autonomous without the reflective part, if their tasks are simple enough, such as a playmate at an elementary school [7] or a guide robot in a museum [8].) An important field test that should be considered before starting to implement such a teleoperated communication robot is to identify whether people will actually accept such a teleoperated service. Sometimes, a robot is preferred because it is not human. Thus, we conducted preliminary field trials to verify this and identify what kinds of service will be accepted. Many previous works exist in the teleoperation of robots, including stable methods for controlling robot posture (such as [9]). Several researchers have developed a system that utilizes robots as a medium of interhuman communication [10], for example, the well-known “tele-presence” concept. Watanabe et al. developed an embodied interaction system that allows people to vocally communicate with each other while robots exchange bodily information [11]. In addition, there are several research works in the field of human-robot
Fig. 3 Teleoperated communication robot system
Fig. 4 Shopping center
interaction with a WOZ method [12, 13]. However, little research has tested the availability of robots in such public service fields. II. SYSTEM CONFIGURATION To provide services such as guiding or supplying information with a robot, we developed a prototype system for communication robots, as shown in Figure 3. This system consists of a humanoid robot, cameras attached to the environment, and an operation system.
A. Setting Figure 4 shows an example of the system installed in a shopping center. The robot, set at point A, provides shop guide or information services to visitors using a map on a sign. The robot was teleoperated from the operation room installed at point B; moving and controlling behaviors were done by wireless LAN. An operator monitored video footage taken with cameras attached to the environment, listened to sounds received from the humanoid robot, sent sounds to the robot, and ordered the robot to perform behaviors.
Fig. 5 Robovie and an interaction scene
B. Hardware This system is composed of the humanoid robot, a personal computer for teleoperation, and cameras attached to the environment. Figure 5 shows the humanoid “Robovie” [6] that is capable of human-like expression and recognizes individuals by using various actuators and sensors. Its body possesses highly articulated arms (4 DOF), eyes (2 DOF), and a head (3 DOF), which were designed to produce sufficient gestures for effective communication with humans. The sensory equipment includes auditory, tactile, ultrasonic, and vision sensors, which allow the robot to behave autonomously and to interact with humans. All processing and control systems, such as the computer and motor control hardware, are located inside the robot’s body. The cameras have a wireless video transmission system. In Figure 4, the monitors and the personal computer for teleoperation were set in the operation room and display video footage taken with cameras located at the four points shown. Using the operation system described below, the operator directed the humanoid robot to perform tasks and move. This personal computer is connected to the robot by wireless LAN.
controls the robot’s movement by directly sending speed and
(a)
C. Operation System The teleoperation system consists of four monitors that display video footage taken with cameras attached to the environment, one personal computer for controlling the robot, and a database. In the database, operation logs and video footage was recorded. Figure 3 shows the flow of information among visitors, robot, and operator. The operator recognizes the situation around the robot from video footage, shown in Figure 6(a), and chooses the proper behavior module using the operation system. Figure 6(b) shows a system window of the operation system. By receiving commands from the system, the robot performs such tasks as pointing or nodding. Also, the operator
(b) Fig. 6 Example of video footage and a screen capture image of operation system
angle control commands. For voice interaction, we use an IP telephone system. The operator hears sounds from a microphone attached to the robot with a hands-free headset. Voice is transmitted from a microphone to a speaker mounted on the robot. Conventionally, Robovie incorporates an autonomous controlling mechanism for communication. However, this mechanism was inactivated because our target is to provide
services by teleoperation. III. FIELD TRIALS A. Details of Experiments We conducted verification experiments for services with the robot set in a shopping center. In these experiments, we administered four tasks for the subjects, such as shop guide or offering information. Since this shopping center is subdivided into sections, visitors in a certain space often have difficulty understanding shop locations. The purpose of the experiments was to verify the model of a teleoperated communication robot and to examine the possibility of future robotic services. With the objective of providing services in a public space, such as a shopping center, we selected subjects unfamiliar with the shopping center’s landscape or its surroundings. All subjects were in their twenties and included three males and three females. The experiments were conducted six times, once for each subject for each task. For the convenience of the shopping center, we performed the experiments late at night. Subjects not currently performing the task played passersby. To verify the model of the teleoperated communication robot, the 10 robot behaviors were carefully selected and included pointing or nodding. Table 1 shows the behaviors used in the experiments. The following experimental tasks were given to the subjects. 1) Guide service task: The subject asks the robot, which is set beside a sign, the way to a shop since the subject cannot find the shop using the sign. 2) Guide service task collaborated with a portable device: The subject calls the robot with a portable device into which a passive RFID tag reader has been installed. Then the robot shows the subject the way to the destination. Here, we assumed that destination information is pre-registered with the portable device. 3) Information supplement service task: The subject TABLE 1: LIST OF ROBOVIE’S BEHAVERS
Head
Arm
Command UP_FACE
Behavior look up into user's face
NOD_SLOW
nod Robovie's head slowly
NOD_RAPID
nod Robovie's head rapidly
FACE_AHEAD
face ahead
FACE_RIGHT
face to the right
FACE_LEFT
face to the left
POINT_AHEAD
point ahead
POINT_RIGHT
point across to the right
POINT_LEFT
point across to the left swing Robovie's arms while walking
MARCH
asks the robot for such information as the degree of congestion in a shop that the subject hopes to visit. 4) Greeting service task using personal information: While walking around the shopping center, the subject is greeted by the robot using personal information, such as a name. We obtained evaluations in free-description questionnaires for each experimental task. There are two reasons why we did not use numerical measurements. Since this is a novel area of research there are no established numerical evaluation methods. We decided therefore to use a free-description questionnaire as a starting point. We also wanted to concentrate on collecting comments to improve the robot system. B. Results We reviewed the comments of the free-description forms and found that subjects mainly mentioned the following two aspects: 1. Desire for each robot service 2. Satisfaction of each robot service Thus, we analyzed whether subjects expressed positive or negative views of these aspects for each service. We used two evaluators not involved in this experiment who established the evaluation criteria and independently judged whether subject responses were positive or negative. 1) Desire for each robot service Figure 7 shows the results of analysis concerning desire for providing each robot service. The evaluators’ judgments highly matched, and their κ value is 0.703. We only adopted judgments where the two evaluator’s judgments matched, omitting subject responses where the two evaluators’ judgments did not match. For example, all subjects positively mentioned the desire of robot tasks 1, 2, and 3. As a result of Fisher's exact test, positive answers were significantly high in tasks 1, 2, and 3 (p=.03). Although there was no significant difference in task 4 (p=.62), four subjects gave negative answers for this service; thus, this service is less desirable for subjects. Here we introduce some of the positive comments: - Because the robot asked “Are you looking for an optician's shop?” I immediately realized that the robot was talking to me. (Because he had input his destination as an optician’s shop in the portable device). (Task 2) - I found the information service about the crowdedness of the shop very useful, because it couldn’t be obtained from a map (or any other existing source). (Task 3) We believe that these comments reveal a desire for a robot that can help people with sophisticated language capabilities as well as individualized services.
6
4 positive negative 2
0 Task1: Route
Task2: PDA
Task3: Info.
Task4: Name
Fig. 7 Desire for each robot service
On the other hand, there was also much negative feedback for task 4. For example: - I was embarrassed when the robot called me by name because there were many people around. - I appreciate a personalized information service, but I cannot accept a service that reveals personal information (including name) by voice. Thus, they did not accept a robot that greeted them by name, because such vocalization disturbed their privacy. 2) Satisfaction of each robot service Figure 8 shows the results of analysis for satisfaction with each robot service. In contrast with desire, their satisfaction was mostly negative. The κ value of the match of the two evaluator’s judgments was 0.412, which indicates marginal correspondence. We omitted subject responses whose judgments did not match. Fisher's exact test indicated almost significant difference in tasks 1 and 4 (p=.06). There was no significant difference in task 2 (p=.38) and task 3 (p=1.0). To summarize the results, in any tasks, subjects were mostly unsatisfied with the service from the robot. For example, a subject mentioned that “when communication timing was inappropriate, I felt
6
4 positive negative 2
0 Task1: Route
Task2: PDA
Task3: Info.
Fig. 8 Satisfaction to each robot service
Task4: Name
uncomfortable as if something was wrong,” and “I was confused because the robot moved slowly.” They mentioned the followings aspects that require improvement: - poses (silence, no responses) during conversation - speed of movements (arm, head, and locomotion) - voice quality There were also a few positive opinions, such as “I talked comfortably with the robot because I could talk with it as if talking with a person,” and “I talked in a small voice, so I worried whether it could hear me. Since it correctly responded, I was impressed.” Recognizing people’s utterances in a small voice in noisy environments will be very difficult, even with the technology of the near future. We believe that it indicates one positive perspective of our approach. Nevertheless, many technological problems remain. IV. DISCUSSION AND CONCLUSION The purpose of the preliminary experiment was to verify our approach for teleoperated communication robots; that is, we will initially place a useful robot controlled by operators and gradually replace its functions with autonomous software. The important point was whether people will accept such robot service controlled by operators. Experimental results indicated the positive perspectives of teleoperated communication robots for the near future. Subjects accepted that robots will fulfill route guidance tasks and information services with a human operator as a controller. As task 2 revealed, a portable device based service indicates the possibility of making route guidance tasks autonomous. There were negative responses from subjects who were called by name, which is opposite to previous case studies in an elementary school [7] and a science museum [8] where people were greatly pleased. This seems to reflect partly the presence of others and privacy issues, and partly due to the robot’s recognition capability. Since it communicated with people with natural language, subjects were not impressed that it could call their name. On the other hand, many issues about robot usability must be improved. For example, subjects were sensitive to delays of the verbal and nonverbal responses, which were mostly caused by the usability of the operator’s remote control interface. Moreover, some simple responses, such as nodding, should be autonomously conducted to reduce poses during conversation. Many similar issues have already been discussed in human-robot interaction (HRI) research regarding teleoperation interfaces for robots with physical tasks, such as visualization of sensory information, control of multiple robots, and comfort with partial autonomy. We believe that “teleoperated communication robots” involve
many new research problems in the field of HRI that need to be solved. ACKNOWLEDGMENT This research was supported in part by the Ministry of Internal Affairs and Communications of Japan. REFERENCES [1] C. Breazeal and B. Scassellati: ‘Infant-like social interactions between a robot and a human caretaker,’ Adaptive Behavior, 8(1), 2000. [2] K. Nakadai, K. Hidai, H. Mizoguchi, H. G. Okuno, and H. Kitano: ‘Real-Time Auditory and Visual Multiple-Object Tracking for Robots,’ Proc. Int. Joint Conf. on Artificial Intelligence, pp.1425-1432, 2001. [3] O. Sugiyama, T. Kanda, M. Imai, H. Ishiguro, and N. Hagita, “Three-layered Draw-Attention Model for Humanoid Robots with Gestures and Verbal Cues,” IEEE/RSJ Int. Conf. on Intelligent robots and systems (IROS2005), pp. 2140-2145, 2005. [4] K. Nakadai, D. Matsuura, H. G. Okuno, and H. Kitano, Applying Scattering Theory to Robot Audition System, IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS2003), pp. 1147-1152, 2003 [5] D. A. Norman, Emotional design, Basic Books, 2003. [6] T. Kanda, H. Ishiguro, M. Imai, and T. Ono, Development and Evaluation of Interactive Humanoid Robots, Proceedings of the IEEE, Vol. 92, No.11, pp. 1839-1850, 2004. [7] T. Kanda, T. Hirano, D. Eaton, and H. Ishiguro,
Interactive Robots as Social Partners and Peer Tutors for Children: A Field Trial, Human Computer Interaction, Vol. 19, No. 1-2, pp. 61-84, 2004. [8] M. Shiomi, T. Kanda, H. Ishiguro, and N. Hagita, Interactive Humanoid Robots for a Science Museum, 1st Annual Conference on Human-Robot Interaction (HRI2006), 2006. [9] N. E. Sian, K. Yokoi, S. Kajita, F. Kanehiro, and K. Tanie, Whole Body Teleoperation of a Humanoid Robot Integrating Operator’s Intention and Robot’s Autonomy, IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS2003), pp. 1651-1656, 2003. [10] D. Sekiguchi, M. Inami, and S. Tachi, RobotPHONE: RUI for Interpersonal Communication, CHI2001 Extended Abstracts, pp. 277-278, 2001. [11] T. Watanabe, M. Okubo, M. Nakashige, and R. Danbara, InterActor: Speech-Driven Embodied Interactive Actor, Int. J. of Human-Computer Interaction, Vol. 17, No1, pp. 43-60. 2004. [12] S. Woods, M. Walters, K. Lee Koay, and K. Dautenhahn, “Comparing Human Robot Interaction Scenarios Using Live and Video Based Methods,” Towards a Novel Methodological Approach, Int. Workshop on Advanced Motion Control, 2006. [13] A. Green, H. Hüttenrauch, and K. Severinson Eklundh, “Applying the Wizard-of-Oz Framework to Cooperative Service Discovery and Configuration,” Proc. IEEE Int. Workshop on Robot and Human Interactive Communication, 2004.