Ghost-in-the-Machine: Initial Results - ACM Digital Library

0 downloads 0 Views 2MB Size Report
Mar 3, 2014 - ABSTRACT. We describe the design of the newly developed Ghost-in-the-. Machine paradigm and present initial results of an experiment.
Ghost-in-the-Machine: Initial Results Sebastian Loth

Manuel Giuliani

Jan P. de Ruiter

Bielefeld University Faculty for Linguistics and Literature Universitätsstr. 25, 33615 Bielefeld, Germany +49-521-106-3669

fortiss GmbH An-Institut TUM Guerickestr. 25, 80805 München, Germany +49-89-360-3522-544

Bielefeld University Faculty for Linguistics and Literature Universitätsstr. 25, 33615 Bielefeld, Germany +49-521-106-6928

[email protected]

[email protected]

[email protected]

ABSTRACT

2. GHOST-IN-THE-MACHINE

We describe the design of the newly developed Ghost-in-theMachine paradigm and present initial results of an experiment addressing the initiation of service interactions at a bar. For developing policies for a robotic bartender, we investigated which sensor modalities were most informative to humans, and which actions they selected as a socially appropriate response. The results showed that participants used two nonverbal cues for their initial response to a new customer. Those were the distance to the bar and whether the customers’ torso was directed to the bar. For acknowledging a new customer, the participants typically responded nonverbally by looking and smiling at the customers. All results can be directly transferred into robotic decision policies.

In the GiM paradigm, the main participant (ghost) is presented the output of the robotic sensors and responds to the participants ordering drinks (customers) by selecting an appropriate response from the robot’s action repertoire. In contrast to the WOz paradigm (e.g., [6]), the ghosts were not allowed to observe the customers directly. Rather the ghosts observed the scene through the eyes of the robot’s planner. The recognizer software issues an update to the planner after major changes in the scene [3]. These updates were presented to the ghosts through a user interface (Figure 1) which made the data comprehensible to the user. The Boolean variables (e.g., Customer is visible) were displayed in a traffic light approach. The angles of the customer’s body and head orientation were shown as an arrow. The position of the customers face was shown as a dot in a coordinate system reflecting the space in front of the bar. Finally, the most likely hypothesis of the ASR including its certainty level in percent was presented. Importantly, this did not add any additional interpretation or social content compared to the recognizer update, as e.g., displaying a torso in front of the bar. The user interface (Figure 1) proceeded through the updates one-by-one after the ghost selected a response option in the control panel (Figure 2). This panel showed the robot’s action repertoire grouped by output modality (e.g., head gestures). The actions within each group were mutually exclusive, e.g. the robot can either look happy or sad, but not both at the same time. Actions across different groups could be combined to a complex response (e.g., looking at customer and saying ‘Hello’).

Categories and Subject Descriptors H.1.2 [Models and Principles]: User/Machine Systems – software psychology.

General Terms Design, Experimentation.

Keywords Human-Robot Interaction, Social Robotics, Psychology.

1. INTRODUCTION For enabling users to interact intuitively with a robot, its recognizing and planning components have to correctly interpret the natural social behavior of its users and it has to respond appropriately. But the robot’s input is limited to its sensor data and thus, the responsiveness of the robot depends on correctly interpreting them [2]. In turn, the planner has to select a socially appropriate response from the robot’s action repertoire [3]. Thus, we were interested in how a socially skilled human participant would interpret the robotic sensor data, and what forms a socially appropriate response. However, the Wizard-of-Oz paradigm (WOz) allows the participant controlling the robot access to unfiltered audio and/or video of the user [5]. Thus, we developed the Ghost-in-theMachine paradigm (GiM) for addressing our research question. In this report, we focus on initiating a service interaction at a bar. This is the first step in short, goal-oriented interactions. It is crucial because the whole interaction would fail if the robot did not recognize that a customer has the intention to place an order. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the author/owner(s). HRI’14, March 3–6, 2014, Bielefeld, Germany. ACM 978-1-4503-2658-2/14/03. http://dx.doi.org/10.1145/2559636.2563696

Figure 1: Example of the GiM user interface.

234

thus, was very important to the ghosts. However, the ghosts also acknowledged customers who were not close to the bar. In these cases, the customers’ body direction was displayed as an arrow and the corresponding indicator was active but not always true. Thus, the ghosts used the body posture for anticipating that the customers will be close to the bar. This also implies that the threshold for the binary indicator body faces bar was too strict. The ghosts successfully compensated for the missing information about the face direction by relying on the body direction. In sum, ghosts attended nonverbal cues and responded if the customers were visible to the system, (anticipated to be) close to the bar and faced the bar. This replicated an earlier finding using natural stimuli [4] demonstrating the validity of the results.

Figure 2: Example of the GiM control interface. The user interface was introduced in a self-paced presentation. The categorial responses of the ghosts as well as their eye tracking data were recorded. The experiment consisted of two practice items followed by six intention recognition trials and six speech recognition trials. However, we focus on intention recognition in this report. These trials terminated as soon as the ghost first acted towards a new customer. The indicator for Seeking attention was not active (both ‘lights’ off) during these trials. In this experiment, pre-recorded data of the JAMES evaluation [1] was used. Twelve participants were recruited from the general university population.

In most cases, the ghosts acknowledged a new customer nonverbally. Only one third of the trials comprised a verbal greeting or prompt for orders. About 75% of all trials involved visibly shifting the robot’s attention to the customers by looking at them. This was often accompanied by smiling at the customers.

5. CONCLUSION The first results of the new GiM paradigm showed that a robotic bartender should acknowledge new customers if they approach the bar and direct themselves towards the counter. This successfully replicated findings acquired using natural data [4]. Furthermore, the results showed how missing data can be compensated for, e.g. using body for face direction. The ghosts acknowledged new customers by looking and smiling at them. But only one third of the responses included a verbal greeting. The participants reported that the design was very immersive. In sum, GiM is a reliable and engaging research tool providing new insight into social behavior.

3. RESULTS Eleven trials were excluded because the participants did not respond to the customer until the update unequivocally indicated a drink order. This resulted in 61 informative intention recognition trials. The data in Table 1 shows which indicators were true (green light) or active (arrow present) when the ghosts acknowledged a customer. The table also shows the averaged dwelling time spent on each indicator between the last update on screen and acknowledging the customer. The face orientation (arrow and traffic light) was never shown during the experiment due to the computer vision software. The data in Table 2 shows which actions were selected by the ghosts for acknowledging a new customer.

6. ACKNOWLEDGMENTS This research was part of the JAMES project and received funding from the European Union’s Seventh Framework Programme under grant agreement number 270435.

4. DISCUSSION

7. REFERENCES

The participants reported that they experienced GiM as very immersive and occasionally tried to engage in dialogues with the customers despite being in an offline setting. Analyzing the data visible on screen showed that the ghosts never waited for a verbal utterance of the customers before acknowledging them. Rather, the ghosts responded to nonverbal cues. The indicator showing the distance to the bar received the greatest share of attention and

[1] Foster, M.E. et al. 2012. Two people walk into a bar: dynamic multi-party social interaction with a robot agent. Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI 2012) (Santa Monica, USA, 2012). [2] Gaschler, A. et al. 2012. Modelling State of Interaction from Head Poses for Social Human-Robot Interaction. Proceedings of the Gaze in Human- Robot Interaction Workshop held at the 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2012) (Boston, USA, 2012).

Table 1: Data displayed to ghosts and dwelling time. Trials with true/active

Feature Visible Close to bar Location (left/right) Body direction (arrow) Body faces bar Said something

Averaged dwelling time

97% 54% 98% 89% 51% 0%

[3] Giuliani, M. et al. 2013. Comparing task-based and socially intelligent behaviour in a robot bartender. Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI 2013) (Sydney, Australia, 2013), 263–270.

326 ms 560 ms 297 ms 217 ms 331 ms 63 ms

[4] Loth, S. et al. 2013. Automatic detection of service initiation signals used in bars. Frontiers in Psychology. 4, (2013). [5] Riek, L. 2012. Wizard of Oz Studies in HRI: A Systematic Review and New Reporting Guidelines. Journal of HumanRobot Interaction. 1, 1 (Aug. 2012), 119–136.

Table 2: Actions selected for acknowledging new customers. Action Looking at customer Happy face Greeting (e.g., Hello) Looking at objects Asking for order Other

Frequency 46 26 14 9 5 2

75% 43% 23% 15% 8% 3%

[6] Rieser, V. et al. 2011. Adaptive Information Presentation for Spoken Dialogue Systems: Evaluation with human subjects. Proceedings of the 13th European Workshop on Natural Language Generation (Nancy, France, 2011), 102–109.

235