People's Interpretations of Agents' Attitude from Artificial ... - CiteSeerX

4 downloads 1342 Views 234KB Size Report
AIBO robot (Sony Corporation), and a normal laptop PC ... participant heads: 60 dB (FAST, A)). ... these sounds were presented by a normal laptop PC (Lets'.
People’s Interpretations of Agents’ Attitude from Artificial Sounds Expressed by Agents with Different Appearances Takanori Komatsu ([email protected]) International Young Researchers Empowerment Center, Shinshu University, 3-15-1 Tokida, Ueda, Nagano 386-8567 Japan

Seiji Yamada ([email protected]) Digital Content and Media Science Research Division, National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda, Tokyo 101-8430 Japan

Abstract We conducted experiments in which participants were presented with artificial sounds that were intended to convey the attitudes of three different agents: a MindStorms robot, an AIBO robot, and a laptop PC. The participants were then asked to select the correct attitudes based on the sounds expressed by these three agents. The results showed that the participants had higher interpretation rates when the PC presented the sounds, while they had lower rates when MindStorms and AIBO presented the sounds, even though these agents expressed completely identical information. Keywords Agent Appearance, Agent Attitudes; HumanAgent Interaction; Subtle Expressions; Impression Formation.

Introduction Recently, one of the “hottest” topics in human-computer interaction or human-agent interaction studies concerns how agent appearance affects interactions with people. Since people supposedly determine agent behavior models based on agent appearance, agent appearance significantly affects their interaction (Yamada & Yamaguchi, 2004). For example, when people encounter a dog-like robot, they expect dog-like behavior from it, and they naturally speak to it using commands and other utterances intended for dogs, such as “sit,” “lie down,” and “fetch.” However, they do not act this way toward a cat-like robot. Several studies have focused on the effects of agent appearance on interactions with people (Kiesler et al., 1995; Kiesler & Sproull, 1997; Goetz et al., 2003). Kiesler et al. (1995) conducted a psychological experiment where participants played a prisoner’s dilemma game with virtual characters (a human and a dog) that appeared on a computer display. The results showed that participants who had owned dogs interacted significantly more effectively with the dog-like virtual agent, e.g., significantly cooperating with this agent. Goetz et al. (2003) investigated the effects on people’s impressions of different head designs of humanoid robots. The results showed that participants considered robots with human-like heads good at social

tasks, while robots with machine-like heads were adept at industrial tasks. These are the pioneering studies on the effects of agent appearance on people’s impressions. Based on the above findings, we focused on how people interpreted the expressed information from agents with different appearances, especially just after they determined the agents’ behavior models based on their appearances. We focused on the relationship between agent appearances and their expressed information. As mentioned above, when people are facing certain agents, they determine the agents’ behavior models. They then interpret the agents’ expressed information based on this behavior model and start interacting with this agent. Therefore, this study constitutes a significant step toward clarifying how agent appearance affects interaction with people. Moreover, the results of this study will contribute to a design policy for agents by suggesting appearances that agents should have to effectively interact with people and what kind of information these agents should express to people.

Related Studies Several approaches have tackled to investigate the effects of the agent appearance on users’ impressions. For example, Matsumoto et al. (2005) proposed “Minimal Design Policy” for designing interactive agents and concluded that agent appearance should be minimized for users. In fact, they applied this minimal design policy for developing Muu, their interactive robot (Okada et al., 2000) and Talking Eye, a life-like agent (Suzuki et al., 2000). Although this study focused on the effects of agent appearance on people’s impressions, which is identical to Kiesler’s study above (1995), it did not mention the relationship between agent appearances and their expressed information; instead, this study proposed a rather abstract design strategy for interactive agents. In their “Media Equation” studies Reeves and Nass (1998) showed that anthropomorphized agents or computers might induce such natural human behaviors as if humans’ behaving toward humans. Their studies focused on the human propensity to anthropomorphize artifacts, even

2492

though these artifacts do not have human-like appearances. These studies are not concerned with the relationship between appearance and expressed information. Kanda el al. (2005) conducted experiments to observe peoples’ behaviors toward two different types of humanoid robots: ASIMO (Honda Motor Co., Ltd.) and Robovie (ATR-IRC). The results of this study showed that the different appearances of these two robots affected people’s nonverbal expressions, such as gestures and body movements. However, this study also failed to mention the relationship between appearance and expressed information.

Agent Attitudes, Appearance, and Expressed Information We conducted a simple experiment to investigate how people interpret presented artificial sounds as expressed information to see if they could determine specific attitudes from agents that have different appearances and to clarify the relationship between the agent appearance and expressed information. We selected positive and negative attitudes corresponding to valence values (Reeves & Nass, 1998) as the primitive attitudes that the agents should express. Informing people of these two primitive attitudes is crucial if the agents are to interact effectively with people (Kobayashi & Yamada, 2005). Next, simple artificial sounds without any verbal information were presented to inform people of certain agents’ primitive attitudes. Komatsu (2007) showed that people can estimate different primitive attitudes by simple beep-like sounds of different durations and inflections. These simple but intuitive expressions are called subtle expressions (Liu & Picard, 2003; Komatsu, 2006). According to the interpretations of these subtle expressions, Komatsu (2005) argued that how the expressed information changes in terms of its natural energy flow, such as the natural damping phenomenon in case of speech sounds, determines the people’s interpretation: 1) When the presented information drastically deviated from its natural energy flow, the information sender’s expressed attitude was interpreted as negative, 2) When the presented information agreed with its natural energy flow, the expressed attitude was interpreted as positive. For the agents, we selected three artifacts with different appearances: a Mindstorms robot (The LEGO group), an AIBO robot (Sony Corporation), and a normal laptop PC (Let’s note W2, Panasonic Inc.). We expected that these artifacts correspond to making mechanical impressions of people, familiar impressions, and non-agent-like impressions, respectively. First, in a preliminary experiment we investigated the specific artificial sounds that might evoke positive or negative attitudes. We then conducted a experiment to present the selected artificial sounds in the above preliminary experiment from the three different agents to

the participants, and investigated how people interpreted them. Finally, we summed the results and discussed the effects of agent appearance on people’s interpretation of the agents’ expressed information.

Preliminary Experiment Before the actual experiment, we conducted a preliminary experiment to determine what artificial sounds effectively evoke certain attitudes, either positive or negative. We focused on artificial sounds that acted as subtle expressions in a previous study (Komatsu & Nagasaki, 2004) and then investigated what kinds of sounds were interpreted as positive or negative. We prepared 44 difference types of triangle wave sounds with four different durations and 11 different fundamental frequency (F0) contours. The four durations were 189, 418, 639, and 868 ms. The 11 F0 contours were set so that the transition range of the F0 values between the onset and endpoint in the sound stimuli were -125, -100, -75, -50, -25, -, 25, 50, 75, 100, and 125 Hz; they were linearly downward or upward (Figure 1). All 44 stimuli had the same F0 average of 131 Hz and the same sound pressure (around the participant heads: 60 dB (FAST, A)). In addition, these sounds had a tone that resembled a computer beep. Actually, these sounds were presented by a normal laptop PC (Lets’ note W2, product of Panasonic Inc.).

2493

Figure 1: 11 different F0 contours (duration: 189 ms). For example, “189u25” indicates that duration was 189 ms, and F0 transition range was 25 Hz with upward slope (increasing intonation).

stim uli 189d025 189u025 189d050 189u050 189d075 189u075 189d100 189u100 189d125 189u125 189du0 418d025 418u025 418d050 418u050 418d075 418u075 418d100 418u100 418d125 418u125 418du0 639d025 639u025 639d050 639u050 639d075 639u075 639d100 639u100 639d125 639u125 639du0 868d025 868u025 868d050 868u050 868d075 868u075 868d100 868u100 868d125 868u125 868du0

A + * + + + + * * + + + + + * * + + + + * + * * + + + *

B * + * * + * + * + * + + + + * * * + + + * + * * * + * * + + -

C * * + + + + + * + + + + + + + + + * + * * + -

D * + + + + + + + + + + + * + * + * + -

P articipants E F G * * + + + + + + + + + + * * * + * + + + + + + + * + * + + + + + + + + + + + + * * * + * * * * + + + * + + + * + + + + + *

H + + + * + + + * + + + + + + + + * + + + + -

I * * * + + + + + + + + + + + * + + * + + + + -

z z z

Positive: PC’s internal state appears positive. Negative: PC’s internal state appears negative. Indistinguishable: unclear whether the PC’s internal state is positive or negative. One randomly selected sound among 44 prepared sounds was presented to participants. Then the participants were asked to select one of the three attitudes. Each participant heard all 44 prepared sounds. The order of presenting sounds was counterbalanced for all ten participants.

J + + + + + + * + + + * + + + + + -

Results The result of this preliminary experiment is depicted in Figure 2, which indicates which sounds were interpreted as “positive (+),” “negative (-),” and “indistinguishable (*).” All ten participants believed the PC had a positive attitude for the following five sounds: 189 ms with an upward slope range of 125 Hz (189u125), 418u125, 639u100, 639u125, and 868u125. They also all believed the PC had a negative attitude for the following six sounds: 418 ms with a downward slope range of 25 Hz (418d25), 418d50, 418d75, 418d125, 639d50, and 639d75. Thus, regardless of the duration, sounds with faster increasing intonation were considered positive, while the sounds around 500 ms with slower decreasing intonation were considered negative. This result agreed with the argument about the interpretation of subtle expressions mentioned above (Komatsu, 2005). Among the five sounds interpreted as positive, we eliminated sound labeled 639d100 because it has the slowest slope range. Among the five sounds interpreted as negative, we eliminated sounds 418d125 and 639d75 because it has the fastest slope range. The remaining eight sounds were then selected for agents with different appearances in the next experiment.

Experiment

Figure 2: Results of preliminary experiment: “+” means “positive attitudes,” “-” means “negative,” and “*” means “indistinguishable.”

Participants Ten Japanese university students, six men and four women, from 19-23 years old participated. Hearing tests established that all had normal hearing.

This experiment investigated the effects of agent appearance on how participants interpreted agent attitudes. The participants were presented selected sounds in a preliminary experiment by agents with different appearances, e.g., the Mindstorms robot, the AIBO robot, and a normal laptop PC (Figure 3). The participants were then asked to determine the appropriate attitudes among positive, negative, or indistinguishable, based on the sounds expressed by these agents.

Participants Procedure First, participants were instructed to determine the attitudes of a PC based on the presented sounds. After listening to one of the 44 prepared sounds, they selected one of three attitudes: “positive,” “negative,” or “indistinguishable.” These three attitudes were defined to participants as follows:

Twenty Japanese university students, 17 men and 3 women, from 19-24 years old participated. No participants were familiar with the robots and had participated in the former preliminary experiment. Hearing tests established that all had normal hearing.

2494

3. Eight sounds expressed by the laptop PC (PCcondition): Nearly identical conditions as those in the preliminary experiment were used. This laptop PC was the same one utilized in the preliminary experiment and remotely operated by a sound-expressing PC. All participants randomly experienced these three conditions. The experiencing order of the experimental conditions and the presenting order of the eight sounds were counterbalanced.

1

Interpretation Rates (times)

0.9

Figure 3: AIBO robot, Mindstorms robot, and laptop PC (from left to right)

Wireless LAN(Remote Access via Real VNC)

FM radio wave

0.8

(* * )

0.7 0.6 0.5 0.4

MindStorms-cond. AIBO-cond. PC-cond.

0.3 0.2 0.1

FM radio tuner

0

))

)

Figure 5: Average interpretation rates of three agents

Wireless LAN (motion command)

FM Transmitter AIBO

Sound AIBO Question expressing operating expressing PC PC PC

Experimenter

Mindstorms

Question displaying monitor

PC

Results

Place of agent

Participant

Figure 4: Experimental Setting

Procedure First, they were instructed to select one of three attitudes, “positive,” “negative,” or “indistinguishable,” after the agent presenting the artificial sound. All participants experienced the following three experimental conditions: 1. Eight sounds selected in a preliminary experiment expressed by Mindstorms (MS-condition): the eight sounds came from an FM radio tuner mounted on Mindstorms. This radio tuner received the transmitted sounds from a sound-expressing PC (Figure. 4). 2. Eight sounds expressed by AIBO (AIBO-condition): the sounds were presented using AIBO’s operating software, “AIBO entertainment player,” which was installed in an AIBO-operating PC. This software remotely controlled the AIBO.

We calculated the interpretation rates to indicate how many times the participants successfully and correctly determined agent attitudes in all three experimental conditions. The results show that the participants had interpretation rates of 0.56 (4.45 times for eight experimental stimuli) in the MScondition, 0.55 in the AIBO-condition, and the 0.83 in the PC-condition (Figure. 5). The interpretation rate in the PCcondition was lower than in the preliminary experiment (interpretation rate in the preliminary experiment was 1.0 because we utilized the sounds for which all ten participants showed identical interpretations) because two participants answered “indistinguishable” for all experimental trials. The ANOVA results of the interpretation rates showed significant differences in these three experimental conditions (F(2,38)=15.56, p

Suggest Documents