Int J Soc Robot (2011) 3: 125–142 DOI 10.1007/s12369-010-0071-x
Communication of Emotion in Social Robots through Simple Head and Arm Movements Jamy Li · Mark Chignell
Accepted: 2 August 2010 / Published online: 4 September 2010 © Springer Science & Business Media BV 2010
Abstract Understanding how people perceive robot gestures will aid the design of robots capable of social interaction with humans. We examined the generation and perception of a restricted form of gesture in a robot capable of simple head and arm movement, referring to point-light animation and video experiments in human motion to derive our hypotheses. Four studies were conducted to look at the effects of situational context, gesture complexity, emotional valence and author expertise. In Study 1, four participants created gestures with corresponding emotions based on 12 scenarios provided. The resulting gestures were judged by 12 participants in a second study. Participants’ recognition of emotion was better than chance and improved when situational context was provided. Ratings of lifelikeness were found to be related to the number of arm movements (but not head movements) in a gesture. In Study 3, five novices and five puppeteers created gestures conveying Ekman’s six basic emotions which were shown to 12 Study 4 participants. Puppetry experience improved identification rates only for the emotions of fear and disgust, possibly because of limitations with the robot’s movement. The results demonstrate the communication of emotion by a social robot capable of only simple head and arm movement.
Funding provided by the Japan Society for the Promotion of Science (JSPS), the Natural Sciences and Engineering Research Council of Canada (NSERC), Bell University Labs and the University of Toronto. J. Li () · M. Chignell Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, ON M5S 3G8, Canada e-mail:
[email protected] M. Chignell e-mail:
[email protected]
Keywords Human-robot interaction · Gesture design · Communication of emotions · Puppetry
1 Introduction Modern robots are no longer being designed only to function as manufacturing aids, but they are also being introduced as social partners. In light of the roles robots are adopting as household pets (e.g., Sony’s AIBO), domestic helpers (iRobot’s Roomba), healthcare assistants (RIKEN Japan’s Ri-Man), emotional companions (AIST’s PARO) and educational aids (MIT’s Kismet and Leo), appropriate social behaviour is critical for people to develop personal relationships with such agents. Many authors have called for better design of robots capable of engaging in meaningful social interactions with people (e.g., [1, 2]). This new breed of robots is called “socially interactive robots” or “social robots” [1]. The use of gestures has been identified as crucial to the design of such robots [2]. Research on robot gestures is needed because: (1) studying gesture interpretation is necessary to improve human-robot interaction especially for robots that have limited ability for vocal and facial expressivity; (2) previous research in HRI has focused on how gestures are created without evaluating people’s understanding of those gestures, so little is known about what factors affect gesture perception; and (3) no previous work has investigated the characteristics of “good” designers and the role of expertise in gesture authorship. Current practice in the design of robot gestures has robot inventors and researchers devising gestures based on their own experience and sometimes drawing upon fields such as dance (e.g. [3]). These methods may be convenient but little work has been done to
126
evaluate their success. Can simple robot gestures, viewed in absence of other cues, convey emotions to viewers? What contextual and motion characteristics aid gesture understanding? Should robot engineers design robot gestures—or should those gestures be designed by an “expert” such as a puppeteer? Greater understanding of the authoring and interpretation of robot gestures would be beneficial to robot makers. To investigate how people understand gestures of social robots, we ground our work in the field of human gesturing. Just as interpersonal behavior (i.e., how people interact with other people) influences how people treat computers and has been used extensively to motivate human-computer interaction (HCI) design [4], we apply these same concepts in order to understand how people respond to robot gestures, and to motivate human-robot interaction (HRI) design. Indeed, previous work has suggested that people treat robots as social agents (e.g., [5–7]). Our focus, however, is not to confirm whether people consider robots to be social agents per se, but rather to assess how specific concepts in human gesture communication may aid the design of the social robot experience. To this end we review relevant work in point-light animation and video experiments of human gesturing and derive hypotheses on robot gesture communication (Sect. 2). Section 3 describes the methodology and results of the first paired study conducted in Japan, which explored situational context, gesture complexity and emotion valence in gesture communication using a simple bearlike robot (Fig. 1) capable of head and arm movement only. Section 4 describes the second paired study conducted in Canada to investigate expertise in gesture authoring using the same robot. Overall conclusions based on both sets of studies, as well as proposed future work, are presented in Sect. 5.
Int J Soc Robot (2011) 3: 125–142
2 Background 2.1 Gestures in Human Interpersonal Communication Perceiving the moods and actions of others is one of our fundamental social skills and gesturing provides valuable information for this process. In the words of Sir Francis Bacon [8], “As the tongue speaketh to the ear, so the gesture speaketh to the eye.” Gestures are a form of non-verbal communication that is closely linked to human cognition and speech [9]. When a person speaks, not only are linguistic sounds transmitted, but the entire body is moving constantly and spontaneously to facilitate and modify what is said. This behavioral phenomenon is termed embodied communication, and gestures are a key component in the process of conveying and understanding meaning among people [10]. As a simple model, Argyle [11] characterized social communication in terms of dialogues involving successive rounds of coding and decoding. In the present context, encoding involves gesture creation and decoding involves gesture understanding. Previous research has identified several different roles for gestures during interpersonal communication. In particular, Nehaniv [12] grouped gestures into five broad classes which we summarize in Table 1 and relate to the classifications of other authors. Here we focus on gestures that communicate emotional content (class 2) because of the importance of emotionality in establishing personal relationships between robots and people [13]. Examples of gestures that express emotion include fist-clenching that indicates aggression and moving of hands when excited. We exclude conversational gestures such as those employed by embodied conversational agents [14, 15] and concentrate exclusively on the communicativeness of “stand-alone” gestures. In light of the capabilities of the robot used as well as our desire to investigate simple robotic movement, the type of gesture consisted of simple head and arm movements (excluding articulated movements of hand, finger, wrist joint or elbow joint that are typically present in human interpersonal communication). Several factors affect the coding-decoding process for expressive gestures, four of which are reviewed below: type of emotion conveyed, situational context, motion characteristics and individual differences. 2.2 Factors Affecting the Perception of Human Gestures: A Survey
Fig. 1 Participant manipulating robot used in this study
We surveyed point-light animation and body video experiments found in social psychology literature to understand how various factors influence the perception of body gestures (some relevant material on hand gestures is also presented). Point-light animations portray biological movement using small point lights attached to the body of an individual
Int J Soc Robot (2011) 3: 125–142
127
Table 1 Classification of gestures (modified from [12]) Class
Name
Defining characteristics and goal
Examples
Related classifications
1
“Irrelevant”/Manipulative gestures
Influence on non-animate environment; manipulation of objects, side effects of body motion
Grasping a cup; motion of arms when walking
2
Expressive gestures
Expressive marking; communication of affective states
Excited raising and moving of hands while talking; fist-clenching (aggression)
Emotion gestures (Argyle [11])
3
Symbolic gestures
Signal in communicative interaction; communicative of semantic content
Waving ‘hello’; illustrating size/shape with hands; gesture ‘language’ used in broadcasting, military, etc.
Iconic gestures, Metaphoric gestures (McNeill [9], Beattie [71]) Illustrator gestures (Argyle [11]), Emblem gestures (McNeill [9], Argyle [11])
4
Interactional gestures
Regulation of interaction with a partner
Nodding head to indicate listening
5
Referential/Pointing gestures
Indication of objects, agents or topics
Pointing of all kinds
Reinforcer gestures (Argyle [11]), Beat gestures (McNeill [9], Beattie [71])
2.2.1 Emotion Type
Fig. 2 Point light animation (A) and body video (B) techniques as used in psychology (from [17])
referred to as the “actor” (Fig. 2A). These points are sometimes connected with lines to create stick figures or superimposed with facial and body forms to create an embodied agent (e.g., [16]). Conversely, body videos show human motion through recorded video, often with the face blurred or occluded to mask facial expression (Fig. 2B). Our review is pragmatic in that it focuses on behavioral effects in gesture communication and does not explore neural, evolutionary or other bases for such behaviour. We use this literature to motivate hypotheses on how people may perceive robot gestures.
Emotions play an important role in human communication and have a strong tendency to be affected by body motion; for example, people can be emotionally engaged when watching dance performances. Emotions have been called the most important drivers for agents to interact in a lifelike and intelligent manner with other entities in their environment [18]. Emotions can be described in terms of the two dimensions of arousal and valence, and the experience of these emotions has been found to be associated with a set of physiological responses [19]. Ekman et al. [20] identified six basic emotions that can be detected in facial expressions: happiness, surprise, anger, sadness, fear and disgust/contempt. This particular set of emotions has since been used frequently in studies of social psychology and HCI. Emotions can also be categorized using the dimension of valence, which refers to whether an emotion is positive, negative or neutral [21]. With respect to Ekman’s emotions, we consider the emotion of happiness to be positive, surprise to be neutral and anger, sadness, fear and disgust to be negative. Previous research suggests that viewers can identify emotions in videos of body gestures without speech or facial expressions using standard recognition tasks such as selection from a list of emotions [22–25]. Basic emotions are also identifiable with point-light animations of arm movement [26], body movement [17] and paired conversations [27]. Recent research has also shown that emotion valence may mediate emotion recognition of gestures [28]: for example, subjects who see videotapes of body movement only
128
can more accurately judge negative emotion, but not positive emotion, than those who view the face only [29]. However, other work has found that using facial expression to judge emotion resulted in much higher accuracy than either body or tone of voice—suggesting that although body gestures in isolation may be used to judge emotion, people tend to rely on facial expression as the key indicator of emotion [30]. It is therefore of interest to test whether in the absence of an expressive face robots can still convey emotions using simple head and arm gestures. • Hypothesis 1: Viewers of robot gestures involving nonarticulated head and arm movements only will be able to recognize emotion at better-than-chance levels. 2.2.2 Situational Context Research in social psychology has shown that in human-tohuman communication, the perceived meaning of a gesture depends on its social and environmental context [11, 31]. For example, depending on the situation an open palm can mean different things—“Give me something” versus “Let’s work together” versus “It’s your turn.” Clarke et al. [27] found that knowledge of social context aided perception of emotion in point-light body movement videos: subjects could better judge the emotion of a pair of point-light actors when both were seen together instead of each in isolation. Likewise for static body posture, emotional expressions are better recognized when the actions are shown in a congruent social context [32]. From a theoretical perspective, the benefit of context in gesture recognition may arise because an individual’s experience of different emotions is characterized by “situational meaning structures.” These structures are based on cognition of how the situation affects an individual and his or her judgment of whether that effect is desirable [33]. In other words, the same event can cause a variety of different responses in people depending on how it is interpreted, what aspects are emphasized or overlooked, and what preconceptions are present—all of which are affected by contextual knowledge. • Hypothesis 2: Knowledge of situational context will improve recognition of emotional content in robot gestures. 2.2.3 Motion Characteristics There are three main classes of motion characteristics that affect the perception of body gestures [34]: (1) Structural information: an object’s form and composition (2) Kinematics: an object’s displacement, velocity and acceleration (3) Dynamics: an object’s mass and force
Int J Soc Robot (2011) 3: 125–142
Here we look at gesture complexity as a measure of kinematics and dynamic information. Significant importance has been placed on the role of kinematics and dynamics in visual perception, as evinced by the ubiquity of point-light animation experiments which exclude static information and focus on motion information. These studies show that kinematic information found in body movements provides sufficient guidance for people to perceive expression of emotion [34]. For example, motion characteristics affected perception of emotion in point-light experiments featuring knocking and drinking arm movements [26]: fast, jerky movements were more associated with anger and happiness, while slow, smooth movements were linked with sadness. Differences in kinematics of arm movements have also been found to help viewers distinguish between joy, anger and sadness as portrayed by dancers moving only their arms [35]. Furthermore, previous work has established that both adults and children use the patterns and timing of motion to judge life or sentience (reviewed in [36]). These cues include autonomy (i.e., “does the movement appear selfdirected?”), speed (“is the movement speed similar to the speed of human motion”?) and goal-directedness (“does the movement appear to be achieving a goal?”) [37–40]. However, some cues such as autonomy are subjective and too vague to be of use [41], while others such as speed may be difficult to measure in robotic movement due to the absence of appropriate sensors. An alternate characteristic of gesture motion, changes in an object’s movement direction, has been found to influence perceptions of animate life for both children and adults [42, 43] and is used in this study. While we expect body motion characteristics to influence the perception of robot gestures, we do not make specific predictions on which types of movements will be easiest to judge or which will be most lifelike. 2.2.4 Individual Differences: Puppeteers vs. Laypeople Previous studies have indicated that there are differences in gesture communication among individuals. Trafton et al. [44] studied the use of iconic gestures by both experts and “journeymen” (i.e., apprentices) in the domains of neuroscience and meteorological forecasting and found that experts perform more gestures than non-experts to convey information. Experts and novices have also been found to employ different knowledge structures in a variety of activities that affect their performance in those activities [45, 46]. Moreover, in watching point-light animations of body movements, observers can not only determine the meaning of the gestures but are also able to accurately judge characteristics of the gesturer such as identity [47]. We expect that differences in the ability to create gestures will also exist between experts and amateurs. For the creation of emotional robotic gestures, however, who might the experts be? There are probably a number of
Int J Soc Robot (2011) 3: 125–142
different types of expertise depending on which class of gesture is being discussed: symbolic gestures, for example, require different knowledge and skills compared to expressive gestures. The domain of use is also a consideration: in designing gestures for a dancing robot, for example, the obvious choice would be to consult professional dancers, and for a meteorology robot a human meteorologist. But for a robot designed to express emotion for everyday social interaction, puppeteers are leading candidates for “gesture experts” because they have experience manipulating puppets that have similar form factors to social robots of the type considered here, with gestures that are meant to portray emotion. Puppeteers are artists who perform by manipulating puppets. They have been called “a special kind of actortechnician who channels performance through a foreign body, using illusion, exhibition, and animation, to achieve the ultimate disguise” [48]. The puppets they operate are figures that usually represent a humanoid or animal character, real or imaginary, and can include the forms of marionettes, shadow puppets, Bunraku puppets, rod puppets, finger puppets and many others (a general review is given in [49]). Experienced puppeteers are able to create interactive and engaging personalities through the use of puppet movement, as well as voice and facial expression in some cases. Moreover, social interaction is a primary goal in puppetry. As mentioned by David Logan [50], a renowned puppeteer: “Puppetry is a highly effective and dynamically creative means of exploring the richness of interpersonal communication.” Put another way, “Puppetry is the act of bringing inanimate objects to life through direct manipulation” [51]. Latshaw [48] describes the general preparation and performance skills puppeteers employ, some of which may be relevant to the design of robotic movement: • Experience working with alternate anatomies for puppets that may not follow the human skeletal and joint system • Manipulation of motion timing to slow down or speed up actions • External observance of his or her own performance as it is happening • Control of all facets of production, from conception and design to performance • Ability to understand construction and movement capabilities of puppets Furthermore, puppeteers’ performance methods as described by Latshaw [48] share some similarities with roboticists’ design of movements: the puppeteer’s movements are restricted to those that move the puppet effectively; the puppeteers themselves cannot see or be seen by an audience; and all communication is indirect, from puppeteer to puppet and from puppet to audience. To the last point, Fukuda and Ueda [52] found that laypeople perceive a robot’s animacy differently depending on whether they are controlling the robot or
129
whether they are just observing it—suggesting that gesture design may be particularly challenging for those without experience. We therefore hypothesize that puppeteers will be more adept at authoring effective emotional gestures for our robot. • Hypothesis 3a: Puppeteers will create robot gestures that are more lifelike than amateurs. • Hypothesis 3b: Puppeteers will create robot gestures that are more liked than amateurs. • Hypothesis 4: Puppeteers will create robot gestures that convey emotion better than amateurs. 2.3 Gestures in human-robot communication Gestures have been used frequently in social robotics. Literature in HRI includes several examples of the use of body movement to show emotions: Mizoguchi et al. [3] employed ballet-like poses for a mobile robot; Lim, Ishii and Takanishi [53] used different motions for a bipedal humanoid robot; Marui and Matsumaru [7] used the same robot bear employed in this study to look at how participants used head and arm movements to convey emotion. These earlier studies focused on the creation of emotion-conveying gestures, rather than the perception of such gestures. One notable exception is work by Nakagawa [54], who developed an “affective nuance model” from which a corpus of gestures can be characterized based on the dimensions of valence and arousal. Participants could judge valence and arousal from the movements generated; however, this study did not evaluate the interpretation of specific emotions from multiple authors as done in this study. Past studies have suggested that motion characteristics affect how lifelike a social robot appears. Closely related to lifelikeness, “social presence” is a term used in the HRI literature. As experienced by a person interacting with a robot, social presence has been defined as “access to another intelligence” [55]. Kidd and Breazeal [56] showed that people judge a robot that moved its eyes to be more socially present and engaging of their senses than a similar animated character. Lee et al. [5] showed that participants can feel engaged with a social presence when interacting with a robot and that judgments of social presence are affected by robot personality as evinced through movement and appearance. Human-robot gesture interaction has also been investigated in situ. Ono et al. [57] proposed a model of gesture understanding during embodied communication that consisted of two requirements: a relationship derived from a mutually entrained gesture, and a joint viewpoint built on this relationship. They investigated these factors using a routefinding scenario in which a robot with head and arm motion ability gave directions to participants trying to locate a specific room. Kanda et al. [58] used motion capturing systems
130
to measure how people move their bodies in response to a humanoid robot that speaks and makes gestures. They found that subjects who judged their interaction with the robot positively also had coordinated, “entrained” gesture behaviour, such as synchronized movements and eye contact. 2.4 Approaches to Design of Robot Gestures How are expressive gestures for robots currently being created? The design of social robots is a challenging and complex task. With successful integration in social environments not only necessitating conceptual and behavioral considerations about HRI, but also the actual construction of mechanical, electrical and motor control systems of the robot, it is not surprising that development methods differ widely. While there is no literature to our knowledge that categorizes design methodologies for robot gestures specifically, Fong [2] describes two primary methods for the development of social robots in general which can be related to robot gestures: “biologically inspired,” in which robots internally simulate social processes in living things, and “functionally inspired” in which robots give the outward appearance of being socially competent but may not have an internal design based on nature. There are two main reasons for biologically inspired designs. First, nature is believed to be the best source for lifelike behavioral models and people can more easily understand robots and their capabilities if they appear to be similar to familiar living creatures [59]. Second, using biological inspirations for robots allows researchers to test scientific theories related to design, such as ethology, theory of mind and development psychology. For example, MIT’s “Cog” is a humanoid robot that can be used to explore cognitive theory. Its behaviour and movement are based on high-level cognitive modules in which environmental and intentionality information is taken into account. Robots that learn gestures by imitating other living entities also fit this category, such as matching a robot’s observed motion sequences to known motor primitives [60]. With functionally inspired designs, it is sufficient to implement the mechanisms and behaviors which create the impression of an intelligent social actor, even if the internal design has no motivation in nature. These robots generally have limited abilities as compared to biologically inspired counterparts which have more flexible social behaviour. However, the advantages to functional designs are that they may be sufficient to produce compelling interactions, since many robots may only require superficial or low-level social skills. Su et al. [18], for example, developed a pet robot architecture using behaviour-based architecture in which complex gestures are decomposed into multiple low-level control modules.
Int J Soc Robot (2011) 3: 125–142
2.5 Puppetry and Human-Robot Interaction Scientific research linking puppetry and HRI has been focused on several areas. Related to HCI, puppetry has long been incorporated into techniques for motion capture and animation with the use of puppeteers. Ever since “Mike the Talking Head” (one of the earliest examples of computer puppetry) was developed by deGraf-Wahrman for Silicon Graphics in 1988 and demonstrated at Siggraph 1989, the tradition of using puppeteers to control animated and robotic characters in the film and media industry has continued to the present day [51]. Typically single or multiple puppeteers are able to control in real-time a character’s facial expressions, mouth shape, head and limb positions and any other features by manipulating their puppet. Puppetry practices have also been applied to the design and study of HRI. Xing and Chen [61] designed a robot with form factors and control mechanisms inspired by traditional puppets. Plaisant [62] describes the use of robot dolls as a therapeutic storytelling tool for children much as puppets have been used to elicit expressive communication among children in crisis or with behavioral challenges. Sabanovic et al. [63] applied shadow puppetry techniques (which uses the movements of shadows of hands cast on a wall to portray characters and tell a story) as a model for synchronous social interaction in HRI. Meisner et al. [64] further developed non-verbal interactive capabilities for a robot based on shadow puppet play, such that the robot could engage in shadow puppet games. To our knowledge, no previous research has involved puppeteers in the design process for robots or robot gestures.
3 Studies 1 and 2: Motion Characteristics, Situational Context and Emotion Valence 3.1 Method We conducted two related studies to investigate whether motion characteristics and situational context influence gesture encoding and decoding. In the first study, participants created gestures using the bear robot; in the second, participants viewed and judged the gestures created by participants in the first study either as a video of the robot or as an animation. Our method (viewing conditions, recognition measures, etc.) was modeled on video and point-light experiments in human gesturing (e.g., [21]). In particular, we chose to show participants of the second study videos of the robot as opposed to a co-present robot to emulate body videos used in studies of human movement and to facilitate more reproducible results (as all participants were guaranteed to have viewed the exact same gestures).
Int J Soc Robot (2011) 3: 125–142
3.1.1 Study 1 Apparatus
131
the gesture created and to re-record if they were dissatisfied with the result. They were asked to write down the meaning of their gesture. Participants also filled out demographic information and the Negative Attitudes toward Robots Scale (NARS) survey [66].
The robot used in the present research had the form and size of a traditional teddy bear (Fig. 1) called “RobotPHONE” [65]. The bear had six motors (two in the head and two in each of the arms) that allowed it to move its head and each of its arms both vertically and laterally. It could perform movements such as nodding its head and waving its arms but it was unable to move its torso and did not have elbows. Its movements were recorded by the RobotPHONE’s software interface operating on a Sony Vaio laptop PC with Windows XP, connected via USB to the robot. This software allowed the movements made by manipulating the robot directly to be recorded and later played back on the actual robot itself, or as an animation in the software interface.
Participants viewed robot gestures as animations on the RobotPHONE software interface and as Quicktime videos of the bear robot performing each gesture (Fig. 3). The videos were created by playing back the robot gestures created in Study 1 and recording them with a digital video camera. Sessions were conducted on a Sony Vaio laptop PC with Windows XP.
3.1.2 Study 1 Participants
3.1.5 Study 2 Participants
Four participants (two female, two male) ranging in age from 21 to 35 (M = 25, SD = 5) were recruited from the Keio University community. The study was conducted in a lab room at the Shonan Fujisawa Campus located near Tokyo.
Twelve participants (three female, nine male) ranging in age from 18 to 60 (M = 26) were recruited from the Keio University population as in Study 1. All Study 1 participants were excluded from participating in Study 2.
3.1.3 Study 1 Procedure In the first study, four participants created a gesture (by moving the head and arms of the robot) for each of 12 scenarios presented as one or two sentences of text (Table 2). A translated excerpt of the Japanese instructions is as follows: “First, listen to the scenario. Pretend that you are in the scenario. Then pretend that you see the bear robot. Create a gesture that the bear will do. The gesture must try to convey an emotion or message to you.” Participants were given time to practice the gesture before recording it, to review Table 2 List of scenarios
3.1.4 Study 2 Apparatus
3.1.6 Study 2 Procedure In the second study, participants were shown the corpus of 48 gestures created in Study 1. For each gesture they selected one emotion they thought was being conveyed from a list of possible emotions (Table 3). This list was coded by two investigators and one translator from the written descriptions given by the gesture authors (Study 1 participants). The descriptions from Study 1 included both basic emotions such as “I am angry”, and some states that participants treated as emotions such as “I am confused.” It is important to note that while the scenarios provided influenced the emotions created, they did not necessarily define those emotions: for example, given the scenario “You are in the living room watching TV. You laugh at a funny
Scenarios 1.
You have just returned home
2.
You are in the living room watching TV. You laugh at a funny show.
3.
You reach your hand to the bear to pick it up.
4.
You pat the bear’s head.
5.
You have been working at your desk for a long time. You yawn.
6.
You take your clothes off.
7.
You start eating dinner.
8.
You say “goodnight” to the bear.
9.
You start crying.
10.
You start playing music on your computer.
11.
You are sitting at your computer and working.
12.
You start drinking a beer.
Fig. 3 Participants viewed gestures either as (a) animations or (b) videos
132 Table 3 List of emotions
Int J Soc Robot (2011) 3: 125–142 Emotions I am happy I am interested I love you I am confused
changes in direction of the robot’s head; and (b) arm movements, determined by similar counting with the robot’s arms and summing over both arms. For the analyses below both measures were divided into two approximately equal groups representing high movement and low movement.
I am embarrassed I am sad
3.3 Results
I am feeling awkward I am angry
3.3.1 Prior Attitudes
I am surprised Neutral/none
show”, different participants generated a diverse set of emotions coded as: “I’m happy”, “I’m confused”, and “I’m interested.” Clearly, emotions such as these are not uniquely identified by the scenario that motivated them. Participants rated lifelikeness of the gesture, how much they liked the gesture and degree of confidence in their selected emotion on single-item Likert scales from 1 (strongly disagree) to 7 (strongly agree) (sample item: “I liked this gesture”). Participants also filled out demographic information and the Negative Attitudes toward Robots Scale (NARS) survey. The study was structured as a 2 (medium: animation vs. video) × 2 (context: provided vs. withheld) balanced, within-subject experiment. For medium, either an animated version of the gesture was shown (“animation”) or a video of the robot was shown (“video”) (see Fig. 3). This factor was included to test the effect of display methodology— i.e. would results differ between an animation and a video of an actual robot. The animation and video formats were analogous to point-light animation and body video methods in social psychology. Videos were used as opposed to a copresent robot to ensure all participants saw identical gestures (similar to human body motion videos). For context, either the appropriate scenario was read to the participant immediately preceding the gesture (“context”) or else it was omitted (“no context”). Experimental order was randomized for each participant. 3.2 Measures Two measures of successful transmission of gesture emotion were used. The first represented the amount of agreement between author and viewer and was calculated by comparing the author’s written meaning with each viewer’s selected codes. The second represented consensus agreement among viewers and was based on the frequency of the most selected code among viewers without regard to the author’s intended emotion. Gesture complexity was measured in two ways: (a) head movements, determined by counting the number of
The prior attitudes of gesture designers were assessed to give the following NARS scores: negative attitudes toward the social influence of robots (M = 15.4, SD = 5.4, on a scale from 5, least negative, to 25, most negative) and negative attitudes toward emotional interaction with robots (M = 7.3, SD = 4.5, on a scale from 3, least negative, to 15, most negative). For Study 2 participants, the assessment gave: negative attitudes towards social influence of robots (M = 15.4, SD = 5.2) and negative attitudes toward emotional interaction with robots (M = 8.44, SD = 2.39). These results are similar to normative data [66] and indicate designers and viewers did not have overly negative attitudes toward robots. 3.3.2 Corpus of Robot Gestures Characteristics of the corpus of robot gestures are summarized in Table 4, which gives motion characteristics and ratings based on perceived emotions. Right arm movements were used more than left arm movements across all emotion types, likely because the gesture authors were right-handed (handedness is relative to the viewer not the robot). All participants were observed to have manipulated the bear while having the bear facing them (rather than facing away). Positive emotions tended to be associated with the highest number of right and left arm movements and the lowest number of head movements. In particular, “I love you” was associated with an average of two changes in head direction but seven changes in both right and left arm direction. Negative emotions had balanced numbers of head, right arm and left arm movements, while neutral emotions had fewer movements overall. Neutral emotions had shorter average durations while “I am confused” had the longest average duration. 3.3.3 Emotion Recognition Hypothesis 1 was supported. The ability of people to detect intended emotions was assessed in the following way. First the 432 emotion judgments were scored as either correct or incorrect based on whether or not the selected emotion
Int J Soc Robot (2011) 3: 125–142
133
Table 4 Characteristics of perceived emotionsa Head movements
Right armb movements
Left armb movements
Gesture time, sec
Like rating
Lifelikeness rating
I am happy
2.33 (0.39)
4.62 (0.52)
4.46 (0.55)
7.29 (0.36)
4.53 (0.12)
4.67 (0.12)
I am interested
5.13 (1.08)
5.80 (0.85)
5.37 (0.97)
7.87 (0.59)
4.83 (0.23)
4.76 (0.22)
I love you
2.00 (0.53)
7.70 (0.96)
7.13 (0.98)
8.47 (0.65)
4.70 (0.22)
5.43 (0.22)
Overall
3.01 (0.38)
5.47 (0.41)
5.16 (0.44)
7.64 (0.28)
4.64 (0.10)
4.83 (0.10)
Emotion
Positive Emotions:
Negative Emotions: I am confused
5.16 (0.92)
6.62 (0.96)
5.11 (1.14)
8.98 (0.72)
3.80 (0.16)
4.18 (0.22)
I am embarrassed
3.19 (0.68)
3.25 (0.68)
2.25 (0.56)
7.59 (0.49)
4.09 (0.21)
4.34 (0.21)
I am sad
3.48 (0.70)
3.90 (0.88)
2.65 (0.60)
8.19 (0.73)
3.81 (0.22)
4.52 (0.22)
I am feeling awkward
3.36 (0.59)
3.64 (0.74)
2.38 (0.54)
7.85 (0.57)
3.87 (0.21)
4.47 (0.20)
I am angry
3.39 (0.66)
3.67 (0.68)
3.02 (0.58)
7.65 (0.58)
3.92 (0.18)
4.18 (0.17)
Overall
3.75 (0.33)
4.28 (0.37)
3.16 (0.34)
8.06 (0.28)
3.89 (0.09)
4.32 (0.10)
Neutral Emotions: I am surprised
4.18 (0.69)
4.06 (0.71)
3.20 (0.76)
7.33 (0.55)
4.90 (0.19)
4.59 (0.21)
(neutral)
3.24 (0.57)
2.52 (0.47)
1.96 (0.44)
6.64 (0.39)
3.24 (0.13)
3.58 (0.14)
Overall
3.58 (0.44)
3.08 (0.40)
2.42 (0.40)
6.89 (0.32)
3.84 (0.13)
3.95 (0.13)
a Means b Right
with standard error in brackets. Like and lifelikeness ratings on a 7-point Likert scale (1-strongly disagree; 7-strongly agree)
and left are from the perspective of the viewer, not the bear
agreed with the emotion that had been attached to the gesture by its author (trials without agreement on coded emotion were excluded). Using the Z approximation to the binomial test as implemented in SPSS, the observed proportion of correct responses (22%) was significantly greater than the one in ten (10%) correct responses that would have been expected by chance (p < .001)—although fully 78% of the emotions were not judged correctly. Thus our expectation that the emotions associated with gestures would be recognized better than chance was confirmed, although the rate was relatively poor. 3.3.4 Effect of Situational Context Hypothesis 2 was partially supported. Two-way repeated measures analysis of variance (ANOVA) was conducted with medium and context as within-subjects factors. Context had a borderline significant main effect on author-viewer emotion agreement, F (1, 11) = 4.46, p = .058. Figure 4 shows how the experimental conditions affected the accuracy of identifying the correct emotion. Context tended to improve emotion recognition (M = 26.7%, SD = 3.61%) compared with no context (M = 15.2%, SD = 3.48%). There was no main or interaction effect of medium. No effect of context was found on participant ratings of identification confidence, although in post-study interviews subjects reported judging gestures to be easier when the scenario was provided.
Fig. 4 Emotion recognition versus experimental condition. Error bars show 95% confidence intervals
3.3.5 Effect of Emotion Valence Aside from the two main hypotheses tested in the first paired study, some additional post-hoc analyses were conducted. Emotion valence was investigated in a post-hoc analysis with a one-way repeated measures ANOVA conducted with emotion valence of the gesture (as defined by positive, negative, or neutral emotion) as the within-subject factor, and the likeability rating as the dependent measure. Emotion valence had a significant effect (F (2, 22) = 12.56, p < .001,
134
r = .60) on the likeability of a gesture. As shown in Table 4, gestures conveying positive emotions are rated as more likable (M = 4.64), followed by gestures conveying neutral emotions (M = 3.89) and negative emotions (M = 3.84). A simple contrast was done for the emotion type variable, using neutral emotions as the control category to which positive and negative emotions were compared. The results show a significant difference between positive versus neutral emotions, F (1, 11) = 28.34, p < .001, r = .84, but not between negative versus neutral emotions. Thus, viewers like gestures conveying positive emotions more than those conveying neutral or negative emotions. An identical one-way repeated measures ANOVA was conducted but with lifelikeness as the dependent measure. Emotion valence had a significant effect, F (2, 44) = 9.07, p < .001, r = .41, on perceived lifelikeness of a gesture. Gestures conveying positive emotions appeared more lifelike (M = 4.83), followed by gestures conveying negative emotions (M = 4.32) and neutral emotions (M = 3.95). Simple contrasts revealed that gestures with positive emotions were judged to be significantly more lifelike versus neutral emotions, F (2, 22) = 16.86, p < .001, r = .66. Again, no significant difference was found between negative versus neutral emotions. 3.3.6 Effect of Motion Characteristics As an additional post-hoc analysis, motion characteristics were investigated with two three-way repeated measures ANOVAs: one with medium, context and gesture arm movement as within-subject factors, and one with medium, context and gesture head movement as within-subject factors. The main effect of arm movement complexity was statistically significant, F (1, 11) = 7.530, p = .019. Figure 5 shows the ratings of lifelikeness that resulted from movements of the arm and head. Looking at the arms condition reveals that gestures with a large number of changes in arm movement direction were perceived as being more lifelike (M = 4.68, SD = .147) than those with fewer arm movements (M = 4.22, SD = .123). This effect was not found in the case of head movements, and there were no interaction effects of medium or context.
Int J Soc Robot (2011) 3: 125–142
Fig. 5 Lifelikeness versus arm and head movement complexity. Error bars show 95% confidence intervals
situation in which gestures are made. Thus most realistic use of social robots involves settings where the context in which gestures are made is known. This is fortunate, since our results show that context has a large impact on how well gestures are understood, boosting emotion understanding accuracy close to the 30% level for the types of gestures and scenarios used in this study. Arm movements, but not head movements, were found to facilitate the perception of lifelikeness. This is in spite of the fact that the robot used in this study was capable of only simple arm movements and did not have a wrist, elbow or shoulder with which to make more subtle arm movements. Therefore the discriminating ability of arm movements identified in studies of human gestures seems to be applicable to robot gestures as well. This finding also corresponds with research by Nomura and Nakao [67], which suggests that selective attention (or “cognitive bias”) to a robot’s specific body motion parts (such as head, arms, hands and legs) correlates with a viewer’s accuracy of emotion recognition. Gestures conveying positive emotions were more liked and perceived to be more lifelike than those conveying either negative or neutral emotions. Closer inspection revealed that gestures associated with positive emotions had more arm movements, indicating that the results on emotional valence and movement complexity may be inter-related. We also found different emotions had distinct movement patterns, complementing work by Marui on temporal and spatial tendencies for robot bear gestures conveying different emotions [7].
3.4 Discussion The present results indicate that simple robot gestures (not involving articulated joints) can provide useful information concerning emotions. However, these simple gestures by themselves do not provide a lot of information and recognition is low although significantly better than chance. In practice, people are aided by their knowledge of the current
4 Studies 3 and 4: Author Expertise 4.1 Method Studies 3 and 4 used the same paired-study methodology as employed by Studies 1 and 2.
Int J Soc Robot (2011) 3: 125–142
4.1.1 Study 3 Apparatus Study 3 used the same robotic bear as in Study 1. 4.1.2 Study 3 Participants Ten people (four female, six male) ranging in age from 21 to 65 years old (M = 31.5, SD = 11.8) participated in Study 3. Five participants (two female, three male) were recruited from the University of Toronto community and had no prior experience in puppetry. The remaining five participants were recruited from the puppetry community in Toronto (puppetry students, professionals affiliated with puppeteer theaters and freelance puppeteers) and had between three and 39 years of professional puppetry experience (M = 13.6, SD = 14.6). 4.1.3 Study 3 Procedure Participants directly manipulated the bear robot to convey each of six basic emotions identified by Ekman et al. [20]: anger, disgust, fear, happiness, sadness and surprise. They were given practice time and had the option to view and rerecord each gesture if they were dissatisfied with their result. The instructions given were similar to those used in Study 1. Participants were told that during the session they were free to make comments concerning their gesture creation and detailed quotations were drawn during the think-aloud experiment. 4.1.4 Study 4 Apparatus The same set-up was used as in Study 2.
135
The study was structured as a 2 (expertise: puppeteer vs. amateur) × 6 (emotion type: anger, disgust, fear, happiness, sadness, surprise) balanced, repeated measures experiment. Experimental order was randomized for each participant. Participants completed questionnaires on demographics, personality and attitude toward robots. 4.2 Measures How well observers recognized gestures was assessed using two measures. The first measure, “raw rating,” was the observer’s selected rating for the emotion that the author was attempting to convey. For example, if the author’s intended emotion for a gesture was sadness and the observer had selected 7 (Strongly Agree) on the Likert scale for “This gesture’s meaning is: I’m sad,” then this measure would be 7. While the raw rating gives a general sense of the communicative effectiveness of a gesture, it does not take into account overlap with the other emotion ratings. Participants could in principle have viewed a gesture and selected 7 for all emotions—in which case they did not really identify the emotion, even though the raw rating would be high. Consequently, the second measure, “normalized rating,” was calculated as the observer’s selected rating for the correct emotion minus the average of the other five ratings for any given viewing. In the above example this measure would be: 7 minus the average of observer’s ratings for all emotions except sadness. This measure better describes how well viewers were able to distinguish one emotion above others in viewing the robot gesture, and has a range from -6 (worst accuracy, the viewer selected 1 for the correct emotion and 7 for all others) to 6 (best accuracy, the viewer selected 7 for the correct emotion and 1 for all others). 4.3 Results
4.1.5 Study 4 participants 4.3.1 Prior Attitudes Twelve people (three female) ranging in age from 21 to 30 years old (M = 25.1, SD = 2.57) participated in Study 4. All participants were recruited from the University of Toronto community. All Study 3 participants were excluded from participating in Study 4. 4.1.6 Study 4 Procedure Study 4 participants were shown the corpus of gestures created in Study 3 (ten authors for each of six emotions for a total of 60 gestures). They were given the option to re-watch the gesture as many times as they needed. They rated liking and lifelikeness as well as the degree to which they believed each of the six emotions was being conveyed by the robot (sample item: “This gesture’s meaning is: I’m angry”) using seven-point Likert scales.
The prior attitudes of gesture designers were assessed to yield the following NARS scores: negative attitudes toward social influence of robots, M = 14.6, SD = 3.24; negative attitudes toward emotions in interaction with robots, M = 9.8, SD = 1.81. The attitudes of gesture viewers were assessed to give: negative attitudes toward social influence of robots, M = 15.3, SD = 2.75; negative attitudes toward emotions in interaction with robots, M = 8.58, SD = 2.07. These results indicate that prior attitudes of designers and viewers are similar to normative data. 4.3.2 Robot Gesture Creation Participants in Study 3 who had no puppetry experience created gestures with mean duration 6.40 seconds (SD = 2.94)
136
while puppeteers created gestures with mean duration 7.10 seconds (SD = 3.29). An independent samples t-test with gesture duration as the test variable and puppetry experience as the grouping variable revealed no significant difference in duration, t (58) = .690, p = .389, two-tailed. Anecdotally, puppeteers seemed to spend more time than the control group to create gestures, often taking up the entire 45 minutes scheduled. From observations of think-aloud comments, all individuals with puppetry experience remarked that having the robot touch its face would be extremely useful and that this was a severe limitation to its ability to convey emotion. (The bear robot used did not have the range of motion to allow for touching of the face in a dynamic way due to lack of elbow joints.) One puppeteer, not being able to manipulate the bear to touch its mouth and stomach to indicate disgust, said “I can’t do disgusted. . . I can’t do it.” Upon manipulating the robot to convey fear, another puppeteer remarked, “I can’t hide its eyes (laugh).” The control group of amateurs made few remarks about limitations. Differences in manipulation techniques were observed between puppeteers and amateurs. All puppeteers manipulated the bear robot with the robot facing away from them; this allowed better manipulation of the head and two arms of the robot simultaneously with their thumb and fingers. All amateurs but one manipulated the robot with the robot facing them on the table. 4.3.3 Effect of Puppetry Experience Hypothesis 3 was not supported. We did not find evidence that puppeteers created either more likeable or more lifelike gestures. To determine the effect of puppetry experience on the likeability or lifelikeness of robot gesture design, we conducted paired t-tests with puppetry as the independent factor and each of the ratings as the repeated dependent factor, with data split by emotion type. The upper rows of Table 5 present the results. Ratings of liking did not differ significantly between gestures created by experts and those created by amateurs for any emotion type. The same was true for lifelikeness, except for a marginally significant effect for happiness, t (11) = −1.71, p = .058. Hypothesis 4 was partially supported. To look at how puppetry experience affected emotion recognition, we conducted paired t-tests with recognition measures as dependent factors; results are presented in the lower rows of Table 5. The raw ratings did not differ significantly, although for disgust there was a borderline effect for puppeteerdesigned gestures being rated higher than amateur-designed gestures (M = 3.72, SD = .97 compared to M = 3.35, SD = .88, t (11) = −1.46, p = .086, one tailed). The normalized measure of recognition was significantly better for expert-designed gestures over novice-designed
Int J Soc Robot (2011) 3: 125–142
gestures for the emotions of disgust and fear. Figure 6 illustrates how well viewers recognized the emotion of gestures created by the puppeteer and amateur groups. Disgust was conveyed better with puppeteer’s gestures (M = .420, SD = .576) than with amateur’s gestures (M = .010, SD = .622), paired t (11) = −1.80, p = .050. Gestures conveying fear were also recognized better with puppeteer’s gestures (M = .270, SD = .786) than with amateur’s gestures (−.197, SD = .546), paired t (11) = −2.26, p = .023. To put this in context, if gestures tended to receive neutral ratings of 4 (i.e. neither agreeing or disagreeing that a certain emotion was being conveyed), puppeteer’s gestures for fear would receive on average a 4.27 agreeing with the correct emotion whereas novice-designed gestures for fear would receive on average a 3.82 disagreeing with the emotion. Closer inspection reveals that a possible reason that puppeteers created better gestures for fear is that with the novice-created gestures, viewers mistook fear for happiness or surprise. We conducted a repeated measures analysis of variance (ANOVA) for the fear emotion gestures with puppetry experience as the independent measure and all emotion ratings as dependent measures. For gestures designed to convey fear, normalized ratings of happiness were significantly higher for novices (M = .423, SD = .820) than for experts (M = −.444, SD = .723), F (1, 11) = 8.96, p = .012. The same tendency was found for ratings of surprise in the context of gestures intended to convey fear: ratings for novice gestures (M = .505, SD = .821) were significantly higher than for puppeteer gestures (M = −.428, SD = .600), F (1, 11) = 6.52, p = .027. Figure 7 presents video stills from the best and worst identified robot gestures that were designed to convey disgust and fear (full videos are available at: http://www. youtube.com/user/robotstudy). In the case of both emotions, the best identified videos were designed by puppeteers and the worst identified videos were designed by amateurs. These examples allude to the large variability of movements employed by the designers to convey a given emotion. 4.3.4 Correlation Among Emotions Pair-wise Pearson product-moment correlations were computed across all the emotions. As shown in Table 6, all pairs of negative emotions (anger, fear, disgust and sadness) were positively correlated with each other, with the highest correlation being between anger and disgust, r = .59, p < .01. The negative emotions of anger and disgust were also negatively correlated with happiness (r = −0.11, p < .01 and r = −0.16, p < .01, respectively), but the correlations, while significant, were low. 4.4 Discussion As in the first paired study, communication of emotions using only gestures of a bear robot and without the use of
Int J Soc Robot (2011) 3: 125–142
137
Table 5 Effect of puppeteer experience on liking, lifelikeness and gesture recognition
t (df = 11)
p
η2
4.57 (.61)
−.771
.229
.051
4.48 (.82)
−.327
.375
.010
4.45 (1.04)
.000
.500
.000
Amateur
Puppeteer
M (SD)
M (SD)
Anger
4.43 (.85)
Disgust
4.40 (.43)
Fear
4.45 (.63)
Liking:
Happiness
4.49 (.92)
4.50 (.64)
−.066
.474
.000
Sadness
4.37 (.65)
4.58 (.59)
−.974
.176
.079
Surprise
4.52 (.49)
4.60 (.96)
−.321
.377
.009
Lifelikeness: Anger
4.15 (.97)
4.28 (1.10)
−.513
.309
.023
Disgust
4.32 (.73)
4.13 (1.30)
.642
.267
.036
Fear
4.20 (.93)
4.22 (.99)
−.046
.482
.000
Happiness
4.20 (.96)
4.52 (.91)
−1.71
.058a
.210
Sadness
4.33 (.94)
4.33 (1.02)
.034
.487
.000
Surprise
4.23 (.86)
4.38 (1.12)
−.515
.309
.024
Anger
3.40 (.86)
3.37 (1.09)
.119
.454
.001
Disgust
3.35 (.88)
3.72 (.97)
−1.46
.086a
.163
Fear
3.40 (1.02)
3.32 (1.23)
.273
.395
.007
Happiness
3.10 (.94)
2.97 (.92)
.498
.314
.022
Raw rating:
Sadness
3.40 (.89)
3.20 (.85)
.852
.207
.062
Surprise
3.50 (1.14)
3.43 (.90)
.236
.409
.005
Anger
.247 (.829)
.123 (.845)
.382
.355
.013
Disgust
.010 (.622)
.420 (.576)
−1.80
.050b
.228
Fear
−.197 (.546)
.270 (.786)
−2.26
.023b
.316
Happiness
−.160 (.951)
−.337 (.674)
.500
.314
.022
Normalized rating:
a Significant
at the p < .10 level
b Significant
at the p < .05 level
Sadness
.027 (.725)
−.157 (.703)
.592
.283
.031
Surprise
.247 (.630)
.230 (.832)
.053
.479
.000
Fig. 6 Puppeteers created more recognizable gestures for fear and disgust. Error bars represent 95% confidence intervals
speech and facial expression information was thus quite difficult to interpret. On the whole judgers were not able to confidently identify emotions. One contributor to this was
the confusion over different emotions: ratings for negative emotions (anger, fear, disgust and sadness) tended to correlate with one another as did ratings for the positive and
138
Int J Soc Robot (2011) 3: 125–142
Fig. 7 Video stills from the first 5 seconds of gestures designed to convey fear and disgust: (a) fear, best recognized; (b) fear, worst recognized; (c) disgust, best recognized; (d) disgust, worst recognized. Gestures (a) and (c) were designed by puppeteers; (b) and (d) were designed by amateurs. Note that video (c) is only 4 seconds long
Table 6 Simple correlations among emotion ratings by gesture viewers
Emotion
Disgust
Fear
Happiness
Sadness
Surprise
Anger
0.59a
0.30a
−0.11a
0.27a
0.01
0.39a
−0.16a
0.28a
0.00
−0.06
0.38a
0.20a
−0.30a
0.47a
Disgust Fear Happiness a Significant
at the .01 level
−0.20a
Sadness
neutral emotions (happiness and surprise, respectively). This suggests that people may be able to identify a general sense of emotional valence of robot gestures more easily than specific emotions. Despite the difficulty in emotion recognition, this experiment showed that puppetry experience had an effect on how well robot gestures were designed. Puppeteers created gestures that better portrayed fear and disgust than novices. The likely explanation for why this effect was not more pronounced was that the RobotPHONE platform used in this study did not have enough movement capability for the experience of the puppeteers to fully come into play. For such a simple and constrained robot non-puppeteers were able to create gestures that were similar in effectiveness as puppeteers. Indeed, the puppeteers complained about the limitations of the bear (e.g., its inability to touch its own face). Another consideration is that some emotions may have been more difficult to convey using robotic movement than others: perhaps fear and disgust required more complex movements and expressivity from the robot, which puppeteers were better able to provide than amateurs. Previous work [68] looking at expression of the basic six emotions found that recognition rates for fear was lower than other
emotions because viewers confused fear with the emotion of disgust; likewise, in this study participants confused fear with happiness or surprise. Although Itoh et al. [68] used a more advanced robot with facial expressivity, wrist, finger, and elbow movement, fear may be an emotion that is particularly difficult to convey in gestures and in the absence of facial expression.
5 Conclusion 5.1 Limitations and Future Work One of the challenges in carrying out research on humanrobot interaction is that many possible types of robots can be studied. This research uses a relatively simple robot to assess a lower bound for robot gesture perception and to facilitate easy manipulation by subjects. While our results are directly applicable only to social robots that are limited to arm and head movements such as the one used in our studies, we expect that the design implications of the research presented here may be generalizable to more sophisticated robots. Indeed, there is evidence that a person’s ability to discern information such as emotion from visual behaviour is based on
Int J Soc Robot (2011) 3: 125–142
motion information such as kinematics and dynamics rather than static information such as appearance or shape [23]. However, future research should examine how gestures of more complex robots are interpreted (including movements of torso, legs and perhaps non-human limbs such as a tail). The robot movements used in this research were very simple. While this was necessary to facilitate easy robot manipulation and production of gestures by laypeople, it limited the range of gestures that could be created, as none had hand articulation (such as differing hand shapes), wrist movement, or elbow movement. We note that point-light animations also omit some of this information due to their focus on kinematic and dynamic information. However, we recognize that this limitation may have prevented the full skill of puppeteers from being realized in robot gesture creation and that the puppeteers may have out-performed amateurs more consistently with a more complex robot. The methodology employed here involved using videos of robot movement instead of co-present robots (co-present robots are more common in communication studies of HRI although examples of videos exist, e.g. [69]). However, our goal was not to compare results with other HRI studies per se, but rather to look at how gesture research in the social psychology literature could be applied to HRI. There was also the practical consideration that the robot used in this study was not able to play back recorded movements in a completely consistent fashion so that video recordings were needed to make sure that the same gesture was seen by all participants. Although this limits the comparability of our work with that of other gesture studies in HRI, we did find some similarities with other studies in the finding that fear was particularly difficult to identify with gestures designed by amateurs. This work chose puppeteers as our gesture experts; however, it is possible that individuals from other professions such as choreographers, mimes or hearing-impaired individuals could create more interpretable and lifelike gestures than puppeteers. As the issue of who should design gestures is of particular importance to social robot development, additional research is needed with more complex robots to identify what types of expertise, skills or characteristics are most beneficial in designing gestures. Our study was meant as a preliminary test to see if professionals such as puppeteers are able to design movements that better communicate emotion. A natural extension of our work would be to evaluate why puppeteers designed better gestures. While we did not explicitly look at this, literature on the puppetry profession suggest some specialized skills and attributes of puppeteers that may be applicable to the creation of robot movement. Future research in this area may help qualify what specific characteristics of the emotional gestures designed by puppeteers make them more recognizable, as well as what design techniques may be employed to create such gestures.
139
Although we conducted experiments in both Canada and Japan, this work did not explore cross-cultural differences because of disparities in study methodology. While we expect many of the effects identified here to be common across many cultures, some characteristics of non-verbal communication are culturally-dependent [11, 70] and their impact on robot gesture understanding should be explored in greater depth. One further concern, raised by a reviewer, is that the appearance of the teddy bear may have led people to expect more gestural and emotional expressivity than it was actually capable of. It is possible that the appearance of the robot as a bear may have made people more willing to accept a limited set of movements as indicators of emotion. However, it also seemed that the puppeteers were frustrated by their inability to express the kinds of articulated gestures implied by the robot’s appearance (one might for instance expect a bear to be able to touch its face). It would be interesting in future studies to examine the degree of expressivity or emotionality implied by the appearance of the robot vs. the range of movements that it is actually able to express. Is it better, for instance, to match the movement capabilities of the robot to the gestural expressivity implied by its appearance, or should the appearance of the social robot maximize its perceived expressiveness regardless of what movements it is actually capable of? Due to the limitations of the research noted above and the relative novelty of this research area, we did not attempt to infer design guidelines or recommendations from this study. Nevertheless, it is hoped that the results obtained in our studies may be further tested in future research with the goal of developing future guidelines for the design of robot gestures. One goal of future research might be to boost and evaluate the level of emotion understanding that can be achieved by adding facial expressiveness. A second approach may be to examine in more detail how gestures that convey positive emotions (and that have many arm movements) improve the lifelikeness of robots. The use of contextual information in decoding the meaning of gestures should also be studied in more detail. For instance, the use of speech, non-speech audio (e.g., chuckling or laughing while gesturing) or referential gestures (e.g., turning the head or pointing toward an object of interest) to provide context may be another way to enhance the impact and understandability of expressive gestures. 5.2 Main Findings and Contributions The goal of this research was to explore the relatively new research topic of gesture communication in social robots using a simple robot as a case study. We posed three main research questions: Can emotions be conveyed through simple
140
robot gestures involving only the head and arms? What factors affect that communication? Are puppeteers better than amateurs at designing robot gestures? We investigated these questions using two paired experiments. Studies 1 and 2 were exploratory in that they tested perception of emotion in robot gestures and the influence of emotional valence, motion characteristics and situational context. Studies 3 and 4 focused on the effect of expertise— specifically that of puppeteers—on gesture communication. The results from this study demonstrate that even gestures of a simple social robot capable of arm and head movement only (without moveable fingers, wrists or elbows) can convey emotional meaning. How well viewers were able to decode a gesture’s intended meaning was affected by knowledge of the gesture’s situational context (providing context improved understanding) and gesture author (puppeteerauthored gestures were better understood for fear and disgust). How lifelike viewers judged a gesture to be was affected by emotional valence (positive emotion expression suggested greater lifelikeness) and complexity of arm movements (more arm movements improved gesture lifelikeness). Our work also illustrates how existing research and methodologies from the study of interpersonal behavior can be used to formulate and test hypotheses related to how people understand robot movement. 5.3 Final Words This paper has presented initial research concerning gesture understanding in human-robot interaction. As in human non-verbal communication, gestures of the body play an important role in HRI. Just as interpersonal communication has been used to motivate HCI, studies in human movement using point-light animation and body video can likewise be applied to robot gesture design. While visions of human-like androids capable of integrating seamlessly in human society depicted by movies and shows such as Star Trek and Battlestar Galactica may well come to pass at some point in the future, it seems likely that social robots will continue to have limited capabilities for the foreseeable future—making it all the more important that simple gestures be expressive and meaningful. The results of this research provide reason to believe that, in the absence of full artificial intelligence, it should still be possible for interaction designers to use interface elements such as gestures to increase the expressive power and usability of simple social robots. Acknowledgements The authors are indebted to Professor Michiaki Yasumura (Keio University) for valuable guidance and support, as well as Sachi Mizobuchi (Toyota), Ryo Yoshida (Keio University) and Flora Wan (University of Toronto) for their valuable research and translation assistance. Further thanks go to members of the IML lab at the University of Toronto and iDL lab at Keio University; to our experimental participants from Keio University, University of Toronto and the Toronto puppetry community; and to the Japan Society for the Promotion of
Int J Soc Robot (2011) 3: 125–142 Science and the Natural Sciences and Engineering Research Council of Canada for funding. We would also like to thank the organizers and participants of the 2nd International Conference on Human-Robot Personal Relationships for their insightful discussions on this work.
References 1. Breazeal C (2003) Toward sociable robots. Robot Auton Syst 42:167–175 2. Fong T, Nourbakhsh I, Dautenhahn K (2003) A survey of socially interactive robots. Robot Auton Syst 42:143–166 3. Mizoguchi H, Sato T, Takagi K, Nakao M, Hatamura Y (1997) Realization of expressive mobile robot. In: Proceedings of the international conference on robotics and automation, pp 581–586 4. Reeves B, Nass C (1996) The media equation. Cambridge University Press, Cambridge 5. Lee KM, Peng W, Jin S-A, Yan C (2006) Can robots manifest personality? An empirical test of personality recognition social responses and social presence in human–robot interaction. J Commun 56:754–772 6. Sidner C, Lee C, Morency L-P, Forlines C (2006) The effect of head-nod recognition in human-robot conversation. In: Proc of ACM SIGCHI/SIGART conference on HRI, pp 290–296 7. Marui N, Matsumaru T (2005) Emotional motion of humanfriendly robot: emotional expression with bodily movement as the motion media. Nippon Robotto Gakkai Gakujutsu Koenkai Yokoshu 23:2H12 8. Bacon F (1815) The works of sir Francis Bacon. Jones, London 9. McNeill D (1987) Psycholinguistics: a new approach. Harper Row, New York 10. Wachsmuth I, Lenzen M, Knoblich G (2008) Embodied communication in humans and machines. Oxford University Press, London 11. Argyle M (1994) The psychology of interpersonal behaviour, 5th edn. Penguin, London 12. Nehaniv C (2005) Classifying types of gesture and inferring intent. In: Proc AISB’05 symposium on robot companions the society for the study of artificial intelligence and simulation of behaviour, pp 74–81 13. Levy D (2007) Intimate relationships with artificial partners. PhD thesis University of Maastricht 14. Cassell J (2000) Embodied conversational interface agents. Commun ACM 43(4):70–78 15. Cassell J, Thorisson KR (1999) The power of a nod and a glance: envelope vs emotional feedback in animated conversational agents. Appl Artif Intell 13(4):519–538 16. Hodgins JK, O’Brien JF, Tumblin J (1998) Perception of human motion with different geometrical models. IEEE Trans Vis Comput Graph 4:307–317 17. Blake R, Shiffar M (2007) Perception of human motion. Annu Rev Psychol 58:47–73 18. Su M-H, Lee W-P, Wang J-H (2004) A user-oriented framework for the design and implementation of pet robots. In: Proceedings of the 2004 IEEE international conference on systems man and cybernetics 10–13 October 2004, The Hague Netherlands, IEEE, Piscataway, NJ 19. Silva DC, Vinhas V, Reis LP, Oliveira E (2009) Biometric emotion assessment and feedback in an immersive digital environment. Int J Soc Robot 1(4):301–317 20. Ekman P, Friesen WV, Ellsworth P (1972) Emotion in the human face: Guidelines for research and an integration of findings. Pergamon Press, New York 21. Schlossberg H (1954) Three dimensions of emotion. Psychol Rev 61:81–84
Int J Soc Robot (2011) 3: 125–142 22. Montepare J, Koff E, Zaitchik D, Albert M (1999) The use of body movements and gestures as cues to emotions in younger and older adults. J Nonverbal Behav 23(2):133–152 23. Atkinson AP, Dittrich WH, Gemmell AJ, Young AW (2004) Emotion perception from dynamic and static body expressions in pointlight and full-light displays. Perception 33:717–746 24. de Meijer M (1989) The contribution of general features of body movement to the attribution of emotions. J Nonverbal Behav 13:247–268 25. Dittrich WH, Troscianko T, Lea S, Morgan D (1996) Perception of emotion from dynamic point-light displays represented in dance. Perception 25:727–738 26. Pollick FE, Paterson HM, Bruderlin A, Sanford AJ (2001) Perceiving affect from arm movement. Cognition 82:B51–B61 27. Clarke TJ, Bradshaw MF, Field DT, Hampson SE, Rose D (2005) The perception of emotion from body movement in point-light displays of interpersonal dialogue. Perception 34(10):1171–1180 28. Shaarani AS, Romano DM (2006) Basic emotions from body movements. In: (CCID 2006) The first international symposium on culture creativity and interaction design HCI 2006 workshops, the 20th BCS HCI group conference, Queen Mary University of London, UK 29. Ekman P, Friesen WV (1969) The repertoire of nonverbal behavior: categories, origins, usage and coding. Semiotica 1:49–98 30. Rosenthal R, DePaulo B (1979) Sex differences in eavesdropping on nonverbal cues. J Pers Soc Psychol 37(2):273–285 31. McNeill D (2005) Gesture and thought. The University of Chicago Press, Chicago 32. Kret ME, de Gelder B (2010) Recognition of emotion in body postures is influenced by social context. Exp Brain Res 206(1):169– 180 33. Frijda N (1986) Emotions. Cambridge University Press, Cambridge 34. Atkinson A, Tunstall M, Dittrich W (2007) Evidence for distinct contributions of form and motion information to the recognition of emotions from body gestures. Cognition 104(1):59–72 35. Sawada M, Suda K, Ishii M (2003) Expression of emotions in dance: relation between arm movement characteristics and emotion. Percept Mot Skills 97:697–708 36. Rakison DH, Poulin-Dubois D (2001) Developmental origin of the animate–inanimate distinction. Psychol Bull 2:209–228 37. Leslie AM (1994) ToMM ToBy and agency: core architecture and domain specificity. In: Hirschfield L, Gelman S (eds) Mapping the mind: domain specificity in cognition and culture. Cambridge University Press, Cambridge, pp 119–148 38. Morewedge C, Preston J, Wegner D (2007) Timescale bias in the attribution of mind. J Pers Soc Psychol 93(1):1–11 39. Opfer J (2002) Identifying living and sentient kinds from dynamic information: the case of goal-directed versus aimless autonomous movement in conceptual change. Cognition 86:97–122 40. Premack D (1990) The infant’s theory of self-propelled objects. Cognition 36:1–16 41. Gelman R, Durgin F, Kaufman L (1995) Distinguishing between animates and inanimates: not by motion alone. In: Sperber S, Premack D, Premack A (eds) Causal cognition: a multidisciplinary debate. Oxford University Press, Cambridge, pp 150– 184 42. Bassili JN (1976) Temporal and spatial contingencies in the perception of social events. J Pers Soc Psychol 33:680–685 43. Tremoulet PD, Feldman J (2000) Perception of animacy from the motion of a single object. Perception 29:943–951 44. Trafton J, Trickett S, Stitzlein C, Saner L, Schunn C, Kirschenbaum S (2006) The relationship between spatial transformations and iconic gestures. Spat Cogn Comput 6(1):1–29 45. Chase WG, Simon HA (1974) Perception in chess. Cogn Psychol 4:55–81
141 46. Chi MTH, Feltovich PJ, Glaser R (1981) Categorization and representation of physics problems by experts and novices. Cogn Sci 5:121–152 47. Loula F, Prasad S, Harber K, ShiVrar M (2005) Recognizing people from their movement. J Exp Psychol Hum Percept Perform 31:210–220 48. Latshaw G (1978) The complete book of puppetry. Dover, Mineola 49. Blumenthal E (2005) Puppetry: a world history. Harry N Abrams, New York 50. Logan D (2007) Puppetry. Brisbane Dramatic Arts Company, Brisbane 51. Sturman D (1998) Computer puppetry. IEEE Comput Graph Appl 18(1):38–45 52. Fukuda H, Ueda K (2010) Interaction with a moving object affects one’s perception of its animacy. Int J Soc Robot 2(2):187–193 53. Lim H, Ishii A, Takanishi A (1999) Basic emotional walking using a biped humanoid robot. In: Proceedings of the IEEE SMC 1999 54. Nakagawa K, Shinozawa K, Ishiguro H, Akimoto T, Hagita N (2009) Motion modification method to control affective nuances for robots. In: Proceedings of the 2009 IEEE/RSJ international conference on intelligent robots and systems, pp 3727–3734 55. Biocca F (1997) The cyborg’s dilemma: progressive embodiment in virtual environments. J Comput-Mediat Commun 3(2). Available: http://www.ascusc.org/jcmc/vol3/issue2/biocca2.html 56. Kidd C, Breazeal C (2005) Comparison of social presence in robots and animated characters. In: Proc of human-computer interaction (CHI) 57. Ono T, Ishiguro H, Imai M (2001) A model of embodied communications with gestures between humans and robots. In: Proceedings of 23rd annual meeting of the cognitive science society, Mahlwal. Erlbaum, Hillsdale 58. Kanda T, Ishiguro H, Imai M, Ono T (2003) Body movement analysis of human-robot interaction. In: Proc of int joint conference on artificial intelligence (IJCAI 2003), pp 177–182 59. Zlatev J (1999) The epigenesis of meaning in human beings and possibly in robots. Lund University Cognitive Studies 79, Lund University 60. Demiris J, Hayes G (1999) Active and passive routes to imitation. In: Proceedings of the AISB symposium on imitation in animals and artifacts 61. Xing S, Chen I-M (2002) Design expressive behaviors for robotic puppet. In: Proceedings of 7th international conference on control automation robotics and vision (ICARCV ’02), Dec 2002, Singapore, pp 378–382 62. Plaisant C, Druin A, Lathan C, Dakhane K, Edwards K, Vice JM, Montemayor J (2000) A storytelling robot for pediatric rehabilitation. In: Proc ASSETS ’00 63. Sabanovic S, Meisner E, Caporael L, Isler V, Trinkle J (2009) Outside-in design for interdisciplinary HRI research. In 2009 AAAI spring symposium on experimental design for real-world systems 64. Meisner E, Sabanovic S, Isler Volkan Caporael L, Trinkle J (2009) ShadowPlay: a generative model for nonverbal human-robot interaction. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction (HRI’09) 11–13 March 2009, La Jolla, California. ACM, New York 65. Sekiguchi D, Inami M, Tachi S (2004) The design of internetbased RobotPHONE. In: Proceedings of 14th international conference on artificial reality, pp 223–228 66. Nomura T, Suzuki T, Kanda T, Kato K (2006) Altered attitudes of people toward robots: investigation through the negative attitudes toward robots scale. In: Proc AAAI-06 workshop on human implications of human-robot interaction, pp 29–35 67. Nomura T, Nakao A (2010) Comparison on identification of affective body motions by robots between elder people and university students: a case study in Japan. Int J Soc Robot 2(2):147–157
142 68. Itoh K, Miwa H, Matsumoto M, Zecca M, Takanobu H, Roccella S, Carrozza MC, Dario P, Takanishi A (2004) Various emotion expression humanoid robot WE-4RII. In: 1st IEEE technical exhibition based conference on robotics and automation (TExCRA 2004), November 18–19, 2004, Tokyo, Japan, pp 35–36 69. Carpenter J, Davis J, Erwin-Stewart N, Lee T, Bransford J, Vye N (2009) Gender representation and humanoid robots designed for domestic use. Int J Soc Robot 1(3):261–265 70. Tanaka A, Koizumi A, Imai H, Hiramatsu S, Hiramoto E, de Gelder B (2010) I feel your voice: cultural differences in the multisensory perception of emotion. Psychol Sci (in press). doi:10. 1177/0956797610380698 71. Beattie G (2003) Visible thought: the new psychology of body language. Routledge, London
Jamy Li is a Master of Applied Science graduate in Mechanical and Industrial Engineering from the University of Toronto, specializing in Human Factors and Human-Computer Interaction. His research interests include human-robot interaction, the social implications of Web 2.0 communication and cross-cultural differences in technology use. He has worked in user experience at Alias (now Autodesk) and DIRECTV.
Int J Soc Robot (2011) 3: 125–142 Mark Chignell is a Professor of Mechanical and Industrial Engineering at the University of Toronto, where he has been on the faculty since 1990. He has a Ph.D. in Psychology (University of Canterbury, New Zealand, 1981), and an M.S. in Industrial and Systems Engineering (Ohio State, 1984). He has a number of research interests that are ultimately aimed at augmenting human capability through user-centered design of innovative collaboration and communication applications. He has been a visiting scientist at the IBM Centre for Advanced Studies in Toronto since 2002 and a visiting scientist at Keio University in Japan since 2005.