Human Factors: The Journal of the Human Factors and Ergonomics Society http://hfs.sagepub.com/
(Non)cooperative Dialogues: The Role of Emotions Federica Cavicchio and Massimo Poesio Human Factors: The Journal of the Human Factors and Ergonomics Society 2012 54: 546 originally published online 21 March 2012 DOI: 10.1177/0018720812440279 The online version of this article can be found at: http://hfs.sagepub.com/content/54/4/546
Published by: http://www.sagepublications.com
On behalf of:
Human Factors and Ergonomics Society
Additional services and information for Human Factors: The Journal of the Human Factors and Ergonomics Society can be found at: Email Alerts: http://hfs.sagepub.com/cgi/alerts Subscriptions: http://hfs.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav
>> Version of Record - Jul 13, 2012 OnlineFirst Version of Record - Mar 21, 2012 Downloaded from hfs.sagepub.com at Remen University of China on October 11, 2013
What is This?
SPECIAL SECTION: Methods for the Analysis of Communication
(Non)cooperative Dialogues: The Role of Emotions Federica Cavicchio, Università di Trento, Rovereto, Italy, and Massimo Poesio, University of Essex, Essex, United Kingdom, and Università di Trento, Rovereto, Italy Objective: The effect of emotion on (non)cooperation in unscripted, ecological communication is investigated. Background: The participants in an interaction are generally cooperative in that, for instance, they tend to reduce the chance of misunderstandings in communication. However, it is also clear that cooperation is not complete. Positive and negative emotional states also appear to be connected to the participants’ commitment to cooperate or not, respectively. So far, however, it has proven remarkably difficult to test this because of the lack of entirely objective measurements of both cooperation levels and emotional responses. Method: In this article, the authors present behavioral methods and coding schemes for analyzing cooperation and (surface) indicators of emotions in face-to-face interactions and show that they can be used to study the correlation between emotions and cooperation effectively. Results: The authors observed large negative correlations between heart rate and cooperation, and a group of facial expressions was found to be predictive of the level of cooperation of the speakers. Conclusion: It is possible to develop reliable methods to code for cooperation, and with such coding schemes it is possible to confirm the commonsense prediction that noncooperative behavior by a conversational participant affects the other participant in ways that can be measured quantitatively. Application: These results shed light on an aspect of interaction that is crucial to building adaptive systems able to measure cooperation and to respond to the user’s affective states. The authors expect their methods to be applicable to building and testing such interaction systems. Keywords: multimodal communication, face-to-face interaction, multimodal coding scheme, coding scheme validation, pragmatics, facial expressions of emotions Address correspondence to Federica Cavicchio, c/o CIMeC, Center for Mind/Brain Sciences, Università di Trento, Corso Bettini 31, 38068, Rovereto (Tn), Italy;
[email protected]. HUMAN FACTORS Vol. 54, No. 4, August 2012, pp. 546-559 DOI:10.1177/0018720812440279 Copyright © 2012, Human Factors and Ergonomics Society.
Introduction
It is widely accepted that for the field of human–computer interaction (henceforth HCI) to develop anticipatory user interfaces that are human centered and based on naturally occurring multimodal human communication, it will be necessary to develop methods to understand and emulate behavioral cues such as affective and social signals (Bianchi & Lisetti, 2002). Picard (1997) listed several applications where it is beneficial for computers to recognize human affective states: For example, recognizing the user’s emotions, the computer can become a more effective tutor and could learn the user’s preferences. Here, we are concerned with one specific effect of emotions—their impact on the level of cooperation of an agent— and with developing methods to analyze both cooperation and emotions that may be used in studying such effects. The philosopher H. P. Grice was perhaps the first to point out the extent to which our ability to communicate effectively depends on speakers acting cooperatively (Grice, 1975). This tendency to cooperation has been a key tenet of subsequent theorizing in pragmatics such as the work of Bratman and Tuomela in philosophy (Bratman, 1987; Tuomela, 2000), the highly influential work of Clark in psycholinguistics (e.g., Brennan & Clark, 1996; Clark, 1996; Clark & Brennan, 1991; Clark & Krych, 2004; Clark & Schaefer, 1987; Schober & Clark, 1989), and work on intelligent agents in artificial intelligence (see, e.g., the articles in Cohen, Morgan, & Pollack, 1990). However, Grice’s early work already highlighted the difficulty of developing operational measures of cooperation, particularly in communication. And in recent years we have witnessed a reassessment of the extent to which people actually cooperate (Grice, 1989). Emotions are one of the factors shown to be an important predictor of (non)cooperation, for example, in studies based on the game theory. For
(Non)cooperative Dialogues 547
instance, Pillutla and Murnighan (1996) measured the feelings of respondents when confronted with unfair offers to predict their tendency to reject the offer and found that anger was positively correlated with the tendency to not cooperate. Other researchers showed that when respondents were treated unfairly, they felt not just anger but also sadness, irritation, and contempt (Xiao & Hauser, 2005). Many studies have claimed that cooperators can be identified by honest and nonfalsifiable emotive facial expressions such as Duchenne smiles, allowing for mutual selection among cooperators (Boone & Buck, 2003; Brown, Palameta, & Moore, 2003; Krumhuber et al., 2007; Matsumoto & Willingham, 2006; Mehu, Little, & Dunbar, 2007; Oda, Yamagata, Yabiku, & Matsumoto-Oda, 2009). (Duchenne smiles involve the innervations of the orbicularis occuli, a facial muscle surrounding the eyes that is difficult to intentionally control, and have been empirically demonstrated to correlate with the experience of positive emotion [Ekman & Friesen, 1982; Frank & Ekman, 1993; Frank et al., 1993].) These findings suggested that cooperative individuals display higher levels of positive emotions than noncooperators, although the situation in which emotions are displayed may determine the degree to which they reflect cooperative disposition (Matsumoto & Willingham, 2006; Mehu et al., 2007). However, these studies used ultimatum and/or trust games to trigger cooperation and noncooperation in participants. Task-oriented dialogues are an effective methodology to investigate the relationship between cooperation and emotion in naturalistic dialogues, as they are likely to elicit unscripted dialogues without losing control of the context. In this work, we investigate the relationship between cooperation and emotion by collecting and analyzing task-oriented dialogues. A key challenge in doing this is to develop sound methodologies to analyze and encode those factors. For the aspect of our work that is concerned with cooperation, we build on the work by Davies (1998, 2006, 2007), who was the first to develop and test a coding scheme to measure the level of cooperation in a task dialogue. Other coding schemes, such as Walton and Krabbe’s (1995) noncooperative features analysis and
Traum and Allen’s (1994) discourse obligations, did not determine interannotator agreement of their coding scheme. Davies’s coding scheme weights the level of effort that participants embark on with their utterances. Her system provided a positive and negative score for each dialogue move with respect to the effort involved. When an instance of a cooperative behavior was found, a positive coding was attributed with respect to the involved effort level (e.g., the repetition of an instruction had an attribute code of 3). Conversely, a negative coding was attributed when an instance where a particular behavior should have been used was not found (e.g., an inappropriate reply, the failure to introduce useful information when required, was coded as –3). Reliability tests for the coding scheme (Davies, 1998) had kappa scores ranging from .69 to 1.0, but Davies remarked that agreement between coders was not significant for some of the features because they were seldom present in her corpus. One of our objectives in this work is to develop a highly reliable coding scheme for cooperation. Emotion coding has proven equally challenging. There is a vast body of literature on affective computing and emotion coding (Abrilian, Devillers, Buisine, & Martin, 2005; Callejas & López-Cózar, 2008; Devillers & Martin, 2010; Martin, Caridakis, Devillers, Karpouzis, & Abrilian, 2006; Vidrascu & Devillers, 2007) and recognition (see, e.g., Paiva, Prada, & Picard, 2007; Pantic & Rothkrantz, 2000; Pantic, Sebe, Cohn, & Huang, 2005). In the past three decades, the theoretical analysis of emotions has shifted from discrete or basic emotion theories (Ekman, 1984; Izard, 1993; Russell & Barrett, 1999) to dynamic architectural frameworks (Frijda, 1986, 2009; Scherer, 1984, 2001, 2009; Scherer & Ceschi, 2000). Although the former were based on the study of a limited number of innate, hardwired affect programs for basic emotions, such as anger, fear, joy, sadness, and disgust, leading to prototypical response patterns, the latter are built on appraisal and motivational changes. The appraisal mechanism requires interaction among many cognitive functions (i.e., to compare the features of stimulus events, to
548
retrieve representation in memory, to respond at motivational urges) and their underlying neural circuits (Scherer, 2009). In addition, the appraisal model of emotions relies heavily on implicit or explicit computation of probabilities of consequences, coping potential, and action alternatives. In appraisal theorists’ view, the emotion process is considered as a varying pattern of change in several subsystems of the organism that is integrated into coherent clusters (Scherer, 1984, 2001). Results on emotion annotation provide support for the appraisal theory of emotion (Truong, 2009). But although appraisal models of emotion dominate in the scientific community, current corpus annotations for the study of verbal and nonverbal aspects of emotions tend to deal with a limited number of stereotypical emotions. The first example of a database focused on emotions is the collection of pictures by Ekman and Friesen (1975), which is based on an early version of basic emotion theory. Similarly, most facial expression recognition research (for comprehensive reviews, see Fasel & Luettin, 2003; Pantic & Rothkrantz, 2003) has been inspired by Ekman and Friesen’s work on action units (AUs), a set of visible muscle movements in the face proposed to code facial expressions, described in the Facial Action Coding System (Ekman & Friesen, 1978). Each AU has a muscular basis, and every facial expression can be described by a combination of AUs. The seminal work by Ekman and Friesen inspired the collection of many emotive data sets, pictures, and audio and/or video. Most of the facial expression recognizers follow a featurebased approach detecting specific features such as the corners of the mouth or eyebrow shape (Calder, Burton, Miller, Young, & Akamatsu, 2001; Calder & Young, 2005; Chen et al., 2005; Oliver, Pentland, & Berard, 2000) or a regionbased approach in which facial motions are measured in certain regions on the face such as the eye or eyebrow and the mouth (Black & Yacoob, 1995; Essa & Pentland, 1997; Mase, 1991; Otsuka & Ohya, 1997; Rosenblum, Yacoob, & Davis, 1996; Yacoob & Davis, 1996). In spite of the variety of approaches to facial expression analysis, the majority suffers from limitations. First, most of those methods
August 2012 - Human Factors
handle only the six basic prototypical facial expressions of emotions. Second, they do not present a context-dependent interpretation of shown facial behavior; therefore, inferences about the expressed mood and attitude cannot be made by current facial “affect” analyzers. Another problem with the current data sets is that in many multimodal (mainly audiovisual) corpora emotive facial expressions are produced by expert or semiexpert actors. It is often taken for granted that these expressions are the “gold standard” for studying facial display of emotions (Ekman, 1992) without relying on any sort of annotation. But this is not completely true; as Wagner (1993) pointed out, each actor’s production should be validated when assessing the real closeness to the “standard” emotion representation that a group of judges has in mind. More recently, emotion corpora have been collected using multiparty dialogues as in AMI (Augmented Multi-party Interaction) corpus (Carletta, 2007). Another popular (and inexpensive) way to collect emotion corpora is by recording them from TV shows, news programs, and interviews. This type of “ecological” corpora usually features a wide range of verbal and nonverbal behaviors (Douglas-Cowie et al., 2005) that are often extremely difficult to classify and analyze in particular with categorical coding schemes of emotions (Abrilian et al., 2005; Callejas & López-Cózar, 2008; Martin et al., 2006). In general, previous attempts at coding emotions have produced results in poor agreement among annotators and consequently lack coding scheme reliability (for reviews, see Artstein & Poesio, 2008; Cavicchio & Poesio, 2008; Jaimes & Sebe, 2007). The aim of the present research was to study the effect of emotion on (non)cooperation in unscripted, ecological communication. It is crucial, then, to examine the role of appraisal and how it affects the different response modalities such as facial expressions and physiological response. That was done by collecting a corpus of dialogues between speakers performing a direction-giving dialogue task, explicitly designed to elicit negative emotions; our hypothesis was that cooperation would be reduced in the speaker undergoing a negative
(Non)cooperative Dialogues 549
emotion. This hypothesis was tested by coding the collected dialogues for cooperation and for facial correlates of emotion and by recording psycho-physiological measures, in particular heart rate (henceforth HR), which have been shown to correlate with negative emotions (Gallo & Matthews, 1999; Kiecolt-Glaser, McGuire, Robles, & Glaser, 2002; Smyth et al., 1998; van Eck, Berkhof, Nicolson, & Sulon, 1996). But as Jacob et al. (1999) showed, positive and negative emotions are associated with comparable levels of HR. Therefore, to disambiguate the emotive state it was essential to investigate the facial expressions that together with HR would be able to predict whether an agent engaged in an interaction is cooperating or not. The level of cooperation and the accompanying facial expression in the utterances following the noncooperative elicitation were measured using a novel coding scheme. The structure of the article is as follows. In the second section we present the method adopted to collect and annotate the data. In the third section, we present the results. A discussion section follows. Method The Task and the Participants
The task we used to elicit conversations is the Map Task (Anderson et al., 1991), often used to study cooperative behavior (Carletta et al., 1997; Carletta & Mellish, 1996; Davies, 1998; Isard & Carletta, 1995). The Map Task involves two participants sitting opposite each other. Each of them has a map (see Figure 1); one participant (the giver) has a route marked on her or his map, whereas the other one (the follower) does not. The participants are told that their goal is to reproduce the giver’s route on the follower’s map. They are told explicitly at the beginning of the task session that the maps are not identical. To elicit noncooperative behavior, during the interaction we delivered in carefully controlled circumstances a set of negative feedback on their task performance to one of the actors (for a review of the methodology, see Anderson, Linden, & Habra, 2005). Participants were 14 Italian native speakers (7 men and 7 women; average age = 28.6, SD = 4.36) matched with a male confederate partner.
The confederate was the same for all the interaction dialogues and was not related to the participants. A control group of 7 pairs of participants (2 female dyads, 2 male dyads, and 3 mixed-gender dyads; average age = 32.16, SD = 2.9) was also recorded while playing the Map Task with the same map but without negative feedback. The dyads were not related to each other. Game role (giver or follower) and gender were counterbalanced across the interactions. Apparatus, Procedure, and Materials
A BIOPAC MP150 system was used to record the electrocardiogram. Two Ag AgC1 surface electrodes were fixed on the participant’s wrists; a low-pass filter was settled at 100 Hz, and the sampling rate was 200 per second. HR was automatically calculated by the system as the number of heart beats per minute. Artifacts resulting from movements were automatically removed using BIOPAC’s Acknowledge software (Version 3.9.1). All sessions were video- and audiotaped with two VC-C50i Canon digital cameras and two free-field Sennheiser half-cardioid microphones with permanently polarized condenser, placed in front of each speaker. In both the confederate and the control sessions, before starting the task, we recorded the participants’ HR for 10 min without challenging them. In the confederate session we recorded the HR during the first 3 min of interaction; we call this period the task condition. The confederate performed noncooperative utterances by acting the following negative emotion elicitation lines (adapted from Anderson et al., 2005) at minutes 4, 9, and 14 of each interaction. Those time intervals were chosen to ensure HR recovery as cardiovascular activation after negative emotions lasts longer than after positive emotions (Brosschot & Thayer, 2003; Palomba, Sarlo, Angrilli, Mini, & Stegagno, 2000). In Figure 2, a sketch of the experiment procedure is presented. The following lines were delivered by the confederate to the followers: •• “You are sending me in the wrong direction, try to be more accurate!” •• “It’s still wrong, you are not doing your best, try harder! Again, from where you stopped.”
550
August 2012 - Human Factors
Figure 1. The Map Task: Maps for the instruction giver (lower part of the figure) and instruction follower (upper part of the map).
•• “You’re obviously not good enough at giving instructions.”
When the confederate had the giver role, the script lines were the following: •• “You are going in the wrong direction, try to be more accurate!” •• “It’s still wrong, you are not doing your best, try harder! Again, from where you stopped.” •• “You’re obviously not good enough at following instructions.”
An HR reading was taken after each script sentence and lasted 2 minutes. At the end of the interactions, the 5-point Self-Assessment Manikin Scale (adapted from
Bradley & Lang, 1994) was completed by all the participants. The aim was to measure the valence (positive or negative) of the emotion felt by the participants during the interactions. Six of the participants rated the experience as quite negative, four rated the experience as almost negative, two participants rated it as negative, and two participants rated it as neutral. Regarding the control group, we collected four HR readings at equal intervals during the task (at the beginning of the task and at minutes 2, 4, and 6 of the interaction). Coding of the Data
A total of 64 conversational turns from the confederate session and the control session
(Non)cooperative Dialogues 551
Figure 2. Sketch of the experiment procedure in the confederate condition.
were annotated. The annotated segments were taken at the same time points at which the HR was measured. For the confederate condition, the segments were taken at the beginning of the task and after each of the noncooperative utterances. For the control condition, they were taken at the four points at which the HR was measured (at the beginning of the task and after 2, 4, and 6 minutes of interaction). The 64 conversational turns were transcribed and aligned with the annotation of cooperation and facial expression (upper and lower face configuration). All the audiovisual excerpts were annotated using the annotation tool ANViL (ANnotation of Video and Language; Kipp, 2001). Regarding facial expressions, any movements produced by the upper and lower part of the face were marked on different layers of the annotation tool and synchronized with the dialogue transcription. Six Italian native speakers independently coded a part of the collected corpus following the coding scheme guidelines (Cavicchio & Poesio, 2012). Annotation of cooperation. The level of linguistic cooperation was annotated using a novel coding scheme that simplifies the coding scheme proposed by Davies (1998, 2006). We reduced the number of annotation labels as Davies remarked that coder agreement was not significant for some of her scheme annotation features (Davies, 1998). This means either that there was no agreement on the use of those features or that they are very rare. Like Davies’s, our cooperation typology is based on the idea of having coders annotate utterances with moves in a slightly modified version of the HCRC dialogue moves coding
scheme (Carletta et al., 1997) and then assigning a numerical degree of cooperation to these moves: For example, spontaneously adding information is considered more cooperative than simply doing what asked to do. We used check, question answering, and giving instruction as measures of knowledge sharing (i.e., grounding) between the two speakers (see Table 1). Our check code covers queries and clarifications or objections. In the Map Task dialogues, a question is a way to check the extent of knowledge shared by the two speakers. Another group of dialogue moves is related to question answering. The last group of dialogue moves concerns giving instruction. In the Map Task, instruction giving is the main task. Regarding noncooperative dialogue moves, we coded when a speaker fails or refuses to answer a question, adds information, or does not repeat an instruction when required by the other speaker. To calculate the level of effort needed in each move, we attributed the lowest value (0) when a behavior requiring an effort (e.g., answer to a direct question) did not occur. A positive weighting value (2) was attributed when an effort move took place in the dialogue (e.g., adding information spontaneously). We also attributed a weight of 1 to the giving instruction move, which is the “minimum need” in the Map Task dialogue (Davies, 1998). The effort was calculated on an individual basis and not as a joint activity because, as Davies (2006) pointed out, in the Map Task the minimization of the effort is made on a speaker’s basis. To check that annotators agreed on cooperation levels, a kappa statistic (Siegel & Castellan, 1988)
552
August 2012 - Human Factors
Table 1: Coding Scheme for Cooperation Annotation Instructions (Cooperation Typology)
Cooperation Level
No answer to question: No answer given when required Inappropriate reply: Failure to introduce useful information when required No spontaneous add/Repetition of instruction: Information is not spontaneously added or repeated after a check Giving instructions: Task baseline Acknowledgment: A verbal response which minimally shows that the speaker has heard the move to which it responds Question answering (Y/N): Yes–no reply to a check Check: Questions (function or form) which solicit other understanding of information already offered Repeating instructions: Repetition of an instruction following a check Question answering + adding information: Yes–no reply + new information introduction Spontaneous info/description adding: Introduces new information relevant to the task
was computed on the annotated data. A high level of agreement was found (κ = .83, p < .01). Annotation of facial expressions. In our coding scheme facial expressions are deconstructed in eyebrows and mouth shapes (see Table 2). For example, a smile was annotated as “)” and a large smile or a laugh was marked as “+).” The latter feature meant a bigger opening in the rim of the mouth or teeth showing through. Other annotated features are grimace “(,” asymmetric smiles (1cornerup), lips in normal position/closed mouth, lower lip biting, and open lips (O). Regarding eyebrows, annotators marked them in the normal position, frowning (two levels: frown and +frown, the latter meaning a deeper frown), and eyebrows up (up and +up). The six coders reached a high level of agreement in facial expression annotation (upperface intercoder agreement: κ = .855, p < .01; lower-face intercoder agreement: κ = .81, p < .001). An excerpt of a conversation along with the HR, cooperation level, and facial expressions is reported. In the following, the confederate had the role of the giver. After he delivered the second sentence of the script, the confederate giver asked a question to the follower, who did not answer.
0 0 0 1 2 2 2 2 2 2
132 bpm It’sstill wrong, you are not doing your best, try harder. Again, from where you stopped. Where do you stop? We’re at Mount Poldi turn right and do not dare to pass through any of the vineyards you’re at a church now u-uhm Ofcourse you’re at the church otherwise you’re going towards a lot of other churches I don’t know on your map but on mine there are an awful lot of other churches Ok So you’re at this church
(Non)cooperative Dialogues 553
Table 2: Coding Scheme for Facial Expression Annotation Upper or Lower Face Configuration Open mouth Lips in relaxed position/ closed mouth Lip corners up (e.g., smile) Open smile or laugh Lip corners down (e.g., grimace) Lower lip biting 1 mouth corner up (asymmetric smile) Eyebrows in normal position Eyebrows up Eyebrows very up Frown Deep frown
Annotation Label O closed ) +) ( Lbiting 1cornerUp -up +up frown +frown
checking instructions yes-answer closed lip biting ( closed ( O closed
up infamous Is it detached or Well some are detached as this one but in this case it is detached it is detached no this one ok it’s that one Inappropriate reply (no giving info upon request) acknowledgement acknowledgement checking instructions
Results Heart Rate and Cooperation
We tested whether the confederate’s uncooperative utterances would lead to a reduced level of cooperation in the participants. To test this, we first checked whether the eliciting protocol we adopted caused a change in the participants’ HR. A one-way within-subjects ANOVA was performed in the confederate and in the control condition. The HR was confronted over the five time points of interest (baseline, beginning of the interaction, after minute 4, after minute 9, after minute 14) for the confederate conditions. Regarding the control group, in addition to the baseline value, HR was measured another four times taken at equal intervals dung the interactions (see the Apparatus, Procedure, and Materials section). As a result, we found a significant effect of time, F(1.5, 19.54) = 125, p < .0001, in the confederate Map Task group but not in the plain Map Task, F(2.1, 27) = 1.9, p < .16. Between-group post hoc tests run in the confederate condition showed that there was a significant effect of the three sentences with respect to the task condition.
554
After determining that our coding scheme for cooperation was reliable (see the Annotation of Cooperation section), we investigated the relationship between HR (following the anger or frustration provocation) and the level of cooperation. Our goal was to investigate whether HR could be one of the predictors of cooperative or noncooperative behavior in an interactive partner. Again, we took the four HR measures at the beginning of the task and after the confederate delivered the first, second, and third sentences. Then, we scored the cooperation level of 56 video excerpts of interactions with the confederate. The cooperation scores were normalized along each video segment performing the average cooperation level for that segment. A linear regression with HR as the dependent variable and the normalized level of cooperation at the four time points (task, first, second, third sentences) as predictors was run. The statistical model was found to be significant (R2 = .85, SE = 6.3, p < .001). HR was found to be negatively correlated with the normalized cooperation level (p < .001) but positively correlated with sentence (p < .001). That means that the HR linearly increase after each sentence and higher HR was found to be associated to lower level of cooperation. We investigated whether individual differences on HR baseline (i.e., the HR taken before the participants started the task) have any effect on cooperation, but we did not find any (p < .4). A t test on residuals showed that the SE was not significantly different from 0 (p = 1) and the Shapiro– Wilk normality test had a p value of .31. An inspection of the quantile–quantile plot of residuals revealed an underlying normal distribution. Cooperation and Facial Expression
The hypothesis that nonverbal expressions of negative emotions such as specific facial configurations would also predict cooperative or uncooperative communication behavior was tested via a multinomial logistic regression. The dependent variable “cooperation” was a categorical variable with three levels: 0 (noncooperative), 1 (giving instructions), and 2
August 2012 - Human Factors
(cooperative). We set 1 as the reference level in the model. The upper- and lower-face configurations annotated along with the three cooperation levels were included in the model. The model was significant (p < .001), and mouth configuration was found to be more likely to predict cooperation than upper-face configuration. We found that with respect to giving instruction, cooperation is predicted by open smile (1.64, SE = 0.59, p < .05), whereas noncooperation was negatively predicted by smile (–1.17, SE = 0.55, p < .03; see Figure 3). To evaluate the model fit, the independence of irrelevant alternatives was tested via the Hausman–McFadden test. To implement this test, we ran an alternative multimodal logistic regression model with a subset comparing noncooperation with cooperation. As the sets of estimates were not statistically different from the ones of our model (p < .98), the independence of irrelative alternatives held. Discussion
The results of the present study indicate that provocative sentences brought a significant change in the HR of the participants. That change did not occur in the HR of the control group playing the Map Task without the angerprovoking sentences. We showed that HR increased with new negative feedback. This finding fits with previous studies examining the role of hostile rumination in HR recovery (Hogan & Linden, 2002; Schwartz et al., 2000). Schwartz et al. (2000) found that after an anger provocation stimulus, participants who were given a chance to repetitively think about the anger-provoking event tended to have a slower HR recovery than those who are distracted. As reported by Linden et al. (2003), this effect might be linked to the continued focus on the preceding anger provocation stimulus. In our data collection the slower recovery following the first anger-provoking stimulus could reflect higher HR values following the second and third stimuli. Moreover, we found that higher HR values are correlated with a lower level of cooperation. Regarding facial expression, we found that cooperation can be predicted on the basis of mouth configurations.
(Non)cooperative Dialogues 555
Figure 3. Predicted probabilities of noncooperation (black), giving instruction (gray), and cooperation (light gray) against face expressions of the mouth. The values reported on the table refer to the multinomial logit confronting the predicted probability of cooperation and noncooperation with the task baseline giving instruction.
This suggests that the mouth is the facial component to look at to reliably predict cooperation. Our model baseline, that is, giving instruction, was characterized mainly by smile that occurs in association with low HR. A number of studies have pointed out that facial expressions, such as a smile, that appear to convey an emotion, often do not indicate the emotional state at all but have strictly communicative functions transcending the simple
indications of one’s current feelings (Bavelas, Black, Lemery, & Mullett, 1987; Fridlund, 1994, 2002; Provine, 2001). Moreover, smiling individuals are rated as less socially dominant (Keating & Bai, 1986). In this view, a smile could effectively communicate empathy and willingness to cooperate. The facial expressions we found to be predictive of cooperation was an open smile, which, following previous studies (Brown et al., 2003; Mehu et al., 2007), may be
556
linked to the tendency of cooperators to openly express their positive emotions. However, we found that open smiles were not correlated with higher levels of HR with respect to the task baseline, suggesting that they were not expressing a higher level of positive emotion activation. Finally, that noncooperation was not predicted by any facial expression may seem surprising, but it is in fact consistent with the recent findings of Schug, Matsumoto, Horita, Yamagishi, and Bonnet (2010). Schug et al. examined the facial expressions of cooperators and noncooperators as they confronted unfair offers in the Ultimatum Game, finding that cooperators showed greater levels of Duchenne smiles compared to noncooperators, but both groups showed the same level of negative facial expressions when they were not cooperating. Schug et al. suggested that negative facial expression are “faked” or “masked” more often when interactors are not cooperating. Our results support the idea that when cooperating participants have the tendency to express more positive facial expression, whereas when they do not cooperate negative facial expressions as well as smiles are suppressed (and in our findings noncooperation was indeed negatively correlated with smile). It is far more difficult masking or suppressing the psycho-physiological response than a facial expression though. Therefore, it is the combination of HR and facial expression that can reliably provide the modifications of the level of cooperation. As we have shown in our study, when investigating noncooperation HR is the principal index to take into account, whereas when investigating cooperation it is important to take into account both facial expression and HR. A low HR with an open smile expresses that the interactor is cooperating, whereas a high HR with a “fuzzier” facial expression expresses that she or he is not. Last but not least, our coding scheme reliability was very high when compared to that achieved with other multimodal annotation schemes, in particular for emotion annotation. This is because we analyzed emotions with a coding scheme based on the decomposition of the several factors of an emotive event.
August 2012 - Human Factors
In particular, we did not ask coders to label emotive terms directly. Given our results, it seems that the modeling of emotions is a multilevel process rather than a single-shot configuration prediction that can be labeled with an emotive word. Particularly, as we investigated a culturally homogenous group of participants, an interesting application of our coding scheme would be exploring how different emotive sets (positive or negative) modify cooperation in different cultural settings. We believe that our coding scheme is an important step toward the creation of annotated multimodal resources for emotion recognition, which in turn are crucial for investigating multimodal communication and HCI. Acknowledgments We wish to thank Marco Baroni for his comments on the statistical analysis and Francesco Vespignani for his help in setting up the psychophysiological recordings. All participants gave informed consent; the experimental protocol was approved by the Human Research Ethics Committee of Trento University. This research was supported by a PhD studentship from the Department of Information Science and Engineering, Università di Trento.
Key Points •• The article contributes an experimental design to compare cooperation levels with HR and facial expressions. •• This article demonstrates the development of analysis techniques to establish that both HR and facial expressions predict (non)cooperation.
References Abrilian, S., Devillers, L., Buisine, S., & Martin, J.-C. (2005, July). EmoTV1: Annotation of real-life emotions for the specification of multimodal affective interfaces. Paper presented at HumanComputer Interaction International, Las Vegas, NV. Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G. M., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H. S., & Weinert, R. (1991). The HCRC Map Task Corpus. Language and Speech, 34, 351–366. Anderson, J. C., Linden, W., & Habra, M. E. (2005). The importance of examining blood pressure reactivity and recovery in anger provocation research. International Journal of Psychophysiology, 57, 159–163. Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34, 555–596. Bavelas, J. B., Black, A., Lemery, C. R., & Mullett, J. (1987). Motor mimicry as primitive empathy. In N. Eisenberg & J.
(Non)cooperative Dialogues 557
Strayer (Eds.), Empathy and its development (pp. 317–338). Cambridge, UK: Cambridge University Press. Bianchi, N., & Lisetti, C. L. (2002). Modeling multimodal expression of user’s affective subjective experience. User Modeling and User-Adapted Interaction, 12, 49–84. Black, M., & Yacoob, Y. (1995). Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In International Conference on Computer Vision (pp. 374–381). Cambridge: MIT Press. Boone, R. T., & Buck, R. (2003). Emotional expressivity and trustworthiness: The role of nonverbal behavior in the evolution of cooperation. Journal of Nonverbal Behavior, 27, 163–182. Bradley, M., & Lang, P. (1994). Measuring emotion: The selfassessment manikin and the semantic differential. Journal of Behavioral Therapy and Experimental Psychiatry, 25, 49–59. Bratman, M. (1987). Intention, plans, and practical reason. Cambridge, MA: Harvard University Press. Brennan, S. E., & Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory and Cognition, 22, 1482–1493. Brosschot, J. F., & Thayer, J. F. (2003). Heart rate response is longer after negative emotions than after positive emotions. International Journal of Psychophysiology, 50, 181–187. Brown, W. M., Palameta, B., & Moore, C. (2003). Are there nonverbal cues to commitment? An exploratory study using the zero-acquaintance video presentation paradigm. Evolutionary Psychology, 1, 42–69. Calder, A. J., Burton, M., Miller, P., Young, A. W., & Akamatsu, S. (2001). A principal component analysis of facial expressions. Vision Research, 41, 1179–1208. Calder, A. J., & Young, A. W. (2005). Understanding facial identity and facial expression recognition. Nature Neuroscience Reviews, 6, 641–653. Callejas, Z., & López-Cózar, R. (2008). Influence of contextual information in emotion annotation for spoken dialogue systems. Speech Communication, 50, 416–433. Carletta, J. (2007). Unleashing the killer corpus: Experiences in creating the multi-everything AMI Meeting Corpus. Language Resources and Evaluation, 41, 181–190. Carletta, J. C., Isard, A., Isard, S., Kowtko, J., Doherty-Sneddon, G., & Anderson, A. (1997). The reliability of a dialogue structure coding scheme. Computational Linguistics, 23, 13–31. Carletta, J., & Mellish, C. (1996). Risk-taking and recovery in taskoriented dialogue. Journal of Pragmatics, 26, 71–107. Cavicchio, F., & Poesio, M. (2008). Annotation of emotion in dialogue: The Emotion in Cooperation Project. In André, E., Dybkjær, L., Neumann, H., Pieraccini, R., and Weber, M. (Eds.), Multimodal dialogue systems perception: Lecture notes in computer science (pp. 233–239). Berlin, Germany: Springer. Cavicchio, F., & Poesio, M. (2012). The Rovereto Emotion and Cooperation Corpus: A new resource to investigate cooperation and emotions. Journal of Language and Resources Evaluation. (Epub ahead of print.) doi:10.1007/s10579-011-9163-y Chen, L. S., Travis Rose, R., Parrill, F., Han, X., Tu, J., Huang, Z., Harper, M., Quek, F., McNeill, D., Tuttle, R., & Huang, T. S. (2005). VACE Multimodal Meeting Corpus. Lecture Notes in Computer Science, 3869, 40–51. Clark, H. H. (1996). Using language. Cambridge, UK: Cambridge University Press. Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. In L. Resnick, J. Levine, & S. Teasley (Eds.), Perspectives on socially shared cognition (pp. 127–149). Washington, DC: American Psychological Association.
Clark, H. H., & Krych, M. A. (2004). Speaking while monitoring addressees for understanding. Journal of Memory and Language, 50, 62–81. Clark, H. H., & Schaefer, E. F. (1987). Collaborating on contributions to conversations. Language and Cognitive Processes, 2, 19–41. Cohen, P., Morgan, J., &, Pollack, M. (1990). Intentions in communication. Cambridge: MIT Press. Davies, B. L. (1998). An empirical examination of cooperation, effort and risk in task-oriented dialogue (Unpublished doctoral thesis). University of Edinburgh, Edinburgh, UK. Davies, B. L. (2006). Testing dialogue principles in task-oriented dialogues: An exploration of cooperation, collaboration, effort and risk (Leeds Working Papers in Linguistics and Phonetics 11). Retrieved from http://www.leeds.ac.uk/linguistics/WPL/ WP2006/2.pdf Davies, B. L. (2007). Grice’s cooperative principle: Meaning and rationality. Journal of Pragmatics, 39, 2308–2331. Devillers, L., & Martin, J.-C. (2010). Emotional corpora. In C. Pelachaud (Ed.), Emotions (pp. 39–51). New York, NY: John Wiley. Douglas-Cowie, E., Devillers, L., Martin, J.-C., Cowie, R., Savvidou, S., Abrilian, S., & Cox, C. (2005). Multimodal databases of everyday emotion: Facing up to complexity. In INTERSPEECH-2005 (pp. 813–816). Available at http:// www.isca-speech.org/archive/interspeech_2005 Ekman, P. (1984). Expression and the nature of emotion. In K. Scherer & P. Ekman (Eds.), Approaches to emotion (pp. 319– 343). Hillsdale, NJ: Lawrence Erlbaum. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6, 169–200. Ekman, P., & Friesen, W. V. (1975). Unmasking the face: A guide to recognizing emotions from facial clues. Englewood Cliffs, NJ: Prentice Hall. Ekman, P., & Friesen, W. V. (1978). Facial action coding system: A technique for the measurement of facial movement. Palo Alto, CA: Consulting Psychologists Press. Ekman, P. & Friesen, W. V. (1982). Felt, false, and miserable smiles. Journal of Nonverbal Behavior, 6, 238-252. Essa, I., & Pentland, A. (1997). Coding, analysis, interpretation, and recognition of facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 757–763. Fasel, B., & Luettin, J. (2003). Automatic facial expression analysis: A survey. Pattern Recognition, 36, 259–275. Frank, M. G. & Ekman, P. (1993). Not all smiles are created equal: The differences between enjoyment and nonenjoyment smiles. Humor, 6, 9–26. Frank. M. G., Ekman, P., & Friesen, W. V. (1993). Behavioural markers and recognizability of the smile of enjoyment. Journal of Personality and Social Psychology, 64, 83–93. Fridlund, A. (1994). Human facial expression: An evolutionary view. San Diego, CA: Academic Press. Fridlund, A. J. (2002). The behavioral ecology view of smiling and other facial expressions. In M. H. Abel (Ed.), An empirical reflection on the smile (pp. 45–82). Lewiston, NY: Edwin Mellen Press. Frijda, N. H. (1986). The emotions. New York, NY: Cambridge University Press. Frijda, N. H. (2009). Emotions, individual differences and time course: Reflections. Cognition and Emotion, 23, 1444–1461. Gallo, L. C., & Matthews, K. A. (1999). Do negative emotions mediate the association between socioeconomic status and health? Annals of the New York Academy of Science, 896, 226–245.
558
Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics, Vol. 3: Speech acts (pp. 41–58). New York, NY: Academic Press. Grice, H. P. (1989). Studies in the way of words. Cambridge, MA: Harvard University Press. Hogan, B. E., & Linden, W. (2002). Anger response styles affect reactivity and recovery to an anger challenge in cardiac patients. Annals of Behaviour of Medicine, 27, 38–49. Isard, A., & Carletta, J. (1995). Transaction and action coding in the Map Task Corpus (Tech. Rep. HCRC/RP-65). Edinburgh, UK: University of Edinburgh, Human Communication Research Centre. Izard, C. (1993). Four systems for emotion activation: Cognitive and non-cognitive processes. Psychological Review, 100, 60–69. Jacob, R. G., Thayer, J. F., Manuck, S. B., Muldoon, M. F., Tamres, L. K., & Williams, D. M. (1999). Ambulatory blood pressure responses and the circumplex model of mood: A 4-day study. Psychosomatic Medicine, 61, 319–333. Jaimes, A., & Sebe, N. (2007). Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding, 108, 116–134. Keating, C. F., & Bai, D. L. (1986). Children’s attribution of social dominance from facial cues. Child Development, 57, 1269– 1276. Kiecolt-Glaser, J. K., McGuire, L., Robles, T. F., & Glaser, R. (2002). Emotions, morbidity, and mortality: New perspectives from psychoneuroimmunology. Annual Review of Psychology, 53, 83–107. Kipp, M. (2001). ANVIL - A generic annotation tool for multimodal dialogue. In Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech) (pp. 1367-1370). ISCA archive: http://www.isca-speech.org/ archive/eurospeech_2001/e01_1367.html Krumhuber, E., Manstead, A. S. R., Cosker, D., Marshall, D., Rosin, P. L., & Kappas, A. (2007). Facial dynamics as indicators of trustworthiness and cooperative behavior. Emotion, 7, 730–735. Linden, W., Hogan, B., Rutledge, T., Chawla, A., Lenz, J. W., & Leung, D. (2003). There is more to anger coping than “in” or “out.” Emotion, 3, 12–29. Martin, J.-C., Caridakis, G., Devillers, L., Karpouzis, K., & Abrilian, S. (2006, May). Manual annotation and automatic image processing of multimodal emotional behaviors: Validating the annotation of TV interviews. Paper presented at the Fifth International Conference on Language Resources and Evaluation, Genoa, Italy. Mase, K. (1991). Recognition of facial expressions from optical flow. IEICE Transactions, 74, 3474–3483. Matsumoto, D., & Willingham, B. (2006). The thrill of victory and the agony of defeat: Spontaneous expressions of medal winners of the 2004 Athens Olympic Games. Journal of Personality and Social Psychology, 91, 568–581. Mehu, M., Little, A. C., & Dunbar, R. I. M. (2007). Duchenne smiles and the perception of generosity and sociability in faces. Journal of Evolutionary Psychology, 5, 133–146. Oda, R., Yamagata, N., Yabiku, Y., & Matsumoto-Oda, A. (2009). Altruism can be assessed correctly based on impression. Human Nature, 20, 331–341. Oliver, N., Pentland, A., & Berard, F. (2000). LAFTER: A real-time face and lips tracker with facial expression recognition. Pattern Recognition, 33, 1369–1382. Otsuka, T., & Ohya, J. (1997). A study of transformation of facial expressions based on expression recognition from temporal
August 2012 - Human Factors
image sequences (Tech. rep.). Tokyo, Japan: Institute of Electronic information, and Communications Engineers. Paiva, A., Prada, R., & Picard, R. W. (2007, September). Affective computing and intelligent interaction. Paper presented at the Second International Conference on Affective Computing and Intelligent Interaction, Lisbon, Portugal. Palomba, D., Sarlo, M., Angrilli, A., Mini, A., & Stegagno, L. (2000). Cardiac responses associated with affective processing of unpleasant film stimuli. International Journal of Psychophysiology, 36, 45–47. Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: The state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 1424–1445. Pantic, M., & Rothkrantz, L. J. M. (2003). Toward an affectsensitive multimodal human-computer interaction. Proceedings of the IEEE, 91, 1370–1390. Pantic, M., Sebe, N., Cohn, J., & Huang, T. S. (2005). Affective multimodal human-computer interaction. New York, NY: Association for Computing Machinery. Picard, R. W. (1997). Affective computing. Cambridge, MA: MIT Press. Pillutla, M., & Murnighan, J. (1996). Unfairness, anger, and spite: Emotional rejections of ultimatum offers. Organizational Behavior & Human Decision Processes, 68, 208–224. Provine, R. R. (2001). Laughter: A scientific investigation. New York, NY: Penguin. Rosenblum, M., Yacoob, Y., & Davis, L. (1996). Human expression recognition from motion using radial basis function network architecture. IEEE Transactions on Neural Networks, 7, 1121–1138. Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76, 805–819. Scherer, K. R. (1984). Emotion as a multicomponent process: A model and some cross-cultural data. Beverly Hills, CA: Sage. Scherer, K. R. (2001). The nature and study of appraisal: A review of the issues. New York, NY: Oxford University Press. Scherer, K. R. (2009). The dynamic architecture of emotion: Evidence for the component process model. Cognition and Emotion, 23, 307–351. Scherer, K. R., & Ceschi, G. (2000). Criteria for emotion recognition from verbal and nonverbal expression: Studying baggage loss in the airport. Personality and Social Psychology Bulletin, 26, 327–339. Schober, M. F., & Clark, H. H. (1989). Understanding by addressees and over hearers. Cognitive Psychology, 21, 211–232. Schug, J., Matsumoto, D., Horita, Y., Yamagishi, T., & Bonnet, K. (2010). Emotional expressivity as a signal of cooperation. Evolution and Human Behavior, 31, 87–94. Schwartz, A., Gerin, W., Christenfeld, N., Glynn, L., Davidson, K., & Pickering, T. (2000). Effects of an anger recall task on post-stress rumination and blood pressure recovery in men and women. Psychophysiology, 37, 12–23. Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences. New York, NY: McGraw-Hill. Smyth, J., Ockenfels, M. C., Porter, L., Kirschbaum, C., Hellhammer, D. H., & Stone, A. A. (1998). Stressors and mood measured on a momentary basis are associated with salivary cortisol secretion. Psychoneuroendocrinology, 23, 353–370. Traum, D. R., & Allen, J. F. (1994). Discourse obligations in dialogue processing. In Proceedings of the 32nd annual meeting of ACL (pp. 1–8). Stroudsburg, PA: ACL. DOI 10.3115/981732.981733
(Non)cooperative Dialogues 559
Truong, K. (2009). How does real affect affect affect recognition in speech? (Unpublished doctoral dissertation). University of Twente, Enschede, Netherlands. Tuomela, R. (2000). Cooperation: A philosophical study. Philosophical studies series 82. Dordrecht, Netherlands: Kluwer. van Eck, M., Berkhof, H., Nicolson, N., & Sulon, J. (1996). The effects of perceived stress, traits, mood states, and stressful daily events on salivary cortisol. Psychosomatic Medicine, 58, 447–458. Vidrascu, L., & Devillers, L. (2007). Five emotion classes detection in real-world call center data: The use of various types of paralinguistic features. In International workshop on paralinguistic speech—Between models and data, Available at http:// www2.dfki.de/paraling07/papers/05.pdf Wagner, H. L. (1993). On measuring performance in category judgment studies on nonverbal behavior. Journal of Nonverbal Behavior, 17, 3–28. Walton, D., & Krabbe, E. (1995). Commitment in dialogue: Basic concepts of interpersonal reasoning. Albany: State University of New York Press. Xiao, E., & Houser, D. (2005). Emotion expression in human punishment behavior. Proceedings of the National Academy of Sciences, 102, 7398–7401. Yacoob, Y., & Davis, L. (1996). Recognizing human facial expressions from long image sequences using optical flow. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 18, 636–642.
Federica Cavicchio is a Marie Curie postdoctoral fellow at the Center for Mind/Brain Sciences, Università di Trento, Rovereto, Italy. She obtained her PhD in cognitive and brain sciences at the Center for Mind/Brain, Università di Trento, in 2010. Her main fields of research are dialogue systems and psycholinguistics. Massimo Poesio is full professor and director of the Cognition, Language, Interaction and Computation (CLIC) Lab at the Center for Mind/Brain Sciences, Università di Trento, Rovereto, Italy. His main field of research is natural language processing. Date received: January 10, 2011 Date accepted: January 15, 2012