are cross validated leaving one discussion (three subjects) out and are thus .... NaN. NaN. 71.4. 22.8. 5.8. NaN. NaN. NaN. NaN. NaN. 24.6. 67.2. 8.2. NaN. NaN.
2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust
Quantifying Behavioral Mimicry by Automatic Detection of Nonverbal Cues from Body Motion Bertolt Meyer and Klaus Jonas University of Zurich Department of Psychology {b.meyer, k.jonas}@psychologie.uzh.ch
Sebastian Feese, Bert Arnrich and Gerhard Tr¨oster ETH Zurich Wearable Computing Laboratory {feese, barnrich, troester}@ife.ee.ethz.ch
Abstract—Effective leadership can increase team performance, however the underlying micro-level behaviors that support team performance are still unclear. At the same time, traditional behavioral observation methods rely on manual video annotation which is a time consuming and costly process. In this work, we employ wearable motion sensors to automatically extract nonverbal cues from body motion. We utilize activity recognition methods to detect relevant nonverbal cues such as head nodding, gesticulating and posture changes. Further, we combine the detected individual cues to quantify behavioral mimicry between interaction partners. We evaluate our methods on data that was acquired during a psychological experiment in which 55 groups of three persons worked on a decision-making task. Group leaders were instructed to either lead with individual consideration or in an authoritarian way. We demonstrate that nonverbal cues can be detected with a F1-measure between 56 % and 100 %. Moreover, we show how our methods can highlight nonverbal behavioral differences of the two leadership styles. Our findings suggest that individually considerate leaders mimic head nods of their followers twice as often and that their face touches are mimicked three times as often by their followers when compared with authoritarian leaders.
Despite the fact that mimicry is pervasive in human behavior and has led to a vast amount of literature in psychology there exist only few approaches to measure and quantify mimicry. In psychotherapy, the measurement of body movement synchronisation has been shown to be a useful tool to track the quality of the patient-therapist relationship [3]. We envision that this tracking of relationship quality is a valuable tool also outside psychotherapy; for example, it could be a complementary tool to measure the interpersonal relationships of employees with their leader. In a leadership training scenario in which a team leader learns to engage and motivate his followers, we can think of the coordination measures to be part of an objective feedback to the trainees about their interaction with their followers. Traditional measurements of behavioral mimicry depend on video footage and manual encoding of mimicry which is time consuming, costly and bound to the lab setting. In order to go out of the lab and measure natural mimicry behavior in every day situations without the need of any fixed infrastructure, we employ wearable motion sensors. In this work, we want to evaluate the potential of a wearable setup.
I. I NTRODUCTION Whenever we interact with others, we not only take conversational turns and thus coordinate the flow of speech, but we also tend to align our gestures, postures, body movements and other nonverbal behaviors to match those of our interaction partners. Even though we are most often unaware of this matching and synchronisation of our nonverbal behavior it “is present in nearly all aspects of our social lives, helping us to negotiate our daily face-to-face encounters”[1]. In psychology the matching of nonverbal behaviors during face-to-face interaction is known as mimicry which includes facial, prosodic and behavioral mimicry. Mimicry has been found to relate to liking and rapport and in fact to be a nonconscious tool to affiliate and disaffiliate with others [2]. When individuals want to affiliate with others they unconsciously engage in more mimicry, whereas they show mimic less when they disaffiliate. It has been shown that mimicry can lead to empathy which helps in understanding the emotions of others and that it can lead to more similar attitudes and shared viewpoints. Mimicry leads to more prosocial behavior toward the mimicker. Altogether, mimicry “binds and bonds” people together and supports their face-to-face interactions [2]. 978-0-7695-4848-7/12 $26.00 © 2012 IEEE DOI 10.1109/SocialCom-PASSAT.2012.48
A. Paper Contribution In this paper, we concentrate on information derived from wearable motion sensors to automatically detect nonverbal cues of interaction partners. We present the following results: 1) We show how on-body motion sensors can provide quantitative information about the nonverbal behavior of humans in naturalistic interactions. In particular, we present a method to detect individual nonverbal cues from motion sensor data using activity recognition methods. We evaluate our methods on an evaluation set of 30 participants. 2) We show how the automatically detected nonverbal cues of individuals can be combined to capture the behavioral interdependencies between two or more interacting persons to measure and assess behavioral mimicry. 3) We apply our methods to sensor data recorded during a recent psychological study on leadership behavior in teams [4] and demonstrate how they can help to uncover nonverbal behavior differences of two different leadership styles. 520
B. Related Work
A. Leadership Manipulation
1) Psychology Background: Leadership has been examined from many perspectives. Much work has been devoted to understand several leadership styles. Two important types are individualized considerate and authoritarian leadership. Individualized considerate leadership is a person-focused leadership style. Considerate leaders pay special attention to their followers’ needs and listen effectively [5]. As such, individual consideration is connected to “preference for and use of two-way communication, empathy, and willingness to use delegation” [5, page 132]. As a substantial facet of transformational leadership, individual consideration has been found to increase team performance particularly well [6]. In contrast to individually considerate leaders, authoritarian leaders take decisions without consulting their followers [7]. Consequently, authoritarian leadership can only work as long as there is no need for input from followers. and their motivation does not depend on their involvement in the decision-making process. 2) Social Computing: Previous work on automatic analysis of social interactions in small groups has dealt with automatic inference of conversational structure, analysis of social attention and the detection of personality traits and roles. A review on the topic can be found in [8]. These works have mostly relied on speech related cues such as speaking length, speaker turns and number of successful interruptions. More recently, nonverbal cues extracted from audio and video have been used for predicting group cohesion [9] and identifying emerging leaders in small groups [10]. Individually considerate leaders were characterized using speech activity cues in [4]. First steps to measure mimicry using vision based methods have been taken in [3], [11]. These works aim to measure movement synchronisation on a lower-level using abstract signal features, whereas our approach relies on discrete nonverbal cues and is thus closer to the works of LaFrance [12] and Chatrand et. al. [13]. In the field of human-robot interaction head nods have been detected using vision based methods [14]. In contrast to previous works on meeting corpora, our approach to extract nonverbal cues relies on sensor data from wearable motion sensors. Pentland and collaborators have first investigated how wearable sensors can be employed to measure honest signals to capture aspects of human behavior in daily life [15].
Half of the leaders were instructed to show individually considerate leadership, whereas the other half was instructed to display authoritarian leadership. Within the study authoritarian leadership refers to the absence of individual consideration. The oldest group member was selected as the group leader and received a short leadership training focusing either on individually considerate or authoritarian leadership. In one-minute instruction videos typical behaviors of each leadership style were presented and the leader was asked to show these behaviors throughout the later discussion. As an incentive, leaders received a raffle ticket for a cash prize for each behavior that they displayed. Individually considerate leaders were instructed to stimulate their followers, to make sure that their followers contribute to the final decision, to avoid pushing for their own opinion and to make suggestions on the discussion structure. Authoritarian leaders were instructed to determine the structure of the discussion, be the first to suggest the rank order of candidates, to interrupt unsuitable contributions of followers and to decide on the optimal rank order of candidates after listening to the followers’ opinions. B. Data Set The upper body motion of each group member was captured with six inertial measurement units (IMU, XSens MTx) placed on the upper body. The IMU’s were located on both lower and upper arms, the back and the head (see Fig. 1). Each IMU includes an accelerometer, a gyroscope and a magnetic field sensor. All sensors were sampled with a frequency of 32 Hz. For an easier setup the IMUs were integrated into a sensor-shirt, a long-sleeve stretch shirt which allowed identical sensor placement on all participants. Additionally, speech was recorded with separate lapel microphones and physiological data such as heart rate and breathing rate was recorded with a monitoring chest-belt (Zephyr BioHarness). In total, we recorded data from 165 subjects (112 female, 53 male; age = 25.4 years ± 4.2 years) in 55 group discussions. Due to sensor failures occurred during the first groups under investigation, the sensor data of 11 group discussions were partially missing. This left us with 44 discussions with in total over 15 hours of discussion time. C. Behavior Annotation
II. E XPERIMENT
Based on the Discussion Coding Manual [16], we have selected seven relevant nonverbal cues which can be derived from body motion. We summarize these nonverbal cues in TABLE I. Generally, the nonverbal cues can be categorized into static postures and dynamic gestures or posture changes. In order to evaluate our detection algorithms, we obtained a ground truth by manually labeling the first 8 minutes of 10 randomly selected sessions (five of each leadership style). Consequently, the evaluation set includes 30 subjects and totals to 240 minutes of discussion data. For the labeling of the lower-arm cues, we followed a semi-automatic labeling approach by first employing an activity segmentation of the motion data stream to pre-segment the motion stream into
For this work, we use data recorded during a recent psychological experiment on leadership [4]. During the experiment participants worked in groups of three (one leader, two followers) on a simulated personal selection task. Each group was asked to rank four fictive candidates with regard to their suitability for an open job position. For the task, each group member received five pieces of information about each candidate that were partly shared among group members (hidden-profile decision making task). Under the guidance of the group leader, the group discussed the suitability of each candidate and was asked to agree on a rank order which served as a measure of group performance.
521
TABLE I E XTRACTED NONVERBAL CUES . N UMBER OF INSTANCES (#) AND THE AVERAGE LENGTH IN SECONDS ( LEN ) IN THE EVALUATION SET. cue face touch arm crossed arm diagonal gesticulating fidgeting posture change nodding
description (related to) touching ones own face (listening, thinking) closed posture (hostile behavior) hands folded on table (listening) hands gesticulate (emphasis, dominance) light arm movements (nervousness) posture changes, incl. all other movements head nodding (backchannel, agreement)
# 253 135 1291 691 1082 279 287
len 5.4 17.2 8.6 3.2 1.5 2.7 1.8
static and dynamic segments (for details see section III-A). Static segments of each lower arm were then labeled as either face touch, arms crossed or arms diagonal. Dynamic segments were labeled as gesticulating whenever the arm was used for a gesture, as fidgeting when the arm was lightly moved while not changing the posture and as posture-change otherwise. In addition head nodding was annotated manually without any automatic segmentation.
Fig. 1. An example of behavioral mimicry: The left and right person both display the facetouch posture. Motion sensors are placed on both lower arms, back and head. 1
A
3
2
+dt
+dt
4
+dt
5
+dt
+dt
B time
Fig. 2. Definition of behavioral mimicry. Person A displays behavior x (yellow) five times and person B mimics person A four times (bold border).
III. I NDIVIDUAL N ONVERBAL C UES FROM B ODY M OTION A. Pre-processing and Segmentation In a pre-processing step, we calibrated the heading (yawangle) of the orientation data to face straight forward to the middle of the table. This heading calibration was done in order to be independent of earth’s magnetic north and allowed us to use the same model for all three participants around the table. For the detection of gestures and postures, we first segmented the motion stream of each body part into dynamic and static segments using a sliding window approach. On a sliding window (length: 500 ms; step size: 31.25 ms) a segmentation feature is calculated and a detection threshold is used to segment the motion streams. For the lower arms, we used the standard deviation of the mean of the acceleration magnitude and the gyroscope magnitude as segmentation feature fseg = 12 (||acc||2 + ||mag||2 ). The detection threshold τ controls the sensitivity of the motion segmentation and was empirically set to τ = 0.1, which we found to be a good balance between static postures and fidgeting movements. To smooth the segmentation output and prevent oversegmentation, we deleted all segments shorter than 500 ms and then merged all segments within 500 ms.
previous and the next segment to capture the similarity of the neighbouring static segments. Before classification, highly correlated features (Pearson’s correlation coefficient, r > 0.8) and features with low standard deviation (σ < 0.01) were removed. For the classification of nonverbal cues, we used logistic regression. For dynamic gestures we added the lasso penalty term for automatic feature selection. The lasso penalty λ was set to 0.8. All our results are cross validated leaving one discussion (three subjects) out and are thus subject independent. C. Detection of Head Nods For the head nod detection only the acceleration and gyroscope sensors of the head mounted IMU are utilized. The detection approach is similar to the detection of arm gestures and postures, however a sliding window with fixed step size (250 ms) and a window length of 1.5 s is used in the segmentation step. In our experiments the sliding window approach outperformed other segmentation approaches in case of the head nod detection and a window length of 1.5 s has been found to be optimal.
B. Detection of Gestures and Postures
IV. I NTERPERSONAL C UES FROM B ODY M OTION B EHAVIORAL M IMICRY
For each static segment, we calculated the mean of the orientation data represented by Euler angles (roll, pitch, yaw). In case of dynamic segments, a number of additional signallevel features were calculated for each axis of the acceleration and gyroscope sensors. We used the following features: maximum, minimum, range, maximum absolute value, number of maximum peaks, number of minimum peaks, mean time between peaks and standard deviation. Because the dynamic cues fidgeting and posture change are dependent on their neighbouring static segments (fidgeting is enclosed by the same posture, whereas posture changes occur between two different postures), we also computed neighbourhood features. For each segment, we calculated the absolute difference of each mean orientation angle of the
In the previous section, we described how individual nonverbal cues can be detected from body motion. In the following, we combine the individual cues of two persons to measure mimicry. Behavioral mimicry takes place whenever a person A adopts the behavior of another person B. An example of behavioral mimicry is illustrated in Fig. 1. A. Definition We define mimicry as an event that occurs when person B follows person A in their behavior, that is they display the same nonverbal cue. Schematically behavior mimicry of one behavioral cue is illustrated in Fig. 2. To count as a
522
mimicry event, person B needs to start displaying behavior x after person A started, but within a certain time dt after person A stopped displaying behavior x (compare examples 1,2,4 and 5). To avoid double counting in case that person A displays the behavior x again (example 3), person B needs to display x before A starts again. In case that person B displays x multiple times (example 5) only one mimicry event is counted. More formally, mimicry events are defined as follows: Given a sequence of behaviors bA 1...N of person A out of a set of behaviors X, a behavior instance is given by bA i . Start and end times of each behavior instance are accessed A A by t1 [bA i ] and t2 [bi ], respectively. A behavior of person A bi is mimicked by person B if a behavior instance bB exists that j meets the following constraints: bA i t1 [bB j ] t1 [bB j ]
=
bB j ,
>
t1 [bA i ],