The Bilingual Mind and Brain Book Series 3
Halszka Bąk
Emotional Prosody Processing for Non-Native English Speakers Towards An Integrative Emotion Paradigm
The Bilingual Mind and Brain Book Series Volume 3
Series editors Roberto R. Heredia, Department of Psychology and Communication, Texas A&M International University, Laredo, TX, USA Anna B. Cies´licka, Department of Psychology and Communication, Texas A&M International University, Laredo, TX, USA
[email protected]
More information about this series at http://www.springer.com/series/13841
[email protected]
Halszka Ba˛k
Emotional Prosody Processing for Non-Native English Speakers Towards An Integrative Emotion Paradigm
13
[email protected]
Halszka Ba˛k Faculty of English Adam Mickiewicz University Poznan´ Poland
The Bilingual Mind and Brain Book Series ISBN 978-3-319-44041-5 ISBN 978-3-319-44042-2 (eBook) DOI 10.1007/978-3-319-44042-2 Library of Congress Control Number: 2016948096 © Springer International Publishing Switzerland 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG Switzerland
[email protected]
To my parents, Boz˙ena and Zenon
[email protected]
Acknowledgments
This work is my own, but as it consumed more and more of my attention and my time, it was the people around me who kept me focused, inspired, and sane enough to push it through to the end. I thank my Ph.D. supervisors, Profs. Roman Kopytko and Katarzyna Bromberek-Dyzman who showed endless patience and much-needed advice on all aspects of completing my Ph.D. dissertation upon which this book is based. Above all I thank them for the freedom they gave me in shaping my project and the trust they showed for my deliberate pacing of the empirical work and the writing process. I thank Prof. Kopytko for inspiring my mind to fly and Prof. Bromberek-Dyzman for tethering it not the stray beyond the boundaries of social acceptability. The integrative paradigm designed for this study would be impossible to create without the help and cooperation from Prof. Jeanette Altarriba from the University at Albany–SUNY. She gave me the opportunity to work at her laboratory and offered priceless guidance through the bureaucratic, ethical, and formal thickets of doing research in a foreign country. The most critical and demanding portion of the empirical work for this book was concerned with developing the stimuli for the experimental stages of the study. Described in Chap. 6, this portion of the work would simply not become a reality without Prof. Altarriba’s collaboration. While I appreciate her professional and committed help, I thank her in particular for her kindness and generosity towards a girl far from home and profoundly out of her depth. For all that I have learned and all I have gained, with fond memories of a hot plate of cinnamon churros—thank you, Jeanette. I am greatly indebted to my new and old friends from the University at Albany, mainly all friends or members of Prof. Altarriba’s Cognition and Language Laboratory. Many thanks for the great pointers and much patience with the mildly obtuse foreigner trying to get an IRB approval to Faye Knickerbocker. Thanks to Stephanie Kazanas for her cool professionalism and candid nature, for keeping her doors open and giving me no limit on the number of odd/silly questions about the how-tos and wherefores of an American University. To Kit Cho, who is secretly a superhero, for swooping in with compatible equipment and much-needed infusions vii
[email protected]
Acknowledgments
viii
of Polish food and Taylor Swift music at the last moment to save my project and me from hopeless despair. To Jenny Martin and Crystal Robins for braving Polish food at a place that seemed to miss the memo about the invention of AC. To Kevin Berry for introducing me to the idea that Americans can produce and indeed know a thing or two about making a decent brew. In a professional vein, my thanks and deep appreciation for the work of Gabrielle M. Roy for assisting me, and indeed largely bearing the brunt of data collection for Chap. 6. Without her dedication and commitment this study would be a poor shadow of itself. Many thanks also to Catherine G. Payano for assisting Gabrielle with data collection. Last but not least, many thanks to Andrew and Julia Ross for rekindling an old friendship and letting me put them to the trouble of driving and showing me around their beautiful city. The majority of the work for this book was carried out at the Faculty of English, Adam Mickiewicz University in Poznan in the course of the Interdisciplinary Ph.D. Program: Language, Society, Technology, & Cognition. All the members of the program were my friends and commiserates throughout the long, long road from the first passable sentence of the first draft of my Ph.D. to the last dot put to the manuscript of this book. No words are deep enough, strong enough to tell each and every one of them how much our time together meant for me as a scholar and as a human being, but I may at least try. My thanks to my partner in crime, Rafał Jon´czyk for sharing the pains and the joys of making and serving the Language and Communication Laboratory we both worked at for most of our time in graduate school. To Marta Gruszecka for being that one friend we all need, the one who would rather make you a better human being by taking you down a peg rather than comforting you at every misstep you make. To Marta Marecka for being the paragon of orderliness, professionalism, and exactitude none of us will probably ever attain and for showing us that truly those who think something impossible should step out of the way of those who think otherwise. To Michał Pikusa, for challenging me to live beyond all kinds of comfort zones. Keep running, my friend, and one day I will definitely catch up. To Paula Ogrodowicz for breaking the limit of the sky and being more patient with me than I deserve. All the sweat, frustrations, cups of tea, and group hugs we shared over the last four years, I appreciate them all. Finally, my deepest apologies and appreciation to all my family by blood and by choice. Thanks in particular to Magda, Michał, Karolina, Łukasz, Kasia, and Tomek for keeping me in your and your children’s lives. I apologize for putting you on hold while I worked on my book and thank you for waiting. I am back.
[email protected]
Contents
1 Emotional Relativity—Argument from Nurture. . . . . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The Relativity of Emotions in Anthropology. . . . . . . . . . . . . . . . . . . 2 1.2.1 The Dawn of Relativity—Franz Boas and Salvage Anthropology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 The Principle of Linguistic Relativity and the Dual System of Language—Edward Sapir. . . . . . . . . . . . . . . . . . . 4 1.2.3 Relativity Through Habituation and the Seeds of Confusion—Benjamin Lee Whorf. . . . . . . . . . . . . . . . . . . 6 1.2.4 From Linguistic Relativity Principle to the Sapir–Whorf Hypothesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.5 Relativity of Emotions in Syntactic Structures . . . . . . . . . . . 12 1.2.6 Emotional Relativity in Semantics. . . . . . . . . . . . . . . . . . . . . 13 1.2.7 Nonverbal and Pragmatic Emotional Relativity. . . . . . . . . . . 18 1.3 Conclusions—Emotional Relativity. . . . . . . . . . . . . . . . . . . . . . . . . . 21 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2 Emotion Universals—Argument from Nature. . . . . . . . . . . . . . . . . . . . 27 2.1 Universalism in the Psychological Research on Emotions . . . . . . . . 27 2.1.1 The Great Pioneer—Charles Darwin’s Expression of Emotions in Man and Animals. . . . . . . . . . . . . . . . . . . . . . 28 2.1.2 The Forefathers of Psychology: Wilhelm Wundt and William James. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.1.3 Between the Dawn and Rebirth—From the Forefathers to Paul Ekman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.1.4 The Universalist—Paul Ekman . . . . . . . . . . . . . . . . . . . . . . . 37 2.1.5 Resistance and Revisionism—The Post-ekmanians. . . . . . . . 40 2.1.6 Conclusions—Emotional Universalism. . . . . . . . . . . . . . . . . 47
ix
[email protected]
Contents
x
2.2 Between Specificity and Universalism—Conclusion. . . . . . . . . . . . . 48 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3 Linguistics—The Great Absentee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 From Saussure to Chomsky—The Great Abstraction . . . . . . . . . . . . 54 3.3 Semiotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4 Semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.5 Pragmatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4 A Different Look at Emotion Processing Models. . . . . . . . . . . . . . . . . . 67 4.1 A Different Approach to Modeling and Visualization. . . . . . . . . . . . 67 4.2 The Classic Models of Emotion Processing. . . . . . . . . . . . . . . . . . . . 68 4.3 Transition Stage—Discrete Emotions Versus Early Dimensional Models of Emotion Processing. . . . . . . . . . . . . . . . . . . 70 4.4 Current Approaches—From Skeptical Resistance to Deep Complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.5 Conclusions—The Cartesian See-Saw. . . . . . . . . . . . . . . . . . . . . . . . 76 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5 The State of Emotional Prosody Research—A Meta-Analysis. . . . . . . 79 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Consensus on the Nature of Emotional Prosody Processing. . . . . . . 81 5.3 Literature Review Selection Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3.1 On the Development and Validity of Stimuli for Emotional Prosody Research. . . . . . . . . . . . . . . . . . . . . . 84 5.3.2 On the Populations Involved in Emotional Prosody Research. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.4 The State of Emotional Research—Evaluation. . . . . . . . . . . . . . . . . 107 5.5 Investigating Emotional Prosody in Nonnative English Speakers—Study Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.5.1 Creating Stimuli. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.5.2 Stimuli Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.5.3 Population Sampling—Nonnative English Speakers. . . . . . . 110 5.6 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6 The Development of Stimuli for Emotional Prosody Research: With Contributions from Prof Dr. Jeanette Altarriba, State University of New York, Albany, USA . . . . . . . . . . . . . . . . . . . . . . . 117 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.2 Stimuli Creation Stage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.2.1 Speakers Providing Emotional Speech Samples . . . . . . . . . . 118
[email protected]
Contents
xi
6.2.2 Materials—Elicitation and Acting. . . . . . . . . . . . . . . . . . . . . 119 6.2.3 Recording Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.2.4 Results—The Recorded Material and Emotion Elicitation Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.3 Stimuli Exploration Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.3.1 The Judges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.3.2 The Evaluation Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.3.3 Determining “Accuracy” Across Evaluation Procedures. . . . 130 6.3.4 The Results of the Exploration Study . . . . . . . . . . . . . . . . . . 132 6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7 Emotional Prosody Processing in Nonnative English Speakers. . . . . . 141 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.2 Participants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.3 Materials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.4 Experimental Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.5 Data Processing and Determining Accuracy . . . . . . . . . . . . . . . . . . . 148 7.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 7.6.1 The Valence and Arousal Evaluation Task (ValAr) Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 7.6.2 The Categorization Task (Cat.) Results. . . . . . . . . . . . . . . . . 152 7.6.3 The Free (Naming) Task Results—Statistical Analysis Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.6.4 The Free (Naming) Task—Qualitative Results Analysis. . . . 155 7.6.5 Task Difficulty Effects Analysis. . . . . . . . . . . . . . . . . . . . . . . 164 7.6.6 The Post-probe Questionnaire Results. . . . . . . . . . . . . . . . . . 165 7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8 Emotional Prosody Processing for Nonnative English Speakers . . . . . 171 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 8.2 On Reductionism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.3 On Negativity Bias. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 8.4 On Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 8.5 On Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 8.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
[email protected]
List of Abbreviations
* Indicates the native language of experimental subjects (pseudo-) Indicates a pseudolanguage A Anger Cat. The categorization task CHI Chinese D Disgust ENG English F Fear F Female FC Forced choice task FH Stimuli coming from female speakers expressing happiness The free naming task (Cat. level analysis) FreeCat FreeValAr The free naming task (ValAr level analysis) FS Stimuli coming from female speakers expressing sadness GER German H Happiness HIM Himba IND Hindi JAP Japanese M Male MH Stimuli coming from male speakers expressing happiness MS Stimuli coming from male speakers expressing sadness N Neutral (tone) PC Propositional content S Sadness SPA Spanish Sur Surprise SYR Arabic (Syria) TAG Tagalog ValAr The valence/arousal evaluation task
xiii
[email protected]
Introduction
Emotional prosody is a collection of patterns of intonation and stress in speech which accompany various emotional states, the “how” of things said. As an object of emotion research it has attracted intermittent attention of various researchers. The research on emotional prosody is, however, laden with multiple problems of practical and ethical nature which have prevented it from becoming as popular as the research on facial expressions of emotion. The majority of the methodological problems come from the contradiction inherent in the nature of emotional prosody as a research object and the methods routinely implemented in the research. The methodology is one of the few points of consistency across emotional prosody research and it is virtually all grounded in Paul Ekman’s universalism (Ekman 1992) and hinges on experiments based on forced choice tasks. This methodology works very well with facial expressions of emotion which, for all their subtlety, have prototypical and easily definable features. With emotional prosody it is very difficult to determine a prototypical pattern which would be universal, especially given the great variability of languages with respect to overall pronunciations and intonation patterns. Because of its dependence on the highly variable speech patterns, emotional prosody is inherently ambiguous (Scherer et al. 2003), while the forced choice task based on universal basic emotions demands unambiguous categorizations. In other words the methods used to investigate emotional prosody and the subject matter are incompatible. It would hardly be an overstatement to say that within the body of research on emotional prosody the population of native monolingual English speakers is overrepresented. Cross-linguistic studies are comparatively rare, and most research so far has been focused on the finer details of emotion recognition from prosody in native monolingual communities. With the preferential focus on Englishspeaking communities, we therefore have a reasonable overall understanding of how English emotional prosody is processed in native speakers of the language. For example, we know that females appear to integrate emotional cue information faster and are therefore better at recognizing emotions than males (Schirmer and Kotz 2003). We also know that the gender-specific differences in processing emotion have their onset at puberty (Fujisawa and Shinohara 2011) and offset in xv
[email protected]
Introduction
xvi
late adulthood, when age-specific differences take over (Paulmann et al. 2008). We know that the prosody of sadness, anger, and fear has relatively high recognition rates, but the prosody of happiness does not (Paulmann and Pell 2011). We also know that native English speakers can reliably recognize basic emotions in the prosody of other languages they do not know with accuracy well above chance (Thompson and Balkwill 2006). Conversely, speakers of other languages can recognize basic emotions in emotional prosody of English at above-chance levels of accuracy without knowing the language (Pell et al. 2009). What we do not know is how the large and growing population of nonnative English speakers process English emotional prosody. This study was designed to answer that question while systematically expanding on the reductionist paradigm prevalent in the emotional prosody research. In its devotion to the standard view of emotions, as the universal basic emotions theory according to Paul Ekman has been called (Russell 1994), emotional prosody research is a microcosm of emotion research in general. However, taking a closer look at the critical literature within the field as well as at the limitations and future directions listed typically at the end of each empirical paper, it becomes apparent that approaches from beyond the standard view mainstream may be more compatible with the subject matter. The literature suggests simple dimensional approaches as viable as they include the perceived arousal component which is strongly correlated with the fundamental frequency of the human voice (Scherer et al. 2003). The results implicating gender- and age-specific factors also suggest high-level processing along the lines of emotional appraisals (Ellsworth and Scherer 2003). However, nothing in the basic theory or the methodological practice of emotional prosody research suggests how to approach the subject matter using methods from outside the standard view. The latter is geared towards finding universal aspects of emotion processing, and in the case of nonnative speakers I was expecting differences. Indications that emotional prosody of basic emotions is processed differently by speakers of different languages are reported in the existing literature (Pell et al. 2009), but are never analyzed or interpreted. I therefore realized that to find and interpret differences in processing I would have to find a new approach and new theoretical framework that would allow me to systematically expand both the theoretical premise and methodological approach to emotional prosody research. Based on the observations reported in previous literature and on an exploratory study reported here in Chap. 6, I postulated six hypotheses on the way nonnative English speakers would process natively English emotional prosody. Nonnative English speakers are a nonhomogeneous group regarding proficiency. I hypothesized that participants of higher English language proficiency would be more accurate in their emotion recognition scores than participants of lower proficiency, as the better knowledge of the language would facilitate recognition. I also hypothesized that negative emotions (sadness) would be recognized significantly better than positive emotions (happiness), and that all emotions would be recognized better in female voices than in male voices. Based on initial reports from Drolet et al. (2012) I also included both natural and acted expressions of emotion.
[email protected]
Introduction
xvii
I hypothesized that the acted emotions, being the intentionally communicative and transparent expressions, would be recognized more accurately than the less transparent natural expressions. The exploratory study conducted to test the experimental tasks and stimuli created for the experiment proper in this study and to make predictions about the effects suggested two additional hypotheses. The three tasks designed for the study tapped into increasingly complex levels of emotion processing, each demanding increasing specificity and yielding increasing processing costs. I therefore hypothesized that the more costly the task the more the accuracy of emotion recognition from prosody would drop. Additionally, observing the results from the exploratory study, I noticed a trend whereby accuracy scores for negative emotions (sadness) were higher, but also the error patterns on positive emotions (happiness) implied that they are misinterpreted as negative emotions (anger, fear, sadness). I therefore additionally hypothesized that negativity bias would manifest on all levels of processing in nonnative English speakers in their emotional prosody recognition scores. To fulfill the study objectives I would have to start with a thorough historical revision of the universalism versus culture-specificity (nature vs. nurture) debate on the nature of emotions. I would have to develop new tools using my expertise in the fields of affective databasing and linguistics, and build my approach firmly on a constructive critique of the existing literature. Therefore, this book consists of two major parts. Part I, including Chaps. 1–4, is concerned with the broad theoretical background of the study. Part II, including Chaps. 5–8, comprises a four-part report on the study conducted to investigate how emotional prosody is processed in nonnative English speakers. Chapter 1 comprises an overview of the anthropological theory and empirical evidence for linguistic relativism in emotional expressions. The chapter opens with a history of the linguistic relativity principle since its inception in Franz Boas’ (1910) works to the full vulgarization, meaning misinterpretation through oversimplification and decontextualization (Joseph 2002), and redrafting as the Sapir– Whorf Hypothesis (Brown 1958). Following this historical overview, selected illustrative examples of anthropological evidence for linguistic relativism in the expression of emotions are presented in sections devoted to syntax, semantics, and pragmatics of language. Anthropology is presented with its evidence as one side of the universalism versus culture-specificity of emotions debate. Chapter 2 comprises an overview of the psychological perspective on the nature of emotional expression and emotion processing. The chapter starts with a historical overview of the theoretical progression of thought on emotions from Darwin’s (1872) observations predating the emergence of psychology as a discrete discipline to the contemporary revisionists represented by James A. Russell and Lisa Feldman Barrett. The chapter traces the roots of the psychological perspective on emotions through the rise and fall of holistic versus reductionist thinking and rational versus affective approaches to human psyche. Psychology is presented as the universalism side of the universalism versus culture-specificity of emotions debate due mainly to the lasting influence of Paul Ekman’s theory of panhuman emotional expressions.
[email protected]
Introduction
xviii
Chapter 3 covers the great absentee from the universalism versus culture-specificity of emotions debate, linguistics. The applied linguistics fields of semiotics, semantics, lexicography, corpus linguistics, and experimental pragmatics are presented as potential contributors to our understanding of how emotions are processed. The complementary potential of linguistics for emotion research is also discussed. Chapter 4 is devoted to selected psychological models of emotion processing based on theories presented in Chap. 2. Instead of simply reiterating the strongly hierarchical models as described and visualized in previous literature, I reanalyzed and reinterpreted the models to emphasize their temporal aspects which are crucial in emotional prosody processing. The models are presented as belonging to three crucial stages of complexity: the classic models (James–Lange, Cannon–Bard, Schachter–Singer), the transitional stage models (Schlosberg’s three-dimensional model, Ekman’s basic emotions model), and the advanced stage models (appraisal models, the circumplex, the Conceptual Act Model). Logical progression from simplicity to complexity demonstrated in the models is thus reimagined. Chapter 5 comprises a detailed critical meta-analysis of the existing empirical literature on emotional prosody. The chapter first discusses the in-field consensus regarding the nature of emotional prosody processing. What follows is an analysis of the way stimuli are typically developed and validated in previous research, and of the way population samples are drafted. I then present an evaluation of the existing body of evidence from emotional prosody pointing out the weak points in the methodologies implemented in previous research. The chapter concludes with a proposed study design for the present book based on a critical analysis on previous research and with the presentation of four main hypotheses. Chapter 6 is concerned with the process of stimuli development and the exploration study conducted on the stimuli. The process of stimuli development is described in detail, including the preparation of propositional content for the emotion acting sessions, the movie clips for emotion elicitation through effect control measures implemented, and the recording and stimuli editing procedures. The stimuli exploration study is then described, including the materials, procedures, results, and their implications. The chapter concludes with the presentation of two additional hypotheses suggested by the results of the exploration study. Chapter 7 is concerned with the experiment in which the six hypotheses postulated for this study were verified. All details of participant selection, effect control measures, materials, procedures, and results are reported in all relevant detail. A short discussion interpreting the more important results is also included. Detailed discussion of the results is presented in Chap. 8. There the results are placed in the wider context of emotion research, evaluated, and their limitations are described and addressed. The study presented here has been conducted using an integrative research paradigm geared towards searching for both universal aspects of emotion processing and for differences between emotion processing patterns in different communities. The paradigm proved viable and yielded reliable and interesting results, even though certain improvements will be necessary before the paradigm can
[email protected]
Introduction
xix
work at optimal capacity. It revealed fundamental perceptual mechanisms such as heuristics and negativity bias and complex mechanisms such as gender stereotyping to be present and significant in emotion processing. It allowed for a comprehensive insight into how emotions are processed on multiple levels and revealed the gender-specificity of many aspects of emotion processing. It is my hope that this book, from its broad theoretical background to the novel solutions in methodology can be useful to anyone interested in emotional prosody research, emotion processing in bilinguals, as well as to anyone with an interest in broadly understood emotion research.
References Boas, F. (1910). Psychological problems in anthropology. The American Journal of Psychology, 21(3), 371–384. Brown, Roger. (1958). Words and things. New York: Free Press. Darwin, Charles. (1872). The expression of emotions in man and animals. New York: D. Appleton and Company. Drolet, M., Schubotz, R. I., & Fischer, J. (2012). Authenticity affects the recognition of emotions in speech: behavioral and fMRI evidence. Cognitive, Affective and Behavioral Neuroscience, 12, 140–150. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200. Ellsworth, P. C. & Scherer, K. (2003). Appraisal processes in emotion. In: R. J. Davidson, H. Hill Goldsmith, & K. R. Scherer (Eds.), Handbook of affective sciences (pp. 572–596.). New York and Oxford: Oxford University Press. Fujisawa, T. & Shinohara, K. (2011). Sex differences in the recognition of emotional prosody in late childhood and adolescence. Journal of Physiological Science, 61, 429–435. Joseph, John. (2002). From Whitney to Chomsky. Essays in the history of American linguistics. Amsterdam: John Benjamins Publishing. Paulmann, S. & Pell, M. D. (2011). Is there an advantage for recognizing multi-modal emotional stimuli? Motivation and Emotion, 35, 192–201. Paulmann, S., Pell, M. D., & Kotz, S. A. (2008). How aging affects the recognition of emotional speech. Brain and Language, 104, 262–269. Pell, M. D., Paulmann, S., Dara, C., Alasseri, A., & Kotz, S. A. (2009). Factors in recognition in the recognition of vocally expressed emotions. A comparison of four languages. Journal of Phonetics, 37, 417–435. Russell, J. A. (1994). Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychological Bulletin, 115(1), 102–141. Scherer, K. R., Johnstone, T., & Klasmeyer, G. (2003). Vocal expression of emotion. In R. J. Davidson, H. Goldsmith, & K. R. Scherer (Eds.). Handbook of the Affective Sciences (pp. 433–456). New York: Oxford University Press. Schirmer, A., & Kotz, S. A. (2003). ERP Evidence for a sex-specific Stroop effect in emotional speech. Journal of Cognitive Neuroscience, 15(8), 1135–1148. Thompson, W. F., & Balkwill, L. L. (2006). Decoding speech prosody in five languages. Semiotica 158(1/4), 407–424.
[email protected]
Chapter 1
Emotional Relativity—Argument from Nurture
1.1 Introduction The history of emotion research across various disciplines that concern themselves with the subject shows that the one inevitable aspect of the field is antagonism. Theories, research frameworks and paradigms form in continuous opposition to one another. The subject of emotions is as divisive as it is interesting, and because multiple related disciplines investigate it, certain definitional problems are bound to arise. To this day there is virtually no single, agreed-upon definition of emotions and what is known as the emotion paradox frequently leads researchers to omit operational definitions from their works. The emotion paradox, noted already by Darwin (1872), was only clearly defined by Barrett (2011). What it boils down to is the simple observation that everybody feels they know what they are talking about when talking about emotions but asked to define emotions precisely they fail. And so adherents of various theories have their own definitions, representatives of various disciplines have their own, fringe researchers do too. Common certainties are few. All researchers appear to agree that emotions have a physiological, neural, neurochemical, linguistic, psychological, social, and cultural aspects. Where they differ is on which of these aspects are the most important, prominent, primal, or generally central, which determines the true nature of emotions. The arguments surrounding this topic have formed into a number of dichotomies in the overall scientific discourse on the subject of emotions, reviewed in great detail by Lutz and White (1986). However, it was the dichotomy between the universalist (psychological) and culture-specific (anthropological) approaches to emotion that held the greatest sway over the entire field of emotion research. From the very beginning of their discipline, psychologists sought validation of their claims in the physical, from observable physiological changes at the turn of the twentieth century to neurochemistry in the twenty-first century. More or less overtly, they also sought © Springer International Publishing Switzerland 2016 H. Ba˛k, Emotional Prosody Processing for Non-Native English Speakers, The Bilingual Mind and Brain Book Series 3, DOI 10.1007/978-3-319-44042-2_1
[email protected]
1
2
1 Emotional Relativity—Argument from Nurture
universal principles of human psychology which would be grounded in biology and overriding any potential cross-cultural differences. Anthropology was concerned with documentation of the rich variety of human cultures with an emphasis on the differences. Both disciplines had an interest in emotions, but each had its own methods and its own body of evidence. The problem then became that the evidence pointed each discipline towards different conclusions about the same phenomenon, while none could deny the validity of the other’s observations. The nature of emotions was clearly not easily explicable within the framework of just one discipline. Still, the research on emotions continued in both disciplines and developments and discoveries in one influenced the other. In Chaps. 1 and 2 I will trace the historical roots of the debate between anthropology and psychology on the subject of emotions which formed the theoretical and methodological background for the emotional prosody research.
1.2 The Relativity of Emotions in Anthropology Anthropological research on emotions has been by and large carried out within the framework of linguistic relativity and through the methods of anthropological linguistics. The notion of linguistic relativity has become arguably the most divisive idea in the history of emotion research. It its original form, developed in the early twentieth century by Franz Boas and two of his students—Edward Sapir and Benjamin Lee Whorf—the principle of linguistic relativity still guides the anthropological research on the entanglement of emotion, cognition, and language. Outside anthropology the notion of linguistic relativity has undergone what Joseph (2002) calls “vulgarization” that is misinterpretation through oversimplification and decontextualization. In the second half of the twentieth century such vulgarized understanding of linguistic relativity, known across multiple disciplines as the Sapir–Whorf Hypothesis, would have an enormous impact on emotion research in psychology (Pavlenko 2005). The ideas about the nature of emotion in psychology would form in sometimes explicit, sometimes implicit, but always strong opposition to the vulgarized understanding of relativity (Duranti 2001). The rift between these two disciplines would continue to deepen throughout the late twentieth century until the emergence of revisionist and integrative approaches to emotion research in psychology. In the meantime, anthropologists instructed in the subtleties of the principle of linguistic relativity in the original Boasian understanding continued collecting a substantial body of evidence to substantiate it. In the following sections, I will first trace the development of the principle of linguistic relativity from Boas’ (1911) original postulate to Brown’s (1958) vulgarization and discuss the mechanisms which led to the popularization of the vulgarized interpretation of the principle. Following this historical overview, I will present an illustrative selection of evidence supporting linguistic relativity collected in anthropological linguistics.
[email protected]
1.2 The Relativity of Emotions in Anthropology
3
1.2.1 The Dawn of Relativity—Franz Boas and Salvage Anthropology Throughout the late twentieth century psychologists investigating emotions would take a highly critical and occasionally disparaging stance towards anthropology due to its adherence to the principle of linguistic relativity. It is therefore one of history’s little ironies that the principle was actually first formulated by a psychologist by training and anthropologist merely by passion—Franz Boas. His is one of the most important and most often overlooked stories in the development of linguistic relativity. Franz Boas’ influence on the field of American anthropology paralleled, to an extent, that of Wilhelm Wundt’s on German psychology. He trained some of the greatest anthropologists of the early twentieth century, such as Edward Sapir, Benjamin Lee Whorf, or the greatly popular and then controversial Margaret Mead.1 His remarkably progressive thinking was influenced by a number of scientific breakthroughs of his time from Albert Einstein’s relativity to William James’ psychological subjectivity and it directed his students’ thinking towards the formulation of linguistic relativity principle. Boas took a remarkably enlightened position for his times to the preliterate indigenous cultures he investigated. He was both sensitive and opposed to the ideas of racial or cultural superiority evident in the patronizing tone of the Western scientific discourse (Pavlenko 2005). He understood very early on that illiteracy did not equate with cultural, moral or intellectual paucity, and he instilled in his followers a professional and respectful attitude toward the cultures and peoples they investigated (Kay and Kempton 1984). This attitude, supported by Boas’ empathy towards rapidly vanishing cultural heritage of multiple tribes in the Americas would be the foundation of his school, often referred to as salvage anthropology. The adherents of this school led by Boas himself would endeavor to document the indigenous cultures on the brink of disappearing without a decipherable trace due to societal, civilizational, and environmental changes. The documentation covered, often in exquisite detail, the four major objects of anthropological investigation according to Boas: the language, culture (rituals, traditions, legends, religion etc.), material artifacts, and human remains (Duranti 2003). Franz Boas stands not only as the father of American anthropology but also as the father of linguistic anthropology for a good reason. As early as in 1910 he postulated that language, with a special emphasis on linguistic variation, should become the central area of empirical interest for anthropologists (Boas 1910). Boas understood “language” to be the totality of means—verbal and nonverbal— used by humans to communicate their mental and emotional states to others (Boas 1938). Combined with his egalitarian outlook on cultures, such view of language 1Margaret
Mead’s Coming of age in Samoa (1928) was a widely read and appreciated book in both popular and academic circles. The book made her name but over time garnered controversy for both its candid descriptions of sexual practices in the Samoan community and the lack of professional distance an impartiality in her descriptions of Samoan cultural practices.
[email protected]
4
1 Emotional Relativity—Argument from Nurture
and linguistic variation formed Boas’ understanding of the relationship between language, cognition, and emotions. Boas postulated that, however idiosyncratic they make it, every human culture and every human language can express any concept produced by any human mind (Boas 1911). In other words, human minds are equal in mental capacity for understanding and expression, though the forms of expression for equivalent concepts may differ. Interestingly, he also observed that two cultures formally using the same language but subject to different sets of historical and social circumstances will have distinct sets of concepts, ways of thinking and feeling (Boas 1938). It is not language alone which is the bearer of mentality and culture, says Boas, but language within the context of culture with all its history and associations built up over the lifetime of that culture. On an individual level, Boas considered both emotion and cognition as thoroughly subjective phenomena, each expressed in a specific way in language, though bound by the cultural and situational contexts. Thought was expressed mainly through the verbal channel, while emotions were most naturally conveyed through the nonverbal channel, particularly through vocalizations and body language. However, Boas also acknowledged the ubiquity of emotions in the verbal channel. He observed that an emotional load will often be attached to words of particular cultural significance for a given population, and that there always exists a collection of vocabulary specialized for emotional expression (Boas 1938). In other words, Boas implies that every culture has its own culturally defined and linguistically idiosyncratic manner of expressing emotions. Boas’ main focus was on the primary mission of salvage anthropology—the documentation and preservation of the disappearing cultures for the future generations. Therefore his ideas about the nature of language, cognition, emotions, and reality never crystallized into a coherent theory of relativity. They were more of a by-product of his thinking, albeit a significant by-product. The significance of Boas’ ideas on these subjects would come into fruition to the greatest extent in the writings of his student Edward Sapir. Boas’ central focus on language and linguistic variability, the implied idea of inherent connection between thought, language and cultural context, as well as the broad definition of language all played a role. Though still somewhat mired in Cartesian dualism, Boasian thought was exceptionally forward and subtly complex. But it would be up to his students, Edward Sapir and Benjamin Lee Whorf, to develop the nascent idea of relativity present in Boas’ work into a formal principle of linguistic relativity.
1.2.2 The Principle of Linguistic Relativity and the Dual System of Language—Edward Sapir As for his mentor, anthropology was not Edward Sapir’s first calling. He was a linguist first and his linguistic training would color much of his anthropological work. Sapir’s linguistic background and anthropological training under Franz Boas made him the perfect man to flesh out the basics of the linguistic relativity
[email protected]
1.2 The Relativity of Emotions in Anthropology
5
principle. He focused mainly on the relationship between language and thought but acknowledged the importance of emotions for human communication and anthropological studies alike. His works elucidating the linguistic relativity principle are a mixture of anthropological and linguistic influences. On the one hand linguistic relativity principle was developed from the fundamentals proposed by Boas for anthropological research. On the other hand, the principles of Saussurean2 structuralism, one of the leading theories of language at the time, determined Sapir’s systematic analysis of language itself. Sapir’s work focused on the structure of language within cultural context and how that relationship was reflected in thought processes. Sapir considered language on two fundamental aspects of structure and function, distantly echoing Saussure’s division of linguistic competence into langue and parole. Structurally, Sapir saw language as a dual system of formal code and its mental representation. The code was a conventionalized system of primarily verbal signs used voluntarily to convey thoughts, emotions, and desires. The mental representations encompassed the concepts, meanings, and the grammar governing language on both levels (Sapir 1921). Sapir postulated that our processing of the objective reality is filtered through the dual system of language. We subjectively perceive the world and consciously process reality in a logical fashion guided by our language (Sapir 1924). What the critics of the vulgarized deterministic interpretation of linguistic relativity seem to focus on is what seems like an implication of certain finality to this arrangement. However, for Sapir the dual nature of the language system and the communicative purposes it is put to made for a very dynamic system. Language on the level of mental representations can develop new concepts and meanings, create new words which can become widely adopted on the level of the conventional code of language. On the level of the conventional code we can encounter new words denoting novel concepts—these can be understood and adopted enriching language on the level of mental representations. As Sapir succinctly put it: “the word, as we know, in not only a key; it may also be a fetter” (Sapir 1921). In other words, language limits our thinking but as language can endlessly change and grow there is no real limit on our capacity for perceiving reality in new ways. Much like Boas, Sapir considered emotions as expressive phenomena of mainly the nonverbal channel of communication. Functionally, he saw language in a way very similar to his mentor, as the totality of means used for interpersonal communication, and he distinguished between verbal and nonverbal channels of communication (Sapir 1921). Any means of communication that are physiologically defined, such as body language, facial expressions, or speech tone he categorized 2Ferdinand de Saussure—a Swiss scholar mostly honored for his major contributions to linguistics and semiotics. The basic tenet of Saussurean theory of language was that it has two principal forms. The first was la langue, that is the abstract, inherent, and perfect system of meanings and grammar retained in the mind. The second was la parole, that is the actual manifestation of la langue in speech and interpersonal communication, which was imperfect but the only means of accessing and analyzing la langue (Saussure 1959).
[email protected]
6
1 Emotional Relativity—Argument from Nurture
as nonverbal (Sapir 1927a). The verbal channel, meaning the actual speech with its propositional content was not, however, immune to the influence of emotions according to Sapir. He saw emotions as pervasive and highly significant for everyday social life and consequently for culture. In his anthropological and linguistic research he observed that there are two fundamental types of verbal expressions of emotion. On the one hand there is always a trove of emotionally loaded words and words denoting specific emotions (Sapir 1927–32). These words are usually considered important, even constitutive of a culture and the basis of national identity. On the other hand, there are the propositionally neutral, nonemotive verbal expressions which can be imbued with emotional meaning given appropriate interpersonal or situational context (Sapir 1927b). To put it another way, emotions are ubiquitous, potentially omnipresent in everyday interpersonal communication. Emotional expressions are thus highly variable in terms of how they are expressed in language. However, Sapir also observed that things get even more complicated when the cultural contexts are factored in. Display rules, that is what a given community considers acceptable or not in terms of explicit expression and manifested through taboos and rituals also increase the variability of emotional expressions between different languages and cultures (Sapir 1930). Sapir believed that emotions, much like any aspect of mentality governed by language under his definition of language, are subject to relativistic effects. Linguistic relativity, according to Sapir, is a principle which explains the mechanism of variability of various languages and cultures. Language on the level of codified convention and flexible mental representation constitutes a perceptual filter between our conscious thought and objective reality. Language guides the way think subconsciously and pervasively. Language limits the number of ways our cognition can spontaneously develop but does not limit our cognitive capacity. The way we express ourselves and process the expressions of others is highly variable and subject to contextual and conventional influences from situational issues to display rules. Sapir emphasized that given this model of language and human communication, the only way a given culturally specific or languagespecific concept can be understood is within the context of that culture and that language (Sapir 1926). In this respect his was a thoroughly anthropological view of linguistic relativity in that he would always consider linguistic artifacts in their native context. Sapir’s version of linguistic relativity was thus already a coherent proposition. Far from a deterministic law, it was a principle explaining the general mechanics of linguistic variability in cultural contexts. The details of the mechanics of the principle would be worked out by Sapir’s student, Benjamin Lee Whorf.
1.2.3 Relativity Through Habituation and the Seeds of Confusion—Benjamin Lee Whorf Much like his teachers, Boas and Sapir, Benjamin Lee Whorf did not start his professional career as an anthropologist. His first job was as a chemical and fire
[email protected]
1.2 The Relativity of Emotions in Anthropology
7
prevention engineer. However, he was probably among the firsts to be formally trained as an anthropological linguist. The majority of his field work in anthropology was directed, likely by Franz Boas, to preliterate Mesoamerican cultures and all of his work leaned heavily on linguistic analyses, following the teachings of Edward Sapir. Although Sapir and Whorf are usually both credited for the development of linguistic relativity, Whorf’s name would bear the brunt of the criticism. Some critical works would refer to the principle of linguistic relativity disparagingly as “Whorfianism.” However, even though his most often cited formulation of the linguistic relativity principle sounds very strong when considered out of context, Whorf’s position was much closer to Sapir’s moderation than to the Nietzschean “prisonhouse of language.”3 Whorf’s original contribution to the development of the linguistic relativity principle was working out the mechanism driving the phenomenon of linguistic relativity. Whorf adopted Sapir’s definition of language as a dual system of code and representation but investigated it as a Boasian anthropological object. Where Sapir dwelled on the nature of the language system, Whorf focused on how the nature of the system would influence the anthropological investigation of languages. As an anthropologist, Whorf considered language as a collective phenomenon, impossible to investigate on the level of an individual but only as a cross-section of individuals fulfilling all kinds of roles in a given community. In this respect Whorf’s writings show some influences of the fundamental principles of immersion and participant observation postulated by the father of social anthropology Bronislaw Malinowski.4 Thus language on the level of codified convention was an outward manifestation of culture, which in turn was the collective mental representation of the world. In other words, for Whorf language was a “mass mind” expressing itself (Whorf 1939a). The “mass mind,” the collective representations common and specific to any linguistic community, was a dynamic system where new representations, concepts and elements of code could appear and be conventionalized. In this respect Whorf’s understanding of the dual system of language remained the same as Sapir’s. Habit was the key mechanism Whorf believed to be the driving force behind linguistic relativity. He famously illustrated his idea of how language triggered certain habituated behavior with an example from his fire prevention practice (Whorf 1939a). In this example, employees of a gasoline drum storage take a smoke break next to a store filled with drums labeled “empty.” The label “empty”
3Even Nietzshe was not quite so radical. David Lovekin traced the origins of the “prison-house” metaphor to a poetic mistranslation of Nietzshe’s words by Erich Heller. The original wording from Nietzshe implied that the logical thought is “constrained” rather than “imprisoned” by language (Lovekin 1991). 4Bronislaw Malinowski—a Polish anthropologist and ethnographer most famous for his works on the Trobriand Island cultures, largely covered by his seminal works Argonauts of the Western Pacific: An account of native enterprise and adventure in the Archipelagoes of Melanesian New Guinea (1922) and The Sexual Life of Savages in North-Western Melanesia (1929). His greatest methodological contribution to anthropology ws the participant observation method.
[email protected]
8
1 Emotional Relativity—Argument from Nurture
makes them believe open flame in proximity to those particular gasoline drums is harmless, as only “full” drums have gasoline which burns and explodes upon contact with open flame. In fact, explains Whorf, gasoline vapor rising from emptied drums is far more explosive than liquid gasoline, making the e mployees’ behavior fatally reckless. For the employees the labels “empty” and “full” are elements of a conventionalized code which in habituated mental representation for this situational context denoted “safe” and “dangerous,” respectively. For him, as a fire prevention engineer, both “empty” and “full” denoted “dangerous” in the very same situation. The observation in the objective reality, wrote Whorf, is the same, but the habit of using one denotation of code over another causes a fundamental difference in the subjective perception of the world (Whorf 1939b). For him the dual system of language was acquired as a means of understanding and communication, hence certain limits that the system might impose on perception are not consciously felt. In Whorf’s understanding the language convention, that is the historically motivated consensus on what mental representation each element of the code of language denotes, is a type of habit. Language forms patterns expressed not only in its semantics, but also in its syntax, morphology, phonology, and pragmatics. He predicted that given strict and systematic approach relativistic effects would be found on every level of language complexity from phonemes to sentence structure and even proposed a plan of such systematic description (Whorf 1938). He predicted those patterns would be found both in the code and in mental representations. A novel representation can give birth to a new code element, a new code element can help develop a novel representation, but convention, patterns or habit guide the dual system of language on the whole. Each culture forms its own habits of language use and its own mappings of meaning between the code and representation (Whorf 1939a). However, like Sapir, Whorf did not see this as a process with a set end point. Rather he believed that through interaction with the objective reality and other cultures and languages, other subjective visions of the world our own subjective views of reality can undergo an “intellectual adjustment” (Whorf 1941a). Habits, after all, can be broken. The locus classicus of what is usually understood as the Whorfian definition of linguistic relativity goes as follows: users of markedly different grammars are pointed by their grammars towards different types of observations and different evaluations of externally similar acts of observation, and hence are not equivalent as observers but must arrive at somewhat different views of the world (Whorf 1940b).
While the passage itself has been quoted endlessly in the critical literature, its context is rarely—if at all—mentioned. The passage comes from a paper entitled “Linguistics as an exact science,” which was one of three published in MIT Technological Review, a journal with a dedicated audience of hard scientists of varied specializations from applied mathematics to engineering. The other two papers in this series were “Science and linguistics” (Whorf 1940a), and “Languages and logic” (Whorf 1941a). In all three, Whorf addressed the intended
[email protected]
1.2 The Relativity of Emotions in Anthropology
9
audience of hard scientists, sensitizing them to the fact that linguistic relativity could compromise their scientific objectivity. At that time English was already rising to the status of the lingua franca of science. Whorf warned his audience that the habituated thinking cultivated in a monolingual English environment could turn their coveted scientific objectivity into an English-centric subjectivity (Whorf 1941a). To prevent ossification of scientific thinking within the framework of a single language, Whorf proposed including multilingual linguists to provide “corrective” perspectives from other languages on various problems. In other words, Whorf’s strongest formulation of linguistic relativity was made not on language in general, but on language as a medium of communication of science. Like his predecessors, Whorf was much less radical in his ideas about linguistic relativity than the later critics would make him out to be. While he did express concern over the potential of relativistic effects to arrest the development of science, he followed Boas and Sapir in considering linguistic relativity primarily as a principle applicable in anthropological research. He followed Sapir’s nuanced definition of language and Boasian framework of language being but one of four anthropological objects of interest in research. Whorf explained the mechanism of habit by which the linguistic relativity principle operated in everyday life and how those habits did not constitute an insurmountable barrier for thought. Staying true to the teachings of both Boas and Sapir he believed that the mental capacity of human beings exceeds any expressive limits the conventional codes we call human languages. Neither Sapir nor Whorf were the determinists the later critical literature on the “Sapir–Whorf Hypothesis” called them. The deterministic interpretation of their works on linguistic relativity were a product of an inevitable process of vulgarization which started soon after their deaths.
1.2.4 From Linguistic Relativity Principle to the Sapir–Whorf Hypothesis For Boas, Sapir, and Whorf linguistic relativity never became what psychologists would call a testable hypothesis. For them it was rather a principle observable in their subjects and applicable to anthropological research, which included more than just language, however, it might be defined. Sapir and Whorf developed the principle of linguistic relativity within the context of anthropological research of language as one of four Boasian anthropological objects. The dual system of language in their understanding was always motivated by its social, cultural, and historical contexts and reflected in part in material artifacts. Language was to be interpreted within its broad anthropological context, not as a simple record of words and grammar. The relativity of languages was not between language and thought, the relativity operated with respect to the cultural, social, and historical contexts of linguistic development. The greater the differences of contexts the greater the differences between languages. Linguistic relativity was an anthropological research framework, and applied with all the sensitivity and conditions stressed by Boas,
[email protected]
10
1 Emotional Relativity—Argument from Nurture
Sapir, and Whorf, it was a valid and consistent one, over time yielding an impressive body of evidence. Given all of the above, it is natural to wonder who, if not Sapir and Whorf, postulated the “Sapir–Whorf Hypothesis.” Edward Sapir died in 1939 and Benjamin Lee Whorf in 1941. Franz Boas outlived both his students, and died in 1942. They could neither develop nor defend their principle as originally conceived against the vulgarization processes started by their successors. Those processes started almost immediately after Boas’ death in 1942 and were facilitated by the global social and political events at the time (Joseph 2002). The United States was on the brink of entering another World War, this one even more than the previous one driven by unprecedented technological and scientific progress. Both the front and the home front needed more efficient ways to kill, supply, medicate, transport, and feed the global population engaged in total war. This was a time when the works of Freud on the subconscious entered public consciousness and the word “propaganda” began acquiring the devious connotations it has today. A time when the anti-semitic Nazi propaganda compared the unwanted individuals and ethnicities to insects and rats. A time the Soviet propaganda-based legal death row sentences on ill-will interpretations of the most innocent expressions regarding the enemy, the motherland, or the party line (Solzhenitsyn 1973). The world was in the midst of the greatest cultural, political, and ideological turmoil in its history, and any idea which suggested that certain words could subconsciously determine what and how we think, feel, or do would fall on willing ears and under heavy criticism all at the same time. One of the books which captured the public mood of this period was Samuel Ichiye Hayakawa’s Language in action. Published for the first time in 1941, it included in its appendix one of Whorf’s MIT Technological Review papers, “Science and Linguistics” (Whorf 1940a). Hayakawa was a linguist, specifically a semanticist, and he appeared to have arrived at conclusions regarding language and thought somewhat akin to those of Whorf. His avowed aim in Language in action, however, was far more immediately practical. The book is part textbook on semantics, part analysis of the language-thought codependency, part a guidebook to breaking out of linguistically motivated perceptual habits (Hayakawa 1947). The book played a significant role in the popularization of principle of linguistic relativity as it appeared among the choices for 1941 Book of the Month Club, prompting sales in the thousands (Joseph 2002). This was the first time Whorf’s writings became available to a large and widely varied audiences. Whorf was dead, and the book his paper was appended to appeared to lean on the side of deterministic influences of language on thought. The paper was given no commentary and no interpretation. In other words, upon being presented to the wider world, linguistic relativity principle appeared out of its intended scientific discourse context and inserted into a textual context of casual determinism. The stage was set for the vulgarization process and the transformation of an idea from an anthropological linguistic relativity principle into the “Sapir–Whorf Hypothesis.” What Edward Sapir and Benjamin Lee Whorf devoted their careers to was the anthropological principle of linguistic relativity. The “Sapir–Whorf Hypothesis”
[email protected]
1.2 The Relativity of Emotions in Anthropology
11
was not developed or even named by either of them. Should we name the Hypothesis after those who actually operationalized the principle into a testable hypothesis, it would have to be known as the “Brown-Lenneberg Hypothesis” (Pavlenko 2013). Roger W. Brown was a psychologist of vastly diverse interests and a substantially influential publication record. Eric Lenneberg was a man of varied scientific interests, trained in linguistics, neurology, and psychology, and the pioneer of the idea of language innateness. Working together they conducted one of the first empirical trials of what they themselves understood as the “Whorf hypothesis” using a simple color-naming task (Brown and Lenneberg 1954). Brown’s later works (Brown 1957, 1958) also mentioned linguistic relativity as an idea proposed by Whorf alone, without references to his teachers and predecessors, Sapir and Boas. His and Lenneberg’s was the first recorded attempt to adapt the linguistic relativity principle for experimental research. Brown also reformulated “The Whorf Thesis” (Brown and Lenneberg 1954) into weak and strong versions. In his 1958, book Words and things, he defined the weak version along the original moderate position of Sapir, and the strong version along highly restrictive deterministic lines (Brown 1958, as cited in Pavlenko 2013). While Brown and Lenneberg operationalized linguistic relativity and made attempts at investigating it experimentally, they still believed it to be Whorf’s idea alone. The idea behind the “Sapir–Whorf Hypothesis” may have been popularized riding on the back of Hayakawa’s publishing success, but in true relativistic fashion the idea needed to be named to be fixed in the academic mass mind. The term “Sapir–Whorf Hypothesis” was likely first mentioned during a 1954 conference commemorating the lives and works of the two scholars in a paper read by a linguist-anthropologist Harry Hoijer (Koerner 2003). However, what would truly cement the term and much of the deterministic bent in its interpretation would be the 1956 publication of the selected works of Benjamin Lee Whorf under the title Language, Thought, and Reality edited by the psychologist John B. Carroll. Carroll’s Introduction to the collection framed the interpretation of the whole along the lines of “Sapir–Whorf Hypothesis” rather than the linguistic relativity principle (Carroll 1956). He traced the roots of Whorf’s thoughts on the subject back to Sapir, credited both the teacher and the student with the development of the principle, called it a hypothesis and implied a deterministic bent in his interpretation of Whorf’s works. His argumentation was in this respect much akin to Hayakawa’s rhetoric, though here it served as an introductory comment to a collection of Whorf’s work framing the readers’ perception of the whole. This is all the more significant as the grand majority of later publications would use Carroll’s (1956) selection as the go-to source for Whorf’s works and as the major literature source for the “Sapir–Whorf Hypothesis.” To summarize, neither Sapir nor Whorf developed the hypothesis that bears their name. Their domain was anthropology, and they observed that given particular contextual (social, cultural, historical, environmental) configurations different cultures will develop different languages (under their definition of “language”). Linguistic relativity for them was an anthropological principle deduced from observation and material evidence. The hypothesis under the definition used in
[email protected]
12
1 Emotional Relativity—Argument from Nurture
experimental psychology—a radical logical statement which is testable and falsifiable—was a later creation. Developed, named, and even tested to an extent by specialists from other fields—linguists and psychologists in the main, the hypothesis became something entirely new, a vulgarized version of the original principle. Simplified to fit the logical rigors of experimental methods and torn out of its disciplinary context, the linguistic relativity principle was replaced in the public and academic imaginations with the “Sapir–Whorf Hypothesis.” And though plentiful criticism was rained down upon the “Sapir–Whorf Hypothesis” in its vulgarized form, the linguistic relativity principle with all its subtleties continued to be a useful research framing device in anthropology. An impressive body of evidence was collected to illustrate the linguistic relativity principle in many areas of life for a great variety of cultures. A substantial portion of the evidence covers the linguistic relativity phenomena in the expression of emotions. As emotions are the main focus of this book, I will go over some of the more interesting findings which reveal the great variability the linguistic and cultural construction of emotions documented by anthropology in recent decades.
1.2.5 Relativity of Emotions in Syntactic Structures Both Boas and Sapir believed emotions to be both pervasive and important anthropological phenomena. Whorf largely omitted the subject of emotions in his works but believed relativistic effects could and would be found on every level of linguistic complexity. Sapir shared Boas’ belief that the natural channel for emotion expression was nonverbal, but thought also that emotions could manifest themselves in the verbal channel and have a multitude of mental representations. Thus, in accordance with the original linguistic relativity principle anthropologists could reasonably expect to find relativistic effects across all functional levels of language according to Morris’ trichotomy (Nöth 1995): syntax, semantics, and pragmatics. Of the three, syntax has been the most elusive vehicle for emotional expression to investigate within the framework of linguistic relativity. Since the late 1950s syntax has become almost exclusively the domain of Chomskyan linguistics which did not encourage empirical field research into the relationship between emotions and Universal Grammar (see Chap. 3 for further details). Anthropologists, due to the nature of the conditions and situations they would normally work with, had more interest in semantics and pragmatics. Therefore, what evidence there is for relativistic effects in the emotional expression in syntactic structures is rather incidental and fragmentary. Tuvaluan is a language classified by UNESCO as a definitively endangered (11,000 speakers in the year 2000). It is a Polynesian language spoken on the island of Tuvalu and in a few locations on New Zealand. Besnier (1986) found that in this language marking nouns ergatively or pseudo-ergatively expresses extremely negative emotional attitude toward the object/person spoken of. Ergative marking is an increasingly rare syntactic structure, whereby both the direct object
[email protected]
1.2 The Relativity of Emotions in Anthropology
13
and subject of an intransitive verb share the same morphological marker, a different one from the subject of a transitive verb (Comrie 1976). Unfortunately, Besnier gives no examples of the affectively negative ergativity, as emotional functions of the construction were not his primary focus. In Japanese, on the other hand, Niyekawa-Howard (1968) described the case of adversative passive case expressions which help convey a specific negatively charged emotional meaning. These passives are used to describe a relationship whereby an individual is subjected to an unpleasant event against his/her will. The author gives the following example (in romaji transliteration5): A man who was visited in a hospital by a welcome guest reports to his wife Kyoo [guest] ga mimai ni kita, but should the guest be an unwelcome one, the man would instead report: Kyoo [guest] ni mimai ni korareta. The latter example is a grammatical construction overtly conveying the unpleasant implications of having had an unwelcome guest. Finally, at least one source notes that in English emotionally loaded phrases appear to disrupt Theta-role assignment, whereby apparently no simple predicates exist that could express both the Subject/Target and the Causer roles in an emotionally loaded phrase (Pesetsky 1995). This matter is described by Pesetsky in some technical detail as a “Target-Subject matter restriction” problem. The evidence on relativistic effects from studies of syntax are admittedly few, though they are intriguing. Because syntax commands the logical structure of sentences, and governs meaning relationships such as causation and agency, there is a considerable potential for expressing complex emotions through specific syntactic structures. Sadly the specialists in the field of syntactic structures are linguists of the Chomskyan school, whose interests are elsewhere, and this direction remains vitally unexplored. Still, with the ascent and development of integrative paradigms in the research of emotions and the widespread movement toward interdisciplinarity in linguistics there are still reasonable chances of gaining more evidence and knowledge on this subject. For what it is worth we have at least an evidence-based indication that syntactic structures too can convey some emotional meaning, and the way that meaning is syntactically expressed is subject to relativistic effects.
1.2.6 Emotional Relativity in Semantics From anthropolinguistic perspective our subjectively constructed perspectives of the objective world vary according to relativistic effects, and the variability is coded and reflected in language. It therefore would be natural to expect to find differences in how emotions are represented and coded in various languages and cultures. Such differences have indeed been documented across multiple studies falling into two broad categories: emotion typologies and case studies of single terms of particular cultural significance. The evidence in this area usually comes
5Romaji—a
style of notation of the Japanese language using the Latin alphabet.
[email protected]
14
1 Emotional Relativity—Argument from Nurture
from long-term immersive studies of single cultures and includes extensive details about the social, situational, and cultural contexts in which the investigated emotions occur. On the evidence reported, the ways in which members of different cultures experience emotions are often unique to the point of defying any systematic comparison across cultures, except in the broadest of terms. Anthropologists often apparently struggled with the Whorfian trap of an English–centric subjectivity of their observations in the face of the rich variety of emotional expressions they encountered. They would routinely find emotion terms in one language without sensible equivalents in any other and emotion typologies vastly different from the broadly accepted Western typology (cf. Ekman 1992). Occasionally they would even find that “emotions” did not constitute a distinct phenomenological or semantic category in some languages.6 The Ifaluk people of the Ifalik atoll in Micronesia represent have a number of terms denoting psychosomatic experiences triggered by highly specific interpersonal contexts. Their emotion typology was described in some detail by Lutz (1982), who noted the high degree of hierarchical social entanglement in all the analyzed Ifaluk emotion terms. Lutz observed that the Ifalukian internal states cluster into meaningful categories which appear to obey the basic rules of prototypical organization of concepts (Rosch 1978), but these clusters do not appear equivalent to any of the basic emotions accepted in the West. The emotion terms themselves carry the basic meanings of visceral sensations bounded by the social context which evokes the emotions (Lutz 1982). For example, one cluster which Lutz designates “emotions of connectedness and loss” among others includes the terms fago and laloileng. Depending on the context of use, fago can denote love, compassion, or sadness (e.g., at the prospect of losing a loved one). Laloileng denotes a mixture of loneliness, abandonment, and insecurity caused by being separated from loved ones. Another cluster, designated for social fears includes terms like metagu—a nervousness accompanying situations where ritualized deference is expected, ma—an embarrassment caused by failing to perform in such a situation, and bobo—a disappointment felt at failing to perform appropriately. As is typical for closely knit island communities, social hierarchical deference plays an important role in the Ifaluk culture and the emotion terms they had developed reflect and help uphold their social structure. The experience of emotions for the Ifaluk appears to be founded not on an independent self but an interpersonal self, the very definitions of emotions are critically dependent on the social situations that trigger them. There are, however, other South Pacific communities for which the basic idea of emotions being an individual experience of an independent self is the norm. Fajans (1983) described three “sentiments” which collectively constitute the self-perceived national character of the Baining speakers of the New Britain Island (Papua New Guinea): shame (angirrup, akalup), hunger (anaingi, airiski), and awumbuk. For the Baining, all
6Coining the term “emotion” to denote a coherent category of psychosomatic experiences is usually attributed to Descartes who first described it in his 1649 essay Passions de l’Ame (Frijda 2008).
[email protected]
1.2 The Relativity of Emotions in Anthropology
15
three are inherently emotional in subjective experience. The case of the Baining language is particularly interesting as there appears to be a certain gradation to these “sentiments” with respect to how well they translate into other languages. Shame is one of the most powerful emotions, regulating the social behaviors and defining transgressions of social order. According to Fajans the subjective experience of shame in Baining as well as the situations that trigger it is quite equivalent to shame in the English language. Hunger is different. The words denoting it, anaingi and airiski denote a conjoined visceral sensation of hunger and psychosocial sense of isolation accompanying long lone journeys from one familial group to another, a frequent activity for the Baining. Finally there is the awumbuk, an emotion described only half-jokingly as a “social hangover” by Fajans. Awumbuk is a sense of diffused sadness, loneliness, boredom, and lassitude which follows the departure of a loved one. Depending on the intensity and closeness of social interaction with the loved one and the manner of departure (anything from finishing a visit to death) the severity of awumbuk will vary. The themes of good social comportment, frequent migrations and visits among familial groups run through the Baining cultural “sentiments” and frame their own unique emotional world. The idea of national character being expressed in a culture- and languagespecific emotion term is not rare. One such case is the Japanese amae (甘え), described in commendable detail by Niiya et al. (2006). With a deeply revered core of inherently eastern traditions and ubiquitous and far-reaching western influences, the Japanese culture has long been a prized location for the study of linguistic relativity. Amae expresses one of the traditional aspects of the Japanese emotional world and echoes the fact that Japan belongs to the Eastern collectivist societies with an interdependent self (Markus and Kitayama 1991). The emotion is usually defined as a type of affective dependency wherein one depends on another’s love, compassion, and indulgent care. It is usually evoked in a situation where one party asks a favor of another. Depending on how appropriate or not the favor is perceived and received the amae may carry different subjective valence, which is quite a unique property for an emotion (Niiya et al. 2006). The term itself is also used in various related ways. Amae can denote the subjective experience of the emotion, any situation which may trigger it, the types of behaviors it may prompt, even the fact that one subscribes to the belief the emotion is an expression of the Japanese culture. Japan belongs to the high-context cultures that is cultures in which interpersonal communication is based on minimal verbal content compensated by high dependency on situational context to disambiguate nuances of what has been verbalized (Hall 1976). Amae is an emotion with reciprocity mechanics built into its functional definition, a perfect expression of a collectivist high-context culture (Niiya et al. 2006). An entire culture of interpersonal relations echoed in a single heartfelt emotion. The examples of emotions considered constitutive for the national character are not limited to the Pacific basin. Similar cases have been described in European languages. The Dutch have the term gezelligheit, usually defined as a type of lowkey conviviality, sociability, with accompanying lack of inhibitions or shame. Gezzeligheit is subjectively felt to be an emotion, but also a kind of consciously
[email protected]
16
1 Emotional Relativity—Argument from Nurture
adopted affective disposition of openness and welcome extended towards others in an effort to have others reciprocate gezelligheit (Lindemann 2009). The Danish have the term hygge, which also denotes a certain variety of sociability, specifically a state of emotional and psychological comfort outside social stratification. However, hygge is also used to denote a certain affective disposition whereby one actively opposes any manifestation of social or economic stratification among Danish compatriots, a feeling of togetherness, a sense of “being Danish together” (Linnett 2011). Polish has the term tęsknić, which can be translated as a variety of English words with the general meaning of “missing something/somebody,” neither of which could express the exact type of “missing” denoted by the Polish term. Tęsknić has been aptly glossed by Wierzbicka (1992) as “the pain of distance,” that is the emotional suffering brought on by being helplessly and inevitably separated in space from the people or things loved and cared for. The term has been of much cultural significance (Goddard and Wierzbicka 2008) for Poles, it could reasonably be argued, due to the romantic tradition of the Great Emigration fixed in the public consciousness and culture.7 Gezelligheit, hygge, and tęsknić are all single words which not only have profound meaning for the cultures they appear in, but are of great functional significance framing and defining deeply rooted traits of national character. In the broadly understood Western culture one of the most ubiquitous and subjectively natural emotions is anger (Averill 1983). There are, however, cultures for which anger is an emotion repressed to the point virtual non–existence. Such is the case with the language of the Utku, a nomadic tribe of the Inuit culture in Northwestern Canada. The anthropologist Jean Briggs described the Utku emotion typology in her seminal work under the telling title Never in Anger (1970). During her time as an adoptive daughter of an Utku family, Briggs observed that children of the tribe are allowed to express a range of emotions akin to the Western notion of anger, while for the adults these emotions are effectively under taboo. The divide between children and adults is defined by the concept of ihuma, which the Utku understand as being in full emotional control, and expressing the kind of cheerful, warm, and positive disposition which facilitates collaborative action. Subjectively felt, ihuma is thus a type of emotion, but also a moral disposition to suppress the expression of negative emotions, especially any forms of emotive aggression. Children lacking ihuma often manifest ill-controlled outbursts of angry emotions such as huaq—verbal aggression, ningaq—physical aggression, qiquq—hosility, or urulu—annoyance. Utku adults possessed of ihuma are considered unable to feel any of these childhood varieties of anger. The revered social order is balanced precariously on this idea of responsible adults never expressing 7The Great Emigration—a significant and heavily romanticized episode of Polish cultural history typically defined as the period just before, during and immediately following the gradual partition of the country between Austria, Prussia, and Russia in the 19th century. The Great Emigration was an exodus of the intellectual elite from Poland to continue their political, scientific, and creative work in countries sympathetic to the Polish cause. Frédéric Chopin was one of those Great Emigrants.
[email protected]
1.2 The Relativity of Emotions in Anthropology
17
hostility, aggression, or negativity to one another. Briggs gives an excruciating account of how she herself was ostracized for 6 months after breaking this taboo and openly displaying anger. Though a Westerner (a kapluna) herself, she was an adult and by adoption a member of the tribe. Therefore, her showing a lack of ihuma constituted a breach of a fundamental emotional taboo, for which transgression she was severely punished. If anger is repressed in some cultures for the overall benefit of the community, it is only natural to find cultures for which this emotion is central. For the Ilongot of the island of Luzon in the Philippines the term liget serves to express the vital force of their culture and their people. In the West anger is routinely filed under the label “negative,” even by scholars who acknowledge that in evolutionary terms there is nothing inherently negative about the emotion and the negative connotations are a later addition of moralistic nature (Averill 1983). For the Ilongot liget is of the same ambivalent nature, but on a higher level of complexity. The emotion is heavily dependent on situational context and, like the Japanese amae, can carry different valence depending on the situation. At its core the fundamental meaning behind liget is energy/force, but it is most often glossed as “energy, anger, passion” (Rosaldo 1980), and indeed there appear to be three major threads of meaning behind the term. On the culturally functional level, liget is the affective driving force behind cathartic acts of ritual violence manifested through the traditional Luzon practice of headhunting. Outside of this ritual context, liget is also understood as the hedonistically positive sensation of strength and virility accompanying productive manual labor or sex, as well as satisfaction and nonphysical love. Finally, liget can be experienced negatively, as force pent up and destructive from within, manifesting itself in ill health, social disruption or withdrawal. The Ilongot subjectively perceive liget viscerally much like a raw somatic substance from which emotions are roughly shaped by the context in which they appear. From headhunting to lovemaking, liget is the emotional axis of the Ilongot subjective emotional experience of the world. The study of semantics of emotion terms provides some of the most intriguing and compelling evidence for the mechanics of the linguistic relativity principle beyond the obvious litany of apparently culturally unique emotions. Whatever the degree of separation between their own deeply subjective understanding of emotions and that of the cultures they investigated, anthropologists always managed to understand and convey what these unique emotions were. Thus in testament to Boas’ and Sapir’s belief they would demonstrate time and again that our human capacity for mutual understanding is universal and far exceeds the variability of the conventions we use to code emotions. At the same time they illustrated just how different the human emotional perceptions can be. In different cultures and languages emotions vary in terms of degree of abstraction or functional intensity. What emotion is considered crucial for one culture may be deemed destructive and tabooed in another. High-context cultures even have emotion terms which can switch valence depending on the situational context. Overall the body of evidence for the working mechanics of the linguistic relativity principle in semantics is one of the largest and most diverse in anthropology.
[email protected]
18
1 Emotional Relativity—Argument from Nurture
The examples given in this section are merely a collection of illustrative examples. The point to be taken from the relativistic effects in semantics is that every community socially coherent enough to develop a distinct culture and language will naturally develop its own unique ways of linguistic emotional expression. Within the anthropological linguistic relativity principle framework emotions are a type of social barometer policing the acceptability and appropriateness of behaviors, including emotional displays. Different cultures are based on different concepts of self (Markus and Kitayama 1991) and different languages are differentially dependent on context to disambiguate the propositional contents conveyed in the verbal channel (Hall 1976). Different cultures have different environmental and historical experiences which shape the way their community and family structures form, and these dictate the kinds of relationships that become established. The relationships in turn run on emotions which will thus necessarily vary in expression from one culture to another, as evidence shows. Certain emotions grow in social and cultural significance to such a degree that they are conceived of as inherent and indispensable aspects the national character. This much was anticipated by Sapir. However, both Sapir and Boas believed the verbal expressions of emotions (e.g., in syntactic and semantic structures) secondary at best. The most natural channel for the expression of emotions, they believed, was the nonverbal, and this would prove one of the most difficult and elusive aspects to study.
1.2.7 Nonverbal and Pragmatic Emotional Relativity The nonverbal channel of emotion communication is arguably the most notoriously hard aspects of human communication to investigate. Language in general has crucial deictic properties which necessitate the inclusion of situational contexts into every study of nonverbal communication. But even disregarding context for the sake of argument, the human body language alone operates on so many variables that the potential relativistic variability may be effectively infinite. The human body operates in four-dimensional space, it is subject to internal psychosomatic disruptions which may affect the otherwise normal gestural expressions, and virtually every part of the body can be used expressively to convey emotions. Facial expressions, hand gestures, body postures, vocalizations, and tone of voice, on certain level even the body chemistry (bodily secretions) are all potential vectors of nonverbal emotion expression whether conscious or not. The quality and quantity of certain manner of nonverbal expressions is subject to relativistic effects, as is the context in which these expressions occur. The context matters not only for its deictic relatedness to language but also for constructs such as personal space and display rules. The personal space can be actively managed, display rules obeyed or violated, and all of the above constitute a manifestation of nonverbal communication of emotions. And while the task of investigating nonverbal emotion communication remains a difficult one, some evidence for expressive linguistic relativity effects has been found.
[email protected]
1.2 The Relativity of Emotions in Anthropology
19
All cultures and languages communicate naturally through the body language. All have a range of gestures which are conventionalized much like language, each gesture carrying a highly specific meaning. All likewise have a range of personal and interpersonal space management habits which shape their communicative practices. In line with the linguistic relativity principle, researchers have found a number of differences both in the iconic gestures and in the management of space. Labarre (1947) lists an impressive number of previously documented gestures, often taken for granted in the Western academic world, which can carry vastly different, even opposite meanings across cultures. Nodding and head shakes can signify agreement or disagreement, right or left hand can be preferentially used in welcoming gestures, deference can be shown by standing up or sitting down, sticking out the tongue can mean playfulness or threat. According to Labarre these differences run along a global line of demarcation roughly between the literate North-West and the preliterate South-East. It also appears that Hall’s (1976) division of languages into high- and low-context ones applies to nonverbal communication. Kleinsmith et al. (2006) found that speakers of high-context languages will have the greater tendency to rely on gestures in overall communication than speakers of low-context languages, who will rely more on the plain propositional content. Yet even with cultures falling on the same side of the line of demarcation there is still plenty of room for relativistic variation. Kirch (1979), for example, investigated the differences between the gestures and interpersonal space in communication between Americans and a number of European nations. Nonverbal communication, emphasizes Kirch, is most often unconscious and laced with emotional content varying from broad positivity when culture-specific rules are obeyed and anxiety if they are breached. The broad East-West distinction in nonverbal emotion behavior has been investigated with some interest in business circles, with the aim of improving crosscultural business negotiations. A lot of attention in this area of research was given to issues of dominance, which is a concept strongly correlated with emotions (Bradley and Lang 1994), and those of display of negative emotions. For example, Semnani-Azad and Adair (2011) found that there is a difference in how Chinese (East) and Canadian (West) businessmen organize the physical space of their offices to signal dominance. The Chinese businessman will put a lot effort into organizing their office spaces and into controlling every aspect of body language in an effort to manifest their control of the entire situation. Canadian businessmen on the other hand will tend to relax their body postures and create physical distance between themselves and their interlocutor to manifest their emotional ease with the situation. All the same, Westerners will tend to establish boundaries between professional and private settings, allowing themselves to be emotionally open in private but more restrained and closed-off at work (Moran et al. 2012). In other words, the professional and business sphere appear to be operating on the kind of work ethics wherein open displays of emotion are considered more a hindrance than help. There are certain conventions in each culture and society which dictate what emotions can be expressed and to what extent depending on the individuals
[email protected]
20
1 Emotional Relativity—Argument from Nurture
present and their social status. These conventions are known as display rules. The Utku concept of ihuma could be said to be a display rule and the acquisition of ihuma an emotive-behavioral rite of passage into adulthood. As it turns out, all manner of display rules on nonverbal emotion expression are defined by the line dividing childhood and adulthood, and the majority of these rules pertain to negative emotions. For example, Novin et al. (2008) described marked differences in display rules in children raised in the Eastern type Iranian culture and those raised in the Western type Dutch culture. Iranians have an age-based social hierarchy. The children are taught from a very young age that displaying negative or extreme emotions in the presence of the elders is considered deeply disrespectful. It is, however, considered perfectly normal to show such emotions to age peers. By contrast Dutch children, raised in the spirit of individualism obey the reverse pattern of display rules. They deploy much more restrictive emotional display rules among peers, not to seem vulnerable or weak, while losing such inhibitions within the nurturing and forgiving familial context is tolerated and even encouraged to build intimacy. Children usually acquire display rules from their next of kin. It has been observed that there are significant qualitative and quantitative differences in nonverbal emotional exchanges between mothers and their infants depending on the mothers’ ethnicities (Tamis-LeMonda et al. 2012). Thus starting from a preverbal age infants become socialized within a certain limited range of nonverbal communicative acts. They will then be exposed to various other members of their communities and expand their knowledge, largely by trial and painful error, on what is and is not acceptable in terms of explicit emotional display. They learn their lessons on the acceptability of emotional displays by suffering negative interpersonal repercussions when transgressing a rule. There is also some evidence that between the ages of 9 and 11 children in the Western cultures also learn that all kinds of emotions, including the extreme, can be freely discharged in solitude (Zeman and Garber 1996). There also appear to be certain tendencies in negative emotion displays that once again set the East and the West apart. For the adults in canonical Western societies such as the English-speaking United States or Canada open and assertive displays of negative emotions in virtually every context are considered perfectly acceptable, and are not usually socially sanctioned. For the adult Japanese any display of a potentially socially disruptive emotion such as anger, disgust, or contempt is unthinkable and usually carries severe social repercussions (Safdar et al. 2009). One aspect of nonverbal expression of emotion which is of particular interest in this book is emotional prosody. Vocal expressions of emotions other than the propositional content of speech are intriguing because they run concurrently to the propositional content. While propositional content falls under virtually complete conscious control, the vocalization itself does not. It has been found that such basic vocalizations as laughter and crying differ across languages in their vocal and causal patterns (Labarre 1947). Vocalizations signaling pain or distress also show distinct variability across languages and cultures (Gendron et al. 2014). And recent evidence shows, clearly echoing Sapir’s prediction, that communities who
[email protected]
1.2 The Relativity of Emotions in Anthropology
21
speak the same official language will not necessarily share the same culture, especially, if the contexts in which these communities developed differ. Petri Laukka and his team investigated the native English speakers of five different varieties of English: Kenyan, Australian, Indian, Singaporean, and American (Laukka et al. 2014). They found that although those five Englishes are the same in their fundamental grammar, the prosodic signatures of negative emotions such as anger, sadness, and fear in them are quite distinct from one another. In line with the anthropological principle of linguistic relativity, the historical colonial contexts in which these five varieties of English developed molded the uniform code into unique manners of emotional expressions. I will discuss the matter of emotional prosody variability in and across different languages and cultures in more detail in Chap. 5. The nonverbal channel of communicating emotions, though postulated as the most natural for the expression of emotions is rather underrepresented in the body of evidence in favor of linguistic relativity. It is, however hardly surprising, all things considered. Language in the understanding of “the code” is a system which is well structured and straightforward, accessible to consciousness and conventionalized—all of which traits make it a much more easy point of access to the “mass mind” of a culture than the body language. Body language, whether expressed in posture, facial expression, or gesture is much more dependent of barely controllable vagaries of human physiology swayed by emotions. Furthermore, because of its huge variability and—apart from the iconic gestures—near impossibility of describing the body language with any measure of systematicity, studies investigating the nonverbal aspects of emotion expression have naturally been slow in coming. Adequate tools for such a description are only now becoming available. Nonetheless, even with the small amount of evidence we have from the nonverbal channel also tells us something about the relativity of emotion communication. The iconic gestures which do not arise naturally from the human physiology but must be acquired in the course of socialization vary across cultures as much as the verbal expressions do. Whether it be global patterns of emotional display rules, single gestures, facial expressions, or tone of voice, relativistic effects are present in all languages.
1.3 Conclusions—Emotional Relativity The linguistic relativity scholarship in the twentieth century was, to summarize it, a bit of a Jekyll and Hyde character case. On the one hand there was the detailed and meticulous study of the rich diversity of human emotional and linguistic expressions grounded in the context of the investigated cultures. On the other there was the vulgarized version of the linguistic relativity principle, operationalized as a research hypothesis, oversimplified and torn out of its proper anthropological context, and attracting academic scorn. Those investigating emotion expressions within the linguistic relativity framework faced quite a challenge. They were
[email protected]
22
1 Emotional Relativity—Argument from Nurture
trying to describe culturally unique emotion-specific syntax, emotion terms and typologies, display rules, gestures and vocalizations in the lingua franca of modern science, English (Janney 1996). At the same time they were doing their best not to fall into the Whorfian trap and having the objectivity of their observations compromised by an English-centric subjectivity (Leavitt 1996). Their works usually concerned small, remote indigenous groups, were highly detailed, carried out in the spirit of Boas’ salvage anthropology and under the conditions of Malinowski’s participant observation. They described their subject matter in impressive detail and analyzed it taking into account the proper cultural and social contexts, in accord with Sapir’s recommendations. However, it is also true that these were all by and large case studies lacking the kind of systematicity of description and analysis as proposed by Whorf (1938), precluding any meaningful comparison across cases. That kind of normativity and reliability of measurement is the domain of modern psychology, which would take exceptionally keen interest in both emotion research and in linguistic relativity. Much of the anthropological evidence for linguistic relativity, especially that on semantic variation, was and remains compelling. And throughout most of the second half of the twentieth century this evidence would sit uncomfortably alongside the psychological theoretical mainstream with its universalist-reductionist approaches to the nature of emotions (see Chap. 2). The psychological mainstream in emotion theory would develop and mature in search of emotion universals and under contrarian rhetoric aimed at the vulgarized version of the linguistic relativity principle. And yet the beginnings of the psychological theories of emotion were far more open and accommodating to anthropological nuances than it is often appreciated. In Chap. 2 I will trace the development of the mainstream psychological theories of emotion from their postulated roots in the works of Darwin, Wundt, and James to the cutting edge of the holistic-integrative approaches of the twenty-first century.
References Averill, J. R. (1983). Studies on anger and aggression: Implications for theories of emotion. American Psychologist, 38(11), 1145–1160. Barrett, Lisa Feldman. (2011). Constructing emotion. Psychological Topics, 20(3), 359–380. Besnier, N. (1986). Word order in Tuvaluan. In P. Geraghty, L. Carrington, & S. A. Wurm (Eds.), FOCAL I: Papers from the fourth international conference on Austronesian linguistics (pp. 245–268). Canberra: Pacific Linguistics. Boas, F. (1910). Psychological problems in anthropology. The American Journal of Psychology, 21(3), 371–384. Boas, F. (1911). Introduction. In F. Boas (Ed.), Handbook of American Indian languages (pp. 5–83). Washington: G.P.O. Boas, F. (1938). Language. In Franz Boas (Ed.), General anthropology (pp. 124–145). Boston: D.C. Heath and Company. Bradley, M. M., & Lang, P. (1994). Measuring emotion: The self-assessment manikin and the semantic differential. Journal of Behavioral Therapy and Experimental Psychiatry, 25(1), 49–59.
[email protected]
References
23
Briggs, Jean L. (1970). Never in anger. Portrait of an Eskimo family. Boston: Aldine Publishing Company. Brown, R. W. (1957). Linguistic determinism and the part of speech. Journal of Abnormal and Social Psychology, 55(1), 1–5. Brown, R. W., & Lenneberg, E. H. (1954). A study in language and cognition. The Journal of Abnormal and Social Psychology, 49(3), 454–462. Brown, Roger. (1958). Words and things. New York: Free Press. Carroll, J. B. (1956). Introduction. In J. B. Carroll (Ed.), Language, thought, and reality. Selected writings of Benjamin Lee Whorf (pp. 1–34). Cambridge: MIT Press. Comrie, B. (1976). Ergativity. In W. P. Lehmann (Ed.), Syntactic typology: Studies in the phenomenology of language (pp. 329–393). Austin: University of Texas Press. Darwin, Charles. (1872). The expression of emotions in man and animals. New York: D. Appleton and Company. Duranti, A. (2001). Linguistic anthropology: History, ideas, and issues. In Alessandro Duranti (Ed.), Linguistic anthropology: A reader (pp. 1–38). Oxford: Blackwell Publishers Ltd. Duranti, A. (2003). Language as culture in U.S. anthropology. Current Anthropology, 44(3), 323–347. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200. Fajans, J. (1983). Shame, social action, and the person among the Baining. Ethos, 11(3), 166–180. Frijda, N. (2008). The psychologist’s point of view. In M. Lewis, J. M. Haviland-Jones, & L. F. Barrett (Eds.), Handbook of emotions (pp. 68–88). New York: The Guilford Press. Gendron, M., Roberson, D., van der Vyver J. M., & Barrett, L. F. (2014). Cultural relativity in perceiving emotion from vocalizations. Psychological Science, 1–10. Goddard C., & Wierzbicka A. (2008). Universal Human Concepts as a Basis for Contrastive Linguistic Semantics. In M. A. G. Gonzales, J. Lachlan Mackenzie & E. M. Gonzales Alvarez (Eds.). Current Trends in Contrastive Linguistics: Functional and cognitive perspectives (pp. 205–226). Amsterdam: John Benjamins Publishing Company. Hall, E. T. (1976). Beyond culture. New York: Anchor Books Doubleday. Hayakawa, S. I. (1947). Language in action. A guide to accurate thinking, reading, and writing. New York: Harcourt, Brace and Company. Janney, R. W. (1996). Speech and affect: Emotive uses of English. Unpublished manuscript. Munich. Joseph, John. (2002). From Whitney to Chomsky. Essays in the history of American linguistics. Amsterdam: John Benjamins Publishing. Kay, P., & Kempton, W. (1984). What is the Sapir-Whorf hypothesis? American Anthropologist, 86(1), 65–79. Kirch, M. S. (1979). Non-verbal communication across cultures. The Modern Language Journal, 63(8), 416–423. Kleinsmith, A., De Silva, P. R., & Bianchi-Berthouse, N. (2006). Cross-cultural differences in recognizing affect from body posture. Interacting with Computers, 18, 1371–1389. Koerner, E. F. K. (2003). Toward a history of American linguistics. London: Routledge. Labarre, W. (1947). The cultural basis of emotions and gestures. Journal of Personality, 16(1), 49–68. Laukka, P., Neiberg, D., & Elfenbein, H. A. (2014). Evidence for cultural dialects in vocal emotion expression: Acoustic classification within and across five nations. Emotion, 14(3), 445–449. Leavitt, J. (1996). Meaning and feeling in the anthropology of emotions. American Ethnologist, 23(3), 514–539. Lindemann, H. (2009). Autonomy, beneficence, and gezelligheit: Lessons in moral theory from the Dutch. Hastings Center Report, 39(5), 39–45.
[email protected]
24
1 Emotional Relativity—Argument from Nurture
Linnett, J. T. (2011). Money can’t buy me hygge. Danish middle-class consumption, egalitarianism, and the sanctity of inner space. Social Analysis, 55(2), 21–44. Lovekin, D. (1991). Technique, discourse, and consciousness: An introduction to the philosophy of Jacques Ellul. London and Toronto: Associated University Press. Lutz, C. (1982). The domain of emotion words on Ifaluk. American Ethnologist, 9(1), 113–128. Lutz, C., & White, G. M. (1986). Anthropology of emotions. Annual Review of Anthropology, 15, 405–436. Malinowski, B. (1922). Argonauts of the Western Pacific: An account of native enterprise and adventure in the Archipelagoes of Melanesian New Guinea. London: Routledge and Kegan Paul. Malinowski, B., & Ellis H. (1929). The sexual life of savages in North-Western Melanesia. An ethnographic account of courtship, marriage, and family life among the natives of the Trobriand Islands, British New Guinea. New York: Eugenics Publishing Company. Markus, H. R., & Kitayama, S. (1991). Culture and self: Implications for cognition, emotion, and motivation. Psychological Review, 98(2), 224–253. Mead, M. (1928). Coming of age in Samoa. A psychological Study of Primitive Youth for Western Civilization. New York: William Morrow & Company. Moran, C. M., Difendorff, J. M., & Greguras, J. G. (2012). Understanding emotional display rules at work and outside of work: The effects of country and gender. Motivation and Emotion, 37, 323–334. Niiya, Y., Ellsworth, P. C., & Yamaguchi, S. (2006). Amae in Japan and the United States: An exploration of a ‘culturally unique’ emotion. Emotion, 6(2), 279–295. Niyekawa-Howard, A. M. (1968). A psycholinguistic study of the Whorfian hypothesis based on the Japanese passive. Honolulu: University of Hawaii Press. Nöth, W. (1995). Handbook of semiotics. Stuttgart: J. B. Metzlersche Verlagsbuchhandlung. Novin, S., Barenjee, R., Dadkhah, A., & Rieffe, C. (2008). Self-reported use of emotional display rules in the Netherlands and Iran: evidence for sociocultural influence. Review of Social Development, 18(2), 397–411. Pavlenko, A. (2005). Bilingualism and thought. In J. F. Kroll & A. M. B. de Groot (Eds.), Handbook of bilingualism. Psycholinguistic approaches (pp. 433–453). Oxford: Oxford University Press. Pavlenko, A. (2013). Sapir, Whorf and the hypothesis that wasn’t. http://cup.linguistlist.org/academic-books/sapir-whorf-and-the-hypothesis-that-wasnt. Accessed: 20 July 2014. Pesetsky, D. (1995). Zero syntax: Experiencers and cascades. Cambridge: MIT Press. Rosaldo, M. (1980). Knowledge and passion. Cambridge: Cambridge University Press. Rosch, E. (1978). Principles of Categorication. In E. Margolis and S. Laurence (Eds.), Concepts: Core Readings (pp. 189–206). Cambridge: MIT Press. Safdar, S., Friedlmeier, W., Matsumoto, D., Yoo, S. H., Kwantes, C. T., Kakai, H., et al. (2009). Variations of emotional display rules within and across cultures: A comparison between Canada, USA, and Japan. Canadian Journal of Behavioural Science, 41(1), 1–10. Sapir, E. (1921). Language. An introduction to the study of speech. (Kindle Ed.). The Project Gutenberg. Sapir, E. (1924). The grammarian and his language. American Mercury, 1, 149–155. Sapir, E. (1926). Notes on psychological orientation in a given society: Hanover conference presentation and excerpts of discussion. (Reprinted from R. Darnell & J. T. Irvine (Eds.), (1999), The collected works of Edward Sapir III: Culture (pp. 73–97). Berlin: Walter de Gruyter). Sapir, E. (1927–32). The psychology of culture, notes from lectures by Edward Sapir made by David Mandelbaum. (Reprinted from R. Darnell & J. T. Irvine (Eds.), (1999), The collected works of Edward Sapir III: Culture (pp. 421–675). Berlin: Walter de Gruyter). Sapir, E. (1927a). Speech as a personality trait. American Journal of Sociology 32, 892–905. (Reprinted from R. Darnell & J. T. Irvine (Eds.), (1999), The collected works of Edward Sapir III: Culture (pp. 120-132). Berlin: Walter de Gruyter). Sapir, E. (1927b). Language as a form of human behavior. The English Journal 16, 421–433. (Reprinted from R. Darnell, & J. T. Irvine (Eds.), (1999), The collected works of Edward Sapir III: Culture (pp. 204-216). Berlin: Walter de Gruyter).
[email protected]
References
25
Sapir, E. (1930). The cultural approach to the study of personality: Hanover conference presentation and excerpts of discussion. (Reprinted from R. Darnell & J. T. Irvine (Eds.), (1999), The collected works of Edward Sapir III: Culture (pp. 199–254). Berlin: Walter de Gruyter). Saussure, F. (1959). Course in general linguistics. New York: Philosophical Library. Semnani-Azad, Z., & Adair, W. L. (2011). The display of ‘dominant’ nonverbal cues in negotiation: The role of culture and gender. International Negotiation, 11, 452–479. Solzhenitsyn, A. (1973). Arkhipelag Gulag. Paris: YMCA Press. Tamis-LeMonda, C. S., Song, L., Leavell, A. S., Kahana-Kalman, R., & Yoshikawa, H. (2012). Ethnic differences in mother-infant language and gestural communications are associated with specific skills in infants. Developmental Science, 15(3), 384–397. Whorf, B. L. (1938). Language: Plan and conception of arrangement, personal memo circulated by Whorf in his department in 1938. In J. B. Carroll (Ed.), (1956), Language, thought, and reality. Selected writings of Benjamin Lee Whorf (pp. 125–133). Cambridge: MIT Press. Whorf, B. L. (1939a). The relation of habitual thought and behavior to language. In J. B. Carroll (Ed.), (1956), Language, thought, and reality. Selected writings of Benjamin Lee Whorf (pp. 134–159). Cambridge: MIT Press. Whorf, B. L. (1939b). Gestalt technique of stem composition in Shawnee. In J. B. Carroll (Ed.), (1956), Language, thought, and reality. Selected writings of Benjamin Lee Whorf (pp. 160–172). Cambridge: MIT Press. Whorf, B. L. (1940a). Science and linguistics. In J. B. Carroll (Ed.), (1956), Language, thought, and reality. Selected writings of Benjamin Lee Whorf (pp. 207–219). Cambridge: MIT Press. Whorf, B. L. (1940b). Linguistics as an exact science. In J. B. Carroll (Ed.), (1956), Language, thought, and reality. Selected writings of Benjamin Lee Whorf (pp. 220–232). Cambridge: MIT Press. Whorf, B. L. (1941a). Languages and logic. In J. B. Carroll (Ed.), (1956), Language, thought, and reality. Selected writings of Benjamin Lee Whorf (pp. 233–245). Cambridge: MIT Press. Wierzbicka, A. (1992). Semantics, culture, and cognition: Universal human concepts in culturespecific configurations. Oxford: Oxford University Press. Zeman, J., & Garber, J. (1996). Display rules for anger, sadness, and pain: It depends on who is watching. Child Development, 67, 957–973.
[email protected]
Chapter 2
Emotion Universals—Argument from Nature
2.1 Universalism in the Psychological Research on Emotions Although anthropologists included emotions in the scope of their research, the subject has always been a minor one for them. Emotions only mattered as one of the factors influencing the formation of the language and culture and regulating behaviors within investigated communities. In psychology on the other hand, emotions have been one of the major subjects of interest since the formal inception of the discipline. Throughout its history the scholarly thinking about emotions in psychology followed the typical pattern of back and forth of opposing ideas gaining and losing ground in what was considered the scientific mainstream. Apart from the period when the behaviorist and neobehaviorist models of human psyche dominated psychology, the subject of emotions enjoyed a remarkably steady status as one of the definitional aspects of human nature (Averill 1983). In the second half of the twentieth century emotions all but took center stage owing mainly to the academic and popular appeal of Paul Ekman’s theory of universal basic emotions. The late twentieth and early twenty-first century theories of emotion formed the new mainstream of psychological research. And while these theories stood up on their own on the strength of empirical evidence, they often sought validation in other disciplines and in the past, mainly in the works of the forefathers of their discipline, Charles Darwin, William James, and Wilhelm Wundt. What follows is a historical overview of the development of contemporary theories of emotion. Their roots are traced back to the forefathers of modern discipline of psychology, through the reductionist paradigms of the twentieth century to the revisionistintegrative approaches of the twenty-first. The discussion of models of emotion processing will be left for Chap. 4, while here I will focus specifically on the formation of emotion theory within psychology and how that was influenced by anthropology. © Springer International Publishing Switzerland 2016 H. Ba˛k, Emotional Prosody Processing for Non-Native English Speakers, The Bilingual Mind and Brain Book Series 3, DOI 10.1007/978-3-319-44042-2_2
[email protected]
27
28
2 Emotion Universals—Argument from Nature
2.1.1 The Great Pioneer—Charles Darwin’s Expression of Emotions in Man and Animals The first study of human emotions which approached the subject in a focused and systematic manner actually predates the emergence of psychology as a discrete scientific discipline. Published in 1872, The Expression of Emotions in Man and Animals by Charles Darwin remains one of the most prescient documents of its time regarding emotions. It embraced the mental, physiological, and expressive aspects of emotions and anticipated many of the issues that would trouble and divide the nascent science of psychology for decades to come (Hess and Thibault 2009). Darwin was very firmly a product of his times, a gentleman scientist with a keen interest in a number of disciplines in the broadly understood natural sciences ranging from geology to medicine. The knowledge he gained from these various disciplines allowed him to create a uniquely, for his time, comprehensive and forward understanding of human emotions and their origins. Although his method of collecting data1 (Darwin 1872) leaves a lot to be desired from the position of modern scientific sensibilities, the insights into the nature of emotions Darwin gave his contemporaries are decidedly astute from the selfsame point of view. Darwin considered human emotions from the perspective of his own theory of descent with modification,2 and The Expression of Emotions in Man and Animals hinges entirely on a complete acceptance of the theory. He saw a clear continuity from the more whole-body visceral manifestations of emotion in animals distantly related to humans to the more detached and psychologically elaborate human emotions. Modern interpretations of Darwin’s work pay a lot of attention to his indepth analysis facial expressions of emotion, to the detriment of his work concerning a broad spectrum of other nonverbal expressions of emotion. Among other telling examples, Darwin described dancing as an almost primal manifestation of “joy, high spirits, love, tender feelings, devotion.” He listed physiological reactions beyond full conscious control, such as crying as expressive of “suffering of body and mind,” suggesting the psychosomatic nature of emotional causality. He even mentioned various aspects of speech tone characteristic for the expression of certain emotions, such as “choked up voice” accompanying different manifestations of anger. While he did focus on the face naturally as the most expressive body part engaged directly in communication, Darwin already appreciated the visceral, broadly somatic nature of the emotion experience and perception. Certain ideas presented in The Expression of Emotions in Man and Animals foreshadowed the shape of things to come in the psychological study of emotions.
1Though
correspondence with scientists and artists around the world. deference to Darwin’s cautiousness, I will use his preferred term for the process. Contrary to the popular belief, he did not like the term “evolution”. He hardly used it in any of his writings on the subject, and it did not appear at all in his major work on the subject, On the Origin of Species (1859). 2In
[email protected]
2.1 Universalism in the Psychological Research on Emotions
29
There are indications in the book that Darwin understood that how emotions are experienced vs. how they are expressed are two interrelated but discrete problems. Likewise, he observed that while humans experience emotion recognition in others as easy, they struggle to put names to the emotions or verbally define them. Psychologists, being humans, would valiantly try and inevitably fail to come to a consensus on the definition of “emotion” for decades to come (see Kleinginna and Kleinginna 1981 for equally informative and entertaining historical review of the existing definitions). Darwin believed that both social processes that would later become known as socialization and enculturation, as well as biological and neural processes play important roles in emotion mechanisms. Social psychology of emotions rested on the former while theories such as Ekman’s basic emotions sought validation in the latter. Finally, Darwin proposed that there exist a range of innate emotions, inherited by modern humans from our animal ancestors through descent with modification. He believed that at their core those innate emotions would be panhuman, and suggested that their existence could be proven by comparing the evidence from the Western world to evidence collected from remote preferentially preliterate cultures. The idea that emotions are universal and the universality can be proven by comparing the literate and preliterate emotional expression, particularly in the face would become the principal direction of the modern emotion research led by Paul Ekman. Researchers, however, tend to focus so fiercely on the chance of discovering a universal principle that they often overlook the crucial caveat Darwin had ascribed to this postulate. The caveat was that languages vary immensely in their patterns of emotional expressions and universal basic emotions could be used as a sensible tertium comparationis to make meaningful discrimination of the varying characteristic of emotional expressions across languages (Darwin 1872). In other words, Darwin’s ideas appear to have more in common with those Edward Sapir than those of later reductionist theorists of emotion who claim for themselves Darwin’s intellectual legacy regarding the nature of emotions (Ekman 1998). Darwin was not a psychologist, but he was the first scientist of the modern age to tackle emotions as an actual object of serious scientific scrutiny. His broad and varied education alongside the means and time afforded by his social status allowed him to conceive a holistic and largely coherent theory of emotions. This theory grounded itself in both social and physiological sciences and thus anticipated many of the problems future researcher of emotion would face. Though there is no indication that this was his intention, Darwin’s theory of emotions predicted both the central focus and the direction psychology would eventually take in their empirical pursuit of the core nature of emotional phenomena. Much to the detriment of the import of The Expression of Emotions in Man and Animals, the book would gradually come to be read only selectively in a manner betraying confirmation bias. Only recently did the whole of Darwin’s work on emotions start going through a modest revival, with all of its depth and subtlety coming to be appreciated. Yet it remains to be seen if his confidently drawn catalog of bodily expressions (other than the facial kind) of emotion can find empirical confirmation in the psychological research of emotional expressions.
[email protected]
30
2 Emotion Universals—Argument from Nature
2.1.2 The Forefathers of Psychology: Wilhelm Wundt and William James The period between Darwin and the mid-twentieth century neobehaviorist dip of scientific interest in emotions was a busy time for psychology. For one thing, the discipline was formally conceived as a discrete discipline of social sciences. It earned the recognition of the Nobel Prize committee. It produced two prominent scholars who would forever after bear the laurel of the forefathers of psychology: Wilhelm Wundt and William James. With the works of those two men the psychology was legitimized as a true scientific discipline with a solid theoretical foundation and a drive toward experimental verification of claims. Owing to them, psychology was an empirical science since its inception. Wundt and James would start psychology and watch it grow into maturity in a rapidly changing world. The first half of the twentieth century was a time of great social and ideological turmoil. During this time, as psychology was establishing itself, the geopolitical scales would tip in favor of America making it the leader in world economy, science, and popular culture alike. Popular culture itself would start playing a role in how academic ideas formed and how they were pursued from the selection of research questions to the question of ethics. Both Wundt and James lived out the most productive periods of their lives at the turn of the twentieth century. Hence, their work by and large sits half-way between Darwin’s broad and holistic ideal of a nineteenth century gentleman scientist and the fragmented and highly specialized multiverse of modern science. Wundt and James were—to put a modern term to good anachronistic use—interdisciplinary in their approaches, drawing ideas from multiple disciplines such as physiology and medicine to explain the investigated psychological phenomena. But they also identified themselves primarily as dedicated psychologists and while their opinion of the nature of psychological processes differed, they had a lot in common. Both majored in medicine, though neither took up the practice, both specialized in philosophy and physiology before psychology. Both used their training in medicine and physiology to ground their claims about the human psyche in the ultimate universal principle of human biology (Goodman 2013; Kim 2014). Both men, in true nineteenth century scholar fashion, took interest in the buzz academic subjects of their period. Wundt engaged in politics with unfortunate consequences for his good name, as his theories of culture and society were adopted by the early twentieth century German nationalists’ “virulently anti-Semitic” rhetoric and policies (Kim 2014). James risked his frail health by venturing into the Amazonian forest on a naturalist specimen collection expedition with the geological superstar of the day, Louis Agassiz (Goodman 2013). And in true twentieth century scientist fashion both Wundt and James had sweeping and complex but thematically focused visions of what their newfangled discipline of psychology encompassed. Likewise in true twentieth century scientist fashion each had highly idiosyncratic ideas about what emotions were and what role they played in the human psyche, and these ideas would prove enduring in the course of emotion research later on in the twentieth century.
[email protected]
2.1 Universalism in the Psychological Research on Emotions
31
2.1.2.1 The Great Teacher—Wilhelm Maximilian Wundt It would hardly be an overstatement to say that the title of the father of psychology has been used so often with respect to Wilhelm Wundt that it has almost lost all of its weight. However, in the case of Wundt, even fully appreciated the title barely hints at the magnitude of his influence upon the young discipline of psychology he helped create. Throughout an incredible 65 years of active and prolific career he produced an estimated 53,000 pages of publications, supervised, and promoted 186 Ph.D. dissertations, and established the first experimental psychology laboratory in the world. Crucially, he also he persuaded his university to recognize his laboratory as such, which was a first for any type of social science (Kim 2014). Even by the demanding standards of his day, rich though it was in voluminous scientific outputs, Wundt was in a class of his own. His charisma and innovations were nothing short of formative inspiration for William James, who even used some of Wundt’s works in his lectures. Wundt’s students would staff many of the newly forming psychology departments in universities in both Europe and the United States. Wundt’s theory of emotions was remarkably progressive, especially in hindsight. Wundt postulated that psychology was the only discipline which could intimate the nature of the human psyche competently with all the strictures of the scientific method (Wundt 1902). By implementing the scientific method he believed he would equip his discipline with the kind of tools that would make its data-driven conclusions valid. He also believed emotions to be vitally important aspects of the human psyche and that they influence significantly how humans think and act (Wundt 1894). The investigation of emotions was therefore of vital importance for psychology as well, and meant that understanding human emotions would help us understand human behavior. The latter was particularly important since Wundt proposed that emotions can and often do influence our thoughts and behavior subconsciously. He differentiated between emotions, of which we were consciously aware and feelings of which we were not (Wundt 1894). Not incidentally, this distinction presaged the inevitable dispute between psychologists regarding the primacy of affect versus cognition in emotional processing exemplified by the debate between Zajonc (1980) and Lazarus (1984). Wundt’s idea of the nature of emotions an intriguing subject. Unfortunately, he never consolidated his thoughts on the matter the way James did, rather he would reflect on it in his many publications. The idea emerging from his writings is quite complex and appears to anticipate several crucial developments in emotion research. Wundt proposed, echoing Darwin that certain emotions are innate, defined by human physiology. He elaborated that idea by proposing that each of these emotions has its dedicated facial expression, likewise innate (Wundt 1894). This was in line with the classic argument from nature, a fundamentally universalist idea which would become the heart of Paul Ekman’s basic universal emotions theory much later. On the other hand Wundt embraced the argument from nurture as well, stating that culture and socialization play equally important role in shaping human emotion expressions. And yet he believed there ought to be a sort of universal principle controlling the emotion processes. For him the principle was that
[email protected]
32
2 Emotion Universals—Argument from Nature
whatever the variation in the form of expression, all emotions were at their somatic origin sorrow and joy. For him those two were core emotions from which all others developed. He would go on to specify that hedonic “pleasure” and perceived “intensity” thereof can be universally determined for all human emotions (Wundt 1902). This idea would later bloom into the dimensional theory of emotions. Wundt’s idea about emotions was thus that they are complex phenomena which ontologically come from sorrow and joy, all have the fundamental properties of hedonic “pleasure” and “intensity.” All emotions are innate, but they are also all subject to the social-cultural molding. For select innate emotions there exists innate facial expression configurations which are likely universal for all mankind. Emotions are a vital aspect of human psyche, and hence also the study of psychology. Psychology is an empirical science and obeys the scientific method. These were Wundt’s convictions, and this is the kind of perspective and knowledge he imparted upon his many students. Some of his ideas, such as the converging influences of both nature and nurture upon emotion expressions are only now gaining widespread acceptance within the revisionist-integrative models such as the conceptual act model (Barrett 2011). Others have been implemented widely in various reductionist emotion perception theories. Following closely in his footsteps but on the other side of the Atlantic, William James would become to American psychology what Wundt was for Europe. 2.1.2.2 The Great Writer—William James William James is a deservedly revered father figure for American psychology. However, his influence upon the development of psychology as a discipline was of an entirely different kind than that of Wundt. James opened the first experimental psychology laboratory in the United States and he wrote and lectured on psychology and an impressive range of other subjects, but this is about the extent of similarities between the two men in terms of how they influenced their successors. James is prized for his signal contributions to philosophy and psychology, but also to linguistics and sociology (Parajes 2002). His take on the philosophical idea of pragmatism, first proposed by Charles Sanders Pierce, had a definite influence on his concept of psychology as a scientific discipline. James exercised his intellectual influence over psychology mainly through writing, specifically through his magnum opus The Principles of Psychology. Ordered by the publisher Henry Holt, the book was intended to become a college psychology textbook (Goodman 2013). The process of writing the book took James 12 years, broken up by bouts of chronic depression and neurasthenia3 which plagued James all his life (Parajes 3Neurasthenia was a frequent diagnosis at the time and it remains a recognized medical condition until today. World Health Organization’s International Classification of Diseases (ICD) defines it as a psychosomatic disorder characterized by mental and physical fatigue, distractedness, stressrelated drop in immunity, often accompanied by depression and emotional instability. Although James suffered greatly from this disorder, it likely saved him from being drafted to participate in the American Civil War (WHO 2015).
[email protected]
2.1 Universalism in the Psychological Research on Emotions
33
2002). The result was a two-volume creation of twelve hundred pages total, made unique as much by its comprehensive scope as by its literary value.4 It was published in full in 1890 and has not really been out of print since. The Principles of Psychology exemplifies James’ introspective method of “doing psychology.” James used the evidence of his own senses and his immense knowledge on a variety of subjects in equal measure. He was a voracious acquirer of knowledge, traveling repeatedly to the antebellum Europe, then the capital of world learning to take stock of the cutting edge of science himself (Goodman 2013). He took interest in the recent developments in neurology and physiology, and both appear to have influenced his understanding of emotions. James proposed that emotions are temporally complex phenomena which progress from an automatic physiological-behavioral reflexive reaction to a stimulus into cognitive awareness and conscious control. The reason or conscious awareness and control can, in moments highly charged with emotion, be overwhelmed and become a “passive spectator” to the powerful physiological-behavioral emotional displays (James 1890). However, he also believed that in most cases reason can suppress and modify the automatic emotional outbursts (James 1893). He believed this cognitive-affective balance of power in human expressions of emotion to be a product of adaptive fit, an idea borrowed from Darwin (James 1890). Furthermore, James postulated, in contrast to the leading neurological theories of the time, that emotions do not have dedicated centers in the brain (James 1890). This may seem obvious, since his definition of emotion prioritizes preconscious somatic reflexes, which are neurologically dispersed. But solid evidence for this idea (e.g., Panksepp 1998) and wide acceptance of it in psychology (e.g., Zimbardo et al. 2009) only came a century after James first proposed it. For Wundt there were two distinct affective phenomena—the preconscious feelings and conscious emotions. For James there was one phenomenon, emotions, which progressed in time from a preconscious to a conscious state of awareness. While these definitions are not entirely incompatible, James’ idea allows for a more complex and dynamic system of emotional reactivity. There is some indication in James’ work that he believed emotional excitation to be a continuous state, never fading completely, merely rising and falling in intensity and conscious perceptibility in response to stimuli. And the stimuli could be both external, coming from the objective reality, and internal, coming from the inner world of thoughts and memories. The emotions that can be thus evoked by such stimuli were of two kinds. One was a limited set of “coarse” emotions—primal, innate in the Darwinian sense, powerful and well defined in terms of bodily expression. These
4All of the James children were seemingly bound for excellence in literature. William’s brother was the popular writer Henry James, and his sister was the now increasingly appreciated diarist Alice James. William himself was deeply dissatisfied with The Principles, however, despite the praise heaped upon it. Wilhelm Wundt praised it as literary tour-de-force, but he was also mostly condescending regarding its portent for psychology; as he put it: “It is literature, it is beautiful, but it is not psychology” (Parajes 2002).
[email protected]
34
2 Emotion Universals—Argument from Nature
are, according to James: “anger, fear, love, hate, joy, grief, shame, pride, and their varieties” (James 1893). Any and all other emotions of more intellectual or aesthetic nature are much less defined, more subtle and ambiguous both in perception and in expression. Significantly James believed that it would be hard to determine a set of necessary and sufficient requirements to define specific emotions in a way that would apply to all human beings. Emotions were profoundly subjective and subject to so much variation within individuals that establishing a definition of an exemplary prototypical emotion would be, for all intents and purposes, impossible (James 1890). Intriguingly, James was aware of the existence of potentially nonequivalent emotion terms in different languages, but he did not pursue the subject. His focus was on the mechanisms commanding emotional processing, not on the names human imagination might have given them. He understood emotions as ubiquitous, ever-present and continuous phenomena, not of transient but temporally dynamic nature. The emotions in his understanding were a constant but not constantly conscious presence in our lives, perceptions, and thoughts. James observed that emotions vary in intensity of perception and expression that their expression is partially subject to conscious control. And though he understood the deep subjectivity and possibly the cultural variability of emotions, he attempted to classify them according to innateness and intensity. Crucially, he put forward the idea that emotions are primarily automatic somatic reflexes. This idea would become the foundation of what later became known as the James–Lange theory of emotion.5 This theory in itself would become a staple of theoretical psychology rhetoric, but it would be the idea of discrete “coarse” emotions that would have the strongest impact on emotion research. This, however is a topic to be discussed in the context of Paul Ekman’s theory of universal basic emotions.
2.1.3 Between the Dawn and Rebirth—From the Forefathers to Paul Ekman William James died in 1910 and Wilhelm Wundt followed 10 years later. With them went any serious or dedicated effort for emotion research. Many factors contributed to such state of things. For one thing, even for Wundt and James emotions were but one of the many aspects of human psyche that psychology should investigate. For another, psychology was still a young science at the beginning of the twentieth century, when scientific consensus or consistent methodology for the entire discipline did not yet establish themselves. Furthermore there was the 5Professor Carl Lange specialized in medicine and investigated emotions from the perspective of outward expression, while James tackled it from the internal, psychological perspective. Both appear to have come to very similar conclusions about the primarily somatic nature of emotion at around the same time. Hence the name of the theory honors them both in equal measure (Titchener 1914).
[email protected]
2.1 Universalism in the Psychological Research on Emotions
35
global historical context. Psychology as a discipline was barely off the ground when it was subjected to the interbellum political agenda. The Great War was an unprecedented traumatic event for the global population on every level—physical, economical, emotional, and psychological. But those in the know in the world of politics knew the Armistice was just that, a temporary and uneasy truce. The world would go to war again, and that meant keeping up the fighting spirit of the already bled nations. For example, cases of active suppression of research on what was then known as shell shock and what is now usually called posttraumatic stress disorder (PTSD) are fairly well documented (Jones et al. 2007). In plain terms, psychologists had plenty to do without investigating emotions, and investigating subjects that would highlight the reality of emotional trauma was not favorably looked at by the powers responsible for research funding. The psychology mainstream took a deliberate turn toward topics of empiricism and behaviorism, both of which had a great potential for legitimizing psychology in the eyes of the general public. Behaviorism and neobehaviorism which ruled the mainstream psychology between the 1920s and 1950s considered emotions no more than an “intervening variable” (Averill 1983). However, several signal developments both in theory and laboratory practice in emotion research did take place in this period. The 1920s saw the development of the Cannon–Bard theory of emotion in critical response to the James–Lange theory and landmark experiments by Watson and Landis. Walter B. Cannon and his doctoral student Philip Bard were dissatisfied with the James–Lange postulates that the physiological sensation should precede conscious cognitive awareness of emotion (Cannon 1931). Their idea was that the physiological excitation and conscious awareness in response to a stimulus occurred simultaneously and that constituted emotions. Present-day critics point out that the main weakness of the Cannon–Bard theory was its overextension. Cannon and Bard modeled their theory on experiments with decorticated animals expressing a limited range of basic emotions (anger, fear, joy, and disgust), but extended their theory to humans and all emotional states (Dror 2014). While the contemporary criticisms are solid, the Cannon–Bard theory of emotions stands as both an important step in emotion theory and a signal of one of the perennial problems in the field: the definition of emotion. Both Cannon and Bard were primarily physiologists, James and Lange were primarily psychologyoriented, and their postulates reflect their training and their fundamental disagreement on what “emotions” actually are. John B. Watson’s fear conditioning experiments published in 1920 (Watson and Rayner 1920) illustrated the fact that not all emotional reactions are innate. In what is now known as the Little Albert experiments Watson demonstrated that fear can be evoked in response to completely harmless and unthreatening stimuli through basic Pavlovian conditioning, and that conditioning is subject to generalization. Landis (1924) provided initial empirical evidence toward the existence of dedicated and specialized facial expressions for discrete emotions through a series of inventive but disturbing tasks, one of which included the participants cutting off the heads of live rats. Both Watson’s and Landis’ experiments were landmarks in
[email protected]
36
2 Emotion Universals—Argument from Nature
terms of research methodology and garnered some infamy regarding the ethics in their research conduct.6 The fact of the matter was, however, that ethical standards in human subjects research at the time did not yet exist. The landmark of raising ethical awareness in this area came much later in the form of the Tuskegee syphilis experiment (Skloot 2010) which ran between 1932 and 1972. However, psychology as a discipline was coming of age and once again the Henry Holt publishing company hand a hand in setting educational standards. In 1938, Robert Sessions Woodworth’s Experimental Psychology came out and became one of the main psychology textbooks for a fresh generation of psychologists. Comprehensive and thoroughly practical, the book emphasized functional knowledge and empirical skill, and due to its status soon became known as the “Columbia Bible”7 (Winston 1990). Among its educational merits, the book contained several ideas that would be picked up and developed once emotion became the mainstream of psychology research in the second half of the twentieth century. Most notably, Woodworth postulated six basic emotions corresponding to discrete facial expressions (Woodworth 1938), an idea later pursued and elevated by Paul Ekman. The 1940s was a rather quiet period for emotion research in psychology, though overall the academic world went through a crucial transformation. While the hint of a global power shift had been in the air ever since the end of the Great War, in the late 1940s the shift became complete. Both the economic and the academic leadership shifted to the United States. Studies of great importance in neuropsychology and traumatology were at the time being carried out in the USSR by Alexander Luria (Luria 1947), but they remained largely isolated from the global academic world behind the Iron Curtain. The center of gravity for the study of psychology and emotions lay in the USA. 1950s and early 1960s again saw crucial developments that would prove influential upon the shape of the things to come. In 1954, Harold Schlosberg proposed a three-dimensional theory of emotions, which postulated that however varied emotions may be, it should always be possible to determine three properties, or dimensions, of their affective meaning. These dimensions, conceptualized as continua, were attention-rejection, hedonic pleasantness-unpleasantness,
6Watson’s experiments and Little Albert’s reactions were captured on film and can easily be obtained today from a variety of online sources. The video allows us to assess the commendable coherence and consequence with which the experimental procedure was conducted. However, the video also reveals the disturbing levels of distress Albert was subjected to, and an analysis of study background reveals that no informed consent from the child’s parents was ever obtained. Landis’ experiments were a model of effective emotion evocation, but the procedures involved in the evoking stage and in subject recruitment were highly questionable. One of the procedures to evoke disgust had the participants take a live rat and cut its head off with a kitchen knife. The subject screening was so poor it allowed the participation of a teen referred to Landis’ Psychology Department with an initial diagnosis of emotional instability. Both experiments are today often featured in research ethics seminars as examples of various facets of unethical research conduct in Human Subject Research. 7Prior to being published fragments of the textbook circulated among the Columbia University students, who found it of great value in their study of psychology. Hence the name “Columbia Bible”.
[email protected]
2.1 Universalism in the Psychological Research on Emotions
37
and sleep-tension related to physiological arousal (Schlosberg 1954). The same year saw Brown and Lenneberg’s attempt to operationalize linguistic relativity as an experimental hypothesis. Two years later in 1956 Language, Thought, and Reality was published (Whorf 1956), and in 1958 the process of vulgarization of the anthropological principle of linguistic relativity was cemented by Brown’s formulation of its deterministic reinterpretation (Brown 1958). Finally, 1956 was the year of the MIT Symposium on Information Theory during which Noam Chomsky presented his paper “Three models for the description of language.” This and the papers by Miller and Newell, and by Simon collectively mark the intellectual beginning of new chapters in linguistics and psychology. In psychology they started the subdiscipline of cognitive psychology (Eysenck and Keane 1995), and in linguistics they started the so-called “Chomskyan Revolution” (see Chap. 3). The 1960s would become the decade when emotions started garnering more and more attention from psychologists as well as neurologists and specialists in related fields. In 1962, Stanley Schachter and Jerome E. Singer reworked the James–Lange and the Cannon–Bard theories of emotion into a new one now bearing their names. They were struck by the variety of emotions and emotion-like phenomena (moods, feelings) which made defining emotions such an elusive goal (Schachter and Singer 1962). They came to the conclusion that cognitive appraisal or labeling was the key to emotional perception. What made an emotion was a state of arousal specific for that emotion with a certain cognition and a label imposed upon it. Cognitive awareness of what a given physiological arousal is caused by allows us to consciously interpret that arousal as a specific emotion appropriate for a given situation. Their theory was the culmination of several decades of emotion theory and research, still rooted in physiological reasoning, but already making use of the evidence for neurochemical reactivity of emotions.8 This theory was on the cusp of a new wave in psychological mainstream. Jerome E. Singer was instrumental in the rapid rise of cognitive psychology, the new subdiscipline kick-started in 1956 at the MIT Symposium of Information Theory. That subdiscipline would now take the mainstream and focus increasingly on emotions as objects of both research and theoretical work. This trend would peak toward the end of the decade with seminal works on facial expressions of emotion by Paul Ekman. It would hardly be an exaggeration to say that the next four decades would revolve around Paul Ekman and his universal basic emotions—sometimes agreeing with him, sometimes battling his uncompromising universalism.
2.1.4 The Universalist—Paul Ekman Few psychologists in history could boast quite the level and scope of academic, professional, and even popular success that Paul Ekman has achieved. His idea of six universal basic emotions recognizable from the face regardless of cultural or 8They used epinephrine/placebo injections and suggestion manipulation to evoke different emotions from the same neurochemical/physiological arousal.
[email protected]
38
2 Emotion Universals—Argument from Nature
linguistic differences had an undeniable appeal in the second half of the twentieth century in the boom period of emotion research due to its simplicity, easily definable aspects and straightforward nature that translated well into the language of research hypotheses. His Facial Action Coding System (FACS) remains to this day one of the finest tools for the meticulous description of facial expression in existence (Sayette et al. 2001), and has found multiple uses in matters of US national security and defense as well as in commercial environments (Fischer 2013). His ideas about facial cues of deception have even been popularized in lay consciousness through TV shows such as Fox’s Lie to me (Fischer 2013), and through his popular honors such as being named one of the TIME Magazine’s top 100 most influential men in the world in 2009 (Taylor 2009). The majority of the honors he has received were on behalf of his work on deception, but his research on the nature of emotions also left a great and lasting impact on the discipline of psychology. In its fundamental form, his theory of universal basic emotions went from strength to strength. From a breakout new idea in 1969, to an introductory psychology textbook standard in the 1970s to canon understanding of emotions in psychology in the late 1980s (Russell 1994). Well into the twenty-first century his theory was so entrenched in the mainstream psychology of emotions that it became referred to as the standard view. Ekman’s early publications on the nature of emotion showed a mild reconciliatory position between Woodworth’s early categorical view and Schlosberg’s early dimensional view (Ekman and Friesen 1967). They also focused, true to Darwin’s description of the matter, on a broad range of body language expressions of emotions (Ekman 1965). This almost noncommittal and broad position was then brushed aside by a much more focused and radical idea in 1969. That year the “Pan-Cultural Elements in Facial Displays of Emotion” (Ekman et al. 1969) appeared in Science, presenting the results of one of a series of studies of facial expressions conducted among the Fore people of Papua New Guinea. The results indicated that there exists a range of universally recognizable facial expressions for a limited range of basic emotions. These expressions are innate, universal and universally recognized regardless of cultural or linguistic differences. These basic emotions postulated from the start were: anger, fear, disgust, sadness, surprise, and happiness (Ekman et al. 1969). The premise here was boldly stated, simple, and translated well to the language of research hypotheses, which increased its appeal to prospective emotion researchers. The initial formulations of this universalist theory of emotions were not radically opposed to issues of linguistic or cultural relativity, which were treated as peripheral to the theory. Still, explicitly and implicitly, Ekman’s theory was built as a critical response to the anthropological findings supporting relativistic effects through evidence of cultural specificity of emotion. In the course of his long and prolific career Ekman’s attitude to this cultural specificity alternative to his universalism would vary from disdainful (Ekman 1998)9 to positively vitriolic (Ekman 1994a).10 9This
was, somewhat ironically, in an Afterword to an edition of Charles Darwin’s The Expression of Emotions in Man and Animal. The disdainful comments were directed at the notable anthropologist Margaret Mead. 10His acerbic response to Russell’s detailed critique of his work.
[email protected]
2.1 Universalism in the Psychological Research on Emotions
39
Regardless of his attitude, however, Ekman could not negate the large and growing body of evidence for certain degree of variability and cultural and linguistic relativity in emotional phenomena. Anthropological evidence of relativistic effects in culture- and language- specific emotion expressions was both solid and compelling. Apart from the evidence from remote cultures such as the Ifaluk or the Utku, there was the evidence of high variability of emotion concepts and their use within languages. Wallace and Carson (1973) tallied around 2.000 English dictionary items with emotional meaning while documenting different patterns of their use depending on the users’ professions. The simplicity and focus of Ekman’s universalism, it quickly became apparent, were both its strength and weakness. The theory was simple, but to account for all emotional phenomena it would need additional provisions. To address this issue in 1992 Ekman clarified some of the details of his original study with the Fore, reporting that he and his collaborators found various facial expressions for given emotions. For example, they found “more than 60 expressions” for anger alone (Ekman 1992). Facial expressions varied within categorical boundaries of basic emotions in a type/token fashion that is the 60 expressions were all tokens of one type: anger. The variability was caused, explained Ekman, by display rules which governed how and what emotions can be overtly expressed in what social situations (Ekman 1970). Regarding the variability of emotion terms, Ekman firmly declared his theory refers to the psychosomatic phenomena, not the labels conventionally used to refer to them. He used the terms as useful simulacra for the discussion of the underlying emotional phenomena (Ekman 1994b). The different related terms for emotions, he believed, were conceptually arranged in a prototype-like fashion with the basic emotion terms in the prototype position (Ekman 1994a). Only the terms for basic emotions bore the necessary and sufficient semantic characteristics to be prototypes. The necessary and sufficient criteria for what constitutes the actual psychosomatic and expressive phenomena of emotions were a more difficult subject. Although the pool of six basic emotions—anger, fear, disgust, sadness, surprise, and happiness—never changed, both the total number of “basic emotions” and their definitions fluctuated over time. The initial number of criteria categorizing an emotion as basic was nine (Ekman 1992), but it shifted to seven (Ekman 1994a), then to eleven (Ekman 1999), only to peak at 12 (Ekman and Cordaro 2012). The criteria changed from simple requirements that each emotion has its dedicated and distinct facial expression to including specific appraisals in emotion perception, and concurrent existence in other primates. As he fought his critics and adapted his theory to the new evidence conflicting it, Ekman shifted his position from a radical denial of the existence of any “non-basic” emotions (Ekman 1992) to one where such emotions existed and fulfilled his categorical criteria only partially (Ekman 1994a). As the number of inclusion criteria grew so did the catalog of the basic emotions. It started with six (Ekman et al. 1969), then it doubled (Ekman 1992), then jumped to 15 (Ekman 1999), to culminate implicitly in an even larger number as Ekman (2003) declared the existence of 16 of positive emotions alone. All in all, although in its late form Ekman’s universalist theory accounted for the majority of affective phenomena, it lost much of its simplicity and focus which made it so initially appealing.
[email protected]
40
2 Emotion Universals—Argument from Nature
The fundamental idea behind Ekman’s theory of emotions remains that there exists a universal principle behind emotions. The principle dictates that there are at least six discrete emotions with dedicated facial expressions which are panhuman and universally recognized. There is a certain variability in these facial expressions caused by display rules, but these variations fit within the discrete emotion categories in a type/token fashion. The variability of emotions terms is explained through prototypical structures of emotion concepts with the basic emotions occupying the prototype positions. The necessary and sufficient criteria for basic emotion category inclusion vary depending on the version of basic universality, but the initial basic six emotions are always included in the final tally of basic universal emotions. Ekman’s theory of emotions shows multiple but highly selective influences. He echoes Darwin, Woodworth, and James in his postulate of dedicated facial expressions for discrete emotions, but rejects Darwin’s observation of cultural and linguistic influence on emotion expression and James’ postulate of deep subjectivity of emotions. He believes in absolute universality of basic emotions and actively opposes the anthropological evidence of cultural specificity of emotions, but accepts a degree of variability within his discrete emotions category through display rules. Paul Ekman’s universal theory of emotion is a powerful one, and there is a substantial body of evidence in its favor, but it is also fairly radical and continues to attract a lot of criticism from multiple fields and angles. The theory dominated much of the psychological research of emotions in the late decades of the twentieth century. It continues to define the field today, though in a different character. The critics and revisionists of Ekman’s theory have been increasing in number and strength of arguments, all backed with solid empirical evidence. Starting in 1980s and continuing to this day the new theorists and researchers of emotions have dethroned Paul Ekman’s universalism from the position of the ultimate answer11 to the problem of emotions to a theory which is simply not entirely and ultimately right.
2.1.5 Resistance and Revisionism—The Post-ekmanians Just as Paul Ekman’s universal basic emotions theory was making its way towards becoming mainstream, alternative theories were being developed on the fringes. Beginning in the 1960s with Magda Arnold’s appraisal theory and culminating in the revisionist theories of the 2010s, there were several things that the researchers in these alternatives to the standard view had in common. For one thing, all of them addressed crucial questions of the self, culture, language, and context in the perception and expression of emotions. In acknowledging these factors in emotion processing, by their own admission (Russell 1991), they were returning to
11Ekman stated outwardly his desire to give the ultimate answer quite unequivocally: “My goal was to settle the matter [of defining what emotions are] decisively” (Ekman 1998).
[email protected]
2.1 Universalism in the Psychological Research on Emotions
41
positions very much like the original anthropological formulation of the linguistic relativity principle. For another, they all appeared to admit that the problem with Ekman’s theory was not that it was wrong. The problem was that it was not entirely right, not the “decisive” one Ekman believed it to be. Virtually all of the advanced versions of the integrative approaches to emotion developed at the turn of the twenty-first century would include Ekman’s model on some level as a component of a much more complex emotion recognition system. With the exception of the early works on the appraisal (Arnold 1960) and the dimensional theories of emotion (Schlosberg 1954), all of the alternatives to Ekman’s theory formed in a more or less open opposition to it. As Ekman’s theory grew into its mainstream position as the standard view, these alternatives took peripheral positions and with time took on a less critical and more revisionist character. The major alternative/ revisionist theories were the appraisal theory, the dimensional theory, and the integrative theory of emotion. 2.1.5.1 Appraisal Theory of Emotions In definitional terms, appraisals may be one of the more elusive concepts in emotion theory. The first comprehensive definition of appraisals was put forward by the psychologist Magda Arnold (1960) and it was, by all accounts, an ambitious attempt to resolve the issue of the entanglement of cognition and emotion. Arnold’s theory of emotions was still rooted in the long standing tradition of grounding psychology of emotions in physiology, but it innovatively extended into the behavioral and pragmatic dimensions. For her emotions were temporally and structurally dynamic cognitive–affective states whereby an initially evoked physiological arousal is filtered through appraisals and cognitively framed. Emotions could be evoked by objects interpreted in relation to the perceiving self. Depending on whether the appraisals would frame a given object as “positive” or “negative” an approach or withdrawal behavior might be aroused in the self, subject to renewed interpretations and reappraisals. Appraisals, according to Arnold, are a kind of set of evaluative filters which direct our interpretation of perceptual input in accordance with our knowledge and experience gained in the course of socialization and enculturation. Appraisals are automatic, typically unconscious, but possible to be accessed consciously, and they govern action potentials for approach and withdrawal. Action potentials in turn signify a kind of mobilization, readiness for action depending on the ultimate outcome of appraisal/reappraisal evaluation. Appraisals alone can evoke action tendencies and behavior. And, according to Arnold, they cause emotions. Without physiological arousal in response to an object in relation to self and evaluated through appraisals emotions cannot exist (Arnold 1960). Arnold’s appraisals thus have more in common with the kinds of learned automatic mechanisms postulated by Damasio (1994) than with conscious, laborious processing of whys and wherefores of any emotion-evoking object. There are obvious similarities between the James–Lange theory of emotions and Arnold’s appraisal theory. Both have resolved the dual cognitive-affective nature
[email protected]
42
2 Emotion Universals—Argument from Nature
of emotion perception and expression to the satisfaction of their own postulates. However, for many emotion researchers this Cartesian philosophy relic still formed the axis of a divisive argument over the primacy of affect versus cognition in emotion processing. The most prominent figures in this great primacy debate were Robert Zajonc and Richard S. Lazarus. Interestingly enough, in hindsight their theories were not so much at odds as differentially focused on select aspects of Arnold’s appraisal theory (Kappas 2006). Invoking Wundt, Zajonc postulated primacy of affect, which he defined as the state of automatic and irrevocable hedonic evaluation of an object evoking the affect. This evaluation was an ontologically primal to any conscious cognitive mechanism and thus remained primary to cognition (Zajonc 1984). In interpersonal communication the naturally affective nonverbal channel of communication coincides with the typically cognitive verbal channel and thus both permeate each other continuously. And yet Zajonc, much like Lazarus believed cognition and affect to be only partially codependent (Zajonc 1981; Lazarus 1981). Lazarus countered Zajonc, placing what he emphatically called cognitive appraisals in the primary position (Lazarus 1981). Like Arnold, Lazarus postulated that emotions were relational in nature and that relational nature defined cognitive primacy for him. The self for Lazarus was in continuous evaluative relation with reality and appraisals, because they were socially and conventionally acquired, were cognitive in nature (Lazarus 2006). Hence cognitions were primary to emotions. Against the background of Arnold’s work, Zajonc, and Lazarus’ debate really is a mere issue of semantics and of the varying definitions of “cognition” and “emotion” (Kappas 2006). Still, whichever side researchers took in the debate, the appraisal theory continued to be developed and perfected, even though some of its claims only begun to find empirical confirmation in the 2000s. The appraisal theory of emotions has its arguably greatest champion in Klaus R. Scherer, who has been a powerhouse for the theory for almost half a century and the author of one of the more advanced models of emotion within the appraisal framework (see Chap. 4). Scherer’s theory of appraisal is advanced and reconciliatory in nature, very skillfully weaving evidence from both psychology and anthropology to try and explain the cross-cultural variability of emotions (Scherer and Wallbott 1994). Scherer defines appraisals as series of perceptual filters of evaluative and subjective nature, which give shape to the stream of stimulation from the objective reality (Ellsworth and Scherer 2003). These appraisal filters range from the very ontologically primitive and universal (e.g., hedonic pleasantness, arousal) to the very complex and socially culturally variable (e.g., power, identity, justice). The more primitive appraisals constitute what Scherer calls “push” factors in emotions which condition the reflexive physiological arousal in emotions. The more advanced appraisals in their turn constitute the “pull” factors, which mold the spontaneous emotional expressions into forms which are socially acceptable under given display rules (Scherer et al. 2011). Together the various appraisals working continuously constitute our conscious awareness of self. According to Scherer the self is a relational emotional entity formed from a mixture of inherent and universal psycho-physiological factors and culture-specific high cognitive factors structured much like the Maslovian model
[email protected]
2.1 Universalism in the Psychological Research on Emotions
43
of basic needs (Scherer 1997). The “backbone of the appraisal system” is the hedonic pleasantness and activity, both also recognized as fundamentals in dimensional theories of emotions and referred to as valence and arousal respectively (Ellsworth and Scherer 2003). All in all, appraisal theory is a theoretical bridge between anthropological evidence of emotion diversity and psychological evidence of emotion universality. It explains in psychological terms what Sapir meant by relativity, but avoids confusion by not referring to language as a defining factor. While connections between the structure of language and the configuration of appraisals can be extrapolated from the preferential use of emotion naming or recall tasks (Ellsworth and Scherer 2003), language was never the focus of the appraisal theory. On the contrary, because the theory embraced the cultural and societal factors in the development of the self-constructing appraisals, it also left room for virtually limitless variability of emotion terms. It was a theory which explained the mechanism of the Whorfian limitation on the perception of reality, but instead of grammatical habituation they used a perceptual habituation as the explanatory principle. And yet there remained the issue of language and how the habituated perceptions of emotions may influence or represent emotional realities of various cultures and to what degree are those realities comparable or universal. That question, as well as the matter of operationalizing the appraisals would be addressed directly by the dimensional theory of emotions. 2.1.5.2 Dimensional Theory of Emotions Dimensional theory of emotions goes all the way back to Wundt and his idea that all emotions ultimately derive from two main sources: sorrow and joy (Wundt 1902). He conceived of this derivation in ontological terms, but he also believed the basic meanings of pleasure and arousal could be determined for every emotion in existence. This idea was expanded upon by Schlosberg, who believed there to be not two but three basic dimensions of meaning for every emotion. There was the hedonic pleasure and arousal, referred to as pleasantness-unpleasantness and sleep-tension dimensions respectively, but also what could be called an action potential dimension of attention-rejection (Schlosberg 1954). By including the lattermost dimension Schlosberg approximated the appraisal theory more than what would later normally be thought of as dimensional theory. The next step for the dimensional theory would come in the mid-90s. In 1994 Margaret M. Bradley and Peter J. Lang proposed a three-dimensional approach with the basic dimensions of valence (pleasantness), arousal (intensity), and dominance. Dominance referred roughly to the level of perceived control over an emotional reaction (Bradley and Lang 1994). The three-dimensional scale has been widely applied since in the form of Self-Assessment Manikin (SAM) to evaluate large databanks of stimuli for emotion research.12 The most advanced and best developed version of the 12For example in the International Affective Picture System (IAPS) Bradley et al. (2008) or the International Affective Digitized Sounds (IADS) (Bradley and Lang 1999).
[email protected]
44
2 Emotion Universals—Argument from Nature
dimensional theory came in James A. Russell’s works on Minimal Universality and core affect (Russell 1995). There are many similarities between the appraisal and dimensional theories of emotion with one crucial difference which shifts the focus of the theory. The concept of self in appraisal is relational, the emotional self existing in relation to objects evoking the emotions. In dimensional theory, the self is predicated on deep psychological subjectivity in the Jamesian spirit. Russell considers emotions phenomenologically as psychosomatic constructs of the mind. These constructs are complete and consist of the physiological arousal associated with a given emotion, a catalog of objects, contexts, and situations that can evoke that emotion, as well as the names of emotions (Russell 2003). Russell and his collaborators have found that the names of emotions are critical in the processing of emotional expressions—one’s own and others’ (Lindquist et al. 2006). Concepts of emotion are not in any way unique on the level of mental representations. They are, according to Russell, organized prototypically, with classic fuzzy boundaries overlapping both with other emotional and other nonemotional concepts (Russell 1983). Such approach allows Russell to explain both the cross-cultural variability of emotions and polysemy of certain emotion terms (Russell 1991). This overlap also means that for Russell emotion terms, however important, remain merely “guideposts” to meaning (Russell and Barrett 1999). Still, from those “guideposts” we can infer a lot of the culture, history, and environment that had shaped both the psyche and the language of the people who use them. By making this argument Russell circled back to the fundamentals of the Whorfian relativity in its original form, though as a psychologist he was still not very comfortable with the idea (Russell 1991). Russell’s definition of emotions is thus almost organic and because it is predicated on the idea of subjective experience and constructed phenomenologically it is also easily adaptive in application. Despite his views seemingly approaching the positions of linguistic relativity, Russell was himself his own kind of universalist. A significant portion of his ideas about the universal vs. culture-specific nature of emotional experiences formed in the course of his detailed and systematic revision of the standard view of emotions (Russell 1994). The main point of disagreement between Ekman and Russell on the issue of universality of emotion was that of degree. For Russell emotional experience was universal on the level of two primal dimensions of valence and arousal, distinct but subjectively experienced as a uniform sensation (Russell 2003). Because emotion terms are “guideposts” to emotion, Russell deduced those basic dimensions from multiple semantic scales of affective meaning converging statistically on valence and arousal exactly (Russell and Mehrabian 1977). The two dimensions correlated with evidence from physiology and could be determined for every emotion in virtually every language (e.g., Russell et al. 1989). Russell called the subjective experience of these two dimensions as a single affective state “core affect” (Russell and Mehrabian 1977). Core affect expressed in valence and arousal are, according to Russell, the minimal degree of universality in emotional expressions and experience across cultures, which is what constitutes Russell’s Minimal Universality.
[email protected]
2.1 Universalism in the Psychological Research on Emotions
45
Russell’s definition of emotions is quite unique, as it does not fit any one scientific framework. There are elements from various schools of psychology, such as Jamesian subjectivity, the appraisal perspective, Ekmanian universalism, as well as aspects of anthropological linguistic relativity. Russell considers emotions as an epistemological phenomenon existing and permeating various levels and planes of reality: the psychological, physiological, social, cultural, and linguistic. At the same time, every aspect of his theory and of his model of emotions, the circumplex (see Chap. 4) is backed by solid empirical evidence. Within Minimal Universality Russell accepts the universal nature of emotional experience within its evolutionary and physiological roots expressed in the conscious experience of core affect. What dimensions of meaning extend beyond the core affect are subject to relativistic effects and can be used to measure and operationalize cross-cultural variability of emotions. But he also observes that just because language is a late ontological aspect of emotional experience, it does not mean language is meaningless for how we perceive and express emotions in ourselves and others. Emotion terms are part and parcel of emotion concepts and they influence how we acquire and implement those concepts. From another perspective emotion terms are “guideposts” pointing the informed interpreter toward the correct emotional meaning. Thus, the language of emotions is both the medium and a constitutive part of emotions. Still his theory of emotions within the dimensional framework explains what emotions are, but not exactly how they become what they are. That detail would be worked out by Lisa Feldman Barrett and her collaborators within the Conceptual Act Theory (CAM) of emotions. 2.1.5.3 Integrative Theory of Emotions The mainstream of psychology has been dominated by the standard view since the 1970s. All other theories of emotion were being developed on the fringes, some as continuations of older ideas such as the appraisal theories, others as a critical response to the reductionism inherent in the standard view. James A. Russell was among the first to openly and systematically question the hegemony of the standard view and propose an empirically grounded alternative. However, his work of Minimal Universality and core affect already hinted on a bigger idea of operationalizing the phenomenological space beyond the core affect into a new kind of theory and model of emotional processing. This idea would be fleshed out by another great revisionist of Paul Ekman’s theory—Lisa Feldman Barrett and her team. The crux of this new idea was to combine the empirically backed aspects of various theories of emotion from multiple disciplines and create a complex integrated research framework that could at last produce a consensus on the nature of emotions. The idea grew in part from the dissatisfaction with the reigning reductionist paradigms and in part from the increasingly vocal revisionist movements systematically reexamining both the methods and the conclusions of the standard view (e.g., Elfenbein and Ambady 2002a, b). Barrett’s Conceptual Act Theory stands as one of the most advanced examples of the integrative approach to date.
[email protected]
46
2 Emotion Universals—Argument from Nature
Barrett’s definition of emotions has a lot in common with that of Russell, but focuses significantly more on the causal mechanics of emotion processing. Emotions, according to Barrett (2011), do not exist inherently as discrete phenomenological entities but are constructed ad hoc based on existing knowledge and past experiences. We are born with certain physiological primitives, mainly sense of valence and arousal, which ensure our survival. Core affect works continuously, like a kind of hedonic barometer for self-preservation. Then in the course of social learning, and the acquisition of language and cultural norms we add upon those primitives. Further, more complex appraisals help us build the concept of self and of the situation we are in, and though language we learn to name particular configurations of core affect, complex appraisals, and contexts as particular emotions. We store our knowledge of such instances with their labels of emotions terms in long term memory and learn to deploy thus acquired knowledge when necessary (Barrett et al. 2011). Emotion, says Barrett, is a temporally complex construct, whereby core affect is excited by an external object. Our minds then quickly match the situational antecedents to our catalog of emotion concepts stored in long term memory. This matching produces what we subjectively experience as a specific emotion, whether it be one of Ekman’s basic six or any other. This matching may be imperfect, activating selected parts of concepts thus giving us a clear emotional sensation which may, however, be hard to exactly define or name. The “Conceptual” of “Conceptual Act Theory” thus stands for the overlapping appraisals and memory matching with the sensations from core affects which in combination produce the psychosomatic sensation of emotion. The “Act” stands for the ad hoc nature of the emotion construction (Barrett 2006). In this definition of emotions words play a role in the process of social learning, as they form something between the Russelian guideposts and the Ekmanian simulacra—labeled directories into which all relevant knowledge of a particular emotional episode is stored (Barrett 2011). Within those directories knowledge is organized prototypically with classic fuzzy boundaries between emotional and nonemotional concepts. At the same time emotion words are entrenched as parts of the emotion concepts they refer to. Barrett’s team has shown that by semantic satiation with an emotion term the access to the entire emotion concept that term refers to can be blocked (Lindquist et al. 2014). Language is thus at the heart of what Barrett calls an emotion paradox. Human beings are extremely good at experiencing and perceiving emotions, but find it extremely hard to name them exactly. There is always a margin of error, a gray definitional area in determining and naming an emotion. This is because the names we have for emotions point our minds to a fuzzy concept which may overlap with another emotional or nonemotional concepts. In the mental lexicon, there can be no such thing as a discrete category with firm boundaries, postulates Barrett (Wilson-Mendenhall et al. 2011). Because of this the standard view claim that anger, fear, sadness, happiness, disgust and surprise constitute not only discrete but universal categories is too radical (Barrett 2006). Even if equivalent names for these emotions can be found across cultures, they may not occupy equivalent conceptual prototype positions or have the same antecedents, associated display rules, etc. Barrett thus points out the
[email protected]
2.1 Universalism in the Psychological Research on Emotions
47
English-centricity of the standard view and its six-way division of the basic emotions. She also points out how the high levels of cross-cultural agreement on this division is a methodological artifact of the preferential use of forced choice paradigms with the predefined and imposed six basic emotion categories as the options to choose from (Barrett 2011). Barrett’s Conceptual Act Theory thus accepts Minimal Universalism as its core. It reduces the rank of the basic Ekmanian emotions form the central universal constructs to one of the many emotions acquired by English speakers in the course of their socialization and enculturation. It determines the nature of emotion paradox and explains why it exists, thus possibly creating a path toward a general consensus on the definition of emotions. It integrates and explains the role of language in the acquisition, formation and processing of emotions in transparent terms. It demonstrates the problems of cross-linguistic nonequivalence and functional distribution of certain terms in everyday use. It postulates boldly that emotions do not exist as inherent categories beyond the limits of Minimal Universalism, but are constructed as concepts in the course of socialization and enculturation. The subjective experiences of emotion in turn, are constructed ad hoc based on our acquired knowledge, experience, and culture. The model based on this theory is one of the most complex and inclusive of all proposed to date. Including elements from the appraisal theory, the dimensional theory, the standard view, as well as from anthropology and linguistics, the Conceptual Act Model (CAM; see Chap. 4) is the culmination of the protracted argument over the multiple dichotomies of emotion research and emotion theory (see Lutz and White 1986).
2.1.6 Conclusions—Emotional Universalism It could be said that the history of emotion research and theory in psychology is one of continuous deconstruction. Initial ideas proposed by Darwin, Wundt, and James were broad, rich in detail and clipped with multiple conditional caveats that captured the nuances of emotion processing on multiple levels. These ideas were then picked up fragmentarily, often out of context and dropping all the carefully placed caveats. Emotion research and theory faced the same struggles and challenges to its scientific legitimacy as their parent discipline did. The tight balance between ethics and high ecological validity and effectiveness of research possibly came the longest way since Landis’ knives and rats. And the theory went from positions in which both language and culture were embraced as factors in emotion research through wholesale rejection and denial of both, back to reconciliation again. Emotion research itself went from introspective theory of the forefathers through the reductionist paradigms of experimentation of the physiologist-psychologists of early twentieth century to a solid dynamic of data-driven models and theory. The models were probably the most valuable concepts applied to the theory and practice of emotion psychology, as they structured and guided the research on emotions and simplified the process of proposing and testing hypotheses. The
[email protected]
48
2 Emotion Universals—Argument from Nature
theory and research in emotion psychology thus swung from broad construction to narrow reductionism and deconstruction back to broad construction again, with an ever-increasing emphasis on empiricism.
2.2 Between Specificity and Universalism—Conclusion If there is a conclusion to be drawn from the dichotomy of universalist versus culture-specific nature of emotions it is that emotions are not an easy object to investigate systematically. Mainly because they are not a coherent phenomenon that obeys simple laws and yields to structured description. They can be considered on many levels from the biological through the psychological to the linguisticcultural, and the truth of their nature could not be determined on any single one, but across all these levels and on their intersections. Anthropologists observed the cultures, languages, and behaviors and inferred the probable principles of human psychological makeup that might govern them. Psychologists have postulated and proven the existence of certain psychological constructs that cause certain emotional reactions and behaviors. Each discipline approached the subject in its own way and ultimately reached similar conclusions. Anthropology found that culture and language are reflections of our partially universal mental and emotional reality. Psychology found that culture and language influence how our thoughts and emotions work. The stable tradition of anthropological research on emotions in the pure, non-vulgarized tradition of linguistic relativity principle produced a substantial and compelling evidence of expressive and perceptual diversity of emotional phenomena. Psychology struggled from holistic to reductionist to integrative approaches to emotions, searching all the while for the elusive universal principle of emotional expression and reluctantly embracing the evidence for emotion variability. The reconciliation between the two disciplines is a fact in all but name in the latest integrative theories and models of emotional processing. And yet there is at least one unresolved issue between the two—the issue of language. However defined, language is a slippery issue. Emotions manifest themselves in communication with people, in interaction with the objective reality, or with a memory of or future projection of such communication or interaction. And they manifest themselves through verbal and nonverbal channels. We speak with our bodies and with our voices, and these acts of speech are usually the doorways to understanding underlying emotion for both psychology and anthropology. It is therefore curious, that the discipline whose domain is language does not appear in the history of emotion research. Linguistics is indeed the great absentee from the historical debate on the nature of emotions, despite the fact that virtually all research on emotions is in some way and to a significant degree language-based. Whether considering emotion terms guideposts, simulacra, or parts of concepts, the linguistic labels for the underlying processes always played a significant role in psychological investigation. In anthropology language was one of the objects of investigation
[email protected]
2.2 Between Specificity and Universalism—Conclusion
49
from syntax through semantics to pragmatics. And yet actual formal linguistic analysis is absent from both disciplines. Linguistics has, however, developed a number of concepts, tools, and methods which could disambiguate many gray areas of emotion research where language is involved. It has the capacity to complement anthropological analyses and psychological models. Therefore, before turning to psychological models of emotions (Chap. 4), I will discuss the great absentee, linguistics, and how it can help explain what emotions are (Chap. 3).
References Arnold, M. B. (1960). Emotion and personality. Vol. I: Psychological aspects; Vol. II: Neurological and physiological aspects. New York: Columbia University Press. Averill, J. R. (1983). Studies on anger and aggression: Implications for theories of emotion. American Psychologist, 38(11), 1145–1160. Barrett, L. F. (2006). Are emotions natural kinds? Perspectives on Psychological Science, 1(1), 28–58. Barrett, Lisa Feldman. (2011). Constructing emotion. Psychological Topics, 20(3), 359–380. Barrett, L. F., Mesquita, B., & Gendron, M. (2011). Context in emotion perception. Current Directions in Psychological Science, 20(5), 286–290. Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: The self-assessment manikin and the semantic differential. Journal of Behavioral Therapy and Experimental Psychiatry, 25(1), 49–59. Bradley, M. M., & Lang, P. J. (1999). Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical Report C-1. The Center for Research in Psychophysiology, University of Florida. Bradley, M. M., Miccoli, L., Escrig, M. A., & Lang, P. J. (2008). The pupil as a measure of emotional arousal and autonomic activation. Psychophysiology, 45(4), 602–607. Brown, Roger. (1958). Words and things. New York: Free Press. Cannon, W. B. (1931). The interrelations of emotions as suggested by recent physiological researches. The American Journal of Psychology, 25(2), 256–282. Damasio, A. R. (1994). Descartes’ error. Emotion, reason, and the human brain. New York: Avon Books. Darwin, C. (1859). On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. London: John Murray. Darwin, C. (1872). The expression of emotions in man and animals. New York: D. Appleton and Company. Dror, O. T. (2014). The Cannon-Bard thalamic theory of emotions: A brief genealogy and reappraisal. Emotion Review, 6(1), 13–20. Ekman, P. (1965). Differential communication of affect by the head and body cues. Journal of Personality and Social Psychology, 2(5), 726–735. Ekman, P. (1970). Universal facial expressions of emotion. California Mental Health Digest, 8(4), 151–158. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200. Ekman, P. (1994a). All emotions are basic. In P. Ekman & R. Davidson (Eds.), The nature of emotion (pp. 15–19). Oxford: Oxford University Press. Ekman, P. (1994b). Strong evidence for universals in facial expressions: A reply to Russell’s mistaken critique. Psychological Bulletin, 115(2), 268–287. Ekman, P. (1998). Afterword: Universality of emotional expression? A personal history of the dispute. In C. Darwin (ed.), The expression of emotions in man and animal (pp. 363–393). Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. J. Power (Eds.), Handbook of cognition and emotion (pp. 45–60). New York: John Wiley and Sons Ltd. Ekman, P. (2003). Sixteen enjoyable emotions. Emotion Researcher, 18, 6–7.
[email protected]
50
2 Emotion Universals—Argument from Nature
Ekman, P., & Cordaro, D. (2011). What is meant by calling emotions basic. Emotion Review, 3(4), 354–370. Ekman, P., & Friesen, W. V. (1967). Head and body cues in the judgment of emotion: A reformulation. Perceptual and Motor Skills, 24, 711–724. Ekman, P., Sorenson, E. R., & Friesen, W. V. (1969). Pan-cultural elements in facial expressions of emotions. Science, 164(3875), 86–88. Elfenbein, H. A., & Nalini, A. (2002a). On the universality and cultural specificity of emotion recognition: A meta-analysis. Psychological Bulletin, 128(2), 203–235. Elfenbein, H. A., & Nalini, A. (2002b). Is there an in-group advantage in emotion recognition? Psychological Bulletin, 128(2), 243–249. Ellsworth, P. C. & Scherer, K. (2003). Appraisal processes in emotion. In: R. J. Davidson, H. Hill Goldsmith & K. R. Scherer (Eds.), Handbook of affective sciences (pp. 572–596). New York and Oxford: Oxford University Press. Eysenck, M. W., & Keane, M. T. (1995). Cognitive psychology. A student’s handbook. Hoboken: Taylor and Francis. Fischer, S. (2013). About face. Boston Magazine. http://www.bostonmagazine.com/news/article/2013/06/25/emotions-facial-expressions-not-related. Accessed 3 Mar 2015. Goodman, R. (2013). William James. The Stanford encyclopedia of philosophy (Winter 2013 Ed.). E. N. Zalta (Ed.). http://plato.stanford.edu/archives/win2013/entries/james. Accessed 1 Mar 2014. Hess, U., & Thibault, P. (2009). Darwin and emotion expression. American Psychologist, 64(2), 120–128. James, W. (1890). The principles of psychology. New York: Henry Holt and Company. James, W. (1893). Psychology. New York: Henry Holt and Company. Jones, E., Fear, N. T., & Wesseley, S. (2007). Shell shock and mild traumatic brain injury: A historical review. American Journal of Psychiatry, 164(11), 1641–1645. Kappas, A. (2006). Appraisals are direct, immediate, intuitive, and unwitting and some are reflective. Cognition and Emotion, 20(7), 952–975. Kim, A. (2014). Wilhelm Maximilian Wundt. The Stanford encyclopedia of philosophy (Winter 2014 Ed.). E. N. Zalta (Ed.). http://plato.stanford.edu/archives/win2014/entries/wilhelmwundt. Accessed 3 Mar 2015. Kleinginna, P. R., & Kleinginna, A. M. (1981). A categorized list of emotion definitions, with suggestions for a consensual definition. Motivation and Emotion, 5(4), 345–379. Landis, C. (1924). Studies of emotional reactions. General behavior and facial expression. Comparative Psychology, 4(5), 447–501. Lazarus, R. S. (1981). A cognitivist’s reply to Zajonc on emotion and cognition. American Psychologist, 36, 222–223. Lazarus, R. S. (1984). On the primacy of cognition. American Psychology, 39, 222–223. Lazarus, R. S. (2006). Emotions and interpersonal relationships: Toward a person-centered conceptualization of emotions and coping. Journal of Personality, 74(1), 9–46. Lindquist, K. A., Barrett, L. F., Bliss-Moreau, E., & Russell, J. A. (2006). Language and the perception of emotion. Emotion, 6(1), 125–138. Lindquist, K. A., Gendron, M., Barrett, L. F., & Dickerson, B. C. (2014). Emotion perception, but not affect perception, is impaired with semantic memory loss. Emotion, 14(2), 375–387. Luria, A. R. (1947). Travmaticheskaya afaziya. Klinika, semiotika i vostanovlitel’naya terapia. Moskva: Izdatel’stvo Akademii Meditsinskich Nauk SSSR. Lutz, C., & White, G. M. (1986). Anthropology of emotions. Annual Review of Anthropology, 15, 405–436. Panksepp, J. (1998). Affective neuroscience: The foundations of human and animal emotions. Oxford: Oxford University Press. Parajes, F. (2002). William James: Our father who begat us. In B. J. Zimmerman & D. H. Schunk (Eds.), Educational psychology: A century of contributions (pp. 41–64). London: Lawrence Erlbaum Associates. Russell, J. A. (1983). Pancultural aspects of the human conceptual organization of emotions. Journal of Personality and Social Psychology, 45(6), 1281–1288.
[email protected]
References
51
Russell, J. A. (1991). Culture and the categorization of emotions. Psychological Bulletin, 110(3), 426–450. Russell, J. A. (1994). Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychological Bulletin, 115(1), 102–141. Russell, J. A. (1995). Facial expressions of emotion: What lies beyond minimal universality? Psychological Bulletin, 118(3), 379–391. Russell, J. A. (2003). Core affect and the psychological construction of emotion. Psychological Review, 110(1), 145–172. Russell, J. A., & Barrett, J. A. (1999). Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76(5), 805–819. Russell, J. A., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11, 273–394. Russell, J. A., Lewicka, M., & Niit, T. (1989). A cross-cultural study of a circumplex model of affect. Journal of Personality and Social Psychology, 57(5), 848–856. Sayette, M., Cohn, J. F., Wertz, J. M., Perrot, M. A., & Parrott, D. J. (2001). A psychometric evaluation of the facial action coding system assessing spontaneous expression. Journal of Nonverbal Behavior, 25(3), 167–185. Schachter, S., & Singer, J. E. (1962). Cognitive, social, and physiological determinants of emotional state. Psychological Review, 69(5), 379–399. Scherer, K. (1997). The role of culture in emotion-antecedent appraisal. Journal of Personality and Social Psychology, 73(5), 902–922. Scherer, K. R., Clark-Polner, E., & Mortillaro, M. (2011). In the eye of the beholder? Universality and cultural specificity in the expression and perception of emotion. International Journal of Psychology, 46(6), 401–435. Scherer, K., & Wallbott, H. G. (1994). Evidence for universality and cultural variation of differential emotional response patterning. Journal of Personality and Social Psychology, 66(2), 310–328. Schlosberg, H. (1954). Three dimensions of emotions. The Psychological Review, 61(2), 81–88. Skloot, R. (2010). The immortal life of Henrietta Lacks. New York: Macmillan. Taylor, B. (2009). Paul Ekman. TIME Magazine. http://content.time.com/time/specials/packages/ article/0,28804,1894410_1893209_1893475,00.html. Accessed 1 Feb 2014. Titchener, E. B. (1914). An historical note on the James-Lange theory of emotion. The American Journal of Psychology, 25(3), 427–447. Wallace, A. F. C., & Carson, M. T. (1973). Sharing and diversity in emotion terminology. Ethos, 1(1), 1–29. Watson, J. B., & Rayner, R. (1920). Conditioned emotional reactions. Journal of Experimental Psychology, 3(1), 1–14. Whorf, B. (1956). Language, thought & reality. Selected writings of Benjamin Lee Whorf. Cambridge: MIT Press. Winston, A. S. (1990). Robert Sessions Woodworth and the “Columbia Bible”: How the psychological experiment was redefined. American Journal of Psychology, 15, 79–83. Wilson-Mendenhall, C. D., Barrett, L. F., & Barslou, L. W. (2011). Situating emotional experience. Frontiers in Human Neuroscience,. doi:10.3389/fnhum.2013.00764. Woodworth, R. S. (1938). Experimental psychology. New York: Henry Holt and Company. World Health Organization. (2015). F48 Other neurotic disorders. Neurasthenia. http://apps.who. int/classifications/icd10/browse/2015/en#/F48.0. Accessed 1 May 2014. Wundt, Wilhelm. (1894). Lectures on human and animal psychology. London: Swan Sonnenschein and co. Wundt, Wilhelm. (1902). Outlines of psychology. Leipzig: Wilhelm Engelmann Publishers. Zajonc, R. (1980). Feeling and thinking. Preferences need no inferences. American Psychologist, 35(2), 151–175. Zajonc, R. (1981). A one-factor mind about mind and emotion. American Psychologist, 36, 102–103. Zajonc, R. (1984). On the primacy of affect. American Psychology, 39, 117–123. Zimbardo, P. G., Johnson, R. L., & McCann, V. (2009). Psychology: Core concepts. New Jersey: Pearson.
[email protected]
Chapter 3
Linguistics—The Great Absentee
3.1 Introduction Linguistics stands somewhat apart from the disciplines of anthropology and psychology. Purely linguistic perspectives are conspicuously absent from anthropological and psychological research on emotions to the detriment of both. Linguistics has over a period of time paralleling the greatest developments in psychology and anthropology developed a great number of approaches to, theories of, and tools for the investigation of language. All of those could not only neatly complement the emotion theories and models developed in anthropology and psychology but also clarify the many gray areas in the interpretation of evidence from both disciplines. There were several reasons for the isolation of linguistic expertise from psychological and anthropological research on emotions. For one thing first the Saussurean structuralism, then the Chomskyan Revolution made linguistics itself turn inwards and toward increasing abstractions of both prescriptivism and the notion of ideal “Language”. For another, in academic circles linguistics was typically lumped with literary studies into inherently armchair style humanities. What empirical developments happened in linguistics would be either largely overlooked or classified as belonging to different disciplines. And so psychologists of emotion plagued by the emotion paradox never appeared to turn to lexicography, semantics, or corpus linguistics to define the links between words, their meanings, and their usage respectively. Linguistic studies in interdisciplinary fields such as psycholinguistics, neurolinguistics, or sociolinguistics often had their research classified as psychology, neurology, or sociology respectively. This would be usually because language in those studies would be typically serving as a functional gateway to underlying psychological, neurological, or sociological processes. However, though neglected, linguistics retained its great potential as a supporting discipline in emotion research.
© Springer International Publishing Switzerland 2016 H. Ba˛k, Emotional Prosody Processing for Non-Native English Speakers, The Bilingual Mind and Brain Book Series 3, DOI 10.1007/978-3-319-44042-2_3
[email protected]
53
3 Linguistics—The Great Absentee
54
3.2 From Saussure to Chomsky—The Great Abstraction The first half of the twentieth century linguistics was dominated by Saussurean structuralism. Ferdinand de Saussure is considered one of the forefathers of modern linguistics. He was a Swiss linguist with an interest in semiotics, which would influence his theory of language and he lived from 1857 to 1913 (Joseph 2002). Despite his early death, Saussure became almost literally the teacher and mentor to all linguists until the advent of the Chomskyan Revolution. The metaphor is hardly an exaggeration, as Saussure’s only major publication was a collection of his lectures—all recorded and edited by his students and published posthumously in 1916 as the Course in General Linguistics. Prior to Saussure linguistics was quaintly devoted to diachronic (comparative, historical) studies of language, concerning itself with questions of language origin and change, etymologies, and grammars. While Saussure himself have done some work in such vein, he innovated by legitimizing synchronic (investigating current state of language) studies of language (McCabe 2011). Linguistics could and should, proclaimed Saussure, focus on what languages are and not exclusively on how they came to be what they are. He laid the foundations of the many forms of formal structuralism by making the distinction between the language structure, which he called la langue, and the functional utility of language called la parole. La langue paralleled Sapir’s understanding of dual code representation language system, while la parole was the language as evident in everyday use with all its pragmatic layer of meaning. La langue was the formal language competence, la parole was language performance. Saussure believed la langue to be largely universal in terms of its overall structure and mechanics that it was stable across languages, and that its abstracted and perfect form should be the main object of linguistic investigation. La parole in turn was highly variable, subject to the vagaries of imperfect competence of speakers and of context, and should only be navigated to gather evidence for our understanding of la langue (Joseph 2002). The idea of investigating an abstract and perfect version of language and its idealized structures would prove enduring. As in any discipline, opposing ideas about the nature of the main object of investigation were proposed and argued over even while the linguistic mainstream became firmly occupied by structuralist and post-structuralist theories. On the whole, therefore, the sole agreed upon point of the definition of language in the field was that it is a complex means of communication. In the late 1950s this idea would shift rather dramatically. What began as a novel idea first formulated by Noam Chomsky in his paper for the 1956 MIT Symposium on Information Technology would blossom into a radical new direction in linguistics. This and the later publications by Chomsky would briefly (about 20 years) enliven the field by promising a simple and systematic solution to the fundamental problem of describing all the complexity and chaos of human languages. This promise, long dubbed impressively as the “Chomskyan Revolution” (Newmeyer 1986), however,
[email protected]
3.2 From Saussure to Chomsky—The Great Abstraction
55
dissolved into increasingly abstract, abstruse, and fractious argument over the finer details of the Universal Grammar theory, leading to an increasing undertone of disillusionment of many linguists with Chomskyanism as the mainstream (Droste and Joseph 1991). Funded by US Department of Defense (Chomsky 1956), Chomsky’s early research was dedicated to the development of computing languages for the fledgling computer science (Joseph 2002). Chomsky’s (1956) presentation already set the goal for the science of linguistics—describe an idealized “Language”, an object unbiased by performative aspects of communication, with the kind of systematic simplicity and rigor that was the norm in mathematics (Chomsky 1956). Chomskyan linguistics would pursue this basic premise into ever more abstraction from semantics, pragmatics, and any aspect of actual language use, favoring instead the idea of a perfect “deep structure” of grammar carried by the idealized speaker–hearer (Joseph 2002). The mature Chomskyan thought in its mentalistic approach to language as a phenomenon rejected the communicative aspects of language altogether (Allan 2009), which goes some way to explain why problems such as social interaction or emotions rarely graced the mainstream linguistics in the second half of the twentieth century. Whereas Saussure still encouraged linguists to go out into the field and investigate la parole for the betterment of our understanding of la langue, Chomsky’s successors focused on an idealized abstraction of “Language” so much they would actively dismiss performative aspects of language as irrelevant. In this they came dangerously close to the deterministic vulgarization that was the strong version of the Sapir–Whorf Hypothesis (Brown 1958). But, as was the case with psychology, alternative approaches to language investigation continued to be developed, some of it within, some of it outside the mainstream. Semiotics explained the relationship between language, its mental representation and objective reality. Semantics delved into problems of language meaning, which meaning was then cataloged by lexicography. Addressing the issues of the growing potential for excessive formality of description lowering the ecological validity of the object of investigation, corpus linguistics aimed to document and illustrate how language is actually performed. Finally, pragmatics ventured to systematically explain the context of language performance and the processes governing the perception and salience of context elements in interpersonal communication. These subdisciplines admittedly rarely ventured into emotion research specifically, but the language–cognition entanglement issues they tackled paralleled the language–emotion–cognition entanglement problems in psychology and anthropology. Additionally, the isolation of linguistics from other fields made its conclusions particularly intriguing, as although it was largely free of anthropological or psychological preconceptions about emotions, in many areas it reached the same conclusions as those two disciplines. And because the focus of linguistics was on language and communication primarily, the perspective it provided became complementary to those of anthropology which focused on culture, and psychology which focused on underlying psychoaffective processes.
[email protected]
3 Linguistics—The Great Absentee
56
3.3 Semiotics For both anthropologists and psychologists the basic idea of an emotion occurrence is that an object existing in the external reality causes an emotion in the self. The entire situation is balanced between the objective reality and the subjective self with the body serving as a sort of avatar of the latter within the former. Linguists focus on the language rather than on the psyche it communicates or the culture it reflects. It is therefore unrestrained by the need to incorporate subjective perspectives or cultural variability into its understanding of language, thought, reality, and how they relate to one another. This basic relationship is investigated by the semiotics, one of the sister disciplines of linguistics (Nöth 1990). The first basic formulation of the relationship was proposed by Ferdinand de Saussure. He proposed a thoroughly subjective, psychological model whereby concepts (representations of objective reality) are connected indirectly to arbitrary signs (words). In this model the relation to the objective reality is merely implied, not explicit (Saussure 1959). The words are arbitrary points of reference to specific concepts, an idea echoing the conventionality of language and the dual system of language proposed by Edward Sapir. In this as well as the later semiotic models of linguistic denotation the linguistic relativity principle in its pure form did gain a measure of acceptance in formal linguistics (Allan 2009). The most widely accepted and implemented model of linguistic denotation includes the phenomenological trifecta: language, thought, and reality. The model was originally proposed by Charles K. Ogden and Ivor A. Richards in 1923 and survives with minimal modifications until today as the semiotic triangle (Fig. 3.1). The basic idea behind the semiotic triangle was not new in 1923, as its philosophical roots can be traced as far back as the second-century BCE to the Greek Stoics and to the Plato’s Cave (Allan 2009). The novelty of Ogden and Richards’ model was in its attention to the nature of the relationships between the different parts of the model. The model consists of language (the conventional code, words), thought (the mental representations of reality, concepts), and referents (objects denoted in objective reality), and the relations between them (Ogden and Richards 1923). The relationship between thought and reality is direct, as within semiotics it is assumed that we form representations through the unbiased evidence of our senses. The relationship between language and thought is also solid because language is an arbitrary creation which is also internalized and mentally represented.
Fig. 3.1 Basic semiotic triangle
[email protected]
3.3 Semiotics
57
The relationship between language and reality is indirect, as it is assumed that there is nothing inherent in formal language that would nonarbitrarily reflect objective reality. Words, Ogden and Richards postulate, merely point to aspects of reality, but are not inherent parts of it. The semiotic triangle is a very basic idea which can be easily extrapolated to emotions, which are technically the kind of mental entities semiotics classifies as concepts/mental representations. It also spells out a problem which merely floated abut indistinctly in the background of psychological and anthropological theories of emotion. Anthropologists postulated that emotion terms were reflections of the mass mind of every culture, but had no tools to investigate the mind. Psychologists had the tools, but were deeply uneasy with language as a constituent of emotion in any sense, treating it merely as a guidepost or simulacrum of emotion. By postulating a definitively arbitrary but direct link between language and thought semiotics defines and clarifies the problem of this relationship. Language is a constituent part of every concept, but it is arbitrary, and as such can shift and change as new circumstances of thought or reality arise. The link between language and reality is sensibly made indirect, leaving room for in-context variability of denotation and polysemy of emotion terms alike. Furthermore, assuming that the direct connection between reality and thought could represent the psychological notion of self, virtually all major psychological models of emotion as well as the linguistic relativity principle could be reconciled within the semiotic triangle. Thus the direct connection between language and thought would stand for the enculturation and socialization processes which conventionalize acquired perceptions of reality. The language–thought side would represent the Sapirian dual language system, with the appraisals forming a perceptual filter between this and objective reality. Ekman’s basic emotions would be confined to the mental representation constituent with an active feedback loop to objective reality. Barrett’s integrative model would include Sapir’s dual language system and Ekman’s subsystem. The beauty of the semiotic triangle is in its simplicity and the fact that it does not prioritize any of its constituents over others. In other words, it can be used not only to serve as a model for emotion definitions, but as a model explaining the disparities between different approaches to the nature of emotion.
3.4 Semantics Semantics, in the simplest of terms, is the study of meaning. It is rooted in the basic principles of semiotics, but draws also on pragmatics for associative meanings. It concerns itself with words, with what and how they mean, as well as with the meaning relations between words. Semantics is also closely connected with lexicography, that is the study of dictionaries and the means of compiling them, and with corpus linguistics, that is the quantitative and qualitative study of large bodies of texts (corpora). With the advance of computer-assisted language research
[email protected]
3 Linguistics—The Great Absentee
58
the three disciplines began to truly support and inform one another. At the same time, with the ponderous but deliberate shift toward empiricism in linguistics, semantics became increasingly technical and reached out to fields such as psychology and anthropology to broaden the scope of its investigation of meaning. The most advanced approach to semantics in this vein was developed and proposed by the linguist Anna Wierzbicka. With half a century of active academic career in which she has deservedly gained recognition from both anthropology and psychology for her insights into the nature of language meaning and its function in creating culture and its role in the human psyche (Wierzbicka 1986b). Her major contribution to semantics was the Natural Semantic Metalanguage, that is, a rudimentary language of perceptual–semantic primitives which can be used to asses semantic equivalence across languages. Her major contribution to the emotion research was her explanation of the ways in which language can express emotions and why the structure and meaning of the different emotion terms across languages matters. The definition of the Natural Semantic Metalanguage is in its name. The “Natural” refers to the postulate that it reflects universal perceptual primitives so basic in their nature that they are only limited by the universal biology and cognitive capacity of the human species. This should also guarantee a 1:1 equivalence of all items across languages (Wierzbicka 1980). The “Semantic” refers to the postulate that each item of this metalanguage is lexicalized, that is, it is expressed in lexical (as a word/words) form, in every human language. The “Metalanguage” refers to the fact that this minimal language is to be utilized to describe existing human languages (Wierzbicka 1986a). Wierzbicka calls Natural Semantic Metalanguage (NSM) the “bedrock of intellectual understanding” for all mankind which is accessible to all, particularly to laypeople unburdened by the preconceptions of scientific classifications of reality. While the NSM rings appealing to universalist psychology, Wierzbicka proposes to use it not to extol universalist approaches to human nature, but to determine the points of divergence and degree of variability of lexical expression across languages (Harkins and Wierzbicka 2001). In this aspect the NSM becomes a tool for anthropological linguistic description of language- and culture-specific terms. The most recent version of the NSM was described in detail by Harkins and Wierzbicka (2001), and comprises 61 items. These range from substantives which serve to distinguish the self from others (items such as i, you, or someone) to perceptual/cognitive capabilities (e.g., think, know, see, or hear) to determiners of space (e.g., where, here, far, or near) and time (e.g., when, now, after, or before). While the NSM does contain the perceptual primitive of feel and evaluative primitives of good and bad, Wierzbicka considers emotions too complex to fit logically within the NSM beyond these basic dimensions. By including the primitives of feel for perception and good and bad for evaluation, Wierzbicka’s NSM echoes Russell’s Minimal Universality. And like Russell, regarding emotions Wierzbicka falls definitively on the culture-specificity side of the argument and brings evidence from semantics to the theoretical debate with Ekmanian universalism. There is one crucial difference between
[email protected]
3.4 Semantics
59
Wierzbicka’s understanding of emotions and psychological approaches. For her emotions are not experienced and expressed, but communicated. Emotions start as basic primitives and become molded by societal and cultural pressures through acts of interpersonal communication (Wierzbicka 1986b). In interpersonal communication we acquire the patterns in which emotions are communicated in our language, and these fall into two basic categories. First, there are the optional functions of grammar, such as diminutives, honorifics, or derogatory labels for people—all functioning as simple pronouns grammatically, but carrying additional emotional semantic load. Second, there are the lexicalized emotional expressions which serve specific emotive function in communication, such as interjections, taboo words, emotive or vulgar vocabulary (Wierzbicka 1986b). Wierzbicka also includes into the second category the differential use of registers depending on the interlocutors and their social status. She thus expands the scope of lexicalization. At its most basic level lexicalization implies a coding of a concept into a single lexical item (a word). In the case of emotions, proposes Wierzbicka, what started as a universal NSM percept could be molded into a wide variety of lexical expressions—words, phrases, metaphors, or circumlocutions, all referring to the same basic universal emotive core in culturally specific ways (Wierzbicka 1998). For example, in one of the Aboriginal Australian languages investigated by Wierzbicka there is no distinction between what the Englishspeaking world knows as the emotions of fear and shame. In English-speaking cultures everyday situations where shame without fear and fear without shame are felt do occur. In this Aboriginal culture similar circumstances produce what to a Westerner would be a mixture of fear and shame (Wierzbicka 1986a). Such differences in lexicalization are a reflection of what Wierzbicka calls the “habits of the heart” of every culture (Harkins and Wierzbicka 2001). Thus Wierzbicka extends the Whorfian idea of habituation to the ways in which we experience and communicate the experience of emotions. If semantics is the study of meanings, lexicography is the study of recording words and their meaning for general use as well as for posterity. Lexicography and corpus linguistics, as two sides of the same coin have great potential to complement the study of emotion terms. In comparison with both psychology and anthropology, lexicography, the study of compiling dictionaries, has a truly venerable tradition. Apocryphally lexicography formally started in 1755 with the publication of the Dictionary of the English Language by Samuel Johnson. It then progressed laboriously until the dawn of the digital age when it picked up the pace to the point of almost matching the pace of language change—an unprecedented feat in the history of linguistics (Durkin 2015). The typical monolingual dictionary of definitions paired with a thesaurus has the greatest potential for emotion research. It will typically provide an exhaustive list of all words that share the basic denotation of a specific emotional state. Because dictionaries and thesauri record all language with little discrimination, such a list will often contain colloquial, regional, or even dated terms. However, such a list will have been verified by a panel of specialist lexicographers and native speakers as denoting a specific emotion with specific antecedents and consequences (Hartmann and James 1998). Such an
[email protected]
3 Linguistics—The Great Absentee
60
exhaustive list may be useful, as it will capture multiple registers according to the habitual patterns of expression for a specific culture (Wierzbicka 1986b). What the dictionary would not tell us until about the 1980s is the likelihood of encountering one emotion term over another denoting the same or similar emotional phenomenon (Scott 2012). That kind of data would be provided by corpus linguistics, that is, the quantitative and qualitative study of massive bodies of text (corpora). Corpora, the collection and analysis of which was made possible by the advent and rapid development of computer linguistics, allow to investigate how specific words are used, in what contexts, and in collocation with other words. Some, as the British National Corpus, even give partial demographic information on the authors of the recorded texts (Jensen 2014). They are organized by source of texts they contain (e.g., records of spoken language, news media, literature, academic publications), and all items within are tagged as belonging to specific parts of speech. Crucially, they allow for analyses of frequency of occurrence of specific words in absolute terms (total frequency in corpus) and in relative terms (Zipf ranking). In recent years two online corpora have increased in relevance: the Corpus of Contemporary American English (known as coca) (Davies 2008), and the subtlex initiative (Van Heuven et al. 2014). COCA is a massive (520 million words) online corpus of varied texts and registers which allows not just for investigating how emotion terms are used but in what contexts and circumstances. subtlex is a set of corpora compiled using movie subtitles in Dutch (Keuleers et al. 2010), Chinese (Cai and Brysbaert 2010), American English (Brysbaert et al. 2009), British English (Van Heuven et al. 2014), Spanish (Cuetos et al. 2011), German (Brysbaert et al. 2011), Greek (Dimitropoulou et al. 2010), and Polish (Mandera et al. 2014). The idea behind subtlex is to capture a maximally natural spoken expressions, and they use movie subtitles on the assumption that movie scripts are written with spoken language in mind. Combined, coca and subtlex corpora can provide an excellent insight into not only what emotion words mean in context but how frequently they are used and, roughly, by whom (contrasting registers on coca). To summarize, semantics is the study of word meaning, lexicography is the study of recording words and meaning, and corpus linguistics is the study of word distribution. Combined they can inform emotion research in the one area that remains poorly explored—the language of emotions. As I remarked in Chaps. 1 and 2, anthropologists investigate language as but a reflection of culture and psychologists are deeply uneasy with the concept of incorporating emotion language into their investigation of emotions proper. The reluctance in both fields to delve deeper into the matter of language and meaning in the thought–language relation of the semiotic triangle is perfectly understandable. Language signs are arbitrary, after all, and emotions themselves, as both disciplines agree, are expressed primarily in the non–verbal and largely unstructured body language, while the verbal communication of emotions is subject to the emotion paradox. Semantics, however, especially within the framework proposed by Anna Wierzbicka, postulates quite reasonably that the arbitrariness of language does not make it meaningless. For one thing, linguistic evidence form semantics supports Russel’s Minimal
[email protected]
3.4 Semantics
61
Universality. For another, the systematic and functional analyses of emotion language made possible by lexicography and corpus linguistics respectively allows for a valid and reliable investigation of emotion language. Semantic analysis using dictionary and corpus data can serve as point of comparison or contrast for the output of experiments using free labeling paradigms in emotion recognition tasks. An analysis of valence and arousal measures inherent in certain emotion or emotive terms can additionally guide the investigations into the nature of the language–emotion interface. Thus semantics can complement emotion research in terms of what is verbalized.
3.5 Pragmatics What semantics cannot address is whatever is not expressed verbally. The contextual and associative layers of word meaning can be incorporated into word definitions and dictate the compositions of thesauri entries, but analyzing those layers is beyond the domain of semantics. But they are covered within linguistics by the field of pragmatics. If semantics focuses largely on the language–thought arm of the semiotic triangle, pragmatics deals with the oblique arm connecting language with reality (Fig. 3.1). As a distinct field pragmatics is deeply rooted in philosophy and phenomenology and has long been an open but strictly theoretical discipline. Experimental pragmatics is a novel approach which has been under development since early 2000s in Europe under such initiatives as euro-exprag and the Poznan´ School of Pragmatics under Roman Kopytko (Kopytko 2002). The latter has increased its output in recent years, with research on a variety of topics related to affective pragmatics, such as irony processing (Bromberek-Dyzman 2012, 2014, 2015), neuropragmatics of emotion (Jon´czyk 2015; Pikusa and Jon´czyk 2015; Jon´czyk et al. 2016), as well as emotional prosody processing (Ba˛k 2013). The empirical vein in pragmatics remains, however, in its nascent stage and it is its venerable theoretical tradition that can facilitate emotion research more reliably. Pragmatics, at its most basic level, is the study of communicative contexts and as such it has developed two ideas which are central to the pragmatic approach to emotions, but have been somewhat downplayed in psychology and anthropology. One is the idea of agency in the communication of emotions, which has been best developed in Kopytko’s notion of Individualized Affective Potential (IAP). The other is the mechanism governing human attention to aspects of context which disambiguate the communication of ideas and emotions alike, an idea known in pragmatics as the Relevance Theory (Sperber and Wilson 2012). In 2002 Roman Kopytko published his book The mental aspects of pragmatic theory: An integrative view. It became the cornerstone for the Poznan´ School of Pragmatics and set out the basic premise of affective pragmatics, which became the topic of choice for his students. The premise of Kopytko’s take on affective pragmatics is based on his critique of the reductionist approaches to investigating emotions in mainstream psychology and combining influences from various
[email protected]
3 Linguistics—The Great Absentee
62
related fields into a new pragmatic framework for emotion research. His understanding of reductionism in psychology is aimed at the perspective implicit in both the research and the conclusions of mainstream psychology. In this reductionist perspective emotions are discussed and investigated as phenomena virtually without a source, entities that can be interpreted independently of their source; emotions are expressed or recognized. In Kopytko’s linguistic–pragmatic perspective emotions are communicated. What exists between interlocutors is not a void into which emotions are expressed by one party to be fished out of it and interpreted by another. It is a rich context of perceptions, preconceptions, display rules, situational circumstances, and above all it is a reciprocal space in which the intentions of mutual communication between interlocutors meet. In such a space emotions are an important but merely one of many aspects of communication. Emotions permeate every instance of communication, but are subordinate to the communicative intent, and as such they can be controlled, negotiated, (mis)interpreted, or even strategically deployed. Kopytko’s affective pragmatics theory is built within this perspective. In phenomenological terms Kopytko sees emotions as non-Cartesian objects, that is, their nature goes beyond the body–mind dualism, and implicates social, cultural, and contextual factors, as well as the various interactions between these factors. Conceptually, he sees emotions on a truncated continuum from mostly cognitive to mostly affective in character with no purely cognitive or purely affective concepts at the extremes of the continuum. And in the context of communication he sees emotions as temporally dynamic phenomena with a distinct stimulus onset, a peak of intensity, and an offset, and in this he approximates Barrett’s (2011) position on the continuous and undulating nature of emotional experience. Most crucially, Kopytko introduces the notion of agency into the communication of emotion. In simple terms this means that it does not matter so much what emotion is expressed but by whom to whom and in what circumstances. Here there are echoes of Arnold’s (1960) take on emotion appraisals, but Kopytko emphasizes more the agency and the self in the context of communication. Specifically, he proposes that in the course of socialization and enculturation each individual acquires an Individualized Affective Potential (IAP), that is, a degree of competence and a personal style of emotional communication. Our emotional communication competence refers to our knowledge of and capability to defer to socially sanctioned display rules in appropriate circumstances. Our emotional communication style is our personal “habit of the heart” as Wierzbicka (Harkins and Wierzbicka 2001) would put it, and it refers to our self-perceived license to express emotion within or outside the norms defined by our culture, language, or socially imposed display rules. Within the intentionally reciprocal communicative perspective of Kopytko’s affective pragmatics IAP thus also refers to our capacity to compensate for our interlocutors’ lack of competence or vagaries of style in the communication of emotions. Taken together, in pragmatics emotions are to be studied with and within the communicative context they occur in, including the interlocutors involved with all their cognitive–affective capacities ensconced in their IAPs.
[email protected]
3.5 Pragmatics
63
Furthermore, in interpersonal communication we do not merely communicate in context, but can use context as a means of communication. Psychology and anthropology deal with the problem of language of emotions by drawing a line between verbal and nonverbal communication assuming that each word or expression denotes an emotional state in a simple linear fashion, from sign to meaning. Pragmatics embraces the fact that in communication the denotation is anything but straightforward. Irony, humor, strategic placement of silence, aspects of praxis— all those rely on partial or complete decoupling of signs and meaning. In pragmatics the divide is not between what is expressed verbally and nonverbally but between the overt, explicit signs (what is manifest in communication, verbal or nonverbal) and the covert, implicit meaning (Janney 1996). The difference here is crucial and crucially important for the pragmatic theory of emotions. Given that human beings navigate such a rich complexity of contextual input and arrive successfully at correct conclusions about their conspecifics’ intended meaning means that there must be some kind of mechanism supporting the interpretative process. Within pragmatics this mechanism is known as the Relevance Theory, and it has been formulated most succinctly by Sperber and Wilson as two basic principles: The Cognitive Principle of Relevance Human cognition tends to be geared to the maximum of relevance. Communicative Principle of Relevance Every act of overt communication conveys a presumption of its optimal relevance. (Sperber and Wilson 2012)
The Cognitive Principle of Relevance, in other words, dictates that we pay attention to those aspects of the communicative context which are the most relevant for the current task of interpreting what is being communicated. The Communicative Principle of Relevance specifies that every time we communicate overtly, regardless of whether we communicate through verbal or nonverbal means, that communication is relevant. Thus, applying Relevance Theory to emotion communication, affective pragmatics postulates that every type of verbal or non-verbal expression of emotion in interpersonal communication is relevant and will be taken into consideration in interpreting the emotional portent of the ongoing communication. It could be argued that what pragmatics achieves in terms of emotion research is to resolve the emotion paradox by rephrasing the problem. Instead of puzzling over the mismatch between the linguistic representation, bodily expression, and psychosomatic reality of emotional experience, pragmatics postulates simply that whether they match or no, they are all overt communication. As such all are relevant and will be taken into account in interpreting emotions in communication. The successful conveyance of the implicit emotion though explicit means is the focus of the pragmatic theory of emotion. Thus the only things that may stand in the way of successful communication of emotion are lack of competence in communicating emotions or a personal style of emotion communication which includes habitual neglect of certain cues relevant for the successful recognition of specific emotions. This idea, at the convergence of Kopytko’s IAP and Wilson and Sperber’s Relevance Theory, may be the most significant potential contribution of
[email protected]
3 Linguistics—The Great Absentee
64
linguistic pragmatics to emotion research at least at the level of evidence interpretation. Most experiments in emotion research are constructed on various emotion recognition tasks. Pragmatics suggests that we do not so much recognize emotions as interpret them as an attribute of a specific speaker in specific circumstances. Hence who the speaker is and what the circumstances are matters in terms of emotion recognition.
3.6 Conclusions The problem of emotions was not frequently considered within linguistics, but the discipline can nonetheless serve to complement emotion research in several important ways. Most broadly, linguistics has the theoretical and practical means to investigate the language of emotions. And because a comprehensive and informed investigation of linguistic aspects of emotional phenomena has largely been eschewed by both psychology and anthropology, linguists can fill this crucial gap. A grounding in semiotics as the organizing principle allows linguists to orientate themselves among the evidence supporting different emotion theories and approaches. A long tradition of lexicographic record and the recent developments in exploring the functional and distributional aspects of language in corpus linguistics gives linguists a solid baseline of semantic data against which to compare the results of experiments on emotional language. And pragmatics resolves the emotion paradox and reframes the two problems of emotion expression and recognition as a single problem of reciprocal emotion communication. Anthropology, specifically anthropological linguistics, focuses on the reflections and manifestations of culture and national character in the language of emotions. Psychology focuses on the workings of the mind and body in emotions and considers language a flawed gateway toward understanding those underlying psychosomatic mechanisms. In other words, anthropological evidence roughly aligns with the reality/referent point of the semiotic triangle, and psychology with the thought/concept point. What linguistics offers is the understanding of the language of emotions itself, thus completing the trifecta. It is my belief that a comprehensive understanding of the nature of emotions can only be achieved by integrating the theory and practice of all three disciplines. This variant of integration, however, pertains only to the theoretical and methodological framing of the study presented in Chaps. 6–7. More directly, the integrative approach of the title of this book refers to the paradigm constructed for this study. The paradigm draws generally from anthropology, psychology, and linguistics, but owes the most to the various models of emotion processing developed in psychology over the last century. Therefore in Chap. 4 I will discuss those models which had the greatest bearing on the eventual development of the integrative paradigm for this study.
[email protected]
References
65
References Allan, K. (2009). The western classical tradition in linguistics. London: Equinox. Arnold, M. B. (1960). Emotion and personality. Vol. I: Psychological aspects; Vol. II: Neurological and physiological aspects. New York: Columbia University Press. Ba˛k, H. (2013). Empathy and distress: Internal and external context dependency of emotion term selection. Poznan´ Studies in Contemporary Linguistics, 49(2), 137–165. Barrett, L. F. (2011). Constructing emotion. Psychological Topics, 20(3), 359–380. Bromberek-Dyzman, K. (2014). Attitude and language. On explicit and implicit attitudinal meaning processing. Poznan´: Wydawnictwo Naukowe UAM. Bromberek-Dyzman, K. (2012). Affective twist in irony processing. Humana. Mente Journal of Philosophical Studies, 23, 83–111. Bromberek-Dyzman, K. (2015). Irony processing in L1 and L2: Same or different? In R. R. Heredia & A. Cies´licka (Eds.), Bilingual figurative processing language (pp. 215–240). New York: Cambridge Universiyty Press. Brown, Roger. (1958). Words and things. New York: Free Press. Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A. M., Bölte, J., & Böhl, A. (2011). The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology, 58, 412–424. Brysbaert, M., New, B., & Keuleers, E. (2009). Adding part of speech information to the SUBTLEX-US word frequencies. Behavior Research Methods, 44(4), 991–997. Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS ONE, 5(6), e10729. Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory,. doi:10.1109/TIT.1956.1056813. Cuetos, F., Glez-Nosti, M., Barbon, A., & Brysbaert, M. (2011). SUBTLEX-ESP: Spanish word frequencies based on film subtitles. Psicologica, 32, 133–143. Davies, M. (2008) The corpus of contemporary American english: 520 million words, 1990–present. http://corpus.byu.edu/coca. Accessed January 1, 2014. Dimitropoulou, M., Duñabeitia, J., Avilés, A., Corral, J., & Carreiras, M. (2010). Subtitle-based word frequencies as the best estimate of reading behaviour: The case of Greek. Frontiers in Psychology, 1(218), 1–12. Droste, F. G., & Joseph, J. E. (1991). Introduction: Linguistic theory and grammatical description. In F. G. Droste & J. E. Johnson (Eds.), Linguistic theory and grammatical description: Nine current approaches (pp. 1–23). Amsterdam: John Benjamins Publishing Company. Durkin, P. P. (2015). Borrowed words. A history of loanwords in English. Oxford: Oxford University Press. Harkins, J., & Wierzbicka, A. (2001). Introduction. In J. Harkins & A. Wierzbicka (Eds.), Emotions in crosslinguistic perspective (pp. 1–34). Berlin: Mouton de Gruyter. Hartmann, R. R. K., & James, G. (1998). Dictionary of lexicography. London: Routledge Taylor & Francis. Janney, R. W. (1996). Speech and affect: Emotive uses of English (Unpublished manuscript). Munich. Jensen, K. E. (2014). Linguistics and the digital humanities: (computational) corpus linguistics. MedieKultur: Journal of Media and Communication Research, 57, 115–134. Jon´czyk, R. (2015). Hemispheric asymmetry of emotion words in a non-native mind: A divided visual field study. Laterality: Asymmetries of Body, Brain and Cognition, 20(3), 326–347. Jon´czyk, R., Boutonnet, B., Musiał, K., Hoemann, K., & Thierry, G. (2016). The bilingual brain turns a blind eye to negative statements in the second language. Cognitive, Affective, & Behavioral Neuroscience, 16(3), 527–540. Joseph, J. (2002). From Whitney to Chomsky. Essays in the history of American linguistics. Amsterdam: John Benjamins Publishing.
[email protected]
3 Linguistics—The Great Absentee
66
Keuleers, E., Brysbaert, M., & New, B. (2010). SUBTLEX-NL: A new frequency measure for Dutch words based on film subtitles. Behavior Research Methods, 42(3), 643–650. Kopytko, R. (2002). The mental aspects of pragmatic theory: An integrative view. Poznan´: Motivex. Mandera, P., Keuleers, E., Wodniecka, Z., & Brysbaert, M. (2014). Subtlex-pl: Subtitle-based word frequency estimates for Polish. Behavior Research Methods, 47(2), 471–483. McCabe, A. (2011). An introduction to linguistics and language studies. London: Equinox. Newmeyer, F. J. (1986). Has there been a “Chomskyan Revolution” in Linguistics? Language, 62(1), 1–18. Nöth, W. (1990). Handbook of semiotics. Bloomington: Indiana University Press. Ogden, C. K., & Richards, I. A. (1923). The meaning of meaning. Cambridge: Cambridge University Press. Pikusa, M., & Jon´czyk, R. (2015). Functional abnormalities in Broca’s area in adolescents with ADHD: A resting-state fMRI study. Poznan´ Studies in Contemporary Linguistics, 51(1), 163–177. Saussure, F. (1959). Course in general linguistics. New York: Philosophical Library. Scott, M. (2012). Looking back or looking forward in corpus linguistics: What can the last 20 years suggest about the next? Ibérica, 24, 75–86. Sperber, D., & Wilson, D. (2012). Meaning and relevance. Cambridge: Cambridge University Press. Van Heuven, W. J. B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). Subtlex-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67, 1176–1190. Wierzbicka, A. (1980). Lingua mentalis. Leiden: Brill Academic Pub. Wierzbicka, A. (1986a). Human emotions: Universal or culture-specific? American Anthropologist, 88(3), 584–594. Wierzbicka, A. (1986b). Does language reflect culture? Evidence from Australian English. Language in Society, 15(3), 349–373. Wierzbicka, A. (1998). Russian emotional expression. Ethos, 26(4), 456–483.
[email protected]
Chapter 4
A Different Look at Emotion Processing Models
4.1 A Different Approach to Modeling and Visualization In Chap. 2 I traced the development of the existing theories of emotion from their roots in Darwinian thought to the most recent integrative approaches. The main focus in that chapter was establishing the definitional premises, outlining what the mainstream holds to be the nature of emotional phenomena. One of the more consistent aspects of all those definitions was the temporally dynamic nature of emotions, their existence along a timeline with discernible onset, peak, and offset. The differences between different approaches and definitions of emotion often come down to, in large part, disagreements over the way emotions work, over the chain of events leading from a stimulus to an emotion. These chains of events are usually represented as emotions processing models, of which there is an impressive variety, usually lumped into three groups: psychological, computational, and embodiment models. Computational models are constructed around the idea of maximum processing efficiency and are typically used in man–machine communication, emotion synthesis studies, and behavioral sciences to simulate emotions on an input/output basis (Marsella et al. 2010). Embodiment models have grown out of the long-standing tradition of validating the results of psychology research through corroborating evidence from neurology and neurophysiology. The basic premise of those models is the assumption that emotion processing is partially based on the reading of continuous neurophysiological feedback from our neurophysiological reactions to the environment (Niedenthal 2007). However, the study reported in Part II of this book was predicated entirely of psychological models. Therefore I will limit my critical overview of the models to the psychological ones developed on the basis of most crucial theories described in Chap. 2. Most of the emotion models in psychology explain the nature of emotional processing by placing various cognitive-affective and neurophysiological components in a logical order rather than along a specific timeline. Hierarchy of component © Springer International Publishing Switzerland 2016 H. Ba˛k, Emotional Prosody Processing for Non-Native English Speakers, The Bilingual Mind and Brain Book Series 3, DOI 10.1007/978-3-319-44042-2_4
[email protected]
67
68
4 A Different Look at Emotion Processing Models
processing and the detail in the description of components trumps the temporal progression of emotion processing. Despite the differences between the models, however, certain components are consistently present in them all. Considering the models as describing temporally dynamic phenomena, all of them start with some kind of interaction between an emotion-evoking stimulus and the individual in whom the emotion is evoked. This interaction then evokes physiological excitation and cognitive excitation, which may lead to conscious awareness of the experienced cognitive-affective state evoked. The stimulus (S), the physiological excitation (PE), and the cognitive component (CC) are all fundamental components in every model, but the models differ significantly in all other details. Because of the heavy emphasis on structure and hierarchy in the existing psychological models of emotion processing, these models seem modular even if they are not. The authors of the models sometimes cut across disciplinary boundaries, mixing neuropsychological or neurochemical components into models which are otherwise purely cognitive-affective in nature. The classic visualizations of these models obfuscate the temporal aspect of emotion processing at best and altogether ignore it at worst. This is particularly problematic for emotional prosody, which develops and is processed along a specific timeline. Therefore, rather than simply reproducing and describing the models as they are I will reinterpret and revisualize them to emphasize the temporal aspects of each. Reinterpreting the models from this perspective reveals a very deliberate structural progression from the most simple classic models (James–Lange, Cannon–Bard, Schachter–Singer), through transitional models of moderate complexity (Universal Basic Emotions, Schlosberg’s Three-Dimensional Model), to the most recent high complexity models (Appraisal Model, Core Affect Model, Conceptual Act Model). Where appropriate, unique features of each model are incorporated and described appropriately.
4.2 The Classic Models of Emotion Processing Carl Lange’s Om Sindsbevaegelser (Eng. On Emotions) appeared in its original Danish in 1885, preceding the publication of The Principles of Psychology by William James which came out in 1890, but it was only translated into English and accorded proper recognition in 1912. Lange’s physiological studies complemented James’ psychological theory, and together their conclusions about the nature of emotions became known as the James–Lange model of emotions. The empirical works by Walter B. Cannon and Philip Bard spanned over two decades between the early 1910s and 1930s, and both published their results systematically throughout this period, focusing heavily on the physiological aspect of emotional phenomena. Their conclusions about the nature of emotional experience would go down in history as the Cannon–Bard model of emotion processing. Stanley Schachter and Jerome E. Singer developed their own model in much the same way as their predecessors and predicated it on very similar assumptions, but introduced a measure of specificity absent from the earlier models. This model was proposed in 1962
[email protected]
4.2 The Classic Models of Emotion Processing
69
Fig. 4.1 Classic models of emotion processing
as the Schachter–Singer model of emotions. Collectively, the James–Lange, the Cannon–Bard, and the Schachter–Singer models are traditionally referred to as the classic, in the sense of revered obsolescence, models of emotion processing. While 77 years separates the first and the last of these models, there are indeed more similarities than there are differences between them. The typical visualizations of all three models stress the differences more than they do similarities, although the theories behind the models have much in common. While all theories and models of emotion acknowledge emotions and emotion-like phenomena, the models usually describe prototypical emotional episodes caused by external stimuli (S). The three classic models treat emotions as discrete phenomena which are in their essence direct psychophysiological responses to an emotion-evoking stimulus. They are visualized as step-by-step causal chains from stimulus to physiological excitation (PE) associated with a cognitive component reaction (CC) at a set point in the timeline of the emotion process. Figure 4.1 illustrates the three classic models. All three classics were simple linear models starting with a stimulus and culminating in an emotional state. The major difference between the three came to the exact moment when a simple cognitive-physiological reaction to a stimulus became a fully fledged emotion according to a given emotion theory and model. Minor differences included the points at which the emotion produced emotional behaviors or reflexes, and the degree of specificity of the various processing components. In the James–Lange model emotions started as an excitation in the parasympathetic nervous system. This excitation constituted emotion even before any cognitive components or conscious awareness were engaged. At this early stage emotional
[email protected]
70
4 A Different Look at Emotion Processing Models
reflexes could occur. With the engagement of cognitive component emotion became fully fleshed out and could motivate complex emotional behaviors. Conversely, the Cannon–Bard model postulated that the physiological excitation and the cognitive component were engaged simultaneously upon exposure to emotion-evoking stimulus. Within this model emotions without cognitive component did not exist and they could produce both emotional reflexes and complex emotional behaviors at any point along the processing timeline. The Schachter–Singer model drew from both its predecessors. In this model a stimulus evoked a physiological excitation which could prompt some emotion reflexes, but this excitation only became emotion when cognitive components were engaged. The model additionally postulated that specific labels for emotional experiences, as well as the existence of a whole catalogue of highly specialized patterns of emotional activation. Specifically, whereas the James– Lange and the Cannon–Bard models were intended to be applicable to all emotions, the Schachter–Singer model postulated that specific stimuli evoke stimuli-specific physiological excitation patterns (SSPE) and stimulus-specific cognitive components (SSCC) with appropriate labels. The Schachter–Singer model became the first to imply that emotions are highly differentiated phenomena in terms of processing. In the way the three classic models relate to language and culture as factors in emotion processing they are pretty similar. While James gave some consideration and thought to both, Lange, Cannon, Bard, Schachter, and Singer were all focused almost exclusively on physiology. The influence of culture is at most implied, while language is altogether disregarded as a factor in all but the Schachter–Singer model. While it is true their “labeling” had more to do with cognitive mapping to prototypical emotion concepts, they implicated language in the process in similar way James Russell would in his later works—as guideposts or markers pointing to the appropriate concept. All in all, however, the three classics are remarkably uniform. All present emotions as discrete events triggered by external stimuli with differential engagement of the physiological and cognitive components with a distinct timeline. All three models likewise evoke a range of emotional reflexes and behaviors when appropriate configuration of components occurs in accord with the basic tenets of a given theory. While simplicity is usually a commendable feature for any processing model, it was plain to the researchers modeling emotions that this simplicity precluded these models from accounting for a great variety of emotional phenomena. This prompted them to propose their own and more complex models.
4.3 Transition Stage—Discrete Emotions Versus Early Dimensional Models of Emotion Processing Because of the way theory of emotions was developed in psychology in the second half of the twentieth century the stages of development of various models by complexity do not correspond exactly to the chronological order in which they appeared. Schlosberg’s (1954) three-dimensional model predates the
[email protected]
4.3 Transition Stage—Discrete Emotions Versus Early Dimensional Models …
71
Schachter–Singer model, but its composition places it firmly at a more advanced stage in terms of processing complexity. Paul Ekman’s discrete basic emotions model was developed between 1969 and 1992, overlapping with the period when the most complex models were being developed, yet it bears somewhat surprising resemblance to the classic models of emotion processing. Schlosberg (1954) visualized his model geometrically as a cone-shaped three-dimensional space with the valence (pleasant-unpleasant) and attention (attention-rejection) continua crossed at the base of the cone and the arousal (sleep-tension) continuum forming the axis of the cone. Ekman’s model (1992) was never formally visualized, but it was described in enough relevant detail to construct an appropriate visualization. And while the two models technically belong to opposing theoretical camps, they share the same level of intermediate complexity between the classic and the most recent models. There are two crucial characteristics both Ekman’s and Schlosberg’s models share. One is in how the models relate to the emotion-evoking stimuli and the other is in the assumption of emotion discreteness. Regarding the stimuli, both models incorporate a new processing stage between perceiving a stimulus and emotional reaction. At this new stage a mental representation of the perceived stimulus is formed and the emotion is evoked in response to that representation rather than the objectively present stimulus. In the case of Ekman’s model the stimulus is processed fast into a stimulus-specific representation (SSR), and in the case of Shlosberg’s model the stimulus is processed through the three-dimensional space which imparts specific characteristics upon the stimulus representation (SR). Regarding discreteness, it is overtly expressed in the six basic emotion categories in Ekman’s model, but more implied in Schlosberg’s model. The latter assumes that every emotion concept occupies a discrete location within his three-dimensional space. These two models are more advanced in that they no longer assume we perceive and emotionally react to an unbiased perception of objective reality. But they are also firmly rooted in the classic idea that both the physiological components and physiological excitation are engaged simultaneously in the formation of emotional experiences. Apart from the introduction of subjectivity and the idea of categorical discreteness of emotions, the two models differ in almost every respect, as illustrated in the visualizations in Fig. 4.2. Ekman’s model assumes a fairly radical level of specificity demand in emotion processing. Objective stimuli are perceived and represented as belonging to a specific class of stimuli (SSR) associated with a particular basic emotion category and expressive response. While stimulus-specific parallel activation of the cognitive component (SSCC) and the physiological excitation (SSPE) were already present in the Schachter–Singer model, this model demands an unprecedented level of specificity. Ekman postulates that there are six basic categories of emotional experience, each with its dedicated category of emotional expressions: happiness, anger, surprise, fear, sadness, and disgust. A specific mental representation of a stimulus (SSR) prompts a specific emotion category to be evoked and specific emotional expression to occur.
[email protected]
72
4 A Different Look at Emotion Processing Models
Fig. 4.2 Transitional models of emotion processing
This simple action–representation–reaction mechanics is also the guiding principle of Schlosberg’s model, but all claim to specificity is dropped in favor of a focus on subjective experience of emotions. In this model our subjective representation (SR) of a stimulus is constructed by filtering the perception of an objective stimulus (S) through a three-dimensional dimensional space where evaluate the objective stimulus in relation to the self. The evaluations encompass the valence dimension (pleasantness-unpleasantness), attention dimension (attention-rejection), and arousal dimension (sleep-tension) continua are fast, subconscious, inevitable, irrevocable, and they shape our subjective representation (SR) of the stimulus (S). The emotion evoked by the evaluations, whatever the emotion is, will necessarily be colored by our evaluations on both the cognitive component and physiological excitation levels. Another difference between these models is the postulated behavior evoked by the emotional states evoked. Paul Ekman limited his postulates in this area largely to facial expressions in his original works, but his successors and collaborators have since expanded into other areas, such as emotional prosody (Sauter et al. 2009). Shlosberg, on the other hand, retained the classic conception of emotional state output as broad spectrum of emotional reflexes and behaviors. What these transition stage models still do not fully embrace is the influence of culture or language. Answering his critics in the middle period of his career, Paul Ekman conceded cultural and social context effects have the potential to alter emotional expressions through display rules. As many later dimensional models of emotion processing, Schlosberg implies the cognitive component is the carrier of cultural and linguistic norms of emotional expression internalized over the course of socialization and enculturation. Still, both models appear to treat both language and culture as intervening variables rather than integral parts of emotion concepts or phenomena. Both Ekman’s discrete emotions model and Schlosberg’s dimensional model have characteristics of the classic models in their linear construction, their simplicity, and the assumption of discreteness of emotional phenomena. However, they also include the notions of subjectivity and representationally mediated nature of emotional experience. Each also
[email protected]
4.3 Transition Stage—Discrete Emotions Versus Early Dimensional Models …
73
represents a different approach to emotions. Ekman’s model became a staple of emotion research and his theory dominated the psychology mainstream, while Schlosberg’s model became the foundation of the dimensional and appraisal theories developed in opposition to the Ekmanian mainstream. Ekman’s radical universalism encapsulated in his discrete emotions model of processing went on to become the preferred framework for emotional prosody research. Schlosberg’s nascent minimal universalism would inspire a healthy measure of skepticism regarding the unquestioning devotion to the radical universalism. And from this skepticism the most complex models of emotion processing would be developed.
4.4 Current Approaches—From Skeptical Resistance to Deep Complexity Scherer’s own appraisal model, dubbed the Component Process Model (Scherer 2009) and Russell’s own dimensional model, dubbed the circumplex (Russell 1980) have been visualized and described in great detail by their authors. Barrett’s own integrative model, dubbed the Conceptual Act Model (Barrett 2011) has only been visualized partially but is has been described in sufficient detail to render a revisualization possible. All these original visualizations have specific downsides, especially regarding the temporal progression. Scherer’s model is rich in detail and describes the relationships between different subroutines in the model, but it includes multiple feedback and reappraisal loops that obfuscate the mechanism. The emphasis on the processing components and the relationships between them give the model modular characteristics that go against the postulated dynamic nature of the processing. Russell’s visualization consists of a pair of axes intersecting at right angles, with the horizontal axis representing valence and the vertical representing arousal. Emotions assessed by numbers on scales of valence and arousal are plotted onto this model. The model thus emphasizes the role of the core affect within Minimal Universality, but does not include other aspects of perception he describes as crucial for emotion processing within his theory. Barrett’s model is partially visualized on the level of core affect, as her model incorporates it as one of the fundamental mechanisms within emotion processing, but once more this visualization fails to reflect the temporal dynamics of the model or components beyond core affect. Figure 4.3 illustrates my revisualizations of all three of these most complex models. The appraisal and dimensional models are very similar overall, with the main difference in their approaches being in the emphasis they put on various processing components. Barrett’s integrative Conceptual Act Model incorporates both appraisal and dimensional components at various levels. There appears to be, therefore, a certain logical progression in the development of these models. They also share multiple crucial assumptions on theoretical level and important components in their models. All three propose that both the cognitive component
[email protected]
74
4 A Different Look at Emotion Processing Models
Fig. 4.3 Complex models of emotion processing
(CC) and the physiological excitation (PE) are continuous in human experience. The cognitive component is continuously active and perceived as (self-)awareness and consciousness on various levels of strength and prevalence. The physiological excitation, generally accepted to be composed of valence and arousal evaluations, form a type of somatovisceral barometer continuously monitoring the environment for potential emotion-evoking stimuli. Both are constantly active, but do not constitute emotion. Emotion for all three starts when an objective stimulus (S) becomes its subjective representation (SR) and briefly becomes the focus of both parallel streams of cognitive-affective processing (CC and PE). Where the three models differ is in determining exactly how that process of transition from objective stimulus to its subjective representation loaded with affective meaning. Furthermore, all three models work on the condition that emotion concepts are not fundamentally different from nonemotional concepts. All concepts are stored in
[email protected]
4.4 Current Approaches—From Skeptical Resistance to Deep Complexity
75
the same manner in long-term memory, are organized prototypically, and overlap to various extents on various aspects of meaning along their fuzzy boundaries. The same is true for emotional and nonemotional concepts alike. In the appraisal model the objective stimulus (S) enters our subjective consciousness through a series of preconscious perceptual filters with built-in valence and arousal evaluation level, which delimit how that objective stimulus is subjectively represented (SR). What constitutes emotion in this model starts with the evaluative mechanisms being activated in the preconscious appraisals and peaks once the representation is formed, only to fade off as time progresses. What is crucial here is the fact that if the objective stimulus remains within the field of perception and or our relation to it shifts as the time progresses, the representation might undergo a change in the course of reappraisal based on the changing circumstances. In the dimensional model the core idea is that valence and arousal are the most clearly felt in our subjective experience. However, more complex appraisals play a role in differentiating the diffuse and vague core affect sensation into specific emotions according to the knowledge and experience regarding previous emotional experiences retrieved from memory. In both the appraisal and the dimensional models emotional reflexes and behaviors coincide with the emergence of emotion within our cognitive-affective processing stream. The integrative-constructivist process model takes a more detailed and radical approach. Emotions in this model are constructed ad hoc out of a set of cognitive-affective primitives activated in our continuous cognitive-affective processing stream by the appearance within our field of perception of an objective stimulus (S). Working on multiple parallel levels of cognition including appraisals (A), personal experience (Exp), general sociocultural knowledge (Kn), as well as emotion terms (ET), including the affective level of core affect, we construct a subjective representation of the objective stimulus (SC!). The process resembles the completion of a jigsaw puzzle. Perceiving an objective stimulus embedded in its situational context our minds pick up on the antecedents, social circumstances, the sensation of self, the utterances formed, and the nonverbal channel cues—all puzzle pieces. These pieces are then assembled in our consciousness until the overall pattern thus constructed approximates one of our prototypical emotion concepts, at which moment the emotion is defined and consciously identified. In this model emotional reflexes and behaviors may emerge at the point where multiple perceptual primitives are being activated and the emergent construct triggers specific somatic reflexes, but also at any stage until emotion offset, which comes with any change in the circumstances or our relation to the objective stimulus. In this model emotions are emphatically nondiscrete and no different from any other nonemotional concept. This model also explains why phenomena such as emotion paradox or misinterpretations of emotions exist. It is also the first to overtly include sociocultural conditioning as one of the primitives in initial perception and language in the form of emotion terms. The model also keeps the deep subjectivity aspect by including personal experience, and retaining good generalizability by including core affect and continuous cognitive-affective stream.
[email protected]
76
4 A Different Look at Emotion Processing Models
The appraisal, the dimensional and the integrative models are the most advanced models of emotion processing to date. Rather than shirking from the problems posed by cultural and linguistic factors, these models embrace them. Appraisal theory probes appraisals through vocabulary choices on emotion naming tasks. Dimensional models such as the circumplex plot emotion terms on valence and arousal scales to assess core affect within Minimal Universality. The integrative-constructivist model incorporates both language and culture as factors in the general aspect of processing objective reality. Appraisal model postulates that emotions form through filtered perceptions, dimensional model that on the level of core affect these perceptions are measurable, and the integrative-constructivist model explains the nature of these perceptions in the case of emotions and the mechanism for constructing an emotional representation of the objective stimuli. These three models are undoubtedly the most complex to date, especially in contrast to the classical or transition period models.
4.5 Conclusions—The Cartesian See-Saw The models of emotions in psychology have undoubtedly come a long way on several levels. From simplicity to complexity, from rejecting to embracing cultural and linguistic factors, and—perhaps most importantly—from physiological to psychological focus. The classic models were all steeped in physiological ideas of emotion, wherein the emotional reaction, whether it contained a cognitive component or not, was just that—a bodily reaction to an external stimulus. These models, in other words leaned on the “body” side of the Cartesian duality. The models in the transition stage shifted from this quintessentially physiological perspective to a more psychologist one by acknowledging that our perceptions of reality are not unbiased, but rather mediated trough subjective cognitive-affective representations. Up until this point in the development of emotion processing models the authors would often seek to validate their theories and models by drawing parallels or designing experiments within physiology or neurology frameworks. With the emergence and growing significance of the previously marginalized models rooted in emotion theories outside the Ekmanian mainstream, this reliance on disciplines outside psychology lessened. The appraisal, dimensional, and integrativeconstructivist models all represent an almost purely psychological perspective. In emotional processing terms all subroutines happen within our psyche and are explicable within the limits of psychology. In this respect these models lean on the “mind” side of the Cartesian duality. But this is also a return to the Jamesian idea that emotions are the domain of psychology and the Wundtian idea that only psychologists have the requisite tools and skills to probe the human psyche in all its aspects, including emotions. The visualization of all these models by their common features allows also to reveal how psychology progressed in its understanding of emotional phenomena. From treating them as inherently unique phenomena contingent upon physiology
[email protected]
4.5 Conclusions—The Cartesian See-Saw
77
and controlled by reason to considering them no different than any other concepts conceived by the human mind and contingent upon a myriad of factors, including the once eschewed cultural and linguistic factors. In a way, it could thus be said that what the models of emotion processing reveal across the history of their development is a long way towards reconciliation with the anthropological postulate of cultural and linguistic relativity principle. Today it becomes increasingly appreciated that in the nature versus nurture debate the latter has equal if not greater influence on the way we process and express emotions. Our perceptions of reality are subjective, and our means of expression are idiosyncratic. Emotions are communicated, negotiated, and strategically deployed in interpersonal communication, wherein display rules are also learned and policed by emotionally competent interlocutors. Emotions permeate communication and language both in the verbal and nonverbal channel, and the emotion paradox, while explicable within the integrative-constructivist model, remains a problem. For all these reasons direct evidence that culture and language do indeed play a part in emotion processing would require locating a sufficiently homogenous population which have a mastery of language but be devoid of the kind of socialization and enculturation regarding emotions and display rules that all children learn in their communities. Luckily, with the rise of English as the lingua franca of the globe, such populations have become readily available. They are the linguistically highly proficient nonnative speakers of English. Trained in English in their native countries in formal contexts they have negligible exposure to natively English social contexts which would facilitate the acquisition of socially sanctioned means of emotion expression appropriate for that language. Exactly how emotions are processed in English by such individuals is the focus of this book.
References Barrett, L. F. (2011). Constructing emotion. Psychological Topics, 20(3), 359–380. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200. Lange, C. G. (1885/1912). Om Sindsbevaegelser [On Emotions]. Whitefish: Kessinger Publishing. Marsella, S., Gratch, J., & Petta, P. (2010). Computational models of emotion. In K. R. Scherer, T. Bänziger, & E. Roesh (Eds.), Blueprint for an affectively competent agent: Crossfertilization between emotion psychology, affective neuroscience, and affective computing. Oxford: Oxford University Press. Niedenthal, P. M. (2007). Embodying emotion. Science, 316, 1002–1005. Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. Sauter, D. A., Eisner, F., Ekman, P., & Scott, S. K. (2009). Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. PNAS , 107(6), 2408–2414. Scherer, K. R. (2009). The dynamic architecture of emotion: Evidence for the component process model. Cognition and Emotion, 23(7), 1307–1351. Schlosberg, H. (1954). Three dimensions of emotions. The Psychological Review, 61(2), 81–88.
[email protected]
Chapter 5
The State of Emotional Prosody Research—A Meta-Analysis
5.1 Introduction Prosody is what linguists refer to as a suprasegmental feature of speech that is a composite feature involving multiple smaller phonological segments arranged in a distinct and meaningful pattern. Emotional prosody is what psychologists refer to as a class of emotion-specific prosodic patterns which functionally cue emotions in emotion recognition tasks. For the nonspecialists emotional prosody constitutes the emotionally colored tone of voice accompanying speech. From an emotion research standpoint emotional prosody is a subject as interesting as it is difficult. It occupies the intersection between verbal and nonverbal channels of emotional expression and thus feature characteristics of both influence its mechanics. The primary component of human speech is its propositional content, i.e., “what” is said, arranged into logical structure and bearing meaning. This component is under virtually complete conscious control barring limited occasions when social or linguistic inhibitions are abandoned due to temporary incapacitation due to psychological or emotional upheaval or intoxication of various origins. The other quintessential component of speech is prosody, including emotional prosody, the “how” of things said, which affects the overall meaning of the propositional content significantly. Being dependent in large part on the respiratory system controlled by the sympathetic nervous system, prosody is also subject to the automatic and reflexive physiological factors. In terms of emotion expression, human speech is therefore a balancing act between what we can and cannot control. Because of this, as well as because of its dynamic temporal nature, emotional prosody is widely held to be material of high ecological validity in emotion research (Banse and Scherer 1996; Elfenbein and Ambady 2002a). Some even argue that eliciting emotional prosody in laboratory conditions does not diminish its excellent
© Springer International Publishing Switzerland 2016 H. Ba˛k, Emotional Prosody Processing for Non-Native English Speakers, The Bilingual Mind and Brain Book Series 3, DOI 10.1007/978-3-319-44042-2_5
[email protected]
79
80
5 The State of Emotional Prosody Research—A Meta-Analysis
ecological validity precisely because of its inherently physiological nature on the logic that if successfully evoked, the emerging emotion will naturally overrule any vestige of conscious control over the respiratory system (Xu 2010). And yet, the need to elicit emotions and the nigh incontrollable nature of vocal emotion expression, along with the entanglement of semantics and prosody, make emotional prosody an immensely complex object of research. This complexity perhaps also explains the fitful way in which research in the field of emotional prosody has so far been carried out. One aspect of the existing emotional prosody research is the conspicuous lack of continuity of inquiry. While there are several researchers and labs appearing fairly consistently throughout the body of evidence which do follow consistent and continuous lines of research, a large proportion of the studies investigating emotional prosody are one-offs for the researchers carrying them out. On the whole, therefore, the field and the body of evidence concerning emotional prosody is heterogeneous and fragmentary. It also appears that the disciplines most interested in emotional prosody are equally profoundly uninterested in the general principles and mechanics of normal emotional prosody processing. Instead, they have a vested interest in malfunctioning or maladaptive emotional prosody as a diagnostic tool for psychiatric or neurological conditions. Fairly substantial research has been done in psychiatry regarding the emotional prosodic properties of auditory hallucinations in schizophrenia (see Hoekert et al. 2007 for a review). Neurologists used emotional prosody as one of the means of assessing the degree of deterioration or damage in neurodegenerative disorders and traumatic brain injuries (see Taler et al. 2008 for evidence from Alzheimer’s disease; Kan et al. 2002 for evidence from Parkinson’s disease; Speedie et al. 1990 for evidence from Huntington’s disease; Schmidt et al. 2010 for evidence form traumatic brain injuries). Psychologists have used emotional prosody to diagnose developmental or behavioral disorders (see Uekerman et al. 2010 for evidence from ADHD; Lindner and Rosen 2006 for evidence form Asperger’s syndrome). While clinical populations are the most frequently engaged, psychologists have managed to amass a reasonable body of evidence on emotional prosody processing in normal, healthy populations forming a formal control population baseline for comparisons. Disciplines outside the social or medical sciences have also investigated emotional prosody, notably so the field of affective databasing, which is dedicated to the creation of meticulously annotated, multi-purpose and often multimodal databases used in psychology and man–machine communication research. In the former, affective databases have been used to create stimuli, while in the latter they largely served to teach machines to recognize emotions. Some of these databases have a very respectable potential for emotion research (for reviews see Ververidis and Kotropoulos 2003, 2006; The HUMAINE project website). However, despite the multiplicity of fields involved and the amount of research done, there is remarkably little consensus about the nature of the investigated object that is emotional prosody.
[email protected]
5.2 Consensus on the Nature of Emotional Prosody Processing
81
5.2 Consensus on the Nature of Emotional Prosody Processing With the many disciplines involved and the overall discontinuous nature of research, there are precious few points on which the researchers appear to agree. In the main, those points circle around the entanglement of semantics and prosody, the role of gender in emotion communication, the specificity of emotions possible to recognize from prosody, and around frameworks and methodology implemented to study emotional prosody. The researchers agree that from the most crude vocalizations to the most subtle tonal shifts, prosody can be either congruent or incongruent with the meaning of propositional content. The semantics-congruent prosodic features facilitate the expressive and illocutionary power of any utterance (Scherer et al. 2003). The interpretation of emotional expressions becomes more complicated when prosody is incongruent with the semantics it colors. It has long been known that extreme emotions can be manifested by patently incongruent vocalizations such as weeping for joy or laughing in anger, sadness or disgust (Russell et al. 2003). In less extreme instances, when emotional expression in speech is ambiguous due to apparent incongruence of propositional content and vocal tone, researchers generally agree that reliance on the less controllable prosodic cues leads to more accurate recognition scores, making emotional prosody a disambiguating feature of communication (Ekman 2003). The entanglement of semantics and prosody and the possibility to manipulate the congruity between the two has been variously employed or dealt with across the various studies into the subject of emotional prosody. There are a few methods of disentangling the two or masking semantics, and their use will be reviewed below. Gender also plays a role in emotional prosody processing. There is consistent evidence that male and female participants perceive emotions differently, resulting in significant differences in the accuracy they achieve on emotion recognition scores (Kotz and Paumlann 2007; Schirmer and Kotz 2003; Schirmer et al. 2005; Goerlich et al. 2011). Even more intriguingly, gender stereotyping may play a role in emotion recognition from prosody, as it has been shown several times that the gender of the individual expressing emotions is a statistically significant factor in attributing emotional states (Banse and Scherer 1996; Schirmer and Kotz 2003; Wallbott and Scherer 1986; Bach et al. 2008). The fact that in interpreting emotional expressions we factor in the gender of the speaker, and that has the potential to significantly alter the pattern of attribution of emotion may be one of the most important downplayed results in the field of emotional prosody research. Regarding emotions themselves, it appears that while emotional prosody disambiguates speech when semantics and prosody are incongruent in valence, it is inherently ambiguous when it is analyzed in isolation from semantics. Sadness is recognized from minimal samples of isolated prosody to a reliably high degree of accuracy, as are fear and anger (Paulmann and Pell 2011; Thompson and Balkwill 2006; Schirmer 2010). However, when participants are asked to discriminate between happiness and anger in isolated prosody they often confuse the two
[email protected]
82
5 The State of Emotional Prosody Research—A Meta-Analysis
(Cornew et al. 2009), which is often attributed to the similar levels of perceived arousal or intensity in both emotions (Scherer et al. 2003). Arousal is associated closely with the acoustic property of fundamental frequency (F0), that is the tonality and rhythmicity of speech, and it has long been accepted as a crucial discriminating feature in emotional prosody research (Busso et al. 2009). The power of valence as a discriminating feature, on the other hand, has long been questioned (Russell and Barrett 1999; Scherer et al. 2003), due mainly to recurring errors such as confusing anger and happiness—emotions of similar arousal but opposite valence. On the one hand, emotional prosody research is typically an incidental aspect of larger subjects in emotion research. On the other hand, there appears to be a common theme throughout the research in that the grand majority of studies are carried out within the standard view of emotions and substantiating the theory of universal basic emotions (Ekman 1992). The fundamental assumption in emotional prosody research methodology is that basic emotions are discernible from prosody as reliably as from facial expressions. The most frequently employed experimental task is emotion recognition through forced choice out of options limited to a selection of basic emotions with an occasional addition of neutral tone option (Russell et al. 2003). Within this paradigm, even in cross-linguistic studies, above-chance accuracy of recognition is practically universal, and ceiling accuracy is not uncommon (Russell 1994). These results are uniformly interpreted as evidence supporting the universality hypothesis at the heart of the standard view. Simultaneously, a fairly strong in-group advantage effect has been observed in the data wherein emotional prosody of one’s own language and culture is recognized better than that of any other language or culture (Elfenbein and Ambady 2002b). The slight variability in recognition rates usually found in emotional prosody recognition studies on various languages is usually explained by conceding some measure of linguistic relativity influence over the data (Walbott and Scherer 1986; Mesquita and Frijda 1992). Finally, although the research on emotional prosody leans by necessity on stimuli of acted rather than authentic emotional speech samples, there is some palpable discomfort about using materials containing a measure of artifice (Scherer et al. 2003; Drolet et al. 2012). Nascent evidence exists that naturally expressed and acted emotions are perceived differently, potentially to a significant degree (Drolet et al. 2012). However, for the time being, concerns of practical and ethical nature outweigh the pressure to use natural emotional expressions, and evidence based on the perception of acted emotions is widely held sufficient to form valid conclusions on the nature of emotional prosody processing. All in all, a large proportion of what the researchers of emotional prosody agree on appears to center on the agenda behind the standard view. Basic emotions are expressed in emotional prosody much like in the face; they are recognizable from emotional prosody in a reliable fashion within forced choice paradigms. The slight fluctuations in recognition rates observable in the data are all within the margin of universality slightly influenced by relativistic effects dependent on the gender of the speakers and interpreters, emotion category expressed, and the manner of
[email protected]
5.2 Consensus on the Nature of Emotional Prosody Processing
83
its expression. Taken as a whole, the consensus on the nature of emotional prosody appears to contain a curious contradiction. There appears to be an agreement that valence is not a reliable feature for discriminating emotions in prosody, but general arousal expressed in fundamental frequency is. But instead of pursuing a dimensional framework, the entire field seems quaintly devoted to the discrete basic emotions framework. In other words, the consensus appears to be that only a very general assessment can be made on emotional prosody when it is isolated from semantics, but the discrete emotions framework increases the demand on specificity of identification quite significantly. Indeed, it has been argued that a substantial portion of the high accuracy rates results from the procedural artifacts of forced choice tasks preferentially used in this field (Russell 1994). If this is indeed the case, a systematic and critical review of the preferentially used methods should reveal the strong and weak points of the existing research and suggest how a new integrative approach might be constructed.
5.3 Literature Review Selection Criteria The fact that emotional prosody has been investigated in so many related though distinct fields resulted in a reasonably substantial but fundamentally heterogeneous body of evidence. In spite of that, the methods implemented in this loosely organized field are remarkably uniform and based in large part on the methods accepted in the standard view. Because of the variety of disciplines involved in emotional prosody research, gaining an understanding of how emotional prosody is processed was not easy but absolutely necessary for my research objectives. My study design was based in large part on a critical meta-analysis of the existing empirical literature. However, the state of research being as it is, I had to specify a set of selection criteria to focus only on the studies which contained sufficiently complete procedural details to ensure replicability. Additionally, the studies had to include behavioral data. It quickly became apparent that on the two basic criteria of replicability and behavioral data content a large proportion of studies was disqualified. Only 31 studies fit the basic criteria and those employed an entire spectrum of emotional prosody manifestations from simple vocalizations to complex narrative passages in some of which propositional content was masked using a variety of methods. Those studies also often concerned clinical populations, included stimulus modalities other than auditory or various sets of basic emotions. To make a systematic comparison of methods and data from normal populations across these studies I had to apply further, more detailed selection criteria. Thus, the meta-analysis of existing literature in this chapter included: • In clinical studies—only data from healthy controls; • In studies of multiple modalities of stimuli—only data from prosody; • In studies including congruence manipulation between prosody and semantics—only data from tasks where prosody and semantics were congruent;
[email protected]
84
5 The State of Emotional Prosody Research—A Meta-Analysis
• In studies where isolated prosody and isolated semantics stimuli were used— only data from isolated prosody; • In studies using neuroimaging—only behavioral data; • In studies with multiple groups from the same language—data collapsed across the groups; • In all studies—only data on the processing of happiness and sadness. The meta-analysis is summarized in Table 5.1 (regarding stimuli development and validation procedures) and in Table 5.2 (regarding population sampling and results). The meta-analysis is organized by analogy with a prototypical context of emotion communication between two interlocutors. On the one hand, I focus on the design of the stimuli employed in the existing literature. Within experimental context, stimuli stand in for the interlocutor expressing emotions. Regarding stimuli, therefore, I consider who were the speakers engaged to provide emotional speech samples, what their gender was, what languages they spoke, what emotions they were asked to express and how the emotions were evoked, what type of expression they used, as well as whether and how the prosody was isolated from semantics and, finally, whether the stimuli were formally validated. In other words, who said what and how, and how much of the expression was presented to the other interlocutor. The other interlocutor in the experimental context is, of course, represented by the study participants. Regarding participants I thus consider who they were in terms of gender and native language, how many of them there were, what recognition or identification tasks they performed, and how well they performed in terms of speed and accuracy. Finally, having assessed these various aspects of methodology in the existing literature, I design my own experimental approach and proposed four major hypotheses to test in the course of my study.
5.3.1 On the Development and Validity of Stimuli for Emotional Prosody Research The primary focus of emotional prosody research has been the question of how it is perceived and how emotions are recognized or identified from prosody. The construction of stimuli has therefore been treated instrumentally, as a means to an end. Indeed, analyzing the available literature form a pragmatic standpoint, one reaches the inevitable conclusion that according to most researchers emotions are signals sent out into a kind of interpersonal void by one party, to be fished out and interpreted by another party along the lines of universal emotion categories. It is apparent from the discussion sections and the theoretical papers that neurologists and psychologists investigating emotional prosody understand that it is relational and communicative in nature and they can even be negotiated or strategically deployed (Scherer et al. 2003). Communication of emotions implies that a communicative intent, the intent to make apparent or understood, is involved.
[email protected]
Masking
No
No
No
No
No
No No
Yes
Yes
No
Type of PC
Sentences
Words
Sentences
Vocalizations
Sentences
Sentences Sentences
Sentences
Sentences
Passage
Source
Kotz and Paumlann (2007) Schirmer and Kotz (2003)
Schirmer et al. (2005)
Bostanov and Kotchoubey (2003)
Drolet et al. (2012)
Rota et al. (2008) Paulmann et al. (2008a)
Paulmann et al. (2010) Paulmann et al. (2008b) Fujiki et al. (2008)
[email protected]
(pseudo-) German (pseudo-) German English
German German
German
German
German
German
German
Language
4
Full (1)
–
e.g.
100 1400
e.g. e.g.
–
Full (1)
–
–
PC inventory e.g.
64 90
178
5
98
148
Number of stimuli 117
Table 5.1 Meta-analysis of stimuli development in emotional prosody research
–
–
750–870ms in oddball 630–980ms in priming M = 1.76s Range = 0.357– 4.843 2s M = 2.92s (A) M = 3.62s (D) M = 4.05s (F) –
–
–
–
Stimulus length
H, S, A, F, D, Sur H, S, A, F
H, A, F, D
H, S, A, N A, F, D, N
H, S, A, F
H, S
H, S
H, A, N
H, A, N
Emotions
N = 48 (?,?) forced choice from (H, A, F, S, and I don’t know) (continued)
–
Reference to previous study
Pre-existing standardized test –
Propositional content rated in undisclosed study
N = 23 (15 M-8F) valence (−2) “v.angry” to (2 +) “v.happy” N = 46 (23M−23F) for propositional content valence (−2) to (2+) Undisclosed number of raters rate valence of propositional content valence (−2) to (2+) –
Validation procedure
5.3 Literature Review Selection Criteria 85
No
No
Words
Sentences
Schirmer (2010)
Dimoska et al. (2010)
No
Numbers
No
No
Sentences
Hopyan-Misakyan et al. (2009) Alba-Ferrara et al. (2011)
Sentences
No
Sentences
Schmidt et al. (2010)
Mitchell (2006b)
Masking
Type of PC
Source
Table 5.1 (continued)
[email protected]
(audio filter) English
English
English
English
English
English
Language
–
e.g.
64
e.g.
–
–
PC inventory –
2880
–
96
24
Number of stimuli 32
M = 1.7s Range = 0.9– 2.5s
–
4.7s
M = 2010ms Range = 1800– 2080ms
–
3s
Stimulus length
H, A, F, Sur
H, S, A, N
H, S, A, pride, guilt, boredom H, S, N
H, S, A, F, D, Sur, N, sleepiness H, S, A, F
Emotions
(continued)
Propositional content rated for H/S (N = 20) Recorded samples validated−no details N = 30 (15M−15F) for propositional content valence (−2) “v.negative” to (2+) “v.positive”, (0) “non-arousing” to (4) “highly arousing” N = 30 (15M−15F) for prosody first forced choice (A, S, H, N, other) and (−2) “v.negative” to (2+) “v.positive”, (0) “non-arousing” to (4) “highly arousing” Reference to previous study
Reference to an invalid online resource
Pre-existing standardized test
Pre-existing standardized test
Validation procedure
86 5 The State of Emotional Prosody Research—A Meta-Analysis
Yes
Sentences
Paulmann and Pell (2011)
No
Sentences
Mitchell et al. (2003)
No
No
Vocalizations and words
Sorocco et al. (2009)
Sentences
No
Sentences
Mitchell (2006a)
Mitchell et al. (2004)
Masking
Type of PC
Source
Table 5.1 (continued)
(pseudo-) English
(audio filter) English
(audio filter) English (audio filter) English
(audio filter) English
Language
e.g.
236
e.g.
–
e.g.
e.g.
40
–
PC inventory e.g.
Number of stimuli –
–
4.7s
4.7s
H, S, A, D, Sur, N
H, S, N
H, S, A, Sur, N, disinterest H, S, N
H, S, N
4.7s
–
Emotions
Stimulus length
[email protected]
(continued)
Propositional content rated for H/S (N = 20) Recorded samples validated— no details Propositional content rated for H/S (N = 20) Recorded samples validated— no details N = 20, forced-choice out of 6
Propositional content rated for H/S (N = 20) Recorded samples validated— no details Pre-existing standardized test
Validation procedure
5.3 Literature Review Selection Criteria 87
No
Yes
Phrases
Words
Yes
Sentences
Castro and Lima (2010) Goerlich et al. (2011)
Fujisawa and Shinohara (2011) Kitayama and Ishii (2002)
Yes
Sentences
Rigoulot et al. (2013)
Yes
Yes
Sentences
Cornew et al. (2009)
Words
Masking
Type of PC
Source
Table 5.1 (continued)
(audio filter) Japanese, English
Japanese
(pseudo-) English (pseudo-) Portuguese (pseudo-) Dutch
(pseudo-) English
Language
Full (1) Full (171)
208
–
e.g.
e.g.
PC inventory e.g.
64
48
448
64
Number of stimuli 144
–
–
–
M = 8 syl.
1250ms
M = 2.7s
Stimulus length
positive, negative
H, S, A
H, S, A, F, D, Sur, N H, S
H, A, F, N
H, A, N
Emotions
N = 72 (F), forced choice out of 7 and 7-point intensity scale N = 10, valence (−4) “v.sad” to (4+) “v.happy”, inter-rater agreement at min. 90 % N = 2 (the researchers) arbitrary qualitative assessment JAP N = 13, valence (1) “v.unpleasant” to (5) “v.pleasant” for both propositional content and prosody ENG N = 23, valence (1) “v.unpleasant” to (5) “v.pleasant” for prosody (continued)
N = 40 forced choice out of 3 and “how well a sentence expresses the intended emotion” from (1) “v.weakly” to (5) “v.intensely” N = 24, forced-choice out of 4
Validation procedure
88 5 The State of Emotional Prosody Research—A Meta-Analysis
[email protected]
[email protected]
No
Vocalizations
Sauter et al. (2009)
(pseudolanguage) English, German, Arabic, Hindi (pseudo-) Spanish, English, German, Arabic English, German, Chinese, Japanese, Tagalog English, Hindi
Language
180
–
Full (2)
e.g.
160
80
PC inventory e.g.
Number of stimuli 125
–
–
S, A, F, D, Sur, amusement, achievement, pleasure, relief
H, S, A, F
H, S, A, F, D, Sur, N
H, S, A, F, D
6–14 syl.
8–14 syl.; 1–3s
Emotions
Stimulus length
–
N = 10, prosody assessed on how “typical” each expression is for a given emotion
Reference to previous study
–
Validation procedure
In Table 5.1 the “PC” abbreviation stands for Propositional Content. The “(pseudo-)” signifies that the language used in the study was a pseudolanguage and the real language(s) listed after this abbreviation was the base from which the pseudolanguage was derived. The “e.g.” indicates only some examples of propositional content were listed in the study and the “Full” followed by a number in brackets indicates all propositional content was listed, the number indicating the number of unique items. The “syl.” abbreviation indicates syllables. The following abbreviations indicate emotions: H—happiness, S—sadness, A—anger, F—fear, D—disgust, Sur—surprise.
No
Yes
Sentences
Pell et al. (2009a)
Sentences
Yes
Sentences
Pell et al. (2009b)
Thompson and Balkwill (2006)
Masking
Type of PC
Source
Table 5.1 (continued)
5.3 Literature Review Selection Criteria 89
[email protected]
Rota et al. (2008) Paulmann et al. (2008a) Paulmann et al. (2010)
Kotz and Paumlann (2007) Schirmer and Kotz (2003) Schirmer et al. (2005) Bostanov and Kotchoubey (2003) Drolet et al. (2012)
Source
M
(11M-1F)
(9M-4F)
12
13
(26M-22F)
48
10
(10M-10F)
20
F
(32M-32F)
64
24
M–F Ratio (16M-18F)
N 34
M = 46.0 SD = 11.86
M = 25.0 Range = 24–38 M = 49.2
M = 24.0
M = 24.0 SD = 2.1 M = 24.5
M = 24.5 SD = 3.25
Age(s) M = 25.7 SD = 3.0
FC 5 (H, A, F, D, N)
FC 4 (H, S, A, N) FC 4 (A, D, F, N)
FC 4 (A. F, H, S) FC 2 (genuine) x (acted)
Oddball out
M ≈ 93 % positive M ≈ 96 % negative
FC 3 (positive, negative, N) LDT
M ≈ 80 % (real language) M ≈ 53 % (pseudolanguage)
Probability estimates: M = 0.27; SE = 0.04 (H-genuine) M = 0.28; SE = 0.07 (H-acted) M = 0.60; SE = 0.05 (S-genuine) M = 0.45; SE = 0.07 (S-acted) M = 83 %; SD = 12 % (collapsed for all emotions) M = 68.54 % (collapsed for all emotions)
M ≈ 88 % (H) M ≈ 82 % (S) “All participants correctly identified [the stimuli]”
Accuracy M = 95.31 % (H) M = 96.29 % (A)
Tasks Word probe (yes/no)
Table 5.2 Meta-analysis of the emotional prosody research procedures and results
German
German
≈910 ms positive ≈915 ms negative ≈1220 ms happy ≈1225 ms sad –
German
German
–
–
(continued)
German
Mean (all genuine) = 2.34 s; SD = 0.39 s Mean (all acted) = 2.41 s; SD = 0.46 s –
German
German
Language German
Response times –
90 5 The State of Emotional Prosody Research—A Meta-Analysis
M–F Ratio (32M-32F)
(8M-11F)
(50M-19F)
N 64
19
69
Fujiki et al. (2008)
Schmidt et al. (2010)
(7M-11F)
M
(8M-36F)
(56M-56F)
(12M-6F)
18
19
44
112
18
Schirmer (2010)
Dimoska et al. (2010)
HopyanMisakyan et al. (2009) AlbaFerrara et al. (2011) Mitchell (2006b)
Paulmann et al. (2008b)
Source
Table 5.2 (continued)
[email protected]
M = 44.4 SD = 12.2 Range = 23–62
M ≈ 21.69 SD ≈ 1.6
M = 24.8 SD = 8.79 Range = 18–51 M = 18.7 SD = 1.3
M = 10.3 SD = 1.5
M = 12.02 SD = 2.49
M = 9.8 SD = 1.8
Age(s) Range = 18–50
Old x new recall on (H, S, N) Same/different judgment FC 4 (H, A, F, Sur)
FC 6 (H, S, A, pride, guilt, boredom) FC 2 (H, S)
FC 6 (H, S, A, F, D, Sur, I don’t know) FC 8 (H, S, A, F, D, Sur, N, sleepy) FC 4 (H, S, A, F)
Tasks FC 7 (H, S, A, F, D, Sur, N)
Probability of recall: M = 0.77; SD = 0.15 (S) M = 0.74; SD = 0.15 (H) M = 76.6 % (same/different) M = 86.5 % (all emotions)
Healthy faster than TBI (non-significant)
(continued)
English
English
English Mean (all congruent) = 2.52 s; SD = 0.29 s –
M = 94.55 %; SD = 6.36 % (collapsed for all emotions)
English
–
M = 73.92 %; SD = 0.01 (H) M = 71.21 %; SD = 0.001 (S)
English
–
English
English
–
–
Language German
Response times –
M ≈ 76.67 % (H) M ≈ 83.34 % (S)
M ≈ 71.88 % (collapsed for all emotions)
Accuracy M = 70.0 % (H-young) M = 52.1 % (H-middle-aged) M = 72.4 % (S-young) M = 53.7 % (S-middle-aged) M = 92.10 % (H) M = 77.63 % (S)
5.3 Literature Review Selection Criteria 91
Paulmann and Pell (2011) Cornew et al. (2009) Rigoulot et al. (2013)
(25M-42F)
(15M-16F)
31
M
13
Mitchell et al. (2004)
67
M
13
Mitchell et al. (2003)
(12M-12F)
?
29
24
M–F Ratio (6M-22F)
N 28
Mitchell (2006a) Sorocco et al. (2009)
Source
Table 5.2 (continued)
M ≈ 21.50 SD ≈ 2.5 M = 23.8 SD = 4.8
M = 21.9 SD = 2.4
M = 32.2 SD = 0.93 Range = 18–60 M = 32.2 SD = 3.6
Age(s) M = 20.9 SD = 3.1 M = 23.3
[email protected]
Yes/no facial recall (prosodic influence on facial processing)
FC 3 (H, A, N)
FC (H, S, A, D, Sur, N)
Go/no-go (H)
FC 6 (H, S, A, Sur, disinterest, N) Go/no-go (H)
Tasks FC 2 (H, S)
(continued)
English
English
–
–
English
–
M = 88.0 %; SD = 6 % (H)
English
–
English
English
–
–
Language English
Response times –
“[In] emotional prosody no areas of functional activation correlated with accuracy” “In emotion prosody: schizophrenics activate left insula more; bipolar patients activate amygdala, bilateral STG and right IFG less than healthy controls” Hu scores: M = 0.28; SD = 0.02 (H) M = 0.63; SD = 0.03 (S) M ≈ 81.7 %; SD ≈ 13.36 (H)
Accuracy M = 91.8 %; SD = 2.3 % (congruent) M = 89.4 % (collapsed for all emotions)
92 5 The State of Emotional Prosody Research—A Meta-Analysis
Fujisawa and Shinohara (2011)
Goerlich et al. (2011)
Castro and Lima (2010)
Source
(53M-50F)
(16M-16F)
32
103
M–F Ratio (8M-72F)
N 80
Table 5.2 (continued)
M = 12.33 SD = 1.16
M = 23.8 SD = 4.0
Age(s) M = 21.8 SD = 6.1
FC 4 (H, S, A, N)
FC 2 (H, S)
Tasks FC 7 (H, S, A, F, D, Sur, N) Intensity scale (1) to (7).
Words priming prosody = no sig. difference b-n sadness and happiness. Prosody priming words = female participants primed by both valences, male participants by positive valence alone M ≈ 65 % (H) M ≈ 80 % (S)
Accuracy M = 75 % (H-lexical) M = 59 % (H-pesudolanguage) M = 84 % (S-lexical) M = 82 % (S-pesudolanguage)
–
Response times Mean (happy lexical) ≈ 3270 ms Mean (happy pesudolanguage) ≈ 3050 ms Mean (sad lexical) ≈ 3200 ms Mean (sad pesudolanguage) ≈ 2900 ms Mean (congruent) ≈ 695 ms
[email protected]
(continued)
Japanese
Dutch
Language Portuguese
5.3 Literature Review Selection Criteria 93
[email protected]
Pell et al. (2009a)
Pell et al. (2009b)
Kitayama and Ishii (2002)
Source
(10M-10F)
(10M-9F)
(29M-32F)
19 (Syr.)
61
(12M-12F)
24 (Eng.) 24 (Ger.) 20 (Ind.)
M = 24.9 SD = 7.9 M = 24.2 SD = 2.8 M = 21.55 SD = 3.0 M = 23.9 SD = 5.1 M = 27.0 SD = 4.0
(university/college students)
ENG (14M-24F)
38 (Eng)
(12M-12F)
Age(s) (university/college students)
M–F Ratio JAP (23M-87F)
N 110 (Jap)
Table 5.2 (continued)
FC 6 (H, S, A, F, D, N)
FC 7 (H, S, A, F, D, Sur, N) Intensity scale from (1) to (5)
FC 2 (pleasant, unpleasant)
Tasks FC 2 (pleasant, unpleasant)
ENG M = 79,6 % (H) ENG M = 90,5 % (S) GER M = 59,6 % (H) GER M = 72,6 % (S) IND M = 67,1 % (H) IND M = 75,7 % (S) SYR M = 59,9 % (H) SYR M = 74,7 % (S) *SPA M = 89 % (H) *SPA M = 51 % (S) ENG M = 32 % (H) ENG M = 74 % (S) GER M = 57 % (H) GER M = 65 % (S) SYR M = 59 % (H) SYR M = 77 % (S)
M = 89 %; SD = 10 % (positive congruent) M = 98 %; SD = 12 % (negative congruent)
Accuracy M = 78 %; SD = 20 % (positive congruent) M = 82 %; SD = 17 % (negative congruent)
–
Response times M = 1346 ms; SD = 327 ms (positive congruent) M = 1398 ms; SD = 388 ms (negative congruent) M = 1196 ms; SD = 215 ms (positive congruent) M = 1152 ms; SD = 235 ms (negative congruent) –
(continued)
*Spanish English German Arabic
Arabic
Hindi
German
English
English
Language Japanese
94 5 The State of Emotional Prosody Research—A Meta-Analysis
N 20
M–F Ratio (8M-12F)
Age(s) M = 21.95
Tasks FC 4 (H, S, A, F)
Accuracy Response times Language – *English *ENG M ≈ 99 % (H) German *ENG M ≈ 90 % (S) Chinese GER M ≈ 57 % (H) Japanese GER M ≈ 84 % (S) Tagalog CHI M ≈ 48 % (H) CHI M ≈ 61 % (S) JAP M ≈ 58 % (H) JAP M ≈ 79 % (S) TAG M ≈ 50 % (H) TAG M ≈ 97 % (S) – *English *ENG M ≈ 95 % (H) 51 (Eng) (21M-30F) FC 2 M = 28.85 Sauter et al. *ENG M ≈ 93.75 % (S) (arbitrary (2009) HIM M ≈ 82.5 % (H) case-wise HIM M ≈ 65 % (S) selection) – *Himba ENG M ≈ 65 % (H) (26M-32F) ? FC 2 58 ENG M ≈ 82.5 % (S) (arbitrary (Himba) *HIM M ≈ 70 % (H) case-wise *HIM M ≈ 82.5 % (S) selection) In Table 5.2 the “MF ratio” refers to male to female participant ratio. The “FC” indicates a forced choice task, the number following the abbreviation indicates the number of options in the forced choice task. The following abbreviations indicate emotions: H—happiness, S—sadness, A—anger, F—fear, D— disgust, Sur—surprise. The following abbreviations indicate languages: ENG—English, GER—German, IND—Hindi, SYR—Arabic (Syrian), SPA— Spanish, CHI—China, JAP—Japanese, TAG—Tagalog, HIM—Himba. An asterisk (*) indicates the native language of the population sample participating in a given study or study arm.
Thompson and Balkwill (2006)
Source
Table 5.2 (continued)
5.3 Literature Review Selection Criteria
[email protected]
95
96
5 The State of Emotional Prosody Research—A Meta-Analysis
Communication is a reciprocal and holistic interaction and in experimental context we attempt to approximate that interaction in controlled laboratory conditions. In such an approximation stimuli stand in for an interlocutor communicating emotions through prosody to an experimental participant. And while in such an arrangement the development of stimuli would seem to hold equal importance status to the selection of participants, this aspect of design is rather downplayed, if not entirely disregarded. 5.3.1.1 Who Are the Speakers in the Stimuli? The speakers employed to provide speech samples of emotional speech are a fairly varied group across the board and there appears to be little agreement as to the speaker selection process. With the exception of one study which involved a native speaker of Bulgarian to provide samples for a study in German (Bostanov and Kotchoubey 2003) and two studies which failed to determine the designated speakers’ nationalities (Hopyan-Misakyan et al. 2009; Sorocco et al. 2009), all studies routinely involve native speakers of the language the study is conducted in. The issue of the designated speakers’ gender is intimately connected to the number of speakers involved. That number ranges from 1 to 120, but in the studies involving even numbers of speakers gender balance is usually kept. Across the 31 studies included in this overview, 14 involved single speakers, 7 of them exclusively female (Cornew et al. 2009; Goerlich et al. 2011; Kotz and Paumlann 2007; Rota et al. 2008; Schirmer 2010; Schirmer et al. 2005; Schmidt et al. 2010) and 7 exclusively male (Bostanov and Kotchoubey 2003; Mitchell et al. 2003, 2004, 2006a, b; Paulmann et al. 2010, 2008b). One study involved two female speakers (Castro and Lima 2010), and two failed to disclose the number and gender of speakers involved (Hopyan-Misakyan et al. 2009; Sorocco et al. 2009). The remaining 14 ranged from 2 to 120 speakers and were usually well-balanced and matched in terms of gender (Schirmer and Kotz 2003; Drolet et al. 2012; Paulmann et al. 2008a; Fujiki et al. 2008; AlbaFerrara et al. 2011; Dimoska et al. 2010; Paulmann and Pell 2011; Rigoulot et al. 2013; Fujisawa and Shinohara 2011; Kitayama and Ishii 2002; Pell et al. 2009a, b; Thompson and Balkwill 2006; Sauter et al. 2009). Finally, with the exception of Drolet et al. (2012) all studies involved exclusively samples of acted emotional prosody. Seven studies employed amateur actors with some documented experience in theater and drama (Cornew et al. 2009; Dimoska et al. 2010; Goerlich et al. 2011; Paulmann and Pell 2011; Paulmann et al. 2008b; Schirmer 2010; Rota et al. 2008), two employed professional actors (Alba-Ferrara et al. 2011; Fujisawa and Shinohara 2011), two involved professional singers (Castro and Lima 2010; Sauter et al. 2009), one opted for a phonetician with specialist training in voice modulation (Bostanov and Kotchoubey 2003), and two once again failed to report the professional background of the speakers (Hopyan-Misakyan et al. 2009; Sorocco et al. 2009). The remaining 18 studies opted for naïve speakers instructed ad hoc about the
[email protected]
5.3 Literature Review Selection Criteria
97
emotional tone they were supposed to adopt (Kotz and Paumlann 2007; Schirmer and Kotz 2003; Schirmer et al. 2005; Drolet et al. 2012; Paulmann et al. 2008a, 2010; Fujiki et al. 2008; Schmidt et al. 2010; Mitchell 2006a, b, 2003, 2004; Rigoulot et al. 2013; Kitayama and Ishii 2002; Pell et al. 2009a, b; Thompson and Balkwill 2006). All in all, only about half of the studies involved both male and female speakers, care has usually been taken to use native speakers, and there has been a roughly even split between studies which use naïve speakers versus speakers with some acting or vocal training. The consensus in the field thus points to marked differences between males and females on the production and reception ends of emotional prosody communication, but these differences are not routinely controlled in the study designs. Furthermore, while there is commendable effort put into engaging native speakers to provide speech samples, the issue of using professional versus naïve actors is dealt with in a variety of ways. The researchers use naïve speakers, professional and semi-professional actors, singers, even trained phonologists, all of whom are typically instructed to speak in a specified emotional tone of voice. Deviations from these overall regularities are comparatively rare, as in the case of nonnative German speaker providing vocalization samples in Bostanov and Kotchoubey (2003), or the extensive use of authentic emotion displays in Drolet et al. (2012). 5.3.1.2 What Are They Saying? Because prosody is largely not an independent means of expressing emotion, but one inherently tied to semantics in natural speech, propositional content of speech plays a role in stimuli construction. The speakers engaged to provide samples of speech need material through which they will express emotional prosody. The only type of stimulus classified in the literature as prosodic which technically has no propositional content is vocalization. Because the focus in emotional prosody research is on the “how”, not on the “what” of speech, propositional content is one of the most patchily reported aspects of stimuli construction. Parameters such as stimulus length or the number of propositional content variants used to create prosodic stimuli are also scarcely and variously reported. Being a suprasegmental feature, discrete patterns of emotional prosody typically become fully realized over utterances comprising several words, a feature usually respected within experimental designs, as 21 out of the 31 studies fitting the selection criteria opted for sentences as basic stimulus structures (see Table 5.1). Of the remaining studies, one used a narrative passage as its propositional content basis (Fujiki et al. 2008), one used an honorific phrase (Fujisawa and Shinohara 2011), one used numbers (Alba-Ferrara et al. 2011), one used both vocalizations and words (Sorocco et al. 2009), two used vocalizations only (Bostanov and Kotchoubey 2003; Sauter et al. 2009), and four used words only (Kitayama and Ishii 2002; Schirmer 2010; Schirmer and Kotz 2003; Goerlich et al. 2011). Eight studies (Paulmann et al. 2008b, 2010; Pell et al. 2009a, b; Cornew et al. 2009; Paulmann and Pell 2011;
[email protected]
98
5 The State of Emotional Prosody Research—A Meta-Analysis
Castro and Lima 2010; Goerlich et al. 2011) included an additional complication for the speakers by having them emote in a pseudolanguage, that is a language constructed according to the phonotactic and morphological rules of a given investigated language, but semantically devoid of meaning. Other studies opted for a more speaker-friendly means of dealing with semantics such as low-band pass filtering in postproduction. Some studies constructed stimuli in which semantics and prosody matched, each facilitating the processing of the other (Schirmer and Kotz 2003; Paulmann et al. 2008b; Schirmer 2010; Drolet et al. 2012), others investigated the relationship between semantics and prosody by alternatively making them congruent and incongruent (Mitchell 2006b; Kotz and Paumlann 2007; Rota et al. 2008). One other solution involved using propositional content with verified neutral valence, that is emotionally neutral (Schirmer et al. 2005; Fujiki et al. 2008; Thompson and Balkwill 2006; Hopyan-Misakyan et al. 2009; Schmidt et al. 2010; Sorocco et al. 2009; Alba-Ferrara et al. 2011). Whether in pseudolanguage or in existing language, when the speakers are asked to emote using prepared linguistic material the material itself is impressively varied across the board. The total number of stimuli implemented in the course of the studies ranges from as few as four short narratives (Fujiki et al. 2008) to a staggering 2880 single words (Schirmer 2010), while the number of propositional content items on which the stimuli are based ranges from one vocalization (Bostanov and Kotchoubey 2003), one phrase (Fujisawa and Shinohara 2011) or one passage (Fujiki et al. 2008) to 171 different words (Kitayama and Ishii 2002). The reports on the average length are by no means the norm and encompass units of time (milliseconds, seconds) or phonetic units (syllables). The lengths reported in units of time varied from a low of 750 ms (some vocalizations in Bostanov and Kotchoubey 2003) to a high of 4843 ms (some utterances in Drolet et al. 2012). When the studies reported length in syllables, the lowest number was 6 (Pell et al. 2009b), the highest was 14 (Pell et al. 2009a, b). Only one study (Bostanov and Kotchoubey 2003) reports all types of relevant information: the total number of stimuli, a full transcript of the propositional content, and mean length of stimuli in seconds. Five studies (Cornew et al. 2009; Paulmann et al. 2008a; Rota et al. 2008; Dimoska et al. 2010; Rigoulot et al. 2013) report the number of stimuli, selected examples of propositional content used, and mean lengths of stimuli in seconds or milliseconds. Three studies (Castro and Lima 2010; Pell et al. 2009a, b) report the total numbers of stimuli, selected examples, and mean number of syllables across the stimuli. Four studies (Thompson and Balkwill 2006; Kitayama and Ishii 2002; Fujiki et al. 2008; Fujisawa and Shinohara 2011) reported the number of stimuli and full transcripts of propositional content, though with the exception of Kitayama and Ishii (2002) who used a total of 171 words (92 in Japanese, 79 in English), these transcripts comprise only one or two items. Four studies (Paulmann and Pell 2011; Kotz and Paumlann 2007; Paulmann et al. 2010; Sorocco et al. 2009) report the number of stimuli with selected examples but no mean lengths. Three studies (Drolet et al. 2012; Schmidt et al. 2010; Alba-Ferrara et al. 2011) report the number of stimuli and their average lengths in seconds or milliseconds, but no examples of propositional content,
[email protected]
5.3 Literature Review Selection Criteria
99
though Alba-Ferrara et al. (2011) do report using “numbers” as propositional content. Four studies led by Mitchell (Mitchell et al. 2003, 2004, 2006a, b) report selected examples and mean lengths of stimuli in seconds, but not the number of stimuli involved. Finally, seven studies (Schirmer and Kotz 2003; Schirmer et al. 2005; Paulmann et al. 2008b, Hopyan-Misakyan et al. 2009; Sauter et al. 2009; Schirmer 2010; Goerlich et al. 2011) only report the number of stimuli and no further details, though Sauter et al. (2009) note they used vowel-based vocalizations. All in all, there is more variety than consistency in the propositional contents the speakers use to convey emotions. Vocalizations have been used and called “prosody” even though technically by definition they are not suprasegmental features but simple phonemes. More often researchers have used words and sentences with an occasional passage or phrase in some studies. The issue of masking semantics pushed some research to additionally tax their speakers with emoting over fragments of nonsensical pseudolanguage. Once recorded the speech samples varied significantly in length from below a second to nearly 5 s. It would be difficult, in other words, to speak of any established or prevalent trend regarding the propositional contents used in emotional prosody research, with the possible exception of using about sentence-length stimuli. 5.3.1.3 How Are They Saying It? Speakers engaged to provide emotional speech samples for emotional prosody research purposes take on the de facto role of actors during the recording of stimulus material. Whether they are actually trained actors or naïve speakers, the standard procedure across virtually all studies calls for them being instructed to express all or a selection of the basic emotions using their voice and the propositional content provided. The only notable exception from this general rule is a portion of stimuli in Drolet et al. (2012), which comprised authentic expressions of emotions collected from news reports and live interviews on public radio. The act of emoting in laboratory context is fraught with conflicting objectives and the stimuli thus created are of necessity the result of design compromise. Good research practice in psycholinguistics dictates strict regulations on the length of stimuli prompting some researchers to lean towards words as they are easy to match and control on multiple variables. On the other hand emotional prosody demands longer units such as sentences, which are not as easy to match and control. Prosody is tied to semantics in speech, so investigating prosody without interference from semantics demands neutralizing semantic influence in some way. However, neutralizing semantics using pseudolanguages, for example, increases the perceived difficulty and artifice of the acted emotions from the speakers’ perspective. This, most likely, is the reason pseudolanguages are not widely used as a method of neutralizing or masking semantics, with only 8 out of the 31 studies analyzed implementing this method (Paulmann et al. 2008b, 2010; Pell et al. 2009a, b; Cornew et al. 2009; Castro and Lima 2010; Goerlich et al. 2011; Paulmann and Pell 2011).
[email protected]
100
5 The State of Emotional Prosody Research—A Meta-Analysis
Other studies used different methods of masking or emotionally neutralizing semantics, such as low-band pass filtering or using semantically neutral content. What researchers call emotionally or semantically neutral contend varied somewhat with some researchers using numbers (Alba-Ferrara et al. 2011), or nonsemantic vocalizations (Bostanov and Kotchoubey 2003; Sorocco et al. 2009; Sauter et al. 2009). Using low-band pass audio filters in postproduction has also been employed a number of times. Low-band pass filtering is a method of editing audio streams to remove portions of spectrum, rendering the semantic content opaque but retaining the prosodic information (Kitayama and Ishii 2002; Mitchell et al. 2003, 2004; Mitchell 2006a; Sorocco et al. 2009; Dimoska et al. 2010; Rigoulot et al. 2013). While this method of removing semantic content shows promise, especially as it shifts the burden of dissociating prosody from semantics from the speakers to the researchers who can carry it out in postproduction. However, of the analyzed studies only Sorocco et al. (2009) reportedly filtered out enough of the spectrum (70–300 Hz) to truly remove the semantic content. The remaining studies only reported the upper cutoff points, which are all too high to guarantee a complete removal of semantic content. Dimoska et al. (2010) reported cutoff points at 360 Hz for male and 400 Hz for female speakers, Mitchell (2006a) reported cutoff points of 350 Hz, Mitchell et al. (2003, 2004) cutoff of 333 Hz, and Kitayama and Ishii (2002) a uniform cutoff of 400 Hz. With the exception of the natural expressions dataset in Drolet et al. (2012) all results on emotional prosody processing are based on playacted emotions. The standard procedure is to instruct the speakers to portray specific emotions by describing the vocal tone the researchers hope to obtain. The discrete basic emotions of the standard view start playing their crucial role in the design of these studies in the level of the instructions given to speakers. The sole exception were Kitayama and Ishii (2002) who instructed their speakers to express themselves in a “smooth and round tone” associated with positive sentiments or “a harsh and constricted tone” associated with negative sentiments in the Japanese culture. All other studies used the typical basic emotions designations with occasional additions of what is referred to as more complex emotions or a neutral tone. The selection of the emotions appears to be of high importance here, and some emotions are used more frequently than others. With the exception of Kitayama and Ishii (2002) who used a different framework and Paulmann et al. (2008a) all the studies included in this analysis included happiness as one of the emotions to investigate in prosody. Twenty-three studies also included sadness (see Table 5.1), making it the second most popular choice. Six studies opted for anger in the place of sadness, and anger was an addition to 16 more studies making it the third most popular choice. Fourteen studies involved fear, nine involved disgust, and eight some form of surprise. The design of the stimuli thus hinges on the universal nature of the basic emotions being inherently understood by the speakers. To conclude, regarding the “how” of the prosodic stimuli, there is also very little agreement with the exception of the use of actors to portray emotions in a specified vocal tone. Beyond that, researchers differ in their approaches to stimulus composition. Some use pseudolanguages, risking the increased processing and
[email protected]
5.3 Literature Review Selection Criteria
101
production load for the speakers, some manipulate the valence of the propositional content, some use audio filters in postproduction. Only with respect to the emotions typically selected for research is there an additional fairly stable trend of opting for happiness and sadness most consistently, while the other basic emotions are used less frequently across the body of research. 5.3.1.4 From Speech Sample to Stimulus—Issues of Stimuli Validation The main reason emotional prosody research is stuck with playacted portrayals are the ever-present ethical concerns surrounding invoking genuine emotions in laboratory settings. The history of ethics in human subjects’ research in the past century teaches that experimental psychology has grown into a tightrope act, balancing precariously between methods which are highly effective and those which are ethical. The methods implemented in Watson’s experiments with Little Albert (Watson and Rayner 1920) and Landis’ (1924) experiments with inducing emotions were undeniably brilliant and effective but ethically highly questionable at the same time. The increased ethical stricture in emotion research today is undoubtedly commendable, but it also likely forced researchers to drop the most effective and straightforward methods of inducing emotions from their methodological toolboxes. This problem of balancing strong, effective method with good ethical research conduct can, however, be resolved. One example of a solution can be found in the methods—both effective and ethically sound—used to induce emotions in the Belfast Induced Natural Emotions Database (BINED) (Sneddon et al. 2007). Since inducing and recording emotions in laboratory conditions is limited by considerable ethical concerns, the seemingly obvious next best source of material would be to use existing emotional expressions recordings from sources such as news stations which, after all, capture human drama on regular basis. Yet this option is rife with its own set of problems. Not all such material is in the public domain, and some of it does not fall under the conditions of fair use for research purposes. What there is often presents mixed audio quality making appropriate normalization of the material for stimulation purposes impossible (Banse and Scherer 1996). Thus, sooner or later, every researcher investigating emotional prosody has to face the reality of using a playacted approximation of emotional reality. Given that an emotion acted is not the same as an emotion naturally expressed (Drolet et al. 2012) stimulus material should undergo norming procedures once recorded to make sure it is valid for the purposes of the research it is to serve. Unfortunately, norming or validation studies are among the most neglected aspects of stimuli creation to report in literature. The grand majority of emotion prosody research studies certainly do carry out validation or norming studies. The problem from the meta-analytical point of view is that the details of these validation studies are very poorly reported thus lowering the overall replicability. What details are reported are scarce and stand as a testament to the heterogeneous nature of the field of emotional prosody research. Across the thirty-one studies included here
[email protected]
102
5 The State of Emotional Prosody Research—A Meta-Analysis
five did not report conducting any validation studies (Bostanov and Kotchoubey 2003; Paulmann et al. 2008a, b; Pell et al. 2009b; Sauter et al. 2009), four used a pre-existing tests assessing various aspects of prosodic aptitude (Rota et al. 2008; Hopyan-Misakyan et al. 2009; Sorocco et al. 2009; Schmidt et al. 2010), three referred to previous studies (Pell et al. 2009a; Dimoska et al. 2010; Paulmann et al. 2010), one referred to a terminated online source (Alba-Ferrara et al. 2011), and one to an undisclosed previous study (Drolet et al. 2012). Two studies reported validating the propositional content of the stimuli but not the resultant prosodic stimuli (Schirmer and Kotz 2003; Schirmer et al. 2005), and a total of nine validation study descriptions were missing important details regarding the gender of the raters (Kitayama and Ishii 2002; Thompson and Balkwill 2006; Paulmann and Pell 2011; Rigoulot et al. 2013) or the validation procedure (Mitchell et al. 2003, 2004, 2006a, b, Fujisawa and Shinohara 2011). Of the remaining six studies, three gave satisfactory details on the number of raters and validation procedure (Fujiki et al. 2008; Cornew et al. 2009; Goerlich et al. 2011) and three gave complete set of details on the raters, their numbers and genders, as well as on the validation procedures making them properly replicable (Kotz and Paumlann 2007; Castro and Lima 2010; Schirmer 2010). The number of raters, when it was reported, varied from as few as 2 (Fujisawa and Shinohara 2011) to as many as 72 (Castro and Lima 2010). When reported, the validation procedures also differed, though they usually involved some variation on a scalar/dimensional evaluation. Castro and Lima (2010) used a combination of forced choice and perceived emotion intensity in their validation procedure, Kotz and Paumlann (2007) placed anger on the negative end of valence and happiness on the positive end. Schirmer (2010) used textbook scales of valence and arousal combined with forced choice validation to maximize reliability. Goerlich et al. (2011) used a valence scale with sadness and happiness on the opposite extremes of the scale and selected stimuli based on a high threshold of inter-rater agreement. Cornew et al. (2009) used a combination of forced choice and a custom scale of strength of expression for each stimulus. Fujiki et al. (2008) opted for a forced choice with an added option of “I don’t know” only. Interestingly, the procedural designs on the level of validation studies do not always match with the tasks eventually performed by experimental subjects. The grand majority of the tasks in experiments were based on forced choice paradigms, while at the validation stages scalar evaluation is clearly preferred, at least based on the few studies which reported enough details to make that kind of assessment possible. One conclusion that appears to be inevitable upon reviewing the approaches to stimuli creation and validation in emotional prosody research is that this aspect of research is not given its due attention. The stimuli are of crucial importance in this field, and though the disadvantages of using playacted over genuine emotional expressions are openly admitted, the efforts to control the effectiveness of the stimuli through validation appear inadequate. It also appears significant that there are no emotion elicitation procedures for the speakers, and their emotional states during production are never probed or controlled. The only type of validation is external, through the perceptual evaluations of observers.
[email protected]
5.3 Literature Review Selection Criteria
103
A rather significant piece of information we never get from the existing research is whether the speakers felt the emotions they were asked to act out.
5.3.2 On the Populations Involved in Emotional Prosody Research How emotional prosody is perceived, or more precisely how accurately emotional prosody is recognized, is the central focus of emotional prosody research. In the grand scheme of translating a real-life emotional interaction between two interlocutors to laboratory conditions, the participation aspect covers the interlocutor perceiving and interpreting the emotions communicated though stimuli. The devotion of the emotional prosody researchers to the standard view framework in the design of the experimental tasks is especially apparent here, though the fitful and fragmentary nature of the field has also left its mark. The studies selected for this meta-analysis included fMRI, EEG, and classic behavioral studies, each of which puts different demands on population sampling and counterbalancing and is interested in different types of data. The ages within and across population samples also often vary widely, though there is a justified preference for young adults. What is less justifiable is the English-centricity of population sampling in emotional prosody research, especially given the commitment to the standard view within the field. That commitment is especially visible in the preferential use of forced choice recognition tasks over any other means of probing the mechanisms underlying emotional prosody processing. Because of differences between the study objectives across the literature, the results are often reported in highly idiosyncratic ways, often making comparisons across the literature extremely difficult or even impossible. Interestingly, although such data should be easy to collect alongside accuracy scores within forced choice tasks, response times are, as a rule, rarely recorded or analyzed. Finally, the literature that is available focuses overwhelmingly on native contexts, with only three studies venturing into crosscultural and cross-linguistic contexts. On the whole, what we can gain from an analysis of the existing literature is a very good, thorough understanding of how a native speaker of English processes emotional prosody in his/her native tongue. The population samples in the literature selected for this review vary a great deal as a factor of the type of study described in each paper. From N = 10 in Rota et al.’s (2008) fMRI study to a total N = 148 in Kitayama and Ishii’s (2002) behavioral semantic/prosodic decision study. fMRI studies also stand out for their frequent single-gender sampling dictated by the documented gender-specific activation pattern differences for processes other than emotional prosody (Mitchell et al. 2003, 2004; Rota et al. 2008; Alba-Ferrara et al. 2011; Drolet et al. 2012). With few exceptions (e.g., Sorocco et al. 2009 did not report the male–female ratio), however, every effort is usually taken to create gender-balanced samples (see Table 5.2). The age of the participants does not vary much across the
[email protected]
104
5 The State of Emotional Prosody Research—A Meta-Analysis
literature, and there is a marked preference for young adults and adults of middle years. This preference is grounded in the fact that the gender-specificity of emotional prosody processing emerges around puberty (Fujisawa and Shinohara 2011) and gives way to gender-neutral age-related effects in late adulthood (Paulmann et al. 2008b). Of the thirty-one studies included in this review, only four focused on prepubescent children, three within English native contexts (Fujiki et al. 2008; Hopyan-Misakyan et al. 2009; Schmidt et al. 2010), and one within Japanese native context (Fujisawa and Shinohara 2011). Among the remaining studies with adult populations only two failed to report reliable data on the exact ages of the participants. In the case of Sauter et al. (2009) they worked with the preliterate Himba tribesmen, who do not have the Western concept of time passage in years, so determining age was impossible. The other was the Kitayama and Ishii (2002) study in which only the vague designation of university- and college-age was reported. With these exceptions, the overall picture points to a preferable age range for participants in emotional prosody studies as 18–45 years old. Although the types of studies and the research agendas differ across the available literature, the basic structure of the experimental tasks is remarkably stable across the implemented designs. That basic structure is a forced choice task wherein the available choices care usually comprised basic emotion categories (happiness, sadness, anger, fear, disgust, surprise) in different arrays and with an occasional addition of neutral tone option. Among the studies reviewed here, 24 opted for a forced choice task (see Table 5.2). Of those, two included a forced choice out of two very general hedonic categories of positive/pleasant versus negative/unpleasant emotions (Kitayama and Ishii 2002; Schirmer and Kotz 2003). The remaining 22 studies included basic emotions categories, though they varied in the number of categories given as possible choices from just two (Goerlich et al. 2011—happiness and sadness only) to the full set of six (e.g., Paulmann et al. 2008b). Four out of the 22 studies added the choice options of emotions or sensations outside the basic emotions category. Fujiki et al. (2008) included I don’t know as an option for the children in their study to cover for their imperfectly developed emotional intelligence. Schmidt et al. (2010) likewise added sleepy as an option, Sorocco et al. (2009) added disinterest, and Alba-Ferrara et al. (2011) added pride, guilt, and boredom. Sixteen studies included neutral as a response option (see Table 5.2). Seven studies included in this review used tasks different than forced choice, but were nonetheless predicated on the participants’ capability to distinguish between the discrete emotion categories, and these tasks included word probe (Kotz and Paulmann 2007), lexical decision (Schirmer and Kotz 2003), oddball (Bostanov and Kotchoubey 2003), recall (Schirmer 2010; Rigoulot et al. 2013), and go/no-go (Mitchell et al. 2003, 2004). On the whole, although the consensus leans towards dimensional interpretations of results in emotional prosody research, the research paradigms are rooted firmly in the standard view. Furthermore, the only options made available in the forced choice usually correspond to the categories of stimuli created for any given study. Because of this, one apparent implication is that the majority of emotional prosody research investigates not so much emotion recognition as emotion categorization.
[email protected]
5.3 Literature Review Selection Criteria
105
Given the nature of the emotional prosody research it is perhaps not very surprising to find certain inconsistencies in the overall style of reporting accuracy and response time data. Indeed, only seven of the studies included in this review gave any response time data, and those were often collapsed for various conditions or merely descriptive. Therefore no wide-scale systematic comparisons could be carried out (see Table 5.2). All the studies did, however, report accuracy data, though with some studies the reports are only descriptive and treated as incidental to the main purpose of the studies (Bostanov and Kotchoubey 2003; Mitchell et al. 2003, 2004; Goerlich et al. 2011). Most of the studies opt to report accuracy in percentages, though some report scores of prediction probability (Schirmer 2010; Drolet et al. 2012) or unbiased accuracy Hu scores (Paulmann and Pell 2011). Table 5.2 summarizes the accuracy scores for the recognition of happiness and sadness, as these two emotions would be the focus of the study reported in Chaps. 5 and 6. In 17 out of 31 studies, included accuracy for both happiness and sadness (or equivalent positive/negative sentiments) was reported. In seven more the scores were collapsed for various conditions, tasks, or emotions regardless of their valence. While the interpretations of results leave no doubt that the results support the universalist theory of emotions, there is still a noticeable variability in accuracy scores in different languages and in cross-linguistic emotion recognition. The scores of accuracy become interesting in the light of the standard view when they are considered in total and as a function of the within- or cross-linguistic study contexts. Twenty-eight of the studies analyzed here investigated emotional prosody in native contexts, and only three in cross-linguistic contexts. By collapsing evidence across all studies it is possible to form a reasonable idea of how average speakers of some of the investigated languages process emotional prosody. English natives interpreting English emotional prosody are documented in 18 different studies, German natives interpreting German emotional prosody—in 10 studies, Japanese natives interpreting emotional prosody in Japanese—in 2 studies. Portuguese, Dutch, Spanish, Arabic, Hindi, and Himba natives are documented in just one study each. Collapsing the populations across the literature gives us an insight into how English-centric, or even Germanic-centric the research on emotional prosody has been so far. Figure 5.1 gives a summary of averaged recognition accuracy scores in percentages for happiness and sadness in native contexts. When collapsed, the populations contributing to the average scores in Fig. 5.1 are as follows: For English N = 637 (299M:309F; age M = 21.4), for German N = 313 (158M:155F; age M = 31.01), for Japanese N = 213 (76M:137F; age M = 12.33 + college-age), for Portuguese N = 80 (8M:72F; age M = 21.8), for Spanish N = 61 (29M:32F; age M = 27.0), for Himba N = 58 (26M:32F; age M = ?), for Dutch N = 32 (16M:16F; age M = 23.8), for Hindi N = 20 (10M:10F; age M = 21.55), for Arabic N = 19 (10M:9F; age M = 23.9). In other words, approximately 45 % of all participants enrolled in emotional prosody studies to date are native English speakers, approximately 22 % are native speakers of German, and approximately 2 % are native speakers of Dutch (not represented in Fig. 5.1 as Goerlich et al. (2011) did not report percentage accuracy data). This means that a total of 79 % of all participants contributing to the body of evidence in emotional prosody research
[email protected]
5 The State of Emotional Prosody Research—A Meta-Analysis
106 Happiness-native
Happiness-non-native
Sadness-native
Sadness-non-native
100% 90% 80%
86%
87% 79%
70%
83%
81%
78%
75%
72%
60%
70%
60%
50% 40% 30% 20% 10% 0%
49%
78%
English
57%
75%
German
58%
79%
Japanese
59%
77%
Arabic
83%
65%
Himba
Fig. 5.1 A comparison of recognition rates for sadness and happiness in different languages
belong to the Germanic languages family. Further 15 % of the total population of participants in emotional prosody research comes from native speakers of Japanese, leaving only 6 % between Portuguese, Spanish, Arabic, Hindi, and Himba. Thus, data from Germanic languages, with English overwhelmingly in the lead, constitute the greatest contribution to our understanding of emotional prosody in general. One of the more interesting insights to be gained from collapsing the results in such a way is the fact that although all recognition rates here are indeed above chance, there are marked differences between how native speakers of different languages process emotional prosody of the assumed universal basic emotions in their own native languages. Since there are differences in how such basic emotions as happiness and sadness are recognized within various languages, one could expect that interpreting emotional prosody in a foreign language would pose the greater challenge and result in a marked dip in accuracy scores. Such a result has been described as an in-group advantage (Elfenbein and Ambady 2002a, b), that is a tendency to recognize emotions of one’s own language and culture faster than those of another language and culture. Between 31 studies included in this review, emotional prosody of five languages was evaluated by both native and nonnative populations: English, German, Japanese, Arabic, and Himba. Figure 5.2 illustrates a comparison of how native and nonnative populations recognized the emotional prosody of happiness and sadness: The data on the nonnative populations evaluating emotional prosody of each of these languages comes from three studies, all of which reported that the participants therein were monolingual and had little to no exposure to the investigated foreign languages (Thompson and Balkwill 2006; Pell et al. 2009a; Sauter et al. 2009). The comparison here is only meant to give a rough idea of the kinds of differences one can expect in comparing emotional prosody recognition across languages, as the profound imbalance of the population samples here makes a statistically valid comparison impossible. Total population sample of individuals for whom English was their native tongue was N = 637, compared to a total of only N = 119 for individuals with negligible knowledge of English (N = 61 Spanish speakers in Pell et al. 2009a, N = 58 Himba speakers in Sauter et al. 2009). For German it was N = 313
[email protected]
5.3 Literature Review Selection Criteria
107
Happiness
Sadness
100% 90% 80%
89%
86% 79%
70%
75%
72%
67%
60%
65% 60%
50% 40% 30% 20% 10% 0%
87%
English
78%
German
81%
Japanese
84%
Portuguese
76%
Hindi
75%
Arabic
51%
Spanish
83%
Himba
Fig. 5.2 A comparison of recognition rates of happiness and sadness across languages
to N = 81 (English speakers N = 20 in Thompson and Balkwill 2006 and Spanish speakers N = 61 in Pell et al. 2009a). For Japanese it was N = 213 to N = 20 (English speakers N = 20 in Thompson and Balkwill 2006). For Himba only was there some semblance of balance as the ratio went N = 58 to N = 61 (English speakers in Sauter et al. 2009). Despite these imbalances, however, one tentative observation can certainly be made—what constitutes a discrete pattern of emotional prosody in one language can be perceived somewhat differently in another, especially when participants of a given study do not know the language the prosodic samples come from. One unexplored question remains of how individuals who know a given language but do not belong to and have not been raised in its native community would process emotional prosody of that language.
5.4 The State of Emotional Research—Evaluation All things considered, it is perhaps charitable to call emotional prosody research a field. There are very few researchers committed to continuous research in the subject (notably Mark D. Pell, Silke Paulmann, Annett Schirmer, and their teams), the research agendas are incredibly varied, and the style of reporting procedural details for purposes of replication extremely inconsistent. The only aspects holding this section of research together are the preferential choice of the standard view of emotions as the basic theoretical framework, and the use of forced choice tasks in experimental procedures. Dismayingly often those inquiring into the nature of emotional prosody processing fail to heed the advice of their predecessors. Though the effects of gender on both production and perception of emotional prosody is a part of the overall consensus, more than half of the researchers in emotional prosody opt to only use one speaker thus failing to balance gender effects in the
[email protected]
108
5 The State of Emotional Prosody Research—A Meta-Analysis
stimuli (Banse and Scherer 1996). The usual research practice is also to involve naïve native speakers with no training in acting, and instructing them to act out specified vocal tones of emotions over specified propositional content. The propositional content varies from semantically meaningless vocalizations to complex passages with a marked preference for sentences. The researchers also appear to be very intent on analyzing emotional prosody in isolation, which leads them to any number of methods of disentangling the semantic and prosodic layers of meaning in speech. Some tax the speakers with emoting in pseudolanguage, lowering even further the ecological validity of the stimulus material thus produced. Others argue for the use of semantically neutral language, seemingly overlooking the fact that given the nature of language and of emotions there can be no such thing, really, as a word completely devoid of affective meaning (Russell 1994; Kopytko 2002). The validity of the stimuli in emotional prosody research is the result of multiple and inevitable compromises making them vulnerable on many levels. Therefore the poor stimuli validation record across the literature does the researchers no service. There are two major issues with participation records. One is the inordinate proportion of studies conducted exclusively in the English language context. The other is the use of forced choice tasks results to make claims about emotion recognition rather than categorization. Ekman (2003) postulated that emotional prosody was a universal disambiguating feature in emotional communication. The research on emotional prosody, however, seems unfit in its body of evidence to support this claim, being as profoundly English-centric as it is. Still, the typical choices in population sampling do tell us something about the nature of emotional prosody processing in the average young adult English speaker. We know such an individual can recognize emotional prosody of his/her own native language within the limits of a forced choice task, that the recognition accuracy within these limits is always well above chance, and that male and female individuals process emotional prosody somewhat differently. The issue of preferential use of forced choice tasks is more difficult. While this is an established and highly reliable method of evaluating emotion recognition, it also remains true that recognition by categorization into prototypical emotion concepts in the English language has its limits. What compounds this problem is the fact that across the literature on this subject the construction of stimuli dictates the options within forced choice absolutely. The one laudable exception was the study by Fujiki et al. (2008), who allowed that the child participants may not have the capacity to comprehend certain emotions and thus provided an I don’t know option. Other than that there is a perfect match between the emotions produced by speakers and the emotion category options available within the forced choice. The problem with such an arrangement is that it leaves precious little room for errors, mistakes, and misinterpretations, and it is only by those that we can determine both the scale and the points of divergence between the native speakers of an investigated language and other populations. Whereas Paul Ekman believed emotional prosody to be a disambiguating feature (Ekman 2003), Klaus Scherer (Scherer et al. 2003) insisted on it being inherently ambiguous. The body of evidence from emotional prosody research appears to suggest that emotional prosody is ambiguous unless viewed through the rigorously disambiguating categorical choices of the Ekmaninan standard view.
[email protected]
5.5 Investigating Emotional Prosody in Nonnative English Speakers—Study Design
109
5.5 Investigating Emotional Prosody in Nonnative English Speakers—Study Design In its devotion to the theoretical assumptions and empirical solutions of Ekmanian universalism emotional prosody research, such as it is, appears out of step with the most recent developments in emotion research in general. In its preferential choices of framework and basic methods it is a picture perfect of the state of emotion research a couple of decades back, when the critics of the standard view were still operating from outside the psychological mainstream. The fact that cultural and linguistic relativity both play a role in emotion processing has been gaining acceptance in psychology. Methodological alternatives to forced choice within the standard view have been tried and tested in empirical studies on various subjects within the inclusive spectrum of emotion research. Thus the focus of this study on nonnative speakers of English stems from both the theoretical developments in emotion theory and the methodological limits of emotional prosody research. On the theoretical level, investigating highly proficient but nonnative English speakers with little to no meaningful exposure to the early-childhood formative influence of the English-speaking culture could help to disentangle the culture-relativistic and language-relativistic effects on emotional prosody processing. On methodological level nonnative speakers of any language have not yet been investigated in emotional prosody processing. Also on methodological level, the consensus suggests an involvement of arousal as a discriminating feature in emotional prosody perception, and the gender- and age-related effects bespeak the involvement of appraisals. Emotional prosody may therefore be the perfect aspect of emotional communication on which to deploy a fully realized integrative research paradigm based on an integrative/constructivist model of emotion processing. Such a model, as it would involve processing levels demanding varying specificity in emotion identification, would in turn be a perfect approach to investigate emotional prosody processing in a population quite different in its characteristics than those investigated before. To add to these intersecting purposes, there is the issue of increasing relevance of investigating different aspects of social communication in nonnative speakers of English. These are relevant because even the more conservative estimates show that by 2050 the global population of nonnative speakers of English will overtake that of the native speakers of the language (Graddol 2003). The time is therefore more than ripe for gaining an understanding of how nonnative speakers process emotions in English.
5.5.1 Creating Stimuli The language of the study will be English, and the speakers will be native speakers of English, equal parts male and female. The speakers will not be professional actors. They will contribute both natural and acted emotional expression samples
[email protected]
110
5 The State of Emotional Prosody Research—A Meta-Analysis
and the emotions will be elicited in them in laboratory conditions. The emotions will be evoked using sad and happy evocative film clips, which is a well-established emotion elicitation method (see Rottenberg et al. 2007). Natural expressions of emotion will be elicited in the course of a semi-structured dialogue between the speakers and the researcher following the viewing of each prepared clip. A session of acting out emotions using a series of prepared sentences will follow. In it, the speakers will be instructed to adopt a certain tone of voice (happy or sad) and say each sentence presented to them in that tone. The sentences will be taken from an existing spoken corpus of English, but they will be based on an existing set from the Velten technique of mood elicitation (Velten 1968). Throughout the procedure the speakers’ mood and emotions will be monitored. Speakers will be randomly assigned to one of two study conditions, whereby each will only be primed for and perform one emotion. The language used throughout the recording procedure will be English and prosody will be isolated using low-band pass filters at 60–300 Hz in postproduction. Both video and audio recordings will be collected to be analyzed in this and other related studies.
5.5.2 Stimuli Exploration The stimuli exploration study planned for this dissertation will aim to obtain ratings and evaluations of the material recorded from the speakers in four modalities: (a) Audio-Visual Unfiltered which will contain video samples with normal, unfiltered audio, (b) Audio-Visual Filtered which will contain video samples with filtered audio, (c) Audio Unfiltered which will contain samples of normal, unfiltered audio, and (d) Audio Filtered which will have samples of filtered audio. Only the Audio Filtered (AF) stimuli would be used in the experiment proper in this study. The ratings from other types of stimuli will be used in further research which will be aimed at establishing the relationship between the prosodic (the stimuli with filtered audio), semantic (the stimuli with unfiltered audio), and visual (the stimuli with video modality) cues to emotional expression in speech. The ratings will be obtained from a participants drafted from the same population as the speakers, and the types of rating tasks and procedures they will perform will be the same as the types of tasks and procedures to be performed by the participants in the experiment proper. Both male and female participants will be included and every effort will be made to equalize the male to female raters ratio.
5.5.3 Population Sampling—Nonnative English Speakers Fluent bilingual speakers of English as a foreign language on two levels of proficiency will be the designated population samples for this study. All participants will be drafted from the Faculty of English, Adam Mickiewicz University. The
[email protected]
5.5 Investigating Emotional Prosody in Nonnative English Speakers—Study Design
111
population sample of lower proficiency will be drafted from among students at the estimated English language proficiency level of B1 by the Common European Framework of Reference for Languages (CEFR). The group of higher proficiency will be drafted from among students at the estimated English language proficiency level of C2 by CEFR. All participants will be screened for all factors that may influence their capacity to understand or perform the experimental tasks, particularly for alexithymia and language history. Alexithymia is a subclinical state manifesting itself, among other things, in an inability to identify emotions by name (Bagby et al. 1994). Language backgrounds will be tested using a Language History Questionnaire (Li et al. 2014) to control for potential long-term immersive exposure to native English cultures. An additional questionnaire administered after the experimental procedure will probe whether the participants could identify the speakers’ gender and language from the filtered audio presented in the stimuli, and ask participants to rank the procedural tasks in order of difficulty. The details of the Procedure will be described in Chap. 6. As the response time data is scarce and inconclusive in the existing body of evidence, the hypotheses to be tested in the course of the empirical work were all based on accuracy scores measurement. Based on the previous research done in the field of emotional prosody reviewed here, and on what constitutes general consensus in the field, the following four main hypotheses are proposed to be tested in this dissertation: Hypothesis 1 Accuracy scores for happiness and sadness recognition will be lower for the participants of lower English language proficiency than for the participants of higher English language proficiency. Hypothesis 2 Accuracy scores for sadness will be higher than those for happiness for all participants regardless of their proficiency in English. Hypothesis 3 Accuracy scores for female speakers will be higher than those for male speakers regardless of emotion expressed. Hypothesis 4 The accuracy scores for acted emotion will be higher than that for natural emotion for all participants regardless of their proficiency in English.
5.6 Conclusion The body of evidence on emotional prosody recognition suffers a number of problems. It is overwhelmingly concerned with monolingual rather than crosslinguistic contexts and it is largely English-centric. The entanglement of prosody and semantics is still an unresolved problem, and each solution has its drawbacks. Though sentences appear to be generally accepted as the best compromise of stimulus length, the exact lengths in seconds or syllables are rarely reported, as are the exact transcripts of the sentences. Happiness and sadness are very consistently featured as target emotions, and acting is the preferred mode of expressing
[email protected]
112
5 The State of Emotional Prosody Research—A Meta-Analysis
them. There is no uniform approach to the number of employed speakers/actors, with half the studies only employing one, and half employing multiple speakers/ actors and balancing the number of speakers by gender. There is also very little consistency in how best to carry out validation studies. Regarding perception, the numbers of participants and the composition of population samples varies greatly, though the response paradigms are by and large limited to forced choice out of basic emotions. Apart from above-chance recognition, the accuracy scores show little consistency across studies, and the response times are inconclusive, while the results for cross-linguistic recognition of English prosody yield somewhat contradictory results. The consensus in emotional prosody theory points to integrative paradigms as the way forward for research, and the existing research suffers from a number of consistency issues. Therefore, the method to be used in this dissertation was designed to both attempt to apply an integrative paradigm systematically, and to avoid the problems and inconsistencies present in the existing research. The standard view procedure of forced choice out of basic emotion categories will be included for the sake of establishing a common methodological ground for comparisons between this work and the existing literature. The dimensional procedure of evaluating valence and arousal will be included to investigate the recurring universal arousal effect in a more direct way than it has been done before, and to test once again for possible effects of valence. The free naming procedure, where participants name the feelings and emotions expressed in stimuli using words of their own choosing will be included to investigate how varied can the interpretations of emotional prosody can be. Combining those three types of procedures may not only improve the quality and quantity of data we have on emotional prosody processing in general, but also reveal some of the dynamic mechanisms behind emotion processing on various levels. Additionally, as the stimuli will be constructed in a new way for the field, it is my hope that the planned stimuli exploration study will yield results which will aid interpretation of the results of the experiment proper.
References Alba-Ferrara, L., Hausmann, M., Mitchell, R. L., & Weis, S. (2011). The neural correlates of emotional prosody comprehension: Disentangling simple from complex emotion. PLoS ONE, 6(12), 1–10. Bach, D. R., Grandjean, D., Sander, D., Herdener, M., Strik, W. K., & Seifritz, E. (2008). The effect of appraisal level on processing of emotional prosody in meaningless speech. NeuroImage, 42, 919–927. Bagby, R. M., Parker, J. D. A., & Taylor, G. J. (1994). The twenty-item Toronto Alexithymia Scale—I. Item selection and cross-validation of the factor structure. Journal of Psychosomatic Research, 38(1), 23–32. Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.
[email protected]
References
113
Bostanov, V., & Kotchoubey, B. (2003). Recognition of affective prosody: Continuous wavelet measures of event-related brain potentials to emotional exclamations. Psychophysiology, 41, 259–268. Busso, C., Lee, S., & Narayanan, S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech and Language Processing, 17(4), 582–596. Castro, S. L., & Lima, C. F. (2010). Recognizing emotions in spoken language: A validated set of Portuguese sentences and pseudosentences for research on emotional prosody. Behavior Research Methods, 42(1), 74–81. Cornew, L., Carver, L., & Love, T. (2009). There’s more to emotion than meets the eye: A processing bias for neutral content in the domain of emotional prosody. Cognition and Emotion, 24(7), 1133–1152. Dimoska, A., McDonald, S., Pell, M. D., Tate, R. L., & James, C. M. (2010). Recognizing vocal expressions of emotion in patients with social skills deficits following traumatic brain injury. Journal of the International Neuropsychological Society, 16, 369–382. Drolet, M., Schubotz, R. I., & Fischer, J. (2012). Authenticity affects the recognition of emotions in speech: Behavioral and fMRI evidence. Cognitive, Affective and Behavioral Neuroscience, 12, 140–150. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200. Ekman, P. (2003). Sixteen enjoyable emotions. Emotion Researcher, 18, 6–7. Elfenbein, H. A., & Ambady, N. (2002a). On the universality and cultural specificity of emotion recognition: A meta-analysis. Psychological Bulletin, 128(2), 203–235. Elfenbein, H. A., & Ambady, N. (2002b). Is there an in-group advantage in emotion recognition? Psychological Bulletin, 128(2), 243–249. Fujiki, M., Spackman, M. P., Brinton, B., & Illig, T. (2008). Ability of children with language impairment to understand emotion conveyed by prosody in a narrative passage. International Journal of Language and Communication Disorders, 43(3), 330–345. Fujisawa, T., & Shinohara, K. (2011). Sex differences in the recognition of emotional prosody in late childhood and adolescence. Journal of Physiological Science, 61, 429–435. Goerlich, K. S., Witteman, J., Aleman, A., & Martens, S. (2011). Hearing feelings: Affective categorization of music and speech in alexithymia, an ERP study. PLoS ONE, 6(5), 1–11. Graddol, D. (2003). The decline of the native speaker. In G. Anderman & M. Rogers (Eds.), Translation today. Trends and perspectives (pp. 152–167). Clevedon: Multilingual Matters Ltd. Hoekert, M., Kahn, R. S., Pijnenborg, M., & Aleman, A. (2007). Impaired recognition and expression of emotional prosody in schizophrenia: Review and meta-analysis. Schizophrenia Research, 96, 135–145. Hopyan-Misakyan, T. M., Gordon, K. A., Dennis, M., & Papsin, B. C. (2009). Recognition of affective speech prosody and facial affect in deaf children with unilateral right cochlear implants. Child Neuropsychology, 15, 136–146. HUMAINE (Human-Machine Interaction Network on Emotion) Project. (2013). http://emotionresearch.net/projects/humaine/aboutHUMAINE. Accessed May 1, 2013. Kan, Y., Kawamura, M., Hasegawa, Y., Mochizuki, S., & Nakamura, K. (2002). Recognition of emotion from facial, prosodic, and written verbal stimuli in Parkinson’s disease. Cortex, 38, 623–630. Kitayama, S., & Ishii, K. (2002). Word and voice: Spontaneous attention to emotional utterances in two languages. Cognition and Emotion, 16(1), 29–59. Kopytko, R. (2002). The mental aspects of pragmatic theory: An integrative view. Poznan´: Motivex. Kotz, S., & Paumlann, S. (2007). When emotional prosody and semantics dance cheek to cheek: ERP evidence. Research report. Brain Research, 1151, 107–118. Landis, C. (1924). Studies of emotional reactions. General behavior and facial expression. Comparative Psychology, 4(5), 447–501.
[email protected]
114
5 The State of Emotional Prosody Research—A Meta-Analysis
Li, P., Zhang, F., Tsai, E., & Puls, B. (2014). Language history questionnaire (LHQ 2.0): A new dynamic web-based research tool. Bilingualism: Language and Cognition, 17(3), 673–680. Lindner, J. L., & Rosen, L. A. (2006). Decoding of emotion through facial expression, prosody and verbal content in children and adolescents with Asperger’s syndrome. Journal of Autism and Developmental Disorder, 36, 769–777. Mesquita, B., & Frijda, N. H. (1992). Cultural variations in emotions: A review. Psychological Bulletin, 112(2), 179–204. Mitchell, R. L. C. (2006a). How does the brain mediate interpretation of incongruent auditory emotions? The neural response to prosody in the presence of conflicting lexico-semantic cues. European Journal of Neuroscience, 24, 3611–3618. Mitchell, R. M. C. (2006b). Does incongruence of lexicosemantic and prosodic information cause discernible cognitive conflict? Cognitive, Affective and Behavioral Neuroscience, 6(4), 298–305. Mitchell, R. L. C., Elliott, R., Barry, M., Cruttenden, A., & Woodruf, P. W. R. (2003). The neural response to emotional prosody, as revealed by functional magnetic resonance imaging. Neuropsychologia, 41, 1410–1421. Mitchell, R. L. C., Elliott, R., Barry, M., Cruttenden, A., & Woodruf, P. W. R. (2004). Neural response to emotional prosody in schizophrenia and in bipolar affective disorder. British Journal of Psychiatry, 184, 223–230. Paulmann, S., & Pell, M. D. (2011). Is there an advantage for recognizing multi-modal emotional stimuli? Motivation and Emotion, 35, 192–201. Paulmann, S., Pell, M. D., & Kotz, S. A. (2008a). Functional contributions of the basal ganglia to emotional prosody: Evidence from ERPs. Brain Research, 1217, 171–178. Paulmann, S., Pell, M. D., & Kotz, S. A. (2008b). How aging affects the recognition of emotional speech. Brain and Language, 104, 262–269. Paulmann, S., Sebastian, S., & Kotz, S. A. (2010). Orbito-frontal lesions cause impairment during late but not early emotional prosodic processing. Social Neuroscience, 5(1), 59–75. Pell, M. D., Monetta, L., Paulmann, S., & Kotz, S. A. (2009a). Recognizing emotions in a foreign language. Journal of Nonverbal Behavior, 33, 107–120. Pell, M. D., Paulmann, S., Dara, C., Alasseri, A., & Kotz, S. A. (2009b). Factors in recognition in the recognition of vocally expressed emotions. A comparison of four languages. Journal of Phonetics, 37, 417–435. Rigoulot, S., Wassiliwizky, E., & Pell, M. D. (2013). Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition. Frontiers in Psychology, 4, 1–14. Rota, G., Veit, R., Nardo, D., Weiskopf, N., Birbaumer, N., & Dogil, G. (2008). Processing of inconsistent emotional information: An fMRI study. Experimental Brain Research, 186, 401–407. Rottenberg, J., Ray, R. D., & Gross, J. J. (2007). Emotion elicitation using films. In J. A. Coan, & J. J. B. Allen (Eds.), Handbook of emotion elicitation and assessment (pp. 9-28). New York: Oxford University Press. Russell, J. A. (1994). Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychological Bulletin, 115(1), 102–141. Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76(5), 805–819. Russell, J. A., Bachorowski, J., & Fernandez-Dols, J. (2003). Facial and vocal expressions of emotion. Annual Review of Psychology, 54, 329–349. Sauter, D. A., Eisner, F., Ekman, P., & Scott, S. K. (2009). Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. PNAS, 107(6), 2408–2414. Scherer, K. R., Johnstone, T., & Klasmeyer, G. (2003). Vocal expression of emotion. In R. J. Davidson, H. Goldsmith, & K. R. Scherer (Eds.), Handbook of the affective sciences (pp. 433–456). New York: Oxford University Press.
[email protected]
References
115
Schirmer, A. (2010). Mark my words: Tone of voice changes affective word representations in memory. PLoS ONE, 5(2), e9080. Schirmer, A., & Kotz, S. A. (2003). ERP Evidence for a sex-specific Stroop effect in emotional speech. Journal of Cognitive Neuroscience, 15(8), 1135–1148. Schirmer, A., Kotz, S. A., & Friederici, A. D. (2005). On the role of attention for the processing of emotions in speech: Sex differences revisited. Cognitive Brain Research, 24, 442–452. Schmidt, A. T., Hanten, G. R., Li, X., Orsten, K. D., & Levin, H. S. (2010). Emotion recognition following pediatric traumatic brain injury: Longitudinal analysis of emotional prosody and facial emotion recognition. Neuropsychologia, 48(10), 2869–2877. Sneddon, I., McRorie, M., McKeown, G., & Hanratty, J. (2007). The Belfast induced natural emotion database. IEEE Transactions on Affective Computing, 3(1), 32–41. Sorocco, K. H., Monnot, M., Vincent, A. S., Ross, E. D., & Lovallo, W. R. (2009). Deficits in affective prosody comprehension: Family history of alcoholism versus alcohol exposure. Alcohol and Alcoholism, 45(1), 25–29. Speedie, L. J., Brake, N., Folstein, S. E., Bowers, D., & Heilman, K. M. (1990). Comprehension of prosody in Huntington’s disease. Journal of Neurology, Neurosurgery and Psychiatry, 53, 607–610. Taler, V., Baum, S. R., Chertkow, H., & Saumier, D. (2008). Comprehension of grammatical and emotional prosody is impaired in Alzheimer’s disease. Neuropsychology, 22(2), 188–195. Thompson, W. F., & Balkwill, L. L. (2006). Decoding speech prosody in five languages. Semiotica, 158(1/4), 407–424. Uekerman, J., Kramer, M., Abdel-Hamid, M., Schimmelmann, B. G., Hebebrand, J., Daum, I., et al. (2010). Social cognition in attention-deficit hyperactivity disorder (ADHD). Neuroscience and Biobehavioral Reviews, 34, 734–743. Velten, E. (1968). A laboratory task for induction of mood states. Behavioral Research and Therapy, 6, 473–482. Ververidis, D., & Kotropoulos, C. (2003). A state of the art review on emotional speech databases. In Proceedings of 1st Richmedia Conference. doi:10.1.1.420.6988 Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48, 1162–1181. Wallbott, H. G., & Scherer, K. R. (1986). Cues and channels in emotion recognition. Journal of Personality and Social Psychology, 51(4), 690–699. Watson, J. B., & Rayner, R. (1920). Conditioned emotional reactions. Journal of Experimental Psychology, 3(1), 1–14. Xu, Y. (2010). In defense of lab speech. Journal of Phonetics, 38, 329–336.
[email protected]
Chapter 6
The Development of Stimuli for Emotional Prosody Research: With Contributions from Prof Dr. Jeanette Altarriba, State University of New York, Albany, USA
6.1 Introduction One of my major concerns regarding the development of stimuli for this study was increasing their ecological validity as compared to the present standards in emotional prosody research. The existing literature on the subject did offer guidance, but on several design points I had to move outside the established disciplinary and framework boundaries to solve certain design problems. There were four major issues to consider in constructing experimental stimuli. First, there was the issue of selecting speakers to provide samples of emotional speech to be converted into emotional prosodic stimuli. They would have to be native speakers of English, preferentially native and without acting or vocal training but who could be primed to feel certain emotions in a fashion similar to the Stanislavsky method—a solution suggested by Scherer and Ellgring (2007). Both male and female speakers should be involved. Second, there was the matter of selecting emotions to be investigated. Based on a critical overview of the selections and conclusions reported in previous research I decided to limit my investigation to happiness and sadness. The emotions correspond roughly to the general dichotomy of positivity/negativity, have excellent representation in the existing body of evidence, can be adjusted for arousal levels, and have opposing valence which would provide a useful contrast for the interpretation of the data. Third, there was the matter of evoking emotions in controlled laboratory conditions. Here I would have to reach out into the methods employed in the fields of affective databasing and emotion elicitation for materials, methods, and rationale to make this procedure successful. Finally, there was the issue of producing valid propositional content for the speakers to use in the course of acting out their emotions. Here, the materials from existing literature being either poorly reported or inadequate I would have to use a combination of emotion elicitation materials and corpus linguistic tools to create an appropriate set of utterances. The details of the process of creating the stimuli will be covered in this chapter. © Springer International Publishing Switzerland 2016 H. Ba˛k, Emotional Prosody Processing for Non-Native English Speakers, The Bilingual Mind and Brain Book Series 3, DOI 10.1007/978-3-319-44042-2_6
[email protected]
117
118
6 The Development of Stimuli for Emotional Prosody Research …
The entirety of the work described in this chapter was carried out in the fall of 2014 at Prof. Jeanette Altarriba’s Cognition and Language Laboratory at the Department of Psychology, University at Albany—SUNY, USA. Validation data was collected by Gabrielle M. Roy with some assistance form Catherine G. Payano.
6.2 Stimuli Creation Stage The procedure of creating the stimuli for this study was based on the critical review of literature presented in Chap. 5 and on methodological recommendations of the authors of the various publications there analyzed. The stimulus material developed in the procedure described in this chapter consists of four subsets, only one of which is used in this particular study. What was not used for this study will be analyzed and reported in separate publications, but the description of the entire set of stimuli was included to fully illustrate the procedure.
6.2.1 Speakers Providing Emotional Speech Samples The speakers were 8 (4 male, 4 female) native speakers of American English (age: M = 18.63, SD = 0.48). All speakers were students at the University at Albany— SUNY and all participated in this procedure for partial credit in their Psychology 101 course. All were screened with Language Experience and Proficiency Questionnaire (Marian et al. 2007), which determined they had negligible exposure to cultures and languages other than American English. Following a suggestion from Goerlich et al. (2011) I also screened all speakers for alexithymia, which is a subclinical condition characterized by difficulties in identifying and naming emotions in the self and others. This screening was done using the Toronto Alexithymia Scale (TAS-20) questionnaire (Bagby et al. 1994). On the alexithymia screening the male speakers scored consistently higher than female speakers on all subscales of the TAS-20. On “Difficulty Identifying Feelings” subscale male speakers scored M = 25.00 (SD = 5.24) to female speakers’ M = 23.50 (SD = 3.84) out of a maximum of 35 points. On “Difficulty Describing Feelings” subscale male speakers scored M = 15.50 (SD = 1.80) to female speakers’ M = 11.75 (SD = 1.30) out of a maximum of 21 points. On “Externally-Oriented Thinking” subscale male speakers scored M = 21.00 (SD = 1.41) to female speakers’ M = 19.75 (SD = 1.64) out of a maximum of 24 points. Overall male speakers scored M = 61.5 (SD = 6.02) to female speakers’ M = 55.00 (SD = 6.20). Male speakers thus exceeded what would normally be the cutoff point for a general diagnosis of alexithymia set at 61 points (the lowest possible score on TAS-20 is 42 points, the highest is 80 points). However, such a score is considered to still be within norm for young males (Mattila et al. 2006).
[email protected]
6.2 Stimuli Creation Stage
119
Therefore, the slightly elevated scores on alexithymia did not constitute an exclusion criterion for any of the speakers. The material from these 8 speakers was selected arbitrarily from a total of 22 (11 male, 11 female) speakers recorded. The material not used in this study will be analyzed and published separately.
6.2.2 Materials—Elicitation and Acting Playacted expressions of emotions are the accepted norm in emotional prosody research and I used this method of creating samples of emotional speech. However, I also used a combination of tools from affective databasing and emotion elicitation to create a viable approximation of natural (as opposed to acted) emotional expressions. The materials used to design and carry out the procedure comprised the films used to elicit the emotions of sadness and happiness, the questionnaires used to control for the effects of elicitation, and a list of utterances to be used as propositional content for playacting emotions. 6.2.2.1 Emotion Elicitation Using Films and Controlling for the Effects of Emotion Manipulation Affective databasing is not a large but increasingly valued field nested between computational and cognitive sciences and dealing with methods of evoking and recording emotional expressions in laboratory conditions for a variety of purposes (Ververidis and Kotropoulos 2003, 2006). Emotion elicitation is another field with a somewhat longer history in psychology, and it deals with the tools and materials used to evoke or induce emotions in laboratory conditions (Westermann et al. 1996). One of the better examples of how these two fields can be combined to produce dynamic facial expressions of emotions is the tellingly named Belfast Induced Natural Emotion Database (BINED) (Sneddon et al. 2007). The method used there and in several other instances to induce the emotions of happiness and sadness was to use evocative film clips from various motion pictures. The participants watch the clips, which evokes certain emotions in them. I decided to use this method but to control the effectiveness of the film clips in terms of evoking emotions by a combination of the Post-Film Questionnaire proposed by Rottenberg et al. (2007) and the Self-Assessment Manikin (SAM) in the original form developed by Lang (1980). There have been several publications on emotion elicitation using films in recent years, mostly concerned with systematizing the methodology and validating clips to be used in emotion elicitation studies. The methodological framework in the fields of both emotion elicitation and affective databasing is the standard view, which means both happiness- and sadness-evoking clips are usually to be found in the existing databases. Details on the clips I used are summarized in Table 6.1.
[email protected]
Boaz Yakin
Todd Phillips
Chris Buck and Jennifer Lee
2000
2013
The Hangover
Frozen
Rob Reiner
1989
2000
Director Frank Capra
Year 1946
Remember the Titans
When Harry Met Sally
It’s a Wonderful Life
Movie
[email protected]
00:31:07
00:22:41
01:39:02
00:42:39
Clip start 02:01:20
00:34:44
00:28:28
01:45:10
00:45:15
Clip end 02:07:58
Table 6.1 Details on the film clips used to elicit emotions in the speakers
3:37
5:47
6:08
2:36
Clip length 6:38
Scene description A man is reunited with his family and friends who show how much they appreciate him by collecting money to cover his debt. A couple bickers over lunch over gender stereotypes in a crowded diner. To prove a point the female character simulates a loud orgasm. Everyone stops eating and watches the couple. The final showdown in a tied football match leading to the protagonists’ winning touchdown. The team and their fans celebrate in the middle of the pitch. Three men wake up in a devastated hotel room in the aftermath of a drunk night out. They remember nothing and there is a live tiger in the room with them. The Let it go segment of the movie. One of the main characters sings an empowerment anthem about letting go of fear and embracing her magical powers.
(continued)
Happiness
Happiness
Happiness
Happiness
Emotion elicited Happiness
120 6 The Development of Stimuli for Emotional Prosody Research …
Steven Spielberg
Pee Docter and Bob Peterson
1998
2009
Up
Howard Zieff
1991
My Girl
Saving Private Ryan
Director Franco Zeffirelli
Year 1979
Movie
The Champ
Table 6.1 (continued)
00:33:08
00:11:39
00:07:19
01:25:47
Clip end 01:55:11
00:28:43
01:23:14
Clip start 01:50:29
4:20
4:25
2:33
Clip length 2:53
Scene description A boy witnesses the death of his father from an injury sustained in a boxing match. The boy does not seem to understand that his father will not wake up. A girl witnesses the funeral service for her best friend for whose death she blames herself. She is overcome with grief and runs from the house. In World War II officials in the US Army realize a mother of four sons is about to receive news that three of them were killed in action. A series of scenes tells the life story of an old couple, Carl and Ellie, from the moment they were married until Ellie’s death. Sadness
Sadness
Sadness
Emotion elicited Sadness
6.2 Stimuli Creation Stage 121
[email protected]
122
6 The Development of Stimuli for Emotional Prosody Research …
The clips were selected from existing validated databases. The clips from The Champ (1979) and When Harry Met Sally (1989) were selected on recommendation of Rottenberg et al. (2007) as classic choices for emotion elicitation using films. The clip from The Champ was edited according to the directions provided by Rottenberg et al. (2007). The clips from It’s a Wonderful Life (1946), Dead Poets Society (1989), My Girl (1991), Saving Private Ryan (1998), Remember the Titans (2000), The Hangover (2009), and Up (2009) after Bartolini (2011). The original happiness- and sadness-elicitation procedures included 6 trials with 6 clips each. However, due to unforeseeable circumstances at Albany campus the day before the procedures were to start 2 clips had to be dropped from the sadness-elicitation procedure due to their thematic content which increased the risk of adverse emotional reactions for the participants. The happiness-elicitation procedure was likewise shortened to 4 trials and 4 clips to match the length of the sadness-elicitation procedure. In effect the clips that were left in the procedures all came either from Rottenberg et al. (2007) or Bartolini (2007), and the clips from Benny and Joon (1993), Shawshank Redemption (1994), and There’s Something About Mary (1998) which were originally included after Schaefer et al. (2010) and the clip from Dead Poets Society (1989) selected after Bartolini (2011) were all removed from the set. On recommendation of Prof. Altarriba I added an extra happiness-eliciting clip at the very end of the sadness-eliciting procedure to break the speakers out of the invoked sadness. The clip I chose came from the animated movie Frozen (2013). This choice was not based on any previous study, but was well received by all speakers assigned to the sadness-elicitation procedure. 6.2.2.2 Propositional Content for Playacting Emotions As shown in Chap. 5, one of the major aspects of the existing research on emotional prosody frustrating its replicability is the incidental and fragmentary nature of reporting propositional content used by speakers to act out emotions. Therefore, to find enough of suitable linguistic material to create my stimuli I initially turned to one of the classics in the field of emotion elicitation, the Velten technique (1968) which is based on sets of utterances designed to evoke positive and negative moods. While the original sets of utterances of the Velten technique are still in occasional use today, being designed in 1960s many of them are distinctly dated. I therefore decided to create my own set of utterances based on a spoken corpus of contemporary American English to make sure the utterances are as close to natural spoken English as possible. To create the set I used the original Velten sets as a guideline, extracting from each utterance the keywords bearing affective meaning. I then performed a keyword search within the spoken corpus subset of the Corpus of Contemporary American English (COCA) (Davies 2008). That subset consists largely of transcripts of interviews and news reports, and all records are tagged with the time each interview and report was originally aired. I limited the searches to corpus sources recorded between 1990 and 2012 to avoid vocabulary which might be perceived as dated by the speakers. The full inventory of acted
[email protected]
6.2 Stimuli Creation Stage
123
utterances used in the study is presented in Appendix 1, and of the natural utterances in Appendix 2. Originally I prepared 120 such utterances, 60 for each emotion-elicitation procedure, 10 for each trial. Eventually, however, because 2 trials were dropped from each procedure, only 80 utterances were used, 40 for each procedure, 10 for every trial in each procedure. The spoken corpus subset of COCA consists of rather faithful transcripts of real-life speech, which are full of hesitations, corrections, misspeaks, pauses, and similar performative errors that characterize natural speech. All of those were marked in some fashion in the transcripts. Where necessary, I edited the utterances to eliminate such performative markers from the transcripts to make the reading easier for the speakers. Likewise, the occasional long, run-on transcript sentence would be divided into two or three shorter ones to make them more readerfriendly. Because of the nature of the source material, the propositional content used here ranges from single sentences to short passages composed of a few short sentences. By their own reports the speakers found the utterances to be easy to understand and perform and the language to be fitting and natural for speech.
6.2.3 Recording Procedure Material from each speaker was recorded individually and each participant was assigned randomly to either happiness- or sadness-elicitation procedure. The speakers filled out the LEAP-Q and TAS-20 questionnaires, following which the recording session started. The recording session was composed of four trials, all materials excepting the questionnaires were presented on a computer screen situated at a comfortable distance from the speaker. The speakers were informed by the researcher present when the recording equipment was being turned on and off. Each trial started with a presentation of a film clip to elicit procedure-specific emotions. The presentation was followed with a semi-structured interview designed to elicit candid expressions of emotion from the speakers. The speakers were asked about their feelings regarding the characters in the film clips, about the characters’ fate and prospects. Following the interview the speakers filled out a questionnaire probing their emotional state following each trial. This questionnaire is described in the materials as the combination of Rottenberg et al. (2007) and Lang’s (1980) questionnaires. Finally, a set of utterances was presented to each speaker one by one on a computer screen. The task of each speaker was to read each sentence several times quietly, and then act it out in a tone of voice specific to their emotion elicitation procedure. Upon completing the acting out of the last sentence the procedure was over. The speakers in the happiness-elicitation procedure were then thanked and released, while the speakers in the sadness-eliciting procedure stayed to watch the final clip from Frozen, following which they were also thanked and released. From the beginning of the first clip to the coda of the last utterance the entire procedure was recorded in audio and video format using a SONY HDR-CX210E video camcorder at video resolution of 720 × 576, bitrate of 9356 kb/s and 25 fps.
[email protected]
124
6 The Development of Stimuli for Emotional Prosody Research …
Audio was recorded in stereo at bitrate of 256 kb/s and sampling rate of 48 kHz. The setup of the room in which the material was recorded was modeled on the optimal setting described in detail by Quiros-Ramirez et al. (2012).
6.2.4 Results—The Recorded Material and Emotion Elicitation Evaluation The questionnaire probing the speakers’ emotional state was administered four times throughout the recording procedure, once in each trial. Among other questions, the questionnaire included the SAM scales, which allowed to assess the speakers’ emotional state in terms of valence and arousal, as well as a discrete emotion probe similar to PANAS (Watson et al. 1988), though with only 18 items. The latter included all six basic emotion categories, which allowed to assess the speakers’ emotional state in emotion-categorical terms. The results of the questionnaires indicate the emotion elicitation was effective overall for valence and categorical emotions evoked. The SAM is a 9-point scale composed of icons and here I analyzed the results from it as from a continuous numerical scale from (−4) to (4). In the happiness-elicitation procedure female speakers indicated overall feeling low-level (M = 0.8) arousal and a positive valence (M = 1.8) as did the male speakers, who indicated relatively low arousal (M = −0.625) but an even better defined positive valence (M = 2.5). In the sadness-elicitation procedure the female speakers indicated experiencing low-level arousal (M = −0.38) and negative valence (M = −1.88), while the male speakers indicated experiencing even lower arousal (M = −1.13) and negative valence (M = −1.25). The PANAS-like probe included 18 emotion terms with a 9-point scale from (0) to (8). The participants were to indicate what level of each of the 18 emotions listed they felt while watching each film clip using the scale provided. The results here also indicate the emotion manipulation was effective. In the happiness-elicitation procedure both male and female speakers indicated the emotions they felt most strongly were: happiness (M = 5.92), joy (M = 5.62), and amusement (M = 5.38), with an additional strong feeling of interest (M = 5.77) and some measure of surprise (M = 2.46). In the sadness-elicitation procedure both male and female speakers indicated experiencing the emotions of unhappiness (M = 5.75) and sadness (M = 5.69), with additional moderate feeling of interest (M = 3.94) and love (M = 2.22). The relatively high scores for interest likely indicate the attention paid to the film clips as presented. The heightened score for surprise may reflect the patterns of previous exposure to the films which were pretty low within the happiness-elicitation procedure: None of the speakers saw It’s a Wonderful Life, and three out of four did not see When Harry Met Sally. One speaker did not see neither Remember the Titans or The Hangover. The heightened score for love in the sadness-elicitation procedure can be explained by the inclusion of a clip from Up the main theme of which is the love between the main characters. All things considered, by the speakers’ own self-assessment they
[email protected]
6.2 Stimuli Creation Stage
125
experienced the emotions it was my intention to elicit in them for the duration of the recording session. As illustrated in Chap. 5, previous research on emotional prosody was by and large based on acted material, where the speakers were not required to feel emotions but to sound emotional. Therefore the stimuli validation procedures in those studies hinged on successful recognition ratings of various types, the latter usually treating categorical emotions of opposing valence as a continuous variable. In this study, however, the speakers providing emotional speech samples provided an assessment of their internal states themselves. They were not merely instructed to act out emotions, they were primed to actually experience the target emotions of sadness or happiness. In the methodological framework of experimental pragmatics adopted for this study, the speakers’ self-assessments would serve as satisfactory measure of validation. I decided, however, to run an exploratory investigation of the stimuli created for the study to (a) test the experimental procedures and tasks that would eventually be used, and (b) formulate predictions regarding the results of the study. The norming study would be conducted within the same population of native speakers of English the speakers came from and would in essence be the first concept test for the integrative paradigm for emotional prosody research designed for this study. The stimulus material was recorded in HD video format. On average the recordings were 38 min 52 s long for each speaker. Clips to be processed into stimuli were cut from the material using Adobe Premiere Pro video editing software. For each speaker there were 40 clips of acted emotional expressions and an average of 15 (range 12–21) natural utterances. This yielded an initial set of 439 clips. 320 of them contained acted emotion expressions, with 160 expressions of sadness and 160 for happiness (40 utterances × 2 speakers × 2 genders), and 119 contained natural emotion expressions, with 61 expressions of happiness (27 from female speakers, 34 from male speakers) and 58 expressions of sadness (28 from female speakers, 30 from male speakers). The clips were then analyzed and all those containing noise pollution or any audio glitches were removed from the initial set. This exclusion process eventually yielded 361 clips. Of those 242 were acted expressions of emotion, 125 expressing happiness (65 from female speakers, 60 from male speakers) and 114 expressing sadness (61 from female speakers, 53 from male speakers). All 119 clips of the natural expressions of emotion were free of noise pollution and were therefore all included in the norming study. From the cleared set of 361 video clips, I extracted audio creating a parallel audio-only set of clips, all of which were subjected to noise reduction and volumematching using Adobe Audition CC audio editing software. Another parallel set was created by processing all audio clips through a low-bandpass filter removing the 60–300 Hz spectrum using Praat software (Boersma 2001). This filtered audio was then spliced with the original video clips to create the fourth and final set of stimuli, this one consisting of video clips with filtered audio. All four sets of stimuli were subjected to the subsequent norming, although only the stimuli and the data from the filtered audio set was applied for this study. It has been pointed out by several researchers that in studying emotional expressions of different valences
[email protected]
126
6 The Development of Stimuli for Emotional Prosody Research …
the balancing of the proportions between the valences is paramount. Failure to balance the proportion of the positive-valence to negative-valence stimuli quickly leads to an affective stimulus bias, that is preferential focusing of attention accompanied by increased accuracy and response times, to the valence more strongly represented in the stimuli (Wagner 1997). For this reason I selected a sample of stimuli that would be matched and balanced for gender of the speakers and valence alike. The acted utterances could be matched on propositional content, so I sought instances where all four speakers in each emotion-elicitation condition managed to express a given prepared utterance and none of the clips for that particular instance was removed from the set due to glitches or noise pollution. There were 14 such instances in the happiness-elicitation condition and 12 in the sadness-elicitation condition. I therefore selected all 12 of such instances from the sadness-elicitation condition, which meant there were 48 stimuli of acted sadness (12 from each speaker), and I removed 2 instances at random from the happiness-elicitation condition thereby keeping a matched set of 48 stimuli of acted happiness (12 from each speaker). Given the speakers’ idiosyncrasies of expression and the unscripted nature of the natural expressions of emotion, the clips with the natural expressions were highly variable in terms of both length and propositional content. To bring a measure of balance into this set all clips shorter than 3 s and longer than 9 s were removed from the set. The natural emotion expressions were somewhat harder to balance by gender than the acted expressions but they were balanced by emotion. A total of 66 stimuli with expressions of natural emotion were selected for the experiment proper, 33 of them with natural expressions of happiness (15 from female speakers, 18 from male speakers), and 33 with natural expressions of sadness (16 from female speakers, 17 from male speakers). Thus the total number of stimuli that went into the experiment was 162, 96 acted and 66 natural.
6.3 Stimuli Exploration Study The database of stimuli generated at the stimuli creation stage consisted of four sets of 361 clips, each set consisting of clips in four different audio-visual conditions described above. All those stimuli would be arranged into sets counterbalanced for the speakers’ gender (male and female), emotions expressed (happiness and sadness), manner of expression (acted and natural), and stimulus type (audiovisual, filtered audio-visual, audio, and filtered audio). Judges drafted from the same population as the speakers would evaluate these stimuli in three different tasks a between-subjects design, so that each judge would evaluate all stimuli types but within only one of three evaluative tasks. Here I will focus only on the results of the filtered audio stimuli set evaluation. The evaluation by the judges would serve both as a concept test for the various experimental tasks designed for this study and as an exploration of the direction the effects might go. The results
[email protected]
6.3 Stimuli Exploration Study
127
would serve as a point of reference for the interpretation of the results from the main experiment described in Chap. 7. The number of stimuli involved in this exploration study was substantial, as the initial set of 361 clips processed into four separate sets of different types totaled 1444 stimuli to be judged. These had to be counterbalanced for different variables within stimuli (speakers’ gender, emotion, manner of expression) and stimuli types to avoid stimulus bias, and divided into smaller sets that could conceivably be evaluated by the judges without causing emotional fatigue. Therefore, the number of judges had to be substantial. The judges were drafted from the same population as the speakers, which was ethnically diverse. Therefore to control for the potential effects of cultural variability, the judges were screened with the LEAP-Q questionnaire (Marian et al. 2007). Because identifying emotions would be the essential part of this exploration study, the TAS-20 (Bagby et al. 1994) screening for alexithymia was also applied. Additionally, to control for potential adverse effect of exposure to multiple expressions of emotion, many of them negative (sadness), the judges filled out the PANAS-X (Watson et al. 1988) questionnaire before and after the evaluation procedure.
6.3.1 The Judges The judges were 118 speakers of English. Data from 9 had to be removed from analysis due to a coding error, and data from further 2 judges was removed because LEAP-Q screening revealed that English was not their most dominant language. This left 107 judges, 43 male and 64 female (age M = 19.2, SD = 2.8). The judges were largely native speakers of American English. In several cases the judges indicated self-identification with different (non-English-speaking) cultures, but those individuals typically also indicated spending all their lives on US soil and using English predominantly in all or most everyday contexts. 53 of the judges indicated knowing English only and identifying primarily with US American culture. The other 54 spoke more languages. The most popular second language among the judges was Spanish (46 judges), to a much lesser degree also French (6 judges), Chinese and Arabic (3 judges each), Japanese, Hindi and Tagalog (2 judges each). There were also individual cases of judges knowing some degree of a wide variety of languages determined by the family histories of the judges (e.g., Polish, German, Yoruba, Akan, Tamil, Gujarati, Hebrew, or ASL). However, on the whole, all the judges could be classified as native based on their responses on the LEAP-Q. They were all dominant American English speakers, have spent all or majority of their lives in the USA, spoke American English in all or most everyday contexts and identified either with the broadly understood US American culture or one of its many subcultures. Quintessentially, they constituted a representative sample of the population the speakers came from in all aspects of gender proportions, ethnicities, and age. All judges were students at the University at Albany— SUNY and participated for partial credit in their Psychology 101 course.
[email protected]
128
6 The Development of Stimuli for Emotional Prosody Research …
The alexithymia scores in this group were somewhat elevated, though the effect could be attributed to a combination of the mean age in the population as well as some procedural priming. The judges were given a general description of the procedure which included the information they would be required to identify or evaluate emotions, which might have prompted them to take a critical look at their skills in this respect and bias some of their responses on the TAS20. In effect the overall mean alexithymia scores were above the threshold of 61 points at M = 66.83 (SD = 9.40). In all other respects, apart from the relatively higher scores, the judges’ scores echoed those of the speakers. Overall scores for the male judges were higher (M = 68.43, SD = 9.44) than those of female judges (M = 65.74, SD = 9.21). The same was true on each of the three factors within TAS-20. On “Difficulty identifying feelings” subscale male judges scored M = 29.09 (SD = 5.72) to female speakers’ M = 27.71 (SD = 5.60) out of a maximum of 35 points. On “Difficulty describing feelings” subscale male speakers scored M = 17.05 (SD = 2.89) to female speakers’ M = 16.06 (SD = 3.51) out of a maximum of 21 points. On “Externally-oriented thinking” subscale male speakers scored M = 22.30 (SD = 3.63) to female speakers’ M = 21.97 (SD = 4.10) out of a maximum of 24 points. No judges were outliers on either the overall TAS20 scores or any of the subscales. Given the average age of the judges, the priming inherent in the study description, and the fact that there were no outliers, I did not take the alexithymia scores to constitute an exclusion criterion for any of the judges. Regarding the PANAS-X scores before and after the evaluation procedure, there were no serious adverse effects apart from a certain level of fatigue indicated by the increased scores on drowsiness and dropping scores on attentiveness and concentration. There was some indication in the PANAS-X scores that the judges were sensitized to low-level negative emotions indicated by slight increases in the scoring of items such as sad or downhearted. Other than that the evaluation procedure appears not to have had any adverse effects on the judges.
6.3.2 The Evaluation Procedures In previous research emotion recognition from prosody was defined as the ability of the judges to categorize a stimulus as belonging to one of the basic discrete emotion categories with an occasional application of valence as a co-determinant of accuracy, as illustrated in Chap. 5. Here I designed three separate procedures, each of which would allow a unique insight into a different level of emotion processing. In the first, judges would evaluate the valence and arousal expressed in the emotional stimuli on continuous scales with very broad denominations of “positive” and “negative” emotions on either end of the valence scale, and “high” and “low” intensity of expression on either end of the arousal scale. This task tapped into the most general level of processing and demanded minimal specificity on identification. In the second procedure, judges performed a classic forced
[email protected]
6.3 Stimuli Exploration Study
129
choice identification through categorization task with happiness, sadness, and neutrality as possible options. Here the demand of specificity was increased as the participants had to assign the stimuli to specific categories of emotions by matching perceptual input to category-specific features stored in long-term memory. Finally, there was the free naming procedure where the judges were to name the emotions expressed in the stimuli using single words of their choice. Here the demand of specificity on identification was the highest of all, as there were no indicators within the procedure of what emotions might be present in the stimuli. Each judge only evaluated stimuli in one procedure and the assignment of each judge to a given procedure was random. The judges evaluated all four stimuli types (video with normal audio, video with filtered audio, normal audio, filtered audio) in dedicated blocks, but with the blocks presented in different orders. Every effort was taken to ensure the samples presented to each judge were balanced on every variable (speaker gender, emotion, manner of expression), and that every judge evaluated the same number of stimuli of each stimulus type. In total, each judge evaluated between 112 and 120 stimuli. The counterbalancing was set up in such a manner that every stimulus out of the 1444 was evaluated by 3 judges. All judges were tested individually, with the evaluation procedure being carried out on a computer and the questionnaires in pen and paper format. For all judges the evaluation procedure started with filling out the PANAS-X questionnaire, followed by the evaluation procedure, followed by another instance of the PANAS-X questionnaire, followed by the TAS-20 and LEAP-Q questionnaires. In the valence/arousal evaluation procedure the judges would first see a fixation point “+” for 1000 ms, followed by stimulus presentation. After each stimulus ended a valence scale from (−3) for maximum “negative” to (+3) for maximum “positive” appeared at the top of the screen with a response window at the center of the screen. Here the judges had to evaluate the valence of the preceding stimulus using the scale provided and typing the number of their choice into the response window. Once they had typed in their valence evaluations, the judges would click enter, upon which the arousal evaluation scale would appear at the top of the screen. The scale was from (−3) for “low” to (+3) for “high” and referred to the level of emotional intensity in the preceding stimulus. Here the judges had to evaluate the level of arousal using the scale and typing in their number of choice into the response window below the scale. Upon typing in their arousal evaluation the judges would hit enter, which would complete a full trial. In this procedure the judges used the number pad of a standard PC keyboard to enter their responses. In the categorization procedure, each trial started with a fixation point “+” presented for 1000 ms followed by stimulus presentation. After each stimulus the judges would see a screen prompt with forced choice options described with icons of facial expressions and category names: “ happy,” “ neutral,” and “ sad.” The arrangement of the on-screen prompt corresponded to the response keys on the keyboard. The response keys were “v,” “b,” and “n,” each marked, respectively, with a colored strip with the “ ,” “ ,” and “ ” icons. The judges would make their evaluations in this task by clicking the key corresponding to the category they
[email protected]
130
6 The Development of Stimuli for Emotional Prosody Research …
decided the preceding stimulus belonged to. Clicking any of the response keys would complete a full trial. The judges were instructed to categorize the emotions in the stimuli presented as either happiness, sadness, or (emotional) neutrality. The free naming procedure also started with a fixation point “+” lasting 1000 ms and stimulus presentation. Following stimulus presentation the judges were presented with a response screen prompting them to type in their response. The instructions specified they should use one word only, though phrasal verbs were allowed as well. The judges were further advised not to trouble themselves about possible spelling mistakes but focus on identifying and naming emotions. After typing in their responses the judges would click enter to complete a full trial. In this procedure the judges were allowed to use the entire keyboard, including backspace to enter and correct their responses.
6.3.3 Determining “Accuracy” Across Evaluation Procedures Each evaluation procedure tapped into a different level of processing and was premised on a different framework. The valence/arousal evaluation procedure was the most broadly general level and it was based on Russell’s (1980) circumplex within Minimal Universality. The categorization evaluation procedure tapped into the intermediate categorical level of emotion processing and was based on Ekman’s (1992) discrete basic emotions within universalist theory of emotions. The free naming evaluation procedure demanded the most specificity and precision in emotion identification and was based on appraisal theory where the vocabulary choices are interpreted as reflective of the emotion appraisal structures (Scherer 2005). Each of these three procedures has different basic underlying assumptions regarding the nature of emotions. Therefore what constituted correct and incorrect identification of emotion in a stimulus differed depending on the procedure. Within Minimal Universality sadness is an emotion of negative valence and happiness—an emotion of positive valence. In the valence/arousal procedure the judges evaluated the valence of presented stimuli using a scale from (−3) to (+3). The scalar evaluations made by the judges were compared against the speakers’ self-assessments during the recording procedure. All evaluations of sad stimuli falling on the negative part of the scale from (−3) to (−1) were marked as correct. Similarly, all evaluations of happy stimuli falling on the positive part of the scale from (+1) to (+3) were marked as correct. All other cases, including all evaluations of valence as (0), as well as misevaluations of sad stimuli as “positive” or happy as “negative” emotions, were marked as incorrect. The speakers’ selfreports indicated the levels of arousal to be relatively low and not well differentiated between sadness and happiness, so evaluations of arousal were not taken into consideration as a criterion of accuracy.
[email protected]
6.3 Stimuli Exploration Study
131
Within the basic discrete emotions of the standard view sadness and happiness are two of the basic six panhuman emotion categories. Within the categorization evaluation the judges were asked to assign each stimulus presented to one of three categories, including the two which were present in the stimuli and an (emotional) neutrality category for the stimuli which could not be clearly categorized as either emotion. The determination of correctness was much more straightforward in this task. All categorizations of sad stimuli as sad and happy stimuli as happy were marked as correct. All other cases, including categorizing any stimulus as neutral, any happy stimulus as sad, and any sad stimulus as happy were marked as incorrect. Within appraisal theory of emotions the determination of necessary and/or sufficient characteristics of any emotion is rather difficult, since the accuracy appears to be dictated by communicative success. However, some insight into the nature of both appraisals and the mechanisms guiding them can be gained by analyzing the vocabulary choices in a free naming paradigm. The judges in the free naming evaluation procedure did exactly that. Here the determination of accuracy would demand constructing a method of interpreting and classifying the many and varied responses given by the judges to the feelings and emotions they perceived in the stimuli. However, referring the output generated by the judges to other human judges to classify would mean the secondary classification judgments would be filtered through those judges’ appraisals once more. More human involvement at this stage would offer more potential for human error and biases. Therefore I would lean on the considerable potential of linguistics to construct a tool capable of classifying the judges’ evaluation output according to its overall valence and basic emotion category membership. The determination of accuracy in the free naming procedure would thus be two-tiered. On valence level all evaluations of sad stimuli classified as “negative” and all evaluations of happy stimuli classified as “positive” would be marked as correct. All classified as “neutral” or impossible to classify would be marked as incorrect. On categorical level, all evaluations of sad stimuli classified as belonging to the category sadness and all evaluations of happy stimuli classified as belonging to the category happiness would be marked as correct. All other evaluations falling into other basic emotion categories or outside them would be marked as incorrect. To classify the output of the free naming evaluation I have constructed a simple tool based on a corpus created by Warriner et al. (2013) and on dictionary and thesauri entries for basic emotion terms. The Warriner et al. (2013) corpus contained, among other measurements, valence judgments for 13,915 English words. The output of the judges’ free naming evaluation procedure would be compared against this corpus and each item on the output record would be tagged as “positive”, “negative”, or “neutral” according to the corpus valence measurements. The Warriner et al. (2013) corpus offered its raters an evaluation scale of 1–9, with 1 standing for “negative” and 9 standing for “positive”, and the raters were instructed to opt for 5 to indicate “neutral”. Due to the raters’ idiosyncrasies and the nature of the data I designated the overall mean ratings of 4.0–6.5 as “neutral”, overall mean ratings of 1.0–3.99 as “negative”, and overall mean ratings of
[email protected]
132
6 The Development of Stimuli for Emotional Prosody Research …
6.51–9.0 as “positive”. This valence corpus matching would determine accuracy of the judges’ output in terms of valence. For the categorical classifications the output would be compared against a collection of synonyms for each of the basic emotion terms. Initially I only included synonym sets for happiness and sadness. However, an overview of the judges’ output revealed the presence of other emotion categories, which prompted me to include all basic emotion categories in the tool. The judges’ output would be compared against the synonym lists, and if a given label assigned by a judge to a stimulus was present in a given emotion category list, the label would be tagged as belonging to that emotion category. Thus in categorical terms, all sad stimuli tagged as belonging to the emotion category sadness and all happy stimuli tagged as belonging to the category happiness would be marked as correct. All labels tagged as belonging to other emotion categories or falling outside basic emotion categories would be marked as incorrect. The logic behind using synonyms was based on Paul Ekman’s notion of emotion families and of the prototypical structure of basic emotion concepts (Ekman 1994). The dictionaries and thesauri used to create the basic sets of emotion term synonyms were the online versions of the Merriam-Webster Dictionary (2014) and the Oxford English Dictionary (2014), the former being the most representative and respected source for American English, the latter for English in general. Overall, between the two dictionaries, there were 288 synonyms for happiness and for sadness there were 223 synonyms including the headwords in each tally. For anger there were 252 synonyms, for fear 227, for disgust 75, and for surprise—87 synonyms in all. The Warriner et al. (2013) corpus, although it is not as venerable as the Affective Norms for English Words (ANEW) (Bradley and Lang 1999), it is considerably larger and includes a wider spectrum of vocabulary from various parts of speech, allowing the classification tool to be more inclusive and flexible. After running a few comparison tests on the judges’ output with both the Warriner et al. (2013) and the ANEW (Bradley and Lang 1999) corpora I found the grand majority of classifications to match across the corpora, the only significant differences being in the amount of output possible to classify by using the former versus the latter corpus. I therefore chose the Warriner et al. (2013) corpus for its size and scope. The classification tool was built using simple matching and tagging functions in MS Office Excel.
6.3.4 The Results of the Exploration Study Assuming, on the basis of the speakers’ self-assessments, that they felt the emotions as elicited, every instance where the elicited emotion was recognized correctly was a laboratory equivalent of the real-life successful communication of an emotion. Thus the accuracy scores in the norming procedure served to establish the instances of successful communication of emotion between native speakers of English. Additionally, the instances of inaccuracy, that is of miscommunication, would indicate the overall recognition trends on different levels of processing
[email protected]
6.3 Stimuli Exploration Study
133
as well as the potential directionality of the results of the study with nonnative English speakers. Every stimulus was evaluated by three judges on each of the three levels, and if at least one judge recognized the emotion in a given stimulus correctly, the stimulus was included into the analysis of accurate recognition cases. The overall patterns of emotion recognition from prosody differentiated between the evaluation procedures. In the valence/arousal evaluation procedure from the initial pool of 361 stimuli (242 acted and 119 natural) 303 (209 acted and 94 natural) were recognized correctly by at least one judge, which constituted 83.93 % of all stimuli evaluated. In the categorization procedure there was a significant drop with 252 stimuli (178 acted and 74 natural), which constituted 69.81 % of the initial pool. In the free naming procedure the accuracy check was dual-tier with respect to valence tags and basic emotion category tags. By valence tags 314 stimuli (240 acted and 74 natural) were recognized accurately, which constituted 86.98 % of the total. However, by basic emotion category tags only 238 stimuli (177 acted and 61 natural) were recognized, which constituted only 65.93 % of the total. The results from the valence/arousal evaluation procedure were analyzed on two levels. First, the overall valence and arousal scores from the judges were compared to the overall valence and arousal scores from the speakers’ self-evaluations during the emotion elicitation procedure. Second, the accuracy and error patterns were analyzed. A broad comparison of the mean scores for valence and arousal between the speakers’ subjective experience and the judges’ evaluations of the expressions based on that experience revealed an interesting pattern (see Fig. 6.1). While the valence scores were relatively similar between the speakers and the judges regardless of the speakers’ gender, the arousal scores for male speakers showed a pronounced difference with the judges indicating much higher levels of arousal than the speakers did. The same pattern emerged in the evaluations of female speakers, but the difference was much less pronounced. Overall the judges indicated the speakers expressing happiness showed positive valence (M = 1.78) with moderate levels of arousal (M = 1.04), with the male speakers expressing happiness showing similar levels of positive valence (M = 1.88) and somewhat higher arousal (M = 1.43). The female speakers expressing sadness were evaluated as showing negative valence (M = −2.04) with lowered arousal (M = 0.00), with the male speakers showing similarly negative valence (M = −1.98) with somewhat higher arousal (M = 0.54). Overall accuracy was relatively high in this evaluation procedure and the correct recognitions of emotions was remarkably well balanced with respect to the speakers’ genders and the emotions expressed. In total 154 stimuli from the female speakers were recognized correctly. This included 88 stimuli from female speakers expressing happiness recognized correctly as showing positive valence (66 acted, 22 natural), and 86 stimuli from female speakers expressing sadness recognized correctly as showing negative valence (62 acted, 24 natural). For male speakers there were 149 correctly recognized emotional prosody stimuli. This included 73 stimuli of male speakers expressing happiness recognized as showing positive valence (52 acted, 21 natural) and 76 stimuli for male speakers expressing sadness
[email protected]
134
6 The Development of Stimuli for Emotional Prosody Research …
Fig. 6.1 Valence and arousal scores from speakers’ self-evaluations and judges’ assessments
and recognized as showing negative valence (49 acted, 27 natural). Interestingly, even on this very general level of processing differences in the processing of male and female speakers’ emotions begun to appear. A similarly interesting pattern was observed in the recognition errors committed. 24 stimuli containing expressions of happiness were most frequently misinterpreted as showing negative valence. The stimuli expressing sadness were mistaken for positive in only 16 cases. Taken together, the two angles of exploration within the valence/arousal evaluation procedure indicate that the emotions in male voices are recognized less accurately and that difference in accuracy is connected to the dimension of perceived emotional arousal. Additionally, it appears that there might be a stronger tendency to misidentify positive emotions as negative than to misidentify negative as positive. In the evaluation through categorization the accuracy was markedly lower than in the more general valence/arousal evaluation, and the drop can be attributed to the increase in the processing costs. In the valence-arousal evaluation the judges only had to evaluate stimuli in very broad terms with the use of a scale which further eased the specificity of emotion identification demands. In this procedure the judges had to recognize basic emotion category characteristics and assign each stimulus to a specific category while compensating for the potentially
[email protected]
6.3 Stimuli Exploration Study
135
non-prototypicality of each emotion expression. The procedure demanded threeway categorization between happiness, sadness, and neutral categories. By the speakers’ self-evaluations there were no neutral expressions in the stimuli, but the option was left into allow for a margin of error which was never before included. In total, 123 stimuli from the female speakers were recognized correctly. This included 46 expressions of happiness recognized as such (35 acted, 11 natural), and 77 expressions of sadness recognized as such (58 acted, 19 natural). From the male speakers 129 stimuli were recognized correctly. This included 61 expressions of happiness recognized as such (39 acted, 22 natural) and 68 expressions of sadness recognized as such (45 acted, 23 natural). The most frequent error in emotion identification in this procedure was mistaking expressions of sadness for expressions of happiness (51 stimuli), second most frequent was categorizing expressions of both happiness and sadness as neutrality (43 stimuli). Mistaking expressions of happiness for expressions of sadness was the least frequent error (15 stimuli). There is much less balance in the recognition patterns with respect to the emotions expressed and the speakers’ gender with a marked difference between the amount of recognition errors made with the female speakers expressing happiness and sadness. The analysis of the free naming evaluation procedure was more complex and was executed in two steps. One was an analysis of the judges’ output by valence of the emotion labels used to identify emotions. The other was an analysis of the judges’ output by its assignment to basic emotion categories. All the judges’ output was spellchecked before being tagged using the tool programmed for that purpose. Where necessary the spelling was corrected. Some of the output could not be unequivocally determined, such as the entry “ad” which could have been meant as either “sad”, “bad” or “mad”. Such cases were treated as errors and dropped from the accuracy analysis. Analyzing the output on the valence dimension I found 314 stimuli recognized accurately. In total 160 stimuli from female speakers were named with words classified by the Warriner et al. (2013) corpus as appropriately positive or negative by valence. This included 66 stimuli containing expressions of happiness (47 acted, 19 natural) were named with words tagged as positive and 94 stimuli with expressions of sadness (68 acted, 26 natural) were named with words tagged as negative. For the male speakers 75 stimuli with expressions of happiness (51 acted, 24 natural) were named with words tagged as positive and 79 stimuli with expressions of sadness (50 acted, 29 natural) were named with words tagged as negative. The remaining 47 stimuli were either tagged as neutral in valence or could not be classified using the Warriner et al. (2013) corpus. In the case of this procedure the specificity of emotion identification demands and therefore the processing costs were the highest. Consequently this procedure bore the largest margin for error. And as previous research on emotion prosody showed happiness tends to be mistaken for anger, the determination of the types of errors made by the judges would be very informative. The error types could be fairly reliably determined by analyzing the judges’ output by basic emotion category tags. While the emotion naming evaluation procedure carried the largest potential for error it also allowed for the greatest precision of identification. On the second
[email protected]
136
6 The Development of Stimuli for Emotional Prosody Research …
level of free naming evaluation procedure the judges’ output was tagged according to basic emotion category membership determined by lexicographic data. It became apparent immediately that in terms of semantics all basic emotions conventionally considered as negative are interconnected. What this meant in terms of basic emotion tagging is that some words were classified lexicographically as being synonymous to more than one basic emotion term. Consequently, some items of the judges’ output would be marked with more than one basic emotion tag. Thus anger tags frequently co-occurred with disgust tags and sadness tags with both fear and disgust tags. The co-occurrence of tags did not occur with the only conventionally positive emotion under scrutiny—happiness. In other words, in the context of this procedure and the lexicographic-semantic data happiness is a semantically pure basic emotion category, while sadness is intermingled with other emotions. Still, any item of the judges’ output that would take the sadness tag was considered correct if it was used to name the emotion of sadness as expressed in a stimulus. 131 stimuli from female speakers were recognized correctly, including 43 expressions of happiness (33 acted, 10 natural) and 88 expressions of sadness (64 acted, 24 natural). 107 stimuli from male speakers were also recognized correctly, including 40 expressions of happiness (27 acted, 13 natural) and 67 expressions of sadness (43 acted, 24 natural). The patterns of recognition indicate that with the processing costs at maximum in this evaluation procedure, the judges made multiple errors on emotion recognition. Interestingly, markedly more errors were made on the recognition of happiness and this pattern held for both male and female speakers. Because each stimulus was evaluated by three judges there were up to three opportunities to misidentify each stimulus in a different way. Still, certain error patterns were discernible in the evaluation data. The judges misidentified stimuli with expressions of happiness much more frequently than stimuli with expressions of sadness. Expressions of happiness were most frequently misidentified as sentiments falling outside the basic emotions categories (49 different items used a total of 147 times). Second most frequent error was misidentification of happiness as sadness mixed with fear (4 different items used a total of 51 times), as anger mixed with disgust (3 items used 25 times), as fear (4 items used a total of 10 times), as pure sadness (4 different items used a total of 9 times), and as sadness mixed with fear and disgust (1 item used 4 times). The least frequent error was misidentification of happiness as surprise (1 item used 1 time). The expressions of sadness were also most frequently misidentified as sentiments beyond basic emotions (27 items used 67 times). Second most frequent error was misidentification of sadness as happiness (2 items used 15 times), third was misidentification of sadness as anger mixed with disgust (3 items used 6 times), and fourth was misidentification of sadness as fear (2 items used twice). Overall the errors made on the recognition of happiness were more frequent, more differentiated, and leaning definitively towards valence opposite to happiness.
[email protected]
6.4 Conclusions
137
6.4 Conclusions In previous research on emotional prosody one of the few points of consistency was the method of creating stimuli, in that it was typically a simple matter of instructing designated speakers to act out specified emotional tones. The emotional prosodic stimuli would then be perceptually validated in a variety of ways. In this study I decided to evoke emotions in my designated speakers, which would color their natural expressions of emotion and emotionally prime their acting out of emotions. With this crucial conceptual difference in how the emotional speech samples were produced, the validation procedure hinged on the speakers’ self-assessments of their emotional states throughout the emotion elicitation procedure. In terms of perceptual properties of the stimuli I had created, I conducted an exploratory study which additionally allowed me to test the experimental procedures that would make up the integrative paradigm to be deployed in the experiment proper (Chap. 7). The exploration study and analysis revealed several trends and patterns which spoke to the nature of emotional prosody processing, which could also reasonably be expected to become manifest in nonnative English speakers. The procedures used in the exploration study ranged from a broad scalar evaluation of valence and arousal, through the more structured categorization by basic emotions, to the highly specific emotion identification and naming. As the demands of emotion identification specificity increased so did the processing costs and the proportion of errors in emotion recognition. The valence/arousal evaluation procedure revealed that male speakers are consistently perceived as expressing emotions with higher levels of arousal than female speakers regardless of the emotions expressed. The levels of arousal as perceived by outside observers were markedly higher than those indicated by speakers’ self-assessments. The categorization evaluation procedure revealed that happiness is notably harder to recognize correctly from emotional prosody than sadness. The free naming evaluation procedure, analyzed on dimensional and categorical levels, revealed the specific nature of errors committed on emotion identification from prosody. On the dimensional level the accuracy of recognition was remarkably high. However, the entire class of “negative” valence vocabulary items used to describe the emotions in the stimuli included basic categories of emotions conventionally considered negative other than sadness. Analysis on the categorical level revealed that emotions such as anger, disgust, and fear were routinely perceived where they were not expressed. A lexicographic analysis conducted in the course of creating a classification tool additionally revealed that the basic emotions of sadness, fear, disgust, and anger are semantically linked to one another in a variety of ways. The fact that these categories appear to overlap in some areas of semantics and in valence may have been a contributing factor in some of the misidentifications. However, the high proportion of stimuli expressing happiness being misinterpreted as expressing some category of “negative” emotion suggests a different, more fundamental, underlying mechanism. With the processing demands in the free naming task
[email protected]
138
6 The Development of Stimuli for Emotional Prosody Research …
being high, the judges may have fallen back on negativity bias, which is a powerful adaptive fit mechanism. It is also a mechanism shaped by evolution, and could therefore be expected to play a role in the experiment proper where nonnative English speakers would be interpreting emotional prosody.
References Arnold, S., Roth, D., & Chechik, J. S. (1993). Benny and Joon (Motion Picture). USA: Twentieth Century Fox. Bagby, R. M., Parker, J. D. A., & Taylor, G. J. (1994). The twenty-item Toronto Alexithymia Scale—I. Item selection and cross-validation of the factor structure. Journal of Psychosomatic Research, 38(1), 23–32. Bartolini, E. E. (2011). Eliciting emotion with film: Development of a stimulus set. Unpublished BA thesis, Wesleyan University. Beddor, F., Steinberg, M., Thomas, B., Wessler, C. B., Farrelly, B., & Farrelly, P. (1998). There’s Something About Mary (Motion Picture). USA: Twentieth Century Fox. Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345. Bradley, M. M., & Lang, P. J. (1999). Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical Report C-1, the center for research in psychophysiology, University of Florida. Bruckheimer, J., Oman, C., & Yakin, B. (2000). Remember the Titans (Motion Picture). USA: Jerry Bruckheimer Films. Bryce, I., Gordon, M., Levinsohn, G., Spielberg, S., & Spielberg, S. (1998). Saving Private Ryan (Motion Picture). USA: DreamWorks SKG. Capra, F., & Capra, F. (1946). It’s a Wonderful Life (Motion Picture). USA: Liberty Films. Davies, M. (2008) The corpus of contemporary American english: 520 million words, 1990-present. http://corpus.byu.edu/coca. Accessed January 1, 2014. Del Vecho, P., Buck, C., & Lee, J. (2013). Frozen (Motion Picture). USA: Walt Disney Animation. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169–200. Ekman, P. (1994). All emotions are basic. In P. Ekman & R. Davidson (Eds.), The nature of emotion (pp. 15–19). Oxford: Oxford University Press. Goerlich, K. S., Witteman, J., Aleman, A., & Martens, S. (2011). Hearing feelings: Affective categorization of music and speech in alexithymia, an ERP study. PLoS ONE, 6(5), 1–11. Goldberg, D., Phillips, T., & Phillips, T. (2009). The Hangover (Motion Picture). USA: Warner Bros. Grazer, B., & Zieff, H. (1991). My Girl (Motion Picture). USA: Columbia Pictures. Haft, S., Witt, P. J., Thomas, T., & Weir, P. (1989). Dead Poets Society (Motion Picture). USA: Touchstone Pictures. Lang, P. J. (1980). Behavioral treatment and bio-behavioral assessment: Computer applications. In J. B. Sidowski, J. H. Johnson, & E. A. Williams (Eds.), Technology in mental health care delivery systems (pp. 119–137). Norwood, NJ: Ablex. Lovell, D., & Zeffirelli, F. (1979). The Champ (Motion Picture). USA: Metro-Goldwyn Meyer. Marian, V., Blumenfeld, H. K., & Kaushanskaya, M. (2007). The language experience and proficiency questionnaire (LEAP-Q): Assessing language profiles in bilinguals and multilinguals. Journal of Speech, Language, and Hearing Research, 50, 940–967. Marvin, N., & Darabont, F. (1994). Shawshank Redemption (Motion Picture). USA: Castle Rock Entertainment.
[email protected]
References
139
Mattila, A. K., Salminen, J. K., Nummi, T., & Joukamaa, M. (2006). Age is strongly associated with alexithymia in the general population. Journal of Psychosomatic Research, 61, 629–635. Merriam-Webster Dictionary. (2014). http://www.merriam-webster.com/dictionary. Accessed May 1, 2014. Oxford English Dictionary. (2014). http://www.oed.com. Accessed May 1, 2014. Quiros-Ramirez, M. A., Polikovsky, S., Kameda, Y., & Onisawa, T. (2012). Towards developing robust multimodal databases for emotion analysis. The 6th International Conference on Soft Computing and Intelligent Systems. Reiner, R., Scheinman, A., & Reiner, R. (1989). When Harry Met Sally (Motion Picture). USA: Metro-Goldwyn Meyer. Rivera, J., & Docter, P., & Peterson, B. (2009). Up (Motion Picture). USA: Pixar Animation. Rottenberg, J., Ray, R. D., & Gross, J. J. (2007). Emotion elicitation using films. In J. A. Coan & J. J. B. Allen (Eds.), Handbook of emotion elicitation and assessment (pp. 9–28). New York: Oxford University Press. Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. Schaefer, A., Nils, F., Sanchez, X., & Philippot, P. (2010). Assessing the effectiveness of a large database of emotion-eliciting films: A new tool for emotion researchers. Cognition and Emotion, 24(7), 1153–1172. Scherer, K. R. (2005). What are emotions? And how can they be measured? Trends and developments: Research on emotions. Social Science Information, 44(4), 695–729. Scherer, K. R., & Ellgring, H. (2007). Are facial expressions of emotion produced by categorical affect programs or dynamically driven by appraisal? Emotion, 7(1), 113–130. Sneddon, I., McRorie, M., McKeown, G., & Hanratty, J. (2007). The belfast induced natural emotion database. IEEE Transactions on Affective Computing, 3(1), 32–41. Velten, E. (1968). A laboratory task for induction of mood states. Behavioral Research and Therapy, 6, 473–482. Ververidis, D., & Kotropoulos, C. (2003). A state of the art review on emotional speech databases. Proceedings of 1st Richmedia Conference. doi:10.1.1.420.6988 Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48, 1162–1181. Wagner, H. L. (1997). Methods for the study of facial behavior. In J. A. Russell & J. M. Fernandez-Dols (Eds.), The psychology of facial expression (pp. 31–54). Cambridge, MA: Cambridge University Press. Warriner, A. B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191–1207. Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology, 54(6), 1063–1070. Westermann, R., Spies, K., Stahl, G., & Hesse, F. W. (1996). Relative effectiveness and validity of mood induction procedures: A meta-analysis. European Journal of Social Psychology, 26, 557–580.
[email protected]
Chapter 7
Emotional Prosody Processing in Nonnative English Speakers
7.1 Introduction Emotional prosody is the melody of speech modulated by emotions, a channel of expression straddling the controllable, and the uncontrollable in emotional expression in speech. Emotional prosody research is a loosely organized area of study bound merely by the subject matter and a universal acceptance of both the theoretical premise and the methodological framework of the standard view of emotions (Ekman 1992). Such as it is, however, the research has its limitations in theory, methodology, and population sampling, as has been described in detail in Chap. 5. This study has been designed to address these limitations. On theoretical level the emotional prosody research is almost exclusively limited to the universal basic emotions theory, at least in the initial premise and assumptions. In many cases, the study conclusions or within-area critical review works often point to dimensional or appraisal approaches for explanations of various effects or error patterns. The notable example of such cases include the observation that arousal (Busso et al. 2009) and/or valence (Russell and Barrett 1999; Scherer et al. 2003) may be better discriminating perceptual features for emotional prosody than basic emotion categorical traits, which suggests dimensional approach. Various reports of age (Paulmann et al. 2008; Fujisawa and Shinohara 2011) and gender (Banse and Scherer 1996; Schirmer and Kotz 2003; Wallbott and Scherer 1986; Bach et al. 2008) factors influencing the perception of emotional prosody, in turn, suggest that affective appraisals maybe involved on some critical level of emotion recognition. In other words, the emotional prosody research is consistently based on the standard view framework, but the results suggest dimensional and appraisal frameworks as viable for the subject matter as well. The one theory and framework which combines all three of these basic approaches—dimensional, standard, and appraisal—is the psychological construction of emotion (Barrett 2011) with its Conceptual Act Model (CAM), both discussed in Chaps. 2 and 4. To expand upon © Springer International Publishing Switzerland 2016 H. Ba˛k, Emotional Prosody Processing for Non-Native English Speakers, The Bilingual Mind and Brain Book Series 3, DOI 10.1007/978-3-319-44042-2_7
[email protected]
141
142
7 Emotional Prosody Processing in Nonnative English Speakers
the theoretical framework of the existing emotional prosody research, I would therefore conduct my study within the framework of psychological construction. On the methodological level, as Tables 5.1 and 5.2 in Chap. 5 illustrate, the research on emotional prosody has its limitations as well, the greatest being procedural. The experimental procedures in this area of study are one of the few points of consistency across the different studies. The grand majority of procedures are based on forced choice tasks with the response options limited to basic emotions with frequent addition of emotionally neutral tone. Any procedure other than forced choice is the exception rather than the rule. In validation studies dimensional and scalar evaluation procedures are occasionally deployed, but as I argued in Chap. 5, the validation studies are inconsistently reported and generally negleted as a procedurally significant detail. The procedure of forced choice tasks has been criticized in other areas of broadly understood emotion research (Russell 1994) as being partially responsible for the high recognition accuracy rates. With a limited number of distinct categories with distinct characteristics an emotion recognition task turns effectively into a stimulus categorization task. In this study, I decided to include the forced choice procedure, but add a response option labeled other for any stimuli that in the participants’ opinion did not fit in the target emotion categories of happiness or sadness. I made the decision to use other rather than neutral, which was used in the stimuli exploration study to loosen up the procedural task constraints and not suggest to the participants the presence of a distinct category of emotional neutrality. This would constitute an intermediate emotion recognition specificity demand task. I decided to also include a task where participants would evaluate emotions expressed in the stimuli using continuous scales of valence and arousal—a low recognition specificity demand task. Finally, I included a task where participants would be asked to name emotions expressed in the stimuli—a high recognition specificity demand task. The three tasks would be presented to every participant. The combination of three tasks, each corresponding to a hierarchically different emotion processing level, would form what I called the integrative study paradigm. By analyzing the recognition rates and error patterns across such different tasks, I believed I could gain a comprehensive insight into the overall nature of emotional prosody processing in nonnative English speakers. From a level of broad positive/negative and high/low intensity generalizations with low-processing cost and narrow potential margin of error, through a level of basic emotions categorization with medium processing cost and margin of error, to very specific emotion identification by naming with high processing costs and large potential error. Using an integrative paradigm like this I would be able to define both the points of similarity and divergence between native and nonnative speakers of English. There were two major reasons to opt for nonnative English speakers as the population of choice for this study. For one thing, bilingual populations have been uniquely underutilized in the existing research. As shown in Chap. 5, the research on emotional prosody so far focused on monolingual speakers interpreting emotions in prosody in their own language or in a language foreign to them. With the preferential focus on English in the existing body of evidence, we have a
[email protected]
7.1 Introduction
143
reasonable understanding of how native English speakers process emotional prosody in their mother tongue, if only on the categorical level. What the nature of this processing might be for individuals for whom English is a known but not native language remains unknown. With reliable estimates indicate the global population of nonnative English speakers overtaking that of native speakers by the year 2050 (Goddard 2003) this is a significant gap in our knowledge. The second reason is that nonnative speakers can provide some evidence in the standing debate between the universalism versus cultural specificity of emotions. Nonnative speakers of any language, specifically those not raised bilingual but only trained in the language in formal educational contexts possess the language, but not the socially acquired culture in the anthropological sense. For a psychologist, such a bilingual is an individual possessed of two conventional codes of communication, each mastered on a different level of proficiency. For an anthropologist such a bilingual is a native of one language and culture with an imperfect shadow of competence in the second language, a code of communication torn out of its cultural and social context. If enculturation has no bearing on emotion processing, nonnative English speakers should not have problems recognizing such basic emotions as happiness and sadness in English prosody. If, on the other hand, socially acquired culture does influence how we process emotions, we should expect the nonnative speakers to deviate significantly from the norms for native speakers established in previous research (see Fig. 6.1). To use Wierzbicka’s terms (Harkins and Wierzbicka 2001), the question here is, if nonnative English speakers have two linguistic habits of the mind, do they also develop two habits of the heart? The question was operationalized in four main hypotheses. Three of those were based on previous research and testing them would allow me to determine the points of similarity between the reported patterns of emotion processing for native English speakers. As stated in Chap. 5, these were Hypotheses 2–4. Hypothesis 2 stated that sadness would be recognized better than happiness in prosody, Hypothesis 3 stated that emotions would be recognized better when expressed by female rather than male speakers, and Hypothesis 4 stated that acted emotion expressions would be recognized more accurately than natural expressions. Hypothesis 1 was exploratory in character and postulated that the level of English language proficiency of the nonnative English speakers would influence their ability to recognize emotional prosody, with the participants of higher proficiency outperforming those of lower proficiency on overall accuracy. Additionally, based on the results from the exploratory study reported in Chap. 6, I would venture to postulate two more hypotheses. I thus propose Hypothesis 5, that the accuracy of emotion recognition from prosody will drop significantly as a function of recognition task difficulty. Furthermore, I propose Hypothesis 6, that negativity bias will be observable both in the accuracy scores and error patterns in the qualitative data from the free naming recognition task. Testing Hypotheses 1, 5, and 6 would allow me to determine the points of divergence between the patterns of emotional prosody recognition in native versus nonnative English speakers. Testing Hypotheses 2–4 would allow me to determine the points of similarity between the two populations.
[email protected]
144
7 Emotional Prosody Processing in Nonnative English Speakers
This study systematically expands upon the existing research on emotional prosody while simultaneously investigating the nature of emotion processing in nonnative English speakers, a substantial and heretofore uninvestigated population. Theory-wise, the study design is grounded in psychological construction with elements of anthropology and linguistic pragmatics. In terms of method it is based on three major research paradigms combined into a coherent integrative paradigm which would allow to make observations of emotional prosody processing on multiple levels and form a comprehensive understanding of the mechanisms involved. Additionally, the lexicographic and corpus-based tools designed for qualitative and quantitative analyses of the free naming task allow to reduce the human error risk factor from the classification of output and serve as a demonstration of the potential of contemporary linguistic methods for emotion research. Should the results indicate that nonnative English speakers process emotional prosody differently than native speakers reportedly do, the implications may be significant for the radical universalist approaches to emotion theory. Such a result would also provide initial support to complex and integrative approaches such as the psychological construction or appraisal theory. In practical terms this study should demonstrate the viability of procedurally complex designs, such as the integrative paradigm proposed here, and their value for emotion research in general.
7.2 Participants All participants enrolled in this study were English language majors at the Faculty of English, Adam Mickiewicz University and participated for partial credit in a variety of courses. There was a total of 133 participants (23 male and 110 female; age M = 21.4, SD = 1.9) divided into two groups by their English language proficiency. The group of lower proficiency (henceforth: LoPro) included students in the freshman year at the Faculty of English and B1 proficiency level by CEFR (Common European Framework of Reference for Languages). This group included 65 participants (14 male and 51 female; age M = 19.9, SD = 1.3). The group of higher proficiency (henceforth: HiPro) included students in Faculty of English MA programs with C2 English language proficiency level by CEFR. This group included 68 participants (9 male and 59 female; age M = 22.9, SD = 1.2). The original sample included six more individuals, two male and one female in the LoPro group and three female in the HiPro group. Data from those individuals was dropped as they did not fit the design-specific population profile which called for a homogenous population of bilingual Polish-English speakers. Those six individuals were Erasmus program exchange students from Spain, Morocco, China, Ukraine, and Russia. All participants were screened with Language History Questionnaire 2.0 (Li et al. 2014) to establish their linguistic backgrounds and their levels of exposure to culture in English native contexts. Only 14 individuals, four in the LoPro group and ten in the HiPro group have spent any length of time in English-speaking
[email protected]
7.2 Participants
145
countries and while summarily the lengths of stay ranged from 3 to 9 months, the continuous stays never exceeded 3 months. The locations of these stays were: the UK (seven participants), the USA (three participants), Ireland (three participants), Scotland (one participant), and Wales (one participant). Given that none of these individuals was bilingual from birth, all were raised as native speakers of Polish, and have visited English-speaking countries in their late teens or early adulthood, I decided not to make their exposure to native English social contexts a criterion of exclusion. In other aspects of linguistic background there were certain marked differences between the LoPro and HiPro groups. The participants in the LoPro group all spoke on average three languages, with Polish as their first and native language and English as their second. Their English was acquired in formal education with an average total length of time learning English at M = 11.4 years (SD = 2.61). The participants in this group also spoke a variety of languages as their third, all of them likewise acquired through formal education. The most frequently reported third language was German with an average of M = 6.5 years (SD = 2.87) of education for 21 participants, Russian with an average of M = 1.92 years (SD = 1.68) for 13 participants, Dutch with an average of M = 6.66 years (SD = 6.49) for 11 participants, and French with an average of M = 3.67 years (SD = 1.15) for 9 participants. Other participants who reported a third language indicated some knowledge of Bulgarian, Japanese, Italian, and Spanish. The participants in the HiPro group all spoke on average four languages, with Polish as their first and native language and English as their second. Their English was acquired in formal education with an average total length of time learning English at M = 13.4 years (SD = 2.91). The participants in this group also named a fairly diverse variety of languages as their third and fourth. The most frequent among those was Spanish with an average of M = 2.65 years (SD = 1.93) for 28 participants, followed by German with an average of M = 8.19 years (SD = 4.28) for 17 participants, French with an average of M = 4.85 years (SD = 1.98) also for 17 participants, and Russian with an average of M = 3.88 years (SD = 2.2) for 8 participants. Other participants in this group indicated some knowledge of Arabic, Dutch, Norwegian, Welsh, and Hindi. The participants were also screened for alexithymia with the Toronto Alexithymia Scale (Bagby et al. 1994). An independent samples t-tests revealed no significant difference in alexithymia scores between the LoPro (M = 53.08, SD = 12.98) and HiPro (M = 49.25, SD = 11.97) groups, t(131) = 1.77, p = 0.079, so the scores were collapsed for further analysis. Participants’ alexithymia scores revealed the normal and expected trends for the population on all factors of the TAS-20. As a group the participants scored well below the cutoff point for positive alexithymia diagnosis (scores ≥61) with a mean score of M = 51.12 (SD = 12.52). On “Difficulty Identifying Feelings” the participants scored M = 18.65 (SD = 6.26) out of a maximum of 35, on “Difficulty Describing Feelings” the participants scored M = 14.24 (SD = 4.81) out of a maximum of 21, and on “Externally-Oriented Thinking” the participants scored M = 18.23 (SD = 4.71) out of a maximum of 24. All in all the scores fit in well with the largely female and well-educated population involved here (Mattila et al. 2006).
[email protected]
146
7 Emotional Prosody Processing in Nonnative English Speakers
7.3 Materials Details regarding the procedure of emotion elicitation, recording, and processing of stimuli can be found in Chap. 6 of this book. The stimuli were 162 samples of emotional speech, 96 of them acted and 66 natural expressions of emotion, all processed through a low bandpass filter (60–300 Hz) to remove semantic information. The stimuli came from eight native English speakers (four male and four female, aged M = 18.63, SD = 0.48), half of whom were primed to experience and express happiness, the other half—sadness. Thus of the 96 acted expressions of emotion, 48 conveyed sadness (24 from female speakers and 24 from male speakers), the other 48 conveyed happiness (24 from female speakers and 24 from male speakers). Of the 66 natural expressions, 33 conveyed sadness (16 from female and 17 from male speakers), the other 33 conveyed happiness (15 from female and 18 from male speakers). The stimuli were all validated by the speakers’ self-evaluations of their emotional states during the procedure in which samples of emotional speech were elicited from them. The stimuli were sorted into three fixed sets of 54 stimuli counterbalanced for the speakers’ gender (male × female), emotion expressed (happiness × sadness), and manner of expression (acted × natural). The experimental procedure was designed in such a way that each participant would be confronted with all 162 stimuli across three different experimental tasks, 54 stimuli in each. The experiment was designed using E-Prime 2.0 software. The hardware included DELL Optiplex 7010 (with Intel® Core™ i3-3220 CPU dual core processor) PC computers with 22.9″ DELL P2312Ht screens (60 Hz) and standard DELL PC keyboards, and closed Sennheiser HD201 headphones. Three questionnaires were also used: the Language History Questionnaire 2.0 or LHQ (Li et al. 2014), the Toronto Alexithymia Scale or TAS20 (Bagby et al. 1994), and a custom Post-probe questionnaire designed for this study. The Post-probe questionnaire asked the participants how many speakers they believed were involved in the production of the stimuli, what genders they were and what languages they spoke, as well as the perceived difficulty of the experimental tasks. All questionnaires were administered in pen and paper format.
7.4 Experimental Procedure Each participant was tested individually at the Language and Communication Laboratory at the Faculty of English, Adam Mickiewicz University. The procedure started with participants filling out the LHQ, followed by three experimental tasks, followed by filling out the TAS-20 and Post-probe questionnaires. The three experimental tasks were administered to all participants in different, counterbalanced orders. The language of the instructions and of the experimental procedure was English. In all experimental tasks the participants were instructed that they would hear a series of audio clips of human speech conveying various emotions, and that these audio clips were modified in such a way that words would not be possible to understand.
[email protected]
7.4 Experimental Procedure
147
In one of the procedures the participants were instructed to listen to the audio clips and evaluate the emotional content conveyed first as positive or negative using a continuous scale from (−3) to (+3) and second as being of high or low intensity, also on a scale from (−3) to (+3). The first evaluation pertained to valence, the second to arousal, so the procedure was documented as the ValAr procedure. The valence scale included a (0) for neutral valence, and the intensity scale included (0) for moderate intensity. Each trial begun with a fixation (+) mark presented at the center of the screen. The participants would hear a stimulus (a visual waveform cue would accompany the stimulus on display), after which they would see the valence scale at the top of the screen and a response window below. The participants would make their evaluation of valence and type their response in using the number pad on their computer keyboards. Upon entering their valence evaluation, participants would press enter which would call out the intensity evaluation scale at the top of the screen with a response window below. Again the participants would make their evaluations and type in a response and press enter, which would complete a full trial. In another procedure the participants were instructed to listen to the audio clips and categorize them as conveying happiness, sadness, or any other emotion or feeling different than either happiness or sadness. The stimulus material exploration study showed that using neutral emotional tone as one of the response options may be too restrictive. Tonal and emotional neutrality have certain category specific characteristics which have to be sought in the perceptual input and matched to conceptual categories stored in long-term memory. In other words scanning for neutrality in emotional tone maybe as costly in terms of processing as scanning for specific emotions. To lessen the processing cost in this task I decided to change the third allowable response option from neutral to other. This procedure was documented as the Categorization (Cat.) procedure. Each trial begun with a fixation (+) mark presented at the center of the screen. The participants would hear a stimulus (a visual waveform cue would accompany the stimulus on display), after which they would see a prompt screen showing the response options as with both icons and category names: “ happy”, “ other”, and “ sad”. The participants responded by clicking the appropriate key on their keyboards. The response keys were (v) for happy, (b) for other emotion, and (n) for sad, each marked with a colored paper slip marked with appropriate icon of ( ), ( ), and ( ) respectively. Pressing any of the response buttons would complete a full trial. In the remaining task the participants were instructed to listen to the audio clips and name the emotions conveyed in them using one word. Phrasal verbs were allowed, as were repetitions of the same terms multiple times, and using backspace. Participants were advised to focus on finding the most appropriate term rather than on spelling. This procedure was documented as the Free (naming) procedure. Each trial begun with a fixation (+) mark presented at the center of the screen. The participants would hear a stimulus (a visual waveform cue would accompany the stimulus display), after which they would see a prompt screen instructing them to “Type in your response:” with a response window underneath the instruction. The participants would type in their responses using the entire keyboard and press enter, which would complete a full trial.
[email protected]
148
7 Emotional Prosody Processing in Nonnative English Speakers
7.5 Data Processing and Determining Accuracy The data from each experimental task had to be processed for statistical analysis in a different way and in each accuracy was determined in a different way. Both the processing and accuracy determination procedure were the same here as in the stimulus exploration study described in detail in Chap. 6. In the ValAr task all responses evaluating stimuli conveying sadness as negative were tagged as correct, as were all responses evaluating stimuli conveying happiness as positive. All responses evaluating either stimulus type as neutral, as well as those evaluating either type as its opposite valence were tagged as incorrect. In the Categorization task all responses categorizing stimuli conveying sadness as “ sad”, and all responses categorizing stimuli conveying happiness as “ happy” were tagged as correct. All responses categorizing either stimulus type as “ other”, as well as all categorizing stimuli conveying happiness as “ sad” or those conveying sadness as “ happy” were all tagged as incorrect. Finally, in the Free (naming) task the analysis begun with spellchecking and correcting the spelling of the participants’ responses where necessary. The responses were then tagged for both valence and basic emotion category membership using a dictionary- and corpus-based tool designed for that purpose (see Chap. 6 for details). The tagging procedure thus yielded material for two-tier analysis. On the more general level, the participants’ responses were tagged for valence based on valence evaluation in existing corpora. Therefore any response given to stimuli conveying sadness and corpus-tagged as negative were also tagged as correct, as were all responses to happiness-conveying stimuli corpustagged as positive on this level. All cases where a response to either type of stimulus was corpus-tagged with valence opposite to it, neutral valence, and all cases where a given response did not appear in corpus database were tagged as incorrect. This level of analysis was documented as FreeValAr. On a more specific level, the responses were tagged for their basic emotion category/family membership based on lexicographic data from English language thesauri and dictionaries. On this level, documented as FreeCat, all responses to sadness-conveying stimuli tagged as belonging to the basic emotion category sadness and all responses to happiness-conveying stimuli tagged as belonging to the basic emotion category happiness were tagged as correct. All responses to either type of stimulus tagged as belonging to the other, all responses tagged as belonging to basic emotion categories other than happiness or sadness, and all responses categorized as falling outside the basic emotion categories were tagged as incorrect. The data as collected would be analyzed for accuracy in quantitative terms and for error patterns in qualitative terms to attempt to explain the underlying causes of emotion recognition errors.
[email protected]
7.6 Results
149
7.6 Results To test the main hypotheses (1–4) the data was analyzed using a three-way repeated measures ANOVA with within-subjects effects of speakers’ Gender (female × male), Emotion expressed (happiness × sadness), and Manner of expression (acted × natural), and with Proficiency (LoPro × HiPro) as the between-subjects effect. Although there were certain interactions in different tasks, the main effects will be reported and analyzed as well, as they speak directly to the hypotheses. Hypothesis 5 was tested using a one-way repeated measures ANOVA with within-subjects effect of task difficulty (ValAr × Cat. × FreeValAr × FreeCat) and between-subjects effect of Proficiency (LoPro × HiPro). Hypothesis 6 was based on the results of testing of Hypotheses 1–5 and a qualitative analysis of the Free (naming) task. I will first report the results for each experimental task separately starting with the quantitative analysis and following it with a brief report on the qualitative analysis results. Second, I will report on the task difficulty analysis. Finally, I will go over the error patterns across all the tasks to verify Hypothesis 6. The results for the main effects are summarized in Tables 7.1 and 7.2.
7.6.1 The Valence and Arousal Evaluation Task (ValAr) Results In the ValAr task there was a significant main effect of speakers’ Gender, F(1, 131) = 12.20, ƞ2 = 0.085, p = 0.001; emotional prosody was recognized significantly better in the female voices (M = 0.494, SE = 0.011) than in male voices (M = 0.454, SE = 0.012). There was also a significant main effect of the Emotion expressed, F(1, 131) = 48.42, ƞ2 = 0.270, p = 0.000; emotional prosody of sadness (M = 0.545, SE = 0.015) was recognized significantly better than emotional Table 7.1 Main effects summary across the ValAr, Cat., and Free (both levels of analysis) tasks
ValAr
Cat.
FreeValAr
FreeCat
Gender Emotion Manner Gender Emotion Manner Gender Emotion Manner Gender Emotion Manner
F 12.20 48.42 2.12 54.46 176.31 34.28 5.86 272.93 1.97 81.83 216.58 15.65
[email protected]
Df (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131)
ƞ2 0.085 0.270 0.016 0.294 0.574 0.207 0.043 0.676 0.015 0.384 0.623 0.107
Sig. 0.001 0.000 0.148 0.000 0.000 0.000 0.017 0.000 0.163 0.000 0.000 0.000
150 Table 7.2 Main means comparisons in the ValAr, Cat., and Free (both levels of analysis) tasks
7 Emotional Prosody Processing in Nonnative English Speakers
Gender
ValAr Cat. FreeValAr FreeCat
Emotion
ValAr Cat. FreeValAr FreeCat
Manner
ValAr Cat. FreeValAr FreeCat
Female SE M 0.494 0.011 0.448 0.010 0.398 0.011 0.221 0.010 Happiness 0.402 0.014 0.305 0.013 0.246 0.011 0.077 0.007 Acted 0.483 0.013 0.440 0.010 0.393 0.012 0.199 0.010
Male M 0.454 0.372 0.373 0.142 Sadness 0.545 0.516 0.525 0.286 Natural 0.465 0.380 0.378 0.164
SE 0.012 0.010 0.010 0.008 0.015 0.011 0.014 0.014 0.011 0.010 0.010 0.008
prosody of happiness (M = 0.402, SE = 0.014). The main effect of Manner of expression was not significant, F(1, 131) = 2.12, ƞ2 = 0.016, p = 0.148; the recognition of acted emotional prosody (M = 0.483, SE = 0.013) was not statistically different from the recognition of natural emotional prosody (M = 0.465, SE = 0.011). There was a significant interaction between the Emotion and Manner, F(1, 131) = 6.26, ƞ2 = 0.046, p = 0.014. A post hoc Bonferroni correction revealed that the emotion of sadness significantly facilitated the differentiation in the recognition rates of acted (M = 0.567, SE = 0.017) expressions of emotion which were recognized better than the natural (M = 0.524, SE = 0.016) expressions, but there was no such facilitation effect in the expressions of happiness, where there was no significant difference between acted (M = 0.398, SE = 0.016) and natural (M = 0.406, SE = 0.017) expressions. There was a significant betweensubjects effect of Proficiency, F(1, 131) = 4.09, ƞ2 = 0.030, p = 0.045. A post hoc Bonferroni correction revealed the participants in the LoPro Proficiency group (M = 0.494, SE = 0.014) recognized emotions in prosody significantly better than the participants in HiPro Proficiency group (M = 0.454, SE = 0.014). Interactions between Proficiency groups are summarized in Tables 7.3 and 7.4, the significant interactions between the within-subjects factors are summarized in Table 7.5. Table 7.3 Between-subjects effects across the ValAr, Cat., and Free (both levels of analysis) tasks
ValAr Cat. FreeValAr FreeCat
F 4.09 0.148 7.57 8.07
[email protected]
df (1, 131) (1, 131) (1, 131) (1, 131)
ƞ2 0.030 0.001 0.055 0.058
Sig. 0.045 0.701 0.007 0.005
7.6 Results
151
Table 7.4 Between-subjects means comparison in the ValAr, Cat., and Free (both levels of analysis) tasks
ValAr Cat. FreeValAr FreeCat
LoPro M 0.494 0.414 0.411 0.205
HiPro M 0.454 0.407 0.359 0.158
SE 0.014 0.013 0.013 0.012
SE 0.014 0.013 0.013 0.012
Table 7.5 All significant integrations across the ValAr, Cat., and Free (both levels of analysis) tasks ValAr Cat.
FreeValAr FreeCat
Emotion × Manner Gender × Emotion Gender × Emotion × Manner Proficiency × Gender × Manner Gender × Manner Emotion × Manner Gender × Emotion Gender × Manner Emotion × Manner Gender × Emotion × Manner
F 6.26 10.42 16.12 9.25 5.10 9.94 43.07 4.70 31.31 11.74
Df (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131) (1, 131)
ƞ2 0.046 0.074 0.011 0.066 0.038 0.071 0.247 0.035 0.193 0.082
Sig. 0.014 0.002 0.000 0.003 0.026 0.002 0.000 0.032 0.000 0.001
The mean values of valence and arousal evaluations made by the participants differed markedly from both the speakers’ self-assessments of their core affect (Chap. 5) and the judges’ evaluations of perceived core affect (Chap. 6). Collapsed across Proficiency groups the participants’ valence evaluations for the female speakers expressing sadness was of relatively higher valence (M = −0.49) and higher arousal (M = 0.18) than they were for both the speakers and the judges. For the female speakers expressing happiness both the valence (M = 0.03) and arousal (M = 0.68) evaluations were lower than those of the speakers and the judges. For the male speakers expressing sadness the evaluations made by the participants here were higher for valence (M = −0.55) than those made by the speakers and the judges, and evaluations for arousal (M = 0.53) were higher than those made by speakers but equivalent to those made by judges. For the male speakers expressing happiness the evaluations of valence (M = −0.19) were noticeably lower than those made by both the speakers and the judges, and the evaluations of arousal (M = 1.17) were slightly lower than those made by judges, but much higher than those made by the speakers. Figure 7.1 illustrates the differences in the means of both valence and arousal in the speakers’ self-assessments, the judges’ and the nonnative participants’ evaluations.
[email protected]
152
7 Emotional Prosody Processing in Nonnative English Speakers
Fig. 7.1 Valence and arousal scores for the speakers, judges, and the nonnative participants
7.6.2 The Categorization Task (Cat.) Results In the Cat. task there was a significant effect of Gender, F(1, 131) = 54.56, ƞ2 = 0.294, p = 0.000; emotional prosody was recognized significantly better in female voices (M = 0.448, SE = 0.010) than in male voices (M = 0.372, SE = 0.010). There was also a significant main effect of Emotion, F(1, 131) = , ƞ2 = 176.31, p = 0.000; emotional prosody of sadness was recognized significantly better (M = 0.516, SE = 0.011) than emotional prosody of happiness (M = 0.305, SE = 0.013). The Manner of emotional expression also yielded a significant effect, F(1, 131) = 34.28, ƞ2 = 0.207, p = 0.000; the acted expressions of emotion (M = 0.440, SE = 0.010) were recognized better than natural expressions (M = 0.380, SE = 0.010). The between-subjects comparison revealed no significant differences between the LoPro and HiPro Proficiency groups, F(1, 131) = 0.148, ƞ2 = 0.001, p = 0.701. In the Categorization task there was no statistical difference between the LoPro group (M = 0.414, SE = 0.013) and the HiPro group (M = 0.407, SE = 0.013).
[email protected]
7.6 Results
153
There was a significant interaction between Gender and Emotion expressed, F(1, 131) = 10.42, ƞ2 = 0.074, p = 0.002. A post hoc Bonferroni correction revealed that significant differences between male (M = 0.451, SE = 0.015) and female (M = 0.581, SE = 0.015) speakers only appear when the speakers express sadness, but the differences are not significant when male (M = 0.294, SE = 0.016) and female (M = 0.315, SE = 0.016) speakers express happiness. There was also a significant three-way interaction between Gender, Emotion, and Manner, F(1, 131) = 16.12, ƞ2 = 0.011, p = 0.000. A post hoc Bonferroni correction revealed that the participants were more sensitive to the subtleties of acted (M = 0.628, SE = 0.020) versus natural (M = 0.533, SE = 0.016) expressions of sadness in female speakers, but not for the male speakers, for whom the participants were significantly better at recognizing acted (M = 0.346, SE = 0.019) versus natural (M = 0.242, SE = 0.017) happiness. The participants did not differentiate significantly between acted (M = 0.317, SE = 0.017) versus natural happiness (M = 0.313, SE = 0.022) in female voices or acted (M = 0.469, SE = 0.020) versus natural (M = 0.433, SE = 0.017) sadness in male voices. Finally, there was also a significant interaction between the Proficiency, Gender, and Manner of emotion expression F(1, 131) = 9.25, ƞ2 = 0.066, p = 0.003. The participants in the LoPro group found the emotion in the male voices easier to identify when they were acted (M = 0.426, SE = 0.018) rather than natural (M = 0.318, SE = 0.017), and the participants in the HiPro group found the emotion in the female voices easier to recognize when acted (M = 0.478, SE = 0.018) rather than natural (M = 0.403, SE = 0.017). Conversely, the participants in the LoPro group did not differ in their recognition of acted (M = 0.469, SE = 0.018) versus (M = 0.433, SE = 0.019) natural emotion expressed by female speakers, and the participants in the HiPro group did not differ in their recognition of acted (M = 0.389, SE = 0.017) versus natural (M = 0.357, SE = 0.017) emotion expressed by male speakers.
7.6.3 The Free (Naming) Task Results—Statistical Analysis Results The Free (naming) task results were analyzed on two levels, first the output was analyzed according to accuracy determined by valence, second the output was analyzed according to accuracy determined by basic emotion category membership. In accordance with the psychological construction theory, the processing of emotional content runs on multiple overlapping levels of complexity involving both core affect expressed in valence and arousal, and appraisals expressed in the specific vocabulary choices. Therefore the results from the Free (naming) task were analyzed on the FreeValAr and FreeCat levels as if they were separate tasks, even though from the participants’ perspective there was only one task performed. In the FreeValAr analysis there was a significant main effect of Gender F(1, 131) = 5.86, ƞ2 = 0.043, p = 0.017. Emotions in female voices were
[email protected]
154
7 Emotional Prosody Processing in Nonnative English Speakers
recognized better (M = 0.398, SE = 0.011) than emotions in male voices (M = 0.373, SE = 0.010). There was also a significant main effect of Emotion F(1, 131) = 272.93, ƞ2 = 0.676, p = 0.000. Sadness was recognized significantly better (M = 0.525, SE = 0.014) than happiness (M = 0.246, SE = 0.011). The effect of the Manner of expression of emotions was not significant in this analysis F(1, 131) = 1.97, ƞ2 = 0.015, p = 0.136. There were no significant differences in emotion recognition whether the expressions were acted (M = 0.393, SE = 0.012) or natural (M = 0.373, SE = 0.010). There was a significant between-subjects effect of Proficiency, F(1, 131) = 7.57, ƞ2 = 0.055, p = 0.007. The participants in the LoPro group were significantly more accurate in their recognition of emotional prosody (M = 0.411, SE = 0.013) than were the participants in the HiPro group (M = 0.359, SE = 0.013). There was a significant interaction between the Gender and Manner F(1, 131) = 5.10, ƞ2 = 0.038, p = 0.026. There were significant differences in emotion recognition in female voices (M = 0.415, SE = 0.014) versus emotions in male voices (M = 0.371, SE = 0.013) but only in the acted expressions. In the natural expressions there were no significant differences between female (M = 0.380, SE = 0.012) and male voices (M = 0.375, SE = 0.012). There was also a significant interaction between Emotion expressed and Manner of expression F(1, 131) = 9.94, ƞ2 = 0.071, p = 0.002. When the speakers were expressing sadness the participants recognized the emotion better when it was acted (M = 0.548, SE = 0.017) rather than natural (M = 0.501, SE = 0.015). There was no significant difference between acted (M = 0.238, SE = 0.013) and natural (M = 0.255, SE = 0.014) expressions when the emotion expressed was happiness. In the FreeCat analysis there was a significant main effect of Gender, F(1, 0.000. Emotions were recognized signifi131) = 81.83, ƞ2 = 0.384, p = cantly better in female voices (M = 0.221, SE = 0.010) than in male voices (M = 0.142, SE = 0.008). There was also a significant main effect of Emotion F(1, 131) = 216.58, ƞ2 = 0.623, p = 0.000. Sadness was recognized significantly better (M = 0.286, SE = 0.014) than happiness (M = 0.077, SE = 0.007). The Manner of expression of emotion effect was also significant, F(1, 131) = 15.65, ƞ2 = 0.107, p = 0.000. The acted expressions of emotion (M = 0.199, SE = 0.010) were significantly better recognized than natural expressions (M = 0.164, SE = 0.008). There was also a significant between-subjects effect, F(1, 131) = 8.07, ƞ2 = 0.058, p = 0.005. Once again, the participants in the LoPro group was better at recognizing emotional prosody (M = 0.205, SE = 0.012) than a HiPro group (M = 0.158, SE = 0.012). There were four significant interactions on this level of analysis. First was between Gender and Emotion F(1, 131) = 43.07, ƞ2 = 0.247, p = 0.000. The participants recognized sadness in female voices (M = 0.360, SE = 0.018) significantly more accurately than in male voices (M = 0.212, SE = 0.015). There was no significant difference in the recognition of happiness the male (M = 0.083, SE = 0.009) versus female voices (M = 0.071, SE = 0.008).
[email protected]
7.6 Results
155
Second significant interaction was between Gender and Manner, F(1, 131) = 4.70, ƞ2 = 0.035, p = 0.032. In the female voices the participants recognized acted emotion (M = 0.248, SE = 0.014) significantly better than natural emotion (M = 0.195, SE = 0.011). In the male voices the pattern was the same with acted emotions (M = 0.151, SE = 0.011) recognized significantly better than natural (M = 0.132, SE = 0.009) but the recognition scores were also significantly lower for the male than for the female voices. Third significant interaction was between Emotion and Manner, F(1, 131) = 31.31, ƞ2 = 0.193, p = 0.000. Acted sadness (M = .325, SE = .017) was recognized significantly better than natural sadness (M = .247, SE = .014). There was no significant difference between acted (M = 0.074, SE = 0.008) and natural (M = 0.080, SE = 0.008) happiness, however. Finally, there was a significant three-way interaction between Gender, Emotion, and Manner, F(1, 131) = 11.74, ƞ2 = 0.082, p = 0.001. The participants were significantly more accurate in their recognition of female sad voices when the emotion was acted (M = 0.420, SE = 0.024) versus natural (M = 0.300, SE = 0.018) than male voices expressing sadness, and in the latter case the acted expressions (M = 0.230, SE = 0.018) were also recognized more accurately than natural (M = 0.195, SE = 0.015). In the expressions of happiness there were no significant differences of such kind. For the female speakers the acted expressions of happiness (M = 0.075, SE = 0.009) were no different from natural expressions of happiness (M = 0.090, SE = 0.012), and collectively the scores for female happiness recognition were no different from the scores for male happiness whether acted (M = 0.073, SE = 0.009) or natural (M = 0.069, SE = 0.009).
7.6.4 The Free (Naming) Task—Qualitative Results Analysis The participants’ output in the Free (naming) task was tagged first for valence and second for basic emotion category membership (see Chap. 6 for details of the tagging procedure). To establish the directionality of errors committed on emotional prosody identification I analyzed the basic emotion tags in all trials tagged as errors by valence. For example, if a stimulus conveying sadness was identified by a participant using a vocabulary item which was subsequently corpus-tagged as being of positive valence that would then be considered as an emotion identification error of interest. In all such cases I would then look at the basic emotion tags to establish what was the target emotion misidentified as. There was a considerable data loss associated with this angle of analysis. Data loss due to the items used by the participants being absent from the valence corpus included 1418 trials, which constituted about 15 % of all trials collapsed across Proficiency groups. Further 17 % of the data (1571 trials) was lost due to the items being corpus-tagged as conveying “neutral” valence. Of the remaining total of 6429 trials (68 % of all trials) 2807 trials (30 % of all trials) constituted errors of interest for this qualitative, exploratory analysis.
[email protected]
156
7 Emotional Prosody Processing in Nonnative English Speakers
As has been specified in Chap. 6, the basic emotions considered in lay terms to be broadly “negative”, that is sadness, anger, fear, and disgust, were found to be semantically overlapping, causing certain items of vocabulary to be tagged as belonging to more than one basic emotion category. The same was not true of the one basic emotion considered to be broadly “positive” in lay terms, happiness. Therefore certain aspects of this analysis will be based on “negative” emotion classifications which include categorical overlap due to multiple basic emotion membership of certain vocabulary items. The instructions in this experimental task allowed the participants to reuse words and use both adjectives and nouns. Therefore I will also be considering both the total number of trials in which a given vocabulary item was used and the total number of items per basic emotion category if the items were used multiple times. 7.6.4.1 The Free (Naming) Task—Analysis of Correct Responses Accuracy of recognition in this experimental task in the first order was determined by accuracy by valence which is a broader and more encompassing dimension than discrete emotion categories. There were therefore a number of cases where responses to stimuli of conveying certain emotions would be tagged as correct on the FreeValAr level, but as incorrect on the FreeCat level. Yet whether analyzed as correct or incorrect, the patterns of distribution of the basic emotion tags was very similar with respect to the types of responses given to expressions of happiness versus sadness in male versus female voices. As discussed in Chap. 6, in lexicographic terms all conventionally “negative” basic emotion terms overlap semantically. However, what became apparent in the data form this experiment was that certain negative emotions overlap more frequently with one another than others. A summary of the distribution and tallies of all the items discussed in this section can be found in Table 7.6 (for correct responses to stimuli conveying sadness) and Table 7.7 (for correct responses to stimuli conveying happiness). In Tables 7.6 and 7.7 the “Items” column lists specific vocabulary items used by the participants to correctly identify the emotion in prosodic stimuli. The column “Speakers(counts)” is divided into “Female” and “Male” and lists the number of times total a given vocabulary item was used to correctly identify emotions in female vs. male speakers. The “FreeCat Tag” column indicates the basic emotion category membership assigned to a given vocabulary item with H standing for happiness, S - sadness, A - anger, F - fear, D - disgust, and Sur - surprise.1 It appears that the emotions of sadness and fear, both associated with withdraw behaviors and flight rather than fight, both of relatively low-arousal overlap to a significant extent, and both are more frequently attributed to female speakers expressing sadness. The participants used 22 different terms for pure sadness in
1This analysis only includes the relevant sadness/happiness data. To obtain full tallies of the response items tagged as other emotions please contact the author (
[email protected]).
[email protected]
7.6 Results Table 7.6 The tally of vocabulary items used to correctly identify sadness in Free (naming) task
157 Items Sad Worried Unhappy Regretful Distressed Depressed Disappointed Sadness Bad Miserable Disappointment Melancholy Sorrow Regret Hopeless Desperate Despair Depression Boredom Gloomy Mourning Sullen Despondent Tearful Discouraged Grievous Sorrowful Heartbroken Shameful Upset Nervous Scared Anxious Afraid Anxiety Frightened Fear Scary Unwilling Worry Apprehensive
Speakers (counts) Female Male 552 274 36 19 23 9 6 7 1 – 61 48 46 48 37 10 9 19 14 – 7 5 5 2 3 4 2 4 4 2 3 2 4 1 4 1 1 3 1 2 1 2 – 2 1 1 2 – 1 1 – 2 1 – 1 – 1 – 14 9 17 17 17 9 18 8 11 4 5 1 2 4 4 1 – 5 1 3 3 1 2 1
[email protected]
FreeCat Tag SF SF SF SF SF S S S S S S S S S S S S S S S S S S S S S S S S SFD F F F F F F F F F F F (continued)
158 Table 7.6 (continued)
7 Emotional Prosody Processing in Nonnative English Speakers Items Uncertainty Doubt Fearful Terrified Cowardly Timid Angry Annoyed Mad Anger Furious Annoyance Displeased Bitter Irritation Resentful Rage Hostile Spiteful Argumentative Disgusted Discontent
Speakers (counts) Female Male – 2 2 – 1 1 2 – 1 – – 1 25 162 6 12 3 13 2 7 1 2 – 2 1 – 1 3 1 2 – 2 – 2 – 1 1 – – 1 1 2 2 –
FreeCat Tag F F F F F F AD AD AD AD AD AD AD A A A A A A A D D
209 trials to accurately identify stimuli conveying sadness. In addition to this they used 14 different terms for fear in 86 trials, five different terms tagged as both sadness and fear in 618 trials, and one term tagged as a mixture of sadness, fear and disgust in 14 trials. This totaled 922 trials using 42 different terms for low-arousal withdrawal-type emotions of sadness and fear used to describe female speakers expressing sadness, all of which can be reliably classified as correct identifications on both the FreeValAr and FreeCat levels of analysis. The same could not be said for the 44 trials using a total of 11 vocabulary items tagged as anger mixed with disgust (38 trials, six items), pure anger (three trials, three items), or pure disgust (three trials, two items). Conversely, the emotions of anger and disgust, both associated with approach behaviors and fight rather than flight, both of relatively high-arousal overlap to a significant extent, and both are more frequently attributed to male speakers expressing sadness. This pattern proved true in a total of 211 trials where 13 different terms for anger and/or disgust were used, including 198 trials using six items tagged as both anger and disgust, 11 trials using six items tagged as pure anger, and two trials using one item tagged as pure disgust. Once again, this portion of the technically correct responses on the FreeValAr level can be challenged
[email protected]
7.6 Results Table 7.7 The tally of vocabulary items used to correctly identify happiness in Free (naming) task
159 Items Happy Good Cheerful Satisfied Content Hopeful Happiness Glad Optimistic Joyful Encouraging Joy Pleased Helpful Upbeat Jolly Merry Carefree Thankful Right Lucky Satisfaction Surprised Amazed Apologetic Care Passionate Cool
Speakers (counts) Female Male 101 104 20 10 11 6 5 11 11 4 8 1 1 8 3 5 2 4 – 4 1 3 2 1 2 – 1 1 2 – 1 1 – 1 – 1 – 1 1 – 1 – 1 – 8 20 – 3 6 3 2 – 1 1 – 1
FreeCat Tag H H H H H H H H H H H H H H H H H H H H H H Su Su F F A A
on the FreeCat level. However, as for female speakers so for male speakers, there was a large proportion of responses which were reliably correct on both levels of analysis. There was a total of 477 such correctly identified trials in which 24 items were used, in 309 trials four items were tagged as both sadness and fear, in 159 trials 19 items were tagged as pure sadness, and in nine trials one item was tagged as a mixture of sadness, fear, and disgust. In the case of correct identifications of happiness in male versus female voices, there were no marked differences in the patterns. Participants used 18 different terms for happiness to correctly identify the emotion in 174 trials for female speakers, and 17 different terms in 166 trials for male speakers. Additionally, two items used 23 times for male speakers and one item used twice for female speakers denoted positive-valence surprise which could be considered marginally
[email protected]
160
7 Emotional Prosody Processing in Nonnative English Speakers
correct on both the FreeValAr and FreeCat levels of analysis. There were also a couple of conflicting classifications. There were four items tagged as correctly positive on the FreeValAr level, but as negative fear (“apologetic”, “care”) and anger (“passionate”, “cool”) on the FreeCat level, but these can be explained by the fact that they can indeed be considered positive or negative depending on the situational context in which they are typically used. 7.6.4.2 The Free (Naming) Task—Analysis of Incorrect Responses The 2807 trials constituting errors of interest included a total 271 different items of vocabulary. 708 trials (98 items) of this total were erroneous responses to stimuli conveying sadness, and 2099 trials (173 items) were responses to stimuli conveying happiness. In other words, happiness was three times more likely to be misidentified as a negative emotion than sadness was to be misidentified as a positive emotion. Considering the basic emotions, the most substantial proportion of errors were tagged as belonging to the category of other (emotions), that is falling outside the basic emotions categories. Errors of Interest tagged as other (emotions) included 1072 trials (184 items). The misidentifications of sadness as positive emotions included 172 trials using 21 different items of vocabulary tagged as belonging to the basic emotion category of happiness, 16 trials using two items tagged as surprise, seven trials using one item tagged as fear, one trial using one item tagged as anger, and one trial using one item tagged as sadness. In this case the fact that some items for negative emotions were classified as erroneous misidentifications of sadness (negative emotion) as negative comes from the fact that these items fell close to the neutral valence ratings in the corpus tagging or they carry a mixed semantic meaning which can become positive or negative depending on the context in which they are used. The items were respectively “apologetic” for fear, “passionate” for anger, and “blue” for sadness. A summary of the distribution and tallies of all the items discussed in this section can be found in Table 7.8 (for erroneous responses to stimuli conveying sadness) and Table 7.9 (for erroneous responses to stimuli conveying happiness). The misidentifications of happiness as negative emotions presented a much less ambiguous case in terms of valence, but much more so in terms of category membership. In their misidentifications of happiness, the participants most often used items tagged as belonging to both anger and disgust emotion categories—in 638 trials using eight items. Furthermore, vocabulary items tagged as pure anger counted 10 used in 38 trials and items tagged as pure disgust counted two used in five trials. 19 more trials using one item (“upset”) tagged as a mixture of sadness, fear and disgust make happiness most frequently mistaken for disgust and anger. The next most frequent error was the misidentification of happiness using vocabulary items tagged as a mixture of sadness and fear with 431 trials using four items. The direction of error was further bolstered by the 206 trials (using 20 items) where happiness was misidentified as pure sadness and 154 trials (using 15 items) where happiness was misidentified as pure fear. Finally, in two trials
[email protected]
7.6 Results Table 7.8 The tally of vocabulary items used erroneously to name sadness in Free (naming) task
161 Items Happy Good Satisfied Content Joyful Hopeful Optimistic pleased Carefree Promising Cheerful Glad Thankful Blessed Helpful Satisfaction Blissful Joy Happiness Encouraging Jolly Surprised Surprise Apologetic Passionate Blue
Speakers (counts) Female Male 54 33 20 8 5 4 5 2 4 2 3 2 1 3 3 1 1 3 1 1 2 – 2 – – 2 2 – 1 1 – 1 1 – 1 – – 1 – 1 – 1 10 5 1 6 1 1 – – 1
FreeCat Tag H H H H H H H H H H H H H H H H H H H H H Su Su F A S
(using one item) happiness was mistaken for surprise. All in all, happiness was most frequently identified as high-arousal negative emotions implying approach attitude—anger and disgust. The second most frequent direction of error was the misidentification of happiness for low-arousal negative emotions implying withdraw attitude—sadness and fear. There was a marked variation in the distribution of error types depending on the speakers’ gender in the stimuli conveying happiness but not in the stimuli conveying sadness. When erring in their identification of sadness, participants most often interpreted the emotion as happiness both in female voices (106 trials with 16 items) and in male voices (66 trials with 16 items). For female voices sadness was also misidentified as surprise (11 trials, two items), fear (6 trials, 1 item), and anger (1 trial, 1 item). For male voices sadness was also misidentified as surprise (5 trials, 1 item), fear (1 trial, 1 item), and sadness (1 trial, 1 item). For the
[email protected]
162 Table 7.9 The tally of vocabulary items used erroneously to name happiness in Free (naming) task
7 Emotional Prosody Processing in Nonnative English Speakers Items Sad Worried Unhappy Regretful Disappointed Depressed Bad Disappointment Sadness Miserable Regret Hopeless Boredom Tearful Grief Despair Pain Desperate Discouraged Sorrowful Lame Grieving Despondent Sorrow Upset Nervous Anxious Afraid Scared Uncertainty Fear Frightened Doubt Unwilling Tense Terrified Frantic Worry Scary Panic Angry
Speakers (counts) Female Male 287 83 26 10 10 9 5 1 55 28 15 13 9 18 9 5 11 2 8 2 1 4 3 1 – 3 3 – 3 – 1 1 1 1 1 1 2 – 1 – – 1 1 – – 1 – 1 8 11 31 55 14 6 12 2 9 3 4 1 4 1 2 1 2 – – 1 1 – 1 – 1 – – 1 1 – – 1 128 373
FreeCat Tag SF SF SF SF S S S S S S S S S S S S S S S S S S S S SFD F F F F F F F F F F F F F F F AD (continued)
[email protected]
7.6 Results Table 7.9 (continued)
163 Items Annoyed Mad Anger Furious Annoyance Outraged Livid Irritation Resentful Bitter Cranky Spiteful Temper Unpleasant Wrath Argumentative Madness Discontent Disgusted Confusion
Speakers (counts) Female Male 28 46 15 31 8 27 2 18 2 3 – 1 – 1 6 14 2 3 5 – 2 – – 1 – 1 – 1 – 1 – 1 – 1 2 1 1 1 1 1
FreeCat Tag AD AD AD AD AD AD AD A A A A A A A A A A D D Su
misidentifications of happiness as negative emotions there appears to be an inverse relationship between the high- and low-arousal basic emotion categories and the speakers’ gender. Male speakers expressing happiness were more frequently misidentified as expressing the high-arousal emotions of anger and disgust, while the female speakers expressing happiness were more frequently misidentified as expressing the low-arousal emotions of sadness and fear. For male speakers happiness was most frequently misidentified using vocabulary items tagged as both anger and disgust (500 trials, 8 items), items tagged as pure anger (23 trials, 8 items), one item tagged as sadness, fear and disgust (11 trials, 1 item), and items tagged as pure disgust (2 trials, 2 items). This totaled 19 items tagged as anger, disgust or a mixture of those emotions for a total of 536 errors made on the identification of happiness in male voices. As for the low-arousal emotions of sadness and fear, these were attributed to male speakers much less often, with 103 trials using four items tagged as both sadness and fear, 82 trials using 15 items tagged as pure sadness, 72 trials using 10 items tagged as pure fear, and 11 trials using one item tagged as a mixture of sadness, fear, and disgust. This totaled 30 different items used for a total 268 trials of erroneous interpretation of happiness in male voices.
[email protected]
164
7 Emotional Prosody Processing in Nonnative English Speakers
This high- to low-arousal emotion misidentification pattern was inverted for female speakers. For them the low-arousal emotions of sadness and fear were more numerous, with 328 trials using 4 items tagged as both sadness and fear, 124 trials using 16 items tagged as pure sadness, 82 trials using 12 items tagged as pure fear, and 8 trials using one item tagged as a mixture of sadness, fear, and disgust. This totaled 542 trials where happiness was misidentified in female voices as low-arousal negative emotions using a total of 33 different vocabulary items. As for the high-arousal negative emotions of anger and disgust, they were erroneously attributed to the expressions of happiness in female voices in 209 trials using 13 items. In 183 trials participants used 6 items tagged as both anger and disgust, in 15 trials four items tagged as pure anger, in three trials one item tagged as pure disgust, and in eight trials using one item tagged as a mixture of sadness, fear, and disgust. In other words, the participants were much more likely to misinterpret male expressions of happiness as high-arousal negativity fleshed out as anger or disgust, but female expressions of happiness were more likely to be misinterpreted as low-arousal negativity fleshed out as sadness or fear.
7.6.5 Task Difficulty Effects Analysis A one-way repeated measures ANOVA with Task Difficulty (ValAr × Cat. × F reeValAr × FreeCat) as within-subjects factor and Profciency (LoPro × HiPro) as between-subjects factor. Although the Free (naming) task was a single task from the participants’ perspective, the two levels of statistical analysis of the data from the tasks were analyzed as two separate tasks due to the nature of the underlying multilevel emotion processing. There was a significant interaction between the Task Difficulty and Proficiency, F(1, 131) = 7.82, ƞ2 = 0.056, p = 0.006. On the whole the more difficult the task and the corresponding processing costs, the more the emotional prosody recognition accuracy rates dropped in both Proficiency groups. A post hoc Bonferroni correction revealed significant differences in accuracy scores between Proficiency groups. There was a marginally significant difference between the proficiency groups in the ValAr task with the LoPro group recognizing emotional prosody more accurately (M = 0.493, SE = 0.015) than the HiPro group (M = 0.451, SE = 0.013), at p = 0.050. In the Cat. task there was no significant difference between the Proficiency groups with LoPro group (M = 0.417, SE = 0.013) and the HiPro group (M = 0.412, SE = 0.013), at p = 0.804. On the FreeValAr level of analysis the LoPro group was significantly better (M = 0.412, SE = 0.014) than the HiPro group (M = 0.356, SE = 0.013) at recognizing emotional prosody at p = 0.004. Finally, on the level of FreeCat level of analysis the LoPro group was also significantly more accurate (M = 0.204, SE = 0.012) than the HiPro group (M = 0.157, SE = 0.012), at p = 0.005. Within the Proficiency for both groups all differences in accuracy between the experimental tasks were significant at p