NR (Violence, mature themes, sexual situations). 4. 113.0:34.0=3.323. 3. Forrest Gump. PG-13 (Mature themes, implied sex, discreet nudity, language, violence).
Modeling emotions using a shallow natural-language processing technique Diploma thesis Alexander Osherenko
Humboldt-Universität zu Berlin Mathematisch-Naturwissenschaftliche Fakultät Institut für Informatik
Examiners: Prof. Dr. Burkhard Dipl.-Inf. Mirjam Minor
Berlin, 31 October 2004
Acknowledgements I would like to thank my examiners Prof. Dr. Hans-Dieter Burkhard and Dipl.-Inf. Mirjam Minor for the helpful and valuable comments. Their critical contribution adds to the good quality of this diploma thesis. Especially, I give thanks to Mirjam Minor. She first brought the idea of using movie reviews as the basis for the approach evaluation. Furthermore, I express my gratitude to her for pointing out the term brain plasticity that makes this thesis especially interesting not only for computer scientists, but also for progressive neurologists. I give my special thanks to Lori Grosland for polishing my English in this thesis. I will “pay” her back when she writes her diploma thesis in Russian.
Contents 1.
Introduction ................................................................................ 1 1.1. Definitions .......................................................................... 2 1.2. Motivation from the shift negotiation scenario .................. 3 1.3. The problem description..................................................... 4 1.4. Application examples ......................................................... 5 1.5. Properties of modeling scenarios using emotions .............. 6 1.6. The structure of this thesis.................................................. 7 2. Preliminaries............................................................................... 8 2.1. Sociolinguistic preliminaries.............................................. 9 2.2. Psychological preliminaries.............................................. 10 2.3. Linguistic preliminaries.................................................... 11 2.3.1. Finding emotions in a text ........................................ 11 2.3.2. The numerical appraisal of the emotional content ... 14 2.4. Computer science preliminaries ....................................... 16 3. The proposed approach ............................................................ 21 3.1. Integrating preliminaries .................................................. 21 3.2. Limitations of the introduced approach............................ 22 3.3. Working with the system.................................................. 22 3.3.1. Identifying emotional clusters .................................. 22 3.3.2. Setting the context of the discussion ........................ 23 3.3.3. Adjusting the weights in the HMM.......................... 25 3.3.4. The working procedure............................................. 29 4. The implemented system.......................................................... 36 4.1. The system architecture.................................................... 36 4.2. The system database ......................................................... 38 4.3. The dialogue course.......................................................... 41 4.4. The used software............................................................. 45 5. The approach evaluation .......................................................... 46 5.1. Evaluation on the base of the shift negotiation scenario .. 46 5.2. Evaluation on the base of the movie review scenario ...... 48 6. Conclusions .............................................................................. 59 6.1. Summary .......................................................................... 59 6.2. Related work..................................................................... 59 6.3. Outlook............................................................................. 60
ii
6.4. Final remarks.................................................................... 64 Selbständigkeitserklärung................................................................. 65 Einverständniserklärung ................................................................... 65 References ........................................................................................ 66 Glossary............................................................................................ 71
iii
List of figures Fig. 1. The shift negotiation scenario .................................................... 4 Fig. 2. Different views on emotions ....................................................... 8 Fig. 3. Linguistic factors influencing expression of emotions ............. 11 Fig. 4. The WOZ architecture............................................................... 18 Fig. 5. A hidden Markov model representing emotions....................... 19 Fig. 6. The HMM extension by one node............................................. 23 Fig. 7. A sample utterance in the conversational context..................... 24 Fig. 8. The initial HMM graph for personality profiling...................... 30 Fig. 9. Defining the context of discussion............................................ 32 Fig. 10. Defining options for changing the HMM................................ 33 Fig. 11. Carrying on a dialogue ............................................................ 33 Fig. 12. Emotions’ distribution diagram .............................................. 34 Fig. 13. Meaning of the flags in the emotions distribution diagram .... 35 Fig. 14. Sketch of the system architecture............................................ 36 Fig. 15. Class diagram for emotional appraisal.................................... 37 Fig. 16. Database scheme ..................................................................... 38 Fig. 17. Entering empirical data for emotional words.......................... 39 Fig. 18. Entering data for the context ................................................... 40 Fig. 19. Entering data for the discourse particles ................................. 41 Fig. 20. Initial HMM for review appraisal (Step 1).............................. 42 Fig. 21. Step 2 of the weight adjustment .............................................. 43 Fig. 22. Step 3 of the weight adjustment .............................................. 44 Fig. 23. The final HMM with adjusted weights (Step 4)...................... 45 Fig. 24. Determining the estimation range ........................................... 54
iv
List of tables Table 1. Emotions’ clusters from the point of view of linguistics ....... 13 Table 2. Transforming the context variables in the numerical form .... 24 Table 3. Empirical data for some example words ................................ 27 Table 4. Reviews for entering data....................................................... 53 Table 5. Computer mean values for different human estimations........ 54 Table 6. Reviews for testing the proposed approach............................ 57
v
Modeling emotions using a shallow natural-language processing technique "… You have but to catch the thief." "And what if we do catch him?" "Then he gets what he deserves." "You are certainly logical. But what of his conscience?" "Why do you care about that?" "Simply from humanity." "If he has a conscience he will suffer for his mistake. That will be his punishment – as well as the prison." "But the real geniuses," asked Razumihin frowning, "those who have the right to murder? Oughtn't they to suffer at all even for the blood they've shed?" "Why the word /ought/? It's not a matter of permission or prohibition. He will suffer if he is sorry for his victim. Pain and suffering are always inevitable for a large intelligence and a deep heart. The really great men must, I think, have great sadness on earth," he added dreamily, not in the tone of the conversation.
[Dost03]
1. Introduction Each man is different and human behavior as well. Although there is a huge amount of theories trying to explain it, currently there is no theory that unambiguously clarifies human actions. Computer science as an exact discipline doesn’t tolerate inaccuracy. It tries to find a model for everything, even for human behavior. Computer scientists are not satisfied with the vague answers from philosophers and psychologists and try to find a mathematical representation for such ephemeral concepts as human nature, human behavior, and human emotions. The requirement “to model” life i.e. human behavior brings a new trend in the development of computer systems. Software systems should perform more complex tasks than they did before and provide more intelligent support to the user utilizing theories of humanities that explain human behavior. Such a requirement raises some questions. How does the human personality function? What are the main dependencies between particular components of the intelligent behavior? What aspects of human acts
1
should be brought to the forefront when analyzing a life situation? How should one model a real-life situation taking into consideration the above questions? From this huge amount of questions this work deals with possibly the most basic question for the computer science – how should a developer of a computer system model a situation applicable to the field of human computer interaction? It may seem unexpected that this work describes many “non-computer” theories. The reason is that I introduce an interdisciplinary approach that goes beyond pure technical issues and needs corresponding handling. Although computer science as the science providing instruments for numerical formalization of user actions can be considered to be the most important for this diploma thesis I show other theories e.g. from linguistics that accompany the proposed approach. I use as an input natural-language utterances of persons taking part in the interaction as this way seems to me to be the simplest and the most general. 1.1. Definitions Before I describe in more detail the problem and approaches to its solution, I provide some definitions necessary to understand substantial issues of this thesis. • Personality. The personality describes emotional characteristics of a person taking part in a discussion through a description of transitions from one affective state to another. For example, it defines the probability that a person being furious will calm down during the conversation. An approach for mathematically modeling such transitions using the hidden Markov models (HMMs) is provided in section 2.4. • Discussion. A conversation in a group of people that are communicating with each other. • Negotiation. “The interaction type… which… is initiated from a conflict of interests. …negotiation is motivated by a need to make a deal while selfishly maximizing personal goals” ([Faratin00]). Hence, the nego-
•
2
tiation can be seen as a special case of discussion. Discussion scenario or simply scenario. A discussion scenario provides a description of a situation, in which a conversation
•
takes place. It also specifies the environment and the persons, participating in it. Utterance. An utterance is defined in this thesis as a phrase that either can arise in a dialogue between two humans or can be simply a text which emotional appraisal is made.
1.2. Motivation from the shift negotiation scenario The shift negotiation scenario is a possible representative of a large group of scenarios involving human computer interaction. In this section I introduce this scenario in order to give an idea of what I understand under discussion scenarios. The shift negotiation scenario describes negotiations between humans on a shift plan in hospital environments. A shift plan consists of particular shifts that could be exchanged (negotiated) between employees of a hospital. The necessity of negotiation on a particular shift arises from the requirement of a hospital employee to exchange his shift to have free time on a certain day for whatever reason. What possibilities do basically exist for changing the shift plan? One possible variant is the generation of a new shift plan that meets the requirements presented by this hospital employee. This alternative seems to be obvious, but isn’t always successful. The shift plan is very hard to change because of problems e.g. difficulties in satisfying constraints that result from administration issues. That’s why the employee has to find himself another employee that can work in his shift and negotiate with this employee i.e. converse with him and discuss the shift plan in order to exchange the corresponding shift (Fig. 1).
3
Fig. 1. The shift negotiation scenario
Note that the conversation takes place in a hospital i.e. organization defining some significant properties of the conversational context e.g. its influence on hospital employees (s. section 2.1). 1.3. The problem description This thesis studies human computer interaction (HCI) and its scenarios. These scenarios are very different. One common feature for all of them is the participation of humans with their personalities. If it would be possible to find a common part of human personality and model it, these notions can simplify the analysis of scenarios in HCI and lighten the development of computer systems based on it. Doubtless there are many things that can be used in order to find commonalities between humans. In this thesis I choose as a common part of the humans their emotions. Hence, the main aim of this thesis is the elaboration of a general model of human emotional behavior suitable for the description of different scenarios in HCI. The model demanded should reflect the peculiarities of human conscience as its transience, suddenness and inaccu-
4
racy. It should adjust in the course of time in order to integrate possible variances. 1.4. Application examples The model to develop can be used in applications involving HCI as the basis for understanding user acts and his or her reactions. For example, consider a high-tensioned person wishing to draw cash. This person would “communicate” with the ATM (Automatic Teller Machine) by such utterances as “I want my damned money”. The ATM can analyze this utterance and construct hypotheses on person’s personality and act accordingly. Although the described situation is most likely far-fetched, the model is helpful to understand user actions and react to them. The model to develop can be adopted e.g. in the intelligent house where the house environment reacts interactively corresponding to the user’s actions ([Lesser et al. 99]). Acts of the computer system in such a house can be influenced by the psychological profile of the person living in it. The analysis of emotions is the field of study in the area of affective dialogue ([Vicente et al. 00]). Besides other issues this approach also studies models of emotional behavior that underlie the corresponding computer systems. Possible models of reacting to human emotional states are described in [MIT03]. The applications include: • AffQuake – Quake changes the size of the player's avatar in relation to the user's response as well, representing player excitement by average skin conductivity level, and growing the avatar's size when this level is high; • Affective Jewelry and Accessories – Wearable jewelry and other clothing designs with embedded sensors for sensing physiological changes associated with emotions; • Affective Tangibles – a squeeze mouse, affective pinwheels that are mapped to skin conductivity, and a voodoo doll that can be shaken to express frustration; • Affective Touchables – Physical objects that sense affective parameters through being held or touched, and communicate the emotions abstractly through sight, sound, or haptic changes;
5
•
• • •
•
•
Affective Tutor – the Learning Companion – an agent that senses affective states like boredom, anxiety, and engagement, and adjusts its response to the user in accord with the user's state; The Galvactivator – A wearable device which maps the skin’s conductivity to a glowing red LED, then this device learns about and communicates with the body's response; Interface Tailor – An agent that attempts to adapt the system in response to affective feedback; Learning and pattern recognition – Computers that learn patterns of behavior (physiological and otherwise) depending on the user (personality, goals, preferences, etc.) and on his or her situation (work, commute, too much caffeine, etc.); Mood Interfaces – This project explores graphical interfaces in which physiological signals drive the visual display, allowing users or their conversational partners to engage in a computermediated dialogue of emotional expression, viewing a graphical representation of expressions of their current physical/emotional state; Robotic Computer – A personal computer with an active, robotic, monitor, which expresses and responds to socialemotional cues.
1.5. Properties of modeling scenarios using emotions The conventional definition of the term “emotion” provides [MerriamWebster03] as “affective aspect of consciousness, or a state of feeling, a psychic or physical reaction (as anger or fear) subjectively experienced as strong feeling and physiologically involving changes that prepare the body for immediate vigorous action.”
This definition implies wide examination of different aspects of emotions. However, in this diploma thesis I choose only user’s utterances in the discussion as a manifestation of emotions. Here I summarize the advantages of using emotions in systems with HCI. These include: • A general approach for modeling scenarios in HCI. Emotions provide a general instrument for HCI scenarios defining a stan-
6
dard in modeling this interaction thereby allowing for the unification and the simplification of implementation issues. Since HCI scenarios are very different and manifold there is a need to provide a general method that can be applied for the description of the most scenarios. As emotions are a constituent part of humans, this property can be used to describe such • scenarios; The improvement of user modeling quality and thus the quality of HCI. Emotions allow for the exacter understanding of HCI through better modeling of persons taking part in discussions that includes understanding reactions and behavior of interaction participants and also observation of discussions in respect of investigating dependencies between different emotions; • Modeling human psyche. The affective modeling provides a possibility to model user psyche. For example, in [Urbig et al. 03] the system chooses possible acts using empirical observations on psychological groups of users and not on the numerical basis. The affective modeling can be used to improve the quality of this modeling. The emotional approach has the following disadvantages: • Unexplainable behavior. Difficulties regarding the nature of emotions that sometimes seem to be unexplainable; • Individual nature. Emotions are the property of individuals and not that of groups of people. That’s why it is impossible to analyze HCI scenarios involving groups of people although it is possible to develop some formalism to calculate “emotions” of groups. 1.6. The structure of this thesis In the following chapters, I introduce an approach for analysis of emotions. Chapter 2 describes preliminaries from different sciences for better understanding of the problems concerning human emotions. Chapter 3 illustrates the proposed approach that results from the preliminaries in Chapter 2 and Chapter 4 – the implemented computer system. Chapter 5 gives the approach evaluation. Chapter 6 provides a summary, the related work, an outlook and final remarks.
7
2. Preliminaries As already mentioned, the emotional approach provides a general instrument for analyzing scenarios in human machine interaction. That’s why in this section I introduce some theories that help to find approaches for the numerical evaluation of emotions. The theories are described probably too exactly in order to give a general idea of existing trends in different sciences as it is almost impossible to make an optimal choice for HCI regarding required theories and even necessary sciences. Different sciences present their approaches to modeling and estimating emotions (Fig. 2). psychology, linguistics etc.
...
biology
chemistry
physics
Fig. 2. Different views on emotions
Fig. 2 shows the hierarchy of disciplines as [Weigand03]. [Salter95] defines different emotional aspects of the communication between humans – the command. Thereby he differentiates commands defined in philosophy, economics, sociology, anthropology, sociolinguistics and psychology. [Wright95] constitutes human mind regarding different sciences – anthropology, psychiatry, sociology, political science – concentrating on evolutionary psychology. A thorough discussion of aspects introduced by all possible sciences would go beyond the scope of this thesis. That’s why I will provide
8
only sociolinguistic, psychological, linguistic, and computer science preliminaries that are important to analyze emotions. 2.1. Sociolinguistic preliminaries Since a discussion between different individuals by nature is more complex than a simple observation of these persons without the discussion context, a general approach should consider the context of conversation. In this section I introduce sociolinguistic preliminaries in order to understand requirements of emotions regarding the context of discussion. The context for social systems is described by environmental settings characterizing the discussion. Formally the utterance context is defined by observations on the communicative situation described by the following context variables ([Belikov et al. 01]): • Sender of the utterance; • Receiver of the utterance; • Relationship between the utterance’s sender to receiver (good– bad–satisfactory). • Communicative tone (official – neutral – friendly); • Goal of communication; • Communicative means (language or its subsystem – dialect, style, as well as gesticulation, mimics); • Way of communication (oral/written, contact/distant); • Place of communication. [Martin80] following Hymes provides another definition of the discussion context. Accordingly, it can be described by 8 sets of variables: • Participants (speaker, sender, addresser/hearer, receivers, audience, addressee); • Instrumentalities (channels: spoken, written; forms of speech: language, dialect); • Act sequence (form of the message, content of the message); • Setting and scene (setting: time and location, scene: e.g. relatively normal); • Key (the way of conveying the message: exact); • Ends (outcomes, goals);
9
•
Norms (the rules the interaction between participants has to follow); • Genre (message style: poetic, ironic); The context in this thesis is determined using empirical observations on emotional properties of an environment and a scenario. For example, an environment such as a hospital implies the nervous excited atmosphere. If two hospital employees are speaking with each other, this state of affairs will influence the conversation and the form of it. From the set of possible sociolinguistic variables I choose for the further analysis three that can be analyzed and specified relative simply – relationship between the utterance’s sender and receiver, communicative tone and genre. Other variables can be
supplied in the next version of the system. The approach to the numerical context definition is shown in section 3.3.2. 2.2. Psychological preliminaries Psychology as a science studying human emotions provides a lot of information concerning emotions ([Lazarus91]). [Dörner03] defines emotion as a reaction of an agent, which may be man, animal or an artificial system, to two aspects of its relation to reality, namely to entropy or uncertainty of the environment and to competence.
As emotions by their nature are extremely different and fluent in their life cycle depending on various reasons it is advisable to use clusters (groups) of emotions in order to reduce the domain complexity. [Ortony et al. 88] introduces a possible emotional taxonomy describing emotions classified by their causes. Different emotions e.g. happyfor are grouped (clustered) together according to various reactions to diverse events. [SchmidtAtzert81] presents another approach for clustering emotions like Joy (Freude), Lust, Affection (Zuneigung), Compassion (Mitgefühl), Aspiration (Sehnsucht), Disquiet (Unruhe), Antipathy (Abneigung), Aggression (Aggressionslust), Sadness (Traurigkeit), Abashment (Verlegenheit), Envy (Neid), Fear (Angst).
10
One more approach for the clustering is built from 135 basic emotions ([Lazarus91]). These emotions can belong to six groups – Love, Joy, Surprise, Anger, Sadness, and Fear. I described the above emotional taxonomies in order to stress a huge amount of emotions and also to give a flavor of grouping these emotions in clusters. Since each particular application domain would emphasize different emotions that play an important role in the system, it is up to the system administrator what emotional clusters he or she uses and what clusters better describe the main issue of the application. For example, a human resources director could be interested if a potential employee can keep calm in an unusual situation. Emotional clusters that are important in this scenario are Disquiet, or Joy (s. also section 3.3). 2.3. Linguistic preliminaries Linguistics is chosen as the basis for the emotional analysis. That’s why I will study linguistic preliminaries more exhaustively. 2.3.1. Finding emotions in a text In order to find the meaning of a text one analyzes not the whole text, but its constituent parts. One of such parts is the emotional component of the text. Fig. 3 shows factors that influence the verbal expression e.g. the verbal expression of emotions ([Fries96]).
CS
SF
SYN
PF Mimic Gesture
AF Grammar
Other knowledge Fig. 3. Linguistic factors influencing expression of emotions
The factors in Fig. 3 (cf. [Jahr00]):
11
•
The conceptual structure (CS) describes different semantic aspects of the meaning of the utterance: o The emotional aspect; o The thematic organization of an utterance; o The elocution (the relation between the speaker and the world – in other words, the context of the utterance); o The representational contents. Although the conceptual structure deserves more thorough discussion, it can be disregarded at the moment due to its complexity. • The semantic form (SF) provides different approaches for analyzing meaning of utterances regarding the usage of their lexical and grammatical structure. For example, interjections by their nature are an expression of emotions. Sometimes specific suffixes express affect as e.g. in German -ling, -chen, and -ler could be interpreted as manifestation of emotions. • The syntactic form (SYN) can define the emotional contents of an utterance e.g. in German: o Dass du immer soviel trinken musst! (You drink too much alcohol! – connecting word is missing); o Feuer! (Fire! – missing article and governing category); o Unvorstellbar! (Unimaginable! – missing governing category); o Raus! (Get out! – prepositional phrases without governing category); • Articulation (AF) and phonological form (PF). Following [Mees91] [Jahr00] introduces a hierarchical scheme of emotions (Table 1):
12
Main class
Rating element
Emotional group
Emotional type
Eventbased emotions
Happiness/unhappiness
Empathy
Mitfreude (gladness), Mitleid (pity), Schadenfreude (gloating joy), Neid (envy)
Expectational emotions
Hoffnung (hope), Furcht (fear), Befriedigung (satisfaction), Erleichterung (relief)
Welfare emotions
Freude (pleasure), Leid (sorrow)
Internal Attribution
Scham (schame), Stolz (proud)
External Attribution
Billigung (fury)
Estimation/disrespect
Estimation emotion
Bewunderung (admiration), Verachtung (abhorrence)
Like/don’t like
Attractiveness emotion
Liebe (love), Hass (hatred)
Satisfaction and Approval
Self-praise and joy Approval and joy
Selbstzufriedenheit (selfsatisfaction) Dankbarkeit (thankfulness)
Dissatisfaction and Disapproval
Self-reproach and sorrow Reproach and sorrow
Selbstunzufriedenheit dissatisfaction) Ärger (anger)
Attribution emotions
Relational emotions
Connection emotions
Approval/Disapproval
(Approval),
Zorn
(self-
Table 1. Emotions’ clusters from the point of view of linguistics
[Leech et al. 94] introduces one more linguistic approach for analyzing emotional background of utterances. He distinguishes the emotive emphasizes in speech as e.g. interjections and exclamations.
13
2.3.2. The numerical appraisal of the emotional content Having said some introductory words, I describe the numerical appraisal of emotions. The numerical appraisal is based upon investigations in [Jahr00] providing the following formula to study emotions in text: (in [Jahr00]), or rewritten for this thesis:
where • •
El – is the estimation of the emotional intensity of an emotion. The value range for this variable is discussed along with the interpretation approach in section 3.3; B – is the concernment factor like personal interest in the dialogue topic; is an I-variable with the values 1 for personal interest and 0 for no personal interest. I-variables following [Lazarus91] are: o Psychological nearness to the themes in text (Ivariable 1); o Importance of the contents for the society (I-variable 2); o Importance for the person (I-variable 3); o Expectancy or non-expectancy (I-variable 4); o Need for dispraise or deserving (I-variable 5); o Rate of social acceptance (I-variable 6); o Rate of own sureness (I-variable 7); o Increased concernment (I-variable 8).
•
14
Without loss of generality I can assume value 1 as value of variable B meaning the highest concernment factor; SM – is the set of linguistic features ([Williams03]) e.g. stylistic figures (metaphors, hyperboles etc.) upon which estimation is performed. For example, [Scheler et al. 97] introduces “discourse particles“ (in German – aber, auch, bloß/nur, denn, doch, eben, eigentlich, einfach, etwa, erst, halt, ja, mal, nur,
•
ruhig, schon, vielleicht, wohl), ( in English – well, yes, oh, ah, okay, uh, um); – are empirical weights of the linguistic features;
•
– is a cumulative value of the
weighted I-variables’ values where is the empirical weight of the dominant I-variable . • W – is the set of words in the estimated utterance. Note that the emotional intensity of an utterance is linear dependent upon the number of linguistic figures. Emotional word is defined as a word in an utterance that can be ascribed a special emotional meaning in an utterance. In order to show the usage of the formula described above, I bring an example from a juristic magazine. Although the text is provided in German, it gives an idea of analyzing emotions in a text. Kaiser: Strafen statt Erziehen? (Zeitschrift für Rechtspolitik 30 (1997), 451-458.) Im Rückblick von einhundert Jahren fällt es daher die Bilanz im sanktionsrechtlichen Umgang mit jugendlichen Straftätern nicht uneingeschränkt positiv oder ermutigend aus. Offenbar waren die Erwartungen der Reformväter zu hoch gesteckt und es ist Bescheidenheit angezeigt. Denn Jugendstrafrecht und Jugendgerichtsbarkeit weisen gravierende Gebrechen auf. Trotz des skizzierten Mängelprofils empfiehlt es sich, ebenso am Erziehungsgedanken wie an Verfahrensgrundsätzen und Interventionssystem des Jugendstrafrechts festzuhalten. Freilich läßt es sich nicht leugnen, daß es sich hierbei nicht nur um unterschiedliche und zum Teil gegenläufige Ideen und Strategien handelt, sondern daß auch unterschiedliche Handlungsstile durch Konflikte zwischen miteinander rivalisierenden Professionen fest angelegt sind. Sonst stehen den mehr bewahrenden Jugendpädagogen im wachsenden Umfang experimentierfreudige Sozialpädagogen und sonstige Sozialwissenschaftler gegenüber, die Ihnen das Feld im Umgang mit jungen Straffälligen streitig machen. Die unterschiedliche Beteiligung der verschiedenen Berufsgruppen an Fachveranstaltungen, etwa der Deutschen Vereinigung für Jugendgerichte und Jugendgerichtshilfen, lässt dies darüber hinaus die „vested interests“ erkennen. Die Verrechtlichung und der Informalismus haben aber in gleicher Weise international für einen Reformschub im Jugendrecht der Gegenwart gesorgt.
Emotional figures (empirical weights FEx are provided in brackets): nicht uneingeschränkt positiv (+0.5) - ermutigend (0)– offenbar (-0.5) – Reformväter (0) - zu hoch gesteckt (0) - Bescheidenheit angezeigt (0) - gravierende Gebrechen (+1) - Mängelprofil (0) -
15
freilich (-0.5) - läßt es sich nicht leugnen (0) - nicht nur… sondern auch (-0.5) - Konflikte… fest angelegt (+0.5) – bewahrende Jugendpädagogen (0) – wachsender Umfang (0) – experimentierfreudige Sozialpädagogen (0) – rivalisierende Pädagogen (0) – Feld streitig machen (0) – „vested interests“ (0) – Reformschub (0).
Empirically revealed dominant emotion: Unmut (resentment) |SM|: 19
FEx: +0.5
Intensity value: El = B (|SM| + FEx) = 1 * (19 + 0.5) * 1 / 166 = 0.117
: 1 |W|: 166 / |W|
Although some values in the calculation are provided on the empirical basis and are subject to wrongness, generally the above formula can be used in order to estimate emotions. Unfortunately, the calculated value 0.117 lacks an interpretation. That’s why it is necessary to develop a method for revealing a distinct explanation for this value. In section 3.3.3, I introduce an interpretation approach along with the formula that is derived from the above one and is used in the implemented system. 2.4. Computer science preliminaries The importance of understanding human language and its ambiguities grows in order to give over some routine tasks to mechanical devices and make the communication with them more robust and easy ([Knoll et al. 03]). Computer science and in particular AI tries to find approaches to analyze interaction between humans and thereby to provide significant notions for understanding of HCI. It defines two main ways to model the human mind ([Weizenbaum76]) – symbolic systems (information processing) and connectionism (neural networks). Symbolic systems assume that the human brain is an informationprocessing system accomplishing thinking by processing, copying and reorganizing symbols. Furthermore, the brain solves problems by creating a symbolic representation of the problem and the search for a solution is not carried on by trial and error, but is selective.
16
Connectionism is a movement in cognitive science which hopes to explain human intellectual abilities using artificial neural networks. Neural networks are simplified models of the brain composed of large numbers of units (the analogs to neurons) together with weights that measure the strength of connections between the units ([Garson03]). Different models of human conscience are an instrument that is the basis for diverse applications e.g. the text analysis. [Chomsky57] analyzes text semantics and provides an overview of monographs concerning understanding of the natural-language text. Modern approaches define artificial consciousness in the form of embodied conversational agents. Such agents are computer programs that simulate human behavior and represent its aspects ([Hill et al. 03], [Matheson et al. 03]). The ELIZA system presents an approach to react to the natural language utterances ([Weizenbaum76]). The ELIZA’s algorithm searches for patterns in a given textual input and, if found, produces a specified answer. Whereas the ELIZA approach is relatively simple, it is general enough to analyze emotions in a dialogue. [Batliner et al. 00] describes the Wizard-of-Oz (WOZ) scenario to analyze emotions in a conversation. Conversation utterances are annotated to lexical, conversational, and prosodic peculiarities. According to this data, the two costs for emotional and neutral state are calculated. Depending on major probability, the utterance is classified as emotional, otherwise as neutral. The analysis takes place on the basis of recorded audio information from 20 dialogs (2254 turns). The WOZ scenario describes the following architecture for analyzing the text meaning (Fig. 4).
17
Fig. 4. The WOZ architecture
Thus, the WOZ system is divided in two parts – userindependent (context-independent) and user adaptive (contextdependent) ([Batliner et al. 00_1]). The marked behavior corresponds to the behavior in an emotional state. Hence, the WOZ system distinguishes prosody peculiarities, repetitions and re-formulations. The mimics’ analysis is not implemented, but can be integrated in the system. Prosody peculiarities, repetitions and re-formulations are weighted according to their importance and, depending upon the calculated weight and costs, the corresponding action of the system is performed. The action can be either restricted, system-guided dialogue restricting communication to a predefined course, or a clarification dialogue making clear some issues in the current dialogue, or hand over to human operator in order to get human assistance for the answer and to yield the graceful recovery. The highlighted
18
areas represent already implemented parts of the computer system (s. also section 4.1). This thesis doesn’t study grammatical issues of the spoken language. It concentrates on the linguistic approach to the problems regarding identification of emotions in the spoken language lexically and doesn’t use prosodic information as in the WOZ system. A reliable and flexible model of emotions has to meet the following requirements: • Reflect emotions or emotional groups (clusters) that are important in the particular scenario; • Reflect coherences between emotions considering its fluency and thus its probabilistic nature; • Provide a way of context consideration. A simple way of specifying emotional connections provide hidden Markov models (HMM) ([Picard97]). An example of a model of this kind is shown in Fig. 5.
Fig. 5. A hidden Markov model representing emotions
The HMM is a probabilistic state automaton. Thus, the HMM in Fig. 5 means the following – the nodes correspond to emotional states of a person: Joy, Interest, Distress; the arcs to possible transitions from and to emotional states with a designated probability (Pr(X | Y)). Depending on the current state in the HMM either the Joy, or the Interest, or the Distress states are active. The HMM can transit
19
terest, or the Distress states are active. The HMM can transit with
the specified probability to another emotional state. Transitions in the HMM are tracked in order to create observations (O(V | X)). Probabilities Pr(X | Y) are measured in the range [0,1], the observations characterizing the affective state can have any range. The HMMs have many advantages in their simplicity and also a disadvantage – they define only one emotion that can be active at the moment although it is possible to define states that represent different combinations of emotions. Neuronal networks provide another way of modeling emotions ([Anderson95]). In this thesis, I use the HMM model as the basis for the further discussion due to its simplicity. Although neural networks don’t have the discussed disadvantage (few nodes in the network can be activated simultaneously), they are hard to debug and to understand as the dependencies in them are not always self-evident.
20
3. The proposed approach In the following, I explain what conclusions I draw from the described preliminaries and also the computer system I’ve implemented. This system identifies during a discussion user’s emotions and creates a model of emotional behavior on the base of his or her natural-language utterances. In order to simplify analysis and reduce complexity regarding problems of the natural-language understanding, I abstain from factors induced by articulation and phonological forms and mimics/gesticulation making a presumption that the omitted aspects can be supplied and their effect on emotions will be studied in future. In order to avoid misapprehension I state that the main components of the proposed approach constitute three pillars that can be used independent from each other as a general instrument for analyzing HCI scenarios. These main components are the HMM of emotions, the context model, and the emotional appraisal engine. The HMM of emotions is a probabilistic automaton constituting transitions from one emotional state to another during the affective conversation (s. also section 2.4). The context model defines the environment for the discussion (s. also section 2.1) and the emotional appraisal engine is the instrument providing the affective analysis (s. also section 2.3.2). A connection between these components e.g. in the way of calculating the emotional appraisal is not present in the current system. As an illustrating example scenario for further studies the shift negotiation scenario is chosen (s. also section 3.3). 3.1. Integrating preliminaries I presented approaches to the description of emotions from the point of view of different sciences (s. section 2). In this chapter, I describe an integral approach that combines the described theories. In order to analyze emotions the implemented system: • From the point of view of sociolinguistics (s. section 2.1): o Takes considerations of the context of communication between humans. • From the point of view of psychology (s. section 2.2):
21
• •
o Groups emotions in clusters – the emotional clusters should be chosen corresponding to the problem to solve; From the point of view of linguistics (s. section 2.3): o Analyzes utterances lexically using [Jahr00] and an empirical database approach; From the point of view of computer science (s. section 2.4): o Uses the HMM model.
3.2. Limitations of the introduced approach In this section, I introduce some limitations and assumptions that are primarily imposed by natural-language issues. Hence: • Natural-language utterances are analyzed only as separate words. o The meaning of complex phrases like idioms is not analyzed; o The problems concerning the correct interpretation of pronominals‘s reference subjects e.g. you, they, it are disregarded. • The conversation assumes no internal code-switching (e.g. conversational style changing); The influence of the above issues is considered to be a possible methodological drawback of the approach which negative impact can be minimized by a grammatical parser (s. also section 6). 3.3. Working with the system In this section I describe first the theoretical steps regarding working with the system and then the practical working procedure in section 3.3.4 as implemented in the current system. 3.3.1. Identifying emotional clusters The HMM of emotions shown in Fig. 5 is very simple and in most cases not applicable for real life scenarios. Real life scenarios assume more than three emotions and six dependencies between them. That is why I suggest an approach for the iterative extension of HMMs.
22
Fig. 6. The HMM extension by one node
Fig. 6 shows such an approach. Consider an existing emotional HMM with its nodes and dependencies. The system administrator adds a new emotion i.e. a new emotional state to the HMM. This node appends 2n+1 arcs to the HMM − the pair of probability arcs (Pr(Emotion | Emotion1), Pr(Emotion1 | Emotion)) from and to the added emotional node to each other emotional node in the HMM and a circular arc Pr(Emotion | Emotion), where n is the number of nodes in the graph before the node insertion. Hence, a new emotional state is defined. Additionally, the HMM can specify a neutral quasi-state “no emotions” as a starting point for the emotional analysis. 3.3.2. Setting the context of the discussion In order to analyze the impact of the context one can “lay” the context upon the emotional estimation of the utterance thereby excluding the possible influence of the current situation on the human. Accordingly, the context attributes as the relationship, communicative tone, genre parameters are to be transformed in the form suitable for the numerical analysis and compared with the corresponding utterance values (Fig. 7).
23
4,5 4 3,5 3 2,5 2 1,5 1 0,5 0
Utterance
Fe ar
En vy
Sa dn es s
Lu st D is qu ie tn es s An t ip at hy Ag gr es si on
Jo y
Context
Fig. 7. A sample utterance in the conversational context
Fig. 7 shows a bar diagram for a sample context and utterance. The X-axe shows significant emotional clusters and Y-axe contribution values of an utterance to these emotional clusters. For this example, I assume bad as the value for the relationship variable, official – for the communicative tone variable and neutral for the genre variable. For the Joy, Lust, Disquietness, Antipathy, Aggression, Sadness, Envy, Fear emotional clusters these variables are transformed in the numerical form as shown in Table 2. Emotional cluster
Bad
Official
Neutral
Cumulative contextual values
Joy Lust Disquietness Antipathy Aggression Sadness Envy Fear
0 0 1 1 1 2 1 2
0 0 1 1 0 0 0 1
0 0 0 0 0 0 0 0
0 0 2 2 1 2 1 3
Table 2. Transforming the context variables in the numerical form
Note that the values are empirical and can be modified by the system administrator.
24
3.3.3. Adjusting the weights in the HMM I developed a method to learn probability weights in the HMM that reflect emotional transitions during the discussion. Hence, the formula for the emotional appraisal in section 2.3.2 is transformed under above described considerations as follows:
where El(cl, utterance) is the numerical appraisal of the emotional cluster cl for an utterance utterance; ∑ 1≤i≤M contribution(wordsutterance(i), cl) is a sum of emotional contributions of words (from 1 to M) in utterance utterance to cluster cl. M is the number of words in utterance utterance. The emotional contribution of a word to a cluster is a number measuring the degree of belonging of this word to the particular cluster. In the current system the values of the emotional contributions are empirical (s. also Table 3). Remembering the interpretation issue in section 2.3.2 regarding variable El, it is necessary to provide an explanation of its value. The main idea is to compare this value with the values of other emotional clusters from the current scenario and on the base of this comparison to find the dominant emotional cluster. Another interpretation result is not required for the implemented algorithm (s. also Algorithm 1 below). The weight adjustment in the HMM is done as follows: 1. The HMM arc weights are initialized to 0 if the edge doesn’t represent a loop, and 1 (100%) otherwise. The cumulative initial value for a particular state is therefore 100%. The cumulative intensity of the dominant emotion that is the sum of the intensity of the dominant emotion in the previous utterances is initialized to 0. The dominant emotion is set to the generic neutral state (step a) in Algorithm 1 and the neutral state definition in section 3.3.1). 2. The system calculates the dominant emotional set for all utterances using a database table with emotional words (step b) in Algorithm 1 – s. also section 4.2). The dominant emotional set Edom(utterance) of the utterance is a set of emotional states (clusters) that are at this particular moment the strongest ones corresponding to the maximal sum of contribution values of words in the utterance:
25
where N is the number of emotional clusters and M is the number of words in the utterance utterance. For a word in the utterance the system goes through all emotional clusters in the EmotionalWords table, extracts the contribution of this word to the corresponding emotional cluster and adds the extracted value to the cumulative contribution value calculated on the base of the previous words in the utterance. On the base of this cumulative value the system can determine what emotion is the dominant (maximal) one and increment the probability of the transition from the current to this dominant emotional state. For example, given that the HMM with the emotional nodes Disquietness, Sadness, Joy, Neutral is in the Disquietness emotional state. The I’m tired utterance arouses the Sadness emotion to become the dominant one because the tired word has Sadness as the dominant emotion and all other words can be considered to have no emotional meaning. The cumulative contribution value for the Sadness emotional cluster is therefore maximal. Hence, the system adds the learning rate parameter to the arc from the Disquietness to the Sadness emotional state (s. step d) in Algorithm 1). Table 3 shows some examples of empirical data (example words are taken from the utterance I’m pretty tired, I’ll better go home now).
26
Emotional cluster
Tired
Better
Home
Now
Joy Lust Disquietness Antipathy Aggression Sadness Envy Fear
0 0 1 1 1 2 1 2
1 1 0 0 0 0 0 0
3 2 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Table 3. Empirical data for some example words
3.
4.
Note that the Now word is actually redundant as it contains only zero elements and can be simply omitted in the database (s. also section 4.2). The dominant emotional set can contain (although very improbably but mathematically seen possibly) several elements. In this situation the system chooses as the dominant emotion the first one in the set. Note that the choice of the dominant emotion from the set of possible states plays an important role and must be fair as defined for Petri nets ([Reisig98]). Otherwise, the system chooses the same state from the set each time it calculates the dominant emotional set and doesn’t allow fair handling. In future, the correct choice of the dominant emotion has to be reviewed. Depending on if the dominant emotion is the same as in the previous utterances the system either adds the calculated emotional value of the current utterance to the value from previous utterances or sets the cumulative intensity to the calculated value (step c) in Algorithm 1). If the cumulated value exceeds the threshold value parameter the system adjusts the transition probability using the learning rate parameter (s. step d) in Algorithm 1). The threshold value is used to avoid oscillation i.e. chaotic weight changes in the HMM as an affect of analysis of only one utterance. Since a single utterance rarely contains a distinct emotional expression and hence can’t be used for reliable emotional appraisal, the threshold value ensures that a particular
27
emotional transition really took place and improves the quality of the approach. 5. The new HMM must reflect the changed probabilities of the emotional transition from the old in the new state and stay consistent with respect to the requirement that the overall transition probability for the outgoing arcs of the current state before and after the adjustment remains 100%. That’s why after the learning rate parameter is added to the corresponding arc weight in the HMM the weights of all other outgoing arcs of the current state must be reduced (step e) and f) in Algorithm 1). Note that the adjustment of the HMM is done permanently during the discussion. Intuitively it corresponds to the human ability to learn and to restructure the brain called brain plasticity ([Kaas01]). 6. In order to reduce the transition probability the system first finds the outgoing arcs with the sufficient weight to avoid the negative weights in the HMM (step e) in Algorithm 1) and then “distributes” the learning rate value among these arcs (step f) in Algorithm 1). Note that a state is deleted from set SetOfSufficientStates if weight(currentState, otherState) is less than LEARNINGRATE / |SetOfSufficientStates| and the power of the set becomes smaller resp. LEARNINGRATE / |SetOfSufficientStates| becomes greater. Hence, set SetOfSufficientStates must be reset since it may already contain elements that become incorrect through this state deletion (label start). 7. Concluding the weight adjustment after exceeding the threshold value the system sets the current emotional state to the current dominant state. The algorithm for the weight adjustment for analyzed utterances given in the pseudocode is the following (STATES – the set of all emotional states in the HMM, UTTERANCES – the set of all phrases to be analyzed, NEUTRALSTATE designates the neutral state in the HMM, THRESHOLD and LEARNINGRATE – the threshold resp. learning rate parameters):
28
a) forall state, otherState in STATES with state ≠ otherState { weight(State, State) = 1 weight(State, OtherState) = 0 } currentState = NEUTRALSTATE cumulativeIntensityOfDominantState = 0 b) forall utterance in UTTERANCES { dominantEmotionalState = Edom(utterance).elems(0) c)
if currentState == dominantEmotionalState cumulativeIntensityOfDominantState += El(dominantEmotionalState, utterance) else cumulativeIntensityOfDominantState = El(dominantEmotionalState, utterance)
d)
if cumulativeIntensityOfDominantState > THRESHOLD { weight(currentState, dominantEmotionalState) += LEARNINGRATE
e)
start: SetOfSufficientStates = STATES \ {dominantEmotionalState} forall otherState in SetOfSufficientStates { if ( weight(currentState, otherState) < LEARNINGRATE / |SetOfSufficientStates| ) { SetOfSufficientStates = SetOfSufficientStates\{otherState} goto start } }
f)
forall adjacentState in SetOfSufficientStates { weight(currentState, adjacentState) -= LEARNINGRATE / |SetOfSufficientStates| }
g)
currentState = dominantEmotionState } //if THRESHOLD } //forall UTTERANCES
Algorithm 1. Weight adjustment in common
3.3.4. The working procedure In order to define an emotional HMM on the basis of the given scenario and to adjust the weights of the arcs using the developed computer system: • Identifying emotional clusters for the current scenario. The system administrator identifies significant emotional clusters for the
29
current scenario and adjusts accordingly the HMM (s. also section 3.3.1). In future research, it is necessary to explore the dependency between possible discussion scenarios and emotional clusters in order to give some recommendations on making such a choice. Once the significant emotional clusters are defined the system administrator creates a HMM e.g. with the Joy, Disquiet, Aggression, Sadness, Fear and the initial Neutral emotional states (s. also section 2.2). Fig. 8 shows the corresponding HMM.
Fig. 8. The initial HMM graph for personality profiling
Nodes in Fig. 8 correspond to significant emotional clusters in the scenario. These nodes are connected with each other with the weighted arcs that describe the probability of changing the emotional state during the conversation. The weight of the arc on the step of initialization is 1 for loops and 0 for all other arcs. Weight 1 (100%) means that whatever happens a person changes his emotional state to the same state as before and experiences the identical emotion as in the previous utterance (doesn’t actually change his emotional state), where weight 0 corresponds to the impossibility of transition to some other emotional state.
30
• Setting (learning) data for utterance words. An utterance is analyzed by dividing it in individual words and then finding for every word its emotional value. That’s why it is required to enter emotional words for dialogues and thereby to prepare the system for the analysis (s. also section 4.2). Thus, in future research, it is necessary to explore the dependency between scenarios and emotional words. Different domains and scenarios introduce their own emotional words. For example, the shift word has a higher affective meaning in the shift negotiation scenario imposing a strong psychological involvement of all discussion participants whereas in a simple discussion this word isn’t especially distinguished. The word can have not only one, but many senses. However, for the sake of simplicity I use generalized values for all word senses considering possible inaccuracy as a methodological drawback of the proposed approach. A more accurate sense of the word can be retrieved using the statistical disambiguation technique introduced in [Yarowsky92]. • Setting the context of discussion. In the next step the user defines the context of the discussion. Some contextual parameters as e.g. the communication means are predefined by the discussion scenario and don’t change depending on the conversation participants, while other parameters have to be specified by the system administrator in order to estimate the progress of discussion (Fig. 9). Thus, such parameters as the relationship between the utterance’s sender and receiver (good–bad–satisfactory), communicative tone (official–neutral–friendly) and genre (poetic–ironic–neutral) have to be set for this conversation. Each time the context of the conversation changes e.g. because of new participating persons, each time the context should be redefined (s. also section 3.3.2);
31
• Fig. 9. Defining the context of discussion
If the differences between the emotional estimation of an utterance and that of the context are familiar and last for a long time then it means that the context of the discussion has changed. For now I assume unchanged context, although the dynamics of the context should be studied thoroughly in future. • Setting the system parameters. In the next step the user defines the system parameters in the Options dialog window (Fig. 10). The learning rate identifies the delta value for adjusting the weights in the HMM, the threshold value defines the definite moment of the state transition when this cumulative value is achieved, and the Whose utterances parameter identifies the person whose utterances are analyzed (s. also Algorithm 1 in section 3.3.3).
32
Fig. 10. Defining options for changing the HMM
• Carrying on a dialogue. The user loads a dialogue by clicking the Dialogue file button and starts the analysis by clicking the Analyze all button (Fig. 11).
Fig. 11. Carrying on a dialogue
The system shows only utterances of a person specified as the Whose utterances parameter in the Options dialog window.
33
An utterance is marked cyan during the analysis process and blue after it. The words which emotional contribution values are stored in the system database are marked cyan. The emotional distribution diagram can be opened by doubleclicking an utterance on the dialogue panel (Fig. 12).
Fig. 12. Emotions’ distribution diagram
Fig. 12 presents a bar chart for an utterance created by the system that shows appraised emotional values for the chosen emotional clusters. If the user is unaware of the current emotion and needs its more thorough estimation, he can carry out psychological tests described in [Emotional Analyst 03]. In order to analyze the impact of the context on the emotional appraisal of the utterance the emotions’ distribution diagram contains corresponding flags – the With context flag is used to show the emotional appraisal of the context for all emotional clusters and the Trim context flag – to reduce the emotional appraisal values by the impact of the context (Fig. 13).
34
Fig. 13. Meaning of the flags in the emotions distribution diagram
Hence, it is possible to analyze not only particular utterances, but also their influence in the environment specified by the context settings. • Adjusting the weights in the HMM. In the case the emotional estimations of dialogue utterances are explicitly saved in a dialogue course file, the user can adjust the HMM weights by using the File, Load predefined course command (s. also section 4.3). The algorithm for adjusting HMM weights is provided in section 3.3.3.
35
4. The implemented system In order to test the proposed approach I developed a computer system that monitors user’s utterances and builds a corresponding emotional model. I’ve chosen an implementation based on a conventional computer system and not e.g. on a multi-agent system in order to avoid corresponding difficulties and reduce design complexity. 4.1. The system architecture Remembering the WOZ system in Fig. 4, its architecture can be revised in order to meet requirements concerning analysis of not only two emotional states neutral/emotional, but any number of significant emotions that are defined by the scenario (Fig. 14).
User adaptive training marked behavior in the context
weighting
classifier
action
word recognition (repetitions, reformulations)
hand over to human operator
Fig. 14. Sketch of the system architecture
Fig. 14 shows the architecture sketch of the developed system. The marked behavior in the context element corresponds to user behavior in the scenario context, the word recognition element
36
introduces the information extraction from the system database and linguistic figures like repetitions and reformulations that are not implemented in the current system, the classifier element classifies the extracted information to be e.g. a simple utterance word, a discourse particle or potentially a linguistic figure. Depending on the result of this classification and the importance of particular data the system uses different weights for various types of extracted information. As an action, it adjusts the weights in the dialogue model in order to reflect changes of emotional states. The hand over to human operator element is shown for the case where the system quality is insufficient and must be improved by the human operator. It is done through changing emotional values in the system database. Class diagram for the implemented system is presented in Fig. 15.
Fig. 15. Class diagram for emotional appraisal
The PPSystem class represents the whole system. It contains panels for specifying the hidden Markov model, the context and the dialogue. The following classes represent the most significant elements of the system GUI (s. also section 3.3). The HMMPanel class stands for a panel that shows the HMM model. It allows for defining and storing the HMM model and also editing it by adding and deleting particular nodes as well as changing labels (names of emotions) (s. also Fig. 6). The ContextPanel class manages data defining the context of the negotiation as the relationship, the communicative tone, the genre parameters (s. also Fig. 7).
37
The DialoguePanel class represents the panel that is used for carrying on a dialogue (discussion) or for the simple analysis of the utterances. The Analyzer class analyzes the dialogue utterances by reading emotional values from the database and cumulating them (s. also Fig. 16). It stores the word representation of the remarks which emotional appraisal is fetched through the DBWrapper singleton class. During the analysis phase the system checks if the database contains a particular word from the utterance. If it finds this word, it instantiates the EmotionalWord class and reads its empirical emotional values. A word can be a discourse particle (oh, ah etc.). In the current system it doesn’t have any influence on behavior of the system, but it is particularly important to stress the meaning of such words and to analyze their impact in future. The OptionsPanel panel allows for changing and setting options stored in the Options singleton class like the learning rate, the threshold value and parameter identifying the person whose utterances are analyzed (s. also Fig. 10). 4.2. The system database The system database scheme is shown in Fig. 16.
Fig. 16. Database scheme
The Clusters table stores the description of the emotional clusters that are significant in the application e.g. the clusters in section 2.2. The Cluster field contains a text describing the cluster.
38
The EmotionalWords table defines the emotional words in an utterance ([EQI03]). It references emotional values in the NumericalEmotionalValueForWords table. Values for the emotional words in the NumericalEmotionalValueForWords table are provided on empirical basis (Fig. 17). Since discourse particles are not analyzed thoroughly in this thesis I provide only as a recommendation the DiscourseParticles table. A record in this table stores a discourse particle along with a factor of the intensification for emotional clusters. In order to simplify entering empirical data in the NumericalEmotionalValueForWords (the Value field) I developed an EmotionalWordsInput tool containing tabs for entering values for the emotional words, the context and the discourse particles.
Fig. 17. Entering empirical data for emotional words
Fig. 17 shows the word agoraphobic and its emotional values (abnormal fear of being helpless in an embarrassing or unescapable situation that is characterized especially by the avoidance of open or public places – [MerriamWebster03]). The Emotional values table shows all emotional clusters that are significant in the current scenario and the corresponding values. The
39
values lie in the range of 0 to 10.0 (5.0 for the Sadness cluster and 10 for the Fear cluster). The Save button is used to save the values entered in the Emotional values table. In order to show the values for an emotional word the user chooses this word in the Emotional word list and presses the Enter key. Note that the EmotionalWords table contains not only adjectives but also words with other functional meaning like nouns and verbs. The context for the conversation is specified in the Context tab (Fig. 18). Values can be calculated as specified in Table 2.
Fig. 18. Entering data for the context
Similar to entering values for the emotional words is the specification of empirical data for the discourse particles (Fig. 19). However, since the emotional words and discourse particles can be seen to be semantically different the interpretation of values for discourse particles is consequentially also different and should be studied in future more thoroughly.
40
Fig. 19. Entering data for the discourse particles
4.3. The dialogue course The system can process a dialogue course saved in a text file. Each line in this file has the format Emotion Power and corresponds to a particular dialogue utterance. It shows the dominant emotion in this utterance and its affective intensity. For example, the file containing Fondness 2 Fondness 4 Displeasure 5 Fondness 3 Fondness 3
describes five dialogue utterances. The first one expresses the dominant Fondness emotion with the intensity 2, the 2nd – the dominant Fondness emotion with the intensity 4, the 3rd – the dominant Displeasure emotion with the intensity 5 and the two last ones – the dominant Fondness emotion with the intensity 3. Given that the steps described in section 3.3.4 except the last step are already done and the learning
41
rate parameter is 0.1 and the threshold value – 5, the system calculates the resulting HMM weights using Algorithm 1. The adjustment of the weights in the HMM is done as follows: Step 1. The system initializes the HMM (Fig. 20), where the loops get the weight 1 and all other arcs the weight 0. The dominant emotion is the initial neutral emotion.
Fig. 20. Initial HMM for review appraisal (Step 1)
Or the HMM in the table form: Neutral Displeasure Fondness
Neutral 1 0 0
Displeasure 0 1 0
Fondness 0 0 1
Step 2. To decide what emotional state will be the next one, the system looks at the first row (Fondness 2) in the dialogue course file. Obviously the value 2 is less than the threshold value 5 that’s why this value 2 is stored by the system till processing the next row (Fondness 4) in the dialogue course file.
42
The system adds together the stored value and the current one and gets the result 6. It means that the system goes from the neutral state to the Fondness state. The weights of the underlying HMM have to be adjusted correspondingly – the probability of the transition to the Fondness state grows (weight (Neutral, Fondness) + learning rate parameter), whereby the weights of other outgoing arcs fall (in this example weight(Neutral,Neutral) - learning rate parameter). The Fondness state becomes current (Fig. 21).
Fig. 21. Step 2 of the weight adjustment
Neutral Displeasure Fondness
Neutral 0.9 0 0
Displeasure 0 1 0
Fondness 0.1 0 1
Step 3. The system processes the third row of the dialogue course (Displeasure 5). Since the intensity value is equal to the threshold value, the HMM transits to the Displeasure state (Fig. 22).
43
Fig. 22. Step 3 of the weight adjustment
Neutral Displeasure Fondness
Neutral 0.9 0 0
Displeasure 0 1 0.1
Fondness 0.1 0 0.9
Step 4. The last two rows in the dialogue course are processed one after another (Fondness 3, Fondness 3. 3 + 3 > 5), the adjusted HMM is shown in Fig. 23.
44
Fig. 23. The final HMM with adjusted weights (Step 4)
Neutral Displeasure Fondness
Neutral 0.9 0 0
Displeasure 0 0.9 0.1
Fondness 0.1 0.1 0.9
4.4. The used software The implemented system uses the following software: 1. Java 2 Platform, Enterprise Edition with the JDBC database interface ([Sun04]). The system is implemented in Java because of its platform independence. Furthermore, there are no special requirements on the system performance. 2. The PtPlot 5.2 library is a 2D data plotter and histogram tool implemented in Java ([Berkeley04]). It is used in the system to plot the emotional distribution diagram. The PtPlot software is used in addition to its features (as drawing charts) for the reason of its free availability e.g. to copy or to modify.
45
5. The approach evaluation
5.1. Evaluation on the base of the shift negotiation scenario For the purpose of studying emotional dialogues and testing the proposed approach it is important to get a corpus with discussions that can be classified by a human as being emotional. The Switchboard corpus meets this requirement ([Wood et al. 03]). The dialogues are thus genuine, naturally occurring conversations, but occurring in an unusual, highly significant and emotionally charged situation… The
Switchboard corpus includes 200,000 utterances. Below is an example of a dialogue provided in the Switchboard corpus “as is” ([LDC03]): 1 2 3
4 5
6 7
8 9
...SPEAKER_A: how they fund whatever um illness they wish i[t's]- it's kind of hard to SPEAKER_B: right SPEAKER_A: to get an even keel on that i mean it's really hard to say well there's you know a hundred thousand per year affected by this one so let's give it this amount and then this other one's only a tenth of that so give it a tenth of that and and just keep doling it out that way or SPEAKER_B: well i'm i'm glad i'm not the one that's that's in charge of of making those decisions they uh SPEAKER_B: you wish that there could just be money for all these problems because_1 they're all so serious you know i mean it's not so serious i guess until it affects your family and then all the sudden it's the most important thing SPEAKER_A: right SPEAKER_B: so i i would hate to have that responsibility just personally but but then again we really we really do have it as a society just decides which things we need to address SPEAKER_B: i uh SPEAKER_B: i guess i it it frightens me to think of so many people with with AIDS and with cancer and
46
10 SPEAKER_B: many of those things um if they're not able to to be insured then the country's gonna pay for it one way or the other 11 SPEAKER_A: right 12 SPEAKER_B: whether it's through 13 SPEAKER_B: prevention or or treatment or you know 14 SPEAKER_B: just uh just helping the people when they are not able to take care of themselves 15 SPEAKER_B: so it seems like one way or the other we're gonna end up paying for it 16 SPEAKER_A: that's true 17 SPEAKER_A: you know the the other thought that i had um i've had several minutes to think about this after i uh while i was finding people i uh 18 SPEAKER_A: i could think about the topic longer than the person that receives it so it's kind of a 19 SPEAKER_A: a unfair advantage as they were but…
I assume that such a dialogue can take place within the limits of the shift negotiation scenario during a conversation between two hospital employees. I have carried out some experiments following the procedure described in section 3.3.4. Accordingly, the dialogues from the shift negotiation scenario resp. from the Switchboard corpus present the following problems for a systematic evaluation: 1. Don’t possess the necessary emotional expressiveness what is seen on the small number of words that can be considered to be emotional. a. Don’t have a clear goal of the discussion between conversation participants. b. Intuitively the emotional analysis in such a dialogue corresponds to the task of a psychoanalyst that tries to understand the feelings of his patient. In this case the result is achieved not during a single session but during the whole therapy; 2. Don’t have an emotional estimation from a human expert and therefore must be additionally processed to get this estimation. In order to avoid solving of the above problems I choose another evaluation scenario – the movie review scenario.
47
5.2. Evaluation on the base of the movie review scenario The movie review scenario answers the question if the review author approves or disapproves of a particular movie ([Berardinelli04]). The transition from dialogues to movie reviews may seem unexpected – thus far I have spoken about dialogues and not about plain texts such as movie reviews. Nevertheless, movie reviews can be used as a valuable simplification of an emotional natural-language dialogue – the interaction between discussion participants isn’t present as there is only one participant in a movie review – its author. Reviews contain the author’s emotions as far as what he or she deems of the movie, the dialogue model (HMM) is actually unnecessary as the evaluation results should show whether the approach can be used for identifying emotions or not. Hence, I can test the analysis engine and don’t take into consideration the discussion context or possible models (HMMs, neural networks etc.) as these issues are very complex and should be studied more thoroughly in future. The reviews are provided with affective estimations from the review’s author expressed numerically using the star notation. Hence, a 2 to 4 star appraisal in the review means that the author favors the movie, whereas 2 or less – the author disapproves of the film. For analyzing movie reviews I choose only two simple emotional clusters – Fondness and Displeasure. In order to populate the system database I enter 450 emotional words from a training set of 49 movie reviews. Then I test the quality of this input by comparing the appraisal from the system and the review’s author for a testing set of 49 movie reviews. Note that I don’t edit the reviews in order to exclude the plot description that is a part in the most reviews and that may contain some plot description that can be falsely interpreted by the system as being a significant part of the review. The best example of this is the film Iris, which tells the story of a woman sick with Alzheimer’s disease. Although this film is in the opinion of the author good (3.5 stars), the review contains many words that can be considered by the system to express a negative opinion about this film. I still leave these words in the database as I assume that they can be used in some other review to illustrate real negative emotions of the author and not the plot description.
48
Before I provide the whole table with the reviews used for entering data, I describe the most characteristic reviews. The words in the excerpts that are considered to be emotional are provided in italics: 1. Independence Day (1996). http://movie-reviews.colossus.net/movies/i/id4.html. Appraisal: 2 stars. Computer appraisal: 53.0:149.0=0.355. Excerpt: It's useless to advise people not to see Independence Day, so I'll issue a warning instead: curb your enthusiasm and don't expect much. With suitably low expectations, you're likely not to be too disappointed, unless you make the mistake of actually thinking about what's taking place onscreen while it's going on. The last half hour is built on a series of contrivances and implausibilities that even a six-year old could find serious flaw with, so be prepared to use the "brain off" switch. But Independence Day isn't about logic and intelligence. It's about space battles, mass destruction, and a laughably "rousing" speech by the President. This is a spectacle, pure and simple. Unfortunately, because the film makers mistakenly tried to inject a load of weak dramatic elements, Independence Day turns out to be overlong, overblown, and overdone.
Comments: This is a typical negative review. 2. Titanic (1997). http://movie-reviews.colossus.net/movies/t/titanic.html. Appraisal: 4 stars. Computer appraisal: 223.0:79.0=2.822. Excerpt: Short of climbing aboard a time capsule and peeling back eight and one-half decades, James Cameron's magnificent Titanic is the closest any of us will get to walking the decks of the doomed ocean liner. Meticulous in detail, yet vast in scope and intent, Titanic is the kind of epic motion picture event that has become a rarity. You don't just watch Titanic, you experience it -- from the launch to the sinking, then on a journey two and one-half miles below the surface, into the cold, watery grave where Cameron has shot never-before seen documentary footage specifically for this movie.
Comments: This is a typical positive review. 3. Iris (2001). http://movie-reviews.colossus.net/movies/i/iris.html
49
Appraisal: 3.5 stars. Computer appraisal: 200.0:113.0=1.769 > 1.7644 (!) Excerpt: Iris persuasively illustrates how Alzheimer's affects not only the afflicted, but those who are close to the victim. During the early stages, Iris - once brilliant, now faltering - is at the tragedy's epicenter. But, as the disease advances, the burden shifts to John. By then, Iris has been reduced to a state where she is unaware of what has happened to her. She is an infant in an adult's body. She does not recognize that her brilliance has been obliterated. That cross is John's to bear. Director Richard Eyre (primarily known for his stage work in England) facilitates our empathy for a man who is powerless to act as the most important person in his life is slowly, inexorably diminished before his eyes. We share John's pain because we, through the flashbacks, have known the younger Iris and recognize what the two mean to one another. And we know that he is ill-suited to care for her (the unhealthy state of their house - with litter and grime all over - emphasizes this).
Comments: This is a typical review for a film considered by the review’s author as good although the plot description contains words that express negative emotions. Note that data entering is not as straightforward as considering only one sense of an emotional word from the review text. While choosing the values for the weights of the emotional words in the system database it is also necessary to consider the probability distribution for possible word senses. For example, the word best can be used not only in phrases expressing approval, but also in negations, where it communicates disapproval e.g. in the It was not the best performance phrase the best word is used in this phrase to express disapproval. For this reason, I enter 6 for the word best and not the maximum 10. In other cases, e.g. for the word undercooked, it is highly improbable that this word would be used in a review communicating an emotion other than displeasure. Hence, this word gets an estimation of 3 for the displeasure emotional cluster with no doubts if it is used for expressing non-displeasure as the literal meaning of this word is almost unlikely to appear in a movie review (s. also the empirical data issue in section 6.3). There are also words containing estimations for both emotions, for the displeasure, but also for the fondness emotional clusters. The
50
word crazy is difficult to classify and to forecast in what situations it expresses what emotion, that’s why it is classified as having an estimation of 4 for the both clusters. Note that in the case that if two emotions have the same values for both emotional clusters it has different impact on the final result of the analysis. For example, given that an utterance contains the word crazy and the cumulative values are 200 for the fondness and 100 for the displeasure emotional clusters. The word crazy is estimated to have the 4 value for the both clusters. Hence, this value has greater meaning for the displeasure emotional cluster as it was analyzed with the less estimation (100 < 200). Consequently, the final ratio 200:100=2 is less than 196:96=2.0416 with the excluded components of the word crazy. This asymmetry can be considered as a slight lack of the proposed approach. As the films are different and describe various issues, I also present MPAA classification in the corresponding column in order to show that the analysis results don’t depend on the plot of the film. According to [MPAA04] movies can be classified as: • G:"General Audiences-All Ages Admitted." • PG:"Parental Guidance Suggested. Some Material May Not Be Suitable For Children." • PG-13:"Parents Strongly Cautioned. Some Material May Be Inappropriate For Children Under 13." • R:"Restricted, Under 17 Requires Accompanying Parent Or Adult Guardian." • NC-17:"No One 17 And Under Admitted." The values in the Human column correspond to the number of stars given by a human, whereas the value in the Computer column is the ratio of estimations for the Fondness and the Displeasure emotional clusters. For example, in the ratio 293:75 the value 293 corresponds to the cumulative value for the Fondness emotional cluster and the value 75 – to the cumulative value for the Displeasure emotional clusters (Table 4). I marked red in the table the wrong classified films (for the classification range provided below).
51
Film
MPAA classification
Human
Computer
1. Amadeus
PG (original); R (director's cut) (brief nudity, mild profanity) NR (Violence, mature themes, sexual situations) PG-13 (Mature themes, implied sex, discreet nudity, language, violence) R (Violence, mature themes, language, brief nudity) R (Profanity, mature themes, brief nudity) R (Language, violence, mature themes) PG-13 (Mayhem, nudity, sex, profanity, mild violence) R (Sexual themes) R (Sexual content, brief nudity) R (Profanity, nudity, sexual situations, drugs, violence) PG-13 (Mature themes, brief profanity & violence) R (Nudity, sex, profanity) R (Profanity, violence, sex) R (Profanity, violence, sex, nudity) R (Profanity, nudity) PG-13 (Profanity, sexual situations) R (Violence, profanity, sex, nudity) R (Profanity) R (profanity, mature themes, drug use) PG-13 (Profanity, sexual situations) R (Nudity, sex, violence, mature themes) R (Nudity, sexual situations, language) R (Sex, profanity) PG-13 (Profanity, violence)
4
293.0:75.0=3.906
4
113.0:34.0=3.323
4
107.0:17.0=6.294
4
145.0:12.0=12.083
4
162.0:28.0=5.785
4
168.0:20.0=8.4
4
223.0:79.0=2.822
3.5 3.5 3.5
109.0:14.0=7.785 202.0:38.0=5.315 159.0:94.0=1.691
3.5
93.0:15.0=6.2
3.5 3.5 3.5
200.0:113.0=1.769 131.0:90.0=1.455 134.0:13.0=10.307
3 3
90.0:38.0=2.368 153.0:153.0=1.0
3
88.0:41.0=2.146
3 3
53.0:21.0=2.523 106.0:24.0=4.416
3
212.0:16.0=13.25
3
84.0:40.0=2.1
2.5
46.0:15.0=3.066
2.5 2.5
102.0:21.0=4.857 94.0:50.0=1.88
R (Extreme and frequent profanity, sexual references, suicide) PG (Sexual innuendo) Unrated (Graphic sex, nudity, profanity, drug use)
2.5
58.0:43.0=1.348
2.5 2.5
77.0:35.0=2.2 54.0:30.0=1.8
2. Farewell My Concubine 3. Forrest Gump
4. God Father 5. Lost in Translation 6. Pulp Fiction 7. Titanic 8. 8 Women 9. Amelie 10. American Beauty 11. Far from Heaven 12. Iris 13. Magnolia 14. Leaving Las Vegas 15. About Schmidt 16. AI 17. Amores Perros 18. Billy Elliot 19. Black Cat, White Cat 20. For Love of the Game 21. Piano 22. A la Mode 23. About Adam 24. O Brother, where are thou? 25. Primary Colors
26. Princess Caraboo 27. Y tu mamá también (And Your Mother
52
Film Too) 28. You Only Live Twice 29. Fortress 30. 31. 32. 33.
Independence Day Loaded Weapon Palmetto Primal Fear
34. Romeo Must Die 35. The Vanishing 36. Ace Ventura: Pet Detective 37. American Outlaws 38. Aspen Extreme 39. Faculty 40. Killer Condom (Kondom des Grauens) 41. On the Line 42. Wedding Planner 43. Air Bud 44. Kazaam 45. Knight Moves 46. Love Don’t Cost a Thing 47. Mission to Mars 48. The Ninth Gate 49. Trees Lounge
MPAA classification
Human
Computer
PG (Sexual innuendo, violence)
2.5
46.0:20.0=2.3
R (Violence, gore, nudity, sex, language) PG-13 (Violence, profanity) PG-13 (Language, sex) R (Violence, profanity, sex) R (Violence, profanity, sex, nudity) R (Violence, profanity, brief nudity) R (Violence, language) PG-13 (Potentially offensive humor, language) PG-13 (Violence, brief profanity) PG-13 (Language, partial nudity) R (Violence, gore, profanity, nudity) No MPAA Rating (Sex, violence, profanity) PG (Mild innuendo) PG-13 (Mature themes, profanity) PG (Sophomoric humor) PG (nothing offensive) R (Language, violence, nudity) PG-13 (Sexual situations, profanity) PG-13 (Profanity, violence) R (Profanity, violence, sex, nudity) R (Language, mature themes)
2
102.0:62.0=1.645
2 2 2 2
53.0:149.0=0.355 34.0:49.0=0.693 77.0:31.0=2.483 43.0:51.0=0.843
2
94.0:65.0=1.446
2 1.5
48.0:30.0=1.6 57.0:45.0=1.266
1.5
41.0:67.0=0.611
1.5
62.0:42.0=1.476
1.5
66.0:48.0=1.3
1.5
35.0:23.0=1.521
1.5 1.5
65.0:92.0=0.706 115.0:67.0=1.716
1 1 1 1
57.0:67.0=0.850 44.0:46.0=0.956 36.0:60.0=0.6 114.0:18.0=6.333
1 1
130.0:69.0=1.884 26.0:59.0=0.440
1
48.0:31.0=1.548
Table 4. Reviews for entering data
It is unusual that the training set already contains falsely classified data. Below I describe some possible reasons for such an error along with the reasons for classification errors e.g. the imperfectness of the analysis engine. The following table provides mean values for the data in the Computer column of Table 4.
53
Human Estimation 4 3.5 3 2.5 2 1.5 1
Computer mean 1049:265 = 3,958490 1028:377 = 2,726790 786:333= 2,360360 477:214 = 2,228971 451:437 = 1,032036 441:384 = 1,1484375 455:350 = 1,3
Table 5. Computer mean values for different human estimations
In order to find a reliable range of values for differentiating between the fondness and the displeasure emotional clusters, I analyze the mean values for the emotional groups (Fig. 24).
Fig. 24. Determining the estimation range
Fig. 24 shows the mean values’ function. For the human estimations from 4 to 2 it is steadily monotonic decreasing and then it increases. I consider this “increase” to be unreliable because the differences between the mean values are too small. That’s why the range for estimating the emotional clusters is calculated as follows: • The upper boundary for the fondness cluster is +∞. • The lower boundary for the fondness cluster is the mean value between the mean value for the human estimation 2.5 stars (2.22) and the maximal mean value for estimations in the displeasure emotional cluster (1.3 for the human estimation 1 star). Hence,
54
the lower boundary is (477:214 + 455:350) / 2= (2.22 + 1.3) / 2 = 1,764485981. • The upper boundary for the displeasure cluster is consequentially the same as the lower boundary for the fondness cluster. • The lower boundary is 0. Thus, the range differentiating the fondness and the displeasure emotional clusters are defined as: • The displeasure emotional cluster: [0 .. 1,7644859) • The fondness emotional cluster: [1,7644859 .. ∞) According to this estimation range the system classifies correct in the above table 42 from 49 films that is 86%. In the following I present the table with the film reviews used for testing the entered data. I marked red the films that were classified incorrectly.
55
Film
MPAA Classification
Human
Computer
50. Before Sunrise 51. Bonnie and Clyde 52. Courage Under Fire
R (Mature themes, language) M (Violence, sexual situations) R (Profanity, violence, mature themes) PG (Violence) R (Extreme violence, profanity, brief nudity) R (Extreme violence and gore, excessive profanity) R (Violence, profanity, mature themes) PG-13 (Mature themes, brief nudity) R (Violence, language) R (Violence, mature themes, language) R (Profanity, angst)
4 4 4
72.0:27.0=2.666 129.0:38.0=3.394 114.0:24.0=4.75
4 4
122.0:36.0=3.388 59.0:34.0=1.735
4
51.0:9.0=5.666
4
79.0:13.0=6.076
3.5
84.0:14.0=6.0
3.5 3.5
42.0:18.0=2.333 96.0:7.0=13.714
3.5
130.0:35.0=3.714
PG (Mature themes)
3.5
58.0:22.0=2.636
Unrated (Mature themes, violence) R (Language, violence) R (Violence, sex, profanity) G R (Sexual situations, mature themes, language) PG (Mature themes) PG (Minor violence) No MPAA Rating (Profanity, mature themes, brief nudity) No MPAA Rating (Profanity, mature themes, brief nudity) PG-13 (Profanity, boxing violence, sensuality) PG-13 (Profanity, disasterrelated peril and destruction) PG (Mature themes) R (Violence, sexual situations, nudity, language, drug use) PG-13 (Language, violence, mature themes) R (Language, sex, mature themes) R (Extreme violence, language, sex, nudity) R (Violence, language, mature themes) R (Violence, language)
3.5
113.0:45.0=2.511
3.5 3 3 3
72.0:31.0=2.322 176.0:41.0=4.292 61.0:2.0=30.5 46.0:14.0=3.285
3 3 3
61.0:18.0=3.388 145.0:36.0=4.027 41.0:21.0=1.952
3
56.0:19.0=2.947
2.5
55.0:35.0=1.571
2.5
32.0:64.0=0.5
2.5 2.5
62.0:29.0=2.137 84.0:13.0=6.461
2.5
69.0:16.0=4.312
2.5
74.0:0.0=Infinity
2.5
103.0:44.0=2.340
2
54.0:77.0=0.701
2
87.0:88.0=0.988
53. Gettysberg 54. Hard 55. Reservoir Dogs 56. Taxi Driver 57. The Bridges of Madison County 58. A Bronx Tale 59. Bullets Over Broadway 60. Everyone Says I Love You 61. Manhattan Murder Mystery 62. Nosferatu 63. Speed 64. Absolute Power 65. Babe 66. Bodies, Rest and Motion 67. Ethan Frome 68. Evita 69. What Happened Was… 70. Wife 71. Against the Ropes 72. Daylight 73. Far Off Place 74. Killing Zoe 75. The Mask 76. Naked In New York 77. True Romance 78. Bad Boys 79. Cliffhanger
56
Film
MPAA Classification
Human
Computer
80. Dazed and Confused 81. Fear 82. Last Action Hero 83. Murder at 1600 84. Renaissance Man 85. Alien 3
R (Language, drug and alcohol use) R (Violence, sex, profanity) PG-13 (Violence) R (Violence, profanity, sex) PG-13 (Language) R (Violence, gore, profanity, mature themes) R (Violence, gore, profanity, nudity) PG-13 (Mature themes, profanity) PG (Cartoon violence) PG-13 (Language, mature themes) PG-13 (Profanity, sexual situations) R (Profanity, violence) PG-13 (Bawdy humor)
2
64.0:48.0=1.333
2 2 2 2 1.5
86.0:29.0=2.965 85.0:57.0=1.491 66.0:26.0=2.538 72.0:47.0=1.531 57.0:18.0=3.166
1.5
70.0:45.0=1.555
1.5
44.0:32.0=1.375
1.5 1.5
37.0:40.0=0.925 55.0:68.0=0.808
1.5
68.0:34.0=2.0
1.5 1
47.0:44.0=1.068 31.0:70.0=0.442
PG-13 (Profanity, vulgarity, brief nudity) PG-13 (Profanity, violence, mature themes) PG-13 (Violence, mild profanity, brief nudity) R (Violence, language, mature themes, bad acting) PG-13 (Mature themes, profanity) R (Violence, profanity)
1
90.0:50.0=1.8
1
82.0:70.0=1.171
1
68.0:72.0=0.944
1
25.0:52.0=0.480
1
58.0:31.0=1.870
1
21.0:47.0=0.446
86. An American Werewolf in Paris 87. The Cable Guy 88. Coneheads 89. Dumb & Dumber 90. Saved! 91. Shadow Conspiracy 92. Ace Ventura: When Nature Calls 93. Almost Heroes 94. Antitrust 95. Avengers 96. Bad Girls 97. Envy 98. Turbulence
Table 6. Reviews for testing the proposed approach
Using the estimation range provided above the system classifies correctly 40 of 49 films in the above table that is 82%. The columns with the film classification in Table 4 and Table 6 don’t show any correlation between the classification of the film and the computer estimation. Below, I try to explore possible reasons for the classification errors: • The review author. Since everyone has his favourite words and phrases the training data is subjective what presents an additional source of errors. Another review author would use different words to designate the same emotion that are probably not stored in the system database; • Imperfectness of the analysis engine. Consider the phrase “On the Line isn't good enough to get us to that
57
point” from the review for the film On the Line. Using the simple Word by Word approach results in the overall positive
estimation of this sentence because of the participation of the word good, although it is clear that the review author means the opposite. In order to improve the quality of the classification the analysis engine in the next version of the system should scrutinize not only separate utterance’s words, but also compound phrases using a grammatical parser. • Overfitting. Note that while entering data the overfitting problem should be considered. Too much data in the database can worsen the results of the analysis ([Mitchell97]).
58
6. Conclusions
6.1. Summary In this diploma thesis I introduced an approach for modeling human emotions using HMMs and described a software system that implements this approach. The proposed modeling method is both simple and at the same time suggestive which makes it easy to use this approach for different HCI scenarios. Despite its simplicity the proposed solution shows very good results identifying emotions in natural-language utterances and hence can be used as a basis for further system extensions. In Chapter 2, I described preliminaries from different sciences that are necessary for better understanding of the problems concerning human emotions and hereby the issues of this thesis. In order to group emotions I used psychological clusters. I introduced a computer model for human emotions based on the hidden Markov models (HMMs) and showed their advantages and disadvantages. Chapter 3 defined the proposed approach that results from the preliminaries in Chapter 2. Since the approach implies some limitations, I also described them and presented an algorithm for adjusting the weights of the HMM. Chapter 4 described the implemented computer system and the way to use it. Chapter 5 illustrated the approach evaluation along with the method of choosing the classification range and corresponding data. Chapter 6 provided a summary, the related work, an outlook and the final remarks. 6.2. Related work When talking about the related work it is necessary to distinguish two aspects – issues concerning natural-language processing and that regarding modeling human behavior. The general question when working with natural-language processing is the decision what approach to choose – deep or shallow. The deep
59
approaches mean thorough analysis of natural-language utterances e.g. that of their grammatical structure, whereby the shallow approaches don’t have this stage. Deep approaches have a possible disadvantage – they imply the complication of the analysis what doesn’t necessary mean its better accuracy (s. also [Verbmobil04]). Several projects exist that focus on defining computer models for human behavior and that test these models using computer systems. The WOZ system described in section 2.4 identifies emotions using the prosodic dialogue data. The system is based on a statistical approach allowing for the identification of two emotional states (neutral/emotional). In contrast the proposed approach in this thesis provides a means for identifying more than two emotions whereby the affective information is stored as text and not as audio data. One other project that should also be mentioned is the project from [Dörner99]. The goal of the project is the development of a software agent (called the agent) that models human behavior and can “live” in the real world. The theory has a model that is based on neural networks. The input of these neural networks is different stimuli of the agent e.g. the hunger stimulus. The agent responds to appearing stimuli that also can be natural-language utterances. The theory defines emotions as modulated behavior of the agent where possible modulators are the arousal, resolution-level, selection threshold and background control parameters ([Dörner et al. 02]). The proposed approach in this thesis can accompany the theory by identifying emotions using natural-language utterances. 6.3. Outlook In this section, I provide some ideas concerning possible system extensions. But before I do it, I have to point out that every system enhancement is the subject to prove regarding system quality. As each extension makes the analysis more complicated the reasons for such improvement have to be balanced very carefully: the system quality vs. system complication. For example, if the system is extended with a grammar parser the system quality should be studied exhaustively in order to find out if such an extension is possibly dispensable and can be discarded if it brings no significant improvement to the system quality.
60
Future versions of the system should study problems that concern the choice of an emotional model (HMMs vs. artificial networks) regarding different factors e.g. modeling simultaneous emotions or the influence of emotions of other individuals. Future systems should provide a true connection between the pillars of the general framework (HMM of emotions, the context model, and the emotional appraisal engine) in order to investigate different interdependencies between these three pillars. It is also possible to research in this regard such terms as joke (s. also [Freud85] for the joke taxonomy), irony, satire, grotesque, parody on the base e.g. of the number and the linguistic type of stylistic figures in the scrutinized text as well as on the base of contextual parameters. For example, two hyperboles in an utterance in the context of bad relationship between the utterance’s sender and receiver could represent an ironic mood of the utterance’s sender. In future research it is essential to explore the dependency between possible discussion scenarios and the choice of emotional clusters in order to give some recommendations on making such a choice. Furthermore, the dependency between the domain of the discussion scenarios and emotional words should be investigated more exhaustively (s. also section 3.3). The current system doesn’t analyze the dynamics of the emotional dialogue resulting from scrutinizing its development. That’s why the next version of the system should carry out this analysis by e.g. constructing time sequences of emotional contribution vectors that are composed from contribution values of utterances as follows:
where is the emotional contribution vector, contributionu,i stands for the emotional contribution of the utterance u to an emotional cluster i. Hence, the sequences of contribution vectors corresponding to utterances in a dialogue can be used for modeling dynamics of this dialogue.
61
The proposed approach is mainly based on the linguistic preliminaries. However, in order to improve the quality of modeling, the next version of the system could also use notions from other sciences. I didn’t study thoroughly the impact of discourse particles or that of the grammatical structure of a phrase ([Norvig95], [Lohnstein00]). Analyzing the sentence grammar leads to the consideration of advantages of the shallow or deep or possibly heterogeneous techniques (s. also MultiNet in [Helbig01]). Future versions of the system should pay more attention to these issues. Also the correct interpretation of pronominals should be studied along with the grammar issues (s. also [Koch et al. 00]). The proposed approach is based on analyzing separate words that can have different meanings. In order to improve the accuracy of the analysis the system can use the sense disambiguation technique introduced in [Yarowsky92]. The implemented system doesn’t analyze complex phrases e.g. idioms due to complexity of this issue. The next version of the system should include analysis of idioms and figures of speech. I sketch here a possible solution. The EmotionalWords table defines not only single words, but also phrases that are stored in the database in the form of regular expressions ([Hommel04]) along with their emotional meaning e.g. the Not calm phrase is matched by the not [?]+ regular expression and the MAXVAL-[1] string stores the emotional meaning of this phrase or for the rather panic than calm phrase – the rather [?]+ than [?]+ regular expression and the [3] - [1] as an emotional meaning: String phrase0 = "not calm"; boolean matched0 = phrase0.split("not [?]+").length!=0; String[] res0 = phrase0.split("not"); //emotional meaning: MAXVAL - res0[1] String phrase1 = "rather panic than calm"; boolean matched1 = phrase1.split("rather [?]+ than [?]+").length!=0; String[] res1 = phrase1.split(" "); //emotional meaning: res1[3] - res1[1] Program code 1. Analyzing emotional meaning of complex phrases.
The difference e.g. res1[3] - res1[1] means here the difference between the emotional contribution vectors for particular word, for
62
example, difference between contribution vectors for word panic and word calm. For phrases having no emotional meaning the emotional meaning field contains a value 0. For example, the fast and furious idiom has no particular emotional meaning in the Everything at Gostiny Dvor - and all other shopping malls as well - is so fast and furious - so many people shopping, so little time before Christmas sentence ([Philpott04]). If the current system analyzes this phrase it would be mislead by the furious word that has an emotional meaning. The proposed recommen-
dation provides means for solving such problems. The analysis of the phrase to be (all) up in arms having the meaning to be very angry, letting that anger be known, opposing something e.g. in The students were up in arms over the cancellation of the class trip to Novgorod - they had really wanted to go, had been looking forward to it, and now felt cheated should be interpreted as a simple word angry in the emotional meaning field. Each regular expression must be associated with corresponding emotional values in the NumericalEmotionalValueForWords table.
The correct choice of the dominant emotion from the dominant emotional set has to be reviewed in order to provide fair handling e.g. by randomizing this choice. The empirical data used in this thesis in the NumericalEmotionalValueForWords table is the area to adjust and experiment with. Particular words can have other weights than those used in the current system. The next version of the system should introduce a more exact explanation of the choice of emotional values e.g. based on neutral emotional values weighted by the frequency of word usage in the language ([Baayen01], [Ruoff90]). In order to test the proposed approach using live dialogues that are carried on by humans the system should be redesigned as e.g. a multiagent system. In this case, one agent stands for one human and his or her psyche and the conversation held by individuals is monitored by the underlying multi-agent system that tracks emotional behavior of the humans and registers the changes of emotions. In the next development step, the acquired data can be used to prescribe constructed emotional models to conversational agents that can build realistic emotional dialogues.
63
6.4. Final remarks I hope this thesis provided a useful contribution to the research in human computer interaction. I tried to give an insight into this fascinating field considering that it will play a much more significant role in future when the dialogue between mechanical devices and humans becomes vitally important. Future work in this field is manifold and multilateral. In conclusion, I quote a verse from Confucius which can be interpreted here as the encouragement for the further research: “Aspire to the way, align with virtue, abide by benevolence, and immerse yourself in the arts”.
64
Selbständigkeitserklärung Ich erkläre hiermit, dass ich die vorliegende Arbeit selbständig und nur unter Verwendung der angegebenen Literatur und Hilfsmittel angefertigt habe.
Berlin, den
Datum, Unterschrift
Einverständniserklärung Ich erkläre hiermit mein Einverständnis, dass die vorliegende Arbeit in der Bibliothek des Institutes für Informatik der Humboldt-Universität zu Berlin ausgestellt werden darf.
Berlin, den
Datum, Unterschrift
65
References 1. [Anderson95] Anderson, J. A. An Introduction to Neural Networks. The MIT Press. Cambridge. 1995. 2. [Batliner et al. 00] Batliner, A., Huber R., Niemann H., Nöth, E., Spilker J., Fischer K. The Recognition of Emotion. In Wahlster, W. (ed.): Verbmobil: Foundations of Speech-to-Speech translation. Springer. 2000. 3. [Batliner et al. 00_1] Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E. Desperately Seeking Emotions: Actors, Wizards, and Human Beings. In Proceedings of the ISCA Workshop on Speech and Emotion. URL: http://www.qub.ac.uk/en/isca/proceedings/pdfs/batliner.pdf. 2000. 4. [Baayen01] Baayen, R. H. Word frequency distributions. Kluwer, 2001. 5. [Belikov et al. 01] , . ., , . . (Sociolinguistics). . 2001. 6. [Berardinelli04] Berardinelli, J. URL: http://moviereviews.colossus.net/master.html. 2004. 7. [Berkeley04] The PtPlot 5.2 library. UC Berkeley, EECS. URL: http://ptolemy.eecs.berkeley.edu/java/ptplot. 2004. 8. [Budanitsky99] Budanitsky, A. Lexical Semantic Relatedness and its Application in Natural Language Processing. Technical Report CSRG-390, Department of Computer Science, University of Toronto. 1999. 9. [Chomsky57] Chomsky, N. Syntactic Structures. Mouton, 1957. 10. [Dörner99] Dörner, D. Bauplan für eine Seele. 1. Aufl. . Rowohlt. 1999. 11. [Dörner et al. 02] Dörner, D., Brieler, N. Die Simulation von Gefühlen. Psychologisches Institut der Universität Bamberg. 8. Systemwissenschaftlichen Kolloquiums der Universität Osnabrück. 2002. 12. [Dörner03] Dörner, D. The Mathematics of Emotions. 2003. 13. [Dost03] Dostojevsky, F. Crime and Punishment. The Project Gutenberg. URL: http://www.gutenberg.net/. 2003.
66
14. [Emotional Analyst 03] Geneva Research Group. Department of Psychology, University of Geneva. URL: http://www.unige.ch/ fapse/emotion/demo/TestAnalyst/GERG/apache/htdocs/index.php. 15. [EQI03] Emotional Intelligence. URL: http://www.eqi.org. 2003. 16. [Garson03] Garson, J. Connectionism. URL: http://plato.stanford.edu/entries/connectionism/. 2003. 17. [Faratin00] Faratin, P. Automated Service Negotiation Between Autonomous Computational Agents Ph.D. Dissertation. University of London, Queen Mary College, Department of Electronic Engineering. URL: http://www.ana.lcs.mit.edu/peyman/papers.htm. 2000. 18. [Freud85] Freud, S. Der Witz und seine Beziehung zum Unterbewusstsein. Gustav Kiepenheuer Verlag. 1985. 19. [Fries96] Fries, N. Sprachsystem und Emotionen. In: Sprache und Subjektivität. Klein, W. (ed.). LiLi, pp. 37-69. 1996. 20. [Helbig01] Helbig, H. Die semantische Struktur natürlicher Sprache: Wissensrepräsentation mit MultiNet. Springer. 2001. 21. [Hill et al. 03] Hill, R. W., Jr, Gratch J., Marcella, S., Rickel, J., Swartout, W., Traum, D. Virtual Humans in the Mission Rehearsal Exercise System. In: KI Magazine. November 2003. 22. [Hommel04] Hommel, S. A. Regular expressions. URL: http://java.sun.com/docs/books/tutorial/extra/regex. 2004. 23. [Jahr00] Jahr, S. Emotionen und Emotionsstrukturen in Sachtexten. Ein interdisziplinärer Ansatz zur qualitativen und quantitativen Beschreibung der Emotionalität von Texten. Walter de Gruyter. 2000. 24. [Kaas01] Kaas J. H. (ed.). The mutable brain: dynamic and plastic features of the developing and mature brain. Harwood, 2001. 25. [Knoll et al. 03] Knoll, A.C., Burgard. W., Christaller, T. In: Handbuch der Künstlichen Intelligenz. Görz G., Rollinger, C.R., Schneeberger, J (Eds.). 4. Auflage. Oldenbourg Verlag. 2003. 26. [Koch et al. 00] Koch, S., Küssner, U., Stede, M. In: Verbmobil: foundations of speech-to-speech translation. Wahlster, W. [Ed.]. Springer, 2000. 27. [Lazarus91] Lazarus, R. S. Emotion and adaptation. Oxford Univ. New York [u.a.]. 1991.
67
28. [Leech et al. 94] Leech, G. N., Startvik, J. A communicative grammar of English. 2nd edition. Longman Group Limited. 1994. 29. [Lesser et al. 99] Lesser, V., Atighetchi, M., Benyo, B., Horling, B., Raja, A., Vincent, R., Wagner, T., Xuan, P., Zhang, S. A Multi-Agent System for Intelligent Environment Control. In Proceedings of the Third International Conference on Autonomous Agents. URL: http://mas.cs.umass.edu/paper/120. January, 1999. 30. [LDC03] Linguistic Data Consortium. Position 2284553. URL: http://www.ldc.upenn.edu. 2003. 31. [Lohnstein00] Lohnstein, H. Satzmodus – konzeptionell. Zur Parametrisierung der Modusphrase im Deutschen. Akademie Verlag. Berlin. 2000. 32. [Martin80] Martin; R. Ch. Soziolinguistik und Situationsbegriff. Die vernachlässigte Rolle der Situation in soziolinguistischen Untersuchungen und ihre pädagogische Interpretation. 1980. 33. [Mees91] Mees, U. Die Struktur der Emotionen. Hogrefe, 1991. 34. [MerriamWebster03] Merriam-Webster Online. URL: www.m-w.com. 2003. 35. [Mitchell97] Mitchell, T. M. Machine Learning. McGraw-Hill International Press. 1997. 36. [MIT03] Affective Computing Research Group. MIT. URL: http://affect.media.mit.edu/AC_research/emotions.html. 2003. 37. [MPAA04] The Motion Picture Association of America. URL: http://www.mpaa.org/home.htm. 2004. 38. [Norvig95] Norvig, N. Artificial Intelligence. A Modern Approach. Prentice-Hall International. 1995. 39. [Ortony et al. 88] Ortony, A., Clore, G. L., Collins, A. The Cognitive Structure of Emotions. Cambridge University Press. 1988. 40. [Philpott04] Phillpott, P. Peter Philpott's English Idioms: featuring special examples from St. Petersburg. URL: http://home.tonline.de/home/toni.goeller/idioms/index.html. 2004. 41. [Picard97] Picard, R. W. The affective computing. The MIT Press. 1997. 42. [Reisig98] Reisig, W. Elements of Distributed Algorithms. Modeling and Analysis with Petri Nets. Springer. 1998.
68
43. [Ruoff90] Ruoff, A. Häufigkeitswörterbuch gesprochener Sprache : gesondert nach Wortarten alphabetisch, rückläufigalphabetisch und nach Häufigkeit geordnet / Arno Ruoff. - 2., unveränd. Aufl. . - Tübingen : Niemeyer, 1990. 44. [Salter95] Salter, F. K. Emotions in command. A Natural Study of Institutional Dominance. Forschungsstelle für Humanethologie in der Max-Planck-Gesellschaft, Andechs, Germany. Oxford University Press. 1995. 45. [Scheler et al. 97] Scheler, G., Fischer, K. The Many Functions of Discourse Particles: A computational model of pragmatic interpretation. In Proceedings of CogSci. 1997. 46. [SchmidtAtzert81] Schmidt-Atzert, L. Emotionspsychologie. Verlag Kohlhammer. Stuttgart [u.a.]. 1981. 47. [Sun04] Java 2 Platform, Enterprise Edition. URL: http://java.sun.com/j2ee/download.html. 2004. 48. [Urbig et al. 03] Urbig, D. / Monett Diaz, D. / Schröter, K. Introducing the IPS framework to model altruistic relationbuilding agents in negotiation. Submitted to the International Joint Conference on Artificial Intelligence. 2003. 49. [Verbmobil04] Verbmobil Projekt. DFKI. URL: http://verbmobil.dfki.de. 2004. 50. [Vicente et al. 00] de Vicente, A., Pain, H. A Computational Model of Affective Educational Dialogues. Papers from the 2000 AAAI Fall Symposium: Building Dialogue Systems for Tutorial Applications, North Falmouth, Massachusetts, November 3-5, 2000. Technical Report FS-00-01, AAAI Press, Menlo Park, CA, USA. pp 113-121. URL: http://www.iac.es/galeria/ angelv/papers/AVicente00.pdf . 2000. 51. [Weigand03] Weigand, E. Emotions: The simple and the complex. URL: http://www.uni-muenster.de/Ling/Emotions_preliminary.pdf. 52. [Weizenbaum76] Weizenbaum, J. ELIZA – A Computer Program fort the Study of Natural Language Communication between Man and Machine. Communication of the ACM, Volume 9. 1976. 53. [Williams03] Williams, G. Figures of Speech. URL: http://www.nipissingu.ca/faculty/williams/figofspe.htm. 2003. 54. [Wood et al. 03] Wood, M., Craggs, R., Fletcher, I., Maguire P. Rare Dialogue Acts Common in Oncology Consultations. URL:
69
http://www.cs.man.ac.uk/~craggsr7/papers/RareDialogActs.pdf. 2003. 55. [Wright95] Wright R. The Moral Animal. Why we are the way we are: the new science of evolutionary psychology. Vintage Books. 1995. 56. [Yarowsky92] Yarowsky, D. Word-Sense Disambiguation Using Statistical Models of Roget's Categories Train on Large Corpora. University of Texas at Arlington. Department of computer science and engineering. URL: http://ranger.uta.edu/~alp/ix/ readings/coling92WordSenseRogets.pdf. 2004.
70
Glossary Context model An environment representation of the discussion ...................... 21, 61 Context variables Variables defining the environment of the discussion ....................... 9 Numerical form The numerical representation of values of the context variables . 24 Discussion A conversation in a group of people................................................... 2 Discussion scenario A description of a situation, in which a conversation takes place...... 3 Dominant emotional set A set of emotional states that are at this particular moment the strongest ones ............................................................................... 25 Emotional appraisal The numerical representation of emotional intensity ....................... 21 Formula ............................................................................................ 25 Emotional cluster Groups of emotions .......................................................................... 10 Emotional contribution A number measuring the degree of belonging of a word to the particular cluster ........................................................................... 25 Emotional state A state in the HMM corresponding to an emotion ........................... 19 Emotional word Word that can be ascribed an emotional meaning............................ 15 Hidden Markov models (HMM) A probabilistic state automaton........................................................ 19 HMM of emotions A probabilistic state automaton where states are represented by emotions ................................................................................. 21, 61 Learning rate The parameter setting the weight changes in the HMM....... 26, 27, 32 Movie review scenario ......................................................................... 48
71
Personality Emotional characteristics of a person taking part in a discussion ...... 2 Shift negotiation scenario....................................................................... 3 Threshold value A cumulative value setting the emotional intensity for a transition in the HMM ................................................................................ 27, 32 Utterance A phrase that either can arise in a dialogue or is a analyzed text ....... 3 Whose utterances Parameter identifying the person whose utterances are analyzed (Speaker A, Speaker B, All utterances)................... 32, 33
72