Towards recognizing emotion with affective ... - Semantic Scholar

Towards recognizing emotion with affective dimensions through body gestures P. Ravindra De Silva, Minetada Osano, Ashu Marasinghe Software Engineering Laboratory, University of Aizu, Aizu Wakamatsu 965-8580, Japan Tel. Phone: (+81) 242 372790 Fax. (+81) 242 372753, email: {d8052201,osano,ashu}@u-aizu.ac.jp Ajith P. Madurapperuma Faculty of Information Technology, University of Moratuwa, Colombo, Sri Lanka. Tel. Phone: (+94) 11 4619777 Fax. (+94) 11 4619774, email:[email protected]

Abstract

sity.

1 Introduction

A major question addressed by research into the use of computational models is the types of emotional factors that need to be modeled. Modeling the intensity of emotion has not received a lot of attention in researchers in affective computing, machine vision, and the robotics. However, in virtual environment and synthetic autonomous agent literature, researchers are implementing computational models for estimating the intensity of emotions. A study [13] has implemented a computational model in an synthetic autonomous agent called Cathexis. By considering both cognitive and non-cognitive elicitors of emotions such as sensorimotor and motivational states, Cathexis models six basic emotions: anger, fear, distress/sadness, enjoyment/happiness, disgust and surprise. The intensity of emotions was estimated using level of intensity dependent factors with emotion elicitors. This model is limited because these values and weights have to be predefined. Nakajima [12] proposed a emotional model for a life like agent. In this model he proposed a intensity estimation model that considers several levels of emotion dependent factors. It uses a fuzzy interface rules for implementing the emotion eliciting condition rules. They use several threshold values to estimate the intensity of emotions.

Currently, a common method used to recognize affective gestures is by studying facial expressions. We acknowledge the importance of using modalities such as face and voice for identifying the emotional state of a human. However, it has been noted that the use of whole body postures to identify human emotional state is an important and novel research area [1]. Some researchers are considering emotions as discrete categories to create a emotional model through body postures [4]. But still two problems exist: gesture is not yet a concern as a channel of affective communication in interactive technology, and existing systems only model discrete categories, but not affective dimensions, e.g., inten-

One study [14] created a chat system that can be used to capture information about a user’s affective state, which in turn could be used to increase the user’s involvement in the conversation. They used Galvanic Skin Response (GSR) sensors to detect arousal and to manually control the valance. However, measuring physiological signals is not an easy task in normal situations. Thus, there is a need for using computational models for estimating the intensity of emotions. In this work, an affective gesture recognition system that can recognize emotion with intensity in a game scenario was developed. The rest of this paper present this work in three parts: Descriptions of affective gesture recognition system ( sec 2); The system architecture ( sec 3);

Due to the ever-increasing importance of computers in many areas of today’s society such as e-Learning, tele home-health care, and entertainment, their ability to interact well with humans is essential. Currently, researchers are using facial expression and voice recognition modalities to create systems that interact with humans. But still two problems exist: gesture is not yet a concern as a channel of affective communication in interactive technology, and existing systems only model discrete categories but not affective dimensions, e.g., intensity. Our focus has been on creating affective gesture recognition system that recognize child’s emotion with intensity through body gestures in context of game. This information is then used by a Game control module that users a rule-based adaptation model to change game level according to the child’s intensity of emotions. Results show that affective gesture recognition model recognized child’s emotion over 79% of the cases and the proposed intensity estimate model has a strong relationship with observer perception except in the low intensity level.

Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06) 0-7695-2503-2/06 $20.00 © 2006

IEEE

Training and testing the affective gesture recognition model (sec 4).

2 Descriptions of affective gesture recognition system Children playing games express a lot of emotions according to whether they win or lose. Therefore, our system attempts to recognize a child’s emotions and estimate the intensity of emotion with children in a game playing situation. We are proposing two separate models to recognize emotion and to estimate intensity of emotion. Body gesture information given by the Hidden Markov Model is mainly used to recognize the emotion. The HMM needs to be trained using test data prior to its use. Body expression, probability of game win/lose, exicitatory/inhabitoray effect, sensorimotor information (body expression), and emotion elicitor parameters are used to estimate the intensity of emotion.

the intensity of these states in real time and then to propose an emotional intensity based adaptation model to increase the performance of the child and encourage them to express their emotions. Figure 1 depicts an overview of the system. It is based on a game played by two children. Each child control the same game from different computers. One of the children wears a motion capture system (in the figure that child is marked with a rectangle boundary). The main control system resides inside an observer’s computer. The observer will gauge the affective states of the child and give feedback about the child’s affective states. In this game, a child controls a line that can be moved within the game square. The objective of the game is to get the longest line while attempting to block the opponents’ line. When a player blocks the opponent’s line, the length of each line is measured and the player with the longest line is declared the winner. If one child has a faster moving line speed, then he can obtain a longer line more quickly than the other child. Therefore winning depends on the speed of line.

3 The system architecture

Figure 1. Figure depicts overall setup of the gesture recognition system. Computer games are often designed specifically to beat humans. Brokens [6] argues that game developers try to reduce the human’s performance level rather than trying to increase it. Our aim in this work is to create an interactive system that can recognize the affective states of children and

Figure 2 illustrates the system architecture. The system consists of seven functional modules as follows: Gesture descriptions, affective gesture recognition model, intensity estimation module, observer interface, feedback processing, game control module, and game information. The observer module (interface update) gives information about emotion and intensity to the observer. The observer will gauge the emotion with its intensity of the child and give feedback about the child’s emotion with intensity of emotion. This information serves as the feedback about the child’s emotion and its intensity. The feedback information is used by the system to check the accuracy of the gesture recognition model and the intensity estimate model. This process will be executed in the feedback module. In our system, game control module controls the game (moving speed of line) according to the intensity of emotion of child. Game control module tries to encourage the child to interact more naturally with this game by assisting a child to win or lose a game by changing the moving speed of lines. Game control module always tries to keep the child’s negative emotion at a low intensity level and the positive emotion at a middle intensity level. Game information module will take game information from the children’s computer and distribute it to other modules.

3.1 The gesture descriptions module The gesture descriptions module obtains body features by analyzing motion information collected through a mo-


IEEE

3.2 Affective gesture recognition model We used HMMs [10] model for recognize child’s affective states. Once these model are trained we will be able to compute, for a given sequence of observations (body features of motion sequences in our case), the probability that this sequence was produced by each of the models. Thus, this sequence of observations will be assigned to the affective state that has a higher probability of generating it.

3.3 Computational model for estimate the intensity of emotion

Figure 2. System architecture for real time emotion recognition with intensity

tion capture system. The Xsens [15] motion capture system was used with 8 markers to capture motion information ( according results of our pervious work [5]. These 8 markers were attached to the child’s upper body parts and each of the markers gives euler angles according to a user defined coordinate system. 3 euler angle motion measurements from each sensor was obtained providing a total of 24 measurements to describe affective gestures. A major problem in real time affective gesture recognition system is that the whole data sequence cannot be used to recognize affective states in real time due to higher time complexity for estimate model’s parameters. Because within 1 minute motion capture system give 1000 of frame that described body motion information. When those whole frames use for recognize emotion, model take considerable time complexity for estimate parameters for recognize emotions in real time. Therefore, most important key frames for described gestures were selected. Suppose that a set of key points (key frames) can be defined as [αai , ti ], where αai is the euler angle of sensor a at frame i and ti is the time at frame i. A key-point is then created if it is a local minima or maxima, with a sufficient angel difference β1 and time lag β2 between two consecutive key points, i.e. if the following condition is satisfied: KFia = |αai − αai−1 | > β1 , |ti − ti−1 | > β2

(1)

Where β1 and β2 are thresholds set respectively to 0.2 [rad] and 0.1[sec]. This procedure is applied to one set of data on sensor and euler angle. The same procedure is repeated for all the sensors. 100 key frames were selected for each of the gestures. Selected frames with gesture descriptions information is sent to the affective gesture recognition module.

Several researchers are considering cognitive and noncognitive appraisals as emotional dependent factors and levels of those factors help to estimate the intensity of emotions. Proposed intensity estimation model was created combining several psychological aspects and existing computational models in synthetic agent literacy [3]. It uses intensity dependent factors for estimating intensity of emotion: excitatory/ inhibitory emotions, emotion elicitor, and previous intensity level with game states. Following exponential equation is used to estimate intensity of emotions. Estimated values for intensity will vary between 0 and its saturation value (maximum value), where IEKt represents intensity of emotion EK at time t, xEkt represents intensity dependent factors for emotion EK at time t. Next, a brief description of means of estimating emotion intensity depended factors and some theoretical background for that estimations is given. IEKt =

IEEE

1 + exp

+0.5 kt 0.1

−xE

(2)

Excitatory/Inhibitory emotion: Minsky [9] argue that some of emotions send an excitatory/inhibitory signal whose strength is proportional to their intensity (negative/positive feedback) to effect to other intensity of emotions. Minsky defines this concept as the ”avalanche effect”. We will consider this type of interaction for estimating intensity of emotions. Information gain theory can be used to find excitatory and inhibitory emotions. Suppose at time t there exists an emotion Ekt and the rest of the emotions are Eit−1 at time t − 1, where i = 1..n and (i = k). If Eit−1 and Ejt−1 emotion are informative emotions to emotion Ekt then information gain Gain(Ekt , Eit−1 ), Gain(Ekt , Ejt−1 ) > 0. According to physiological literacy same emotions have excitatory effect for each other and different emotions have inhibitory effect for each other. Suppose Eit−1 (i = 1..n) and (i = k = j) is excitatory then Gain(Ekt , Eit−1 ) > 0. Also Ejt−1 (j = 1..n) and (j = k = i) is inhibitory then Gain(Ekt , Ejt−1 ) > 0. The information gain values


1

with their previous intensity values become one of emotion intensity dependent factors. where IEit−1 is an intensity of emotion Eit−1 at t − 1 and (i = k = j). where En: Entropy, P r: probability Gain(Ekt , Eit−1 ) = En(Ekt ) − P r(Ejt−1 )En(Ejt−1 ) i

Gain(Ekt , Eit−1 )IEit−1 −

j

Gain(Ekt , Ejt−1 )IEjt −1

(3)

Emotion elicitor: Some researchers propose cognitive and non-cognitive processes that elicit emotions [11][8]. Computational model proposed in this work has used a combination of these two concepts for estimating emotion elicitor number. Suppose at time t game win/loss probability is P (Gt ), and current emotion is Ekt with probability P (Ekt ). For each Eit , P (Ekt ) is determined by the output of the affective gesture recognition model (based on HMM model). The speed of the lines the child builds is a controllable parameter. By controlling this parameter it is possible to affect the probability that a child looses or wins a game. The probability of winning a game by altering the speed of the line is called P (Ht ). The gesture recognition model recognizes emotion states according to higher probability sequence of body features. That probability P (Ekt ), is represented as the sensorimotor information. For emotion (Ekt ) to occur, a set of alternative events was involved: game information (win or lose the game), P (Gt ) and alternative help to win or lose the game, P (Ht ) ( power appraisal factors). Therefore probability of success P (Ekst ) ( for emotion (Ekt ) to occur), can be defined as P (Ekst ) = P (Ekt |Gt )P (Gt ) + P (Ekt |Ht )P (Ht ). The entropy of success Psfkt is defined as emotion elicitor number. Therefore 2nd intensity dependent factor can be defined as Psfkt = −[P (Ekst )ln(P (Ekst )) + (1 − P (Ekst ))ln(1 − P (Ekst ))]

(4)

Previous intensity level with game states: Ekman [7] proposed two ways for understanding cause of moods. One is that the cause of moods can be due to changes in biochemical states which might in turn be caused by changes such as lack of sleep or lack of food. The second cause of moods could be the repetition of a specific emotion of a very high intensity at a high frequency. For instance, repeated occurrences of joyful emotions usually modify an individual’s emotional states and induce him or her to be in good or happy mood. Adopting Ekman’s theory with Cathexis model [13], previous intensity is considered as the mood. Further, mood is considered to change according to game

states. Thus, mood with game states is considered as the third emotion intensity dependent factor. Suppose it is required to estimate the intensity of emotion Ekt at time t, IEkt , and it is known that at time t-1 the intensity of emotion Ekt−1 was IEkt−1 . Suppose at time t game win/loss probability is P r(Gt ). Hence information gain can be used to check how game information affects emotion Ekt at time t, then Gain(Ekt , P r(Gt )) > 0. If information gain value is multiplied by game win/loss probability value it is possible to find out how one of the intensity dependent factors (previous intensity and game information) helps to increase or decrease intensity of emotion Ekt . Then it can be defined that: IEkt−1 ∗ Gain(Ekt , P r(Gt ))

Finally intensity of emotional factors can be obtained by summing the above 3 equations (Equations No: 3, 4, 5). The total emotion intensity factor value can be applied to equation (2) and finally IEKt value gives intensity of emotion EKt at time t.

3.4 The game control module In the current system game control module is responsible for controlling the game according to child’s intensity of emotions. In other words game control module tries to adjust game states by demonstrating behavior similar to that of a human partner. As explained previously, intensity of emotion can vary between 0 and its saturation value (maximum value). The game control module will create 3 intensity levels between 0 and its saturation value. It uses two rules for adapting the game: for positive emotions, always try to keep the child’s affective states in level 2 and for negative emotions, try to keep child’s affective states in level 1. Game control module can control the speed of child’s lines to change his affective state level of both positive and negative emotions. Game control module changes the probability of winning a game by altering the speed of the line. Suppose that a child’s emotion is at one of the intensity levels. Then the game control module should try to change his/her intensity of emotion to another level. To achieve this change a suitable Pr(H) value [changing the probability of winning a game by altering the speed of the line] need to be found by changing the intensity values between min and max of intensity level ( intensity level needs to be changed) for getting maximum value of α. Suppose at time t emotion Ekt exists and the rest of the emotions are Eit , where i = 1..n and (i = k). Suppose Eit (i = 1..n and (i = k = j) are excitatory emotions , Ejt (j = 1..n and (j = k = i) is inhibitory emotions and Psf is an emotion elicitor (game information).


IEEE

(5)

αkt = Gain(Ekt , Psfkt ) +

i

Gain(Ekt , Eit )IEit −

Gain(Ekt , Ejt )IEjt (6)

j

The rationale for this is the need to explore all possible ways to increase/decrease intensity of emotions Ek .

(a)

(b)

(c)

(d)

4 Training and testing the affective gesture recognition model Several psychological experiments were conducted to identify the kind of affective states achieved by children when they are playing the game. The experimental results show that main emotions expressed by children are sad, frustrated, joyful, and happy. Those 4 emotions were used to evaluate the gesture recognition model and the intensity estimate model presented in this work . In this testing process pairs of children aged between the ages 10-11 were asked to play the game. One child was wearing the motion capture system and this child’s affective states were monitored. HMM model needed to be trained before its use as a gesture recognition model. For training the model gestures from 30 children with teacher’s feedback about children’s affective states corresponding to each gesture were collected. 30 affective gestures from each of the emotions were selected and used for training the model. When the trained HMM model was tested using the same data set, a higher recognition rate (over 85.6 %) was achieved. Then, this trained HMM model was used for real time emotion recognition. In real time, the game was connected to the main system (inside of observer’s computer). After the completion of each game all the information is displayed on the observer’s interface and the observer inputs information on the child’s affective states and the intensity. 20 children were used to test the model and each child was asked to play the game 5 times. For each game the following data was recorded: HMM model recognized emotion, estimated intensity of emotion, and observers’ feedback about emotion and intensity of emotions. At the end of the experiments 100-gesture emotions were obtained. According to observer’s feedback there are 30 affective gestures for sad, 24 for frustrated, 22 for happy, and 24 for joy. Figure 3 shows the example of affectively express children for each emotion with intensity.

4.1 Concordance between recognition model and observer’s feedback The first task carried out during the experiments carried out was to test the recognition power of the gesture recog-

Figure 3. Example of affectively express children for each emotion with intensity. (a) Sad, intensity = 0.61, (b) Frustrated, intensity = .72 (c) Happy, intensity = 0.52, (d) Joy, intensity= 0.87.

nition model (HMM). Overall, the confusion matrix shows a 79% agreement between observer’s feedback and the gesture recognition model (HMM). To test the significance of this agreement (i.e., the null hypothesis being that the observer’s feedback are independent of the gesture recognition model evaluations), Subsequently, the χ2 test was applied to recognition model evaluation and observer’s feedback data. The resulting value of 159.78 for 9 degrees of freedom was much higher than the critical value of 27.877 for α= 0.001, implying that the null hypothesis can be rejected, implying that the two groups have a significant relationship.

4.2 Relationship between model estimate intensity and observer’s feedback intensity To check the relationship between model estimate intensity and observer’s feedback intensity 79 emotions were selected. This was due to the fact that 79 emotions agreement between observers and the gesture recognition model were available. Analysis of Variance (ANOVA) was used to check these relationships. It was observed that a linear (F(1,78) = 112.79, P < .001) trend was present in the relationship between estimated intensity and observer’s feedback intensity. This evaluation is not enough to study which intensity levels have significant differences. In other words it was required to check whether proposed computational model will perform at which intensity level. To evaluate above information 3 intensity levels between 0 and its saturation value ( maximum value, equal to 1) ; intensity 0 to


IEEE

Table 1. The significant differences between level of model estimated intensity and observer’s feedback intensity Emotion

Intensity level

F and P-value

Sad

level1 level2 level3

F(1,4)= 9.5 P= 0.037 F(1,8)= 4.5 P= 0.067 F(1,12)= 3.6 P= 0.082

Frustrated


F(1,5)= 6.45 P= 0.052 F(1,5)= 6.3 P= 0.053 F(1,6)= 5.4 P= 0.068


F(1,5)= 7.6 P= 0.04 F(1,6)= 4.5 P= 0.078 F(1,6)= 3.32 P= 0.11


F(1,8)= 8.25 P= 0.02 F(1,7)= 5.43 P= 0.053 F(1,7)= 4.31 P= 0.08

Happy

Joy

0.25 as level 1, intensity 0.25 to 0.75 as level 2, and intensity 0.75 to 1 as level 3 were created. Then each of the emotion categories were considered separately and checked intensity levels differed significantly. F and P values for the differences between intensity level can be see in Table 1. Significant differences between intensity levels are printed in bold. Results show that there are significant differences in intensity level 1 except for the emotion ”frustrated”. But p-value was found to be very close to having significant difference in intensity level 1. This confirms results achieved through psychological experiments [2] in which it was always very difficult to perceive low-level intensity for observers. However, an observer can recognize a high intensity level very easily than a low-level intensity.

5 Conclusion Recent research work in affective computing have attempted to use emotional states to create interactive systems that adapt to individual users. Results of this research have been used in application areas such as E-Learning, robotic, and affective computing. However many of these systems lack in adaptability, but were designed to respond in predetermined ways to specific situations. In our work we propose a dynamic and flexible emotion intensity estimation model that can be used to adapt game stated to change child’s emotion intensity. Estimating intensity of emotion is one of the most relevant areas of research in affective computing, human computer interaction, and robotic literacy but so far have had limited success. Previous studies in this areas have proposed computational models for estimating intensity of emotion for emotion expression but not for emotion recognition. Results of the experiments with

the proposed model show that gesture recognition module of the system can recognize child’s emotions in real time with a high recognition rate of over 79%. The proposed intensity estimate model has a strong relationship with observers’ perception (observer’s feedback), except in the case of low intensity levels, which corresponds to the findings reported in psychological experiments. Game control module used the level of emotion intensity to adapt the affective gesture recognition system instead of using emotion.

References [1] M. Argyle. Bodily Communication-2nd Edition. London: Methuen & Co. Ltd., 1998. [2] C. Bartneck, J. Reichenbach, and A. v. Breemen. In your face, robot! the influence of a character’s embodiment on how users perceive its emotional expressions. In Proceedings of the Design and Emotion, 2004. [3] D. Bui, D. Heylen, M. Poel, and A. Nijholt. Parlee: An adaptive plan-based event appraisal model of emotions. In In: KI 2002: Advances in Artificial Intelligence. SpringerVerlag, 2002. [4] M. Coulson. Attributing emotion to static body postures: recognition accuracy, confusions, and viewpoint dependence. Journal of nonverbal behavior, 28:117–139, 2004. [5] P. R. de Silva and N. Bianchi-Berthouze. Modeling human affective postures: An information theoretic characterization of posture features. Journal of Computer Animation and Virtual Worlds, 15:269—276, 2004. [6] D. DeGroot and J. Broekens. Using negative emotions to impair game play. 15th Belgian-Dutch Conference on Artificial Intelligence, 2003. [7] P. Ekman. Moods, emotions, and traits, the nature of emotion: Fundamental questions. New York: Oxford University Press, 1994. [8] C. E. Izard. Appraisals of emotion-eliciting events: Testing a theory of discrete emotions. Psychological Review, 1(100):68–90, 1993. [9] M. Minsky. The Society of Mind. New York: Simon & Schuster, 1986. [10] L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE, pages 257–286. IEEE, 1989. [11] I. J. Roseman, P. Jose, and M. S. Spindel. Appraisals of emotion-eliciting events: Testing a theory of discrete emotions. Journal of Personality and Social Psychology, 5(59):899–915, 1990. [12] H. Ushida, Y. Hirayama, and H. Nakajima. Emotion model for life-like agent and its evaluation. AAAI, 1997. [13] J. Velasquez. Modeling emotions and other motivations in synthetic agents. AAAI, 1998. [14] H. Wang, H. Prendinger, and T. Igarahi. Communicating emotions in online chat using physiological sensors and animation text. Proceedings of CHI’04, 2004. [15] Xsens. Motion capture system. http://www.xsens.com/.


IEEE