Based on Support Vector Regression for Emotion ... - CiteSeerX

156 downloads 23 Views 404KB Size Report
[3] analyzes four kinds of signals to classify eight classes of emotions. Mauss et al. [4] used ... spoof of the superhero film, mainly the first Spider-Man, and other ...
Based on Support Vector Regression for Emotion Recognition using Physiological Signals Chuan-Yu Chang, Senior Member, IEEE, Jun-Ying Zheng, and Chi-Jane Wang 

Abstract—Facial expression are widely used for emotion recognition. Facial expressions may be expressed differently by different people subjectively, inaccurate results are unavoidable. Nevertheless, physiological reactions are non-autonomic nerves in physiology. The physiological reactions and the corresponding signals are hardly to control while emotions are excited. Therefore, an emotion recognition system with consideration of physiological signals is proposed in this paper. A specific designed mood induction experiment is performed to collect physiological signals of subjects. Five biosensors including electrocardiogram, respiration, galvanic skin responses (GSR), blood volume pulse, and pulse are used. Then a Support Vector Regression (SVR) is used to train three regression curves of three emotions (sad, fear, and pleasure). Experimental results show that the proposed method based on SVR emotion recognition has a good performance in accuracy.

I. INTRODUCTION

H

uman emotional expressions play an important role in human-to-human interaction. General expressions may include word choices, tone of voice, and body language, such as posture and physiology responses. Human emotional expressions are vital for interpersonal interaction. Beside these sensible reactions, human expression is another native action that accompany with physiological signal changes. Some accessible human physiological signal patterns including respiration, heart rate, blood volume pressure, finger temperature, skin conductivity, electromyogram (EMG), electrocardiogram (ECG), and electroencephalographic (EEG) are used in applications of medicine, psychology and physiology, mental disorder, and human-to-computer interactions. In addition, physiological signals can greatly help assessing and quantifying stress, tension, anger, and other emotions that influence health. In general, physiological reactions are non-autonomic nerves in physiology. The physiological reactions and the corresponding signals are hardly to control while emotions are excited. Researches have used them to determine and classify different kinds of emotion [2]. Leon et al. [3] analyzes four kinds of signals to classify eight classes of emotions. Mauss et al. [4] used the ECG to calculate five characteristics to analyze the impact of angry memories of the participants. Bailenson et al. [5] combined human face This work was supported by the National Science Council Taiwan, under the grant NSC 98-2218-E-006-004. Corresponding author: Chuan-Yu Chang, Associate Professor, is with Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Yunlin, Taiwan 640. (e-mail: [email protected]).

and physiological signals to get 251 features and used feature selection approach to find some significant features of emotions. In emotional recognition systems, it was necessary to collect a lot of physiological signals representing specific emotional statuses. Kim et al. [6] used dolls and situational story to establish an environment to induce participants’ emotions. Mandryk and Atkins [7] record the physiological signals of participants playing video games. Jonghwa [1] designed a musical induction method which spontaneously leads subjects to real emotional states, without any deliberate lab setting. Participants are requested to fill out an emotional model (see Fig. 1). Katsis et al. [8] design a racing simulation device for participants to experience, and measuring the excitement level of participants. There are many classifiers have been used in emotion classification, such as Support Vector Machine (SVM) [5-6], Multilayer Perceptron (MLP) [3, 9], Adaptive Neuro-Fuzzy Inference System (ANFIS) [8], and Linear Discriminant Analysis (LDA) [1]. The features extracted from different physiological signals are used in those classifiers. Since the same emotion has similar physiological responses, the Support Vector Regression (SVR) is adopted to find the trend curve of each emotion in this paper. The rest of this paper is organized as follows. Section II presents the emotion induction experiment and the extracted physiological signals. The Support Vector Regression approach is described in Section III. Experimental results are given in Section IV. Conclusions are given in Section V.

Fig. 1. Emotional model [1].

II. EMOTION INDUCTION AND PHYSIOLOGICAL SIGNALS A. Emotion Induction In order to collect emotional physiological signals, an experiment for stimulating t h e pa r t i c i pa n t ’ s emotion is performed. Audio-visual equipments have been used to induce the emotional responses in some studies [5, 10, 11]. In Bailenson et al.’ s method [5], participants are requested to watch a 9 minutes film clip with plots of amusing, neutral, sad, and neutral. Nasoz and Lisetti designed a 45 minutes slide to induce five emotions (anger, surprise, fear, frustration, and amusement) [10]. We believe that short videos or static images are insufficient to induce participants’emotions. Therefore, participants are required to watch a complete movie for about 1.5 hours in this study. In the proposed emotion induction experiment, three movies are selected to induce three emotions including sadness, fear, and pleasure. The selected movies are briefly described as follows: 1) Sad Movie is a romantic comedy film published in South Korean, 2005. The movie describes the stories about four relationships and their trials, pains, heartaches, and subsequent separations. This movie is used to induce sad emotion. 2) The Grudge: Old Lady in White is a 2009 Japan horror film and the director is Ryuta Miyake. The movie describes a curse that was born when someone dies. At a certain house, a son brutally murders all five of his family members after failing the bar exam. He then hangs himself, leaving behind a cassette recorder at the scene on which he can be heard saying, "Go...Go now" in unison with a strange female voice. That voice belongs to a victim of the family massacre. This movie is used to induce fear emotion. 3) Superhero Movie: It is a 2008 United States spoof film and the director is Craig Mazin. Superhero Movie is a spoof of the superhero film, mainly the first Spider-Man, and other modern Marvel Comics film adaptations. This movie is used to induce pleasure emotion.

Then participants are requested to fill out a pre-test questionnaire which mainly surveys the emotion situation of the participant. The participant watches the movie and only the participant is in the audio-visual room. When the movie finished, the participant is requested to fill out a post-test questionnaire which mainly asked the questions about the movie plot. Figure 3 shows a participant is watching a movie. The movie is played on a 42" LCD TV. Figure 4 show a participant wears five biosensors to collect the emotional physiological signals.

Fig. 3. A participant is watching a movie.

Figure 2 shows the procedures of the emotion induction experiment. In order to provide participants an undisturbed space to watch the emotion induction movies, we prepared an audio-visual room. When the participant sat down, we helped him/her wear the physiological signal devices, and describe the function of these devices. Then participants have to close eyes for 5 minutes, all lights are turned off in the period. Experiment introduction and instruction

Wear the five biosensors

Take a rest of 5 minutes

Fill out the post-test questionnaire

Watch the emotion induction movie

Fill out the pre-test questionnaire

Fig. 2. The flow diagram of the emotion induction experiment.

Fig. 4. A participant wears five biosensors to collect the emotional physiological signals.

B. Physiological signals The physiological signals were acquired using the ML870 PowerLab 8/30 with five biosensors including electrocardiogram (ECG), respiration, galvanic skin responses (GSR), blood volume pulse (BVP), and pulse. The sampling rates were 400 Hz for all channels. The ECG was measured from both wrists and right ankle with the three-electrode method. Respiratory was measured from chest with a flexible strap. GSR was measured from two metal electrodes attached to the index and middle fingers of the right hand. BVP was measured from a clamp which has an infrared device attached to the middle finger of the left hand. Pulse was measured from a piezoelectric device attached the ring finger of the left hand. III. METHODOLOGY The block diagram of proposed emotion recognition system is shown in Fig. 5. In the training phase, after preprocessing, the physiological signals were input to the SVR. Then the sadness, fear, and pleasure trend curves were obtained respectively. In the testing phase, when a new data input to the system, the degree of similarity can be calculated with the three trend curves. Finally, according to degree of similarity, the emotion of the input data is determined. A. Preprocessing There are four processes in the preprocessing stage. A low-pass filter was used to remove the noise of the physiological signals. In addition, in order to remove the baseline from the certain signals, a high-pass filter was then applied. Then, the physiological signals collection software has been used to obtain R-R interval, respiration rate, peak of BVP, and peak of pulse. To reduce the amount of data, GSR

signal is re-sampled by a sampling rate of 32 Hz; R-R interval, respiration rate, and peak of BVP and pulse are re-sampled by sampling rate of 4 Hz [1]. In post-test questionnaire, each question represents a movie plot, and the participant must select an emotion answer: sadness, fear, pleasure, or the other emotions. According to the questionnaire answers, we can know what kind of the emotion induced from the participant in the movie plot. Because the emotion is a short-term physiological response, we extracted an instant emotion signal for 10 seconds. Moreover, intensity of each signal which extracted from original signal was normalized to [0, 1]. And then, these 10 seconds signals are the input of Support Vector Regression. B. Support Vector Regression Support vector machines (SVMs) proposed by Vapnik et al. are supervised learning machines [12-13]. Since SVM is capable of generating a hyperplane to separate two data sets and of providing good generalization, it is a powerful classification approach. Moreover, SVM can also used to deal with regression problems. The so-called regression for each instance corresponding to the label is continuous real numbers rather than discrete class. SVM in the regression problem is known as Support Vector Regression (SVR) [14]. Like SVM, the goal of SVR is to find a hyperplane in a feature space. However, SVM is able to find a hyperplane which separates the data into two parts, while SVR to find a hyperplane to accurately predict the distribution of original data (shown in Fig. 6). In this paper, the SVR has been used to find the three kinds of emotion trend curve. Suppose a training data set is given by

, x2 , y2  , ,  xl , yl  x1 , y1  x p 

(1)

where xi denotes the time index, y j E , R, G , B, Pdenotes the signal intensity of E (R-R interval), R (respiration rate), G (GSR), B (peak of BVP), and P (peak of pulse), and l is signal length. The parameter l is set to 40 for all signals except GSR.

Fig. 6. Linear SVR in the feature space. ( ) input instance. Fig. 5. Block diagram of the proposed emotion recognition system.

The parameter l is set to 320 for GSR signal. The goal of ε -SVR is to find a function f(x), such that the ε -deviation between the actual target value y j and the prediction target y as small as possible [15]. The linear function f is described as follows:

f  x w  x b w Χ, b

(2)

where w is the hyperplane solved in SVR. We can rewrite this problem as a convex optimization problem:

1 2 w 2 subject to yi  w xi b  minimize

(3)

where 0 denotes the maximum deviation between the actual and predictive target. However, noise existed in most applications. This can be described by introducing

i , i* to measure the

(non-negative) slack variables

deviation of training samples outside -insensitive zone. Thus SVM regression is formulated as minimization of the following functional:

 l

minimize

1 2 w C i i* 2 i 1



xi b i yi w   subject to w  xi b yi i*  * i , i 0 

(4)

In Eq.(4), each training instance has its own corresponding i and i* which are used to determine whether the training instance falls outside the scope of ε . The penalty parameter C 0 determines the trade-off between the flatness of f and the amount up to which deviations larger than are tolerated. The optimization problem of Eq. (4) can be solved by the Lagrange multiplier technique and standard quadratic programming technique. Finally, the regression function of f  x is given by

  K x , x b

f  x 

N

i

* i

i

(5)

i 1

where K  xi , x   xi  x is known as the kernel function. T

A number of coefficients

i i* have nonzero

values and the

corresponding training instances are known as support vectors that have approximation errors equal to or larger than the error level ε . In this paper, a SVR with RBF kernel function is adopted to train the three kinds of emotion model.

IV. EXPERIMENTAL RESULTS In emotion induction experiment, we collected physiological signals from eleven persons (eight males and three females). Three kinds of emotional physiological signal (sadness, fear, and pleasure) are collected. After preprocessing stage, the emotion signals are divided into two groups: one half is for training, and the other is for testing. Samples of the three emotional physiological signals obtained from a participant are shown in Fig. 7. The length of each physiological signal is ten seconds. Since people have only 2 or 3 breaths in 10 seconds, respiration signal is not suitable for emotion recognition in a short time. In Fig. 7 (a-d), the R-R interval signal is decreasing sharply and increasing slowly in sadness. The peak of BVP and pulse of sadness signals are similar to R-R interval signal, but the slope of increasing is smaller than it in R-R interval signal. The GSR signal decreased rapidly when it increases to the peak. In Fig. 7 (e-h), the R-R interval and peak of BVP and pulse show the decreasing state in fear signals. The GSR signal represents increasing in fear signal. In Fig. 7 (i-l), the R-R interval is showing the decreasing and increasing waveform in pleasure, and its response time is usually short. The peak of BVP and pulse of pleasure signals are similar to sad signals, but the increasing slope is smaller than it in sad signals. The GSR signal is decreasing slowly when it is increasing to the peak. In the ε -SVR, three parameters including ε , gamma, and C should be set appropriately to obtain high accuracy. Since the penalty parameter C does not affect the capacity of prediction, C is set to 1. Gamma is the standard deviation in the RBR kernel function, which can be calculated according to the input instance of SVR. In this paper, the gamma value is set as 0.57738. To obtain a proper εfor SVR, experiments for different εwith fixing C and gamma were performed. The accuracies under various εare shown in Fig. 8. From Fig. 8, the highest accuracy (90.6%) was obtained when the parameter εwas set to 0.2. Therefore, parameter ε was set to 0.2. Accordingly, all experiments were carried out with C=1, gamma=0.57738, and ε= 0.2, respectively. In the testing stage, a distance measurement is used to determine the emotion. The unknown emotion signal ( fT ) is compared with the emotion trend curves ( f k ) (obtained from SVR) by

Dk   fT f k 

(6)

where k  s, f , prepresented the emotions Sadness, Fear, and Pleasure, respectively. Therefore, three Dk values will obtained and the one with the smallest Dk value is taken as the result of emotion recognition. Tables I-IV show the recognition results with different signals. In Tables I-IV, the smallest average accuracy is 87.8%, and the highest one is 94%.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k) (l) Fig. 7. Samples of physiological signals for three emotions:(a-d) the R-R interval, GSR, peak of BVP and Pulse of sadness signals; (e-h) the R-R interval, GSR, peak of BVP and Pulse of fear signals; (i-l) the R-R interval, GSR, peak of BVP and Pulse of pleasure signals.

TABLE I EMOTION RECOGNITION ACCURACY BY R-R INTERVAL R-R interval Sadness

Fear

Pleasure

Accuracy (%)

Sadness

19

0

3

86.4%

Fear

0

64

0

100.0%

0

2

42

95.5%

Pleasure

Average accuracy Fig. 8. Accuracy of different ε .

94.0%

TABLE II EMOTION RECOGNITION ACCURACY BY GSR GSR

In Table II, the GSR signal has a lower accuracy in pleasure because the GSR signal is similar in sadness and pleasure. Since the GSR signal of pleasure is slowly decreasing than the sadness signal and the signals sometimes have been affected by noise, the signal will not be classified correctly. In Table IV, the peak of pulse signal has a lower accuracy in fear. The peak of pulse of sadness and fear signal is decreasing initially, but the end of signals is including two states: increasing and decreasing. Therefore, this difference between the above cases will cause misclassification.

Sadness

Fear

Pleasure

Accuracy (%)

Sadness

21

0

0

100.0%

Fear

0

21

1

95.4%

9

2

30

73.2%

Pleasure

Average accuracy

89.5%

TABLE III EMOTION RECOGNITION ACCURACY BY PEAK OF BVP BVP Peak Intensity

V. CONCLUSION

Sadness

In this paper, we propose a SVR-based emotion recognition system. In order to obtain the actual emotional responses from participants, we designed an experiment to induce participants' emotions. In the emotion induction experiment, a participant watches movies and is not interfered with others, so he/she can indulge in the plot of a movie. Five physiological signals are extracted to train the SVR. Finally, three emotional regression curves (sadness, fear, and pleasure) are obtained. The unknown input signal is compared with the obtained emotional regression curves to determine the emotion. From experiments, the proposed SVR-based method is useful for emotion recognition with accuracy of 90.2%. In the future work, we will add the long-term analysis information to the emotion recognition system. Moreover, we can find the correlations between various features and try to measure the participant's emotion level.

[2]

[3]

[4]

Accuracy (%)

24

0

1

96.0%

Fear

0

44

3

93.6%

3

3

31

83.8%

Average accuracy

91.1%

TABLE IV EMOTION RECOGNITION ACCURACY BY PEAK OF PULSE Pulse Peak Intensity Sadness

Fear

Pleasure

Accuracy (%)

Sadness

9

1

0

90.0%

Fear

8

29

0

78.4%

2

0

38

95.0%

Pleasure

Average accuracy

[5] K. Jonghwa and E. Ande, "Emotion Recognition Based on Physiological Changes in Music Listening," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, pp. 2067-2083, 2008. C. Y. Chang, J. S. Tsai, C. J. Wang, and P. C. Chung, "Emotion recognition with consideration of facial expression and physiological signals," Procdeing of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 278-283, 2009. E. Leon, G. Clarke, V. Callaghan, and F. Sepulveda, "Real-time detection of emotional changes for inhabited environments," Computers & Graphics, vol. 28, pp. 635-642, 2004. I. B. Mauss, C. L. Cook, J. Y. J. Cheng, and J. J. Gross, "Individual differences in cognitive reappraisal: Experiential and physiological

Pleasure

Sadness

Pleasure

REFERENCES [1]

Fear

[6]

[7]

[8]

87.8%

responses to an anger provocation," International Journal of Psychophysiology, vol. 66, pp. 116-124, 2007. J. N. Bailenson, E. D. Pontikakis, I. B. Mauss, J. J. Gross, M. E. Jabon, C. A. C. Hutcherson, C. Nass, and O. John, "Real-time classification of evoked emotions using facial feature tracking and physiological responses," International Journal of Human-Computer Studies, vol. 66, pp. 303-317, 2008. K. H. Kim, S. W. Bang, and S. R. Kim, "Emotion recognition system using short-term monitoring of physiological signals," Medical & Biological Engineering & Computing, vol. 42, pp. 419-427, 2004. R. L. Mandryk and M. S. Atkins, "A fuzzy physiological approach for continuously modeling emotion during interaction with play technologies," International Journal of Human-Computer Studies, vol. 65, pp. 329-347, 2007. C. D. Katsis, N. Katertsidis, G. Ganiatsas, and D. I. Fotiadis, "Toward Emotion Recognition in Car-Racing Drivers: A Biosignal Processing Approach," IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, , vol. 38, pp. 502-512, 2008.

[9]

[10]

[11]

[12]

[13] [14]

[15] [16]

E. Leon, G. Clarke, V. Callaghan, and F. Sepulveda, "A user-independent real-time emotion recognition system for software agents in domestic environments," Engineering Applications of Artificial Intelligence, vol. 20, pp. 337-345, 2007. F. Nasoz and C. L. Lisetti, "MAUI avatars: Mirroring the user's sensed emotions via expressive multi-ethnic facial avatars," Journal of Visual Languages & Computing, vol. 17, pp. 430-444, 2006. O. Pollatos, B. M. Herbert, E. Matthias, and R. Schandry, "Heart rate response after emotional picture presentation is modulated by interoceptive awareness," International Journal of Psychophysiology, vol. 63, pp. 117-124, 2007. V. Vapnik and A. Lerner, "Pattern recognition using generalized portrait method," Automation and Remote Control, vol. 24, pp. 774-780, 1963. V. Vapnik and A. Chervonenkis, "A note on one class of perceptrons," Automation and Remote Control, vol. 25, 1964. V. Vapnik, S. E. Golowich, and A. Smola, "Support vector method for function approximation, regression estimation, and signal processing," Advances in Neural Information Processing Systems, vol. 9, pp. 281-287, 1996. V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995. C. C. Chang and C. J. Lin. (2001, LIBSVM : a library for support vector machines. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm