An Automatic Micro-expression Recognition System

0 downloads 0 Views 349KB Size Report
problem, Ekman developed the Micro Expression Training Tool (METT) [6]. Frank ..... http://www.cse.usf.edu/~mshreve/publications/FG11.pdf. 11. Kanade, T.
The Machine Knows What You Are Hiding: An Automatic Micro-expression Recognition System Qi Wu1,2, Xunbing Shen1,2, and Xiaolan Fu1,* 1

State Key Laboratory of Brain and Cognitive Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China [email protected] 2 Graduate University of Chinese Academy of Sciences, Beijing, China

Abstract. Micro-expressions are one of the most important behavioral clues for lie and dangerous demeanor detections. However, it is difficult for humans to detect micro-expressions. In this paper, a new approach for automatic microexpression recognition is presented. The system is fully automatic and operates in frame by frame manner. It automatically locates the face and extracts the features by using Gabor filters. GentleSVM is then employed to identify microexpressions. As for spotting, the system obtained 95.83% accuracy. As for recognition, the system showed 85.42% accuracy which was higher than the performance of trained human subjects. To further improve the performance, a more representative training set, a more sophisticated testing bed, and an accurate image alignment method should be focused in future research. Keywords: Micro-expression, Mutual trimming, GentleSVM, Gabor filters.

information,

Dynamical

weight

1 Introduction The frequency of violent and extreme actions around the world is increasing. Scientists and engineers are forced to develop new techniques to detect those actions. Psychologists have made some progress within this domain. In 1969, Ekman and Friesen discovered the existence of micro-expressions [1]. A micro-expression is an extremely quick facial expression of emotion that lasts for 1/25s to 1/5s. It is an involuntary expression shown on the face when humans are trying to conceal or repress their emotions, and it is expressed in the form of seven universal facial expressions [2] [3] [4] [6] [7] [8]. Microexpressions reveal human’s real intent. Therefore, they can be essential behavioral clues for lie and dangerous demeanor detections [2] [4]. The USA government even has already employed this technique in its counter-terrorism practice [5]. However, micro-expressions are barely perceptible to humans [1]. To solve this problem, Ekman developed the Micro Expression Training Tool (METT) [6]. Frank et al. reported that it was still difficult for human to detect real-life micro-expressions *

Corresponding author.

S. D´Mello et al. (Eds.): ACII 2011, Part II, LNCS 6975, pp. 152–162, 2011. © Springer-Verlag Berlin Heidelberg 2011

The Machine Knows What You Are Hiding

153

even after the training of METT (about 40% correct) [7]. Due to this incompetence, researchers and practitioners have to analyze the videos frame by frame, which is very time-consuming. Efficient technical adjuncts are in great need within this domain. The combination of computer science and psychology research fields can satisfy these needs. Two independent groups of computer scientists have already addressed this issue. In [8], Polikovsy et al. used the 3D gradients orientation histogram descriptor to represent the motion information. Results show that this method can recognize action units with good precision at some facial regions and it also can analyze the expressions’ phases. Shreve et al. proposed novel expression segmentation methods in [9] and [10]. Utilizing the optical strains caused by non-rigid motions, their algorithms can automatically spot macro- and micro-expressions. These studies have some shortcomings. The method proposed by Polikovsy et al. [8] cannot directly measure the durations of expressions. In addition, their algorithm was evaluated on an artificial dataset in which subjects were asked to simulate microexpressions with low facial muscle intensity. However, micro-expressions can hardly be falsified and it is the duration but not the intensity that defines micro- and macroexpressions (expressions with durations that are longer than the micros’ are called macro-expressions) [1] [4]. The main dataset used by Shreve et al. [9] [10] is also artificial and the false alarm rate of their method for micro-expressions is too high. One of the challenges faced by researchers is that there is no large enough standard micro-expression database to train an accurate system by utilizing the dynamical features like in [25]. In fact, no researchers have ever investigated the dynamics of micro-expressions. However, the appearance of micro-expression completely resembles the seven basic expressions [3] [4] [6] [8], so it is possible for researchers to train a system based on existing facial expression databases by utilizing only the appearance-based features while ignoring the dynamical information. In this paper we propose a new approach for automatic micro-expression recognition. The system ignores the dynamical information and analyzes the videos frame by frame. The structure of the paper is as follows. In section 2, an algorithm for facial expression recognition is described and its performance is evaluated, providing the foundation for micro-expression recognition. In Section 3, we modify and apply the algorithm to micro-expressions. We conclude the paper in Section 4.



2 An Algorithm for Facial Expression Recognition 2.1 Dataset, Face Detection, and Preprocessing The algorithm was evaluated for its facial expression recognition performance on Cohn and Kanade’s dataset (CK) [11]. The neutral and one or two different peak frames of 374 sequences from 97 subjects were employed, resulting in 518 images. For each selected subject, only one neutral frame was chosen. Algorithms were trained to recognize the six basic and the neutral expressions. Images are first converted to grayscale with 8-bit precision. Then the faces are automatically detected by using the algorithm developed by Kienzle et al. [12]. Detected faces are rescaled to 48×48 pixels.

154

Q. Wu, X. Shen, and X. Fu

2.2 Feature Extraction Gabor filters are employed to extract features from detected faces. The twodimensional Gabor filters are defined as follows:

Ψ u ,v ( z ) = where

& ku , v & 2

σ

2

e ( −&ku ,v &

2 & z &2 /2σ 2 )

(e

iku ,v z

− e −σ

2

/2

).

(1)

k iφ uπ k = k e u , k = max , φ = , φ ∈ [ 0, π ) , and z = ( x, y ) . u, v v v u v 8 u f

G

e i k u ,v z is the oscillatory wave function. In our proposed algorithm, we use the Gabor filters with 9 scales and 8 orientations, while σ = 2π , kmax = π / 2 , and f = 2 . 2.3

Improved Gentleboost

Adaboost has been applied successfully to a variety of problems (e.g. [14]). In contrast to Adaboost, Gentleboost converges faster, and performs better for object detection problems [15]. Considering its excellent performance [16], we initially chose Gentleboost as the classifier. Mutual Information and Dynamical Weight Trimming. The weak classifier selection and combination methods determine the efficiency of boosting algorithm [17]. Here we incorporate mutual information (MI) [14] to eliminate non-effective weak classifiers for Gentleboost to improve its performance. Before a weak classifier is selected, the MI between the new and selected classifiers is examined to make sure the information carried by the new classifier has not been captured by the selected ones. At stage T + 1 when T weak classifiers {hv (1) , hv ( 2) ,...hv (T ) } have been selected, the function for measuring the maximum MI of the candidate classifier h j and the selected classifiers can be defined as follows: R ( h j ) = max MI ( h j , hv ( t ) ) . t =1,2,...,T

(2)

where MI ( h j , hv ( t ) ) = H ( h j ) + H ( hv ( t ) ) − H ( h j , hv ( t ) ), H ( x ) is the entropy of random variable x, and H ( x , y ) is the joint entropy of random variable x and y . Each weak classifier is considered as a random variable, so MI (h j , hv (t ) ) represents the mutual information between the two random variables h j and hv(t ) . For a weak classifier with discrete values like the regression stump which is employed in this study, the probability values required by MI calculation could be estimated by simply counting the number of possible cases and dividing that number with the total number of training samples. To determine whether the new classifier is effective or not, R ( h j ) is compared with a pre-defined threshold of mutual information (TMI). If it is larger

The Machine Knows What You Are Hiding

155

than TMI, the classifier is considered c as non-effective and a new one is selected frrom the candidate classifiers. Because the training of Gentleboost is time-consuming, researchers are forcedd to train and evaluate this algorrithm on low dimensional small data sets. However, duee to the extra computation requiired by MI, the training of Gentleboost even would becoome longer [14]. Here we exten nd the idea of dynamical weight trimming (DWT) [18] for Gentleboost to compensate for this problem. This implementation is still called DW WT. At each round, training samples s are filtered according to a threshold t ( β ) which is the βth percentile of the weight w distribution over training data at the correspondding iteration. Training samples with w weight less than the threshold are not used for trainning. Since the weak classifier iss not trained on the whole training samples and is seleccted according to MI, the weighteed error of selected classifier may even be more than or eqqual to 0.5. To exclude such non-effective classifiers, t ( β ) is dynamically channged according to the classificatio on error. Details are shown in Fig. 1. For multiclass classificcation, the one-against-all method is implemented. T The expression category decisiion is implemented by choosing the classifier with the maximum decision function n value for the test example. Details are shown in Fig. 1.

Fig. 1. The imp proved Gentleboost algorithm for binary classifier

156

Q. Wu, X. Shen, and X. Fu

Experimental Results and Discussions. The proposed algorithm was implemented in Matlab and tested on a Core i5 650 system with 4GB memory. The 10-fold cross validation was used as the testing paradigm that subjects were randomly separated into 10 groups of roughly equal size and did ‘leave one group out’ cross validation.TMI and β were set to 0.11. The best results were obtained when 120 features were employed by each binary classifier. Table 1. Performance of imroved Gentleboost

Classifier Gentleboost, MI+DWT Gentleboost, DWT Gentleboost,MI Original Gentleboost

Accuracy 88.61% 85.33% 87.83% 86.48%

Training time 85019.71s 84644.05s 122817.91s 122364.53s

2

As shown in Table 1, the MI+DWT combination improved Gentleboost’s accuracy. Although this combination has the highest accuracy score, the 88.61% accuracy obtained by our algorithm is still not satisfactory enough. Results also showed that the training speed of Gentleboost was improved by the MI+DWT combination (1.44 times faster than original). This effect is brought by the DWT implementation. It should be noted that, for most problems, β = 0.1 should be able to bring acceptable results because the data used for training carries 90 percent of the total weight. As shown in Table 1, the accuracy of Gentleboost with DWT was only slightly lower than the original Gentleboost. The larger value of β can bring faster training speed, but it will also introduce more errors [18]. A trade-off between speed and accuracy must be considered. 2.4 GentleSVM

It has been shown that SVMs can provide state-of-the-art accuracy for the facial expression recognition problem [13] [16]. To further improve the performance of our proposed algorithm, we chose to use the Gentleboost algorithm as a feature selector preceding the SVM classifiers. This combination is called GentleSVM. In feature selection by Gentleboost, each Gabor filter is treated as a weak classifier. Gentleboost picks the best of those classifiers, and then adjusts the weight according to the classification results. The next filter is selected as the one that gives the best performance on the errors of the previous filter. All the selected features are then united to form a new representation. Then the 1-norm soft margin linear SVMs are trained on the selected Gabor features. For multiclass classification, the one-againstall method is implemented. The expression category decision is implemented by choosing the classifier with the maximum geometric margin for the test example. 1 2

For all the experiments described in this paper, the mutual information was measured in bans. The time required for 10 times trainings.

The Machine Knows What You Are Hiding

157

Experimental Results and Discussions. The proposed algorithm was implemented and tested on same platform as in Section 2.3. Testing paradigm was the same to Section 2.3. TMI and β were set to 0.1. The best results were obtained when 200 features were selected for each expression category. Table 2. Performance of GentleSVM

Classifier Original SVM GentleSVM, MI+DWT GentleSVM, DWT GentleSVM, MI Original GentleSVM

Accuracy 88.03% 92.66% 90.93% 92.47% 90.35%

Training time 883.62s 18.85s 19.31s 17.79s 19.00s

3

Table 3. Accuracy for each expression category

Classifier Original SVM GentleSVM, MI+DWT GentleSVM, DWT GentleSVM, MI Original GentleSVM

Sadness 82.43%

Surprise 93.06%

Anger 84.13%

Disgust 93.75%

Fear 75.41%

Happiness 92.55%

Neutral 91.11%

85.14%

98.61%

87.03%

96.86%

86.89%

96.81%

94.44%

85.14%

98.61%

87.03%

96.88%

80.33%

94.68%

91.11%

85.13%

100%

85.71%

96.88%

86.89%

96.81%

93.33%

82.43%

98.61%

88.89%

95.31%

83.61%

96.81%

85.56%

As shown in Table 2, all GentleSVM variants outperformed the original SVM and our improved Gentleboost, and all GentleSVM variants completed the 10 times trainings within 20s, which means that after the training of Gentleboost the system can be further improved with only a little cost of time. Within these GentleSVM variants, the best result was obtained by the MI+DWT combination (92.66%). Taken all results together, we can safely conclude that the MI+DWT combination improves the performance of Gentleboost both for classification and feature selection. As shown in Table 3, all GentleSVM variants and original SVM recognized surprise, disgust, happiness, and neutral expressions with good precisions. As for sadness, anger, and fear, the performance of all the classifiers were lower. However, all these accuracy scores for GentleSVM variants were still above 80%. Bartlett et al. [13] have conducted similar experiments by using the Computer Expression Recognition Toolbox on CK. They reported 93% correct for an alternative forced choice of the six basic expressions plus neutral [13]. To the best of our knowledge, this is the highest performance obtained by appearance based approach on CK. Their results are not directly comparable to us due to different protocols, 3

The time required for 10 times trainings.

158

Q. Wu, X. Shen, and X. Fu

preprocessing methods, and so on, but we can see from the results that the best accuracy score obtained by us (92.66%) is very similar to theirs. Such accuracy is encouraging for an automatic system without complicated preprocessing procedures. It establishes a firm foundation for micro-expression recognition.

3 An Automatic Micro-expression Recognition System To recognize micro-expressions, the algorithm described above is applied to videos to determine the expression label for each video frame. Then the label outputs are scanned. Durations of expressions are measured according to the transition points and the video’s frame-rate. For example, if a video’s frame-rate is 30 fps, and its label output is 111222, then the transition points are the first frame, the point where 1 → 2 , and the last frame. So the durations of expression 1 and 2 are both 1/10s. Then the micro-expression and its expression label are extracted according to its definition. Expressions that last for 1/25s to 1/5s are considered as micro-expressions, whereas expressions with durations that are longer than 1/5s are considered as macroexpressions and are ignored [1] [3] [4]. Considering contempt is not universal as the other six basic expressions [7], presently the system is only trained to recognize micro-expressions from the six prototypic categories. Fig. 2 illustrates its framework.

Fig. 2. Overview of the proposed automatic micro-expression recognition system

3.1 A New Training Set, the Test Set, and the Extra Preprocessing Procedure

A new training set was collected to promote the system’s generalization ability. To be more specific, 1109 images were collected from facial expression databases [11] [19] [20] [21] [22] [23] and 26 images were downloaded from the Internet. Micro-expression videos from METT [6] were employed as the test set. Forty-eight micro-expression videos from the six basic expression categories were all employed. In each video, there was a flash of frontal view micro-expression, preceded and followed by about 2s of frontal view neutral expressions. The system’s performance was evaluated by two indexes. As for spotting, the system had to identify the hidden expressions as micro-expressions. No false alarms were accepted. As for recognition, the system had to correctly label the micro-expressions.

The Machine Knows What You Are Hiding

159

To compensate for the illumination differences between facial expression databases, an extra preprocessing procedure is performed that the grayscale values of each rescaled face are normalized to zero mean and one unit variance. 3.2 Experimental Results and Discussions

The system was implemented and tested in the same environment as in Section 2. TMI and β were set to 0.1. Two hundred features were selected for each expression.

Fig. 3. The performance of the automatic micro-expression recognition system on METT

Table 4. The performances for Caucasians and Asians

Race Caucasian Asian

Recognition accuracy 95.8% 75%

Sample size 904 160

4

As shown in Fig. 3, if the training set was changed to CK and the extra preprocessing was not performed, the system’s recognition accuracy on METT was only 50% which was even lower than the performance of naive human subjects on the pretest phrase of METT [7] [24]. However, when the system was trained on the new training set, its spotting accuracy reached 91.67% and the recognition accuracy was 81.25%. When the extra preprocessing procedure was added, the spotting accuracy reached 95.83%, and the recognition accuracy was 85.42%. This recognition accuracy was even higher than the performance of trained human subjects (70%–80%) on the posttest phrase of METT [7] [24]. This result highlights the need for large training set in practice and the necessity of the extra preprocessing procedure. Considering the contribution of the extra preprocessing was only about 4% when the training set was large, this result suggests that researchers should focus more on the training set than the preprocessing method. 4

The number of examples in the new training set.

160

Q. Wu, X. Shen, and X. Fu

A detailed analysis indicates that our new training set is still not representative enough. In our test set, one half of the subjects are Caucasians, while the other half is Asian. Results showed that the final system performed differently for these two different races (as shown in Table 4). Compared with the large sample size of Caucasians in the new training set, the sample size of Asians in this training set is small, thus limiting the performance of our final system for Asians. A more representative training set should be collected in the future. The results obtained above are still preliminary. The micro-expressions in METT are in their simplest forms that the micro-expressions are in frontal-view and there are no large head rotations and speech related expressions in the video. The micro-expressions in real-life situations are more complicated than in METT. Therefore, it is necessary to collect a more comprehensive testing bed in order to evaluate the system in more realistic situations or to investigate the dynamics of micro-expressions. It should be noted that no image alignments are performed in our system. Inaccurate image alignments may impair the performance of the system in real-life applications [16]. Therefore, future improvements should focus on the image alignment method in order to handle head rotations and image shifts. It also should be noted that intelligent systems like the one described above must be applied cautiously in practice. Although the micro-expression hypothesis is widely accepted among psychologists, some empirical studies suggest that overreliance on micro-expressions as an indicator of deception is likely to be ineffective. It is necessary for the researchers to take other behavioral clues into consideration [3] [5].

4 Conclusions In this paper, we presented a new approach for facial micro-expression recognition. The system ignores the dynamical information and analyzes the videos frame by frame. Results showed that the performance of our final system on METT test was better than the performance of trained human subjects. Acknowledgments. This research was supported in part by grants from 973 Program (2011CB302201) and the National Natural Science Foundation of China (61075042).

References 1. Ekman, P., Friesen, W.V.: Nonverbal Leakage and Clues to Deception. Psychiatry 32, 88– 97 (1969) 2. Ekman, P.: Lie Catching and Microexpressions. In: Martin, C. (ed.) The Philosophy of Deception, pp. 118–133. Oxford University Press, Oxford (2009) 3. ten Brinke, L., MacDonald, S., Porter, S., O’ Conner, B.: Crocodile Tears: Facial, Verbal and Body Language Behaviors Associated with Genuine and Fabricated Remorse. Law. Hum. Behav., 1–11 (2011)

The Machine Knows What You Are Hiding

161

4. Ekman, P.: Telling Lies, 2nd edn. Norton, New York (2009) 5. Weinberger, S.: Intent to Deceive: Can the Science of Deception Detection Help to Catch Terrorists? Nature 465, 412–415 (2010) 6. Ekman, P.: Micro Expression Training Tool. University of California, San Francisco (2003) 7. Frank, M.G., Herbasz, M., Sinuk, K., Keller, A., Nolan, C.: I See How You Feel: Training Laypeople and Professionals to Recognize Fleeting Emotions. In: The Annual Meeting of the International Communication Association. Sheraton New York, New York City (2009), http://www.allacademic.com/meta/p15018_index.html 8. Polisovsky, S., Kameda, Y., Ohta, Y.: Facial Micro-Expressions Recognition Using High Speed Camera and 3D-Gradients Descriptor. In: The Proceedings of 3rd International Conference on Imaging for Crime Detection and Prevention, pp. 1–6 (2009) 9. Shreve, M., Godavarthy, S., Manohar, V., Goldgof, D., Sarkar, S.: Towards Macro- and Micro-Expression Spotting in Videos using Strain Patterns. In: The Proceeding of IEEE Workshop on Applications of Computer Vision, pp. 1–6 (2009) 10. Shreve, M., Godavarthy, S., Goldgof, D., Sarkar, S.: Macro- and Micro- Expression Spotting using Spatio-temporal Strain. To appear in Face and Gesture, Santa Barbara (March 2011), http://www.cse.usf.edu/~mshreve/publications/FG11.pdf 11. Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive Database for Facial Expression Analysis. In: Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 46–53 (2000) 12. Kienzle, W., Bakir, G., Franz, B., Scholkopf, M.: Face Detection - Efficient and Rank Deficient. In: Advances in Neural Information Processing Systems, vol. 17, pp. 673–680 (2005) 13. Bartlett, M., Whitehill, J.: Automated Facial Expression Measurement: Recent Applications to Basic Research in Human Behavior, Learning, and Education. In: Calder, A., Rhodes, G., Haxby, J.V., Johnson, M.H. (eds.) Handbook of Face Perception. Oxford University Press, USA (2010), http://mplab.ucsd.edu/~marni/pubs/ Bartlett_FaceHandbook_2010.pdf 14. Shen, L., Bai, L.: Mutualboost Learning for Selecting Gabor Features for Face Recognition. Pattern. Recogn. Lett. 27, 1758–1767 (2006) 15. Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing Features: Efficient Boosting Procedures for Multiclass Object Detection. In: Proceedings of 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 762–769 (2004) 16. Whitehill, J., Littlewort, G., Fasel, I., Bartlett, M., Movellan, J.: Toward Practical Smile Detection. IEEE Trans. Pattern. Anal. Mach. Intell. 31, 2106–2111 (2009) 17. Gao, N., Tang, Q.: On Selection and Combination of Weak Learners in AdaBoost. Pattern Recogn. Lett. 31, 991–1001 (2010) 18. Jia, H., Zhang, Y.: Fast Adaboost Training Algorithm by Dynamic Weight Trimming. Chinese. J. Comput 32, 336–341 (2009) 19. Pantic, M., Valstar, M.F., Rademaker, R., Maat, L.: Web-Based Database for Facial Expression Analysis. In: Proceedings of IEEE International Conference on Multimedia and Expo., pp. 317–321 (2005) 20. Wallhoff, F.: Facial Expressions and Emotion Database. Technische Universität München (2006), http://www.mmk.ei.tum.de/~waf/fgnet/feedtum.html

162

Q. Wu, X. Shen, and X. Fu

21. Lyons, M.J., Akamatsu, S., Kamachi, M., Gyoba, J.: Coding Facial Expressions with Gabor Wavelets. In: Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 200–205 (1998) 22. Roy, S., Roy, C., Fortin, I., Either-Majcher, C., Belin, P., Gosselin, F.: A Dynamic Facial Expression Database. J. Vis. 7, 944a (2007) 23. Ekman, P., Friesen, W.V.: Pictures of Facial Affect. Consulting Psychologists Press, California (1976) 24. Russell, T.A., Elvina, C., Mary, L.P.: A Pilot Study to Investigate the Effectiveness of Emotion Recognition Remediation in Schizophrenia Using the Micro-Expression Training Tool. Brit. J. Clin. Psychol. 45, 579–583 (2006) 25. Koelstra, S., Pantic, M., Patras, I.: A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models. IEEE Trans. Pattern Anal. Mach. Intell. 32(11), 1940–1954 (2010)

Suggest Documents