Comparison Between Euclidean and Manhattan

0 downloads 0 Views 379KB Size Report
relevant feature points on the face in order to manage avatar's emotions [3, 4]. ... SDK contains a face tracking engine, which can analyse the frame captured by ...
Comparison Between Euclidean and Manhattan Distance Measure for Facial Expressions Classification Latifa Greche1, Maha Jazouli2, Najia Es-Sbai1, Aicha Majda2, Arsalane Zarghili2 1

Laboratory of Renewable Energy and Intelligent Systems 2 Laboratory of Intelligent Systems and Applications Faculty of Sciences and Technology, SMBA University Fez, Morocco {latifa.greche;maha.jazouli;najia.essbai;aicha.majda;arsalane.zarghili}@usmba.ac.ma Abstract-- In this paper, we compare classification results, of six facial expressions including joy, surprise, sadness, anger, disgust, and fear, relying on two different methods of distance computing between 121 landmark points on the face. Facial features were computed using L1 norm (Manhattan distance) in the first case and L2 norm (Euclidean distance) in the second case. Training and test data have been collected using kinect sensor. Labelled dataset contains sequences of 121 landmark points extracted from the face of each subject while displaying six facial expressions including joy, surprise, sadness, anger, disgust, and fear. Classification has been realized using multi-layer feed forward neural network with one hidden layer. Good recognition rates have been achieved in early stages of training regarding Euclidean facial distances. Keywords--facial expression recognition; Euclidean distance; Manhattan distance; neural network; kinect sensor.

I. INTRODUCTION Understanding individual’s emotions is becoming a very interesting subject matter for psychologists, developers, and researchers. Indeed, facial expressions can inform us a lot of things about the psychological state of each person. After proving the emotion universality of individuals [1], several studies have been achieved regarding facial expression recognition issue. Automatic facial expression recognition systems (AFERS) have many applications including detection of mental disorders and pain feeling of patients, person identification, understanding how customers feel, and design of individual avatars. Every AFERS rely on three important stages: collection of training data, extraction and selection of relevant features from labelled data, and generation of classifiers using machine learning techniques. Over the last decades, so many open access databases are acquired for researchers to train and test their classifiers. Most of these databases are constructed from a set of images photographed by ordinary cameras. Nowadays, the usage of smart cameras becomes more interesting. Smart cameras are able to track, with high accuracy, movement and relevant features of objects. Kinect [2] is an intelligent camera which is integrated by an embedded system that enables to track several kinds of patterns in a sequence of images in real time. This camera can track 121 relevant feature points on the face in order to manage avatar’s

emotions [3, 4]. Unlike to the ordinary way to form data, a new test and training data have been constructed, pre-processed, and normalized. The labelled dataset contains many recorded sequences of 121 landmarks extracted from the face of each subject while displaying six different facial expressions in a short time. The objective of this paper is to find out an optimal neural network structure that enables to classify six facial expressions. The dataset has been pre-processed by computing distances between facial landmark points. In the first case, facial distances were computed using Manhattan distance technique. However in the second case, an Euclidean distance technique has been used. The reminder of this paper is structured as follows: in section II we list such related works concerning emotion recognition issue. In section III we describe two main stages of the proposal recognition system which are: feature extraction stage and classification stage. Section IV then presents the obtained results. Finally, the conclusions and further work are detailed in section V. II. RELATED W ORKS Although, the human ability to understand others' feelings during social interaction, facial expressions still remain more complex to be understood and read by machines. Detecting emotions requires descriptors with more accuracy in order to be able to detect texture, shape information, and edge orientations on the image. The main issue of such task of recognition is to find out the algorithm that enables to detect, track and classify facial features movement with high accuracy. A lot of feature descriptors have been proposed over years. Among these whole descriptors, facial landmark points are the most relevant information that can be extracted and used to track and classify expressions. According to various studies and surveys conducted on the subject matter of emotion recognition, facial landmarks classification can be realized using three different classification ways. The first way relies on a classification using combinations of a set of action units [5, 6, 7, 8]. Each action unit describes a specific movement of a specific region of interest on the face. The combination of action units can

have a meaning only if it is similar to one of the combinations outlined by Paul and Ekman [9]. The second classification is a template matching [10] that enables to find out the best degree of similarity between an inputted image and templates inside the labelled data. The third classification way is a classification by means of a generated classifiers based on certain supervised machine learning methods. Among these methods, the most used ones in the literature are: k-nearest neighbour (KNN) [11,12], support vector machines (SVM) [13,14,15], and the neural network [16,17,18,19,20]. S. Yunus and T. Christopher [21] propose a cascade of facial expression classifiers. Each cascade stage contains a binary SVM classifier of each emotion. If the classifier recognizes from an early stage the expression a positive response is outputted otherwise the cascade pass to the following stage until the last stage where the response can be positive in case of good classification or either negative when the window doesn’t contain the subject. The work in this paper is based on a neural network to predict the class of a set of facial inputted distances.

Fig. 2. The 121 facial landmark points used to create training and test dataset.

The distances shown in figure 2 have been calculated in two cases using Manhattan distance dM in the first case and Euclidian distance dE in the second. Then, the distances have been normalized according to the average distance between the eyes of each template inside the data. The dM and dE formulas are respectively presented as follow:

III. AUTOMATIC F ACIAL EXPRESSION RECOGNITION SYSTEM This article aim to review recognition of emotional facial expressions. Our work is based on the Kinect sensor and an artificial neural network approach. figure 1 presents an overall view of the proposed system of automatic emotion recognition.

Fig. 1. Overall view of the proposed system structure.

A.

Facial feature extraction

In this paper, a new labelled data has been constructed and stored using face tracking SDK of kinect sensor. Face Tracking SDK contains a face tracking engine, which can analyse the frame captured by the Kinect camera and detect the head pose and face features in real time. The face tracking engine determines 3D positions of semantic facial feature points as well as 3D head pose. It tracks the 3D location of 121 points, which are depicted on figure 2. For that, to collect and store our own data, a sequence of 121 landmark points has been extracted from the face of each subject while displaying six different facial expressions including joy, surprise, sadness, anger, disgust, and fear. The sequences have been recorded in a short time. Features registration of each sequence starts with an observable emotion on the subject face and ends with a very expressive emotion. Before training the neural network, the training and test data have been pre-processed and normalized.

p and q are vectors space with fixed Cartesian coordinate system

B.

Classification and recognition using neural network

Designing a neural network requires an iterative loop process to find out the optimal structure. To design a neural network there is no rule that can be used to determine the exact number of hidden neurons or hidden layers. Too few or too many hidden neurons/layers can lead to an underffiting or overffiting[22] respectively. In this two cases the neural network gives the worst performance overall. Growing or pruning neurons are the best method that is used to design an artificial neural network. Growing the network means that the training begins with a low number of hidden neurons/layers then neurons grow until achieving good performance. However, pruning begins with an overffiting then decrease the number of hidden neurons until getting an optimal network structure. The longest stage, during neural network structuring, is the stage of hidden layer construction. To determine the quantity of neurons that should be used in hidden layer, we decided to start with a small hidden layer with few neurons and then increasing the quantity until achieving higher performance. During the training we used back-propagation algorithm which gives in the outputs a maximum value of 1 for correct facial expression classification and 0 for all other expressions. These two values 0 and 1 are estimated in the final layer by SOFTMAX function. The activation function in the single hidden layer is a sigmoid function. IV. RESULTS AND DISCUSSIONS In this part, we are going to present the results after the training and test procedures of the algorithm based on a neural network classification. The database consists of ten individuals displaying their six emotions of facial expression: joy, surprise, sadness, anger, disgust, and fear. This gives us a database of 60 emotions. Data has been divided as follow: 70% for the learning phase and 30% for the test. In the first experimentation

bad results have been obtained due to data redundancy. Registration speed of facial features is fast (25 fps) so that the same facial landmark points have been recorded over time. After training the Multi layer feed forward network by the new labelled data with 0% redundancy, a very good results has been obtained. During the first neural network training, the number of hidden neurons was set to five neurons and then incremented iteratively until we got an optimal and efficient neural network. The figures below show classification result of the trained NN in two cases. In the first case (figure 3) facial distances have been computed using L1 norm, whereas in the second case (figure 4) a L2 norm has been used. In these two figures, classification results have been shown, only, for four training stages while increasing the number of neurons of the hidden layer.

TABLE 1. COMPARISON B ETWEEN AVERAGE RECOGNITION R ATES OF T HE S IX F ACIAL E XPRESSION S TUDIED IN T HE C ASE OF MANHATTAN AND E UCLIDEAN DISTANCES Average recognition rate

The quantity of hidden neurons

5 10 20 30 58

Manhattan L1 norm

Euclidean L2 norm

0.837 0.919 0.988 0.966 1.000

0.840 0.930 1.000 0.977 ---

In table 1 regarding Euclidean distance good recognition rates have been obtained, in an early training stage, by an average of 100% at 20 neurons in the hidden layer of the neural network whereas the best Manhattan rates have been obtained at 58 hidden neurons. Despite the fact that Manhattan distance is the distance that gives the closest approximation between the real distance of image pixels and theoretical distance, Euclidean distance gives results better than the Manhattan distance by a difference of 28 neurons in the hidden layer which means a difference of 203448 interconnections between the two trained neural networks. CONCLUSION

Fig. 3. Manhattan classification results based on a neural network classifier.

Emotion recognition is a subject matter that gained the interest of scientists from different fields including psychology, medicine, informatics, and mathematic fields. This paper reports a comparative study between two methods of distance computation for facial expression recognition issue. The classification of Manhattan and Euclidean distance has been realized using a neural network classifier to recognize six emotions including joy, surprise, sadness, anger, disgust, and fear. These two methods attain the same recognition rates by an average of 100%, except that each one attains this rate at different neural network training stages. Euclidean distance classifies facial expressions studied with high accuracy and fewer hidden neurons in comparison with Manhattan distance. As a first experiment, the paper presents classification results of six universal psychological states of normal individuals stored in small dataset. Future directions would be to recognize different facial expressions (not necessarily universal emotions) and behaviour of individuals suffering from different anomalies including health problems and mental problems. For that the coming works would be dedicated to collect and classify data according to normal or unnatural behaviour. Then train classifiers to recognize unnatural behaviours beside of normal ones.

Fig. 4. Euclidean classification results based on a neural network classifier.

REFERENCES [18] [1] [2] [3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

C. Darwin, “The Expression of Emotions in Man and Animals”, University of Chicago Press, 1965. ZHANG, Zhengyou. Microsoft kinect sensor and its effect. IEEE multimedia, 2012, vol. 19, no 2, p. 4-10. WEISE, Thibaut, BOUAZIZ, Sofien, LI, Hao, et al. Realtime performance-based facial animation. In : ACM Transactions on Graphics (TOG). ACM, 2011. p. 77. VERA, Lucía, GIMENO, Jesús, COMA, Inmaculada, et al. Augmented mirror: interactive augmented reality system based on kinect. In : IFIP Conference on Human-Computer Interaction. Springer Berlin Heidelberg, 2011. p. 483-486. COHN, Jeffrey F., SCHMIDT, Karen, GROSS, Ralph, et al. Individual differences in facial expression: Stability over time, relation to self-reported emotion, and ability to inform person identification. In : Proceedings of the 4th IEEE International Conference on Multimodal Interfaces. IEEE Computer Society, 2002. p. 491. MCDUFF, Daniel, MAHMOUD, Abdelrahman, MAVADATI, Mohammad, et al. AFFDEX SDK: a cross-platform real-time multiface expression recognition toolkit. In : Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems. ACM, 2016. p. 3723-3726. SENECHAL, Thibaud, MCDUFF, Daniel, et KALIOUBY, Rana. Facial action unit detection using active learning and an efficient non-linear kernel approximation. In : Proceedings of the IEEE International Conference on Computer Vision Workshops. 2015. p. 10-18. FASEL, Beat, MONAY, Florent, et GATICA-PEREZ, Daniel. Latent semantic analysis of facial action codes for automatic facial expression recognition. In : Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval. ACM, 2004. p. 181-188. P. Ekman and W. Friesen. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, 1978. GRECHE, Latifa and ES-SBAI, Najia. Automatic system for facial expression recognition based histogram of oriented gradient and normalized cross correlation. In : Information Technology for Organizations Development (IT4OD), 2016 International Conference on. IEEE, 2016. p. 1-5. HAN, Dongxu, AL JAWAD, Naseer, et DU, Hongbo. Facial expression identification using 3d geometric features from microsoft kinect device. In : SPIE Commercial+ Scientific Sensing and Imaging. International Society for Optics and Photonics, 2016. p. 986903-986903-14. YOUSSEF, Amira E., ALY, Sherin F., IBRAHIM, Ahmed S., et al. Auto-optimized multimodal expression recognition framework using 3D kinect data for ASD therapeutic aid. International Journal of Modeling and Optimization, 2013, vol. 3, no 2, p. 112. MALAWSKI, Filip, KWOLEK, Bogdan, et SAKO, Shinji. Using kinect for facial expression recognition under varying poses and illumination. In : International Conference on Active Media Technology. Springer International Publishing, 2014. p. 395-406. PIANA, Stefano, STAGLIANO, Alessandra, ODONE, Francesca, et al. Real-time automatic emotion recognition from body gestures. arXiv preprint arXiv:1402.5047, 2014. MAO, Qi-rong, PAN, Xin-yu, ZHAN, Yong-zhao, et al. Using Kinect for real-time emotion recognition via facial expressions. Frontiers of Information Technology & Electronic Engineering, 2015, vol. 16, no 4, p. 272-282. HESHAM, A. ALABBASI, MOLDOVEANU, Florica, MOLDOVEANU, Alin, et al. Facial Emotion Expression Recognition With brain Activates Using Kinect Sensor V2. International Research Journal of Engineering and Technology (IRJET), 2015, vol. 2, no 02, p. 421-428. VINEETHA, G. R., SREEJI, C., et LENTIN, J. Face expression detection using Microsoft Kinect with the help of artificial neural

[19]

[20]

[21]

[22]

network. Trends in Innovative Computing, 2012. PUICA, M. et FLOREA, Adina-Magda. Towards a computational model of emotions for enhanced agent performance. 2013. Thèse de doctorat. Ph. D thesis, University Politehnica of Bucharest, Romania. YAMAGUCHI, Naoya, CACERES, Maria Navarro, DE LA PRIETA, Fernando, et al. Facial Expression Recognition System for User Preference Extraction. In : Distributed Computing and Artificial Intelligence, 13th International Conference. Springer International Publishing, 2016. p. 453-461. SAUDAGARE, Pushpaja V. et CHAUDHARI, D. S. Facial expression recognition using neural network–An overview. International Journal of Soft Computing and Engineering (IJSCE), 2012, vol. 2, no 1, p. 224-227. SAATCI, Yunus et TOWN, Christopher. Cascaded classification of gender and facial expression using active appearance models. In : Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th International Conference on. IEEE, 2006. p. 393-398. HAGAN, Martin T., DEMUTH, Howard B., BEALE, Mark H., et al. Neural network design, PWS Pub. Co., Boston, 1996, vol. 3632.