1
Fuzzy Neural Networks(FNN)-based Approach for Personalized Facial Expression Recognition with Novel Feature Selection Method Dae-Jin Kim and Zeungnam Bien
[email protected],
[email protected] Div. of EE, Dept. of EECS, KAIST 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, Korea Kwang-Hyun Park
[email protected] Human-friendly Welfare Robotic System Engineering Research Center, KAIST 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, Korea
Abstract— Facial expression recognition is considered as very important part in many human-robot/human-computer interaction system. Although there are already done so many works on this subject, it is not directly available technology due to its ignorance about individual differences. Thus, as one of the solutions for such problems, we introduce a personalized facial expression recognition system. Many previous works on facial expression recognition system focus on the formal six universal facial expressions with unified classification approach. However, it is very difficult to make such expressions for normal person without much effort and training. Thus, for dealing with such personalities in facial expression, we propose a method for personalization of the classifier and image processing steps. And results show that the proposed method is feasible with low computation load and relatively simple structure. Index Terms— Fuzzy Neural Networks, Facial Expression Recognition, Personalized, Feature Selection.
I. I NTRODUCTION HE facial expression recognition is one of the biosignalbased recognition techniques which is mainly focused by many researchers in these days [1]. Based on this technologies, various applications are available such as the user identification system, emotion monitoring system and so on. As same as other biosignal-based recognition techniques, facial expressions also show so much different charactersitics according to each individual. And it is hard to find a set of common features for various users. In many previous works on facial expression recognition, main research directions are based on the psychological results by Ekman et al.[2]. According to [2], there exist six universal facial expressions for six emotions (happy, sad, fear, angry, surprise and disgust) nonetheless the difference of race, culture, district, sex and so on. In the robotics area, the facial expression is also very important component for human-machine interaction. KISMET (AI Lab., MIT) and Face Robot (Science University of Tokyo) are good examples for that purpose [3][4]. They can show various kinds of facial expressions based on the
T
Ekman’s emotional model. And in the field of computer science and other engineering part, there are also very interesting and promising results using NN-based approach [5][6], Motionbased approach [7], expert system for facial expression recognition [8] and so on. However, when we consider real world applications, conventional approaches are hard to directly applicable due to their ignorance of personalized aspect of facial expressions from person to person. After constructing a classifier, it is better to make the classifier as simple as we can due to computation time, cost and so on. Furthermore, in view of pattern classification, a parsimonious network is desirable form in any cases. To make the classifier as simple as possible, feature selection (or node pruning) is considered after initial construction of classifier. Based on many related works on feature selection [9][10][11][12][13], we can summarize two important points for feature selection as follows: 1) Consideration of statistical properties in input domains, 2) Consideration of structural aspect of classifier itself Thus, in this paper, we propose a personalized facial expression recognition which can be adaptable (or learnable) to each person’s characteristics. Especially, fuzzy neural networks are adopted for constructing the personalized classifier of facial expression recognition. Fuzzy neural networks is adopted due to their capability for implementation of human expert’s knowledge-based decision making and learning function. So, it is easy task with fuzzy neural networks to include previous works in the field of psychology (Dr. Ekman’s work) with learning capability to ensure individual characteristics. A novel feature selection method based on the structure of classifier is proposed for further minimization of the network size. This paper is organized as follows. In Section II, the concept about personalized facial expression recognition will be given. According to this basic concept, each necessary process for facial expression recognition is described in following sections. In Section III, personalized facial feature extraction
2
Fig. 2. Face segmentation using CHAT procedure Fig. 1. Structure of personalized facial expression recognition system
process is mentioned according to each different step from face segmentation to facial feature extraction. And next, fuzzy neural networks-based classifier and feature selection method are explained in Section IV. The results of facial expression recognition is shown in Section V. Finally, concluding remarks are given in Section VI. II. P ERSONALIZED FACIAL E XPRESSION R ECOGNITION S YSTEM In this paper, we adopt the term ‘personalized’ because of following reasons. At first, in view of image processing, we developed some adaptive method to deal with individual’s different facial color component for effective face detection (or face segmentation) given cluttered scene. Furthermore, we also used various methods to deal with individual’s different characteristics about an arrangement of each facial component, the extraction of facial features and so on. Next, in view of classifier, each individual has different characteristics in its structure and input nodes according to their importance for classification result. Thus, we need a method to deal with such ‘personalization’ concept during classification task. For this, we propose a novel method based on the relations between the input nodes and the output nodes in the classifier. For this purpose, fuzzy neural networks is very appropriate method to deal with ‘personalization’ concept. Furthermore, after learning is done, we can derive the linguistic rules for each person’s facial expressions by analyzing fuzzy neural networks. Fig. 1 shows the whole structure of our recognition scheme. As shown in Fig. 1, we include ‘personalized’ concept for face segmentation process and feature selection process in each step. According to this structure, image features from the first ‘Feature Extraction’ stage are passed to the second ‘FNN-based classifier’ stage as an input of fuzzy neural networks-based classifier. III. FACIAL F EATURE E XTRACTION FOR P ERSONALIZED FACIAL E XPRESSION R ECOGNITION A. Face Segmentation For effective face segmentation, we used I1I2I3 color space [14] and ‘color histogram based adaptive threshold (CHAT)’. I1I2I3 color space is very effective for skin color segmentation. CHAT procedure is basically a method to get the new thresholds for skin color segmentation based on color histogram of each sub-band from color image [15]. With simple and fast procedure, we can get the necessary thresholds for skin color segmentation. Fig. 2 shows a temporal sequence of segmented
(a) Original facial images
(b) T-shape based deformable template matching for (a) Fig. 3. Coarse estimation of facial components
face images based on CHAT procedure. As shown in Fig. 2, the final face image contains only the face region without any other artifacts. B. Facial Component Extraction For effective facial component extraction, in this paper, we propose a ‘Coarse-to-Fine’ approach [16]. This approach is motivated by the human’s way of facial component extraction process. At first, to get the coarse position of whole facial components, we used a ‘T-shape’ based deformable template matching. As shown in Fig. 3, the facial components are arranged as ‘T-shape’ even though the face directions are widely changed. And next, for finely locating the facial components, well-known projection-based approach is taken for locating each facial component. C. Facial Feature Extraction According to previous steps, we can get the location of each facial component. And these facial components are categorized as permanent components which are constantly appeared nonetheless the change of facial expressions. On the contrary, the transient components are appeared or disappeared according to the change of facial expressions. Fig. 4 shows a typical set of transient components [17]. From each facial component, some meaningful features should be extracted for facial expression recognition. Among many possible features for different components, we selected five features according to their importance for facial expressions based on previous related work [18]. The selected features are (1) degree of mouth openness, (2) degree of eye openness, (3) the distance between the eyebrows and the eyes, (4) degree of nasal root wrinkles (NLR) and (5) degree of nasolabial furrows (NLF). For (3),(4) and (5), we used well-known image features such as Euclidean distance between two components and gabor-filtered coefficients [19].
3
IV. FACIAL E XPRESSION R ECOGNITION USING F UZZY N EURAL N ETWORKS A. Fuzzy Neural Networks
Fig. 4. Transient Facial Components [17]
(a) Openned eyes
(b) Closed eyes
Fig. 5. Feature extraction for eye openness
To get effective image features for (1) and (2), we propose a novel feature extraction method based on the human visual system-based approach. Human can extract meaningful features for eyes and mouth based on the combination of global and local features without much effort [20]. 1) Degree of mouth openness: Especially, the degree of mouth openness is measured by combining the global feature (the height ratio and the area ratio between the whole face and the mouth region) and the local feature to get the mouth openness itself. For example, if one person has very thick lips than others, we cannot easily know the degree of mouth openness just using global features. For local feature, we propose a gabor-gaussian feature [21]. Gabor-gaussian feature is a gaussian weighted sum of vertically projected gabor-filtered coefficients of mouth region. Eq. (1) shows a final form of gaborgaussian feature.
Fuzzy neural networks is a kind of neural networks-based implementation of fuzzy decision making system. It is characterized by the advantages of both systems such as knowledge representation of human experts and learning function of neural networks [23]. Fuzzy neural network mainly consists of 5-layered structure such as; 1) ‘Input Variable Node(or Input Feature Node)’, 2) ‘Input Linguistic Node’, 3) ‘Rule Node’, 4) ‘Output Linguistic Node’, 5) ‘Output Variable Node’. Input/Output Variable Nodes are similar to the conventional neural networks. The second layer is for representing input linguistic terms for each input variable node, and the fourth layer is for representing output linguistic terms for each output variable node. Each input/output linguistic nodes plays a function to produce the membership value between 0 and 1 with well-kwown bellshaped gaussian function defined by specific center value and deviation parameter. In the third layer, nodes in the second layer are connected to nodes in the fourth layer. This structural relationship corresponds to fuzzy decision making rules with antecedants(nodes in the second layer) and consequents(nodes in the forth layer). In Fig. 6, a simple network is given for denoting each layer. In an application to the facial expression recognition problem, Dr. Ekman’s psychological work is considered as a starting point to construct fuzzy neural networks-based classifier for individual. By including additional examples for each person’s different facial expression, learning phase of fuzzy neural networks can be performed for personalization. In learning phase, well-known error-backpropagation algorithm is used to determine appropriate parameters for bell-shaped gaussian function in each input/output linguistic node. To make the fuzzy neural networks-based classifier as small as possible, we have to consider the difference from person to person. According to our severe observations for various kinds of facial expressions from many people, we found that some
(1)
where, means the gaussian weights, is the absolute values of the derivative of the projected value , is the height of gabor-filtered image. 2) Degree of eye openness: Next, for the degree of eye openness, we investigated various eye images and found that the shape of the upper eyelid is quitely different between closed eyes and openned eyes. Thus, by extracting the shape information of the upper eyelid, we can easily estimate the degree of eye openness. For this purpose, we adopted the ‘dip’ feature in Log-polar mapped image [22] as shown in Fig. 5. The dip feature is used as a global feature in this case. For local feature, we used gabor-filtered coefficients.
Fig. 6. Simple example for FNN: 5-layered structure
4
specific facial features are more important than other facial features for each individual. For example, for who can move his/her eyebrows with one’s own will, eyebrows is more important factor with respect to other facial features to classify various facial expressions. Similar to this process, feature selection(or input node pruning) is a method to select appropriate feature set from original feature set according to a specific measure for evaluating the different importance of input features. In this paper, a novel feature selection method is proposed for fuzzy neural networks-based ‘personalized’ facial expression recognition problem on the basis of histogram of input linguistic nodes. B. Feature Selection by Histogram of input linguistic nodes Feature selection is studied with many researchers in the field of neural networks-based structure [9][10][11][12][13]. Few works are concentrated on the field of fuzzy neural networks, however, they don’t consider the unique characteristics of fuzzy neural networks such as the usage of human expert’s knowledge[11]. Thus, in this paper, we focused on the fact that fuzzy neural networks can represent the human expert’s rule with input/output relation through each rule node. To consider the input/output relations in the classifier, we define a concept so called ‘histogram of input linguistic nodes’. Briefly speaking, histogram of input linguistic nodes is a derived information from the input/output relations in the classifier in numerical form. When we consider the bins of histogram as input linguistic nodes, we can easily acquire the historam of input linguistic nodes which is related to output linguistic nodes. For the better understanding, we explains how to get the histogram of input linguistic nodes using a simple network in Fig. 6. This network consists of two input variable nodes with two linguitic nodes ‘L’ and ‘H’ for each input variable node. Also, output part has similar structure as input part. From the structure of Fig. 6, for each rule node in the third layer, we can acquire 2D relational matrix according to their connection between nodes in the second layer and nodes in the fourth layer as shown in Table I. In Table I, each element denotes the relation for each pair of input linguistic node and output linguistic node as integer value. ‘0’ stands for non-connected pair, ‘1’ or ‘2’ denotes how many connections are exist for selected pair. As a symbolic way, each element in 2D relational matrix can be represented as . For example, if we consider the pair of ‘ ’ and ‘ ’, the pair is connected two times in Fig. 6 and is allocated the value ‘2’.
Now, in view of input linguistic nodes, we can define two functions which is necessary to construct the histogram of input linguistic nodes as shown in Eqn. (2).
(2)
In Eqn. (2), is a familiar concept such as conventionally used histogram whose bins are selected as input linguistic nodes. For each input linguitic node , all the elements in 2D relational matrix are added to denote the histogram value. However, it is not sufficient to represent the relation between input and output effectively. Because the term ‘H’ is more important than the term ‘L’ in view of classification, we can assign additional value for each element to consider the importance of the term ‘H’ as . In , the weight function ‘ ’ is defined as follows.
Thus, using values to construct the histogram of input linguistic nodes, we can count the importance of each input linguistic node more effectively in view of classification. According to Eqn. (2), 2D relational matrix in Table I have two kinds of histogram as shown in Fig. 7(a) and Fig. 7(b). As shown in Fig. 7, is more useful to determine the importance of each input variable node (or input feature) than . As a simple strategy, we may delete input linguistic nodes with negative value because their importance is low than
(a) Histogram by
TABLE I 2D RELATIONAL MATRIX
0 0 1 2
1 1 1 0
1 2 2 1
1 0 1 1
(b) Histogram by Fig. 7. Histogram of input linguistic nodes for the network shown in Fig. 6
5
Fig. 8. An example set of facial images with seven facial expressions [2]
other nodes with positive value. In some cases, we also may delete the nodes with most negative value due to its worst performance among all nodes with negative value. V. R ECOGNITION R ESULTS As the classifier, we construct fuzzy neural networks-based classifier as shown in Fig. 9. The inputs are five image features and the outputs are degree for each facial expression. To show the performance of our proposed method, we used well-known facial image DB which contains 110 images from 14 persons [2]. In here, we focus on the performance of our proposed feature selection method with respect to the size of the network, the computation time, the recognition ratio for statistically varying disturbance and so on. Using five image features in previous section and fuzzy neural networks with five inputs and seven outputs, we construct a facial expression recognition system based on proposed procedure. Here, seven outputs are selected to consider six universal facial expressions plus neutral face from Ekman’s work (see Fig. 8). Next, we generate two additional classifier with proposed feature selection method. According to the procedure given by previous section, we get the histogram of input linguistic nodes as shown in Fig. 10. Based on the histogram of input linguistic nodes in Fig. 10, we can make two different classifiers such as one with exceptions of two nodes (excluding the nodes with most negative value such as ‘ ’ and ‘ ’) and the other with exceptions of five nodes (excluding all nodes with negative value such as ‘ ’, ‘ ’, ‘ ’, ‘ ’ and ‘ ’). For each classifier, we input each person’s facial feature set with a different standard
Fig. 10. Histogram of input linguistic nodes for personalized facial expression recognition system
Fig. 11. Recognition ratio of different FNNs given standard deviations
deviation value. Specifically, we used the value such as 1%, 2%, 5%, 10%, 20% and 40%. Fig. 11 shows the recognition ratio for this test with varying the standard deviation values and different structures. As shown in Fig. 11, even with 5% variations, the recogntion ratio keeps around 100%. However, from 10% variations, large errors are occurred for FNN with exceptions of five nodes. Table II shows the comparison among three classifiers in view of its size and computation time. From this result, we know that proposed method can provide not only more efficient classifier with highly reduced structure but also less computation time. VI. C ONCLUDING R EMARKS In this paper, a personalized facial expression recognition scheme is proposed with novel feature selection scheme. A preliminary result for proposed scheme is given with different structures of the classifier and various deviations of image feaTABLE II C OMPARISON AMONG THREE CLASSIFIERS
Fig. 9. Structure of Fuzzy Neural Networks
# of branches # of nodes Reduced size Computation time (msec)
No exception 73 43
2 exceptions 64(87.6%) 41(95.3%) 83.5%
5 exceptions 47(64.4%) 38(88.4%) 56.9%
18.5
17.7(95.6%)
16.1(87.3%)
6
tures. As shown in Section V, we can make reasonably smallsized, fast runnable recognition system with proposed methods. Proposed scheme is very promising for various human-friendly system such as pet robot, intelligent residential space, other service robotic systems and so on. As a further work, we are studying about the usage of statistical properties for input features to make better feature selection scheme. R EFERENCES [1] 1st Workshop on Biometrics, Korea, 2001. [2] P. Ekman et al. “Facial Action Coding System”, 1978. [3] C. Breazeal et al. “A context-dependent attention system for a social robot”, IJCAI99, pp. 1146-1151, 1999. [4] F. Hara and H. Kobayashi. “An Animate Face Robot”, The 5th Conference on Interface for Real and Virtual Worlds, 1996. [5] C. Huang and Y. Huang, “Facial expression recognition using modelbased feature extraction an d action parameters classification,” Journal of Visual Communication and Image Presentation, vol. 8, no. 3, pp. 278– 290, 1997. [6] M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, “Coding facial expressions with gabor wavelets,” in IEEE International Conference on Automatic Face and Gesture Recogn ition, pp. 200–205, 1998. [7] I. Essa and A. Pentland, “Coding, analysis interpretation, recognition of facial expressions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 757–763, 1997. [8] M. Pantic and L. Rothkrantz, “Automatic analysis of facial expressions: The state of the art,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1424–1445, 2000. [9] K. Chung et al., “Performance Comparison of Several Feature Selection Methods Based on Node Pruning in Handwritten Character Recognition,” Proceedings of ICDAR’99, pp. 11–15, 1997. [10] V. Onnia et al., “Feature Selection method using neural networks,” Proceedings of ICIP2001, pp. 513–516, 2001. [11] D. Chakraborty et al., “Designing rule-based classifiers with on-line feature selection: a neuro-fuzzy approach,” Proceedings of AFSS2002, pp. 251–259, 2002. [12] J.-S. Han, W.-C. Bang and Z. Zenn Bien, “New Feature Set Selection Algorithm based on Soft Computing Techniques and its Application to EMG Pattern Classification,” International Journal of Fuzzy Optimization and Decision Making (in print). [13] M.-J. Kim, J.-S. Han, K.-H. Park, W.-C. Bang and Z. Zenn Bien, “Classification of Arrhythmia Based on Discrete Wavelet Transform and Rough Set Theory,” ICCAS 2001 (International Conference on Control, Automation and Systems), Jeju National University, Jeju, Korea, October 17-21, 2001. [14] Y. Ohta et al. “Color Information for Region Segmentation”, Computer Graphics and Image Processing, pp. 222-241, Vol. 13, 1980. [15] D.-J. Kim, “Image-based Personalized Facial Expression Recognition using Fuzzy Neural Networks,” Doctor’s Thesis Proposal, KAIST, Korea, 2002. [16] F. Fleuret et al. “Coarse-to-fine visual selection”, citeseer.nj.nec.com/fleuret99coarsetofine.html. [17] Y.-L. Tian et al. “Recognizing action units for facial expression analysis”, IEEE Transactions on PAMI, pp. 97-115, Vol. 23, No. 2, 2001. [18] G.-T. Park et al. “Fuzzy Observer approach to Automatic Recognition of Happiness using Facial Wrinkle Features”, FUZZ-IEEE’99, pp. 15731578, 1999. [19] Z. Zhang et al. “Comparison between geometry-based and gaborwavelets-based facial expression recognition using multi-layer perceptron”, IEEE International Conference on Automatic Face and Gesture Recognition, pp. 454-459, 1998. [20] G. Donato et al. “Classifying facial actions”, IEEE Transactions on PAMI, pp. 974-989, Vol. 21, No. 10, 1999. [21] D.-J. Kim et al. “Effective intention reading technique as a means of human-robot interaction for human centered systems”, IEEE International Conference on Fuzzy Systems, S303, 2001. [22] C.F. Weiman “Tracking algorithm using log-polar mapped image coordinates”, SPIE: Intelligent Robots and Computer Vision VIII: Algorithms and Techniques, pp. 843-853, Vol. 1192, 1989. [23] C.T. Lin “Neural Fuzzy Control Systems with Structure and Parameter Learning”, World Scientific, 1994. [24] S. Abramowski et al. “Service management for personalized services”, Intelligent Network ’94 Workshop, pp. 239-251, Vol. 1, 1994.