FUZZY-LOGICAL IMPLEMENTATION OF CO-OCCURRENCE RULES FOR COMBINING AUS A. Wojdeł, L.J.M.Rothkrantz and J.C. Wojdeł Knowledge Based Systems Group Delft University of Technology Mekelweg 4, 2628 CD Delft The Netherlands
[email protected] ABSTRACT In this paper we present how to implement the cooccurrence rules defined by psychologist Paul Ekman in a computer animated face. The rules describe the dependencies between the atomic observable movements of the human face (so called Action Units). They are defined in a form suitable for a human observer who needs to produce a consistent binary scoring of visible occurrences on the human face. They are not directly applicable to automated animation systems that must deal with facial geometry, smooth changes in the occurrences intensities etc. In order to be able to utilize the knowledge about human faces which is present in the work of Ekman, we chose a fuzzy logical approach by defining the co-occurrence rules as specific fuzzy-logical operators. KEY WORDS Facial animation, FACS, A.I. based animation, fuzzy logic
1 Introduction It is a common human desire to understand in depth what is the message behind the verbal part of communication. But a message has also non-verbal aspects. If we are aware of the effect that our body-language or facial expressions can have on the other person (whether he/she is conscious of this influence or not), we can control them in such a way that the communication proceeds in the most efficient and beneficial way [1]. The amount of popular books on the topic of body-language, emotional conversation etc. shows clearly that the influence of the non-verbal part of human-to-human communication should not be underestimated [2]. The appearance of a human face is not only responsible for the nonverbal part of the communication. It performs also an active role in speech understanding. It is known that even normal-hearing people use lip-reading to some extent. Further, studies show that the visibility of the whole face [3] together with the rest of the human body [4, 5] increases the communication efficiency. Appropriate facial expressions or body gestures not only improve intelligibility of speech but also can be used as a re-
placement for specific dialogue acts (such as confirmation or spatial specification). Therefore, it is understandable that, as soon as computers became multi-modal communication devices, the need for robust facial animation became apparent. The topic of computer generated facial animation ranges from cartoon like characters [6], through simple 3D models [7] to realistic 3D models that can be used in the movies instead of real actors [8]. A common approach to facial animation is to use 3D parametric models of the face represented by the polygon topology. The parameterization is done by grouping vertices together to perform predefined tasks. The parameters can be varied and each of their infinite combinations represents some facial expression. In this flexibility lays both the strength and weakness of parametric models. Parametric models – in contrast to for example key-frame models – can easily be used to generate unrealistic facial expressions, expressions that are either physically or psychologically impossible. Only the complex physiologically based models guarantee the physical validity of rendered expressions. Therefore, in automated facial animation systems, it is very important to define constraints and co-occurrence rules for parameters.
2 Facial Expressions Modeler Our system for facial animation is inspired by the Facial Action Coding System (FACS) introduced by P. Ekman and F.W. Friesen [9]. FACS is based on Action Units (AUs), where each AU represents the simplest facial movement, which cannot be divided into more basic ones. There are 44 AUs representing movements on the face surface, 8 AUs describing movements of the whole head, and 6 AUs related to gaze direction. According to Ekman and Friesen, each facial expression can be described as an appropriate combination of those AUs. Our animation system is developed in two separate parts: text processing and expressions processing (Figure 1). The first part, text processing, facilitates interaction with a user. Here, the user can design the animation. The user is aided in this task by being provided with a facial ex-
Text
Text Processing
Dictionary of Facial Expressions
Text with signs Facial Expressions Generator
Expressions Processing
Expressions Synchronizator
Lips Synchronizator
Knowledge
Sign(t)
sign->set of AUs Translator
AU(t)
Action Units Blender
Combined AUs
Model of the Face
Face Animator
AU(t)
3D Animated Face
Figure 1. An overview of the developed animation system
Figure 2. A screenshot of the implemented facial animation system
pressions script language. This script language contains a predefined set of facial expressions together with descriptions of their meanings and a multi-modal query system. More about our facial expression script language can be read in [10]1 . The second part of the system, expressions processing, is fully automatic. Input data in the form of text accompanied with the representation of facial expressions is processed automatically and results in a rendered facial animation. The facial model used in our system was designed with two constraints in mind: it should produce realistic (in behavioral sense) facial expressions and it should not limit the amount of available expressions. For those reasons our model is performance based (the facial movements are modeled from recordings of a real person) and at the same time parameterized (so that we are not confined only to the movements that were actually recorded). We 1 See
also the query system at
http://www.kbs.twi.tudelft.nl/People/Students/E.J.deJongh/
want also to reuse as much of the knowledge contained in the works of Ekman as possible. Therefore, in our facial model, each parameter corresponds to one of the AU from FACS. Each facial parameter is automatically adjusted in such a way that the resulting facial deformation optimally represents the AU performed by the subject on which the model is trained [11]. Moreover we have implemented the methods to accumulate displacements from separate AUs together, and rules on how to show different combinations of AUs [12]. This should be sufficient – according to Ekman – to generate any desired facial expression. Figure 2 shows the software for directly controlling the facial model. The animation system is aimed at animating the face in the context of non-verbal communication between people or between human and machine. Therefore we restricted our implementation only to those facial parameters that correspond to AUs which are really used in every day face-to-face communication. There is a fair amount of AUs that were not taken into consideration for this reason (such as: AU29 - Jaw Thrust, AU33 - Blow or AU35 - Suck). A full list of implemented AUs is presented in Table 1.
3 Action Units and their co-occurrences AUs can be scored independently. There are restrictions on how different AUs interact with each other or whether they are allowed to occur together at all. Ekman introduces 5 different generic co-occurrence rules which describe the way in which AUs combine and influence each other. First of all the combination of the AUs can be additive. In such a case they operate as if they were activated separately and the resulting facial movement is a plain summation of the separate displacements. Additive combinations usually occur when involved AUs appear on separate areas of the face. Further one AU can dominates over the other, diminishing therefore the results of the activation of the latter AU. The example of such interaction is combina-
AU5
AU43
AU63
AU7
AU9
AU1
AU6
AU10
AU24
Domination
AU17
AU26
AU64
AU12 AU61
AU62
AU51
AU52
AU53
AU54
AU55
AU56
AU2
AU57
AU58
AU4
AU20
AU18
AU16
AU15
Exclusion
Opposition
AU25
AU27
AU23
AU22
AU28
Figure 3. Dependencies between Action Units implemented in our system
tion of AU9 and AU10. Activation of AU9 raises the upper lip as a side effect of nose wrinkling and therefore it diminishes the result of AU10 activation. In case where AUs can not be scored simultaneously because the anatomy of our face doesn’t allow us to score both AUs at the same time, we say that they combine in an alternative way. There is also a possibility of substitution in case when the occurrence of two AUs at the same time is equivalent to activation of a third AU alone. In the end all of the exceptions that cannot be modeled in the above mentioned ways fall into a group of different ways of combining AUs. Even though there are only 5 classes of AU interaction, the overall set of restrictions in FACS is far from simple. Figure 3 contains a chart with co-occurrence rules for selected AUs that are implemented in our system. The graph in Figure 3 is directed which reflects the fact that not all of the interactions are mutual (e.g. AU15 dominates over AU12, but the changes of AU12 do not influence AU15 at all). The description of co-occurrence rules provided by Ekman is in a verbal form and operates on a binary scoring system in which any given AU can be either active (1) or not (0). There are several exceptions to this binary schema. In cases where the intensity of observed facial deformation could not be disregarded, the FACS introduces three additional categories of AU intensity called low, medium and high. They are denoted by appending to AU number one of the letters x, y or z respectively. It is obvious that the facial model cannot be based directly on discrete values of AU activations. The changes in the facial geometry need to be continuous in order to yield a smooth and realistic (not to mention visually pleasant) animation. That requires a continuous control parameter set. The AUs co-occurrence rules can be used to establish the dependencies between facial parameters. We need to ensure that the results obtained from the animation system comply with those rules in all combinations of model parameters.
4 Implementation of co-occurrence rules In order to implement the restrictions described in Ekman’s work, we decided to implement a separate module in our system. This module is called AU Blender and it takes a list of AUs with their respective activation values, and produces a new list which has modified activations that conform to the co-occurrence rules described in FACS. We will denote the incoming AU activations by their respective names and the outgoing activations will be put in square brackets. This process is realized in a form of fuzzy processing that extends the Boolean logic described in FACS. We will present here all the implemented classes of interactions between AUs on the specific examples. Each description of implementation is referred by its name and followed with the example notation used in FACS. Domination (63>5). The domination rule says that if AU63 is activated it overshadows AU5. In other words, AU5 is activated only if the absence of AU63 allows for it. The Boolean logic of this rule would be:
:AU63 ^ AU5) ) [AU5]
(
The fuzzy logical implementation of the above rule is: [AU5] = minf1
AU63; AU5g
Domination of multiple AUs (6>7, 9>7). AU7 is suppressed if either AU6 or AU9 are activated. This is a straightforward extension of the previous rule:
:AU6 ^ AU7) ^ (:AU9 ^ AU7) ) [AU7]
(
Which is equivalent to the following:
:AU6 ^ :AU9 ^ AU7) ) [AU7]
(
Therefore it is implemented as: [AU7] = minf1
AU6; 1
AU9; AU7gg
Table 1. Implemented Action Units AU AU1 AU2 AU4 AU5 AU6 AU7 AU9 AU10 AU12 AU15 AU16
description Inner Brow Raiser Outer Brow Raiser Brow Lowerer Upper Lid Raiser Cheek Raiser Lid Tightener Nose Wrinkler Upper Lip Raiser Lip Corner Puller Lip Corner Depressor Lower Lip Depressor
AU AU17 AU18 AU20 AU22 AU23 AU24 AU25 AU26 AU27 AU28 AU43
Domination of AU combination (20+23>18). AU20 and AU23 together dominate over AU18. That means that only if both AU20 and AU23 are activated, the AU 18 is suppressed:
:
( (AU20
^ AU23) ^ AU18) ) [AU18]
After fuzzification: [AU18] = minf1
;
:AU15z ^ AU12) ) [AU12]
It introduces a new logical variable AU15z which represents a subclass of all facial deformations described by AU15 that can be considered strong. In a fuzzy logical implementation, AU15z is actually a function of the value of activation of AU15. We can use here a typical trapezoid member functions often used in fuzzification. The final implementation of this rule follows the one described for the domination rule, with AU15z being used instead of AU15. Exclusion (18@28). The interaction between AU18 and AU28 is described in FACS in such a way that they cannot be scored together. In our implementation we privileged AU18 in such a way that its appearance cancels the scoring of AU28. This relation is actually pretty similar to the domination rule, but with a much stronger interaction between the AUs. This kind of behavior can be described as follows: you are allowed to score AU28 only if activation of AU18 is negligibly small. This interpretation of the rule yields the following Boolean realization:
^ AU28) ) [AU28]
description Head Turn Left Head Turn Right Head Up Head Down Head Tilt Left Head Tilt Right Eyes Turn Left Eyes Turn Right Eyes Up Eyes Down
[AU51] = maxf0; AU51 [AU52] = maxf0; AU52
;
(
AU AU51 AU52 AU53 AU54 AU55 AU56 AU61 AU62 AU63 AU64
Opposite AUs (51@52). The FACS manual describes the interaction between AU51 and AU52 also as exclusion. However we can see, that AU51 and AU52 describe two opposite movements of the head. In order to preserve the apparent symmetry of the relation we need a fuzzy logical opposition operator that does not have a Boolean counterpart:
fAU20 AU23g AU18g
min
Domination of a strong AU (15z>12). AU15 dominates over AU12 only if it is strongly activated. The Boolean version of this rule is simply a realization of domination rule:
(AU18x
description Chin Raiser Lip Puckerer Lip Stretcher Lip Funneler Lip Tightener Lip Presser Lips Part Jaw Drop Mouth Stretch Lip Suck Eyes Closed
AU52g AU51g
The resulting activations of AU51 and AU52 are depicted in Figure 4.
5 Results With all the co-occurrence rules from FACS implemented, the AU Blender module can be used to correct the input activations so that they do not conflict with each other. Our fuzzy-logical implementation has been tested on a wide range of the input parameters. In this section we will go through some examples of the obtained corrections. Figure 5 shows in tabular form the presented examples. Each row contains two independent facial expressions generated by our system, their uncorrected combination, and the result of blending them together in accordance with cooccurrence rules. The first example in Figure 5 shows the results of applying the exclusion rule when combining the expressions containing AU25 and AU27 (27@25). It can be seen that those two AUs when combined together result in abnormal shape of the mouth opening. In the second example, the AU1 is dominated by AU9 (1