fuzzy-logical implementation of co-occurrence rules for combining aus

3 downloads 0 Views 219KB Size Report
ABSTRACT. In this paper we present how to implement the co- occurrence rules defined by psychologist Paul Ekman in a computer animated face. The rules ...
FUZZY-LOGICAL IMPLEMENTATION OF CO-OCCURRENCE RULES FOR COMBINING AUS A. Wojdeł, L.J.M.Rothkrantz and J.C. Wojdeł Knowledge Based Systems Group Delft University of Technology Mekelweg 4, 2628 CD Delft The Netherlands [email protected] ABSTRACT In this paper we present how to implement the cooccurrence rules defined by psychologist Paul Ekman in a computer animated face. The rules describe the dependencies between the atomic observable movements of the human face (so called Action Units). They are defined in a form suitable for a human observer who needs to produce a consistent binary scoring of visible occurrences on the human face. They are not directly applicable to automated animation systems that must deal with facial geometry, smooth changes in the occurrences intensities etc. In order to be able to utilize the knowledge about human faces which is present in the work of Ekman, we chose a fuzzy logical approach by defining the co-occurrence rules as specific fuzzy-logical operators. KEY WORDS Facial animation, FACS, A.I. based animation, fuzzy logic

1 Introduction It is a common human desire to understand in depth what is the message behind the verbal part of communication. But a message has also non-verbal aspects. If we are aware of the effect that our body-language or facial expressions can have on the other person (whether he/she is conscious of this influence or not), we can control them in such a way that the communication proceeds in the most efficient and beneficial way [1]. The amount of popular books on the topic of body-language, emotional conversation etc. shows clearly that the influence of the non-verbal part of human-to-human communication should not be underestimated [2]. The appearance of a human face is not only responsible for the nonverbal part of the communication. It performs also an active role in speech understanding. It is known that even normal-hearing people use lip-reading to some extent. Further, studies show that the visibility of the whole face [3] together with the rest of the human body [4, 5] increases the communication efficiency. Appropriate facial expressions or body gestures not only improve intelligibility of speech but also can be used as a re-

placement for specific dialogue acts (such as confirmation or spatial specification). Therefore, it is understandable that, as soon as computers became multi-modal communication devices, the need for robust facial animation became apparent. The topic of computer generated facial animation ranges from cartoon like characters [6], through simple 3D models [7] to realistic 3D models that can be used in the movies instead of real actors [8]. A common approach to facial animation is to use 3D parametric models of the face represented by the polygon topology. The parameterization is done by grouping vertices together to perform predefined tasks. The parameters can be varied and each of their infinite combinations represents some facial expression. In this flexibility lays both the strength and weakness of parametric models. Parametric models – in contrast to for example key-frame models – can easily be used to generate unrealistic facial expressions, expressions that are either physically or psychologically impossible. Only the complex physiologically based models guarantee the physical validity of rendered expressions. Therefore, in automated facial animation systems, it is very important to define constraints and co-occurrence rules for parameters.

2 Facial Expressions Modeler Our system for facial animation is inspired by the Facial Action Coding System (FACS) introduced by P. Ekman and F.W. Friesen [9]. FACS is based on Action Units (AUs), where each AU represents the simplest facial movement, which cannot be divided into more basic ones. There are 44 AUs representing movements on the face surface, 8 AUs describing movements of the whole head, and 6 AUs related to gaze direction. According to Ekman and Friesen, each facial expression can be described as an appropriate combination of those AUs. Our animation system is developed in two separate parts: text processing and expressions processing (Figure 1). The first part, text processing, facilitates interaction with a user. Here, the user can design the animation. The user is aided in this task by being provided with a facial ex-

Text

Text Processing

Dictionary of Facial Expressions

Text with signs Facial Expressions Generator

Expressions Processing

Expressions Synchronizator

Lips Synchronizator

Knowledge

Sign(t)

sign->set of AUs Translator

AU(t)

Action Units Blender

Combined AUs

Model of the Face

Face Animator

AU(t)

3D Animated Face

Figure 1. An overview of the developed animation system

Figure 2. A screenshot of the implemented facial animation system

pressions script language. This script language contains a predefined set of facial expressions together with descriptions of their meanings and a multi-modal query system. More about our facial expression script language can be read in [10]1 . The second part of the system, expressions processing, is fully automatic. Input data in the form of text accompanied with the representation of facial expressions is processed automatically and results in a rendered facial animation. The facial model used in our system was designed with two constraints in mind: it should produce realistic (in behavioral sense) facial expressions and it should not limit the amount of available expressions. For those reasons our model is performance based (the facial movements are modeled from recordings of a real person) and at the same time parameterized (so that we are not confined only to the movements that were actually recorded). We 1 See

also the query system at

http://www.kbs.twi.tudelft.nl/People/Students/E.J.deJongh/

want also to reuse as much of the knowledge contained in the works of Ekman as possible. Therefore, in our facial model, each parameter corresponds to one of the AU from FACS. Each facial parameter is automatically adjusted in such a way that the resulting facial deformation optimally represents the AU performed by the subject on which the model is trained [11]. Moreover we have implemented the methods to accumulate displacements from separate AUs together, and rules on how to show different combinations of AUs [12]. This should be sufficient – according to Ekman – to generate any desired facial expression. Figure 2 shows the software for directly controlling the facial model. The animation system is aimed at animating the face in the context of non-verbal communication between people or between human and machine. Therefore we restricted our implementation only to those facial parameters that correspond to AUs which are really used in every day face-to-face communication. There is a fair amount of AUs that were not taken into consideration for this reason (such as: AU29 - Jaw Thrust, AU33 - Blow or AU35 - Suck). A full list of implemented AUs is presented in Table 1.

3 Action Units and their co-occurrences AUs can be scored independently. There are restrictions on how different AUs interact with each other or whether they are allowed to occur together at all. Ekman introduces 5 different generic co-occurrence rules which describe the way in which AUs combine and influence each other. First of all the combination of the AUs can be additive. In such a case they operate as if they were activated separately and the resulting facial movement is a plain summation of the separate displacements. Additive combinations usually occur when involved AUs appear on separate areas of the face. Further one AU can dominates over the other, diminishing therefore the results of the activation of the latter AU. The example of such interaction is combina-

AU5

AU43

AU63

AU7

AU9

AU1

AU6

AU10

AU24

Domination

AU17

AU26

AU64

AU12 AU61

AU62

AU51

AU52

AU53

AU54

AU55

AU56

AU2

AU57

AU58

AU4

AU20

AU18

AU16

AU15

Exclusion

Opposition

AU25

AU27

AU23

AU22

AU28

Figure 3. Dependencies between Action Units implemented in our system

tion of AU9 and AU10. Activation of AU9 raises the upper lip as a side effect of nose wrinkling and therefore it diminishes the result of AU10 activation. In case where AUs can not be scored simultaneously because the anatomy of our face doesn’t allow us to score both AUs at the same time, we say that they combine in an alternative way. There is also a possibility of substitution in case when the occurrence of two AUs at the same time is equivalent to activation of a third AU alone. In the end all of the exceptions that cannot be modeled in the above mentioned ways fall into a group of different ways of combining AUs. Even though there are only 5 classes of AU interaction, the overall set of restrictions in FACS is far from simple. Figure 3 contains a chart with co-occurrence rules for selected AUs that are implemented in our system. The graph in Figure 3 is directed which reflects the fact that not all of the interactions are mutual (e.g. AU15 dominates over AU12, but the changes of AU12 do not influence AU15 at all). The description of co-occurrence rules provided by Ekman is in a verbal form and operates on a binary scoring system in which any given AU can be either active (1) or not (0). There are several exceptions to this binary schema. In cases where the intensity of observed facial deformation could not be disregarded, the FACS introduces three additional categories of AU intensity called low, medium and high. They are denoted by appending to AU number one of the letters x, y or z respectively. It is obvious that the facial model cannot be based directly on discrete values of AU activations. The changes in the facial geometry need to be continuous in order to yield a smooth and realistic (not to mention visually pleasant) animation. That requires a continuous control parameter set. The AUs co-occurrence rules can be used to establish the dependencies between facial parameters. We need to ensure that the results obtained from the animation system comply with those rules in all combinations of model parameters.

4 Implementation of co-occurrence rules In order to implement the restrictions described in Ekman’s work, we decided to implement a separate module in our system. This module is called AU Blender and it takes a list of AUs with their respective activation values, and produces a new list which has modified activations that conform to the co-occurrence rules described in FACS. We will denote the incoming AU activations by their respective names and the outgoing activations will be put in square brackets. This process is realized in a form of fuzzy processing that extends the Boolean logic described in FACS. We will present here all the implemented classes of interactions between AUs on the specific examples. Each description of implementation is referred by its name and followed with the example notation used in FACS. Domination (63>5). The domination rule says that if AU63 is activated it overshadows AU5. In other words, AU5 is activated only if the absence of AU63 allows for it. The Boolean logic of this rule would be:

:AU63 ^ AU5) ) [AU5]

(

The fuzzy logical implementation of the above rule is: [AU5] = minf1

AU63; AU5g

Domination of multiple AUs (6>7, 9>7). AU7 is suppressed if either AU6 or AU9 are activated. This is a straightforward extension of the previous rule:

:AU6 ^ AU7) ^ (:AU9 ^ AU7) ) [AU7]

(

Which is equivalent to the following:

:AU6 ^ :AU9 ^ AU7) ) [AU7]

(

Therefore it is implemented as: [AU7] = minf1

AU6; 1

AU9; AU7gg

Table 1. Implemented Action Units AU AU1 AU2 AU4 AU5 AU6 AU7 AU9 AU10 AU12 AU15 AU16

description Inner Brow Raiser Outer Brow Raiser Brow Lowerer Upper Lid Raiser Cheek Raiser Lid Tightener Nose Wrinkler Upper Lip Raiser Lip Corner Puller Lip Corner Depressor Lower Lip Depressor

AU AU17 AU18 AU20 AU22 AU23 AU24 AU25 AU26 AU27 AU28 AU43

Domination of AU combination (20+23>18). AU20 and AU23 together dominate over AU18. That means that only if both AU20 and AU23 are activated, the AU 18 is suppressed:

:

( (AU20

^ AU23) ^ AU18) ) [AU18]

After fuzzification: [AU18] = minf1

;

:AU15z ^ AU12) ) [AU12]

It introduces a new logical variable AU15z which represents a subclass of all facial deformations described by AU15 that can be considered strong. In a fuzzy logical implementation, AU15z is actually a function of the value of activation of AU15. We can use here a typical trapezoid member functions often used in fuzzification. The final implementation of this rule follows the one described for the domination rule, with AU15z being used instead of AU15. Exclusion (18@28). The interaction between AU18 and AU28 is described in FACS in such a way that they cannot be scored together. In our implementation we privileged AU18 in such a way that its appearance cancels the scoring of AU28. This relation is actually pretty similar to the domination rule, but with a much stronger interaction between the AUs. This kind of behavior can be described as follows: you are allowed to score AU28 only if activation of AU18 is negligibly small. This interpretation of the rule yields the following Boolean realization:

^ AU28) ) [AU28]

description Head Turn Left Head Turn Right Head Up Head Down Head Tilt Left Head Tilt Right Eyes Turn Left Eyes Turn Right Eyes Up Eyes Down

[AU51] = maxf0; AU51 [AU52] = maxf0; AU52

;

(

AU AU51 AU52 AU53 AU54 AU55 AU56 AU61 AU62 AU63 AU64

Opposite AUs (51@52). The FACS manual describes the interaction between AU51 and AU52 also as exclusion. However we can see, that AU51 and AU52 describe two opposite movements of the head. In order to preserve the apparent symmetry of the relation we need a fuzzy logical opposition operator that does not have a Boolean counterpart:

fAU20 AU23g AU18g

min

Domination of a strong AU (15z>12). AU15 dominates over AU12 only if it is strongly activated. The Boolean version of this rule is simply a realization of domination rule:

(AU18x

description Chin Raiser Lip Puckerer Lip Stretcher Lip Funneler Lip Tightener Lip Presser Lips Part Jaw Drop Mouth Stretch Lip Suck Eyes Closed

AU52g AU51g

The resulting activations of AU51 and AU52 are depicted in Figure 4.

5 Results With all the co-occurrence rules from FACS implemented, the AU Blender module can be used to correct the input activations so that they do not conflict with each other. Our fuzzy-logical implementation has been tested on a wide range of the input parameters. In this section we will go through some examples of the obtained corrections. Figure 5 shows in tabular form the presented examples. Each row contains two independent facial expressions generated by our system, their uncorrected combination, and the result of blending them together in accordance with cooccurrence rules. The first example in Figure 5 shows the results of applying the exclusion rule when combining the expressions containing AU25 and AU27 (27@25). It can be seen that those two AUs when combined together result in abnormal shape of the mouth opening. In the second example, the AU1 is dominated by AU9 (1

Suggest Documents