An Expressive System for Endowing Robots or Animated Characters ...

7 downloads 5970 Views 109KB Size Report
Blue, a computer simulation of an embodied mind that includes emotion and is designed to learn from its environment ... states of a computer system or a human user (for whom it stands in proxy). An avatar may ..... System (FACS). Consulting.
An Expressive System for Endowing Robots or Animated Characters with Affective Facial Displays Craig Latta, Nancy Alvarado, Sam S. Adams, Steve Burbeck IBM, Thomas J. Watson Research Center Abstract The expressive system presented in this chapter is designed to link a model of emotion (or other underlying internal states) with socially appropriate expressive behaviors in an intelligent agent or robot. The system was initially developed to provide an affective interface to Joshua Blue, a computer simulation of an embodied mind that includes emotion and is designed to learn from its environment and interact effectively in social contexts. While many of the components of this system are not new, their combination in an expressive system is innovative. The advantages of this implementation include the ability to accommodate different emotion models (or models of other internal states), to easily specify dynamic sequences of expressive behavior, and to directly inspect, continuously monitor, and intervene to change the underlying states in a system as they occur. The system currently uses a real-time 3D facial avatar with a simplified anatomical model, but may be readily modified to direct the expressive behaviors of more complex animated characters or robots.

Introduction Expressive behavior in intelligent agents or robots serves several important purposes. First, it permits an entity to interact effectively with other expressive beings in social contexts. In many applications, this may be the primary function of the expressive behavior. Naturalness and believability,

social effectiveness, and meaningfulness of displays to human observers are important concerns. Second, expressive behavior provides information to the designer about the internal states of the software. This latter function is essential to proving assertions that an emotion model has influenced the behavior of an agent or robot. In more complex systems with greater behavioral autonomy, the better the information conveyed by expressive behavior, the better a designer can understand the impact of affect on a machine’s cognition, which is less directly observable. Third, expressive behavior, particularly facial expression, may provide a visualization tool for monitoring the complex internal states of a computer system. In this application, where expression replaces analog or numerical readouts, it is most important that the expressions corresponding different states be quickly identifiable and well differentiated from each other, so that few interpretation errors are made by human observers. Robots may use a variety of methods to portray facial expression, ranging from mechanical faces like that of Kismet (Breazeal, 2001) to animated faces portrayed on a display screen mounted on the robot’s body. Two main approaches to the portrayal of facial activity by animated figures include embodied conversational agents and avatars. A conversational agent interacts with computer users and may thus be considered a type of user interface. An avatar, on the other hand, represents the internal states of a computer system or a human user (for whom it stands in proxy). An avatar may also interact with users, but its function and purpose in doing so may be different than that of a conversational agent. Both kinds of figures may employ similar methods of animation for different purposes. Whether implemented in a robot, conversational agent, or avatar, the greater the behavioral repertoire or the autonomy of a system, the greater the need for a model-driven expressive system. Methods of animating faces are reviewed by Parke & Waters (1996). Ruttkay & Noot (2001) enumerate the shortcomings of applications, such as FaceWorks, used to create synthetic characters: “The facial expressions of a specific type are identical, due to the fact that single (tracked or synthetic)

expressions are wired-in the system. The blending and concatenation of expressions is based on simple principles, often resulting in unnatural facial movement. While for visual speech the co-articulation and concatenation has een studies [sic] extensively, little attention has been paid on developing principles and methods to superimpose and blend facial expressions in time. The production of a subtle facial animation of a synthetic character is a tedious, low-level process which requires much professional skill and time, as there is neither enough available knowledge on the dynamism of human facial expressions, nor appropriate paradigms and tools to animate synthetic faces. The ‘look’ of a synthetic head is often non-photorealistic, but the expressions are realistic.” The use of models to guide facial expression generation overcomes the problem of wired-in predictability (Paradiso & Abbate, 2001; Ball & Breese, 2000), but designers have had little empirical research to guide them in creating such models. For example, Paradiso & Abbate (2001) use facial expression similarity scaling data (Schlosberg, 1952) to infer methods for blending expressions. Their assignment of intensity weights to modify their character’s basic expressions is ad hoc and untested – no research exists as a guide to how to differentiate intense and non-intense versions of the same affect, how to blend mixed emotional states, or how to combine affect with speech or other non-emotional facial movements. The expressive system described here addresses this difficulty and also provides a means of investigating these issues empirically. Believability and effectiveness in interactive contexts may be emphasized in conversational agents, whereas accuracy of representation is emphasized in avatars. Paradoxically, the believability of an agent may be enhanced by decreasing its realism. Picard and Klein developed an animated agent called Bruzzard. They deliberately altered the appearance and speech characteristics to make the agent appear vaguely foreign, with the goal of increasing tolerance for its less-than-perfect realism. Others have similarly modified their agents to resemble animals or exaggerated cartoon characters, so that the implicit assumption that realism is not being attempted will generate greater acceptance of imperfections

in accomplishing realism. A similar approach may be used in robotics. Investigation of the aspects of facial behavior that evoke social rejection can aid designers in building more believable agents, without resorting to extreme unrealism (which may have other drawbacks, including potential insult by caricature and stereotype, lack of gravity of presentation, lessened authority, less social identification or empathy with the agent). Unrealism may be appropriate for entertainment contexts but less appropriate in other settings, especially when an agent is used to motivate important behavior. It seems likely that the principles best employed in creation of embodied conversational agents may be different than those best employed in creation of avatars or especially visualization tools, where realism must contribute to accurate understanding of an internal state. Facial expressions are decoded by humans in order to attribute complex internal states to others. We may even have an evolved capacity for doing this quickly and concurrently with other cognitive activity. A problem experienced by those designing complex computer systems is how to present information about the status of such a system in a manner readily grasped by human operators. It may be difficult for an operator to make sense of an array of fluctuating values on a computer screen. Computer scientists interested in visualization techniques try to find ways of graphically presenting data so that it can be easily interpreted. The more complex the system, the more difficult this becomes. Building upon the pre-existing capacity of humans to interpret facial expressions may provide a visualization metaphor for representing complex internal states of computer-controlled systems. An avatar that smiles when a system is running optimally but shows negative affect when a system is outside important control parameters, can be useful in letting human operators tell at a glance that something is wrong, and perhaps even what, just as it does in human interactions. For such uses, subtle affect may be inappropriate and the basic emotions may be the best solution. Graded representation of such states may also be important, suggesting the need to understand how intensity of affect is conveyed by facial expression.

Additionally, we felt that the utility of an expressive system would be greatly enhanced if it were able to: (1) support different models of emotion; (2) be configured to control a wide variety of behaviors with minimal human effort; and (3) allow the designer to easily inspect, monitor, or change the internal affective states of the underlying emotion system, through the same expressive interface. We designed this expressive system for use with Joshua Blue, a computer simulation of an embodied mind that must ultimately function in a wide variety of environments, virtual and physical (see Alvarado, Adams, Burbeck & Latta, 2001). These requirements made versatility an important design goal. However, we believe our approach may also enhance simpler, less general systems by making it easier to incorporate future improvements in animation techniques or robot technology that will make expressive systems more socially effective. The need for a research tool to explore user responses to the dynamics of facial expression motivated our creation of a stand-alone expressive system independent of the Joshua Blue simulation. All of the applications employing facial expression in computer-animated agents, avatars and robots depend upon a better understanding of human facial expression and human responses to the expressions of expressive figures. Designers, computer scientists and engineers make some attempt to test their creations. However, their studies are generally aimed at confirming the value of what has been done, not at testing competing approaches, much less testing theories or models of expressive behavior. A convenient tool for the study of the dynamics of expression, coupled with study of human responses to them, would enable us to expand the parameters of expression. Psychological Research on Facial Expressions Aside from a set of basic emotional expressions, thought to be universally recognized crossculturally (Ekman, 1992), little research attention in psychology has been focused upon investigating the characteristics of the stream of facial activity ongoing in everyday contexts, present with less intense

emotional and cognitive states (such as interest, boredom, or confusion), or indicative of moods or physical states such as anxiety, hunger, fatigue. Further, little attention has focused upon how felt emotion is expressed in facial activity during speech, much less the timing of emotion-related facial activity and speech-related mouth movement. Theorists have complained that too great an emphasis on studying basic emotional expressions has led to neglect of many other important aspects of facial activity, including the study of the dynamics of facial expression in naturalistic settings and in everyday contexts. These complaints are expressed most eloquently by the various contributors to Russell and Fernandez-Dols (1997) edited volume on the psychology of facial expression. This neglect of facial expression by psychologists has practical consequences for designers of expressive systems. For example, computer-animated figures enhance the effectiveness of interactive tasks such as computer-aided tutoring. Psychological expertise has been incorporated into the design of the instructional elements of such programs, but there is little research to guide the creation of the animated figures. Is it better if a computerized tutor smiles and shows encouragement or would such affect be interpreted in certain contexts as sarcastic or derisive, making a more neutral expression the better choice? With the advent of computer animated expressive characters in user interfaces of various kinds (such as computerized tutors, service agents, game participants, personal trainers, companions), the need to specify effective, believable facial behavior has become urgent. Current efforts to create believable agents have revealed the incompleteness of Ekman and Friesen’s (1978) six basic emotional expressions as a model of emotional facial expression. Designers of animated characters have been finding that these expressions, although highly recognizable, are frequently viewed as too extreme, too intense, or emotionally inappropriate for many routine interactive contexts. In many computer-generated applications, understanding the facial activity needed to convey friendliness, trustworthiness, interest, concern, and so on, is more important than conveying anger, fear, disgust, and the other basic emotions.

Further, while some implementation of swallowing, head turns and blinks has occurred (Cassell, Sullivan, Prevost & Churchill, 2000), researchers have not systematically studied the myriad small idiosyncratic facial movements that might make a figure appear natural and more life-like to observers. To date, research on facial expressions of emotion has focused largely upon: (1) demonstrating existence of a set of universal basic expressions recognizable cross-culturally, or challenging the existence of such expressions; (2) identifying new candidate universal expressions, c.f. Keltner’s studies of facial expressions of shame/embarrassment/guilt (Keltner & Buswell, 1996) or Reeve’s study of interest expressions (Reeve & Nix, 1997); (3) investigating voluntary control of expression in deception or comparing posed vs spontaneous expression; (4) relating facial expressions to underlying emotional experience (or appraisals or contextual antecedents) by correlating facial activity with other measures of emotion, such as self-report and autonomic response; (5) localizing recognition of basic emotional expressions to a particular area of the brain; or (6) identifying effects of cognitive impairment on ability to recognize and produce basic expressions. An additional large body of research uses facial expression as a dependent variable to illuminate various emotion-related phenomena. All of these categories involve the basic emotions approach, including the study of deception and display rules, which attempts to explain why basic emotions are not invariably observed to accompany emotional states. When dynamics have been studied, it has been to address different questions than those most important to designers of expressive systems. Because decoding studies have primarily used still posed expressions, research has focused upon the effects of posing expressions rather than spontaneously generating them. Although both posed and spontaneous expressions arise in both the lab and in real life, other critics continue to challenge the validity of use of posed stimuli in research, especially still photos. While differences between posed and spontaneous expressions have been noted, these have never been shown to affect the results of affect-recognition studies. Nevertheless, critics of studies conducted using

still photos continue to emphasize the existence of disparities between still and dynamic images. Such criticisms seem intuitively plausible given that we subjectively experience expressions as part of a stream of behavior in everyday life, not as still frames. On the other hand, vision researchers consider motion to be a perceptual construction, not something we directly experience through our visual system (Hoffman, 2000). The importance of dynamics to the decoding of affect remains an open question. In contrast, encoding studies emphasize methods of evoking affect and use the dynamics of the evoked expressions as a dependent variable. The measured dynamics typically consist of frequency, intensity, duration, and co-occurrence of muscle movements. Little effort has been devoted to understanding the dynamics of expression beyond the search for reliable expressive behavior that can be linked to emotion, personality, or psychopathology. In many cases, simple presence or absence of a set of specific muscle group movements characterizes a proposed expression associated with some phenomenon, such as pain or embarrassment (for examples, see Ekman and Rosenberg, 1997). Are there facial expressions beyond the small set of basic emotions already identified? What is the relation of facial activity to less emotional states? It is clear that the English lexicon considers some internal states to be less emotional than others (e.g., sleep, hunger compared to fear, anger). Ortony, Clore and Collins (1988) also report a distinction between cognitive states such as interest or confusion and emotions such as fear or anger. It is less clear whether such a distinction is relevant to facial expression. A review of the findings for pain expressions (Harris & Alvarado, in press) shows marked similarities with emotional expressions that make it worth considering what all such expressions may have in common. Kaiser, Wehrle & Schmidt (1998) suggest that a component-based approach (one that considers individual action units or muscle group movements as components linked to components of emotional experience or cognition) may reveal regularities associated with social intentions or intrapsychic regulation. They state: “We have to extend our theoretical models to include these different

functions…If regularities in the relation between facial expressions and cognitive evaluation processes can be established, we can apply them to emotional as well as nonemotional facial expressions.” This exploration is just beginning. Research explicitly investigating the dynamics of facial expression must be differentiated from research employing dynamic expressions as stimuli or dependent variable. Very little of the former has been published. Several methodological volumes have described use of film or video and analysis of facial behavior. Most recent are Ekman’s comments in the edited volume on analyzing spontaneous facial behavior using FACS, the Facial Action Coding System (Ekman and Rosenberg, 1997). Beyond this, as noted above, an important focus of previous research on dynamics of expression has been upon identifying differences between posed and spontaneous expressions. Given the dual innervation of the musculature of the face, interest has focused upon whether voluntary muscle control originating in the motor cortex produces observable differences compared to expressions originating in the limbic areas of the brain. Lateral asymmetries in onset/offset or extent of muscle contraction were found to be an indicator of deception or voluntary control of expression (Ekman, Hager, & Friesen, 1981; Frank, Ekman & Friesen, 1993), however, an analysis by Campbell challenges these earlier findings (1986). Even in studies of detection of deception, Hess and Kleck (1990) state that the extent of asymmetry (irregularity) of posed versus spontaneous expressions provided the primary cue for differentiating them. The dynamics (onset or offset speed) provided only a secondary cue. Further, while subjects are able to perceive lateral asymmetries in judgment tasks in the lab, no study has established that these asymmetries influence decoding in everyday situations. This implies that such asymmetries are unlikely to contribute to either decoding accuracy or realism of animated expressions. There is some evidence that access to the dynamics of facial expression helps subjects disambiguate degraded stimuli (or stimuli viewed under less than optimal conditions), see Wehrle,

Kaiser, Schmidt and Scherer (2000). However, there is little evidence that dynamics are used in the identification of basic emotional expressions under normal viewing conditions. Wallbott (1992) found that distortions of the temporal and spatial resolution of video stimuli had little effect on attribution of emotion, except when both types of distortion were combined and were severe. That temporal distortion (reduction of refresh rate in frames per second) by itself interfered little with such judgments suggests that the information conveyed by rate of movement is little used when interpreting emotional meaning in basic expressions, though it may be used for other types of judgments. Line drawings have been used in facial expression research for well over a century, beginning with sketches by the anatomist Bell, and later Piderit, in the mid-1800s (Woodworth, 1938). Considerable research using Piderit’s components was conducted in the 1920s and 1930s, as reviewed by Woodworth. It was concluded that no meaning was reliably interpreted in facial expression and this line of research was abandoned for several decades until Ekman and Izard, independent of each other, provided evidence that certain basic emotional expressions were reliably recognized cross-culturally. Since then, photos of the basic expressions have served as a standard for evaluating line drawings. Line drawings have been found to produce comparable results when they portray the appropriate action units. Due to the expense of animation, film and video were preferred as a presentation medium until the advent of computer animation. Massaro and colleagues have long been investigating use of animated figures to teach deaf individuals the mouth movements needed for speech. Their “talking head,” Baldi, simulates mouth movements needed to produce different syllables (Massaro, Cohen, Beskow & Cole, 2000). Ellison & Massaro (1997) used simple line drawings that manipulated eyebrow and mouth movements to test the fuzzy logic model of perception (FLMP). Etcoff and Magee (1992) investigated categorical perception of facial expressions of emotion using morphed line drawings of facial expressions and later replicated their findings using photos, producing comparable results (Calder,

Young, Perrett & Etcoff, 1996). Wehrle, Kaiser, Schmidt and Scherer (2000) used animated and still synthetic stimuli, line drawings consisting of combinations of facial components, testing the theory that such components are related to cognitive appraisals. With this manipulation, the recognition percentages for their static versions were generally less than 50%, much lower than in typical judgments studies of Ekman and Friesen’s static Pictures of Facial Affect. Dynamic presentation improved recognition of their images, but it is unclear what information the dynamic stimuli contributed to obtain the improvement. Spencer-Smith and colleagues (Spencer-Smith et al., 2001) have created a set of AUspecific male and female animated faces using Poser4 animation software. Models of facial expression linking component movements to internal states are unlikely to produce reliable interpretations of meaning by observers. Alvarado conducted a series of studies of the contribution of specific component movements (action units) to the decoding of meaning in facial expressions of emotion. This research involved computer editing of digital video images to create still photo stimuli with the needed characteristics, and computer presentation of stimuli in judgment paradigms measuring response times or labeling of stimuli. Results (Alvarado & Jameson, 2002) suggest that: (1) action units do not modify the meaning of a basic emotional expression except to make it more or less recognizable; (2) intensity of an emotional expression is most likely not conveyed by extent of muscle movement but by presence or absence of certain action units; (3) so-called blends are most likely interpreted as bad exemplars of basic emotions, not as mixed states described using specific verbal labels. These counter-intuitive findings are problematic for current approaches such as component theory, which seeks to link specific cognitive appraisals to presence of specific facial action units. Encoding studies may reveal that correlations do exist between action units and cognitive appraisals (e.g., Smith, 1989), but these associations do not appear to influence an observer’s decoding judgments, at least in Alvarado’s work to date.

Finally, we would like to note the importance of null findings to this endeavor. Given the intense competition for journal pages, publication of null findings has been given less priority than results establishing some theory-based argument. Given the difficulties establishing the handful of basic expressions, and the lack of success finding regularities for anything beyond them (reviewed by Alvarado and Jameson, 2002), we suspect that research on dynamics may have been similarly frustrating. Numerous theorists (Russell & Fernandez-Dols, 1997) propose possibilities related to dynamics that appear untested in the literature and yet may have been tested and abandoned as unpromising by previous researchers. In the absence of published tests, such assertions appear plausible and further appear to substantiate a purposeful neglect of the more naturalistic dynamic stimuli by those in the literature. We cannot know what results may be residing unpublished in psychologists’ file drawers. As a result of these difficulties, the only empirical guidance available to designers of expressive systems remains the basic emotions approach. Until human expressive behavior is better understood, the best approach to modeling facial behavior may be to directly study humans in the contexts in which robot or agent behavior may occur, and specify expressions comparable to those observed in people. This is an admittedly ad hoc approach, but one that seems viable as way to supplement Ekman’s basic facial expressions. Such study would enable designers to specify a model to guide spontaneous expressive behavior in different contexts, where one is not to be found in the psychological literature. The expressive system described here permits specification of ad hoc expressive behavior, as well as expressions derived from models, such as Ekman’s basic expressions.

Expressive System Overview The expressive system consists of the following components: (a) an emotion model; (b) an expressive behavior model; (c) a 3D facial avatar; (d) an emotional state display; (e) tools for editing emotional expressions and changing the internal state of the system. Each of these parts is described below, for the current implementation. The specifics of the avatar, emotion model and expressive behavior model can be changed with minimal impact on the remaining parts of the system, which make internal system values open to inspection and propagate changes between subsystems, all with fluid interactivity. The system is implemented in Squeak, a dynamic, open-source object system. Emotion Model The emotion model in the Joshua Blue system incorporates the two dimensions of affective experience, valence (appraisal as good or bad) and arousal (level of excitement), originally described by Osgood (1966) and later presented as an emotion model by Russell (1991; 1997). Since these dimensions are essentially abstractions derived from the dominant aspects of subjective experience, they are hypothesized to be present in any emotional phenomenon. For example, when multidimensional scaling is applied to similarity judgments of facial expressions of emotion, a first dimension of valence, account for the majority of the variance emerges, with a second dimension of arousal accounting for another 10-15% of the variance. Together these two dimensions account for 70-80% of the variance, depending on the stimuli used. Studies of other emotional self-report or expressive behavior or employing other data reduction techniques show similar results. Two dimensions do not permit differentiation of closely similar items. Higher dimensionality is needed for that. However, it seems likely that the disadvantage of the inability to capture such subtleties in the model may be offset by the practical advantages of using a less complex model. By using an abstraction that spans such a range of

affective phenomena, a variety of emotion models, from Ekman’s (1992) basic emotions to simple painversus-pleasure models, can be readily mapped into the same two-dimensional space. If reduction to two dimensions is undesirable, such as when finer distinctions are needed, multidimensional maps can be created in the same manner as was used to generate the simpler twodimensional system. Additional dimensions might model cognitive components such as expectancies or feelings of control, aspects of relationship such as the object of affect or display rules, or contextual contingencies. While this added dimensionality complicates visualization, the system is not limited to using 2-D pointers but can support a 3-D controller. An alternative approach is to specify higher dimensionality in a series of 2-D maps, each including different specified pairs of dimensions. In this implementation, regions of the affective space are partitioned by ten equal increments along each axis. Discrete emotions are mapped to the space, as shown in Figure 1. The terms shown in Figure 1 are descriptive and suggest in a general way what occurs facially in each region. The actual specification of facial movement is defined by the expressive behavior model, described below. INSERT FIGURE 1 ABOUT HERE.

Expressive Behavior Model Facial expressions for animating the avatar were specified using the definitions of action units corresponding to basic emotional states (Ekman and Friesen, 1978). Action units for additional emotional states were identified based on empirical studies by Snodgrass (1992) and Alvarado (1996). We determined contraction values for muscle groups in each expression using action units defined by the Facial Action Coding System (Ekman & Friesen, 1978). Intensities of muscle movement (expressed as slider position) were specified for each step in the two-dimensional affective space of the emotion

model. A separate map was defined for each muscle group. This specified a facial expression for each state possible in the Joshua Blue system’s emotion model. Basic expressions concurrent with extreme emotional arousal have been tested for reliable interpretation by Ekman and Friesen. To the extent that our versions contain the same action units and thus conform to the specified basic expressions, we expect to obtain similar results in observer judgment studies. Because no model exists for facial activity corresponding to less intense emotional states, those expressions corresponding to states not defined as basic emotions must be tested for interpretability by human observers. An advantage of this model is that specific muscles, in either side of the face, can be modified independently within the muscle maps in order to fine-tune the presented expressions. Facial Avatar The current expressive display employs an animated facial avatar, rendered from a threedimensional mesh of external facial skin points deformed under the influence of underlying musculature, as described by Waters (1987) (see Figure 2). We adapted a Smalltalk implementation of Waters’ avatar (Notarfrancesco, 2001) for use as an expressive display driven by the affective states of the Joshua Blue system. A user may manually modify the tensions of 12 major facial muscle groups, by opening and manipulating sliders. While not anatomically precise, this simplified muscle model permits simulation of reasonably believable affective facial expressions. It is the first of three main elements in the expressive system’s user interface, the other two being the emotional state display and modification tools. Emotional State Display The user may view and manipulate the procession of Joshua Blue’s emotional state through the affect space over time with a specialized monitor. This monitor samples Joshua Blue’s current emotional

state at a user definable rate, and displays it as a color trace in the two-dimensional space. Previously displayed samples persist on the display, but change color over time as they age. Over time, the monitor accumulates a visual record of the changes in Joshua Blue’s emotional state. The monitor provides transport controls, so that the user may see the expression corresponding to a previous emotional state, and control the sampling of subsequent states. When the user clicks on a region in the space, the monitor highlights that point in outline, and the avatar takes on the expression corresponding to the associated emotional state. The internal state of the Joshua Blue system can also be changed to a specific combination of valence and arousal values using this tool. A slider below the affect space display provides random access to particular points in the trace’s history. Finally, buttons at the bottom of the monitor provide traditional playback control, toggle the recording of new samples, and enable storage and loading of traces. Editing Tools In addition to rendering Joshua Blue’s current emotional state as a facial expression, the avatar provides access to tools for editing muscular responses. As the user moves the pointer over the avatar, the system indicates the muscle groups in the vicinity, and displays their names. The user may then select a particular muscle group, which displays its current contraction value along with a slider showing the range of possible contraction values. Having selected a muscle group, the user may also open an editor on the mapping between affect and muscle contraction. This editor shows a particular muscle group’s contraction response for each point in the affect space, indicated by color. The user may rapidly specify the response for many points by selecting a contraction value from a palette and painting its associated color in the affect space. Direct manipulation of the muscle in the avatar also selects contraction values in the muscle

editor; the user is free to interpret the palette selection as a number or as a visible expressive cue. The user may open any number of muscle editors concurrently, supporting quick composition of complex facial expressions over the affect space. Research Expressive System For research purposes, the avatar was modified to operate independently of the remaining Joshua Blue system. As in the original expressive system, the user can manually modify the tensions of 12 major facial muscle groups, by opening and manipulating sliders. Left and right sides of the face may be specified independently. Precise timing is easily accomplished (to a hundredth of a second) of onset, offset, intensity, and transitions to create, store, and replay streams of behavior of any length. Because images are generated from a model rather than from stored image files, there is no practical limit to the length of a specified behavior stream. While not anatomically precise, this simplified muscle model permits simulation of reasonably realistic affective and non-affective facial expressions. Further, with a limited amount of programming the model can be changed to expand the muscle groups represented, if it appears that the initial set are inadequate. The groups already implemented exceed those available in many other animated stimulus sets, including AU 6 and 7, and AU 20. Mouth movements are incomplete (no AU 23 or 24) but can be improved as needed. Although originally developed as part of an IBM project, the avatar itself comes from the open-source community and thus can be offered to other researchers. The research advantages of this approach are: (1) precise stimulus control of dynamics; (2) fast and easy development of more realistic or believable ad hoc model-driven animated facial behavior; and (3) availability of a repeatable method for testing facial dynamics.

Squeak Implementation Benefits

The Squeak Smalltalk system (see Ingalls, Kaehler, Maloney, Wallace & Kay, 1997) affords us a great deal of flexibility on several fronts. Squeak’s message-oriented organization allows us to establish a strong separation of concerns, such that interdependencies between major functional parts are few. As a result, it is straightforward for us to change the emotion model, expressive behavior, and human interface subsystems independently. Since Squeak is a “late-binding” system, in which objects’ message responses are determined at runtime, we may make such changes during operation. This allows us to incorporate new insights while continuing to monitor expressions and record emotional states. The Squeak architecture is completely open. This gives us the ability to make fundamental operational changes, such as might be required when profiling performance. It also permits complete understanding of the system, both by us and by other researchers, and promotes a broad consistency throughout the system. Largely due to its openness, Squeak has an active worldwide development community. Several contributions from this community have saved us a great deal of development time, and we have been able to contribute significant works in return. We have noticed most of these features in other systems, but only in Squeak have we found them in the same system. Conclusion While the relationship between cognition and expression is not yet well understood, we believe that many useful elements for its pursuit are in hand. We have created an environment for fluid, interactive evaluation of these elements, which we hope will yield new insights into their composition and development.

References Alvarado, N. (1996). Congruence of Meaning Between Facial Expressions of Emotion and Selected Emotion Terms. Motivation and Emotion, 20, 33-61.

Alvarado, N., Adams, S. S., Burbeck, S. & Latta, C. (2001). Integrating Emotion and Motivation into Intelligent Systems. Unpublished manuscript, IBM, T.J. Watson Research Center. Alvarado, N. & Harris, C. (2001). Facial Expression and Coping Style. Manuscript in submission. Alvarado, N. & Jameson, K. (2002). Varieties of anger: The relation between emotion terms and components of anger expressions. Motivation and Emotion, 26, 153-182. Breazeal, C. 2001. Designing Sociable Machines. The MIT Press. Calder, A., Young, A., Perrett, D. & Etcoff, N. (1996). Categorical perception of morphed facial expressions. Visual Cognition, 3, 81-117. Campbell, R. (1986). “Asymmetries of facial action: Some facts and fancies of normal face movement. In R. Bruyer (Ed.), The neuropsychology of face perception and facial expression (pp. 247267). Hillsdale, NJ: Lawrence Erlbaum Associates. Cassell, J., Sullivan, J., Prevost, S. & Churchill, J. (2000). Embodied conversational agents. Cambridge, MA: MIT Press. Ekman, P. (1992). An Argument for Basic Emotions. Cognition & Emotion, 6, 169-200. Ekman, P. & Friesen, W. (1978). Facial Action Coding System (FACS). Consulting Psychologists Press. Ekman, P., Hager, J. & Friesen, W. (1981). The symmetry of emotional and deliberate facial actions. Psychophysiology, 18, 101-106.

Ekman, P. & Rosenberg, E. (1997). What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). New York, NY: Oxford University Press.

Ellison, J. & Massaro, D. (1997). Featural evaluation, integration, and judgment of facial affect. Journal of Experimental Psychology: Human Perception and Performance, 23, 213-225. Etcoff, N. & Magee, J. (1992). Categorical perception of facial expressions. Cognition, 44, 227-240. Frank, M.G., Ekman, P. & Friesen, W. (1993). Behavioral markers and recognizability of the smile of enjoyment. Journal of Personality and Social Psychology, 64, 83-93. Harris, C. & Alvarado, N. (In Press). Pain facial expression: Individual variability undermines the specific adaptationist account. Commentary on Williams, Facial expression of pain: An evolutionary account. Behavioral and Brain Sciences. Hess, U. & Kleck, R. (1990). Differentiating emotion elicited and deliberate emotional facial expressions. European Journal of Social Psychology, 20, 369-385. Hoffman, D. (2000). Visual intelligence: How we create what we see. New York, NY: W.W. Norton & Co. Ingalls, D., Kaehler, T. Maloney, J., Wallace, S., Kay, A. (1997). Back to the Future: The Story of Squeak, a Practical Smalltalk Written in Itself. Proceedings of the 1997 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, 318-326. Kaiser, S., Wehrle, T. & Schmidt, S. (1998). Emotional episodes, facial expressions, and reported feelings in human-computer interactions. In A. H. Fischer (Ed.), Proceedings of the Xth Conference of the International Society for Research on Emotions (pp. 82-86). Wurzburg: ISRE Publications. Keltner, D. & Buswell, B. (1996). Evidence for the distinctness of embarrassment, shame, and guilt: A study of recalled antecedents and facial expressions of emotion. Cognition and Emotion, 10, 155-171.

Massaro, D., Cohen, M., Beskow, J. & Cole, R. (2000). Developing and evaluating conversational agents. In J. Cassell, J. Sullivan, S. Prevost, & J. Churchill (Eds), Embodied conversational agents (pp. 287-318). Cambridge, MA: MIT Press. Notarfrancesco, L. (2001). An adaptation of the Waters facial animation system. Unpublished source code. Ortony, A., Clore, G. & Collins, A. (1988). The cognitive structure of emotions. New York: Cambridge University Press. Osgood, C.E. (1966). Dimensionality of the Semantic Space for Communication Via Facial Expression. Scandinavian Journal of Psychology, 7, 1-30. Paradiso, A. & Abbate, M. (2001). A model for the generation and combination of emotional expressions. In Multimodal Communication and Context in Embodied Agents. Proceedings of the AA*01 Workshop 7 at the 5th International Conference on Autonomous Agents (AA’01), Montreal. The COGITO Project. Parke, F. & Waters, K. (1996). Computer facial animation. Wellesley, MA: A.K. Peters. Reeve, J. & Nix, G. (1997). Expressing intrinsic motivation through acts of exploration and facial displays of interest. Motivation and Emotion, 21, 237-250. Russell, J. (1991). The contempt expression and the relativity thesis. Motivation and Emotion, 15, 149-168. Russell, J. (1997). “Reading emotions from and into faces: Resurrecting a dimensionalcontextual perspective.” In J. Russell & J. Fernandez-Dols (Eds.), The psychology of facial expression (pp. 295-320). Cambridge, UK: Cambridge University Press.

Russell, J. & Fernandez-Dols, J. (1997). The psychology of facial expression. Cambridge, UK: Cambridge University Press. Ruttkay, Z. & Noot, H. (2001). FESINC: Facial expression sculpturing with Interval constraints. In Multimodal Communication and Context in Embodied Agents. Proceedings of the AA*01 Workshop 7 at the 5th International Conference on Autonomous Agents (AA’01), Montreal. Smith, C. (1989). Dimensions of appraisal and physiological response in emotion. Journal of Personality and Social Psychology, 56, 339-353. Snodgrass, J. (1992). Judgment of Feeling States From Facial Behavior. Reported by Russell (1997). Unpublished doctoral dissertation, University of British Columbia. Spencer-Smith, J., Wild, H., Innes-Ker, A., Townsend, J., Duffy, C., Edwards, C., Ervin, K., Paik, J. & Prause, N. (2001). Making faces: Creating 3-dimensional, ecologically-motivated poseable expressions. Behavior Research Methods, Instruments and Computers, 33, 115-123. Wallbott, H. (1992). Effects of distortion of spatial and temporal resolution of video stimuli on emotion attributions. Journal of Nonverbal Behavior, 16, 5-20. Waters, K. (1987). A muscle model for animating three-dimensional facial expressions. IEEE Computer Graphics (SIGGRAPH'87), 21(4). Wehrle, T., Kaiser, S., Schmidt, S., & Scherer, K. (2000). Studying the dynamics of emotional expression using synthesized facial muscle movements. Journal of Personality and Social Psychology, 78, 105-119.

Suggest Documents