Motivation Driven Learning for Interactive Synthetic ... - CiteSeerX

Motivation Driven Learning for Interactive Synthetic Characters Song-Yee Yoon

Bruce M. Blumberg

Gerald E. Schneider

Synthetic Characters Group Media Laboratory & Department of Brain and Cognitive Sciences

Synthetic Characters Group Media Laboratory

Department of Brain and Cognitive Sciences

MIT E15-306 20 Ames St. Cambridge, MA 02139 +1.617.253.5917 [email protected]

MIT E15-306 20 Ames St. Cambridge, MA 02139 +1.617.253.9832 [email protected]

MIT E25-634 45 Carleston St. Cambridge, MA 02139 +1.617.253.5795 [email protected]

That is, they must be able to infer a character’s beliefs and desires through its observed actions, and the quality of those actions. Fundamentally then, a character’s actions must be driven by its beliefs and desires.

ABSTRACT Adaptation capability and a transparent motivation system greatly aid real time interactions between humans and synthetic characters. These components enhance the life-like impression that the characters make, and enable comfortable communication between the characters and human participants. We extended the behavioral action selection system of Blumberg[2] and Kline[14] with these needs in mind, and developed a creature kernel that enables the designing of a character with communicative motivational and emotional states, and learning abilities based on feedback from the motivation system. In this paper, we introduce this new approach to character design, and how various learning algorithms have been incorporated within this framework. The main characters for an interactive installation, (void*): A cast of characters, have been created using this developed creature kernel. We describe results with examples of alteration of attitudes, learning of concepts, and formation of emotional reactions to locations based on experience.

Keywords Motivation, affect, emotion, learning, adaptation, interactive synthetic character

1. INTRODUCTION Synthetic characters are 3D virtual creatures that are intelligent enough to do and express the right things in a particular situation or scenario [14]. The research goal of our group is to create lifelike synthetic characters that can interact with human participants in real time. This setting forces the characters to face the following challenges, among others: •

Intentionality[6]: Human participants can comfortably interact with interactive characters only when they feel that they can understand what is going on in the characters’ minds.

•

Adaptability: Real time interaction with human participants brings a strong measure of unpredictability to the virtual world. Not every situation can be predicted at the character design stage. So, it is very difficult to make characters behave in an intelligent manner solely based on the designer' s comprehensive thought at the development stage. In addition, failures to show even a very primitive level of intelligent behavior damages the life-like impression made by the characters. For example, it is hard to feel sympathy for a character that keeps approaching a participant who has been punishing the character at every instance of interaction, rather than avoiding such a participant. In other words, adaptability is crucial for it to "survive" as a character that can interact with humans in a compelling manner, where the details of interactions are not pre-scripted.

These two problems that synthetic characters face are similar to those real animals face in nature. With only limited or no language ability, they have to meet the need of communicating with each other for coordinated group behaviors, so an individual can make its need be known to other members of the species, etc. Also, animals are born with a limited set of built-in skills and survival strategies [15], and to survive, they need to adapt to the particular world that they are born into, and to the kinds of interactions they face throughout their life time, where every detail could not have been completely predicted before their birth. Animals deal with these problems through adaptation, i.e., various types of learning. Adaptation refers to a mechanism through which they can modify their innate tendencies and behavioral preferences to be more adaptive to the world. The motivation system naturally gives rise to reinforcement signals for learning, which is crucial for adaptability. The motivation system, which forms an intrinsic reward mechanism for them, includes drives and affect. Animals learn to avoid objects that evoked fear or pain in the past, and try to return to the situations that brought pleasure and joy. Also, emotional states and needs are conveyed as modulators of facial expressions, gestures and vocalization to let observing animals

understand an individual’s intention. Such expressions form the basic means of communication for animals. Taking this idea, we implemented a creature kernel, extending the Blumberg[2] and Kline-et-al[14] behavior-based character design approach. The new kernel more fully incorporates the motivation system and incorporates a learning mechanism that uses the output of the motivation system as its reinforcement signal. This learning mechanism is the topic of this paper. We present the model and learning algorithms in detail, and show how this was implemented in (void*): A cast of characters, which is an interactive installation presented at SIGGRAPH 99, Emerging Technologies session. This paper concludes with discussions of current methods and suggestions for further development.

2. RELATED WORK There has been a series of efforts for making artifacts with their own motivations and emotions, which led to impressive robotic or software agents that show sympathetic mind[25,8]. Breazeal[4] built a robot called Kismet that can express nine emotions - anger, surprise, fear, happiness, calm, interest, tired, disgust and sadness. In a social situation - interaction with a human user - Kismet displays one of those nine emotional states through its facial expressions. Duration and ’intensity’ of certain types of interactions that the designer had in mind are the main factors that affect its drive states, which in turn, are used for selecting one of the nine emotional states as the primary emotion of that moment to be expressed through its face. Though Kismet expresses emotion which reflects the nature of its interaction with the human user, the system is designed within the framework of a kind of reflex model such that Kismet does not learn to try out a different strategy even if a certain situation causes it to undergo pain. While Breazeal’s robot design is focused on feed forward operation of motivation i.e. drives and emotional states influence behavior selection and facial expression - Velasquez’s robot, Yuppy, was designed more with a feed backward operation of emotion. Previous emotional experiences are fed back to the behavior system and influence future action selection strategy in the same or similar situations[25]. Within a behavior engine similar to that of Breazeal’s, his robot keeps forming emotional memory, which affects behavioral attitude when it reencounters an object with associated emotional memory. However, since Yuppy does not have any generalization capability from the past experience, it does not know how to deal with objects or situations whose features whether they cause pleasure or pain to the robot - were not prespecified by the designer, and thus it does not show emotional response to a novel object or situation. Bates led the OZ project[1,16], which paid a lot of attention to emotional aspects of synthetic actors, called Woggles. Individual Woggles had specific habits and interests or baseline emotional states, which are shown as different personalities for different Woggles. However, designed with application to interactive cinema in mind, which is different from the goal of making an adaptive animat, emotion, as well as the rest of the components of a motivation system, functions more like a set of rules for determining each Woggle’s specific ways of reacting to events or objects. Blumberg[3] implemented an action selection method, and incorporated learning based on feedback from the drive system. The work demonstrated a way of understanding and implementing

classical and operant conditioning within the behavior architecture. But this implementation did not fully explore the affect (emotional feeling) part of the motivation system as a provider of reinforcers for learning, and thus was restricted to certain types of behavioral adaptation.

3. A SYNTHETIC CHARACTER We developed our learning model with the purpose of making it useful for characters themselves as well as for the designers. As discussed above, real time interaction brings in unpredictable situations where characters still need to show appropriate emotional responses. But it is very difficult for designers to predict all possibilities, and carefully specify appropriate reactions. By incorporating a model of the motivation system, and having it interact with other parts of the creature kernel, as opposed to having it function as a behavior modifier which comes in at the last instant, we developed a creature kernel model that can deal with more general types of learning from motivational feedback. It can also incorporate higher cognitive-level influences on behavioral strategy updates at the same time as it utilizes the motivational inputs as ’adverbs,’ which influence the way each behavior is performed. Several modalities of motivational learning interact with each other to bring generalization ability that results in a biasing of behaviors appropriately according to past experiences. We briefly introduce our character design paradigm in this section and then continue with the learning algorithms.

3.1 Organization of the creature kernel The creature kernel of our characters is responsible for deciding what to do and how to do it. A creature kernel is composed of four main parts - perception, motivation, behavior and motor systems. These four components need to be well coordinated for a character to present as a functioning whole. The perception, behavior and motivation systems are organized as a network of basis units. For example, the behavior system is modeled as a hierarchically connected network of behavior units as in Tinbergen’s behavior model [24]. The number of units in each network is decided both by ontogeny and phylogeny of the character. Units in a network interact to excite or inhibit each other in a manner similar to that implemented by Blumberg [2]. Influence from one unit to another depends on the strength of the connection, which ranges between 0 and 1, as well as on its own activation level. The perception system refers to the system of sensors that is used to extract what is going on inside and outside of the character. For characters in a physical world, the perception system includes visual, auditory or tactile sensors. Similarly, a character in a virtual world use virtual sensors to extract visual features, such as color or shape of other objects, or smells which return olfactory features of other objects in the world to the character. The motivation system is composed of affect and drives. Depending on personalities and what kind of species it is, a character has a different set of drives with different parameters. Affect refers to the state of feelings, associated with emotion, mood, etc. The behavior system performs action selection. That is, it chooses the most relevant behavior to perform given relevant perception and motivation input. The behavior system in turn sends a signal to the motor system, which is a set of action primitives and available motor skills, indicating which action should be performed and how it should be performed. This ’how’ parameter comes from the motivation system. For example, when the behavior system

sends a ’walk’ signal to the motor system to activate a walk skill, it also indicates how happily or sadly it should walk based on its current emotional state. In our system, where characters are 3D animated creatures, the motor system is composed of a set of labeled example verbs, where the label indicates where the verb lives in adverb space. At runtime, the motor system uses multitarget motion interpolation to interpolate among examples to arrive at motion that corresponds to the current adverb setting [20,13]. The motivation system provides the intra-kernel common currency in other ways as well, such as biasing action selection and sending adjective parameters to facial animations in this fourcomponent creature kernel framework.

chicken coop, which was induced from hunger drive, which is located at one level higher in the drive network. The affect system forms the other half of the motivation system. Affect can be roughly categorized into three kinds depending on their sources, as shown in the figure 2: they are affect associated with drives due to the character’s internal state, affect associated with appetitive or consummatory behavior or social signaling, and affect associated with sensory-induced drives, all of which are implemented using the same kinds of building blocks in our system.

Figure 1 shows the schematic diagram of the creature kernel. Solid arrows represent information flow between components. Any loop made by a succession of arrows represents a possible sequence, e.g., from creature’s perception of the world, to internal information processing and then to exertion on the outside world.

Figure 2. Components of the motivation system, which consists of the drive system and the affect system, each of which consists of a corresponding sub-network. Parts of the behavior system that have intimate connections to the motivation system are shown as well.

Figure 1. A schematic diagram of a situated character. Arrows represent information flow among systems that bind this character as a whole. Well-coordinated communication among the four systems -- perception, motivation, behavior and motor -- is required for a character to successfully function in a dynamic world.

3.2 The motivation system As mentioned previously, the motivation system is composed of two parts: a drive system and an affect system. The drive system includes drives depending on internal states and sensory induced drives. The drive system is organized as a semi-hierarchical network, and the connections are also modifiable. Each creature is born with species-specific drives, and over time, drive units can form modified connections, or even new units, so that the individual creature can gain specific interests and desires. For example, in the SWAMPED!1 World, a raccoon is born with three character-specific drives: curiosity, hunger and dislike of chicken. Based on the past interaction with a chicken, it may change its attitude toward chicken either to hate the chicken even more if the chicken has been irritating it too much, or tolerate it better if it has been quiet for a long enough time. This type of change can be implemented as a shift in relative significance of drives. And even if it was not born with a specific drive to go to the chicken coop, if it found eggs many times there it might develop a desire to go to the

1

http://characters.www.media.mit.edu/groups/characters/swampe d/

The highest level of the affect system is composed of three basis units, which represent valence, stance and arousal. Valence is the state of goodness (good-bad axis), stance is the tendency to approach an object (approach-avoidance axis) and arousal is intensity of the change of affective state, i.e., the first derivative of the change in the intensity of the affective state. Outputs from these units can be combined and observed as less rudimentary affective states, such as one of six primary emotional states [4,7,21]. Like the drive system, components of the affect system are connected together in a hierarchical manner. Within this affect network, nodes located at the higher level correspond to what we usually call "mood", which provides underlying bias to the character’s operation (Figure 2) whereas ones at the lower level correspond to more object or event specific affects and apply to specific objects or events.

4. MOTIVATION DRIVEN LEARNING The motivation system provides a reinforcement signal for learning. Driven by its innate affect and drives, a character behaves in such a way as to satisfy its daily needs. Emotion, with its associated "affective tag2", biases a character to behave in an adaptive manner by providing "values" to measure the level of goodness of its behavior [18]. Imbalance in the drive system hints to the character through the affect system what he should do next or what is the most needed behavior.

2

Affective tags are learned or generalized affective attitudes toward certain objects or events. For example, a character who was hit by a red car in the past might not choose a red umbrella because the color red reminds him of a bad memory. In this case, we say that the color red has an affective tag with negative stance value. This can be caused as a result of generalization from an innate attitude toward objects with certain properties. We describe its implementation in section 4.3.

Likewise, the motivation system provides feedback indicating the plausibility of the behavior that the character has performed[11,26]. Successes (Failures) at performing consummatory behaviors, that would lead to reduction (increase) in certain drives and motivations or experience of happiness or joy (sadness or fear), function as positive (negative) reinforcers for a character’s learning, which in turn make the character more (less) likely to trigger the behavior just aroused[17]. Based on this reinforcement signal from the motivation system, three different kinds of learning happen in our system. They are organizational learning, concept learning and affective tag formation.

4.1 Organizational learning Organizational learning refers to the updates in any of the networks that are part of the creature kernel. Each network is composed of nodes and weights. Organizational learning includes updating weights of the connections as well as changes in structure of the network. We call these two types of modification preference learning and strategy learning, respectively. For example, suppose there is a part of the behavior network that represents a ’going into house’ behavior. Next to the ’approaching the house’ node, there may be several nodes such as ’taking the front door’, ’taking the back door’ and ’going through the window.’ With no special preferences for any of these methods, the character may decide to go into the house through the window. After being yelled at for this behavior, he will reduce the tendency to choose to do so and the ’going through the window’ behavior becomes less likely to be chosen the next time the character approaches the house close enough to choose a method of entry. This is implemented as a decrease in connection weight from the ’approaching the house’ parent behavior node to ’going through the window’ child behavior node, and is called preference learning. Or, he may break his leg while going into the house and drop the connection weight to zero, in which case he would not consider going through the window any more. On the other hand, if he finds out that he can go down the chimney, he may add a new behavior node ’going down the chimney’ and consider activating that node the next time he arrives at the house. These deletions or additions of connections bring changes in the structure of the behavior network, and this is called strategy learning. Now we explain how this organizational learning has been implemented in detail.

4.1.1 Groups The behavior network is composed of behaviors, behavior groups and connections among them. ’Behavior group’ refers to a set of mutually exclusive behaviors, and activation of a behavior group is always preceded by the activation of its parent behavior [2]. Thus, in the previous example the various alternative ways of getting into the house would be contained within one behavior group. The activation level of each behavior is determined as a function of the creature’s internal state, releasing stimuli, level of interest and inhibitory gain combined with the level of preference from the group’s parent behavior to each of the behaviors in the group[2,14]. This preference is proportional to the expected reward from performing each behavior. This reward comes through the motivation system, which, in turn, is represented by stance and valence values. In other words, expected reward from behavior i, E ( Ri ) , is calculated as the weighted sum of expected valence,

E (vi) , and expected stance, E (si ) , that come from performing the behavior. Assuming wvi and wsi are mixing weights which represent the behavior-specific relative importance of valence and stance, respectively, the expected reward is represented as Equation (1). E ( Ri ) = wvi E (vi ) + w si E ( si )

(1)

Here, we chose linear summation for combining these two values for simplicity. Note, in our system the weights, e.g., wvi and wsi are set by the designer, and it is the expected reward, valence and stance that are learned.

4.1.2 Bayesian Inference and Learning This preference level from the parent behavior to a certain child behavior is updated after each child behavior’s activation following the activation of the parent behavior. Let E (vi , t ) represent the expected valence of child behavior i at time t. The updated value E (vi , (t + 1)) is computed as[9]

I ⋅ vi N +α + E (vi , (t + 1)) = E (vi , t ) ⋅ N +α +1 N +α +1

(2)

after the child behavior i is activated. Here, I is the arousal level of the affect and vi is the actual valence that the creature just experienced as the result of the activation. α is the accumulated number of activations of the behavior group which this child behavior i belongs to, and N is a constant related to the confidence of the initial EV i value. If behavior j refers to another child behavior that belongs to the same behavior group, the expected valence for behavior j is updated as:

E (v j , (t + 1)) = E (v j , t ) ⋅

N +α . N +α +1

(3)

Expected stance value is calculated using a similar update rule and these two are combined to update the expected reward for the behavior. Updated expected rewards are normalized within the behavior group and it becomes the preference given to each behavior from the parent behavior.

4.2 Concept learning By concept, we mean each character’s attitude to or belief about certain objects or events, which influences his behavior when he encounters such objects or events. For example, if there is a character who has a concept that tigers are scary, he will behave cautiously and will try to run away from it when he meets a tiger, whereas if there is another character who has a concept that animals are fun to play with, the best guess that he can make when he meets a tiger for the first time is that it is going to be fun to play with, and he will try to approach the tiger. We assume that each character is “born” with some built-in concepts. Another assumption is that there is a set of features that each character cares about when he judges whether a certain concept is right or wrong with respect to a given object or event. For example, assume that a character uses size and brightness of objects as reliable features for judging the nature of the objects (e.g., scariness). This is graphically represented in figure 3(a). Gray area is where he believes size and brightness features of scary animals are located. So this shows that initially his belief is that all

animals are scary no matter how small it is or how bright the animal’s color is. Given this, at the first instance of meeting an animal (e=1), he will behave under an assumption that the animal will be scary, following MDL (Minimal Description Length) principle. Assume that he met a white cat and behaved very cautiously. After figuring out that the white cat is fun and not scary, he will narrow his concept down and start thinking that only dark animals are scary. Then he meets a little black mouse (e=2) with an assumption that the mouse will be scary because it is a dark animal. Then if he again figures out that the mouse is not scary, he will update his concept again to include only dark and large animals in the category of scary animals and apply this assumption when he meets an animal next time (e=3).

(a)

(b)

(c)

Figure 3. Assume a character whose initial belief is that all animals are scary no matter how small they are or how bright their fur colors are (a). As he interacts with animals that break this current belief, he updates his concept to reflect his own experience. This is called concept learning and implemented using Bayesian belief update and MDL principle. See text for details. This process of belief update is called concept learning, and implemented as part of the creature kernel learning system as follows. Each object is represented as a vector whose dimension equals the number of features that the character cares about. And each concept is coded as a set of vectors that belong to the concept. So, assume that for a concept C, the character has example vectors {x i} that all belong to a concept C, and each x i is a vector whose dimension equals the number of features that the character cares about. When he encounters a new object y, he thinks that y belongs to the concept C with probability

P( y ∈ C | {x i}) =

1 . d n −1 (1 + ) R

(4)

Here n is the dimension of the vector, and R is the farthest distance between two example data that belong to the set {x i} along each dimension. And d represents the shortest distance between y and a datum that belongs to the example set along the feature axis. This measure is multiplied for every dimension to calculate the probability of y belonging to the same concept as the set {x i} represents. So, if all features of y fall in the area bounded by the example features, the creature will behave with the strong belief that y belongs to the concept C, whereas if d is large for every dimension, his belief in y’s belonging to the concept C will be low. After interaction with y, if it was discovered that y belongs to the concept C even if P( y ∈ C | {x i}) was not 1, the character will add this new datum y to the concept C representing set, and update the concept. As examples are added and R for a certain feature axis gets larger, that axis becomes less and less useful in

describing the concept[23]. This learning rule is also used for representing and updating beliefs in certain events, situations or behaviors.

4.3 Affective tag formation One of the primary roles of the motivation system is to offer a very efficient way of making quick decisions, through a mechanism that was called "Somatic Marker" by Damasio [5]. Here we introduce a more general concept, Affective Tag, in order to avoid the unproven hypothesis that such a mechanism is always associated with a peripheral state of the body. Affective tag provides bias to action selection in a form of emotional memory. Even when there is no other strong cue to prefer one way versus the other, affective tag can still function as a bias to make the character feel negative stance to a red umbrella even though he has never seen it before, just because he had a bad red car accident. Or it influences the character to decide to make an appointment to be on Thursday as opposed to Wednesday even if there isn’t any clear advantage of having the meeting on Thursday, because his joyful experience of last Thursday biases him to have a positive attitude toward Thursday. In our system, this functionality is implemented as Hebbian connections[10] between the units in the motivation system and objects of interest, so the learning rule can be written as ∆W ij∝ Oi ⋅ M j , which is updated whenever an object Oi is involved in behavior and got motivational feedback M j as the result of the behavior. So, for example, when a certain behavior selection is made, the motivation unit provides bias to behaviors differently based on objects of interest that each behavior is likely to engage. Though the influence is small, positive feedback resulting from the mutual inhibition mechanism biases behaviors with stronger affective tag inputs.

5. IMPLEMENTATION Using the creature kernel with the motivation system and learning capability described above, we implemented three main characters for an interactive installation, (void*): a cast of characters, which provided a real time interaction experience to several hundred people at SIGGRAPH 993. In this section, we introduce the type of interaction and characters we presented in (void*). Then we continue with an explanation of how each subpart of the character kernel and learning ability were realized through these characters.

5.1 (void*): a cast of characters (Void*): a cast of characters is an installation implementing a diner setting (Figure 4). There are three main characters - Trucker, Dude and Salesman - and each of them can interact with a human user through instrumented “buns-and-forks”. A human participant can "possess" a character by putting the buns-and-forks on one of three plates between her and the screen; each plate represents one of the three characters. This results in a "POSSESSED" signal being sent to the corresponding character. As an acknowledgement of the possession, the character gets up and walks toward the center of the diner hall. From this point on, the character's behavior system receives a strong influence from the interface until it is unpossessed. 3

http://www.siggraph.org/s99/conference/etech/projects.html

5.2 Interaction At the beginning, characters have different attitudes toward dancing and thus towards possession. Trucker represents a personality with much anger. He thinks it is irritating to be possessed and to be forced to dance. Salesman is a nervous and timid character. He is frightened by possession and dances because he is forced to do so by an authority, but does not quite enjoy it. Dude likes dancing and thus he enjoys being possessed.

itself, and its attitude toward dancing is updated so it may come to differ from the character’s original attitude. The current attitude toward possession and dancing is indicated by its emotional expression. Even if the character is doing the same LEG_LIFT following the human participant’s one bun lifting signal, it could do it full of joy or with resistance or hesitance depending on whether the character is enjoying the dance or experiencing too much pain doing it. It may reject being possessed and walk out of the dining hall when the human participant is causing too much pain. If it receives acknowledgement of un-possession, it gains the freedom to act according to its own will. It may go back to its seat or keep dancing. When another character is possessed, it pays attention to the possessed character’s behavior and watches that character’s dancing and reactions to the human participant. It expresses cheer or sympathy depending on how the possessed character is doing.

5.3 Creature kernel

(a)

Within the creature kernel framework, these interactions are implemented as follows. A character becomes aware of its possession through its perception system. Even though its motivation to sit at the counter is much higher than to walk toward the middle of the diner, direct influence from the interface through the perception system to the behavior system forces it to move toward the center. While it is possessed, its behavior system’s action selection mechanism is dominated by the input from the interface unless it gets so extremely upset that it decides to reject being possessed and walk out of the dining hall. While it is possessed, since the influence from the participant dominates the activation of the behavior system, it performs the type of dance the participant wants it to do. While its behavior system is sending externally driven input to the motor system, the motivation system sends signals originating within the character itself to the motor system. As a result, the quality of the motion is modulated by the character’s emotional state. Other interactions are implemented in a similar way.

5.4 Learning

(b) Figure 4. (Void*): A cast of characters implements a diner setting. (a) There are three distinctive characters sitting at the counter. (b) A human participant can come in and possess one of the three characters using a wireless interface mimicking buns-and-forks from Charlie Chaplin’s movie Gold Rush. From then on, he can force the character to dance in a certain way by wiggling the interface in the way he wishes the legs of the possessed character to move. Characters respond to this possession and forced dancing differently, based on each character’s personality, past experience and motivation. Based on how the human participant controls the interface, the possessed character may have fun dancing a certain type of dance that it likes, or it may keep falling down on the floor and have a painful time. This emotional experience is fed back to the character

At every tick, a character undergoes two modes of operation: first, forward mode operation, and then backward mode operation. Forward mode operation is the behavior system’s making a decision to perform a certain behavior and executing it through the motor system based on the currently available best information from its perception, motivation and behavior systems. Immediately after that, backward mode operation follows which corresponds to an update in the structure and connection weights of the networks that form the perception, motivation and behavior systems and connections between those systems. The update is perceived as learning and it is achieved in the way described in the previous sections. Three main types of learning go on in parallel with the interactions, demonstrating the operation of each part of the system. These are illustrated by the following phenomena.

5.4.1 Organizational learning Once a character is un-possessed, it may continue dancing or go back to its seat by its own free choice, since there is no longer any external input which dominates its behavior system as occurred while it was being possessed. The strength of the desire to dance, which is determined by the overall affective feedback from the previous dancing experience, influences the character’s decision between going back to the seat or continuing to dance. This attitude or desire to dance is a modifiable parameter that is learned

through possession and dancing experience and implemented as the preference learning part of organizational learning. Upon the decision to keep dancing, it tends to repeat the type of dance that it enjoyed during its previous interaction session. This is because, in its behavior system, the dance types that gave the greater fun to the character gained a higher preference value compared to others and were dynamically assigned as children nodes of the autonomous dancing behavior parent node. This shows an example of strategy learning. A session of free dancing gradually reduces the desire to dance because of fatigue and consummation of its desire, and eventually the character goes back to its seat.

5.4.2 Concept learning The main concept that a character builds through an interaction experience is the attitude toward dancing and possession. In this particular installation, the data that these characters collect are the motivational feedback that they experience during the interaction dancing session. In other words, xi are one-dimensional data that are collected while the dancing interaction continues. This, in turn, influences the character’s attitude toward dancing which then alters the drive system part of the motivation system, changing the desire to dance and the desire to interact with the human participant. The updated attitude toward dancing constitutes a change in the affect part of the motivation system of the character, which is displayed as its emotional state through facial expression and modulation of its motor system at the next time tick.

5.4.3 Affective tag formation Characters have the ability to choose where in the diner to dance. They can choose to dance slightly to the left, to the right or in the middle. This decision is made very quickly, from a combination of personal preference, physical proximity to the point and past experience. The past experience is coded as affective tags attached to locations, which influence the character’s decision when it dances the next time. For example, a bad experience in the right corner of the diner makes the character avoid that corner, whereas having lots of fun in one corner increases the probability that the character will choose that corner in the forthcoming session.

6. DISCUSSION Tight coupling among the four systems that constitute the creature kernel made it possible to build characters that show their affective states transparently. In our implementation, the motivation system continually sends modulating signals to the motor system so the character behaves in the proper emotional way, as signals to the behavior system update the behavioral preferences and concepts. Since these changes can be seen by the human participant while the behavior system is being updated, when the character shows an attitude change the person can easily feel sympathetic toward the character. This is the main response we got from SIGGRAPH 99 participants who had a chance to interact with the (void*) characters. In the installation, the three characters initially have very distinctive and strong personalities. Trucker’s basic emotional state is anger; he represents a conservative character who has a very negative attitude toward dancing. Dude is a happy character. He is interested in doing novel and cool things. Though he does not expect to be possessed before he has any experience of possession, he generally has a very positive attitude toward dancing. And when

he figures out that possession leads to dancing, he comes to look forward to being possessed. Salesman’s underlying emotional state is fear. He is willing to do whatever he is forced to by an authority figure. Through repeating experiences, he gradually builds a certain attitude toward possession and the dancing experience associated with the possession. Depending on how the user controls the interface, the characters can have a fun time dancing or they can have painful experiences falling down on the floor. Happy dancing experiences act as positive rewards for the characters and result in their looking forward to being possessed, whereas too much painful experience makes the characters build up a negative attitude toward dancing and possession, and eventually this can make them upset and walk out of the hall while ignoring the signals from the interface. Given this, one of the notable facts is that although the three main characters in the installation showed very different personalities their creature kernels have almost the same structures. Different initial biases toward different desires and preferences for certain types of dance, differently progressing learning rates, etc., made the characters look and behave very differently, as they conveyed strong and easily recognizable archetypes. From a programmer’s point of view, this motivation-based creature design freed us from considering every possible instance or situation a character might encounter. We have been able to create emotionally compelling synthetic characters, and then it was easy to add complicated situations and possible actions and still get realistic and emotional responses from the characters that could elicit sympathetic interest from human participants. That is, we are able to give them internally active and externally expressed feelings that can guide their adaptation to environments not fully anticipated by their creators. One of the huge assumptions that this system makes lies in the feature extraction and attention part. Characters all know where to attend and which features are interesting, which certainly is not true in the real world and becomes computationally too expensive if we introduce the character to a world with too many interesting objects. We think that this problem can be handled by introducing an adaptive attention mechanism, so that characters can learn where to attend and what to attend to and look for relevant features. Also, the motor system has a flattened hierarchy unlike the other three systems of the creature kernel and the units are huge, i.e., mostly complex animated sequences. It will be interesting to see how learning mechanisms similar to what we have discussed can be applied to the motor system and have characters learn new motor skills.

7. ACKNOWLEDGMENTS We thank everyone who worked on the (void*) project: Bill Tomlinson, Michael Patrick Johnson, Marc Downie, Ari Benbasat, Jed Wahl, Delphine Nain, Matt Berlin, Walter Dan Stiehl and Dr. Joe Paradiso. Also our thanks go to others who provided valuable insights for the project: Christopher Kline and Michal Hlavac.

8. REFERENCES [1] Bates, J. The nature of character in interactive worlds and the oz project. Technical Report CMU-CS-92-200, School of Computer Science, Carnegie Mellon University, October 1992.

[2] Blumberg, B. Old Tricks, New Dogs: Ethology and Interactive Creatures. Ph. D in Media Arts and Sciences, The Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, September 1996.

[3] Blumberg, B., Todd, P. and Maes, P. No Bad Dogs: Ethological Lessons for Learning. In From Animals To Animats, Proceedings of the Fourth International Conference on the Simulation of Adaptive Behavior. 1996. MIT Press. Cambridge, MA

[4] Breazeal (Ferrell), C. A motivational system for regulating

[15] Lorenz, K. The Foundations of Ethology. Springer-Verlag, New York, NY. 1981.

[16] Mateas, M. An OZ-centric review of interactive drama and believable agents. Technical Report CMU-CS-97-156, School of Computer Science, Carnegie Mellon University, Pittsburgh PA, June 1997.

[17] Mowrer, O. H. Learning Theory and Behavior. John Wiley and Sons, Inc., New York, NY, 1960.

[18] Plutchik, R. The psychology and biology of emotion. Harper Collins College Publishers, New York, NY, 1994.

human-robot interaction. In Proceedings of AAAI 98. Madison, WI, 1998.

[19] Rolls, E. T, Burton, M.J. and Morea, F. Neurophysiological

[5] Damasio, A. Descartes’ Error: Emotion, Reason, and the

analysis of brain stimulation reward in the monkey. Brain Research, 194:339~357, 1980.

Human Brain. Putnam Publishing Group, 1994.

[6] Dennett, D. The Intentional Stance, Cambridge MA, MIT Press, 1987

[7] Ekman, P. Emotion in the human face. 2nd Edition, Cambridge University Press, Cambridge, UK, 1982.

adverbs: Multidimensional motion interpolation. IEEE Computer Graphics & Applications, 18(5), September ~ October 1998.

[21] Russell, J. A circumplex model of affect. Journal of

[8] El-Nasr, M.S., Loerger, T. R. and Yen, J. A Web of Emotions. In Workshop on Emotion based Architecture (EBAA ’99), Seattle WA, 1999.

[20] Rose, C., Cohen, M. F., and Bodenheimer, B. Verbs and

Agent

[9] Heckerman, D., Learning Bayesian Networks, In The Ninth Annual Conference on Neural Information Processing Systems, Denver CO, November 1995

[10] Hertz, J. A., Palmer, R. G. and Krogh, A. Introduction to the Theory of Neural Computation, volume 1 of Santa Fe Institute studies in the sciences of complexity, AddissonWesley Publishing CO., Redwood City, CA, 1991.

[11] Hull, C. L. Principles of Behavior. Appleton-Century-Crofts, New York, NY, 1943.

Personality and Social Psychology, 39:1161~1178, 1980.

[22] Spier, E. H. From Reactive Behaviour to Adaptive Behaviour: Motivational models for behaviour in animals and robots. Ph. D. Thesis. Balliol College, University of Oxford, 1997.

[23] Tenenbaum, J. B. Bayesian modeling of human concept learning, In Advances in Neural Information Processing Systems 11, M. S. Kearns, S. A. Solla and D. A. Cohn (eds.). MIT Press, Cambridge, MA, 1999.

[24] Tinbergen, M. The Study of Instinct. Oxford University, New York, NY. 1951.

[25] Velasquez, J. A computational framework for emotion-based

[12] Johnson, J., Wilson, A., Blumberg, B., Kline, C. and Bobick, A. Sympathetic Interfaces: Using a Plush Toy to Direct Synthetic Characters, In Proceedings of CHI 99, 1999

control. In Proceedings of the Grouding Emotions in Adaptive Systems Workshop, SAB ’98, Zurich Switzerland, 1998

[13] Johnson, M. P., Multi-Dimensional Quaternion Interpolation,

[26] Young, P. T. The role of hedonic processes in motivation. In

In ACM SIGGRAPH99 Conference Applications, page 258, August 1999.

Abstracts

and

[14] Kline, C. and Blumberg, B. The Art and Science of Synthetic Character Design. Proceedings of the AISB 1999 Symposium on AI and Creativity in Entertainment and Visual Art, Edinburgh, Scotland, 1999.

Nebraska symposium on motivation, 187-188. Lincoln: University of Nebraska Press, 1955.

Motivation Driven Learning for Interactive Synthetic ... - CiteSeerX

Motivation Driven Learning for Interactive Synthetic ... - CiteSeerX

Suggest Documents

Interactive Training for Synthetic Characters - CiteSeerX

Interactive Training for Synthetic Characters - CiteSeerX

Eliciting Positive Student Motivation for Learning ... - CiteSeerX

Motivation-Driven Learning and Teaching Model for ... - UTS ePRESS

Interactive Learning-driven Innovation in Upstream-Downstream ...

Motivation-Driven Educational Game Design: Applying ... - CiteSeerX

Advanced Concepts for Interactive Learning - CiteSeerX

Online Learning for Interactive Statistical Machine ... - CiteSeerX

interactive computer aided learning courseware for ... - CiteSeerX

Advanced Concepts for Interactive Learning - CiteSeerX

information systems for interactive learning: design ... - CiteSeerX

Advanced Concepts for Interactive Learning - CiteSeerX

interactive computer aided learning courseware for ... - CiteSeerX

Anchored Interactive Learning Environments - CiteSeerX

Interactive Nonlinear Learning Environments - CiteSeerX

interactive web based learning - CiteSeerX

Learned Lexicon-driven Interactive Video Retrieval - CiteSeerX

Interactive multimedia learning for

Context Driven Reinforcement Learning - CiteSeerX

Emotion-Driven Reinforcement Learning - CiteSeerX

An affectively driven planner for synthetic characters

A Verifier for Interactive, Data-driven Web Applications - CiteSeerX

A Verifier for Interactive, Data-driven Web Applications - CiteSeerX

A Learned Lexicon-Driven Paradigm for Interactive Video ... - CiteSeerX