Animated Storytelling System via Text Kaoru Sumi
Mizue Nagata
National Institute of Information and Communications Technology
The Department of Early Childhood Care and Education, The Faculty of Human Life, Jumonji University
3-5 Hikaridai, Seikacho, Kyoto, Japan +81-774-98-6880
[email protected]
Sugasawa, Niiza-shi, Saitama, Japan
[email protected]
ABSTRACT This paper describes a system, called Interactive e-Hon, for helping children understand difficult content. It works by transforming text into an easily understandable storybook style with animation and dialogue. In this system, easy-to-understand content is created by a semantic tag generator through natural language processing, an animation generator using an animation achieve and animation tables, a dialogue generator using semantic tag information, and a story generator using the Soar AI engine. Through the results of an experiment, this paper describes the advantages of attracting interest, supporting and facilitating understanding, and improving parent-child communication by using Interactive e-Hon.
Categories and Subject Descriptors H.5.1 [Multimedia Information Systems]
General Terms Design
Keywords Animation, Semantic Information, media transformation
1. INTRODUCTION In this paper, I introduce a storytelling system for helping children’s understanding. When interacting with adults about a common topic, children experience barriers to understanding, including difficult expressions, prerequisite background knowledge, and so on. Although parents can select words and concepts that a child may know, a communication gap may remain. If there was something to bridge this gap between a parent and a child, it would be helpful for their conversation. Our research goal is to remove such barriers and build bridges to facilitate children’s understanding and curiosity and help parents’ explanations. We think visualization is helpful for this goal. Pictures can show static images, and animation can provide action. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACE 06, June 14-16, 2006, Hollywood, California, USA. Copyright 2006 ACM 1-59593-380-8 /06/0006 ...$5.00.
This paper describes a system, called Interactive e-Hon, for helping children understand difficult topics. I introduce Interactive e-Hon as both a storytelling system and a multimedia communication tool. The system provides storytelling in the form of animation and dialogue translated from original text. The text can be free text inputted by a user, existing electronic content, and so on. Users can change the story by interacting with the text. Interactive e-Hon uses animation to help children understand content. Visual data attract a child’s interest, and the use of concrete examples like metaphors facilitates understanding, because each person learns according to his or her own unique mental model [1],[2], formed according to one’s background. For example, if a user poses a question about something, a system that answers with a concrete example in accordance with the user’s specialization would be very helpful. With this approach in mind, our long-term goal is to help broaden children’s intellectual curiosity [3] by broadening their world. The basic idea is to (1) divide the text into parts, (2) select animation from an animation archive, which has animation data divided into parts according to the division of the text, (3) combine the animation parts, and (4) present a voice and animation synchronously. The users in this system are a parent and a child, who watch the system like a movie while talking to each other. As a result, the child can understand the content and discuss the topics that it includes. Attempts to transform natural language (NL) into animation began in the 1970s with SHRDLU [4], which represents a building-block world and shows animations of adding or removing blocks. In the 1980s and 1990s, more applications [5], [6], [7] appeared, in which users operate human agents or other animated entities derived from NL understanding. Recently, there has been research on the natural behavior of life-like agents in interactions with users [8], [9],[10]. The main theme in this line of inquiry is the question of how to make these agents as human-like as possible in terms of dialogicality, believability, and reliability. On the other hand, our research aims to make content easier for users to understand, as opposed to improving agent humanity. WordsEye [11] is a text-to-scene generation system that includes special data. In contrast, our system generates animation but not scenes. Thus, we have pursued this line of inquiry because little or no attention has been paid to media translation from content with the goal of improving users’ understanding.
2. Interactive e-Hon Interactive e-Hon is a kind of word translation medium that provides expression through the use of 3D animation and dialogue explanation in order to help children understand Web content or other electronic resources, e.g., news, novels, essays, free text, and so on. For given content, animation plays in synchronization
expected to be applied in the next generation of Web documentation. Interactive e-Hon generates documents with semantic tags (.tag files), with morphological and dependency structure information (.morph) and animation files (.ehon), according to the .x file format of DirectX. The system utilizes an action animation table and a background table. Interactive e-Hon uses software libraries in the Windows XP environment as follows:
Web
Server World Japanese Ontology Multimedia SOAR thsaurus Semantic AI engine View Database Achieve
- Cabocha, for the Japanese morphological and dependency structure analyzer; - Microsoft Access, for the tables;
Text in E-content
Tag generator
Animation generator
Dialogue generator
Story generator
- Direct X SDK, for the animation environment; - FineSpeech, for the Japanese voice synthesizer; - Xerces C++, for XML;
Japanese Morphological /Dependency Structure A Japanese Analyzer
Voice Synthesizer
- A Japanese thesaurus; and
Lexcon
- A Japanese lexicon, Goitaikei.
PC Animation & Dialogue
Figure 1: System framework of Interactive e-Hon.
with a dialogue explanation, which is spoken by a voice synthesizer. The main points of our approach are to present (1) animation of behavioral semantics, i.e., who is doing what, etc.; and (2) dialogue between the shadows of a parent and a child, represented as cartoon characters through a simple visual interface and intended for situations in which a parent and child use the system together. The main functions of Interactive e-Hon are (1) transforming NL into animation word by word and additional story generation, (2) generating dialogue from the NL of content, (3) presenting the content by using parent and child agents, and (4) explaining concepts through the use of animations transformed from metaphors. Figure 1 shows the system framework of Interactive e-Hon. It consists of a tag generator, an animation generator, a dialogue generator, and a story generator. Our system uses the Japanese language for text. The tag generator uses ontologies and a Japanese morphological and dependency structure analyzer for natural language processing (NLP). The animation generator uses a multimedia semantic archive for the animation data, and a world-view database and ontologies for animation explanation using metaphors. The dialogue generator uses the abovementioned structure analyzer and a voice synthesizer for output. The story generator uses the Soar AI engine as a rule-based system for story generation, and the multimedia semantic archive for storage of animation data. Figure 2 shows the process flow for Interactive e-Hon. This processing is based on text information containing semantic tags, which are tags with several semantic meanings for every morpheme. Recently, the Semantic Web [12] and its associated activities have adopted tagged documentation. Tagging is also
In the following subsections, I describe the key aspects of Interactive e-Hon: the transformation of electronic content into animation, the information presentation model using parent and child agents, the transformation of electronic content into dialogue expressions, and the expression of conceptual metaphors by animation.
2.1 Transformation from Text into animation Interactive e-Hon transforms content into animation by using semantic tags, morphological and dependency structure information, an action animation table, and a background table. At first, the system attempts to reduce long, complicated sentences into simple sentences. One animation and one dialogue are generated for one sentence and then played at the same time. The semantic tags consists of items concerning time, space, weather, and objects (from one to many objects). Each item is divided into smaller items as follows: time (season, date, time); space (continent, country, region, scene); weather; object (name, category, feature, position, action (name, category, feature, target (name, category, function, feature))). For looking up semantic categories, the system uses both the Japanese morphological analyzer and the Japanese lexicon, as mentioned above. An animation is selected according to the action animation table and the background table. The action animation table includes an action’s name, category, and modifier, an object’s name, category, function, and modifier, and a priority score. If these registered patterns correspond to the pattern of a sentence, the registered animation is selected. This depends on the registration in the table, but basically, a subject is treated as a character, and a predicate is treated as an action. An object is also treated as a character, and an associated predicate is treated as an action. A background for animation is selected according to the background table. This table includes a scene ID, season, year, month, day, hour, minute, continent, country, region, scene,
Original Text
Including Dialogues
Adding Semantic Tags (. tag)
Adding Morphological &Dependency Structure Information (. morph)
Action Animation Table
Searching World View databases
Background Table Generating animations transformed from metaphors
Story Generation (SOAR)
“A president is similar to a king in the sense of being a person who governs a nation” Generating Animations (. ehon)
Interactive e-Hon Figure 2: Overview of the process flow for Interactive e-Hon.
weather, and priority score. If these registered patterns correspond to the pattern of a sentence, the registered background is selected. This depends on the registration in the table. Many animations have been recorded in our database. At present, the system has 18 characters, 67 behaviors, and 31 backgrounds, but the database is still being expanded. A character or action involves a one-to-many relationship. Various character names are linked to each character. Various action names are linked to each action, because often several different names indicate the same action. Actions can be shared among characters in order to prepare a commoditized framework of characters. The representation in an animation comes from real motions, gestures, emphasized behaviors, and cartoon-like deformations. During registration of the action animation table, we registered a man in a suit as a character to be selected when the subject or object is a company, a government, or a public agency. If there is no word correspondence, as in the case of a person described in a semantic tag, the system selects a general person character.
It is said that a confectioner, who read the newspaper, made a stuffed bear, found the nickname “Teddy,” and named it a “teddy bear.” (Information of .morph file) - S: confectioner; MS: who read the newspaper; P: make; O: stuffed bear. - S: confectioner; P: find; O: nickname “Teddy”; MO: his. - S: confectioner; P: name; MP: “teddy bear”. - S: it; P: say. (Original Sentence 2) But, the president refused to shoot the little bear, and he helped it instead.” (Information of .morph file) - S: president; P: shoot; O: little bear. - S: president; P: refuse; O: to shoot the little bear.
The system first generates a .tag file and a .morph file from the text information of content; it then divides the sentences in the text. For example, it might generate the following lists from the long sentences shown below, using the following abbreviations: MS: modifier of subject; S: subject; MO: modifier of object; O: object; MP: modifier of predicate; P: predicate. (Original Sentence 1)
- S: president; P: help; O: little bear.
In generating an animation, the system combines separate animations, for example the subject as a character, the object as a passive character, and the predicate as an action, according to the animation table.
For example, in the case of Original Sentence 2 above, first - president (character) shoot (action), and - little bear (character; passive) is shot (action; passive) are selected. After that, - president (character) refuse (action) is selected. Finally, - president (character) help (action), and - little bear (character; passive) is helped (action; passive) are selected. This articulation of animation is used only for verbs with clear actions. For example, the be-verb and certain common expressions, such as “come from” and “said to be” in English, cannot be expressed. Because there are so many expressions like these, the system does not register verbs for such expressions as potential candidates for animation.
2.2 Parent-child agent model We believe that (1) an easy, natural, unforced viewing style and (2) a feeling of assimilation into content are important elements in encouraging users to accept and become interested in content. For example, with a guided agent, however eagerly it talks to a user, the user may not be interested and may no longer want to watch. We think that this is a kind of forced style, similar to how the more parents tell children to study, the less they feel like studying. Furthermore, for users to be interested in content, it must provide some form of assimilation for them. If users feel that a virtual world is completely separate from themselves, they will lose interest in it. Our system has agents that mediate a user’s understanding through intelligent information presentation. In the proposed model, a parent agent (mother or father) and a child agent have a conversation while watching a “movie” about content, and the user (or users, in the case of a child and parent together) watches the agents. The agents take the form of moving shadows of the parent and child. The shadow of the child agent represents the shadow of the real child user, and the shadow of the parent agent represents the shadow of the child’s real parent. These shadows appear on the system’s screen and move freely without the user’s intervention, similarly to the way Peter Pan’s shadow moves independently. The shadows enhance the feeling of assimilation between the real world (users) and the virtual world (content). The user’s assimilation with the content should lead to a feeling of affinity. There are two kinds of agents: avatars, which represent the self, and guides or actors, which represent others. Avatars are more agentive, dialogical, and familiar than guides or actors [13]. Thus, we designed the avatars in Interactive e-Hon so that users would feel familiarity and affinity with the agents, helping them gain a better understanding of content.
2.3 Transformation from content into dialogue expressions To transform content into dialogue, the system first generates a list of subjects, objects, predicates, modifiers and so on from the
text information of the content. It also attempts to shorten and divide long, complicated sentences. Then, by collecting these words and connecting them in a friendly, colloquial style, the system generates conversational sentences. In addition, it reduces the level of repetition for the conversational partner by changing phrases according to a thesaurus. It prepares explanations through abstraction and concretization based on ontologies, meaning that it adds explanations of background knowledge. For example, in the case of abstraction, “Antananarivo in Madagascar” can be changed into “the city of Antananarivo in the nation of Madagascar,” which uses the ontologies, “Antananarivo is a city” and “Madagascar is a nation.” Similarly, in the case of concretization, “woodwind” can be changed into “woodwind; for example, a clarinet, saxophone, or flute.” These transformations make it easier for children to understand concepts. In the case of abstraction, the semantic tag meaning of “person” adds the expression, “person whose name is”; “location” adds “the area of” or “the nation of”; and “organization” adds “the organization of.” In the case of concretization, if a target concept includes lower-level concepts, the system employs explanations of these concepts. The system then generates dialogue lines one by one, putting them in the order (in Japanese) of the subject’s modifier, the subject, the object’s modifier, the object, the predicate’s modifier, and the predicate, according to the divided units. To provide the characteristics of real storytelling, the system uses past tense and speaks differently depending on whether the parent agent is a mother or father. Sometimes the original content uses reverse conjunction, as with “but” or “however,” as in the following example: “But, what do you think happens after that?”; “I can’t guess. Tell me the story.” In such cases, the parent and child agents speak by using questions and answers to spice up the dialogue. Also, at the end of every scene, the system repeats the same meaning with different words by using synonyms.
2.4 Searching and transformation of metaphors into animation If a user does not know the meaning of a term like “president,” it would be helpful to present a dialogue explaining that “a president is similar to a king in the sense of being the person who governs a nation,” together with an animation of a king in a small window, as illustrated in Figure 3. As noted above, people achieve understanding of unfamiliar concepts by transforming the concepts according to their own mental models [1],[2]. The above example follows this process. The dialogue explanation depends on the results of searching world-view databases. These databases describe the real world, storybooks (with which children are readily familiar), insects, flowers, stars, and so on. The world used depends on a user’s curiosity, as determined from the user’s input in the main menu. For example, “a company president controls a company” appears in the common world-view database, while “a king reigns over a country” appears in the world-view database for storybooks, which is the target database for the present research. The explanation of “a company president” is searched for in the storybook world-view database by utilizing synonyms from a
In this case, Interactive e-Hon is explaining the concept of a “president” by showing an animation of a king. The mother and child agents talk about the content. The original text information can be seen in the text box above the animation. The user can ask questions directly in the text box. The original text: “President Roosevelt went bear hunting and met a dying bear in autumn of 1902. However, the President refused to shoot and kill the bear, and he helped it instead. Along with a caricature by Clifford Berryman, the occurrence was carried by the Washington Post as a heartwarming story.” The following is a dialogue explanation for this example: Parent Agent: President Roosevelt went bear hunting. Then, he met a small, dying bear. Child Agent: The President met a small bear who was likely to die. A real child: What is a president, mummy? (Then, his mother operates the e-Hon system by clicking the word. Here, an animation using the retrieved metaphor is played.) Parent agent: A president is similar to a king as a person who governs a country (with the king animation in a small window). A real parent: A president is a great man, you know? Parent Agent: But, what do you think happens after that? Child Agent: I can’t guess. Tell me the story. Parent Agent: The President refused to shoot and kill the bear. And, he helped it instead. Child Agent: The President assisted the small bear. Parent Agent: The occurrence was carried by the Washington Post as a heartwarming story, with a caricature by Clifford Berryman. Child Agent: The episode was carried by the newspaper as a good story.
Figure 3: A sample view from Interactive e-Hon. thesaurus. Then, the system searches for “king” and obtains the explanation: “A company president, who controls a company, is similar to a king, who governs a nation.” If the user asks the meaning of “company president,” the system shows an animation of a king in a small window while the parent agent, voiced by the voice synthesizer, explains the meaning by expressing the results of the search process. In terms of search priorities, the system uses the following order: (1) complete correspondence of an object and a predicate; (2) correspondence of an object and a predicate, including synonyms; (3) correspondence of a predicate; and (4) correspondence of a predicate, including synonyms. Commonsense computing [14] is an area of related research on describing world-views by using NL processing. In that research, world-views are transformed into networks with well-defined data,
like semantic networks. A special feature of our research is that we directly apply NL with semantic tags by using ontologies and a thesaurus.
2.5 Story generation Using the Soar AI engine, the system exchanges documents, stories, tags, and animations. Soar is a production system in which we can generate rules ourselves. Its processing is sufficiently fast to be applied even in game technology. Interactive e-Hon can supplement stories with this rule-based system. Defining commonsense rules improves the presentation. The original text is extended by adding some text and animation according to the defined rules. The IO of the SOAR engine consists of the tags and the text. The essential idea is automatic content extension by using commonsense knowledge in
real time. Richer content can be represented by using the story generator, in comparison with not using it. For example, for original text as follows, clicking the open button launches the following content. (Original Sentence1) President Roosevelt went bear hunting.” (with an animation of Roosevelt walking) If the user adds the text “in winter,” the original text information is changed as follows: (Original Sentence2) In winter, President Roosevelt went bear hunting.” Then, by clicking the SOAR button and the open button, the original text is supplemented and an animation is added: (Original Sentence3) In winter, President Roosevelt went bear hunting. It was snowing, and a rabbit was coming.” (with an animation of Roosevelt walking, then an animation of snow and a rabbit coming)
In this example, the rule descriptions are as follows: IF (^season winter) THEN (^weather snow)
IF (^season winter) and (^location forest) THEN (^object rabbit) (^action come)
3. EXPERIMENT We tried converting four actual examples of content from the Web, newspapers, and novels. For example, we transformed the content, “the origin of the teddy bear’s name,” from a Web source into an animation and a dialogue (Figure 3). The users of Interactive e-Hon were assumed to be a pair consisting of a parent and a child. By observing the users, we could evaluate the effect of using the system via their interactions. Therefore, we conducted experiments using real subjects to examine whether Interactive e-Hon’s was helpful for users’ understanding. We used a pair consisting of a teacher and a child, instead of a parent and a child. The subjects were two preschool teachers and four children. Teacher A was in his/her fifties and had been teaching for 25 years. Teacher B was in his/her forties and had been teaching for 3 years. Children S and H were both boys, approximately 5 years and 6 months old. Child C was a girl, also 5 years and 6 months old. Child M was a girl, 4 years and 3 months old. In the experiment, the subjects viewed the content of “the origin of the teddy bear” via a dialogue and an animation generated for the text from the Web. The content was presented on the screen with explanation via the dialogue of the parent agent. Each pair of users was asked to sit in front of the display and talk freely while
viewing the system. Their interactions were recorded on video. Each teacher was then asked to respond to a questionnaire afterward. According to the responses, the teachers and children were previously unaware of “the origin of the teddy bear’s name.” We presumed that the concept was not so easy for the children, because both teachers said that the content included some difficult words and concepts. Consider the following examples of a teacher asking about a child’s understanding: Teacher A: Do you know about the United States of America? Children S: I don’t know. (He shakes his head) Teacher A: You don’t know? Teacher A: Do you know about bear hunting? It means catching a bear. Children S: (He nods) According to the responses, both teachers reported that visualization is an advantage of the system. Because of the visualization, the teachers could easily explain concepts even to small children who have poor vocabularies. Regarding the effectiveness of visualization, we observed that both teachers repeatedly pointed to the display during their interactions with the children. Teacher A: The Washington Post is a newspaper, you know. (She points to the display.) Children S: (He nods) Teacher A: That story was published in a newspaper in the United States of America. (She points to the display.) Teacher B: Then, the company made and sold the stuffed bears. They sold a lot of them. (She points to the display.) Teacher A reported that the animation attracted the children’s attention, indicating another advantage of the system. Child H was very interested in the animation on the display, from the beginning to the middle. Child H: What’s this? What? Teacher A: (She nods) (The story starts) Child H: Uuuuu! Ooooh! Explanation using animated representation can thus facilitate children’s understanding and enable easier explanation. It also attracts a child’s attention, as shown by the above example. Teacher A pointed out the possibility for content to combine children’s experience and existing knowledge with their imaginations. The next example illustrates this type of interaction.
Child S: Yes it is.
Child S: In my house……
Teacher A: You understand? 3000 bears is a lot.
Teacher A: Yes?
Child S: Yes.
Child S: I have a stuffed bear… A big one…. I have it during sleeping time…. Teacher A: Oh. That’s nice.
Teacher A: Then, so many bears were ordered. All these people in America said, “I want to buy a teddy bear,” and they bought them. (She pointed to the display.)
Child S: Such a big one.
Child S: Now, you know what?
Teacher A: You have a bear in your house.
Teacher A: Yes?
Child S: Yes
Child S: Ah, the teddy bear was everybody’s favorite?
Teacher A: Yes. It was everybody’s favorite.
Teacher B: Do you have a stuffed bear in your house? Child M: (She nods) Yes. A blue ribbon one. Teacher B: It has a blue ribbon? Like this? (She pointed to the display.) Child M: (She nods) I always take care of it. Teacher B: You always take care of it. Child M: (She nods) We also observed the children asking the teachers for explanations.
From the expressions, “3000 bears were ordered” and “there was a teddy bear boom in America,” it would is difficult for a child to understand the consequences of the expression, “the teddy bear was everybody’s favorite.” Additionally, the concepts of “3000 bears”, ”order,” and “boom” are not easy, and the inference of why ordering 3000 bears led to a boom is also difficult. As a result of interaction via the e-Hon system, however, Child S understood and translated his own word for “favorites.” Consequently, through this experiment, we have demonstrated the possibility of actively supporting children’s understanding by using our system to interact with an adult, take advantage of visualization, and explain by showing a related concept.
(Voice: The company exhibited the stuffed bear at an expo.)
4. DISCUSSION
Child M: What is the meaning of “exhibit”?
Interactive e-Hon’s method of expression through dialogue and animation is based on NL processing of content. For dialogue expression, the system generates a plausible, colloquial style that is easy to understand, by shortening long sentences and extracting the subject, objects, predicate, modifiers, and other components from them. For animation expression, the system generates a helpful animation by connecting individual animations selected for the subject, objects, predicate, and so on. The result is expression through dialogue with animation that can support a child user’s understanding, as demonstrated by our experiment using real subjects.
Teacher A: It means “bring and show”. He is bringing it, see? (She is pointing to the display.) This example illustrates the possibility of a child working with an adult and actively getting knowledge through their interaction. The next example shows the possibility of supporting the understanding of children who can read by showing text on the screen. Teacher A: Then, what it is called in the text? Child H: I don’t know. Teacher A: Here it is. (She pointed to the display.) Child H: Teddy. Teacher A: Yes. Oh, Yes. The next example demonstrates the acceleration of a child’s understanding as a result of interaction with the teacher. (Voice: Then, 3000 bears were ordered and there was a teddy bear boom in America. So the name “teddy bear” became established.) Teacher A: Ah… Americans thought the teddy bear was cute, and it attracted their attention. Then, everyone said, “I want to buy a bear.” So the company made a lot of them, like this. (She pointed to the display while explaining.) 3000 bears were ordered, you know? That’s so many, isn’t it?
In our experiment, we observed that the Interactive e-Hon system has the advantages of attracting interest, supporting and facilitating understanding, and improving communication. In other words, we can say that the system supports communication between a parent and a child via contextual understanding. If the NLP is successful and animations appropriate for the corresponding words are displayed, the system can transform original text into dialogue and animation. Reference terms (e.g., “it,” “that,” “this,” etc.) and verbal omission of a subject are open problems in NLP and remain as issues in our system. As a tentative solution, we have manually embedded word references in tags. A fully automatic process knowing which words to reference will depend upon further progress in NLP. When translating a word into an animation, ambiguity may be a problem. Ambiguity of the same word with different meanings can be resolved using a dictionary. For ambiguity of animation expression, however, we have to consider several cases, because the action depends on the conditions or character, and so on.
5. CONCLUSION We have introduced Interactive e-Hon, a system for facilitating children’s understanding of electronic content by transforming it into an animation. We have conducted media transformation of actual content and demonstrated the effectiveness of this approach via an experiment using real subjects. We have thus shown that Interactive e-Hon can generate satisfactory explanations of concepts through both animation and dialogue, which can be readily understood by children. Interactive e-Hon could be widely applied as an assistant to support the understanding of difficult content or concepts by various kinds of people with different types of background knowledge, such as the elderly, people from different regions or cultures, or laypeople learning about a difficult field.
6. REFERENCES [1] Philip N. Johnson-Laird: Mental Models, Cambridge: Cambridge University Press. Cambridge, Mass.: Harvard University Press (1983). [2] D. A. Norman, The Psychology of Everyday Things, Basic Books (1988). [3] Hatano and Inagaki: Intellectual Curiosity, Cyuko Shinsho (in Japanese) (1973). [4] Terry Winograd, Understanding Natural Language, Academic Press (1972). [5] Vere, S. and Bickmore, T.: A basic agent. Computational Intelligence, 6:41-60 (1990). [6] Richard A. Bolt: “Put-that-there”: Voice and gesture at the graphics interface, International Conference on Computer Graphics and Interactive Techniques archive, Proceedings of
the 7th annual conference on Computer graphics and interactive techniques, ACM Press (1980). [7] N. Badler, C. Phillips, and B. Webber, Simulating Humans: Computer Graphics, Animation and Control. Oxford University Press (1993). [8] Justine Cassel, Hannes Hogni Vilhjalmsson and Timothy Bickmore: BEAT: the Behavior Expression Animation Toolkit, Life-Like Characters, Helmet Prendinger and Mitsuru Ishizuka Eds., pp. 163-187, Springer (2004). [9] Hozumi Tanaka et al: Animated Agents Capable of Understanding Natural Language and Performing Actions, Life-Like Characters, Helmet Prendinger and Mitsuru Ishizuka Eds., pp. 163-187, Springer, (2004). [10] Stacy Marsella, Jonathan Gratch and Jeff Rickel: Expressive Behaviors for Virtual World, Life-Like Characters, Helmet Prendinger and Mitsuru Ishizuka Eds., pp. 163-187, Springer (2004). [11] Bob Coyne and Richard Sproat: WordsEye: An Automatic Text-to-Scene Conversion System, SIGGRAPH 2001, Proceedings of the 28th Annual Conference on Computer Graphics, Los Angeles, California, USA. ACM (2001). [12] D. Fensel, J. Hendler, H. Liebermann, and W. Wahlster (Eds.) Spinning the Semantic Web, MIT Press (2002). [13] Toyoaki Nishida, Tetsuo Kinoshita, Yasuhiko Kitamura and Kenji Mase: Agent Technology, Omu Sya (in Japanese) (2002). [14] Hugo Liu and Push Singh: Commonsense reasoning in and over natural language. Proceedings of the 8th International Conference on Knowledge-Based Intelligent Information & Engineering Systems (KES-2004) (2004).