Using multi-agent Systems to visualize text descriptions Edagar Bola˜no-Rodr´ıguez, Juan C. Gonz´alez-Moreno, David Ramos-Valcarcel and Luiz V´azquez-L´opez
Abstract Usually people make a visual representation of what they read. This is a fact inherent to their nature. Many efforts have been done in the last years to try to find a direct method to translate a text wrote on a natural language to a graphical representation. Some solutions try to represent mainly a static scene with the objects appearing in the text. There are not so many approaches that try to build an animation from the action described in text. But in any case the approaches found on literature try create directly the visual representation from the text description. In this paper we present a new approach based on the creation of an intermediate model using the INGENIAS agent oriented methodology. Once this model has been obtained, a new translation is performed to get a visual model that can be viewed and modified with the Alice system. A multi-agent system is proposed in order to control and direct the translation process.
1 Introduction Natural language could be understood as a vital and ubiquitous part of human intelligence and their social relations. Therefore, it is natural to believe that technologies which can automatically process natural languages (NLP technologies) could facilitate the everyday lives of ordinary people. Indeed, many NLP technologies have Edgar Bola˜no-Rodr´ıguez Departamento de Inform´atica, Universidad de Vigo, e-mail:
[email protected] Juan C. Gonz´alez-Moreno Departamento de Inform´atica, Universidad de Vigo, e-mail:
[email protected] David Ramos-Valcarcel Departamento de Inform´atica, Universidad de Vigo, e-mail:
[email protected] Luiz V´azquez-L´opez Departamento de Inform´atica, Universidad de Vigo, e-mail:
[email protected]
1
2
Authors Suppressed Due to Excessive Length
Fig. 1 A basic scene on a 3D modeled using the Alice 3 System with Sims characters. Alice is a programming environment designed (mainly) for teaching programming through building 3D virtual worlds and without making syntax errors.
already been applied to build end user applications which help millions of ordinary people on a daily basis. For example, automatic language translation technologies can help people accessing information presented in foreign languages; and technologies for automatic grammar analysis can help people avoid making grammatical mistakes during writing, and so on. Another vital part of human intelligence is their ability to visualize abstractions, more exactly their ability to visualize what people describe when speaking or writing. This ability is basic in many domains in which people needs to visualize a description in order to understand what happens. Moreover, nowadays computer animation is a significant part of our society. It not only has enormous commercial values for the entertainment and advertising industries, but also vast potential to become a useful tool in education and communication. Currently, the creation of computer animation is mostly done manually. This manual process can be tedious, time consuming and labor intensive, as I will illustrate with an example. The scene that is shown in Figure 1 is a virtual environment which contains a room, two men, two women, and a little boy. Although there are not any furniture, in order to build this simple scene any qualified designer must consume several hours of hard work. Moreover, this scene usually represent only the started point for a particular animation. If someone wish to tell a story taking place on this virtual environment, he/she would have to somehow manipulate the entities within this environment in a particular way so that the story could be properly visualized. Traditionally, this manipulation is mostly performed by a human animator who must control manually every relevant aspect of the virtual environment according to the story being told. In the last years some works [13, 10, 9] have been proposed to automatize this work based on the use of natural language instructions. As pointed by Patrick Ye in his PhD. Thesis [13]: ”generating computer animation from natural language instructions is a complex task that encompasses several key aspects of artificial intelligence including natural language understanding, computer graphics and knowledge representation”. Traditionally, this task has been approached by the use of rule based systems that were highly successful on their respective domains, but that were not easy to generalize to other domains. Some previous works [7, 6] have been shown that it is possible to get an initial INGENIAS model by the use of NLP techniques. Nevertheless, It should be noted that the process of generating this INGENIAS model is not fully automatic. This is mainly due because of the ambiguity of Natural Language and because it is not
Using multi-agent Systems to visualize text descriptions
3
Fig. 2 Global metamodel of INGENIAS Methodology. The specification of a MAS is structured in five viewpoints: The definition, control and management of each agent mental state. The agent interactions. The MAS organization. The environment. The tasks and goals assigned to each agent.
usual that the client gives a fully description of the required application. In order to manage this problem the solution adopted was the implementation of a Multi-Agent System (NLP4INGENIAS [7]) to interact with the stakeholders of the application requested. Taking as starting point this previous work and using a similar method to what is presented in [2, 12] this paper suggest an iterative and incremental method to get a model that could be run and modified on the Alice system [8, 4]. The organization of this paper is as follows: Section 2 presents briefly the INGENIAS methodology and the Alice System. In section 3 the most relevant and related works are summarized. While section 4 summarizes the NLP4INGENIAS MAS and presents the translation models. To conclude section 5 picks the conclusions and future works.
2 INGENIAS and Alice in a nutshell 2.1 INGENIAS Methodology INGENIAS [11] is a recent methodological proposal for the development of applications based on multi Systems. The proposal has a good definition of the elements of modeling in which it is based, uses an amendment to the Unified Process Development for its life cycle and allows obtaining consistently a correct and executable code that could be traced from the customer requirements. INGENIAS is based on the concept of meta-model. A meta-model, according to INGENIAS, defines the primitives and also the syntactic and semantic properties of a model. Moreover, INGENIAS tries to follow a Model Driven Development(MDD) [1], so it is based on the definition of a set of meta-models that describe the elements that form a MAS from several viewpoints (see Fig. 2). Each viewpoint is specify using a different meta-model: The Agent Metamodel. The Interaction Metamodel. The Organization Metamodel. The Environment Metamodel. The Tasks and Goals Metamodel.
4
Authors Suppressed Due to Excessive Length
Fig. 3 The Alice 2.2 System Interface. There are five regions in the Alice2 interface: 1) the scene window, 2) the object tree, 3) the object details area, 4) the animation editing area, and 5) the behaviors area. In Alice, users can save both worlds and individual characters with animations and behaviors, facilitating both the reuse of animations for particular characters and sharing characters with friends.
2.2 The Alice System The Alice system, which is provided freely as a public service by Carnegie Mellon University 1 was originally developed as part of a research project in Virtual Reality. The system lets anyone to be the director of a movie, or the creator of a video game, in which 3D objects are presented in an on-screen virtual world and move around according to the directions programmed. Objects in the language refers to any character that could appear in the scene, including the world, camera and light, while the names of default methods and functions are ”words in english” whose meaning is close to the action they perform, like ”move forward” or ”turn right”. A prominent point is that ”it is not possible to have programming errors” or what it is the same a programmer can’t make mistakes! Well, of course mistakes could appear, by telling one of the objects on the scene to move forward when the action is to move the object backward, and so on. This is because of the interface, elements that might be used in a program (commands, programming constructs, 3D objects, objects’ properties, and variables) are tiles that users can drag and drop into animations they are creating. When users drop a command tile requiring parameters, Alice displays menus with valid parameter choices for users to select from. The system also allows a fast feed-back because changes could be tested immediately. One only have to use the play button (see Figure 3) to test the changes introduced. Moreover the system allow to apply easily the principles of the agile development process manifest. Once the system it is finished it could be exported as an embeddable Java Applet or as a movie in quicktime format, ready to interact with then.
1
It could be download from http://www.alice.org
Using multi-agent Systems to visualize text descriptions
5
Fig. 4 The Confucius Architecture. The dashed part in the figure is the knowledge base, including language knowledge (lexicons and a syntax parser) which is used in NLP, and visual knowledge such as 3D models of characters, props, and animations of actions, which is used for generating 3D animations. The main novelty of this approach is the TTS engine used by the Merlin narrator.
3 State of the Art of visualizing text descriptions Over the last two decades, many researchers in the NLP community jointly with the computer graphics community have been developing techniques to enable computers to understand the human natural language. These techniques could aid artists to create virtual reality for storytelling or help researchers to understand some social and human behavior. It is usual for NLP systems the use of knowledge bases containing linguistic information to analyze sentences and the production of data structures representing the syntactic structure, jointly with the semantic dependency of such information. In the computer games and animation industry, computer artists create virtual characters, props, and whole scenes of stories. The construction of these scenes is a labour so intensive, that consumes much more time than expected, and that is hardly reusable. Many works in the last decade have tried to get a system to automatically generate virtual reality from stories in natural language. May be they take as inspiration the Aristotle phrase: ”The soul never thinks without a mental image”, and their basic idea that the representational power of language is derived from imagery, and that spoken words are the symbols of inner images. In this section some of those works are summarized.
3.1 CONFUCIUS CONFUCIUS [9] is an intelligent storytelling system, which converts natural language into 3D animation and audios. CONFUCIUS is implemented using VRML, Java and Javascript, and it integrates existing tools for language parsing and text-tospeech synthesis. The architecture of CONFUCIUS is given in Figure 4. The input of CONFUCIUS is natural language text, but it only deals with single sentences with non-metaphor word senses, and output animations of rigid objects and human bodies. Although natural language expresses concepts at different levels of abstraction,
6
Authors Suppressed Due to Excessive Length
the approach only handles concepts, actions and states at a low abstraction level. Most of the sentences used in implementation and evaluation are chosen from children’s stories like ”Alice in Wonderland”, because entities in these stories are usually not abstract but tangible and amenable to visualization. The CONFUCIUS’ multimodal output includes 3D animation with speech and non speech auditory icons, and a presentation agent, Merlin the narrator. The work of CONFUCIUS is focused mainly on synthesize multimodal output in order to generate virtual character animation and speech with particular emphasis on the generation of virtual humans’ movements associated to verbal words. Nevertheless, CONFUCIUS is only able to visualize single sentences, which contain action verbs with visual valency of up to three, e.g. ”John left the gym”, ”Nancy gave John a loaf of bread”. It cannot handle verbs that involve a 3D morphing (e.g. ”melt” ), deformable and breakable objects (e.g. ”bend”, ”break”), nor verbs without distinct visualization when out of context (e.g. ”play”). In the same way, high level behaviors or routine events (e.g. ”interview”) are out of their scope.
3.2 Patrick Ye Approach In [13] the problem is foreseen as a black box paradigm, in which the final system only has two inputs and one output, namely: As Inputs: ”S”, a natural language description of a list of visualizable actions and ”V”, an accessible and manipulatable virtual environment in which all the actions in S take place. And as output a ”computer animation” that faithfully and coherently visualizes S within V. The main contributions of this research include a novel method for performing semantic role labeling on prepositional phrases, a novel method for performing verb sense disambiguation, and most importantly, a domain-independent empirical approach for mapping verb semantics to animation procedures based on training data. With respect to the task of natural language based animation generation, the most important findings of this research are that: (1) semantic role labeling has a significantly positive contribution; (2) verb sense disambiguation can contribute positively to the task, but its contribution is marginal; and (3) virtual scene features can be used in conjunction with linguistic features to perform planning and reasoning.
3.3 A Multi-modal 3D Approach The work presented by K. R. Glass on their PhD. Thesis [5] tries the conversion of fiction text into multi-modal animated virtual environments as the solution of two problems: Firstly the analysis of the natural language text to create a structured intermediate representation; and secondly the interpretation of this intermediate representation for creating a corresponding animated virtual environment. The text analysis begins with the creation of surface annotations, which involves identifying
Using multi-agent Systems to visualize text descriptions
7
the structural and syntactic properties of fiction text. The annotations are built using a pattern-based machine learning approach. The interpretation of the annotated fiction text involves formulating structured scene descriptions, quantifying entity behavior in a virtual environment, and populating corresponding virtual environments. The work use knowledge-poor techniques for formulating scene descriptions from annotations including: a list of the different scenes to visualize (using the Setting annotation); entities that populate each scene (using Avatar and Object annotations and a library of geometric models); and structured descriptions of entity behavior in each scene (by translating Transition and Relation annotations into time-based constraints).
4 Building Alice programs from scratch The main motivation of this paper is to present a method for build Alice programs from text descriptions done in a natural language like spanish. In [6] it has been presented a procurement process requirements for the INGENIAS methodology. Moreover a MAS system was presented to get an INGENIAS initial model from a customer text description. The process implemented in the MAS is based on the discovery of agents / roles, tasks and goals and it is very similar to those proposed in the works referred in section 3.3, but the result is a model of the description done. [2] presents a 3D Electronic Institutions methodology, which supports human integration into MAS-mediated environments. One of the basis of the proposal is the equivalence established between the meta-model elements needed to specify an electronic institution and those needed to model a 3D virtual world. Taking into account the characteristics that presents the Alice system (see 2.2) jointly with those of the INGENIAS meta-model in this section it is presented a two phase process that could be used as an alternative to the systems described in section 3.
4.1 NLP4INGENIAS The process presented on [7, 6] is based on the discovery of goals, tasks and agents / roles that appear on the requirements of a customer description. This process doesn’t use a controlled language as it is usual in similar approaches, but get as an output a controlled version of the description given. The last version is more helpful to engineers in their work of modeling the desired MAS application than first version. The system raised will be modeled as a multi-agent system that is going to arise on the basis of the system goals and an initial organization model of the system that presents the agents / roles involved. Goals are associated with cases of use described by means of scenes. In these scenes the identified agents / roles will perform individual tasks and / or interactions among them. If we consider now that the application modeled is an animation that correspond with the initial text description
8
Authors Suppressed Due to Excessive Length
then what we have got applying the NLP4INGENIAS approach is just a model of one 3D application. A detailed description of the process could be found on [6].
4.2 INGENIAS2Alice
Fig. 5 An example on Alice of the proposed solution.
Taken into account that the goal is to build an animation on the Alice system. After establish an agent model (an INGENIAS model) it is desirable to have a translation from this model to the Alice system. As pointed in 2.2 the Alice system it is not a traditional programming environment, moreover the Alice program structure (on the version 2.2) consists in a compressed file which contains a directory structure that merges binary files with ”xml” files that collect the behavior and the structure of the animation. In order to get the conversion it will needed to fix a mapping between the INGENIAS model elements and the Alice program elements. To get it, a modeling of Alice programs by means of the INGENIAS methodology has been used. Each scene in Alice has been modeled by means of an organization model. This model will contain the groups, agents and roles that are involved in the scene. It has been taken into consideration in the specification of the scene that not any character play the same role, nor have the same kind of participation. Moreover the following restrictions have been fixed over the organization diagrams: Each Alice scene must have only one organization diagram. The diagram will contain at most three predefined groups of agents: Dynamic, Static and Tools. A Dynamic agent (i.e. main characters) refers to a character that has an own dynamic behavior or it is managed dynamically by users. A Static agent (i.e. secundary characters) refers to a character whose behavior (may be none) is static and is partially present in the full scene. While a tool agent (i.e. helping characters) is a character that could appear or not in the scene depending on the behavior of main characters, with a predefined functionality supplementing the knowledge or the behavior of the rest. Each agent must have an agent model that must specify the roles, task and goals that the character will do on the scene modeled. At least one task and goal diagram have to be specified in order to express the relations between goals. Many other diagrams are needed to express the tasks that satisfy each goal. The Interactions between characters are specified by means of the interaction diagrams. At last use
Using multi-agent Systems to visualize text descriptions
9
case diagrams are also used in order to relate the full original description with the INGENIAS specification. A translation process to get an Alice representation is performed after such an INGENIAS specification is gotten. The translation considers each agent as a character (object) on Alice, uses roles to activate or deactivate some functionalities of the characters, the goals will represent the actions performed in the scene, whereas use cases present a particular view of the scene that could be controlled by a camera. In what respect to tasks and agent interactions they specify the methods and functions that characters could perform. It has been Used this basic approach over several examples provided by a well known product ”The Merli˜no game”. This game was built several years ago to help Galician childrens to learn the Galician idiom and understand the Galician culture. The game is presented as a set stories in which children interact with the characters in a magic adventure. The tests performed was consisted on to apply the method to the description of each scene using several characters pre-designed accordingly and checking the results gotten with the aspect and behavior of the original application. Figure 5 present the result gotten for a basic scene with two characters and one object that model the description: ”Merli˜no gives the old woman a box”.
5 Conclusion and Future Works In this paper a new approach to the animation of text description in natural language has been presented. The approach consists on the establishment of an intermediate agent model (an INGENIAS model). This approach covers the rest of solutions presented in the literature and solves some of the problems that appear in those approaches. In what respect to [9] the approach presented in this paper allows the use of complex sentences and also any kind of verb. Because of the use of NLP4INGENIAS that creates an intermediate model that allows to refine the complex action in simpler tasks. The work of [13] is covered by the iterative and incremental process done by NLP4INGENIAS. Moreover the use of the intermediate model allow to have the same visual representation for several equivalent text descriptions or to change the animation if the semantic of some phrase must be changed without change the text description. This point is an advantage over the annotation of the text that is done in [5]. As future works it is planned to develop a new methodology based on the use of agents to build multimedia applications. Also it is in working process the translation into a program that run over the beta Alice 3 system, whose main advantage is that integrates the Sims avatars and has a better look of the environments.
10
Authors Suppressed Due to Excessive Length
References [1] C. Atkinson and T. Kuhne. Model-driven development: a metamodeling foundation. Software, IEEE, 20(5):36–41, 2003. [2] Anton Bogdanovych, Marc Esteva, Simeon J. Simoff, Carles Sierra, and Helmut Berger. A methodology for developing multiagent systems as 3d electronic institutions. In Michael Luck and Lin Padgham, editors, AOSE, volume 4951 of Lecture Notes in Computer Science, pages 103–117. Springer, 2007. [3] Juan M. Corchado, Sara Rodr´ıguez, James Llinas, and Jos´e M. Molina, editors. International Symposium on Distributed Computing and Artificial Intelligence, DCAI 2008, University of Salamanca, Spain, 22th-24th October 2008, volume 50 of Advances in Soft Computing. Springer, 2009. [4] Wanda P. Dann, Stephen Cooper, and Randy Pausch. Learning To Program with Alice. Prentice Hall Press, Upper Saddle River, NJ, USA, 2 edition, 2008. [5] R. K. Glass. Automating the Conversion of Natural Language Fiction to MultiModal 3D Animated Virtual Environments. PhD thesis, Rhodes university, 2008. [6] Juan Carlos Gonz´alez-Moreno and Luis V´azquez-L´opez. Design of multiagent system architecture. In COMPSAC, pages 565–568. IEEE Computer Society, 2008. [7] Juan Carlos Gonz´alez-Moreno and Luis V´azquez-L´opez. Using techniques based on natural language in the development process of multiagent systems. In Corchado et al. [3], pages 269–273. [8] Caitlin Kelleher and Randy Pausch. Using storytelling to motivate programming. Commun. ACM, 50:58–64, July 2007. [9] Minhua Ma. Automatic Conversion of Natural Language to 3D Animation. PhD thesis, University of Ulster. Faculty of Engineering., 2006. [10] Masaki Oshita. Generating animation from natural language texts and semantic analysis for motion search and scheduling. The Visual Computer, 26(5):339–352, 2010. [11] Juan Pav´on, Jorge J. G´omez-Sanz, and Rub´en Fuentes-Fern´andez. The INGENIAS Methodology and Tools, article IX, pages 236–276. Idea Group Publishing, 2005. [12] Tomas Trescak, Marc Esteva, and Inmaculada Rodr´ıguez. A virtual world grammar for automatic generation of virtual worlds. The Visual Computer, 26(6-8):521–531, 2010. [13] Patrick Ye. Natural Language Understanding in Controlled Virtual Environments. PhD thesis, University of Melbourne. Department of Computer Science and Software Engineering., 2009.