Multilingual Natural Language Generation for 3D ... - Semantic Scholar

Multilingual Natural Language Generation for 3D Learning Environments Charles B. Callaway, Brent H. Daniel, and James C. Lester Multimedia Laboratory Department of Computer Science North Carolina State University Raleigh, NC 27695-8206 http://www.multimedia.ncsu.edu/

fcbcallaw,bhdaniel,[email protected]

Abstract. It is crucial for knowledge-based learning environments to be able to provide students with dynamic, realtime explanations. Achieving this requires the integration of 3D animated explanations with speech. Unfortunately, planning the integrated creation of 3D animation and the types of linguistic utterances capable of reactively and dynamically explaining them requires a wide range of knowledge about 3D models, pedagogical strategies, student models and linguistic utterances that can describe what is seen onscreen at any moment. Furthermore, the development costs associated with state-of-the-art technologies currently limit the introduction of these learning environments to monolingual systems. We present a multilingual framework for generating multimodal explanations of complex physical phenomena that combines 3D animated technologies with multilingual speech. This framework achieves multilingual generation without requiring any changes to be made to the underlying knowledge base that contains the domain knowledge.

1 Introduction As multimedia technologies reach ever higher levels of sophistication, knowledgebased learning environments and intelligent training systems can create increasingly eective educational experiences. Moreover, if learning environments could leverage the growing body of work on intelligent multimedia systems in the form of knowledge-based 2D graphics generation (Roth, Mattis, & Mesnard 1991, Mittal et al. 1995), automated static 3D graphics production (Feiner 1985, Feiner & McKeown 1993, Wahlster et al. 1993), or even 3D animation generation (Karp & Feiner 1993, Butz & Kruger 1996, Christianson et al. 1996, Bares & Lester 1997), they could exibly generate multimedia explanations that clearly communicate complex concepts. If work on multilingual natural language generation systems (Paris et al. 1995, Vander Linden & Scott 1995, Aguado et al. 1998, Bateman Support for this work was provided by grants from the NSF (IRI-9701503, REC9973157), the IntelliMedia Initiative of NC State University, the William R. Kenan Institute for Engineering, Technology and Science, and a gift from Novell, Inc.

& Sharo 1998, Beale et al. 1998) could be integrated into multimedia explanation systems, the overall cost of deploying intelligent multilingual multimedia learning environments would be substantially reduced. Because text-only explanations are notoriously inadequate for expressing complex physical relationships, realtime multimediaexplanation generation could contribute signi cantly to a broad range of learning environments and training systems. Unfortunately, planning the integrated creation of 3D animation and multilingual linguistic utterances in realtime requires coordinating the visual presentation of 3D objects while generating appropriate descriptive phrases that accurately re ect the complex objects and events involved. Although a number of projects have studied the automated coordination of natural language and 2D graphics (Feiner &McKeown 1993), previous work on knowledge-based 3D animation either avoids accompanying narration altogether (Karp & Feiner 1993, Butz & Kruger 1996, Christianson et al. 1996), employs canned audio clips in conjunction with generated 3D graphics (Bares & Lester 1997) or focuses on either basic coordination issues (Wahlster et al. 1993) or on the challenges of incorporating animated characters (Andre & Rist 1996). Furthermore, none of these projects consider the complex issues involved in multilingual output, and work that focuses on multilingual generation does not involve utterances tailored to either the pedagogical aspects of knowledge-based learning environments or the demanding exactness of systems for complex animated physical environments (Paris et al. 1995, Vander Linden & Scott 1995, Aguado et al. 1998, Bateman & Sharo 1998, Beale et al. 1998). To address this problem, we propose a multilingual explanation planning framework for generating multimedia explanations in English and Spanish that combine 3D animation and speech which complement one another. Because 3D animation planners require spatial knowledge in a geometric form that does not impact upon language choice, we have been able to develop this framework in such a way that only the natural language generator and speech synthesis system need to be changed in order to implement dierent language versions. This framework has been implemented in a multilingual version of CineSpeak (Towns et al. 1998), a multilingual multimedia explanation generator consisting of a media-independent explanation planner, a visuo-linguistic mediator, a 3D animation planner, and a realtime multilingual natural language generator with a speech synthesizer. CineSpeak has been used in conjunction with PhysViz (Fig. 1) (Towns et al. 1998), a prototype 3D learning environment in the domain of physics, to generate realtime multimedia explanations of three dimensional electro-magnetic elds, forces, and electrical current.

2 Multilingual Natural Language Generation The primary functions of a natural language generator in a learning environment are to provide the student with new information and to provide feedback on their progress. When generating a response to a question, the learning environment must decide which portion of its domain knowledge is relevant to the answer

Figure 1. PhysViz explaining the right-hand rule. and decide how to express it in natural language which can then be sent to the speech synthesizer. Many NLG researchers have been working on the problem of answering questions and generating explanations in non-pedagogical contexts. These classic pipelined NLG architectures (the unshaded boxes in Fig. 2) consist of three major processes: { Discourse Planning: When posed a question, e.g., (explain battery), the discourse planner, e.g. (Suthers 1991, Cawsey 1992, Hovy 1993, Moore 1995, Lester & Porter 1997), performs two important functions: it selects the relevant portions of the domain knowledge base that pertain to the correct answer, decomposes them into sentence-sized chunks, and orders the knowledge into a pedagogically motivated sequence that aids the student in acquiring the information. { Sentence Planning: Once the information has been selected and ordered, a sentence planner, e.g., (Callaway & Lester 1995), determines which elements of each sentence have which semantic roles and have focus, retrieves lexicalized domain elements from the lexicon, and composes complex noun phrases from individual lexicon elements. { Surface Realization: Each semantic chunk from the sentence planner is uni ed with a grammar (Elhadad 1992) of a particular language to determine a surface syntactic form and order, and a morphology component that adds appropriate pre x and sux morphological endings to root lexical words.

2.1 Knowledge Representation and NLG

In the domain of physics, a sample query from a student might be, \How much force does the magnetic eld exert on the wire?" After the domain problem solver determines the correct answer, the discourse planner extracts the related domain concepts such as wire, electric-current, and magnetic-field from the knowledge base and creates a discourse plan with the number of sentence speci cations necessary to answer the query. These are then ordered and the semantic roles of the domain concepts determined. The grammar then ensures that grammatically correct sentences will be generated and creates the nal

Communicative Goal

Discourse Planner

Sentence Planner

Spanish Lexicon

English Lexicon

Other Lexica

Spanish Grammar

English Grammar

Other Grammars

Surface Generator

Multilingual Text

Figure 2. Architecture of the NLG component surface string (e.g., \The magnetic eld exerts 10 newtons of force on the wire.") and sends it to the speech synthesizer. The rst requirement when answering a student query in this manner is to have the necessary domain knowledge on hand. To answer the above question, the knowledge base must contain information representing the relevant propositions, e.g., \wires are metal, and thus are subject to magnetic force," and \magnetic force is equal to mass multiplied by acceleration." While some systems add linguistic concepts to the domain ontology, our goal has been to leave the knowledge base unmodi ed (because the physics knowledge is independent of the language used to communicate it) and also to leave the discourse planner unmodi ed (because level-of-detail and explanation strategies are constant across languages). Furthermore, the knowledge base itself should be completely independent of linguistic information in order to allow multiple NLG systems to use the knowledge it contains (i.e., tying linguistic knowledge into a domain knowledge base forces others to accept a particular ontology in order to use that NLG approach).

2.2 Eects of Multilinguality on NLG Architecture In keeping with the goal of language-independent discourse planning, the sentence planner, which is the next element in the NLG pipeline, is the rst component to be informed of the target language. The sentence planner contains a clause-level component that determines which domain-semantic concepts should occupy which case role-semantic elements of a sentence. For example, it might map wire to medium and magnetic-force to agent. Furthermore, it may decide that wire should be assigned the sentence focus (perhaps because wire was

mentioned in the previous sentence). Additionally, the noun phrase generator (or referring expression generator) must be modi ed to take into account changes such as extracting the language-appropriate lexicon entry and adding default gender for all nouns. To enable a single sentence planner to generate texts in multiple languages, it is necessary to design the input of each grammar to be as semantically similar as possible, while retaining diering lexical items at the leaf levels of instantiated sentential trees. Because the Surge grammar (Elhadad 1991) is already wellde ned and is employed by a relatively large number of researchers, we decided to construct a Spanish grammar to accomodate it as closely as possible. As a side eect, we save substantial time during the grammar creation process by reusing large amounts of the English grammar rules contained in Surge. This is in accord with results obtained by others in creating Spanish grammars from pre-existing English grammars (Aguado et al. 1998). Modifying Surge to create a systemic-functional Spanish grammar entails addressing the following issues: { Linear Precedence: Ordering clausal and phrasal constituents at all levels, e.g., most adjectives and classifying nouns follow rather than precede the base noun in a noun phrase, e.g., \the civil war" vs. \la guerra civil". Also, indirect pronouns precede rather than follow the main verb, e.g., \I gave him the ball," vs. \Yo le di la pelota." { Default Features: Spanish has a number of default lexical features not present in English, such as nominal gender, pronoun formality, and determiner gender, e.g., \this" vs. \este" or \esta". { Pronouns: Spanish pronouns perform a number of functions, have dierent syntactic cases, and undergo transformations not found in English. For example, pronouns in Spanish can be used with passivization, have objective case when governed by prepositions, and can undergo surface transformations under certain circumstances, such as \se" to \le". { Verbal Changes: Spanish has dierent methods of expressing verbal relations, e.g., Spanish in nitives preferred over gerunds in \Al entrar. .. " for \Upon entering. . . ". Despite these changes, fully 90% of the Spanish grammar that was created in this manner was completely unchanged from the Surge grammar for English.1 Although relatively few changes were needed for the sentence planner and grammar, the lexicon required many more changes. Because lexical elements are unpredictable and have fundamentally dierent information (although similar ways of expressing it, e.g., gender as a syntactic feature is common to most languages), they must contain full-scale entries of lexical information for each language that can be generated. In our implementation, we thus have an English and Spanish version of every lexical item needed in the generation process. Our Spanish grammar implementation, which required approximately one month to construct, currently has over 60% of the syntactic coverage compared with Surge, as well as a number of examples not found in English, such as gender-in ected demonstrative pronouns. 1

Finally, morphological changes also require extensive work. The types of morphological changes we made fall into four major categories: { Verbs: Spanish has a large number of irregular verb stems, endings that dier with tense, mood, and time, irregular present and past participles, stem-changing verbs, etc. { Pronouns: Spanish pronouns must be in ected for person, number, at least seven dierent cases, gender, formality, possessiveness, etc. (e.g., \yo", \me", \m", \mi", \mis"). { Nouns: Spanish nouns have number, gender, and use dierent pluralization rules than their English counterparts. { Contractions and Enclitization: Spanish articles can contract with their prepositions in some cases, and there are special rules for joining prepositions to their governing verbs (e.g., prepositions governed by imperative and progressive verbs as in \