Towards a Text Representation for Text Expanding

0 downloads 0 Views 46KB Size Report
It is only one kind of text concerned by the process of explanations according to Maybury's taxonomy [13]. These texts are characterized by a sequence of three.
Towards a Text Representation for Text Expanding Yllias Chali1 , Elsa Pascual1 and Jacques Virbel2 Abstract. Text expanding enables to relate a story by its kernel version, or by more and more detailed versions, until the maximal one, which contains all the available information. It corresponds to text versioning by progressive introduction of details about the material of the text. This process is based on a text representation, including a sentence representation and its characterizations (following Harris’s grammar), and a question/answer structure reflecting the semantical links between sentences. This representation integrates at the same time linguistic resources and semantic knowledges and is dedicated to texts being the narration of events and states of a story.

1

INTRODUCTION

Text expanding enables to relate a story in all its details, constituting a text with complex and large structures studied in [17, 18, 19] ; it can also tell the story in its minimal version, that we call kernel version of the text. In other words, a story can be mentioned in variable depth and details in a text. The organization of the introduction of the details into the kernel version constitutes text versioning. These text versions of a story depend on their contents. The aim of text expanding through the introduction of details is to explain how, when, where and why questions supported by the utterances of the text in order to get the addressee to know about the events and the states narrated by the story. This explanation is done by text versioning, in a step by step way, increasing version of text from the kernel version until the reach of the largest version of text supposed including all the details. Since natural language generation is a process of going from a representation of a situation to a textual expression of some relevant portion of that situation in a natural language, most of the works in natural language generation have addressed the problem of expressibility, and try to bring representation back to expressions in natural language [14, 9]. A situation can be the representation of the world in a data base, or the state of the program, or a person’s mental state, and some relevant portion of the situation will contribute to the message that will be expressed. Hence, for ensuring the realization in natural language of the message corresponding to the relevant portion of the situation, this representation must be provided at the once time by:

 linguistic knowledge, and  semantic knowledge enabling it to produce utterances of language in respect of linguistic resources and communicative goals. 1

Institut de Recherche en Informatique de Toulouse, Universite´ Paul Sabatier, 118 route de Narbonne, 31062 Toulouse Cedex, France. 2 Institut de Recherche en Informatique de Toulouse, Centre National de la Recherche Scientifique, 118 route de Narbonne, 31062 Toulouse Cedex, France.

c 1996 Y. Chali, E. Pascual and J. Virbel Proceedings of the ECAI 96 Workshop Gaps and Bridges: New Directions in Planning and Natural Language Generation. Edited by K. Jokinen, M. Maybury, M. Zock and I.Zukerman.

We give below a formal representation of the structures of text which can support the expansion operation of text versioning. This representation is based on a linguistic knowledge and on the logicosemantical links holding between the minimal units of text. The minimal units of text are the minimal units of meaning in respect to a given language3, following Gross’s analysis [4]. Our domain application for text expanding is narrative texts relating a story by its events and states. It is only one kind of text concerned by the process of explanations according to Maybury’s taxonomy [13]. These texts are characterized by a sequence of three episodes: initial state, event, final state, and generally the last and the first episodes are in contrast. The event episode is also broken down into three sub-episodes: anecdote, peripeteia, and resolution. This domain application is in connection with experimental protocols in neuropsycholinguistic, for studying the language relations between text structures and text comprehension 4.

2

CONTEXT OF THE PROJECT

This project involves a linguistic and a logical aspects which are detailed in the following sections. It also involves a psycholinguistic aspect, through a collaboration with the researchers of the Neuropsycholinguistic Laboratory of the University of Toulouse I (Prof. J-L. Nespoulous). This context determines some limits, and several methodological aspects that we summarize here. The intended aims are of two folds:

 to assess and to characterize the understanding and memorizing performances of subjects of different kinds (normal subjects and pathological subjects) through their answers to questions about a text. The decomposition mechanism of textual sentences into elementary sentences gives an access to the queriability of text, -for a given class of questions (the chosen class is presumed pertinent by the psycholinguist for the studied understanding performances)-.  to evaluate and to characterize the mode by which the subject can growth the basic text, through request of informations hoped to acquire, by a reiterative way. It is that situation that serves to the presentation given below. The same set of elementary sentences, instead to be used for asking questions to subjects for evaluating their understanding about text, is used as a support allowing he/she to indicate if he/she hopes to acquire new informations towards the current version of a text in the expansion process (the request being incorporated into the current version of text). This experimental situation allows us to precise the following points: 3 4

Our application is text expressed in French language PRESCOT project (Pole of Research in Cognitive Science of Toulouse).

 It is probably possible to determine one or several basic texts from the formal properties of the question/answer structure and the set of decomposed sentences. But such basic version can also be experimentally determined by normal subjects. It is such version that serves as reference.  Th level of the decomposition into elementary sentences is itself liable to variations. Thus [6, 7], the substantive determination (a/the) or the tense of verbs can be considered as primitives, or give further decompositions that fix the entity references and the temporal intervals. In order to optimize the natural expressing of the handled sentences, we have chosen the first alternative in this study.  The types of the asked questions about a text constitute only a restricted sub-set of the whole set of possible questions: it is those which give an evidence: in the one hand the functions of significant descriptive details vs insignificant descriptive details in the range of the story, on the other hand the narrative causality.  In the case of our study, the expansion strategy of the basic text is not defined a priori. It is a result of the expansion process managed by the subject. The planning can operate here as a means to characterize this process. It leads an unusual role because it not guides the generation process but characterizes the action of the subject. The importance of the preliminary works to be realized does not allow us yet to study the concrete role of such planning.

3

4 TEXT REPRESENTATION 4.1 Textual base Following Harris’s approach in [6], the maximal version of the text is decomposed into a set of elementary sentences that constitutes the textual base [20]. For example, text (1) is decomposed into the set (2) of elementary sentences. (1) An old man screamed for help (2) a. A man was old b. The man screamed for help At each elementary sentence, we associate a syntactic schema, allowing to build up complex sentences by transformations of the elementary sentences of the textual base. For example, to the elementary sentences (2) are associated respectively the following syntactic schemas: (3) a. N0 be Adjective b. N0 V Prep N1 The text (1) is obtained by sequential application of the andconjunction operator, which arguments are instantiated by the sentences (2a) and (2b), and by application of the reduction to zero operator, followed by the application of the permutation operator to the result of the previous operation. This representation, based on a set of elementary sentences, their characterizations, and their associated transformational operators, gives a description of the internal relations of the sentences, and allows the composition of complex sentences. We need to extend this representation for handling relations between text units, and for permitting to compound them for obtaining a text.

BASIS OF TEXT REPRESENTATION

With the aim of formalizing the text expanding process, we evolved a representation of text, based on three theoretical principles:

 The text is represented by its minimal units of meaning [4], which allow us to compose meaning from them in a calculable way. The set of the minimal units that are available for expanding the kernel version into several other texts, is obtained by decomposition of the largest version, which is the data of our study. This decomposition into a set of elementary sentences and a set of associated characterizations is done, following Harris’s approach in Mathematical Structures of Language [6]. The minimal units are elementary sentences and all the other sentences are obtained from these set of elementary sentences with help of transformations.  The kernel version of text is defined as the version that cannot be reduced, following [15]. It corresponds to the synopsis of the original text, according to Mann and Thompson’s prediction in their text rhetorical analysis [12]. They argue that if units that only function as satellites and never as nuclei are deleted from a text, the resulting text is still coherent, and its message is resembling that of the original. However, if any unit that function as nucleus is deleted, the result should be incoherent, and the central message of the text is difficult or impossible to comprehend, and if a particular nucleus is removed, then the significance of the material talked about in the text will not be noticeable.  The representation of the semantical dimension of the studied class of texts is enhanced by the association of a question/answer structure to the elements of the previous set; it ensures all the links between the minimal units of meaning according to the story. This structure is related to the logic of questions [1, 3, 2, 8, 11, 5]. Each elementary sentence is linked to other elementary sentences through this structure which is elaborated on the basis of the setof-answers reduction methodology [5]. It identifies a question supported by an elementary sentence by its set of answers, that are also elementary sentences. Towards a Text Representation for Text Expanding

4.2

Question/Answer structure

The processing of the meaning in the transformational theory is linked to the notion of acceptability. It is a process for defining the set of the elementary sentences (i.e. sentences with one main verb, subject and different objects). This set is considered as the set of minimal units of meaning [4, 7]. By means of transformations, it is used for generating complex sentences. So, we need to extend this representation for improving the semantical component representation of text relating a story. For achieving this aim, we link each elementary sentence to the set of the answers to the questions that are supported by this elementary sentence. For this, we refer to the theory of the logic of questions [1, 3, 2, 8, 11, 5]. The principle of this theory is that the meaning of a sentence is constituted by the different answers to the questions which are supported by this sentence. This principle is stated by Hamblin’s postulates [5]: 1. An answer to a question is a statement. 2. Knowing what counts as an answer is equivalent to knowing the question. 3. The possible answers to a question are an exhaustive set of mutually exclusive possibilities.

1124

Following this theoretical principle, and according to the set-ofanswers reduction methodology, we supply the set of the basic elementary sentences with question/answer relations, allowing them the composition of the meaning of the text. These relations link an elementary sentence which supports questions to its set of answers. Four types of relations are relevant in this context [20]: Y. Chali, E. Pascual and J. Virbel

   

manner, answering to the how ? question place, answering to the where ? question temporal, answering to the when ? question implicational, answering to the why ? question, which is subdivided into: – implicational-motive, which answer is because – implicational-purpose, which answer is in the purpose of, or in order to , etc.

For example, table 1 shows the question/answer relations holding between the elements of the set of elementary sentences (5) corresponding to the decomposition of text (4). (4) One Tuesday, while the man had just destroyed a wasps’nest, he found himself pinned on the 3 meter high roof. Because he aimed to go down very quickly, he knocked down the ladder that he had precariously balanced against the wall. Since he started to scream for help, a courageous child, who was playing quietly in the street, raised his head, understood the situation, and put back the ladder which was on the ground near a rosebush. (5) a. The wasps had done a nest b. The man had just destroyed the nest c. One Tuesday, the man found himself pinned on the roof d. The roof was 3 meter high e. The man aimed to go down very quickly f. The man knocked down the ladder g. The man had precariously balanced the ladder against the wall h. The man started to scream for help i. A child was courageous j. The child was playing quietly in the street k. The child raised his head l. The child understood the situation m. The child put back the ladder n. The ladder was on the ground o. The ladder was near a rosebush

5

NEW DIRECTIONS AND FUTURE WORKS

Indeed, the computational model of text expanding is decomposed into representations and algorithms which by the process of text versioning works. In this paper we have just described the representation which can support this process, and our aim is to meet them together. From the kernel version of a text, text expanding proceeds by an incremental introduction of details for growing the text version. The whole set of details to one event or state, which are elementary sentences and associated characterizations can be partitioned into an ordered sub-sets defining thus levels of details. This order allows to distinguish between the more and the less important details in the scope of the narrated events and states of the story. In our case, it must be calculated by help of the question/answer relations holding between elementary sentences. The insertion of details into the text version in order to expand it and to still obtain coherent text is also a problem which must be considered. For that, we refer to an approach of text organization like as Hovy and Paris do in planning coherent multisentential paragraphs [10, 16] by help of rhetorical relations [12].

[2] Belnap, N., Steel, T. (1976). The Logic of Questions and Answers. Yale, New Haven. [3] Cresswell, M. (1965). On the Logic of Incomplete Answers. Journal of Symbolic Logic, (30):65-68. [4] Gross, M. (1990). Sur la notion harisienne de transformation et son application au franc¸ais. Revue Langages, Les grammaires de Harris et leurs applications, par Daladier, A., Gross, M., Harris, Z. A., Lentin, A., Ryckman, T., (99):39-56, Larousse, Paris. [5] Harrah, D. (1984). The Logic of Questions. In Gabbay, D. and Guenthner, F. editors. Handbook of Philosophical Logic, volume 2, pages 715-764, by D. Reidel Publishing Company. [6] Harris, Z. S. (1971). Structures Mathe´matiques du Langage. Dunod Edition, Paris. [7] Harris, Z. S. (1990). La ge´ne`se de l’analyse des transformations et de la me´talangue. Revue Langages, Les grammaires de Harris et leurs applications, par Daladier, A., Gross, M., Harris, Z. A., Lentin, A., Ryckman, T., (99):9-20, Larousse, Paris. [8] Hintikka, J. (1976). The Semantics of Questions and the Questions of Semantics: Case Studies in the Interrelations of Logic, Semantics and Syntax. North-Holland, Amsterdam. [9] Horacek, H., Zock, M. (1993). New Concepts in Natural Language Generation: Planning, Realization and Systems. Communication in Artificial Intelligence Series, Pinter Publishers. [10] Hovy, E. D. (1988). Planning Coherent Multisentential Paragraphs. In Proceedings of the 26th Annual Meeting of the Association of Computational Linguistics, Buffalo, New York, pages 163-169. [11] Lehnert, W. (1978). The Process of Questions Answering. Wiley, New York. [12] Mann, W. C., Thompson, S. A. (1987). Rhetorical Structure Theory: a Theory of Text Organization. Technical Report of the Information Sciences Institute, ISI/RS-87-190. [13] Maybury, M. T. (1992) Communicative acts for explanation generation International Journal Man-Machine Studies, (1992) 37, 135-172. [14] Meteer, M. W. (1992). Expressibility and the Problem of Efficient Text Planning. Communication in Artificial Intelligence, Pinter Publishers Ltd. [15] Nespoulous, J. L., Virbel, J. (1991). Compre´hension/me´morisation de textes de diffe´rentes structures par sujets normaux et pathologiques. Technical Report of the Research Program in Cognitive Sciences of Toulouse. [16] Paris, C. L., Swartout, W. R. and Mann, W. C. (1991). Natural LanguageGeneration in Artificial Intelligence and ComputationalLinguistics. Kluwer Academic Publishers. [17] Pascual, E. (1991). Repre´sentation de l’architecture textuelle et ge´ne´ration de texte. PhD thesis, Paul Sabatier University, Toulouse. [18] Pascual, E. (1993). The Modeling of Text Formating for Text Generation. Fourth European Symposium on Natural LanguageGeneration, Pisa, Italy. [19] Pascual, E. (1996). Integrating Text Formatting and Text Generation. In Adorni, G. and Zock, M., editors, Trends in Natural Language Generation: an Artificial Intelligence Perspective, Lecture Notes in Artificial Intelligence 1036, Springer, 205-221. [20] Virbel, J. (1993). De´composition du texte en un ensemble de phrases e´le´mentaires et caracte´risations associe´es. Technical Report of the Research Program in Cognitive Sciences of Toulouse.

REFERENCES ˚ [1] Aqvist, L. (1965). A New Approach to the Logic Theory of Interrogatives. Almqvist & Wiksell, Uppsala.

Towards a Text Representation for Text Expanding

1125

Y. Chali, E. Pascual and J. Virbel

Sentence support of questions

a b c d e f g h i j k l m n o

Table 1.

Answer sentences Why-motive How When

a a, b, e, f , g

f

b e, g

e

c e

c, f

c

h c, h e, f , h, l, n f f

c, e, f , h h h, k h, l f f

h

Question/Answer relations holding between the elementary sentences of text (4).

Towards a Text Representation for Text Expanding

1126

Y. Chali, E. Pascual and J. Virbel