Text structure Modelling and Language Comprehension ... - CiteSeerX

0 downloads 0 Views 34KB Size Report
call the text of reference. This text has been built up according to neuropsycholinguistic hypothesis, relative to the comprehension/retrieval of the stu-.
Text structure Modelling and Language Comprehension processes Yllias Chali, Elsa Pascual and Jacques Virbel Centre National de la Recherche Scientifique Université Paul Sabatier 118, route de Narbonne 31062 Toulouse Cedex France KEYWORDS: text structure, text comprehension, text representation AFFILIATION: Institut de Recherche en Informatique de Toulouse E-MAIL:

[email protected] [email protected] [email protected] (33) 61 55 83 25 FAX NUMBER: PHONE NUMBER: (33) 61 55 67 64

1. Statement of the study

The present work is part of a project1 grouping together linguistic, neuropsycholinguistic and computer science researchers. The aim is [15]: • to obtain a precise definition of text structure, in order to use it in a system of natural language generation. • to study the impact of text structure on comprehension and memorization processes. The tested populations are normal and pathologic subjects (e.g. Alzheimer and right brain-damaged patients). This paper is focused on the first aim: We propose a model of text that is, at the same time, constrained by and needed for the second aim. This framework led us to examine the notion of text structure in an original way, following from the nature of the project: a class of neuropsycholinguistic experimental protocols, whose purpose is to evaluate the text comprehension, is based on a system of questions asked to the subjects. The relevance of their answers allows to weigh up their comprehension and memorization. However, these protocols are lacking of a formal characterization of text: the questions features cannot be controlled, and the set of the possible questions cannot be defined. Hence an appropriate text representation is needed for supporting this specific problem. For this reason, the proposed model is based on: • a decomposition of the texts in elementary sentences that support the questions, and on • a questions/answers structure.

We applied this model on a particular text, that we call the text of reference. This text has been built up according to neuropsycholinguistic hypothesis, relative to the comprehension/retrieval of the studied patients: it is a narrative text. This model is aimed to be the basis of a generation system of multiple versions of the same text, characterized by their structure differences. The long-term project is to integrate this system into a new experimental protocol. At present, the neuropsycholinguist built up a text following its hypothesis. Our idea is to enable the subject to build up the text himself, using the generation system: the beginning of the text is generated by the system; after that, a question is asked to the subject by the system; according to its answer the following part of the text is generated, and so on until the whole text is generated. The final text results from a set of choices done by the subject. This text defines a particular version, that can be interpreted according to the same neuropsycholinguistic hypothesis.

2. The model of text 2.1 The basis of the model Our computational model takes into account: • the text properties that are needed for verifying the neuropsycholinguistic hypothesis [15], • the text structure that are needed for automated text generation [12, 18, 10, 17, 14, 16]. The text structure representation makes explicit the knowledge about the descriptions and the relations between the structure of: • the elements of the sentences, • the sentences, and • the text units. This representation is defined for being the basis of the generation of several versions of the same text. These versions are characterized by their level of details: the text production is done at a variable depth and development. Consequently, this representation has to take into account elements that can be selected and structurally composed for defining one text version. In view of the definition of a text structure representation responding to these needs, we appeal to the following theoretical approaches: • the surface sentences of the reference text are decomposed into a set of elementary sentences, to which is associated a set of composition operators, following Harris’ approach [6, 7]. • the coherence [13, 9] and the cohesion of the text are achieved by the dependency relations

which hold between the utterances of the text. These dependencies are semantical, and consist in questions/answers relations which logical types are implication, manner, time and place. • the constitutive elements are arranged into text classes, formalized with a text grammar [19, 18, 17].

2.2 Harris’ operators system Harris ([6, 7]) formulates all the necessary and sufficient properties and relations of the natural language in terms of a mathematical system, and gives a characterization of the expressions of the language. He proves that the sentences have the property to be decomposable, by a one-to-one and calculable way, into a set of elementary sentences, except in some cases of degeneracy; inversely he defines a recursive process allowing to generate all the sentences from a finite elementary set of assertions (K) and by means of a finite set of operators (f). 2.3 The textual base Following this approach, the reference text is decomposed into a set of elementary sentences. This set constitutes the textual base. The complex sentences are built up by transformation of the elementary sentences of the textual base, and with help of their associated characterizations. For each elementary sentence, these characterizations are ([15]): • a syntactic schema in terms of elementary constituents. • a set of unary transformations (e.g. nominalization, pronominalization). • a set of internal questions, to which the elementary sentence constitutes an answer. These questions follow from the associated syntactic schema. Subordination relations hold on the set of elementary sentences for constituting complex sentences. The discourse is built up by a recursive application of operators whether on assertions (i.e. elementary sentences), or on sentences obtained by operators applications. This text representation, in terms of elementary sentences and associated transformational operators gives a description of the internal relations of the sentences. We need to extend it for handling relations between the elements of the text structure, in particular for: • the coherence/cohesion notion holding between the text units. • the ordering of the constitutive elements of the text classes. 47

2.4 The semantic relation between sentences The processing of the meaning in the transformational theory is linked to the acceptability notion. It is a process for defining the set of the simple elementary sentences (i.e. sentences with one verb, subject and different objects). This set is considered as the set of the minimal units of meaning [4, 7]. By means of transformations, it is used for generating the complex sentences. The systematic application of the set of the unary and binary transformations on the set of the simple and complex elementary sentences generates many text variants (different by their meaning) of the same basic set of elementary sentences. Moreover, a text corresponds to a particular description of the world situation, and we have to delimit the combinatory nature of the composition operations that the elementary sentences can support. For solving the problem of the sentence semantics in the text, we refer to the logic of questions theory [1, 3, 2, 8, 11, 5]. The principle of this theory is that the meaning of a sentence is constituted by the different answers to the questions which are supported by this sentence. This principle is stated, in an informal way, by Hamblin’s postulates [5]: 1 2 3

an answer to the question is a statement knowing what counts as an answer is equivalent to knowing the question the possible answers to a question are an exhaustive set of mutually exclusive possibilities.

In a text, and following this principle, a sentence evokes only a part of the global description coming from the text, and the only way for this sentence to interact with the other descriptive elements is through the questions/answers relations. For this reason, we provide the set of the basic elementary sentences with questions/answers relations, allowing them to compose the meaning of the text. These relations link an elementary sentence, support of questions, to its set of answers. Three types of relations are distinguished: • temporal, answering to "when" question • manner, answering to "how" question • place, answering to "where" question • implicational, answering to "why" question, which is subdivided into: • implicational motive, which answer is "because" • implicational purpose, which answer is "in the purpose of", or "in order to", etc.

2.5 The organization of the constitutive elements of text The large texts, with a complex structure, can be divided into classes. At each class of text corresponds a description in terms of constitutive elements. The ordering of these elements can be formalized by a text grammar [19, 18, 17]. For example, the technical or scientific papers class corresponds to the following description:

better the comprehension of certain subjects (e.g. Alzheimer and right brain-damaged patients), and could help to evaluate more exactly the evolution of the qualitative treatment of textual information.

References [1]

[2] ...


Most of these constitutive elements can be themselves recursively decomposed into others constitutive elements, following the textology rules, until to reach elementary or non-decomposable constitutive elements. Thus, a hierarchical structure is defined on the constitutive elements. According to this text structure, the set of the text elementary sentences is divided into disconnected subsets. In the framework of our project, we consider a class of narrative texts. In this class, the set of the elementary sentences is partitioned into three disconnected subsets, corresponding to the three text episodes: initial situation, event, final situation; in addition, the subset relative to the event episode is also subdivided into three subsets: anecdote, peripeteia, resolution.

[3]

[4]

[5]

[6] [7]

[8]

3. Conclusion The representation in terms of elementary sentences and associated characterizations, presented in sections 2.2 and 2.3, takes into account only of the internal elements of the sentences. It is suitable for the generation of the complex sentences. Its extension presented in section 2.4 and 2.5, takes into account the descriptions and the links between the elements of the text structure, in particular the relations between the sentence structures, and between the constitutive elements of the text. The whole representation provides the base of a production system of the coherent texts with complex structure. This representation is both of syntactical and semantical nature. At the moment, we are working, on the one hand, on the logic of questions for more characterization of the semantic component, and on the other hand on the mechanism of production of text versions. At short-term, the impact of this study is to consider the second aim of the project, namely to exhibit the underlying interpretations of the differences of text structures. These structures are represented by our model. In return, the obtained results would clear up more precisely the notion of text structure. At long-term, a major impact could concern the links between memory and language. In particular, this kind of work, may help to understand

[9]

[10]

[11] [12]

[13]

[14]

Aqvist, A New Approach to the Logic Theory of Interrogatives. Almqvist & Wiksell, Uppsala, 1965. N. Belnap and T. Steel, The Logic of Questions and Answers. Yale, New Haven, 1976. M. Cresswell, On the Logic of Incomplete Answers. Journal of Symbolic Logic, (30):65-68, 1965. M. Gross, Sur la notion Harissienne de transformation et son application au français. Revue Langage, (99), 1990. D. Harrah, The Logic of Questions. In D. Gabbay and F. Guenthner Eds., Handbook of Philosophical Logic, volume II, pages 715-764. D. Reidel Publishing Company, 1984. Z. S. Harris, Structures Mathématiques du Langage. Edition Dunod, Paris, 1971. Z. S. Harris, La génèse de l’analyse des transformations et de la métalangue. Revue Langage, (99), 1990. J. Hintikka, The Semantics of Questions and the Questions Of Semantics: Case Studies in the Interrelations of Logic, Semantics and Syntax. North-Holland Amsterdam, 1976. J. R. Hobbs, On the Coherence and Structure of Discourse. Technical Report Center For the Study of Language and Information, CSLI-85-37, 1985. E. H. Hovy, Approaches to the Planning of Coherent Text. In W. R. Swartout, C. L. Paris and W. C. Mann Eds., Natural Language Generation in Artificial Intelligence and Computational Linguistics, pages 83-102. Kluwer Academic Publishers, 1991. W. Lehnert, The Process of Questions Answering. Wiley, New York, 1978. W. C. Mann, Text Generation: The Problem of Text Structure. In D. D. McDonald and L. Bolc Eds., Natural Language Generation Systems, Symbolic Computation, pages 47-68. Springer-Verlag, 1988. W. C. Mann and S. A. Thompson, Rhetorical Structure Theory. In G. Kempen Eds., Natural Language Generation, New Results in Artificial Intelligence, Psychology and Linguistics, pages 85-96. Martinus Nijhoff Publishers, 1987. M. W. Meteer, Expressibility and the Problem of Efficient Text Planning. Communication in Artificial Intelligence Series, Pinter Publishers, 1992. 48

[15] J. L. Nespoulous and J. Virbel, Compréhension/mémorisation de textes de différentes structures par sujets normaux et pathologiques. Technical Report, Programme de Recherche e n Sci ences Cognitives de Toulouse, 1991. [16] C. L. Paris, User Modeling in Text Generation. Communication in Artificial Intelligence Series, Pinter Publishers, 1993. [17] E. Pascual, Représentation de l’architecture textuelle et génération de texte. PhD thesis, Université Paul Sabatier, Toulouse, 1991. [18] E. Pascual and J. Virbel, Le problème de la génération de textes structurés. Septième Congrès Reconnaissance des Formes et Intelligence Artificielle, pages 1181-1188, Paris, 1989. [19] G. Sabah, L’intelligence artificielle et le langage, processus de compréhension, volume 2, Edition Herms, 1989. 1.

This project is supervised by Jean-Luc Nespoulous and Jacques Virbel. It takes place in the Research Program in Cognitive Science of Toulouse, National Center of Scientific Research (Cognisciences Program). It is financed by the Ministry of Education and Research.

49