19 Butterworth, B 'Lexical access in speech production' in Reference 8. 20 Sehreufler, R .... nitive Linguistics Benjamin, Netherlands (1990). 79 Langaeker, J ...
Lexical choice as pattern matching J-F Nogier and M Zock*
A good lexical component is a vital part of any naturallanguage system. The paper discusses an implemented lexical c o m p o n e n t that is part of a larger system currently being developed for information retrieval. Two special features of the system are its unique knowledge-representation formalism on the various levels, conceptual graphs, and a unique lexicon for the parser and the generator. Lexical choices depend on various knowledge sources (pragmatic, conceptual, linguistic etc). The conceptual component, i.e. the words' underlying meanings or definitions, are discussed in the paper. The authors believe that word meanings and utterance meanings are isomorphic, in the sense that (a) words, sentences and texts are simply different units for conveying a message (words being shorthand labels for larger conceptual chunks), and (b) the core meanings of words and texts (sentences) can be expressed by the same formalism: conceptual graphs. This view allows the process oflexical choice to be modelled by matching definition graphs (word definitions) on an utterance graph (conceptual input). Further, it provides a natural basis for paraphrases and for explanations concerning the conceptual differences between a set of words. Keywords: natural-language generation, lexical choice, paraphrases, conceptual graphs
LEXICAL C O M P O N E N T : NATURAL-LANGUAGE
POOR COUSIN SYSTEMS
OF
It is a truism to say that a g o o d lexical c o m p o n e n t is a vital part o f any natural-language system, be it for analysist or synthesis (generation, abstracting, paraphrasing or translation). Despite this fact, the most i m p o r t a n t part o f the lexicon, its conceptual c o m p o n e n t (meaning), has not received the a m o u n t o f attention that it deserves (for tWhile this seemsobvious for generation, it would be wrong to believethat such a component is of little importance for parsing. True text understanding requires going beyond the information given, i.e. inferencing. Division SDC, Thompson CSF, Service DT-PTI, 79 rue de Mathurins, BP 10, F-92 223 Bagneux, France *LIMSI - CNRS, BP 133, 91403 Orsay, France Paper received 31 January 1992. Accepted 13 April 1992
200
an exception, see Reference 1). C o m p a r e d with other aspects (syntax and morphology), it remains an underdeveloped c o m p o n e n t . This last view is shared by m a n y leading researchers in the field: With only few exceptions, generation researchers have so far paid little attention to the nature of words... Cumming's 1985review of generation lexicons identifies many more open problems than it does accepted solutions/ Most approaches simply provide engineering tools that allow their systems to make lexical choice in a reasonable, if relatively unsophisticated way ... a truly satisfactory theoretical approach for lexical choice has yet to be developed) In some important sense, these systems have no real knowledge of lexical semantics... They use fragments of linguistic structure which eventually have words as their frontiers, but have little or no explicit knowledge of what these words mean. At best, these systems assume that each conceptual primitive corresponds to a particular unique lexical item or phrase trivializing the problem of lexical semantics to the claim that the meaning of the word can be represented by the same word in upper case/ While it is true that little progress has been accomplished since G o l d m a n ' s seminal w o r k 5,6, one must admit that things are changing. Several collections o f papers have been edited by linguists 7, psychologists 8 and computational scientists 93°. There are two m o n o g r a p h s by psychologists ~J2, various empirical studies 13-~6,and at least two g o o d discussions in psycholinguistic textbooks ~7,18. Finally, there are excellent survey papers written from a psychological viewpoint j9,2°, and f r o m a c o m p u t a t i o n a l point o f view 21. A journal special issue has also been published (see, in particular, References 22-28, and, finally, there have been a great n u m b e r o f publications in proceedings and journals in which c o m p u t a t i o n a l linguists directly address the problem o f lexical choice in natural-language generation (see, for example, References 2, 21 and 29-60).
FRAMEWORK The lexical c o m p o n e n t described in this paper has been designed in the context o f a larger system n a m e d KALIPSOS (Knowledge Acquisition/Logical Inference Process/
0950-7051/92/030200-13 © 1992 B u t t e r w o r t h - H e i n e m a n n Ltd
Knowledge-Based Systems
IL,
I
i
°2t- l
J
",./
"x
Figure 2. Utterance graph Gt
M ~
M~e
received
to be sent
Figure 1. Architecture Of KALIPSOS system Symbolic Oriented Software), whose goal it is to retrieve information from financial texts. The knowledge is represented in terms of conceptual graphs 6~. Only a short description is given in this paper. For more details, see Reference 62. KALIPSOS uses a single semantic lexicon for both analysis and synthesis. As can be seen in Figure 1, the parser and the generator share the lexical database. Words (lexemes) are associated with conceptual graphs which encode their underlying meaning*. The parser uses these graphs to reconstruct the underlying meaning of the sentence and its components, words t. Conceptual graphs thus represent not only the sentence meaning (utterance-graph), but also the meaning of the utterance's components, its individual words (word-definition graphs). This has, of course, important consequences as will be seen. If, later on, a user asks for information, the system's reasoning component looks at the database, and retrieves the relevant piece of information. The result of this search and reasoning process is, again, a conceptual graph that represents the message to be conveyed (utterance graph). The generator takes this message as input, and produces the corresponding output (sentence). For more details on text analysis and information retrieval based on conceptual graphs, see Reference 62.
CHOICE OF W O R D S O N BASIS OF CONCEPTUAL
GRAPHS
Suppose that it is desired to express the meaning in Figure 2 encoded by the utterance graph G~. Keep in mind that the same formalism (conceptual graph) is used to represent the meanings on various levels (word, sentence). While Gl represents the underlying meaning of *Of course, word definitionsare not enough to account for meaning. To determine what a word really means, one needs also to know the context in which it is used. tOn the conceptuallevel,there are no such units as wordsor sentences. One could think of an analogy of wordsand conceptson the one hand, and sentencesand propositions on the other. However,as will be seen, this kind of comparison does not lead very far. Vol 5 No 3 September 1992
a sentence, the definition graphs introduced below represent the meaning of a word. Obviously, this graph can be verbalized in many ways. There are two extremes: (a) each concept/relation corresponds to a word (one-to-one-mapping), (b) a single word can express the entire message, i.e. all the concepts and relations encoded in the utterance graph (all-to-onemapping). In most cases, the reality lies between these two extremes (many-to-many mapping). An excellent example of this all-to-one-mapping is found in a seminal paper by Lashley 63, in which he cites work done by Chamberlain 64, an ethnomethodologist: The Cree Indian word 'kekawewechetushekamikowanowow' is analyzed by Chamberlain [1911] into the verbal root, tusheka, 'to remain', and the various particles which modify it as follows: ke(la)wow, the first and last syllables, indicating second person plural; ka, a prefix of the future tense; we, a sort of imperative mode expressing a wish; weche, indicating conjunction of subject and object; mik, a suffix bringing the verb into agreement with a third person subject and second person object; and owan, a suffix indicating that the subject is inanimate and the object animate. A literal translation: 'You will I wish together remain he-you it-man you', or, freely, 'may I remain with you'. 6~
Obviously, languages differ in their capacity to integrate conceptual chunks of variable size in single words. The example given by Lashley seems to be an extreme case. However, this need not be the case. Actually, there are many common words that need to be decomposed to varying degrees according to the addressee's expertise. Words such as 'justice', 'discovery', 'inflation', 'computer' etc. are frequently used. However, on occasion, each one of them is too dense or too abstract. That is why it is necessary to decompose them. Take, for example, such a common word as 'computer'. How would one translate this idea if one were talking to a person who came from a culture where there were no such machines? O f course, one could use a periphrase such as 'a machine that processes information'. However, that does not solve the problem: what is a machine? What does it mean to process symbols or information? Each one of these defining terms may need to be decomposed in its turn. A similar problem arises in translation, where the text of the source language contains two words, while the target language requires just one. Take, for example, such frequently used words as 'very much' and 'not know'. In each case, English requires two words, where in French one needs only one: 'beaucoup' and 'ignorer'. Besides interlingual differences, there are also differences within a given language. Consider the following pairs, in which two concepts are expressed by one word or several words: ignore unhappy
not consider not happy 201
punch leave
hit hard go away
H o w the system chooses the main verb expressing the central action (movement) of the episode encoded* in G~ is described below (more details are given further below). The generation of the corresponding sentence is also given further below. As the meanings of the words are defined in the same way as the meaning of the utterance, lexicalization consists in matching definition graphs (word meanings) on an utterance graph. LEXICON The words in the lexicon contain three kinds of information: their meanings (word-definition graphs), their base forms (lexemes, e.g. infinitive for verbs), and their possible syntactic structures (type of transitivity, voice etc.). The graphs in Figure 3, which express the underlying meaning of movement verbs, are a subset of the lexicon t . As the graphs express the word's underlying meaning, they are called word-definition graphs. Each lexical entry has the following structure:
word contains as many conceptual graphs as there are different syntactic structures. In other words, a hybrid form of representation is used. The reason for this is quite simple: the same meaning can be expressed by different words, each of which may exhibit different syn-
tactic structures. Take, for example, the following conceptual input or utterance graph, cause(X, Y), where X and Y can be expressions of arbitrary complexity (simple or complex propositions, i.e. noun phrases). In this case, one may use verbs belonging to either of the following groups: X (causes) Y (NP-V-NP) Y(is due to) X (NP-V-PP) X (givesriseto) Y (NP-V-NP-PP) Y(followsfrom) X (NP-V-PP) X (bringsabout) Y (NP-V-PP) Y(is the resultof) X (NP-V-NP-PP) According to the verb chosen (lexeme), one is committed to a specific syntactic structure; lexical and syntactic choices cannot be taken independently. For a more thorough discussion of the interdependency between conceptual, pragmatic and linguistic choices (syntax and semantics), see Reference 71.
First steps: selection of word Step 1: preselection (keyword retrieval)
NAME is C G NAME (for instance, VERB('to walk', VB_INTRANS)) is a functional term, whose name (e.g. 'VERB') is the word's syntactic category (part of speech). • The first argument is the lexeme: 'to walk'. • The second argument is the name of a generic syntactic graph (e.g. VB INTRANS). This syntactic graph represents the syntactic structure which is used to express the given meaning. C G is a generic conceptual graph which describes the underlying meaning of the word given as the first argument. It should be noted that special signs have been added to this graph (A SUB, A VERB). These signs, which are called labels, are used later in the process of word choice. For example, the label A VERB is used to signal that this concept will surface as a verb. As noted above, lexical items are associated not only with conceptual graphs, but also with syntactic information (the generic syntactic graph). Actually, each
Because the same conceptual input can be expressed by many words, there is a great risk of combinatorial explosion. To avoid this problem, a first rough choice is made (keyword retrieval) by trying to find a word for the most central concept of the conceptualization (typically an ACTION, STATE o r PROCESS). The task is thus to find a word which conveys the central idea of [MOVEMENT], as encoded in G~. As this kind of concept typically maps onto a verb, the system tries to find a verb expressing a movement, that is, the system distinguishes movement verbs from, say, mental states (see, hear), transfer of possession verbs (buy, give, lend) etc. These verbs pertain to other semantic fields or domains. Given the discussion above about the role of the label A, the word-definition graphs, from which one may choose, must contain the concept [MOVEMENT:* AVERB]. This yields in this case the following list: 'to walk', 'to drive', 'to move', 'to swim', 'to run' (see Figure 4).
Step 2: choice of possible candidates by use of pattern matching As the resulting list contains more than one candidate, it is necessary to eliminate all but one. For this to be done, pattern-matching (covering, filtering) will be relied on, i.e. the projection operation as defined by Sowa 59,61 will be usedL The result of this operation is kernel graphs. All
*Note the following: • Sentence generation is verb-driven. • Only verb choice is discussed here (for the lexicalization of other syntactic categories, see Reference 65).
• It is assumed here that a given conceptual structure is typically (by default) expressedby a specificsyntacticstructure. This is, of course, a simplification. Predicates may surface not only as verbs, but also as nouns, adjectives etc. While this paper does not deal with the interaction between word choice and syntactic structure, there is a means of solving this problem: as many syntactic structures as the particular word allows for are associated with each lexeme. For instance, if a verb allows for both the active and passive voice, the lexicon must contain a graph for each one of these forms. A similar approach is taken in Gross's lexicon-grammar~6and Joshi's treeadjoining grammar67.
202
tNote the following: • Even though the descriptions are given in English, they are based on the analysis of French words.
• Even if these definitions can be challenged on various grounds (linguistic or psychological)(see, for example, References68-70, the changes do not affect the authors' general point of view: lexical choice is a process of matching definition graphs on an utterance graph. • The reader should bear in mind that the authors' object is not so much to provide a fine-grained knowledge representation for a specific class of words (semantic field), but rather to provide a method that can account for word choice. That is, the authors focus on the process rather than on the data. ~For implementationdetails, see References72 and 73.
Knowledge-Based Systems
Figure 3. Word-definition graphs for movement verbs Vol 5 No 3 September 1992
203
?
0
b~
-#
r~
E"
4~
* AOBJ
I
[
Words remaining after key-word selection Movement verbs
l ~.~g:~÷~i~::i!l
1Vl~Ilt~"
I ~ w O D~ t ~ : ~ = ~ : * :
Projection
Lexical Data Base : Word Definition Graphs
• SYNTACTIC_STRUCI'URE(VB_INTRANS)is
[ENTITY:. A S I j B ] . ~ - - - { ~ . ~ . -
I ACTION:* A VERB I
Figure 6. Generic syntactic graph
Figure 5. Kernel graph G2 the conceptual graphs previously selected are projected onto the utterance graph G~ (see Figure 4). A projection succeeds if the word definition is a generalized subgraph of G1. The system chooses the graphs that match perfectly with G~. The following words remain possible candidates: 'to move', 'to walk' and 'to run'. All the other items failed the test*. As there is still more than one candidate, and as all of them express more or less precisely the intended content, it is necessary to reduce the list further to choose the best candidate. Step 3: selection of best candidate on basis of correlation factor The problem of word choice consists in not only selecting one item from a list of candidates, but also above all in determining the word that expresses most accurately the intended content (access versus meaning). Obviously, there is more to word choice than meaning; there is also rhetorical effect. Hence, besides conceptual information, stylistic and pragmatic factors must be taken into account~. Among the three candidates, not all the words fit equally well. In this case, 'to run' is the most adequate, as it expresses more of the utterance graph Gl than 'to walk' and 'to move'. While the former conveys the notion of a movement, it does not specify the speed. On the other hand, 'to move' does not convey any information concerning the instrument or the location (ground, water, air) of the action. To determine the suitability of each word, a correlation factor is computed. The latter reflects the word's appropriateness, i.e. its accuracy in expressing the given conceptual structure. To compute the correlation factor, the graph resulting from the projection of the definition graph VERB ('tO run', VB_INTRANS) on Gl is considered. The result of this projection is called the kernel graph. In the example, the projection results in the graph G2 (see Figure 5). The correlation factor consists of the total number of concepts expressed by the kernel graph (which is, of course, identical to the number of concepts expressed by a given word, i.e. its underlying definition graph). The correlation factor is used to select the most accurate word, i.e. the word whose underlying definition graph matches the greatest number of concepts contained in the utterance graph; the more concepts the definition graph has, the more specific it is, and hence the more accurate
*In the case of 'to drive', the concept [VEHICLE]and the relation oBJ (object) cannot be unified with [LEG]and IHSTR;consequently,this projection fails. For the verb 'to swim', the edge of the [SURFACEOF WATER]cannot be projected onto Gt, because [SURFACE_OF_WATER] does not match with [GROUND].Therefore, this definitionis rejected also. Vol 5 No 3 September 1992
the word is. In consequence, the possible candidates can now be ordered in terms of accuracy, 'to run' (5) being the best candidate, as it obtains the best score, followed by 'to walk' (4) and 'to move' (2). The benefits of this technique are conceptual density and economy of linguistic resources (number of words), and hence conciseness. Instead of using one word per concept, the system tries to find a word with optimal coverage, that is to say, a word that integrates the greatest number of concepts* without leaving holes. By introducing criteria such as appropriateness, accuracy etc. and by operationalizing these terms, objective means are provided for making principled choices of an item from among a set of alternatives.
Step 4: instantiation
Having chosen the lexical item, it is still necessary to determine the syntactic structure. Remember that the lexical entry VERB('to run', VB INTRANS) has been chosen. The generic syntactic graph, given as the second argument, is encoded as shown in Figure 6. It should be noted, that, from now on, the relations linking the lexical items of the future sentence are syntactic relations, and no longer conceptual relations as in the utterance graph Gt. What used to be an AGT relation becomes SUBJECT (SUB), linking a verb to the subject of the sentence. This kind of graph is called a syntactic
graph. The next step consists of instantiating the generic syntactic graph with the concepts of the utterance graph. The pending edges of the utterance graph, whose concepts are marked syntactically in the kernel graph (A SUB or A OBJ, according to the verb chosen), are linked to the correspondingly marked concepts of the generic syntactic graph. The same process is applied to the kernel graph G2. Once a word is found for a given concept, it replaces the corresponding variable in the generic syntactic graph. For example, the concept [ACTION:* A VERB] becomes ('to run') in the partial syntactic graph. As can be seen, during this process, there is a hybrid form of representation: [BOY] --* (SIZE) ~ [SMALL] which is purely conceptual, whereas [BOY] ~ (SUB) ~- [TO RUN] are conceptual and syntactic. The last operation consists of removing the labels (A...), which are of no use any more. Therefore, the generic syntactic graph becomes a partial syntactic graph G3 (see Figure 7). At the same time, the kernel graph G2 is transformed into a partial conceptual graph G4 (see Figure 8). tof course, there are cases where maximal coverage does lead to an optimal solution. This may be the case if a word covers the entire conceptual structure. Declarativesentencesrequire at least two words, a subject and a predicate. 205
I BOY:$5
&
I'--'--I to=l
Figure 7. Partial syntactic graph G3
7zFigure 8. Partial conceptual graph G 4
[ BOY:S5 ]~s-
~ - ~ -
]
"torun" I
1 Figure 9. Graph of 'Le petit garcon court vers la maison'
SEMANTIC PARAPHRASING AND EXPLANATION OF DIFFERENCE BETWEEN TWO WORDS As lexical choices are based on the substitution of words for parts of the utterance graph, and as words have different potentials with regard to expressing variable sizes of the utterance graph*, there is a natural way of (a) paraphrasing and (b) explaining the differences between a set of words. Paraphrases are obtained by varying the number of concepts that one tries to express by a word. Note that word choice affects not only the conceptual density per word (and hence the total number of words in the sentence), but also the syntactic structure of the sentence and its component elements (words). To paraphrase, the system backtracks each time at its last decision point, moving from the most inclusive word to the most specific word. More details on the use of PROLOG backtracking for paraphrasing are given in Reference 65. For a similar approach, see Reference 76 +. This is illustrated by an example. Suppose that it is desired to express the content shown in Figure 11 In this case, the system can generate either a short version, such as the first phrase below, or different versions of increasingly larger sentences, as in the second, third and fourth sentences below: • • • •
The inflation The increase of M I causes a rise in prices. Increasing money in circulation hikes prices. The increase of money in circulation gives rise to an increase in prices.
Obviously, as mentioned above, the integrative power of words (conceptual density) varies among and within a given language. In the last example, English allows for shorter versions than French (see further below), as, for a given conceptual chunk, there are several short words in English for which there is no corresponding word in French. Hence, periphrases are necessary:
[(The little boy runs towards the house.)]
(cause ((an increase) (in price)))
Step 5: replacement of conceptual structure by syntactic structure
(the increase of (money in circulation) ~ the increase of M I la hausse de la masse mon6taire
By now, the partial syntactic graph G3 has been instantiated with words. The partial conceptualgraph G4 underlying this syntactic construction is also complete. The final operation consists in replacing the partial conceptual graph G4 by the partial syntactic graph G3 in the original utterance graph Gj, and, of course, lexicalizing the remaining unexpressed concepts (HOUSE, BOY, SMALL) by performing the same kind of operations. Since the partial conceptual graph G4 has become an exact subgraph of the utterance graph, it is now certain that the substitution is feasible. Finally, after the process of substitution, G~ becomes the graph shown in Figure 9. In this way, the conceptual graph gradually becomes a syntactic graph, which is a syntactic representation of the future sentence. This graph is interpreted by a generation automaton which transforms it into a sentence (cf. References 65, 74 and 75). Figure 10 and Table 1 summarize and illustrate the whole process. 206
~ hike prices provoquer la hausse des prix
The paraphrases are thus built on the basis of the correlation factor. A short version is started from, with increasingly longer versions being moved to. In other words, the correlation factor can be used to adapt lexical choice to the user's' expertise: it is dense and concise for experts, and verbose and explicit for the layperson. It should be noted, however, that conceptual density, while very useful, is not the sole criterion for word choice. Besides linguistic factors, context, i.e. pragmatic and stylistic factors, should also be taken into account. For example, in the fourth sentence in the list above,
*Remember that there are a large number of linguistic possibilities, ranging from words expressing only one concept to words expressing a large conceptual structure, or informational chunk. +Using backtracking to generate all the possible coverings of the input (conceptual structure), they also build different syntactic representations of the same semantic network. However, unlike Boyer, the authors do not use any h,xical tran.~/brmation.
Knowledge-Based Systems
Step 3
Step 2 • VEItl ('te lue~', I/IL_PIO)
• vl~ltll ('re mm,¢,Vll_PlO) ls
u,
l l ~ O b l : " ,b.SUB[ ' , ( - " - ( ~ - , q - . - [
Cerreistlea lacier : 2
.... ~
MO~I~41~:" . VERB[
" ~
.
~ |: ~ < i l
¥
• Y E I t t ('re 1,,,.m',VII ~lTIOt,/~ t..
• V E U ('re ~ q k ' , v | _ D ~ t A , q S ) ~-
Co~eb~tlcm f - - ' t o r : 4
/ • VEIIt ('re ms', VB_INTIIANS) is
I~o~ ~,~,~