Generating Descriptions in Context Emiel Krahmer & Mari¨et Theune IPO, Center for Research on User-System Interaction, Eindhoven University of Technology fkrahmer/
[email protected]
1
I NTRODUCTION
In their interesting 1995 paper, Dale & Reiter present various algorithms they developed alone or in tandem to determine the content of a distinguishing description, that is: a definite description which is an accurate characterization of the entity being referred to, but not of any other object in the current ‘context set’ (taken to be the set of objects speaker and hearer are currently attending to). They argue that their Incremental Algorithm, discussed in more detail below, is the best one from a computational point of view (it is fast) as well as from a psychological point of view (humans appear to do it in a similar way). However, if we want to use this algorithm in a Concept-to-Speech system —a system which converts structured non-linguistic concepts into spoken texts— we encounter two problems. First, the algorithm is not sensitive to the linguistic context. Consider: (1)
a.
Al bought the small grey poodle and the blue siamese cat.
b. Unlike the cat, the dog was a bargain. In (1.a), two distinguishing descriptions are used: the small grey poodle and the blue siamese cat. If a speaker wants to refer to these animals again, she typically will not repeat the initial distinguishing descriptions, but rather use anaphoric, reduced descriptions which, as it were, are distinguishing given the linguistic context. It would not be very attractive to define a separate algorithm for such anaphoric descriptions, which would operate alongside the Incremental Algorithm. As we shall argue in this paper this is not necessary; it turns out that with some modifications we can use the Incremental Algorithm for both kinds of definite descriptions. The second aspect of the Incremental Algorithm which prohibits immediate embedding in a Concept-to-Speech system is that its output is not a natural language expression, but a list of properties which uniquely selects one object from a context set. The question how this list of properties is to be expressed in natural language is not addressed. Still, when developing a Concept-to-Speech system that is what we are interested in. Additionally, we want to know which properties realized in the definite description should be accented due to the fact that they are new to the discourse, or since they stand in a contrast relation (as is the case with the properties dog and cat in example (1.b)). However, when we embed the proposed modified algorithm in LGM,1 a language and domain independent Concept-to-Speech system, the generated descriptions are automatically provided with prosodic annotations indicating which words should be accented. The modifications to the Incremental Algorithm which we want to argue for are based on the idea that definite descriptions refer to the most salient element satisfying the descriptive content (Lewis 1979: 348-350, see Krahmer 1998 for a formalization in dynamic semantics). Lewis mentions the following example (due to McCawley): (2) 1
The dog got in a fight with another dog. See e.g., Van Deemter & Odijk (1997), Theune et al. (1997), Klabbers et al. (1998).
[1]
Lewis notes that this statement can only be true in a domain which contains at least two dogs, which entails that the dog cannot be a distinguishing description. According to Lewis, (2) means that the most salient dog got in a fight with some less salient dog. Lewis does not mention descriptions which refer to ‘unique’ objects (i.e., distinguishing descriptions), but it is readily seen that they can also be understood in terms of salience: if there is only one object with the relevant properties, it has to be the most salient one. The outline of this paper is as follows. In section 2, Dale & Reiter’s Incremental Algorithm is discussed, while in section 3 a modification of this algorithm is described which incorporates a notion of salience. It is shown that this paves the way for the context sensitive generation of both distinguishing and anaphoric descriptions, while retaining the attractive properties of the original algorithm. In section 4, two ways of assigning salience weights to objects are discussed, one based on the hierarchical focusing constraints of Haji˘cov´a (1993), the other on the constraints of Centering Theory (Grosz et al. 1995). In section 5 we discuss the embedding of the modified algorithm in the aforementioned Language Generation Module (LGM). We end with some concluding remarks and pointers to future research.
2
T HE I NCREMENTAL A LGORITHM
Suppose, for the sake of illustration, that we have a domain D1 of objects consisting of the following three entities d1 ? d3 together with a list of attribute value pairs, or properties, of the respective entities (Dale & Reiter 1995:258).
d1 h type, chihuahua i, h size, small i, h colour, black i d2 h type, chihuahua i, h size, large i, h colour, white i d3 h type, siamese cat i, h size, small i, h colour, black i Following Dale & Reiter (1995:254) we make the following basic assumptions about domains in general: (i) each entity in the domain is characterized by a list of attribute value pairs, (ii) each entity has at least an attribute type and (iii) there may be a subsumption hierarchy on certain attributes, generally on the attribute ‘type’. For instance: ‘dog’ subsumes ‘chihuahua’ and ‘poodle’, ‘cat’ subsumes ‘siamese cat’ and the type ‘animal’ in its turn subsumes ‘dog’ and ‘cat’. Additionally, there can be a ‘basic level value’ (Rosch 1978) for a certain attribute. Following Dale & Reiter, we assume that the basic level values for ‘chihuahua’ and ‘siamese cat’ are ‘dog’ and ‘cat’ respectively. The algorithms discussed by Dale & Reiter all have a common task: trying to collect a number of properties which uniquely determine the intended referent, and doing so in a ‘minimal’ way. For example, the so-called Full Brevity algorithm, originally from Dale (1992), guarantees that its output is the shortest possible distinguishing description. However, it turns out that this algorithm has a complexity which is NP-hard. They discuss three alternative algorithms with a polynomial complexity, of which the Incremental Algorithm is computationally the least expensive since it is independent of the number of attribute value pairs which characterize an object d in the domain. The input for the Incremental Algorithm is an object r , a set C (the context set) consisting of alternative objects from which r has to be distinguished (the ‘distractors’) and, crucially, an ordered set of preferred attributes. This list contains, in order of preference, the attributes that human speakers and hearers prefer for a particular domain. For instance, we may assume that a human speaker would first try to describe an animal by its ‘type’ (is it a dog? is it a cat?), and if that doesn’t help attributes like ‘colour’ and ‘size’ may be used. It is reasonable to assume that speakers have a general preference for absolute properties such as ‘colour’, which are easily observed without taking the other objects into account, over relative properties such as ‘size’, which are less easily observed
[2]
and always require inspection of the distractors. 2 Thus let us assume that the list of preferred attributes for the example domain is h type, colour, size, ... i. Essentially, the Incremental Algorithm goes through the list of preferred attributes, and for each attribute it encounters it finds the best value of this property. The best value of a property is the value closest to the basic level value such that none of the values it subsumes rule out more objects in the domain. If adding this best value to the already selected properties has the effect of ruling out any of the remaining distractors, it is included in the list of properties to be used in the generation of the distinguishing description. The algorithm stops when the list of distractors is empty (success) or when the end of the list of preferred attributes is reached (failure). A noteworthy feature is that there is no backtracking (hence the term ‘incremental’): once a property p has been selected, it will be realised in the final description, even if a property which is added later would render the inclusion of p redundant ‘with hindsight’. This aspect is partly responsible for the efficiency of their algorithm, but Dale & Reiter additionally claim that this property is ‘psychologically realistic’ since human speakers also often include redundant modifiers in their referring expressions (see e.g., Pechmann 1989).3 As noted in the introduction, if we want to use the Incremental Algorithm in a full-fledged Conceptto-Speech system we encounter a number of issues which need to be resolved. For instance, we do not want to restrict ourselves to distinguishing descriptions; we also want to be able to generate anaphoric descriptions. It would not be very attractive to define a separate algorithm for such anaphoric descriptions, which would operate alongside the Incremental Algorithm. Fortunately, this is not necessary; it turns out that with some simple modifications we can use the Incremental Algorithm for both kinds of definite descriptions.
3
AN
ALTERNATIVE ALGORITHM BASED ON SALIENCE
The underlying idea of the modifications we want to make to the Incremental Algorithm is the following: A definite description ‘the N’ is a suitable description of an object d in a state s iff d is the most salient object with the property expressed by N in state s Since the denotation of the N is an important factor, we use the notion of a value set. Let L be the list of properties expressed by some N. The value set of L in some domain D (notation ValD (L)) is the set of objects d 2 D which have the properties expressed by L. Thus, ValD1 (fhsize; smallig) = fd1 ; d3 g. The domain subscript and the attribute are omitted when this does not lead to confusion; e.g., we write Val(small). By definition, the value set of the empty list of properties is the entire domain (ValD (fg) = D ). Following common practice, we use jS j to denote the cardinality of a set S. 2
The literature on perception contains a wealth of empirical evidence for such general orderings, see, for instance, Pechmann (1989), Levelt (1989) and, more recently, Beun & Cremers (1998). 3 Dale & Reiter (1995:248): “For example, in a typical experiment a participant is shown a picture containing a white bird, a black cup, and a white cup and is asked to identify the white bird; in such cases, participants generally produce the referring expression the white bird, even though the simpler form the bird would have been sufficient.” Yet, it seems to us that the Incremental Algorithm would produce the description the bird in this situation: if we make the natural assumption that ‘type’ is the most preferred attribute, the property ‘bird’ will be the first one selected and immediately rules out the black and the white cup. In general, it should be noted that Pechmann’s notion of incrementality refers to speech production; the fact that speakers, when describing an object, start uttering properties of that object without making sure whether these are actually distinctive or not. Dale & Reiter’s incrementality refers to the lack of backtracking in property selection, but the order in which properties are selected is not related to the order in which properties are realized in speech.
[3]
How can we model the salience of an entity? For that purpose we use a function variable sw (salience weight) which per state represents a function mapping each element in the domain to a natural number. For the sake of simplicity, we shall assume that in the initial state, say the beginning of the generation process, all entities are minimally salient (represented as a zero salience level). Formally, 8d 2 D : sw0(d) = 0. When this can be done without creating confusion, the indices are suppressed. A central question is how an object increases in salience. Heim (1982:21), in a discussion of Lewis’ notion of salience, offers the following possibility: “A necessary condition of an utterance of a sentence S to promote an object x to maximal salience is that S contain either an NP that refers to x or a singular indefinite whose predicate is true of x”. Even if this is a necessary condition, it certainly is not a sufficient one. Many additional factors play an important role, e.g., whether the NP in question occupies the subject or the object position in the sentence, whether it occurs in the topic or the focus of the sentence, whether the NP is definite or indefinite, what the structure of the preceding text is etc. Below, in section 4, we discuss and compare two methods for salience weight assignment; for the time being we simply assume that the salience weights are given. So let sws be the function assigning salience weight in state s, then we can define that an object r is the most salient object having certain properties L in a state s (notation: MostSalient(r; L; s)) if, and only if, every object in Val(L) different from r has a lower salience weight in s than r itself. More formally: 8d 2 Val(L)(d 6= r ! sws(d) < sws(r)). Let us now turn to the question how the Incremental Algorithm can be generalized. Figure 1 contains our proposal for a modified algorithm in pseudo-code. We have stuck as closely as possible to the algorithm from Dale & Reiter (1995:257) to ease comparison.4 Below, we illustrate it with a number of examples. First, we give a general, somewhat informal overview. The algorithm is called by MakeReferringExpression (r; P; s); that is, we try to generate a definite description for a referent r given some pre-defined list P of preferred attributes in a state s. L is a list of the properties which have been selected for inclusion in the expression generated and is initialized as the empty list. The variable tree contains the syntactic tree for the N under construction which corresponds with the current list of properties L. Finally, contrast is a boolean variable which indicates whether the property under consideration is contrastive or not. As in the original version the main loop iterates through the list P of preferred attributes. For each property (attribute value pair) on this list, the best value V is sought (essentially in the same way as done in the Incremental Algorithm). Once the best value V is found, it is checked whether adding this property to the list of already selected properties ‘shrinks’ the value set (and thus rules out one or more ‘distractors’). If this is so, it is checked whether V is contrastive. The function Contrastive (r; A; V ) checks whether the property under consideration serves to distinguish an object in some linguistic context C . C is defined as the set of objects/discourse referents (DRs) which are referred to in the previous sentence (PrevS) or in the sentence currently being generated (CurrS),5 and of which the basic level value of the attribute ‘type’ has the same parent as that of the intended referent r . A property V of r is considered to be contrastive if there is an element c 2 C which has a different value for the current attribute. 6 Subsequently, the algorithm tries to incorporate the property V in the N under construction, using the 4
For this same reason, we also adopted the notation employed by Dale & Reiter, which is essentially the WEB style from Knuth (1986:viiff). 5 Embedding the modified algorithm in LGM entails that the objects referred to in the previous discourse can be retrieved from the discourse model. Objects referred to in the current sentence are available from the syntactic template. See section 5. 6 Thus, loosely speaking, the property ‘large’ in the NP the large dog is contrastive in the context of a small cat but not in the context of a small car. This treatment of contrast is related to the proposal of Prevost (1996), who presents an algorithm for deciding which properties should receive contrastive accent in a manner which is somewhat similar to the Incremental Algorithm. See Theune 1997 for some further discussion on Prevost’s approach to contrast.
[4]
MakeReferringExpression (r; P; s)
L
fg
tree nil contrast false for each member Ai of list P do V FindBestValue (r; Ai ; BasicLevelValue(r; Ai ); s) if Val(L Ai ; V ) < Val(L) contrast Contrastive(r; Ai ; V ) (tree UpdateTree(tree; V; contrast)) = nil L Ai ; V then L endif if MostSalient (r; L; s) = true then if type; X L for some X then return [NP [Det the] tree] else V BasicLevelValue(r; type) (tree UpdateTree (tree, V, false)) = nil return [NP [Det the] tree] endif endif return failure
j
[ fh
ig j j
[ fh
h
j^ ^
6
ig
i2
^
6
^
FindBestValue(r; A; initial-value; s)
h
i
if UserKnows(r; A; initial-value ) = true then value initial-value nil else value endif if MostSalient(r; A; value ; s) = false (more-specific-value MoreSpecificValue(r; A; value)) = nil (new-value FindBestValue (r; A, more-specific-value, s)) = nil Val( A; new-value ) < Val( A; value ) then value new-value endif return value
fh
j fh
ig
^
ig j j fh
MostSalient(r; L; s)
82
6 ) sw
if d Val(L)(d = r then return true else return false endif
6
ig j
6
^
^
d) < sw (r))
s(
s
Contrastive(r; A; V )
fd 2 DR PrevS [ CurrS j d 6 r ^ Parent BasicLevelValue d; type if 9d 2 C Value d; A 6 V C
(
)
(
:
then return true else return false endif
=
(
(
)) =
g
Parent(BasicLevelValue(r; type))
)=
Figure 1: Full sketch of the modified algorithm. [5]
function UpdateTree(tree, V, contrast).7 If this does not succeed (the lexical or syntactic restrictions of the generation module make it impossible to express the property), V is rejected. If it does succeed, the current property is added to the list of selected properties. Then it is checked whether the intended referent r is the most salient object in the current state of the discourse which satisfies L. If so, the algorithm succeeds, and the constructed N is combined with the definite determiner the to produce a full definite description. First example: non-anaphoric description Reconsider our example domain D1 . Suppose we start generating a monologue in the state s0 . Thus, by assumption, all elements of the domain are equally salient. Now we want to generate an expression for d2 . MakeReferringExpression (d2 ; P; s0 ) is called (assuming that P is h type, colour, size, ... i). The list of properties L is initialized as fg, tree as nil and contrast as false.8 We consider the first property of d2 , h type, chihuahua i. The best value for this attribute is ‘dog’ (since Val(chihuahua ) = Val(dog) = fd1 ; d2 g). This property has sufficient descriptive content to be included in the description under construction: jVal(dog)j = 2 < jVal(fg)j = 3. As a result the function UpdateTree is called which returns a simple tree consisting of an N with an N0 dog. The value of L is now fh type, dog ig. MostSalient(d2 ; dog; s0 ) fails (because d1 is also a dog, and both d1 and d2 are equally salient by assumption.). So we proceed by taking the second property of d2 , h colour, white i. Now we have jVal(white; dog)j < jVal(dog)j; this property is discriminating and again we call the function UpdateTree which updates the current N tree (which only contained the head noun dog) and adds an AP to this tree for the property ‘white’. Now, MostSalient(d2 ; fwhite, dogg; s0 ) is true: d2 is the only white dog in the domain D1 so it is by definition the most salient one. Since the type of d2 is present in the constructed N tree, the definite article is added to this tree, and the resulting tree for the white dog is returned. When this description is conveyed to the user, d2 increases in salience (for a discussion of salience weight assignment, see section 4) and this makes d2 is more salient than d1 and d3 . Notice that when we assume that all entities in the domain are always equally salient (thus sw represents the constant function mapping each d 2 D to some n), the modified algorithm selects exactly the same properties as the original Incremental Algorithm. In other words, our version truly generalizes the original, which brings us to the next example. Second example: anaphoric description (1) Suppose we want to refer to d2 in a situation s0 where d2 is more salient than the other animals in the domain (e.g., because d2 was referred to in the previous sentence). We call MakeReferringExpression (d2 ; P; s0 ). The first property encountered is h type, chihuahua i, and the best value for this attribute is ‘dog’. Again jVal(dog)j < jVal(fg)j, hence the property ‘dog’ is selected. But now MostSalient(d2 ; dog; s0 ) is true: Val(dog) = fd1 ; d2 g and the only element in this value set different from d2 has a lower salience weight than d2 . So, the result is (a tree for) the dog. Thus: if d2 is more salient than all other dogs in the domain, we can subsequently use the description the dog to refer back to it. Of course, Dale & Reiter’s version, being insensitive to differing contexts, would always express d2 using the properties ‘dog’ and ‘white’. 7
A full description of this function is beyond the scope of this paper. It differs from the core algorithm in that it is, to a large extent, domain and language dependent. Essentially what it does is as follows: it attempts to integrate each new property V in the syntactic tree for the N constructed so far. In general, the ‘type’ attribute is realised as the head noun, and further adjectives are added in the order of selection, provided the result is grammatically correct. If contrast is true, the expression of the property V is marked by a [+c] feature, which is taken into account by the AddProsody function of LGM discussed in section 5. 8 We assume, for the sake of simplicity, that the linguistic context is empty and that the current sentence contains only one reference to an element of the domain, which entails that contrast will remain false in this as well as the following two examples. In the fourth and final example, the effect of contrast is described in more detail.
[6]
Third example: anaphoric description (2) An anaphoric description generally contains less information than its antecedent. In the previous example this was reflected by the omission of the property ‘white’ when d2 was referred to a second time. But it may also happen that a more general head noun is used for subsequent reference. To show how the modified algorithm captures this intuition, let us add a (small white) poodle to domain D1 and refer to this new domain as D2 .
h type, chihuahua i, h size, small i, h colour, black i h type, chihuahua i, h size, large i, h colour, white i h type, siamese cat i, h size, small i, h colour, black i h type, poodle i, h size, smalli, h colour, white i If we call MakeReferringExpression (d2 ; P; s) in a situation s where all elements of D2 have an d1 d2 d3 d4
equal salience weight, the result will now not be the white dog, as in the first example, but the white chihuahua, since in this case FindBestValue (d2 , type, dog, s) will return ‘chihuahua’ (since jValD2 (chihuahua )j < jValD2 (dog)j). Now consider the same situation as in the previous example, where d2 is more salient than the other elements of D2 , and a subsequent call of MakeReferringExpression (d2 ; P; s0 ) occurs. The BasicLevelValue for d2 with attribute ‘type’ is ‘dog’, and since d2 is the currently most salient dog in the domain, this basic value is the best value as well. Thus, initial reference to d2 (in domain D2 ) yields the white chihuahua and subsequent reference, the dog (on the assumption that the initial reference is not accompanied by references to other dogs). 9 Dale & Reiter’s algorithm would always express d2 (for domain D2 ) using the properties ‘white’ and ‘chihuahua’.
Fourth example: contrast Now consider calling the function MakeReferringExpression (d2 ; P; s) (again for domain D2 ), when the preceding sentence referred to both d1 and d2 (e.g., ‘the black chihuahua and the white chihuahua ...’). Then the first property added to the description under construction will be ‘chihuahua’ (as the reader may check for herself). Now we consider the second property, ‘white’. This property will also be added to the description, since it distinguishes d2 from d1 . Moreover, this property is also contrastive, since d1 and d2 have the same basic level value for the ‘type’ attribute and different values for the ‘colour’ attribute. The result is ‘the white[+c] chihuahua’ (where [+c] is a feature indicating a contrastive relation, these and other features are used in LGM for the assignment of pitch accents, see below). Summarizing We have discussed a generalization of the Incremental Algorithm which differs from the original version in that the discourse context is taken into account, success is defined in terms of salience and the output is a syntactic NP tree with markers for contrastiveness. We have seen that this generalization entails that creating a referring expression for one object (d2 ) can result in e.g., the white chihuahua (in an ‘empty’ context), the dog (in a context like the white chihuahua ...), the white[+c] chihuahua (sample context: the black chihuahua and the white chihuahua ...), the dog[+c] (example context: the white chihuahua and the cat ...). This means that we have resolved some of the issues mentioned in the introduction which complicated the usage of Dale & Reiter’s algorithm in a Concept-to-Speech system (context-sensitivity, natural language output). In section 5 we discuss how the modified algorithm can be embedded in LGM and how as a side-effect of this embedding, the referring expressions are ‘dressed up’ with prosodic information. First, however, let us discuss the assignment of salience weigths. 9
The reader is invited to check that when both d2 and d4 are referred to in the previous sentence and are equally salient (the white chihuahua and the poodle ...), a subsequent call to MakeReferringExpression does not produce the dog but the chihuahua (with a contrastive accent on the head noun, see below).
[7]
4
D ETERMINING S ALIENCE W EIGTHS
In this section we discuss and compare two different methods to assign salience weigths, one based on the hierarchical focusing constraints of Haji˘cov´a (1993), the other on the Centering approach of Grosz et al. (1995). Our intention is not so much to argue in favour of one of these approaches. Rather, we want to show that the modified algorithm can be associated with at least two realistic ways of determining salience weigths.
4.1 Haji˘cov´a: Hierarchical Focus Constraints The general goal of Haji˘cov´a and co-workers is to account for the relations between the changes of salience and the choices of referring expressions in discourse. The work is couched in Sgall’s Functional Generative Description of language (Sgall et al. 1986). Part of this Description is the Stock of Shared Knowledge (i.e., the common ground). This common ground is argued to be of a dynamic character: as the discourse progresses certain entities receive a higher degree of activation. To model the degrees of activation, objects are assigned natural numbers. Let us cast this approach in terms of our salience weight assignment function. The salience function given below closely corresponds to the rules given by Haji˘cov´a (1993), except that in our version entities are mapped to a higher number as they become more salient, with a maximum of 10 for the most salient entities, whereas in Haji˘cov´a (1993) entities are assigned a lower number as they become more salient (maximal salience being indicated by a zero). As a consequence, the Praguian approach allows for an infinite decrease in salience weight, and thus keeps track of objects which have faded from the discourse long ago. In our approach, the salience weight of an entity cannot decrease below zero.10 ; 11 D EFINITION 1 (Salience Weight assignment based on Haji˘cov´a (1993)) Let swi be some total function mapping the objects in the domain D to the set f0; : : : ; 10g. Let Uk be a sentence uttered in the context of swi , let topic(Uk ) D and focus(Uk ) D be the sets of entities which are referred to in the topic and the focus part of Uk respectively. Then the new salience function swi+1 is defined as follows:
8 > 10 > > > > sw (d) > > max(0; sw (d) ? 1) > > : max(0; sw (d) ? 2) i
i
i
i
if d 2 focus(Uk )
if d 2 topic(Uk ) and d is referred to by a definite NP if d 2 topic(Uk ) and d is referred to by a pronoun
if d 62 topic(Uk ) [ focus(Uk ) and d 2 topic(Ul ); l
object > other. The salience weight assignment we describe here is based on the ranking of the forward looking centers of an utterance. D EFINITION 2 (Salience Weight assignment: the Centering approach) Let swi be some total function mapping the objects in the domain D to f0; : : : ; 10g. Let Uk be a sentence uttered in the context of swi , in which reference is made to fd1 ; : : : ; dn g D . Let Cf (Uk ) (the forward looking centers of Uk ) be a partial order defined over fd1 ; : : : ; dn g D . Then the new salience function swi+1 is defined as follows:
sw +1(d) = i
(
0
n
if d 62 Cf (Uk )
otherwise, where n = level(d; Cf (Uk ))
level(d; Cf (Uk )) refers to the level of the occurrence of d in the ordering Cf (Uk ) defined in such a way that the highest element(s) on the ordering are mapped to 10, the element(s) immediately below are mapped to 9, etc. It is worth noticing that swi+1 is determined completely independent of swi . This is in line with the Centering claim that only the previous sentence is relevant for determining whether the current sentence is a coherent continuation in general and for the choice of referring expressions in particular. Of course, this is not say that swi and swi+1 are completely unrelated. In fact, one of the central claims of Centering Theory, alluded to above, is about the relation between swi and swi+1, and can roughly be stated as follows: the more ‘tight’ the connection is between swi and swi+1 , the more coherent the sequence of utterances will be perceived. For more details we refer to Grosz et al. 1995, and the references cited there.
4.3 Examples, Predictions and Comparison Here we shall not make a detailed comparison between the hierarchical focus constraints of Haji˘cov´a and the centering theory of Grosz et al., as this falls way beyond the scope of the current paper12 Rather we discuss two illustrative examples in some detail, with special emphasis on the repercussions of choosing one particular way of assigning salience weights for the context sensitive, incremental generation of referring expressions. Suppose the domain of discourse is D2 . We use swh to indicate the salience weights as assigned by the Haji˘cov´a-based approach (definition 1), while swc gives the salience weights as based on the Centering approach (definition 2). For the sake of simplicity, we only keep track of the individuals in the domain and only list values which are not zero. Indices on words refer to entities in the domain. 12
The interested reader is referred to Kruijff-Korbayov´a and Haji˘cov´a(1997).
[9]
In all sentences, the focus is underlined. Let us first consider a variant of an example discussed in Grosz et al. (1995:217). (3)
a.
The2 black chihuahua was angry.
sw (d2 ) = 9 sw (d2 ) = 10 h
c
b. It2 could not find its new bone anywhere.
sw (d2 ) = 9 sw (d2 ) = 10 h
c
c.
It2 barked at the1 white chihuahua.
sw (d2 ) = 9; sw (d1 ) = 10 sw (d2 ) = 10; sw (d1 ) = 9 f The1 dog (H) / The1 white dog (C) g has been acting quite odd recently. sw (d2 ) = 8; sw (d1 ) = 10 sw (d1 ) = 10 h
c
d.
h
h
c
h
c
Let us discuss this example twice and assume that the definite descriptions are generated by the modified Incremental Algorithm. First we take the Praguian point of view, that is: the algorithm uses swh as defined above. In (3.a) the modified Incremental Algorithm produces the description the black chihuahua to refer to d2 . Thereby, d2 becomes the most salient object, with the highest-but-one degree of salience, as it is introduced in the topic of the sentence. In the next sentence, reference to d2 is made by a pronoun. Since this pronoun occurs in the topic of (3.b) the salience weight of d2 is unchanged. In (3.c), d2 is again referred to by a pronoun, and the d1 is introduced to the discourse using the description the white chihuahua. Since this introduction is done in the focus of the sentence, d1 rises to maximal salience. When a reference to d1 is made in the next sentence, pronominalization is out: even though d1 is the most salient object in the entire domain D2 , the subject position of the previous utterance is occupied by a different object which would also count as a suitable antecedent for a pronoun it, namely d2 . Hence a definite description is constructed, and in this case the result is the dog, as d1 is the most currently most salient dog. This result, though marginally acceptable, is not what we want.13 A simple solution to this problem, and one which is also suggested in KruijffKorbayov´a and Haji˘cov´a (1997:41), is to ignore small differences in activation degree between competitors. In our terminology, this would amount to re-defining MostSalient(r; L; s) in such a way that this condition is met if, and only if, every object in Val(L) different from r has a salience weight in s which is at least, say, two points lower than the salience weight in s of r itself. Let us now discuss the example (3) from the Centering perspective. In the first sentence, the modified Incremental Algorithm creates the definite description the black chihuahua to refer to d2 . After the first sentence has been uttered, we find that Cf (3.a) is the singleton set containing d2 , and as a result this object is now the most salient object. In both (3.b) and (3.c) we refer to d2 with a pronoun. In the latter sentence we also refer to d1 ; this object is again introduced by the definite description the white chihuahua. The set of forward looking centers of (3.c) now contains both d1 13
It should be noted that for other examples a similar result may be obtained when using swc . Consider for instance: The2 white chihuahua chases the4 poodle.
According to definition 1 (swh ), the poodle (d4 ) is more salient than the chihuahua when this sentence has been uttered, since d4 is introduced in the focus. According to definition 2 (swc ) by contrast, the white chihuahua (d2 ) is more salient, since it is referred to by the subject. This means that when the modified algorithm uses swh , the generated description for d4 in a consecutive sentence is the dog (as with example (3) above), and similarly when the modified algorithm uses swc , the generated description for d2 in a consecutive sentence is also the dog.
[10]
and d2 and since d2 is referred to in subject position, it is ranked higher than d1 . Consequently, to refer to d1 in (3.d) the modified incremental algorithm correctly produces the white dog. Now consider the following example. Suppose John went to a dog-fair (thus, the domain of discourse, which we shall not list here, contains hundreds of dogs of all kinds, sizes, colours) and bought two dogs. (4)
a.
John bought the45 large black long-haired sausage dog and the53 small grey pygmy poodle with the perm wave. swh(john ) = 9; swh (d45 ) = 10; swh (d53 ) = 10 swc(john ) = 10; swc (d45 ) = 9; swc (d53 ) = 9
b. The45 sausage dog was a bargain. swh(john ) = 8; swh (d45 ) = 9; swh (d53 ) = 8
sw (d45 ) = 10 c
c.
It45 cost only $25. swh(john ) = 7; swh (d45 ) = 9; swh (d53 ) = 6
sw (d45 ) = 10 f The53 poodle (H) / The53 small grey pygmy poodle with the perm wave (C) g was much c
d.
more expensive though. swh(john ) = 6; swh (d45 ) = 8; swh (d53 ) = 9
sw (d53 ) = 10 c
Again, we discuss this example from two perspectives, beginning with the Praguian view. In the first sentence, two entities are introduced besides john: the dogs d45 and d53 , for which the respective descriptions the large black long-haired sausage dog and the small grey pygmy poodle with the perm wave are generated. Since both dogs are introduced in the focus of the sentence, they are promoted to maximal salience, and John, being introduced in the topic, is promoted to the one-but-maximal salience level. In (4.b), d45 is referred to. Pronominalization is not possible (the first sentence contains two equally salient potential antecedents for a pronoun), hence a description is returned, and since d45 is the currently most salient sausage dog, the description produced by the modified incremental algorithm is the sausage dog. Since d45 is now part of the topic, and referred to by a definite description, it has a salience level of 9. Since they are not mentioned in sentence (4.b), both john and d53 receive a lower salience weight (and notice that the salience weight of d53 reduces faster than that of john since the former but not the latter was introduced in focus(4.a)). In (4.c) we once again refer to d45 and now we pronominalize. After this sentence has been uttered, d45 still is the most salient object, and both john and d53 have a lower salience weight, of respectively 7 and 6. Now, in (4.d) we want to refer to d53 . Obviously, pronominalized reference is out of the question, so a definite description is constructed. Since d53 is the most salient poodle at this stage, the generated description is the poodle. Looking at this example from a Centering perspective we find something rather different. For the first sentence, the generation process does not change, the only difference with the Praguian approach is that on the Centering account it is john who is assigned the highest salience weight. For the generation process of the second sentence this makes no difference: d45 is still the most salient sausage dog and the modified incremental algorithm yields the sausage dog. d45 is the only object referred to in (4.b), hence it is the only forward looking center of this utterance, and thus trivially the most salient object in the domain. The salience weight of d53 , on the other hand, reduces to zero. The pronominal reference to d45 in (4.c) does not change the picture, so that when we refer to d53 [11]
in (4.d), the modified incremental algorithm again produces the description the small grey pygmy poodle with the perm wave, just as it did for the first-mention of this dog in (4.a). In our opinion, this shows that the Centering restriction that only the last sentence is relevant for determining the forward looking centers and the associated salience levels, is too strong from the perspective of definite descriptions. Again, there is an obvious way to remedy this shortcoming: we have to take the structure of the discourse into account (see e.g., Walker 1997 for a proposal to this effect). In sum: we have seen that both approaches rest on different assumptions and are formalized in different ways. Probably the most important distinctions are (i) the Praguian assumption that information introduced in the focus of an utterance has a somewhat higher salience weight than information introduced in the topic, as opposed to the Centering assumption that information referred to in subject position is more salient than information referred to in object position,14 and (ii) the Centering assumption that only objects mentioned in the previous sentence can have a non-zero salience weight, while in the Praguian approach the determination of salience weights is not restricted to the previous sentence. We have discussed two examples, each of which provides an illustration of one of these differences. In the first case, the Centering approach yields better results, while in the second case, it is the Haji˘cov´a way of determining salience weights which pays off. However, for both problems there are relatively straightforward solutions. In any case, it seems safe to conclude that there are at least two principled ways to determine salience weights to be used in the modified algorithm. In the next and final section, we show how the resulting algorithm can be embedded in a general Concept-to-Speech system called LGM, and how as a side-effect of this embedding the natural language expressions are dressed up with prosodic features.
5
LGM
LGM is a generic language generation module for use in Concept-to-Speech systems. Starting point for generation in LGM is a typed data structure representing the information to be expressed in natural language. The output is a so-called enriched text, i.e., a text which is annotated with prosodic markers, indicating the placement of pitch accents and intonation boundaries. This enriched text can be fed into a speech synthesis module, which makes use of the information from the prosodic markers. Text generation within LGM is done using syntactic templates. A syntactic template consists of (among other things) the syntactic tree of a phrase, containing at least one lexical item and a number of open slots to be filled by other syntactic trees. A simplified example is given in Figure 2.15 The syntactic trees in the templates serve several purposes. First, they are used to check whether a potential sentence violates the Binding Theory. If this is the case, the sentence will be rejected. Second, knowledge of the syntactic structure of phrases is used to generate elliptic constructions (such as the black and the white chihuahua). Third, the syntactic structure of a sentence plays a role in determining the placement of accents within the sentence. Van Deemter & Odijk (1997), Theune et al. (1997a) and Klabbers et al. (1998) provide detailed descriptions of how syntactic templates are selected, filled and ordered to generate a well-formed natural language text. Here we restrict ourselves to illustrating those aspects of the generation process which are relevant for this paper. We do this using an example, starting from the point when a specific template has been selected. The domain is D2 . Let us assume that the sentence to be gen14 In most cases, but not always, the subject of a sentence refers to topical information. In those cases, the approaches we have discussed make different predictions. 15 There is a certain resemblance with Lexicalized TAG (LTAG), which is a combination of Joshi’s TAG with Lexical Grammar. This issue is discussed in further detail in Klabbers et al. (1998). For the use of LTAG trees for generation see Stone & Doran (1997).
[12]
erated is the second one of a text, and that the first sentence was The white chihuahua bites the cat, where the white chihuahua refers to d2 and the cat refers to d3 . The discourse model maintained by LGM accordingly contains references to objects d2 and d3 . Now we wish to express the fact that d2 is angry. This can be done using the simple template shown on the left in Figure 2. The basic condition on the use of this template is that the data structure contains the information that there is an entity which is angry, and this condition is met by d2 . The open slot in the template, indicated by angled brackets, must be filled with an expression referring to d2 , which is generated by the Express function that is called in the template. When the example template has been selected to be used for the generation of a sentence, the function ApplyTemplate, shown (simplified and in pseudo-code) in Figure 3, is called. ApplyTemplate in turn calls the function FillSlots to obtain the set of all possible trees that can be generated from the template, using all possible combinations of slot fillings generated by the Express function. Express(r, P, case, s) PN, PR, RE nil trees PN MakeProperName (r) PR MakePronoun (r, case) RE MakeReferringExpression (r, P, s) trees PN PR RE return trees
S HHH VP HH V NP is
fg
angry
[ [
FROM Express(x, P, nom, s)
Figure 2: Example template (left) and Express function (right) The MakeReferringExpression function, discussed in the previous section, can be embedded in LGM as part of the Express function, shown in pseudo-code on the right in Figure 2. This function has as input the entity to be expressed, the list of preferred attributes P , the case of the NP to be generated and the current state. The output of the function is the set of all possible expressions (in the form of trees) that can be found for the entity. The function MakeProperName returns the name of the entity, if it has one. The function MakePronoun returns a pronoun with the appropriate case, gender and number, and MakeReferringExpression returns a definite description created in the manner described above. The output of all these functions is gathered together and returned by the Express function. For the entity d2 , assuming it has no name, the Express function will return trees for the following NPs: fit, the dog[+c] g. The definite description the dog can be used as d2 is the currently most salient dog in the domain. The property ‘dog’ has been marked as contrastive because d2 and d3 are both animals, but with a different value for their ‘type’ attribute. Given this set, FillSlots, returns the set (of trees for) f It is angry, The dog[+c] is angry g. Now it is checked for each tree (1) whether it does not violate the Binding Theory and (2) whether updating the old discourse model with (information from) the new tree results in a well-formed discourse model. The trees which do not pass these tests are filtered out. From the remaining ones, the function PickAny chooses one at random. Its prosodic properties are computed and the discourse model is updated. Finally, the fringe of the tree (= the annotated sentence) is sent to the Speech Generation Module to be pronounced. In the example, both trees returned by FillSlots obey the Binding Theory, since both the pronoun it and the R-expression the dog are ‘free’. However, the first tree cannot be used to update the current discourse model, since this contains no unique antecedent for the pronoun it: this pronoun might refer to either d2 or d3 and would therefore be ambiguous. As a consequence, only the second tree is applicable and PickAny can only select this tree for generation. On the basis of this tree, the [13]
ApplyTemplate(template) all trees, allowed trees chosen tree, final tree, sentence nil FillSlots(template) all trees for each member ti of all trees do if Violate BT(ti ) = false Wellformed(UpdateDiscourseModel(ti )) = true then trees trees ti endif if allowed trees = nil then return false else chosen tree PickAny(allowed trees) UpdateContext(chosen tree) final tree AddProsody(chosen tree) sentence Fringe(final tree) Pronounce(sentence) return true endif
fg
^
[
^
^ ^
^ ^
Figure 3: The function ApplyTemplate discourse model is updated by —among other things— adding a new referent for d2 to it, and coindexing it with its antecedent. As part of the process, the salience weight of the entity is increased. Then prosodic markers are added to the sentence by the function AddProsody. For more details about this function, we refer to Theune et al. (1997b), here we only give an informal overview. First, pitch accents are assigned. A word or phrase may be accented if it was marked as contrastive in the MakeReferringExpression algorithm, or if it expresses new information. Basically, all information that is not present in, or derivable from, the discourse model is regarded as being ‘new’. All other information is treated as ‘given’; such information is not accented unless it is contrastive. In our example sentence (The dog[+c] is angry), the VP is angry expresses new information and must receive an accent. By default, the accent lands on the rightmost element of a phrase, in this case the adjective angry. The phrase the dog represents given information, as it refers to d2 which has already been referred to in the preceding sentence. Additionally, the word dog is regarded as expressing given information since the concept ‘dog’ has been invoked by the previous mention of the word chihuahua. However, despite its ‘double’ givenness, AddProsody still assigns a pitch accent to the word dog to indicate its contrastiveness. After accentuation has been finished, intonational boundaries are added to the resulting tree. In our example, the syntactic structure does not motivate the addition of boundaries. Now the sentence can be sent to the speech output module to be pronounced, and together with the first sentence it forms the following small discourse (accents are indicated using small caps, boundaries by ‘/’): (5)
6
a. The WHITE CHIHUAHUA / bites the CAT. b. The DOG is ANGRY.
C ONCLUDING
REMARKS
In this paper we have discussed a generalization of Dale & Reiter’s Incremental Algorithm which differs from the original version in that the linguistic context is taken into account, success is defined in terms of salience and the output is a syntactic NP tree with markers for contrastiveness. Moreover, [14]
we have shown how the modified algorithm can be embedded in a Concept-to-Speech system (LGM) which adds prosodic annotations to the generated definite descriptions. As we have seen, our version of the Incremental Algorithm is a real generalization of the Dale & Reiter version, of which the attractive features have been retained. First, the modified algorithm can be shown to have the same theoretical complexity as Dale & Reiter’s (polynomial), and in general, it requires even less computational effort since the use of context enables us to reduce the number of attributes mentioned in the final referring expression. Second, even though we are reluctant to claim psychological reality, the modified algorithm might be argued to be at least as ‘psychologically plausible’ as Dale & Reiter’s original, since it does more justice to the observation, illustrated by example (1), that the creation of a referring expression is co-determined by the linguistic context. We certainly do not wish to claim psychological reality for LGM. In particular, the ‘generateand-test’ strategy and making a random choice from the suitable sentences is probably not the way humans do it. For instance, when an unambiguous reference is possible, LGM still generates both a definite description and a pronoun and makes a random choice between the two, whereas humans appear to prefer a pronoun whenever possible. Abandoning this random choice between different types of NPs may increase the coherence of the generated texts, but this will happen at the expense of variation. In any case, LGM is fast. Due to the fact that LGM is coded in Elegant, a high-level language based on C which employs lazy evaluation,16 the system operates in real time (even though it is difficult to estimate the overall theoretical complexity of such a complex system). There are a number of issues for future research related to our version of the Incremental Algorithm. As Dale & Reiter (1995:256) themselves point out, it would be interesting to further generalize the algorithm by applying it to indefinite and quantificational noun phrases. Similarly, we feel that the pronominalization decision can also be taken within the algorithm. Of course, both the work of Haji˘cov´a and colleagues as well as the work of Grosz et al. provide excellent starting points for such a more subtle pronominalization strategy, which could also replace LGM’s pronominalization method. Moreover, the algorithm could be further improved by taking semantic information into account. One possibility, which we hope to explore in the future, would be the simultaneous construction of a semantic representation for the N in the usual compositional fashion, which would simplify the algorithm, while at the same time doing more justice to the semantical complexities of adjectives.
R EFERENCES [1] Beun, R.-J. and A. Cremers, (1998) Object reference in a shared domain of conversation. Pragmatics & Cognition, to appear. [2] Dale, R. (1992), Generating Referring Expressions, MIT Press, Cambridge, MA. [3] Dale, R. and E. Reiter (1995), Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions, Cognitive Science 18; 233-263. [4] van Deemter, K. and J. Odijk (1997), Context Modelling and the Generation of Spoken Discourse, Speech Communication 21(1/2):101-121. [5] Grosz, B., A. Joshi and S. Weinstein (1995), Centering: A Framework for Modeling the Local Coherence of Discourse, Computational Linguistics 21(2):203-225. [6] Haji˘cov´a, E. (1993), Issues of Sentence Structure and Discourse Patterns. Theoretical and Computational Linguistics, Vol 2, Charles University, Prague. 16 Lazy evaluation is particularly beneficial when a template with, say, five open slots is used for the generation of a sentence. If each slot can be filled by 10 phrases, the template in question can produce 105 sentences. Generating all of these and making a random choice afterwards, would be computationally extremely inefficient.
[15]
[7] Heim, I. (1982), The Semantics of Definite and Indefinite Noun Phrases, doctoral dissertation, University of Massachusetts, Amherst. [8] Horacek, H. (1997), An Algorithm for Generating Referential Descriptions with Flexible Interfaces, Proceedings of the 35th ACL/EACL, Madrid, 206-213. [9] Klabbers, E., Krahmer, E., and Theune, M. (1998), A Generic Algorithm for Generating Spoken Monologues, Proceedings ICSLP’98, Sydney, Australia. [10] Krahmer, E. (1998), Presupposition and Anaphora, CSLI Publications, Stanford. [11] Kruijff-Korbayov´a, I. and Haji˘cov´a, E. (1997), Topics and Centers: A Comparison of the Salience-based Approach and the Centering Theory, Prague Bulletin of Mathematical Linguistics 67:25-50. [12] Knuth, D. (1986), Computers and Typesetting, Volume B, Addison-Wesley Publishing Co., Reading, MA. [13] Levelt, W. (1989), Speaking, from Intention to Articulation, MIT Press, Cambridge, MA. [14] Lewis, D. (1979), Scorekeeping in a Language Game, Journal of Philosophical Logic, 8:339-359. ¨ [15] Pechmann, T. (1984), Uberspezifizierung und Betonung in Referentieller Kommunikation, doctoral dissertation, Mannheim University. [16] Pechmann, T. (1989), Incremental Speech Production and Referential Overspecification, Linguistics 27:98–110. [17] Prevost, S. (1996), An Information Structural Approach to Spoken Language Generation, Proceedings of the 34th ACL, Santa Cruz, 294-301. [18] Rosch, E. (1978), Principles of Categorization, Cognition and Categorization, E. Rosch and B. Lloyd (eds.), Erlbaum, Hillsdale NJ, 27-48. [19] Sgall, P., E. Haji˘cov´a and J. Panevov´a (1986), The Meaning of the Sentence in its Pragmatic Aspects. Reidel, Dordrecht. [20] Stone, M. and C. Doran (1997), Sentence Planning as Description Using Tree Adjoining Grammar, Proceedings of the 35th ACL/EACL, Madrid, 198-205. [21] Theune, M. (1997), Contrastive Accent in a Data-to-Speech System, Proceedings of the 35th ACL/EACL, Madrid, 519-521. [22] Theune, M., E. Klabbers, J. Odijk and J.R. de Pijper (1997a), From Data to Speech: A Generic Approach, manuscript, http://www.tue.nl/ipo/people/theune/manuscript.ps.Z. [23] Theune, M., E. Klabbers, J. Odijk and J.R. de Pijper (1997b), Computing Prosodic Properties in a Datato-Speech System, Workshop on Concept-to-Speech Generation Systems, ACL/EACL, Madrid, 39-46. [24] Walker, M. (1997), Centering, Anaphora Resolution and Discourse Structure, in: M. Walker, A. Joshi and E. Prince (eds.), Centering in Discourse, Oxford University Press.
[16]