Ambiguity, Underspecification and Discourse Interpretation Massimo Poesio University of Edinburgh Centre for Cognitive Science
[email protected] Abstract A formal analysis of ambiguity processing is a necessary prerequisite for the development of a theory of underspecification and discourse interpretation for NLP systems. The analysis presented here is based on a distinction between semantic ambiguity and perceived ambiguity. A sentence is semantically ambiguous if it has a multiplicity of interpretations; a form of underspecified representation is introduced that can be used as the translation of a sentence that is semantically ambiguous in this sense. Perceived ambiguity, on the other hand, is captured in terms of hypothesis generation in context.
1. Introduction Although semantic ambiguity is often mentioned as a problem for Natural Language Processing ( NLP) systems, more often than not the discussion begins and ends with the statement that ambiguity is a problem and that a system developer has to find a way not to generate all the readings of an ambiguous sentence. The development of solutions to the problem —such as the popular idea of ‘underspecified representations’— must, however, be based on a clear analysis of the phenomenon of semantic ambiguity, if for no other reasons than otherwise we wouldn’t even know whether the form of underspecified representation we develop does the job it is supposed to do.
2. Ambiguity in Natural Language 2.1. The Combinatorial Explosion Paradox Advances in modern syntactic and semantic theory typically result in the discovery that sentences have many more interpretations than previously thought. This progress has an unfortunate side effect: that the alternative syntactic readings of sentences such as (1)
under one of such theories number in the hundreds, whereas the number of scopally distinct readings of sentences such as (2) may well be of hundreds of thousands, if we do not take syntactic and semantic constraints in account. Yet, human beings appear able to deal with these sentences effortlessly. A lot of work on ambiguity in Natural Language Processing (NLP) is motivated by this Combinatorial Explosion Paradox. (1)
We should move the engine at Avon, engine E1, to Dansville to pick up the boxcar there, then move it from Dansville to Corning, load some oranges, and then move it on to Bath.
(2)
A politician can fool most voters on most issues most of the time, but no politician can fool all voters on every single issue all of the time.
In their eagerness to explain the paradox, researchers have often forgotten other aspects of the problem of ambiguity; most importantly, the fact that sentences can be intended to be ambiguous. ‘Ambiguity elimination’ solutions to the combinatorial explosion paradox such as Kempson and Cormack’s (1981) or Verkuyl’s (1992) have had some
International Workshop on Computational Semantics success in showing that certain classes of ambiguity— especially ambiguities associated with plural noun phrases or certain classes of scopal ambiguities— can be done without, but such proposals cannot be extended to eliminate structural and lexical ambiguity, and anyway the applicability of these techniques is limited even as far as scopal ambiguity is concerned.1 The existence of syntactic and semantic constraints on the available readings (May, 1985), as well, is only part of the story: we do not want, for example, a theory that assigns a single syntactic structure to a sentence such as They saw her duck.
well, there is evidence that preferences exist (Kurtzman and MacDonald, 1993); and although it’s not clear when disambiguation takes place, effects similar to garden path can be had with quantifier scope, as shown by (3): (3)
Statistics show that every 11 seconds a man is mugged here in New York City. We are here today to interview him.
Furthermore, there is evidence that humans entertain more than one interpretation when disambiguating, which, again, is not what one would expect from ‘lazy’ processors. For one thing, humans are able to detect ambiguity in context when it occurs: this is shown both by the fact that the a sentence’s perceived ambiguity can be exploited for rhetorical effects,3 and by the fact that when clarity is a goal, people tend to construct the sentences occurring in natural conversations and texts in such a way as to avoid ambiguity, so that most sentences one runs across in real texts or transcripts of natural conversations have preferred readings in context.4 All of this suggests that although humans couldn’t possibly consider all of the theoretically possible interpretations of a sentence, they may entertain more than one possibility and they do attempt to disambiguate, at least partially. Should this fact be of any concern to the developer of a NLP system? It depends on the application. There are at least two reasons for giving the kind of systems I am concerned with, systems that engage in conversations with their users, the ability to recognize an ambiguity: in order to ask for clarifications, and in order to make their own output unambiguous. I suspect other systems may need this the ability to recognize whether a text is ambiguous as
2.2. Eager Disambiguation and Deliberate Ambiguity A more drastic way around the Combinatorial Explosion Paradox is to conclude that ambiguous sentences are just not interpreted. There is, however, little doubt that humans do not take such a lazy approach to discourse interpretation. First of all, there is psychological evidence indicating that whenever human subjects are presented with sentences as part of a task, whether as participants to a psychological experiment or when they have to solve a problem as in the TRAINS conversations,2 they perform a substantial amount of disambiguation, whether the sentences being processed are lexically, structurally, referentially, or scopally ambiguous. In fact, in the case of the first three kinds of ambiguity at least, it’s commonly accepted that disambiguation not only occurs, but it takes place rather early, as shown by phenomena like garden-path sentences (Frazier and Fodor, 1978; Crain and Steedman, 1985; Altmann, 1989). In the case of scopal ambiguity, as
1 For example, the two readings of Few students know many languages are truly distinct. 2 The aim of the TRAINS project is to study task-oriented conversations. The project involves both collecting a corpus of conversations
between human beings involved in a task—the TRAINS domain is transportation of goods by train—and the development of theories about the aspects of natural language interpretation and plan reasoning observed in these transcripts. These theories are tested by the development of prototype systems able to engage in conversations (Allen et al., 1995). 3 Raskin (1985) claims that humor crucially relies on exploitation of ambiguity. He discussed examples such as the following (p. 25-26): (4)
The first thing that strikes a stranger in New York is a big car.
The joke relies on two assumptions about human processing: that the clause the first thing that strikes a strangerin New York gets interpreted before the end of the sentence, strikes receiving the ‘surprise’ interpretation; secondly, that the reader is able to go back, arrive at a second interpretation, and entertain them both simultaneously. 4 That discourse interpretation may result in more than one interpretation is also the conclusion arrived at by psychological research on discourse: there is evidence that both lexical processing and syntactic processing are processes during which several hypotheses are generated in parallel, and then filtered on the basis of contextual information (Seidenberg et al., 1982; Swinney, 1979; Kurtzman, 1985; Crain and Steedman, 1985; Gibson, 1991). Kurtzman and MacDonald (1993) suggest a similar model for scope disambiguation as well. As far as reference interpretation is concerned, there is some evidence that all pragmatically available referents become active before a referent is identified (see, e.g., (Spivey-Knowlton et al., 1994)).
2
International Workshop on Computational Semantics well—for example, systems that checks whether text is easy to understand.
in particular, the idea of ‘distinct interpretations’— precise is to introduce a ‘translation language T L, such that each expression of T L denotes a single object, and to use distinct expressions of this language to encode the distinct interpretations of the string in L. (Typically, T L is a logical language with the usual ingredients such as predicates, connectives, etc.) We can now rephrase the ‘intuitive’ definition above by saying that the word croak is ambiguous because it has two translations in terms of the objects denoted by the language T L: one of them is the (denotation of the) predicate CROAK1, which includes objects that produce a sound like that produced by frogs; and the other to the object denoted by the predicate CROAK2, which is a property of people who died. This notion of semantic ambiguity can be formalized as follows: a string (a word or a larger constituent, such a sentence) of language L (say, American English) is ambiguous with respect to the translation function that maps expressions of L into (the denotations of the ) expressions of the language T L, interpreted with respect to the model M = hU,Fi, if () = { 1 , …, n }, where 1 , … n are distinct objects of M. Before continuing, I’d like to clarify the definition above a bit more. One case that is worth discussing are words like tall. Given the way tall is usually translated (in a vocabulary or in a NLP system), and given the definition above, tall would not be classified as ambiguous. This is because although different people may have different notions of what it means for someone to be tall, and although the extension of the predicate TALL-PERSON cannot be characterized very precisely, nevertheless each person assigns a single translation to the word.5 This notion of semantic ambiguity can also be used to classify sentences. Let’s call the semantic correlate of sentences propositions. According to the definition above, a sentence is ambiguous if it translates into two propositions. The sentence Kermit croaked, for example, would be considered ambiguous under the translation function above because it denotes two propositions: the proposition that attributes to Kermit the property of producing the sound that frogs produce; and the proposition that attributes to Kermit the property of dying. Note that whether a sentence comes out as ambi-
3. A Basic Theory of Ambiguity 3.1. Semantic Ambiguity and Perceived Ambiguity In order to address the Combinatorial Explosion Paradox without ruling out the possibility that a human (or a system) may recognize whether a sentence is ambiguous, a distinction between the notion of semantic ambiguity discussed above, and what I will call perceived ambiguity, is needed. The Combinatorial Explosion Paradox is solved if our theory of ambiguity does not require that all distinct interpretations of a semantically ambiguous sentence are actually generated. On the other hand, we want the theory to allow that, in a given context, more than one interpretation becomes available, although typically the number of such interpretations will be much smaller than the number of possible semantic interpretations. I will say that in the case the number of interpretations obtained at the end of sentence processing is strictly greater than one, we have a perceived ambiguity. The theory of ambiguity I propose consists therefore of two main parts: 1. an underspecified language which can be used to code semantic ambiguity implicitly, thus eliminating the need to generate all semantic interpretations; and 2. a theory of the disambiguation process that may result in a perceived ambiguity.
3.2. Characterizing Semantic Ambiguity I will illustrate the intuitions about semantic ambiguity that inform this work by discussing one form of semantic ambiguity, lexical ambiguity. According to what we may call the ‘intuitive’ notion of ambiguity, the word croak is ambiguous because, given a vocabulary for the language L—say, American English—we can find in it (at least) two distinct interpretations for that word. The usual way to make this intuition—and,
5 I have ignored here the fact that tall is context-dependent in the sense that a tall giraffe is taller than a tall person. I assume this issue can be dealt with by translating tall as a predicate modifier.
3
International Workshop on Computational Semantics guous depends in part on the kind of objects that are used to model the notion of proposition. If we identify propositions with truth values, the sentence Kermit croaked turns out to be unambiguous with respect to a model if Kermit has both the property of dying and the property of producing a frog-like sound in that model, or if he (it) has neither property. Furthermore, we need an intensional notion of proposition: we do not want to say that a sentence is ambiguous if Kermit has the property of producing a froggy sound in situation s and of dying in situation s’. Both requirements can be satisfied if we assume that propositions are partial functions from situations to truth values, as common in most recent work in semantics.6 A sentence can be semantically ambiguous for several reasons, besides the fact that (some of) its lexical constituents are ambiguous: for example, it may have more than one structural analysis (as the sentence They saw her duck) or it may be scopally ambiguous (as the sentence I can’t find a piece of paper). Both kinds of ambiguity result in a semantically ambiguous sentence under the definition above if they result in distinct translations for the sentence.
shown in (6), in which quantifiers are left in place and the referent for the definite description the tree is not specified. (5)
Every kid climbed the tree.
(6)
[ climbed ]
These representations were originally conceived as a way to solve a problem in system implementation, namely, separating ‘context-independent’ from ‘context dependent’ aspects of the interpretation, thus making either part reusable for different applications. The idea has been gaining consensus in recent NLP literature that the underspecification approach may in fact have cognitive plausibility, and in fact, that it may explain the Combinatorial Explosion Paradox, the idea being that humans translate sentences in an underspecified language UL that can encode more than one distinct semantic interpretation. A semantically ambiguous sentence, therefore, need not cause problems for a human to process because it is not perceived as ambiguous. In order to implement the URH, we need a language that can be used to code the alternative semantic interpretations of a sentence. Two questions have to be addressed: what kind of information this language should carry, and whether it should have a proper interpretation. As far as the first question is concerned, there is a fair amount of similarity between the underspecified languages proposed in the literature. The second question has raised much discussion. Only a few underspecified representations come with a proper semantics, and there is no agreement as to what this semantics should be. The uninterpreted conditions in DRT, which might be considered a form of underspecified representation, make their ‘uninterpretability’ their defining characteristic. These representations encode the ambiguity of a sentence in the sense that that sentence has the reading r iff that reading can be generated by repeatedly applying ‘construction rules’ to the underspecified representation. If, however, we want to maintain a separation between semantic ambiguity and perceived ambiguity, and moreover, if we do not want to characterize the disambiguation process as one that simply generates all of the alternative semantic interpretations, we need a way to characterize a sentence’s semantic ambiguity independently from the results of the process of disambiguation. In a
3.3. The Underspecification Hypothesis Most NLP systems, and many theories of ambiguity, have assumed (perhaps implicitly) what I will call here the Underspecification Hypothesis (URH): Underspecification Hypothesis (URH) Humans are capable of representing semantic ambiguity implicitly by means of underspecified representations that do not require all aspects of interpretation to be resolved. Examples of such underspecified representations are the ‘Logical Form’ proposed by Schubert and Pelletier (1982), the ‘Situation Schemata’ proposed by Fenstad et al. (1987), the ‘Logical Form’ discussed in Allen’s textbook (1987), and the ‘Quasi-Logical Forms’ used in the Core Language Engine (Alshawi and Crouch, 1992). The ‘uninterpreted conditions’ produced during the intermediate steps of the DRT construction algorithm (Kamp and Reyle, 1993) can be considered underspecified representations as well. A typical example of underspecified representation is the representation for (5) proposed by Schubert and Pelletier,
6 More complex notions of propositions, such as those used in Situation Semantics (Barwise and Cooper, 1993) ensure an even finer grained distinction, but this will not be required here.
4
International Workshop on Computational Semantics
[ P(t)] = {f j f(s) = undefined if g(s) = undefined or h(s) = undefined; f(s) = 1 if g(s)[h(s)] = 1; f(s) = 0 if g(s)[h(s)], for g 2 [ P] and h 2 [ t] }
theory in which semantic ambiguity are distinct notions, the underspecified language into which sentences are translated must have an interpretation specifying the ‘ambiguity potential’ of those sentences. Given the amount of space at my disposal, I will only be able to illustrate in detail an underspecified representation used to encode the ambiguity potential of a lexically ambiguous sentence. Below I will briefly discuss extending this language into one that can be used to translate sentences that exhibit more complex forms of semantic ambiguity. More details are given in (Poesio, 1994). My proposal about underspecification is tied fairly closely to the definition of semantic ambiguity discussed above. Given a language T L that includes a predicate symbol for each interpretation of each word in the language L, we use as target of the translation for a lexically ambiguous sentence (that is, as our underspecified representation) a lexically underspecified language LUL such that (i) T L LUL, (ii) the same model M used to interpret T L is also used to interpret LUL, and (iii) the interpretation of each non-logical constant of LUL is a set of objects of the type denoted by the corresponding constants of T L. For example, the following lexically underspecified language can be used to translate the sentence Kermit croaked:
f(s) = undefined if neither g(s) or h(s) = 0, but either one is undefined; f(s) = 1 if both g(s) = 1 and h(s) = 1, for g 2 [ ]] and h 2 [ ]] }
[ :]] = {f j f(s) = undefined if g(s) = undefined; f(s) = 1 if g(s) = 0; f(s) = 0 if g(s) = 1, for g 2 [ ]]} As can be seen from these clauses, a lexically underspecified langage has two basic properties: (i) all expressions denote sets of objects of the type denoted by expressions of T L, and (ii) for each word of the (natural) language L such as croak that translates in two distinct objects of the same type, there is an expression in the ‘underspecified’ language such as CROAKU that denotes the set of objects that constitute the translation of croak. The non-logical constants of LUL that are also in T L denote the singleton set whose element is their denotation in T L. The clauses for negation and connectives show how ambiguity ‘percolates up’. We propose that underspecified languages such as LUL are used to translate lexically ambiguous sentences, that is, that an ‘underspecified’ interpretationsuch as CROAKU is assigned to each semantically ambiguous predicate of English. The grammar generating an underspecified representation of Kermit croaked is as follows:
Terms: a single constant, k. Predicates: the predicates CROAK2.
CROAKU , CROAK1 ,
[ ^ ]] = {f j f(s) = 0 if g(s) = 0 or h(s) = 0;
and
Atomic Formulas: If t is a term and P is a predicate, then P(t) is an atomic formula. Formulas: If is an atomic formula, then is a formula. If and are formulas, then : and ^ are formulas.
S ! NP VP; VP0 (NP0)
Let M = hU,Fi be a model for the ‘basic’ language T L. We assume that all expressions in T L are intensional, i.e., denote functions from situations to objects in the domain. The interpretation function for the ‘underspecified’ language LUL is then defined as follows:
VP ! croaked; CROAKU
NP ! Kermit; k
The translation of Kermit croaked in LUL, CROAKU (k), denotes a set of two propositions (= func-
tions from situations to truth values): the function that assigns 1 to a situation iff Kermit produced a frog-like sound in that situation; and the function that assigns 1 to a situation iff Kermit died in that situation. The example just discussed, although simple to the point of silliness, nevertheless introduces the main points of the particular implementation of the Underspecification Hypothesis that is proposed here: (i)
[ k] = {F(k)}
[ CROAK1] = {F(CROAK1 )} [ CROAK2] = {F(CROAK2 )} [ CROAKU ] = {F(CROAK1), F(CROAK2 )} 5
International Workshop on Computational Semantics translate natural language into underspecified representations, and (ii) let these denote the set of alternative interpretations that can be assigned to an ambiguous sentence (or sentence constituent).
The set of hypotheses that result from this inference process is filtered and organized by plausibility on the basis of commonsense knowledge. The preferred interpretation of (7), for example, is the one where the pronoun it refers to the serial port, because these are usually found in the back of computers.
3.4. Discourse Interpretation and Perceived Ambiguity
(7)
The evidence discussed above suggests that discourse interpretation is a process that involves reasoning with underspecified representations and may involve generating more than one hypothesis in parallel. This, as well as the fact that most of the inferences performed in discourse processing are defeasible, suggest that discourse interpretation should be formalized in terms of a theory of default inference such as Reiter’s default logic (Reiter, 1980).7 The hypotheses generated by discourse interpretation can be thought of as the extensions of a default theory (DI,UF), where DI— the Discourse Interpretation Principles—are default inference rules, whereas UF is a set of underspecified formulas. For example, our set of Discourse Interpretation Principles could consist of the following inference rule:
Hook up the cable to the serial port. It is on the back of the computer.
Commonsense reasoning may also tell us that some extensions are equivalent for the purposes at hand, thus can be merged. An example from the TRAINS corpus is the sentence Hook up the engine to the boxcar, and move it to Avon.: even though it can refer either to the engine or to the boxcar, and therefore two extensions could be obtained by discourse interpretation, the difference between these two extensions would be immaterial as far as the plan is concerned, because moving one object would necessarily entail moving the other; the two hypotheses can therefore be merged. How precisely commonsense knowledge is used to ‘filter’ and ‘merge’ hypotheses is pretty much an open question; for our purposes here it is enough to assume that a Discourse Interpretation System includes, in addition to the set DI of discourse interpretation principles and the set UF of ‘initial facts’, a plausibility ranking function .9 We say that an ambiguity is perceived when the function cannot impose a total order on the hypotheses obtained by discourse interpretation starting from UF and DI.
CROAK1-IF-FROG: CROAKU (x); FROG(x) CROAK1 (x) ^ :CROAK2(x)
That is, if an object x with the property CROAKU also has the property FROG, you can assume that the interpretation CROAK1 was intended and refine the current theory. Note also that each interpretation is assumed to be incompatible with the others.8 Of course, defaults are only plausible assumptions, and therefore they can be overridden by stronger information, as well as generate conflicts. Thus, if our default theory were to include a second discourse interpretation principle stating that the CROAK2 interpretation is plausible for human-like beings, and if we assume that Kermit is indeed a human-like being, we would obtain a second extension of the theory above. This second extension could in turn be overridden if the initial theory UF happened to include the fact that Kermit is currently alive.
3.5. The Anti-Random Hypothesis The Underspecification Hypothesis is also consistent with a theory of discourse interpretation requiring,say, that once an underspecified interpretation is obtained, all possible disambiguated interpretations have to be generated; as a matter of fact, theories of discourse interpretation developed in the natural language processing literature such as Hobbs and Shieber’s scoping algorithm (Hobbs and Shieber, 1987) are of this kind. This kind of theory would be formalized in the framework just discussed by having discourse interpretation
7 The abductive model proposed by Hobbs et al. (Hobbs et al., 1990) is another way to formalize the process of discourse interpretation. Reiter’s model has the advantage of being more readily extendible in the sense discussed in the next section. 8 One could of course imagine a scenario in which more than one property is attributed at once, but we will not consider that issue here. 9 One way to formalize a commonsense knowledge-controlled ‘plausibility ranking’ among hypotheses is in terms of a priority ranking among models similar to that proposed by Shoham (Shoham, 1988) In practice, in systems such as TRAINS the planner is used to assign this ranking.
6
International Workshop on Computational Semantics principles that simply generate all the semantically plausible interpretations, e.g., of the form:
Elmira : but it looks like : it might just be too complicated to figure out how to use that engine 33.13 : [snirk] 33.14 : are we gonna have enough _time_ to 33.15 : um 33.16 : use the engine from Avon 33.17 : and 33.18 : bring it all the way down to Bath 33.11 33.12
CROAK1-AT-RANDOM: CROAKU (x) CROAK1 (x) ^ :CROAK2(x)
This kind of discourse interpretation process would clearly produce all semantically available interpretations. The URH thus needs to be supplemented with a theory of disambiguation that makes the process that results in the final interpretation(s) highly constrained.10 More precisely, I propose that discourse interpretation is subject to the following constraint:
There is a clear preference for interpreting that pronoun as referring to the engine at Avon introduced in 33.16, as opposed to a boxcar introduced in 33.5. One theory of pronoun interpretation could be as follows: a listener searches for all possible antecedents in the discourse, generates for each of them an hypothesis to the effect that the pronoun refers to that antecedent, and ranks these hypotheses according to how plausible the situation is. This would be an example of a random hypothesis generation process. Such a process would leave the task of choosing one hypothesis to plan recognition; note that this would lead to trouble in the example in (8), since in order to refute the hypothesis that it refers to a boxcar, the plan reasoner would have to do some pretty complex reasoning—say, that it is unlikely that the speaker is referring to some boxcar to bring all the way down to Bath before having identified one. I am not aware of any plan reasoner able to do that, and anyway the explanation seems a bit far fetched. A different kind of theory would be something along the lines of centering theory (Grosz, Joshi, and Weinstein, 1983; Brennan, Friedman, and Pollard, 1987), according to which each utterance establishes a ‘backward looking center’ (Cb), and a pronoun is by default interpreted to refer to the Cb. (There are a number of complexities I am glossing over here.) One could imagine that this theory would generate a single (or a few) hypothesis concerning the antecedent of it; the other possibilities, although semantically possible, would simply never come up. This is an example of a theory of discourse interpretation that satisfies the Anti-Random Hypothesis. Examples of theories
Anti-Random Hypothesis (ARH) Humans do not randomly generate alternative interpretations of an ambiguous sentence; only those few interpretations are obtained that (i) are consistent with syntactic and semantic constraints and (ii) are suggested by the context. The Anti-Random Hypothesis should be thought of as a ‘meta-constraint’ on theories of interpretation: if we intend to account for the Combinatorial Explosion Paradox, we have to develop theories of interpretation (e.g., theories of parsing, or theories of definite description interpretation) that satisfy this constraint. An example illustrating the difference between theories of discourse interpretation that satisfy the Anti-Random Hypothesis, and theories that do not, is the question of how to interpret the pronoun it in utterance 33.18 in the following fragment from a TRAINS transcript: (8)
33.4 M: we need 33.5 : once again an engine and a boxcar ... 33.6 : and there’s 33.7 : okay 33.8 : there’s an engine at Avon [2sec] 33.9 : um 33.10 : and there is _one_ engine also at
10 Part of the explanation of the Combinatorial Explosion Paradox may well be that sentence processing is incremental, in the sense that it starts before the sentence is complete, and that there is no ‘explosion’ in the number of readings because ambiguities are resolved locally, as the text is processed word by word or constituent by constituent The case for incremental parsing is discussed, e.g., in (Crain and Steedman, 1985), and numerous parsing models based on the incrementality hypothesis have now been presented (Jurafsky, 1992). An incremental model of reference interpretation have been developed by Mellish (1985). I will simply mention for the moment that while it is likely that aspects of this idea will have to be incorporated in a theory of disambiguation, a theory of incremental interpretation depends on a theory of underspecification as well, as not all ambiguities can be immediately resolved. For example, the scope of the modal should in (1) is not determined before the NP some oranges is encountered, while the definite NPs the boxcar there and the engine at Avon are processed ‘on the spot’. A revised model that does take the incremental aspect of sentence processing in account is under development. Preliminary work towards an incremental model of discourse interpretation is discussed in (Milward and Cooper, 1994).
7
International Workshop on Computational Semantics of definite description interpretation, tense interpretation, the interpretation of modals in discourse, and scope disambiguation that satisfy the Anti-Random Hypothesis are discussed in (Poesio, 1994).
A logical form like (9) denotes the set of propositions that can be obtained by (i) assigning a scope to all the operators—quantifiers, negation, etc.—by means, say, of a procedure like Cooper’s Store (Cooper, 1983); (ii) choosing a particular value for all parameters, of which (9) contains two, y_ and s_; and (iii) choosing a specific interpretation for all underspecified lexical items, such as SEEU . A function CV (for Cooper Value) can be defined that assigns to each subtree of (9) a set of sequences of length greater than 1, representing an interpretation and a set of objects ‘in store’, and to each tree of the form [S ] a set of sequences of length exactly 1, each of whose elements is an expression denoting a set of propositions corresponding to the interpretations of the tree obtained by fixing one particular scope order and propagating lexical and referential ambiguity. The denotation of the logical form is the union of such sets of propositions. Parameters are used to specify the aspects of a sentence’s meaning that need to be resolved in context by discourse interpretation. Semantically, a parameter denotes a set of functions from situations to the objects made ‘available’ in the discourse situation.12 For example, if the objects a1 …an are ‘made available’ in the sense described below in the discourse situation d, a parameter like y_ in (9) denotes in d a set {f1 , …, fn}of functions from situations to a1 …an.
4. Other Forms of Ambiguity I briefly discuss how to extend the treatment of ambiguity proposed in the previous section to deal with other cases of semantic ambiguity, such as scopal ambiguity and referential ambiguity. This is done by preserving the basic ideas of the theory (semantic ambiguity as multiplicity, and perceived ambiguity as the occurrence of multiple extensions in a default theory) and introducing, on the one hand, a more complex language able to encode other forms of ambiguity; on the other hand, more complex inference rules.
4.1. Logical Forms Scopal and referential ambiguity can be represented in an underspecified way by preserving in our underspecified language UL the information about syntactic structure and the interpretation of lexical items. This can be done by adopting as basis for the underspecified language a language in which the ‘combinatoric’ properties of lexical items can be encoded (such as Montague’s Intensional Logic), and by allowing in UL trees labeled with syntactic categories and with the translations in UL of lexical items. The sentence I don’t see the engine, for example, can be represented as in (9); the translation of the definite the engine is derived from Heim’s proposal (Heim, 1982), as discussed in (Poesio, 1994).11
4.2. Discourse Interpretation as Defeasible Inference Over DRSs Each hypothesis generated during discourse interpretation is an hypothesis about the interpretation of a sentence given a certain context, at least to the extent that information about available discourse referents has to be included. An hypothesis about a context can be represented as in Discourse Representation Theory (DRT) (Kamp and Reyle, 1993), by means of Discourse Representation Structures (DRSs). A DRS is a pair hU,Ci, where U is a set of discourse referents, and C is a set of conditions on (predications about) these referents. In (Kamp and Reyle, 1993), the procedure used to assign truth conditions to a sentence is specified in terms of construction rules, operations from DRSs
(9) S NP
VP V
VP V
I
:
VP V
NP
SEEU
P [s_ j= [ENGINEU (x) ^ [x = y_ ]]] ^ P(x))
11 Although there is no room to discuss the issue here, it is an important assumption of my work on scopal disambiguation that underspecified representations should preserve syntactic information, both because certain inferential processes depend on the syntactic structure, and because the structure imposes constraints on the available interpretations. 12 Note that already in LUL, a term denotes a (singleton) set of such functions.
8
International Workshop on Computational Semantics to DRSs. I propose in (Poesio, 1994) that the format adopted by Kamp and Reyle to specify their semantic interpretation procedure can be recycled to formalize discourse interpretation, in the sense that the operations that result in new hypotheses can be formalized as construction rules. I propose to use construction rules as a generalization of the default inference rules discussed in the previous section, in that they can augment both the set of conditions and the set of discourse referents of a DRS (whereas a ‘traditional’ default inference rule only augments a set of conditions). The Anti-Random Hypothesis suggests another crucial difference between the construction rules that formalize discourse interpretation principles and the rules used by Kamp and Reyle. All discourse interpretation principles are context-dependent, in the sense that they depend on certain conditions being met by the content of a certain hypothesis in order to apply. For example, consider the following partial disambiguation rule MCR.EVERY, that assigns a scope to universally quantified NPs by ‘rewriting’ an underspecified representation.13 When applied, MCR.EVERY replaces the triggering condition with a new tripartite condition, whose restriction contains a discourse marker that is only accessible from the nuclear scope, and whose nuclear scope contains a logical form obtained by replacing every-NP in the triggering condition with the new discourse marker.
scope disambiguation developed in (Poesio, 1994) by making the assignment of a relative scope to operators depend on the listener’s being able to identify the context-dependent aspects of the meaning of these operators; in the case of universally quantified NPs, this aspect is the domain of quantification, represented in the rule by the parameter s_.
5. Conclusions I suggested that in developing a theory of discourse interpretation that is consistent with what we know about the problem of ambiguity, we need first of all to make a distinction between semantic ambiguity and perceived ambiguity, and then make at least two assumptions, that I called here Undespecification Hypothesis and Anti-Random Hypothesis. I presented a model of discourse interpretation in which semantic ambiguity is characterized modeltheoretically in terms of multiplicity of meanings, whereas perceived ambiguity is characterized as a conflict between the hypotheses produced during discourse interpretation. The theory of ambiguity and underspecification here discussed has served as the basis for the SAD system at the University of Rochester, a component of the TRAINS-93 discourse understanding system.
MCR.EVERY (first approximation)
Acknowledgments I owe many of the ideas in the paper, and most importantly, the realization of the importance of the phenomenon of perceived ambiguity, to my advisor Len Schubert and to Graeme Hirst. Thanks to James Allen, Robin Cooper, Richard Crouch, Kees van Deemter, Janet Hitzeman, Howard Kurtzman, Peter Lasersohn, Barbara Partee, Manfred Pinkal and Sandro Zucchi for helpful comments and discussion. This work was in part supported by the LRE Project 62-051 FraCaS.
S Triggering configuration
:
NP
P [8 x [s_ j= Q(x)]] (P(x)) ANCHORED(s _,a)
Constraints:
emptymarkA Replace with:
d Q(d)
?!
S d
References This construction rule is very similar to the rule CR.EVERY proposed by Kamp and Reyle (p. 169), but crucially depends on one aspect of the representation of the quantified NP having been resolved. The Anti-Random Hypothesis is built into the theory of
Allen, J. F. 1987. Natural LanguageUnderstanding. Menlo Park, CA: Benjamin Cummings. Allen, J. F., L. K. Schubert, G. Ferguson, P. Heeman, C. H. Hwang, T. Kato, M. Light, N. Martin, B. Miller, M. Poesio, and D. R. Traum. 1995. The TRAINS project: A
13 These rewrite operations can be considered as simple additions of ‘less ambiguous’ expressions.
9
International Workshop on Computational Semantics Kempson, R. and A. Cormack. 1981. Ambiguity and quantification. Linguistics and Philosophy, 4(2):259–310.
case study in building a conversational planning agent. Journal of Experimental and Theoretical AI, to appear. Alshawi, H. and R. Crouch. 1992. Monotonic semantic interpretation. In Proc. 30th. ACL, pages 32–39, University of Delaware. Altmann, G. T. M., editor. 1989. Parsing and Interpretation. Hove, East Sussex, UK: Lawrence Erlbaum. Barwise, J. and R. Cooper. 1993. Extended Kamp notation. In P. Aczel, D. Israel, Y. Katagiri, and S. Peters, editors, Situation Theory and its Applications, v.3. CSLI, chapter 2, pages 29–54. Brennan, S.E., M.W. Friedman, and C.J. Pollard. 1987. A centering approach to pronouns. In Proc. ACL-87, pages 155–162, June. Cooper, R. 1983. Quantification and Syntactic Theory. Dordrecht, Holland: D. Reidel Publishing Company. Crain, S. and M. Steedman. 1985. On not being led up the garden path: the use of context by the psychological syntax processor. In D. R. Dowty, L. Karttunen, and A. M. Zwicky, editors, Natural Language Parsing: Psychological, Computational and Theoreticalperspectives. Cambridge University Press, New York, pages 320–358. Fenstad, J.E., P.K. Halvorsen, T. Langholm, and J. van Benthem. 1987. Situations, Language and Logic. Dordrecht: D.Reidel. Frazier, L. and J. D. Fodor. 1978. The sausage machine: A new two-stage parsing model. Cognition, 6:291–295. Gibson, E. 1991. A Computational Theory of human linguistic processing: memory limitations and processing breakdown. Ph.D. thesis, Carnegie Mellon University, Pittsburgh. Grosz, B.J., A.K. Joshi, and S. Weinstein. 1983. Providing a unified account of definite noun phrases in discourse. In Proc. ACL-83, pages 44–50. Haddock, N. 1988. Incremental Semantics and Interactive Syntactic Processing. Ph.D. thesis, Dept. of AI and Centre for Cognitive Science, University of Edinburgh. Heim, I. 1982. The Semantics of Definite and Indefinite Noun Phrases. Ph.D. thesis, University of Massachusetts at Amherst. Hobbs, J. R. and S. M. Shieber. 1987. An algorithm for generating quantifier scopings. Computational Linguistics, 13(1-2):47–63, January-June. Hobbs, J. R., M. Stickel, P. Martin, and D. Edwards. 1990. Interpretation as abduction. Technical Note 499, SRI International, Menlo Park, CA, December. Jurafsky, D. 1992. An on-line computational model of human sentence interpretation. In Proc. AAAI-92, pages 302–308. Kamp, H. and U. Reyle. 1993. From Discourse to Logic. Dordrecht: D. Reidel.
Kurtzman, H. 1985. Studies in Syntactic Ambiguity Resolution. Ph.D. thesis, MIT, Cambridge, MA. Kurtzman, H. S. and M. C. MacDonald. 1993. Resolution of quantifier scope ambiguities. Cognition, 48:243–279. Lewis, D. K. 1979. Scorekeeping in a language game. Journal of Philosophical Logic, 8:339–359. May, R. 1985. Logical Form in Natural Language. The MIT Press. Mellish, C. S. 1985. Computer Interpretation of Natural Language Descriptions. Chichester and New York: Ellis Horwood and John Wiley. Milward, D. and R. Cooper. 1994. Incremental interpretation: Applications and relationship to dynamic semantics. In Proc. COLING-94, Kyoto. Poesio, M. 1994. Discourse Interpretation and the Scope of Operators. Ph.D. thesis, University of Rochester, Department of Computer Science, Rochester, NY. Raskin, V. 1985. Semantic Mechanisms of Humor. Dordrecht and Boston: D. Reidel. Reiter, R. 1980. A logic for default reasoning. Artificial Intelligence, 13(1–2):81–132, April. Schubert, L. K. and F. J. Pelletier. 1982. From English to Logic: Context-free computation of ’conventional’ logical translations. American Journal of Computational Linguistics, 10:165–176. Seidenberg, M. S., M. K. Tanenhaus, J. Leiman, and M. Bienkowski. 1982. Automatic access of the meanings of ambiguous words in context: some limitations of knowledge-based processing. Cognitive Psychology, 14:489–537. Shoham, Y. 1988. Reasoning About Change. Cambridge, MA: The MIT Press. Spivey-Knowlton, M., J. Sedivy, K. Eberhard, and M. Tanenhaus. 1994. Psycholinguistic study of the interaction between language and vision. In Proceedings of 12th National Conference on Artificial Intelligence (AAAI-94), Seattle. Swinney, D. A. 1979. Lexical access during sentence comprehension: (re)consideration of context effects. Journal of Verbal Learning and Verbal Behavior, 18:545– 567. Verkuyl, H. J. 1992. Some issues in the analysis of multiple quantification with plural NPs. OTS Working Papers OTS-WP-TL-92-005, University of Utrecht, Research Institute for Language and Speech, The Netherlands. To appear in F. Hamm and E. Hinrichs, editors, Plural Quantification, Kluwer.
10