Partial Proof Trees, Hybrid Logic, and Quantifier ... - Semantic Scholar

2 downloads 0 Views 109KB Size Report
Oct 15, 1999 - mantic interpretation, and in that sense occupies a “middle ground”. We believe that this makes a significant contribution to the investigation of ...
Partial Proof Trees, Hybrid Logic, and Quantifier Scope  Aravind K. Joshi(+), Seth Kulick(+), Natasha Kurtonina Department of Computer and Information Science (+) and Institute for Research in Cognitive Science University of Pennsylvania 3401 Walnut Street, Suite 400A Philadelphia, PA 19119 fjoshi,skulick,[email protected] October 15, 1999

1 Introduction Over the last few years there has been a shift in the transformational tradition, from a representational perspective back to a derivational one, as exemplified in work in the Minimalist program. This shift brings such work closer to work in the resource-logical framework, which grows out of the tradition of categorial grammar, and has always been derivational. There has, naturally, therefore been growing interest in exploring the connections between these frameworks. In this paper we address some issues related to semantic interpretation and “L(ogical) F(orm)” in the two frameworks. Using quantifier scope as an example, we argue that despite their similarities, there are still some fundamental theoretical differences in this respect. We then discuss in this context a system of categorial inference, using a “hybrid logic”, based on insights from Lexicalized Tree Adjoining Grammar (LTAG). We show how this system incorporates features from both traditions to compute semantic interpretation, and in that sense occupies a “middle ground”. We believe that this makes a significant contribution to the investigation of the connections between the Minimalist and resource-logical frameworks.

2 Logical Form in Derivational Minimalism and Resource Logics In work stemming out of the transformational (GB) tradition, logical form is a level of representation mapped from the surface structure, and in more recent work in the  We would like to thank Robin Clark, Gerhard Jaeger, Michael Moortgat, Mark Steedman, Richard

Oehrle, Barbara Partee, Christian Retor´e, and an anonymous reviewer for many valuable comments and discussion. This work was partially supported by NSF grant SBR8920230 and ARO grant DAAH040494-G-0426.

1

(some) (course)

(every) (student)

(every) (student)

(some) (course)

/every/ /student/ (t) likes /some/ /course/

/every/ /student/ (t) likes /some/ /course/

(A)

(B)

Figure 1: Structures for Ambiguous Scope Readings in Derivational Minimalism

.. .

A*B: i B:x *e .. .

.. .

A: i *i A : (x: ) * e A * BB: :x:x ( )

Figure 2: Scope Elimination and Scope Introduction Minimalist Framework, the logical form is a level of representation after “spell-out”. Stabler’s account of Derivational Minimalism [8] does not appear to use the notion of “spell-out”, but creates a structure in which an NP may be phonetically located at one node, while its semantic representation is at another node. (1) (2)

Every student likes some course a. For every student x, there is some course y such that x likes y . b. There is some course y such that for every student x, x likes y

Consider, for example, a standard example of quantifier scope ambiguity, as in (1), with the two readings in (2). The structures in Figure 1 would be generated, where a lexical item X is split into two locations, /X/ for phonetic output and (X) for semantic interpretation.1 . Further interpretation of the resulting structure results in a modeltheoretic interpretation. Montague grammar and related approaches have historically had a somewhat different approach to semantic computation; they employed what [1] referred to as a rule-by-rule approach, in which there is a tight connection between syntactic rules and semantic rules, with a correspondence between syntactic categories and semantic types. Modern resource-logic approaches (e.g., [2, 5, 6, 7]) have recast such rules in a more general and elegant deductive system, but remain faithful to retaining a correspondence between syntax and semantic deduction; as [6] puts it, “the proof terms 1

These structures are simplified, and do not take into account the full treatment of quantifiers proposed in ([8].) However, they are sufficient for the discussion in this paper

some course

s * np : some course * e2 np n s=np : y:x:(likes x y) np : y =e * e1 npns : x:(likes x y) ne s : likes x y 2 s : 9w[(course w) ^ likes x w] * e 1 s : 8u[(student u) ) 9w[(course w) ^ likes u w]] * e

every student s * np : every student np : x

likes

Figure 3: Derivation of reading (2a)

some course

s * np : some course * e2 np n s=np : y:x:(likes x y) np : y =e * e1 npns : x:(likes x y) ne s : likes x y 1 s : 8u[(student u) ) likes u y] * e 2 s : 9w[(course w) ^ 8u[(student u) ) likes u w]] * e

every student s * np : every student np : x

likes

Figure 4: Derivation of reading (2b) associated with categorial derivations relate structural composition in a systematic way to the composition of meaning.”2 For example, [2] and [7] present a natural deduction form of quantifier elimination, illustrated in Figure 2, which uses hypothetical reasoning to allow a quantifier to take scope at a later point in the derivation. The two readings (2) are derived by the different orders of discharging the quantifiers, as shown in Figures 3 and 4, = P: w[( w) (P w)] and = with P: w[( w) (P w)]. There are some similarities between these two approaches, in that they both use some mechanism to allow the quantifier to be interpreted semantically at a higher position than it phonetically appears. However, whereas in Stabler’s approach the semantic representation is built up after the syntactic derivation (where “syntactic” includes the movement of the semantic operators), the resource-logic approach maintains the approach of building up the semantic representation step-by-step.

every student

9 course ^

8 student

)

some course

3 Partial Proof Trees and Hybrid Logic LTAG is a tree-rewriting system and therefore deals with structural, and not string, adjacency. Being lexicalized, a structure is associated with each lexical item. When viewed from the context of categorial grammar, LTAG can be seen as a system of partial proof trees (PPTs) (see [3] for more details than can be discussed here). The key idea is that instead of associating a type with each lexical item, we associate one or more partial proof trees, and each tree is obtained by unfolding the arguments of the 2

These considerations also hold for the CCG categorial system of Steedman ([9]).

Hazel likes

Bob NP

(NP\S)/NP

NP passionately

[NP]

(NP\S) [(NP\S)] (NP\S)\(NP\S) (NP\S) [NP] [(NP\S)] S

Figure 5: Sample Partial Proof Tree Derivation type.3 Important characteristics of LTAG carry over in the the PPT system, namely, the extended domain of locality and consequent factoring of recursion from the domain of dependencies. When viewed from a logic perspective, this leads to the use of a hybrid logic. The basic PPTs serve as the building blocks of the grammar, and complex proof trees are obtained by ‘combining’ these PPTs by two operations: substitution and stretching, illustrated in Figure 5. Substitution makes use of the terminal node of a tree, such as the substituting of the PPT for Bob into an np node in the tree for likes, as shown. The second operation, stretching, provides access to an internal node in a tree. Here the tree for passionately is linked to the “stretching” of the internal np s node in the likes tree. As the figure illustrates, dependencies are represented in the elementary trees by the unfolding process. For example, while the NPs for likes are both clearly semantic arguments and so are unfolded, only the verb phrase, but not the noun phrase, is an argument of the adverb passionately, and so the tree unfolds only to the np s level. Since we are using the Lambek calculus, not only function application but also conditionalization play an important role the construction of the PPTs. However, crucially, trace assumptions, once introduced, must be discharged within the same PPT in which they originate. Since such PPTs are of a small bounded size, this allows for a localization of resource management with corresponding computational advantages. The same advantages hold for the use of structural modalities, such as Permutation, which can be explored only within the scope of an elementary tree and has therefore more restricted capacity than in CG to influence the whole structure of generated strings. The operations of substitution and stretching do not affect this localization. The important point is that constraints that license the elementary trees are distinct from these operations that combine the elementary trees.

n

n

3.1 Hybrid Logic A logical modelling of the PPT system makes use of a hybrid logic, i.e., two kinds of logic are involved. This is a consequence of LTAG being a tree-rewriting system. 3

Only arguments are unfolded - not arguments of arguments

meets Bob NP

1 (NP\S)/NP [NP] NP\S

today [NP\S] (NP\S)\(NP\S) NP\S

[NP] [NP\S] who

S

1

[(N\N)/(S/NP)] S/NP [N] N\N N

Figure 6: Non-Peripheral Extraction

(3)

a. b. c. d. e.

(4)

a.

meets : (npns)=np; np ) npns (5) a. Bob : np; npns ) s b. Bob meets np ) s c. Bob meets ) s=np who Bob meets ) nnn n nnn ) n d. npns today ) npns

e.

meets np today ) npns np npns ) s

Bob meets np today ) s Bob meets today ) s=np who Bob meets today ) nnn n nnn ) n

We distinguish the logic of constructing basic trees and combining trees. Construction of basic trees is guided by the logic of a CG, while both operations of combining trees (substitution and stretching) are encoded by a single rule: Cut. The logic of constructing the basic trees is based on the usual understanding of structure-sensitive consequence relation between formulas as types. However, since trees can be viewed as proofs, the logic of combining trees defines how some set of proofs can be transformed into another proof. Therefore the consequence relation is defined on proofs. We first give an example of a hybrid inference (with some abuse of notation, and ignoring substitution), and then give a scheme of a general definition. Consider the example of a relative clause, and the derivation illustrated in Figure 6. This derivation takes advantage of the possibilities for conditionalization discussed above. As can be seen, the tree for meets assumes an np assumption which is locally discharged. This is a case of non-peripheral extraction, a classic example of the use of a Permutation modality in categorial grammar. We show that hybridization of the inference allows us to avoid the use of the structural modality, and in general, it has been shown elsewhere ([3, 4]) that this approach allows the use of structural modalities to be in some cases eliminated and in other cases localized. The sequents in (3) model the derivation of the first part of the meets tree. For space reasons, we are not discussing the latter part of the tree with the use of who. The sequent in (4) models the derivation for the today tree. It is important that every step

to walk

seems

(NP\Sinf) Bob NP

(NP*\S)/(NP\S) [(NP\Sinf)] (NP*\S)

[NP] [(NP\S)] S

Figure 7: Interpolation for subject-to-subject raising of the derivation in (3) is presented to make internal nodes available for stretching. The sequents in (5) illustrate how the second logic, a result of the hybridization, is used to combine the sequents in (3) and (4). The first step would be the application of cut to (3a) and (4a), resulting in (5a). The second step replaces meets np with meets np today everywhere in the derivation of (3), resulting in (5bcde). Note that, crucially, the structure of (3) is not disturbed by this replacement. Therefore, while (5c) has the appearance of a violation of the Lambek calculus, this step is justified. The relations between the types in an elementary tree are fixed by the creation of the tree. Since the stretching process maintains the relations between the types, the non-peripheral extraction is legitimate due to the application of the second logic, which does not disturb the type relations in the elementary trees. Space reasons prevent us from giving here the exact definition of models for stretching trees. However, the form of the general definition will be as follows: Let  1 be a A and 2 be a proof containing (**) Y A Z A as the proof containing(*) X last sequent. Then a new proof  3 contains all sequents preceding (*) and (**) in  1 and  2 respectively, a new sequent Y X Z A, and all sequents of 1 provided that X is replaced by Y X Z , with no change to the structure.

)

)

)

3.2 Interpolation Stretching can be generalized as an interpolation rule which is used for some other linguistic constructions, such as subject-to-subject raising, discussed in this section, and quantifier scoping, discussed in the next section. An example of interpolation is seen in the PPT for to walk in Figure 7, in which it interpolates from (np sinf ) to np s before continuing the unfolding. From a logic perspective, the double-dashed lines indicate that a subproof needs to be inserted, such that it has an (np sinf ) assumption and the conclusion np s. In a general setting, an interpolated proof has a step in which there is a step INTERPO LATION x y, which indicates that to complete the proof construction we need to insert a subproof which corresponds to the tree with the assumption x and conclusion y.. However, due to space reasons we must leave out here the logical proofs for the derivations using interpolation. We stress that interpolation, unlike stretching, is not something that can be generated within the course of a derivation. All interpolation specifications are stated on the basic PPTs, the units that are then put together during the derivation.

n

n

!

n

n

some course

every student

likes NP/N N

NP/N N NP

(NP\S)/NP [NP]

NP

[NP]

(NP\S) S

[S]

[S]

S(every)

S(some)

Figure 8: PPTs for Every student likes some course

Vu (student(u) -> S) S

u

likes x y x

w

∃w (course(w) ^ S)

y

S

Figure 9: Semantic Representations of PPTs

4 Quantifier Scoping in the PPT System In this section we address the problem of how to derive the ambiguity of quantifier scope readings in the PPT system, and compare it to the two approaches discussed in Section 2. In order to derive the two readings for (1), we use interpolation to specify a basic PPT for the NPs, which can be considered a structural analog to type-raising. In addition, corresponding to each elementary PPT, we have a representation of its semantic contribution. At each step of the derivation, as the syntactic PPTs combine, the semantic representations combine as well. Figure 8 shows the basic PPTs used for the derivations of (1), with the PPTs for the quantifiers interpolating from np to s.4 Figure 9 shows the corresponding semantic representations. Each PPT is represented by a boxed semantic representation, with the smaller protruding boxes representing variables (resources) that need to be filled during the derivation. We stress that there is nothing crucial about the particular representation used - one can imagine different ways to handle the representation. 5 The crucial aspect is that the representations correspond to a basic PPT as a single entity, not to the individual nodes 4 To avoid clutter, we have shown the N PPTs already substituted into the quantifier PPTs. This is a simple matter and does not affect the discussion here. 5 In particular, the boxes are not meant to suggest a deep connection to DRT. [6]’s comment holds for the PPT system as well, we believe: “The derivational semantics is fully neutral with respect to the particular ‘theory of natural language semantics’ one wants to plug in: an attractive design property of the type-logical architecture when it comes to portability.

every student NP/N

likes

N

some course NP/N N

NP

(NP\S)/NP [NP]

NP

(NP\S)

NP/N

N

NP

S [S]

every student

S(some)

S(every)

likes

some course NP/N N

(NP\S)/NP

NP

(NP\S) S

S(some) S(every)

(A)

(B)

Figure 10: Syntactic Derivation for every wide scope

Vu (student(u) -> S)

∃w (course(w) ^ likes x w)

u

x

S (A)

Vu [ (student u) -> ∃w [ (course w) ^ likes u w ]] (B)

Figure 11: Semantic Derivation for every wide scope in a PPT. The derivation can proceed in two ways. We first illustrate the derivation in which every student takes wider scope. It proceeds by first combining the PPTs for likes and some via interpolation (connecting the NP and S nodes), resulting in Figure 10A. The corresponding semantic step combines the semantic representations for likes and some course, by unifying the w and x nodes, and the S node and the likes x y box. The result is shown in Figure 11A. The next syntactic step combines the PPTs for every student and likes some course in Figure 10A to form Figure 10B. The corresponding semantic step results in the LF in Figure 11, in which u and x have unified, along with S and the representation for likes some course.6 To derive the reading with some course taking wider scope, first the PPTs for every student and likes trees combine, forming the PPTs in Figure 12A, which then combine with the tree for some to form figure 12B. The corresponding semantic derivation is shown in Figure (13AB). 6 This is a simplified representation of the underlying semantic composition, due to space restrictions, although adequate for immediate purposes. In future work we will show in more detail how the use of S in the semantic composition is eliminated in favor of using (P w), in a more direct representation of the usual semantics for generalized quantifiers. The small boxes request variables that can be taken as denoting discourse entities or events.

some course likes every student NP/N

N

NP/N (NP\S)/NP

[NP]

[NP]

N

NP

every student NP/N

(NP\S)

N

NP

S

some course NP/N N

(NP\S)/NP

NP

(NP\S) S

[S]

S(every)

likes

S(every)

S(some)

S(some)

(A)

(B)

Figure 12: Syntactic Derivation for some wide scope

Vu (student(u) -> likes u y)

w

∃w (course(w) ^ S) S

y (A)

∃ w [ (course w) ^ Vu [ (student u) -> likes u w ]] (B)

Figure 13: Semantic Derivation for some wide scope

4.1 Comparison with Derivational Minimalism and Resource-Logic Approaches Derivational Minimalism and Partial Proof Trees The Derivational Minimalism approach derives the two structures in Figure 1 for the ambiguous quantifier scopings. The derivation of each tree is then followed by semantic interpretation. As shown in Figures 10B and 12B, the PPT system also derives structures which syntactically represent the different quantifier readings. However, the difference is that the semantic interpretation does not take place after the derivation; it is computed simultaneously with the syntactic derivation.7 From the perspective of the PPT system, then, the syntactic tree with the different scope readings is a byproduct of the derivation, with no real significance. The use of interpolation in a basic PPT clearly looks very similar to a “quantifier raising” (QR) approach, as the feature checking in Derivational Minimalism in effect accomplishes. Indeed, it is likely to be possible to reconceptualize such feature checking as being the mechanism that creates a basic PPT. The difference is that such QR would be, by the very nature of formalism, localized to the (small) domain of a basic PPT. 7 We have not attempted here to fully replicate the semantic system used in [8], and have contented ourselves with deriving the usual logical representation. A more complete presentation, which we leave for future work, would attempt to derive the same semantic representation.

Further, because the very same machinery is used for derivation of structure and scope readings, it is guaranteed that no extra formal power is necessary for generating the scope readings. The overall system is still weakly equivalent to the LTAG system, because the PPT system is weakly equivalent to the LTAG system. For a system whose scope readings are derived by further operations at another level of representation, such an equivalence is not guaranteed at all. Also, we mention here another issue, that different quantifiers do not all have the same ambiguity. Various authors have pointed out that (6) does not allow the reading with the object having higher scope (the “inverse” reading) (7b). (6) (7)

Some linguist speaks at most two languages a. Some linguist x is such that x speaks at most two languages b. There are at most two languages y such that some linguist or other speaks those two languages y .

In our approach, scope (7a) is derived by some being structurally type-raised, in a way similar to that of the subject every student in Figure 10. Scope (7b) is ruled out by not allowing at most to take the raised object tree in the same way that some had in Figure 12. Note in this connection the argument made by [8] against “semantic restructuring” accounts which “compute the normal and inverse scope readings from a single syntactic structure by, in effect, storing and raising the embedded object quantifier in the computation of logical form”. Since such accounts make either argument of the predicate available to take higher scope, a problem is raised by examples like (6). [8] suggests that Any account is going to have to reconstruct some sort of asymmetry between the two arguments of the verb, and then have some corresponding difference between the lexical entries for the various sorts of quantifiers so that derivations of the unwanted scopes are unavailable. Stabler’s objection does not apply to our approach, since it is unproblematic to have a corresponding difference between the lexical entries. This is because our system is lexicalized, in the sense that each basic PPT is anchored by a lexical entry, so it is possible to “control” the possible PPTs for a particular entry. Systems which allow type-raising, e.g., during the course of the derivation lose this control. We do not need this sort of derivational complexity because the system is based on units larger than just types, already encoding all potentially-needed partial proofs. Resource Logics and Partial Proof Trees The PPTS derivation retains the stepby-step syntax/semantic correspondence during the derivation, just like all grammars in the categorial tradition. However, because of the use of a hybrid logic, the items used for syntactic and semantic composition are larger than those in more traditional approaches. Just as each basic PPT can be considered as a section of a Derivational Minimalism derivation, each basic PPT can also be considered part of a more traditional resource logic/categorial derivation. For example, there is clearly a close connection between the Scope Elimination rule in Figure 2 from [2, 7], and our use of interpolation within a basic PPT for the

quantifiers. However, the Scope Elimination rule must for an allow unboundedly large proof content between the introduction and discharge of the hypothetical. In contrast, since interpolation can only be specified on basic PPTs, there can only be an small, bounded distance between the introduction and discharge.8 The analysis for quantifier ambiguity is an illustration of how the use of a hybrid logic allows for a different approach to resource management, which localizes the use of such things as interpolation or structural modalities to the domain of the basic PPTs.

5 Conclusion We have discussed an approach to capturing quantifier scope ambiguity within the context of categorial inference based on the use of hybrid logic to impose two different levels of composition. The resulting analysis captures aspects of both the Derivational Minimalism and Resource Logic approaches. We have suggested that the analysis described here can therefore be seen as a “middle-ground” between those two approaches, and in this context their differences regarding semantic interpretation are not as significant as might otherwise appear.

References [1] Emmon Bach. Informal Lectures on Formal Semantics. State University of New York Press, 1989. [2] Bob Carpenter. Quantification and scoping: A deductive account. 1994. [3] Aravind K. Joshi and Seth Kulick. Partial proof trees as building blocks for a categorial grammar. Linguistics and Philosophy, 20:637–667, 1997. [4] Aravind K. Joshi, Seth Kulick, and Natasha Kurtonina. An LTAG perspective on categorial grammar, 1998. Presented at Logical Aspects of Computational Linguistics. [5] Alain Lecomte and Christian Retor´e. Words as modules and modules as partial proof-nets. In John Benjamins, editor, Proceedings of ICML 96, 1996. [6] Michael Moortgat. Categorial type logics. In J. Van Benthem and A. Ter Meulen, editors, Handbook of Logic and Language. North Holland, 1997. [7] Glyn Morrill. Type Logical Grammar - Categorial Logic of Signs. Kluwer, Dordrecht, 1994. [8] Edward Stabler. Computing quantifier scope. In Szabolcsi, editor, Ways of Scope Taking, pages 155–182. Kluwer, 1996. 8 Of course, one might wonder then about cases in which more than just one clause is in between the assumption and discharge. In that case, the two components are “pushed apart” by the use of stretching, although we do not have room to illustrate this here. It is the same idea as deriving a long distance topicalization for Apples Bill thinks John likes from a basic PPT for Apples John likes, as described in [4].

[9] Mark Steedman. Surface Structure and Interpretation. Number 30 in Linguistic Inquiry Monograph. MIT Press, 1996.

Suggest Documents