TR 2014/02
ISSN 0874-338X
Joint Proceedings of the Second Workshop on Natural Language and Computer Science (NLCS’14) & 1st International Workshop on Natural Language Services for Reasoners (NLSR 2014) Affliated to RTA-TLCA, VSL 2014 July 17-18, 2014 Vienna, Austria. Valeria de Paiva Nuance Communications & Walther Neuper Graz University of Technology & Pedro Quaresma University of Coimbra & Christian Retoré University of Bordeaux & Lawrence S. Moss Indiana University & Jordi Saludes Polytechnic University of Catalonia
Center for Informatics and Systems of the University of Coimbra
Joint Proceedings of the Second Workshop on Natural Language and Computer Science (NLCS’14) & 1st International Workshop on Natural Language Services for Reasoners (NLSR 2014) July 17-18, 2014 Vienna, Austria Valeria de Paiva Nuance Communications & Walther Neuper Graz University of Technology & Pedro Quaresma University of Coimbra & Christian Retoré University of Bordeaux & Lawrence S. Moss Indiana University & Jordi Saludes Polytechnic University of Catalonia September 9, 2014 updated version, November 27, 2014
Preface This volume contains the papers presented at NLCS’14, the Second Workshop on Natural Language and Computer Science & NLSR 2014, Natural Language Services for Reasoners, FLoC Workshops, affliated to RTA-TLCA joint conference, Vienna Summer of Logic, July 17-18, 2014, Vienna, Austria. Natural Language in Computer Science (NLCS) is a workshop on the interface between computer science and linguistics, covering a wide range of topics connected to logic and the larger VSL. The overall theme is the two-way interplay between formal tools developed in logic, category theory, and theoretical computer science on the one hand, and natural language syntax and semantics on the other. Formal tools coming from logic and related areas theory are important for natural language processing,especially for computational semantics. Moreover, work on these tools borrows heavily from all areas of theoretical computer science. In the other direction, applications having to do with natural language have inspired developments on the formal side. The NLCS workshop invites papers on both topics, as well as on the combination between logical methods and statistical methods. Specific topics include, but are not limited to: • linguistic,computational and logical aspects of the interface between syntax and semantics; • logical aspects of linguistic theories; • logic for semantics of lexical items, sentences, discourse and dialog; • formal tools in textual inference, such as logics for natural language inference; • applications of category theory in semantics; • linear logic in semantics; • formal approaches to unifying data-driven (quantitative, statistical) and declarative (logical) approaches to semantics; • natural language processing tools using logic. Natural Language Services for Reasoners (NLSR) natural language aspects of:
is a forum to gather researchers interested in the
• Multilingual on-line accessible mathematical content; • Advanced tools for automated and interactive theorem proving and problem solving; • Rigorous reasoning methods and tools; • Formal methods and tools (making them more accessible to non-experts); • Generating explanations from business rules. The complexity of ATP and ITP makes them only usable by experts. One of the cornerstones of addressing a more general audience is the ability to be queried and to generate results in natural language. This necessity is palpable from the fact that many systems in the TP community try hard to make proofs as close as possible to the “natural language” of mathematicians. In the area of education, intelligent tools such as automated and interactive theorem provers, the automatic discovery of properties, tools, problems repositories, the formalisation of theories fragments, etc., can bring a whole new dimension to mathematical education. The current development of these tools is focused on usability for experts, and it is still a major challenge to make such tools ready for widespread use. However, for some tools it seems to be the right time to begin addressing the next challenge, i.e., to link and adapt them for specific educational needs.
On the side of language technology, some dialog managers use theorem provers to drive the underlying logic and controlled natural languages exist based on abstract representations compatible to the ones used in reasoners. A third aspect comes from formal representation of mathematics with the aim of automatic checking or translation of existing material. This year there were 19 extended abstracts submitted to NLCS from which 16 were accepted; also 3 extended abstracts were submitted to NLSR from which 2 were accepted. We thank all those who have contributed to this meeting. The NLSR sponsor, the Association for Logic, Language and Information (FoLLI) (http://www.folli.info/) and most importantly, we thank the invited speakers: Aarne Ranta, Anne Abeillé and Laure Vieu, the contributing authors, the referees, the members of the program committee and the local organizers, all of whose time and effort have contributed to the practical and scientific success of the meeting. Support from EasyChair is also gratefully acknowledged July 17-18, 2014, Vienna Valeria de Paiva Walther Neuper Pedro Quaresma Christian Retoré Lawrence S. Moss Jordi Saludes
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Programme Committee NLCS Robin Cooper, University of Gothenburg Valeria de Paiva, Nuance Communications Christophe Fouqueré, University Paris 13 Lawrence S. Moss, Indiana University Ian Pratt-Hartmann, University of Manchester, UK Christian Retoré, University of Bordeaux Wlodek Zadrozny, UNC, Charlotte
Organisers NLCS Valeria de Paiva, Nuance Communications (co-chair) Lawrence S. Moss, Indiana University (co-chair) Christian Retoré, University of Bordeaux (co-chair)
Programme Committee NLSR Yannis Haralambous. Institut Mines-Télécom, Télécom Bretagne Walther Neuper, Graz University of Technology Bengt Nordstrom, Chalmers University of Technology Pedro Quaresma, University of Coimbra Jordi Saludes, Polytechnic University of Catalonia
Organisers NLSR Walther Neuper, Graz University of Technology (co-chair) Pedro Quaresma, University of Coimbra (co-chair) Jordi Saludes, Polytechnic University of Catalonia (co-chair)
CISUC/TR 2014-02
iv
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Conference Program I
Invited Speakers
1
3 Machine Translation: Green, Yellow, and Red Aarne Ranta
II
Contributed Papers NLCS
5
7 Solving Partee’s Temperature Puzzle in an EFL-Ontology Kristina Liefke 19 Mereological Delineation Semantics and Quantity Comparatives Heather Burnett1 31 Signatures as open-ended types Tim Fernando 43 Divergence in Dialogues Christophe Fouqueré and Myriam Quatrini 55 Algebraic Effects and Handlers in Natural Language Interpretation Jiˇrí Maršík and Maxime Amblard 67 Analyzing the Structure of Argument Compounds in Logic: the case of technical texts Juyeon Kang and Patrick Saint-Dizier 83 Grice, Hoare and Nash: some comments on conversational implicature Rohit Parikh 95 Computing Italian clitic and relative clauses with tupled pregroups Claudia Casadio and Aleksandra Ki´slak-Malinowska 111 Toward a Logic of Cumulative Quantification Makoto Kanazawa and Junri Shimada 125 Modelling implicit dynamic introduction of function symbols in mathematical texts Marcos Cramer 137 On Translating Context-Free Grammars into Lambek Categorial Grammars Stepan Kuznetsov 143 Program Extraction Applied to Monadic Parsing Ulrich Berger, Alison Jones, Monika Seisenberger 155 How to do things with types Robin Cooper 165 A low-level treatment of generalised quantifiers in categorical compositional distributional semantics Ondˇrej Rypácˇek, Mehrnoosh Sadrzadeh
III
Contributed Papers NLSR
179
181 Deduction for Natural Language Access to Data Cleo Condoravdi, Kyle Richardson, Vishal Sikka, Asuman Suenbuel and Richard Waldinger 195 Exposing Predictive Analytics through Natural Language Jeroen Van Grondelle, Christina Unger and Frank Smit
Index of Authors
CISUC/TR 2014-02
203
v
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
CISUC/TR 2014-02
vi
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Part I
Invited Speakers
CISUC/TR 2014-02
1
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Machine Translation: Green, Yellow and Red Aarne Ranta University of Gothenburg Abstract Machine translation (MT) can be divided into quality-oriented and coverage-oriented approaches (also known as dissemination and assimilation, respectively). The current main stream is coverageoriented: most people use MT for getting an idea of what some document is about, but don’t rely on it when they want to publish their own documents. Coverage-oriented systems must be able to translate everything, whereas quality-oriented systems usually have to sacrifice coverage and specialize on some domain. Most available coverage-oriented systems are statistical (Google translate, Bing), but there are also rule-based systems available (Systran, Apertium). In MT research, the main line of research is hybrid systems combining statistics with linguistic knowledge. In this talk, we will present a hybrid MT approach based on GF, Grammatical Framework. Most of the previous work in GF has focused on small, quality-oriented systems working on controlled languages; the main asset has been the scalability to high numbers of parallel languages. But recent developments in GF runtime algorithms and language resources have made it possible to address the coverage-oriented task of “translating everything”. This happens of course with some loss of quality, but the great advantage of GF (and some other knowledge-based systems) is that we can make a clear distinction between levels of confidence. We have used this knowledge in translation programs by marking translations as green (reliable), yellow (grammatically correct but unreliable), and red (unreliable but “still better than nothing”). There is also a clear recipe for improving the quality by increasing the size of the “green” area. These are typically inherited from domain-specific controlled languages. The talk will explain how grammars of different levels are created and combined, how statistics is used in the translation process and for bootstrapping grammars, and how the resulting system performs in comparative evaluation. The current system is available in eleven languages, both as a web service and as a mobile Android app: http://www.grammaticalframework.org/demos/translation.html. The talk is based on joint work with Krasimir Angelov, Inari Listenmaa, Prasanth Kolachina, Ramona Enache, and Thomas Hallgren and partly funded by Swedish Research Council under grant nr. 2012-5746 (Reliable Multilingual Digital Communication).
CISUC/TR 2014-02
3
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
CISUC/TR 2014-02
4
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Part II
Contributed Papers NLCS
CISUC/TR 2014-02
5
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Solving Partee’s Temperature Puzzle in an EFL-Ontology∗ Kristina Liefke Munich Center for Mathematical Philosophy Ludwig-Maximilians-Universit¨at, Munich, Germany
[email protected] Abstract According to the received view of type-logical semantics (suggested by Montague and adopted by many of his successors), the number of a semantics’ basic types depends proportionally on the syntactic and lexical diversity of the modeled natural language fragment. This paper provides a counterexample to this principle. In particular, it shows that Partee’s temperature puzzle – whose solution is commonly taken to require a basic type for indices (for the formation of individual concepts) or for individual concepts – can be interpreted in the poorer type system from [21], which only assumes basic individuals and propositions. We use this result to defend the invariance of formal semantic models under their objects’ codings. This result further contributes to the project of identifying the minimal semantic requirements on models for certain linguistic fragments.
1
Introduction
It is a commonplace in ontology engineering that, to model complex target systems, we need to assume many different kinds of objects. The semantic ontology of natural languages is no exception to this: To interpret a reasonably rich fragment of English, we assume the existence of individuals, propositions, properties of individuals, relations between individuals, situations, events (or eventualities), degrees, times, substances, kinds, and many other types of objects. These objects serve the interpretation of proper names, declarative sentences or complement phrases, common nouns or intransitive verbs, transitive verbs, neutral perception verbs, adverbial and degree modifiers, temporal adjectives, mass terms, bare noun phrases, etc. Traditional type-logical semantics (esp. [21–23]) tames this zoo of objects by assuming only a small set of primitive objects, and by constructing all other types of objects from these primitives via a number of type-forming rules. In this way, Montague reduces the referents of the basic fragment of English from [21] (hereafter, the EFL-fragment) to two basic types of objects: individuals (type ι) and propositions (analyzed as functions from indices to truth-values, type σ → t; abbreviated ‘o’). From these objects, properties of individuals and binary relations between individuals are constructed as functions from individuals to propositions (type ι → o), respectively as curried functions from ordered pairs of individuals to propositions (i.e. as functions from individuals to ∗
Thanks to Ede Zimmermann, whose comments on my talk at Sinn und Bedeutung 18 have inspired this paper.
CISUC/TR 2014-02
7
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
functions from individuals to propositions; type ι → (ι → o)). Since Montague’s semantics reduces the number of basic objects in the semantic ontology to (constructions out of) a small set of primitives, we hereafter refer to the above-described view of natural language semantics as the reduction view of formal semantics. This view is characterized in Definition 1: Definition 1 (Reduction view). Many (types of ) objects in the linguistic ontology can be coded as constructions out of a few (types of ) semantic primitives. The coding relations between objects enable a compositional interpretation of natural language. In the last forty years, revisions and extensions of Montague’s formal semantics have caused many semanticists to depart from the reduction view. This departure is witnessed by the introduction of a fair number of new basic types (including types for primitive propositions (cf. [4,30]), situations (cf. [1,19]), events (cf. [8,9]), degrees (cf. [7]), and times (cf. [10])). The introduction of these new types is motivated by the coarse grain of set-theoretic functions (s.t. Montague’s semantics sometimes generates wrong predictions about natural language entailment; cf. [2, 6, 20]), and by the need to find semantic values for ‘new’ kinds of expressions (e.g. for neutral perception verbs, and for adverbial and degree modifiers) which are not included in Montague’s small linguistic fragments. Since many of these values are either treated as semantic primitives or are identified with constructions out of the new primitives, we will hereafter refer to this view of formal semantics as the ‘ontology engineering’ view.1 This view is captured below: Definition 2 (‘Ontology engineering’ view). Many (types of ) objects in the semantic ontology of natural language cannot be coded as constructions out of other (types of ) objects. The compositionality of natural language interpretation is only due to the coding relations between a small subset of objects. The ‘ontology engineering’ view of formal semantics is supported by Montague’s semantics from [21] and [23]. The latter interpret the EFL-fragment in models with primitive individuals and propositions, and interpret the fragment of English from [23] (called the PTQ-fragment) in models with primitive individuals, indices (i.e. possible worlds, or world/time-pairs), and truth-values. Since the PTQ-fragment extends the lexicon of the EFL-fragment via intensional common nouns (e.g. temperature, price) and intensional intransitive verbs (e.g. rise, change), since the frames of PTQ-models extend the frames of EFL-models via (constructions out of) individual concepts, and since PTQ-models interpret intensional common nouns and intransitive verbs as functions over individual concepts, it is commonly assumed that any empirically adequate model for the PTQfragment requires a basic type for indices, or for individual concepts. The adoption of the ‘ontology engineering’ view has a number of practical advantages. In particular, the availability of a larger number of basic types facilitates work for 1
Notably, the ‘ontology engineering’ view still understands formal semantics as a compositional enterprise, which identifies each expression’s semantic value with its contribution to the truth-conditions of the sentence in which the expression occurs.
CISUC/TR 2014-02
8
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
the empirical linguist: In a rich type system, fewer syntactic expressions are interpreted in a non-basic type.2 As a result, the compositional translations of many syntactic structures will be simpler, and will involve less lambda conversions than their ‘reductive’ counterparts. However, the proliferation of basic types is not an altogether positive development. Specifically, the adoption of additional basic types reduces the number of representational relations between different types of objects (cf. [16,25]) and obfuscates the semantic requirements on different linguistic fragments. For example, since Montague does not discuss the relation between his model-theoretic objects from [21] and [23], it remains unclear whether a primitive type for indices is required for the interpretation of the PTQ-fragment. However, an identification of these requirements is vital to our understanding of the semantic type system of natural language. This paper aids the identification of the above requirements. In particular, it shows that Partee’s temperature puzzle from [23, pp. 267–268] – which is commonly taken to prove the impossibility (or undesirability) of interpreting the PTQ-fragment in an EFLsemantics – can be interpreted in the poorer semantic models from [21]. The paper is organized as follows: Section 2 introduces Partee’s temperature puzzle for extensional semantics, and presents Montague’s solution to this puzzle from [23]. Section 3 identifies a strategy for the puzzle’s solution in an EFL-ontology, and applies this strategy. The paper closes with a summary of our results and with pointers to future work.
2
Partee’s Temperature Puzzle and Montague’s Solution
Partee’s temperature puzzle [23, pp. 267–268] identifies a problem with extensional semantics for natural language, which regards their validation of the counterintuitive inference from (?): The temperature is ninety. (?) The temperature rises. Ninety rises. The origin of this problem lies in the different readings of the phrase the temperature in the two premises of (?), and in the inability of extensional semantics to accommodate one of these readings: In the second premise, the occurrence of the phrase the temperature is intuitively interpreted as a function (type σ → ι) from worlds and times (or ‘indices’, type σ) to the temperature at those worlds at the given times.3 In the first premise, the occurrence of the phrase the temperature is interpreted as the value (type ι) of the above 2
For example, since linguists typically assign degree modifiers (e.g. very) the type for degrees d (rather than the type for second-order properties of individuals (ι → o) → o), gradable adjectives (e.g. tall) receive a translation in the type δ → (ι → o), instead of the type ((ι → o) → o) → (ι → o). 3 As a result, this reading is sometimes called the function reading [17,28]. The reading of the phrase the temperature from the first premise is called the value reading.
CISUC/TR 2014-02
9
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
function at the actual world at the current time. Since extensional semantics (like the semantics of Church’s Simple Theory of Types [5], and Montague’s semantics from [21]) do not have a type for indices – such that they also lack a type for index-to-value functions –, they are unable to capture the reading of the phrase the temperature from the second premise. The inference from the conjunction of the logical translations of the two premises of (?) to the translation of the conclusion of (?) in classical extensional logic (cf. [5]) is given in Figure 1. In the figure, the variables x and y range over individuals (type ι). The expression ninety is an individual constant (type ι). The expressions temp and rise are constants for first-order properties of individuals (i.e. for functions from individuals to propositions, type ι → o). ∃x∀y.(temp (y) ↔ x = y) ∧ x = ninety ∃x∀y.(temp (y) ↔ x = y) ∧ rise (x) rise (ninety) Figure 1: The valid counterintuitive inference in extensional semantics. The asserted identity of the temperature x with the value ninety in the formula in the first premise of Figure 1 justifies the (counterintuitive) substitution of the translation of the phrase the temperature from the second premise by the translation of the name ninety. Montague’s semantics from [23] blocks the counterintuitive inference from Figure 1 by interpreting intensional common nouns (e.g. temperature) and intransitive verbs (e.g. rise) as properties of individual concepts (type (σ → ι) → o), and by restricting the interpretation of the copula is to a relation between the extensions of two individual concepts at the actual world @ at the current time (i.e. to a curried relation between individuals, type ι → (ιι → o)). Since the first premise of (?) thus only asserts the identity of the individual ‘the temperature at @ @’ and the value ninety, it blocks the substitution of the ‘individual concept’-denoting phrase the temperature in the second premise of (?) by the name ninety. The invalidity of the inference from the conjunction of the two premises of (?) to the conclusion of (?) in (a streamlined version of) Montague’s Intensional Logic (cf. [13]) is captured in Figure 2 (next page). In the figure, c and c1 are variables over individual concepts (type σ → ι). The expressions temp and rise are constants for first-order properties of individual concepts (i.e. for functions from individual concepts to propositions, type (σ → ι) → o). ninety is a constant (type σ → i) for the function from indices to the denotation of the constant ninety.4 We let @ be the type-σ constant for the current index. The remaining expressions are typed as above. 4
Thus, ninety satisfies the axiom ∀iσ .ninety(i) = ninety.
CISUC/TR 2014-02
10
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
∃c∀c1 .(temp(c1 ) ↔ c = c1 ) ∧ c (@) = ninety ∃c∀c1 .(temp(c1 ) ↔ c = c1 ) ∧ rise (c) /// /// rise(ninety) Figure 2: The invalid counterintuitive inference in PTQ-semantics.
Since Montague’s models from [21] only assume basic types for individuals and propositions (s.t. they do not allow the construction of individual concepts via the available type-forming rules), it is commonly assumed that these models are unable to block the counterintuitive inference from (?). We show below that this assumption is mistaken.
3
A New, Extensional, Solution to the Puzzle
To demonstrate that Montague’s models from [21] enable a solution to Partee’s temperature puzzle, we first identify a strategy for the type-{ι, o} representation of indices, which allows us to code all linguistically relevant objects in the class of models from [23] as objects in the class of models from [21] (in Sect. 3.1). We then show that type-incompatibilities, which result from the application of this coding procedure to the compositional interpretation of the sentences from (?), are easily resolved through (EFL-correlates of) the familiar type-shifting operations from [25] and [3, 15, 18] (in Sect. 3.2).
3.1
Coding PTQ-Objects as EFL-Objects
To block the counterintuitive inference from (?) in the class of models from [21], we code indices as type-o propositions. This coding is made possible by the interpretation of o as the type for functions from indices to truth-values (s.t. there is an injective map, λiσ .i = w, from indices w to propositions), and is supported by a corollary of Stone’s Theorem. The existence of an injective relation between indices and propositions justifies the replacement of each occurrence of ‘σ’ in the type for individual concepts by the type o. The traditional linguistic designators of individual concepts will then be interpreted in the type o → ι. Intensional common nouns and intransitive verbs receive an interpretation in the type (oo → ι) → o. Expressions from all other lexical classes of the PTQfragment retain their original type-assignment. In particular, extensional common nouns (e.g. man) and intransitive verbs (e.g. walk) are interpreted in the type ι → o. The coding relation between objects in the models from [21] and [23] is captured in Definitions 5 and 6. In the definitions, TY2 types and EFL types are defined as follows: Definition 3 (TY2 types). The set 2Type of TY2 types is the smallest set of strings such that ι, σ, t ∈ 2Type and, for all α, β ∈ 2Type, (α → β) ∈ 2Type. Definition 4 (EFL types). The set 1Type of EFL types is the smallest set of strings such that ι, o ∈ 1Type and, for all α, β ∈ 1Type, (α → β) ∈ 1Type.
CISUC/TR 2014-02
11
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Our use of the types from Gallin’s logic TY2 instead of the types from Montague’s logic IL from [23] is motivated by the greater perspicuity of TY2 (in comparison to IL; cf. [24, Ch. 3]), and by the possibility of embedding IL into TY2 (cf. [13]). Because of the codability of n-ary as unary function types [27] (cf. [24]), our definition of EFL-types neglects the existence of n-ary function types. It is clear from the above and from the definition of o as σ → t that all EFL types are TY2 types, but not the other way around. In particular, the TY2 type σ and constructions out of this type (esp. the types σ → ι and (σ → ι) → t) are not EFL types. The relation between TY2 types and EFL types is captured below: Definition 5 (Type-conversion). The function ξ sends TY2 types to EFL types by the following recursion: I. (i) (ii) (iii) II.
ξ(ι) ξ(σ → ι) ξ(t) ξ(σ)
= = = =
ξ(α → β)
=
ι; ξ(σ → t) = o; (o → ι); o, if t is not a constituent of (an equivalent of ) σ → t; o, if σ is not a constituent of (an equiv. of ) σ → t or σ → ι; (ξ(α) → ξ(β)), where α, β ∈ 2Type.
Clause I captures the conversion of the TY2 types for individuals, individual concepts, and propositions (cf. subclauses (i), (ii)), and of the TY2 types for indices and truthvalues (under certain conditions, cf. subclause (iii)). Clause II captures the conversion of the remaining complex TY2 types. In particular, the conjunction of clause II with the clauses for the conversion of the TY2 types for individual concepts and for propositions enables the conversion of σ → ι) → (σ → t) the type for properties of individual concepts, (σ t), to the type for σ → ι) → ξ(σ σ → tt)), i.e. properties of functions from propositions to individuals, (ξ(σ (oo → ι) → o . The conditions on the two items from clause (I.iii) prevent the undesired σ ) → ξ(tt))) (i.e. into conversion of types of the form α → (σ → t) into types ξ(α) → ( ξ(σ types ξ(α) → (o → o) o)). This completes our discussion of the relation between TY2 types and EFL types. We next define the coding function on TY2 objects. In this definition, TY2 frames are frames which only contain objects of a TY2 type. EFL frames are frames which only contain objects of an EFL type: Definition 6 (EFL encoding). Let F 2 = {Dα2 | α ∈ 2Type} and F 1 = {Dα1 | α ∈ 1Type} 2 be TY2 and EFL frames, respectively, for which it holds that Dι2 = Dι1 and Dσ→t = Do1 . 2 1 For each TY2 type α, we define a function Ξα : Dα → Dξ(α) by the following recursion: I. (i) (ii)
Ξι (d) = d, if d ∈ Dι ;
Ξσ→t (d) = d, if d ∈ Dσ→t ;
1 If f ∈ then Ξσ→ι (f ) is the function g ∈ Do→ι s.t., for all w ∈ Dσ2 , f (w) = g(λiσ .i = w);
CISUC/TR 2014-02
2 Dσ→ι ,
12
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(iii)
2 If d ∈ Dt2 and d is not in the range of a function in Dσ→t , then Ξt (d) is 1 2 the function f ∈ Do s.t., for all w ∈ Dσ , f (w) = d;
2 2 If w ∈ Dσ2 and w is not in the domain of a function in Dσ→t or Dσ→ι , 1 then Ξσ (w) is the function f ∈ Do s.t. f = (λiσ . i = w);
II.
2 1 If h ∈ Dα→β , then Ξα→β (h) is the function h0 ∈ Dξ(α)→ξ(β) .
It is easy to prove that all functions Ξα are injections.
3.2
Modeling the Puzzle in EFL-Frames
The previous section has identified a procedure for the coding of TY2 objects. The present section uses this coding procedure to solve Partee’s temperature puzzle in an EFLontology. To facilitate our proof that the temperature puzzle can be solved in an {ι, o}-based semantics, we invert Montague’s strategy of “generalizing to the worst case” [26, p. 34] (cf. [16]): Rather than interpreting all expressions from the PTQ-fragment as functions over (the EFL-correlates of) individual concepts and obtaining their extensional equivalents through the use of meaning postulates, we interpret these expressions as functions over individuals. Only intensional common nouns and intensional intransitive verbs retain their interpretation as functions over (the EFL-correlates of) individual concepts. The application of the logical translations of intensional expressions to the translations of other PTQ-expressions is then handled through type-shifting. In particular, to enable the application of the EFL-translations of determiners (i.e. expressions of the type5 (ι → o) → ((ι → o) → o)) to the translations of intensional common nouns (type (oo → ι) → o) in an EFL-typed logic, we introduce the extensionalization operator ext. This operator sends properties of the EFL-correlates of individual concepts (type (oo → ι) → o) to properties of individuals (type ι → o). Below, the type-o constant w denotes the EFL-correlate of the current index, @.6 The variables c (type o → ι) and T (type (o → ι) → o) are the EFL-proxies for variables over individual concepts, respectively over properties of individual concepts. Definition 7. The function ext := λTλx∃c.T(c) ∧ x = c(w) sends terms of the type (o → ι) → o to terms of the type ι → o. The operator ext enables the extensionalization of the type-((o → ι) → o) translation, temp, of the noun temperature to the term λx∃c.temp(c)∧x = c(w). This term denotes the property of being identical to the result of applying some type-(o → ι) witness, c, of the property denoted by temp to the type-o correlate, w, of the current index. As a result, the term λx∃c.temp(c) ∧ x = c(w) denotes the property of being the temperature at the current index. 5 6
Here, the type of the argument is underlined. Thus, w := (λiσ .i = @).
CISUC/TR 2014-02
13
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
The possibility of interpreting intensional nouns in the type ι → o enables the EFLtranslation of the first premise from (?). This translation is given in (1). In the translation, lift := λxλPι→o .P (x) is a variant of the type-lifting operator from [25]. The variables P and P1 range over objects of the type ι → o. The variable Q ranges over objects of the type (ι → o) → o: 1. [np ninety] 2. [tv is]
ninety
(1)
λQλx.Q(λy.x = y)
3. [vp [tv is][np ninety]]
λQλx.Q(λy.x = y) [lift(ninety)]
= λQλx.Q(λy.x = y) [λP.P (ninety)] = λx.[λP.P (ninety)](λy.x = y) = λx.(x = ninety) 4. [n temperature] 5. [det the]
temp = λc.temp (c)
λP1 λP ∃x∀y.(P1 (y) ↔ x = y) ∧ P (x)
6. [np [det the][n temperature]]
λP1 λP ∃x∀y.(P1 (y) ↔ x = y) ∧ P (x) [ext(temp)] = λP1 λP ∃x∀y.(P1 (y) ↔ x = y) ∧ P (x) λz∃c.temp(c) ∧ z = c(w) = λP ∃x∀y. [λz∃c.temp(c) ∧ z = c(w)](y) ↔ x = y ∧ P (x) = λP ∃x∀y. ∃c.temp(c) ∧ y = c(w) ↔ x = y ∧ P (x)
7. [s [np [det the][n temperature]][vp [tv is][np ninety]]] λP ∃x∀y. ∃c.temp(c) ∧ y = c(w) ↔ x = y ∧ P (x) [λz.(z = ninety)] = ∃x∀y. ∃c.temp(c) ∧ y = c(w) ↔ x = y ∧ [λz.(z = ninety)](x) = ∃x∀y. ∃c.temp(c) ∧ y = c(w) ↔ x = y ∧ x = ninety
Notably, the term from the last line of (1.7) does not result from the term in the first premise of Figure 2 by replacing ‘c’ and ‘c1 ’ by ‘c’ and ‘c1 ’, and by replacing ‘temp’ and ‘rise’ by ‘temp’ and ‘rise’, respectively. In particular, while the term in the first (o → ι) → o) premise of Figure 2 states the existence of a unique witness of the type-((o property of being a temperature, the term from (1.7) only states the existence of a unique witness of the type-(ιι → o) property of being the temperature at the current index. However, since the occurrence of the phrase the temperature in the first premise of (?) receives an extensional interpretation (type ι), this weakening is unproblematic. We will see at the end of this section that our weaker EFL-term still blocks Partee’s temperature puzzle. To enable the combination of determiners (traditionally, type (ι → o) → ((ι → o) → o)) with intensional common nouns (type (oo → ι) → o) – and, thus, to enable the combination of noun phrases with intensional intransitive verbs –, we introduce the intensionalization operator int. This operator sends individuals to the EFL-correlates of individual concepts, and sends functions from properties of individuals to generalized quantifiers over individuals (type (ιι → o) → ((ιι → o) → o)) to functions from the EFL-
CISUC/TR 2014-02
14
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
correlates of properties of individual concepts to the EFL-correlates of generalized quan(o → ι) → o) → (((o (o → ι) → o) → o)). tifiers over individual concepts (type ((o In Definition 8, ninety is the EFL-correlate of the constant ninety.7
Definition 8. The operator int then works as follows: int(ninety) int λP1 λP ∃x.P1 (x) ∧ P (x) int λP1 λP ∀x.P1 (x) → P (x)
:=
ninety
:=
λT1 λT ∃c.T1 (c) ∧ T (c)
λT1 λT ∀c.T1 (c) → T (c) int λP1 λP ∃x∀y.(P1 (y) ↔ x = y) ∧ P (x) :=
:= λT1 λT ∃c∀c1 .(T1 (c1 ) ↔ c = c1 ) ∧ T (c)
The operator int is similar to an ‘ι-to-(σ → ι)’-restricted variant of the intensionalization operator for extensional TY2 terms from [14] (cf. [11, Ch. 8.4], [3,15,18]). This operator systematically replaces each occurrence of ι in the type of a linguistic expression by the type σ → ι (our type o → ι). As a result, the type for generalized quantifiers (σ → ι) → o) → o (our type over individuals, (ιι → o) → o, will be replaced by the type ((σ (o → ι) → o) → o). ((o The interpretation of intensional noun phrases in the type ((o → ι) → o) → o enables the translation of the second premise from (?). In the translation, the type-(o → ι) → o term rise is the EFL-correlate of the constant rise. 1.
[iv rises]
2.
[n temperature]
3.
[det the]
4.
[np [det the][n temperature]]
rise
(2) temp
λP1 λP ∃x∀y.(P1 (y) ↔ x = y) ∧ P (x)
int λP1 λP ∃x∀y.(P1 (y) ↔ x = y) ∧ P (x) [temp]
= λT1 λT ∃c∀c1 .(T1 (c1 ) ↔ c = c1 ) ∧ T (c)[temp]
5.
= λT ∃c∀c1 .(temp (c1 ) ↔ c = c1 ) ∧ T (c)
[s [np [det the][n temperature]][iv rises]]
∃c∀c1 .(temp (c1 ) ↔ c = c1 ) ∧ rise (c)
The interpretation of proper names in the type o → ι enables us to translate the conclusion from (?) as follows: 1.
[np ninety]
2.
[iv rises]
3.
[s [np ninety][iv rises]]
ninety
(3)
rise rise [int(ninety)] = rise (ninety)
This completes our translation of the ‘ingredient sentences’ for Partee’s temperature puzzle. The invalid inference from the conjunction of (1.7) and (2.5) to (3.3) in our EFLsemantics is captured in Figure 3: 7
Thus, ninety satisfies the metatheoretical axiom ∀po .ninety(p) = ninety.
CISUC/TR 2014-02
15
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
∃x∀y. ∃c.temp(c) ∧ y = c(w) ↔ x = y ∧ x = ninety
∃c∀c1 .(temp (c1 ) ↔ c = c1 ) ∧ rise (c) ///// ///// rise (ninety) Figure 3: The invalid inference in EFL-semantics. In particular, while the formula in the second premise attributes the property ‘rise’ to the type-(oo → ι) object which has the property of being a temperature, the formula in the first premise attributes the property ‘is ninety’ only to the result (type ι) of applying a temperature-object to the EFL-correlate of @. In virtue of this fact – and the resulting invalidity of substituting ninety for c in the second premise of Figure 3 –, the formula in the conclusion does not follow from the conjunction of the two premise-formulas by the rules of classical logic.
4
Conclusion
This paper has demonstrated that Partee’s temperature puzzle can be solved in an EFLsemantics, which only commands basic types for individuals and propositions. We have obtained this result by coding indices in the type for propositions, and by defining shiftability relations between functions over individuals and the EFL-correlates of individual concepts. Since the temperature puzzle is a general litmus test for empirically adequate PTQ-models, the result shows that ontologically minimal models for the PTQ-fragment need not contain indices or individual concepts. The identification of the minimal semantic requirements on PTQ-models is a contribution to the general program of reverse formal semantics. This program – named in analogy with the corresponding program, reverse mathematics, in the foundations of mathematics [12, 29] – attempts to identify the minimal formal semantics (with ‘minimality’ defined w.r.t. the number and complexity of basic types) which interpret certain fragments of natural language. Questions in reverse formal semantics include the following: (1) What minimal number of types is required for the interpretation of certain fragments of natural language (e.g. the EFL- or PTQ-fragment)? (2) What is the identity of the types in the smallest set of types from (1)? What are the objects in the types’ domains? (3) Which equinumerous type-sets are equivalent (up to coding) to the sets from (1) and (2) (s.t. they preserve the explanatory and modeling power of a semantics with basic types from the original set)? (4) By (1) to (3): Which existing formal semantics (for which fragments?) can be reduced to which other semantics (for which other fragments)?
CISUC/TR 2014-02
16
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(5) Which types (if any) resist coding through the familiar types (s.t. they require the jump from one to another, non-equivalent, set of types)? The present paper has provided a partial answer for particular cases of questions (1), (3), and (4). In particular, it has shown that the set of basic types from [21] is equivalent to the basic-type set from [23] (cf. (3)), such that we can reduce Montague’s formal semantics for the PTQ-fragment to (a variant of) the semantics from [21] (cf. (4)). We hope that the answers to other cases of the above questions and the answers to the remaining questions will identify new relations between existing formal semantic models, and that they will further our insight into the type system of natural language.
References [1] Jon Barwise, Scenes and other situations, The Journal of Philosophy 78/7 (July 1981), 369–397. [2] Jon Barwise and John Perry, Situations and Attitudes, MIT Press, Cambridge, Mass., 1983. [3] Johan van Benthem, Strategies of intensionalization, Intensional Logic, History of Philosophy, and Methodology: To Imre Ruzsa on the occasion of his 65th birthday (I. Bodn´ ar, A. M´ at´e, and L. P´ olos, eds.), Department of Symbolic Logic, E¨ otv¨ os University, Budapest, 1988. [4] Gennaro Chierchia and Raymond Turner, Semantics and property theory, Linguistics and Philosophy 11 (1988), 261–302. [5] Alonzo Church, A formulation of the Simple Theory of Types, Journal of Symbolic Logic 5/2 (December 1940), 56–68. [6] Maxwell J. Cresswell, Logics and Languages, Methuen Young Books, London, 1973. [7]
, The semantics of degree, Montague Grammar (Barbara Partee, ed.), Academic Press, New York, 1976, 1976.
[8] Donald Davidson, The logical form of action sentences, Essays on Actions and Events: Philosophical essays of Donald Davidson, Clarendon Press, Oxford and New York, 2001, 1967. [9] David R. Dowty, Word Meaning and Montague Grammar: The semantics of verbs and times in generative semantics and in Montague’s PTQ, Synthese Language Library, vol. 7, D. Reidel Publishing Company, Dordrecht, 1979. [10] Marie Dˇ uz´ı, Bjørn Jespersen, and Pavel Materna, Procedural Semantics for Hyperintensional Logic: Foundations and applications of Transparent Intensional Logic, Logic, Epistemology, and the Unity of Science, vol. 17, Springer, Dordrecht, 2010. [11] Jan van Eijck and Christina Unger, Computational Semantics with Functional Programming, Cambridge University Press, Cambridge and New York, 2010. [12] Harvey Friedman, Some systems of second order arithmetic and their use, Proceedings of the International Congress of Mathematicians, Vol. 1, Canadian Mathematical Congress, Montreal, 1974. [13] Daniel Gallin, Intensional and Higher-Order Modal Logic with Applications to Montague Semantics, North Holland, Amsterdam, 1975. [14] Philippe de Groote and Makoto Kanazawa, A note on intensionalization, Journal of Logic, Language and Information 22/2 (April 2013), 173–194. [15] Irene Heim and Angelika Kratzer, Semantics in Generative Grammar, Blackwell Textbooks in Linguistics, vol. 13, Blackwell, Malden, Mass. and Oxford, 1998.
CISUC/TR 2014-02
17
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
[16] Herman Hendriks, Flexible Montague Grammar, ITLI Prepublication Series for Logic, Semantics and Philosophy of Language 08 (1990). [17] Theo M.V. Janssen, Individual concepts are useful, Varieties of Formal Semantics: Proceedings of the 4th Amsterdam Colloquium (Fred Landman and Frank Veltman, eds.), Groningen-Amsterdam Studies in Semantics, vol. 3, 1984. [18] Edward L. Keenan and L. Faltz, Boolean Semantics for Natural Language, Kluwer Academic Publishers, Dordrecht, 1985. [19] Angelika Kratzer, An investigation into the lumps of thought, Linguistics and Philosophy 12 (1989), 607–653. [20] David Lewis, General semantics, Synthese 22/1–2 (December 1970), 18–67. [21] Richard Montague, English as a formal language, Formal Philosophy: Selected papers of Richard Montague (Richmond H. Thomason, ed.), Yale University Press, New Haven and London, 1976. [22]
, Universal grammar, Formal Philosophy, 1976.
[23]
, The proper treatment of quantification in ordinary English, Formal Philosophy, 1976.
[24] Reinhard Muskens, Meaning and Partiality, CSLI Lecture Notes, FoLLI, Stanford, 1995. [25] Barbara Partee, Noun phrase interpretation and type-shifting principles, Studies in Discourse Representation Theory and the Theory of Generalized Quantifiers (Jeroen Groenendijk, Dick de Jong, and Martin Stokhof, eds.), Foris Publications, Dordrecht, 1987, 1987. , The development of formal semantics, The Handbook of Contemporary Semantic Theory (Shalom Lappin, ed.), Blackwell, Oxford, 1996, 1996. ¨ [27] Moses Sch¨ onfinkel, Uber die Bausteine der mathematischen Logik, Mathematische Annalen 92 (1924), 305–316. [26]
[28] Magdalena Schwager, Bodyguards under cover: The status of individual concepts, Proceedings of SALT XVII (T. Friedman and M. Gibson, eds.), Cornell University, Ithaca, 2007, 2007. [29] Stephen G. Simpson, Subsystems of Second Order Arithmetic, Perspectives in Logic, vol. 2, Cambridge University Press, Cambridge, 2009. [30] Richmond H. Thomason, A model theory for the propositional attitudes, Linguistics and Philosophy 4 (1980), 47–70.
CISUC/TR 2014-02
18
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Mereological Delineation Semantics and Quantity Comparatives ∗ Heather Burnett1,2 1
D´epartement de linguistique et de traduction Universit´e de Montr´eal 2 Institut Jean Nicod Ecole normale sup´erieure, Paris
[email protected]
Abstract This paper makes a new contribution to the development of new logical systems for the analysis of natural language expression and reasoning. More precisely, I present a new logical analysis of quantity comparatives (i.e. sentences as in More linguists came to the party than stayed home to study) within the Delineation Semantics (DelS) approach to gradability and comparison Klein (1980), among many others. Along with the Degree Semantics (DegS) framework (see Kennedy (1997), for a summary), DelS is one of the dominant logical frameworks for analyzing the meaning of gradable constituents of the adjectival syntactic category; however, there has been very little work done investigating the application of this framework to the analysis of gradability outside the adjectival domain. This state of affairs distinguishes the DelS framework from its DegS counterpart, where such questions have been investigated in great deal since the beginning of the 21st century. Nevertheless, it has been observed (for example, by Doetjes et al. (2011)) that there is nothing inherently adjectival about the way that the interpretations of scalar predicates are calculated in DelS, and therefore that there is enormous potential for this approach to shed light on the nature of gradability and comparison in the nominal and verbal domains. This paper gives the first attempt at realizing this potential within a Mereological extension of a (simplified) version of Klein (1980)’s system.
1
Introduction
This paper makes a new contribution to the development of new logical systems for the analysis of natural language expression and reasoning. More precisely, I present a new logical analysis of nominal comparatives (i.e. sentences as in (1)) within the Delineation Semantics (DelS) approach to gradability and comparison (Klein, 1980, among many others). ´ e, Ed Keenan, Jessica Rett, Robert van Rooij, Ed Stabler and the audience Thanks to Paul Egr´ at the NLCS2 workshop in Vienna for helpful comments. This research was partially supported by a SSHRC postdoctoral fellowship to the author. ∗
CISUC/TR 2014-02
19
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(1)
More linguists came to the party than stayed home to study.
Along with the Degree Semantics (DegS) framework (see Kennedy, 1997, for a summary), DelS is one of the dominant logical frameworks for analyzing the meaning of gradable constituents of the adjectival syntactic category; however, there has been very little work done investigating the application of this framework to the analysis of gradability outside the adjectival domain. This state of affairs distinguishes the DelS framework from its DegS counterpart, where such questions have been investigated in great deal since the beginning of the 21st century. Nevertheless, it has been observed (for example, by Doetjes et al. (2011)) that there is nothing inherently adjectival about the way that the interpretations of scalar predicates are calculated in DelS, and therefore that there is enormous potential for this approach to shed light on the nature of gradability and comparison in the nominal and verbal domains. This paper gives the first attempt at realizing this potential within a Mereological extension of a (simplified) version of Klein (1980)’s system.
2
Delineation Semantics for Adjectival Comparatives
The proposal that there exists an analytical relationship between context-sensitivity and gradability lies at the heart of the Delineation approach to the semantics of scalar predicates; in particular, in this framework, the orderings associated with adjectival predicates (often called their scales) are derived from looking at how the denotation of these predicates vary according to a contextually given comparison class. Formally speaking, we have two series of predicates; positive predicates (notated by the P class: P, P1 , P2 . . .) and negative predicates (notated by the P¯ class: P¯ , P¯1 , P¯2 . . .). The semantics (for a language with constants and predicates of the type found in First Order Logic, as well as the negative predicates) is set up as follows : Definition 2.1. Model. A model is a tuple M = hD, J·Ki where D is a non-empty domain of individuals, and J·K is a function from pairs consisting of a member of the non-logical vocabulary and a comparison class (a subset of the domain) satisfying: • For each individual constant a1 , Ja1 K ∈ D.
• For each X ⊆ D and for each predicate P , JP KX ⊆ X and JP¯ KX ⊆ X.
With respect to the linguistic analysis, let us assume furthermore that the sets of individuals that the predicates pick out are those individuals that clearly satisfy them (or clearly do not satisfy them, in the case of negative predicates), i.e. only non-borderline individuals. In this framework, truth in a model is always given with respect to a distinguished comparison class. Here I suppose that if the subject of the sentence is not included in
CISUC/TR 2014-02
20
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
the distinguished comparison class, the truth value of the sentence is undefined1 . Definition 2.2. Semantics of the positive form. For all models M 2 , all comparison classes X ⊆ D, all predicates P and individuals a1 ∈ X, 1 if Ja1 KM ∈ JP KX,M JP (a1 )KX,M = 0 if Ja1 KM ∈ JP¯ KX,M i otherwise
This very simple semantic analysis is now enriched with a set of extra constraints on the application of scalar predicates across classes. In this paper, we will adopt the constraint set proposed in van Rooij (2011a) as an analysis of the context-sensitivity and gradability properties of relative adjectives. Van Rooij proposes the following four constraints (set in my notation)3 : For all predicates P , all comparison classes X ⊆ D, and all a1 , a2 ∈ D,
(2)
P*: JP KX ∩ JP¯ KX = Ø
(2) ensures that P and P¯ behave as contraries. (3)
No Reversal*: 1.If JP (a1 )KX = 1 and JP (a2 )KX = 0, then there is no X 0 ⊆ D such that JP (a2 )KX 0 = 1 and JP (a1 )KX 0 = 0. 2.If JP¯ (a1 )KX = 1 and JP¯ (a2 )KX = 0, then there is no X 0 ⊆ D such that JP¯ (a2 )KX 0 = 1 and JP¯ (a1 )KX 0 = 0.
(3) ensures that, if in one comparison class, a1 is categorized as (clearly) P and a2 is categorized as (clearly) not P , then there are no comparison classes in which this categorization is reversed. (4)
Upward Difference*: If JP (a1 )KX = 1 and JP¯ (a2 )KX = 1, then, for all X 0 : X ⊆ X 0 , there is some a3 , a4 such that JP (a3 )KX 0 = 1 and JP¯ (a4 )KX 0 = 1.
(4) says that, if, in one comparison class, it is reasonable to make a P/P¯ distinction between a1 and a2 , then, in all larger comparison classes that contain a1 and a2 , we 1
(i)
As suggested by our judgements concerning sentences like (i). #Mary is tall for a boy in this class.
2
For readability considerations, I will often omit the model notation, writing only J·KX for J·KX,M . Van Rooij’s constraints are modified versions of those in van Benthem (1982), which are called No Reversal, Upward Difference and Downward Difference, respectively. Hence the NR*, UD*, DD* labels. 3
CISUC/TR 2014-02
21
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
must continue to make some distinction (although not necessarily the same one). This axiom can be thought of as a principle of contrast preservation in categorization. (5)
Downward Difference*: If JP (a1 )KX = 1 and JP¯ (a2 )KX = 1, then, for all X 0 ⊆ X, if a1 , a2 ∈ X 0 , then there is some a3 , a4 such that JP (a3 )KX 0 = 1 and JP¯ (a4 )KX 0 = 1.
(5) is another principle of contrast preservation which states that, if we make a distinction between a1 and a2 based on P/P¯ in one comparison class, then in all smaller comparison classes that include a1 and a2 , we must continue to make some distinction (although, again, not necessarily the same one). With these constraints in place, we can define an ordering relation (P ) associated with a scalar predicate P and a similarity relation (∼P ) based on P as follows, based on (van Rooij, 2011a, p.13):
Definition 2.3. Implicit Scale () and similarity (∼). For all individuals a1 , a2 ∈ D and all predicates P , 1. a1 P a2 iff a1 ∈ JP Ka1 ,a2 and a2 ∈ JP¯ Ka1 ,a2 . 2. a1 ∼P a2 iff a1 6P a2 and a2 6P a1 .
Van Rooij shows that with the constraint set in (2)-(5), the P relations are semiorders: irreflexive, semi-transitive relations that satisfy the interval order property. A pleasant consequence of this is that, van Rooij argues, we immediately have an analysis of the properties of modes of comparison that do not involve special degree morphology, such as implicit comparative constructions like Mary is tall, compared to John. However, comparisons formed with the comparative morpheme -er/more (6) have different properties than comparisons that do not involve this morphology. Observe that, in contrast to the implicit comparative construction, for (6-a) to be true, Mary only needs to be slightly taller than John, not noticeably or significantly so. (6)
a. b.
Mary is taller than John. This problem is more difficult than that one.
In order to reflect the difference in the order, van Rooij proposes that the orders that are relevant for evaluating the truth of sentences with explicit comparative morphology in English are stronger than the ones given by definition 2.3, namely, they are strict weak orders: irreflexive, transitive and almost connected relations. As shown by Luce (1956), for every semi-order, there is a unique most refined strict weak order, which we will notate >P (for the corresponding semi-order P ), and this order can be constructed as follows: For all individuals a1 , a2 ∈ D and all predicates P , a1 >P a2 iff there is some a3 ∈ D such that a1 ∼P a3 and a3 P a2 or a2 ∼P a3 and a1 P a3 . This construction preserves the relations established by P , but also adds the individuals into the order that are found in the ‘gaps’ between a predicate’s clear extension (P )
CISUC/TR 2014-02
22
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
and its clear anti-extension (P¯ ) in a comparison class. In addition, if we define an equivalence relation ≈P based on >P as a1 ≈P a2 iff a1 6>P a2 and a2 6>P a1 , then we can group individuals together into equivalence classes (notated as [a1 ]P ) based on ≈P in the usual way4 and construct a linear ordering between equivalence classes as follows: for all a1 , a2 ∈ D, [a1 ]P >∗P [a2 ]P iff a1 >P a2 . These structures are appropriate for the analysis of comparatives in which both the main clause and the than clause contains a single adjective; however, comparatives can also be formed from (a subset of possible) adjective-individual pairs (7), and indeed it is reasonable to think (as already suggested by Klein (1980)) that sentences such as (6) are simply special cases when the two adjectives happen to be the same. (7)
This table is longer than that table is wide.
Thus, the semantics for explicit comparative constructions needs to consider both the scales associated with the adjective in the main clause and the scales associated with the adjective in the than clause. Within the logical analysis, we add to the language a quaternary predicate er that will combine with two individuals and two predicates to form well-formed formulas of the form er(a1 , P1 , a2 , P2 ) etc. Since formulas containing er can contain two context-sensitive predicates, they will be evaluated with respect to two (possibly distinct) comparison classes (one for each predicate)5 , and their truth values will depend on whether the first individual occupies a higher rank on the first predicate’s scale than the second individual on the second predicate’s scale. To formalize this idea, the approach that I will pursue in this paper is that of Bale (2008) in which a ‘universal scale of comparison’ is constructed according to which individuals’ rankings on different scales can be compared. The first step in the construction involves a normalization process: pairs consisting of an individual a1 and a linear scale >∗P are mapped to what, following the terminology (and notation) in van Rooij (2011b), I will call their Bale rank (bP , for linear scale >∗P ). Bale ranks are rational numbers between 0 and 1 whose values are calculated in the following way: Definition 2.4. Bale rank (bP ). For all predicates P , all individuals a1 , bP (a1 ) =
|{[a2 ]P : a2 ∈ D & [a1 ]P >∗P [a2 ]P }| |{[a3 ]P : a3 ∈ D}|
In other words, we map individuals according to their equivalence classes (or ‘degrees’) on a linear scale to the number of equivalence classes that lie below them on that scale divided by the total number of distinct degrees on the linear scale. Now that we can compare degrees on different scales by means of the universal scale, the truth of the explicit comparative construction is calculated with respect to comparing individuals belonging to equivalence classes that constitute these degrees: 4
For all a2 ∈ D, a2 ∈ [a1 ]P iff a2 ≈P a1 . As a convention, I will use numerals to associate predicates and comparison classes, i.e. in an expression like Jer(a1 , P1 , a2 , P2 )KX1 ,X2 , X1 is the class that goes with P1 and X2 goes with P2 . 5
CISUC/TR 2014-02
23
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Definition 2.5. Semantics of the explicit comparative construction For all comparison classes X1 , X2 ⊆ D, predicates P1 , P2 , and individuals a1 , a2 , Jer(a1 , P1 , a2 , P2 )KX1 ,X2 = 1 iff bP1 (a1 ) > bP2 (a2 )
I now extend this analysis of the scales associated with adjectives and the meaning of the explicit comparative construction to gradable constituents and comparatives outside the adjectival domain.
3
Delineation Semantics for Quantity Comparatives
The heart of the proposal involves combining three (I hope) relatively uncontroversial points: 1. The proposal will be set within a mereological extension (Simons, 1987; Hovda, 2008, among others) of the Delineation system presented above, in line with much of the work on the semantics of plural noun phrases and distributivity since Link (1983)6 . 2. I assume that subject determiner phrases denote generalized quantifiers Barwise & Cooper (1981), among others: second order properties (i.e properties of properties). 3. When the explicit comparative morpheme (er) appears in the nominal domain, this morpheme combines with noun phrases containing the ‘Q-adjectives’ many or few, as proposed by Bresnan (1973), among many others. In line with the work of Link (1983), I assume that plural predicates of the distributive class are constructed in the syntax through the concatenation of a pluralizing distributivity operator ∗ : If P is a predicate, then P ∗ is a plural predicate. Furthermore, we also have introduce a nominal pluralization operator: ? : If P is a singular predicate, then P ? is a plural NP. Unlike in basic Delineation semantics, in which we supposed our domain to be an unordered set of individuals as in the models for first order logic, we now interpret the expressions of our language into a domain that encodes mereological (i.e. partstructure) relations between its individuals. More precisely, we define the our model structures as follows: 6
For convenience, I limit my attention to distributive predicates like come to the party and stay home to study, i.e. instances of plural predication that allow for inferences to instances of singular predications.
CISUC/TR 2014-02
24
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Definition 3.1. Model Structure. A model structure M is a tuple hD, i, where D is a finite set of individuals, is a binary relation on D.7 Furthermore, we stipulate that hD, i satisfies the axioms of classical extensional mereology (CEM).8 1. Reflexivity. For all a1 ∈ D, a1 a1 . 2. Transitivity. For all a1 , a2 , a3 , if a1 a2 and a2 a3 , then a1 a3 . 3. Anti-symmetry. For all a1 , a2 ∈ D, if a1 a2 and a2 a1 , then a1 = a2 . 4. Strong Supplementation.9 For all a1 , a2 ∈ D, for all atoms a3 , if, if a3 a1 , then a3 ◦ a2 , then a1 a2 . 5. Fusion Existence. For all X ⊆ D, if there is some a1 ∈ X, then there is some a2 ∈ D such that F u(a2 , X)10 . We can note that, in CEM, for every subset of D, not only W does its fusion exist, but it is also unique (cf. Hovda (2008), p. 70), so we write X to notate the unique a1 such that F u(a1 , X), for all X ⊆ D. Furthermore, since we stipulated that every domain D is finite, every structure hD, i is atomic. We define the notion of an atom as follows: a1 ∈ D is an atom iff there is no a2 ∈ D such that a2 ≺ a1 , and we will write AT (X) for the set of atoms of a set X ⊆ D. Above, it was proposed that the interpretation of predicates is relativized to comparison classes, i.e. distinguished subsets of the domain. Now, since our domain has more structure, we need to make a precision: while we will allow constants (a1 , a2 , a3 . . . ) to denote in the entire domain (i.e. both singular and plural individuals), we stipulate that both the denotations of singular predicates and the comparison classes according to which they are interpreted are restricted to the set of atoms of D. Definition 3.2. Model. A model is a tuple M = hD, , J·Ki, where hD, i is a model structure and J·K is a function satisfying: 1. If a is a constant, then JaK ∈ D.
2. If P is a singular predicate and X ⊆ AT (D), then JP KX ⊆ X.
7 Note this symbol should not be confused with the P symbols that notate the semi-order ‘implicit scale’ relations. notates invariant, predicate independent relations that are part of the model structure. 8 This particular axiomatization is taken from Hovda (2008) (p.81). The version of fusion used here is what Hovda calls ‘type 1 fusion’. 9 Def. Overlap (◦). For all a1 , a2 ∈ D, a1 ◦ a2 iff ∃a3 ∈ D such that a3 a1 and a3 a2 . 10 Def. Fusion (Fu). For a1 ∈ D and X ⊆ D, Fu(a1 , X) (‘a1 fuses X’) iff, for all a2 ∈ D, a2 ◦ a1 iff there is some a3 such that a3 ∈ X and a2 ◦ a3 .
CISUC/TR 2014-02
25
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
The analysis of the semantics of distributive plural predicates will follow that of Link (1983). Link proposes that plural distributive predicates (ex. linguists, have come to the party etc.) are derived from singular predicates P (ex. linguist, has come to the party) through the addition of a ∗ or a ? operator, which generates all the individual fusions of members of the extension of P 11 . Definition 3.3. Interpretation of plurality (∗ /? ). For all singular predicates P and X ⊆ AT (D), JP ∗/? KX = {a1 : F u(a1 , A), for some A ⊆ JP KX }.
Following standard assumptions in the linguistics literature since Bresnan (1973), I propose that the explicit comparative relations are built off of the scales associated with DPs containing the Q-adjectives many and few. These items can combine with count plural noun phrases to form DPs as shown in (8); therefore, we add to the language the expressions many and few, which combine with a plural NP P1? to form an expression of the category DP (i.e. manyP1? or fewP1? ). (8)
a. b.
Many linguists came to the party. Few philosophers came to the party.
As mentioned, I analyze DP subjects of the form many linguists and few philosophers as denoting generalized quantifiers, and, because these DPs are context-sensitive, I propose that they are evaluated with respect to a comparison class. In parallel to the comparison classes associated with singular predicates, the comparison classes associated with generalized quantifiers (i.e. DP denotations) will be made up of possible values for the quantifiers’ argument: sets of properties. These sets of properties should be thought of constituting the range of relevant possible cases or situations that might influence whether or not the main predicate denotation would be categorized as satisfying the DP subject denotation. Definition 3.4. Interpretations of Many/Few DPs. If P is a predicate, X ⊆ AT (D), Z ⊆ P(D), JmanyP ? KZ,X ⊆ Z, and JfewP ? KZ,X ⊆ Z.
Expressions of the category DP combine with plural predicates P ∗ to form formulas, which are interpreted in an exactly parallel manner to formulas containing the positive form of scalar adjectives (cf. Def. 2.2). However, instead of opposing positive extensions of predicates P and negative extensions P¯ , we oppose DPs containing many with those containing few. 11 I have chosen to separate the nominal and predicative pluralizing operators in the logic because it makes the syntax of our small language much simpler, but since we assign to them exactly the same interpretation, as shown in Definition 3.3, I suggest that they should be thought of as two separate occurrences of an interpretable [+plur] morphological marking.
CISUC/TR 2014-02
26
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Definition 3.5. For all predicates P1 , P2 , X1 , X2 ⊆ AT (D) and Z ⊆ P(D), ? ∗ 1 if JP2 KX2 ∈ JmanyP1 KZ,X1 JmanyP1? (P2∗ )KZ,X1 ,X2 = 0 if JP2∗ KX2 ∈ JfewP1? KZ,X1 i otherwise ∗ ? 1 if JP2 KX2 ∈ JfewP1 KZ,X1 JfewP1? (P2∗ )KZ,X1 ,X2 = 0 if JP2∗ KX2 ∈ JmanyP1? KZ,X1 i otherwise
Of course, as in the adjectival domain, the analysis will be incomplete if we do nothing to restrict the application of many/few DPs across comparison classes. Thus, we impose higher-order versions of van Rooij (2011a)’s constraints that were adopted in the analysis of gradable adjectives above. The first constraint (9) ensures that DPs containing many and DPs containing few are contraries.
(9)
P?: For all P , all X ⊆ AT (D), and Z ⊆ P(D), JmanyP ? KZ,X ∩ JfewP ? KZ,X = Ø
Furthermore, we assume appropriate versions of the constraints that characterize the application of scalar predicates across comparison classes. (10)
(11)
(12)
No Reversal?: 1.If JmanyP1? (P2∗ )KZ,X1 ,X2 = 1 and JmanyP1? (P3∗ )KZ,X1 ,X3 = 0, then there is no Z 0 ⊆ D such that JmanyP1? (P3∗ )KZ 0 ,X1 ,X3 = 1 and JmanyP1? (P2∗ )KZ 0 ,X1 ,X2 = 0. 2.If JfewP1? (P2∗ )KZ,X1 ,X2 = 1 and JfewP1? (P3∗ )KZ,X1 ,X3 = 0, then there is no Z 0 ⊆ D such that JfewP1? (P3∗ )KZ 0 ,X1 ,X3 = 1 and JfewP1? (P2∗ )KZ 0 ,X1 ,X2 = 0.
Upward Difference?: If JmanyP1? (P2∗ )KZ,X1 ,X2 = 1 and JfewP1? (P3∗ )KZ,X1 ,X2 = 1, then, for all Z 0 : Z ⊆ Z 0 , there are some P4 , P5 such that JmanyP1? (P4∗ )KZ 0 ,X1 ,X4 = 1 and JfewP1? (P5∗ )KZ 0 ,X1 ,X5 = 1.
Downward Difference?: If JmanyP1? (P2∗ )KZ,X1 ,X2 = 1 and JfewP1? (P3∗ )KZ,X1 ,X3 = 1, then, for all Z 0 ⊆ Z, if P2∗ , P3∗ ∈ Z 0 , then there are some P4 , P5 such that JmanyP1? (P4∗ )KZ 0 ,X1 ,X4 = 1 and JfewP1? (P5∗ )KZ 0 ,X1 ,X5 = 1.
With this architecture, we can now define the implicit scales associated with DPs containing many and few in an exactly parallel manner to the scales associated with adjectival predicates (see Def. 2.3)12 . 12
Observe that since DPs containing many/few contain a context-sensitive predicate (P ? ), the scales associated with the whole DP must be parametrized to a single interpretation of P ? at a comparison
CISUC/TR 2014-02
27
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Definition 3.6. Implicit quantitative scale. (many/few ) For all predicates P , properties P2 , P3 ⊆ D, and X ⊆ AT (D), 1. P2 [manyP ? ]X P3 iff P2 ∈ JmanyP ? K{P2 ,P3 },X and P3 ∈ JfewP ? K{P2 ,P3 },X . 2. P2 [fewP ? ]X P3 iff P2 ∈ JfewP ? K{P2 ,P3 },X and P3 ∈ JmanyP ? K{P2 ,P3 },X .
If we do this, we can show that, like with adjectival predicates, the [manyP ? ]X and [fewP ? ]X relations are semi-orders. Theorem 3.1 follows straightforwardly from van Rooij (2011a)’s corresponding result in the adjectival domain. Theorem 3.1. Mereological Semi-Orders. For all predicates P , all X ⊆ AT (D), [manyP ? ]X is a semi-order. As is [fewP ? ]X . Now that we have associated semi-orders with DP constituents, we can construct strict weak orders from our [manyP ? ]X relations in the way suggested by Luce (1956), as well as the corresponding linear orders >∗[manyP ? ]X 13 . However, looking at the scales associated with a single DP would only give us the truth conditions for comparatives where the nominal restriction is the same in the main clause as in the than clause. To interpret comparatives where the DP restrictions are different (i.e. More linguists came to the party than philosophers), we need to make use of the constructions that we used to interpret indirect comparatives in the adjectival domain. In particular, using Bale’s construction, we can assign to every property on a scale a bale rank, and give the semantics of formulas with the explicit comparative relation when it holds between two properties and two quantifiers, which is simply a higher-order version of the semantics that we gave to explicit comparatives in the adjectival domain14 . Definition 3.7. Jer(P1 , manyP2? , P3 , manyP4? )KX1 ,X2 ,X3 ,X4 = 1 iff b[manyP2? ]X2 (JP1 KX1 ) > b[manyP4? ]X4 (JP3 KX3 )
class X. Therefore, we write [manyP ? ]X to reflect this. 13 Def. Explicit (>[manyP ? ]X ) and linear (>∗[manyP ? ]X ) scales. For all properties P1 , P2 ⊆ D, all predicates P and all X ⊆ AT (D), (i)
P1 >[manyP ? ]X P2 iff there is some P3 ⊆ D such that: 1.P1 ∼[manyP ? ]X P3 and P3 [manyP ? ]X P2 or 2.P2 ∼[manyP ? ]X P3 and P1 [manyP ? ]X P3 .
(ii)
[P1 ]∼[manyP ? ]X >∗[manyP ? ]X [P2 ]∼[manyP ? ]X iff P1 >[manyP ? ]X P2 .
The definitions for expressions with few are identical. 14 Def. DP Bale Rank. For all predicates P , comparison classes X ⊆ AT (D), and properties P1 , b[manyP ? ]X (P1 ) =
CISUC/TR 2014-02
|{[P2 ][manyP ? ]X : [P1 ][manyP ? ]X >∗[manyP ? ]X [P2 ][manyP ? ]X }| |{[P3 ][manyP ? ]X : P3 ⊆ D}|
28
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Thus, in this system, (modulo consideration of type and syntactic complexity) we have a complete parallelism between comparison in the adjective and nominal domains. (13)
Adjectival Domain a. er(a1 , P1 , a2 , P1 ): This table is longer than that one (is). b. er(a1 , P1 , a1 , P2 ): This table is longer than it is wide. c. er(a1 , P1 , a2 , P2 ): This table is longer than that one is wide.
(14)
Nominal Domain a. er(P1 , manyP2? , P3 , manyP2? ): More linguists came to the party than stayed home to study. b. er(P1 , manyP2? , P1 , manyP4? ): More linguists came to the party than philosophers. c. er(P1 , manyP2? , P3 , manyP4? ): More linguists came to the party than philosophers stayed home to study.
I therefore conclude that the Delineation approach to the semantics of gradable adjectives extends naturally to gradable constructions in the nominal domain.
References Bale, Alan. 2008. A universal scale of comparison. Linguistics and philosophy, 31, 1–55. Barwise, John, & Cooper, Robin. 1981. Generalized quantifiers and natural language. Linguistics and philosophy, 4, 159–219. Bresnan, Joan. 1973. Syntax of the comparative clause in english. Linguistic inquiry, 4, 275–343. Doetjes, Jenny, Constantinescu, C, & Souckov´a, K. 2011. A neo-klein-ian approach to comparatives. Pages 124–141 of: Ito, S, & Cormanu, E (eds), Proceedings of semantics and linguistic theory 19. Amherst: UMass. Hovda, Paul. 2008. What is classical mereology? Journal of philosophical logic, 38, 55–82. Kennedy, Christopher. 1997. Projecting the adjective. Ph.D. thesis, University of California, Santa Cruz. Klein, Ewan. 1980. A semantics for positive and comparative adjectives. Linguistics and philosophy, 4, 1–45. Link, Godehard. 1983. The logical analysis of plurals and mass nouns: A latticetheoretic approach. Pages 302–322 of: Bauerle, Rainer, Schwartze, Cristophe, & von
CISUC/TR 2014-02
29
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Stechow, Arnim (eds), Meaning, use and the interpretation of language. The Hague: Mouton de Gruyter. Luce, R Duncan. 1956. Semi-orders and a theory of utility discrimination. Econometrica, 24, 178–191. Simons, Peter. 1987. Parts: A study in ontology. Oxford: Oxford University Press. van Benthem, Johan. 1982. Later than late: On the logical origin of the temporal order. Pacific philosophical quarterly, 63, 193–203. ´ e, Paul, van Rooij, Robert. 2011a. Implicit vs explicit comparatives. Pages – of: Egr´ & Klinedinst, Nathan (eds), Vagueness and language use. Palgrave Macmillan. van Rooij, Robert. 2011b. Measurement and interadjective comparisons. Journal of semantics, 28, 335–358.
CISUC/TR 2014-02
30
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Signatures as open-ended types Tim Fernando Trinity College Dublin, Ireland
[email protected] Abstract Natural language is open-ended in many ways, two of which are analyzed using Goguen and Burstall’s notion of an institution. Signatures are associated with strings and records, representing time-as-change and variable adicity.
1
Introduction
An essential characteristic of natural language, open-endedness is partly (if not wholly) responsible for the interest in what Terry Parsons has dubbed “sub-atomic semantics,” the study of those “formulas of English” that are treated as atomic formulas in most logical investigations of English (Parsons 1990, page ix). A famous example of open-ended predicate-argument structure is Jones did it slowly, deliberately, in the bathroom, with a knife, at midnight (Davidson 1967, page 81). Open-endedness lurks also in McTaggart’s dictum that ‘there could be no time if nothing changed’ (Prior 1967, page 85), what counts as change being open-ended. Variable adicity and time-as-change are analyzed below using the notion of an institution, introduced by Goguen and Burstall 1992 to cope with the proliferation of logical systems in computer science.1 At the center of an institution are relations |=Σ of satisfaction between models and sentences of various signatures Σ. Signatures provide rudimentary types for subatomic semantics, or so we will argue. For variable adicity, we identify models with records, tracing open-endedness to the labels that may (or may not) appear in a record. For time-as-change, our models are strings, the alphabet of which is specified by the associated signature. It is helpful to flesh out the examples a bit more for orientation. 1
The theory of institutions has attracted considerable attention, and found numerous applications, including ontology design (e.g. Diaconescu 2012, Kutz et al 2010).
CISUC/TR 2014-02
31
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Example A
We can represent a calendar year as the string smo := Jan Feb Mar · · · Dec
of length 12, or, were we interested also in days d1,d2. . .,d31, the string smo,dy := Jan,d1 Jan,d2 · · · Jan,d31 Feb,d1 · · · Dec,d31 of length 365 for a non-leap year.2 In contrast to the points in the real line R, a box can split, as Jan in smo does (30 times) to Jan,d1 Jan,d2 · · · Jan,d31
in smo,dy , on introducing days d1, d2,. . ., d31 into the picture. Reversing direction and generalizing from mo := {Jan,Feb,. . .Dec} to any set A of temporal propopositions, we define the function ρA on strings (of sets) to componentwise intersect with A ρA (α1 · · · αn ) := (α1 ∩ A) · · · (αn ∩ A) (throwing out non-A’s from each box) so that ρmo (smo,dy ) = Jan
31
Feb
28
· · · Dec
31
.
Time-as-change is then enforced by compressing all repeating blocks αn (for n ≥ 1) of a box α in a string s to α for its block compression bc(s) if s = ααs0 bc(αs0 ) 0 α bc(βs ) if s = αβs0 with α 6= β bc(s) := s otherwise so that if bc(s) = α1 · · · αn then αi 6= αi+1 for i from 1 to n − 1. In particular, bc( Jan
31
Feb
28
· · · Dec
31
) = smo .
Writing bcA for the function mapping s to bc(ρA (s)), we have bcmo (smo,dy ) = smo . 2
We draw boxes (instead of the usual curly braces { and }) around sets-as-symbols, stringing together “snapshots” much like a cartoon/film strip. While the example of a calendar is (in itself) of marginal interest to subatomic semantics, it shows off clearly some tools to analyze, for instance, the pair The soup cooled in an hour The soup cooled for an hour familiar from Dowty 1979 (Fernando 2013, 2014).
CISUC/TR 2014-02
32
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
In general, we can refine a string sA of granularity A to one sA0 of granularity A0 ⊇ A with bcA (sA0 ) = sA . Iterating over a chain A ⊆ A0 ⊆ A00 ⊆ · · · , we can glue together strings sA , sA0 , sA00 , . . . such that bcX (sX 0 ) = sX for X ∈ {A, A0 , A00 , . . .}. That is, we can form the inverse limit of the maps bcX . Details are supplied in section 2, where constraints can be added by arranging the sentence morphisms in an institution to be inclusions (leaving sentences intact). We move away from sentence morphisms as inclusions to analyze records such as (a) for Davidson’s aforementioned example over an untyped universe U . (a) (b) who = jones l1 = u1 . . slow . how = deliberate h i l = u n n where = bathroom h i when = midnight h i with-what = knife Example B Given a large set Lab of labels (or attributes), we equip a set U with a partial function F : U × Lab + U from U × Lab to U . The idea is that a record such as (b) is represented by an entity u ∈ U such that the restriction F ∩ (({u} × Lab) × U ) of F to {u} × Lab is {((u, l1 ), u1 ), . . . , ((u, ln ), un )}. The partiality of F reflects the fact that not every element of Lab need be a label of every record, nor need every element of U be a record. As a Kripke frame (over accessibility relations labeled by Lab), F can be turned into a Kripke model over a set T of tests by adding a valuation v ⊆ U × T . As we shall see in section 3, the notion of an institution extends v to a satisfaction relation |=Σ between Σ-models in U and Σ-sentences based on T and Lab,3 subject to signature morphisms shaped by F . The signatures amount to types on sets of attribute-value pairs, where values may themselves be sets of attribute-value pairs. Types constitute the foundation of the proof-theoretic extension in Ranta 1994 of Discourse Representation Structures (Kamp and Reyle 1993). The focus here is not on discourse or anaphora but on the open-endedness of subatomic semantics, including the use of frames/records for predication (Fillmore 1985, Cooper 2012). We adopt the usual notational conventions, and take for granted the category Set of sets and functions. An institution (Sig, Mod, Sen, {|=Σ }Σ∈|Sig| ) consists of 3
The fonts on the record in (a) were chosen to suggest the following formalization: italics (e.g. who) for labels in Lab; small caps (jones) for elements of U ; and boldface (e.g. slow) for tests in T .
CISUC/TR 2014-02
33
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
- a category Sig with a set |Sig| of objects, called signatures - functors Mod : Sigop → Set and Sen : Sig → Set returning for each Σ ∈ |Sig|, a set Mod(Σ) of Σ-models and a set Sen(Σ) of Σ-sentences, and - for each Σ ∈ |Sig|, a binary relation |=Σ between Σ-models and Σ-sentences such that (†)
for all σ ∈ Sig(Σ, Σ0 ), M 0 ∈ Mod(Σ0 ) and ϕ ∈ Sen(Σ),
Mod(σ)(M 0 ) |=Σ ϕ ⇐⇒ M 0 |=Σ0 Sen(σ)(ϕ) .
The satisfaction condition (†) is applied below to refine the inverse limit mentioned in Example A (section 2), and the typings on records in Example B (section 3).
2
Time-as-change and bc-systems
Henceforth, we shorten “temporal proposition” to fluent, and fix an infinite set Φ of fluents. Let us write Fin(Φ) for the set of finite subsets of Φ. Given a set X, the power set 2X of X is the set of all subsets of X, while X + is the set of all non-empty strings over the alphabet X. A bc-system is a function f with domain Fin(Φ) such that for each A ∈ Fin(Φ), f (A) ∈ (2A )+ and f (B) = bcB (f (A))
whenever B ⊆ A.
In the terminology of institutions, the category Sig is Fin(Φ), partially ordered by ⊆ — i.e. Sig-morphisms are pairs (B, A) such that B ⊆ A ∈ Fin(Φ). Mod maps a finite subset A of Φ to the set Mod(A) := {bc(s) | s ∈ (2A )+ } and whenever B ⊆ A, Mod(A, B) : Mod(A) → Mod(B) is the restriction of bcB to Mod(A) — i.e., for s ∈ Mod(A), Mod(A, B)(s) = bcB (s) ∈ Mod(B). The set of bc-systems is the inverse limit, lim Mod, of the inverse system defined by ←− the contravariant functor Mod. Returning to Example A, note that some bc-systems are better than others; for instance, we may wish to throw out functions f for which f (mo) = Jan,May . To refine lim Mod, we can try refining Mod. Toward that end, we fix a Fin(Φ)-indexed ←− family {Sn(A)}A∈F in(Φ) of sets Sn(A) of sentences ψ with associated denotations [[ψ]] ⊆ (2A )+ .4 We let Sen map B ∈ Fin(Φ) to Sen(B) := {hAiψ | A ⊆ B and ψ ∈ Sn(A)} 4
For example, Sn(A) might consist of all regular expressions over the alphabet 2A . We need not, however, insist here on regularity.
CISUC/TR 2014-02
34
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
and whenever B ⊆ C, let Sen(B, C) be the inclusion Sen(B) ,→ Sen(C) preserving hAiψ (for A ⊆ B, ψ ∈ Sn(A)), with s |=B hAiψ
⇐⇒
bcA (s) ∈ [[ψ]]
for all s ∈ Mod(B). One can verify that the satisfaction condition (†) holds. Furthermore, given a Fin(Φ)-indexed family {Th(A)}A∈F in(Φ) of sets Th(A) ⊆ Sen(A), we can refine Mod to the presheaf ModTh : Sigop → Set [ ModTh (A) := {s ∈ Mod(A) | (∀ϕ ∈ Th(B)) s |=A ϕ} B⊆A
with ModTh (A, C) as the restriction of Mod(A, C) to ModTh (A) whenever C ⊆ A. Can we view a bc-system f as a completed whole without resorting to its finite approximations f (A)? To string f out fully, let us work with the set Q of rational numbers, and define a Q-string over Φ to be an expansion A = hQ, q) (r, r0 ) ∩ U = ∅}.
CISUC/TR 2014-02
35
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Next, for 1 ≤ j ≤ 2k + 1, we collect each a ∈ A such that Ua holds of every element of Ij in αj := {a ∈ A | Ij ⊆ Ua } and finally, block-compress for fA (A) := bcA (α1 α2 · · · α2k+1 ). Recalling the strings smo and smo,dy from Example A, we have Jan Feb Mar · · · Dec
Jan,d1 Jan,d2 · · · Dec,d31
= fA (mo) = fA (mo ∪ {d1, d2,. . .d31})
if A is the Q-string over Φ ⊇ mo ∪ {d1, d2,. . .d31} such that, for instance, UJan = (−∞, 1], UFeb = (1, 2], . . . , UDec = (11, +∞) 1 1 Ud1 = (−∞, −31] ∪ (1, 1 ] ∪ (2, 2 ] ∪ · · · ∪ (11, 12] 28 31 1 2 Ud2 = (−31, −30] ∪ (1 , 1 ] ∪ · · · ∪ (12, 13] 28 28 .. . Ud31 = (−1, 1] ∪ (2 Proposition 2.
30 , 3] ∪ · · · ∪ (41, +∞). 31
Given Q-strings A and A0 over Φ, A∼ = A0 implies fA = fA0 .
The converse of Proposition 2 fails without further assumptions. Already for the singleton Φ := {a}, there are five pairwise non-isomorphic Q-strings A with fA ({a}) = a , given by the possibilities (0, 1), [0, 1), (0, 1], [0, 1] and {1} for Ua . To cover all bc-systems f over a countable set Φ, it is enough to consider Q-strings over Φ meeting an additional requirement. Keeping in mind that a Q-segmentation cannot consist of exclusively open or exclusively closed intervals, unless there is only one interval in the Q-segmentation (i.e., Q), let us call a Q-string hQ, v(p) or that u(q) > u(p) and v(q) ≥ v(p). There is no way to make one person better off without making the other person worse off. Nash assumes moreover that the space S is convex.8 Using very natural axioms on the solution concept Nash proves that the final bargain will be the unique point p such that u(p) × v(p) is maximum. It is obvious that assuming that the players are choosing a Pareto optimal point, and there are at least two such, then there is a conflict. Neither can gain without the other losing. The element of cooperation enters through Nash’s notion of a fallback point. The fallback point F is the point to which they “fall back” in case they cannot arrive at a bargain, and this point is worse (for both) than any other point in S. 8 To take an example rather like that of Nash’s original example. Suppose that the two are restricted to a point in the set {(x, y)|2x + y ≤ 3}. The utilities are x for A and y for B. The fallback point is (0,0). Then the Pareto optimal points will be all the points on the line 2x+y = 3. But which particular point should be chosen? The product of the utilities is maximized at the point (.75, 1.5). But Nash does not speak about communication and there is no guarantee even that a Pareto optimal point will be reached, let alone Nash’s “ideal” point. To take a real life example, it seems highly unlikely that a Pareto optimal point will be reached in Ukraine.
CISUC/TR 2014-02
89
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Thus cooperation arises through the fact that both players want to avoid the fallback point and each needs the help of the other to achieve this. Grice’s cooperative principle is a special case of Nash’s. For suppose the utilities are aligned. I.e., if for any two points p and q we have u(q) > u(p) iff v(q) > v(p), then the Nash bargaining point which maximizes the product u(p) × v(p) is also the point which maximizes u(p). B gains by helping A to gain. The pedestrian helps the motorist to get gas for the sake of the small pleasure of helping another9 . But as we noted this is not the only case. The mere fact that the players both want to avoid X does not imply that their utilities are fully aligned. We now offer an example of how the Nash principle works. Suppose that an American tourist is in India and wants to buy a carved wooden elephant. He has already seen such an elephant in a store for Rs. 500 but sees a hawker selling the identical elephant for Rs. 400. It is customary to bargain with hawkers but what should the tourist offer?10 In this situation, the fallback situation is that the tourist abandons the hawker and buys his elephant in the store. But the hawker himself has bought the elephant for Rs. 40 and so any price paid from 41 rupees to 499 rupees would be better for both than the fallback situation, which is no sale for the hawker and a cost of Rs. 500 for the tourist. The element of cooperation arises because both parties want to avoid the fallback situation, but given this fact there is an element of conflict in that the hawker wants to charge more and the tourist wants to pay less. Here we assume that the utilities of A and B are not aligned although there must be some concord for communication to take place at all.11 Consider the following scenario. You are in small town in India12 . You have hired a rickshaw, a small motorised vehicle and you ask the driver where you can buy some silver jewelry. Now there are in fact many places in Jaipur where you can do that, but the driver will tell you about one where the shopkeeper knows him and will give him a commission. He will say, “Go to Raja Motimal’s Jewlery, they will give you the best quality. And in fact mention my name and they will give you a good price.” He may even offer to take you there. Now it may be, and probably is the case that if α is the action of going to “Raja Motimal” and β is the action of going to another place, say Rani Rukmini’s shop, then β has a higher utility for you, as her prices are better, while α has the higher utility for the driver as the Raja will give him a commission. 9
See for instance Tomasello, [19]. In a similar situation, Aumann offered 200, the offer was accepted and Aumann bought the elephant, only to find that the proper price would have been Rs. 50. 11 We do not consider the important and interesting case where A thinks they are aligned but they are not. 12 My own experience was in Jaipur. 10
CISUC/TR 2014-02
90
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
If Xis the condition of being where you are and G is the goal of buying silver jewelry, than both the Hoare assertions {X}α{G} and {X}β{G} are correct. But α yields a higher utility for the driver and β yields the higher utility for you. But the driver may not tell you about the Rukmini shop and you are stuck with Raja Motimal. The driver may not feel a pang of conscience if the utility of going to Raja Motimal’s is higher for you then the utility of simply not buying any jewelry. Perhaps Nash [15] would approve of the “bargain” that the two of you have worked out where each of you made a gain. Stalnaker, [18] lays out the conditions which will govern the transactions when the utilities are only partially aligned and B has various options of what to say. Then B will say something which will benefit him as well, but knowing that A might be suspicious, he will confine himself to saying something believable. Here it may happen that B and A have strategies for speaking and for believing which are in some sort of equilibrium. For a last example we consider the etchings dialog discussed by Steven Pinker in a youtube video https://www.youtube.com/watch?v=3-son3EJTrU as well as in a joint PNAS paper [16]. A young man and his date have just had dinner at a restaurant near his apartment building.13 After the dinner the young man says to his date “Would you like to come to my apartment and see my etchings?” Here the implicature might well be, “Would you like to come to my apartment and have sex?” It is not clear whether their utilities are aligned or if they are looking for different Pareto optimal points. Perhaps the young man actually has etchings and perhaps the young woman is interested in sex even though she has no interest in etchings. What bargain point will they arrive at? It is a mystery. In the full paper we will discuss this issue in more detail and propose a formalism.
4.1
Discussion and further work
Hoare logic is much more than just the simple case of a Hoare assertion which we mentioned. Hoare also has other rules. For instance, one rule of his is, from {X}α{Y }, {Y }β{Z} to derive {X}α; β{Z} Where α; β is the composition of α and β. 13
This insertion about the location of the restaurant is mine so as to make the scenario plausible.
CISUC/TR 2014-02
91
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
It could well be that the procedure which A intends to follow has parts, e.g., α, β, which B could guess and the implicature could refer to one of these parts. So a more complex theory than the one we have presented is possible.14 Acknowledgements: We thank Nicholas Allott, Anton Benz, Luciana Belotti, Michael Devitt, Stephen Neale, Prashant Parikh, Steven Pinker, Ram Ramnujam, Adriana Renero, Robert Stainton, and two referees for comments.
References [1] Nicholas Allott Game theory and communication Game Theory and Pragmatics (Benz, J¨ ager and van Rooij, editors), Palgrave Macmillan, (2006) pp. 123-152. ¨ ger and Robert van Rooij An introduction to game theory [2] Anton Benz, Gerhard Ja for linguists Game Theory and Pragmatics (Benz, J¨ager and van Rooij, editors), Palgrave Macmillan, (2006) pp. 1-82. [3] Anton Benz and Robert van Rooij, Optimal assertions and what they implicate, Topoi, vol. 26, 2007, pp. 63-78. [4] Luciana Belotti and Patrick Blackburn, Conversational Implicatures Paper in progress 2014 [5] Wayne Davis Implicature Stanford Encylopedia of Philosophy - online [6] Michael Devitt What makes a property “semantic”? Perspectives on Pragmatics and Philosophy (Capone, Piparo and Carapezza, editors) Springer (2013) pp. 87-112. [7] Joseph Farrell and Matthew Rabin Cheap talk Journal of Economic Perspectives vol. 10 (1996), no. 3 , pp. 103-118. [8] Michael Franke Game Theoretic Pragmatics Philosophy Compass vol. 8 2013), no. 3 , pp. 269-284. [9] Bart Geurts Scalar implicature and local pragmatics Mind & Language vol. 24 (2009), pp. 51-79. [10] Paul Grice, Studies in the Way of Words, Harvard, 1989. [11] C. A. R. Hoare, An axiomatic basis for computer programming, Communications of the Association for Computing Machinery, vol. 12 (1969), no. 10, pp. 576–580. [12] Laurence Horn Implicature The Handbook of Pragmatics ( Horn and Ward, editors), Blackwell Publishing (2004) pp. 3-25. [13] Dan Jurafsky Pragmatics and computational linguistics The Handbook of Pragmatics ( Horn and Ward, editors), Blackwell Publishing (2004) pp. 578-604. [14] Arthur Merin, Information, relevance and social decisionmaking Logic, Language and Computation (volume 2) (Moss, Ginzburg and de Rijke, editors), CSLI press, (1999) pp. 179-221 [15] John Nash The bargaining problem Econometrica (1950), pp. 155-162. 14 Indeed Grice himself supplies an example. Consider two algorithms αβ1 γ and αβ2 γ for making a cake. Here α is a preparatory stage, γ is actually putting the cake in the oven. β1 is adding sugar to the prepared mix, and β2 is adding salt to the prepared mix. If the goal is to have a tasty cake, then β1 will work as the middle step and β2 will not.
CISUC/TR 2014-02
92
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
[16] Steven Pinker, MA Nowak and JJ Lee The logic of indirect speech Proceedings of the National Academy of Sciences (2008), pp. 833-838. [17] Prashant Parikh, The Use of Language, CSLI, Stanford, 2001. [18] Robert Stalnaker Saying and meaning, cheap talk and credibility Game Theory and Pragmatics (Benz, J¨ ager and van Rooij, editors), Palgrave Macmillan, (2006) pp. 83-100. [19] Michael Tomasello Why We Cooperate MIT Press 2009. [20] Deirdre Wilson and Dan Sperber Relevance Theory The Handbook of Pragmatics (Benz, Horn and Ward, editors), Blackwell Publishing (2004) pp. 607-632.
CISUC/TR 2014-02
93
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
CISUC/TR 2014-02
94
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Computing Italian clitic and relative clauses with tupled pregroups Claudia Casadio1 Aleksandra Ki´slak-Malinowska2 1
Dept. of Philosophy, Univ. G. D’Annunzio, Chieti IT -
[email protected] 2 Faculty of Mathematics and Computer Science Univ. of Warmia and Mazury, Olsztyn PL -
[email protected]
Abstract Pregroup grammars are introduced by Lambek in 1999 as an algebraic tool for the formal analysis natural languages. The main focus of the present paper is placed on a special extension of this calculus known as tupled pregroup grammars. Their applications to Italian concerning different sentential structures involving clitic pronouns, relative pronouns and related phenomena are taken into consideration and the advantages of the tupled pregroup approach are shown from the point of view of both linguistic analysis and syntactic computation.
keywords: Pregroup grammars, natural language, clitic pronouns, relative clauses
1
Introduction
Some years ago E. Stabler [33] developed the idea of extending pregroups into tupled pregroup grammars and showed how to make use of them to treat some crucial aspects of English syntax. According to him the pregroup operations provide a simple feature checking and the tupling allows additional operations like ‘movement’. Languages definable in tupled pregroup grammars are weakly equivalent to multiple context-free grammars with the power to define the mildly context sensitive languages like {xx|x ∈ {a, b}∗ }, {an bn cn |n ≥ 1, a, b, c ∈ Σ}, {am bn cm dn |m, n ≥ 1, a, b, c, d ∈ Σ}. In this paper1 , following Stabler’s proposal, we introduce a tupled pregroup grammar for a fragment of Italian grammar including clitics, relative pronouns and related phenomena. The result obtained has the advantage of involving local applications of the rules and in the same time is consistent with the move vs. merge strategy put forward by Stabler in coherence with the Chomskian minimalist theory of grammar. 1
The authors are grateful to Jim Lambek and Wojciech Buszkowski for stimulating suggestions in the course of a number of discussions during the past years. Thanks are also due to the anonymous referees for useful comments. Support by MIUR (PRIN 2012, 60% 2013) is acknowledged by C. Casadio.
CISUC/TR 2014-02
95
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
2
Tupled pregroup grammars
The calculus of pregroups is developed by Lambek [20, 21] as an alternative to his Syntactic Calculus [19], a well known model of categorial grammar [24, 25]. Pregroups are a particular kind of substructural logic that is compact and non-commutative [6, 7]. In fact the calculus is a non-conservative extension of classical non-commutative multiplicative linear logic [1, 10]: the left and right ‘iterable’ adjoints of pregroups have as their counterparts the left and right ‘iterable’ negations of non-commutative multiplicative linear logic; in principle formulas can occur with n (left or right) adjoints (n ≥ 2), although 2 appears as being sufficient for linguistic applications; see [10, 21, 16]. In summary, a pregroup {G, . , 1, ` , r , →} is a partially ordered monoid in which each element a has a left adjoint a` , and a right adjoint ar such that a` a → 1 → a a` a ar → 1 → ar a where the dot “.”, that is usually omitted, stands for the pregroup operation, the compact conjunction or multiplication with unit 1, the arrow denotes the partial order, the rules a` a → 1 , a ar → 1 are called contractions, and the opposite rules 1 → a a` , 1 → ar a are called expansions. From the point of view of linguistic analysis, the constant 1 represents the empty string of types, and the operation “.” is interpreted as concatenation. Contractions like a` a → 1 and a ar → 1 are the basic means to determine if a string of words, to which pregroups types are assigned, is grammatical and, particularly, if it is a sentence. A pregroup grammar is freely generated by a partially ordered set of basic types a1 , ..., an : from each basic type a we form simple types by taking single or repeated adjoints: . . . a`` , a` , a, ar , arr . . . A compound type or just a type is a string of simple types. For exmple, assigning the basic type n to the syntactic category of noun phrases, and the simple type (nr s) to the syntactic category of verb phrases, one obtains that the concatenation (product) of type n with type nr s leads to the type s of a sentence by applying the contraction (n · (nr s)) → s. The types assigned to the words of a given language are assumed to be stored permanently in the speaker’s ‘mental’ dictionary, and metarules are introduced into the dictionary to simplify lexical assignments and make syntactic calculations quicker [20, 21]. Eventually metarules can be used to introduce semantic information, depending on the language under consideration; see e.g. [3]. In recent years several works have been developed to account for the semantics of pregroups and interesting connections have been individuated with compact monoidal categories, within category theoretical semantics [22, 28]. This field appears particularly promising for its relations to research in quantum logic and computation, with particular reference to vector space semantics. We cannot enter in details here, but we refer in particular to [27, 29]. Given their ‘simple’ type logical nature and their computational efficiency, pregroups have been applied, in a short span of time, to a wide range of languages from
CISUC/TR 2014-02
96
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
English and French to Polish, Arabic and Persian; see e.g. [9, 2, 3, 11, 16, 32]. Also their computational properties have been studied in several directions, e.g. [15, 4, 5, 14]. In this paper we will present a new approach to pregroups known as tupled pregroup grammar. In tupled pregroup grammars the lexicon consists of tuples of ordered pairs whose first element is a type and the second one is a symbol from the alphabet. Elements of the lexicon are called lexical entries. The idea of taking certain elements of an alphabet together in one tuple can be motivated and explained with the fact that in natural languages certain words tend to occur together in the sentence, as for example prepositions with nouns, pronouns with verbs, clitic pronouns with certain verbs, etc. In formal languages one may wish to copy each item of an alphabet (as in copying language {xx|x ∈ {a, b}∗ } ), to have the same number of occurrences of certain elements, etc. Let si be elements of a finite alphabet Σ ∪ (we can think of them as the words of a natural language, being an empty string), and let P be a set of simple l , for types, partially ordered by ≤. Types (TP ) are of the form pr1 pr2 ...prk v w1l w2l ...wm p1 , p2 , ...pk , v, w1 , w2 , ..., wm ∈ P, k, m ≥ 0, where pr and pl are called right and lef t adjoints of p, respectively, for any p ∈ P 2 . The latter fulfil the following inequations3 : pl p ≤ 1 and ppr ≤ 1. The lexical entries take the following form:
t1 t2 ... tk s1 s2 ... sk
Here s1 , ..., sk are elements of the alphabet and t1 , ..., tk are types. A merge operation applying to any pair of tuples is defined as follows:
t1 ... ti s1 ... si
t1 ... tk ti+1 ... tk = • s1 ... sk si+1 ... sk
An operation of deleting i-th coordinate, for any k-tuple k > 0 and any 1 ≤ i ≤ k is defined as follows: t1 ... ti−1 ti ti+1 ... tk t1 ... ti−1 ti+1 ... tk = s1 ... si−1 si si+1 ... sk −i s1 ... si−1 si+1 ... sk
Let us define a binary relation on tupled pregroup expressions, denoted by ⇒ that
holds in the following cases, for any tuples e1 , e2 and sequence of tuples α, β: (M rg)
α e1 e2 β ⇒
α e 1 • e2 β
2 Every type must contain exactly one basic type v (without the superscript r or l) and may contain an arbitrary number (possibly equal to zero) of right and left adjoints. 3 Presenting types in that form is a great simplification, but it is done on purpose for our linguistic applications; for more details see [33].
CISUC/TR 2014-02
97
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(M ove) α
t1 ... tk s1 ... sk
β⇒α
ti tj si sj
t1 ... tk • β s1 ... sk −i−j
(Move) applies to any k-tuple (k > 1), for any 1 ≤ i ≤ k and 1 ≤ j ≤ k. The type in any coordinate can be contracted, for any a, b such that a ≤ b: xy x bl a y (GCon) α ... ... β ⇒ α ... ... β s s r xab y xy (GCon) α ... ... β ⇒ α ... ... β s s Additionally, (M rg) can be applied to a pair of tuples only when in one tuple all types are of the form v (without left and right adjoints) and (M ove) takes two items of a tuple if one of the types is of the form v. A tupled pregroup grammar is G = (Σ, P, ≤, I, S), where Σ is a finite alphabet, P is a set partially ordered by ≤, I is a relation I ⊂ (TP × (Σ ∪ {})∗ , S is a designated type (here the type of a sentence). The language of the grammar is S ∗ ∗ } LG = {s|I ⇒ s
3
Applying tupled pregroup grammar to chosen aspects of Italian grammar
We now introduce an application of a tupled pregroup grammar to a well known aspect of the Italian language: clitic pronouns placement and occurrence within certain relative clauses4 . Single tuples are put together with those containing two or three coordinates - the first word in the tuple being usually a verb in its inflected form (remind again that the order of pairs (type, word) within a tuple is not important and it can be switched without any consequences). Empty strings play the role of ordering the words or strings of words within the sentence. The idea behind the tuples definitions is that some words are indispensable in sentences with clitic pronouns, and have to occur together in order to fulfill certain features and peculiarities of the grammar. In comparison with former approaches (see [9],[11]) the types assigned to the words are less complicated and not so numerous; nonetheless, the proposed tupled pregroup grammar ‘catches’ all the acceptable clitic patterns and excludes those considered wrong by Italian grammarians (see e.g. [23]). Italian clitics exhibit two basic patterns: clitic pronouns can occur both in pre-verbal and post-verbal position, keeping the same relative order of occurrence: locative/indirect object/direct object. Subjects do not allow clitic counterparts and concerning the other verbal arguments, clitic unstressed elements can be attached both to the main 4 On the subject of clitic pronouns in Romance languages, with particular reference to French and Italian, there is an extensive and articulated literature, both in the field of theoretical linguistics and of type logical grammars. We cannot enter here into details, but cf. [31, 23, 26]; on the side of pregroups, we refer to the two leading papers: [2, 9].
CISUC/TR 2014-02
98
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
verb or to the auxiliaries and the modal verbs. Therefore the set of clitic types will include types for the direct object (accusative), types for the indirect object (dative), and types for the locative object. The types used in our examples are: s1 π3
type of a sentence in a present tense third person subject we assume π3 ≤ om4 ≤ o4 (The assumption may be interpreted this way: third person subject in the sentence can also play the role of a direct object with masculine gender and the latter may be treated as a direct object when gender does not matter.) nm noun masculine singular nf noun feminine singular Nm noun phrase masculine singular for example un gatto, il cane we assume that Nm ≤ π3 and Nm ≤ o4m ≤ o4 Nf noun phrase feminine singular for example una foto, la foto we assume that Nf ≤ π3 and Nf ≤ o4f ≤ o4 i infinitive of the verb, for example dare ¯i infinitive form of the verb phrase with a clitic, for example darla, darmi, darmela, poterla avere vista o4 an accusative (direct) object (as i.e. una ragazza, una bella ragazza, etc.) o3 a dative (indirect) object (as a me, a una ragazza, etc.) oˆ4 an accusative (direct) personal pronoun occurring within a tuple, i oˆ4 accompanied with the verb as in dare la oˆ3 a dative(direct) personal pronounoccurring within a tuple, i oˆ3 accompanied with the verb as in dare mi oˇ3 a dative (direct) personal pronoun occurring within a tuple, accompanied i oˇ3 oˆ4 with the verb and an accusative personal pronoun as in dare me la Note: we do not compare either o4 with oˆ4 or o3 with oˆ3 . p2 the plain form of past participle (visto, dato) of the verb which takes an auxiliary verb avere to form past participle p¯2 the full form of past participle (visto una ragazza, dato una mela a me) of the verb with an auxiliary verb avere and we assume that p2 ≤ p¯2
Let the dictionary (showing only tuples used in our examples) be as follows, for more see [12]: r l ¯ip¯2 l p2 oˇ3 oˆ4 π3 i oˆ4 s1 p¯2 l π3 s1 o4 I={ data me la Mario vedere la avere ha ha r l l l l l ¯ ¯ Nf nf π3 s1 nm nf p¯2 i s1 i Nm nm Nm nm gatto foto dorme voluto vuole il un una
CISUC/TR 2014-02
99
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
r r r r r r r i oˆ4 w1 n m w2 x s1 oˆ4 π3 s1 s1 oˆ4 oˇ3 π3 s1 π3r s1 ol4 guardare lo che cui vede r r r r r l l l w1 π3 π3 s1 w2 n m n m x π3 π3 s1 ...} Example 1 Let us start with a very simple example Mario vede [sees] Dario ([Mario] Nm Nm [Dario] - Mario sees Dario) with the following lexical entries Mario Dario r l π3 s1 o4 vede
Using the rules of (M rg), (M ove) and (GCon) we can justify the correctness of this sentence with the following derivation: r l Nm Nm π3 s1 o4 ⇒ (Mrg on the second and third tuples) Mario Dario vede r l Nm π3 s1 o4 Nm ⇒ Nm ≤ o4 (making use of partial order) Mario vede Dario r l Nm π3 s1 o4 o4 ⇒ (Move on coordinates of the second tuple) Mario vede Dario Nm π3r s1 ol4 o4 ⇒ (GCon in the second tuple) Mario vede Dario Nm π3r s1 ⇒ (Mrg on the first and the second tuple) Mario vede Dario Nm π3r s1 ⇒ Nm ≤ π3 (making use of partial order) Mario vede Dario π3 π3r s1 ⇒ (Move on coordinates of the tuple) Mario vede Dario π3 π3r s1 ⇒ (GCon within the tuple) Mario vede Dario s1 Mario vede Dario Sometimes the derivation is presented in graphical form by means of non-crossing underlinks: M ario vede Dario Nm π3r s1 ol4 Nm
⇒ s1 (here Nm ≤ o4 and Nm ≤ π3 )
The above given example can be done in classical pregroup grammars - the example serves only to see how the mechanism of derviation of the sentence works. The examples with tuples with more coordinates will be shown below.
CISUC/TR 2014-02
100
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Example 2 Using the rules of (M rg), (M ove) and (GCon) we can justify the correctness of the sentence Mario la vuole vedere ([Mario] [her] [wants] [(to) see] - Mario wants to see her ) and proceed as follows r r r Nm i oˆ4 s1 oˆ4 π3 s1 s1¯il ⇒ Mario vedere la vuole Nm sr1 oˆr4 π3r s1 s1¯il i oˆ4 ⇒ Mario vuole vedere la Nm sr1 oˆr4 π3r s1 s1¯il i oˆ4 ⇒ i ≤ ¯i Mario vuole vedere la r r r Nm s1 oˆ4 s1 oˆ4 π3 s1 ⇒ Mario vuole vedere la Nm s1 oˆ4 sr1 oˆr4 π3r s1 ⇒ Mario vuole vedere la Nm s1 sr1 oˆr4 π3r s1 oˆ4 Nm oˆr4 π3r s1 oˆ4 ⇒ ⇒ Mario vuole vedere la Mario vuole vedere la Nm π3r s1 Nm oˆ4 oˆr4 π3r s1 ⇒ ⇒ la vuole vedere Mario la vuole vedere Mario Nm π3r s1 ⇒ Nm ≤ π3 Mario la vuole vedere π3 π3r s1 π3 π3r s1 ⇒ ⇒ Mario la vuole vedere Mario la vuole vedere s1 Mario la vuole vedere We can also express the derivation in a graphical form as given below, with the two coordinates of one tuple depicted in red:
Mario Nm
T12 ↓ la oˆ4
vuole s1¯il
T22 ↓ vedere i
sr1
oˆr4
π3r
s1 ⇒ s1
or by means of non-crossing underlinks: M ario la vuole vedere Nm oˆ4 s1 ¯il i
CISUC/TR 2014-02
. sr1 oˆr4 π3r s1 ⇒ s1
101
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Apart from being a very useful tool for the syntactic analysis and computation of natural language, tupled pregroup grammars can also handle semantic analysis. We illustrate the semantical interpretation for our two examples. Basic types are treated as functional or relational symbols and the iterated types determine the argument places. Single basic types - not introducing argument places - are translated by individual constants. It is possible to add a semantic translation for each lexical item. Then the translation of a sentence is computed by substitution according to the links (contractions). In Examples 2 and 3 we now add translation to semantic types in the following way: Mario Dario vede vuole la vedere vedere
Nm Nm π3r s1 ol4 s1¯il oˆ4 i i sr1 oˆr4 π3r s1
mario dario vedere(x1 , x2 ) volere(y) la vedere vedere( (la)) id(z1 (z3 , z2 ))
(within a tuple) (within a tuple)
Contractions according to underlinks define the substitutions. Thus in our examples we get: x1 → mario x2 → dario y → vedere( (la)) z1 → volere(y) z2 → la z3 → mario Now we are able to compute the semantics of the sentence Mario vede Dario by substitution in the following way: vedere(x1 , x2 )= vedere(mario, dario) Before we compute the semantics of the sentence Mario la vuole vedere we would i oˆ4 like to emphasise the meaning of the tuple (below in blue). The type of vedere la the infinitive of the verb vedere [to see] should take as a complement the direct object in accusative case and the syntactic type of it should be iol4 - but in the tuple the complement is already given (la). This fact is visible in the semantic types of both la and vedere, the first taking la as a direct object, the second treated as a trace of it. mario la volere y verdere( (la)) M ario la vuole vedere
CISUC/TR 2014-02
z1 z2 z3 id
102
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Nm
oˆ4
s1 ¯il
i
sr1 oˆr4 π3r s1 ⇒ s1
Now the meaning of the sentence is computed as follows: id(z1 (z3 ,z2 ))= id(volere(y)(mario,la))= id(volere(vedere( (la))(mario,la)= (volere(vedere( (la))(mario,la) Here id(x)=x is the identity function. The meaning of the sentence is read as follows: the function volere(vedere) [to want to see] takes two arguments - one for the subject and the other one for an object. In this case the direct object is a clitic which has to be placed before the verb phrase in the sentence and the tuple enables putting it in the right place with respect to the syntax. Normally the direct object is governed by the verb vedere [to see] (this is depicted by (vedere( (la) which may be treated as a trace), however, in the present example the clitic pronoun in accusative case must take preverbal position and the modal verb volere [to want] is also involved. Example 3 We now show how to justify for the correctness of the following sentence: Mario me la ha voluto avere data. ([Mario] [to me] [it] [has] [wished] [(to) have] [given] - Mario wished to have given it to me) The below given tuples are appropriate for the derivation. The attention should be given to a tuple with three coordinates, as the lexical p2 oˇ3 oˆ4 entries are indispensable and should be taken together . Then our data me la derivation looks as follows: r r r r ¯ip¯2 l p2 oˇ3 oˆ4 s1 oˆ4 oˇ3 π3 s1 Nm s1 p¯2 l p¯2¯il ⇒ data me la Mario avere ha voluto ¯ip¯2 l Nm p2 oˇ3 oˆ4 p¯2¯il s1 p¯2 l Mario voluto avere data me la ha r r r r s1 oˆ4 oˇ3 π3 s1 ⇒ p2 ≤ p¯2 r r r r ¯i Nm oˇ3 oˆ4 s1 oˆ4 oˇ3 π3 s1 p¯2¯il s1 p¯2 l ⇒ Mario avere data me la voluto ha r r r r Nm p¯2 oˇ3 oˆ4 s1 oˆ4 oˇ3 π3 s1 s1 p¯2 l ⇒ Mario voluto avere data me la ha r r r r Nm s1 oˇ3 oˆ4 s1 oˆ4 oˇ3 π3 s1 ⇒ Mario ha voluto avere data me la Nm s1 sr1 oˆr4 oˇ3 r π3r s1 oˇ3 oˆ4 ⇒ Mario ha voluto avere data me la Nm oˆr4 oˇ3 r π3r s1 oˇ3 oˆ4 ⇒ Mario ha voluto avere data me la Nm oˆ4 oˆr4 oˇ3 r π3r s1 oˇ3 ⇒ Mario la ha voluto avere data me
CISUC/TR 2014-02
103
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
oˇ3 Nm oˇ3 r π3r s1 ⇒ la ha voluto avere data me Mario Nm oˇ3 oˇ3 r π3r s1 ⇒ ... me la ha voluto avere data Mario s1 Mario me la ha voluto avere data
Again, another way of presenting the sentence with three-coordinate tuple indispensable in our derivation:
Mario
T13 ↓ me
Nm
oˇ3
T23 ↓ la
ha
oˆ4
s1 p¯2 l
voluto p¯2¯il
avere ¯ip¯2 l
T33 ↓ data
p2
sr1 oˆr4 oˇ3 r π3r s1
⇒ s1
and the same sentence presented with the non-crossing links with semantical types added: mario me la avere x volere y avere z dare( (me), (la)) x1 x2 x3 x4 id M ario me la ha voluto avere data ¯i p¯2 l p2 Nm oˇ3 oˆ4 s1 p¯2 l p¯2 ¯il sr1 oˆr4 oˇ3 r π3r s1
Mario me la ha voluto avere data
Nn oˇ3 oˆ4 s1 p¯l2 p¯2¯il ¯ipl2 p2 sr1 oˆr4 oˇr3 π3r s1
mario me la avere(x) volere(y) avere(z) dare( (me), (la) id(x1 (x4 , x3 , x2 ))
(within a tuple) (within a tuple)
(within a tuple))
The substitutions are as follows: x1 → avere(x) x2 → la x3 → me x4 → mario x → volere(y) y → avere(z) z → dare( (me), (la)) Now we are able to compute the semantics of the sentence Mario me la ha voluto avere data by substitution in the following way:
CISUC/TR 2014-02
104
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
id(x1 (x4 , x3 , x2 ))= id(avere(volere(avere(dare( (me), (la))))))(mario, me, la)= (avere(volere(avere(dare( (me), (la))))))(mario, me, la) The interpretation of the meaning of the sentence is similar to the one given in Example 2, with dare [to give] being a ditransitive verb. Example 4 We can justify for the correctness of the sentence with a relative pronoun Mario che ha un gatto vede una foto ([Mario] [who] [has] [a] [cat] [looks] [at its] [picture] ) and proceed with the following tuples: r r r l r l Nm w1 n m nm w1 π3 π3 sl1 π3 s1 o4 Nm nlm π3 s1 o4 Mario che gatto ha un vede Nf nlf nf s1 ⇒ ... ⇒ foto Mario che ha un gatto vede una foto una Another way of presenting the sentence with two-coordinate tuple indispensable in our derivation below. Note that we make use of assumptions Nm ≤ o4 and Nf ≤ o4 :
Mario
T12 ↓ che
Nm
w1
T22 ↓
ha
un
gatto
vede
una
foto
w1r π3r π3 sl1
π3
π3r s1 ol4
Nm nlm
nm
π3r s1 ol4
Nf nlf
nf
and the same sentence presented with the non-crossing links: M ario che ha un gatto vede una f oto r r l r l l r l Nm w1 w1 π3 π3 s1 π3 π3 s1 o4 Nm nm nm π3 s1 o4 Nf nlf nf ⇒ s1 The meaning of the sentence, obtained in a similar way as in the former examples, gives us the following semantics: vedere((mario(avere(X, un(gatto)), che(X))), una(foto)) It can be seen that the function vedere(x,y) for vede [looks] takes two arguments (the subject and an object), and the subject (Mario) is modified by an appropriate relative clause. Example 5 By means of tupled pregroup grammars we are also able to account for a sentencehood of the following string of words in Italian Mario il cui gatto dorme vede una foto ([Mario] [-] [whose] [cat] [is sleeping] [looks] [at its picture] ). For that reason we need the following tuples from our lexicon:
CISUC/TR 2014-02
105
NLCS’14 / NLSR 2014
⇒ s1
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
r r r r Nm w2 x nm π3 s1 Nm nlm x π3 π3 sl1 w2 nm nlm Mario cui gatto dorme il r l l Nf nf nf π3 s1 o4 ⇒ ... foto vede una s1 ⇒ Mario il cui gatto dorme vede una foto Below the same sentence presented with boxes:
Mario
T12 ↓
Nm
x
xr π3r π3 sl1
il
T22 ↓ cui
gatto
dorme
Nm nlm
w2
w2r nm nlm
nm
π3r s1
vede una foto π3r s1
⇒ s1
and the same sentence presented with the non-crossing links: M ario il cui gatto dorme vede una f oto π3 x xr π3r π3 sl1 Nm nlm w2 w2r nm nlm nm π3r s1 π3r s1 ol4 Nf nlf nf ⇒ s1 Example 6 By means of tupled pregroup grammars we are also able to account for a sentencehood of the following strings of words in Italian (combining relative pronouns with clitics) Mario che ha un gatto lo vuole guardare ([Mario] [who] [has] [a] [cat] [it] [wants] [to observe] Mario who has a cat wants to observe it) or Mario il cui gatto dorme lo vuole guardare ([Mario] [-] [whose] [cat] [is sleeping] [it] [wants] [to observe] - Mario whose cat is sleeping wants to observe it). Let us conclude with the last sentence with the following lexicon elements: r r r r nm π3 s1 π3 w2 x x π3 π3 sl1 w2 nm nlm Nm nlm Mario cui gatto dorme il r r r l ¯ i oˆ4 s1 oˆ4 π3 s1 s1 i ⇒ ... guardare lo vuole s1 ⇒ Mario il cui gatto dorme lo vuole guardare and, presented with boxes, the sentence looks as follows:
CISUC/TR 2014-02
106
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Mario π3 0 T12 ↓ lo
oˆ4
4
T12 ↓
il
x
xr π3r π3 sl1
Nm nlm
vuole s1¯il
0 T22 ↓ guardare
i
sr1 oˆr4 π3r s1
T22 ↓ cui
gatto
dorme ...
w2
w2r nm nlm
nm
π3r s1 ...
⇒ s1
Conclusions
In this paper examples involving both clitic pronouns and relative clauses have been considered, showing how the tuples introduced by this model of pregroup grammar can effectively constitute the building blocks of the grammatical structure of a natural language. The boxes presentations aim in fact at suggesting this idea of step-by-step syntactical analysis, while the underlink representations allow one to sketch the lines for a semantic analysis based on appropriate substitutions. Although the examples presented here are in a relatively small number, they are representative of two characteristic aspects of Italian grammar: preverbal clitic attachment and clitic climbing with modal auxiliaries, and cooccurrence of clitic pronouns with relative clauses.
References [1] Abrusci, M.: Classical Conservative Extensions of Lambek Calculus. Studia Logica 71, 277–314 (2002) [2] Bargelli, D., Lambek J.: An Algebraic Approach to French Sentence Structure, In De Groote, P., Morrill, G., Retor´e, C. (eds.) Logical Aspects of Computational Linguistics. LACL 2099, Pages, 62–78. Springer, Berlin (2001) [3] Bargelli, D., Lambek, J.: An Algebraic Approach to Arabic Sentence Structure, Linguistic Analysis, Volume 31, Pages 301–315, (2001) [4] B´echet, D.: NP-completeness of grammars based upon products of free pregroups. In C. Casadio, B. Coecke, M. Moortgat, and P. Scott, editors, Categories and Types in Logic, Language and Physics, Festschrift on the occasion of Jim Lambek’s 90th birthday, LNCS 8222 51-62, Springer-Verlag (2014) [5] B´echet, D., Foret, A.: A pregroup toolbox for parsing and building grammars of natural languages, Linguistic Analysis, 36 (1-4), 473-482 (2010) [6] Buszkowski, W.: Lambek Grammars Based on Pregroups, in Lecture Notes in Computer Science, LACL 2099, Springer, 95-109 (2001) [7] Buszkowski, W.: Pregroups: Models and Grammars. In: Relational Methods in Computer Science LNCS 2561, 35-49 (2002) [8] Buszkowski, W., Lin, Z.: ’Pregroup Grammars with Letter Promotions’, in Lecture Notes in Computer Science, LATA 6031, Springer, 130-141 (2010)
CISUC/TR 2014-02
107
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
[9] Casadio, C., Lambek, J.: An Algebraic Analysis of Clitic Pronouns in Italian, in A. Lecomte, F. Lamarche and G. Perrier (eds.), Logical Aspects of Computational Linguistics, LNAI 2099, Springer, 110-124 (2001) [10] Casadio, C., Lambek, J.: A Tale of Four Grammars. Studia Logica 71(2), 315–329 (2002) [11] Casadio, C.: Agreement and Cliticization in Italian: A Pregroup Analysis, Lecture Notes in Computer Science, LNCS 6031, Springer, 166–177 (2010) [12] Casadio, C., Ki´slak-Malinowska, A.: Tupled Pregroups. A Study of Italian Clitic Patterns, Proceedings LENLS 9,ISBN 978-4-915905-51-3 (JSAI), 1–12 (2012) [13] Casadio, C., Ki´slak-Malinowska, A.: Italian Clitic Patterns in Pregroup Grammar: State of the Art. Categories and Types in Logic, Language, and Physics, Festschrift on the occasion of Jim Lambek’s 90th birthday, LNCS 8222, Springer, 156-171 (2014) [14] Foret, A.: A modular and parameterized presentation of pregroup calculus. Journal of Information and Computation 208(5), 510–520 (2010). [15] Francez, N., Kaminski, M: Commutation-Augmented Pregroup Grammars and Mildly Context-Sensitive Languages. Studia Logica 87(2/3), 295-321 (2007) [16] Ki´slak-Malinowska, A.: Polish Language in Terms of Pregroups, in C. Casadio and J. Lambek (eds.), Computational Algebraic Approaches to Natural Language, Polimetrica, 145– 172 (2008) [17] Ki´slak-Malinowska, A.: Some Aspects of Polish Grammar in Terms of Tupled Pregroups, Linguistic Analysis, 93–119 (2010) [18] Ki´slak-Malinowska, A.: Extended Pregroup Grammars Applied to Natural Languages, Logic and Logical Philosophy, Vol. 21, 229–252 (2012) [19] Lambek, J.: The Mathematics of Sentence Structure. American Mathematics Monthly 65, 154–169 (1958) [20] Lambek, J.: Type Grammars Revisited, in A. Lecomte, F. Lamarche and G. Perrier (eds.), Logical Aspects of Computational Linguistics, LNAI 1582, Springer, Berlin, 1-27, (1999) [21] Lambek, J.: From Word to Sentence, Polimetrica, (2008) [22] Lambek, J.: Compact Monoidal Categories from Linguistics to Physics, In: New Structures for Physics, Coecke, B. (ed.): Lecture Notes in Physics, Springer-Verlag, Vol 813, 467–487, (2011) [23] Monachesi, P.: A lexical Approach to Italian Cliticization, Lecture Notes 84, CSLI, Stanford (1999). [24] Moortgat, M.: Categorical Type Logics. In: van Benthem, J., ter Meulen, A. (eds.) Handbook of Logic and Language, Elsevier, Amsterdam, 93–177 (1997) [25] Moortgat, M.: Symmetric Categorial Grammar. Journal of Philosophical Logic, 38, 681– 710 (2009) [26] Moot, R., Retor´e, C.: Les indices pronominaux du francais dans les grammaires cat´egorielles, Lingvisticae Investigationes, Vol 29, 1, 137–146 (2006) [27] Preller, A., Category Theoretical Semantics for Pregroup Grammars, Logical Aspects of Computational Linguistics, LNCS 3492, 238-254 (2005) [28] Preller, A., Lambek, J.: Free Compact 2-Categories, Mathematical Structures in Computer Science, Vol 17, 309–40 (2007). [29] Preller, A., Sadrzadeh, M.: Semantic Vector Models and Functional Models for Pregroup Grammars, Journal of Logic, Language and Information 20 (4), 419-443 (2011)
CISUC/TR 2014-02
108
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
[30] Preller, A., Degeilh, S.: Efficiency of Pregroups and the French Noun Phrase, Journal of Logic, Language and Information, 14, 423-444 (2005) [31] Rizzi, L.: Issues in Italian Syntax. Foris Publications, Dordrecht, 1982; second printing: Mouton de Gruyter, The Hague (1993) [32] Sadrzadeh, M.: Pregroup Analysis of Persian Sentences, in Casadio, C., Lambek, J. (eds.): Recent Computational Algebraic Approaches to Morphology and Syntax, Polimetrica, Milan (2008) [33] Stabler, E.: Tupled pregroup grammars, in C. Casadio and J. Lambek (eds.), Computational Algebraic Approaches to Natural Language, Polimetrica, 23–52 (2008)
CISUC/TR 2014-02
109
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
CISUC/TR 2014-02
110
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Toward a Logic of Cumulative Quantification Makoto Kanazawa1 and Junri Shimada2 1
National Institute of Informatics, Tokyo, Japan
[email protected] 2 Keio University, Tokyo, Japan
[email protected] Abstract
This paper studies entailments between sentences like three boys kissed five girls involving two or more numerically quantified noun phrases that are interpreted as expressing cumulative quantification in the sense of Scha (1984). A precise characterization of when one such sentence entails another such sentence that differs from it only in the numerals is crucially important to evaluate claims about scalar implicatures arising from the use of those sentences, as pointed out by Shimada (to appear). This problem turns out to be non-trivial and surprisingly difficult. We give a characterization of these entailments for the case of sentences with two noun phrases, together with a complete axiomatization consisting of two simple inference rules. We also give some valid inference rules for sentences with three noun phrases.
1
Introduction
This paper concerns sentences expressing cumulative quantification (Scha, 1984), exemplified by (1): Three boys kissed five girls. (1) We call sentences like (1) cumulative sentences, focusing only on their cumulative reading. In general, a cumulative sentence may involve k numerically quantified noun phrases and a verb expressing a k-ary relation. A general form of a cumulative sentence may be schematically represented by n1 N1 V n2 N2 . . . nk Nk ,
(2)
where each ni is a number word, Ni is a count noun, and V is a verb. Following Krifka (1999), we assume that the relevant truth conditions of sentences of the form (2) are as given in (3), where we write πi (R) for the ith projection { xi | (x1 , . . . , xk ) ∈ R } of a k-ary relation R: ∃X1 . . . ∃Xk
CISUC/TR 2014-02
k ^
i=1
|Xi | = ni ∧ Xi ⊆ JNi K ∧ Xi = πi (JVK ∩ (X1 × · · · × Xk )) .
111
(3)
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
b1
g1
b2
g2
b3
g3 g4 g5
Figure 1: A model where three boys kissed five girls and three boys kissed four girls are true but two boys kissed five girls is false. The truth conditions in (3) are deliberately weaker than those suggested by Scha (1984), which may be represented as follows: k ^
i=1
|πi (JVK ∩ (JN1 K × · · · × JNk K))| = ni .
(4)
Krifka’s (1999) idea was that (3) represents the truth conditions that are directly delivered by compositional semantics involving plural predication,1 but they are pragmatically strengthened by scalar implicatures, resulting in something like (4). In more detail, Krifka (1999) assumes that when a speaker utters a sentence of the form (2) for a specific choice of N1 , . . . , Nk , V, all the sentences of the form (2) for the same choice of N1 , . . . , Nk , V constitute the set of relevant alternatives for the sentence, partially ordered by the entailment relation. According to Krifka, Grice’s maxim of Quantity has the effect that the numbers n1 , . . . , nk actually used in the utterance must be the highest numbers that make the sentence true. A problem with this account, as pointed out by Shimada (to appear), is that the partial order on Nk given by the entailment relation between the truth conditions (3) is not the familiar coordinatewise order defined by (n1 , . . . , nk ) ≤ (m1 , . . . , mk ) ⇐⇒ n1 ≤ m1 ∧ · · · ∧ nk ≤ mk . For instance, three boys kissed five girls, understood according to (3), entails three boys kissed four girls, but not two boys kissed five girls (see Figure 1). This means that a speaker who knows the first of these three sentences to be true may nevertheless choose to utter the last sentence to convey information not carried by the former. There is no reason, then, to expect that Grice’s maxim forces the speaker to choose the highest 1
Representing plural individuals as sets of atomic individuals, we may suppose that if the bare form of a verb V denotes a k-ary relation JVK on atomicVindividuals, its plural form denotes its closure under cumulativity, which amounts to { (X1 , . . . , Xk ) | ki=1 Xi = πi (JVK ∩ (X1 × · · · × Xk )) }. Numerically quantified noun phrases ni Ni would denote generalized quantifiers { P | ∃X(X ⊆ JNi K ∧ |X| = n ∧ X ∈ P ) } over plural individuals and combine with the plural verb denotation in the standard way to produce (3).
CISUC/TR 2014-02
112
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
numbers that make the sentence true. Shimada (to appear) claims that this is as it should be, and an utterance of two boys kissed five girls in fact does not implicate the negation of three boys kissed five girls. The actual entailment relation between cumulative sentences understood according to the truth conditions in (3), even for the case k = 2, is rather complex. A precise characterization of this relation is important to understand how the truth conditions of cumulative sentences may be pragmatically strengthened by scalar implicatures. Shimada (to appear) gives two sufficient conditions for the entailment to hold, for the case k = 2. In Section 3, we reformulate Shimada’s sufficient conditions and show that they form a complete set of inference rules. In Section 4, we consider the case k = 3, and give some valid inference rules which will hopefully form part of a complete system of rules. Let us write RX1 . . . Xk for the formula k ^
i=1
Xi = πi (R ∩ (X1 × · · · × Xk ))
(5)
and R(n1 , . . . , nk ) for the closed formula ∃X1 . . . ∃Xk
k ^
i=1
|Xi | = ni ∧ RX1 . . . Xk
!
.
(6)
Since the truth conditions (3) can be equivalently expressed in the form (6) with R = JVK ∩ (JN1 K × · · · × JNk K), the entailment relation between cumulative sentences (with the same choice of nouns and verbs) corresponds to the logical consequence relation between the closed formulas of the form (6) for various choices of n1 , . . . , nk . As usual, we write R(n1 , . . . , nk ) |= R(m1 , . . . , mk ) (7) to mean R(m1 , . . . , mk ) is a logical consequence of R(n1 , . . . , nk ). Since (6) is evidently definable by an existential first-order formula with n1 + · · · + nk bound variables, the relation (7), understood as a 2k-ary relation on the natural numbers, is decidable. This fact alone, however, does not suggest any efficient procedure to decide whether (7) is true. Strictly speaking, logical consequence (in the model-theoretic sense) is a relation between closed formulas of a formal language, so when we write (7), we are using R as a predicate symbol, not as a k-ary relation (i.e., set of ordered k-tuples). A model for formulas of the form R(n1 , . . . , nk ) is a structure M = (M, RM ), where M is a non-empty set and RM is a k-ary relation on M . The relation (7) holds if and only if every model of R(n1 , . . . , nk ) is a model of R(m1 , . . . , mk ). In this paper, we choose to be sloppy and will not make a clear distinction between syntax and semantics. This should not lead to any confusion.
CISUC/TR 2014-02
113
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
2
Preliminaries
We employ the projection (π) and selection (σ) operators from relational database theory, using component numbers 1, 2, . . . , k to pick out attributes of a k-ary relation (see Ullman, 1988). For example, if R is a ternary relation, π1,2 (R) = { (x, y) | (x, y, z) ∈ R },
σ$16=a (R) = { (x, y, z) ∈ R | x 6= a }. Usually, the condition F in a selection operator σF must be a Boolean combination of equalities and inequalities, but in this paper we allow a selection operator of the form σ$i∈D . When R is a binary relation, we also use standard notations like • dom(R) and ran(R) for π1 (R) and π2 (R), • R−1 for π2,1 (R), • R−1 (b) for π1 (σ$2=b (R)), • R−1 (D) for π1 (σ$2∈D (R)), etc. If R is a binary relation, its deterministic reduct is2 d(R) = { (x, y) ∈ R | ∀y 0 ((x, y 0 ) ∈ R → y 0 = y) }. Clearly, d(R) is a partial function. Lemma 1. σ$1∈dom(d(R)) (R) = d(R). Proof. Clearly, d(R) = { (x, y) ∈ R | ∀y 0 ∀y 00 (((x, y 0 ) ∈ R ∧ (x, y 00 ) ∈ R) → y 0 = y 00 ) }, and so dom(d(R)) = { x ∈ dom(R) | ∀y 0 ∀y 00 (((x, y 0 ) ∈ R ∧ (x, y 00 ) ∈ R) → y 0 = y 00 ) }. It easily follows that d(R) = { (x, y) ∈ R | x ∈ dom(d(R)) }. Lemma 2. If |dom(R)| < |ran(R)|, then |ran(d(R))| < |dom(R)|. Proof. Assume |dom(R)| < |ran(R)|. Since d(R) is a partial function, |dom(d(R))| ≥ |ran(d(R))|. This implies R 6= d(R). Since R = σ$1∈dom(R) (R), Lemma 1 implies dom(R) ⊃ dom(d(R)). So |ran(d(R))| ≤ |dom(d(R))| < |dom(R)|. Lemma 3. d(σ$1∈D (R)) = σ$1∈D (d(R)). 2
This piece of terminology comes from descriptive complexity theory (Immerman, 1999), where the deterministic transitive closure of a binary relation is defined as the transitive closure of its deterministic reduct.
CISUC/TR 2014-02
114
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Proof. We have d(σ$1∈D (R)) = { (x, y) ∈ σ$1∈D (R) | ∀y 0 ((x, y 0 ) ∈ σ$1∈D (R) → y 0 = y) }
= { (x, y) ∈ R | x ∈ D ∧ ∀y 0 (((x, y 0 ) ∈ R ∧ x ∈ D) → y 0 = y) }
= { (x, y) ∈ R | x ∈ D ∧ ∀y 0 ((x, y 0 ) ∈ R → y 0 = y) } = σ$1∈D ({ (x, y) ∈ R | ∀y 0 ((x, y 0 ) ∈ R → y 0 = y) }) = σ$1∈D (d(R)).
Lemma 4. If dom(R0 ) = dom(R) and R0 ⊆ R, then d(R0 ) ⊇ d(R). Proof. Suppose dom(R0 ) = dom(R) and R0 ⊆ R. We have d(R0 ) = { (x, y) | x ∈ dom(R0 ) ∧ ∀y 0 ((x, y 0 ) ∈ R0 → y 0 = y) } = { (x, y) | x ∈ dom(R) ∧ ∀y 0 ((x, y 0 ) ∈ R0 → y 0 = y) } ⊇ { (x, y) | x ∈ dom(R) ∧ ∀y 0 ((x, y 0 ) ∈ R → y 0 = y) }
= d(R).
It is easy to see that RX1 . . . Xk , as defined by (5), is monotone in R: if R ⊆ R0 , then RX1 . . . Xk implies R0 X1 . . . Xk . It follows that RX1 . . . Xk if and only if there exists some R0 ⊆ R such that Xi = πi (R0 ) for i = 1, . . . , k. The following two lemmas are straightforward: Lemma 5. The following are equivalent: (i) R(n1 , . . . , nk ). 0
(ii) ∃R ⊆ R
k ^
i=1
0
!
|πi (R )| = ni .
Lemma 6. The following are equivalent: (i) R(n1 , . . . , nk ) |= R(m1 , . . . , mk ). (ii)
k ^
i=1
|πi (R)| = ni implies R(m1 , . . . , mk ).
Lemma 7. Let R be a k-ary relation. Let a ∈ πi (R) and R0 = σ$i6=a (R). Then we have πi (R0 ) = πi (R) − {a},
πj (R0 ) = πj (R) − (d(πj,i (R)))−1 (a)
CISUC/TR 2014-02
115
for j 6= i.
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Proof. It is easy to see that πi (R0 ) = πi (R) − {a}. Now let j 6= i and suppose y ∈ πj (R0 ). Then for some (x1 , . . . , xk ) ∈ R, xj = y and xi 6= a. Since (y, xi ) ∈ πj,i (R), we have (y, a) 6∈ d(πj,i (R)). So y ∈ πj (R) − (d(πj,i (R)))−1 (a). Conversely, suppose y ∈ πj (R) − (d(πj,i (R)))−1 (a). This implies that there must be some z 6= a such that (y, z) ∈ πj,i (R), which implies that for some (x1 , . . . , xk ) ∈ R, xi = z and xj = y. Since xi 6= a, (x1 , . . . , xk ) ∈ R0 . So y ∈ πj (R0 ). Lemma 8. If A = dom(R), B = ran(R), and ∅ 6= B 0 ⊂ B, then there exists a b ∈ B 0 such that |(d(R))−1 (b)| ≤ b(|A| − 1)/|B 0 |c. P Proof. Since |(d(R))−1 (B 0 )| = y∈B 0 |(d(R))−1 (y)|, it suffices to show (d(R))−1 (B 0 ) ⊂ A. Suppose (d(R))−1 (B 0 ) = A. Then dom(d(R)) = A and ran(d(R)) = B 0 . By Lemma 1, R = σ$1∈A (R) = σ$1∈dom(d(R)) (R) = d(R). But ran(d(R)) = B 0 6= B = ran(R), a contradiction.
3
Binary Cumulative Sentences
In what follows, we use letters k, l, m, n as variables ranging over the set N − {0} of positive integers. The following lemmas are straightforward: Lemma 9. If R(m, n) |= R(k, l), then one of the following holds: (i) m ≥ n and k ≥ l; (ii) m ≤ n and k ≤ l. Lemma 10. If R(m, n) |= R(k, l), then m ≥ k and n ≥ l. The following lemma is Proposition 1 of Shimada (to appear): Lemma 11. If m > n, then R(m, n) |= R(m − 1, n). Proof. Suppose m > n, and let R be a binary relation such that A = dom(R), B = ran(R), and |A| = m, |B| = n. By Lemmas 5 and 6, it suffices to show the existence of an R0 ⊆ R such that |dom(R0 )| = m − 1 and |ran(R0 )| = n. Since d(R−1 ) is a partial function from B to A and |A| > |B|, there must be an a ∈ A such that (d(R−1 ))−1 (a) = ∅. Let R0 = σ$16=a (R). By Lemma 7, dom(R0 ) = A − {a} and ran(R0 ) = B, and we are done. The integer part bm/nc of a fraction m/n is the quotient of m divided by n. We write m mod n for the remainder of m divided by n. We always have m = n · bm/nc + (m mod n). The following lemma is adapted from Proposition 2 of Shimada (to appear): Lemma 12. If m ≥ n > 1, then R(m, n) |= R(m − bm/nc, n − 1).
CISUC/TR 2014-02
116
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Proof. Suppose m ≥ n > 1, and let R be a binary relation such that A = dom(R), B = ran(R), and |A| = m, |B| = n. ByP Lemma 6, it suffices to show R(m − bm/nc, n − 1). −1 Since m = |A| ≥ |(d(R)) (B)| = y∈B |(d(R))−1 (y)|, there must be a b ∈ B such that |(d(R))−1 (b)| ≤ bm/nc. Let R0 = σ$26=b (R). By Lemma 7, dom(R0 ) = A − (d(R))−1 (b) and ran(R0 ) = B − {b}. Let m0 = |A − (d(R))−1 (b)|. Then m0 ≥ m − bm/nc. By Lemma 5, we have R(m0 , n − 1). Since m ≥ n, we have m − bm/nc ≥ n · bm/nc − bm/nc = (n − 1) · bm/nc
≥ n − 1. So Lemma 11 implies R(m − bm/nc, n − 1).
It is evident that the symmetric variants of Lemmas 11 and 12 hold as well: m < n implies R(m, n) |= R(m, n − 1) and 1 < m ≤ n implies R(m, n) |= R(m − 1, n − bn/mc). Combining all these, we see that R(m, n) |= R(m−1, n−bn/mc) holds whenever m > 1, and symmetrically, R(m, n) |= R(m − bm/nc, n − 1) holds whenever n > 1. Let us write R(m, n) ` R(k, l) if R(k, l) can be deduced from R(m, n) using the following rules of inference: R(m, n) m > 1 (R2–1) R(m − 1, n − bn/mc)
R(m, n) n > 1 (R2–2) R(m − bm/nc, n − 1)
Theorem 13 (Soundness). If R(m, n) ` R(k, l), then R(m, n) |= R(k, l). Lemma 14 (Characterization). Let m ≥ n. The following conditions are equivalent: (i) R(m, n) |= R(k, l). (ii) k ≤ m ∧ l ≤ n ∧ l ≤ k ≤ l · bm/nc + min(m mod n, l). (iii) R(m, n) ` R(k, l). Proof. (i) ⇒ (ii). By Lemmas 9 and 10, we only need to show k ≤ l · bm/nc + min(m mod n, l).
(8)
Let A = {a0 , . . . , am−1 }, B = {b0 , . . . , bn−1 }, and consider the binary relation R = { (ai , bj ) | j = i mod n }. Clearly, dom(R) = A and ran(R) = B, so R(m, n) holds. By the assumption (i), R(k, l). Let A0 ⊆ A, B 0 ⊆ B be such that |A0 | = k, |B 0 | = l, and RA0 B 0 . We must have [ A0 ⊆ { ai | j = i mod n }, bj ∈B 0
CISUC/TR 2014-02
117
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
so k≤
X
bj
∈B 0
|{ i | j = i mod n }|.
(9)
We distinguish two cases. Case 1. n divides m, i.e., m mod n = 0. In this case, it is clear that |{ i | j = i mod n }| = m/n = bm/nc holds for each j = 0, . . . , n − 1. By (9), k ≤ l · bm/nc, which implies (8). Case 2. n does not divide m, i.e., m mod n ≥ 1. In this case, ( bm/nc + 1 if 0 ≤ j < m mod n, |{ i | j = i mod n }| = bm/nc if m mod n ≤ j ≤ n − 1. Then (8) easily follows from (9). (ii) ⇒ (iii). By (R2–1), it suffices to prove R(m, n) ` R(l · bm/nc + min(m mod n, l), l).
(10)
We show this by induction on n − l. If n − l = 0, then l = n > m mod n, so l · bm/nc + min(m mod n, l) = n · bm/nc + (m mod n) = m, and (10) becomes R(m, n) ` R(m, n), which holds trivially. Now suppose n − l ≥ 1. The induction hypothesis says R(m, n) ` R((l + 1) · bm/nc + min(m mod n, l + 1), l + 1). Let k 0 = (l + 1) · bm/nc + min(m mod n, l + 1),
so that the right-hand side of the induction hypothesis becomes R(k 0 , l + 1). Since l + 1 > 1, an application of the rule (R2–2) gives R(k 0 , l + 1) ` R(k 0 − bk 0 /(l + 1)c, l). Combining this with the induction hypothesis, we get R(m, n) ` R(k 0 − bk 0 /(l + 1)c, l).
(11)
Case 1. m mod n < l + 1. Then k 0 = (l + 1) · bm/nc + (m mod n). Now k0 (l + 1) · bm/nc + (m mod n) = l+1 l+1 m mod n = bm/nc + . l+1
CISUC/TR 2014-02
118
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
It follows that bk 0 /(l + 1)c = bm/nc, since m mod n < l + 1. So k 0 − bk 0 /(l + 1)c = (l + 1) · bm/nc + (m mod n) − bm/nc = l · bm/nc + (m mod n).
(12)
Since m mod n < l + 1, m mod n ≤ l and min(m mod n, l) = m mod n. So (10) becomes R(m, n) ` R(l · bm/nc + (m mod n), l), which we can obtain from (11) by substituting (12) for k 0 − bk 0 /(l + 1)c. Case 2. m mod n ≥ l + 1. Then k 0 = (l + 1) · bm/nc + l + 1 = (l + 1) · (bm/nc + 1). So k 0 − bk 0 /(l + 1)c = (l + 1) · (bm/nc + 1) − (bm/nc + 1) = l · (bm/nc + 1) = l · bm/nc + l.
(13)
Since m mod n ≥ l + 1 > l, we have min(m mod n, l) = l, and (10) becomes R(m, n) ` R(l · bm/nc + l, l), which we can obtain from (11) by substituting (13) for k 0 − bk 0 /(l + 1)c. (iii) ⇒ (i). By Theorem 13. Theorem 15 (Completeness). R(m, n) ` R(k, l) if and only if R(m, n) |= R(k, l). Proof. Immediate form Lemma 14 and its symmetric variant for m ≤ n. Figure 2 shows a Hasse diagram for the partial order representing the entailment relation between binary cumulative sentences R(m, n) with m, n ≤ 8.
4
Ternary Cumulative Sentences
The entailment relation between ternary cumulative sentences seems to be much more complicated than the binary case, and we only have some partial results so far. Lemma 16. Suppose m ≥ n. Then {R(m, n, p)} ∪ { ¬R(m − i, n − j, p) | 0 ≤ i < j < n } is satisfiable.
CISUC/TR 2014-02
119
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(1, 8)
(2, 8)
(3, 8)
(4, 8)
(5, 8)
(6, 8)
(7, 8)
(8, 8)
(8, 7)
(8, 6)
(8, 5)
(8, 4)
(8, 3)
(8, 2)
(1, 7)
(2, 7)
(3, 7)
(4, 7)
(5, 7)
(6, 7)
(7, 7)
(7, 6)
(7, 5)
(7, 4)
(7, 3)
(7, 2)
(7, 1)
(1, 6)
(2, 6)
(3, 6)
(4, 6)
(5, 6)
(6, 6)
(6, 5)
(6, 4)
(6, 3)
(6, 2)
(6, 1)
(1, 5)
(2, 5)
(3, 5)
(4, 5)
(5, 5)
(5, 4)
(5, 3)
(5, 2)
(5, 1)
(1, 4)
(2, 4)
(3, 4)
(4, 4)
(4, 3)
(4, 2)
(4, 1)
(1, 3)
(2, 3)
(3, 3)
(3, 2)
(3, 1)
(1, 2)
(2, 2)
(2, 1)
(8, 1)
(1, 1)
Figure 2: A Hasse diagram of entailment between binary cumulative sentences (from Shimada, to appear). Proof. This set is satisfied by any binary relation R such that R(m, n, p) and π1,2 (R) is a function. Lemma 17. Suppose m ≤ n + p − 2. Then {R(m, n, p)} ∪ { ¬R(m0 , n, p) | m0 < m } is satisfiable. Proof. Let A = {a1 , . . . , am }, B = {b1 , . . . , bn }, C = {c1 , . . . , cp }, and R = { (amin(j,m) , bj , cp ) | 1 ≤ j ≤ n − 1 } ∪
{ (amin(n−1+k,m) , bn , ck ) | 1 ≤ k ≤ p − 1 }.
Then π1 (R) = A, π2 (R) = B, π3 (R) = C, so R(m, n, p). Note that for all x ∈ A, either (d(π2,1 (R)))−1 (x) 6= ∅ or (d(π3,1 (R)))−1 (x) 6= ∅. Suppose m0 < m and R(m0 , n, p), so that π1 (R0 ) ⊂ A, π2 (R0 ) = B, π3 (R0 ) = C for some R0 ⊆ R. Let a ∈ A − π1 (R0 ). Then σ$16=a (R) ⊇ R0 , so π2 (σ$16=a (R)) = B and π3 (σ$16=a (R)) = C. Lemma 7 then implies (d(π2,1 (R)))−1 (a) = (d(π3,1 (R)))−1 (a) = ∅, a contradiction. Proposition 18. The following inference rule is valid: R(m, n, p)
CISUC/TR 2014-02
m>n m>p m>n+p−2 (R3–1) R(m − 1, n, p)
120
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Proof. Assume m > n, m > p, m > n + p − 2, and let R be a ternary relation such that A = π1 (R), B = π2 (R), C = π3 (R) and |A| = m, |B| = n, |C| = p. We show that there exists an R0 ⊆ R such that |π1 (R0 )| = m − 1, |π2 (R0 )| = n, |π3 (R0 )| = p. Let S = d(π2,1 (R)) and T = d(π3,1 (R)). Since m > n and m > p, Lemma 2 implies |ran(S)| ≤ n − 1 and |ran(T )| ≤ p − 1. Since m > n + p − 2, it follows that there is an a ∈ A − (ran(S) ∪ ran(T )). Let R0 = σ$16=a (R). Since S −1 (a) = T −1 (a) = ∅, we get π1 (R0 ) = A − {a}, π2 (R0 ) = B, and π3 (R0 ) = C by Lemma 7. Lemma 19. Suppose m − 1 ≥ 2(n − p + 1) and p ≥ 2. Then {R(m, n, p)} ∪ { ¬R(m − 1, n0 , p) | n0 < n } is satisfiable. Proof. If n ≤ p, it is easy to see that the conclusion of the lemma holds, so let us suppose n > p. Let A = {a0 , . . . , am−1 }, B = {b0 , . . . , bn−1 }, C = {c0 , . . . , cp−1 }, and define R = { (ai , bj , ck ) | (i ≤ m − 2 ∧ j = i mod (n − p + 1) ∧ k = 0) ∨ (i = m − 1 ∧ j = n − p + k ∧ k ≥ 1) }.
It is clear that A = π1 (R), B = π2 (R), C = π3 (R), so R(m, n, p) holds. For each j ≤ n − p, {aj , an−p+1+j } ⊆ (d(π1,2 (R)))−1 (bj ), and for each j ≥ n − p + 1, cj−(n−p) ∈ (d(π3,2 (R)))−1 (bj ). This property ensures that R(m − 1, n0 , p) does not hold for any n0 < n. Proposition 20. Suppose n−1 ≥ 2(m−p+1), m ≥ n, and p ≥ 2. Then R(m, n, p) |= R(m0 , n0 , p) implies m0 = m and n0 = n. Proof. Suppose R(m, n, p) |= R(m0 , n0 , p). By Lemma 16, m − m0 ≥ n − n0 . In particular, m0 = m implies n0 = n. Assume m0 < m. Since n−1 +p−1 2 ≤ n + p − 2,
m≤ Lemma 17 applies, so n0 < n.
CISUC/TR 2014-02
121
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Let R be a ternary relation such that A = π1 (R), B = π2 (R), C = π3 (R), and |A| = m, |B| = n, |C| = p. By Lemmas 5 and 6, there exist R0 ⊆ R and A0 , B 0 such that π1 (R0 ) = A0 , π2 (R0 ) = B 0 , π3 (R0 ) = C, and |A0 | = m0 , |B 0 | = n0 . Let b1 , . . . , bn−n0 be the elements of B − B 0 , and for each i = 1, . . . , n − n0 , pick ai and ci such that (ai , bi , ci ) ∈ R. Let R00 = R0 ∪ {(a1 , b1 , c1 ), . . . , (an−n0 −1 , bn−n0 −1 , cn−n0 −1 )}. Then R00 ⊆ R, and it is easy to see |π1 (R00 )| ≤ m0 + n − n0 − 1,
|π2 (R00 )| = n − 1,
|π3 (R00 )| = p. Since m − m0 ≥ n − n0 , we have
m0 + n − n0 − 1 ≤ m0 + m − m0 − 1 = m − 1.
This shows that R(m00 , n − 1, p) holds for some m00 < m whenever R(m, n, p) holds. On the other hand, since n − 1 ≥ 2(m − p + 1) and p ≥ 2, a variant of Lemma 19 says {R(m, n, p)} ∪ { ¬R(m0 , n − 1, p) | m0 < m } is satisfiable, a contradiction. For example, (m, n, p) = (15, 13, 10) satisfies the conditions of Proposition 20, so R(15, 13, 10) |= R(m0 , n0 , 10) only if m0 = 15 and n0 = 13. Proposition 21. The following inference rule is valid: R(m, n, p)
2(n − p + 1) ≥ m > p 2(m − p + 1) ≥ n > p R(m − 1, n − 1, p)
p≥2
(R3–2)
Proof. Suppose 2(n−p+1) ≥ m > p, 2(m−p+1) ≥ n > p, and p ≥ 2. Let R be a ternary relation such that A = π1 (R), B = π2 (R), C = π3 (R), and |A| = m, |B| = n, |C| = p. We show that there is an R0 ⊆ R such that |π1 (R0 )| = m − 1, |π2 (R0 )| = n − 1, and |π3 (R0 )| = p. Since n > p, Lemma 2 implies |ran(d(π3,2 (R)))| ≤ p − 1. So |B − ran(d(π3,2 (R)))| ≥ n − p + 1. Let B 0 be a subset of B − ran(d(π3,2 (R))) with |B 0 | = n − p + 1. Since p ≥ 2, B 0 ⊂ B. By Lemma 8, there is a b ∈ B 0 such that |(d(π1,2 (R)))−1 (b)| ≤ b(m − 1)/(n − p + 1)c. Since 2(n − p + 1) ≥ m, we have b(m − 1)/(n − p + 1)c ≤ 1, so (d(π1,2 (R)))−1 (b) is either empty or a singleton. Let R1 = σ$26=b (R).
CISUC/TR 2014-02
122
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Since b 6∈ ran(d(π3,2 (R))), we have (d(π3,2 (R)))−1 (b) = ∅. By Lemma 7, π2 (R1 ) = B − {b} and π3 (R1 ) = C. Case 1. (d(π1,2 (R)))−1 (b) = {a} for some a ∈ A. By Lemma 7, we have π1 (R1 ) = A − {a}, and we are done. Case 2. (d(π1,2 (R)))−1 (b) = ∅. By Lemma 7, we have π1 (R1 ) = A. By Lemma 2, |ran(d(π3,1 (R1 )))| ≤ p − 1, so |A − ran(d(π3,1 (R1 )))| ≥ m − p + 1. Let A0 be a subset of A − ran(d(π3,1 (R1 ))) with |A0 | = m − p + 1. By Lemma 8, we can find an a0 ∈ A0 such that |(d(π2,1 (R)))−1 (a0 )| ≤ b(n − 1)/(m − p + 1)c ≤ 1. So (d(π2,1 (R)))−1 (a0 ) is either empty or a singleton. Since a0 6∈ ran(d(π3,1 (R1 ))), we have (d(π3,1 (R1 )))−1 (a0 ) = ∅. Case 2.1. (d(π2,1 (R)))−1 (a0 ) = {b0 } for some b0 ∈ B. Let R2 = σ$16=a0 (R). We know that (d(π3,1 (R1 )))−1 (a0 ) = ∅. But since dom(π3,1 (R1 )) = π3 (R1 ) = C = π3 (R) = dom(π3,1 (R)), Lemma 4 implies d(π3,1 (R1 )) ⊇ d(π3,1 (R)). It follows that (d(π3,1 (R)))−1 (a0 ) = ∅ as well. By Lemma 7, we have π1 (R2 ) = A − {a0 }, π2 (R2 ) = B − {b0 }, and π3 (R2 ) = C. Case 2.2. (d(π2,1 (R)))−1 (a0 ) = ∅. Since it is easy to see π2,1 (R1 ) = π2,1 (σ$26=b (R)) = σ$16=b (π2,1 (R)), Lemma 3 implies d(π2,1 (R1 )) = σ$16=b (d(π2,1 (R))). This means that (d(π2,1 (R1 )))−1 (x) = (d(π2,1 (R)))−1 (x) − {b} for every x ∈ A. In particular, (d(π2,1 (R1 )))−1 (a0 ) = ∅ − {b} = ∅. Let R3 = σ$16=a0 (R1 ). Since we also know that (d(π3,1 (R1 )))−1 (a0 ) = ∅, Lemma 7 gives π1 (R3 ) = π1 (R1 ) − {a0 } = A − {a0 }, π2 (R3 ) = π2 (R1 ) = B − {b}, and π3 (R3 ) = π3 (R1 ) = C. Proposition 22. The following inference rule is valid: R(m, n, p)
m − b(m − 1)/(n − p + 1)c ≥ n + p − 3 R(m − b(m − 1)/(n − p + 1)c, n − 1, p)
n>p≥2
(R3–3)
Proof. Suppose that m − b(m − 1)/(n − p + 1)c ≥ n + p − 3 and n > p ≥ 2. Let R be a ternary relation such that A = π1 (R), B = π2 (R), C = π3 (R), and |A| = m, |B| = n, |C| = p. We show that R(m − b(m − 1)/(n − p + 1)c, n − 1, p) holds. Let S = d(π3,2 (R)) and T = d(π1,2 (R)). Since n > p, |B − ran(S)| ≥ n − p + 1 by Lemma 2. Since p ≥ 2, n − p + 1 < n. By Lemma 8, there is a b ∈ B − ran(S) such that |T −1 (b)| ≤ b(m−1)/(n−p+1)c. Since b 6∈ ran(S), S −1 (b) = ∅. Let R0 = σ$26=b (R). By Lemma 7, π1 (R0 ) = A−T −1 (b), π2 (R0 ) = B−{b}, π3 (R0 ) = C, so R(m0 , n−1, p) for some m0 ≥ m−b(m−1)/(n−p+1)c. Since m−b(m−1)/(n−p+1)c ≥ n+p−3 ≥ (n−1)+p−2, Lemma 18 implies R(m − b(m − 1)/(n − p + 1)c, n − 1, p).
CISUC/TR 2014-02
123
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
References Neil Immerman. Descriptive Complexity. Springer, Berlin, 1999. Manfred Krifka. At least some determiners aren’t determiners. In Ken Turner, editor, The Semantics/Pragmatics Interface from Different Points of View, pages 257–291. Elsevier, Amsterdam, 1999. Remko J.H. Scha. Distributive, collective and cumulative quantification. In Jeroen Groenendijk, Theo Janssen, and Martin Stokhof, editors, Truth, Interpretation and Information, pages 131–158. Foris Publications, Dordrecht, 1984. Junri Shimada. Entailments and implicatures of cumulative sentences. In MIT Working Papers in Linguistics, to appear. Jeffrey D. Ullman. Priciples of Database and Knowledge-Base Systems, Volume I. Computer Science Press, Rockville, MD, 1988.
CISUC/TR 2014-02
124
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Modelling implicit dynamic introduction of function symbols in mathematical texts Marcos Cramer University of Luxembourg
[email protected] Abstract The specialized language of mathematics has a number of linguistically and logically interesting features. One of them, which to our knowledge has not been systematically studied before, is the implicit dynamic introduction of function symbols, exemplified by constructs of the form “for every x there is an f (x) such that . . . ”. We present an extension of Groenendijk and Stokhof’s Dynamic Predicate Logic – Typed Higher-Order Dynamic Predicate Logic – which formally models this feature of the language of mathematics. Furthermore, we illustrate how the implicit dynamic introduction of function symbols is treated in the proof checking algorithm of the Naproche system.
Contents 1 Introduction
1
2 Dynamic quantification and Dynamic Predicate Logic 2.1 DPL semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 3
3 Implicit dynamic function introduction
4
4 Typed Higher-Order Dynamic Predicate Logic 4.1 THODPL semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 6
5 Implicit dynamic function introduction and proof-checking
8
6 Conclusion
1
10
Introduction
Like other sciences, mathematics has developed its own specialized language, which we call the language of mathematics. This specialized language has a number of linguistically and logically interesting features: For example, on the syntactic level, it can incorporate complex symbolic material into natural language sentences. On the semantic level, it refers to rigorously defined abstract objects, and is in general less open to ambiguity than most other text types. On the pragmatic level, it reverses the
CISUC/TR 2014-02
125
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
expectation on assertions, which have to be implied by the context rather than adding new information to it. The work presented in this paper has been conducted in the context of the Naproche project, an interdisciplinary project at the universities of Bonn and Duisburg-Essen which analyses the language of mathematics with methods from mathematical logic and computational and formal linguistics (see section 1.4 of [5]). The main aim of the Naproche project has been to develop a controlled natural language (CNL) – i.e. a subset of a natural language defined through a formal grammar – for writing mathematical texts, and a computer program, the Naproche system, that can check the correctness of mathematical proofs written in this CNL. For the development of this CNL and this system, we had to develop new linguistic and logical machinery, which is thoroughly discussed in the author’s PhD thesis [5]. In this paper we focus on one particular phenomenon of the language of mathematics, the implicit dynamic introduction of function symbols, and show how it can be modelled in a formal system. To our knowledge, this phenomenon has not been previously described in the literature or formally modelled in a formalism.1 Since this phenomenon is a special case of the phenomenon of dynamic quantification, we first briefly discuss this well-known phenomenon and one of the standard solutions to it, Groenendijk and Stokhof’s Dynamic Predicate Logic.
Dynamic quantification and Dynamic Predicate Logic
2
When translating natural language sentences to standard first-order formulae, there is a problem in the treatment of natural language quantifiers. For example, the indefinite article a is normally translated by an existential quantifier, but the natural translation (2) of (1) has universal quantifiers corresponding to the two occurrences of the indefinite article: (1) If a farmer owns a donkey, he beats it.2 (2) ∀x∀y (farmer(x) ∧ donkey(y) ∧ owns(x, y) → beats(x, y)) Additionally, these universal quantifiers have wider scope than the implication sign, even though in 1 the indefinite articles are in the antecedent of the conditional. Because of this difference between the functioning of quantifiers in natural language and in standard first-order logic, formal linguists say that natural language has dynamic quantifiers, whereas standard first-order logic has static quantifiers. Jeroen Groenendijk and Martin Stokhof have developed a variant of first-order logic called 1
The only exception being the author’s paper [4], which sketched part of the material presented in this paper. 2 This example sentence is one of a number of standard examples from the linguistic literature about dynamic quantification, which are usually called donkey sentences. Donkey sentences were originally introduced by Peter Geach [6].
CISUC/TR 2014-02
126
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Dynamic Predicate Logic (DPL) [7] which has dynamic instead of static quantifiers. In DPL, 1 can be translated as 3: (3) ∃x (farmer(x) ∧ ∃y (donkey(y) ∧ owns(x, y))) → beats(x, y) The syntax of DPL is identical to that of standard first-order predicate logic (PL), but its semantics is defined in such a way that (3) is equivalent to (2) in DPL. The natural language quantification used in mathematical texts also exhibits these dynamic features, as can be seen in the following quotation from [8, p. 36]: If a space X retracts onto a subspace A, then the homomorphism i∗ : π1 (A, x0 ) → π1 (X, x0 ) induced by the inclusion i : A ,→ X is injective.
2.1
DPL semantics
We present DPL semantics in a way slightly different but logically equivalent to its definition in [7]. Structures and assignments are defined as for standard first-order logic: A structure S specifies a domain |S| and an interpretation aS for every constant, function or relation symbol a in the language. An S-assignment is a function from variables to |S|. GS is the set of S-assignments. Given two assignments g, h, we define g[x]h to mean that g differs from h at most in what it assigns to the variable x. Given a DPL term t, we recursively define if t is a variable, g(t) g S [t]S := t if t is a constant symbol, S g g f ([t1 ]S , . . . , [tn ]S ) if t is of the form f (t1 , . . . , tn ). In [7], DPL semantics is defined via an interpretation function J•KS from DPL formulae to subsets of GS × GS . We instead recursively define for every g ∈ GS an interpretation function J•KgS from DPL formulae to subsets of GS :3 1. J>KgS := {g}
2. Jt1 = t2 KgS := {h|h = g and [t1 ]gS = [t2 ]gS }4
3. JR(t1 , . . . , t2 )KgS := {h|h = g and ([t1 ]gS , . . . , [t2 ]gS ) ∈ RS } 4. J¬ϕKgS := {h|h = g and there is no k ∈ JϕKhS }
5. Jϕ ∧ ψKgS := {h|there is a k s.t. k ∈ JϕKgS and h ∈ JψKkS }
6. Jϕ → ψKgS := {h|h = g and for all k s.t. k ∈ JϕKhS , there is a j s.t. j ∈ JψKkS }
7. J∃x ϕKgS := {h|there is a k s.t. k[x]g and h ∈ JϕKkS }
ϕ∨ψ and ∀x ϕ are defined to be a shorthand for ¬(¬ϕ∧¬ψ) and ∃x > → ϕ respectively. 3
This can be viewed as a different currying of the uncurried version of the interpretation function in [7]. 4 The condition h = g in cases 2, 3, 4 and 6 implies that the defined set is either ∅ or {g}.
CISUC/TR 2014-02
127
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
3
Implicit dynamic function introduction
Functions are often dynamically introduced in an implicit way in mathematical texts. For example, [10, p. 1] introduces the additive inverse function on the reals as follows: (4) For each a there is a real number −a such that a + (−a) = 0. Here the natural language quantification “there is a real number −a” locally (i.e. inside the scope of “For each a”) introduces a new real number to the discourse. But since the choice of this real number depends on a and we are universally quantifying over a, it globally (i.e. outside the scope of “For each a”) introduces a function “−” to the discourse. The most common form of implicitly introduced functions are functions whose argument is written as a subscript, as in the following example: (5) Since f is continuous at t, there is an open interval It containing t such that |f (x) − f (t)| < 1 if x ∈ It ∩ [a, b]. [10, p. 62] If one wants to later explicitly call the implicitly introduced function a function (or a map), the standard notation with a bracketed argument is preferred: (6) Hence for each u ∈ Rn there is a number f (u) ∈ C with f (u) 6= 0 such that (σ(α(u))3 , σ(α(u)) Σ(α(u)), T (α(u))) = f (u)(x1 (u), x2 (u), x3 (u)). The function f is locally a quotient of continuous functions, so it is itself continuous. [2, p. 489] (7) Suppose that, for each vertex v of K, there is a vertex g(v) of L such that f (stK (v)) ⊂ stL (g(v)). Then g is a simplicial map V (K) → V (L), and |g| w f . [9, p. 19] (8) Since the multi-map Φ−1 is surjective, for every x ∈ X there is a point f (x) ∈ Y with x ∈ Φ−1 (f (x)), which is equivalent to f (x) ∈ Φ(x). It follows from the bornologity of Φ that the map f : X → Y is bornologous. [1, p. 5] When no uniqueness claims are made about the object locally introduced to the discourse, implicit function introduction presupposes the existence of a choice function, i.e. presupposes the Axiom of Choice. We hypothesize that the naturalness of such implicit function introduction in mathematical texts contributes to the wide-spread feeling that the Axiom of Choice must be true. Implicitly introduced functions are generally partial functions, i.e. they have a restricted domain and are not defined on the whole universe of the discourse. For example in (7), g is only defined on vertices of K and not on vertices of L. Implicit function introduction can also be used to introduce multi-argument functions. For example, subtraction on the reals could be introduced by a sentence of the following form:
CISUC/TR 2014-02
128
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(9) For all reals a, b, there is a real a − b such that (a − b) + b = a. For the sake of simplicity and brevity, we restrict ourselves to unary functions for the rest of this paper. See section 5.1 of [5] for an account of how to extend the presented formalization of implicit dynamic function introduction to multi-argument and curried functions.
4
Typed Higher-Order Dynamic Predicate Logic
Typed Higher-Order Dynamic Predicate Logic (THODPL) extends DPL to a higherorder system that formalizes the implicit dynamic introduction of function symbols. The type system used in THODPL is Church’s Simple Type Theory [3] with two base types i (for objects) and o (for propositions), and a function type T1 → T2 for any two types T1 , T2 . We assume a countably infinite supply of variables and constants of each type. In the examples below we use x and y as variables of the basic type i, f as a variable of the function type i → i and R as a constant of type i → (i → o). The distinctive feature of THODPL syntax is that it allows not only variables but any well-formed terms to come after quantifiers. Writing tT for a well-formed term of type T , we can define THODPL formulae as follows: ϕ ::= to |tT = tT |def(tT )|>|∃tT ϕ|¬ϕ|ϕ ∧ ϕ|ϕ → ϕ The intended meaning of def(tT ) is that tT is a defined term. This is needed because THODPL allows for partial functions and hence for undefined terms. Similarly as in DPL, ϕ ∨ ψ and ∀t ϕ are defined to be a shorthand for ¬(¬ϕ ∧ ¬ψ) and ∃t > → ϕ respectively. Instead of R(t1 )(t2 ), we also write R(t1 , t2 ) in uncurried notation. Since terms can come after quantifiers, (11) is a well-formed THODPL formula and can be considered the THODPL formalization of (10): (10) For every x, there is an f (x) such that R(x, f (x)). (11) ∀x ∃f (x) R(x, f (x)) But what is the intended semantics of (11)? The truth conditions of (11) should turn out to be equivalent to those of (12), but given what we have said about implicit dynamic function introduction in the language of mathematics in section 3, unlike (12), (11) dynamically introduces the function symbol f to the context, and should hence be essentially equivalent to (13). (12) ∀x ∃y R(x, y) (13) ∃f (∀x R(x, f (x))) We will come back to this example when clarifying the semantics of THODPL after its formal definition.
CISUC/TR 2014-02
129
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
4.1
THODPL semantics
Since implicitly introduced functions can be partial, THODPL does not assume function variables to refer to total functions. Partial functions can give rise to undefined terms, so we need to handle these in THODPL semantics. This is done by extending the domain of discourse by an object u used as the value of undefined terms. This motivates the following two definitions: Definition 4.1. Let D be a non-empty set. Fix some u ∈ / D. Then we define DT for every type T inductively as follows: Di := D Do := {>, ⊥} DT1 →T2 := (DT2 ∪ {u})DT1 Definition 4.2. A THODPL structure is a pair S = (D, F ), where D is a non-empty set called the domain of S and F is a map that assigns to every constant of type T an element of DT . In order to handle quantifiers followed by complex terms, assignments in THODPL can not only assign values to variables, but also to complex terms: Definition 4.3. Given a THODPL S structure S = (D, F ), an S-assignment is a partial function g from THODPL terms to T DT satisfying the following two properties: • g is defined on all variables.
• For every type T and every term t of type T , if g(t) is defined, then g(t) ∈ DT . We denote the set of S-assignments by GS . Definition 4.4. Given two assignments g and h, we define g[t1 , . . . , tn ]h to mean that dom(g) = dom(h) ∪ {t1 , . . . , tn } and for all s ∈ dom(h) \ {t1 , . . . , tn }, g(s) = h(s). Now we are ready to present the semantics of THODPL in two definitions, one for THODPL terms and one for THODPL formulae: Definition 4.5. Given a THODPL structure S = (D, F ) and an S-assignment g, we recursively define [t]gS for THODPL terms t as follows: g(t) if t is a variable, F (t) if t is a constant symbol, [t]gS := g g [t0 ]S ([t1 ]S ) if t is of the form t0 (t1 ) and [ti ]gS 6= u for i ∈ {1, 2}, u if t is of the form t0 (t1 ) and [ti ]gS = u for some i ∈ {1, 2}.
Definition 4.6. Given a THODPL structure S = (D, F ) and an S-assignment g, we recursively define for every g ∈ GS an interpretation function J•KgS from THODPL formulae to subsets of GS :
CISUC/TR 2014-02
130
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
( {g} if [t]gS = >S 1. JtKgS := ∅ otherwise ( {g} if [t1 ]gS = [t2 ]gS g 2. Jt1 = t2 KS := ∅ otherwise ( {g} if [t]gS 6= u g 3. Jdef(t)KS := ∅ otherwise 4. J>KgS := {g}
5. J∃t ϕKgS := {h | there is a k such that k[t]g and h ∈ JϕKkS } ( {g} if there is no h such that h ∈ JϕKgS g 6. J¬ϕKS := ∅ otherwise
7. Jϕ ∧ ψKgS := {h | there is a k such that k ∈ JϕKgS and h ∈ JψKkS }
8. Jϕ → ψKgS := {h|h differs from g in at most some function variables f1 , . . . , fn (where this choice of function variables is maximal), and there is a variable x such that for all k ∈ JϕKgS , there is an assignment j ∈ JψKkS such that j(fi (x)) = h(fi )(k(x)) for 1 ≤ i ≤ n, and if n > 0 then k[x]g }
In order to make case 8 of the definition more comprehensible, let us consider its role in determining the semantics of (11), i.e. of ∃x > → ∃f (x) R(x, f (x)): J∃f (x) R(x, f (x))KkS is the set of assignments j satisfying R(x, f (x)) (i.e. for which JR(x, f (x))KjS is non-empty) such that j[f (x)]k . J∃x >KgS is the set of assignments k such that k[x]g. So by case 8 with n = 1, J∃x > → ∃f (x) R(x, f (x))KgS = {h|h[f ]g and there is a variable x such that for all k such that k[x]g, there is an assignment j satisfying R(x, f (x)) such that j[f (x)]k and j(f (x)) = h(f )(k(x)), and k[x]g} = {h|h[f ]g and for all k such that k[x]g, there is an assignment j satisfying R(x, f (x)) such that j[f (x)]k and j(f (x)) = h(f )(k(x))} = {h|h[f ]g and for all k such that k[x]h, k satisfies R(x, f (x))} = J∃f (∀x R(x, f (x)))KgS
CISUC/TR 2014-02
131
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
The truth condition of a formula ϕ under (S, g) is determined by JϕKgS being empty or non-empty (with emptiness corresponding to falsehood). So the claim made above that (11) and (12) have the same truth conditions means that J∃x > → ∃f (x) R(x, f (x))KgS is empty iff J∃x > → ∃y R(x, y)KgS is empty. This can be shown from the definitions as follows: J∃x > → ∃f (x) R(x, f (x))KgS 6= ∅
iff {h | h[f ]g and for all k such that k ∈ J∃x >KgS , there is an assignment j ∈ J∃f (x) R(x, f (x))KgS such that j(f (x)) = h(f )(k(x))} = 6 ∅
iff {h | h[f ]g and for all k such that k[x]g, there is an assignment j such that j[f (x)]k, j ∈ JR(x, f (x))KgS and j(f (x)) = h(f )(k(x))} = 6 ∅
iff {h | h[f ]g and for all k such that k[x]g, F (R)(k(x), h(f )(k(x))) = >} = 6 ∅ (where we write S = (D, F ) again) iff there is a function f¯ such that for all k such that k[x]g, F (R)(k(x), f¯(k(x))) = > iff for all k such that k[x]g, there is a y¯ such that F (R)(k(x), y¯) = > (the right-to-left implication follows from the Axiom of Choice) iff J∃x > → ∃y R(x, y)KgS 6= ∅.
5
Implicit dynamic function introduction and proof-checking
The Naproche system mentioned in the introduction can check mathematical texts written in a controlled natural language. It is interesting to see how implicit dynamic function introduction can be used to introduce functions to the discourse without having to explicitly prove their existence. For this, we first briefly sketch the general working of the proof-checking implemented in the Naproche system. For checking single proof steps, the Naproche system makes use of state-of-theart automated theorem provers (ATPs) for standard first-order logic. Given a set of premises 5 Γ and a conjecture ϕ, an ATP tries to find either a proof that the Γ logically implies ϕ, or build a model for Γ ∪ {¬ϕ}, which shows that they do not imply it. A conjecture together with a set of premises handed to an ATP is called a proof obligation. We denote the proof obligation consisting of premises Γ and conjecture ϕ by Γ `? ϕ. 5
In the ATP community, the term “axiom” is usually used for what we call “premise” here; the reason for our deviation from the standard terminology is that in the context of our work the word “axiom” means a completely different thing, namely an axiom stated inside a mathematical text that is to be checked by the Naproche system. The premises that we are considering here can originate either from axioms, from definitions, from assumptions or from previously proved results.
CISUC/TR 2014-02
132
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
The proof checking algorithm keeps track of a list of first-order premises considered to be true. This list gets updated continuously during the checking process. Each assertion is checked by an ATP based on the currently active premises. We illustrate the functioning of the proof checking algorithm on an example. Suppose that the Naproche system has to check the text in (14). (15) is the PTL translation of (14), i.e. the input to the proof checking algorithm. (14) Assume n is a square number. So there is a k such that n = k 2 . If n is even, then k is even, i.e. 4 divides n. (15) (square(n) → ∃k n = k 2 ∧ (even(n) → (even(k) ∧ divides(4, n)))) Suppose further that Γ is the set of premises that is active on encountering this text fragment (i.e. Γ is the set of premises that the proof-checking algorithm has learned from previous text parts). Then the proof checking algorithm will check (15) by sending the three proof obligations (16), (17) and (18) to an ATP: (16) Γ, square(n) `? ∃k n = k 2 (17) Γ, square(n), n = k 2 , even(n) `? even(k) (18) Γ, square(n), n = k 2 , even(n), even(k) `? divides(4, n) Let us first explain the functioning of the algorithm in terms of how it works on a Naproche CNL text like (14): Reading in the assumption “Assume n is a square number”, the proof checking algorithm adds “square(n)” to the premise list. It then checks the rest of the text, which is completely inside the scope of this assumption, using this extended premise list. (16) is the proof obligation that checks the sentence “So there is a k such that n = k 2 ”. Having checked the existence of a k with n = k 2 , n = k 2 is then added to the premise list for checking the final sentence. Note that we do not add the checked existentially quantified formula ∃k n = k 2 to the premise list, but its Skolemized form n = k 2 . This ensures that the k in this new premise corefers with the variable k in the final sentence. (Adding ∃k n = k 2 to the premise list would be like adding ∃m n = m2 for some unused variable m.) When checking the final sentence, the translation “even(n)” of the antecedent “n is even” of the implication is added to the premise list, and the two parts of the consequence are checked one after the other. The Naproche system does not check the Naproche CNL input directly, but first translates it into a semantic representation format which is an extension of THODPL. The proof checking algorithm is then defined on input in this extension of THODPL (see chapter 6 of [5]). In order to show how the proof checking works in the context of implicitly introduced function symbols, we reconsider the example sentence (10) and the three example formulae whose relationship we described in section 4:
CISUC/TR 2014-02
133
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(10) For every x, there is an f (x) such that R(x, f (x)). (11) ∀x ∃f (x) R(x, f (x)) (12) ∀x ∃y R(x, y) (13) ∃f (∀x R(x, f (x))) For checking (10), i.e. formally for checking (11), the conjecture ∃y R(x, y) will be checked under the active premise list (where y is some fresh variable). The premise list is then extended by ∀x R(x, f (x)); here the y that has just been proved to exist is replaced by its Skolemization f (x), which is actually the form it had in (10) and (11). So when checking (11), it behaves like (12), whereas the premise that is added after having checked it is the same as the premise added after having checked (13). This corresponds to the semantic relationship between these formulae discussed in section 4. Note that in this way, functions can be introduced to the discourse without having to check a proof obligation whose conjecture explicitly asserts the existence of a function.
6
Conclusion
The phenomenon of implicit dynamic introduction of function symbols discussed in this paper is interesting both from a theoretical and from a practical perspective: From a theoretical perspective, it is an interesting case of dynamic quantification which needs to be taken into consideration if one wants to give a full account of the nature of quantification in natural language (at least if one accepts specialized languages like the language of mathematics as instances of natural language). From a practical point of view, developers of a formal mathematics computer system can make their system more easily usable by allowing for implicit dynamic introduction of function symbols in the input language and treating it similarly as described in section 5 above. In this way, users can introduce new function symbols to the discourse in a way that is common practice among mathematicians and without having to explicitly prove the existence of the function. The Naproche system is a formal mathematics system in which the implicit dynamic introduction of function symbols has already been implemented. To our knowledge, the phenomenon of implicit dynamic introduction of function symbols has not been previously described in the linguistic or logical literature. In this paper, we have presented a formalization of this phenomenon in a higher-order extension of Dynamic Predicate Logic, and have illustrated its functioning in the proof checking algorithm of the Naproche system.
References [1] T. Banakh and I. Zarichnyy. The coarse classification of homogeneous ultra-metric spaces. Preprint, 2008. arXiv:0801.2132.
CISUC/TR 2014-02
134
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
[2] M. Bonk. On the second part of Hilbert’s fifth problem. Mathematische Zeitschrift, 210(1), 1992. [3] A. Church. A formulation of the simple theory of types. Journal of Symbolic Logic, 5:56–68, 1940. [4] M. Cramer. Implicit dynamic function introduction and its connections to the foundations of mathematics. In O. Prosorov, editor, Proceedings of the International interdisciplinary conference on Philosophy, Mathematics, Linguistics: Aspects of Interaction (PhML 2012), pages 35–42, 2012. [5] Marcos Cramer. Proof-checking mathematical texts in controlled natural language. PhD thesis, University of Bonn, 2013. [6] Peter Thomas Geach. Reference and Generality. An Examination of Some Medieval and Modern Theories. Cornell University Press, Ithaca, NY, 1962. [7] J. Groenendijk and M. Stokhof. Dynamic Predicate Logic. Linguistics and Philosophy, 14(1):39–100, 1991. [8] A. Hatcher. Algebraic Topology. Cambridge University Press, Cambridge, 2002. [9] M. Lackenby. Topology and Groups, 2008. [10] W. Trench. Introduction to Real Analysis. Prentice Hall, Upper Saddle River, NJ, 2003.
CISUC/TR 2014-02
135
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
CISUC/TR 2014-02
136
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
On Translating Context-Free Grammars into Lambek Categorial Grammars Stepan Kuznetsov Steklov Mathematical Institute, Russian Academy of Sciences
[email protected] Abstract We modify Buszkowski’s translation of context-free grammars in Chomsky normal form to Lambek categorial grammars in order to deal with Montague-style semantics properly.
1
Types, Terms, and Lambek Categorial Grammars
Introduce the notion of semantic type in the following way. Let B be a (countable) set of basic types. Semantic types are built from basic types using one operation “→”. We denote the set of all semantic types by T. Types are populated with terms (also called λ-terms): for every semantic type there is a countable number of variables of that type; we can also allow constants—they behave just like variables, but for them substitution and λ-abstraction is prohibited. If f is of type A → B and u is of type A, then (f u) is of type B; if v is of type B and x is a variable of type A, then λx.v is of type A → B. If term u is of type A, then we write “u : A”. By u[x := v] we denote the result of substituting term v for all free (not under λ) occurrences of variable x. The set of all terms will be denoted by Tmλ . Terms are considered modulo α-, β- and η-equivalence: one can freely rename bound variables and perform reductions on subterms: (λx.u)v ;β u[x := v] (if free variables of v are not bound in u) and (λx.(ux)) ;η u, if x is not a free variable of u. On the other hand, we also have syntactic types. They are built from a set of primitive types Pr using two binary connectives \ and / (left and right division). The set of all syntactic types is denoted by Tp (more precisely, Tp(\, /)). We introduce a translation σ : Tp → T of syntactic types to semantic ones. On Pr this function is defined arbitrarily (the image of a primitive syntactic type need not be a basic type), and then σ is uniquely propagated to Tp: σ(A \ B) = σ(B / A) = σ(A) → σ(B). We denote both syntactic and semantic types by capital Latin letters (A, B, C, . . . ). If A is a syntactic type and u is a term, we can also write “u : A”, meaning u : σ(A). Now we introduce the product-free Lambek calculus. This calculus originates from [8], but here we enrich it with terms attached to types, in a Curry – Howard style (due to van Benthem [1]; see, for example, [3]). Capital Greek letters denote
CISUC/TR 2014-02
137
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
sequences of syntactic types. If Π = A1 . . . An and ~x = (x1 , . . . , xn ) is a sequence of variables, then by ~x : Π we denote the sequence x1 : A1 , . . . , xn : An . Expressions of the form ~x : Π → u : A are called sequents. Axioms are sequents of the form x : A → x : A. The rules are as follows: x : A, ~y : Π → u : B ~y : Π → (λx.u) : (A \ B)
~y : Π, x : A → u : B where Π is non-empty ~y : Π → (λx.u) : (B / A)
~x : Π → u : A ~y : Γ, t : B, ~z : ∆ → v : C ~y : Γ, ~x : Π, f : (A \ B), ~z : ∆ → v[t := (f u)] : C ~x : Π → u : A ~y : Γ, t : B, ~z : ∆ → v : C ~y : Γ, f : (B / A), ~x : Π, ~z : ∆ → v[t := (f u)] : C
Let us call this calculus Lλ . Note that if Lλ ` ~x : Π → u : A, then all free variables of u are contained in ~x, and indeed u is of type A. Finally, the cut rule is admissible and acts as the substitution operation on terms: ~x : Π → u : A ~y : Γ, t : A, ~z : ∆ → v : C ~y : Γ, ~x : Π, ~z : ∆ → v[t := u] : C Now define the notion of Lλ -grammar (semantically enriched Lambek categorial grammar): it is a tuple G = hΣ, D, Hi. Here Σ is a finite alphabet, H ∈ Tp is the final type, and D ⊂ Σ × Tp × Tmλ is the categorial dictionary. The set D is finite, and if ha, A, ui ∈ D, then u : A. Implicitly, the mapping σ : Tp → T is also part of G. A string w = a1 . . . an is accepted by this grammar iff there exist such syntactic types A1 , . . . , An and terms u1 , . . . , un , such that hai , Ai , ui i ∈ D (i = 1, . . . , n), and Lλ ` x1 : A1 , . . . , xn : An → v : H (for some term v). The term v[x1 := u1 , . . . , xn := un ] is called the semantic value of w w.r.t. G. Note that a grammar can associate several semantic values to the same w via various choices of Ai or various derivations of the sequent.
2
Context-Free Grammars with Bidirectional Labelling
Here we consider only context-free grammars in Chomsky normal form [4]. Such a grammar is a tuple G = hN, Σ, P, si, where N and Σ are disjoint alphabets (elements of N are called non-terminal symbols), s ∈ N is the starting symbol, and P is the set of rules; each rule is either of the form p ⇒ qr, where p, q, r ∈ N , or of the form p ⇒ a, where p ∈ N , a ∈ Σ. A rule p ⇒ α can be applied to a word ηpθ ∈ (N ∪ Σ)+ , producing ηαθ. If w ∈ Σ+ can be obtained from s by several applications of the rules, then w is generated by grammar G. Now we introduce semantic labels (see, for example, [5]). Firstly, we associate a semantic type with any non-terminal symbol. This is done by a function σ : N → T. For any rule of the form p ⇒ qr we require that either σ(r) = σ(q) → σ(p) or σ(q) =
CISUC/TR 2014-02
138
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
σ(r) → σ(p). In the first case we call p ⇒ qr a right-to-left rule, in the second case—a left-to-right one. Finally, with any rule of the form p ⇒ a we associate a term u of type σ(p). Let w ∈ Σ+ be generated by G in the following way s ⇒ α1 ⇒ . . . ⇒ αm ⇒ w (each step here is an application of one rule). Going from the end to the beginning of this derivation, we associate a term with each occurrence of each non-terminal symbol. In the case of p ⇒ a we take u; in the case of a right-to-left rule p ⇒ qr, if u is associated with r and v is associated with q, then for p we take (uv). Note that, since u : σ(q) → σ(p), and v : σ(q), (uv) is a well-formed term of type σ(p). The left-to-right rule case is symmetric (we take (vu); v : σ(r) → σ(p), u : σ(r)). The term associated with the initial s is considered to be the semantic value of w.
3
Translation of Context-Free Grammars into Lambek Categorial Grammars
Given a context-free grammar G with semantic labels, we aim to obtain an Lλ -grammar G that generates the same set of words, and, moreover, gives the same semantic values to words of this set as G does. For an arbitrary cycle-free context-free grammar not generating the empty word this is done in [7]. Here we provide an easier and more straightforward way to do it for grammars in Chomsky normal form using Buszkowski’s [2] construction. Essentially, the construction of Kanazawa and Salvati [7] translates the grammar via Greibach normal form, and here we do it directly, preserving the structure of the parse tree. Here we restrict ourselves to a special class of semantic labels on context-free rules (one of the terms applied to the other), but the method can be generalised to arbitrary semantic marking on them. For the vice versa direction (translating Lambek grammars to context-free ones), Pentus proved [9], that every language generated by a Lambek grammar is context-free, and Kanazawa and Salvati [7] enriched this translation with semantics. However, being rather straightforward, Pentus’ translation is quite inefficient (leads to exponential growth of the grammar). Consider all non-terminal symbols as primitive types (N ⊆ Pr); the σ mapping is also preserved. With any t ∈ N we associate a set I(t) ⊂ Tp in the following way: • t ∈ I(t); • if p ⇒ qt is a rule of G, then (q \ p) ∈ I(t); • if t ∈ N and p ⇒ qr is a rule of G, then (q \ p) /(t \ r) ∈ I(t). Now we are ready to define G. Let H = s. For any rule of the form t ⇒ a and A ∈ I(t) with a term u : σ(t) associated we add an entry to the categorial dictionary D in the following way.
CISUC/TR 2014-02
139
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
• If A = t, then we add ha, A, ui. • If A = q \ p and p ⇒ qt is a right-to-left rule, then σ(t) = σ(q) → σ(p), so u : A, and we add ha, A, ui. • If A = q \ p and p ⇒ qt is a left-to-right rule, then σ(q) = σ(t) → σ(p). Let f be a variable of type σ(q) and let u ˜ = λf.(f u). Then u ˜ is a well-formed term of type σ(q) → σ(p) = σ(q \ p), and we add ha, A, u ˜i. • If A = (q \ p) /(t \ r) and p → qr is a right-to-left rule, then, since σ(r) = σ(q) → σ(p), σ(A) = (σ(t) → σ(r)) → σ(r). Let g be a variable of type (σ(t) → σ(r)) and let u ˜ = λg.(gu). We have u ˜ : ((σ(t) → σ(r)) → σ(r)), whence u ˜ : A, and we add ha, A, u ˜i. • If A = (q \ p) /(t \ r) and p → qr is a left-to-right rule, then σ(q) = σ(r) → σ(p). Thus σ(A) = (σ(t) → σ(r)) → ((σ(r) → σ(p)) → σ(p)). Let g be a variable of type (σ(t) → σ(r)) and f be a variable of type (σ(r) → σ(p)). Then u ˆ = λg.λf.(f (gu)) is a well-formed term of type σ(A), and we add ha, A, u ˆi to D.
Proposition 1. If w ∈ Σ+ is accepted by G, then it is also generated by G.
Proof. For every right-to-left rule of the form p ⇒ qr consider the sequent x : q, f : r → (f x) : p, and for every left-to-right rule p ⇒ rq consider the sequent f : r, x : q → (f x) : p. These sequents form a set A. Let’s add them as new axioms and also add the cut rule, thus getting a new calculus Lλ + A. Now let t ∈ N and A ∈ I(t). Then one can easily prove that Lλ +A ` x : t → x0 : A, where x0 is a term of type σ(A) (actually, either x, or x ˜, or x ˆ). Now, if w = a1 . . . an is accepted by G, then Lλ ` x1 : A1 , . . . , xn : An → v : s. Using the cut rule, we get Lλ + A ` x1 : t1 , . . . , xn : tn → v[x1 := x01 , . . . , xn := x0n ] : s. Now we reformulate the calculus Lλ + A in an equivalent form. Instead of the set A of axioms we introduce a set R of rules: ~z1 : Γ1 → v : q ~z2 : Γ2 → u : r for a right-to-left rule p ⇒ qr; ~z1 : Γ1 , ~z2 : Γ2 → (uv) : p ~z1 : Γ1 → u : r ~z2 : Γ2 → v : q for a left-to-right rule p ⇒ rq. ~z1 : Γ1 , ~z2 : Γ2 → (uv) : p
The calculi Lλ + A and Lλ + R are equivalent. Moreover, Lλ + R enjoys the cutelimination property. Therefore, Lλ + R ` x1 : t1 , . . . , xn : tn → v 0 : s without using the cut rule. Since types here do not contain any connectives, the only rules that can be used in the derivation are rules from R, and the derivation itself is actually a context-free derivation in G, thus w is generated by G.
CISUC/TR 2014-02
140
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Lemma 1 shows that G does not accept any extra words in comparison with G. Moreover, the cut-elimination procedure in Lλ +R preserves the semantic value (modulo α-, β- and η-equivalence; this can be proved in the same way as for Lλ , see [6]). Therefore v 0 is the same term as v[x1 := x01 , . . . , xn := x0n ]. Finally, on one hand, v 0 is the sematic value assigned by G. On the other hand, G assigns the term v[x1 := x01 , . . . , xn := x0n ], where the choice of xi , x ˆi or x ˜i for x0i is determined by Ai . Thus the semantic values coincide. Proposition 2. If w ∈ Σ+ is generated by G, then w is accepted by G with the same semantic value. Proof. Proceed by induction on G-derivation. Decorate all non-terminal symbols with syntactic types. The start symbol s is decorated with itself. On the induction step we apply a rule p ⇒ qr, decorate r with (q \ p) and consider several cases. • p is decorated with itself. Then decorate q with q. • p is decorated with a type of the form (q1 \ p1 ). Then decorate q with (q1 \ p1 ) /(q \ p). • p is decorated with a type of the form (q1 \ p1 ) /(p \ s). Then decorate q with (q1 \ p1 ) /(q \ s). One can prove by induction that any occurrence of a non-terminal symbol t is decorated with a type from I(t). Now we backtrack the G-derivation and associate terms to occurrences of nonterminal symbols. This correspondence is actually twofold: one line of terms comes from the G-derivation itself, the other is going to come from the Lambek derivation. We shall prove the following statement by induction: if at some step of the G-derivation starting with s we get t1 . . . tm (ti ∈ N ), and for each ti we have a term ui associated, then there exist terms u01 , . . . , u0m , such that for every i either u0i = ui , or u0i = u ˜i , or u0i = u ˆi , so that u0i : Ai , Lλ ` x1 : A1 , . . . , xm : Am → v¯ : s, and v¯[x1 := u01 , . . . , xm := u0m ] = v is the term associated with the start symbol s (the semantic value); the syntactic types Ai are those with which ti were decorated. The base case is trivial: m = 1, A1 = s, and we just take u01 = v. The step with the right-to-left rule is considered as follows. Let the rule be of the form p ⇒ qr, and the term u associated with p is then of the form (hu1 ), h associated with r, u1 associated with q. Also let p be decorated with A, q with B and r with C. We check that Lλ ` u01 : B, h0 : C → u0 : A, by considering possible cases for u01 , h0 , and u0 , and then apply the cut rule. The left-to-right case is considered symmetrically.
CISUC/TR 2014-02
141
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Finally, we apply the rules of the form ti ⇒ ai . Since the choice of the alternative for u0i is determined by Ai , the corresponding entries hai , Ai , u0i i exist in D. Therefore the word a1 . . . an is accepted by G with semantic value v, q.e.d.
Acknowledgments The author is grateful to Prof. Mati Pentus for constant attention to my work. I am also grateful to the participants of the ‘Natural Language and Computer Science’ workshop held at Vienna Summer of Logic 2014 for comments and discussion. This research was supported by the Russian Foundation for Basic Research (grants 12-0100888-a and 14-01-00127-a) and by the Presidential Council for Support of Leading ˇ Scientific Schools (grant NS-1423.2014.1).
References [1] J. van Benthem. The semantics of variety in categorial grammar. Report 83–29, Dept. of Mathematics, Simon Fraser University, Vancouver, 1983. [2] W. Buszkowski. On the Equivalence of Lambek Categorial Grammars and Basic Categorial Grammars. — Amsterdam: Institute for Logic, Language and Computation, University of Amsterdam, 1993. — 21 p. — (ILLC Prepublication Series; LP–93–07). [3] B. Carpenter. Type-logical semantics. — Cambridge, Mass.: The MIT Press, 1997. [4] N. Chomsky. Three models for the description of language. IRE Transactions on Information Theory. I T-2, No. 3, 1956. — P. 113–124. [5] D. R. Dowty et al. Introduction to Montague semantics. — Dordrecht etc: D. Reidel, 1981. [6] H. Hendriks. Studied flexibility: categories and types in syntax and semantics. Ph. D. dissertation, University of Amsterdam, 1993. [7] M. Kanazawa, S. Salvati. The string-meaning relations definable by Lambek grammars and context-free grammars. Formal Grammar 2012/2013, ed. by G. Morrill and M.J. Nederhof, Lect. Notes in Comput. Sci. 8036, Springer, Berlin, 2013. — P. 191–208. [8] J. Lambek. The mathematics of sentence structure. Amer. Math. Monthly. 65, No. 3, 1958. — P. 154–170. [9] M. Pentus. Lambek grammars are context-free. Proc. of the 8th Annual IEEE Symposium on Logic in Computer Science, IEEE Computer Society Press, Los Alamitos, California, 1993. — P. 429–433.
CISUC/TR 2014-02
142
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Program Extraction Applied to Monadic Parsing Ulrich Berger, Alison Jones, Monika Seisenberger Swansea University, Wales Abstract This paper outlines a proof-theoretic approach to developing correct and terminating monadic parsers. Using modified realisability, we extract provably correct and terminating programs from formal proofs. If the proof system is proven to be correct, then any extracted program is guaranteed to be so. By extracting parsers, we can ensure that they are correct, complete and terminating for any input. The work is ongoing, and is being carried out in the interactive proof system Minlog.
1
Introduction
Parsing any language can be problematic. With natural language, the problems are exacerbated by its inherent ambiguities and irregularities. In Linguistics, syntax can be analysed using several different formalisms, none of which is universally agreed upon. At the semantic level there is even less consensus. This sets natural languages apart from more formal languages encountered by parsers. Despite these differences, however, there are key properties that all parsers should ideally have; they should be complete and correct (with respect to a given grammar), and they should terminate with any input. The importance of termination is demonstrated by the well-known left-recursion problem. In this paper, we focus on monadic parsers, which allow us to generate parse trees as “side-effects” in pure functional programming languages such as Haskell. Another advantage is that monadic parser combinators naturally accommodate non-determinism and ambiguous input. Monadic parsers can be directly implemented in Haskell. This is the approach taken in much of the literature (see [HM], [HM96], [Wad92b], [Wad92a]). Testing these parsers could check (to a degree) that they work as expected. A more rigorous approach would be to verify them by formalising the implementations and proving that they work as expected. A third option, and the one explored here, is to extract correct parsers directly from the formal proofs. This is carried out using the interactive proof system Minlog.
2
The Parsing Problem
In the rest of this paper, we use sets to formalise parser relations, which we call parsers for short. Let P(X) denote the classical powerset of X (i.e. all subsets of X including ∅
CISUC/TR 2014-02
143
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
and X). Let P= Q](s) = {(b, t)|∃a, r.(a, r) ∈ [P ](s) ∧ (b, t) ∈ [Qa ](r)}
3.4
The Basic Parsers and Parser Combinators are Finitely Branching
To illustrate that these basic parsers and parser combinators are finitely branching, one must prove the following statements: 1. The return operation is finitely branching. This follows trivially from the parser function for return(x). 2. All basic parsers are finitely branching. This follows trivially from the parser function definitions. 3. If parser P and parser Q (of the same parsing result type) are finitely branching, then so is P ∪ Q: FBS,A (P ) → FBS,A (Q) → FBS,A (P ∪ Q)
(2)
Since [P ∪ Q](s) = [P ](s) ∪ [Q](s), it suffices to prove that the union of two finite sets is finite. 4. If P is finitely branching and has the parsing result type A, and Qa is a family of finitely branching parsers with the result type B, parametrised over A, then their composed set is finitely branching. FBS,A (P ) → ∀a ∈ A FBS,B (Qa ) → FBS,B (P >>= Q)
(3)
To prove this, one needs to prove that the union of finitely many finite sets is also finite.
CISUC/TR 2014-02
146
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
3.5
The Parsing Monad
The operations return and >>= endow the type operator A 7→ P(S × A) with the structure of a monad. Monads were connected with computation by [Mog91] and are now widely used in programming. The monad P(S × A) is different though related to the parser monad p :: String -> [(a, String)] of [HM98] or, set-theoretically, the type of parser functions S → P(A × S) (as a functor of A) which can be given essentially the same monadic structure as the parser monad in [HM98]. More precisely, the mapping sending a parser (relation) P to its parser function [P ] is a monad homomorphism. The monadic structure allows us to use the popular “do-notation”: do { a ← P ; Qa } do { P ; Q }
3.6
stands for stands for
P >>= λa . Qa P >>= λa . Q (a not in Q)
Recursive parsers
Nearly all non-trivial parsers use some form of recursion. In our setting recursive parsers are relations that are defined inductively, that is, as the least fixed points of monotone operators. Hence, in order to prove that recursively defined parsers are finitely branching (and hence give rise to parsing programs) we need to identify conditions on monotone operators Φ : P(S × A) → P(S × A) that guarantee that their least fixed points are finitely branching. In the following, we assume that all operators under discussion are monotone, that is P ⊆ Q → Φ(P ) ⊆ Φ(Q). This assumption is harmless since in all constructions below it is clear that the involved operators are monotone. For a parser P ∈ P(S × A) and a natural number n we define the restricted parsers P≤n := {(s, a) ∈ P | |s| ≤ n}
P= Q is finitely branching provided Qa is for all a ∈ A. In order to prove regressiveness, parsers that reject the empty string are useful. We call a parser P ∈ P(S × A) hungry if ([], a) 6∈ P for any a ∈ A. Similarly a family of parsers Q : A → P(S × B) is called hungry if Qa is hungry for all a ∈ A. Now, one can
CISUC/TR 2014-02
147
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
prove, for example, that the operator λP . P >>= Q is regressive if Qa is hungry for all a ∈ A. Of course, this leads to the question of how to prove that a parser is hungry. For example, a parser defined by a chain of compositions of parsers is hungry provided one parser in the chain is hungry, while for the union of two parsers to be hungry both parser need to be hungry. Regarding recursively defined parsers one can prove that a regressive operator that preserves hungriness has a hungry fixed point. The considerations above should be viewed as initial steps towards a library of proof tactics supporting a combinatorial style of proving that a parser is finitely branching which can be automatized. We illustrate the approach by showing that the finite iteration of a hungry and finitely branching parser is finitely branching. For a parser P ∈ P(S × A) we define inductively the parser many(P ) ∈ P(S × A∗ ): • ([], []) ∈ many(P ) • If (s, a) ∈ P and (t, as) ∈ many(P ), then (s ++ t, a : as) ∈ many(P ). This definition can be rewritten as the least fixed point of the operator Φ : P(S ×A∗ ) → P(S × A∗ ) defined by Φ(Q) := {([], [])} ∪ (P >>= λa . Q >>= λas . return(a : as)) or, using do-notation, Φ(Q) := {([], [])} ∪ do { a ← P ; as ← Q ; return(a : as) } Now, if P is hungry and finitely branching, then Φ is regressive and finitely branching. Hence, its least fixed point, many(P ), is finitely branching. Clearly, many(P ) is not hungry, but its variant many1(P ) := do { a ← P ; as ← many(P ) ; return(a : as) } is (and is still finitely branching).
4
Formalisation and Program Extraction in Minlog
Minlog ([Min], [BBS+ 98]) is a proof system based on a first order natural deduction calculus. It is interactive, using a backwards reasoning style. It implements Heyting Arithmetic in finite types enriched with free predicate variables, program constants (denoting computable functionals) with term-rewriting rules, inductive data types (freely generated algebras), and inductively and coinductively defined predicates. It is not a type-theoretic system, but keeps formulas and proofs separate from types and terms and has a simple domain-theoretic semantics: the partial continuous functionals. The theoretical background of Minlog is explained in [SW11].
CISUC/TR 2014-02
148
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
4.1
Formalisation
In the following, we discuss some relevant aspects of the formalisation in Minlog. So far, the development of the theory was independent from the representation of sets. In the Minlog formalisation, we decided to treat sets as boolean functions. A set of elements of type α therefore has the type α → Boole. If P is a set and a is an element of type α, P a represents the statement a ∈ P . (In our example, α will be instantiated by the type A × S, i.e., pairs of parsing results and strings.) We define the operations empty set (Em), insertion (Ins) and union (Un) as program constants in Minlog with appropriate term-rewriting rules. For example, the empty set is a function that returns false for all elements. With these constants, we can define an inductive predicate Fin of arity alpha=>boole (where alpha is a type variable), containing all finite sets (cf. Pboole. Fin consists of two closure axioms: one for the base case of the empty set, and one for the step case of a finite set with an additional inserted element. In Minlog, we give names to these closure axioms: InitFin and GenFin. Reverting to the set-theoretic syntax, these axioms are: InitFin: ∅ ∈ P (algFin alpha) -> (algFin alpha)
It is worth noting that this corresponds exactly to the standard cons constructor for lists. I.e., in general, any proof that a parsing function is finitely branching yields, via program extraction, a program that lists all parsing results. Finally, the induction principle for the inductive definition corresponds to recursion on the program level.
4.3
Extracted Union Program
From an inductive proof that the union of two finite sets is finite, we can automatically extract the following recursive program in Minlog, where we have replaced automatically generated variables by more meaningful ones: [q-fin,p-fin] (Rec algFin alpha=>algFin alpha)p-fin q-fin ([p’-fin,un-p’q-fin,a] (CGenFin alpha)un-p’q-fin a) Here, the corner brackets denote lambda abstraction. p-fin and q-fin are variables of type (algFin alpha) representing the finiteness of p and q. The program runs by recursion on the finiteness of p. In the base case, when p is empty, the finiteness of the union follows from the finiteness of q (q-fin). In the step case, where p equals Ins a p’, CGenFin produces a realiser for Un (Ins a p’)q using un-p’q-fin and a. Essentially, the program implements the concatenation of lists.
CISUC/TR 2014-02
150
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Note that the sets p and q do not occur in the program. This is due to our use of the non-computational quantifier allnc in the proof. Minlog will automatically generate a soundness proof stating the correctness of the extracted program. In this case, it makes use of the inductively defined predicate FinMR with two arguments: a term t and a set p, with the intended meaning that t is a realiser for the finiteness of p. The soundness theorem states that if we have realisers for the finiteness of p and q, then the extracted program applied to these realisers realises that the union of p and q is finite. We show the original output: allnc q,(algFin alpha)^2354(FinMR(algFin alpha)^2354 q --> allnc p,(algFin alpha)^2355(FinMR(algFin alpha)^2355 p --> FinMR((Rec algFin alpha=>algFin alpha) (algFin alpha)^2355(algFin alpha)^2354 ([(algFin alpha)^2364,(algFin alpha)^2365,a] (CGenFin alpha)(algFin alpha)^2365 a)) ((Un alpha)p q)))
5
Extracting Monadic Parser Programs
This section illustrates how the previous proof can be used to extract correct and terminating parser combinators. In Section 3.2, we formalised the parser combinator for choice in terms of set union. In Section 4, we extracted a program from a proof that the union of two finitely branching sets is finite. From this result it follows immediately that the pointwise union P ∪ Q of two finitely branching parsers P and Q is finitely branching (see (2) above). To extract parsing programs that use the choice combinator in Minlog, we just need to instantiate P and Q with concrete parsers. As an example, we will extract a program that combines the item parser and the return function. Extracting such a program is straightforward. We add program constants for the item parser and return function, prove that both are finitely branching and then use these to extract the program from an instantiated proof of (2): FBS,A (item ∪ return c) Here, in accordance with Section 2.1, a parser is finitely branching if its parsing function yields a finite set of results with any input string s. At the proof level, this follows from the finite union proof and the finiteness of the individual parsers. At the program level, it is realised by a concatenated list of parsing results. The extracted program takes a string st and a (return) character ‘c‘ as input. It can be run in Minlog as follows. With the argument string "aba" and character ‘c‘, the program yields a list with to elements: (‘a‘,"ba") from the item parser and (‘c‘,"aba"), from the return function. In Minlog, this is displayed as:
CISUC/TR 2014-02
151
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(a@b::a:)::(c@a::b::a:): where :: is the cons operator for lists and @ is used for pair terms. As a second example, the empty string could be given as the first input argument. The item parser would fail, so the result from the return function would be returned as a singleton list: (c@(Nil char)): where (Nil char) is the empty list of type Char.
6
Conclusion
A fundamental benefit of treating monadic parsers at the proof level is that we can often work directly with parsers (that is, parser relations) as opposed to their parsing functions. Parsing functions are more complex objects, and verifying their correctness at the program level creates unnecessary work. To our knowledge, this is the first time that program extraction has been applied to parsing algorithms. We are still at the early stages, so it remains to be seen how far we can get with this approach. However, we think that it has good potential. A proof of correctness, completeness and termination is advantageous in any parsing context. In terms of potential applications to natural language semantics in particular, there is a common concern in the literature for modular grammar engineering ([BB04], [BB99]), which is facilitated by monads. We also share an interest in the functional programming framework and the lambda calculus [vEU10]. There is further potential for overlap in the area of discourse analysis and semantic inference [BBKdN98]. We welcome suggestions of other possible applications to natural language semantics.
References [BB99]
Patrick Blackburn and Johan Bos. Representation and Inference for Natural Language: A First Course in Computational Semantics (Volume II), 1999. http://www.let.rug.nl/bos/comsem/. [BB04] Patrick Blackburn and Johan Bos. Representation and Inference for Natural Language: A First Course in Computational Semantics. Centre for the Study of Language & Information, 2004. [BBKdN98] Patrick Blackburn, Johan Bos, Michael Kohlbase, and Hans de Nivelle. Automated Theorem Proving for Natural Language Understanding. In Proceedings of CADE15 workshop Problem-solving methodologies with Automated Deduction, 1998. [BBS+ 98] Holger Benl, Ulrich Berger, Helmut Schwichtenberg, Monika Seisenberger, and Wolfgang Zuber. Proof theory at work: Program development in the Minlog system. In Automated Deduction, volume II of Applied Logic Series, pages 41–71. Kluwer, 1998. [BMSS11] Ulrich Berger, Kenji Miyamoto, Helmut Schwichtenberg, and Monika Seisenberger. Minlog - a tool for program extraction supporting algebras and coalgebras. In Andrea Corradini, Bartek Klin, and Corina Crstea, editors, Algebra and Coalgebra
CISUC/TR 2014-02
152
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
[BS05]
[HM] [HM96] [HM98] [Kre59] [Min] [Mog91] [SW11] [vEU10] [Wad92a] [Wad92b]
in Computer Science, volume 6859 of Lecture Notes in Computer Science, pages 393–399. Springer Berlin Heidelberg, 2011. Ulrich Berger and Monika Seisenberger. Applications of inductive definitions and choice principles to program synthesis. In Laura Crosilla and Peter Schuster, editors, Applications of inductive definitions and choice principles to program synthesis. Oxford University Press, 2005. Graham Hutton and Erik Meijer. Functional Pearls: Monadic Parsing in Haskell. http://eprints.nottingham.ac.uk/223/1/pearl.pdf. Graham Hutton and Erik Meijer. Monadic Parser Combinators, 1996. http://www.cs.nott.ac.uk/∼gmh/monparsing.pdf. Graham Hutton and Erik Meijer. Monadic parsing in Haskell. Journal of Functional Programming, 8(4):437–444, July 1998. Georg Kreisel. Interpretation of analysis by means of constructive functionals of finite types. Constructivity in Mathematics, pages 101–128, 1959. The Minlog System. http://www.minlog-system.de. Eugenio Moggi. Notions of computation and monads. Information and Computation, 93(1):55–92, 1991. Helmut Schwichtenberg and Stanley S. Wainer. Proofs and Computations: Perspectives in Logic. Cambridge University Press, 2011. Jan van Eijck and Christina Unger. Computational Semantics with Functional Programming. Cambridge University Press, 2010. Philip Wadler. Comprehending Monads. Mathematical Structures in Computer Science, 2:4, 1992. Philip Wadler. The Essence of Functional Programming, 1992. www.math.pku.ecu.cn/teachers/qiuzy/plan/lits/essence.pdf.
CISUC/TR 2014-02
153
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
CISUC/TR 2014-02
154
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
How to do things with types∗ Robin Cooper University of Gothenburg
Abstract We present a theory of type acts based on type theory relating to a general theory of action which can be used as a foundation for a theory of linguistic acts.
Contents 1 Introduction
1
2 Type acts
2
3 Coordinating interaction in games
5
4 Conclusion
9
1
Introduction
The title of this paper picks up, of course, on the seminal book How to do things with words by J.L. Austin [Aus62], which laid the foundation for speech act theory. In recent work on dialogue such as [Gin12] it is proposed that it is important for linguistic analysis to take seriously a notion of linguistic acts. To formulate this, Ginzburg uses TTR (“Type Theory with Records”) as developed in [Coo12]. The idea was also very much a part of the original work on situation semantics [BP83] which has also served as an inspiration for the work on TTR. TTR was also originally inspired by work on constructive type theory [ML84, NPS90] in which there is a notion of judgement that an object a is of a type T , a : T .1 We shall think of judgements as a kind of type act and propose that there are other type acts in addition to judgements. We shall also follow [Ran94] in including types of events (or more generally, situations) in a type theory suitable for building a theory of action and of linguistic meaning. ∗
The work for this paper was supported in part by VR project 2009-1569, SAICD. While TTR has borrowed liberally from the many important ideas in constructive type theory, it does not adhere rigidly to the intuitionistic programme of the original and it has features, such as intersection and union types and a general model theoretic approach, which might make some type theorists judge it not to be a type theory at all. We have found it productive, however, to relate the ideas from modern type theory to the classical model theoretic approach adopted in formal semantics stemming from the original work by [Mon74]. 1
CISUC/TR 2014-02
155
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
This paper attempts to explore what kind of general theory of action such a notion of linguistic acts could be embedded in and tries to develop the beginnings of such a theory using type theory as the starting point. We will here only develop a nonlinguistic example, although we will point out relationships to linguistic acts as we go along.
2
Type acts
Imagine a boy and a dog playing a game of fetch. (The boy throws a stick and the dog runs after it and brings it back to the boy.) The boy and the dog have to coordinate and interact in order to create an event of the game of fetch. This involves doing more with types than just making judgements. For example, when the dog observes the situation in which the boy raises the stick, it may not be clear to the dog whether this is part of a fetch-game situation or a stick-beating situation. The dog may be in a situation of entertaining these two types as possibilities prior to making the judgement that the situation is of the fetch type. We will call this act a query as opposed to a judgement. Once the dog has made the judgement2 what it has observed so far is an initial segment of a fetch type situation it has to make its own contribution in order to realize the fetch type, that is, it has to run after the stick and bring it back. This involves the creation of a situation of a certain type. Thus creation acts are another kind of act related to types. Creating objects of a given type often has a de se [see, for example, Per79, Lew79, Nin10, Sch11] aspect. The dog has to know that it itself must run after the stick in order to make this a situation in which it and the boy are playing fetch. There is something akin to what Perry calls an essential indexical here, though, of course, the dog does not have indexical linguistic expressions. It is nevertheless part of the basic competence that an agent needs in order to be able to coordinate its action with the rest of the world that it has a primitive sense of self which is distinct from being able to identify an object which has the same properties as itself. We will follow Lewis in modelling de se in terms of functional abstraction over the “self”. In our terms this will mean that de se type acts involve dependent types. In standard type theory we have judgements such as o : T “o is of type T ” and T true “there is something of type T ”. We want to enhance this notion of judgement by including a reference to the agent A which makes the judgement, giving judgements such as o :A T “agent A judges that o is of type T ” and :A T “agent A judges that there is some object of type T ”. We will call the first of these a specific judgement and the second a non-specific judgement. Such judgements are one of the three kinds of acts represented in (1) that we want to include in our type act theory. The three kinds of 2
What judgement is made and what queries are entertained we imagine being governed by some kind of Bayesian learning theory. An initial suggestion along these lines is made by [CDLL14a] who base the learning theory on a string of probabilistic Austinian propositions, records consisting of a situation, s, a type, T and a probability, p, corresponding to a judgement by the learning agent that s if of type T with probability p.
CISUC/TR 2014-02
156
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(1)
Type Acts judgements specific o :A T “agent A judges object o to be of type T” non-specific :A T “agent A judges that there is some object of type T ” queries specific o :A T ? “agent A wonders whether object o is of type T ” non-specific :A T ? “agent A wonders whether there is some object of type T ” creations non-specific :A T ! “agent A creates something of type T”
type acts in (1) underly what are often thought of as core speech acts: assertion, query and command. Note that creations only come in the non-specific variant. You cannot create an object which already exists. Creations are also limited in that there are certain types which a given agent is not able to realize as the main actor. Consider for example the event type involved in the fetch game of the dog running after the stick. The human cannot be the main creator of such an event since it is the dog who is the actor. The most the human can do is wait until the dog has carried out the action and we will count this as a creation type act. This will become important when we discuss coordination in the fetch-game below and it is also important in accounting for turn-taking in the coordination of dialogue. It is actually important that the human makes this passive contribution to the creation of the event of the dog running after the stick and does not, for example, get the game confused by immediately throwing another stick before the dog has had a chance to retrieve the first stick. There are other cases of event types which require a less passive contribution from an agent other than the main actor. Consider the type of event where the dog returns the stick to the human. The dog is clearly the main actor here but the human has also a role to play in making the event realized. For example, if the human turns her back on the dog and ignores what is happening or runs away the event type will not be realized despite the dog’s best efforts. Something similar holds in language. If it is your dialogue partner’s turn to make an utterance, you still have to play your
CISUC/TR 2014-02
157
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
part by paying attention and trying to understand. Other event types, such as lifting a piano, involve more equal collaboration between two or more agents, where it is not intuitively clear that any one of the agents is the main actor. So when we say “agent A creates something of type T ” perhaps it would be more accurate to phrase this as “agent A contributes to the creation of something of type T ” where A’s contribution might be as little as not realizing any of the other types involved in the game until T has been realized. De se type acts involve functions which have the agent in its domain and return a type, that is, they are dependent types which, given the agent, will yield a type. We will say that agents are of type Ind (“individual”) and that the relevant dependent types, T , are functions of type (Ind →Type). We characterize de se type acts in a way parallel to (1), as given in (2). (2)
De Se Type Acts judgements specific o :A T (A) “agent A judges object o to be of type T (A)”
non-specific :A T (A) “agent A judges that there is some object of type T (A)” queries specific o :A T (A)? “agent A wonders whether object o is of type T (A)”
non-specific :A T (A)? “agent A wonders whether there is some object of type T (A)” creations non-specific :A T (A)! “agent A creates something of type T (A)” From the point of view of the type theory de se type acts seem more complex than non-de se type acts since they involve a dependent rather than a non-dependent type and a functional application of that dependent type to the agent. However, from a cognitive perspective one might expect de se type acts to be more basic. Agents which perform type acts using types directly related to themselves are behaving egocentrically and one could regard it as a more advanced level of abstraction to consider types which are independent of the agent. This seems a puzzling way in which our notions of type seem in conflict with out intuitions about cognition. While these type acts are prelinguistic (we need them to account for the dog’s behaviour in the game of fetch) we will try to argue later that they are the basis on
CISUC/TR 2014-02
158
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
which the notion of speech act [Aus62, Sea69] is built. The idea would be that our division of type acts into the three classes judgements, queries and creations underly the core speech acts assertion, questions and imperatives and that other kinds of speech acts are related to one of the three kinds of type acts. Our notion of using types in query acts seems intuitively related to work on inquisitive semantics [GR12] where some propositions (in particular disjunctions) are regarded as inquisitive. However, this will still allow us to make a distinction between questions and assertions in natural language as argued for by [Gin12] and [GCF14].
3
Coordinating interaction in games
Let us now apply these notions to the kind of interaction that has to take place between the human and the dog in a game of fetch. First consider in more detail what is actually involved in playing a game of fetch, that is creating an event of the particular type represented by “game of fetch”. Each agent has to keep track in some way of where they are in the game and in particular what needs to happen next. We analyze this by saying that each agent has an information state which we will model as a record. We need to keep track of the progression of types of information state for an agent during the course of the game. We will refer to the types of information states as gameboards. Our notions of information state and gameboard are taken from [Lar02] and [Gin12] respectively as well as a great deal of related literature on the gameboard or information state approach to dialogue analysis originating from [Gin94]. We have adapted the notions somewhat to our own purposes but we want to claim that the kind of information states that have been proposed in the literature on dialogue are developments from the kind of information states needed to account for non-linguistic coordination. The idea is that as part of the event occurs then the agent’s gameboard is updated so that an event of the next type in the string is expected. For now, we will consider gameboards which only place one requirement on information states, namely that there is an agenda which indicates the type of the next move in the game. Thus if the agent is playing fetch and observes an event of the type where the human throws the stick, then the next move in the game will be an event of the type where the dog runs after the stick. If the actor in the next move is the agent herself then the agent will need to create an event of the type of the next move if the game is to progress. If the actor in the next move is the other player in the game, then the agent will need to observe an event and judge it to be of the appropriate type in order for the game to progress. The type of information states, InfoState, will be (3a). (In dialogue, we see more complex information states which include additional fields in the record types.) The type of the initial information state, InitInfoState, will be one where the agenda is required to be the empty list. (3) a. agenda : [RecType] b. agenda=[] : [RecType]
CISUC/TR 2014-02
159
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
((3b) is a manifest field as discussed in [Coo12] based on an idea by Thierry Coquand. A manifest field restricts the type in the field to the singleton type whose only member is the object represented after ‘=’.) We can now see the rules of the game corresponding to the type as a set of update functions which indicate for an information state of a given type what type the next information state may belong to if an event of a certain type occurs. These update functions correspond to the transitions in a finite state machine. This is given in (4). (4) { λr: agenda=[]:[RecType] . e:pick up(a,c) agenda=[ ]:[RecType] , e:pick up(a,c) ]:[RecType] λr: agenda=[ λe: e:pick up(a,c) . ]:[RecType] , e:attract attention(a,b) agenda=[ e:pick up(a,c) ]:[RecType] λr: agenda=[ λe: e:attract attention(a,b) . e:throw(a,c) agenda=[ ]:[RecType] , e:throw(a,c) ]:[RecType] λr: agenda=[ λe: e:throw(a,c) . e:run after(b,c) agenda=[ ]:[RecType] , e:run after(b,c) ]:[RecType] λr: agenda=[ λe: e:run after(b,c) . e:pick up(b,c) ]:[RecType] agenda=[ , e:pick up(b,c) ]:[RecType] λr: agenda=[ λe: e:pick up(b,c) . e:return(b,c,a) agenda=[ ]:[RecType] , e:return(b,c,a) ]:[RecType] λr: agenda=[ λe: e:return(b,c,a) . agenda=[]:[RecType] } Since we are treating an empty agenda as the condition for the input to the initial state in the corresponding automaton and also the output of the final state we automatically get a loop effect from the final state to the initial state so that the game can be repeated indefinitely many times. In order to prevent the loop we would have to distinguish the type corresponding to the initial and final states. Note that the functions in (4) are of the type (5). (5) ( agenda:[RecType] →(Rec→RecType))
That is, they map an information state containing an agenda (modelled as a record containing an agenda field) and an event (modelled as a record) to a record type. This is true of all except for the function corresponding to the initial state which is of type (6).
CISUC/TR 2014-02
160
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(6)
( agenda:[RecType] →RecType)
That is, it maps an information state directly to a record type and does not require an event. We can think of this set as the set of rules which define the game. It is of the type (7). (7) {(( agenda:[RecType] →(Rec→RecType))∨( agenda:[RecType] →RecType))}
Let us call the type in (7) GameRules. Sets of game rules of this type define the rules for specific participants as in (4). In order to characterize the game in general we need to abstract out the roles of the individual participants in the game. This we will do by defining a function from a record containing individuals appropriate to play the roles in the game thus revising (4) to (8). h : Ind chuman : human(h) d : Ind ∗ . (8) λr : cdog : dog(d) s : Ind cstick : stick(s) { λr: agenda=[]:[RecType] . ∗ ∗ agenda=[ e:pick up(r ∗.h,r ∗.s) ]:[RecType] , e:pick up(r .h,r .s) ]:[RecType] λr: agenda=[ ∗ ∗ λe: e:pick up(r .h,r .s) . ∗ .h,r ∗ .d) ]:[RecType] , e:attract attention(r agenda=[ e:pick up(r∗ .h,r∗ .s) ]:[RecType] λr: agenda=[ . λe: e:attract attention(r∗ .h,r∗ .d) ∗ .h,r ∗ .s) ]:[RecType] , e:throw(r agenda=[ e:throw(r∗ .h,r∗ .s) ]:[RecType] λr: agenda=[ λe: e:throw(r∗ .h,r∗ .s) . ∗ .d,r ∗ .s) ]:[RecType] , e:run after(r agenda=[ e:run after(r∗ .d,r∗ .s) ]:[RecType] λr: agenda=[ ∗ .d,r ∗ .s) . λe: e:run after(r ∗ .d,r ∗ .s) ]:[RecType] , e:pick up(r agenda=[ e:pick up(r∗ .d,r∗ .s) ]:[RecType] λr: agenda=[ ∗ ∗ λe: e:pick up(r .d,r .s) ∗ . ∗ ∗ .d,r .s,r .h) ]:[RecType] , e:return(r agenda=[ ∗ .d,r ∗ .s,r ∗ .h) ]:[RecType] e:return(r λr: agenda=[ λe: e:return(r∗ .d,r∗ .s,r∗.h) . agenda=[]:[RecType] }
(8) is of type (Rec→GameRules) which we will call Game.
CISUC/TR 2014-02
161
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Specifying the rules of the game in terms of update functions in this way will not actually getting anything to happen, though. For that we need type acts of the kind we discussed. We link the update functions to type acts by means of licensing conditions on type acts. A basic licensing condition is that an agent can create (or contribute to the creation of) a witness for the first type that occurs on the agenda in its information state. Such a licensing condition is expressed in (9). (9)
If A is an agent, si is A’s current information state, agenda=T | R : [RecType] , then :A T ! is lisi :A censed.
Update functions of the kind we have discussed are handled by the licensing conditions in (10). (10) a. If f : (T1 → (T2 → Type)) is an update function, A is an agent, si is A’s current information state, si :A Ti , Ti v T1 (and si : T1 ), then an event e :A T2 (and e : T2 ) licenses si+1 :A f (si )(e). b. If f : (T1 → Type) is an update function, A is an agent, si is A’s current information state, si :A Ti , Ti v T1 (and si : T1 ), si+1 :A f (si ) is licensed. (10a) is for the case where the update function requires an event in order to be triggered and (10b) is for the case where no event is required. There are two variants of licensing conditions which can be considered. One variant is where the licensing conditions rely only on the agent’s judgement of information states and events occurring. The other variant is where in addition we require that the information states and events actually are of the types which the agent judges them to be of. (These conditions are represented in parentheses in (10).) In practical terms an agent has to rely on its own judgement, of course, and there is one sense in which any resulting action is licensed even if the agent’s judgement was mistaken. There is another stricter sense of license which requires the agent’s judgement to be correct. In the real world, though, the only way we have of judging a judgement to be correct is to look at judgements by other agents. Licensing conditions will regulate the coordination of successfully realized games like fetch. They enable the agents to coordinate their activity when they both have access to the same objects of type Game and are both willing to play. The use of the word “license” is important, however. The agents have free will and may choose not to do what is licensed and also may perform acts that are not licensed. We cannot build a theory that will predict exactly what will happen but we can have a theory which tells us what kinds of actions belong to a game. It is up to the agents to decide whether they will play the game or not. At the same time, however, we might regard whatever is licensed at a given point in the game as an obligation. That is, if there is a general obligation to continue a game once you have embarked on it, then whatever type is
CISUC/TR 2014-02
162
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
placed on an agent’s agenda as the result of a previous event in the game can be seen as an obligation on the agent to play its part in the creation of an event of that type.
4
Conclusion
We have sketched the beginnings of a theory of action based on ideas from type theory. We believe that a theory of linguistic acts in terms of updating information states as discussed in the literature on dialogue can be seen as a development of such a basic theory and that this makes a connection between the kind of complex coordination involved in dialogue and coordination that is needed between non-linguistic agents who interact with each other and the rest of the physical world.
References [Aus62] J. Austin. How to Do Things with Words. Oxford University Press, 1962. ed. by J. O. Urmson. [BP83] Jon Barwise and John Perry. Situations and Attitudes. Bradford Books. MIT Press, Cambridge, Mass., 1983. [CDLL14a] Robin Cooper, Simon Dobnik, Shalom Lappin, and Staffan Larsson. A probabilistic rich type theory for semantic interpretation. In Cooper et al. [CDLL14b], pages 72–79. [CDLL14b] Robin Cooper, Simon Dobnik, Shalom Lappin, and Staffan Larsson, editors. Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS). Association for Computational Linguistics, Gothenburg, Sweden, April 2014. [Coo12] Robin Cooper. Type theory and semantics in flux. In Ruth Kempson, Nicholas Asher, and Tim Fernando, editors, Handbook of the Philosophy of Science, volume 14: Philosophy of Linguistics, pages 271–323. Elsevier BV, 2012. General editors: Dov M. Gabbay, Paul Thagard and John Woods. [GCF14] Jonathan Ginzburg, Robin Cooper, and Tim Fernando. Propositions, questions, and adjectives: a rich type theoretic approach. In Cooper et al. [CDLL14b], pages 89–96. [Gin94] Jonathan Ginzburg. An update semantics for dialogue. In Harry Bunt, editor, Proceedings of the 1st International Workshop on Computational Semantics, Tilburg University, 1994. ITK Tilburg. [Gin12] Jonathan Ginzburg. The Interactive Stance: Meaning for Conversation. Oxford University Press, Oxford, 2012.
CISUC/TR 2014-02
163
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
[GR12] Jeroen Groenendijk and Floris Roelofsen. Course notes on inquisitive semantics, nasslli 2012. Available at https: //sites.google.com/site/inquisitivesemantics/documents/ NASSLLI-2012-inquisitive-semantics-lecture-notes.pdf, 2012. [Lar02] Staffan Larsson. Issue-based Dialogue Management. PhD thesis, University of Gothenburg, 2002. [Lew79] David Lewis. Attitudes de dicto and de se. Philosophical Review, 88:513– 543, 1979. Reprinted in [Lew83]. [Lew83] David Lewis. Philosophical Papers, Volume 1. Oxford University Press, 1983. [ML84] Per Martin-L¨ of. Intuitionistic Type Theory. Bibliopolis, Naples, 1984. [Mon74] Richard Montague. Formal Philosophy: Selected Papers of Richard Montague. Yale University Press, New Haven, 1974. ed. and with an introduction by Richmond H. Thomason. [Nin10] Dilip Ninan. De Se Attitudes: Ascription and Communication. Philosophy Compass, 5(7):551–567, 2010. [NPS90] Bengt Nordstr¨ om, Kent Petersson, and Jan M. Smith. Programming in Martin-L¨ of ’s Type Theory, volume 7 of International Series of Monographs on Computer Science. Clarendon Press, Oxford, 1990. [Per79] John Perry. The problem of the essential indexical. Noˆ us, 13(1):3–21, 1979. Reprinted in [Per93]. [Per93] John Perry. The Problem of the Essential Indexical and Other Essays. Oxford University Press, 1993. [Ran94] Aarne Ranta. Type-Theoretical Grammar. Clarendon Press, Oxford, 1994. [Sch11] Philippe Schlenker. Indexicality and De Se reports. In Claudia Maienborn, Klaus von Heusinger, and Paul Portner, editors, Semantics: an international handbook of natural language meaning, pages 1561–1604. de Gruyter, 2011. [Sea69] John R. Searle. Speech Acts: an Essay in the Philosophy of Language. Cambridge University Press, 1969.
CISUC/TR 2014-02
164
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
A low-level treatment of generalised quantifiers in categorical compositional distributional semantics Extended Abstract Ondˇrej Ryp´acˇ ek1∗and Mehrnoosh Sadrzadeh2† University of Oxford, Oxford, UK
[email protected] Queen Mary University, London, UK
[email protected]
1 2
Abstract We show how one can formalise quantifiers in the categorical compositional distributional model of meaning. Our model is based on the generalised quantifier theory of Barwise and Cooper. We develop an abstract compact closed semantics using Frobenius Algebras for these quantifiers and instantiate the abstract model in vector spaces and in relations. The former is an example for the distributional corpusbased models of language and the latter for the truth-theoretic ones. We provide explanations and toy examples and defer formal proofs and large scale empirical validation to the full version of the paper.
1
Introduction
Distributional models of natural language are based on Firth’s hypothesis that meanings of words can be deduced from the contexts in which they often occur [6]. One then fixes a context window of n words and computes frequencies of how many times a words has occurred in this window with other words. These models have been applied to various language processing tasks, for instance thesauri construction [5]. Compositional distributional models of meaning extend these models from words to sentences. The categorical such models [4, 1] do so by taking into account the grammatical structure of sentences and the vectors of the words in there. These models have proven successful in practical natural language tasks such as disambiguation, term/definition classification and phrase similarity, for example see [7, 8]. Nevertheless, it has been an open problem how to deal with meanings of logical words such as quantifiers and conjunctives. In this paper, we present preliminary work which aims to show how quantifiers can be deal with using the generalised quantifier theory of Barwise and Cooper [2]. Very briefly put, according to the ‘living on property’ of generalised quantifier theory, the meaning of a sentence with a natural language quantifier Det such as ‘Det Sbj Verb’ is determined by first taking the intersection of the denotation of Sbj with the denotation of ∗ †
Author supported by EPSRC Grant EP/I037512/1. Author supported by an EPSRC CAF grant EP/J002607/1.
CISUC/TR 2014-02
165
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
subjects of the Verb, then checking if this is an element of the denotation of Det(Sbj). The denotation of Det is specified separately, for example, for Det = ∃, it is the set of non-empty subsets of the universe, for Det = 2 it is the set of subsets of the universe that have two elements and so on. As a result, and for example, the meaning of a sentence “some men sleep” becomes true if the set of men who sleep is non empty and the meaning of “two men sleep” is true if the set of men who sleep has cardinality 2. In what follows, we work in the categorical compositional distributional model of [4], treat natural language quantifiers as generalised quantifiers and model them in a way similar to how we modelled relative pronouns in previous work [12, 13]. We first present a brief preliminary account of compact closed categories and Frobenius algebras over them and review how vector spaces and relations provide instances. Then, we develop a compact closed categorical semantic for quantifiers, in terms of diagrams and morphisms of compact closed categories and Frobenius Algebras over certain objects of them. We present two concrete interpretations for this abstract setting: relations and vector spaces. The former provides basis for a truththeoretic model of meaning and the latter for a corpus-base distributional model of meaning. In the full article version of the current extended abstract, we provide details of interpretation maps, formal proofs, and results of large scale empirical evaluations.
2
Preliminaries
This subsection briefly reviews compact closed categories and Frobenius algebras. For a formal presentation, see [9, 10]. A compact closed category, C, has objects A, B; morphisms f : A → B; a monoidal tensor A⊗B that has a unit I, that is we have A⊗I ∼ = I ⊗A ∼ = A. Furthermore, r l for each object A there are two objects A and A and the following morphisms: r
ηr
A A A ⊗ Ar −→ I −→ Ar ⊗ A
l
ηl
A A Al ⊗ A −→ I −→ A ⊗ Al
These morphisms satisfy the following equalities, sometimes referred to as the yanking equalities, where 1A is the identity morphism on object A: l (1A ⊗ lA ) ◦ (ηA ⊗ 1A ) = 1A
l (lA ⊗ 1A ) ◦ (1Al ⊗ ηA ) = 1Al
r (rA ⊗ 1A ) ◦ (1A ⊗ ηA ) = 1A
r (1Ar ⊗ rA ) ◦ (ηA ⊗ 1Ar ) = 1Ar
These express the fact the Al and Ar are the left and right adjoints, respectively, of A in the 1-object bicategory whose 1-cells are objects of C. A Frobenius algebra in a monoidal category (C, ⊗, I) is a tuple (X, δ, ι, µ, ζ) where, for X an object of C, the triple (X, δ, ι) is an internal comonoid; i.e. the following are coassociative and counital morphisms of C: δ: X → X ⊗ X
ι: X → I
µ: X ⊗ X → X
ζ: I → X
Moreover (X, µ, ζ) is an internal monoid; i.e. the following are associative and unital morphisms:
CISUC/TR 2014-02
166
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
And finally the δ and µ morphisms satisfy the following Frobenius condition: (µ ⊗ 1X ) ◦ (1X ⊗ δ) = δ ◦ µ = (1X ⊗ µ) ◦ (δ ⊗ 1X ) Informally, the comultiplication δ dispatches the information contained in one object into two objects, and the multiplication µ unifies the information of two objects into one. Finite Dimensional Vector Spaces. These structures together with linear maps form a compact closed category, which we refer to as FdVect. Finite dimensional vector spaces V, W are objects of this category; linear maps f : V → W are its morphisms with composition being the composition of linear maps. The tensor product V ⊗ W is the linear algebraic tensor product, whose unit is the scalar field of vector spaces; in our case this is the field of reals R. Here, there is a naturual isomorphism V ⊗ W ∼ = W ⊗ V . As a result of the symmetry of the tensor, the two adjoints reduce to one and we obtain the isomorphism V l ∼ = Vr ∼ = V ∗ , where V ∗ is the dual space of V . When the basis vectors of the vector spaces are fixed, it is further the case that we have V ∗ ∼ =V. Given a basis {ri }i for a vector space V , the epsilon maps are given by the inner product extended by linearity; i.e. we have: X X l = r : V ⊗ V → R given by cij ψi ⊗ φj 7→ cij hψi | φj i ij
ij
Similarly, eta maps are defined as follows: ηl = ηr : R → V ⊗ V
given by 1 7→
X i
ri ⊗ ri
− Any vector space V with a fixed basis {→ vi }i has a Frobenius algebra over it, explicitly given as follows, where δij is the Kronecker delta. δ: V → V ⊗ V
given by
ι: V → R
given by
µ: V ⊗ V → V
given by
ζ: R → V
given by
→ − − − vi 7→ → vi ⊗ → vi → − → − − v ⊗ v 7→ δ → v i
j
→ − vi 7→ 1 X → − 1 7→ vi
ij i
i
Relations. Another important example of a compact closed category is Rel, the cateogry of sets and relations. Here, ⊗ is cartesian product with the singleton set as its unit I = {?}, and the adjoin operators are identity morphisms on objects. Closure reduces to the fact that a relation between sets A × B and C is equivalently a relation between A and B × C. Given a set S with elements si , sj ∈ S, the epsilon and eta maps are given as follows: l = r ηl
=
CISUC/TR 2014-02
ηr
: S × S → {?}
: {?} → S × S
given by {((si , sj ), ?) | si , sj ∈ S, si = sj }
given by {(?, (si , sj )) | si , sj ∈ S, si = sj }
167
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Every object in Rel has a Frobenius algebra over it given by the diagonal and codiagonal relations, as described below:
δ µ ι ζ
: S →S×S
: S×S →S
given by {(si , (sj , sk )) | si , sj , sk ∈ S, si = sj = sk }
given by {(si , sj ), sk ) | si , sj , sk ∈ S, si = sj = sk }
: S → {?} given by {(si , ?) | si ∈ S}
: {?} → S
given by {(?, si ) | si ∈ S}
For the details of verifying that for each of the two examples above, the corresponding conditions hold see [3].
2.1
String Diagrams
The framework of compact closed categories and Frobenius algebras comes with a complete diagrammatic calculus that visualises derivations, and which also simplifies the categorical and vector space computations. Morphisms are depicted by boxes and objects by lines, representing their identity morphisms. For instance a morphism f : A → B, and an object A with the identity arrow 1A : A → A, are depicted as follows: A f
A
B Morphisms from I to objects are depicted by triangles with strings emanating from them. In concrete categories, these morphisms represent elements within the objects. For instance, a vector a in a vector space A is represented by the morphism a : I → A and depicted by a triangle with one string. The number of strings of such triangles depict the tensor rank of the vector space; for instance, the diagrams for a ∈ A, a0 ∈ A ⊗ B, and a00 ∈ A ⊗ B ⊗ C are as follows:
A
A B
A B C
The tensor products of the objects and morphisms are depicted by juxtaposing their diagrams side by side, whereas compositions of morphisms are depicted by putting one on top of the other; for instance the object A ⊗ B, and the morphisms f ⊗ g and f ◦ h, for f : A → B, g : C → D, and h : B → C, are depicted as follows:
CISUC/TR 2014-02
168
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
A
A
B
A
C
f
f
g
B
B
D
h C
The maps are depicted by cups, η maps by caps, and yanking by their composition and straightening of the strings. For instance, the diagrams for l : Al ⊗ A → I, η : I → A ⊗ Al and (l ⊗ 1A ) ◦ (1A ⊗ η l ) = 1A are as follows: Al
A A
Al
Al
A
Al
=
A
As for Frobenius algebras, the diagrams for the monoid and comonoid morphisms are as follows: (µ, ζ)
(∆, ι)
with the Frobenius condition being depicted as:
=
=
The defining axioms guarantee that any picture depicting a Frobenius computation can be reduced to a normal form that only depends on the number of input and output strings of the nodes, independent of the topology. These normal forms can be simplified to so-called ‘spiders’: ··· ···
CISUC/TR 2014-02
=
··· ···
169
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
3
Diagrammatic Compact Closed Semantics
Following the terminology and notation of [2], given a quantified phrase such as “Det noun”, we call “Det” a determiner. A quantified phrase is a noun phrase which is created by the application of a determiner to a noun phrase. We suggest the following diagrammatic semantics for a determiner Det:
N
Det N
N
N N
N
It corresponds to the following compact closed categorical morphism: (1N ⊗ δN ) ◦ (1N ⊗ Det ⊗ µN ) ◦ (1N ⊗ ηN ⊗ 1N ) ◦ ηN The meaning of the sentence with a quantified phrase in a subject position and its normalised form are as follows:
Det
N
N
N
Sbj
Verb
N N
N
Obj
N
N
N
Sbj
Verb
Obj
N
N SN
N
∼ =
N
N
Det
NS
N
N
Here, “Sbj”, “Obj”, and “Verb” are elements within objects N , N , and N ⊗S ⊗N , respectively. The symbolic representation of the normal form diagram is as follows: (N ⊗ 1S ) ◦ (Det ⊗ µN ⊗ 1S ) ◦ (δN ⊗ 1N ⊗S ⊗ N )(Sbj ⊗ Verb ⊗ Obj)
CISUC/TR 2014-02
170
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
The intuitive justification is that the determiner Det first makes a copy of the subject (via the Frobenius δ map), so now we have two copies of the subject. One of these is being unified with the subject argument of the verb (via the Frobenius µ map). In set-theoretic terms this is the intersection of the interpretations of subject and subjects-of-verb. The other copy is being inputted to the determiner map Det and will produce a modified noun based on the meaning of the determiner. The last step is the application of the unification to the output of Det. Set theoretically, this step will decide whether the intersection of the subject-of-verb and the noun belongs to the interpretation of the quantified noun. The diagrams, morphisms, and intuitions for a quantified phrase in an object position are obtained in a similar way.
4
Truth-Theoretic Interpretation
For this part, we work in the category Rel of sets and relations. We take N to be the set of all subsets of individuals from a universal reference set U , that is N = P(U ). A common noun in N is hence taken to be the set of all subsets of its individuals. We take S to be the one → − dimensional space; the origin 0 represents false and the vector 1 represents truth. A verb is the set of all non-empty subsets of a relation (corresponding to its predicate). For an intransitive verb, this relation is on the set N × S, where each relation corresponds to a subset of N , since we have N × S ∼ = N . For a transitive verb, it is a relation on the set N × S × N ∼ = N × N. We denote the set-theoretical semantics of a word “w” by [[w]] and its categorical representation as developed above by [[w]]. Then we have the following: [[Sbj]] = {Ai | Ai ∈ P([[Sbj]])}
[[Verb]] = {Bj | Bj ∈ P([[Verb]])} Det [[Sbj]] = {Dk | Dk ∈ Det([[Sbj]])}
where [[Sbj]] ⊆ U and [[Verb]] ⊆ U are the set-theoretic meanings of “Sbj” and “Verb”, and Det(S) is the same as in the generalised quantifier approach. The truth-theoretic meaning of the sentence “Det Sbj Verb” is obtained by computing the morphism developed in section 4 in category Rel. Here, since we have S = I, the morphisms that are applied to object S are dropped, that is we have: (Det ⊗ µN )(δN ⊗ 1N ) [[Sbj]] ⊗ [[Verb]] This is computed in three steps. In the first step, we obtain:
(δN ⊗ 1N ) [[Sbj]] ⊗ [[Verb]] = {(Ai , Ai ) | Ai ∈ P([[Sbj]])} ⊗ {Bj | Bj ∈ P([[Verb]])}
In the second step, we obtain:
CISUC/TR 2014-02
171
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(Det ⊗ µN ) {(Ai , Ai ) | Ai ∈ P([[Sbj]])} ⊗ {Bj | Bj ∈ P([[Verb]])} = Det {Ai | Ai ∈ P([[Sbj]])} ⊗ {Ai | Ai = Bj , Ai ∈ P([[Sbj]]), Bj ∈ P([[Verb]])} = {Dk | Dk ∈ Det([[Sbj]])} ⊗ {Ai | Ai = Bj , Ai ∈ P([[Sbj]]), Bj ∈ P([[Verb]])}
In the final step, we obtain: {Dk | Dk ∈ Det([[Sbj]])} ⊗ {Ai | Ai = Bj , Ai ∈ P([[Sbj]]), Bj ∈ P([[Verb]])} = {? | Dk = Ai , Dk ∈ Det([[Sbj]]), Ai = Bj , Ai ∈ P([[Sbj]]), Bj ∈ P([[Verb]])}
If the result is non-zero, the meaning of the sentence is true, else it is false. Example. As an example, suppose we have two male individuals m1 , m2 and a cat individual c1 . Suppose further that the verb ‘sneeze’ applies to individuals m1 and c1 . Hence, we have the following interpretations for the lemmas of words “man”, “cat”, and “sneeze”: [[men]] = P({m1 , m2 })
[[cat]] = P({c1 })
[[sneeze]] = P({m1 , c1 })
Consider the following quantified phrases and their interpretations: Some [[men]] = {{m1 }, {m2 }, {m1 , m2 }}
One [[man]] = {{m1 }, {m2 }}
In the first step of computation of the meaning of “some men sneeze”, we obtain:
{({∅}, {∅}), ({m1 }, {m1 }), ({m2 }, {m2 }), ({m1 , m2 }, {m1 , m2 })} ⊗ {∅, {m1 }, {c1 }, {m1 , c1 }} In the second step, we obtain: Some {{m1 }, {m2 }, {m1 , m2 }} ⊗ µ {∅, {m1 }, {m2 }, {m1 , m2 }} ⊗ {∅, {m1 }, {c1 }, {m1 , c1 }}
= {{m1 }, {m2 }, {m1 , m2 }} ⊗ {∅, {m1 }}
In the last step, we obtain the following: {{m1 }, {m2 }, {m1 , m2 }} ⊗ {∅, {m1 }} = {?}
Hence, the meaning of the sentence is true. For the sentence “One man sneezes”, one applies (One ⊗ µ) to the result of the first step, which is as above. Hence, the second and third steps of computation are as follows:
CISUC/TR 2014-02
172
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
One {{m1 }, {m2 }, {m1 , m2 }} ⊗ µ {∅, {m1 }, {m2 }, {m1 , m2 }} ⊗ {∅, {m1 }, {c1 }, {m1 , c1 }} = {{m1 }, {m2 }} ⊗ {∅, {m1 }} = {?}
Due to lack of space, we have not done the necessary expositions here, but as established in [3] and used in a similar context for relative pronouns in [12, 13], the Frobenius map µ is the analog of set-theoretic intersection and the compact closed epsilon map is the analog of set-theoretic application. Then it follows that our truth-theoretic interpretation of the compact closed semantics of quantified sentences provides us with the same truth-theoretic meaning as their generalised quantifier semantics. The proof of this claims uses the ‘living on property’ of generalised quantifiers.
5
Concrete Corpus-Based Interpretation
In a concrete vector space model, builtPfrom a corpus using distributional methods, we assume − n i ∈ N and the linear map corresponding to the that vector of the subject is i Ci → P meaning → − → − → − verb is jk Cjk n j ⊗ s k ∈ N ⊗ S. For n i a basis vector of N , we define the map Det as follows: − − − − Det(→ n i ) = Φ{→ w ∈ N | d(→ n i, → w ) = α}
(1)
− − where we have that φ is a linear function and α indicates how close → w is to the → n i and depends on the quantifier expressed by Det. − The intuitive reading of the above is that Det of a word → n i is a linear combination of all − the words that are α-close to → n i . For instance, if Det is ‘few’, then α is a small number (closer to 0 than to 1), indicating that we are taking the combination of vectors that are not so close to → − n i . If Det is ‘most’, then α will be a large number (closer to 1 than to 0), indicating that we − are taking the combination of vectors that are close to → n i . Here we are interpreting the notion of closeness based on the number of properties the words share. The distance α can be learnt from a corpus using a relevant task. The underlying idea here is that the quantitative way of quantifying in set-theoretic models, which depends on the cardinality of the quantified sets, is now transformed into a geometric way of quantifying where the meaning of the quantified phrase depends on its geometric distance with other words. Hence, a quantified phrase such as ‘few cats’ returns a representative noun (obtained by taking the average of all such nouns) that is far from vector of ‘cat’ in the semantic space. This representative noun shares ‘few’ properties with ‘cat’. A quantified phrase such as ‘most cats’ returns a representative noun that is close the the vector of ‘cat’ and stands for a noun that shares ‘most’ of the properties of ‘cat’. With this instantiation, the meaning of “Det Sbj Verb” is obtained by computing the following:
CISUC/TR 2014-02
173
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
−→ −−→ (N ⊗ 1S ) ◦ (Det ⊗ µN ⊗ 1S ) ◦ (δN ⊗ 1N ⊗S ) Sbj ⊗ Verb
In the first step of computation we have:
X X X X − − − − − − − (δN ⊗ 1N ⊗S ) Ci → ni ⊗ Cjk → nj ⊗→ sk =( Ci → ni ⊗→ n i) ⊗ ( Cjk → nj ⊗→ s k) i
i
jk
jk
In the second step we obtain: X X X − − − − − − − − (Det ⊗ µN ⊗ 1S ) ( Ci → ni ⊗→ n i) ⊗ ( Cjk → nj ⊗→ s k) = Ci Cjk Det(→ n i ) ⊗ µ(→ ni ⊗→ n j) ⊗ → sk i
=
X ijk
jk
ijk
− − − Ci Cjk Det(→ n i ) ⊗ δij → ni ⊗→ sk
The final step is as follows: (N ⊗ 1S )
X ijk
X − − − − − − Ci Cjk Det(→ n i ) ⊗ δij → ni ⊗→ sk = Ci Cjk hDet(→ n i ) | δij → n i i→ sk ijk
Example. As a distributional example, take N to be the two dimensional space with the basis − − − − {→ n 1, → n 2 } and S be the two dimensional space with the basis {→ s 1, → s 2 }. Suppose the linear expansion of → − − − N := C1 → n 1 + C2 → n2 and the linear expansion of −→ − − − − − − − − VP := C11 (→ n1 ⊗→ s 1 ) + C12 (→ n1 ⊗→ s 2 ) + C21 (→ n2 ⊗→ s 1 ) + C22 (→ n2 ⊗→ s 2) Suppose further the following for the interpretation of the determiner: −→ − − − − − − Det(Sbj) = Det(C1 → n 1 + C2 → n 2 ) = Det(C1 → n 1 ) + Det(C2 → n 2 ) = C10 → n 1 + C20 → n2
(2)
Then the result of the first step of the computation of a meaning vector for the sentence ‘Q Sbj Verb’ is: − − − − − − − − − − (C1 → n 1 + C2 → n 2 ) ⊗ (C11 → n1 ⊗→ s 1 + C12 → n1 ⊗→ s 2 + C21 → n2 ⊗→ s 1 + C22 → n2 ⊗→ s 2) In the second step of computation we obtain: − − − − − − − − − C1 C11 Det(→ n 1 )→ n1 ⊗→ s 1 + C1 C12 Det(→ n 1 )→ n1 ⊗→ s 2 + C2 C21 Det(→ n 2 )→ n2 ⊗→ s 1+ → − → − − C C Det( n ) n ⊗ → s 2
CISUC/TR 2014-02
174
22
2
2
2
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Since Det is a linear map, the above is equal to the following: − − − − − − − − − − Det(C1 → n 1 )(C11 → n1⊗→ s 1 + C12 → n1⊗→ s 2 ) + Det(C2 → n 2 )(C21 → n2⊗→ s 1 + C22 → n2⊗→ s 2) According to the expansion of the assumption in equation 2, the above is equivalent to the following: − − − − − − − − − − (C10 → n 1 )(C11 → n1 ⊗→ s 1 + C12 → n1 ⊗→ s 2 ) + (C20 → n 2 )(C21 → n2 ⊗→ s 1 + C22 → n2 ⊗→ s 2) Substituting this in the last step of the computation provide us with the following vector in S: − − − − C10 C11 → s 1 + C10 C12 → s 2 + C20 C21 → s 2 + C20 C22 → s2
(3)
Small Corpus-Based Witness. A large scale experimentation for this model constitutes work in progress. For the sake of providing intuitions for the above symbolic constructions, we provide a couple of corpus-based witnesses here. In the distributional models, the most natural instantiation of the distance d in equation 1 is the co-occurrence distance. For a noun ‘n’ and determiners ‘few’ and ‘most’, we define these as generally as follows: few(n) = Avg{nouns that share few properties with n} most(n) = Avg{nouns that share most properties with n} For the purpose of this toy-example, the above can be instantiated in the simplest possible way as follows: few(n) = Avg{nouns that co-occurred with n few times} most(n) = Avg{nouns that co-occurred with n most times} In this case, a sample query from the online Reuter News Corpus, with at most 100 outputs per query, provides the following instantiations: few(dogs) = Avg{bike, drum, snails} most(dogs) = Avg{cats, pets,birds, puppies} few(cats) = Avg{fluid, needle, care} most(cats) = Avg{dogs, birds, rats, feces} A cosine-based similarity measure over this corpus results in the fact that any of the words in the ‘most(n)’ set are more similar to ‘n’ than any of the words in the ‘few(n)’ set. This is indeed because the words in the former set are geometrically closer to ‘n’ than the words in the latter set, since they have co-occurred with them more. This is the first advantage of our model over a distributional model, where words such as ‘few’ and ‘most’ are treated as noise and hence meanings of phrase such as ‘few cats’, ‘most cats’, and ‘cats’ become identical (and similarly for any other noun). Moreover, in our setting we can establish that ‘most cats’ and ‘most dogs’ have similar meanings, because of the over lap of their determiner sets. A larger corpus and a more thorough statistical analysis will let us achieve more, that for instance, ‘few cats’ and ‘few dogs’ also have similar meanings.
CISUC/TR 2014-02
175
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
At the level of sentence meanings, compositional distributional models do not interpret determiners (e.g. see the model of [11]). As a result, meanings of sentences such as ‘cats sleep’, ‘most cats sleep’ and ‘few cats sleep’ will become identical; meanings of sentences ‘most cats sleep’ and ‘few dogs snooze’ become very close, since ‘cats’ and ‘dogs’ often occur in the same context and so do ‘sleep’ and ‘snooze’. In our setting, equation 3 tells us that these sentences have different meanings, since their quantified subjects have different meanings. To −→ − − − − see this, take cats = C1 → n 1 + C2 → n 2 , where as f ew(cats) = C10 → n 1 + C20 → n 2 and most(cats) = → − → − 00 00 C1 n 1 + C2 n 2 . Instantiating these in equation 3 provides us with the following three different vectors: −−−−−−→ − − − − cats sleep = C1 C11 → s 1 + C1 C12 → s 2 + C2 C21 → s 2 + C2 C22 → s2 −−−−−−−−−→ → − → − → − → − 0 0 0 0 few cats sleep = C1 C11 s 1 + C1 C12 s 2 + C2 C21 s 2 + C2 C22 s 2 −−−−−−−−−−→ − − − − most cats sleep = C100 C11 → s 1 + C100 C12 → s 2 + C200 C21 → s 2 + C200 C22 → s2 On the other hand, we have that ‘most cats sleep’ and ‘most dogs snooze’ have close meanings, one which is close to ‘pets sleep’. This is because, their quantified subjects and their verbs have similar meanings, that is we have: ( −−→ −→ −−→ most(dogs) ∼ most(cats) ∼ pets =⇒ most cats sleep ∼ most dogs snooze ∼ pets sleep −→ − −−−→ ∼ − snooze sleep At the same time, ‘few cats sleep’ and ‘most dogs snooze’ have a less-close meaning, since their quantified subjects have different meanings, that is: −−→ −−−−−−−−−−−→ −−−−−−−−−→ −→ most(dogs) ∼ / f ew(cats) =⇒ most dogs snooze ∼ / few cats sleep
References [1] M. Baroni and R. Zamparelli. Nouns are vectors, adjectives are matrices: Representing adjectivenoun constructions in semantic space. In Conference on Empirical Methods in Natural Language Processing (EMNLP-10), Cambridge, MA, 2010. [2] Jon Barwise and Robin Cooper. Generalized quantifiers and natural language. Linguistics and Philosophy, 4(2):159–219, 1981. [3] B. Coecke and E. Paquette. Introducing categories to the practicing physicist. In B. Coecke, editor, New Structures for Physics, volume 813 of Lecture Notes in Physics, pages 167–271. Springer, 2008. [4] B. Coecke, M. Sadrzadeh, and S. Clark. Mathematical Foundations for Distributed Compositional Model of Meaning. Lambek Festschrift. Linguistic Analysis, 36:345–384, 2010. [5] J. Curran. From Distributional to Semantic Similarity. PhD thesis, School of Informatics, University of Edinburgh, 2004. [6] J.R. Firth. A synopsis of linguistic theory 1930–1955. In Studies in Linguistic Analysis. 1957. [7] E. Grefenstette and M. Sadrzadeh. Experimental support for a categorical compositional distributional model of meaning. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1394–1404, 2011.
CISUC/TR 2014-02
176
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
[8] E. Grefenstette and M. Sadrzadeh. Prior disambiguation of word tensors for constructing sentence vectors. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), page 15901601, 2013. [9] G.M. Kelly and M.L. Laplaza. Coherence for compact closed categories. Journal of Pure and Applied Algebra, 19(0):193 – 213, 1980. [10] Anders Kock. Strong functors and monoidal monads. Archiv der Mathematik, 23:113–120, 1972. [11] Jeff Mitchell and Mirella Lapata. Composition in distributional models of semantics. Cognitive Science, 34:1388–1439, 2010. [12] Clark S. Sadrzadeh M. and C. Coecke. Forbenius anatomy of word meanings i: subject and object relative pronouns. Journal of Logic and Computation, 23:1293–1317, 2013. [13] Clark S. Sadrzadeh M. and C. Coecke. Forbenius anatomy of word meanings 2: possessive relative pronouns. Journal of Logic and Computation, doi: 10.1093/logcom/exu027, 2014.
CISUC/TR 2014-02
177
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
CISUC/TR 2014-02
178
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Part III
Contributed Papers NLSR
CISUC/TR 2014-02
179
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Deduction for Natural Language Access to Data Cleo Condoravdi1 , Kyle Richardson2 , Vishal Sikka3 †, Asuman Suenbuel3 and Richard Waldinger4∗ 1
2
1
Department of Linguistics, Stanford University Stanford, California, U.S.A.
[email protected] Institute for Natural Language Processing, University of Stuttgart Stuttgart, Germany
[email protected] 3 Office of the CTO, SAP Labs Palo Alto, California, U.S.A.
[email protected] †now at Infosys 4 Artificial Intelligence Center, SRI International Menlo Park, California, U.S.A
[email protected]
Introduction
We outline a general approach to automated natural-language question answering that uses first-order logic and automated deduction. Our interest is in answering queries over structured data resources. We are concerned with queries whose answer is not stored directly in a single database but rather must be deduced and computed from information provided by a number of resources, which may not have been designed to work together. While the obstacles to understanding natural language queries are formidable, we simplify the problem by limiting ourselves to a well-understood subject domain and a known set of data resources. Using domain knowledge, queries in natural language are mapped to a logical representation and interpreted using an automated reasoner over a logical theory with semantic links to target knowledge sources. Examples are drawn from a prototype system called Quest, which is being developed for a business enterprise question-answering application. Users of such a system can query complex databases without needing to know the structure of the target knowledge sources or to write programs in a database query language.
2
Motivation
Automated question answering has long been a major research topic in artificial intelligence. In particular, there is a large literature on natural language interfaces to ∗
From NLSR2014— the workshop on Natural Language Services for Reasoners
CISUC/TR 2014-02
181
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
databases, which focuses on using ordinary language to query database content (for a review, see [1, 11]). Given a natural language question, the goal of a question-answering system is to find an answer using a combination of language processing and reasoning. In the automated reasoning community, the deductive aspects of question answering on structured data have long been understood (for discussion see [19]), but theoremproving tools not have not been widely used in recent mainstream natural-language processing tasks (some exceptions: [4, 5, 16]), let alone for question answering. In general terms, answering a question in this framework reduces to the problem of proving a theorem about the existence of entities that satisfy the conditions of the question, which are expressed in a formal logic. The role of the language-processing component is therefore to produce such a formal representation. In natural language processing proper, work on interfaces to databases has traditionally focused on transforming a natural language question to an expression in a formal database query language, such as SQL [8, 9] or SPARQL [18]. The expression is then used by a standard database management system to access the target knowledge source and retrieve the desired answer (which in essence solves the reasoning task). Usually, the query is applied to a single (fixed) database with a predefined semantics. More recently, research has centered around learning the transformation from language to formal queries using various forms of data supervision [20, 12, 9, 2]. In this recent work especially, there has been scant attention to using richer formal representation languages, and the available datasets continue to rely on formal database query languages for representing query semantics. Such approaches tend to ignore complex semantic phenomena (e.g. quantifiers scope, temporal and spatial reasoning), despite the appearance of such constructs in benchmark datasets (see [6] for discussion). A major drawback to these approaches is that a simple natural language query may be complex when represented in a language like SQL, due to limitations in the expressive power of such languages [1]. Furthermore, the way the user naturally formulates a query may be quite different from the way the database designer chooses to represent information that answers the question. It may require world knowledge to translate the user’s formulation of the query into the form preferred by the database. Once the query is reduced to an SQL program, it is hard to apply world knowledge to transform it. The problem is compounded when the answer depends on information obtained from several heterogenous sources, or involves information that changes over time. Each source may have adopted its own representational conventions, and, when sources are to be composed, the information provided by one source may not be in the form required by another. Another obstacle is that natural language queries are hard to parse and interpret, especially when the target representations are richly defined and aim to model deeper aspects of linguistic meaning. Using a well-defined subject domain and set of knowledge resources, one can use this knowledge (e.g. about the entities and types of relations between entities) to help guide the parsing process, an idea which has gained some traction in recent work in NLP on semantic parsing (e.g. [2]). Using a reasoner further
CISUC/TR 2014-02
182
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
User Query (a):
Pull up a list of large companies that have a debt of more than 5 million euros. In particular, I am interested in the nationalities of these clients.
(computer naive user)
Sample Databases (b) Accounting Database 1
Name Database 2
company
debt
location
time
employed
ID
Name
SL Foods
$10526...
CH
9.2007
5000
CH
Switzerland
Maria Morgan
$5000
US
8.2006
150
US
United States
Currency Conversion Database 3
USD(x)=
EU(x)=
CH Franc(x)=
RU Ruble(x)=
x
x * 0.74
x * 0.91
x * 35.81
Figure 1: Illustration of the problem setting: a natural language query (a), along with a set of knowledge resources, or databases (b). See text for full description.
simplifies the task of interpreting the language, since assumptions can be associated with symbols in the formal representation rather than explicitly encoded into the language analysis directly. In the following sections, we outline our approach and show examples from a small prototype question-answering system called Quest, which is being built for the enterprise software company SAP. We focus largely on the representation side of the problem and the general mechanics of how the reasoning works and interacts with the natural language analysis.
3
Problem Setting
We are interested in questions whose answers are spread over different knowledge sources. Figure 1 provides an illustration of the problem, and includes an English query (a) and a set of (simplified) databases (b). The question is from the domain of dunning (or debt collection) being investigated in the Quest project. The colors show the rough correspondence between individual phrases in the English query and values in multiple databases. Some of these correspondences are direct; for example the word ‘company’ is directly related to the company field in Database 1. Other correspondences are less direct; for example the modifier ‘large’ is a qualitative notion that relates to the size of the particular item being modified; for companies, this is taken here to mean the number of employees. There are also more abstract relations between the words
CISUC/TR 2014-02
183
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(e.g. that the ‘company’ has the ‘debt as one of its properties), which in general are less direct and require further specification or computation. Our approach is based on logic, so the meaning of a query is expressed symbolically as a logical formula and answered by proving a theorem in the logic about the existence of entities that satisfy the formula. Figure 2a informally shows how the query in Figure 1 is represented in logic. The starred symbols and indentation show the logical symbols (in this case, quantifiers) and their scope, the words in bold are the types of entities, and the underlined symbols are the relations. These symbols are further defined in a subject domain theory (sketched in Figure 2b), which consists of axioms, which define relations and other background assumptions and knowledge (e.g. the meaning of ‘large’); and procedural attachments, which specify how to link symbols with procedures, such as look-up procedures in a structured database. A semantic parser is used to translate the English queries to first-order logic, and an automated reasoner is used to find proofs of the query, each of which indicates an answer. When a formula with a linked symbol appears in the search for a proof, the relevant database is consulted, and information from the database is imported into the proof, just as if it had been represented axiomatically. For example, the procedural attachment in Figure 2b specifies that an existential variable of sort company in the linked relation company-record can be replaced by an entry from the company field in Database 1. In addition, multiple databases might need be consulted in order to solve the conjecture; e.g. knowing if the debt value (in dollars) in Database 1 is above the specified value (in euros) requires doing (a real-time) currency conversion using Database 3, specified in the euro procedural attachment. Another database relates the two-character country code CH to the name of the country Switzerland. Procedural attachment allows a set of databases to act as a virtual extension of the subject domain theory. Qualitative relations are further defined by ordinary axioms and can be crafted to fit a particular domain or refined by asking follow-up questions (e.g. defining what a large company means). The labor in our system therefore is divided between producing first-order logic representations from natural language questions (the semantic parsing component) and using these representations to find answers by proving theorems (the reasoning component). In the sections below, we describe the details of each component in the Quest system and show sample representations.
4
Natural Language Query Processing
The first component of our system translates English questions to a formal meaning representation, a task traditionally known as semantic parsing. The goal of the representation is to encode the domain-specific relations that hold, in addition to logical operators (e.g. quantifiers, negations) and their scope. This mapping is done by
CISUC/TR 2014-02
184
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Conjecture (a): there’s some* company that company is large there’s some* debt the company has that debt there’s some* debt amount the debt amount is 5 million euros the debt is more than the debt amount there’s some* nationality there’s some* client the client has that nationality the client is equal to the company
Subject Domain Theory (b): ;;axioms COMPANY has DEBT exists company-record(?company,?debt,..) AND ?debt > $0.00 COMPANY is large exists company-record(?company,..,?size) AND ?size > some Y X more than Y euros euro(x) > Y ;;procedural attachments company-record(?company,?debt,..) : for some row in Database 1 replace ?company with value in companies field replace ?debt with value in debt field (DB1) replace ?size with value in employed field euro(?value) : replace ?value with result of EU(?value)
Figure 2: A sketch of the interpretation process for Figure 1, including a logic-like representation of the query (or the theorem to be proved) and a informal description of the associated knowledge. The colored objects in the knowledge point to the database values shown in Figure 1, and the underlined words show the relations.
detecting patterns within the surface syntax, patterns that are defined by a domain expert and based largely on the background subject domain. These rules are used to rewrite interpretable syntactic patterns to flat semantic clauses. Below is a fragment of a question (1), along with sample tree mapping rules (a-d) and syntactic fragments (i) that match these patterns. 1. .. company with a debt of 100 million euros... show the nationality... (a) (COMPANY,(with(DEBT)NP )PP )NP => (COMPANY-HAS-DEBT company debt) (i) [companyCOMPANY [withP [debtDEBT ]NP ]PP ]NP (b) (DEBT,(..?(MONEY AMOUNT)NP )..? )? => (DEBT-HAS-VALUE DEBT MONEY AMOUNT) (i) [debtDEBT [ofp [eurosMONEY AMOUNT ]NP ]PP ]NP
CISUC/TR 2014-02
185
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
((input Show a company with a debt of more than 100 million dollars. What is the nationality of the client ?) (top_level company_5 0) (top_level company_6 0) (definite company_6) (definite nationality_8) (desired_answer company_5) (quant exists debt_3 sort debt) (quant exists company_5 sort company) (quant exists nationality_8 sort nationality) (quant exists company_6 sort company) (quant (complex_num more than 100000000 dollar) dollar_4 sort money_amount) (exists_group ex_grp_45 (debt_3 nationality_8)) (scopes_over nscope company_5 ex_grp_45) (in nscope debt_3 (company-has-debt company_5 debt_3)) (in nscope dollar_4 (debt-has-money-value debt_3 dollar_4)) (in nscope nationality_8 (nationality-of company_6 nationality_8)))
Figure 3: Output of the SAPL semantic parser. Relations are expressed in a flat clausal form. See text for more details, as well as a description of the mapping rules in (1) used for producing these representations.
(c) (NUMBER,(..?(MONEY AMOUNT)NP )..? )NP =>(HAS-MONEY-VALUE MONEY AMOUNT NUMBER) (i) [100 millionNUMBER eurosMONEY AMOUNT ]NP (d) (DEFINITE MARKER,NATIONALITY)NP =>(DEFINITE NATIONALITY) (i) [theDEFINITE MARKER nationalityNATIONALITY ]NP This transduction procedure is implemented in a prototype parser called SAPL, which is built on top of components from the SAP Inxight text analytics engine. At an early stage of processing, the text analytics engine is used to perform entity (bold subscripts) and part-of-speech tagging. SAPL then uses a bottom-up parsing strategy and incrementally matches rules to syntactic patterns annotated with their entity types. For example, in 1a, the words ‘company’ and ‘debt are tagged as COMPANY and DEBT respectively and detected in an NP pattern. The mapping rule specifies that there is a COMPANY-HAS-DEBT relation, which holds between a COMPANY and DEBT in a subordinate PP. Since the pattern matches, it is rewritten into the corresponding semantic representation. In general, rules can be written at varying levels of detail; for example, the regex-like operator ..? (zero or one of anything) under-specifies details about the associated syntax. Other rules are linguistic in nature; for example, (1d) is a mapping rule for marking definite descriptions, which is useful for later reasoning about equality and anaphora. Additional heuristic rules for handling scope and other logical operator are also added. Technically, the SAPL parser builds on earlier work that uses the PARC XLE system [15], and MT-style transfer rules for mapping syntactic structures to semantic clauses [7]. Rewrite rules of this kind are commonplace in computational semantics and
CISUC/TR 2014-02
186
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
can range from rather simple pattern or template matching ([18]) to the use of formal systems such as tree transducers [14]. In the latter case, recent work on data-driven semantic parsing has focused on learning these transformation rules from data [12], often by employing techniques and formal models from statistical machine translation [20] and statistical parsing [2]. Given enough data, such learning methods could be used to learn the patterns shown in (1), cutting out the need for writing rules. Most of the mapping rules encode information directly from the subject domain theory used by the automated reasoner, and provides an overall conceptualization of the target domain. This knowledge guides the parsing and interpretation of the language, in the sense that constraints on the types of semantic relations that hold make it easy to eliminate certain structural ambiguities. For example, the PP ‘of 100 million dollars’ syntactically can attach to either ‘company’ or ‘debt’, but only the second attachment is allowed by our knowledge of the domain (similarly for ‘with debt’ attaching to ‘company). The idea of using knowledge like this has been applied at a larger scale for semantic parsing, e.g. in [2], which uses knowledge from the large FreeBase ontology [3] (as well as answers) for representations for open-domain questions. The full output of SAPL applied to a particular example is shown in Figure 3. In this case, the query has multiple sentences, and the second question is a follow-up on the first. Similarly to the representations used in [7], the entities in the patterns are expressed as skolem terms, and the relations (including scopal relations) are expressed in a flat clausal form, which mirror the local syntactic structures that the patterns are rewritten from. This representation is then unpacked to have a more conventional first-order-logic form (shown in Figure 4), which is passed to the automatic reasoner. The reasoning also fills in more details missing from the analysis, e.g. it can deduce the equality of ‘company’ and ‘client’ given that the clue that ‘client’ is definite in the representation.
5
Reasoning
SAPL produces an intermediate semantic representation. This representation provides a description of the query, including the logical relations and the connectives and quantifiers that relate them. A component called SAPL-to-SNARK converts this into logical form; this form is regarded as a conjecture that is submitted to the theorem prover SNARK [17]. The proof is conducted within the subject domain theory, whose axioms embody the knowledge of our subject domain experts. They define the meaning of the concepts in the query, specify the capabilities of the data sources, and provide the background knowledge necessary to relate them. The logical form is decomposed and transformed according to these axioms. In proving the logical form, SNARK uses machine-oriented inference rules, such as resolution (for general-purpose reasoning) and paramodulation (for reasoning about equality). It has special facilities for reasoning about space and time. Its procedural-
CISUC/TR 2014-02
187
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
attachment mechanism allows us to link the axiomatic theory to structured data and its answer-extraction mechanism constructs an answer to the query from the completed proof. Although SNARK is ideally suited, QUEST could use any theorem prover that has comparable facilities. SNARK is a refutation procedure; it looks for inconsistencies in sets of logical sentences. In trying to prove a new conjecture, SNARK will negate the conjecture, add it to the set of axioms in the theory, and try to show that the resulting set is inconsistent. For this purpose, it will apply inference rules to the negated conjecture and the axioms. The resulting inferred sentences are added to the set of sentences. Inference rules are than applied to the new sentences and the axioms. The process continues until a contradiction is obtained. Because the inference rules are all logically sound, all the sentences obtained follow from the negated conjecture and the axioms. Assuming that the axioms are all true, this shows that the negated conjecture must be false, i.e., the conjecture itself must be true. The decision in which order the axioms are considered is controlled by a set of heuristic principles, which are built into SNARK. We supply orderings and weights on the symbols of the theory; this information focuses the attention of the system. For example, subgoals of lower weight are given higher priority than sentences of higher weight. Within a given logical sentence, relation symbols that are higher in the ordering are attended to before relation symbols that are lower. Since our theorem proving is undecidable, we give SNARK a time limit (currently six seconds); if the theorem is not proved in that time, we cannot give an answer. None of the examples in this paper required more than that time; most were done in less than a second. The answer-extraction mechanism was originated for program synthesis applications. The query requires us to find entities that satisfy a complex relationship, which may have many conditions. The entities to be found are represented by variables that are existentially quantified in the logical form. In proving the theorem, the theorem prover replaces these variables by more complex terms; variables in those terms are replaced in turn by other terms, and so on. When the proof is complete, the composition of these replacements indicates an answer that can be extracted from the proof. Typically, there are many proofs for each theorem; each proof may yield a different answer. QUEST assembles all these answers for presentation to the user. As we have seen, using the procedural-attachment mechanism, symbols in the theory may be linked to tables in the database (or indeed to arbitrary procedures). The database can be accessed as the proof is under way. When a formula with a linked symbol appears in the search for a proof, the relevant database is consulted, and information from the database is imported into the proof, just as if it had been represented axiomatically. For example, a procedural attachment might specify that an existentially quantified variable company can be replaced by to an entry from the company field in the Company Record database, and the variable debt can be replaced by an entry from the debt field). Other symbols can be linked to other databases, e.g. the
CISUC/TR 2014-02
188
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Currency Conversion database. Mathematical functions and relations are linked directly to arithmetic procedures, so that these operations do not need to be performed by the theorem prover. This means that the theorem prover can access not only the knowledge that resides in its axioms, but also the contents of available data resources. The proceduralattachment mechanism also allows us to mimic some higher-order-logic inference steps. If, after the user has viewed the answers, there is a follow-up question, the logical form is modified to incorporate the new conditions, and the theorem is proved again.
6
Example
As a specific example, we consider the query Show a company with a debt of more than 100 million dollars. What is the nationality of the client? Although for a human being this is straightforward, for the parser it is highly ambiguous. A person knows that the word “with” pertains to the company, because companies may have debt. The parser, however, also considers the possibility that a debt could be a way of showing something, like pictures in Show your ideas with pictures. By using subject domain knowledge, the parser concludes that a debt cannot be a way of showing a company and so “with” must apply to the company itself. Also, the subject domain knowledge tells us that in this context the word “client” is synonymous with “company.” Hence, “client” in the second sentence is the same thing as “company” in the first. Consequently, we replace all occurrences of “client” with “company.” The full semantic representation for this query is shown in Figure 3. SAPL introduces a quantified variable for each named entity in the query and indicates what kind of quantifier is to be used and what scoping is to be employed. In our example, all the quantifiers are existential, because they are entities to be found. The rows in (1) indicate that the variables company_5 and debt_3 are to be existentially quantified in the logical form, and these variables have sorts company and debt, respectively. 1.
(quant exists company_5 sort company) (quant exists debt_3 sort debt)
SAPL introduces appropriate semantic relations between these variables. For instance, in the representation, the row in (2) indicates that the variable company_5 is related to the variable debt_3 by the relation company-has-debt. Furthermore the notation requires that this relation be within the scope of the variable debt_3. 2.
(in nscope debt_3 (company-has-debt company_5 debt_3))
The logical form obtained from this representation is shown in Figure 3. In other words, we are seeking a company that has a debt whose value is more than a hundred million dollars; we are also to find the nationality of this company.
CISUC/TR 2014-02
189
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
(exists ((company_5 :sort company)) (exists ((nationality_8 :sort nationality)) (exists ((debt_3 :sort debt)) (exists ((dollar_4 :sort money_amount)) (and (is-debt debt_3) (is-company company_5) (nationality-of company_5 nationality_8) (company-has-debt company_5 debt_3) (more-than dollar_4 (* 100000000 dollar)) (debt-has-money-value debt_3 dollar_4))))))) :answer (ans company_5 nationality_8 debt_3 dollar_4)
Figure 4: Reformatted representation used by the theorem-prover for query in Figure 3.
7
Proof
As part of the theorem proving process, the logical form is negated, its quantifiers are removed by skolemization, and its variables are renamed. The expression is expanded according to the following axiom expand-company-has-debt: (==> (company-has-debt ?c.company ?d.debt) (& (company-record ?c.company ?d.debt ?dso.dso ?l.location ?size.number (now)) (> ?d.debt 0))) Here ?c.company is a variable of sort company; it can be replaced only by terms of sort company (or subsorts of this sort) during the proof. The relation company-record holds if its arguments are linked to a row in the database of companies; ?dso.dso is a detailed record of a company’s debt; ?size.number is the size of the company; (now) is the current date. Note that the axiom requires that, for a company to be said to have a debt, its entry in the debt column of the database must be nonzero. A company with zero debt (or negative debt) is not thought of as having a debt at all. Applying this axiom to the negated logical form results in a sentence that contains the negation of (company-record ?c.company ... (now)). The symbol company-record has a procedural attachment which can access rows in the database as the proof is underway. The variables in the expression will be replaced by concrete terms from the database. For instance, one such row refers to the Swiss company SL Foods Inc., which has a debt of 105,263,552 dollars. (The data, based on actual SAP data, has been garbled to protect the privacy of clients.) Consequently a new sentence will be obtained in which ?c.company has been replaced by SL Foods Inc., ?d.debt has been replaced by a debt of 105,263,552 dollars, and ?l.location has
CISUC/TR 2014-02
190
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
been replaced by a two-letter code for Switzerland. The new sentence will contain the condition that the debt is greater than 100 million dollars. Other sentences will correspond to other companies in the database, with other debts and nationalities. These turn out to have debts of less than 100 million dollars. The only proof, then, will yield the answer SL Foods Inc. of Swiss nationality.
8
Other Examples
In this section we mention some examples that illustrate other capabilities. What clients have a high debt or a long-term debt? Concepts such as high and low, and long-term and short-term debts are defined by axioms. Having a theorem prover allows us to deal with questions with logical operators, such as “or”, and quantifiers. Temporal reasoning allows us to determine the duration of a debt. For example, in Show clients with a high debt within the last two years, the logical form will posit the existence of a temporal interval that represents the notion of the last two years. The duration of this interval is two years and its finish-point is the current time ”now.” Relevant axioms for these concepts include the axiom duration-of-time-interval, (= (duration (make-interval ?t1.time-point ?t2.time-point)) (minus-time ?t2.time-point ?t1.time-point) and the axiom last-of-time-interval, (iff (last ?t.time-interval) (= ,now (finish-time ?t.time-interval). (In other words, the duration of a time interval is the difference between its end-points, and an interval is a ”last” interval if its finish-time is ”now.” Debts are themselves temporal intervals, and the specified debt is restricted in the logical form to be within the posited two-year temporal interval. For What companies do not have a low debt? the following axiom uniqueness-of-debt implies that the debt a company owes SAP at a given time is unique. [(company-record ?c.company ?d1.debt ?dso1.dso ?l1.location ?size1.number ?t.time) & (company-record ?c.company ?d2.debt ?dso2.dso ?l2.location ?size2.number ?t.time)] => (= ?d1.debt ?d2.debt) Thus, when Quest finds a company with one debt that is not low, it reasons that the company could not have a low debt.
CISUC/TR 2014-02
191
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
In Every company does not have a low high debt, Quest can answer immediately in the affirmative because the definitions of low and high are contradictory; it does not need to consult the database. Can you show me Swiss companies with a debt? Show only those that owe more than 50 million euros. Quest knows that debt means the same as money owed. Currency conversion rates for 35 currencies for the date in question are obtained from the European Central Bank [www.ecb.europa.eu]. Two-letter country codes in the SAP database are associated with country names in an ISO database [www.iso.org/iso/country_codes.htm]; another table associates country names (e.g. Switzerland) with nationalities (e.g. Swiss). What country had the lowest debt within the last two years? Show only a client who owes more than 50 dollars. What is the nationality of the company? The database gives the date each debt was contracted. Quest uses temporal reasoning to make sure that date is within two years of “now,” the date the question is being asked. Companies that owe less than 50 dollars are not accepted as answers. Of all the remaining proofs, Quest will select the one yielding the lowest debt as the answer.
9
Future Work and Summary
There is much to do in the business enterprise domain that we have not yet approached. While we can find the largest or smallest item from a set of answers, we cannot find their sum or average. We don’t do amalgamation questions such as What percentage of companies have a high debt? which are quite useful. While we find answers in a few seconds, we could greatly improve the efficiency of Quest by making more use of the facilities of a database query system such as SQL. We would need to prove a more complex theorem, which would result in the synthesis of a more complex but faster database query. We have found that knowledge and reasoning in a specific subject domain can greatly assist in natural language processing. However, the task of developing an axiomatic subject domain theory for the subject of interest can be very labor intensive. In the next phase of our research, we plan to use natural language to assist in the formation or extension of an axiomatic knowledge base. Quest translates a query into a logical form that captures its intended meaning. We can apply the same techniques to translate declarative sentences into axioms. Supplying this technology to a subject domain expert allows an axiomatic theory to be constructed or extended, without the expert’s needing to know any logic. Playing back the axiom into English, as Quest does with queries, allows us to ensure that the correct meaning of the original English sentence has been captured. Theorem proving techniques can also be brought to bear to detect if any inconsistencies have been introduced into the theory. The domain expert may also be alerted to any surprising consequence of the new axioms. For instance, the system may warn the expert if a new axiom simplifies or
CISUC/TR 2014-02
192
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
subsumes a great many other axioms in the theory; this can be a sign that the axiom is stronger than expected. Rather than starting to develop axioms from scratch, we can use an existing ontology, such as SUMO [13] or Cyc [10]. SUMO gives us some 80,000 axioms for common concepts. Every word in the online thesaurus Wordnet maps into some SUMO concept. While we usually need to introduce more axioms, to fill in gaps in the understanding of our subject domain, we can use a standard ontology as a starting point. Adapting a standardized representation allows us to take advantage of other efforts in axiomatizing the same subjects. We can also partially automate the formation of new procedural attachments. A domain expert who is also familiar with the structure of a relevant database can assist in linking a new relation symbol with a database table. Each column in the database (e.g. size, location) must be associated with a concept in the ontology. We may then automatically generate code that accesses that table when the linked relation symbol occurs in the search for a proof. The domain expert may also provide declarative sentences that relate common relations in the subject domain with database concepts. This is an iterative process—when the system fails to answer a question, the domain expert may need to provide a new sentence, which corresponds to a missing axiom. At no point does the domain expert need to understand the underlying logical language.
10
Acknowledgements
The authors would like to thank Matthias Anlauf, Butch Anton, Danny Bobrow, Ray Perrault, the late Mark Stickel, James Tarver, Mabry Tyson, and Michael Wessel for discussions that helped formulate the ideas in this paper and get them to work.
References [1] Ion Androutsopoulos, Graeme D. Ritchie, and Peter Thanisch. Natural language interfaces to databases - an introduction. Natural Language Engineering, 1(1):29–81, 1995. [2] Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase from question-answer pairs. In EMNLP, pages 1533–1544, 2013. [3] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247– 1250, 2008. [4] Johan Bos. Three stories on automated reasoning for natural language understanding. In Proceedings of ESCoR (IJCAR Workshop), pages 81–91, 2006. [5] Johan Bos. Wide-coverage semantic analysis with boxer. In Proceedings of the 2008 Conference on Semantics in Text Processing, pages 277–286, 2008.
CISUC/TR 2014-02
193
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
[6] Philipp Cimiano and Michael Minock. Natural language interfaces: what is the problem?–a data-driven quantitative analysis. In Natural language processing and information systems, pages 192–206, 2010. [7] Richard Crouch. Packed rewriting for mapping semantics to kr. In Proceedings of the 6th International Workshop on Computational Semantics, pages 103–14, 2005. [8] Anette Frank, Hans-Ulrich Krieger, Feiyu Xu, Hans Uszkoreit, Berthold Crysmann, Brigitte J¨ org, and Ulrich Sch¨afer. Querying structured knowledge sources. In Proceedings of AAAI-05. Workshop on question answering in restricted domains, pages 10–19, 2005. [9] Alessandra Giordani and Alessandro Moschitti. Semantic mapping between natural language questions and sql queries via syntactic pairing. In Natural Language Processing and Information Systems, pages 207–221, 2010. [10] Douglas Lenat. Cycorp: Home of smarter solutions. online, 2014. http://www.cyc.com. [11] Mark T. Maybury, editor. New Directions in Question Answering. AAAI Press, 2004. [12] Raymond J Mooney. Learning for semantic parsing. In Proceedings of Computational Linguistics and Intelligent Text Processing, pages 311–324. Springer, 2007. [13] Adam Pease, Ian Niles, and John Li. The suggested upper merged ontology: A large ontology for the semantic web and its applications (2002). online, 2002. http://citeseerx. ist.psu.edu/viewdoc/summary?doi=10.1.1.18.6817. [14] Adam Purtee and Lenhart Schubert. Ttt: A tree transduction language for syntactic and semantic processing. In Proceedings of the EACL Workshop on Applications of Tree Automata Techniques in Natural Language Processing, pages 21–30, 2012. [15] Kyle Richardson, Daniel G. Bobrow, Cleo Condoravdi, Richard J. Waldinger, and Amar Das. English access to structured data. In Proceedings of IEEE International Conference on Semantic Computing, pages 13–20, 2011. [16] Lenhart Schubert. From treebank parses to episodic logic and commonsense inference. In Proceedings of ACL Workshop on Semantic Parsing, pages 55–60, 2014. [17] Mark E. Stickel, Richard J. Waldinger, and Vinay K. Chaudhri. A guide to SNARK, 2000. [18] Christina Unger, Lorenz B¨ uhmann, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, and Philipp Cimiano. Template-based question answering over rdf data. In Proceedings of the 21st international conference on World Wide Web, pages 639–648, 2012. [19] Richard Waldinger. Whatever happened to deductive question answering? In Logic for Programming, Artificial Intelligence, and Reasoning, pages 15–16, 2007. [20] Yuk Wah Wong and Raymond J Mooney. Learning for semantic parsing with statistical machine translation. In Proceedings of HTL-NAACL, pages 439–446, 2006.
CISUC/TR 2014-02
194
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Exposing Predictive Analytics through Natural Language Jeroen van Grondelle1 , Christina Unger2 , and Frank Smit 1
HU University of Applied Sciences Utrecht
[email protected] 2 CITEC, Bielefeld University
[email protected] Abstract
Data processing applications often have non-experts as audience, which has led to an increasing interest into verbalizing data into natural language. However, verbalizations often focus on observed or predicted facts, while predictive technologies in addition generate information such as probabilities and confidence scores. This gives rise to data that goes beyond facts alone and that also has to be communicated to its users in order to enable a correct interpretation of the predictions. In this paper we explore the natural language needed when applications offer predictive analytics technology to their users, and demonstrate how to modularly implement the grammars needed to verbalize common aspects of predictions.
1
Introduction
In order to allow non-expert users to access data, a lot of study has gone into applying natural language generation for the verbalization of knowledge- and databases [1], recently in the context of the Semantic Web [2]. However, data services often include some form of reasoning engines and predictive analytics. As a result, the dialog between user and system includes inferred data and predictions. With that, a number of aspects enter the dialog beyond simple fact assertion. For example, predictive techniques create statements about specific future expectations, and often include additional information on, e.g., confidence, probability and provenance of the predicted facts. These aspects have subtle semantics that non-statistically trained people are not familiar with, but that do impact which expectations are justified based on the predictions. Sharing these formal attributes with the users of these predictive services is important to ensure a correct interpretation of the results [3]. In order to enhance the accessibility as well as comprehensibility of predictive data for non-expert users, this paper proposes a number of natural language verbalizations for different kinds of predictive data. In a proof of concept, different machine learning algorithms are applied to flight delay data. Based on the LT3 framework [6], modular grammars are developed which are used to generate verbalizations of flight delay predictions. These grammars comprise both a domain grammar module for verbalizations of flight delays, and a generic grammar module covering predictive aspects, that could also be coupled with any other domain.
CISUC/TR 2014-02
195
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
2
Verbalizing Predictive Data
Predictive analytics techniques discover and analyse patterns in data about historical behaviour and events, and use those to predict likely future events and behaviour, for example the revenue a customer is going to generate, expected house prices, or flight delays. The underlying technologies often make some form of generalisation of the data they train on, in order to predict on unseen datapoints. The basic data of predictive analytics are thus future facts, for instance the time that the departure of a flight will be delayed. In its most basic form, a fact like The departure of the flight is delayed by 10 minutes can be turned into a prediction of a future fact by turning it into simple future tense: The departure of the flight will be delayed by 10 minutes. In addition, predictive techniques can produce data on aspects that enrich a prediction with further information that describes the prediction or the way it was reached, the specific semantics of which depends on the type of predictive model that is employed. In this paper we focus on the following aspects that we identified as common across a number of predictive techniques: Interval In case of continuous numerical data, a classifier can be used to predict a discretized class, i.e. an interval of values. Also, a regression technique could provide a margin, e.g. based on its error rate, in order to offer an interval prediction instead of an exact value, thereby encoding its inability to make a precise prediction. Intervals can be verbalized either by means of their limits (e.g. The departure of the flight will be delayed between 10 and 20 minutes) or by means of a name or description (e.g. The departure of the flight will be slightly delayed). Probability Some predictive techniques, such as Naive Bayes classification, inherently produce probabilistic predictions, i.e. they predict the probability of a certain fact materializing. That fact not materialising in a certain individual case by no means disqualifies the prediction; instead more detailed statistical analyses across multiple instances is needed to check if the observed frequency of the fact can be reconciled with the predicted probability. Similar to intervals, the probability of a prediction can be expressed as an exact value (e.g. There is a chance of 83% that the departure of the flight will be delayed by 10 minutes) or as a description of the probability range expressed by an adjective or adverb (e.g. There is a high chance that the departure of the flight will be delayed by 10 minutes or The departure of the flight will most likely be delayed by 10 minutes). In addition, instead of trying to verbalize a probability in the context of a single occurrence, the very nature of probability suggests the alternative of verbalizing the effect of that probability across occurrences. This, again, can be done in an exact way, as a percentage of known cases (e.g. The departure of similar flights is delayed by 10 minutes in 83 % of the cases), or in a descriptive way codifying the
CISUC/TR 2014-02
196
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
percentage by adverbs such as often (e.g. The departure of similar flights is often delayed
for 10 minutes).
Confidence In addition, a measure of confidence can be associated along with a prediction, that specifies to which extent the prediction can be trusted. Its exact implementation depends on the type of classifier or regressor. In case no confidence measure is implemented, the prediction accuracy could serve as an indicator for confidence. In contrast to a probabilistic prediction, an individual prediction with an attached confidence value must me interpreted as false if it does not prescribe the actual future behaviour. In order to convey the semantics of a confidence associated with a prediction, it is important to clearly distinguish it from the prediction itself and its probability. This is easiest done by not verbalizing the predicted future event as a fact, but rather express the potentiality of the event using different modalities, indicated by modal markers such as certainly and maybe or modal auxiliaries (e.g. The departure of the flight might be delayed for 10 minutes, or modal constructions such as It seems like the departure of the flight will be delayed for 10 minutes), or using speaker attitudes, e.g. by means of constructions such as I am convinced that. . . , I believe that. . . , I am undecided whether. . . . Some predictive techniques may use more than one of these classes of information in their predictions, i.e. predict a probability for a variable to be in a certain interval. The forms for the different kinds of predictive data can then be combined if the predictive technology produces results that have more than one of these properties associated to it. In order to create a modular grammar resource for predictive verbalizations that facilitates the porting of those verbalizations across domains and languages, we use the LT3 framework introduced in [6]. Most importantly, its architecture decouples domain aspects (e.g. flight delays or customer revenues) from task aspects (e.g. predictions and dialog aspects). This allows for an easy coupling of the predictive task with any domain of data. Moreover, its architecture relies on an automatic generation of grammars based on lexical resources that require less effort and linguistic expertise to create.
3
Case: Predicting and Communicating Flight Delays
In this section we instantiate LT3 for the use case of predicting flight delays. To this end, we realize a prototype for training common machine learning algorithms on flight delay data, predicting the delay of single flight instances, and building verbalizations that express the resulting prediction in natural language. The prototype is realized in Python1 and is available at http://bitbucket.org/ jcgronde/exposingpredictiveanalytics. It imports predictive techniques from the 1
Python, Version 2.7, http://python.org/
CISUC/TR 2014-02
197
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Scikit-learn Toolkit2 and builds on Grammatical Framework3 (GF) [5] for natural language generation. The prototype slices the data, picking a (either specified or random) data point for prediction and all other data points for training, producing a prediction and a corresponding GF representation as output. The latter is piped through the GF shell for linearization in two languages: English and Dutch. We applied the system to the flight records4 provided by RITA5 , which capture information about U.S. domestic flights from 1987 on, including (among others) date information, the origin and destination of the flight, its carrier, the scheduled departure and arrival times, the departure and arrival delay as well as delay causes. As accurate prediction is not a goal in itself for our research, we have considered only a subset of the data, in particular all those flights from 2008 that originate from one of three airports (John F. Kennedy International, Dallas Love Field, Honolulu International) and have specified delay causes, amounting to 53,817 flight instances. In general, the delays peak at delays up to 40 minutes, leaving a large tail of longer delays. For training and prediction we have taken into account the following columns (disregarding all others): month, day of month, day of week, scheduled departure and arrival time, unique carrier code, origin and destination airport, travelled distance, departure and arrival delay (in minutes), as well as delay causes. In addition, we add columns that pool values of the month, day of week, distance and delay columns into coarse classes, in particular no for delays of 0 minutes, low for delays from 1 to 20 minutes, medium for delays between 21 and 40 minutes, and high for delays greater than 40 minutes. The departure and arrival delay are the target columns to be predicted, the others are used as input for the prediction. Predictions are made with a Gaussian Naive Bayes classifier, a Linear SVM classifier, a decision tree classifier, and a K-nearest neighbors regressor with k = 3. For arrival and departure delays as prediction targets, the regressor predicts an exact delay value, while the classifiers predict one of the above mentioned coarse classes (no, low, medium, high).
3.1
Lexical Resources and Domain Grammar
For generating verbalizations, we exploit the LT3 framework [6], which builds on a pipeline that generates grammar resources from conceptualizations and according declarative lexicalizations. To apply this framework, we therefore construct a domain ontology that captures flights together with the properties captured by the dataset, in particular their origin and destination airport, their carrier, the distance they span, their scheduled departure and arrival time, as well as the departure and arrival delays. The lexical representation of those domain aspects are captured in a lemon 6 [4] lexicon. lemon is a model for the declarative specification of multilingual, machine2
Scikit-learn Toolkit, Version 0.14, http://scikit-learn.org/ Grammatical Framework, Version 3.5, http://grammaticalframework.org/ 4 http://stat-computing.org/dataexpo/2009/the-data.html 5 http://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp 6 http://lemon-model.net 3
CISUC/TR 2014-02
198
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
readable lexica in RDF that capture syntactic and semantic aspects of lexical items relative to some ontology. The meaning of a lexical item is given by reference to an ontology element, i.e. a class, property or individual, thereby ensuring a clean separation between the ontological and lexical layer. Both the ontology and lexica in English and Dutch are available at Bitbucket: http: //bitbucket.org/chru/lexica. The domain ontology and the corresponding lexica then serve as input for lemongrass 7 , a grammar generation script that automatically constructs a GF grammar, mapping the ontology to an abstract syntax and each lexicon to a concrete syntax. Since the lexicon only contains domain-specific expressions (mainly nouns, verbs and adjective), but no domain-independent functional expressions, such as determiners, negation and coordination, also the resulting domain grammar is incomplete for the purposes of a full-fledged dialog. It is thus combined with a core grammar module8 containing domain-independent expressions as well as clause building constructions. This combined grammar now allows to express domain facts, e.g. the fact that the departure of a flight is delayed for 10 minutes. The part of the domain-independent core grammar module that contains numerals and units allows for expressing either exact values (10 minutes) or more vague descriptions (a few minutes or slightly). 1–3 show the corresponding verbalizations in English and Dutch. 1. The departure is delayed for 10 minutes. De vlucht vertrekt met een vertraging van 10 minuten.
2. The departure is delayed for a few minutes. De vlucht vertrekt met een vertraging van enkele minuten.
3. The departure is slightly delayed. Het vertrek is licht vertraagd.
Similarly, the grammar module for numericals allows to express intervals (e.g. between 10 and 20 minutes) as well as interval limits, for example: 4. The departure is delayed for at least 40 minutes. De vlucht vertrekt met een vertraging van tenminste 40 minuten.
Since the verbalization of exact and descriptive numerical values is contained in a domain-independent grammar module, it can be used in each domain that makes numerical statements, as well as in a task grammar module, e.g. for verbalizing exact or descriptive numerical probabilities, as shown below.
7 8
http://bitbucket.org/chru/lemongrass Available for English and Dutch at http://bitbucket.org/chru/grammars.
CISUC/TR 2014-02
199
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
3.2
Generating Verbalizations of Predictive Data
Verbalization aspects of predictive analytics are captured in a domain-independent task grammar module9 , which, for now, was hand-crafted. It captures functions for building predictions from domain statements, as well as functions for attaching probability and confidence information to such predictions. The GF representation d of a domain fact can now be embedded in a task-specific prediction function prediction, turning a domain fact into a simple prediction of a future fact by changing the domain verbalization into future tense, as shown in the following GF representation with corresponding verbalizations. 5. predict d The departure will be delayed for 10 minutes. De vlucht zal met een vertraging van 10 minuten vertrekken.
Analogously, any other domain fact can be embedded in the prediction, in order to yield descriptive verbalizations such as The departure will be delayed for a few minutes or The departure will be slightly delayed, and verbalizations specifying interval limits, for example The departure will be delayed for at least 40 minutes.
In addition to plain predictions of future facts, the task grammar defines functions that add probabilistic information to predictions p of the form predict d (with d any domain statement), either using exact values or descriptions, as in the following examples. 6. probability_exact p (value_float_unit 83.0 Percent) There is a chance of 83.0 percent that the departure will be delayed for 10 minutes. Er is een kans van 83.0 procent dat de vlucht met een vertraging von 10 minuten vertrekken zal.
7. probability_descr p High There is a high chance that the departure will be delayed for 10 minutes. Er is een hoge kans dat de vlucht met een vertraging von 10 minuten vertrekken zal.
7 uses a descriptive measure (Low, Medium, or High) that can be verbalized either as an adjective (in English slight, moderate and high, in Dutch licht, redelijk and hoog) or as an adjective modifier (in English slightly, moderately and significantly, in Dutch licht, redelijk and flink). Similarly, the function confidence adds confidence information to a prediction p, expressing a confidence measure by means of attitudes, as in the following example. 8. attitude p Expect I expect that the departure will be delayed for 10 minutes. Ik verwacht dat de vlucht met een vertraging von 10 minuten vertrekken zal. 9
Available for English, Dutch and German at http://bitbucket.org/chru/grammars.
CISUC/TR 2014-02
200
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
4
Future Work
In this paper, we have proposed generic verbalizations of common information that is relevant for predictions, in particular intervals, probabilities and confidence measures. This, however, does not yet cover all aspects and information that predictive techniques offer. In order to base decisions on predictions, it would, for example, be necessary to also verbalize provenance information, such as explanations of the produced prediction in terms of the predicition model, or information about the general accuracy of a predictor. Another aspect that can enrich verbalizations are expectations, e.g. conveying whether delays are expected for a particular airport or carrier, or whether are they surprising. Similarly, we could verbalize conditional predictions that explicitly name factors that often change and thus might still alter the prediction (e.g. If the weather conditions do not change, the departure of the flight will not be delayed). Finally, alternative forms such as exact numerical vs descriptive verbalizations could be used to address different audiences, e.g. having different expertise levels. In addition to an extension of the verbalized aspects, future work will include an evaluation of the performance of the proposed verbalizations, investigating whether non-statistically trained users understand the sentences and their consequences, and whether users experience the actual outcomes to be in line with the expectations they got from the verbalized predictions. Acknowledgments This work was partially funded by the EU project PortDial (FP7-296170).
References [1] Ion Androutsopoulos. Natural language interfaces to databases – an introduction. Journal of Natural Language Engineering, 1:29–81, 1995. [2] Nadjet Bouayad-Agha, Gerard Casamayor, and Leo Wanner. Natural language generation in the context of the semantic web. Semantic Web, 2013. [3] Anthony Jameson. Understanding and Dealing With Usability Side Effects of Intelligent Processing. AI Magazine, 30(1994):23–40, 2009. [4] John McCrae, Guadelupe Aguado de Cea, Paul Buitelaar, Philipp Cimiano, Thierry Declerck, Asuncion Gomez-Perez, Jorge Garcia, Laura Hollink, Elena MontielPonsoda, and Dennis Spohr. Interchanging lexical resources on the semantic web. Language Resources and Evaluation, 46(4):701–719, 2012. [5] Aarne Ranta. Grammatical Framework: Programming with Multilingual Grammars. CSLI Publications, 2011.
CISUC/TR 2014-02
201
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
[6] Jeroen Van Grondelle and Christina Unger. A Three-Dimensional Paradigm for Conceptually Scoped Language Technology. In Towards the Multilingual Semantic Web. Springer, 2014.
CISUC/TR 2014-02
202
NLCS’14 / NLSR 2014
Joint Proceedings NLCS’14 & NLSR 2014, July 17-18, 2014, Vienna, Austria
Index of Authors ——/ A /—— Amblard, Maxime . . . . . . . . . . . . . . . . . . . . . . . . . 55
——/ P /—— Parikh, Rohit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
——/ B /—— Berger, Ulrich . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Burnett1, Heather . . . . . . . . . . . . . . . . . . . . . . . . . 19
——/ Q /—— Quatrini, Myriam . . . . . . . . . . . . . . . . . . . . . . . . . 43
——/ C /—— Casadio, Claudia . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Condoravdi, Cleo . . . . . . . . . . . . . . . . . . . . . . . . 181 Cooper, Robin . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Cramer, Marcos . . . . . . . . . . . . . . . . . . . . . . . . . . 125 ——/ F /—— Fernando, Tim . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Fouqueré, Christophe . . . . . . . . . . . . . . . . . . . . . . 43 ——/ J /—— Jones, Alison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 ——/ K /—— Kanazawa, Makoto . . . . . . . . . . . . . . . . . . . . . . . 111 Kang, Juyeon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Ki´slak-Malinowska, Aleksandra . . . . . . . . . . . . 95 Kuznetsov, Stepan . . . . . . . . . . . . . . . . . . . . . . . . 137 ——/ L /—— Liefke, Kristina . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 ——/ M /—— Maršík, Jiˇrí . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
——/ R /—— Ranta, Aarne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Richardson, Kyle. . . . . . . . . . . . . . . . . . . . . . . . .181 Rypácˇek, Ondˇrej . . . . . . . . . . . . . . . . . . . . . . . . . 165 ——/ S /—— Sadrzadeh, Mehrnoosh . . . . . . . . . . . . . . . . . . . 165 Saint-Dizier, Patrick . . . . . . . . . . . . . . . . . . . . . . . 67 Seisenberger, Monika . . . . . . . . . . . . . . . . . . . . . 143 Shimada, Junri . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Sikka, Vishal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Smit, Frank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Suenbuel, Asuman . . . . . . . . . . . . . . . . . . . . . . . 181 ——/ U /—— Unger, Christina . . . . . . . . . . . . . . . . . . . . . . . . . 195 ——/ V /—— Van Grondelle, Jeroen . . . . . . . . . . . . . . . . . . . . 195 ——/ W /—— Waldinger, Richard . . . . . . . . . . . . . . . . . . . . . . . 181